Virtual ME 3.0

Virtual Me
2010
Faculty of Engineering Computer Engineering Department
Virtual ME
A Graduation Project Report Presented in Partial Fulfillment of the B.Sc. Degree in Computer Engineering
`
Submitted by
Menna Hamza Mohamad Hesham
Mona AbdelMageed Yasmine Shaker
Supervised by
Dr. Magda K. Fayek
Virtual Me
2010
Abstract
In this document, we propose a real-time 3D human body tracking system that utilizes proprietary video cameras to track the motion of colored markers attached to particular locations on the actors body. The locations of the markers on the body are chosen such that the basic body parts (e.g. the arm) are covered. The problem is decomposed into 3 sub-problems: marker extraction, depth extraction and pose reconstruction. We implement a modular framework such that each module tackles on those sub-problems. We also implement an application that utilizes our framework.
Virtual Me
2010
ACKNOWLEDGEMENTS

Dr. Magda Fayek: Thanks for your help and support. The OpenCV team and community. Our families: for understanding.
Virtual Me
CONTACTS
Name E-mail Mobile
2010
Menna Hamza Mohammed Mohamed Hesham Fadl Mona Abdel-Mageed El Koussy Yasmine Shaker AbdelHameed
hamza.menna@gmail.com mohamad.hesham@studentpartners.com monaabdelmageed@yahoo.com yasmine.shaker88@gmail.com
0100173604 0106172670 0161800073 0100766122
Supervisor
Name E-mail Mobile
Dr. Magda Fayek
magdafayek@gmail.com
0101589411
Virtual Me
TABLE OF CONTENT
2010
ACKNOWLEDGEMENTS ..................................................................................................................................... 3 CONTACTS ............................................................................................................................................................. 4 SUPERVISOR .......................................................................................................................................................... 4 LIST OF TABLES ................................................................................................................................................... 8 LIST OF FIGURES ................................................................................................................................................. 9 TABLE OF ACRONYMS AND DEFINITIONS ...................................................................................................................... 10 DOCUMENT HISTORY ...................................................................................................................................... 11 1. CHAPTER ONE: INTRODUCTION ........................................................................................................ 12 1.1. 1.2. 1.3. 1.4. 2. MOTIVATION AND JUSTIFICATION ..................................................................................................................... 13 PROBLEM DEFINITION..................................................................................................................................... 13 SUMMARY OF APPROACH ................................................................................................................................ 13 DOCUMENT STRUCTURE.................................................................................................................................. 14
CHAPTER TWO: THE SCIENCE BEHIND VIRTUAL ME ................................................................ 16 2.1. IMAGE PROCESSING ....................................................................................................................................... 17 2.1.1. IMAGE SMOOTHING .................................................................................................................................. 17 Median Filter .................................................................................................................................................. 17 Dilation and Erosion ....................................................................................................................................... 17 2.1.2. SEGMENTATION ........................................................................................................................................ 18 K-means Cluster .............................................................................................................................................. 18 Threshold Technique ....................................................................................................................................... 18 Edge Finding Technique .................................................................................................................................. 19 Connected component labeling ...................................................................................................................... 19 2.1.3. COLOR DETECTION .................................................................................................................................... 20 Color Models ................................................................................................................................................... 20 2.2. GEOMETRY OVERVIEW .................................................................................................................................... 22 2.2.1. System of coordinates ...................................................................................................................... 22 2.2.2. Homogeneous coordinates .............................................................................................................. 22 Vector operations ........................................................................................................................................... 22 2.2.3. Geometric Transformation .............................................................................................................. 22 2.3. CAMERA PROJECTION MODELS......................................................................................................................... 23 2.3.1. The weak projection model.............................................................................................................. 23 2.3.2. The pin hole camera projection model ............................................................................................ 23 Refining the pinhole projection model ............................................................................................................ 24 2.4. CAMERA CALIBRATION .................................................................................................................................... 25 2.5. STEREO VISION.............................................................................................................................................. 25 Stereo Vision Terminology ............................................................................................................................. 30 2.1.1. Stereo camera parameters............................................................................................................... 31 2.6. KINEMATICS.................................................................................................................................................. 31 2.6.1. Rigid body ........................................................................................................................................ 31 2.6.2. Kinematic chains.............................................................................................................................. 32 2.7. HUMAN BODY............................................................................................................................................... 32 2.8. GRAPHICS ENGINE ......................................................................................................................................... 33 2.9. PHYSICS ENGINE ............................................................................................................................................ 33
Virtual Me
3.
2010
CHAPTER THREE: LITRUTURE AND MARKET SURVEY ............................................................. 34 3.1. LITERATURE SURVEY ....................................................................................................................................... 35 3.2. MARKET SURVEY ........................................................................................................................................... 35 3.2.1. PlayStation Eye ................................................................................................................................ 35 Key Features: .................................................................................................................................................. 35 Price: ............................................................................................................................................................... 36 3.2.2. Nintendo Wii.................................................................................................................................... 36 3.2.3. ARENA motion capture .................................................................................................................... 36 Key Feature: .................................................................................................................................................... 37 Technical Specifications: ................................................................................................................................. 37 Price: ............................................................................................................................................................... 37
4.
CHAPTER FOUR: SYSTEM ANALYSIS................................................................................................ 38
4.1. DEVELOPMENT TOOLS & ENVIRONMENT ............................................................................................................ 39 4.2. BASIC BLOCKS ......................................................................................................................................... 40 4.3. FRAME WORK MODULES .................................................................................................................................... 41 4.3.1. VM-M-00 Marker Extraction ................................................................................................................ 41 4.3.2. Spatial Correspondence ................................................................................................................... 43 THE BLOCK DIAGRAM IN FIGURE X-Y IS A HIGH LEVEL DESCRIPTION OF THE INITIALIZATION PROCESS. AS SHOWN IN THE FIGURE, IT TAKES A SET OF CHESSBOARD IMAGES CAPTURED BY THE WEBCAM AS AN INPUT AND PROVIDES THE LEFT AND RIGHT RECTIFICATION MAPS NEEDED FOR STEREO CORRESPONDENCE. ......................................................... 44 FIND CHESSBOARD CORNERS ............................................................................................................................ 45 FIRST OF ALL WE NEED TO CALIBRATE ANY CAMERA USED IN THE SYSTEM BECAUSE OF THAT CAMERA
CALIBRATION IS IMPORTANT FOR ANY IMAGE PROCESSING TO BE A HIGHLY ACCURATE REPRESENTATION OF THE REAL WORLD. CAMERA CALIBRATION CAN BE DONE VERY EFFICIENTLY BY OBSERVING A CALIBRATION OBJECT (USUALLY CONSISTS OF TWO OR THREE PLANES ORTHOGONAL TO EACH OTHER) WHOSE GEOMETRY IN 3-D SPACE IS KNOWN WITH VERY GOOD PRECISION. CAMERA CALIBRATION NEEDS THAT OBJECT AS IT NEEDS FOR ITS COMPUTATIONS A SET OF 3D POINTS IN THE REAL WORLD ALONG WITH THEIR 2D POINTS IN THE IMAGE PLANE RESULTING FROM PROJECTING THE 3D POINTS ON THE IMAGE PLANE. CHESSBOARD IS THE PRACTICAL CHOICE AS WE KNOW THE DIMENSIONS OF THE SQUARES OF THE GRID; THEREFORE WE CAN COMPUTE THE ACTUAL COORDINATES OF EVERY CORNER IN THE GRID. FIGURE X SHOWS THE CHESSBOARD WE USED IN CALIBRATION. ................................................................................... 45
FIGURE X THE CHESSBOARD USED IN CALIBRATION ......................................................................................... 46 STEPS FOR GETTING SET OF 3D AND 2D CHESSBOARD CORNERS POINTS: ....................................................... 46 1. CAPTURE DIFFERENT IMAGES FOR THE CHESSBOARD WITH DIFFERENT ORIENTATIONS.............................. 46 2. INPUT EACH CAPTURED IMAGE TO A CORNER FINDER FUNCTION (CVFINDCHESSBOARDCORNERS() IN OPENCV [X]) , THIS FUNCTION RETURNS THE IMAGE COORDINATES OF EVERY CORNER IN THE INPUT IMAGE. ............. 46 3. NOW THERE IS A SET OF 3D POINTS WITH THEIR CORRESPONDING 2D POINTS WHICH IS PASSED TO THE CALIBRATION FUNCTION. .............................................................................................................................................. 46 STEREO CALIBRATION ....................................................................................................................................... 46 WE PERFORM STEREO CALIBRATION IN ORDER TO FIND THE ROTATION MATRIX R AND TRANSLATION VECTOR T BETWEEN THE TWO CAMERAS. ...................................................................................................................................... 46 STEPS FOR GETTING THE ROTATION MATRIX R AND TRANSLATION VECTOR T: ............................................. 46 1. USE CVCALIBRATECAMERA2() TO FIND ROTATION MATRIX AND TRANSLATION VECTOR OF EACH CAMERA SEPARATELY. ................................................................................................................................................................ 46 2. PLUG THESE PARAMETERS INTO THE FOLLOWING EQUATION TO FIND THE ROTATION AND TRANSLATION PARAMETERS BETWEEN THE TWO CAMERAS: ................................................................................................................ 46 .......................................................................................................................................................... 46 .................................................................................................................................................... 46 STEREO RECTIFICATION .................................................................................................................................... 47 1. DEPTH MAP CALCULATION..................................................................................................................... 47 IN OUR APPROACH, WE USE CALIBRATED STEREO CORRESPONDENCE APPROACH. THIS APPROACH ASSUMES THAT
WE KNOW THE PARAMETERS FOR THE STEREO RIG AND REDUCE OUR SEARCH FOR CORRESPONDENCES TO THE EPIPOLAR LINES. THE KEY FEATURES OF THIS TYPE OF CORRESPONDENCE PROCESSING IS THAT THE EPIPOLAR LINES
Virtual Me
2010
ARE EASILY DETERMINE, SINCE CAMERA PARAMETERS ARE KNOWN AND THE EMPHASIS IS ON INCREASING THE SPEED AND ACCURACY OF THE SEARCH ITSELF. FOR THE PURPOSES OF OUR PROJECT WE TREAT THIS PART AS A BLACK-BOX, SINCE WE KNOW THERE EXIST A NUMBER OF USEFUL ALGORITHMS FOR DOING THIS TASK. WE USE THE IMPLEMENTATION IN OPENCV WHICH IS CVFINDSTEREOCORRESPONDENCEBM() FUNCTION. ..................................... 47
A PAIR OF RECTIFIED IMAGES ............................................................................................................................... 48 DEPTH MAP .......................................................................................................................................................... 48 AS SHOWN IN FIGURE X, THE FUNCTION ASSUMES THE INPUT TO BE A PAIR LEFT AND RIGHT RECTIFIED IMAGES
AND OUTPUTS THE DISPARITY MAP WHICH HAS THE DISPARITY VALUE OF EACH POINT BELONGS TO THE VISUAL AREAS IN WHICH THE VIEWS OF THE TWO CAMERAS OVERLAP. ................................................................................................ 48 AFTER THAT, ONCE WE KNOW THE PHYSICAL COORDINATES OF THE CAMERAS SCENE, WE CAN DERIVE DEPTH MEASUREMENTS FROM THE TRIANGULATED DISPARITY MEASURES (EXPLAINED IN SECTION X-Y) BETWEEN THE CORRESPONDING POINTS IN THE TWO DIFFERENT CAMERA VIEWS. ............................................................................... 48 WE USE CVFINDSTEREOCORRESPONDENCEBM() OPENCV FUNCTION WHICH IMPLEMENTS THE STEREO CORRESPONDENCE VERY EFFECTIVELY. ........................................................................................................................ 48
3D RECONSTRUCTION (DEPTH MAP) ...................................................................................................... 48 4.3.3. VM-M-01 Pose Reconstruction ........................................................................................................ 49 4.4. Virtual ME Application ......................................................................................................................... 51 4.3.1. Virtual Goal Keeper ......................................................................................................................... 52 4.3.2. VMAPP-M-01 Game Logic ............................................................................................................... 54 Description ...................................................................................................................................................... 54 4.3.3. VMAPP-M-01 Game Physics ............................................................................................................ 54 Description ...................................................................................................................................................... 54 4.3.4. VMAPP-M-02 Game Graphics ......................................................................................................... 54 Description ...................................................................................................................................................... 54 4.4. SYSTEM FLOW ............................................................................................................................................... 55 4.5. LIMITATIONS ................................................................................................................................................. 55 2. 5. CHAPTER 5 :RESULTS, CONCLUSION & FUTURE WORK ............................................................ 56 5.1. TESTING & RESULTS ....................................................................................................................................... 56 5.1.1. System testing: ................................................................................................................................ 56 5.1.2. Module testing: ............................................................................................................................... 56 5.2. CONCLUSION ................................................................................................................................................ 56 5.2.1. Methodology Error .......................................................................................................................... 56 Inherent Methodology Error: .......................................................................................................................... 56 Interpolation Error: ......................................................................................................................................... 57 5.2.2. Quality of reconstruction ................................................................................................................. 57 5.2.3. Testing result ................................................................................................................................... 57 5.2.4. Performance enhancement ............................................................................................................. 57 Code optimization........................................................................................................................................... 57 Threading........................................................................................................................................................ 58 5.2.5. Robust marker detection and classifier ........................................................................................... 58 5.2.6. More Accurate Methods to get the depth ....................................................................................... 58 5.2.7. More Constraint for the pose constructor ....................................................................................... 58 REFERENCES ....................................................................................................................................................... 58 APPENDIX A. USER MANUAL ...................................................................................................................... 61
APPENDIX B. ........................................................................................................................................................ 62 APPENDIX C. APPENDIX D. APPENDIX E. INSTALLATION & OPERATION MANUAL ..................................................................... 63 PROGRAMMERS MANUAL ............................................................................................... 64 DESIGN OUTLINE ................................................................................................................. 65
Virtual Me
APPENDIX F. 5.2.8.
2010
TEST CASES ............................................................................................................................ 66
Component testing .......................................................................................................................... 66
LIST OF TABLES
Table 1. Table of Acronyms and Definitions ............................................................................................................. 10 Table 2:Limitation table ............................................................................................................................................ 55 Table 3: Marker extraction test cases ............................................................................. Error! Bookmark not defined. Table 4:color detection test cases ................................................................................... Error! Bookmark not defined. Table 5:Segmentation test cases ..................................................................................... Error! Bookmark not defined. Table 6: Classification test cases ..................................................................................... Error! Bookmark not defined. Table 7:Prediction test cases........................................................................................... Error! Bookmark not defined.
Virtual Me
LIST OF FIGURES
2010
Figure 1: median filter .............................................................................................................................................. 17 Figure 2: Segmentation ............................................................................................................................................. 19 Figure 3: RGB illustration .......................................................................................................................................... 20 Figure 4:HSV color space ........................................................................................................................................... 21 Figure 5: pinhole camera projection (mona) ............................................................................................................. 24 Figure 6:sterio vision clarification(mona) ........................................................................ Error! Bookmark not defined. Figure 7:stereo vision terminology(mona) ...................................................................... Error! Bookmark not defined. Figure 8:Triangulation (mona) ........................................................................................ Error! Bookmark not defined. Figure 9:the correspondence problem ............................................................................ Error! Bookmark not defined. Figure 10: depth reconstruction ...................................................................................... Error! Bookmark not defined. Figure 11:kinematic chain illustration ............................................................................. Error! Bookmark not defined. Figure 12: Playstation Eye ......................................................................................................................................... 35 Figure 13:Nintendo game ......................................................................................................................................... 36 Figure 14:Virtual Me basic blocks ............................................................................................................................. 40 Figure 15:Marker extraction flow chart .................................................................................................................... 41 Figure 16: Virtual Me marker distribution ................................................................................................................. 43
Virtual Me
Table of Acronyms and Definitions
2010
Following is a table of the acronyms and abbreviations we use throughout the document. < Move after TOC> Term Definition
Table 1. Table of Acronyms and Definitions
10
Virtual Me
2010
DOCUMENT HISTORY
Modified by Version Date Descriptions/Remarks
11
Virtual Me
2010
1. Chapter One: Introduction
In recent years there has been a resurge in Human body tracking (HBT), it has a wide range of applications in human computer interfaces and virtual environments as controlling virtual interfaces in games or in educational purposes as driving and flying virtual training, On the other hand a detailed analysis and tracking of the human body is used in clinical studies and diagnostics of orthopedic patients or to help athletes understand and improve their performance.
12
Virtual Me
2010
Also surveillance applications have its place among HBT applications were humans are being tracked over time and monitored for special actions. Different applications have different needs so different techniques have been invented to satisfy this needs. Marker-based techniques involve the placement of markers on different body parts this markers differs from optical, magnetic and colored markers each have its own pros and cons. Optical markers users have free movements and long tracking range but the markers are subjected to optical occlusion, needs controlling lighting and extensive pre and post-processing also it is from the most expensive techniques. Magnetic sensors users are not subjected to optical occlusion and sensor orientation is directly given but subject to disruption by electrical/magnetic fields by other devices and small tracking range. Colored markers (our system) users have free movements and long tracking range but subject to marker disappearance and needs a controlled lighting environment. Marker-less techniques involves detecting human body features and using kinematic models with no need for intensive markers or special cameras which is very attractive but designing such a system is not a trivial task.
1.1. Motivation and Justification

Expectations for what the computer should be able to do have grown rapidly, in recent years there has been a resurge in HCI (Human Computer interface). Touch, tilt and gesture interfaces have replaced the traditional key board and mouse in a lot of applications, especially in the field of entertainment. One can claim that those methods are as much of a revolution as moving from text interface to graphical user interface. One of the flashier applications is using the human body movements as an input method for computer. For example in a game instead of pressing a button to kick the ball the player will actually perform the kicking motion and the computer will interpret the motion as a kick motion.
1.2. Problem Definition

It is required to reconstruct the body posture of the actor using cameras for recording real life movement of colored markers placed in strategic locations on the body as sequences of Cartesian coordinates in 3D space. We focus on tracking the major joints of the actors body: Shoulders, Elbow, Wrist, Waist, Knee and Ankle.
1.3. Summary of Approach

Our system consists of three main phases Marker Extraction, Z Extraction and Pose Reconstruction. Marker Extraction involves finding the 2D pixel coordinates of the markers in images which involves segmentation of similar colors and mapping them to different body parts also using kalman filters to predict the position of the missing markers.
13
Virtual Me
Z Extraction://summary(mona)
2010
Pose Reconstruction: (summary H)
1.4. Document Structure

This is the graduation project final document. It will describe the details of the project and overview of sciences and theories we have used in our project. We start in Chapter Two by a review of the science behind Virtual Me. In Chapter Three we list some related products and papers, then, in Chapter Four we illustrate the details of the Virtual ME system. We first describe a general overview, and the decomposition of the design. This is followed by a detailed discussion of the different components used (i.e. Marker Extraction., Depth Extraction and Pose Reconstruction). Finally, we propose the performance results obtained, and possible future works.
14
Virtual Me
2010
15
Virtual Me
2010
2. Chapter two: the Science behind Virtual ME
16
Virtual Me
2010
2.1. Image Processing

In electrical engineering and computer science, image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an image or, a set of characteristics or parameters related to the image. Most image-processing techniques involve treating the image as a two dimensional signal and applying standard signal-processing techniques to it.
2.1.1. Image Smoothing

It is a widely used technique used as a pre-processing stage, to reduce image noise. But it has the problem of blurring sharp edge.
Median Filter
Median filter is one of smoothing techniques. Median filter is the nonlinear filter that is used to remove the impulsive noise from an image. Furthermore, it is a more robust method than the traditional linear filtering, because it preserves the sharp edges. Median filter is a spatial filtering operation, so it uses a 2-D mask that is applied to each pixel in the input image. To apply the mask means to center it in a pixel, evaluating the covered pixel brightnesses and determining which brightness value is the median value. Figure [1] presents the concept of spatial filtering based on a 3x3 mask, where I is the input image and O is the output image [1].
Figure 1: median filter
Dilation and Erosion

Dilation is a convolution of some image (or region of an image), which we will call A, with some kernel, which we will call B. The kernel, which can be any shape or size, has a single defined anchor point. Most of en, the kernel is a small solid square with the anchor point at the center. The kernel can be thought of as a template or mask, and its effect for dilation is that of a local maximum operator. As the kernel B is scanned over the image, we compute the maximal pixel value overlapped by B and replace the image pixel under the anchor point with that maximal value. This causes bright regions within an image to grow. This growth is the origin of the term dilation operator.
17
Virtual Me
2010
Erosion is the converse operation. The action of the erosion operator is equivalent to computing a local minimum over the area of the kernel. Erosion generates a new image from the original using the following algorithm: as the kernel B is scanned over the image, we compute the minimal pixel value overlapped by B and replace the image pixel under the anchor point with that minimal value. However, because dilation is just a max operator and erosion is just a min operator, morphology may be used on intensity images as well. In general, whereas dilation expands region A, erosion reduces region A. Moreover, dilation will tend to smooth concavities and erosion will tend to smooth away protrusions. Of course, the exact result will depend on the kernel, but these statements are generally true for the filled convex kernels typically used.
2.1.2. Segmentation
Segmentation of an Image entails the division or separation of the image into parts that have a strong correlation with objects or areas of the real world. The result of image segmentation is a set of segments that collectively cover the entire image. Each of the pixels in a region is similar with respect to some characteristic or computed property, such as color, intensity, or texture. Adjacent regions are significantly different with respect to the same characteristic(s). [5] There are many segmentation techniques; a brief will be given on the most widely used techniques
K-means Cluster
It is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centers, one for each cluster. The next step is to take each point belonging to a given data set and associate it to the nearest center. When no point is pending, the first step is completed and an early grouping is done. At this point we need to re-calculate k new center of the clusters resulting from the previous step. After we have these k new centers, a new binding has to be done between the same data set points and the nearest new centers. A loop has been generated. As a result of this loop we may notice that the k center change their location step by step until no more changes are done. In other words centers do not move any more. [6]
Threshold Technique
The technique makes decisions based on local pixel information, since many objects or image regions are characterized by constant reflectivity or light absorption of their surface, are effective when the intensity levels of the objects fall squarely outside the range of levels in the background. This technique is computationally inexpensive and fast and can easily be done in real time. [7]
18
Virtual Me
2010
Edge Finding Technique

It finds pixels that belong to the borders of the objects, using edge-detecting operators. The most common problems of edge-based segmentation are: 1. An edge presence in locations where there is no border 2. No edge presence where a real border exists
Original Image
Segmented Image
Figure 2: Segmentation
Connected component labeling

Connected Component Labeling is an algorithm that groups the connected pixels in an image. (Note: there are different ways to define connectedness such as: intensity and position) Connected component labeling works by scanning an image, pixel-by-pixel in order to identify connected pixel regions, i.e. regions of adjacent pixels which share the same set of intensity values V. (For a binary image V={1}; however, in a gray level image V will take on a range of values, for example: V={51, 52, 53, ..., 77, 78, 79, 80}. Connected component labeling works on binary or graylevel images and different measures of connectivity are possible. However, for the following we assume binary input images and 8connectivity. The connected components labeling operator scans the image by moving along a row until it comes to a point p (where p denotes the pixel to be labeled at any stage in the scanning process) for which V={1}. When this is true, it examines the four neighbors of p which have already been encountered in the scan (i.e. the neighbors (i) to the left of p, (ii) above it, and (iii and iv) the two upper diagonal terms). Based on this information, the labeling of p occurs as follows: If all four neighbors are 0, assign a new label to p, else if only one neighbor has V={1}, assign its label to p, else
19
Virtual Me
2010
if more than one of the neighbors have V={1}, assign one of the labels to p and make a note of the equivalences. After completing the scan, the equivalent label pairs are sorted into equivalence classes and a unique label is assigned to each class. As a final step, a second scan is made through the image, during which each label is replaced by the label assigned to its equivalence classes.[]
2.1.3. Color detection

The use of color in image processing is motivated by an important factor; color is a powerful descriptor that often simplifies object identification and extraction from a scene.
Color Models
The purpose of a color Models (also called as color space) is to facilitate the specification of colors in some standard. In terms of digital image processing, the most commonly used models are the RGB (red, green, blue); the CMY (cyan, magenta, yellow); CMYK (cyan, magenta, yellow, black); and the HSV (hue, saturation, value) model which corresponds closely with the way humans describe and interpret color. [2] In the detection problem we have preferred to use HSV model to make the project reliable such that the user can adjust the brightness and the saturation of the colors depending on the lighting conditions in the room and the color of his markers. RGB Model In the RGB model, each color appears in its primary spectral components of red, green, and blue. The number of bits used to represent each pixel in RGB space is called the pixel depth. Consider an RGB image in which each of the red, green, and blue images is an 8-bit image. Each RGB color pixel is said to have a depth of 24 bits. [3] RGB model is an additive color model in which red, green, and blue light are added together in various ways to reproduce a broad array of colors.
Red
Green
Color Monitor
Blue
Figure 3: RGB illustration
20
Virtual Me
2010
HSV Model RGB and CMY color models are not well suited for describing colors in terms that are practical for human interpretation, so HSV color space is very important color model for image processing applications because it represents colors similarly how the human eye senses colors.[4] The HSV color model represents every color with three components: hue (H), saturation (S), value (V). The below figure illustrates how the HSV color space represents colors.
Figure 4:HSV color space
2.1.3.1. RGB to HSV Conversion: Creating image in RGB and CMY and changing it to HSI is a straight forward process.
21
Virtual Me 2.2. Geometry overview

2.2.1. System of coordinates
2010
A point in n dimensional space is completely represented by n constants representing the length of the points vector component along each of the space axis. The axis are usually taken as orthonormal vectors for simplicity but any n independent vectors will suffice to describe the space.
2.2.2. Homogeneous coordinates Vector operations

Dot product
Also known as the scalar product of two vectors is the angle between the two vectors. And is the magnitude or length of the vector.
Cross product
This operation produces a vector in the direction normal to the two vectors.
are orthonormal unit vectors along the x, y and z directions respectively
2.2.3. Geometric Transformation

A geometric transformation is simply a function applied to a point to generate another point. Transformations include translation, rotation, scaling and affine transformation. Transformations are usually expressed as matrix multiplication. A vector V is defined in 3d as ( ) an extra one is added to allow cascaded operations. The transformed Point is defined by the equation T is the transformation matrix.
Translation transform
Translation is moving a point in a straight line. Translation is defined by the translation vector T; the unit vector defines the direction of the displacement and the magnitude of T defines the magnitude of the translation
22
Virtual Me
The translation matrix for displacement T is defined as:
2010
Rotation transform
Rotation Transform is the process of rotating the point around one of the axis. The rotation around each axis in the right direction is defined in the following equations ( ) ( )
2.3. Camera Projection Models

Camera projection is the process of mapping a 3d point (x,y,z) in the world space to a 2d point (u,v) in the image (X,Y) space. It is worth mentioning that this is not a reversible transformation. 3d info is lost in the process and the problem of re-obtaining the 3d info is nondeterministic. Various camera models exist to mathematically represent the transformation.
2.3.1. The weak projection model

Weak projection is described by the equations: Where s is a constant that varies from one projection to the next. Te weak projection model is valid for views where the variation in depth for the body is relatively small compared to the distance between the body and the camera lens. The weak projection model is considered a simplification of the pinhole camera projection model
2.3.2. The pin hole camera projection model

The pinhole camera model; also known as the perspective projection; assumes that all light passes through an infinitesimally small pinhole. All the light reflected from the objected passes through this pin hole. The image is formed at a distance f behind the lens; where f is a physical property of the lens called the focal length. The image formed is equivalent to the image projected under perspective projection to a plan at a distance f from the camera
23
Virtual Me
2010
Figure 5: pinhole camera projection (mona)
The equation for protection under the pinhole model is:
Refining the pinhole projection model

The previous equations assume an ideal camera manufacturing process which is usually not the case; actual cameras introduce other errors that need to be handled in the equations. We continue to refine our model to model this sub optimal behavior. Imager shift Due to manufacturing errors the Imager is not attached to principle axis (the center of the image). This leads to a shift in both the X and Y axis of the image. To accommodate this error the equations are modified to
The projection equations under homogeneous coordinates in matrix form: ( ) ( ) ( )
24
Virtual Me
Lens distortion There are two main lens distortions:
2010
Radial distortion: It is easier to manufacture a spherical lens rather than the ideal parabolic lens. This leads to the fish eye phenomenon where the image is distorted and bent around the optical center of the image. The distortion is directly proportional to r, the distance between the point and the optical center of the image. Radial distortion can be modeled as a Taylor expansion around r. The equations for radial distortion are:
Where are the radial distortion coefficients. The number of terms sued in the correction process depends on the camera; for modern cameras usually the 1st 2 parameters and are sufficient. Tangential distortion: Tangential distortion is caused by defects in the manufacturing process in the camera e.g. the imager not being parallel to the image plan.
Tangential distortion is minimally defined by the following equations
Where r is still the distance between the point and the optical center distortion coefficients
and
are tangential
2.4. Camera calibration

The purpose of camera calibration is to estimate the intrinsic parameters of the camera. Those parameters are fx, fy, cx, cy, k1, k2, p1and p2; as well as the extrinsic parameters of the object. Extrinsic parameters are the translation and rotation of the object relative to the camera. Calibration is performed on a series of images of a known object. Usually this is done on a known 2d pattern for practicality. A black and white chess board image is one of the most famous patterns for camera calibration. Camera calibration techniques includes Zhang and Tsai
2.5. Stereo Vision

Stereo Vision is the recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in space. The images can be obtained using multiple cameras or one moving camera.
25
Virtual Me
2010
Why we need stereo vision?

In our approach, we want to reproject two dimensions points (the image coordinates of the merker) into three dimensions (the real world coordinates of the marker) given their screen coordinates. As shown in figure 4, the 2D point (shown as a cross) corresponds to a line of points (dashed line) in 3D; any point along this line would project to the same point. Thus, the 3D position of a point cannot be determined in general from one camera.
Figure 6
Triangulation - the principle underlying stereo vision

Lets see how we can work out where a point must be in real 3D space using stereo cameras (they act exactly as if they are your left and right eye). As depicted in figure 5, the same point observed in two cameras results in two different lines which intersect at the 3D object. Hence, 3D reconstruction from stereo cameras is possible by observing its position in two different cameras. This is called Triangulation.
26
Virtual Me
2010
Figure 7
But how we can find the two corresponding point in left and right image? Epipolar Geometry would answer that question.
Epipolar Geometry
Its simple idea is: suppose an image point is observed in the left image from a 3D feature. The exact position of the point is unknown, but it will definitely lie somewhere on the back projection line (shown dashed). If we take this line and project it onto the right camera image, we get a line in the right camera image. This line in the image is called an epipolar line. So, given a feature in one image, its matching view in the other image must lie along the corresponding epipolar line.
27
Virtual Me
The Essential and Fundamental Matrices
2010
The essential matrix E captures the essential geometry of stereo imaging i.e. it contains all of the information about the translation T and the rotation R, which describe the location of the second camera relative to the first in global coordinates as shown in Figure x.
The fundamental matrix F contains the same information as E in addition to information about the intrinsic of both cameras. Once we have the fundamental matrix F, we may use it to rectify the two stereo images so that the epipolar lines are arranged along image rows and the scan lines are the same across both images.
Stereo Rectification
Given a pair of stereo images, rectification determines a transformation of each image plane such that pairs of epipolar lines become collinear and parallel to one of the image axes. The important advantage of rectification is that computing stereo correspondences (explained in next section) is reduced to a 1-D search problem along the horizontal raster lines of the rectified images. The following figure illustrates the effect of stereo rectification.
Figure 6 the search space before (1) and after (2) rectification
28
Virtual Me
2010
The result of stereo rectification is eight terms, four terms for left camera and four for the right one. Each camera terms are: 1. The distortion vector distCoeffs 2. The rotation matrix 3. The rectified camera matrix 4. The unrectified camera matrix From these terms, we can make a map which is called Rectification Map using cvInitUndistortRectifyMap function (explained in section x-y) that is used to interpolate pixels from the original image in order to create a new rectified image.
Stereo Correspondence
Finding pairs of matched points such that each point in the pair is the projection of the same 3D point. Triangulation depends on the solution of the correspondence problem. Ambiguous correspondence between points in the two images may lead to several different consistent interpretations of the scene.
The reconstruction problem

The computation of the disparity map given the corresponding points. The disparity map can be converted to a 3D map of the scene (i.e., recover the 3D structure) if the stereo geometry is known. Recovering depth (reconstruction) Consider recovering the position of P from its projections pl and pr:
In general, the two cameras are related by the following transformation:
29
Virtual Me
2010
Using Zr = Zl = Z and Xr = Xl - T:
where d = xl - xr is the disparity (i.e., the difference in the position between the corresponding points in the two images)
Stereo Vision Terminology

Fixation point: the point of intersection of the optical axis. Baseline: the distance between the centers of projection. Epipolar plane: the plane passing through the centers of projection and the point in the scene. Epipolar line: the intersection of the epipolar plane with the image plane. Conjugate pair: any point in the scene that is visible in both cameras will be projected to a pair of image points in the two images. Disparity: the distance between corresponding points when the two images are superimposed. Disparity map: the disparities of all points form the disparity map (can be displayed as an image).
30
Virtual Me
2010
2.1.1. Stereo camera parameters

Intrinsic parameters Parameters that characterize the transformation from image plane coordinate to pixel coordinates in each camera. Extrinsic parameters (R, T): Parameters that describe the relative position and orientation of the two cameras: Pr = R(Pl - T) (aligns right camera with left camera) Can be determined from the extrinsic parameters of each camera: R = Rr (Rl)^T T = Tl - (R)^T * Tr
2.6. Kinematics
Kinematics is a branch of classical mechanics concerned with the study of motion of bodies regardless of the causes of the motion like acting forces, mass etc.
2.6.1. Rigid body

A rigid body is an ideal finite solid body that does not deform regardless of the forces acting on it. The distance between any two particular points on the rigid bodies remains constant regardless of the transformation applied on it.
31
Virtual Me
2010
Rigid Bodies only support rigid transforms that maintains the rigid property e.g. rotation and translation. An example of non rigid transform is the scaling transform.
2.6.2. Kinematic chains

A kinematic chain is defined as set of rigid bodies (segments) connected by joints. A transformation applied on a particular segment affects all segments.
Joints
There are various kinds of joints but we will focus on 2 joint types: Ball/Spherical joint: is a joint that allows free rotation in any direction in 3d space. An example of ball joint is the human shoulder Hinge/Revolute joint: is a joint that allow rotation in 1 direction only. An example of the hinge joint is the rotation motion of the human elbow.
State of the kinematic chain

The state of the kinematic chain is defined by the state of every joint in the chain. Each joint state defines a transformation that applies to the subsequent part of the kinematic chains including segments and joints. By chaining those transformations the position and orientation of each segment can be uniquely identified.
2.7. Human Body

Human body can be approximated as a kinematic chain. The human body joints Degrees Of Freedom (DOF) are rotational. Two of the major joints of the human body are: The ball and socket joint: This joint is found in the hip and the shoulder and allows 3 DOFs The hinge joint : This joint is found in the elbow and the knee and allows 1 DOF
Human body joint rotations are constrained rotations. The joint has a range of rotation for each DOF. For example the hinge joint at the elbow has an approximate range of (30,180). This means that the maximum relative angle that the fore arm can make with the upper arm is 180 and the minimum is 30.
32
Virtual Me 2.8. Graphics Engine
2010
The development of games involves a lot of work that is repetitive and changes very little between games. A middle ware called a game engine builds a layer that abstracts a lot of the repetitive work. The features provided by the game engine includes: 3d rendering, network handling, AI. A game engine does not need to implement all of those features. Game Engines that focus on features concerned with rendered are called graphics engine. Some graphics engines abstraction of the hardware may allow support of cross platform development as well as operation on different drivers. Graphics engines include: OGRE Realm Forge Irrlicht
2.9. Physics Engine

The need for real life behavior of objects is rising rapidly in certain applications including: Simulation and Games (e.g. crashes and collisions). This leads to an application logic layer that is bloated with physics simulation features. This is usually very complicated and quite hard to test and implement. Physics engine implements physics simulation allowing the application designer to focus on the actual logic of the application. The physics engine process usually consists of two steps: World creation A virtual physics world is created where all objects are created and physical properties are assigned like mass, volume and velocity Simulation The physics engine applies the physical constraints like Newtons equations to determine the state of the world after a specific time Usually the graphics and game logic are called after each simulation step to handle new events resulting from the changes in the world state. Examples of physics engines: ODE Bullet Havok
33
Virtual Me
2010
3. Chapter THREE: LITRUTURE and market survey
34
Virtual Me
3.1. Literature Survey 3.2. Market Survey
2010
As a result of choosing this project, we searched for similar projects that can have functionalities like ours, so to know what we can benefit from them, to get a good insight in our product and see the projects future work. The most famous systems are: PlayStation Eye, Nintendo Wii and Arena Motion Capture.
3.2.1. PlayStation Eye

Sony PlayStation Eye is a webcam device by Sony Computer Entertainment for the PlayStation 3 video game console. It is capable of capturing standard video with frame rates of 60 hertz at a 640480 pixel resolution, and 120 hertz at 320240 pixels. It has four microphones which allow the peripheral to be used for speech recognition and audio chat in noisy environments without the use of a headset.
Figure 8: Playstation Eye
Key Features:
A sophisticated microphone with the ability to reduce background noise and focus on spoken word for smoother, more accurate speech recognition. Engineered to perform in low-light conditions. Faster frame rate for improved tracking, responsiveness and smoothness. Two position zoom lens for close-up and full body options. Free EyeCreate editing software, which allows users to save, edit and add cool visual effects to photos, video and audio clips.
35
Virtual Me
2010
EyeCreate also enables user to capture videos and audio clips directly to your PS3s hard disk drive and access a range of different capturing modes, including slow motion and time-lapse.
Price:
PlayStation Eye is available for $39.99.
3.2.2. Nintendo Wii

A video game system introduced in 2006, uses a wireless remote control (Wiimote) for sensing the position and motion which allow the player to interact with games through movements, for example in tennis the player swing the Wiimote as if it were a tennis racket, sensors in the Wiimote sends those motion to the game console which apply a same motion on the game character.
Figure 9:Nintendo game
Wiimote includes an accelerometer to determine the movements speed, and an infrared camera to determine location relative to the LED lights in a sensor bar, also Bluetooth is used to determine the buttons that the user pushes on the controller.
3.2.3. ARENA motion capture

ARENA is an intuitive motion capture software, incorporating capture, and export features. Used with OptiTrack FLEX: V100R2 cameras, ARENA is a tightly integrated motion capture tool, giving users precise capture data and control over final output sequences. The Foundation Package includes all the building blocks of an advanced motion capture system. The Foundation Package is the perfect building block for professionals and amateurs, working in Poser or Motion Builder
36
Virtual Me
Key Feature:
2010
Single User Control: With user defined timers for capture sessions, and real time 3D view feedback, a single user can be a motion capture operator and actor. Simple Camera Calibration: Up to 24 cameras can be calibrated in capture volumes up to 400 square feet (20' x 20'), with capture volume preview. Automatic Skeletons & Marker Assignment: Generate automatic skeletons and marker assignments with the easy-to-follow Skeleton Wizard. Real-time Data Capture: Preview your motion capture data in real time, with full body rendering. Built-in Data Editor: Advanced editing tools let users work with their capture data.
Technical Specifications:
6 FLEX: V100R2 cameras
Price:
The motion capture software with the 6 cameras is available for:$ 5,999
37
Virtual Me
2010
4. Chapter four: System Analysis
38
Virtual Me
4.1. Development Tools & Environment
Microsoft Visual Studio 2008 Microsoft Visual C++ 2008 Intel OpenCV2.0 image processing library Minoru 3d web cam Irrlicht Game and Graphics Engine Havok Physics Engine AQtime Profiler Trial version Intel Parallel Studio
2010
39
Virtual Me
4.2. Basic Blocks
The following figure shows the basic blocks and interactions in our framework
2010
Virtual ME
Left Web Cam
Right Web Cam
Left Video Feed
Figure 10:Virtual Me basic blocks
Chessboard Images
Images
Left and Right Video Feeds
Marker Extraction
Markers Positions in image
Rectification Maps
Markers Estimated 3d Position
Spatial Correspondence
Disparity Map
Reconstructed Pose
Pose Reconstruction
To Application
Figure 11 Virtual Me Basic Flow
40
Virtual Me
2010
4.3. Frame Work Modules

4.3.1. VM-M-00 Marker Extraction
Description This Module is responsible for detecting the markers (some colors) position then it segment them and classify each center point of a segment to a body part. Interface Input: Image Output:2D points of the markers center, this output goes to the depth extraction module Flow chart
Frame (RGB Image)
Points of each marker
Capture Interface
Color Detection
Segmentation
Center of each Segment
Classification
Mapping each marker to a body part
Figure 12:Marker extraction flow chart
Major Components
VM- 00-C-00 Color Detection There are four colors to be detected Red, Green, Blue and Yellow, to be able to control the color saturation and brightness the RGB image is converted first to the HSV image.
41
Virtual Me
2010
Each Color has a specific range at the Hue, Saturation, and Value channel, No perfect specific ranges are known, and by trials we have got this ranges for our markers: Color name (hue min, hue max, saturation min, saturation max. value min, value max) NB: OpenCV just uses Hues between 0 to 179 Blue (0, 30,140,255,140,255) Green (40, 80,140,255,100,255) Yellow (90,100,140,255,100,255) Red (110, 150, 140,255,100,255) So if the Color Detection found those ranges it adds the position of those pixels in structure then it passes it to the segmentation component. Noise handling: sometimes the noise in the frame affect our detected values, so in a noisy environment we uses Erode and dilate function to eliminate the noise in the frame, but we have preferred to use white background to eliminate the noise instead of using Erode and Dilate because they are extremely time consuming. VM- 00-C-01 Segmentation: The segmentation takes as an input the position of each pixel of the markers (a point in 2D), and then decides which points belongs to the same segment, and the return the center position of each segment. The importance of the segmentation in our project is that we use the same color for different body part, so we need to collect the connected component so as to differentiate between those parts. Example: detecting the green color in the image will return a large number of green points in the image; these points represent markers on the right and left shoulders, after passing these points on the segmentation function, to will return 2 components representing the left and right shoulders. Noise handling: We have faced some noise problems, the marker sometimes is divided into smaller markers due to lighting conditions, and so at the end of segmentation we connect the segments that are very near to each other. Also to eliminate the noise, we ignore any segment that has a number of points smaller than specific threshold.
42
Virtual Me
VM- 00-C-02 Classification:
2010
The classification takes the center of each marker as an input to map each marker to a body part based on the position of the marker and the knowledge of the human body. Example: in the following figure we classify the min x & min y green component as the right shoulder
Figure 13: Virtual Me marker distribution
VM- 00-C-04 Prediction The problem of marker disappearance exist in our project, so we use kalman filter for predicting the markers position from the first frame in our system to train the filter, and if the marker is missed at any frame the predicted value is used. The kalman filters takes as an input the previous points and predict the current one. For more details about kalman filter refer to appendix B.
4.3.2. Spatial Correspondence

Description This module is responsible for extracting the real world coordinates (X, Y, Z) of markers positions. The following figure is a high level description of depth extraction module. Interface Input: 1. Calibration parameters (intrinsic parameters of left and right image). 2. The left and right images to be rectified. Output: The 3D coordinates of markers detected in the stereo images provided as input.
43
Virtual Me
2010
Flow Diagram Calibration parameters
Left & right images
Stereo Calibration
Calibration parameters
Depth Map Calculation

Rectified images
3D Extraction
Figure 14 Depth Extraction Block Diagram
Module decomposition Stereo Calibration (offline) The Block Diagram in Figure x-y is a high level description of the initialization process. As shown in the figure, it takes a set of chessboard images captured by the webcam as an input and provides the left and right rectification maps needed for stereo correspondence.
Webcam Find Chessboard Corners Stereo Calibration
Chessboard images Corner 2D coordinates
Figure x Stereo Calibration Block Diagram
44
Virtual Me
Find Chessboard Corners
2010
First of all we need to calibrate any camera used in the system because of that camera calibration is important for any image processing to be a highly accurate representation of the real world. Camera calibration can be done very efficiently by observing a calibration object (usually consists of two or three planes orthogonal to each other) whose geometry in 3-D space is known with very good precision. Camera calibration needs that object as it needs for its computations a set of 3D points in the real world along with their 2D points in the image plane resulting from projecting the 3D points on the image plane. Chessboard is the practical choice as we know the dimensions of the squares of the grid; therefore we can compute the actual coordinates of every corner in the grid. Figure x shows the chessboard we used in calibration.
45
Virtual Me
2010
Figure x the chessboard used in calibration Steps for getting set of 3D and 2D chessboard corners points: 1. Capture different images for the chessboard with different orientations. 2. Input each captured image to a corner finder function (cvFindChessboardCorners() in OpenCV [x]) , this function returns the image coordinates of every corner in the input image. 3. Now there is a set of 3D points with their corresponding 2D points which is passed to the calibration function.
Stereo Calibration
We perform stereo calibration in order to find the rotation matrix R and translation vector T between the two cameras. Steps for getting the rotation matrix R and translation vector T: 1. Use cvCalibrateCamera2() to find rotation matrix and translation vector of each camera separately. 2. Plug these parameters into the following equation to find the rotation and translation parameters between the two cameras:
46
Virtual Me
2010
Intrinsic cameras parameters
Get Rectification Matrices

Rectification matrices
Rectify Left & Right Images

Figure 15 Stereo Rectification Block Diagram
For the purpose of our project, we treat this part as a black-box, since it a pure mathematics part and there exist a number of functions for doing it. We first used the implementation in opencv which is based on Bouguets algorithm [x], but it is not very efficient with our system. We then use cvWarpPerspective() (refer to [x] for more details) opencv function which improved the accuracy of rectification.
Depth Map Calculation

In our approach, we use calibrated stereo correspondence approach. This approach assumes that we know the parameters for the stereo rig and reduce our search for correspondences to the epipolar lines. The key features of this type of correspondence processing is that the epipolar lines are easily determine, since camera parameters are known and the emphasis is on increasing the speed and accuracy of the search itself. For the purposes of our project we treat this part as a black-box, since we know there exist a number of useful algorithms for doing this task. We use the implementation in openCV which is cvFindStereoCorrespondenceBM() function.
47
Virtual Me
A pair of rectified images
2010
cvFindStereoCorrespondenceBM ()
Depth map
As shown in figure x, the function assumes the input to be a pair left and right rectified images and outputs the disparity map which has the disparity value of each point belongs to the visual areas in which the views of the two cameras overlap. After that, once we know the physical coordinates of the cameras scene, we can derive depth measurements from the triangulated disparity measures (explained in section x-y) between the corresponding points in the two different camera views. We use cvFindStereoCorrespondenceBM() opencv function which implements the stereo correspondence very effectively.
1. 3D Reconstruction (Depth Map)

Given the disparity value d of a certain 2D point (x, y) and the 4-by-4 reprojection matrix Q which is obtained from the calibrated stereo rectification step, we can derive the actual coordinates of that point using the following equation:
[ ]
[ ]
where the real world coordinates are then (X/W, Y/W, Z/W). OpenCV has two functions that do this extraction automatically. The first operates on an array of points and their associated disparities which is called cvPerspectiveTransform. The second operates on whole image and its associated disparity map which is called cvReprojectImageTo3D. For our approach, we choose the first function as it is more efficient with respect to and memory as we dont need to find the actual coordinates of all pixels of the image but for specific points (markers positions) of the image.
48
Virtual Me
4.3.3. VM-M-01 Pose Reconstruction
2010
Description: This Module is responsible for generating the pose information. It handles reconstruction for the human body pose from the 3d points. The module works with a set of constraints during reconstruction to make sure the constructed pose is valid. For performance reasons this module is dependent on both the articulated body and the segments being tracked in the body. Interface: Input: o The position of markers in 3d o The classification of each marker Output o The Reconstructed 3d human pose Flowchart: Constraints
Markers positions and classification
Pose Reconstruction
Pose Data
Figure 16 pose reconstruction
The pose data is presented in two representations. The spherical coordinates of each segment of the human body as well as the unit vector along each segment in the human body. Major Components: VM-02-C-00 Reconstruction: The reconstruction the component is the core component of the pose reconstruction module. The reconstruction component uses vector algebra to obtain the orientation of each individual joint. The idea is that each segment vector is approximated by the 2 markers at the ends of the segment. For performance reasons the implemented reconstruction module only enforces the constant limb length constraints. VM-02-C-01Pose:
49
Virtual Me
2010
Pose is the output of the frame work. It is mainly generated by the Reconstruction. It stores the needed information about each segment to reconstruct the segment pose. Essentially constructing the human body pose.
50
Virtual Me
4.4. Virtual ME Application
2010
A general sequence of how Virtual ME is used in an application is shown in the following figure.
Virtual ME App
Initialize Application
1. Initialize Virtual Me Frame work and Create frame work interface
2. Query the frame work for new pose information
Application Processing Loop
3. Frame Work result
Processing delay
Virtual ME
Exit Application
4. Release virtual me resources
Figure 17 Virtual ME Application
Basically there are 4 major steps for using Virtual ME: 1. Initialization: The first step for using virtual ME is initialization. The application created the Virtual ME interface; the class is used to communicate with virtual ME. Behind the scenes the frame work allocates required resources most notably the capture interface to interface with the camera. 2. Query for Pose: In this step the application is basically asking the frame work for input. It is worth mentioning that there is no buffering feature in virtual ME; frames can be lost due to failure to acquire them from the cam before they are discarded. It is also worth mentioning that theirs is usually a delay for the virtual me function call .This delay can cause the application to become unresponsive. Hence it is advisable to run the frame work in a different thread. 3. Pose returned:
51
Virtual Me
2010
When the frame work processing is done the result is returned to the application. The application can then use the results to animate objects, create events, etc. 4. Release Resources: This is the last step for using the frame work. The frame work is resources are deallocated to prevent resources and memory leaks.
4.3.1. Virtual Goal Keeper

Virtual goal keeper is a virtual ME application. It is a simple game where the player takes the role of a goal keeper. The player tries to protect the goal from the incoming balls. The player motion is captured and applied to the players avatar. The game uses physics engine to handle collisions. Essentially the game consists of 2 different worlds: The graphics world that is rendered on the screen and seen by the player. The physics world; a virtual world where objects physics are simulated to get their new states e.g. position and velocity. The idea is to have two versions of each game entity one for graphics containing shape, textures and visual elements, and one for physics containing properties like mass and velocity. To facilitate this every game entity has 2 attached objects for graphics and physics.
Game Entity
Entity Physics
Figure 18 Game Entity
Entity Graphics
The update process of the game world is divided into two steps: Update physics Update Graphics to match physics
52
Virtual Me
Virtual Goal Keeper
2010
Initialize Application
Game Loop
Handle Game Logic Check Virtual ME for output Update Physics

2. New Physics World State
1. Advance the Physics World
Physics Engine
Update Object States Update Graphics

3. New Graphics World State
Graphics Engine
Exit & Release Resources

Figure 19 Virtual Goal Keeper Game Flow
53
Virtual Me
2010
During the Initialization phase Virtual ME frame work is initialized and started in a thread. The physics and graphics world are also initialized; this includes initializing the graphics and physics engines. The Application then starts the game loop. The main operations in the game loop are checking the Virtual ME Thread and restarting it when needed. Running physics, handling game logic then updating the displayed world.
4.3.2. VMAPP-M-01 Game Logic Description

This module is the centralized control of the application. It is responsible for handling the user input, keeping track of goals scored, as well as supervising the physics and graphics module.
4.3.3. VMAPP-M-01 Game Physics Description

This module is responsible for handling the physics world. It acts as a wrapper for havok physics engine. The Module handles most of the physics requirements of the game transparently.
4.3.4. VMAPP-M-02 Game Graphics Description

This module is responsible for handling the graphics in game. It wraps Irrlicht and is responsible for the game graphics object as well as the camera.
54
Virtual Me
2010
4.4. System Flow 4.5. Limitations

Markers Colors: we use only 4 colors to identify the basic parts of the body, so some parts takes the same color and we differentiate between them by the position of it w.r.t. the other same color marker (e.g. the left and right shoulders are green, but the right shoulder is always have min x value) so we limit the motion of the user by the following constraints:
Body part Shoulders Elbow Wrist Position
Constraint Shoulders cannot be lower than the waist Elbow cannot be lower than the right knee Wrist cannot be lower than the ankle The user cannot turn abound
Table 2:Limitation table
55
Virtual Me
2010
5.
Chapter 5 :Results, Conclusion & Future Work

5.1. Testing & Results
5.1.1. System testing: 5.1.2. Module testing:
5.2. Conclusion
5.2.1. Methodology Error
The methodology introduces two main types of errors: Inherent methodology error and Interpolation error.
Inherent Methodology Error:

This error sums up all of the inaccuracies in the library output during non interpolation phase. The major sources of these errors are: 1. Error in marker placement: because of the limitation on the number of markers; each two markers sharing the same joint are aggregated into one marker. This is okay as long as the marker is placed exactly on the joint which is not very practical. 2. Marker Center Error: Due to lightening conditions the detected portion of the marker may vary; this in turns affects the representative points which affect the orientation vector of the segment. 3. Bad/Inaccurate 3d remapping: The algorithm that generates the depth map fails to generate a complete depth map; if the marker position lies within a bad disparity part of the depth map the result have to be approximated. Also the accuracy of the depth map is not very high and can lead to instability in the position of a stationary point over time in the video. 4. Noise leading to bad marker classification: The marker classification is based on relative spatial positions of the markers. Noise can lead to bad classification in markers. This is especially devastating because at least 2 segments are affected by the marker. 5. Inherent mathematical inaccuracies: numerical errors can accumulate leading to inaccuracies in orientation vectors.
56
Virtual Me
2010
Figure 20 Marker Position Error
The Arm on the left has the ideal marker placement while the arm on the right has only 3 markers and one of them is misplaced this leads to a deviation in the orientation vector.
Interpolation Error:
Calling the complete framework sequence for each frame may pose a huge impact on performance. To handle this problem the output can be interpolated between each two successive framework calls. Such that the jerky motion from the framework is smoothed across a number of frames to give the motion a smoother feel. This interpolation however is not the actual motion and change in the motion between each 2 interpolation frames may be lost.
5.2.2. Quality of reconstruction 5.2.3. Testing result 5.2.4. Performance enhancement

Two performance enhancement techniques have been used to improve the run time of the framework and application.
Code optimization
Using profilers a list of functions that take up a significant percentage of runtime was compiled. The list was then divided to 3 categories: Black box functions: The external library functions used by the framework; these functions are quite hard to optimize Performance/accuracy functions :Functions whose processing depends on some parameter and the accuracy varies inversely with the speed Other functions: functions that do not fit into either of the above categories
57
Virtual Me
2010
For the black box functions we chose not to optimize since their optimization would take a significant amount of time and is quite risky; The other functions categories were the focus of optimization by inlining some strategic functions and changing the data structures use a performance increase of --- was achieved
Threading
In the initial runs the Frame work calls where putting a significant time penalty on the application render loop adding a stale and an application freeze whenever the frame work was called. To handle this; the logic was modified such that the library call is handled in an independent thread. The stale to the render loop was significantly reduced.
Future Work
5.2.5. Robust marker detection and classifier
A Nave approach for detecting the markers and classifying them is used in Virtual ME. This imposed constraint on the player pose; as well as introduced inaccuracies in the makers tracking process.
5.2.6. More Accurate Methods to get the depth

The current way of handling stereo calibration and producing the depth map is vulnerable to bad disparity values as well as erroneous results.
5.2.7. More Constraint for the pose constructor

The current pose constructor model only applied constant length constraints. More constraints will help the pose construction process as well as prevent erroneous values. Examples of additional constraints are angular velocities of joints and additional constraints on joints allowable orientations
Educational Value <Move to conclusion>
REFERENCES
[1] http://homepages.inf.ed.ac.uk/rbf/HIPR2/label.htm [2]opencv book [3]http://en.wikipedia.org/wiki/Kalman_filter
58
Virtual Me
2010
59
Virtual Me
ARABIC SUMMARY
0102
. .
06
Virtual Me
Appendix A. User Manual
2010
61
Virtual Me
Appendix B.
Kalman Filter
2010
The basic idea behind the Kalman filter is that it will be possiblegiven a history of measurements of a systemto build a model for the state of the system that maximizes the a posterior it probability of those previous measurements. What does it mean to maximize the a posteriori probability of those previous measurements? It means that the new model we construct after making a measurementtaking into account both our previous model with its uncertainty and the new measurement with its uncertaintyis the model that has the highest probability of being correct. This means that the Kalman filter is the best way to combine data from different sources or from the same source at different times.[2]
Kalman Equations
The Kalman filter model assumes that the true state at time k is evolved from the state at (k 1) according to
Where Fk is the state transition model which is applied to the previous state xk1; Bk is the control-input model which is applied to the control vector uk ; wk is the process noise which is assumed to be drawn from a zero mean multivariate normal distribution with covariance Qk. At time k an observation (or measurement) zk of the true state xk is made according to
Where Hk is the observation model which maps the true state space into the observed space vk is the observation noise which is assumed to be zero mean Gaussian white noise with covariance Rk. The initial state, and the noise vectors at each step {x0, w1... wk, v1 ... vk} are all assumed to be mutually independent.[3]
62
Virtual Me
Appendix C. Installation & Operation Manual
2010
63
Virtual Me
Appendix D. Programmers Manual
2010
64
Virtual Me
Appendix E. Design Outline
2010
65
Virtual Me
Appendix F. Test Cases
1. VM-M-00 Marker Extraction:
Test case Empty frame All red frame No camera attached Not all markers available A lot of markers available Noisy frame Values Null frame Red image Unplug the webcam Hide the left hand
2010
Remove the white background
Bring some colored objects in front of the cam

Table 3: Marker extraction test cases
5.2.8. Component testing

1. VM- M-01 Color Detection
Test case Empty frame All red frame All green frame All blue frame All yellow frame Noisy frame with a lot of colors Values Null frame Red image Green image Blue image Yellow image Bring some colored objects in front of the cam
66
Virtual Me
Table 4:color detection test cases
2010
2. VM- M-02 Segmentation

Test case Empty component Very large number of points 2 non connected components 2 very close non connected components Component with very few points Values White image Bring some colored objects in front of the cam Image with 2 red balls
Image with close 2 red balls
Image with one pixel
Table 5:Segmentation test cases
3. VM- M-03Classification
Test case Empty container Container with large number of components Container with only one component Container with a components of a center at the image boundaries Values White image Image with 10 red balls
Image with one red ball
Image with red corners
Table 6: Classification test cases
4. VM- M-04 Prediction

Test case Frame with Values Image with only one green, one red, one blue & one yellow
67
Virtual Me
a missing marker balls
2010
Table 7:Prediction test cases
68

Virtual ME 3.0

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Virtual ME 3.0

Hochgeladen von

Copyright:

Verfügbare Formate

Virtual Me

Faculty of Engineering Computer Engineering Department

Menna Hamza Mohamad Hesham

Mona AbdelMageed Yasmine Shaker

Dr. Magda K. Fayek

hamza.menna@gmail.com mohamad.hesham@studentpartners.com monaabdelmageed@yahoo.com yasmine.shaker88@gmail.com

0100173604 0106172670 0161800073 0100766122

Dr. Magda Fayek

CHAPTER FOUR: SYSTEM ANALYSIS................................................................................................ 38

TEST CASES ............................................................................................................................ 66

Component testing .......................................................................................................................... 66

Table 1. Table of Acronyms and Definitions

1. Chapter One: Introduction

1.1. Motivation and Justification

1.2. Problem Definition

1.3. Summary of Approach

Pose Reconstruction: (summary H)

1.4. Document Structure

2. Chapter two: the Science behind Virtual ME

2.1. Image Processing

2.1.1. Image Smoothing

Figure 1: median filter

Dilation and Erosion

Edge Finding Technique

Connected component labeling

2.1.3. Color detection

Figure 4:HSV color space

Virtual Me 2.2. Geometry overview

2.2.2. Homogeneous coordinates Vector operations

are orthonormal unit vectors along the x, y and z directions respectively

2.2.3. Geometric Transformation

2.3. Camera Projection Models

2.3.1. The weak projection model

2.3.2. The pin hole camera projection model

Figure 5: pinhole camera projection (mona)

The equation for protection under the pinhole model is:

Refining the pinhole projection model

The projection equations under homogeneous coordinates in matrix form: ( ) ( ) ( )

Tangential distortion is minimally defined by the following equations

2.4. Camera calibration

2.5. Stereo Vision

Why we need stereo vision?

Triangulation - the principle underlying stereo vision

The reconstruction problem

In general, the two cameras are related by the following transformation:

Stereo Vision Terminology

2.1.1. Stereo camera parameters

2.6.1. Rigid body

2.6.2. Kinematic chains

State of the kinematic chain

2.7. Human Body

Virtual Me 2.8. Graphics Engine

2.9. Physics Engine

3. Chapter THREE: LITRUTURE and market survey

3.2.1. PlayStation Eye

Figure 8: Playstation Eye

3.2.2. Nintendo Wii

Figure 9:Nintendo game

3.2.3. ARENA motion capture

4. Chapter four: System Analysis

Left Web Cam

Right Web Cam

Left Video Feed