Beruflich Dokumente
Kultur Dokumente
2010
Virtual ME
A Graduation Project Report Presented in Partial Fulfillment of the B.Sc. Degree in Computer Engineering
`
Submitted by
Supervised by
Virtual Me
2010
Abstract
In this document, we propose a real-time 3D human body tracking system that utilizes proprietary video cameras to track the motion of colored markers attached to particular locations on the actors body. The locations of the markers on the body are chosen such that the basic body parts (e.g. the arm) are covered. The problem is decomposed into 3 sub-problems: marker extraction, depth extraction and pose reconstruction. We implement a modular framework such that each module tackles on those sub-problems. We also implement an application that utilizes our framework.
Virtual Me
2010
ACKNOWLEDGEMENTS
Dr. Magda Fayek: Thanks for your help and support. The OpenCV team and community. Our families: for understanding.
Virtual Me
CONTACTS
Name E-mail Mobile
2010
Menna Hamza Mohammed Mohamed Hesham Fadl Mona Abdel-Mageed El Koussy Yasmine Shaker AbdelHameed
Supervisor
Name E-mail Mobile
magdafayek@gmail.com
0101589411
Virtual Me
TABLE OF CONTENT
2010
ACKNOWLEDGEMENTS ..................................................................................................................................... 3 CONTACTS ............................................................................................................................................................. 4 SUPERVISOR .......................................................................................................................................................... 4 LIST OF TABLES ................................................................................................................................................... 8 LIST OF FIGURES ................................................................................................................................................. 9 TABLE OF ACRONYMS AND DEFINITIONS ...................................................................................................................... 10 DOCUMENT HISTORY ...................................................................................................................................... 11 1. CHAPTER ONE: INTRODUCTION ........................................................................................................ 12 1.1. 1.2. 1.3. 1.4. 2. MOTIVATION AND JUSTIFICATION ..................................................................................................................... 13 PROBLEM DEFINITION..................................................................................................................................... 13 SUMMARY OF APPROACH ................................................................................................................................ 13 DOCUMENT STRUCTURE.................................................................................................................................. 14
CHAPTER TWO: THE SCIENCE BEHIND VIRTUAL ME ................................................................ 16 2.1. IMAGE PROCESSING ....................................................................................................................................... 17 2.1.1. IMAGE SMOOTHING .................................................................................................................................. 17 Median Filter .................................................................................................................................................. 17 Dilation and Erosion ....................................................................................................................................... 17 2.1.2. SEGMENTATION ........................................................................................................................................ 18 K-means Cluster .............................................................................................................................................. 18 Threshold Technique ....................................................................................................................................... 18 Edge Finding Technique .................................................................................................................................. 19 Connected component labeling ...................................................................................................................... 19 2.1.3. COLOR DETECTION .................................................................................................................................... 20 Color Models ................................................................................................................................................... 20 2.2. GEOMETRY OVERVIEW .................................................................................................................................... 22 2.2.1. System of coordinates ...................................................................................................................... 22 2.2.2. Homogeneous coordinates .............................................................................................................. 22 Vector operations ........................................................................................................................................... 22 2.2.3. Geometric Transformation .............................................................................................................. 22 2.3. CAMERA PROJECTION MODELS......................................................................................................................... 23 2.3.1. The weak projection model.............................................................................................................. 23 2.3.2. The pin hole camera projection model ............................................................................................ 23 Refining the pinhole projection model ............................................................................................................ 24 2.4. CAMERA CALIBRATION .................................................................................................................................... 25 2.5. STEREO VISION.............................................................................................................................................. 25 Stereo Vision Terminology ............................................................................................................................. 30 2.1.1. Stereo camera parameters............................................................................................................... 31 2.6. KINEMATICS.................................................................................................................................................. 31 2.6.1. Rigid body ........................................................................................................................................ 31 2.6.2. Kinematic chains.............................................................................................................................. 32 2.7. HUMAN BODY............................................................................................................................................... 32 2.8. GRAPHICS ENGINE ......................................................................................................................................... 33 2.9. PHYSICS ENGINE ............................................................................................................................................ 33
Virtual Me
3.
2010
CHAPTER THREE: LITRUTURE AND MARKET SURVEY ............................................................. 34 3.1. LITERATURE SURVEY ....................................................................................................................................... 35 3.2. MARKET SURVEY ........................................................................................................................................... 35 3.2.1. PlayStation Eye ................................................................................................................................ 35 Key Features: .................................................................................................................................................. 35 Price: ............................................................................................................................................................... 36 3.2.2. Nintendo Wii.................................................................................................................................... 36 3.2.3. ARENA motion capture .................................................................................................................... 36 Key Feature: .................................................................................................................................................... 37 Technical Specifications: ................................................................................................................................. 37 Price: ............................................................................................................................................................... 37
4.
4.1. DEVELOPMENT TOOLS & ENVIRONMENT ............................................................................................................ 39 4.2. BASIC BLOCKS ......................................................................................................................................... 40 4.3. FRAME WORK MODULES .................................................................................................................................... 41 4.3.1. VM-M-00 Marker Extraction ................................................................................................................ 41 4.3.2. Spatial Correspondence ................................................................................................................... 43 THE BLOCK DIAGRAM IN FIGURE X-Y IS A HIGH LEVEL DESCRIPTION OF THE INITIALIZATION PROCESS. AS SHOWN IN THE FIGURE, IT TAKES A SET OF CHESSBOARD IMAGES CAPTURED BY THE WEBCAM AS AN INPUT AND PROVIDES THE LEFT AND RIGHT RECTIFICATION MAPS NEEDED FOR STEREO CORRESPONDENCE. ......................................................... 44 FIND CHESSBOARD CORNERS ............................................................................................................................ 45 FIRST OF ALL WE NEED TO CALIBRATE ANY CAMERA USED IN THE SYSTEM BECAUSE OF THAT CAMERA
CALIBRATION IS IMPORTANT FOR ANY IMAGE PROCESSING TO BE A HIGHLY ACCURATE REPRESENTATION OF THE REAL WORLD. CAMERA CALIBRATION CAN BE DONE VERY EFFICIENTLY BY OBSERVING A CALIBRATION OBJECT (USUALLY CONSISTS OF TWO OR THREE PLANES ORTHOGONAL TO EACH OTHER) WHOSE GEOMETRY IN 3-D SPACE IS KNOWN WITH VERY GOOD PRECISION. CAMERA CALIBRATION NEEDS THAT OBJECT AS IT NEEDS FOR ITS COMPUTATIONS A SET OF 3D POINTS IN THE REAL WORLD ALONG WITH THEIR 2D POINTS IN THE IMAGE PLANE RESULTING FROM PROJECTING THE 3D POINTS ON THE IMAGE PLANE. CHESSBOARD IS THE PRACTICAL CHOICE AS WE KNOW THE DIMENSIONS OF THE SQUARES OF THE GRID; THEREFORE WE CAN COMPUTE THE ACTUAL COORDINATES OF EVERY CORNER IN THE GRID. FIGURE X SHOWS THE CHESSBOARD WE USED IN CALIBRATION. ................................................................................... 45
FIGURE X THE CHESSBOARD USED IN CALIBRATION ......................................................................................... 46 STEPS FOR GETTING SET OF 3D AND 2D CHESSBOARD CORNERS POINTS: ....................................................... 46 1. CAPTURE DIFFERENT IMAGES FOR THE CHESSBOARD WITH DIFFERENT ORIENTATIONS.............................. 46 2. INPUT EACH CAPTURED IMAGE TO A CORNER FINDER FUNCTION (CVFINDCHESSBOARDCORNERS() IN OPENCV [X]) , THIS FUNCTION RETURNS THE IMAGE COORDINATES OF EVERY CORNER IN THE INPUT IMAGE. ............. 46 3. NOW THERE IS A SET OF 3D POINTS WITH THEIR CORRESPONDING 2D POINTS WHICH IS PASSED TO THE CALIBRATION FUNCTION. .............................................................................................................................................. 46 STEREO CALIBRATION ....................................................................................................................................... 46 WE PERFORM STEREO CALIBRATION IN ORDER TO FIND THE ROTATION MATRIX R AND TRANSLATION VECTOR T BETWEEN THE TWO CAMERAS. ...................................................................................................................................... 46 STEPS FOR GETTING THE ROTATION MATRIX R AND TRANSLATION VECTOR T: ............................................. 46 1. USE CVCALIBRATECAMERA2() TO FIND ROTATION MATRIX AND TRANSLATION VECTOR OF EACH CAMERA SEPARATELY. ................................................................................................................................................................ 46 2. PLUG THESE PARAMETERS INTO THE FOLLOWING EQUATION TO FIND THE ROTATION AND TRANSLATION PARAMETERS BETWEEN THE TWO CAMERAS: ................................................................................................................ 46 .......................................................................................................................................................... 46 .................................................................................................................................................... 46 STEREO RECTIFICATION .................................................................................................................................... 47 1. DEPTH MAP CALCULATION..................................................................................................................... 47 IN OUR APPROACH, WE USE CALIBRATED STEREO CORRESPONDENCE APPROACH. THIS APPROACH ASSUMES THAT
WE KNOW THE PARAMETERS FOR THE STEREO RIG AND REDUCE OUR SEARCH FOR CORRESPONDENCES TO THE EPIPOLAR LINES. THE KEY FEATURES OF THIS TYPE OF CORRESPONDENCE PROCESSING IS THAT THE EPIPOLAR LINES
Virtual Me
2010
ARE EASILY DETERMINE, SINCE CAMERA PARAMETERS ARE KNOWN AND THE EMPHASIS IS ON INCREASING THE SPEED AND ACCURACY OF THE SEARCH ITSELF. FOR THE PURPOSES OF OUR PROJECT WE TREAT THIS PART AS A BLACK-BOX, SINCE WE KNOW THERE EXIST A NUMBER OF USEFUL ALGORITHMS FOR DOING THIS TASK. WE USE THE IMPLEMENTATION IN OPENCV WHICH IS CVFINDSTEREOCORRESPONDENCEBM() FUNCTION. ..................................... 47
A PAIR OF RECTIFIED IMAGES ............................................................................................................................... 48 DEPTH MAP .......................................................................................................................................................... 48 AS SHOWN IN FIGURE X, THE FUNCTION ASSUMES THE INPUT TO BE A PAIR LEFT AND RIGHT RECTIFIED IMAGES
AND OUTPUTS THE DISPARITY MAP WHICH HAS THE DISPARITY VALUE OF EACH POINT BELONGS TO THE VISUAL AREAS IN WHICH THE VIEWS OF THE TWO CAMERAS OVERLAP. ................................................................................................ 48 AFTER THAT, ONCE WE KNOW THE PHYSICAL COORDINATES OF THE CAMERAS SCENE, WE CAN DERIVE DEPTH MEASUREMENTS FROM THE TRIANGULATED DISPARITY MEASURES (EXPLAINED IN SECTION X-Y) BETWEEN THE CORRESPONDING POINTS IN THE TWO DIFFERENT CAMERA VIEWS. ............................................................................... 48 WE USE CVFINDSTEREOCORRESPONDENCEBM() OPENCV FUNCTION WHICH IMPLEMENTS THE STEREO CORRESPONDENCE VERY EFFECTIVELY. ........................................................................................................................ 48
3D RECONSTRUCTION (DEPTH MAP) ...................................................................................................... 48 4.3.3. VM-M-01 Pose Reconstruction ........................................................................................................ 49 4.4. Virtual ME Application ......................................................................................................................... 51 4.3.1. Virtual Goal Keeper ......................................................................................................................... 52 4.3.2. VMAPP-M-01 Game Logic ............................................................................................................... 54 Description ...................................................................................................................................................... 54 4.3.3. VMAPP-M-01 Game Physics ............................................................................................................ 54 Description ...................................................................................................................................................... 54 4.3.4. VMAPP-M-02 Game Graphics ......................................................................................................... 54 Description ...................................................................................................................................................... 54 4.4. SYSTEM FLOW ............................................................................................................................................... 55 4.5. LIMITATIONS ................................................................................................................................................. 55 2. 5. CHAPTER 5 :RESULTS, CONCLUSION & FUTURE WORK ............................................................ 56 5.1. TESTING & RESULTS ....................................................................................................................................... 56 5.1.1. System testing: ................................................................................................................................ 56 5.1.2. Module testing: ............................................................................................................................... 56 5.2. CONCLUSION ................................................................................................................................................ 56 5.2.1. Methodology Error .......................................................................................................................... 56 Inherent Methodology Error: .......................................................................................................................... 56 Interpolation Error: ......................................................................................................................................... 57 5.2.2. Quality of reconstruction ................................................................................................................. 57 5.2.3. Testing result ................................................................................................................................... 57 5.2.4. Performance enhancement ............................................................................................................. 57 Code optimization........................................................................................................................................... 57 Threading........................................................................................................................................................ 58 5.2.5. Robust marker detection and classifier ........................................................................................... 58 5.2.6. More Accurate Methods to get the depth ....................................................................................... 58 5.2.7. More Constraint for the pose constructor ....................................................................................... 58 REFERENCES ....................................................................................................................................................... 58 APPENDIX A. USER MANUAL ...................................................................................................................... 61
APPENDIX B. ........................................................................................................................................................ 62 APPENDIX C. APPENDIX D. APPENDIX E. INSTALLATION & OPERATION MANUAL ..................................................................... 63 PROGRAMMERS MANUAL ............................................................................................... 64 DESIGN OUTLINE ................................................................................................................. 65
Virtual Me
APPENDIX F. 5.2.8.
2010
LIST OF TABLES
Table 1. Table of Acronyms and Definitions ............................................................................................................. 10 Table 2:Limitation table ............................................................................................................................................ 55 Table 3: Marker extraction test cases ............................................................................. Error! Bookmark not defined. Table 4:color detection test cases ................................................................................... Error! Bookmark not defined. Table 5:Segmentation test cases ..................................................................................... Error! Bookmark not defined. Table 6: Classification test cases ..................................................................................... Error! Bookmark not defined. Table 7:Prediction test cases........................................................................................... Error! Bookmark not defined.
Virtual Me
LIST OF FIGURES
2010
Figure 1: median filter .............................................................................................................................................. 17 Figure 2: Segmentation ............................................................................................................................................. 19 Figure 3: RGB illustration .......................................................................................................................................... 20 Figure 4:HSV color space ........................................................................................................................................... 21 Figure 5: pinhole camera projection (mona) ............................................................................................................. 24 Figure 6:sterio vision clarification(mona) ........................................................................ Error! Bookmark not defined. Figure 7:stereo vision terminology(mona) ...................................................................... Error! Bookmark not defined. Figure 8:Triangulation (mona) ........................................................................................ Error! Bookmark not defined. Figure 9:the correspondence problem ............................................................................ Error! Bookmark not defined. Figure 10: depth reconstruction ...................................................................................... Error! Bookmark not defined. Figure 11:kinematic chain illustration ............................................................................. Error! Bookmark not defined. Figure 12: Playstation Eye ......................................................................................................................................... 35 Figure 13:Nintendo game ......................................................................................................................................... 36 Figure 14:Virtual Me basic blocks ............................................................................................................................. 40 Figure 15:Marker extraction flow chart .................................................................................................................... 41 Figure 16: Virtual Me marker distribution ................................................................................................................. 43
Virtual Me
Table of Acronyms and Definitions
2010
Following is a table of the acronyms and abbreviations we use throughout the document. < Move after TOC> Term Definition
10
Virtual Me
2010
DOCUMENT HISTORY
Modified by Version Date Descriptions/Remarks
11
Virtual Me
2010
In recent years there has been a resurge in Human body tracking (HBT), it has a wide range of applications in human computer interfaces and virtual environments as controlling virtual interfaces in games or in educational purposes as driving and flying virtual training, On the other hand a detailed analysis and tracking of the human body is used in clinical studies and diagnostics of orthopedic patients or to help athletes understand and improve their performance.
12
Virtual Me
2010
Also surveillance applications have its place among HBT applications were humans are being tracked over time and monitored for special actions. Different applications have different needs so different techniques have been invented to satisfy this needs. Marker-based techniques involve the placement of markers on different body parts this markers differs from optical, magnetic and colored markers each have its own pros and cons. Optical markers users have free movements and long tracking range but the markers are subjected to optical occlusion, needs controlling lighting and extensive pre and post-processing also it is from the most expensive techniques. Magnetic sensors users are not subjected to optical occlusion and sensor orientation is directly given but subject to disruption by electrical/magnetic fields by other devices and small tracking range. Colored markers (our system) users have free movements and long tracking range but subject to marker disappearance and needs a controlled lighting environment. Marker-less techniques involves detecting human body features and using kinematic models with no need for intensive markers or special cameras which is very attractive but designing such a system is not a trivial task.
13
Virtual Me
Z Extraction://summary(mona)
2010
14
Virtual Me
2010
15
Virtual Me
2010
16
Virtual Me
2010
Median Filter
Median filter is one of smoothing techniques. Median filter is the nonlinear filter that is used to remove the impulsive noise from an image. Furthermore, it is a more robust method than the traditional linear filtering, because it preserves the sharp edges. Median filter is a spatial filtering operation, so it uses a 2-D mask that is applied to each pixel in the input image. To apply the mask means to center it in a pixel, evaluating the covered pixel brightnesses and determining which brightness value is the median value. Figure [1] presents the concept of spatial filtering based on a 3x3 mask, where I is the input image and O is the output image [1].
17
Virtual Me
2010
Erosion is the converse operation. The action of the erosion operator is equivalent to computing a local minimum over the area of the kernel. Erosion generates a new image from the original using the following algorithm: as the kernel B is scanned over the image, we compute the minimal pixel value overlapped by B and replace the image pixel under the anchor point with that minimal value. However, because dilation is just a max operator and erosion is just a min operator, morphology may be used on intensity images as well. In general, whereas dilation expands region A, erosion reduces region A. Moreover, dilation will tend to smooth concavities and erosion will tend to smooth away protrusions. Of course, the exact result will depend on the kernel, but these statements are generally true for the filled convex kernels typically used.
2.1.2. Segmentation
Segmentation of an Image entails the division or separation of the image into parts that have a strong correlation with objects or areas of the real world. The result of image segmentation is a set of segments that collectively cover the entire image. Each of the pixels in a region is similar with respect to some characteristic or computed property, such as color, intensity, or texture. Adjacent regions are significantly different with respect to the same characteristic(s). [5] There are many segmentation techniques; a brief will be given on the most widely used techniques
K-means Cluster
It is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centers, one for each cluster. The next step is to take each point belonging to a given data set and associate it to the nearest center. When no point is pending, the first step is completed and an early grouping is done. At this point we need to re-calculate k new center of the clusters resulting from the previous step. After we have these k new centers, a new binding has to be done between the same data set points and the nearest new centers. A loop has been generated. As a result of this loop we may notice that the k center change their location step by step until no more changes are done. In other words centers do not move any more. [6]
Threshold Technique
The technique makes decisions based on local pixel information, since many objects or image regions are characterized by constant reflectivity or light absorption of their surface, are effective when the intensity levels of the objects fall squarely outside the range of levels in the background. This technique is computationally inexpensive and fast and can easily be done in real time. [7]
18
Virtual Me
2010
Original Image
Segmented Image
Figure 2: Segmentation
19
Virtual Me
2010
if more than one of the neighbors have V={1}, assign one of the labels to p and make a note of the equivalences. After completing the scan, the equivalent label pairs are sorted into equivalence classes and a unique label is assigned to each class. As a final step, a second scan is made through the image, during which each label is replaced by the label assigned to its equivalence classes.[]
Color Models
The purpose of a color Models (also called as color space) is to facilitate the specification of colors in some standard. In terms of digital image processing, the most commonly used models are the RGB (red, green, blue); the CMY (cyan, magenta, yellow); CMYK (cyan, magenta, yellow, black); and the HSV (hue, saturation, value) model which corresponds closely with the way humans describe and interpret color. [2] In the detection problem we have preferred to use HSV model to make the project reliable such that the user can adjust the brightness and the saturation of the colors depending on the lighting conditions in the room and the color of his markers. RGB Model In the RGB model, each color appears in its primary spectral components of red, green, and blue. The number of bits used to represent each pixel in RGB space is called the pixel depth. Consider an RGB image in which each of the red, green, and blue images is an 8-bit image. Each RGB color pixel is said to have a depth of 24 bits. [3] RGB model is an additive color model in which red, green, and blue light are added together in various ways to reproduce a broad array of colors.
Red
Green
Color Monitor
Blue
Figure 3: RGB illustration
20
Virtual Me
2010
HSV Model RGB and CMY color models are not well suited for describing colors in terms that are practical for human interpretation, so HSV color space is very important color model for image processing applications because it represents colors similarly how the human eye senses colors.[4] The HSV color model represents every color with three components: hue (H), saturation (S), value (V). The below figure illustrates how the HSV color space represents colors.
2.1.3.1. RGB to HSV Conversion: Creating image in RGB and CMY and changing it to HSI is a straight forward process.
21
2010
A point in n dimensional space is completely represented by n constants representing the length of the points vector component along each of the space axis. The axis are usually taken as orthonormal vectors for simplicity but any n independent vectors will suffice to describe the space.
Cross product
This operation produces a vector in the direction normal to the two vectors.
Translation transform
Translation is moving a point in a straight line. Translation is defined by the translation vector T; the unit vector defines the direction of the displacement and the magnitude of T defines the magnitude of the translation
22
Virtual Me
The translation matrix for displacement T is defined as:
2010
Rotation transform
Rotation Transform is the process of rotating the point around one of the axis. The rotation around each axis in the right direction is defined in the following equations ( ) ( )
23
Virtual Me
2010
24
Virtual Me
Lens distortion There are two main lens distortions:
2010
Radial distortion: It is easier to manufacture a spherical lens rather than the ideal parabolic lens. This leads to the fish eye phenomenon where the image is distorted and bent around the optical center of the image. The distortion is directly proportional to r, the distance between the point and the optical center of the image. Radial distortion can be modeled as a Taylor expansion around r. The equations for radial distortion are:
Where are the radial distortion coefficients. The number of terms sued in the correction process depends on the camera; for modern cameras usually the 1st 2 parameters and are sufficient. Tangential distortion: Tangential distortion is caused by defects in the manufacturing process in the camera e.g. the imager not being parallel to the image plan.
Where r is still the distance between the point and the optical center distortion coefficients
and
are tangential
25
Virtual Me
2010
Figure 6
26
Virtual Me
2010
Figure 7
But how we can find the two corresponding point in left and right image? Epipolar Geometry would answer that question.
Epipolar Geometry
Its simple idea is: suppose an image point is observed in the left image from a 3D feature. The exact position of the point is unknown, but it will definitely lie somewhere on the back projection line (shown dashed). If we take this line and project it onto the right camera image, we get a line in the right camera image. This line in the image is called an epipolar line. So, given a feature in one image, its matching view in the other image must lie along the corresponding epipolar line.
27
Virtual Me
The Essential and Fundamental Matrices
2010
The essential matrix E captures the essential geometry of stereo imaging i.e. it contains all of the information about the translation T and the rotation R, which describe the location of the second camera relative to the first in global coordinates as shown in Figure x.
The fundamental matrix F contains the same information as E in addition to information about the intrinsic of both cameras. Once we have the fundamental matrix F, we may use it to rectify the two stereo images so that the epipolar lines are arranged along image rows and the scan lines are the same across both images.
Stereo Rectification
Given a pair of stereo images, rectification determines a transformation of each image plane such that pairs of epipolar lines become collinear and parallel to one of the image axes. The important advantage of rectification is that computing stereo correspondences (explained in next section) is reduced to a 1-D search problem along the horizontal raster lines of the rectified images. The following figure illustrates the effect of stereo rectification.
Figure 6 the search space before (1) and after (2) rectification
28
Virtual Me
2010
The result of stereo rectification is eight terms, four terms for left camera and four for the right one. Each camera terms are: 1. The distortion vector distCoeffs 2. The rotation matrix 3. The rectified camera matrix 4. The unrectified camera matrix From these terms, we can make a map which is called Rectification Map using cvInitUndistortRectifyMap function (explained in section x-y) that is used to interpolate pixels from the original image in order to create a new rectified image.
Stereo Correspondence
Finding pairs of matched points such that each point in the pair is the projection of the same 3D point. Triangulation depends on the solution of the correspondence problem. Ambiguous correspondence between points in the two images may lead to several different consistent interpretations of the scene.
29
Virtual Me
2010
Using Zr = Zl = Z and Xr = Xl - T:
where d = xl - xr is the disparity (i.e., the difference in the position between the corresponding points in the two images)
30
Virtual Me
2010
2.6. Kinematics
Kinematics is a branch of classical mechanics concerned with the study of motion of bodies regardless of the causes of the motion like acting forces, mass etc.
31
Virtual Me
2010
Rigid Bodies only support rigid transforms that maintains the rigid property e.g. rotation and translation. An example of non rigid transform is the scaling transform.
Joints
There are various kinds of joints but we will focus on 2 joint types: Ball/Spherical joint: is a joint that allows free rotation in any direction in 3d space. An example of ball joint is the human shoulder Hinge/Revolute joint: is a joint that allow rotation in 1 direction only. An example of the hinge joint is the rotation motion of the human elbow.
Human body joint rotations are constrained rotations. The joint has a range of rotation for each DOF. For example the hinge joint at the elbow has an approximate range of (30,180). This means that the maximum relative angle that the fore arm can make with the upper arm is 180 and the minimum is 30.
32
2010
The development of games involves a lot of work that is repetitive and changes very little between games. A middle ware called a game engine builds a layer that abstracts a lot of the repetitive work. The features provided by the game engine includes: 3d rendering, network handling, AI. A game engine does not need to implement all of those features. Game Engines that focus on features concerned with rendered are called graphics engine. Some graphics engines abstraction of the hardware may allow support of cross platform development as well as operation on different drivers. Graphics engines include: OGRE Realm Forge Irrlicht
33
Virtual Me
2010
34
Virtual Me
3.1. Literature Survey 3.2. Market Survey
2010
As a result of choosing this project, we searched for similar projects that can have functionalities like ours, so to know what we can benefit from them, to get a good insight in our product and see the projects future work. The most famous systems are: PlayStation Eye, Nintendo Wii and Arena Motion Capture.
Key Features:
A sophisticated microphone with the ability to reduce background noise and focus on spoken word for smoother, more accurate speech recognition. Engineered to perform in low-light conditions. Faster frame rate for improved tracking, responsiveness and smoothness. Two position zoom lens for close-up and full body options. Free EyeCreate editing software, which allows users to save, edit and add cool visual effects to photos, video and audio clips.
35
Virtual Me
2010
EyeCreate also enables user to capture videos and audio clips directly to your PS3s hard disk drive and access a range of different capturing modes, including slow motion and time-lapse.
Price:
PlayStation Eye is available for $39.99.
Wiimote includes an accelerometer to determine the movements speed, and an infrared camera to determine location relative to the LED lights in a sensor bar, also Bluetooth is used to determine the buttons that the user pushes on the controller.
36
Virtual Me
Key Feature:
2010
Single User Control: With user defined timers for capture sessions, and real time 3D view feedback, a single user can be a motion capture operator and actor. Simple Camera Calibration: Up to 24 cameras can be calibrated in capture volumes up to 400 square feet (20' x 20'), with capture volume preview. Automatic Skeletons & Marker Assignment: Generate automatic skeletons and marker assignments with the easy-to-follow Skeleton Wizard. Real-time Data Capture: Preview your motion capture data in real time, with full body rendering. Built-in Data Editor: Advanced editing tools let users work with their capture data.
Technical Specifications:
6 FLEX: V100R2 cameras
Price:
The motion capture software with the 6 cameras is available for:$ 5,999
37
Virtual Me
2010
38
Virtual Me
4.1. Development Tools & Environment
Microsoft Visual Studio 2008 Microsoft Visual C++ 2008 Intel OpenCV2.0 image processing library Minoru 3d web cam Irrlicht Game and Graphics Engine Havok Physics Engine AQtime Profiler Trial version Intel Parallel Studio
2010
39
Virtual Me
4.2. Basic Blocks
The following figure shows the basic blocks and interactions in our framework
2010
Virtual ME
Chessboard Images
Images
Left and Right Video Feeds
Marker Extraction
Rectification Maps
Markers Estimated 3d Position
Spatial Correspondence
Disparity Map
Reconstructed Pose
Pose Reconstruction
To Application
40
Virtual Me
2010
Capture Interface
Color Detection
Segmentation
Classification
Mapping each marker to a body part
Major Components
VM- 00-C-00 Color Detection There are four colors to be detected Red, Green, Blue and Yellow, to be able to control the color saturation and brightness the RGB image is converted first to the HSV image.
41
Virtual Me
2010
Each Color has a specific range at the Hue, Saturation, and Value channel, No perfect specific ranges are known, and by trials we have got this ranges for our markers: Color name (hue min, hue max, saturation min, saturation max. value min, value max) NB: OpenCV just uses Hues between 0 to 179 Blue (0, 30,140,255,140,255) Green (40, 80,140,255,100,255) Yellow (90,100,140,255,100,255) Red (110, 150, 140,255,100,255) So if the Color Detection found those ranges it adds the position of those pixels in structure then it passes it to the segmentation component. Noise handling: sometimes the noise in the frame affect our detected values, so in a noisy environment we uses Erode and dilate function to eliminate the noise in the frame, but we have preferred to use white background to eliminate the noise instead of using Erode and Dilate because they are extremely time consuming. VM- 00-C-01 Segmentation: The segmentation takes as an input the position of each pixel of the markers (a point in 2D), and then decides which points belongs to the same segment, and the return the center position of each segment. The importance of the segmentation in our project is that we use the same color for different body part, so we need to collect the connected component so as to differentiate between those parts. Example: detecting the green color in the image will return a large number of green points in the image; these points represent markers on the right and left shoulders, after passing these points on the segmentation function, to will return 2 components representing the left and right shoulders. Noise handling: We have faced some noise problems, the marker sometimes is divided into smaller markers due to lighting conditions, and so at the end of segmentation we connect the segments that are very near to each other. Also to eliminate the noise, we ignore any segment that has a number of points smaller than specific threshold.
42
Virtual Me
VM- 00-C-02 Classification:
2010
The classification takes the center of each marker as an input to map each marker to a body part based on the position of the marker and the knowledge of the human body. Example: in the following figure we classify the min x & min y green component as the right shoulder
VM- 00-C-04 Prediction The problem of marker disappearance exist in our project, so we use kalman filter for predicting the markers position from the first frame in our system to train the filter, and if the marker is missed at any frame the predicted value is used. The kalman filters takes as an input the previous points and predict the current one. For more details about kalman filter refer to appendix B.
43
Virtual Me
2010
Stereo Calibration
Stereo Rectification
Calibration parameters
3D Extraction
Figure 14 Depth Extraction Block Diagram
Module decomposition Stereo Calibration (offline) The Block Diagram in Figure x-y is a high level description of the initialization process. As shown in the figure, it takes a set of chessboard images captured by the webcam as an input and provides the left and right rectification maps needed for stereo correspondence.
44
Virtual Me
Find Chessboard Corners
2010
First of all we need to calibrate any camera used in the system because of that camera calibration is important for any image processing to be a highly accurate representation of the real world. Camera calibration can be done very efficiently by observing a calibration object (usually consists of two or three planes orthogonal to each other) whose geometry in 3-D space is known with very good precision. Camera calibration needs that object as it needs for its computations a set of 3D points in the real world along with their 2D points in the image plane resulting from projecting the 3D points on the image plane. Chessboard is the practical choice as we know the dimensions of the squares of the grid; therefore we can compute the actual coordinates of every corner in the grid. Figure x shows the chessboard we used in calibration.
45
Virtual Me
2010
Figure x the chessboard used in calibration Steps for getting set of 3D and 2D chessboard corners points: 1. Capture different images for the chessboard with different orientations. 2. Input each captured image to a corner finder function (cvFindChessboardCorners() in OpenCV [x]) , this function returns the image coordinates of every corner in the input image. 3. Now there is a set of 3D points with their corresponding 2D points which is passed to the calibration function.
Stereo Calibration
We perform stereo calibration in order to find the rotation matrix R and translation vector T between the two cameras. Steps for getting the rotation matrix R and translation vector T: 1. Use cvCalibrateCamera2() to find rotation matrix and translation vector of each camera separately. 2. Plug these parameters into the following equation to find the rotation and translation parameters between the two cameras:
46
Virtual Me
Stereo Rectification
2010
For the purpose of our project, we treat this part as a black-box, since it a pure mathematics part and there exist a number of functions for doing it. We first used the implementation in opencv which is based on Bouguets algorithm [x], but it is not very efficient with our system. We then use cvWarpPerspective() (refer to [x] for more details) opencv function which improved the accuracy of rectification.
47
Virtual Me
A pair of rectified images
2010
cvFindStereoCorrespondenceBM ()
Depth map
As shown in figure x, the function assumes the input to be a pair left and right rectified images and outputs the disparity map which has the disparity value of each point belongs to the visual areas in which the views of the two cameras overlap. After that, once we know the physical coordinates of the cameras scene, we can derive depth measurements from the triangulated disparity measures (explained in section x-y) between the corresponding points in the two different camera views. We use cvFindStereoCorrespondenceBM() opencv function which implements the stereo correspondence very effectively.
[ ]
[ ]
where the real world coordinates are then (X/W, Y/W, Z/W). OpenCV has two functions that do this extraction automatically. The first operates on an array of points and their associated disparities which is called cvPerspectiveTransform. The second operates on whole image and its associated disparity map which is called cvReprojectImageTo3D. For our approach, we choose the first function as it is more efficient with respect to and memory as we dont need to find the actual coordinates of all pixels of the image but for specific points (markers positions) of the image.
48
Virtual Me
4.3.3. VM-M-01 Pose Reconstruction
2010
Description: This Module is responsible for generating the pose information. It handles reconstruction for the human body pose from the 3d points. The module works with a set of constraints during reconstruction to make sure the constructed pose is valid. For performance reasons this module is dependent on both the articulated body and the segments being tracked in the body. Interface: Input: o The position of markers in 3d o The classification of each marker Output o The Reconstructed 3d human pose Flowchart: Constraints
Pose Reconstruction
Pose Data
The pose data is presented in two representations. The spherical coordinates of each segment of the human body as well as the unit vector along each segment in the human body. Major Components: VM-02-C-00 Reconstruction: The reconstruction the component is the core component of the pose reconstruction module. The reconstruction component uses vector algebra to obtain the orientation of each individual joint. The idea is that each segment vector is approximated by the 2 markers at the ends of the segment. For performance reasons the implemented reconstruction module only enforces the constant limb length constraints. VM-02-C-01Pose:
49
Virtual Me
2010
Pose is the output of the frame work. It is mainly generated by the Reconstruction. It stores the needed information about each segment to reconstruct the segment pose. Essentially constructing the human body pose.
50
Virtual Me
4.4. Virtual ME Application
2010
A general sequence of how Virtual ME is used in an application is shown in the following figure.
Virtual ME App
Initialize Application
Processing delay
Virtual ME
Exit Application
4. Release virtual me resources
Figure 17 Virtual ME Application
Basically there are 4 major steps for using Virtual ME: 1. Initialization: The first step for using virtual ME is initialization. The application created the Virtual ME interface; the class is used to communicate with virtual ME. Behind the scenes the frame work allocates required resources most notably the capture interface to interface with the camera. 2. Query for Pose: In this step the application is basically asking the frame work for input. It is worth mentioning that there is no buffering feature in virtual ME; frames can be lost due to failure to acquire them from the cam before they are discarded. It is also worth mentioning that theirs is usually a delay for the virtual me function call .This delay can cause the application to become unresponsive. Hence it is advisable to run the frame work in a different thread. 3. Pose returned:
51
Virtual Me
2010
When the frame work processing is done the result is returned to the application. The application can then use the results to animate objects, create events, etc. 4. Release Resources: This is the last step for using the frame work. The frame work is resources are deallocated to prevent resources and memory leaks.
Game Entity
Entity Physics
Figure 18 Game Entity
Entity Graphics
The update process of the game world is divided into two steps: Update physics Update Graphics to match physics
52
Virtual Me
Virtual Goal Keeper
2010
Initialize Application
Game Loop
Physics Engine
Graphics Engine
53
Virtual Me
2010
During the Initialization phase Virtual ME frame work is initialized and started in a thread. The physics and graphics world are also initialized; this includes initializing the graphics and physics engines. The Application then starts the game loop. The main operations in the game loop are checking the Virtual ME Thread and restarting it when needed. Running physics, handling game logic then updating the displayed world.
54
Virtual Me
2010
Constraint Shoulders cannot be lower than the waist Elbow cannot be lower than the right knee Wrist cannot be lower than the ankle The user cannot turn abound
Table 2:Limitation table
55
Virtual Me
2010
5.
5.2. Conclusion
5.2.1. Methodology Error
The methodology introduces two main types of errors: Inherent methodology error and Interpolation error.
56
Virtual Me
2010
The Arm on the left has the ideal marker placement while the arm on the right has only 3 markers and one of them is misplaced this leads to a deviation in the orientation vector.
Interpolation Error:
Calling the complete framework sequence for each frame may pose a huge impact on performance. To handle this problem the output can be interpolated between each two successive framework calls. Such that the jerky motion from the framework is smoothed across a number of frames to give the motion a smoother feel. This interpolation however is not the actual motion and change in the motion between each 2 interpolation frames may be lost.
Code optimization
Using profilers a list of functions that take up a significant percentage of runtime was compiled. The list was then divided to 3 categories: Black box functions: The external library functions used by the framework; these functions are quite hard to optimize Performance/accuracy functions :Functions whose processing depends on some parameter and the accuracy varies inversely with the speed Other functions: functions that do not fit into either of the above categories
57
Virtual Me
2010
For the black box functions we chose not to optimize since their optimization would take a significant amount of time and is quite risky; The other functions categories were the focus of optimization by inlining some strategic functions and changing the data structures use a performance increase of --- was achieved
Threading
In the initial runs the Frame work calls where putting a significant time penalty on the application render loop adding a stale and an application freeze whenever the frame work was called. To handle this; the logic was modified such that the library call is handled in an independent thread. The stale to the render loop was significantly reduced.
Future Work
5.2.5. Robust marker detection and classifier
A Nave approach for detecting the markers and classifying them is used in Virtual ME. This imposed constraint on the player pose; as well as introduced inaccuracies in the makers tracking process.
REFERENCES
[1] http://homepages.inf.ed.ac.uk/rbf/HIPR2/label.htm [2]opencv book [3]http://en.wikipedia.org/wiki/Kalman_filter
58
Virtual Me
2010
59
Virtual Me
ARABIC SUMMARY
0102
. .
06
Virtual Me
Appendix A. User Manual
2010
61
Virtual Me
Appendix B.
Kalman Filter
2010
The basic idea behind the Kalman filter is that it will be possiblegiven a history of measurements of a systemto build a model for the state of the system that maximizes the a posterior it probability of those previous measurements. What does it mean to maximize the a posteriori probability of those previous measurements? It means that the new model we construct after making a measurementtaking into account both our previous model with its uncertainty and the new measurement with its uncertaintyis the model that has the highest probability of being correct. This means that the Kalman filter is the best way to combine data from different sources or from the same source at different times.[2]
Kalman Equations
The Kalman filter model assumes that the true state at time k is evolved from the state at (k 1) according to
Where Fk is the state transition model which is applied to the previous state xk1; Bk is the control-input model which is applied to the control vector uk ; wk is the process noise which is assumed to be drawn from a zero mean multivariate normal distribution with covariance Qk. At time k an observation (or measurement) zk of the true state xk is made according to
Where Hk is the observation model which maps the true state space into the observed space vk is the observation noise which is assumed to be zero mean Gaussian white noise with covariance Rk. The initial state, and the noise vectors at each step {x0, w1... wk, v1 ... vk} are all assumed to be mutually independent.[3]
62
Virtual Me
Appendix C. Installation & Operation Manual
2010
63
Virtual Me
Appendix D. Programmers Manual
2010
64
Virtual Me
Appendix E. Design Outline
2010
65
Virtual Me
Appendix F. Test Cases
1. VM-M-00 Marker Extraction:
Test case Empty frame All red frame No camera attached Not all markers available A lot of markers available Noisy frame Values Null frame Red image Unplug the webcam Hide the left hand
2010
66
Virtual Me
Table 4:color detection test cases
2010
3. VM- M-03Classification
Test case Empty container Container with large number of components Container with only one component Container with a components of a center at the image boundaries Values White image Image with 10 red balls
67
Virtual Me
a missing marker balls
2010
68