Sie sind auf Seite 1von 15

OBJECT TRACKING

TSBB13 Computer Vision System

Fredrik Lundell

Andreas Wallin

Andreas Vikman

Wednesday 28th April, 2010

Contents
1 Introduction 1.1 The dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Generalization and Assumptions . . . . . . . . . . . . . . . . . . Methods 2.1 Background modeling . . . . . . . . . 2.1.1 Median lter . . . . . . . . . . 2.1.2 Gaussian mixture model . . . . 2.2 Foreground segmentation . . . . . . . 2.2.1 Morphological operations . . . 2.2.2 Segmentation with Graph Cuts 2.3 Identity . . . . . . . . . . . . . . . . . . 2.4 Object identity over time . . . . . . . . Results 3.1 Detection . . . . . . . . . . 3.1.1 Background model 3.1.2 Segmentation . . . 3.2 Tracking . . . . . . . . . . 3.2.1 Identity . . . . . . 3.2.2 Identity over time . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 2 4 4 4 4 5 5 5 6 6 7 7 7 7 8 8 9 10

References 11 .1 Dokumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Chapter 1

Introduction
The aim of this project is to implement an object tracking system for the course Computer Vision Systems at Linkping University. The problem can be broken down into two major components: detection and tracking. Detection involves the separation of foreground from background, segmentation and giving each detected object an identity. Tracking refers to the part where identity is maintained over time. To realise such a system techniques such as background modeling, graph cuts, contour extraction and temporal ltering are used. Furthermore one important criteria is that the system implemented perform in real time or as near real time as possible. To such effect the chosen implementation language is C/C++ and a complementary open source computer vision library (OpenCV2.01 ) is used.

1.1

The dataset

The datasets used in the project come from the following sources: CAVIAR2 EC Funded CAVIAR project/IST 2001 37540 ViSOR3 Video Surveillance Online Repository

1.2

Generalization and Assumptions

To simplify the task of object tracking a number of assumptions have been made: The camera is assumed to be static i.e. no ego-motion is present, therefore all movement observed comes from the observed scene and not the camera itself. The background is mostly static, mostly in this case means that the background consists of simple geometries such as the inside of a building or a
1 http://opencv.willowgarage.com/wiki/ 2 http://homepages.inf.ed.ac.uk/rbf/CAVIAR/ 3 http://www.openvisor.org

road which do not change signicantly over time (disregarding differing lightning conditions). All objects in a scene move at a similar velocity across the scene.

Chapter 2

Methods
2.1 Background modeling

By removing the background from a scene movement over time can be detected. However accurately determining what is background and what is foreground in a scene is not a simple task. Noise, differing lighting conditions (dark vs. a lit room) and false positives (trees moving in the wind) all make the task harder. The simplest method would to simply take an image and assign it as a background reference and then perform the subtraction between each frame and this reference. This approach however only works on scenes where most parameters are known and can be controlled, not really suited for most real-world situations. Therefore a more robust approach of describing the background is needed.

2.1.1

Median lter

Assuming that the background will be present more often than the moving objects , the background can be modeled as being the temporal median intensity for each pixel.[3] The method can be modied to only increment/decrement the median by a chosen step for each pixel, if the current pixel intensity is larger/smaller than the median value as outlined in [5]. This removes the need to explicitly calculate the median for each pixel in every frame. Median ltering is a better approach than a static background image. But a poorly chosen will either incorporate slow-moving objects into the background or leave trailing ghosts in the foreground. See 3.1 for reference.

2.1.2

Gaussian mixture model

An even more appealing approach would be to model the background as a set of gaussian distributions in each pixel describing in each frame which object is closest to the camera. This approach also lends itself to a multi-modal interpretation of the background where several such distributions can be considered as background. This potentially solves the problem of periodically recurring movement such as leaves moving in the wind. [5]

2.2

Foreground segmentation

Modeling the background is an important step in separating the foreground from the background. However, we need additional segmentation methods to separate objects from noise and unwanted artifacts.

2.2.1

Morphological operations

One approach to image segmentation is to use fundamental morphological operations. Dilation and erosion are basic morphological operations constructed as logical-based decisions as a structure element convolves over the image. Dilation is like a union operator which causes objects to grow in size. Erosion is an intersection operator and causes objects to shrink. They both can be used independently or combined to remove noise, artifacts and isolating individual objects in a local neighborhood.

2.2.2

Segmentation with Graph Cuts

Graph cuts [2] is a combination of existing image segmentation methods and simple machine learning techniques. Graph cuts unlike morphogical operations guarantees a global solution and nds an optimal or minimal cut which divides the foreground from background. A graph is constructed with nodes associated with each pixel in the image. Nodes are connected with edges weighted with some measurement of similarity. Each node is also connected to a source and sink node viewed as representing the objects and background respectively. The structure of the graph is illustrated in 2.1.

Figure 2.1: Graph structure The edge weights between nodes should be tuned in such a way which makes similar pixels close to each other belong to the same object. In this implementation the similarity measurement is based on the image intensity. The weights between edges is shown in equation 2.1 where I denotes the intensity at the current node and the standard deviation. Wij = e
( I (i ) I ( j))2

(2.1)

The source and the sink weights describes how likely a node belongs to the foreground or the background. These weights are shown in equation 2.2 and 2.3 where IF and IB denotes the the intensity of forground and background respectivly. Wis = Wit =

( IF Ii ) ( IF Ii ) + ( IB Ii ) ( IB Ii ) ( IF Ii ) + ( IB Ii )

(2.2) (2.3)

With this setup the cost of the cut is very high inside objects of interest and low around the border of the objects. The MAXFLOW[1] algorithm calculates the minimal cut which separates the objects from the background.

2.3

Identity

When the foreground has been predicted the pixels belonging to the foreground will be grouped as individual objects and assigned a label. OpenCV provides a function for nding contours in a binary image, cvFindContours. It creates a vector with vectors containing the point coordinates for each contour. The function uses an algorithm described in [4] to detect the contours. To visualize the identity of tracked objects a rectangular bounding box is rendered surrounding the points provided by cvFindContours.

2.4

Object identity over time

The identity of an object is maintained by calculating the intersection and area overlap of the bounding boxes found in the current and previous frame. If the intersection area is above a certain threshold the previous identity is used to set the new identity. Rectangels which do not satiy the condition of area intersection is given a new unique identity.

Chapter 3

Results
3.1
3.1.1

Detection
Background model

The background is modeled by the median ltering approach described earlier. A crude but effective ramp-up of the variable is performed to quickly reach a state where the median background is usable. A large , e.g. 30, is used as starting point for the rst n, e.g. 20, frames, then for each n frames both and n is halved until reaches 0.5 where it remains constant. The choice of is dependent on the apparent velocity of moving objects in a scene. In cars.wmv from the ViSOR dataset an of 0.5 is clearly too small at certain times as can be seen in 3.1.

Figure 3.1: Left: Absolute difference of median. Right: Result of the median failing to keep the background intact. cars.wmv, ViSOR

3.1.2

Segmentation

The Graph cut method used for foreground and background segmentation proved to be effective but rather slow. One of the constraints which Graph cuts made impossible was to perform all calculations in real-time. We solved this 7

problem by downsample the image to half the size and use that information as setup for the graph. This made it possible to run the algorithm in almost real-time without any major loss in quality. To maintain object solidity we had to force the algorithm to only deal with rough structures. This resulted in better segmentations in most cases limited the possibility to detect small moving objects.

Figure 3.2: Left: Absolute difference of median. Right: Result of Graph cut segmentation. cars.wmv, ViSOR

Figure 3.3: Left: Absolute difference of median. Right: Result of Graph cut segmentation. Note that it removes the small object to the left Walk3.mpg, CAVIAR

3.2
3.2.1

Tracking
Identity

The cvFindContours function in OpenCV works well to identify the tracked objects. It manages to provide a good prediction of which pixels that belong together. The shape of a contour can shift much between frames though, causing the bounding box to expand and shrink rapidly. Interpolation between the current frame and the previous reduces this behaviour but could be enhanced by taking more than two frames in consideration.

Median ltering update step Graph Cuts downsampling factor std. deviation ThresholdFG ThresholdBG Tracking size of bounding boxes overlap of bounding boxes

0.5

0.5 100.0 40 0

> 40 > 40

px2 %

Table 3.1: Constants, used in the implementation

3.2.2

Identity over time

The maintaing of identity between frames works well if the object is not moving to fast and objects do not overlap in the scene. If objects are moving to fast, bounding boxes will not overlap between frames and the identity will the not be preserved. Overlaping objects in the scene will probably get the same identity.

Figure 3.4: Left: Identity at a previous frame. Right: The identity is maintained over time. Walk3.mpg, CAVIAR

Chapter 4

Conclusion
With rather simple Computer vision techniques we have developed a system capable of detecting and tracking objects in several simple cases. However, a lot of further improvements can be done. Median ltering is a simple approach to background modeling, easy to implement but produces some errors that could have been avoided using for instance gaussian mixture models. Median ltering tends to leave trailing ghosts in the foreground which creates false positives. The Graph cut segmentation algorithm is computationally expensive which makes it hard to perform foreground segmentation in real-time. Furthermore it produces a far more stable result than for instance morphological operations which need to be manually optimized for each scene. When determining if two bounding boxes ought to be the same identity we only take the previous frame into consideration. Performing an averaging operation over several frames could have been used to make predictions of the next most likely position and size of the bounding box. A more formal approach would be to use Kalman ltering and would have given more accurate information when estimating the identity of objects. Considering the short project time we are pleased with the results and dont think we could have produced a far better result without replacing some of the fundamental algorithms for detection and tracking of objects.

10

References
[1] Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-ow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell., 26(9):11241137, 2004. [2] Anders P Eriksson, Olof Barr, and Kalle strm. Image segmentation using minimal graph cuts. In Fredrik Georgsson and Niclas Brlin, editors, Proceedings SSBA 2006, pages 4548, 2006. [3] Milan Sonka, Vaclav Hlavac, and Roger Boyle. Image Processing, Analysis, and Machine Vision. Thomson, 2008. [4] Ichiro Suzuki and Tadao Kasami. A distributed mutual exclusion algorithm. ACM Trans. Comput. Syst., 3(4):344349, 1985. [5] John Wood. Statistical background models with shadow detection for video based tracking. Masters thesis, Linkping University, 2007.

.1

Code documentation

/ tracking . h / # i f n d e f TRACKING_HPP_ # define TRACKING_HPP_ # include # include # include # include # include # include # include # include <iostream > <vector > <string > " graph . h " " cv . h " " highgui . h " " cxcore . h" " bBox . hpp "

typedef Graph<double , double , double> GraphType ;

11

# define # define # define # define # define # define

MEDIAN_FINAL_ALPHA 0 . 5 SIGMA_GC 1 0 0 . 0 CONTOUR_MIN_OVERLAP 0 . 4 CONTOUR_MIN_DISTANCE 150 CONTOUR_MIN_AREA 40 DOWN_SAMPLE_RATE 0 . 5 / / G r a p h c u t downsampling f a c t o r .

using namespace s t d ; / Class tracking . / c l a s s Tracking { public : / Constructor I n p u t : cv : : Mat image , s e t s up t h e g r a p h c u t g r a p h and median b a c k g r o u n d m o d e l i n g d a t a . / Tracking ( cv : : Mat& in pu t ) ; / I n i t i a l s e t u p o f median b a c k g r o u n d m o d e l . / void medianSetup ( c o n s t cv : : Mat& in pu t ) ; / U p d a t e s t h e median b a c k g r o u n d m o d e l . C a l l e d f r o m p r o c e s s ( ) . / void updateMedian ( cv : : Mat& i np ut ) ; / I n i t i a l s e t u p o f g r a p h c u t g r a p h n o d e s and e d g e s . / void graphCutSetup ( cv : : Mat& in pu t ) ; / Updates t h e graphcut graph . C a l l e d from p r o c e s s . / void updateGraph ( c o n s t cv : : Mat &input , cv : : Mat &output ) ; / C o n s t r u c t s an i m a g e f r o m t h e d a t a i n t h e g r a p h c u t g r a p h . / void buildGraphImage ( c o n s t cv : : Mat &input , cv : : Mat &output ) ; / / void morphSegment ( c o n s t cv : : Mat &input , cv : : Mat &output ) ; / 12

F i n d s and c r e a t e s p o i n t l i s t s o f a l l c o n t o u r s f o u n d i n an i m a g e . / void c r e a t e C o n t o u r s ( cv : : Mat& in pu t ) ; / C a l l e d e a c h t i m e a new f r a m e n e e d s t o b e p r o c e s s e d . / void p r o c e s s ( cv : : Mat& frame , i n t f p s ) ; / U p d a t e s t h e drawn b o u n d i n g b o x e s i n t h e f i n a l s t a g e . / void updateBox ( ) ; / Updates i d s o f bounding boxes found in each frame . / void updateIDs ( cv : : Mat& output ) ; private : / S t o r e s t h e number o f p r o c e s s e d f r a m e s . / double frame ; / C o n t a i n e r o f t h e median b a c k g r o u n d d a t a . / cv : : Mat median ; / The a b s o l u t e d i f f e r e n c e o f t h e c u r r e n t f r a m e and t h e median . / cv : : Mat absDiffMedian ; / The i m a g e g e n e r a t e d by t h e g r a p h c u t g r a p h . / cv : : Mat graphImage ; / Storage container for contours . / cv : : MemStorage s t o r a g e ; / P o i n t e r to graphcut graph . / GraphType g ; 13

/ L i s t of the previous frames boundingboxes . / v e c t o r <bbox> prevIDs ; / L i s t the current frames boundingboxes . / v e c t o r <bbox> newIDs ; / L i s t o f th e " unique " boundingboxes found . / v e c t o r <bbox> f i n a l I D s ; }; # endif / TRACKING_HPP_ /

14

Das könnte Ihnen auch gefallen