You are on page 1of 47

Detection and Tracking of Moving

Objects from a Moving Platform

Grard Medioni
Institute of Robotics and Intelligent Systems
Computer Science Department
Viterbi School of Engineering
University of Southern California

Problem Definition
Scenario: rigidly moving objects + moving camera

Goal
Motion segmentation: motion regions / background area
Tracking of multiple objects: consistent track(s) over time
Geo-registration and Geo-tracking: Geo-referenced mosaic and tracks

Scenario example 1 moving cameras

Moving cameras

Image stabilization

Motion
segmentation

Tracking
Mosaic+Tracks
+Tracks
Mosaic

Scenario example 2 - moving cameras with a map

Moving camera
Map
Image stabilization
Geo registration

Global data
association

Motion
segmentation
Tracking

Geo-referenced
Geo-referenced
mosaic++tracks
tracks
mosaic

Challenges & Applications


Information sources
Pixel colors + 2D coordinates
Object model information (optional)

Difficulties
Camera motion
3D Static structures (parallax)
Multiple moving objects

Applications
Video surveillance
Video compression and indexing

Outline
 Introduction
 2D Motion segmentation
Tracking of multiple moving objects
Geo-registration and geo-tracking
Summary and Discussion

Motion Segmentation Overview


Task: to segment motion region and background

Assumptions
General camera motion
Distant scene
Textured background

Feature Extraction & Matching


Salient parts of the scene
Extraction
Harris corners
Multi-scale
Multi-orientation
Sub-pixel accuracy

Matching
Small inter-frame motion
Gray-scale windows
Cross correlation

Large viewpoint change


Gradient histogram
Vector angle

Multiple Image Registration


Frame motion model
Assumptions:
Small inter-frame motion
Distant planar scene

2D affine transform

Robust estimation
Random Sample Consensus
(RANSAC)
Keep the model with the
largest number of inliers

Non-linear refinement over


the inliers

Ap1 = p 2

A11
A
21
0

A12
A22
0

A13 u1 u2
A23 v1 = v2
1 1 1

Motion Segmentation
Two-frame pixel-level segmentation?
Segmentation within a temporal window
Accumulate the pixels warped from adjacent frames
K-Means to find the most representative pixel
Frame differencing and thresholding: |Ioriginal-Imodel|>I

Frame t
Frame t-w

t: reference frame
w: half size of the window

Frame t+w
10/72

Experimental Results (1)

Original
Images

Motion
Prob.
Maps

Initial
Detection
Results

Tracking
Results

11/72

Experimental Results (2)

Original
images

Motion
Prob.
Maps

Initial
Detection
Results

Tracking
Results

Experimental Results (3)

A synthesized video without motion regions

Outline
 Introduction
 2D Motion segmentation
 Tracking of multiple moving objects
Geo-registration and geo-tracking
Summary and Discussion

Problem statement- multiple target tracking


Input: foreground regions in each frame
Output: trajectories with consistent track IDs
Challenges:
Noisy foreground regions
Occlusions

Problematic underlying assumption


One-to-one assumption
One target can correspond to at most one observation
One observation can be associated to at most one target
Appropriate to punctual observations

Underlying one-to-one assumption may not stand for visual tracking

Radar

UAV camera

Stationary camera

Related work

MAP, multi-scan, uniform prior (no missing or false detection)

(Cong et al., 04) Approximate association probabilities in JPDAF


MMSE, MCMC outperforms JPDAF, one-scan/muliti-scan

(Sastry, et.al 04) MCMC to compute joint DA with unknown number of


targets
MAP, multi-scan, outperforms MHT, consider temporal association only

(F.Dellaert et.al 03) MCMC to SfM without correspondence


MMSE, Single scan, similar to JPDAF

Our method: overcome the one-to-one assumption


MAP, multi-scan, consider both spatial and temporal association

One-to-one assumption

(Pasula et al., 99) Gibbs sampling to compute joint DA

Anatomy of the problem


Explain foreground regions:

It is hard at one frame without using any model information


It is solvable if smoothness in motion and appearance is used

Explanation of foreground regions


Two way of explain foreground regions
Precisely

Approximately
Labeling of foreground regions

The label(s) of a pixel indicates the


track ID
Each pixel can have multiple labels
to represent occlusions
Accurate but expensive!

Cover of foreground regions

A set of shapes (rectangles)


Each rectangle can have overlap
with others to represent occlusions
Approximate but Efficient!

Our formulation
Given
A set of noisy observations (foreground regions)

Find
A cover of foreground regions over time

is a sequence of shapes (rectangles)

Solution space
Solution space is a collection of spatio-temporal covers of
observation Y.
Joint association event

= { 1 , 2 K, K }

Two kinds of data association


Spatial data association - change the cover at one instant
Temporal data association - form consistent tracks

Uncovered area belongs to false alarms

(a) Observations Y

(b) One possible cover of Y

Bayesian formulation
MAP estimate

* = arg max( p( | Y ))
p ( | Y ) p (Y | ) p ( )
Prior model p()
Few number of long tracks
One track should have little overlapping with other track unless necessary

p( ) = p ( L) p( K ) p(O)
Likelihood p(Y | )
Smoothness in both motion and appearance
Areas of uncovered false alarms p(F)
K | k |1

p (Y | ) = p ( F ) L( k (ti +1 ) | k (ti ))
k =1 i =1

Motion likelihood
Appearance likelihood

Motion and appearance likelihood

Motion

xtk+1 = Ak xtk + w
y = H x +v
k
t

k
t

w ~ N (0, Q)
v ~ N (0, R)

k (ti+1)

k (ti+1)

Appearance
LM ( k (ti +1 ) | k (ti )) p( k (ti +1 ) | k +1 (ti ))

LA ( k (ti +1 ) | k (ti )) = (1/ z3 ) exp ( 3 D( k (ti ), k (ti +1 ) )


D ( k (ti ), k (ti +1 )
Kullback- Leibler (KL)
distance between two RGB
color histograms

MAP of full posterior p( |Y)


MAP estimate of such a posterior is not a trivial task
Even to determine the parameters in such a posterior is not an
easy task

p( | Y ) exp {C0 Slen C1 K C2 F C3 Solp C4 S app Smot }


MAP is equivalent to minimize an energy function.

Solution to MAP:
Sampling based method to avoid enumerating all possible solutions
Two types of proposal moves (temporal and spatial moves)
Symmetric temporal information

Markov Chain Monte Carlo


Basic idea: construct a Markov chain which will converge to
the target distribution
State of the Markov chain is defined in
Transition of the Markov chain is guided by a proposal distribution

Metropolis-Hasting algorithm
Propose a new state from the previous state (i)

' ~ q( ' | (i ) )
Accept with probability

p( ')q( ( i )

| ')
min 1,
(i )
(i )
p ( )q ( ' | )

Properties
Dont have to compute the global p(), but the local ratio p()/ p()
For MAP, dont have to keep the whole chain, but the current state and the
best one

Metropolis-Hasting algorithm
1. Initialize (0) .
2. For i = 0 to N -1

N is the length of Markov chain

- Sample u U [0,1]
- Propose ' q( ' | (i ) ).

q() is called the proposal distribution

(i )

p
(

')
q
(

| ')
(i )
- Compute A( , ')= min 1,
(i )
(i )

p
(
)
q
(
'
|
)

- If u < A( ( i ) , ')
else

( i +1) = '
(i +1) = (i )

Endfor

The chain { (0) , K , ( N ) }N p( )

Two types of q( | )
Temporal moves and spatial
Birth/Death

Data-driven proposal

q( ' | ) q( ' | , D)
Spatial moves are made only after

Temporal Moves

moves to drive the Markov chain

enough temporal information is

Extension/
Reduction

Split/Merge

Switch

Symmetric temporal information

Forward and backward (e.g. extension)

Deal with occlusions at the very


beginning

Spatial Moves

collected
Segmentation
/Aggregation

Diffusion

MCMC Data Association


1. Initialize (0) .
2. For i = 0 to N -1

- Sample u U [0,1]
- Sample if i < N , ' qTemporal ( ' | ( i ) )
else

' qAll ( ' | (i ) ).

(i )

p
(

')
q
(

| ')
(i )
- Compute A( , ')= min 1,
(i )
(i )
p( )q( ' | )

- If u < A( ( i ) , ')
else
Endfor

( i +1) = '
(i +1) = (i )

Determining Parameters
Determine the parameters in the full posterior
Casual setting makes ground truth p(gt|Y) even much lower than the
solution.
Take advantage of the property of MCMC

p ( | Y ) exp {C0 Slen C1 K C2 F C3 Solp C4 S app S mot }

Degenerate the gt to

p( gt )
p ( ')

A [C0 , C1 , C2 , C3 , C4 ] b

C0 , C1 , C2 , C3 , C4 0
max(C + C + C + C + C )
0
1
2
3
4

Linear Programming to solve it


(GNU Linear Programming Kit)

Simulation experiments
Settings

K (unknown number) moving discs in 200x200


Independent color appearance and motion
Static occlusion and inter-occlusion
False alarms

Original video

Tracking result

Simulation experiments
Quantitative comparison

MHT (I. Cox94), JPDAF (J.Kang03), Temporal only


STDA score in VACE-II eval
Same motion and appearance likelihood
Average of multiple sequence and multiple runs

FA=0, W=50, 10K MCMC iterations

K=5, W=50, 10K MCMC iterations

Simulation experiments
Online implementation
Sliding window W
Initialize t with *t-1

Online vs. offline comparison T=1000

Real Scenarios

Experiments

CLEAR 320x240

Vivid-II 320x240

Experiments
Can handle occlusion at the beginning by using symmetric
temporal information

Outline
 Introduction
 2D Motion segmentation
 Tracking of multiple moving objects
 Geo-registration and geo-tracking
Summary and Discussion

Geo-registration
Use 2D homography to

compensate inter-frame (2-

H i +1, M = ( H i ,i +1 ) H i , M H update

view) motion

Hi,i+1
Hi,M

Hi+1,M

Hupdate

Refine the homography

between map and images

37/72

Geo-registration results

Geo-mosaicing 2000 frames on top of the reference frame.

Experimental results
Results are shown on two UAV data sets
Map is acquired from Google Earth
Geo-registration is performed every 50 frames
Local data association (MCMCDA) window 50 frames

Geo-registration

Without geo-refinement

With geo-refinement

Experimental results

Experimental results

System implementation
C++ implementation
Xeon Dual Core P4 3.0GHz
Preliminary time performance
Procedure

Time (seconds) on 320x240

Image registration

~ 0.25

Motion detection (moving cameras)

~ (2 / 0.1) (CPU / GPU)

Object detection after motion


segmentation

~0.25

Geo-registration

~ 6 every 50 frames

Tracking

~ 0.4

Total

~ 1 ( GPU)

43/72

Outline
 Introduction
 2D Motion segmentation
 Tracking of multiple moving objects
 Geo-registration and geo-tracking
 Summary and Discussion

Summary & Discussion


Detection and tracking in dynamic scene

Moving camera + rigid moving objects


2D motion segmentation and geometric analysis of background
Spatial and temporal (2D+t) data association of moving objects
Tracking with Geo-registration

Highlights
Solution to practical problems in detection and tracking area

Encouraging results and extensive applications

Future directions
Multi-view geometry + object recognition
Automatically determination of applicable tasks

Reference

Qian Yu and Grard Medioni, A GPU-based implementation of Motion Detection from a


Moving Platform, to appear in IEEE workshop on Computer Vision on GPU, in conjunction
with CVPR08

Qian Yu and Grard Medioni, Integrated Detection and Tracking for Multiple Moving
Objects using Data-Driven MCMC Data Association, IEEE Workshop on Motion and Video
Computing (WMVC'08), 2008

Qian Yu, Grard Medioni, Isaac Cohen, "Multiple Target Tracking Using Spatio-Temporal
Monte Carlo Markov Chain Data Association" IEEE Conference on Computer Vision and
Pattern Recognition, 2007 (CVPR'07), pp.1-8

Qian Yu, Grard Medioni, "Map-Enhanced Detection and Tracking from a Moving Platform
with Local and Global Data Association," IEEE Workshop on Motion and Video Computing
(WMVC'07), 2007

Yuping Lin, Qian Yu, Gerard Medioni "Map-Enhanced UAV Image Sequence Registration"
Workshop on Applications of Computer Vision (WACV'07), 2007

Qian Yu, Isaac Cohen, Grard Medioni and Bo Wu "Boosted Markov Chain Monte Carlo
Data Association for Multiple Target Detection and Tracking," Proceedings of the 18th
international Conference on Pattern Recognition (ICPR'06), Vol. 2, pp. 675-678.

Q&A

Thank you!