01.13.hierarchical Model-Based Motion Estimation

Hierarchical Model-Based Motion Estimation
James R. Bergen, P. Anandan, Kei,th J. Hanna, and Rajesh Hingorani
Center,PrincetonNJ 08544,USA
David SarnoffResearch
Abstract. This paper describesa hierarchical estimation framework for

the computation of diverse representationsof motion information. The key
features of the resulting framework (or family of algorithms) a,rea global
model that constrainsthe overallstructure of the motion estimated,a local
rnodel that is used in the estimation process,and a coa,rse-finerefinement
strategy. Four specific motion models: affine flow, planar surfaceflow, rigid
body motion, and generaloptical flow, are describedalong with their appli-
cation to specific examPles.
1 Introduction
A large body of work in computer vision over the last L0 or 15 years has been con-
."ro"d with the extraction of motion information from image sequences.The motivation
of this work is actually quite diverse,with intended applicationsranging from data com-
pressionto pattern recognition (alignment strategles) to robotics and vehicle navigat[gn.
In tandem with this diversity of motivation is a diversity of representation of motion
information: from optical flow, to affine or other parametric transformations, to 3-d ego-
motion plus range or other structure. The purpose of this paper is to describe a common
framework within which all of these computations can be represented.
This unification is possible because all of these problems can be viewed from t[e
perspective of image registration. That is, given an image sequence,compute a repre-
sentation of motion that best aligns pixels in one frame of the sequencewith those in
the next. The differencesamong the various approachesmentioned above can then be
expressedas different parametric representationsof the alignment process.In all ca^ses
the function minimized is the same; the difference lies in the fact that it is minimized
with respect to different parameters.
The key features of the resulting framework (or family of algorithms) are a global
modelthat constrainsthe overallstructure of the motion estimated, a local rnodelthat is
used in the estimation process1, and a coarse-finerefinement strategy. An example of a
global model is the rigidity constraint; an example of a local model is that displacement
is constant over a patch. Coarse-finerefinement or hierarchical estimation is included in
this framework for reasonsthat go well beyond the conventionalones of computational
efficiency.Its utility derives from the nature of the objective function common to the
various motion models.
1.1 Hierarchical estimation

Hierarchicalapproaches havebeenusedby variousresearchers e.8.,see[2, L0, 1I,22,19]).
More recently, a theoretical analysisof hieralchical motion estimation was described in
1 Becausethis model will be used in a multiresolution data structure, it is "local" in a slightly
unconventional sensethat will be discussedbelow.
-J
238
[8] and the advantagesof using parametric models within such a framework have also
been discussedin [5].
Arguments for use of hierarchical (i.e. pyramid based) estimation techniquesfor mo.
tion estimation have usually focused on issuesof computational efficiency.A matching
process that must accommodatelarge displacementscan be very expensiveto compute.
Simple intuition suggeststhat if large displacementsca^nbe computed using low resolu-
tion image information great savingsin computation will be achieved.Higher resolution
information can then be used to improve the accuracy of displacement estimation by
incrementally estimating small displacements(see,for example, [2]). However,it can also
be a.rguedthat it is not only efficient,to ignore high resolution image information when
computing large displacements,in a senseit is necessaryto do so. This is becauseof
aliasing of high spatial frequency componentsundergoing large motion. Aliasing is the
source of false matchesin correspondencesolutions or (equivalently) local minima in the
objective function used for minimization. Minimization or matching in a multiresolution
framework helps to eliminate problems of this type. Another way of expressingthis is
to say that many sourcesof non-convexitythat complicate the matching processare not
stable with respect to scale.
With only a few exceptions ([5, 9J), much of this work has concentrated on using a
small family of "generic" motion models within the hiera,rchicalestimation framework.
Such models involve the use of some type of a smoothnessconstraint (sometimesallow-
ing for discontinuities) to constrain the estimation processat image locations containing
little or no image structure. However, as noted above, the arguments for use of a mul-
tiresolution, hierarchical approach apply equally to more structured models of image
motion.
In this paper, we describe a variety of motion models used within the same hierar-
chical framework. These models provide powerful constraints on the estimation process
and their use within the hierarchical estimation framework leads to increased accuracy,
robustnessand efficiency.We outline the implementation of four new models and present
results using real images.
L.2 Motion Models

Becauseoptical flow computation is an underconstrainedproblem, all motion estimation
algorithms involve additional assumpti'onsabout the structure of the motion computed.
In many cases,however, this assumption is not expressedexplicitly as such, rather it is
presentedas a regularizationterm in an objective function [14, 16] or describedprimarily
as a computational isbue [18, 4, 2, 20J.
Previous work involving explicitly model-basedmotion estimation includes direct
methods 1L7,217,[13] as well as methods for estimation under restricted conditions [7,9J.
The first class of methods uses a global egomotion constraint while those in the second
classof methods rely on parametric motion models within local regions.The description
"direct methods" actually applies
equally to both types.
With respect to motion models,these algorithms can be divided into three categories:
(i) fully parametric, (ii) quasi-parametric, and (iii) non-parametric. Fully parametric
models describethe motion of individual pixels within a region in terms of a parametric
form. These include affine and quadratic flow fields. Quasi-parametric models involve
representing the motion of a pixel as a combination of a parametric component that is
valid for the entire region and a local component which varies from pixei to pixel. F'or
instance,the rigid motion model belongsto this class:the egomotionpararnetersconstrain
the local flow vector to lie along a specific line, while the local depth value determinesthe
239
r,lso exact value of the flow vector at each pixel. By non-parametric models, we mean those
such as are commonly used in optical flow computation, i.e. those involving the use of
no- some type of a smoothnessor uniformity constraint.
itrg A parallel taxonomy of motion models can be constructed by consideringlocal models
ute. that constrain the motion in the neighborhoodof a pixel and global models that describe
>lu- the motion over the entire visual field. This distinction becomesespeciallyuseful in a^na.
ion lyzing hiera,rchicalapproacheswhere the meaning of "local" changesas the computation
by movesthrough the multiresolution hierarchy.In this schemefully parametric models are
r,lso global models, non-parametric models such as smoothnessor uniformity of displacement
ren are local models, and quasi-parametricmodels involve both a global and a local model.
rof The rea^sonfor describing motion models in this way is that it clarifies the relationship
bhe between different approachesand allows consideration of the range of possibilities in
the choosing a model appropriate to a given situation. Purely global (or fully parametric)
ion models in essencetrivially imply a local model so no choice is possible.However, in the
ris ca^seof quasi- or non-parametric models, the local model can be more or less complex.
not Also, it makesclea,rthat by varying the size of local neighborhoods,it is possibleto move
continuously from a partially or purely local model to a purely global one.
g a The reasonsfor choosingone model or a.notherare generally quite intuitive, though
rk. the exact choice of model is not always easy to make in a rigorous way. In general,
)w- parametric models constrain the local motion more strongly than the less parametric
ing ones. A small number of parameters (e.g., six in the ca.seof a,ffineflow) are sufficient
rul- to completely specify the flow vector at every point within their region of applicability.
Ige However, they tend to be applicable only within local regions, and in many casesare
approximations to the actual flow field within those regions (although they may be very
ar- good approximations). From the point of view of motion estimation, such models allow
ess the preciseestimation of motion at locations containing no image structure, provided the
ICY, region contains at least a few locations with significant image structure.
ent
Quasi-parametric models constrain the flow field less, but neverthelessconstrain it
to some degree. For instance, for rigidly moving objects under perspective projection,
the rigid motion pa.rameters(same as the egomotion paxarnetersin the case of observer
motion), constrain the flow vector at each point to lie along a line in the velocity space.
One dirnensionalimage structure (e.g., a,nedge) is generally sufficient to precisely esti-
ion
mate the motion of that point. These models tend to be applicable over a wide region
ed.
in the image, perhaps even the entire image. If the local structure of the scene can be
tis further parametrized (".9., planar surfacesunder rigid motion), the model becomesfully
:ily parametric within the region.
Non-parametric models require local image structure that is two-dimensional (e.g.,
:ct
corner points, textured areas). However, with the use of a smoothnessconstraint it is
el. usually possible to 'frll-in" where there is inadequatelocal information. The estimation
,nd
process is typically more computationally expensive than the other two ca.ses.These
.on
models are more generally applicable (not requiring parametrizable scene structure or
motion) than the other two classes.
1.3 Paper Organization
The remainder of the paper consistsof an overview of the hierarchicalmotion estimation

framework, a description of each of the four models and their application to specific
examples,and a discussionof the overall approach and its applications.
' 240
2 Hierarchical Motion Estimation
Figure 1 describesthe hierarchical motion estimation framework. The basic components

of tnis framework are: (i) pyramid construction, (ii) motion estimation, (iii) image warp-
ing, and (iv) coarse-to-finerefinement.
There are a number of ways to construct the image pyramids. Our implementation
uses the Laplacian pyramid described in [6], which involves simple local computations
and provides the necessaryspatial-frequencydecomposition.
The motion estimator varies accordingto the model. In all cases,however,the estima-
tion processinvolvesSSD minimization, but instead of performing a discrete search(such
." in [l]), Gauss-Newtonminimization is employed in a refinement process.The basic
*rr*piion behind SSD minimization is intensity constancy.as applied to the Laplacian
pyramid images.Thus,
f(*,t) = /(* -.t(x),t - 1)
where * = (r,y) denotesthe spatial imageposition of a point, f the (Laplacianpyramid)

image intensity and u(*) - (u(o,a),a(x,y)) denotesthe image velocity at that point.
the SSD error measurefor estimating the flow field within a region is:
r({.'}) - t (/(*,t) - /(x - rr(*),t- L))' (1)

x
where the sum is computed over all the points within the region and {.t} it used to denote
the entire flow field within that region. In general this error (which is actually the sum
of individual errors) is not quadratic in terms of the unknown quantities {t}, be_cause
of the complex pu,[1gtttof intensity variations. Hence, we typically have a non-linear
minimization problem at hand-
Note that the basic structure of the problem is independent of the choiceof a motion
model. The model is in essencea statement about the function t(x). To make this
explicit, we can write,
u(x) = u(x;p-), (2)
where pr,. is a vector representingthe model parameters.

A standa,rd numerical approach for solving such a problem is to apply Newton's
method. Ilowever, for errors which are sum of squaresa good approximation to Newton's
method is the Gauss-Newtonmethod, which usesa first order expansionof the individual
error quantities before squaring. If {u}; current estimate of the flow field during the fth
iteration, the incrementalestimate {6u} can be obtained by minimizing the quadratic
error measure
a({6u})- I @I+ v/. 6u(x))2, (3)
x
where
A/(x) - f(*,t) - /(* - ur(x) ,t - L),
that is the differencebetween the two images at correspondingpixels, after taking the
current estimate into account.
As such, the minimization problem describedin Equation 3 is underconstrained.The
different motion models constrain the flow field in difierent ways. When these a,reused
to describethe flow field, the estimation problem can be refiormulatedin terms of the un-
known (incremental) model parameters.The details of these reformulations are described
in the various sections correspondingto the individual motion models,
241
current values of the

The third component, image warping, is achievedby using the
field to warp I(t - L)
model parameters to compute"aflow fiefi', and then using this flow
warping algorithm uses
;s towards I(f),which is used as the referenceimage.Our current
secondimage)is then
bilinear interpolation. The warped image (as against,the original
The spatial gradient
used for the computation of thl error AI for further estimation2.
n v.[ computations a,rebased on the referenceimage.
current motion esti-
.s The final component, coarse-to-finerefinement, propagatesthe
as initial estimates' For
mates from one level to the next level where they are then used
of the parameters are
l,- the parametric component of the model, this is easy; the values
is also used, that
h simply transmitted to the next level. However, when a local model
a flow field or a
.c information is typically in the form of a denseimage (or images)-<.g.,
pyramid expansionopera-
n depth map. Thislmug" 1o, images)must be propagated via a
the local information
tion as describedin toj. irt" gloial'parameters in combination with
perform the initial warping at
can then be used to generat-ethe flo* field necessaryto
this next level.
r)
t.
3 Motion Models
r) 3.1 Affine Flow

and the camera is
The Model: when the distance between the background surfaces
as an affine trans-
|'e large, it is usually possible to approximate the motion of the surface
m formation:
ie u(r,y)=or*o,zt+asY
IT
a(x, y) -- a4 * asx * aaU (4)
)n Using vector notation this can be rewritten as follows:

is (5)
u(x) - X(x)a
2) md
wherea denotesthe vector (orrozragra1tas,aa)T,
ts X(x) = I t ' y 000 l

ts L00 0 t x Y)
parameter vector a,
al Thus, the motion of the entire region is completely specified by the
,h which is the unknown quantity that needsto be estimated'
ic
the affine param-
The Estimation Algorithm: Let a; denote the current estimate of
in the warping step, an
]) eters. After using the flow field representedby these parameters
the parametric
incremental estimate da can be determined. To achieve this, we inserf
of 6a'
form of 6u into Equation 3, and obtain an error measurethat is a function
E(6a)= I (aI +(v/)rx 6u)' (6)
1e x
Minimizing this error with respectto 6a leadsto the equation:
te
-I x"(v/)(^I).
I x'(v/Xv/)'x] 6a=
rd (i)
n-
:d 2 We have avoided usrng the standard notation It in order to avoid any confusion about this
point.
242
use of the affine flow

Experiments with the affine motion model: To demonstrate
frame of the original
model, we show its performance on an aerial image sequence.A
two frames of
sequenceis shown in Figure 2a and the unprocesseddifference between
an affi'ne
this sequenceis shown in rigure 2b. Figure 2c shows the result of estimating
then using this to com-
transformation using the hierarchical warp motion approach,and
perfectly flat, we
pensate for camera motion induced flow. Although the terrain is not
simple difierence be-
still obtain encouragingcompensationresults. In this example the
locate a helicopter
tween the comp"orJt"J and original image is sufficient to detect and
of compensateddiffer-
in the image. We use extensionsof the approach, like integration
with respect to the
ence imagesover time, to detect smaller objects moving more slowly
background.
3.2 Planar Surface Flow

a planar surface
The Model: It is generally known that the instantaneousmotion of
image coordinates
undergoing rigid *otion can be describedas a secondorder function of
In this section we provide a brief
involving eight independentparameters(e.g.,see [15]):
its estimation'
derivation of this description a,ndmake some observationsconcerning
object (in
We begin by observing that the image motion induced by " rigidly moving
this casea plane)' can be written as:
u(x)= (8)
fuo(*)t*B(x)c.r
position
where Z(*)is the distancefrom the cameraof the point (i.e., depth) whoseimage
is (x)' and
o"l
A(*) = [-Jo - f
L aJ
B(x) = | @il/f -(f-@v)lr

+ *\lf v I
-x )'
Ltr+ f)lr
focal length f
The A and the B matrices depend only on the image positions and the
c, the angular velocity vector, and
and not on the unknowns: t, tle translation vector,
Z.
A planar surface can be describedby the equation
ktX*kzY*ksZ=l (e)
plane from
where (kt,kr,lca) relate to the surface slant, tilt, and the distance of the
origin). Dividing
the origin of the chose coordinate system (in this case, the camera
throughout by Z, we get
t =k++k,I* kg.
we obtain
using k to denotethe vector(tc1,kz,ks) and r to denotethe vector (*lf ,vlf ,1)
- r(x)"k.
z(*)
Substituting this into Equation 8 gives
u(x) = (A(x)t) (r(x)"k) + B(x)r.r (10)

a.
243
.ow This flow field is quadratic in (x) and can be written also as
nal
sof u(x) - a1* a2x * aey* azxz* asxy
fine o(x) - &4* asc * aaU* azxU+ aeUz (1 1 )
rIrl-
we where the 8 coefficients(41,...,og) are functionsof the motion paramterst,cl and the
be- surface parmeters k. Since this 8-parameterform is rather well-known (e.g., see [15]) we
Iter omit its details.
fer- If the egomotionparametersare known, then the three parameter vector k can be used
the to represent the motion of the pla^narsurface.Otherwise the 8-parameter representation
can be used. In either case,the flow field is a linear in the unknown pa,rameters.
The problem of estimating pla^narsurfacemotion has been has been extensivelystud-
ied before [21, 1, 23]. In particular, Negahdaripourand Horn [21]suggestiterative meth-
ods for estimating the motion and the surfaceparameters,a"swell as amethod of estimat-
ace
ing the 8 parameters and then decomposingthem into the five rigid motion parameters
r,tes
the three surfaceparametersin closedform. Besidesthe embeddingof thesecomputations
rief
within the hierarchical estimation framework, we also take a slightly different, approach
to the problem.
(it We assumethat the rigid motion parametersare already known or can be estimated
(".9., see Section3.3 below). Then, the problem reducesto that of estimating the three
surfaceparametersk. There are severalpractical reasonsto prefer this approach:First, in
many situations the rigid motion model may be more globally applicable than the planar
(8) surface model, and can be estimated using information from all the surfacesundergoing
the same rigid motion. Second,unless the region of interest subtends a significant field
of view, the second order components of the flow field will be small, and hence the
estimation of the eight parameterswill be inaccurateand the processmay be unstable.
On the other hand, the information concerningthe three parametersk is containedin the
first order componentsof the flow field, and (if the rigid motion parameters are known)
their estimation will be more accurateand stable.
The Estirnation Algorithm: Let ki denote the current estimate of the surface pa-
t f
rameters, and let t and cudenote the motion parameters.These parameters are used to
md
construct an initial flow field that is used in the warping step. The residual information
is then used to determine an incremental estimate 6k.
By substituting the parametric form of 6u
(e) 6u=u-u0
om = (A(x)t) (r(x)"(ko + 6k)) * B(x)c.,- (a(*)t) (r(x)"ko) + B(x)c.,
i.rg - (A(x)t) r(x)"6k (12)
in Equation 3, we can obtain the incrementalestimate 6k as the vector that minirnizes:
aln E(6k)= I( @t + (vD"(nt)r"ar)2 ( 13)

x
Minimizing this error leadsto the equation:
r,
"(rtA"Xv/)(v/)'(At)r")] ru = - I "(t'A"Xv/)aI (14)
[I
10) This equation can be solvedto obtain the incemental estimate dk.
244
We demonstrate the appli-

Experiments with the planar surface motion model:
from an outdoor sequence.one of the
cation of the planar ,,rrf*" model using images
between both input images is
input images is shown in Figure 3a, u"a trt" difference
between the images using the
shown in Figure Bb. After esiimating the camera motion
estimation algorithm
algorithm described in Section J.3, i" opplied the plana,rsurface
a region on the ground plane'
to a manually selectedimage window placed roughly over
towards the first (this process
These parameterswere then used to warp the secondframe
this warped image and the
should align the ground plane alone). The difierence between
of the ground plane
original image i, Jho*r, in rigu. e lc'.The figure showscompensation
other objects in the background'
motion, leavingresidual puru,Ilr* motion of tle trees and
projected a rectangular grid
Finally, in order to demonstratethe plane-fit, we graphically
image in Figure 3d.
onto that prane.This is shown superimposedon the input
3.3 Rigid BodY Model

rigid mo_tioncannot usually
The Model: The motion of arbitrary surfacesundergoing
use of the global rigid bgat
be describedby a single global model. We can howevermake
In this section, we provide a
model if we combine it with a local model of the surface.
provides further details
brief derivation of the global and the local models. Hanna [12]
interact at corner-like
and results, and also dkcribes how the local and global models
and edge-likeimage structures.
by a rigidity moving object
As described in Section 8.2, the image motion induced
can be written as:
'(x)= (15)
fun(*)t*B(x)c.,
(i.e., its depth), whose image
where z(*) is the distance from the camera of the point
position is (x), and
A(*)= [-ot!,X]
B(x) - | @illf -u2 +,\lf v I

-@v)lf -x l
L(r'+ v2)lf
positions and the focal length
The A and the B matrices depend only on the image
vector, c.rthe angula,rvelocity vector, and
f and not on the unknowns: t, th; translation
c.rand t, with pa'rametersof
Z . Eqaalion lb relates the parameters of the global model,
the local scenestructure, Z(x)'
that over a local image
A local model we use is the frontal-planar model, which means
the assumption that
patch, we assumethat z(*) is constant. An alternative model uses
refined estimate-is constant
bZ 1*1-the differencebetween a previous estimate and a
over each local image Patch.
of the local struc-
We refine the local and global modelsin turn using initial estimates
@ and t. This local/global
ture paramet erc,z(x), and the globat rigid body paiameters
refinement is iterated severaltimes'
be denoted as Z;(Ill.*l :"d

The Estimation Algorithm: Let the current estimates
to construct an initialflow
cr.r;.As in the other models,we can use the model parameters
towards the next. The residual
field, ,i(*), which is used to warp one of the imag" fru*"r
to which it is warped is used to
error between the warped image and the originJ image
245
refine the parametersof the local and global models.We now show how these models are
: refined.
S We begin by writing equation 15 in an incremental form so that
3
I du(x) -#A(x)ts -B(x)c.,s
= jft.A(x)t*B(x).., (16)
is
Inserting the parametric form of du into Equation 3 we obtain the pixel-wise error as
e
E(t,u,Lfz(x))= (at + (vr)" N/z(x) + (v|rBu - (vr)" Ari/zi(x)- (v4"B r,)'

a
(17)
I
To refinethe local models,we assumethat L/Z(x) is constantover 5 x 5 imagepatches
centered on each image pixel. We then algebraically solve for this Z both in order to
estimate its current value, and to eliminate it from the global error measure. Consider
the local component of the error measure,
Eto"ot- I E(t,w,I/Z(x)). (18)

5xE
Differentiatingequation17 with respectto I/Z(x) and settingthe result to zero,we get

- Ibxs(VI)"At (4/ - gz/)rAtilzd(x),+ (V/)"gc.t
' - (v
\ l)r3,w;)
' '
L/z(x)- t1(
(19)
Du*u((vr;r6*''
To refine the global model, we minimize the error in Equation L7 summed over the
) entire image:
Estobat= E(t,u,I/Z(x)). (20)
:e t
Image
We insert the expressionfor | / Z (x) given in Equation L9-not the current numeri,cal
aalue of the local parameter-into Equation 20. The result is an expression for Eilobar
that is non-quadratic in t but quadratic in c.r. We recover refined estimates of t a,ndc.r
by performing one Gauss-Newtonminimization step using the previous estimates of the
global parameters, ti and arg,as starting values. Expressionsa,reevaluated numerically
att ;andu)=u)i.
h We then repeat the estimation algorithm severaltimes at each image resolution.
d
,f
Experiments with the rigid body motion model: We have chosenan outdoor scene
e to demonstrate the rigid body motion model. Figure 4a shows one of the input images,
rt and Figure 4b shows the difference between the two input images. The algorithm was
rt perfiormedbeginning at level 3 (subsampledby u factor of 8) of a Laplacian pyramid. The
local surface parameterc If Z(x) were all initialized to zero, and the rigid-body motion
parameterswereinitializedto t0 = (0,0, 1)T and u) = (0,0,0)t.The modelparameters
rl were refined 10 times at each image resolution. Figure 4c shows the difierence image
between the secondimage and the first image after being warped using the final estimates
of the rigid-body motion parameters and the local surface parameters. Figure 4d shows
an image of the recoveredlocal surface parameterc lf Z(x) such that bright points are
d nea,rerthe camera than dark points. The recoveredinverse ranges are plausible almost
w everywhere,except at the image border and near the recoveredfocus of expansion.The
al bright dot at the bottom right hand side of the inverserange map correspondsto a leaf
;o in the original image that is blowing acoss the ground towa"rdsthe camera. Figure 4e
246
shows a table of rigid-body motion parameters that were recovered at the end of each
resolution of analysis.
More experimental results and a detailed discussion of the algorithm's performance
on va.rious types of scenes can be found in [12].
3.4 General Flow Fields

The Modeh Unconstrainedgeneralflow fields are typically not describedby any global
parametric model. Different local models have been used to facilitate the estimation pro-
cess,including constant flow within a local window and locally smooth or continuousflow.
The former facilitates direct local estimation [18, 20], whereasthe latter model requires
iterative relaxation techniques [16] tt is also not uncommon to use the combination of
these two types of local models (".g., [3, 10]).
The local model chosenhere is constant flow within 5 x 5 pixel windows at each level
of the pyramid. This is the sarnemodel as used by Lucas and Kanade [18] but here it is
embeddedas a local model within the hiera,rchicalestimation framework.
The Estirnation Algorithm: Assume that we have an approximate flow field from
previous levels (or previous iterations at the same level). Assuming that the incremental
flow vector 6u is constant within the 5 x 5 window, Equation 3 can be written as
E(6u)- f{a I +vfr6')' (2r)

x
where the sum is taken within the 5 x 5 window. Minimizing this error with respect to
6u leadsto the equation,
-I
[Itotxvo'] 6u- vIAI. (22)
We make some observationsconcerningthe singularities of this relationship. If the sum-
*ittg window consists of a single element, the 2 x 2 matrix on the left-hand-side is an
outer product of a 2 x I vector and hence has a rank of atmost unity. In our case,when
the sum*ittg window consistsof 25 points, the rank of the matrix on the left-hand-side
will be two unlessthe directions of the gradient vectors V.I everywherewithin the window
coincide. This situation is the general caseof the aperlure effect.
In our implementation of this technique,the flow estimate at eachpoint is obtained by
using a 5 x 5 windows centeredaround that point. This amounts to assumingimplicitly
that the flow field varies smoothly over the image.
Experiments with the general flow model: We demonstrate the generalflow algo-
rithm on an image sequencecontaining severalindependently moving objects, a casefor
which the other motion models described here are not applicable. Figure 5a shows one
image of the original sequence.Figure 5b shows the difference between the two frames
that were used to compute imageflow. Figure 5c showslittle differencebetween the com-
pensatedimage and the other original image. Figure 5d showsthe horizontal component
of the computed flow field, and figure 5e shows the vertical component. In local image
regions where image structure is well-defined,and where the local image motion is sim-
ple, the recoveredmotion estimates appear plausible. Errors predictably occur however
at motion bounda^ries.Errors also occur in image regionswhere the local image structure
is not well-defined (like some parts of the road), but for the same rea"son,such errors do
not appear as intensity errors in the compensateddifferenceimage.
247
I each 4 Discussron
nalrce Thus far, we have described a hierarchicalframework for the estimation of image motion
between two images using va^riousmodels. Our motivation was to generalizetle notion
of direct estimation to model-basedestimation and unify a diverse set of model-based
estimation algorithms into a singleframework.The framework also supports the combined
use of parametric global models and local models which typically represent some type of
3lobal a smoothnessor local uniformity assumption.
l pro-
One of the unifying aspects of the framework is that the same objective function
flow. (SSD) is used for all models, but the minimization is performed with respect to different
luires parameters.As noted in the introduction, this is enabledby viewing all these problems
on of
from the perspective of image registration.
It is interesting to contrast this perspective (of model-basedimage registration) with
level
some of the more traditional approachesto motion analysis. One such approach is to
:itis compute image flow fields, which involvescombining the local brightness constraint with
somesort of a global smoothnessa^ssumption, and then interpret them using appropriate
motion models. In contrast, the approach taken here is to use the motion models to
from constrain the flow field computation. The obvious benefit of this is that the resulting
ental flow fields may generally be expected to be more consistent with models than general
smooth flow fields. Note, however,that the framework also includes general ,*ooih flow
field techniques,which can be used if the motion model is unkno*n.
(21) In the caseof models that are not fully parametric, local image information is used to
determine local image/sceneproperties (e.g.,the local range value). However,the accu-
:t to racy of these can only be as good as the availablelocal image information. For example,
in homogeneousareasof the scene,it may be possibleto achieveperfect registration even
if the surface range estimates (and the correspondinglocal flow vectorsf are incorrect.
(22) However, in the presenceof significant image structures, these local estimates may be
expectedto be accurate.On the other hand, the accuracyof the global parameters(e.g.,
ium-
the rigid motion parameters) dependsonly on having sufficient and sufficiently diverse
san
local information across the entire region. Hence, it may be possible to obtain reliable
rhen
estimates of these global parameters, even though estimated local inf,ormation may not
side
be reliable everywherewithin the region. For fully parametric models, this problem does
.dow
not exist.
The image registration problem addressedin this paper occurs in a wide range of
tbv
image processingapplications, far beyond the usual ones consideredin computer vision
:itly
(".9., navigationand imageunderstanding).Theseinclude imagecompressionvia motion
compensatedencoding, spatiotemporal analysisof remote sensingtype of images,image
databaseindexing and retrieval, and possibly object recognition. On" way to state this
lgo- general problem is as that of recoveringthe coordinate system that relate two imagesof
for a scenetaken from two different viewpoints. In this sense,the framework proporuJ h"r"
one unifies motion analysis acrossthese diferent applications as well.
mes
om-
ent Acknowledgements: M*y individuals have contributed to the ideas and results pre-
age sentedhere. These include Peter Burt and Leonid Oliker from the David Sarnoff Research
im- Center, and ShmuelPelegfrom Hebrew University.
:ver
ure
do
248
References
and structure from optical flow generated
1. G. Adiv. Determining three-dimensionalmotion pattern Anorysis and Machine Intelligence,
by severar moving objects. IEEE Trans. on
?( ):384-401,JulY 1985-
techniques for the measurement of
2 . p. Anandan. A unified perspective on computational
vision, pages zl9-230, London,
visual motion. rn Internationar conference on computer
May 1987.
an algorithm for the measurementof visual
3 . p. Anandan. A computational framework and
1989'
motion. International Journal o! computer vision,2z283-3L0,
computationaly efrcient motion estimation
4 . J. R. Bergen and E. H. Adersoo. Hi.rarchicar,
algorithm. J. Opt. Soc.Am' A',4:35,1987'
pereg.computing two motionsfrom three
s. lt;:';;;, ;1. ir"rr,'ii. Hinsorani,lnd s. 1ee0'
Tt 5,ol' on computer vision,osaka, Japan,December
- l-- T^-^- T\^^o-l'o. 1 OOf)
;;;;"ii"i)rr)r""t;;;'
lfallles. lll rt c;;i;"ce _- .r^ Ttr1E
pyramid as a compact image code. IEEE
6. p. J. Burt and E. H. Adelson. The raprr.ino
/d!a e Transactionson c ommunication,31:532-540,1983.
Awl ;. #""rrJ;."b;;:;-;;;;kfi.,oi*, a moving camera,an appricationof dvnamic motion 'l
i";i;;;o\n visuarMotion, cA, March1e8e-

pases2-12,rr-vine,
. f1t A lf,^-^L oQo
;;il:
(LItCLIJDrD. i"'ioii
-rr1
component pat--- :-^l^1i-- ^^rnhanan
g. p.J. Burt, R. llingorani, and R. J. Kolczynski. Mechanisms for isolating I r : ^ ^ . - l i f ^ + : ^ ^

rn IEEE workshop on visual Motion,
terns in the sequential analysis of multipre motion.
pages187-193,Princeton,NJ, October 1991'
motion parallax' In
g. stefan carlsson. object detection using model based prediction and
sweden, August 1989'
stockholm workshopon computationaluision,stockholm, systemsfor
the dynamic pyramid. rn Pgrarnidal
1 0 . J. Dengler. Locar motion estimation with
1986'
,o*pul", aision,pages289-298, Maratea, Italy' May
algorithms for estimation of optical flow fields in
1 1 . w. Enkelmann. Investigations of multigrid
Image Processing4339:150-L77'1988'
image sequences.ComJuter Vision, Giaphics, and
of ego-motion and structure from motion'
12. K. J. Hanna. Direct multi-resolution estimation
NJ, October 1991'
ln Workshopon Visual Motion' pages156-162, Princeton,
and *oiioo rrom multipleframes'TechnicalReport
,Tt^^l | .
13. :: ri:"i5't;:;':rir**i.";il;;;e
1190,MIT AI LAB, Cambridge,MA, 1990'
14. E. C. Hildreth . The Messureme,nt o! visual Motion' The MIT Press' 1983'
1 5 . B . K . P . I l o r n . R o b o t V i s i o n , . M I T P r e s s , C a m b r i d g e , M A , 1 9 8 6 . Inteuigence,r7:L85-
optical flow. Artificial
1 6 . B. K. p. Horn and B. G. Schunck.Determining
2 0 3 ,1 9 8 1 .
for recovering motion. International
17. B. K. p. Horn and E. J. weldon. Direct methods
Journalof ComputerVision,2(1):51-76' 'it"r.tive June 1988'
1g. B.D. Lucas and T. Kanade.'Ao image registration techniquewith an application
pages 121-130,1-991'
to stereo vision. In Image{JnderstsndingWorkshop' ror estimating
Matthtes, K. s;"ilil:;;J i: K;";J;. Karman'ftt"'-u.'ed algorithms
l: il=jfit;'*.
19. L. SzensKl' ar
on computer vision, pages 199-
r l f :
depth from image-sequences. rn International conference

213,TamPa,FL, 1988'
from second order intensity variations in in-
zo. H. H. Nagel. Displacement vectors derived
andImage Ptocessing,2T:85-ll7 '
tensity sequences. computer vision, Pattern reognition
1983.
passivenavigation. IEEE Trans. on Pattern
zr. s. Negahdaripour and B.K.p. Ilorn. Direct
g(1):168-1?6, January 1987'
Analysisand Machine Intelligence,
for image-flow computation. ln International
22. A. Singh. An estimation theoretic framework
November 1990'
Confeienceon Computer Vision,Osaka,Japan' andgrobar
derormation
""i"ii.n, neighborhood
23. ;:#:ffiff;;;?:w;;:'b;;;;;; .-L , /e\.o<
Joirnal of Robotics Reseotch,4(3):95-
image flow: planar surfaces in motion . International
108,Fall 1985.
ted
C€,
;of
)[r
ual
ion
ree
).
EE
Fig. 1. Diagram of the hierarchical motion estimation framework.
nd
ion
at-
)ft,
In
for
in
3.
)n.
)rt
t-
wl
on
ng
t-
.n-
(
rn
wl
>al
Fig.2. Affine motion estimation: a) Original. b) Raw difference. c) Compensated difference.

Fig.3. Planar surface motion estimation.
a) Original image.
b) Raw diference.
.) Diff"r"nce after planar compensation.
c) Planar grid superimposed on the original image.
&
(a)
(.oooo,.ooo0,.oooo)
(.0000,.0000,1.0000)
3 2 x 3 0 .0027,.0039,-.0001 ( - . 3 3 7 9 , -3. 15 2 , . 9 3 1 4 )
64x60 ( . 0 0 3 8 , . 0 0 4 1 , . 0 0 ( -9. 3 31 9 , - . 0 5 6 1 , .19! )4
1
1 2 8 x 1 2 0 ( . 0 0 3 ? , . 0 0 1 2 , . 0 0 0-.oooo,-.0383,.9971)
8)
256 x 240 (.oozg,.oo06,.oo13) -.0255,-.0899,.9956)
Fig.4. Egomotion based flow model'

a) Original image from an outdoor sequence'
b) Raw difference.
c) Difference after ego-motion compensation'
d) Inverse range map.
"j nigia body parameters recovered at each resolution.
252
(")
(b)
(") (d)
(")
Fig.5. Optical flow estimation.

a) Original image.
b) Raw diference.
c) Difference after motion compensation.
d) Horizontal component of the recovered flow fielcl.
e) Vertical component of the recovered florv field.

01.13.hierarchical Model-Based Motion Estimation

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

01.13.hierarchical Model-Based Motion Estimation

Hochgeladen von

Copyright:

Verfügbare Formate

Hierarchical Model-Based Motion Estimation

James R. Bergen, P. Anandan, Kei,th J. Hanna, and Rajesh Hingorani

Abstract. This paper describesa hierarchical estimation framework for

1.1 Hierarchical estimation

L.2 Motion Models

1.3 Paper Organization

The remainder of the paper consistsof an overview of the hierarchicalmotion estimation

2 Hierarchical Motion Estimation

Figure 1 describesthe hierarchical motion estimation framework. The basic components

where * = (r,y) denotesthe spatial imageposition of a point, f the (Laplacianpyramid)

r({.'}) - t (/(*,t) - /(x - rr(*),t- L))' (1)

where pr,. is a vector representingthe model parameters.

current values of the

r) 3.1 Affine Flow

)n Using vector notation this can be rewritten as follows:

ts X(x) = I t ' y 000 l

use of the affine flow

3.2 Planar Surface Flow

B(x) = | @il/f -(f-@v)lr

u(x) = (A(x)t) (r(x)"k) + B(x)r.r (10)

in Equation 3, we can obtain the incrementalestimate 6k as the vector that minirnizes:

aln E(6k)= I( @t + (vD"(nt)r"ar)2 ( 13)

We demonstrate the appli-

3.3 Rigid BodY Model

B(x) - | @illf -u2 +,\lf v I

be denoted as Z;(Ill.*l :"d

E(t,u,Lfz(x))= (at + (vr)" N/z(x) + (v|rBu - (vr)" Ari/zi(x)- (v4"B r,)'

Eto"ot- I E(t,w,I/Z(x)). (18)

Differentiatingequation17 with respectto I/Z(x) and settingthe result to zero,we get

3.4 General Flow Fields

E(6u)- f{a I +vfr6')' (2r)

i";i;;;o\n visuarMotion, cA, March1e8e-

g. p.J. Burt, R. llingorani, and R. J. Kolczynski. Mechanisms for isolating I r : ^ ^ . - l i f ^ + : ^ ^

depth from image-sequences. rn International conference

Fig.2. Affine motion estimation: a) Original. b) Raw difference. c) Compensated difference.

Fig.4. Egomotion based flow model'

Fig.5. Optical flow estimation.

Das könnte Ihnen auch gefallen

r({.'}) - t (/(,t) - /(x - rr(),t- L))' (1)