Beruflich Dokumente
Kultur Dokumente
Abstract
1 Introduction
to select the pixels to be used in the registration step.
Consider a camera moving in the world; for exam- This system was integrated with a visual position
ple, a person using a hand-held camera or a cam- estimator [3]. The position estimator was developed
era mounted on a car. The images obtained are of- for space robotics applications, where the video frame-
ten shaky, particularly when the camera moves fast, rate is low. Thus, the most important factor for the
When there is st significant amount of motion, it is image stabilization algorithm is to deal with large dis-
hard to keep track of objects in the video sequence. placement between frames, a high frame-rate is not
Getting disoriented is easy when the camera moves fundamental. In applications such as stabilizing a
quickly. When comparing two consecutive frames of a home video, on the other hand, a high frame-rate is
video sequence, t<hecamera motion causes a motion of fundamental, but displacements between frames tend
the contents. Figure 1 illustrates this effect. The goal to be of a smaller magnitude.
of image stabilization is to estimate and compensate
for this motion.
In this paper, we present a system capable of sta- 2 Related Work
bilizing video with very large displacements between
frames. This system is applied to feature tracking and There has been extensive work on image stabiliza-
to generate stable overlays on the video. tion, [a, 5, 7, 91 are some examples available in the lit-
This paper contributes with formulation and results erature. Most of these, including our system, use the
of the fastest software-only implementation of image same general framework for image stabilization: the
stabilization. The system presented here is capable motion between consecutive frames of the video se-
of stabilizing video at 5Hz on non-dedicated hardware quence is estimated (except for [2]) and compensated
(Silicon Graphics RlOOOO workstation). This is also for. The main differences are in the motion estima-
the only published work to use color information in the tion algorithm used, the type of motion compensation
image stabilization and apply a gradient-based criteria applied, the hardware and the final goal application.
Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:16 from IEEE Xplore. Restrictions apply.
Morimoto and Chellappa developed a system that
performs motion estimation by tracking a set of fea- Therefore, the phases (@) are different by a linear
term in a , magnitude is invariant to translation:
ture points [9]. The system of Hansen et al. fits a
global motion model to the optical flow [5]. These @tg(t)l- @tJ(t)l= --aw .
implementations achieved higher frame-rates (between
10 and 28Hz) using specialized hardware. These ap- The desired translation can be obtained by calcu-
proaches use tracking or flow to stabilize video. Our lating the inverse Fourier transform of F [ d ( t ) ]where:
,
approach stabilizes the video by directly estimating W(t)l = Qtg(t)l - @![f(t)l= --a" I llW(~)lIl
=1 .
a global transformation by applying an image regis-
tration algorithm, This result could then be used for The inverse is a delta function translated by a:
tracking or flow estimation. We believe our approach
is less prone to local effects. Our approach achieves F[d(t)]= e-aW' + d(t) = 6(t - a) .
a lower frame-rate, but does not use specialized hard-
ware. The performance of other approaches will prob- Therefore, d ( t ) will have value zero everywhere ex-
ably be up to one or two orders of magnitude lower cept at a, which is the desired translation.
if used on non-dedicated hardware, while our system This same reasoning can be extended to continuous
would have performance comparable to existing sys- 2-D functions. Consider, g(x,y) = f(x - a , y - b ) , in
tems if implemented on dedicated hardware. this case, the function d ( z , y) is defined by:
Irani et al. formulate a method for determining 3D Qtd(%Y)l = @h[g(~IY)I - @[f(zc,y)l= -(awl + bw2) >
Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:16 from IEEE Xplore. Restrictions apply.
model of the effect on the images of the camera mo- 0 smoothed camera motion: the high-frequency
tion. Thus, the problem can be defined as finding the component of the camera motion is removed,
values of the parameters of 7 that minimize E’. eliminating “jittering” in the video, and is used to
An iterative numerical optimization method is used stabilize the overlays. The low-frequency compo-
to minimize E2 [la]. These methods are prone to fall nent of the camera motion, such as that caused by
into local minima, instead of the desired global min- a rover turning, should not be altered. A possible
ima. To circumvent this problem, the method can approach is to remove the low-frequency compo-
be performed in a coarse-to-fine fashion, for example, nent from the motion compensation. [4]
using a Gaussian image pyramid [ll]. Another in-
teresting technique is to also perform a coarse-to-fine
approach on model space. [6]. 5 Feature Tracking
3.3 Optimizing in Color Space
Feature tracking is used in many computer vision
When using grayscale images, a blue sky could be applications. The motion of the camera and of objects
incorrectly registered to a brown mountain, if they in the scene are the main causes of feature motion.
have similar intensities. This kind of mismatch is very Image stabilization can be used estimate the camera
disturbing to us, as viewers and users of the registered motion component.
images, because a blue sky and a brown mountain are In most tracking approaches, a window (or tem-
very different; they don’t even have the same color! plate) around the desired feature is matched using a
We have worked on the hypothesis that color gives a local search algorithm. These methods are affected
strong cue to the registration process, the cue neces- by local effects (local minima), such as textureless re-
sary to minimize this kind of mismatch. Therefore, it gions and local repetitive patterns. A coarse-to-fine
was incorporated in the formulation of the algorithm. approach is usually applied to minimize this problem,
Consider an ima, e divided into three bands, red but it will not solve it completely.
(R), green (G) and %lue(a).This formulation incor- For applications in which camera motion is the
porates color information into the error function: main source of motion in the video sequence, our ap-
E’ K ~ x [ R t + i ( u , v -Rt(z,y)I2
) + proach is able to estimate this motion. Therefore, it
“,Y is possible to use image stabilization to track features.
K ~ x [ G t + i ( u , v-) Gt(z3y)I2 + This is achieved by applying the transformation [T]to
”,Y
the original position of the feature, thus, obtaining a
new estimate of feature location. [4]
K I 3 C [ B t t 1 ( u , v ) - Bt(Z,Y)I2 i (2) The advantage of using a global method is that it
X?Y
is not affected by local effects. This allows us, for ex-
where (u,v) = 7[(z,y)] . ample, to track a region with no texture. However,
The constants K R ,KG and K B are weights for each the motion of the feature may not be perfectly esti-
component. Again, the problem is defined as finding mated with a global motion model. In this case, image
the values of the parameters of 7 that minimize E2. stabilization can be used to obtain an initial estimate
of feature motion. This estimate can then be refined
using a local method.
4 Motion Compensation
In this Section, wle present an overview of some mo-
tion compensation techniques. [4] 6 Implementation and Results
0 overlays on raw video: an application of this We implemented an image stabilization system us-
work is creating stable graphical overlays on video ing the formulation presented previously. This sys-
so that they appear to “stick” to world features tem was integrated into a Visual Position Estimator
as the camera moves. For example, buildings can designed to aid operators of teleoperated rovers [3].
be labeled in art aerial sequence, or the goal of a This system was able to stabilize a live video stream
teleoperated rover could be indicated. at 5Hz, using a software-only implementation on a Sil-
0 rigid frame: the motion estimate is used to warp icon Graphics RlOOOO workstation. This frame-rate is
all frames to a reference frame. The viewer feels adequate for a space robotics application, such as that
that camera motion has been removed. of the visual position estimation system.
21
Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:16 from IEEE Xplore. Restrictions apply.
The image stabilization system implemented is able
to stabilize video with large displacements between
frames, up to 50% of the image size. This is essential
when the frame-rate is low.
Video clips and illustrations are available:
http://www.cs. emu. edu/-guestrin/ImageStabilization/
6.1 Overview
22
Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:16 from IEEE Xplore. Restrictions apply.
6.2.2 Minimizing Image Difference intensity variation in the images only make E2 nu-
merically larger. The gradient information, however,
To perform this minimization, the Levenberg- is necessary to determine how to converge to a solu-
Marquardt method for non-linear optimization was tion. Therefore, gradient-based subsampling will de-
chosen [lo]. Using this iterative optimization method, crease the amount of data used in the optimization,
sub-pixel accuracy can be achieved. but maintain a large part of the information needed
In our implementation, optimization is carried out for convergence.
in a coarse-to-fine alpproach in image space, using a
In our implementation, we only consider a band of
Gaussian pyramid. A coarse-to-fine approach is also
a pixel (each pixel is composed of three bands) if its
applied in model space, starting with a simpler model
and using the result as the initial estimate for the next, gradient is above a threshold. This threshold is dy-
more detailed, model. In many cases, the largest mo- namically determined. First, the percentage of the
tion component in video sequences is caused by trans- pixels that will be used in the computation is defined.
lation. It is common to see the camera panning, but Then, the threshold gradient value that binds that
not rotating about its optical axis. For this reason, percentage of the pixels is determined. In our imple-
we first estimate translation between images and then mentation, the threshold value is determined by, first,
estimate other motion models, such as the rigid model sorting the gradient pixels. The desired threshold is
(rotation+translation) used in our system. the pixel with index npixelJ* loo-binding percentage 100
in the sorted vector. For our application, we found
Registration in Color Space that using 8% of the pixels yielded the best perfor-
We chose to weigh each band equally. Therefore, mance to accuracy ratio.
the constant K R , KG and KB in equation ( 2 ) were
set to one. The weights for each band could be tuned
through experiment ation or with a physical analysis 6.3 Motion Compensation
of the application.
Although, at first the color formulation seems Figures 4 and 5 illustrate the results of image sta-
two times more computationally expensive than the bilization on a video sequence. The position of the
grayscale one, in practice this was not the case. The flag was estimated using the global motion estimation
formulation color folrmulation was only 20% slower, calculated by the registration algorithm. Experiments
because the optimiization converged in fewer itera- were also performed in smoothing camera motion and
tions. motion coding, obtaining successful results (see [4]).
Robustness, on the other hand, increased dramati-
cally with the use of color information. For many test During our tests, we compared the visual result of
sequences, visually noticeable mismatches were much smoothed camera motion to that of just using the raw
more frequent with the grayscale formulation than video. We found that, for low frame-rates, the im-
with color. This algorithm was also applied to image provement is minimal. It seems that in the low frame-
mosaicing. A large number of sequences were tested rate case the high-frequency motion we would like to
and the number of visually noticeable mismatches de- remove has not been captured due to the lower ac-
creased by about 50%. quisition frequency. Therefore, smoothing the camera
motion is only necessary when the frame-rate is high.
Gradient-based Subsampling
In order to increase performance, we optimize on a 6.4 Feature Tracking
subset of the image pixels. If we chose the pixels in this
subset randomly or in using a regular pattern, we may
lose important features that help the algorithm con- A feature tracker, using the global motion estima-
verge to a correct solution. Therefore, we would like tion as an initial estimate for feature motion, was im-
to select a subset of the image pixels using a criteria plemented as described in Section 5. First, the global
that maintains a large proportion of the information. motion calculated in the stabilization step presented
Thus, a gradient-based subsampling was performed. previously is used to estimate the position of the fea-
The image gradient information is used to deter- ture in the current frame. Then, a local tracking algo-
mine the partial derivatives and the Hessian of the rithm is applied to refine this estimate. [4] Using this
error function E 2 with respect to the model param- feature tracking improved, visually, the accuracy of
eters. Intuitively, we can think that regions of low the positioning of overlays over long video sequences.
23
Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:16 from IEEE Xplore. Restrictions apply.
Figure 4: The rigid frame technique of motion con; )ensation is applied to stabilize a video sequence.
I II I
Figure 5 : The motion compensation technique overlays on raw video is applied to stabilize the flag.
24
Authorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:16 from IEEE Xplore. Restrictions apply.