Sie sind auf Seite 1von 29

A Unied Approach for Registration and Depth in Depth from Defocus

Rami Ben-Ari
Keywords 3D reconstruction, registration, focus sensing, depth from defocus, extended depth
of eld, GPU computing
Abstract Depth from Defocus (DFD) suggests a simple optical set-up to recover the shape of a
scene through imaging with shallow depth of eld. Although numerous methods have been pro-
posed for DFD, less attention has been paid to the particular problem of alignment between the
captured images. The inherent shift-variant defocus often prevents standard registration tech-
niques from achieving the accuracy needed for successful shape reconstruction. In this paper,
we address the DFD and registration problem in a unied framework, exploiting their mutual re-
lation to reach a better solution for both cues. We draw a formal connection between registration
and defocus blur, nd its limitations and reveal the weakness of the standard isolated approaches
of registration and depth estimation. The solution is approached by energy minimization. The
efciency of the associated numerical scheme is justied by showing its equivalence to the
celebrated Newton-Raphson method and proof of convergence of the emerged linear system.
The computationally intensive approach of DFD, newly combined with simultaneous registra-
tion, is handled by GPU computing. Experimental results demonstrate the high sensitivity of
the recovered shapes to slight errors in registration and validate the superior performance of the
suggested approach over two, separately applying registration and DFD alternatives.
1 Introduction
An essential goal of visual inference is to provide estimates of the scene structure from images.
It is well known that extracting shape from a single view is an ill-posed problem. Reliable
Rami Ben-Ari
Orbotech Ltd.
E-mail: benari.rami@gmail.com
Digital Object Indentifier 10.1109/TPAMI.2014.14 0162-8828/14/$31.00 2014 IEEE
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
2 Rami Ben-Ari
approaches therefore assume having several views of the scene to recover the shape. Common
triangulation methods such as stereo vision [4] [15] [26], structured light [12], or structure from
motion [31], assume a pinhole image formation model, ignoring the lens blur. Imaging through
shallow depth of eld creates a varying blur (defocus) encoding the shape of the scene. A single
image is not sufcient for depth estimation due to unknown latent all-in-focus image of the
scene. Depth from Defocus (DFD) thus attempts to extract depth from two images (at least),
captured at different focus settings [7, 8, 9, 29, 32, 33]. In DFD, one attempts to solve the
inverse problem, i.e., recovering the shape that best agrees with the observations, according to
the image formation model. Closely related methods are multi-aperture photography [11] and
confocal stereo [14].
1.1 Problem Statement
In standard DFD, each point in the scene gains two unique observations. The two observations
are then compared in order to estimate the depth at a point. While in aligned images the cor-
responding points lay under the same coordinates, when the images are misaligned, the corre-
spondence is lost. DFD requires the scene to have a textured pattern (high frequencies). In most
cases hence, a slight misalignment drastically impacts the recovered shape, particularly due to
sampling and quantization. Figure 1 demonstrates the degeneracy of the reconstructed shape
under mild misalignments. Serious effort is therefore made to register the input images prior to
depth evaluation [8, 32, 33]. The problem of accurate registration was previously addressed in
[32, 33], while the authors discuss alignment level under 0.1 pixel error images captured in a
controlled lab set-up. The alignment problem was also addressed in confocal stereo [14], where
an optical model was developed to achieve sub-pixel accuracy.
Registration is a well-established domain in computer vision. Common methods today are
able to align images even between different modalities (see [35] and the references therein).
In an ideal case, the algorithm should be able to detect the same features (corners, curvatures
or intensities) in all projections of the scene, regardless of the particular geometric and photo-
metric transformations. However, the registration accuracy is bounded by several factors such
as discretization (sampling), photometric transformation, blur or noise [23]. A basic approach
denes the registering transformation by best matching the dense appearance pattern between
the observations. Other methods match a sparse set of points, called interest points, to better
cope with geometric and photometric transformations. A well-known set of interest points is
suggested by SIFT (Scale Invariant Feature Transform) [19]. While dense appearance registra-
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 3
Fig. 1: The effect of misalignment on 3D reconstruction from DFD. Top: A synthetic case of slanted
plane, misaligned by translation of (1,1) pixels. Down: Real images, warped here by translation of
(1.8,2.3) pixels to create misalignment. On the left, a sample image (from two) with recovered 3D re-
construction in the middle column. The shape reconstruction from the registered set is shown at left for
comparison. Note the distortions associated with rather slight displacements.
tion can cope with relatively smooth patterns, under low contrast, sparse methods require the
existence of unique cues in the image (usually corners), and offer certain invariance with re-
spect to zoom, rotation and translation. However, the aforementioned approaches rarely cope
with defocus blur adequately. Registration of differently blurred images remains a particular
challenge. To deal with this problem often focal stack methods abandon the standard registra-
tion approaches based on observations and turn to device calibration, using specied targets on
a controlled device [16]. The problem of registration in the DFD escalates when an asymmet-
ric defocus kernel is involved. In this case, registration features such as edges or corners are
smeared directionally, resulting an incorrect alignment. Figure 2 indicates the difculty in pre-
cise alignment of DFD sets, as sharp image features are matched with their blurred counterparts.
Results are demonstrated in Fig. 2 for SIFT-based registration.
Current DFD methods separate the registration and depth evaluation stages [8, 9, 32, 33],
ignoring the mutual relation between them. On the one hand, knowing the local blur (namely
depth) can be utilized to match the observations therefore resulting in improved registration. On
the other hand, exact registration is a necessary condition for accurate depth (blur) estimation.
This coupled nature of registration and depth, reasons for a combined approach, to drive a
unied cost function toward a global minimum. The suggested paradigm demands that both
registration and depth be satised simultaneously.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
4 Rami Ben-Ari
Fig. 2: The effect of defocus on registration shown on two pair of images captured/simulated in two
different focus positions. Left: A patch from a real image set used for DFD, describing part of a spider.
Note that the blurred features are aimed to precisely match their sharp counterpart in registration (and
vice versa). Right: A synthetic random dot pattern, uniformly blurred by an asymmetric kernel. On left,
the overlaid red circles are SIFT key-points while on right, these points are mapped according to SIFT
registration. For comparison, the blue plus signs are our registration results, leading to improved shape
recovery as shown in Section 5.
1.2 Previous Work
DFD is an active area of research with two main approaches: frequency (Fourier) [17, 33, 35]
and spatial methods [7, 9]. In the frequency domain one often seeks the latent radiance image
1
by deblurring the observations with a discrete range of blur kernels. The scale of the local kernel,
indicating depth, is then obtained by minimizing an objective functional that satises the image
formation model. The need for sufciently large domains, to estimate local frequencies, as well
as the discrete range of depths, often result in errors in regions of varying depths and depth
discontinuities.
The most popular minimization strategy for continuous modeling is the variational frame-
work [34]. Variational methods in the DFD suggest a spatial approach, allowing for a dense and
continuous solution, while providing a well-based mathematical formalism, explicitly express-
ing the model assumptions. The joint optimization of data compliance and spatial smoothness
efciently lls data values in ambiguous regions, while preserving its discontinuities. In this
paper, we formulate our model for registration and depth in a single energy functional. To facil-
itate the model, avoiding the recovery of the radiance map, we encode the depth by relative blur
[8, 9] and assume registration by a similarity transformation. This transformation applies for
a rigid motion of the camera/scene, perpendicular to the optical axis, while the scale complies
with the optical magnication. Our theoretical analysis shows that regardless of the transfor-
mation, one can still relate the observations, modeling the DFD problem by relative blur as in
[2, 8].
1
The map of a scenes view where all points are in focus.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 5
The suggested numerical solution is in the spirit of [5, 8] and based on linearization of the
Euler-Lagrange nite difference scheme, solved by nested (lagged) iterations. We furthermore
follow a pyramidal coarse-to-ne approach to better cope with local minima in depth, and large
misalignments. However, we suggest a slight modication to the previous approach of [9], to
improve its stability. New theoretical observations justify our approach by showing the equiva-
lence of our scheme to the celebrated Newton-Raphson method. Finally, a proof for convergence
of the emerged linear system establishes our numerical stability.
The registration in DFD plays the same role as weak calibration in stereo-vision, where
the fundamental matrix is eventually recovered. In stereo, images are captured in a wide depth
of eld (therefore approximated by the pinhole model), and are free of (depth varying) blur.
These conditions allow typical registration methods, to be suitable for stereo weak calibration.
Nonetheless, the DFD problem imposes a new challenge for registration due to the shift-varying
blur and the high sensitivity to registration errors (in order of a fraction of a pixel). Combining
depth estimation with registration therefore, was previously found to be a practical solution
[21]. However, as opposed to our approach, in [21] the blur is assumed to be constant in space,
restricting this model to equifocal surfaces, and far from the general case.
1.3 Parallel Implementation
Real-time performance of a 3D reconstruction method is commonly a desired property. Often,
a computationally expensive approach prevents a method to spread and become popular. Tri-
angulation methods similar to stereo vision are often the popular depth estimation techniques
to offer acceptable accuracy and capability for real-time performance. However methods as
stereo vision are not easily applicable in many scenarios. Previously, a real-time DFD system
was suggested by Watanabe et al. [32] using an active illumination, to allow for computation-
ally cheap decoding. While introducing a real-time performance (on VGA image size), this
approach required an additional projecting system, and was found to be highly sensitive to the
object reectance properties. In this study, we address the problem of passive DFD, where no
light emitters are involved, coping with the associated intensive computation by parallel pro-
cessing on GPU. Our implementation allows computing of 1 MPixel of a dense depth map in
just 2 seconds, while conducting self-registration. The computational rate is accelerated by or-
der of magnitude if DFD is activated without registration. These gures are obtained based on a
CUDA-C code implementation of our fast converging numerical scheme, when run on a desk-
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
6 Rami Ben-Ari
top NVIDIA GPU. To the best of our knowledge, this introduces the highest DFD computation
rate known in the literature.
1.4 Contribution and Experimental Tests
Our contribution is reected in several important aspects as specied below:
1. Accurate registration of images captured in a shallow depth of eld introduces a challenge
due to the varying blur. The reciprocal relation between registration and blur is exploited
here to improve the accuracy in both alignment and shape recovery. The suggested approach
presents a self-calibrating method
2
for estimation of a depth indicator, similar to disparity
in stereo-vision.
2. The relative blur model has been acknowledged as a reliable approximation for DFD, re-
sulting in a plausible and stable depth indicator (see also the recent work in [18]). However,
the terms under which the relative blur model apply for misaligned images has not yet been
shown. In this paper, we formally derive the relative blur model in presence of misalignment,
exposing its limitations. This approach can also be used for accurate registration of optically
defocused images.
3. Newexploration enhances and justies the stability of previously suggested numerical schemes.
In this regard, the convergence of the emerged inner-xed point iteration scheme is proved.
4. A newly derived parallel scheme for DFD allows an efcient implementation on GPU. To
the best of our knowledge, this presents the rst real-time and self-calibrating DFD in the
literature.
Our experimental tests demonstrate the limitation of standard appearance and feature based
approaches in accurate registration of images, captured in shallow depth of eld. The test-bed
includes synthetic as well as real data with quantitative and qualitative comparison. Most of
the currently available datasets are indoor images, often captured with controlled illumination
in a lab facility. We hereby introduce a new data captured in the wild and provide it for public
use. The results show successful recovery of depth, despite the misalignment involved
3
. We
further compare our method to a dense appearance-based registration and the widely used SIFT
matching. Improved results emerging from the suggested approach justify our paradigm.
2
In terms of weak calibration in stereo-vision.
3
Additional gures and content are provided on the project page, www.cs.bgu.ac.il/

rba/dfd/DFDProjectPage.html to
meet length requirements.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 7
1.5 Paper Organization
This paper is organized as follows: In section 2 we begin by describing the image formation
model for misaligned data-set. Section 3 then describes the objective functional, unifying regis-
tration and blur under a single coupled model. The numerical scheme for energy minimization
is then described in Sec. 4, including modications for a massively parallel scheme. Section 5,
then presents experimental evaluations made on a diverse set of data, with a comparison to two
alternatives based on separate registration and depth evaluation. Finally, the paper is concluded
in Section 6 with a summary and discussion.
2 Image Formation Model
Let us dene the latent all-in-focus view of the scene, namely the radiance image by; r :
R
2
[0, ) with denoting a compact support. We consider two images of the scene captured
under the same illumination regime but with different focus setting. Our rst observation, I
1
:
[0, ), is formed by imposing the defocus operator T
1
on the radiance image:
I
1
(x) = T
1
_
z(x)
_
r(x) (1)
Note that for a certain camera, the defocus operator depends on the scenes shape, z :
(0, ), and the optical setting, determined here by the subscript. As the optical setting, we refer
to the position of the focal plane in the scene, or the distance between the image sensor and the
lens in the camera. The new notation, denotes a shift-variant linear operation, to be dened
in Section 2.1. We make use of a 3D coordinate axes (X, Y, Z), where Z coincides with the
optical axis of the camera and (X, Y ) span the perpendicular plane.
While previous DFD models assume perfect alignment between the images, we allow for a
relative displacement and rotation of the camera/object, prior to the capture of the second image
4
. Equifocal and planar motion impose displacement in (X, Y ) plane and rotation, around the
optical axis (Z). On top of this geometric transformation there is an optical scale, often referred
as magnication [27]. The resulting transformation then can be described by a similarity map,
P. Note that the scale is constant and not depth dependent, according to the geometric optic
model (see proof in Appendix D.1 of [10]). Under these assumptions, the image formation for
our second image emerges as:
I
2
= T
2
_
z(Px)
_
r(Px) (2)
4
Note that the rst image is chosen as the reference image, without loss of generality.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
8 Rami Ben-Ari
While the defocus operator is attached to the camera frame, the shape and the radiance map are
both embedded in the object frame.
2.1 Defocus and Geometric Transformation
Defocused images are often described with linear models and applied by local convolution. To
facilitate the problem, we adopt a parametric approach, assuming the shape of the kernel h, to be
known and the depth encapsulated in the kernels scale: h : z [0, 1), with R
2
denoting
a compact support and z standing for a point-wise depth. This modeling reects aberration-free
optics where point-wise defocus kernel is determined by the corresponding scenes depth, and
is explicitly independent to spatial coordinates
5
. Another common assumption is the normal-
ization property:
_

h
_
u, z(x)
_
du = 1, x (3)
reecting the energy conservation.
We now dene the defocus operator in (1), by the following linear operation:
I
1
= T
1
(z) r(x) :=
_

h
1
_
u, z(x)
_
r(x u)du (4)
Note that the defocus function h
1
(, ) is determined by the focus setting (indicated by the sub-
script) and the local depth z. The second image in our DFD set, is captured under a different
optical setting and further endowed with a similarity transformation; Px = sRx + t (cf. Sec-
tion 2), where s, R, t denote the scale, the rotation matrix and the translation vector respectively.
Transformation P relates the second image frame and the scenes frame, suggesting the follow-
ing image formation model:

I
2
=
_

h
2
_
R
1
u, z(sx +t)
_
r(sx +t u)du (5)
I
2
=

I
2
(Rx)
The image formation model in (5) relates the geometric transformation to the defocus opera-
tor. While an aberration-free assumption detaches the kernel from dependency to position, the
rotation creates a relative angle that should be considered in the image formation, particularly
for asymmetric defocus kernels. This imaging model can be described by rst applying the
defocus with inversely rotated kernel on a scaled and shifted radiance map, then rotating the
resulting image. Equation (5) extends previous image formation models in DFD to misaligned
5
In practice, this effect can be compensated for by proper calibration as conducted in [14].
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 9
object-camera frames, under similarity transformation. In the following propositions we prove
the properties of this model.
Proposition 1 Let T be a defocus operator and Px = sx + t a pure translation and optical
scale. The defocus and geometric transformation are then commutative, i.e.:
T
_
z(Px)
_
r(Px) = I(Px) (6)
where I(x) = T
_
z(x)
_
r(x). Note that we omit the optical setting index for readability.
Proof Since R is an identity matrix, our extended image formation model in (5) reduces to:
T
_
z(Px)
_
r(Px) =
_

h
_
u, z(sx +t)
_
r(sx +t u)du (7)
Using the change of variables x = sx +t yields:
=
_

h
_
u, z( x)
_
r( x u)du = I( x) = I(Px) (8)
Note that the above claimholds for any defocus kernel, satisfying the normalization and aberration-
free properties. .
Proposition 2 Let the function h(, ) be a rotationally symmetric defocus kernel and Px =
sRx + t a similarity transformation. The defocus and geometric transformation are then com-
mutative.
Proof Since the kernel is rotationally symmetric:
h(R
1
u) = h(u) (9)
The intermediate image in (5) is then:

I =
_

h
_
u, z(sx +t)
_
r(sx +t u)du = I(sx +t) (10)
where the right equality is followed by Proposition 1. Rotating the image

I as described in (5)
yields the proof:
T
_
z(Px)
_
r(Px) =

I(Rx) = I(sRx +t) = I(Px) (11)
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
10 Rami Ben-Ari
2.2 Relative Blur
Commonly any defocus model assumes the existence of an underlying radiance map to be
the source of the defocused observations. However, in practice, the radiance map is a-priori
unknown and its recovery involves the unstable and noise sensitive deblur process. It has pre-
viously shown [10], that under certain conditions, one can recover the depth map by nding a
blur function mapping the observations toward each other, eliminating the necessity for estima-
tion of the radiance. This blur function is known as relative-blur [7, 8]. In this section, we will
show that under certain conditions, the relative blur paradigm can be extended to misaligned im-
ages. To this end, we start with expressing the radiance map by the reference (untransformed)
observed image, according to Eq. (1):
r(x) = T
1
1
(z) I
1
(x) (12)
Indeed the existence of the inverse defocus operator is a key assumption and will be discussed
below. Satisfying Propositions (1) and (2), enables commuting the order of the transformation
and defocus in (2). This results in a new representation for our transformed observation involv-
ing a new defocus operator independent of the transformation:
I
2
(P
1
x) = T
2
(z) r(x) (13)
Note that a key factor leading to this relation is the commute property proved in Propositions
(1) and (2). The two input images can now be related directly, by substituting (12) into (13),
changing T = P
1
, for convenience:
I
2
(Tx) = T
2
(z)T
1
1
(z) I
1
(14)
Similarly, one can derive the reference image from the transformed image by:
I
1
= T
1
(z)T
1
2
(z) I
2
(Tx) (15)
Note that neither of the involved defocus operators depends now on the geometric transfor-
mation T. Practically, equation (14) expresses a map between the observations by a two stage
process, rst defocusing I
1
according to the shape (and corresponding focus setting) obtaining
the radiance map, then applying a new defocus operator to form the second observation. The
resulting image is then transformed geometrically to yield I
2
. Equations (14),(15), express two
options for mapping points between the images, one conducted from the source I
1
to I
2
(14) and
the second vice versa (15). Per-point (pixel), one option involves a stable positive blur and
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 11
the other an unstable negative alternative, namely a deblur process. Seeking a stable scheme
reasons decomposing the image domain into two disjoint sets, where all points are mapped by
positive blur.
To discuss the existence of operator T
1
i
in (14) and (15), let us restrict ourselves to an
equifocal patch. The intensity pattern at this domain is then modelled by convolution of the
corresponding radiance patch and the defocus kernel. For a pillbox kernel type, T
i
is a non-
invertible operator, according to convolution theorem (zero crossing of kernel frequencies). The
inverse operator however, exists for a Gaussian kernel. In practice, a Gaussian kernel suffers
from high condition number and is therefore sensitive to noise or even rounding errors. This
problem is addressed by regularization such as in Weiner lter and more recent approaches of
[17, 29]. Interestingly, Subarrao et al. [29] handle the inverse defocus operator by restricting the
intensity to behave locally as a third-order polynomial function. The inverse problem can then
be formulated explicitly (see also [28]).
2.3 Gaussian Kernel
In a parametric approach, we assume the shape of the defocus kernel to be known. A Gaussian
shape kernel can be justied as approximation for an ideal lens [21]. Generalization from the lo-
cal approach to the whole domain allows for replacing the defocus operator in (1) by a Gaussian
blur D
i
(z) = G(V
i
). In this representation, V
i
: z(x) R
+
is a strictly monotonic function,
with respect to z. Based on self similarity and cascade properties of Gaussian convolution, one
can replace the two variance maps, V
i
, i 1, 2 with a single function, obtained from their
difference, V
r
:= V
2
V
1
6
. The operators relating the two observations in (14) and (15) can
now be described by the following Gaussians with a single unknown called relative-blur:
D
2
(z)D
1
1
(z) = G(V
r
) (16)
D
1
(z)D
1
2
(z) = G(V
r
)
For the rest of the paper, we will refer to depth as relative-blur, occasionally interchanging these
terms . The sign of this newly introduced function indicates the direction of blur, from I
1
to I
2
or vice versa. This relation creates a matching problem as previously shown in case of aligned
images [6, 7, 8, 29], and is equivalent to the correspondence problem in stereo. Note that if the
focus plane for imaging I
1
is closer to the camera, then positive V
r
indicates points near the
camera and negative values, parts further away.
6
See Appendix A.3 for elaboration
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
12 Rami Ben-Ari
3 The Energy Functional
The derived matching paradigm can be formulated as an optimization problem. Our new model
endows two unknowns, the relative blur function V
r
and the registering transformation, T. The
optimization problem is often formulated by minimization of an energy functional, consisting
of a data delity E
d
and smoothness prior E
s
:
[

V
r
(x),

T] = arg min
V
r
,T
E
d
[V
r
, T] +E
s
[|V
r
|], (17)
where the upper tilde denotes evaluated values and is a balancing weight.
The data term reects the image formation model (14) and (15). Considering a Gaussian
defocus kernel yields (see (16)):
E
d
[V
r
, T] =
_

|G(V
r
) I
1
I
2
(Tx)|
1
H(V
r
) +|G(V
r
) I
2
(Tx) I
1
|
1
(1 H(V
r
)) dx
(18)
Note that the /
1
penalization norm for data delity introduces robustness to outliers. The Heav-
iside function H(V
r
) divides the image into two disjoint and complementary sub-domains, ac-
cording to V
r
sign:
H(V
r
) =
_
_
_
1 V
r
0
0 V
r
< 0
(19)
As regularization term we employ total variation (TV) to favor piecewise smooth depth maps:
E
s
[V
r
] =
_

|V
r
|
1
dx (20)
In practice, a modied /
1
normis used, in both data and regularization terms, |f|
1

_
f
2
+
2
with as a small constant, avoiding numerical instability. Actually, this parameter has a broader
impact and can be used to tune the norm between /
1
and /
2
[3].
A necessary condition for the minimizer of (18) is given by Euler-Lagrange (EL) equations
that yields a pair of coupled PDEs. Variation with respect to V
r
results:
E
V
r
=

(o
1
2
)o
1

I
1v
H(V
r
) +

(o
2
2
)o
2

I
2v
(1 H(V
r
)) div
_

(|V
r
|
2
)V
r
_
= 0
(21)
where () denotes the modied /
1
norm and the following abbreviations are used to allow a
compact representation:
o
1
:= G(V
r
) I
1
I
2
(Tx), o
2
:= G(V
r
) I
2
(Tx) I
1
(22)

I
1v
:=

V
r
[G(V
r
) I
1
] ,

I
2v
:=

V
r
[G(V
r
) I
2
(Tx)]
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 13
The new terms

I
1v
and

I
2v
are functional derivatives with respect to V
r
. They endow a high
sensitivity, and will be elaborated in the following Section 4. The additional hat sign indicates
a function after defocus blur operation.
The second PDE is emerged from variation with respect to transformation components
i
:
E

i
=
_

(o
1
2
)o
1

i
I
2
(Tx)H(V
r
) +

(o
2
2
)o
2

i
[G(V
r
) I
2
(Tx)] (1 H(V
r
)) = 0
(23)
The transformation entries
i
correspond to scale, rotation and shift. For details in derivation of
the EL equations, the reader is referred to Appendices A.1 and A.2.
4 The Numerical Scheme
There are several approaches to solving the PDEs in (21) and (23), ranging from explicit
gradient-descent to semi-implicit schemes [22, 24, 25]. Recently, semi-implicit methods have
shown superior performance in terms of stability and convergence rate. In this paper we follow
the semi-implicit approach of [8, 22], based on linearization of the nite difference scheme, but
with a few modications.
Discretization of (21), while linearizing the scheme by lagged iteration yields:

(o
k
1
2
)o
k+1
1

I
k
1v
H(V
k
r
) +

(o
k
2
2
)o
k+1
2

I
k
2v
_
1 H(V
k
r
)
_
div
_

(|V
k
r
|
2
)V
k+1
r
_
= 0
(24)
where the superscript k in the iterations, stands for a known function (obtained from the pre-
vious iteration step) and k + 1 indexes the unknown entity. Note that as opposed to previous
approaches, the norm derivatives

() are set with a lagged iteration index to avoid nesting and


further computation.
The functional derivatives

I
1v
and

I
2v
in (24) are often derived analytically and then dis-
cretized. However, the emerged analytical expression suffers from numerical instability due to
high sensitivity to V
r
, particularly at small values (the exponent and the coefcient are inversely
proportional to V
r
and to cubic of V
r
, respectively). We have thus employed the following ap-
proximation for computation of

I
1v
and

I
2v
:

I
v

I (V
r
+V
r
)

I (V
r
V
r
)
2V
r
(25)
The upper hat indicates a defocused image. In addition to higher stability, this approximation
can directly accommodate an implicit and discretely represented defocus kernel (e.g. in case
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
14 Rami Ben-Ari
of a coded aperture [17]). The intensive computation burden of multiple defocus operations is
coped here by parallel processing.
The obtained numerical scheme is still non-linear with respect to V
r
. Following [5, 8] we
introduce V
k+1
r
by small updates around the previous iteration:
V
k+1
r
= V
k
r
+dV
k
r
(26)
Describing the update as a small perturbation allows a rst-order approximation:
o
k+1
i
= o
k
i
+

I
k
iv
dV
k
r
, i 1, 2 (27)
Substitution of the linearized terms in the lagged iteration scheme of (24), yields a linear system
for the newly introduced variable dV
k
r
. A system of linear algebraic equations is then emerged
after discretization, to be described symbolically by:
A
k1
dV
k
r
= b
k1
. (28)
In this scheme spatial derivatives are discretized by central differences. In the following sec-
tions, we elaborate on the iterative solution of this linear system, revealing its equivalence to
the celebrated Newton-Raphson scheme.
The remaining unknowns are now the registration parameters. Stabilized cameras endure a
relatively small misalignment between the captured images. It is therefore reasonable to use an
explicit scheme for the registration PDE in (23). Image warping associated with the alignment,
uses interpolation and thus blurres the intensity edges. This blur component is added to the
blur caused by the optical system and therefore affects the depth evaluation. To this end, we
employed a bi-cubic interpolation to reduce this effect while obtaining a mild impact on the
defocus.
To cope with the emerged coupled PDEs we conduct a joint minimization, modifying the
relative blur and registration parameters successively at each iteration. A Gaussian pyramidal is
further applied to handle large misalignments and local minima.
4.1 Relation to Newton-Raphson Iterative Method
A popular numerical approach for efcient solution of Euler-Lagrange equations conducts lin-
earization on the nite difference scheme [4, 5, 8]. While this is commonly preferred to cope
with large variations, we bring in this section a theoretical justication for the efciency of such
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 15
approach. To establish this claim, let us restrict ourselves to the data term, assuming without
loss of generality, that V
r
> 0. Under these conditions Eq. (28) reads as:
dV
r
=

(o
2
1
)

I
1
I
2
(Tx)

I
1v
(29)
Dening the numerator as f :=

I
1
I
2
(Tx), yields the following relation:
dV
r

f(V
r
)
f

(V
r
)
(30)
The term

(o
2
1
) is a weight function emerging from our penalization, and would have gained
the unit value for a quadratic norm. The updating scheme in (30) is equivalent to the well-known
Newton-Raphson method, having at least a quadratic convergence rate
7
. This equivalence re-
ects the efciency of our semi-implicit scheme.
4.2 Solution of xed point iterations and proof of convergence
Eq. (28) presents a sparse linear system for the discrete vector of variables dV
r
, and can now be
approached for a solution. The most relevant tools for this task are the iterative schemes, known
as relaxation methods. In this category there are two classes, the stationary methods e.g. Jacobi,
Gauss-Seidel (GS) or SOR and non-stationary approaches, e.g. conjugate gradients. Gauss-
Seidel offers an efcient iterative approach for large and sparse linear systems. Although GS
is inherently sequential, a known relaxation called Red-Black allows for a parallel alternative
suitable for GPU computing. In this section, we prove the convergence of the inner xed-point
iterations as described in (28).
Lemma 1 The xed-point coefcient matrix in (28) is strictly diagonally dominant
Proof Lets consider an image of size M N. The coefcient matrix in (28) is then a sparse
matrix of size MN MN, having a three diagonal and two off-diagonal terms. The matrix
entries are obtained from the nite difference scheme in (24). In a row stacking format, the
diagonal terms a
m,m
can be decomposed into two parts each given by:
a
m,m
= D
m
+S
m
(31a)
D
m
: =

(o
2
1
)

I
2
1v
H(V
r
) +

(o
2
2
)

I
2
2v
(1 H(V
r
)) (31b)
S
m
: =

2
_
4

m,m
+

m,m1
+

m,m+1
+

m+M,m
+

mM,M
_
. (31c)
7
For a continuously differentiable function having a non-zero derivative at the solution.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
16 Rami Ben-Ari
where D
m
and S
m
are obtained from the data and smoothing terms respectively, and:

:=
_
_
|V
r
|
2
+
2
_
1
,

(o
2
) =
_
_
o
2
+
2
_
1
(32)
The sum of absolute off-diagonal entries of A
k
in the m-th row (R
m
), is practically given by
S
m
:
R
m
:=
MN

n=1
n=m
[a
m,n
[ = [S
m
[ = S
m
(33)
where the right equality is due to the fact that

> 0 (see denition in (32)). All assignments


are made for iteration k. To prove the diagonally dominance property of A
k
, we rst calculate
the (signed) distance between the diagonal entry and the sum of off-diagonal terms:
K
m
:= [a
m,m
[ R
m
= D
m
(34)
As S
m
is cancelled out, the remainder is D
m
in (31b). We are now interested in the sign of this
term. To this end let us examine the components of D
m
as shown in (31b). The rst term is
positive,

(o
2
) > 0 by denition (see (32)). Assuming a radiance map with high frequency
content (a necessary condition for DFD) assures

I
1v
,

I
2v
,= 0, inferring strict positivity of the
corresponding quadratic terms, in (31b). Finally our approximation for the Heaviside function:
H

(V
r
) =
1
2
_
1 +
2

arctan
_
V
r

,
__
, > 0 (35)
yields, 0 < H(V
r
) < 1. Having shown the positive sign for all the components of D
m
proves
the positivity of this term and therefore of the equal term K
m
in (34). The coefcient matrix A
k
in (28) is hence strictly diagonally dominant. .
Corollary 1: The xed point matrix is positive denite.
Proof According to Gershgorin circle theorem the eigenvalues of A
k
must lie within a discs
having the radius R
m
as derived in (33).
The diagonal entries of Asatisfy a
m,m
> 0 due to positivity of its components S
m
and D
m
, as
proved in Lemma 1. Since the matrix A
k
is strictly diagonally dominant, and all its diagonal el-
ements are positive, all the eigenvalues are strictly positive and therefore A
k
is positive-denite.
.
Corollary 2: The inner Gauss-Seidel iteration in (28) is convergent.
Proof The convergence properties of the GaussSeidel iterative scheme depend on the coefcient
matrix A
k
. Proven the strictly diagonally dominance of A
k
in (28), infers convergence of the
emerged iterative scheme [1]. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 17
4.3 Parallel implementation
The obtained numerical scheme is computationally intensive, particularly due to the recursive
defocus operations. Although our defocus operator is shift-variant,
8
the locality of the defo-
cus operator allows for a ne grain parallelism. In fact, such a case is typical for many solvers
of PDEs [13, 30, 20] and particularly for DFD, as previously shown in [2]. For a real-time
operation (on reasonable image sizes), we adapt our numerical scheme to harness the ne paral-
lelism available on GPU. The main challenge for this aspiration was the efcient solution of the
sparse linear system in (28). On CPU, such linear systems are often solved by Gauss-Seidel or
SOR relaxation methods, with inherently sequential nature. However, the classic Gauss-Seidel
scheme can be relaxed by the Red-Black strategy to allow for a parallel alternative [30]. A sin-
gle red-black relaxation then consists of two half iterations, each updating every alternate point
(called red and black points). The updates for all the red/black points are now parallel, as all the
dependencies are the neighboring counter-color pixels from the previous half iteration.
Finally, our gradient descent steps introduce an explicit parallel process, similar to the imple-
mentation in [2]. Note that the integral (sum) computation in (23) can be efciently distributed
by the reduction process on GPU. Another heavy process to harness GPU computing is the
image warp, which involves repetitive interpolations.
Our scheme typically converges in less than 20 iterations, over 2-3 levels in the pyramid.
The fast converging scheme together with ne grain GPU parallelism introduces rate of 0.5 MP
depth values per second on a NVIDIA GEForce GTX 460. This performance ramps up by factor
10 when the registration mode is off, i.e. DFD is applied on aligned images. In spite of the high
throughput there is still room for speed-up by device optimization.
5 Experimental Validation
In this section, we demonstrate the effectiveness of our approach on three types of data sets: syn-
thetic, simulated (where real images were computationally misaligned) and newly introduced
real-data captured in the wild. We further demonstrate the shortcomings of standard registration
methods particularly in case of asymmetric defocus kernel.
For comparison, we conduct the commonly separated approach, where images are primarily
registered and then used for DFD. The performance is then measured by the accuracy of the
registration (where applicable) and the recovered shape. To this end, we employ two different
8
in contrast with the shift-invariant convolution
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
18 Rami Ben-Ari
registration methods. The rst is based on dense appearance matching, formally minimizing the
following objective functional:
E
R
[T] =
_

_
[I
1
I
2
(Tx)]
2
+
2
dx (36)
We minimize this energy functional by gradient descent to obtain the registration, similar to
our unied approach. The second comparison is made with registration based on SIFT [19]
key-points, well known by its robustness and invariance to scale and rotation. Registration is
then based on tting a similarity transformation to the corresponding points (found according
to their descriptors), using least squares.
5.1 Synthetic and simulated images
Our synthetic and simulated data consist of four image pairs as shown in Fig. 3. The rst syn-
thetic data encodes blur variation of a slanted plane and the second is a demonstration of an
equifocal plane, defocused by asymmetric kernel. The asymmetric kernel used was a Gaussian,
where the left side (negative x values) was vanished. This simple example demonstrates a lim-
itation of standard and separate registration methods, where the directional blur is commonly
perceived as displacement, resulting in registration errors.
We further test the methods on simulated real data, where one image is intentionally warped
to create a controlled misalignment. To this end, the celebrated Cup pair from[33] and Cylinders
from [7] were used.
Having the ground truth transformation in hand, we report the registration accuracies in Ta-
ble 1. Our results show high accuracy in all the test cases with errors below one quarter of a
pixel in translation, and up to 0.001

and 0.1% rotation and scale, respectively. In separate reg-


istration and DFD however, we nd signicantly higher errors. For instance, in the asymmetric
defocus test case (ASK), both comparing methods are drifted as expected, due to the directional
blur. While the SIFT registration succeeds in two of four tests, larger errors are obtained in the
Cylinders case while lack of key-points in the Cup set
9
yields a registration failure (indicated
by dashed signs in the table).
Errors in registration impact the recovered depth map. Fig. 4, demonstrates how conducting
DFD on raw (misaligned) data harms the depth map evaluation. Although prior registration
improves this outcome, still leaving a few noticeable artifacts. Inconsistent depth values are
9
Based on the SIFT code provided by David Lowe at http://www.cs.ubc.ca/

lowe/keypoints/. We have also tested the


OpenCV implementation on all the test cases. It resulted in key-points which in most cases were wrongly matched.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 19
Fig. 3: Synthetic and simulated data sets. Left to right: Synthetic plane, equifocal surface with assymetric
blur, Cup and Cylinders. Designated squares show zoom-in image patches to better visualize the change
in blur. Note how changes in optical setting yield different range of blur.
Ground Truth Misalignment
t
x
t
y
s
Plane 2.0 1.0 1.0

1.0%
ASK 3.5 1.0 0.0

0.0%
Cup 1.5 0.8 0.0

0.5%
Cylinder 1.2 2.5 -0.5

1.0%
Registration Errors
Apprearance SIFT Our
|t
x
| |t
y
| || |s| |t
x
| |t
y
| || |s| |t
x
| |t
y
| || |s|
Plane 0.76 0.07 0.875

0.5% 0.01 0.02 0.0 0.0 0.04 0.01 0.0 0.0


ASK 0.45 0.0 0.0 0.0 0.61 0.01 0.0 0.0 0.24 0.0 0.0 0.0
Cup 0.0 0.09 0.06

0.1% 0.08 0.02 0.001

0.1%
Cylinder 0.02 0.37 0.68

0.06% 0.05 0.14 0.05

0.0 0.01 0.03 0.006

0.1%
Table 1: The misalignments and registration errors for different approaches. On top, the ground truth:
translation (t
x
, t
y
) in pixels, rotation () and incremental scale between the images (s). Below: registra-
tion errors for the following approaches: Appearance: The dense appearance based registration, SIFT:
Sparse SIFT-based registration, Our: Our unied DFD and registration method. The resulting depth
maps inuenced by the registration accuracy are shown in Fig. 4. Dash signs in SIFT results indicate
registration failure due to lack of key points.
particularly present in the asymmetric kernel example, due to the registration errors, as reported
in Table 1. Although the SIFT matching outperforms the appearance method, the best results
are still obtained by the unied approach, that successfully copes with all the test cases. Note
the results for the aligned data (without the added misalignment) shown for comparison in the
right column of Fig. 4. In the Cup case, for instance, our depth map is comparable to the aligned
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
20 Rami Ben-Ari
(a) (b) (c) (d) (e)
Fig. 4: Evaluated depth (relative blur) for the data sets in Fig. 3 (see also Table 1) for misalignments
and registration errors). The depth color code is shown at the bottom . (a) Without registration, (b)
Appearance-based registration, (c) SIFT based alignment, (d) Our approach (e) DFD result for the orig-
inal aligned data, shown for comparison. From top to down: Synthethic plane, constant defocus with
asymmetric kernel, Cups and Cylinders. Note the signicant effect of the misalignment in (a). The uni-
ed approach outperforms the tested alternatives by showing results that are closest to the depth maps
obtained from aligned data. Note the example of asymmetric defocus. While the separate approaches in-
clude artifacts, ours yields a constant depth as expected (the slight color difference in the blur level w.r.t
the aligned data is in sub-pixel order). Failure in SIFT matching for the Cup set is due lack of key-points.
data with reported registration accuracy of 0.1 pixel. The conservation of discontinuities in the
depth maps, is a consequence of our TV regularization term.
5.2 Real data
We further extend our test bed with new real data, captured in nature
10
. The data includes
images of tiny creatures scaling in the intermediate domain of microscopic and typical macro-
scopic eld where, to the best of our knowledge, no previous data for DFD was shown. Particu-
larly, the images were captured in the wild from the living antlion, spider and two insects from
Aphrophoridae species. The camera used was a semi-professional high-end device, a Canon
5D Mark II. Images were captured while the cameras color sensor was operated in 21 MP
10
Courtesy of Ilia Lutsker - Orbotech Ltd.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 21
Miss-Alignment
Appearance SIFT Our
t
x
t
y
s t
x
t
y
s t
x
t
y
s
Antlion 1.67 1.1 0.04

1% 2.06 1.48 -0.185

0.9% 2.07 1.46 -0.181

0.9%
Spider -0.126 0.304 0.084

0.4% 1.15 -0.43 -0.169

2.8% -0.28 -0.17 0.006 2.3%


Aphrophoridae 1.19 -1.13 0.044

0.6% - - - - 1.2 -0.91 -0.02

0.7%
Table 2: Misalignments evaluated for the real data, indicated by translation (t
x
, t
y
) in pixels, rotation ()
and incremental scale between the images (s). Appearance: Registration based on appearance similarity,
SIFT: SIFT-based registration, Our: Our unied DFDand registration approach. Note the relatively large
scale (over 2%) in the spider case. While SIFT registration agrees with our method in the antlion case,
there are discrepancies in the spider and failure in the Aphrophoridae case due to lack of key-points.
Inaccuracy in registration yields artifacts in the recovered depth maps. See Fig. 6 and 7 for details.
and 14-bit raw mode, then demosaiced in Adobe Camera Raw, transformed to gray-level and
down-sampled to 1 MP (for imaging parameters the reader is referred to the project page
11
).
Images of our real data set are shown in Fig. 5 (in color). The scenarios include an antlion
on a groundsel-shaped plant, a spider on top of a ower and two Aphrophoridaes on a nest,
covered by white bubbles. This challenging data set introduces ne structures such as the sepals
Fig. 5: The real data set. From left top right: Antlion, Spider and Aphrophoridae.
and antennas on the antlion, the spiders legs and extremely thin elements like the spider hairs.
The focus setting was changed by mounting the camera on a rail and manually moving it
between the shots along the optical axis. The captured images endure misalignment due to
camera displacement and relative scale caused by the optical magnication. The misalignment
components are up to 2 pixels shift, 0.2

rotation and 2.3% scale (based on estimation). Table 2


summarizes the evaluated misalignments from our approach and the comparing methods.
11
www.cs.bgu.ac.il/

rba/dfd/DFDProjectPage.html
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
22 Rami Ben-Ari
The recovered depth maps are shown in Figures 6 and 7. Interesting details in the results
are the correct depth order of the antlions and spiders limbs. As Fig. 6 shows, while the mis-
alignment in the raw data causes a corrupted depth map, prior registration indeed improves
the results. However, a qualitative examination shows that while comparative results are ob-
tained with SIFT matching for the antlion, overall, the best results are still achieved in the
unied framework. In our depth map for the antlion the distance order of the limbs, particularly,
wings, antennas and legs are correctly recovered. The depth resolution is sufcient to expose
the slanted angle of the insects wing. More details are shown in Fig. 7. Interestingly, the results
reveal the existence of a second wing underneath the front one, hardly observable from the im-
ages (see lower row of antlions close-up in Fig. 7). This observation indicates a submillimetre
depth resolution. In the Spider case, the ower petals and the spiders limbs such as the head,
legs and the claws in front, are well recovered with their relative depth order. The result in the
Aphrophoridae case indicates the nest structure the existing insects.
Comparative examination in Figs. 6 and 7 shows the benets in our approach. While the
appearance based registration yields signicant artifacts in the antlion case the SIFT based
alternative shows comparable results to our unied approach. In the spider test-set however,
the unied approach outperforms SIFT registration as inconsistent depths appear in the ower,
spider head and ne structures such as the spider legs. Furthermore, due to the relatively smooth
pattern in the Aphrophoridae set, the SIFT approach fails (caused by lack of key-points). This
demonstrates another case where our appearance and unied framework has an advantage over
sparse and separate approaches.
We further provide a 3D visualization of our depth maps in Fig. 8. Eminent details are vi-
sualized in this representation. Note the 3D structure of the antlions double folded wing the
ower petals in the spider image as well as the 3D poses of the spider legs, head and claws.
Fig. 8: Color-coded 3D visualization for real data set (please refer to the electronic copy or the project
page for better visibility).
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 23
In addition to depth, the relative blur actually labels sharper pixels, by determining the direc-
tion of blur between the two observations (see Section 2.3). This characteristic can be further
utilized to fuse the input images into a new view with improved sharpness. We call this new
image pseudo-radiance
12
and compute it by the following fusion equation:
r(x) = I
1
(x)H(V
r
) +I
2
(Tx) (1 H(V
r
)) , (37)
where the Heaviside function provides the (soft) labeling. The resulting pseudo-radiance images
obtained from our real data are shown in Figure 9. Note how these images are fused seamlessly,
to suggest a sharper view. Close-ups describe relevant parts in the input images and their corre-
sponding sharper variant fused into the pseudo-radiance map. This is an outcome of our reliable
depth ordering. On the project page
13
, we also provide 3D animations created by rendering our
recovered shapes with the pseudo-radiance image.
6 Summary and Discussion
Depth from Defocus (DFD) exploits optic blur to infer the shape of the scene from just two
images. A strong requirement for this approach is the accurate alignment between the input im-
ages, while a slight misalignment causes signicant error in the recovered depth map. Previous
DFD methods perform registration between the images as a preprocessing stage. However, as
the depth of eld is decreased to allow for higher depth resolution, the blur dominates, dete-
riorating the capability of standard approaches to accurately register the inputs. While robust
registration methods such as SIFT point matching, show invariance to rotation and scale and
partially to constant blur, they commonly lack of capability to accurately register images under
shift-variant and particularly asymmetric blur.
Based on the coupled nature of registration and blur we propose a unied model for depth
from defocus. To this end, we formally derive the image formation model and the corresponding
relative-blur paradigm in presence of misalignment. Assuming a rigid in-plane motion of the
object/camera, the observed images relate by a similarity transformation (in addition to optic
blur). Accordingly, the intuitive registration conducted by warping one image toward the other
is justied for a rotationally symmetric defocus kernel. However, in case of an arbitrary kernel
this relation is satised only for pure translation of the object.
The suggested framework accommodates both registration and depth (as relative blur) in
a single unied objective functional. The solution is then approached by a numerical scheme
12
In contrast with the radiance where all image parts are in focus
13
www.cs.bgu.ac.il/

rba/dfd/Movies/
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
24 Rami Ben-Ari
based on xed point iterations and gradient descent. We further show the equivalence of the
derived numerical scheme to the celebrated Newton-Raphson method and prove the conver-
gence of the iterative scheme associated with our linear system. This reects the stability of the
derived numerical scheme. A pyramidal approach (up to 2-3 levels) allows recovery of large
misalignments and copes with local minima.
Experimental evaluations cover synthetic data as well as real images from literature and
newly introduced photographs captured in wild. The tests show successful registration and re-
covery of the target shape. To emphasis the capabilities of the suggested approach, experimental
evaluations were made for two alternatives conducting separate registration and DFD. The out-
come shows that overall, our combined approach outperforms the comparing methods for both
registration and depth recovery.
DFD is recognized as a computationally intensive method for 3D reconstruction. In this
paper, we handle the computational burden by GPU computing. A new numerical scheme har-
nessing the ne parallelism of GPU computing yields runtime of 2 sec for 1 MP image. The
computational bottle neck is the registration explicit scheme as the throughput ramps up over
10 folds if DFD is activated without it.
Future work can move in various directions. In theoretical aspects, there is room for analysis
of DFD sensitivity with respect to registration. The impact of the interpolation in image warping
inspires another direction, as this approximation imposes a blur effect undesirably merged with
the optical blur. For further speed up, we intend to focus on more efcient schemes to recover the
registration in our model. In an era where multi-core devices become ubiquitous, the suggested
approach presents a practical alternative for inferring a realistic depth map that can further be
used for focus manipulation, object detection and segmentation.
Acknowledgements I would like to thank Dr. Shmuel Rippa for sharing with me his vast knowledge in numerical algebra, my college, Gonen
Raveh for his devoted assistance in GPU implementation and last but not least, Ilia Lutsker for providing me his amazing photographs. I further
thank the anonymous reviewers for their thorough review and constructive remarks which made a major improvement to this manuscript.
References
1. Bagnara, R.: A unied proof for the convergence of jacobi and gauss-seidel methods. SIAM Review 37 (1995)
2. Ben-Ari, R., Raveh, G.: Variational depth from defocus in real-time. In: The 3rd IEEE Workshop on GPU for
Computer Vision ICCV, pp. 522529 (2011)
3. Ben-Ari, R., Sochen, N.: A geometric framework and a new criterion in optical ow modeling. Journal of Mathe-
matical Imaging and Vision 33(2), 178194 (2009)
4. Ben-Ari, R., Sochen, N.: Stereo matching with mumford-shah regularization and occlusion handling. TPAMI 32(11),
20712084 (2011)
5. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical ow estimation based on a theory for warp-
ing. In: Proc. of 8th European Conference on Computer Vision, LNCS, vol. 3024, pp. 2536 (2004)
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 25
6. Ens, J., Lawrence, P.: An investigation of methods for determining depth from focus. TPAMI 15(2), 97108 (1993)
7. Favaro, P.: Shape from defocus via diffusion. IEEE TPAMI 30(3), 518531 (2008)
8. Favaro, P.: Recovering thin structures via nonlocal-means regularization with application to depth from defocus. In:
Proc. CVPR (2010)
9. Favaro, P., Soatto, S.: A geometric approach to shape form defocus. IEEE Transaction on Pattern Analysis and
Machine Intelligence 27(3), 112 (2005)
10. Favaro, P., Soatto, S.: 3-D Shape Estimation and Image Restoration-Exploiting defocus and motion blur. Springer-
Verlag (2007)
11. Green, P., Sun, W., Matusik, W., Durand, F.: Multi-aperture photography. In: ACM SIGGRAPH (2007)
12. Gupta, M., Agrawal, A., Veeraraghavan, A., Narasimhan, S.G.: Structured light 3d scanning in the presence of global
illumination. In: CVPR, pp. 713720 (2011)
13. Gwosdek, P., Zimmer, H., Grewenig, S., Bruhn, A., Weickert, J.: A highly efcient GPU implementation for varia-
tional optic ow based on the Euler-Lagrange framework. In: Proc. 3rd ECCV Workshop CVGPU (2010)
14. Hasinoff, S.W., Kutulakos, K.N.: Confocal Stereo. International Journal of Computer Vision 81(1), 82104 (2008)
15. Heo, Y.S., Lee, M., Lee, S.U.: Robust stereo matching using adaptive normalized cross-correlation. TPAMI 33(4),
807822 (2011)
16. Kumar, A., Ahuja, N.: A generative focus measure with application to omnifocus imaging. In: Int. Conference in
Computational Photography, pp. 112119 (2013)
17. Levin, A., Fergus, R., Durand, F., Freeman, W.T.: Image and depth froma conventional camera with a coded aperture.
In: Proc. SIGGRAPH (2007)
18. Lin, X., Suo, J., Wetzstein, G., , Q.D., Raskar, R.: Coded focal stack photography. In: Proc. ICCP (2013)
19. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. Journal of Comput. Vision 60(2), 91110
(2004)
20. Mairal, J., Keriven, R., Chariot, A.: Fast and efcient dense variational stereo on GPU. In: Proc. 3DPVT (2006)
21. Myles, Z., da Vitoria Lobo, N.: Recovering afne motion and defocus blur simultaneuosly. TPAMI 20(6), 652658
(1998)
22. Papenberg, N., Bruhn, A., Brox, T., Didas, S., Weickert, J.: Highly accurate optic ow computation with theoretically
justied warpings. IJCV 67(2), 141158 (2006)
23. Robinson, D., Milanfar, P.: Fundamental Performance Limits in Image Registration. IEEE Transactions on Image
Processing 13(9), 11851199 (2004)
24. Rosman, G., Dascal, L., Sidi, A., Kimmel, R.: Efcient beltrami image ltering via vector extrapolation methods.
SIAM-Journal of Imaging Sciences 2(3), 858878 (2009)
25. Rosman, G., Dascal, L., Ta, X.C., Kimmel, R.: On semi-implicit splitting schemes for the beltrami color image
ltering. Journal of Mathematical Imaging and Vision 40(2), 199213 (2011)
26. Sabater, N., Almansa, A., Morel, J.M.: Meaningful matches in stereo vision. IEEE transactions on pattern analysis
and machine intelligence 34(5), 93042 (2012)
27. Smith, W.J.: Modern Optical Engineering. McGraw-Hill (2000)
28. Soon-Yong, P.: An image-based calibration technique of spatial domain depth-from-defocus. Pattern Recognition
Letters 27(12), 13181324 (2006)
29. Subbarao, M., Surya, G.: Depth from defocus: A spatial domain approach. International Journal of Computer Vision
13(3), 271294 (1994)
30. Sundaram, N., Brox, T., Keutzer, K.: Dense point trajectories by GPU-accelerated large displacement optical ow.
In: Proc. ECCV, pp. 43845 (2010)
31. Szeliski, R.: Recovering 3d shape and motion from image streams using nonlinear least squares. In: Proc. CVPR
(1993)
32. Watanabe, M., Nayar, S., Noguchi, M.: Real-time computation of depth from defocus. In: Proc. of The International
Society for Optical Engineering (SPIE), vol. 2599, pp. 1425 (1996)
33. Watanabe, M., Nayar, S.K.: Rational lters for passive depth from defocus. Int. Journal of Computer Vision 27(3),
203225 (1998)
34. Weickert, J.: Anisotropic diffusion in image processing. Teubner Stuttgart (1998)
35. Zhou, C., Lin, S., Nayar, S.K.: Coded aperture pairs for depth from defocus. In: Proc. ICCV (2009)
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
26 Rami Ben-Ari
Fig. 6: Depth maps (relative blur) for the real data set. (a) Without registration, (b) Appearance-based
registration (c) SIFT matching (d) Our approach. From top to down: Antlion, Spider and Aphrophoridae.
Columns (a) and (b) are dominated by artefacts. The SIFT matching fails in the Aphrophoridae case
(third row) due to lack of key points. Color coding, presents red as near and blue as far from the camera.
Note that the background obtains intermediate zero blur due to lack of texture.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 27
Fig. 7: Close ups on depth maps (relative blur). These close ups show details and existing artifacts as
obtained fromdifferent approaches. (a) Without registration. (b) Appearance based registration. (c) SIIFT
matching. (d) Our approach. First row-Antlion: Outliers are observed in the head as well as the legs. At
the front wing, one can distinguish the secondary wing folded, a delicate detail hardly observed in the
image. Second row-Spider: Artifacts are observed on the ower and the leg. Note that Spiders leg in the
result is correctly recovered, only in our approach. Third row-Aphrophoridae: Observed inconsistencies
are marked by arrows, as sharp depth variations. The SIFT matching is failed here due to lack of key-
points.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
28 Rami Ben-Ari
Fig. 9: Synthesized best focus image (pseudo-radiance image). (a) Near focus. (b) Far focus. (c) Best
focus image. Note while parts appear in and out of focus in the input, our pseudo-radiance image includes
the sharper choice.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
A Unied Approach for Registration and Depth in Depth from Defocus 29
Rami Ben-Ari received the B.Sc. and M.Sc. degrees in Aerospace Engineering from Israel Institute of
Technology - Technion, and Ph.D. degree in Applied Mathematics from Tel-Aviv University, in 2008.
During 2008-2009 he was an Associate Researcher at the Technion (IIT) and Ben-Gurion University. In
2009 he joined Orbotech Ltd., the world leader in Automated Optical Inspection of electronics manu-
facturing, as an algorithm researcher and later on entitled as Expert in Computer Vision (in professional
path ladder). Currently, he is part of eyesight-Technologies, as a Senior Algorithm Researcher working
on visual tracking and gesture recognition. His recent research interests are in Computational Photog-
raphy, PDE methods in Computer Vision and Machine Learning, with mission to bring state-of-the-art
academic research into industrial applications.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Das könnte Ihnen auch gefallen