Beruflich Dokumente
Kultur Dokumente
c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands.
PAOLO FAVARO
Electrical Engineering Department, Washington University in St. Louis, One Brookings Drive,
St. Louis, MO 63130
fava@ee.wustl.edu
ANDREA MENNUCCI
Scuola Normale Superiore, Piazza Dei Cavalieri 7, Pisa 56100, Italy
mennucci@tonelli.sns.it
STEFANO SOATTO
Computer Science Department, University of California, Los Angeles, 405 Hilgard Avenue,
Los Angeles, CA 90095
soatto@ucla.edu
Received June 13, 2001; Revised April 18, 2002; Accepted September 9, 2002
Abstract. Accommodation cues are measurable properties of an image that are associated with a change in the
geometry of the imaging device. To what extent can three-dimensional shape be reconstructed using accommodation
cues alone? This question is fundamental to the problem of reconstructing shape from focus (SFF) and shape from
defocus (SFD) for applications in inspection, microscopy, image restoration and visualization. We address it by
studying the “observability” of accommodation cues in an analytical framework that reveals under what conditions
shape can be reconstructed from defocused images. We do so in three steps: (1) we characterize the observability
of any surface in the presence of a controlled radiance (“weak observability”), (2) we conjecture the existence of a
radiance that allows distinguishing any two surfaces (“sufficient excitation”) and (3) we show that in the absence of
any prior knowledge on the radiance, two surfaces can be distinguished up to the degree of resolution determined
by the complexity of the radiance (“strong observability”). We formulate the problem of reconstructing the shape
and radiance of a scene as the minimization of the information divergence between blurred images, and propose an
algorithm that is provably convergent and guarantees that the solution is admissible, in the sense of corresponding
to a positive radiance and imaging kernel.
Keywords: shape, surface geometry, low-level vision, shape from defocus, shape from focus
As an alternative to relying on prior assumptions, erties of the solution and its uniqueness, is to add regu-
one can try to retrieve spatial information by look- larization terms. Most regularization schemes impose
ing at different images of the same scene taken with smoothness constraints that are not appropriate for all
different imaging devices. Measurable properties of scenes. For example, Chan and Wong (1998) propose a
images which are associated with a changing view- total variation regularization that is effective in recover-
point are called “parallax” cues (for instance stereo and ing edges of images. Alternatively, regularization with
motion).1 wavelets is explored by Kalifa et al. (1998), Namba and
In addition to changing the position of the imaging Ishida (1998), and Neelamani et al. (1999).
device, one could change its geometry. For instance, In the computer vision community, blind deconvo-
one can take different photographs of the same scene lution is known as passive depth from defocus (e.g.
with different lens apertures or focal lengths. Similarly, Hwang et al., 1989; Schechner et al., 1994, 1998;
in the eye one can change the shape of the lens by acting Subbarao and Surya, 1994; Subbarao and Wei, 1992),
on the lens muscles. We call “accommodation cues” all which is different from active depth from defocus
measurable properties of images that are associated (Girod and Scherock, 1989; Nayar et al., 1995; Noguchi
with a change of the geometry of the imaging device. and Nayar, 1994; Scherock, 1981) where structured
The study of accommodation raises a number of light is used. While in the passive technique both the
questions. Is accommodation a visual cue (i.e. does it radiance (or “texture” or “deblurred image”) and the
carry information about the shape of the scene)? What depth are unknown, in the active case the radiance is
are the conditions under which two different scenes can given as a pattern projected onto the scene.
be distinguished, if at all? Do such conditions depend The literature on active/passive depth from defocus
upon the particular imaging device? If two scenes are can be divided into two main groups: one poses the
distinguishable, is there an algorithm that provably dis- problem within a statistical framework (Rajagopalan
tinguishes them? How does the human visual system and Chaudhuri, 1995, 1997, 1998; Schechner and
make use of accommodation? Is it possible to render Kiryati, 1999), and one as a deterministic optimiza-
the accommodation cue, so as to create a “controlled tion (Pentland, 1997; Pentland et al., 1989, 1994;
illusion” in the same way photographs do for picto- Hopkins, 1955). In the statistical approach, one choice
rial cues? In this manuscript we intend to answer some is to use Markov random fields to describe both the im-
of these questions rigorously for an idealized imaging ages and the space-variant point spread function (see
model, and test the resulting algorithms on realistic data Section 1.2). This eliminates effectively the need for
sets. windowing, allowing for precise depth and radiance
estimation also where the equifocal assumption is not
1.1. Relation to Previous Work fully satisfied. Other methods (Farid and Simoncelli,
1998; Simoncelli and Farid, 1996) work on differen-
Depth from defocus can be formalized as a “blind de- tial variations in the image intensities using masks or
convolution” of certain integral operators, a problem exploiting aperture changes. In the deterministic ap-
common to many research fields. The same mathemat- proach, we find that many algorithms formulate the
ical formulation appears in inverse filtering, in speech problem in the frequency domain. The advantages of
analysis, in image restoration (corrupted by noise or op- such a representation are overshadowed by inaccu-
tical aberrations), in source separation, in inverse scat- racies due to windowing effects, edge bleeding, fea-
tering, in computer tomography. These are problems ture shifts, image noise and field curvature (see Ens
studied in a number of fields such as signal processing, and Lawrence, 1993; Nair and Stewart, 1992). A so-
information theory, communication theory and com- lution is to use highly specialized filters that operate
puter vision. Consequently, this paper relates to a large in narrow bands. The main problem is that narrow-
body of literature. band filters have to be defined on a large support;
Solving blind deconvolution can be posed as the min- therefore, the amount of computation grows consid-
imization of a discrepancy measure between the input erably. Xiong and Shafer (1995) propose a set of 240
data and the model that generates the data. However, moment filters that select narrow bands of the radi-
this approach inevitably leads to multiple solutions due ance, and allow for precise depth estimation. Gokstorp
to the ill-posedness of the blind deconvolution problem. (1994) uses a multiresolution local frequency repre-
A common procedure used to guarantee desirable prop- sentation of the input image pair. Many algorithms
Observing Shape from Defocused Images 27
operate in the spatial domain (see Watanabe and The brightness at a particular point is constrained to
Nayar, 1996a; Nayar and Nakagawa, 1990; Noguchi be a positive3 value that depends on the radiance r , the
and Nayar, 1994; Favaro and Soatto, 2000). One ap- surface S and the optics of the imaging device. Define
proach involves rational filters tuned for restricted fre- the vector u ∈ U where U is the set of parameters (for
quency bands. For the active depth from defocus, opti- instance focal length, optical center, lens aperture etc.)
mal patterns are derived and projected onto the scene that describe the geometry of the imaging device. We
according to the chosen point spread function. When denote the image of the scene as IuS (·, r ) to emphasize
the optical point spread function is modeled with a its dependence on the radiance r , the surface S and the
2D Laplacian, the best pattern is a checker board. A parameters u of the optics.
number of other spatial domain algorithms exist (see The shape S is represented as a piecewise smooth
Subbarao and Surya, 1994; Ziou, 1998; Asada et al., function. We consider the radiance4 r to be a positive in-
1998; Marshall et al., 1996) where the scene is approx- tegrable function defined on S, and we also assume that
imated by a planar surface, or the analysis is concen- the scene is populated by Lambertian5 objects. Given
trated on step edges, line edges, occluding edges and these assumptions, it is possible to find a change of co-
junctions (see Marshall et al., 1996; Asada et al., 1998). ordinates so that both the radiance r and the surface S
In our paper we study depth from defocus from a are defined on the domain .
deterministic point of view. We pose the problem in Based on these assumptions, a model of the imaging
a mathematical framework, and derive conditions for process that is suitable to study the accommodation cue
a unique reconstruction of both radiance and surface. is given by integral equations of the form
In particular, we give precise definitions of observ-
ability of radiance and shape for active and passive Iu (y, r ) =
S
h uS (y, x)r (x) dx (1)
depth from defocus (see Sections 2 and 4), and iden-
tify which radiances are “unobservable” and to what where y ∈ are the image plane coordinates, and x ∈
degree reconstruction of radiance and shape is possible are the scene coordinates. The kernel h has two differ-
(see Section 3). A similar study, but restricted to finite ent interpretations: for any fixed x ∈ , the function
impulse response filters, has been done in Harikumar y → h uS (y, x) is the point-spread function of the optics;
and Bresler (1999). dually, for any fixed y ∈ , the function x → h uS (y, x)
In addition to answering observability issues, we is a function that has the normalization property
propose a locally optimal algorithm which borrows
mainly from blind deconvolution techniques (Kundur h uS (y, x) dx = 1. (2)
R2
and Hatzinakos, 1998; Snyder et al., 1992). We choose
as criterion the minimization of the information diver- Both functions can be either a delta measure or a
gence (I-divergence) between blurred images, moti- bounded function (that is continuous inside a support,
vated by the work of Csiszár (1991). The algorithm but possibly discontinuous across its border); they de-
we propose is iterative, and we give a proof of its con- pend on u ∈ U, the camera parameters, and on the sur-
vergence to a (local) minimum and test its performance face S.
on both real and simulated images. Given the characteristics of the kernel, we note that,
by duality, r belongs to Lloc (R2 ), which includes locally
integrable positive functions as well as delta measures.
1.2. Notation and Ideal Formalization This is why we call r the radiance distribution, or en-
of the Problem2 ergy distribution.6
Since the model Eq. (1) is common to most of the
In order to introduce some of the concepts that we will literature on shape from defocus, we will not further
address in this paper, consider a scene whose three- motivate it here, and adopt it for analysis in the follow-
dimensional shape is represented by S and that emits ing sections (Chaudhuri and Rajagopalan (1999) for
energy with a radiance r . While we will make the mean- details).
ing of S and r precise soon, an intuitive grasp suffices In some particular cases the surfaces may be de-
for now. scribed by some parameters s. For example, suppose
The image of the scene can be represented as a func- that S consists of a single slanted plane: the scene S
tion with support on a compact subset ⊂ R2 of the is then completely described by the intercept with the
imaging surface (e.g. the retina or the CCD sensor). optical axis S Z , and by a reduced normal ñ, so that
28 Favaro, Mennucci and Soatto
the surface S can be represented by the parameters Piecewise smooth surfaces may have discontinuities
s = (S Z , ñ) ∈ R3 . In particular, in Appendix C we will (which form a set of measure zero). We do not distin-
study the case of equifocal planes. We will consider the guish between surfaces that differ on a set of measure
(ideal) model of a thin lens camera that is imaging an zero.
equifocal plane. In this case the kernel h uS (y, x) takes The purpose of this section is to establish that piece-
the form of a pillbox kernel. The scene is completely wise smooth surfaces are weakly observable. In order
described by the depth s = z of the imaged plane, and to prove this statement we need to make explicit as-
the parameters u encode the geometry of the camera. sumptions on the imaging system. In particular,
The same surface representation will be used locally
at each point in Section 6. This will considerably sim- (1) For each scalar z > 0, and for each y ∈ , there
plify the implementation of the proposed algorithm for exists a setting ū, which we call focus setting, and
estimating depth from defocus. an open set O ⊂ , such that y ∈ O and for any
In Eq. (1), we assume that the values IuS (y, r ) can be surface S smooth in O with S(y) = z, we have
measured for each y ∈ and control parameter u ∈ U.
The energy distribution, or radiance, r is unknown, and h ūS (y, x) = δ(y − x) ∀x∈ O
h uS (y, x) is known only up to the surface S, which is
unknown. The scope of this paper can be summarized (2) Furthermore, given a scalar z > 0 and a y ∈ , such
into the following question: a focus setting ū is unique in U.
(3) For any surface S and for any open O ⊂ such
Question 1. To what extent and under what conditions that S is smooth in O , we have
can the radiance r and the surface S be reconstructed
from measurements of defocused images IuS (y, r )? h uS (y, x) > 0 ∀ x, y ∈ O
The next few sections are devoted to answering the whenever u is not a focus setting.
above question. We proceed in increasing order of gen-
erality, from assuming that the radiance r can be chosen The above statements formalize some of the proper-
purposefully (Section 2) to an arbitrary unknown vari- ties of typical real aperture cameras. The first statement
ation (Section 4). Along the way, we point out some corresponds to guaranteeing the existence of a control
issues concerning the hypothesis on the radiance in u at each point y such that the resulting image of that
the design of algorithms for reconstructing shape from point is in focus. Notice that the control u may change
focus/defocus (Section 3). at each point y depending on the scene’s surface. The
second property states that such a control is unique.
2. Weak Observability Finally, when the kernel is not a Dirac delta function,
it is a strictly positive function over an open set.
The concept of “weak observability”, which we are We wish to emphasize that the existence of a fo-
about to define, is relevant to the problem of shape cus setting is a mathematical idealization. In practice,
reconstruction from active usage of the control u and diffraction and other optical effects prevent a kernel
the radiance pattern r . from approaching a delta distribution. Nevertheless,
Consider two scenes with surfaces S and S analysis based on an idealized model can shed light
respectively. on the design of algorithms operating on real data. Un-
der the above conditions we can state the following
Definition 1. We say that a surface S is weakly indis-
result:
tinguishable from a surface S if for all possible r ∈
Lloc (R2 ) there exists at least a radiance r ∈ Lloc (R2 )
Proposition 1. Any piecewise smooth surface S is
such that we have
weakly observable.
IuS (y, r ) = IuS (y, r ) ∀y∈ ∀ u ∈ U. (3)
Proof: See Appendix A.
Two surfaces are weakly distinguishable if they are
not weakly indistinguishable. If a surface S is weakly Proposition 1 shows that it is possible—in
distinguishable from any other surface, we say that it principle—to distinguish the shape of a surface from
is weakly observable. that of any other surface by looking at images under
Observing Shape from Defocused Images 29
different camera settings u and radiances r . This, how- us to model it as L2 (R2 ). However, such a restriction is
ever, requires the active usage of the control u and of legitimate only if the radiance r does not have any har-
the radiance r . While it may be possible to change r in monic component, which we will define shortly. Before
an active vision context, this is not the case in a general doing so, we remind the reader of the basic properties
setting. of harmonic functions.
The definition of weak observability leaves the free-
dom to choose the radiance r to distinguish S from S . Definition 3. A function r : R2 → R is said to be
As we have seen in the proposition above, this is in- harmonic in an open region O ⊂ R2 if
deed sufficient to render any piecewise smooth surface
observable. However, a natural question is left open: . ∂2
2
r (x) = r (x) = 0
i=1 ∂xi
2
Question 2. Can a radiance r be found that allows
distinguishing S from any other surface?
for all x ∈ O.
We address this issue in the next subsection.
Proposition 2 (mean-value property). If r : R2 → R
is harmonic, then, for any integrable function
2.1. Excitation
F : R2 → R that is rotationally symmetric,
Let I(S | r ) denote the set of surfaces that cannot be
distinguished from S given the energy distribution r : F(y − x)r (x) dx = r (y) F(x) dx
I(S | r ) = S̃ IuS̃ (y, r ) = IuS (y, r ) ∀ y ∈ , u ∈ U
whenever the integrals exist.
(4)
Proof: See Appendix B.
Note that we are using the same radiance on both sur-
faces S, S̃. This is the case, for instance, when using
Corollary 1. In general h uS (y, x) dx = 1; when we
structured light, so that r is a (static) pattern projected are imaging an equifocal plane, the kernel h uS (y, ·)
onto a surface. Clearly, not all radiances allow distin- is indeed rotationally symmetric. Hence, the above
guishing two surfaces. For instance, r = 0 does not al- property leads us to conclude that, in that case, if
low distinguishing any surface. We now discuss the r is harmonic, then no information can be obtained
existence of a “sufficiently exciting” radiance. from the accommodation cue; from Mennucci and
Soatto(1999), we have that, in many cases, h uS (y, ·) can
Definition 2. We say that a distribution r is sufficiently be approximated by a circularly symmetric function; if
exciting for S if r is a harmonic function, then
I(S | r ) = {S}. (5)
h uS (y, x)r (x) dx =
r (y) (6)
We conjecture that distributions with unbounded
variation on any open subset of the image are suffi-
so that we can state that harmonic radiances are neg-
ciently exciting for any surface, i.e. “universally ex-
ligible as carriers of shape information.
citing.” A motivation for this conjecture can be ex-
trapolated from the discussion relative to L2loc (R2 ) that
A simple example of a harmonic radiance is r (x) =
follows in Section 3, as we will see in Remark 1.
a0 x + a1 , where a0 and a1 are constants (i.e. a bright-
ness gradient). It is easy to convince ourselves that
3. Physically Realizable Radiances, Resolution when imaging an equifocal plane in real scenes, im-
and Harmonic Components ages generated by such a radiance do not change for
any focal setting (i.e. they appear as always perfectly
Energy distributions with unbounded variation cannot focused).
be physically realized. Therefore, in this section we The above result says that if one decomposes a radi-
are interested in introducing a notion of “bandwidth” ance into the sum of a harmonic function and a square
on the space of energy distributions, which would allow integrable function, the harmonic component can (and
30 Favaro, Mennucci and Soatto
Definition 4. Let {θi } be an orthonormal basis of Proof: Substituting the expression of r in terms of
L2 (R2 ). We say that a radiance r has a degree of the basis θi , we have
definition (or degree of complexity) k if there exists
a set of coefficients αi , i = 1, . . . , k such that
k
IuS1 (y, r ) = β j (y, u, S1 )αi θ j (x)θi (x) dx
k i, j=1
r (x) = αi θi (x). (7)
∞
i=1 + β j (y, u, S1 )αi θi (x)θ j (x) dx.
i, j=k+1
When k < ∞ we say that the distribution r is band- (11)
limited.
From the orthogonality of the basis elements θi we are
Note that for practical purposes there is no loss of gen-
left with
erality in assuming that r ∈ L2 (R2 ), since the energy
emitted from a surface is necessarily finite and the def-
k
inition is necessarily limited by the optics. If we also IuS1 (y, r ) = βi (y, u, S1 ) αi (12)
assume that the kernels h uS ∈ L2 (R2 × R2 ), we can de- i=1
fine a degree of resolution for the surface S.
from which we see that, if βi (y, u, S1 ) = βi (y, u, S2 )
Definition 5. Let h uS ∈ L2 (R2 × R2 ). If there exists a for all i = 1, . . . , k, then IuS1 (y, r ) = IuS2 (y, r ) for all
positive real integer ρ and coefficients β j , j = 1, . . . , ρ y, u.
such that
ρ Remark 1. The practical value of the last proposition
h uS (y, x) = β j (y, u, S)θ j (x) (8) is to state that, the more “irregular” the radiance, the
j=1 more resolving power it has. In the case of structured
light, the proposition establishes that irregular patterns
then we say that the surface S has a degree of resolu- should be used in conjunction with accommodation. An
tion ρ. “infinitely irregular” pattern should therefore be univer-
sally exciting, as conjectured in the previous section.
The above definitions of degrees of complexity (or
resolution) depend upon the choice of basis of L2 (R2 ).
We note that, as suggested by an anonymous re-
In the following, we will always assume that the two
viewer, the degree of resolution with which two sur-
are defined relative to the same basis. There is a natural
faces can be distinguished also depends upon the optics
link between the degree of complexity of a distribution
of the imaging system. In our analysis this is taken into
and the degree of resolution at which two surfaces can
account by the degree of complexity of the radiance,
be distinguished.
that can itself be limited, or it can be limited by the
optics of the imaging system.
Proposition 3. Let r be a band-limited distribution Besides the fact that delta distributions are a mathe-
with degree of complexity k. Then two surfaces S1 and matical idealization and cannot be physically realized,
S2 can only be distinguished up to the resolution deter- it is most often the case that we cannot choose the dis-
mined by k, that is, if we write tribution r at will. Rather, r is a property of the scene
∞ being viewed, over which we have no control. In the
h uS (y, x) = β j (y, u, S)θ j (x) (9) next section we will consider the observability of a
j=1 scene depending upon its particular radiance.
Observing Shape from Defocused Images 31
Definition 6. We say that the pair (S2 , r2 ) is indistin- In this section we formulate the problem of optimally
guishable from the pair (S1 , r1 ) if estimating depth from an equifocal imaging model. Re-
call that in the imaging model (1) all quantities are
IuS1 (y, r1 ) = IuS2 (y, r2 ) ∀y∈ ∀ u ∈ U. (13) constrained to be positive: r because it represents the
radiant energy (which cannot be negative), h uS because
it specifies the region of space over which energy is in-
If the set of pairs that are indistinguishable from tegrated, and IuS (y, r ) because it measures the photon
(S1 , r1 ) is a singleton (i.e. it contains only (S1 , r1 )), we count on the surface of the sensor (CCD) correspond-
say (S1 , r1 ) is strongly distinguishable. The following ing to the pixel y. We are interested in estimating the
proposition characterizes the set of indistinguishable energy distribution r and the parameters s that describe
scenes in terms of their coefficients in the basis {θi }. the shape S, by measuring a finite number L of images
obtained with different camera settings u1 , . . . , u L . If
Proposition 4. Let r1 have degree of complexity k. we collect the corresponding images IuS1 , . . . , IuSL and
The set of scenes (S2 , r2 ) that are indistinguishable from organize them into a vector I S = [IuS1 , . . . , IuSL ] (and so
(S1 , r1 ) are those for which r2 = r1 up to a degree of for the kernels h uSi , h S = [h uS1 , . . . , h uS L ]), we can write:
complexity k, and S1 = S2 up to a resolution ρ = k.
I S (y, r ) = h S (x, y)r (x) dx. (16)
Proof: For (S2 , r2 ) to be indistinguishable from
(S1 , r1 ) we must have
The collection of images I S (·, r ) represents the ob-
k servations (or measurements) from the scene. Together
β j (y, u, S1 )α1i θi (x)θ j (x) dx with it we define a collection of estimated images which
i, j=1
we generate introducing our current estimates for ra-
∞
diance r̂ and surface parameters ŝ (which encode the
= β j (y, u, S2 )α2i θi (x)θ j (x) dx (14)
estimated surface Ŝ) in the above equation. We call
i, j=1
such an estimated image BuŜ (·, r̂ ), and the correspond-
ing collection B Ŝ = [BuŜ1 , . . . , BuŜL ]:
for all y and u. From the orthogonality of {θi } and the
arbitrariness of y, u we conclude that
.
B Ŝ (y, r̂ ) = h Ŝ (x, y)r̂ (x) dx. (17)
α1i = α2i ∀ i = 1, . . . , k (15)
The collection I S (·, r ) is measured on the pixel grid
from which we have that β j (y, u, S1 ) = β j (y, u, S2 ) and, hence, its domain (and the corresponding do-
for almost all y, u and for all j = 1, . . . , k, and there- main of B Ŝ (·, r̂ )) is a finite lattice = [x1 , . . . , x N ] ×
fore S1 = S2 almost everywhere up to the degree of [y1 , . . . , y M ] for some integers N and M.
definition k. We now want a “criterion” (or cost function)
to measure the discrepancy between the collection of
The above proposition is independent of the particular measured images I S (·, r ) and the collection of esti-
imaging device used, in the sense that the dependency mated images B Ŝ (·, r̂ ), so that we can formulate the
is coded into the coefficients βi . Any explicit charac- problem of reconstructing both the radiance and the
terization will necessarily be dependent on the partic- surface parameters in terms of the minimization of .
ular imaging device used. It is possible to give explicit Common choices of criteria include, for example, the
examples using particular devices, for instance those least-squares distance between I S (·, r ) and B Ŝ (·, r̂ ), or
with kernels described in Mennucci and Soatto (1999), the integral of the absolute value of their difference
and concentrating on a class of surfaces, for instance (total variation) (Chan and Wong, 1998).
slanted planes or occluding boundaries. The calcula- The choice of such a criterion is in principle arbitrary
tions are straightforward but messy, and are therefore as long as the criterion satisfies a basic set of properties
not reported here. (i.e. it is always positive, and 0 if and only if I S (·, r )
32 Favaro, Mennucci and Soatto
and B Ŝ (·, r̂ ) are identical), and can lead to very dif- the optimal solution. To this end, suppose an initial
ferent results. We follow Csiszár’s approach (Csiszár, estimate of r is given: r0 . Then, iteratively solving the
1991) that poses the problem of defining an “optimal” two following optimization problems
discrepancy function within an axiomatic framework. s .
k+1 = arg min φ(s̃, rk )
Csiszár defines a set of desirable, but general, prop-
. s̃
(21)
erties that a discrepancy measure should satisfy. He rk+1 = arg min φ(sk+1 , r̃ )
r̃
concludes that, when the quantities involved are con-
strained to be positive7 (such as in our case), the only leads to the (local) minimization of φ, since
consistent choice of criterion is the so-called informa-
tion divergence, or I-divergence, which generalizes the 0 ≤ φ(sk+1 , rk+1 ) ≤ φ(sk+1 , rk ) ≤ φ(sk , rk ). (22)
well-known Kullback-Leibler pseudo-metric and is de-
fined as However, solving the two optimization problems in
(21) may be an overkill. In order to have the sequence
. S I S (y, r ) {φ(sk , rk )} converging monotonically it suffices that, at
(I (·, r ) B (·, r̂ )) =
S Ŝ
I (y, r ) log
y∈ B Ŝ (y, r̂ ) each step, we choose sk and rk in such a way as to
guarantee that Eq. (22) holds, that is
−I (y, r ) + B (y, r̂ )
S Ŝ
(18)
sk+1 | φ(sk+1 , r̃ ) ≤ φ(sk , r̃ ) r̃ = rk
(23)
rk+1 | φ(s̃, rk+1 ) ≤ φ(s̃, rk ) s̃ = sk+1 .
where log(·) indicates the natural logarithm. Notice that
the function defined above is always positive and is 0 The iteration step of the surface parameters sk can
if and only if I S (·, r ) coincides with B Ŝ (·, r̂ ). How- be realized in a number of ways.
ever, the I-divergence is not a true metric as it does not In the next section we will adopt the equifocal as-
satisfy the triangular inequality. sumption, which in general holds only locally and
The I-divergence is defined only for positive func- where the surface is smooth. However, the derivation
tions. We assume ŝ represents a positive function (with of the minimization algorithm that follows does not de-
respect to the chosen parameterization) and r̂ is a pos- pend on this particular choice. We generically indicate
itive function defined on R2 . Furthermore, we assume the step on the surface parameters as:
that ŝ and r̂ are such that the estimated image B Ŝ (y, r̂ )
(where Ŝ is the surface generated with the parameters sk+1 = arg min φ(s̃, rk ). (24)
ŝ) has finite values for any y ∈ . In this case, we say s̃
that ŝ and r̂ are admissible. The second step is obtained from the Kuhn-Tucker con-
In order to emphasize the dependency of the cost ditions (Luemberger, 1968) associated with the prob-
function on the parameters ŝ and the radiance r̂ , we lem of minimizing φ for fixed s̃ under positivity con-
define straints for rk+1 :
Proof: The proof follows Snyder et al. (1992). From φ(s̃, rk+1 ) − φ(s̃, rk ) ≤ 0. (33)
the definition of φ in Eq. (18) we get
Note that Jensen’s inequality becomes an equality if
B S̃ (y, rk+1 ) and only if FrS̃k is a constant; since the only admissible
φ(s̃, rk+1 ) − φ(s̃, rk ) = − I S (y, r ) log
y∈ B S̃ (y, rk ) constant value is 1, because of the normalization con-
straint, we have rk+1 = rk , which concludes the proof.
+ B S̃ (y, rk+1 ) − B S̃ (y, rk ).
y∈
(28) Finally, we can conclude that the proposed algorithm
generates a monotonically decreasing sequence of val-
The second sum in the above expression is given by ues of the cost function φ.
h S̃ (y, x)rk+1 (x) dx − h S̃ (y, x)rk (x) dx Corollary 2. Let s0 , r0 be admissible initial condi-
y∈ y∈ tions for the sequences sk and rk defined from Eqs. (24)
and (27) respectively. The sequence φ(sk , rk ) converges
= h 0S̃ (x)rk+1 (x) dx − h 0S̃ (x)rk (x) dx (29)
to a limit φ ∗ :
where we have defined h 0S̃ (x) = y∈ h S̃ (y, x), while lim φ(sk , rk ) = φ ∗ . (34)
k→∞
from the expression of rk+1 in (27) we have that the
ratio in the first sum is
Proof: Follows directly from Eqs. (24), (22) and
Proposition 5, together with the fact that the I-
B S̃ (y, rk+1 ) h S̃ (y, x)rk (x)
= FrS̃k (x) dx. (30) divergence is bounded from below by zero.
B S̃ (y, rk ) B S̃ (y, rk )
We next note that, from Jensen’s inequality (Cover and Even if φ(sk , rk ) converges to a limit, it is not neces-
Thomas, 1991), sarily the case that sk and rk do. Whether this happens
or not depends on the observability of the model (16),
S̃ which has been analyzed in the previous sections. In
(y, x)rk (x)
h
log FrS̃k (x) dx Appendix D we compute the Cramér-Rao lower bound
B S̃ (y, rk ) considering that the images are affected by Gaussian
S̃
h (y, x)rk (x) noise. The Cramér-Rao bound allows us to derive con-
≥ log FrS̃k (x) dx (31)
B S̃ (y, rk ) clusions about the settings that yield the best estimates.
34 Favaro, Mennucci and Soatto
0.055
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
Noise variance (normalized with respect to the radiance magnitude)
Figure 1. Experiments with synthetic data: mean and standard deviation of the depth error as a function of image noise variance. The mean of
the depth error is normalized with respect to the true depth, i.e. = ŝ−s
s where ŝ is the estimated surface parameter (a single parameter in the
case of an equifocal plane) and s is the true surface parameter. The standard deviation of the depth error is also normalized with respect to the
true surface parameter s. Similarly, the noise variance is normalized with respect to the radiance magnitude so that the value 0.01 in the abscissa
means that the variance is 0.01 ∗ M (i.e. 1% of M), where M is the maximum radiance magnitude.
Figure 3. Near-focused original image (left); far-focused original image (right) (courtesy of S.K. Nayar). The difference between the two
images is barely perceivable since the two focal planes are only 340 mm apart.
Figure 4. Reconstructed depth for the scene in Fig. 3. On the left the depth map is rendered as gray levels (lighter corresponds to a smaller
depth value), while on the right it is rendered as a smoothed mesh.
Figure 5. Near-focused original image (left); far-focused original image (right). In this data set the maximum blurring radius is approximately
of 5 pixels.
Observing Shape from Defocused Images 37
Figure 6. Reconstructed depth for the scene in Fig. 5 coded in grayscale (lighter corresponds to smaller depth values) on the left, and recovered
radiance on the right. In this data set the algorithm is challenged with a larger blurring. During the iterative minimization of the I-divergence, a
smoothing constraint is also imposed to the surface to regularize the reconstruction process.
Figure 7. Texture mapping of the reconstructed radiance on the reconstructed surface of the scene in Fig. 5 as seen from a novel view.
where J1 is the Bessel function of the first kind. We define the set D2 = { f | J2 (| f |ρi∗ ) = 0 for a
i ≤ n} the set D2 is again union of a countable num-
Proposition 6. Suppose that we are given n ≥ 2 ber of circles, so it is a set of measure zero. Whenever
(different) defocused images.9 Suppose that r is non f ∈ A ∪ D0 ∪ D1 ∪ D2 , we have that the derivative
∂
zero and has finite energy (or, equivalently, suppose g ∗ ( f ) = 0;10 the gradient of dρi ,ρ j with respect to
∂ρi ρi
that the images IuSj are non zero and have finite energy); ρi , ρ j is then
let (ρ1 , . . . , ρn ), r (x) be a solution to the equations ∂ ∂
g
∂ρi ρi
g
∂ρ j ρ j
∇dρi ,ρ j = ,−
∀ j, y IuSj (y) = pρ j (y − x)r (x) dx (47) gρi gρ j
R2
J2 (| f |ρi ) J2 (| f |ρ j )
= |f| − , (51)
Then, (ρ1 , . . . , ρn ) is isolated in Rn . J1 (| f |ρi ) J1 (| f |ρ j )
40 Favaro, Mennucci and Soatto
It is easy to see that ∂ρ∂ i dρi ,ρ j and ∂ρ∂ j dρi ,ρ j are non or
zero for f ∈ A ∪ D0 ∪ D1 ∪ D2 . Then, it is possible to
express ρi in terms of ρ j (and vice versa) in equation ai, j ρi∗ = bi, j ρ ∗j
{dρi ,ρ j = 0}: this implies that, locally in a neighborhood
of (ρ1∗ , . . . , ρn∗ ), there is at most a parametric curve ai, j ρi∗ ρ ∗j 2 128 + ρi∗ 3 192 (58)
(ρ1 , . . . , ρn ) = (ρ1 (t), . . . , ρn (t)) of solutions. If there = bi, j ρ ∗j ρi∗ 2 128 + ρ ∗j 3 192
was such a curve, then, for any fixed i, j = i, all the gra-
dients {∇dρi ,ρ j ( f ) | ∀ f ∈ A ∪ D0 ∪ D1 ∪ D2 } would or (having c = ai, j /bi, j )
be parallel, that is, there would be constants vectors
(ai, j , bi, j ) (not zero) such that
ρi∗ = cρ ∗j
∂ ∂ (59)
ai, j dρi∗ ,ρ ∗j ( f ) = bi, j dρ ∗ ,ρ ∗ ( f ). (52) cρ ∗j 3 (1/128 + 1/192) = c3 ρ ∗j 3 128 + ρ ∗j 3 192
∂ρi ∂ρ j i j
We wish to show that this is not the case: we will
or
show that the above cannot be an equality: this will
prove the proposition.
The equality c3 /128 − c(1/128 + 1/192) + 1/192
We define InN (where the superscript N denotes that Taking the derivative of the last expression with respect
it is affected by noise), as a Gaussian random variable to ri we have
with mean In and covariance σ 2 ; the components of
. ∂ log ( p I N (I N )) M
InN − In
the random variable vector I N = {InN }n=1..M are also = h n,i (69)
independent from each other. ∂ri n=1
σ2
The probability density distribution of I N is
therefore and it is easy to verify that it satisfies the regularity
condition (a necessary condition to apply the Cramér-
M
1 ( InN −In )2 Rao lower bound):
p I N (I N ) = √ e− 2σ 2 . (63)
n=1 2πσ 2
∂ log ( p I N (I N ))
E =0 ∀ i ∈ [1, . . . , N ]. (70)
The radiance step defined in Eq. (27) is recursive and ∂ri
nonlinear, and thus it is not suitable for an evaluation of
Now, taking derivatives with respect to r j we obtain:
its covariance. However, we are interested to analyze
the “stability” of the estimator around the solution. In
∂ 2 log ( p I N (I N )) M
h n,i h n, j
this case we can approximate the estimated radiance r̂ =− . (71)
with the true radiance r and plug it in the right hand side ∂ri ∂r j n=1
σ2
of Eq. (27). Hence, with the notation just introduced,
we obtain Since the above equation is not dependent on any ran-
dom variable, taking the expectation does not change
1 M
h n,i InN the expression, and we have:
r̂i = ri M . (64)
h n,i In
n=1 n=1 M
h n,i h n, j
[I F ]i, j = . (72)
This estimator is unbiased, as we have n=1
σ2
Again, to simplify the notation we define the modi- 4. Note that, since neither the light source nor the viewer move, we
.
fied kernel h̃ n,i = Mh n,ih and its corresponding vector do not make a distinction between radiance and reflectance in this
. m=1 m,i paper. This corresponds to assuming that the appearance of the
H̃ n = [h̃ n,1 , . . . , h̃ n,N ]. Finally, we introduce the vec-
. . surface does not change when seen from different points on the
tors r = [r1 , . . . , r N ]T and Rn = [r1 h̃ n,1 , . . . , r N h̃ n,N ]. lens. This is the case when the aperture is negligible relative to
The bound is now expressed as the distance from the scene.
5. A surface is called Lambertian if its bidirectional reflectance
−1 distribution function is independent of the outgoing direction
M
RnT Rn
M
(and, by the reciprocity principle, of the incoming direction as
≥ H̃ nT H̃ n . (75) well). An intuitive notion is that a point on a Lambertian surface
n=1 r T H̃ nT H̃ n r n=1 appears equally bright from all viewing directions.
6. The term distribution is used in the sense of the theory of distri-
We notice that the bound depends on the radiance being butions (see for instance Friedlander and Joshi, 1998), and not
imaged and the kernel of the optics, that in turn depends in the sense of probability.
on the surface of the scene. 7. When there are no positivity constraints, Csiszár argues that the
only consistent choice of discrepancy criterion is the L2 norm,
Consider the simplifying case of having an almost
which we have addressed in Soatto and Favaro (2000).
constant radiance vector r ; recalling the earlier dis- 8. Moreover, if the relationship ρ j = ρ(z, u j ) is known only up to
cussions about observability of shape and radiance, certain number of parameters of the optics, then, given a large
this case yields a poor reconstruction and can be ana- enough number n of images, we may be able to recover the
lyzed as the worst case. The equation above becomes distance of the plane alongside the parameters of the optics: this
could lead to an auto-calibration of the model.
approximately
9. By saying that the images are defocused and different we mean
−1 ρi > 0, and ρi = ρ j when i = j.
M
M
T 10. Incidentally, this tells us that, for many values of f , it is possible
H̃ nT H̃ n ≥ H̃ n H̃ n (76) to make explicit the dependence of ρi on r̂ ( f ): this will be mostly
n=1 n=1 useful in studying how the noise affects the solution to (47).
M
which is satisfied if and only if n=1 H̃ nT H̃ n is the
identity matrix (given the constraints on H̃ n ). This cor- References
responds to having an image focused everywhere. On
the other hand, as our setting is such that the scene is Asada, N., Fujiwara, H., and Matsuyama, T. 1998. Seeing behind the
scene: Analysis of photometric properties of occluding edges by
more and more defocused, the bound is less and less reversed projection blurring model. IEEE Trans. Pattern Analysis
tight, and the estimation of the radiance worsens as and Machine Intelligence, 20:155–167.
expected. Chan, T.F. and Wong, C.K. 1998. Total variation blind deconvolution.
IEEE Transactions on Image Processing, 7(3):370–375.
Chaudhuri, S. and Rajagopalan, A.N. 1999. Depth from Defocus: A
Acknowledgments Real Aperture Imaging Approach. Springer-Verlag: Berlin.
Cover, T.M. and Thomas, J.A. 1991. Elements of Information Theory.
Wiley Interscience.
We wish to thank S.K. Nayar for kindly providing us
Csiszár, I. 1991. Why least squares and maximum entropy? An ax-
with experimental test data and with a special camera iomatic approach to inverse problems. Ann. of Stat., 19:2033–
to acquire it. We also wish to thank H. Jin for help- 2066.
ing with the experimental session and an anonymous Ens, J. and Lawrence, P. 1993. An investigation of methods for deter-
reviewer (Reviewer 1) for many useful comments and mining depth from focus. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 15(2):97–108.
suggestions. We also wish to thank Prof. J. O’Sullivan
Farid, H. and Simoncelli, E.P. 1998. Range estimation by opti-
for stimulating discussions. cal differentiation. Journal of the Optical Society of America A,
15(7):1777–1786.
Favaro, P. and Soatto, S. 2000. Shape and radiance estimation from
Notes the information divergence of blurred images. In Proc. European
Conference on Computer Vision, 1:755–768.
1. Note that it is still necessary to make a-priori assumptions in Friedlander, G. and Joshi, M. 1998. Introduction to the Theory of
order to solve the correspondence problem. Distributions. Cambridge University Press: Cambridge.
2. This section introduces the problem of depth from defocus and Girod, B. and Scherock, S. 1989. Depth from focus of structured
our notation. The reader who is familiar with the literature may light. In SPIE, pp. 209–215.
skip to Section 2. Gokstorp, M. 1994. Computing depth from out-of-focus blur using
3. We use the term “positive” for a quantity x to indicate x ≥ 0. a local frequency representation. In International Conference on
When x > 0 we say that x is “strictly positive”. Pattern Recognition, Vol. A, pp. 153–158.
Observing Shape from Defocused Images 43
Harikumar, G. and Bresler, Y. 1999. Perfect blind restoration of im- Proc. International Conference on Image Processing, pp. 636–
ages blurred by multiple filters: Theory and efficient algorithms. 639.
IEEE Transactions on Image Processing, 8(2):202–219. Rajagopalan, A.N. and Chaudhuri, S. 1997. Optimal selection of
Hopkins, H.H. 1955. The frequency response of a defocused optical camera parameters for recovery of depth from defocused im-
system. Proc. R. Soc. London Ser. A, 231:91–103. ages. In Computer Vision and Pattern Recognition, pp. 219–
Hwang, T.L., Clark, J.J., and Yuille, A.L. 1989. A depth recovery 224.
algorithm using defocus information. In Computer Vision and Rajagopalan, A.N. and Chaudhuri, S. 1998. Optimal recovery of
Pattern Recognition, pp. 476–482. depth from defocused images using an mrf model. In Proc. Inter-
Kalifa, J., Mallat, S., and Rouge, B. 1998. Image deconvolution in national Conference on Computer Vision, pp. 1047–1052.
mirror wavelet bases. In Proc. International Conference on Image Schechner, Y.Y. and Kiryati, N. 1999. The optimal axial interval
Processing, p. 98. in estimating depth from defocus. In IEEE Proc. International
Kundur, D. and Hatzinakos, D. 1998. A novel blind deconvolution Conference on Computer Vision, Vol. II, pp. 843–848.
scheme for image restoration using recursive filtering. IEEE Trans- Schechner, Y.Y., Kiryati, N., and Basri, R. 1998. Separation of trans-
actions on Signal Processing, 46(2):375–390. parent layers using focus. In IEEE Proc. International Conference
Luemberger, D. 1968. Optimization by Vector Space Methods. Wiley: on Computer Vision, pp. 1061–1066.
New York. Scherock, S. 1981. Depth from focus of structured light. In Technical
Marshall, J.A., Burbeck, C.A., Ariely, D., Rolland, J.P., and Martin, Report-167, Media-Lab, MIT.
K.E. 1996. Occlusion edge blur: A cue to relative visual depth. Schneider, G., Heit, B., Honig, J., and Bremont, J. 1994. Monocular
Journal of the Optical Society of America A, 13(4):681–688. depth perception by evaluation of the blur in defocused images.
Mennucci, A. and Soatto, S. 1999. The accommodation cue, In Proc. International Conference on Image Processing, Vol. 2,
part 1: Modeling. Essrl Technical Report 99-001, Washington pp. 116–119.
University. Simoncelli, E.P. and Farid, H. 1996. Direct differential range estima-
Murase, H. and Nayar, S. 1995. Visual learning and recognition or tion using optical masks. In European Conference on Computer
3d object from appearance. Intl. J. Computer Vision, 14(1):5–24. Vision, Vol. II, pp. 82–93.
Nair, H.N. and Stewart, C.V. 1992. Robust focus ranging. In Com- Snyder, D.L., Schulz, T.J., and O’Sullivan, J.A. 1992. Deblurring
puter Vision and Pattern Recognition, pp. 309–314. subject to nonnegativity constraints. IEEE Transactions on Signal
Namba, M. and Ishida, Y. 1998. Wavelet transform domain blind Processing, 40(5):1142–1150.
deconvolution. Signal Processing, 68(1):119–124. Soatto, S. and Favaro, P. 2000. A geometric approach to blind de-
Nayar, S.K. and Nakagawa, Y. 1990. Shape from focus: An effective convolution with application to shape from defocus. Proc. IEEE
approach for rough surfaces. In IEEE International Conference on Computer Vision and Pattern Recognition, 2:10–17.
Robotics and Automation, pp. 218–225. Subbarao, M. and Surya, G. 1994. Depth from defocus: A spa-
Nayar, S.K., Watanabe, M., and Noguchi, M. 1995. Real-time fo- tial domain approach. International Journal of Computer Vision,
cus range sensor. In Proc. International Conference on Computer 13(3):271–294.
Vision, pp. 995–1001. Subbarao, M. and Wei, T.C. 1992. Depth from defocus and rapid aut-
Neelamani, R., Choi, H., and Baraniuk, R. 1999. Wavelet-domain ofocusing: A practical approach. In Computer Vision and Pattern
regularized deconvolution for ill-conditioned systems. In Proc. Recognition, pp. 773–776,
International Conference on Image Processing, pp. 58–72. Taylor, M. 1996. Partial Differential Equations (volume i: Basic
Noguchi, M. and Nayar, S.K. 1994. Microscopic shape from focus Theory). Springer Verlag: Berlin.
using active illumination. In International Conference on Pattern Watanabe, M. and Nayar, S.K. 1996a. Minimal operator set for
Recognition, pp. 147–152. passive depth from defocus. In Computer Vision and Pattern
Pentland, A. 1987. A new sense for depth of field. IEEE Trans. Recognition, pp. 431–438.
Pattern Anal. Mach. Intell., 9:523–531. Watanabe, M. and Nayar, S.K. 1996b. Telecentric optics for com-
Pentland, A., Darrell, T., Turk, M., and Huang, W. 1989. A simple, putational vision. In European Conference on Computer Vision,
real-time range camera. In Computer Vision and Pattern Recogni- Vol. II, pp. 439–445.
tion, pp. 256–261. Xiong, Y. and Shafer, S.A. 1995. Moment filters for high preci-
Pentland, A., Scherock, S., Darrell, T., and Girod, B. 1994. Simple sion computation of focus and stereo. In Proc. of International
range cameras based on focal error. Journal of the Optical Society Conference on Intelligent Robots and Systems, pp. 108–113.
of America A, 11(11):2925–2934. Ziou, D. 1998. Passive depth from defocus using a spatial domain
Rajagopalan, A.N. and Chaudhuri, S. 1995. A block shift-variant approach. In Proc. International Conference on Computer Vision,
blur model for recovering depth from defocused images. In pp. 799–804.