Sie sind auf Seite 1von 11

932 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO.

3, MARCH 2015

Estimation of Sunlight Direction


Using 3D Object Models
Yang Liu, Theo Gevers, Member, IEEE, and Xueqing Li

Abstract— The direction of sunlight is an important informa- shading cues from 3D object geometry to estimate the light
tive cue in a number of applications in image processing, such source direction. The work by [3] and [4] deals with inaccurate
as augmented reality and object recognition. In general, existing object geometry. Other methods use multiple cues [5]–[8].
methods to estimate the direction of the sunlight rely on different
image features (e.g., sky, texture, shadows, and shading). These Instead of considering the 3D object geometry, these methods
features can be considered as weak informative cues as no single estimate the sunlight direction using multiple-cues such as the
feature can reliably estimate the sunlight direction. Moreover, sky, shadows, shading and texture patterns. The disadvantage
existing methods may require that the camera parameters are of the above methods is that they require either 3D geometry
known limiting their applicability. In this paper, we present of the scene or camera parameters limiting their applicability.
a new method to estimate the sunlight direction from a single
(outdoor) image by inferring casts shadows through object In this paper, the aim is to estimate the sunlight direction
modeling and recognition. First, objects (e.g., cars or persons) from a single (outdoor) image without requiring any prior
are first (automatically) recognized in images by exemplar-SVMs. information. The only assumption is that images contain at
Instead of training the Support Vector Machine (SVMs) using least one object (e.g. a person, animal, car, train or street light)
natural images (limited variation in viewpoints), we propose to generating a cast shadow on a planar surface (e.g. ground).
train on 2D object samples generated from 3D object models.
Then, the recognized objects are used as sundial cues (probes) These (cast) shadows are used to infer the sunlight direction.
to estimate the sunlight direction by inferring the corresponding Hence, objects are used as sundial cues to estimate the
shadows generated by 3D object models considering different sunlight direction by inferring the correspondence between
illumination directions. We demonstrate the effectiveness of our objects and their cast shadows. To achieve this, objects are
approach on synthetic and real images. Experiments show that automatically detected by the DPM detector [9]. The object
our method estimates the azimuth angle accurately within a
quadrant (smaller than 45°) and compute the zenith angle with category is obtained using exemplar-SVMs [10]. As it is
mean angular error of 23°. hard to obtain cast shadow ground-truth (different viewpoints)
Index Terms— Image processing, object detection, shadow from real scenes, instead of training SVMs on (real) images,
detection, sunlight direction estimation. we propose to train on 3D (synthetic) models. In this way,
(synthetic) training images are generated for a wide variety
I. I NTRODUCTION of viewpoints. Through 3D object modeling and detection,
the object category is obtained as well as the viewpoint.
I N IMAGE processing, the direction of the sunlight is an
important source of information for a number of applica-
tions such as augmented reality and object recognition. For
The sunlight direction is estimated by inferring the detected
shadows with shadow models generated under different camera
example, for augmented reality, objects may appear unrealistic viewpoints and directions of the sunlight.
without natural shadowing cues, see Fig. 1. To generate In summary, we propose a new 3D model inference
more realistic shading (Fig. 1.d), the sunlight direction is an approach to estimate the sunlight direction from a single
important cue. outdoor image. To achieve this, we combine the following
Existing methods compute the illumination (sunlight) contributions: (1) To locate objects using the DPM detector [9]
direction from shading and specular reflections. They can be and classify their category using exemplar-SVMs [10]; (2) To
divided into two classes: (3D object-based) [1], [2] using train exemplar-SVMs using (synthetic) images generated from
3D models and a wide range of viewpoints; (3) To generate
Manuscript received April 11, 2014; revised August 4, 2014 and shadow models from 3D models for different directions of
October 13, 2014; accepted November 27, 2014. Date of publication Decem-
ber 22, 2014; date of current version January 26, 2015. The associate editor the light; (4) To estimate the sunlight direction by a model
coordinating the review of this manuscript and approving it for publication inference method.
was Mr. Pierre-Marc Jodoin. The remainder of this paper is organized as follows.
Y. Liu is with the Intelligent Systems Laboratory, University of Amsterdam,
Amsterdam 1098 XH, The Netherlands (e-mail: y.liu@uva.nl). In Section II, we give a brief overview of the related work.
T. Gevers is with the Intelligent Systems Laboratory, University of In Section III, we describe the method generating shadow
Amsterdam, Amsterdam 1098 XH, The Netherlands, and also with the models and present our sunlight direction estimation method.
Computer Vision Center, Universitat Autònoma de Barcelona,
Barcelona 08193, Spain (e-mail: th.gevers@uva.nl). In Section IV, experimental results are presented.
X. Li is with the School of Computer Science and Technology, Shandong II. R ELATED W ORK
University, Jinan 250000, China (e-mail: xqli@sdu.edu.cn). In this section, we discuss the methods that are most relevant
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. to our work including object recognition, shadow detection and
Digital Object Identifier 10.1109/TIP.2014.2378032 illumination estimation.
1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
LIU et al.: ESTIMATION OF SUNLIGHT DIRECTION 933

Fig. 1. Considering an input image (a), the aim is to estimate the sunlight direction with the sun dial inserted (b). For the purpose of augmented reality, an
object is inserted (c) with corresponding shading cues (d).

A. Viewpoint-Independent Object Detection Methods classifies them into shadow edges. [22] learns shadow
Can be divided in two categories. The first group starts features automatically using multiple convolutional networks
from a collection of training images taken from objects with (ConvNets) and classifies the contours into shadow edges
varying viewpoints [11]–[13]. Then, a classifier is trained for using the learned features. All the methods mentioned above
each viewpoint. In general, to obtain the ground-truth per do not distinguish shadows from shading. [23] detects hard
viewpoint, these methods may need human intervention to and soft shadows while ignoring surface texture and shading
determine the correspondence between training images and the using normal cues from a single RGB-D Image.
different viewpoints. Moreover, in most cases, it is difficult In our algorithm, for the shadow detection part, we follow
to obtain recordings from different viewpoints for the same the approach of [21]. The difference is that, instead of incor-
object class resulting in a limited range of viewpoints (limited porating the scene layout to remove false positives, we discard
and unbalanced training set). To overcome these drawbacks, false positives using object locations. In this way, our approach
the second group is based on 3D object modeling [14]–[18]. reduces the search space in the shadow inference step.
[15] generates shape models from 3D CAD data and
establishes the correspondence between 3D CAD models and C. Illumination Estimation Methods
real test images. Instead of using geometric features for Can be divided into three categories. The first type of
matching, [14], [16] generate a vocabulary of photometric methods is data-driven. [5] uses four different cues to estimate
descriptors from existing 3D models and use these discrim- the sun position: the sky, the shadows on the ground, the
inative descriptors. [18] uses individual vertices and faces shading on vertical surfaces and the appearance of humans.
to describe 3D deformable objects and makes it possible These weak cues are combined to yield a more robust
to reason about the locations and shapes of objects. shadow estimator. The second type of methods is geometry-
[17] collects images with corresponding 3D descriptions from driven. [24] uses occlusion information of the incoming light
the full 3D geometry of a scene given 2D projection from to estimate the illumination. For each pixel, the intensity
low-cost RGBD cameras (Microsoft Kinect). value is computed from different light sources that are not
The first type of viewpoint-independent object detection occluded. Then, a set of linear equations is obtained. The light
methods is simple and efficient but is limited by the amount distribution is acquired by solving these linear equations. The
and nature of training data. The second type maps 3D objects drawback of the method is that it requires the 3D geometry
onto 2D images solving the problem of constrained training of the scene.
data. However, they use complex training models which are The last type of methods uses shadow models. [3], [4]
time-consuming. Therefore, in contrast to previous work, we generate shadow models and match them with real shadows
combine the two types of methods by a 3D model inference using a higher-order graphical model. This method needs
approach using exemplar-SVMs [10]. Our method generates human intervention to adjust the camera parameters and
different viewpoints of objects from 3D models and uses to determine the viewpoint of the objects. In contrast
each generated instance as an exemplar. This implies that to [3] and [4], our method detects the viewpoint of objects by
our method directly transfers the 3D geometry and viewpoint a viewpoint-independent object detection approach. For sun-
during the detection process and there is no need to train light direction estimation, we match a model with a distance
complex models. transformed image (DT image) rather than an edge map as
in [3] and [4]. This allows, to a certain degree, dissimilarity
B. Shadow Detection Methods between the model and the shadow boundaries.
Detect and/or remove shadows from a single
image [19]–[23]. The approach by [19] is based on a III. E STIMATION OF S UNLIGHT D IRECTION
physical model of illumination. To classify shadow edges, F ROM A S INGLE I MAGE
the method compares edges in the original RG B image In our approach, objects are automatically detected and
to edges derived from an illuminant-invariant image. Other used as sundial cues. The sunlight direction is estimated by
methods detect shadows based on training images. In [20], inferring the relationship between the detected objects and
regions are classified based on the frequency of different their cast shadows. Different objects can be taken, such as
shading features. [21] detects contours in images and then city attributes (e.g. kiosks, market stalls, traffic signs), animals
934 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 3, MARCH 2015

Fig. 2. Flowchart of the sunlight direction estimation algorithm. Starting with the input image, we detect objects by combining the DPM detector with
exemplar-SVMs. Shadows are detected using the method proposed by [21]. Then, the light source direction is computed by a model inference approach.

Fig. 3. Examples of 3D car and pedestrian models used in our algorithm.

(e.g. dogs, cats, cows and horses), vehicles (e.g. bikes, trams the different surfaces in the scene and the light direction.
and trains) and vegetation (e.g. flowers, bushes and trees). In this paper, to generate a wide variety and balanced set of
In this paper, we focus on cars and humans as they appear viewpoints, 3D models are used to describe the 3D geometry
most frequent in outdoor (street) scenes. of the object. Fig. 3 shows a number of 3D car and pedestrian
The flowchart of the proposed approach to estimate the light models used in our algorithm.
source direction is shown in Fig. 2. It consists of three major As we consider outdoor (street) scenes to estimate the
components: (1) object detection, (2) shadow detection, and sunlight direction, we assume that objects cast their shadows
(3) sunlight direction estimation. Given an image, the method onto planar Lambertian surfaces. To this end, we compute
detects objects and shadow boundaries in the image and then the minimum-volume enclosing rectangular bounding box
acquires a set of shadow models according to the viewpoint from the 3D model’s mesh geometry and then normalize the
of the object. To compute the sunlight direction, the shadow 3D model. Each model is rendered from a discrete number
boundaries are matched with the shadow models. of light directions (see Fig. 4), where V denotes the sunlight
Section III-A presents the process of generating shadows direction. It is represented by its angular position relative to
from 3D models. Section III-B discusses the process of detect- the camera V = (θ, φ) in which θ is the angle between
ing objects and their viewpoint classification. Section III-C the light direction and the normal of the ground n (zenith
describes the detection of shadow boundaries. Section III-D angle). φ denotes the angle between the light direction and
focuses on the estimation of the sunlight direction. the view direction of the camera (azimuth angle). d is the
distance between the model and the camera.
A. Model Acquisition There are three degrees of freedom: the object-camera
A shadow occurs when an object partially or totally distance, the zenith and the azimuth angles. The step angle
occludes direct lighting. Given an object and the direction is 10° each. The object-camera distance is at 10 units. The
of the light source (sunlight), a shadow image is defined as camera can move along a certain direction. This is defined as
follows: a different viewpoint of the 3D object which is acquired by
the corresponding exemplar. When generating shadow models,
E = E sun (θ, φ)S(θ, φ), (1)
we fix the viewpoint of the camera and move the object along
where (θ, φ) is the direction of the sunlight and E sun (θ, φ) is an axis. Finally, the standard camera model is used which is
the sun radiance. S(θ, φ) is the occlusion coefficient, where zero-skew and unit pixel ratio. We assume that the object is
S(θ, φ) = 0 means total occlusion. Otherwise S(θ, φ) = 1. positioned at the world coordinate origin (0,0,0). The field of
To generate cast shadows, the following information is view is set to default 40°, and the camera height is set to 90%
needed: the 3D geometry of the object, the reflectance of of the normalized size of the model.
LIU et al.: ESTIMATION OF SUNLIGHT DIRECTION 935

Fig. 4. Generating shadow models under different directions of the light Fig. 5. Generating synthetic images for training SVM models under
source. 3D models are fixed at the origin. The viewpoint of the camera is discretization of the camera parameters (azimuth and elevation). The camera
fixed at the world coordinate origin (0,0,0) and varies only along the viewpoint moves in both azimuth and elevation direction, while the distance between
direction. During the generating process, the light direction, both the zenith the 3D object and the camera is fixed.
angle and azimuth angle, are changed step by step by 10° each.

B. Object Detection Algorithm 1 The Pseudo Code for Estimating Sunlight


Direction From Corresponding Exemplar and Shadow
In general, object detectors provide a bounding box and
a category label, such as the DPM detector [9]. However,
the ensemble of exemplar-SVMs [10] is able to associate
each detection with a visually similar training exemplar. The
basic idea behind the method is to train a separate linear
SVM classifier for every exemplar in the training set. Although
this method can transfer each detection to the corresponding
3D model, it needs human intervention to associate 3D models
with each exemplar. However, it is very difficult to collect
images containing objects of all viewpoints. Besides, as each
model is tested at each position and scale in the image,
the detecting time will be much slower than standard object
detection methods.
In our method, objects in (2D) real images are located
first using the DPM detector [9]. Then, exemplar-SVMs are
used to classify each object obtained. Instead of using real
images for training the classifiers, we focus on training models
using synthetic images generated from 3D models under
discretization of the camera parameters azimuth and elevation is poor. [21] presents an algorithm to automatically detect
(see Fig. 5). The image generating process is similar to that of shadows casted by objects on the ground. The method is based
the shadow model generation where the camera moves around on the observation that the type of materials constituting the
the azimuth and elevation direction. ground in typical outdoor scenes are limited, most commonly
There are now two degrees of freedom: the azimuth and the including concrete, asphalt, grass, mud, stone, brick [8]. They
elevation of the camera. The step size of the angle is 10°.1 The assume that these shadows can be learned from a set of labeled
distance between the object and the camera is not considered. images of real world scenes. In fact, they transfer the detection
The same 3D models are considered that are also used in the problem as a classification between shadow and non-shadow
shadow model generation process, automatically associating boundaries. As shown in [21], reflections, self-shadowing, and
3D models with each exemplar. We normalize the 3D objects complex geometry are common phenomena that may confuse
and use the camera parameters described in the shadow model the classifier. In their implementation, they incorporate scene
generating process. layouts to remove false positives. Instead of incorporating the
scene layout to remove the false results, our approach uses
C. Shadow Detection object locations that are already detected in the first step.
The aim of the shadow detection process is to detect cast
shadow regions. Recent approaches have mainly used illumi- D. Sunlight Direction Estimation
nation invariants which can fail when the quality of images
After obtaining the detected object, its corresponding exem-
1 For pedestrians, we only consider standing/walking persons from frontal plar and candidate shadow boundaries, the sunlight direction
viewpoints. is estimated using a model inference method. Fig. 6 shows the
936 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 3, MARCH 2015

Fig. 6. Flowchart of our algorithm to estimate the sunlight direction. Given the detected object and its corresponding exemplar, a set of shadow models
are obtained. Each shadow model is matched with the detected shadow boundary in the region provided by the shadow model. The longest shadow model is
considered as the final matched shadow model.

flow chart of our algorithm for sunlight direction estimation.


The viewpoint and the 3D model are directly acquired from
the exemplar, and a set of shadow models are obtained
by the corresponding viewpoint. To reduce the influence of
errors on the object viewpoint detection, we also add shadow
models corresponding to the angles with an offset of the
detected viewpoint. Instead of searching all candidate shadow
boundaries in the image, only those regions are considered Fig. 7. Shadow boundary image and its corresponding DT image. Darker
which are close to the object. The search area corresponds to color indicates that it is closer to the nearest shadow boundary.
the size of the shadow model and the object location in the
image. Then, the sunlight direction is acquired by searching
the most similar shadow model in the selected region. The shadow boundaries. As a result, we do not select the shadow
shadow matching function is defined by: model with the lowest matching score as the correct one.
Instead, we set a threshold and any shadow model lower
1  than the threshold is taken as a candidate. Then, the longest
D(M, I ) = d I (m), (2)
|M| candidate is chosen as the correct shadow model. Finally, the
m∈M
sunlight direction is acquired from the selected shadow model
where M is the shadow model, and I represents the shadow (see Algorithm 1).
boundaries in an image. |M| denotes the length of the shadow
model M. d I (m) denotes the chamfer distance between pixel IV. E XPERIMENTS
m in the shadow model M and the closest shadow boundary
The proposed method is evaluated in three different
location in I . To compute the chamfer distance, the shadow
ways. First, in section IV-A, we quantitatively evaluate
boundary image is transformed into a distance transformed
our approach using images collected under controlled
image (DT image). The distance transform converts a binary
illumination conditions. We investigate how the accuracy
image, which consists of edge feature and non-feature edge
of each algorithmic component influences the overall light
pixels, into an image where each pixel value denotes the
source direction estimation result. Second, in section IV-B,
distance to the nearest edge feature pixel. Fig. 7 shows the
a quantitative evaluation is conducted on a dataset of manually
shadow boundary image and its corresponding DT image.
labeled natural images downloaded from the Internet. Then,
The advantage of matching a model with the DT image rather
in section IV-C, several qualitative results are provided for
than the edge image itself is that it allows for a certain degree
real images. Finally, an application is presented of inserting
of dissimilarity between the model and shadow boundaries in
a 3D object into a single 2D photograph.
the image.
To compute the correct shadow model, the match distance
is calculated between the shadow model and the shadow A. Synthetic Images: Quantitative
boundaries detected in the image. There are a number of We evaluate the proposed method quantitatively on a set
small size shadow models that can match parts of the detected of synthetic images. We render 3D car models with natural
LIU et al.: ESTIMATION OF SUNLIGHT DIRECTION 937

Fig. 8. Synthetic images used in the evaluation.

background under different light directions and generate TABLE I


120 images in total for testing. The models used here are R ESULTS ON S YNTHETIC I MAGES : F ROM L EFT TO R IGHT, THE M EAN
different from those used in the model generation process. E RROR I S S HOWN TO E STIMATE THE S UNLIGHT D IRECTION
Fig. 8 shows a few synthetic images. We will first evaluate U SING (1) O BJECT D ETECTION , (2) S HADOW
the accuracy of the viewpoint detection on synthetic images D ETECTION , (3) B OTH C OMPONENTS
in Section IV-A1. Then, we will demonstrate the influence of
each algorithmic component in Section IV-A2.
1) Viewpoint Detection: To evaluate the performance of the
viewpoint detection method, we compute the mean error of the
viewpoint detection averaged over all images in the synthetic
test set. The mean error is calculated as follows:

N
gt
The object is masked out, and shadow ground-truth is acquired
E mean = |Videt − Vi | (3) using edge detection.
i=1 Table I shows the errors of the sunlight direction, averaged
where N is the number of all images in the synthetic test set. over all images in the synthetic test set. The accuracy of
gt shadow detection has lowest mean error at 14.78°, and the
Videt is the viewpoint detected and Vi is the ground-truth. The
mean error is 36°, and this is smaller than the discrete step accuracy of object detection has the second lowest mean
size of 45°.2 The objects considered in this paper (persons and error at 23.37°. It can be derived that both shadow detection
cars) are roughly symmetric. For example, the back- and front- and object detection influence the estimation result. However,
view of a car could generate similar shadow models under the shadow detection has a higher impact than object detection.
same sunlight direction. Therefore, we also compute the mean This is because the detected shadows are correlated with the
error of the viewpoint considering object symmetry, resulting shadow models obtained not only from the original but also
in a mean error of 21.25°. When a viewpoint is estimated for from the rotated exemplar. This reduces the influence of errors
the given object, the light direction can be estimated. To do on object viewpoint detection. The mean error of the azimuth
so, the detected shadows in the image are correlated with the angle is 25.46° and is smaller than 45° which corresponds to
shadows of the exemplar, computed by rendering the exemplar the light direction estimation within an quadrant.
with several light directions, sampled on the hemisphere. Further, to test the influence of an increasing failure when
To increase robustness to errors in orientation estimation, detecting objects, we randomly generate bounding boxes for
the exemplar is also rotated by +20° before computing the every location and scale. Then the overlap scores between the
shadows. The detected shadows are then correlated with all generated bounding boxes and the ground-truth are computed
shadow models obtained from the original, and the rotated using a histogram of 10 bins. For each bin, a bounding
exemplar. box is randomly selected for further evaluation, with the
2) Influence of the Algorithmic Components: In this section, object viewpoint and shadow ground-truth. Fig. 9(a) shows the
each algorithmic component is analyzed on the synthetic influence of an increasing failure in object detection averaged
images generated from 3D car models. The aim is to study over 50 images. It can be derived that the error of sunlight
the contribution (failure sensitivity and discriminative power) direction decreases exponentially with the amount of object
of each component to the overall light source direction overlap (i.e. correct detection).
estimation. To demonstrate the effect of each component, To evaluate the influence of an increasing failure in view-
we investigate the influence of (1) object detection and point detection, for each object, different angle offsets are
(2) shadow detection. added with regard to the true viewpoint from 0° − 360° with
To test the influence of object detection, the ground-truth of 10° added at each step. To measure only viewpoint sensi-
object detection is manually labeled, and, to test the influence tivity, we keep the other components consistent. Therefore,
of shadow detection, we use the shadow ground-truth of test the correct object bounding boxes are used as well as the
images. The shadow ground-truth is generated using a method shadow ground-truth. Fig. 9(b) shows the influence of view-
that is very similar to that of the shadow model generation. point detection averaged over 50 images. It is shown that the
azimuth angle error increases with the offset added. It achieves
2 Usually in datasets for viewpoint detection, eight views are labeled such a local maximum around 90° offset from the ground-truth.
as frontal, left etc. The step size between these eight views is therefore 45°. Then, the angle error starts to decrease to a local minimum
938 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 3, MARCH 2015

Fig. 9. Influence of different algorithmic components averaged over 50 images: (a) percentage of correctly detected objects, (b) percentage of correct
viewpoints, (c) percentage of correct shadows. The error bars on the graph represent the standard deviation of the angle errors.

Fig. 10. Light direction detection using different thresholds for cars (Left) and pedestrians (Right).

around 180°. This confirms the assumption that symmetrical


viewpoints have little influence on the algorithm. Low accu-
racy is around 90° and 270° from the ground-truth. This is
because the viewpoint used for evaluation is vertical to the
ground-truth.
To evaluate the influence of increasing failures in shadow
detection, more and more shadows are discarded from the
shadow ground-truth. The other components are kept the same
i.e. correct object locations and viewpoints. Fig. 9(c) shows
that the azimuth angle accuracy decreases linearly with the
number of omitted shadows.
Fig. 11. Cumulative light direction azimuth angle error for different methods.
B. Real Images: Quantitative
In this subsection, our method is quantitatively tested on is provided using selected light parameters inserted in the
a set of real images. First, we introduce a new dataset original image. It is more convenient for human labelers to
with images containing objects and their corresponding adjust the light parameters so that it aligns with the perceived
shadows. Then, we analyze how the performance changes sunlight direction. Even so, for some images, it is still difficult
(varying threshold) for the different object categories. to provide the sunlight direction. To overcome this problem,
Third, we consider the performance and computational cost we ask 9 persons to label the sunlight directions in the image.
for different sampling steps. Finally, we compare our results The mean shift method is used to cluster the labeled ground-
with [5] on the new dataset. truth. The center of the cluster with most items is considered
1) Data Set: [5] introduces a dataset taken from webcam as the final ground-truth.
sequences and use the technique of [8] and [25] to estimate 2) Parameter Analysis: To remove the influence of small
the position of the sun. However, only a few images from shadow models, we first truncate the shadow models with a fix
the dataset contain objects with corresponding cast shadows. threshold, and then select the shadow model with largest length
Therefore, we collected a new dataset. We randomly selected as our final result. In this subsection, we estimate the light
100 outdoor images from Flickr containing objects and their direction using different thresholds for cars and pedestrians
corresponding shadows (50 for cars and 50 for pedestrians). respectively on the full collected dataset. We use the azimuth
Then, we manually label the sunlight direction. An interactive angle error as standard, and the results are shown in Fig. 10
graphical interface is used that resembles the virtual sun dials (Left for cars and right for pedestrians).
to help users to label the ground-truth. The task is to select From Fig. 10, it is shown that similar trends exist for cars
the sunlight direction in the circle part. A synthetic sun dial and pedestrians when the threshold changes. The azimuth
LIU et al.: ESTIMATION OF SUNLIGHT DIRECTION 939

Fig. 12. Cumulative light direction azimuth angle error for cars (Left) and for pedestrians (right).

Fig. 13. Estimation results for cars (top) and pedestrians (bottom) as sundial cues respectively. The first and third rows are results computed by the proposed
method. The second and forth rows are ground-truth labeled by human.

TABLE II TABLE III


R ESULTS U SING D IFFERENT S AMPLING S TEPS FOR C ARS R ESULTS U SING D IFFERENT S AMPLING S TEPS FOR P EDESTRIANS

angle error starts to drop when the threshold is very small. 3) Shadow Model Sampling: In our algorithm, we render
When the threshold exceeds a certain value, the angle error shadow models from 3D models under a discrete number of
starts to increase. For cars, the minimal angle error is light directions and match those generated shadow models
at 2.4, while the minimum angle error is 2.1 for pedestrians. with the detected shadows. In this subsection, we analyze
Therefore, in the following experiments, we use the settings to what extent the angle error and computational cost are
2.4 for cars and 2.1 for pedestrians. influenced by the sampling scheme. Table II shows the results
940 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 3, MARCH 2015

TABLE IV
R ESULTS U SING DT AND NMI IN THE L IGHT
D IRECTION E STIMATION PART

with sampling changing from 10° to 60° for cars, and Table III
shows the results for pedestrians.
From Table II, it can be derived that the error of
sunlight direction increases exponentially when the sampling
size decreases. The computational cost also decreases. The
same observation holds for pedestrians (see Table III).
1) NMI Edge Matching: In [3] and [4], models are matched
with the edge distance. In this subsection, the same matching
approach is used. We match models based on NMI(Normalized
Moment of Inertia). NMI of an image has translation, rotation
and scale invariability [26].
Given an image I (x, y), the centroid (x c , yc ) of the image
is computed first:
M N
x=1 y=1 x × I (x, y)
xc =  M  N ,
x=1 y=1 I (x, y)
M N
x=1 y=1 y × I (x, y) Fig. 14. Comparison to Lalonde [5]. The blue lines in the left images
yc =  M  N (4) demonstrate the sunlight direction detected by our method, while the blue ones
x=1 y=1 I (x, y) in the right images demonstrate the sunlight direction detected by Lalonde’s
method. The black lines in all images demonstrate the ground-truth labeled
Then, the moment of intertia J(xc ,yc ) around the centroid is: by humans. The quantitative errors for both the azimuth and zenith angles are
shown at the right bottom of the image.

M 
N
J(xc ,yc ) = ((x − x c )2 + (y − yc )2 )I (x, y) (5) Our method outperforms the pedestrian-based illumination
x=1 y=1 predictor (see Table V).
Then, J(xc ,yc ) is normalized: Fig. 11 shows the cumulative light direction azimuth error
 for the approach of Lalonde, and our method using DT map
J(xc ,yc ) and NMI respectively. It can be derived that our method
N M I = M N (6) outperforms the approach of Lalonde on the new image set.
x=1 y=1 I (x, y)
This is also holds for cars (see Fig. 12 left) and for pedestrians
Table IV shows the results using DT and NMI in the light (see Fig. 12 right).
direction estimation part of our method. For the azimuth angle,
using DT performs outperforms NMI for all the objects. This
also holds for the zenith angle although a slightly less. This C. Real Images: Qualitative
confirms that using the DT map allows, to a certain degree, Fig. 13 shows several example results of applying our algo-
dissimilarity between the models and the shadow boundaries rithm on real images taken from our dataset. The results show
in the image. that the estimations are close to the ground-truth as labeled
2) Comparison: In this subsection, we carry out quantitative by humans. The relationship between the objects and casting
experiments with Lalondes approach on our new dataset. shadows provides a strong cue to estimate the azimuth angle.
Table V shows the comparison between Lalonde’s approach For images containing an object and its casting shadow, the
and our method. The mean error of the azimuth angles algorithm provides accurate results even for images with
is 82.68° while ours is 40.74°. Our method outperforms occluded cast shadows.
Lalonde’s approach for the azimuth angles. Fig. 14 shows the comparison between our method and the
Further, Lalonde proposed a pedestrian-based illumination method of [5]. Results show that both methods estimate the
predictor which is based on the shading information sunlight direction reasonably good. However, when shadows
generated by the human body. This is also a week cue have wrong directions, the results of [5] are poor.
for light direction estimation. Therefore, we compare our Our algorithm is based on the observation that the
method with their pedestrian-based illumination predictor here. relationship between the object and its casting shadow
LIU et al.: ESTIMATION OF SUNLIGHT DIRECTION 941

TABLE V
L IGHT D IRECTION E STIMATION R ESULTS ON THE R EAL I MAGE D ATASET OF O UT M ETHOD (L EFT ),
L ALONDE A PPROACH (M IDDLE ) AND L ALONDE A PPROACH U SING O NLY P EDESTRIANS (R IGHT )

Fig. 15. Typical failure cases. (a) object detection result; (b) shadow boundary detection result; (c) ground-truth labeled by human; (d) result using our
method.

Fig. 16. 3D object inserting. From estimating the light direction by our proposed method (a), we generate the 3D model and its corresponding shadow area
(b). Then, we render the real image with generated shadow area (c). In the end, the 3D object is rendered into the real scene (d).

provides sufficient information for sunlight direction pixel value from shadow region and that from the shadow’s
estimation. Fig. 15 shows typical failure cases. In the first surrounding area in each channel. Third, we merge the
image, although the shadow boundary is correctly detected, corresponding shadow areas of the 3D model in the image.
shadow boundaries generated by other objects are also The shadow is darkened with the scaling factors. Then
detected. For example, the shadow boundary at the end of the the shadow region is smoothed using Gaussian function to
car is considered as providing a cue to predict the sunlight generate soft shadow. Finally, the 3D model is rendered
direction as it is matched with a longer shadow model. (see Fig. 16). We use the same camera parameters as
In the second image, the object is not correctly detected. the matched shadow model. Another example is shown
In our algorithm, the shadow model is scaled according to in Fig. 1.
the bounding box enclosing the object. The inaccuracy of
the bounding box detection may lead to errors in sunlight V. C ONCLUSION
direction estimation. In this paper, we have proposed a method to estimate the
sunlight direction from a single image. We used objects (such
D. Application as cars and pedestrians) in scenes as sundial cues and estimated
The application is how to insert an object into a single the sunlight direction through the inference of objects and
2D photograph with the corresponding light direction. We use their casting shadows. The DPM detector by [9] has been
a simple method generating shadows with consistent direction. combined with the exemplar-SVMs [10]. Light direction has
First, we render a 3D object under the direction estimated by been estimated by matching the detected shadows with the
our approach. Second, we acquire scale factors between the shadows models generated from 3D models.
942 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 3, MARCH 2015

The experiments show that our approach is able to estimate [21] J.-F. Lalonde, A. A. Efros, and S. G. Narasimhan, “Detecting ground
the azimuth angle effectively with the accuracy within an shadows in outdoor consumer photographs,” in Proc. Eur. Conf. Comput.
Vis., 2010, pp. 322–335.
quadrant (smaller than 45°) and inference the zenith angle [22] S. H. Khan, M. Bennamoun, F. Sohel, and R. Togneri, “Automatic
reasonably. In the future, we plan to involve more objects, such feature learning for robust shadow detection,” in Proc. IEEE Conf.
as motorbikes, traffic signs and buildings that often appear in Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, pp. 1939–1946.
[23] Y. Xiao, E. Tsougenis, and C.-K. Tang, “Shadow removal from single
outside scenes to estimate the sunlight direction. RGB-D images,” in Proc. IEEE Comput. Conf. Comput. Vis. Pattern
Recognit. (CVPR), Jun. 2014, pp. 3011–3018.
[24] I. Sato, Y. Sato, and K. Ikeuchi, “Illumination from shadows,” IEEE
R EFERENCES Trans. Pattern Anal. Mach. Intell., vol. 25, no. 3, pp. 290–300,
[1] Y. Wang and D. Samaras, “Estimation of multiple directional light Mar. 2003.
sources for synthesis of augmented reality images,” Graph. Models- [25] J.-F. Lalonde, S. G. Narasimhan, and A. A. Efros, “What does the sky
Special Issue Pacific Graph., vol. 65, no. 4, pp. 38–47, Jul. 2003. tell us about the camera?” in Proc. 10th Eur. Conf. Comput. Vis., 2008,
[2] W. Zhou and C. Kambhamettu, “A unified framework for scene illu- pp. 354–367.
minant estimation,” Image Vis. Comput., vol. 26, no. 3, pp. 415–429, [26] X. Yang, G. Fu, D. Miao, and W. Zhang, “A new approach to target
Mar. 2008. recognition based on image NMI feature,” Comput. Eng., vol. 6, no. 28,
[3] A. Panagopoulos, D. Samaras, and N. Paragios, “Robust shadow and pp. 149–151, 2002.
illumination estimation using a mixture model,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2009, pp. 651–658.
[4] A. Panagopoulos, C. Wang, D. Samaras, and N. Paragios, “Illumination
estimation and cast shadow detection through a higher-order graphical
model,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Yang Liu received the B.S. and Ph.D. degrees from
Jun. 2011, pp. 673–680. the Department of Computer Science, Shandong
[5] J.-F. Lalonde, A. A. Efros, and S. G. Narasimhan, “Estimating natural University, Jinan, China, in 2008 and 2013,
illumination from a single outdoor image,” in Proc. IEEE 12th Int. Conf. respectively. His research interests are in the
Comput. Vis., Oct. 2009, pp. 183–190. areas of scene understanding, color constancy, and
[6] C. B. Madsen and B. B. Lal, Augmented Reality. Rijeka, Croatia: InTech, 3D scene reconstruction.
2010.
[7] J.-F. Lalonde, D. Hoiem, A. A. Efros, C. Rother, J. Winn, and
A. Criminisi, “Photo clip art,” ACM Trans. Graph., vol. 26, no. 3,
Aug. 2007, Art. ID 3.
[8] J.-F. Lalonde, S. G. Narasimhan, and A. A. Efros, “What do the sun and
the sky tell us about the camera?” Int. J. Comput. Vis., vol. 88, no. 1,
pp. 24–51, May 2010.
[9] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan,
“Object detection with discriminatively trained part-based models,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627–1645, Theo Gevers (M’01) is currently a Full
Sep. 2010. Professor of Computer Vision with the
[10] T. Malisiewicz, A. Gupta, and A. A. Efros, “Ensemble of exemplar- University of Amsterdam (UvA), Amsterdam,
SVMs for object detection and beyond,” in Proc. IEEE Int. Conf. The Netherlands, and a Full Professor with the
Comput. Vis., Nov. 2011, pp. 89–96. Computer Vision Center, Universitat Autònoma de
[11] M. Arie-Nachimson and R. Basri, “Constructing implicit 3D shape Barcelona, Barcelona, Spain. He is a Founder and
models for pose estimation,” in Proc. IEEE 12th Int. Conf. Comput. Chief Scientific Officer with Sightcorp, a spinoff
Vis., Sep./Oct. 2009, pp. 1341–1348. of the Intelligent Systems Laboratory, UvA. He is
[12] M. Ozuysal, V. Lepetit, and P. Fua, “Pose estimation for category specific a Founder and Chief Executive Officer with
multiview object localization,” in Proc. IEEE Conf. Comput. Vis. Pattern 3DUniversum. His main research interests are in
Recognit. (CVPR), Jun. 2009, pp. 778–785. the fundamentals of image understanding, object
[13] H. Su, M. Sun, L. Fei-Fei, and S. Savarese, “Learning a dense multi- recognition, and color in computer vision. He is interested in different aspects
view representation for detection, viewpoint classification and synthesis of human behavior, in particular, emotion recognition. He is the Chair of
of object categories,” in Proc. IEEE 12th Int. Conf. Comput. Vis., various conferences and an Associate Editor of the IEEE T RANSACTIONS
Sep./Oct. 2009, pp. 213–220. ON I MAGE P ROCESSING . He is a Program Committee Member for a number
[14] J. Liebelt, C. Schmid, and K. Schertler, “Viewpoint-independent object of conferences and an Invited Speaker at major conferences. He is also
class detection using 3D feature maps,” in Proc. IEEE Conf. Comput. a Lecturer delivering post-doctoral courses at various major conferences,
Vis. Pattern Recognit., Jun. 2008, pp. 1–8. including the IEEE Conference on Computer Vision and Pattern Recognition,
[15] M. Stark, M. Goesele, and B. Schiele, “Back to the future: Learning the International Conference on Pattern Recognition, SPIE, and the Computer
shape models from 3D CAD data,” in Proc. Brit. Mach. Vis. Conf., Graphics, Imaging, and Vision.
Sep. 2010, pp. 1–11.
[16] J. Schels, J. Liebelt, and R. Lienhart, “Learning an object class repre-
sentation on a continuous viewsphere,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit. (CVPR), Jun. 2012, pp. 3170–3177.
[17] S. Satkin and M. Hebert, “3DNN: Viewpoint invariant 3D geometry
matching for scene understanding,” in Proc. IEEE Int. Conf. Comput. Xueqing Li is currently a Professor with the School
Vis. (ICCV), Dec. 2014, pp. 1873–1880. of Computer Science and Technology, Shandong
[18] M. Z. Zia1, M. Stark, and K. Schindler, “Are cars just 3D University, Jinan, China, where he received the
boxes?—Jointly estimating the 3D shape of multiple objects,” in Proc. B.Sc., M.Eng., and Ph.D. degrees from the Depart-
IEEE Comput. Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, ment of Computer Science, in 1987, 1990, and
pp. 3678–3685. 2002, respectively. His current research interests
[19] G. D. Finlayson, M. S. Drew, and C. Lu, “Entropy minimization for include human–computer interaction, virtual reality,
shadow removal,” Int. J. Comput. Vis., vol. 85, no. 1, pp. 35–57, computer graphics, image processing, and computer
Oct. 2009. geometry. He has authored over 20 papers on essen-
[20] R. Guo, Q. Dai, and D. Hoiem, “Single-image shadow detection and tial periodicals both at home and abroad, a number
removal using paired regions,” in Proc. IEEE Conf. Comput. Vis. Pattern of which were embodied by SCI and EI. He partic-
Recognit. (CVPR), Jun. 2011, pp. 2033–2040. ipates in compiling computer graphics and human technology.

Das könnte Ihnen auch gefallen