Sie sind auf Seite 1von 11

Computer Vision and Image Understanding 115 (2011) 13841394

Contents lists available at ScienceDirect

Computer Vision and Image Understanding


journal homepage: www.elsevier.com/locate/cviu

Multiscale illumination normalization for face recognition using dual-tree


complex wavelet transform in logarithm domain
Haifeng Hu
School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510275, PR China
Robotics Institute, Carnegie Mellon University, Pittsburg, PA 15213, USA

a r t i c l e i n f o a b s t r a c t

Article history: A new illumination normalization approach based on multiscale dual-tree complex wavelet transform
Received 14 July 2010 (DT-CWT) is presented for variable lighting face recognition. In our method, low resolution wavelet coef-
Accepted 11 June 2011 cients are truncated to minimize variations under different lighting conditions. On the other hand, the
Available online 21 June 2011
coefcients in the directional subbands are used for extracting the invariant facial features. In order to
reduce the halo artifacts, new conduction function is presented which can smooth the high-frequency
Keywords: illumination discontinuities while keeping most of the facial features unimpaired. The histogram remap-
Face recognition
ping technique is employed on the normalized image which remaps the histogram into normal distribu-
Illumination normalization
Dual-tree complex wavelet transform
tion and thus it can improve the recognition performance. Experimental results on the Yale B and CMU
Logarithm transform PIE databases show that the proposed method can achieve satisfactory performance for recognizing face
images under large illumination variations.
2011 Elsevier Inc. All rights reserved.

1. Introduction stable under different lighting conditions. In [8], the quotient im-
age (QI) method is presented which can extract an illumination-
Face recognition has attracted many researchers in the area of normalized representation of an image with using the Albedo
pattern recognition, machine learning, and computer vision be- information. In [9], the self-quotient image (SQI) is introduced
cause of its immense application potential. Numerous methods which can extract the intrinsic lighting properties of the image.
have been proposed in the last two decades. However, there are In [10], Gross and Brajovic propose a pre-processing algorithm that
still substantial challenging problems, which remain to be un- consists in extracting the Albedo information by an estimation of
solved. One of the critical issues is how to recognize faces across the illumination eld. In [11], Xie and Lam propose a related ap-
large illumination changes [1]. It has been proven, both experi- proach where the intensity of the pixels is locally normalized using
mentally [2] and theoretically [3] that, in face recognition, varia- a 7  7 window. These methods have common advantages that
tions caused by illumination are more signicant than the they do not require training images and have relatively low com-
inherent differences between individuals. putational complexity. However, they may fail when recognizing
Many approaches have been proposed to handle the illumina- face images with strong shadows cast from a direct light source.
tion problem. In [4,5], the illumination cone method is proposed In this case, shadow boundaries may create abrupt illumination
where human faces are approximated by low-dimensional linear discontinuities and, hence, halo artifacts are often visible in the
subspaces. In [6], a 9D linear subspace is used to approximate normalized image.
the set of images obtained under a variety of lighting conditions. Recently, some researchers attempt to solve the illumination
In [7], a linear subspace model is presented which can segment problem with using the invariant features extracted from facial
the image into regions that have surface normals whose directions images. In [12], the logarithmic total variation (LTV) model is pre-
are close to each other. The main drawbacks of these approaches sented where the illumination invariant facial structure is obtained
are they require knowledge either about the light source or a large by factorizing a single face image with using LTV model. In [13], a
volume of training data, which limit their applications in practical wavelet-based method is proposed which can detect key facial fea-
recognition systems. On the other hand, there are alternative tures by using wavelet shrinkage techniques. In [14], Chen et al.
methods available to normalize the images so that they can appear propose a discrete cosine transform (DCT) for compensating illumi-
nation variations. In this method, the coefcients corresponding to
low-level frequencies are discarded so as to minimize the illumina-
Address: School of Information Science and Technology, Sun Yat-sen University,
Guangzhou 510275, PR China.
tion variations. The results show the DCT model can achieve satis-
E-mail address: huhaif@mail.sysu.edu.cn factory recognition rates under some conditions. There are two

1077-3142/$ - see front matter 2011 Elsevier Inc. All rights reserved.
doi:10.1016/j.cviu.2011.06.004
H. Hu / Computer Vision and Image Understanding 115 (2011) 13841394 1385

problems when applying DCT method for variable lighting face Since DT-CWT produces complex coefcients Q s;o ; C s;o for
recognition. One is the appropriate number of DCT coefcients each directional subband at each scale, we dene magnitude of
must be selected for being truncated. If too small, the illumination the coefcients as
variation will be preserved after normalization process. If too large, q
the key facial features will be lost. The other one is the effect of Ms;o x; y Q s;o x; y2 C s;o x; y2 1
illumination discontinuity is not considered. As the illumination
continuities of shadow boundaries mainly lie in the high-frequency where s refers to scale, o 2 f1; . . . ; 6g is a set of six subbands ori-
band, they will be preserved if only low-frequency coefcients are ented in 15 ; 45 ; 75 directions, and Ms;o is the magnitude
discarded. As a result, the DCT method cannot avoid the decrease of the coefcients of subband o at scale s.
in the recognition rates when dealing with images under severe
lighting conditions, which have been veried by our experimental
2.2. Histogram remapping
results.
Motivated by Chens work, in this paper, a new illumination
Histogram remapping technique is regularly used for the task of
normalization method is presented. In our method, the coefcients
face recognition as a preprocessing step. For example, histogram
of the low resolution subbands are truncated to minimize the vari-
equalization is one of frequently used histogram remapping tech-
ations under different lighting conditions. Moreover, the illumina-
niques. With the property of improving contrast and simulta-
tion invariant features are extracted from the directional subbands.
neously compensating for the illumination-introduced variations
In order to reduce the halo artifacts, new discontinuity measure
in the appearance of the facial images, histogram equalization
and conduction function are devised which can smooth the illumi-
can ensure enhanced and more robust recognition performance.
nation discontinuities while keeping the key facial features unim-
As it seems that non-uniform distribution mapping is more suit-
paired. Finally, the histogram remapping technique is employed to
able than the uniform one for the face recognition task [16], normal
map the normalized image into normal distribution, which may
distribution mapping is adopted in this paper which can remap the
improve recognition performance. Experimental results on the
facial image into normal distribution.
Yale B and CMU PIE databases show that, compared with some fa-
The rst step common to all histogram remapping techniques is
mous illumination normalization techniques, our approach can
the rank transform, which renders the histogram of the given im-
achieve better performance for recognizing face images in variable
age to approximate the uniform distribution. Here, each pixel value
lighting conditions.
in a N dimensional image Ix; y is replaced with the rank H the
The organization of this paper is as follows. Section 2 briey de-
pixel would correspond to if the image pixels are ordered in an
scribes some related work. In Section 3, we will present the DT-
ascending manner. Then the general mapping function to match
CWT based illumination normalization method. Sections 4 and 5
the target distribution f x is calculated as
show the experiment results and our conclusion.
Z t
N  H 0:5
f xdx 2
2. Related works
N x1

where f x is a normal distribution function


2.1. Dual-tree complex wavelet transform
!
1 x  l2
In DT-CWT, two real discrete wavelet transforms wh t and wg t f x p exp 3
r 2p 2r2
are employed in parallel to generate the real and imaginary parts of
complex wavelet wt : wh t jwg t separately [15]. Note that
where l denotes the mean value and r > 0 represents the standard
wh t is approximately analytic and wg t is approximately the Hil-
deviation. As the facial images are normalized, they have zero mean
bert transform of wg t  Hwh t equivalently. In 2-D scenario,
and unit variance, i.e., l 0 and r 1.
DT-CWT is obtained by ltering an image separately: two trees
Let Fx be the target cumulative distribution function (CDF)
are used for the rows of the image and two trees for the columns
and u be the scalar of the left Eq. (2). The mapped value t is found
in a quad-tree structure with 4:1 redundancy. The four quad-tree
by
components of each DT-CWT coefcient are combined by simple
arithmetic sum and difference operations to yield a pair of complex t F 1 u 4
coefcients. This produces six directionally selective subbands for
each scale at approximately 15, 45 and 75 [15]. The impulse where F 1 denotes the inverse of the CDF.
responses of the lters for the six directional subbands are shown A visual example of a transformed image using the normal dis-
in Fig. 1. tribution is shown in Fig. 2.

Fig. 1. Complex dual-tree 2D wavelets and corresponding labels.


1386 H. Hu / Computer Vision and Image Understanding 115 (2011) 13841394

Fig. 2. A sample image and its histogram before (upper row) and after (lower row) mapping the histogram to a normal distribution.

3. The proposed method Eq. (7) shows that the normalized face image can be obtained
from the original image by using an additive term C0 x; y. If we
A new method is presented in this section for extracting the know where illumination variations are, the input face image can
illumination invariant from the directional subbands in wavelet be well compensated by adding or subtracting C0 x; y.
domain. In our method, new discontinuity measure and conduc- According to the common assumption, the luminance part of
tion function are devised which can reduce the effect of illumina- the model in (5) varies slowly with the spatial position and, hence,
tion discontinuities while keeping the most of the facial features represents a low-frequency phenomenon [19]. Based on this
unimpaired. assumption, in [14], the illumination variations C0 x; y are esti-
mated by the low-frequency DCT coefcients, which are discarded
3.1. Illumination invariant extraction from the directional subbands so as to obtain the normalized face image. As we know, the perfor-
mance of DCT method depends on the proper selection of coef-
In a simple situation, a face image Ix; y under illumination cients for being truncated. Moreover, it does not consider the
conditions follows the model [17]: effect of illumination discontinuities. As a result, the DCT method
may fail when dealing with the images under extreme lighting
Ix; y Rx; y  Lx; y 5
conditions.
where Lx; y is the luminance and Rx; y is the reectance at each The DT-CWT can also be used to transform an image from
point x; y. As Rx; y relates solely to the objects in an image [18], it spatial domain to frequency domain. The coefcients in the
is a stable characteristic of facial features and thus can be regarded low resolution subbands and directional subbands respectively
as illumination invariant measure. correspond to the low-frequency and high-frequency compo-
Taking logarithm transform on (5), we have nents of the image. Therefore, we can get an estimate for the
illumination variations C0 x; y with using wavelet coefcients
I0 x; y R0 x; y L0 x; y 6
in low resolution bands. And the variations under different light-
0 0 0
where I x; y logIx; y, R x; y logRx; y and L x; y ing conditions can be minimized by directly truncating the
logLx; y. approximate wavelet coefcients. On the other hand, the illumi-
Suppose ~ L0 x; y is the desired uniform illumination for L0 x; y, nation invariant facial features can be obtained from the multi-
we have scale directional subbands. As the DT-CWT holds the desirable
characteristics such as spatial locality and orientation selectivity
~I0 x; y R0 x; y ~L0 x; y
[15], it can have more capability than DCT in detecting and elim-
R0 x; y L0 x; y C0 x; y 7 inating illumination effect, which will be veried by our experi-
I0 x; y C0 x; y mental results.
Note that the illumination discontinuities of shadow bound-
where ~I0 x; y is the desired face image under uniform illumination aries correspond to the high-frequency components of image. They
and C0 x; y is dened by will be preserved if we use the unaltered directional subband coef-
C0 x; y ~L0 x; y  L0 x; y 8 cients to extract invariant features, which may create the abrupt
halo artifacts in the normalized image and degrade the recognition
H. Hu / Computer Vision and Image Understanding 115 (2011) 13841394 1387

performance. Therefore, we must discriminate the illumination Let Bx0 ;y0 denote the set of points fMx; yg whose context falls
discontinuities from the key facial features and it is also the main in the moving window. The estimate of the variance r2L x0 ; y0 is gi-
task of this paper. ven by
1 X
3.2. Smoothing illumination discontinuities in the directional subbands
r^ 2L x0 ; y0 Mx; y  lx0 ; y0 2 11
2K x;y2Bx0 ;y0

In this section, we will show how to reduce the effect of discon- where
tinuities of shadows with devising a proper measure function
1 X
/x; y and conduction function gx; y. Here /x; y is used for mea- lx0 ; y0 Mx; y 12
suring the level of discontinuity at pixel x; y and g is a weighted 2K x;y2Bx0 ;y0
function for modifying the coefcients such that the key facial
features are preserved while the luminance discontinuities are In the implementation, the context fFx; yg are rst sorted, and
smoothed. a moving window is placed over them, so the set Bxy and the var-
iance estimate r ^ 2L x; y can be updated efciently.
It should be mentioned that the magnitude values Mx; y
3.2.1. Discontinuity measure function /x; y rather than the original wavelet values are used for computing
Although there are many discontinuity measures, it is hard to r2L x0 ; y0 . This is because the wavelet coefcients are uncorrelated,
measure discontinuities of illumination accurately in an image and the above average of the neighbors does not yield much infor-
when it has been transformed into wavelet domain. In our method, mation about the coefcient of interest. However, the magnitude
the local variance r2L x; y is adopted for measuring the discontinu- values of neighboring coefcients are correlated [22], and therefore
ity of each coefcient because it can provide the degree of unifor- their averages are useful in collecting information in the near
mity for the pixels in the small neighborhood centered at pixel vicinity.
x; y. Moreover, different from the traditional variance estimation At this point, the local variance r ^ 2L x; y has been obtained. It can
methods, context modeling technique [20,21] is employed to esti- be directly adopted as the measure for the discontinuity at pixel
mate r2L x; y for it is exible and can adapt to the changing image x; y, i.e.
characteristics.
Consider one particular subband o at scale s with coefcients ^ 2L x; y
/x; y r 13
fMs;o x; yg. To simplify notation, the superscript s; o is dropped
where /x; y is called the discontinuity measure function which is
and will be used only when necessary for clarity. Each Mx; y can
normalized by
be modeled as a random variable whose variance is estimated as
follows.  y /x; y  /min
/x; 14
Consider a neighborhood of Mx; y, and its m elements are /max  /min
placed in a m  1 vector uxy . In our method, uxy is constructed by
where /max and /min are the maximal and minimal values of /
the eight nearest neighbors of Mx; y in the same subband, plus
across the subband under consideration. To emphasize higher val-
its parent coefcient Ms1;o x=2; y=2. To characterize the activity
 y that more likely correspond to cast shadow, we adopt
ues of /x;
level of the current pixel, we calculate the context Fx; y as a
a nonlinear transformation as follows:
weighted average of the value of the neighbors
p 
~ y sin
/x;  y ;
/x;  y 6 1
0 6 /x; 15
Fx; y wt uij 9 2
The weight w is found by using the least squares estimate, that
is, 3.2.2. Conduction function gx; y
X The proper conduction function g must be dened to utilize
wLS arg min Mx; y  wt uxy 2 the discontinuity measure. As we know, g must be a nonnegative
x;y 10 monotonically decreasing function where a large weight should be
Ut U1 Ut P assigned to a pixel that involves low discontinuity, and vice versa
for smoothing the discontinuities.
where U is a S  p matrix with each row being utxy , and P is the S  1 In our method, the conduction function g is given by
vector containing all coefcients Mx; y, S M  N is the size of 8
> ~ y < T
the subband under consideration. <1 /x;
Then the local variance r2L x; y of Mx; y can be estimated from gx; y 1
otherwise 16
> ~
/x;y
: 1 T
2
other coefcients whose context variable are close in value to
Fx; y. This may be illustrated in Fig. 3, which shows the relations
~ y is the obtained discontinuity measure at pixel x; y,
where /x;
of the pairs fMx; y; Fx; yg; x 1;    ; M; y 1;    ; N. As it can be
seen, taking an interval of small valued Fx; y, the associated coef- and T is a parameter that determines the level of discontinuities
cients Mx; y have a small spread; on the other hand, an interval which must be smoothed. From Eq. (16), it can be seen that the dis-
of large valued Fx; y has corresponding Mx; y with a larger continuities will be smoothed if /x; y P T. On the other hand, the
spread. Note the intervals are of different widths to capture the regions of facial features are kept unaltered when /x; y < T. There-
same number of points. This indicates that the context provides a fore, the expected normalized face image could be obtained with
good indication of local variability. using an appropriate T.
~ yg be the discontinuity measure set in some subband
Let f/x;
Thus, for a given coefcient Mx0 ; y0 , an interval is placed
around Fx0 ; y0 , and the variance is estimated from the points and S M  N be the size of the subband. There are S elements in
~ yg, which can be sorted in the decreasing order (in value),
f/x;
whose context Fx; y falls within this window. In particular, we
take K closest points (in value) above Fx0 ; y0 and K closest points ~1 P /
~2 P    P /
~S
/ 17
below, resulting in a total of points. We choose K max50;
0:02  M  N to ensure that enough points are used to estimate ~ 1 and /
where / ~ S are respectively the maximal and minimal value of
the variance. ~ yg.
f/x;
1388 H. Hu / Computer Vision and Image Understanding 115 (2011) 13841394

Fig. 3. Sample plot of fMx; y; Fx; yg, where Mx; y is the wavelet coefcient and Fx; y is its context. A collection of Mx; y with small values of Fx; y have a smaller
spread than those with large values of Fx; y suggesting that context modeling provides a good variability estimate of Mx; y.

Let a be a control parameter ranged within 0; 1 and l da Se the illumination discontinuities and facial features with strong
be the nearest integers more than or equal to a S. Then the opti- edges (e.g. eyes, nose, mouth, etc.) are detected. This means facial
mal value of T is given by features may be ambiguous after normalization process, which is
shown in Fig. 5c. Therefore, the formation of gx; y must be mod-
T /l 18 ied such that it can keep most of the key facial features unaltered.
It is easy to see that a is actually a ratio value which reects the From Fig. 4ch, we also can observe an interesting phenome-
pixels related to discontinuities accounting for the whole subband. non: the large discontinuities related to the facial features (e.g.
Its value could be adjusted according to the varying lighting condi- eyes) always can be detected in each subband whereas those re-
tions. In the experiment section, we will give the empirical value lated to illuminations only can be detected in one or two specic
for a. band. This means we can discriminate the illumination discontinu-
Fig. 4ch illustrate the visualization of conduction function ities from the facial edges based on the times that each pixel x; y
gx; y obtained in six directional bands for one level complex is detected as discontinuity in six directional bands. Thus, we can
wavelet transform, where a 0:2 and black points represent the reformulate the conduction function gx; y with using the updated
pixels which will be smoothed. From the gures, we can nd both discontinuity information.

~x; y (see Eq. (21)). (c)(h) Visualization of


Fig. 4. (a) The original facial image from the Yale B database. (b) Visualization of discontinuity measure values by using g
discontinuity measure values obtained by gx; y (see Eq. (16)) in six subbands. Note that a 0:2.
H. Hu / Computer Vision and Image Understanding 115 (2011) 13841394 1389

Fig. 5. Comparison of normalization results for different conditions. (a) The original facial image from the Yale B database. (b) Normalization result obtained directly from the
directional subbands with all the coefcients kept unaltered. (c) Normalization result when using gx; y dened in Eq. (16) where a 0:2. (d)(e) Normalization results when
using g~x; y dened in Eq. (21) where a is set to 0.2 and 0.4 respectively.

Let C s
n x; y be an additive function dened by where Z ~ LL is the matrix of updated low resolution coefcients and 0
is zero matrix.
X
6
~ we have
C s  s;o x; y With using the updated complex coefcient matrix Z;
n x; y 19
o1 ^I0  W 1 Z
~ 26
where 1
8 where W is the inverse dual-tree complex wavelet transform and
> ~ s;o
< 1 if / x; y > T ^I0 is the estimated normalized image in the logarithm domain.
s;o
 x; y 0 otherwise 20
>
:
3.4. Overview of the proposed method

It is obvious that C s
n x; y
is actually the times the pixel x; y is A schematic overview of our method is as follows
detected as discontinuity in six subbands. With C s
n x; y, the con-
duct function can be reformulated as 1. Take the logarithm for image I and obtain I0 .
8  6  
> P s;o s 2. Compute the dual-tree complex wavelet transform on I0 , obtain-
>
> 1
 g x; y  6 C x; y
< C s x;y
n o1
n
ing Z WI0 .
g~s;o x; y 21 3. Compute the discontinuity measure /x; ~ y and conduction
>
>
> if 1 6 C s
n x; y 6 2
: function g~s;o x; y for all the directional subbands.
1 otherwise
4. Modify the wavelet coefcients Z s;o x; y, obtaining
Fig. 4b shows the visualization of the new conduction function ~ s;o x; y Q s;o x; y  g
Q ~s;o x; y and
g~s;o x; y. From the gure, we can nd that most of the facial fea- ~ s;o s;o
C x; y C x; y  g ~s;o x; y, for s 1; . . . ; J, o 1; . . . ; 6.
tures are preserved. Therefore, the effect of the strong illumination And truncated all the low resolution coefcients, i.e., Z ~ LL 0.
discontinuity can be effectively reduced, which is shown in 5. Apply the inverse dual-tree complex wavelet transform on the
Fig. 5(d)(e). updated coefcients Z and obtain the expected normalized
image with ^I0  W 1 Z. ~
3.3. Illumination normalization based on DT-CWT

From the above two sections, we have stated the facts that illu-
mination variations can be minimized by truncating all of the low 4. Experimental results
resolution coefcients and illumination invariant can be extracted
from the directional subbands. Moreover, we have presented the In this section, we will evaluate the performance of the pro-
conduction function for reducing the effect of illumination discon- posed method for detecting illumination invariant capability with
tinuities. In the following, we will present the whole procedure of using different face database.
our model.
Let Z WI0 denote the matrix of complex wavelet coefcients 4.1. Datasets
of I0 , where W is the two-dimensional dual-tree complex wavelet
transform operator and I0 is the input face image in the logarithm For consistency with other studies, the Yale face B database and
domain. Let fZ s;o x; yg be the M  N complex coefcients in sub- the CMU PIE database are adopted. The important statistics of
s;o
band o at scale s. Suppose Z s;o x; y Q s;o x; y jC x; y where
p these databases are summarized below:
j 1. For each pixel at x; y, we can modify the coefcients
with using a conduction function g~s;o x; y as follows:
The Yale face database B [5] totally contains images of 10 indi-
~ s;o x; y Q s;o viduals in nine poses and 64 illuminations per pose. We only
Q x; y  g~s;o x; y 22
use the frontal face images for evaluation. Generally, all the
~ s;o x; y C s;o x; y  g~s;o x; y images per subject can be divided into ve subsets according
C 23 to the angle of the light source directions, which are: subset 1
where o 1; . . . ; 6. s 1; . . . ; J and J is the largest scale in the (012), subset 2 (1325), subset 3 (2650), subset 4 (51
decomposition. 77) and subset 5 (above 78). Fig. 6 shows four images from
The updated complex coefcients is expressed as each subset for a single subject and their corresponding normal-
~ s;o x; y jC
~ s;o x; y Q ~ s;o x; y ized images obtained by our method.
Z 24

The CMU PIE database [23] contains 68 subjects with 41,368
As for the coefcients in the low resolution subbands, they are face images as a whole. The face images are captured by 13 syn-
truncated so as to minimize the illumination variations, which can chronized cameras and 21 ashes, under variations in pose, illu-
be expressed as mination, and expression. As we concern illumination
~ LL 0 variations rather than pose and expression, only frontal images
Z 25
are chosen for testing, which correspond to the ones captured
1390 H. Hu / Computer Vision and Image Understanding 115 (2011) 13841394

Fig. 6. (a) Five subsets for the same person in YALEB dataset. (b) The corresponding normalized images obtained by the proposed method. Note that the images have been
remapped into normal distribution.

Fig. 7. (a) The 21 different lighting conditions for a single subject in CMU PIE dataset. (b) The corresponding normalized images obtained by the proposed method where the
histograms of the images have been remapped into normal distribution.

Fig. 8. Example image of an individual from YALEB database and several illumination normalization algorithms are applied. Note that the histograms of the images have been
remapped into normal distribution.
H. Hu / Computer Vision and Image Understanding 115 (2011) 13841394 1391

Fig. 9. Example image of an individual from CMU PIE database and several illumination normalization algorithms are applied. Note that the histograms of the images have
been remapped into normal distribution.

using camera C27. Fig. 7 shows the different lighting images for per subject which are randomly selected from subset 1. The
a single subject and the corresponding normalized images other one is built with using two images per individual ran-
obtained by the proposed method. domly selected from subset 5.

As for the CMU PIE database, two training datasets are built
4.2. Training datasets denoted as PIE-TRAIN-1 and PIE-TRAIN-2 in respective. PIE-
TRAIN-1 is built with using images from good illumination

In the case of Yale B database, two training datasets are built conditions which correspond to images 08 and 20 from each indi-
which are denoted as YALEB1-TRAIN and YALEB5-TRAIN in vidual. The other is formed with images from bad illumination
respective. YALEB1-TRAIN is constructed with using two images conditions corresponding to images 00 and 15 from each subject.

Fig. 10. Averaged best correct recognition rates obtained by our method versus parameter a using the YALEB1-TRAIN dataset. Note that subsets 25 are used for testing and
the results are averaged over 10 random splits to obtain statistical representative data.
1392 H. Hu / Computer Vision and Image Understanding 115 (2011) 13841394

4.3. Compared algorithms


Linear discriminant analysis (LDA) is selected as projection
method.
Eight illumination normalization algorithms which are com-
The algorithms are evaluated in a face identication scenario
pared in our test are listed below: and results are characterized in terms of rank-1 recognition
rate.

the Multi Scale Retinex (MSR) algorithm [24],
Both Euclidean (EUC) and cosine angle (COS) distance measures

the Homomorphic ltering (HOMO) algorithm [25], are used.

the Multi scale Self Quotient image (MSQ) algorithm [9],
Three-level complex wavelet transform is implemented in our

the DCT based normalization technique (DCT) [14], method.

the WAvelet (WA) based normalization technique [26],

the Wavelet Denoising (WD) based normalization technique 4.5. Results
[13],

the ISotropic diffusion (IS) based normalization technique [10], 4.5.1. Determination of the parameter a

the AnIsotropic diffusion (AS) based normalization technique As we know, the a P 0 is an essential parameter involved in
[27]. our algorithm. Therefore, the rst experiment is performed to
determine the optimal value of a with using the YaleB database.
For the convenience of comparisons, some example images ob- Here, we use YALEB1-TRAIN and YALEB5-TRAIN as the training
tained after applying the above illumination compensation meth- set in respective and all the images in other subsets are used
ods are presented, which are shown in Figs. 8 and 9. for testing. In order to obtain statistical representative data,
we average the results over 10 random splits, which are shown
4.4. Test conditions in Figs. 10 and 11 where the averaged recognition rates are
plotted versus different a values. Note that the range of a is

All images are cropped to a size of 128  128. In our system, the set from 0.05 to 0.5 with the step 0.05. From the gures, we
positions of the two eyes are located either manually or automat- can nd, the best performance is achieved when a is set within
ically, which are then used for alignment. No masking is done [0.1, 0.3].
since it turns out that cropping eliminates enough background. It should be mentioned that almost all of the compared meth-
Examples of the cropped images are shown in Figs. 6 and 7. ods are dependent on the parameters values selection. In the fol-

For all the normalized face images obtained by different meth- lowing test, the optimal parameters for each algorithm are tuned
ods, the histogram remapping technique is employed so as to by hand. And the highest recognition rate obtained by each method
remap their histogram into norm distribution. is adopted for the performance comparison.

Fig. 11. Averaged best correct recognition rates obtained by our method versus parameter a using the YALEB5-TRAIN dataset. Note that subsets 14 are used for testing and
the results are averaged over 10 random splits to obtain statistical representative data.
H. Hu / Computer Vision and Image Understanding 115 (2011) 13841394 1393

Table 1
Averaged recognition rates (%) obtained with using different illumination normalization methods on the YALEB1-TRAIN dataset. Note that LDA is implemented for dimensionality
reduction and the results are averaged over 10 random splits to obtain statistical representative data.

Testing dataset Subset 2 Subset 3 Subset 4 Subset 5


EUC COS EUC COS EUC COS EUC COS
MSQ 99.67 100 99.17 99.83 90.57 93.57 89.05 93.73
HOMO 99.50 99.92 82.92 90.50 50.07 61.43 47.52 53.21
MSR 99.58 99.83 96.83 97.67 78.29 81.71 71.32 76.79
DCT 99.50 99.33 88.16 92.41 53.28 63.78 45.68 62.31
WA 99.58 99.92 84.92 91.25 50.36 62.00 49.58 52.26
WD 99.83 100 98.25 98.75 91.93 93.86 92.11 92.79
IS 99.83 100 99.50 99.75 94.07 94.43 92.74 93.52
AS 92.25 92.67 99.50 100 80.29 89.50 64.42 81.37
Our method 99.62 100 99.41 100 94.79 95.28 93.63 95.00

Table 2
Averaged recognition rates (%) obtained with using different illumination normalization methods on the YALEB5-TRAIN dataset. Note that LDA is implemented for dimensionality
reduction and the results are averaged over 10 random splits to obtain statistical representative data.

Testing dataset Subset 1 Subset 2 Subset 3 Subset 4


EUC COS EUC COS EUC COS EUC COS
MSQ 94.12 93.00 87.92 88.67 90.33 92.41 83.92 90.07
HOMO 50.28 51.29 55.33 57.83 52.50 59.50 50.07 67.57
MSR 70.86 70.43 65.67 66.25 65.58 66.67 62.57 72.43
DCT 63.00 64.14 56.50 60.17 50.16 63.58 47.71 66.50
WA 48.00 47.43 52.92 58.00 52.17 60.00 50.36 64.50
WD 92.28 92.52 88.97 87.92 88.25 89.50 92.85 95.36
IS 92.29 94.00 91.75 89.33 92.42 92.42 92.93 95.50
AS 66.86 78.57 61.83 62.83 77.83 82.17 80.93 84.43
Our method 97.57 98.43 95.11 95.47 92.83 94.08 94.43 96.04

4.5.2. Performance comparisons on Yale B database 4.5.3. Performance comparison on CMU PIE database
In this section, we compare the performance of different illu- In this section, performance comparisons of different illumina-
mination compensation methods with using the YALEB1-TRAIN tion normalization techniques are conducted on the CMU PIE data-
and YALEB5-TRAIN as training dataset in respective. All the base. In our test, PIE-TRAIN-1 and PIE-TRAIN-2 set are selected for
images in other subsets are used for testing. The recognition re- training in respective, while the remaining images are used for
sults are averaged over 10 random splits which are listed in Ta- testing. The results are listed in Table 3, from which we can nd
bles 1 and 2. Note the bold values shown in these tables are the
best correct recognition rates obtained by our approach under (1) Our method consistently performs the best for all the cases.
different testing conditions. From the tables, we can observe Note that all the best correct recognition rates are beyond
the following: 94%, which indicates the proposed illumination normaliza-
tion method can satisfy the requirements in most of the
(1) The proposed method performs the best in most of the cases, real-life law enforcement applications.
regardless of which dataset or distance measure is used. This (2) Our method performs slightly better than the MSQ methods,
indicates our method shows more power than other tech- which is regarded as one of most effective illumination com-
niques in detecting illumination invariant from face images. pensation methods. Furthermore, when considering the pro-
(2) Our method is obviously superior to the DCT method. cessing time, we nd that the proposed algorithm is
Although the proper number of low-frequency DCT coef-
cients has been truncated, its recognition performance is
unsatisfactory when training or testing images are selected
Table 3
from the subsets captured under bad lighting conditions.
Recognition rates (%) obtained with using different illumination normalization
This indicates DCT method is not suitable for dealing with methods on the CMU PIE database. Note that LDA is implemented for dimensionality
face images with large illumination variations. On the other reduction and training images are from PIE-TRAIN-1 (good illumination conditions)
hand, our method can achieve good performance under dif- and PIE-TRAIN-2 (bad illumination conditions) respectively.
ferent lighting conditions, which indicates that the illumina- Training dataset PIE-TRAIN-1 PIE-TRAIN-2
tion variations are minimized when truncating all the low
EUC COS EUC COS
resolution coefcients and key facial features are obtained
with using the directional subbands information. Moreover, MSQ 99.85 100 92.79 93.82
HOMO 83.56 87.19 73.58 77.09
the proposed approach can reduce the effect of illumination MSR 98.99 99.23 85.82 87.31
discontinuities, which makes it more robust to the large var- DCT 85.28 89.97 78.01 81.47
iable lighting conditions. WA 86.75 89.48 71.54 76.33
(3) When comparing the similarity measures we can nd, using WD 99.85 99.92 92.41 93.06
IS 99.69 99.78 90.75 91.20
Cosines distance gives better results than using Euclidean
AS 92.40 93.65 66.09 70.53
distance. The similar observation has also been found in
Our method 99.94 100 94.66 96.03
[28].
1394 H. Hu / Computer Vision and Image Understanding 115 (2011) 13841394

obviously faster than MSQ. A simple comparison has been made [4] P.N. Belhumeur, D.J. Kriegman, What is the set of images of an object under all
possible lighting conditions, in: Proc. Intl Conf. Computer Vision and Pattern
using a standard PC with Intel Core 2 CPU, 2.2 GHz, 4G RAM,
Recognition, 1996.
Window 7 Home Premium 64-bit. As for our method, the mean [5] A.S. Georghiades, P.N. Belhumeur, D.J. Kriegman, From few to many:
processing time for extracting the illumination invariant from illumination cone models for face recognition under differing pose and
an image of 128  128 pixels is about 95 ms. However, the lighting, IEEE Transactions on Pattern Analysis and Machine Intelligence 23
(6) (2001) 643660.
mean processing time for MSQ is more than 730 ms. This indi- [6] R. Ramamoorthi, P. Hanrahan, On the relationship between radiance and
cates for applications in which speed is critical, our method irradiance: determining the illumination from images of a convex Lambertian
might be a better alternative to the MSQ techniques. object, Journal of the Optical Society of America 18 (10) (2001) 24482459.
[7] A.U. Batur, M.H. Hayes, Linear subspaces for illumination robust face
recognition, in: Proc. Intl Conf. Computer Vision and Pattern Recognition,
2001, pp. 296301.
5. Conclusions [8] A. Shashua, T. Riklin-Raviv, The quotient image: class-based re-rendering and
recognition with varying illuminations, IEEE Transactions on Pattern Analysis
In this paper, a new illumination normalization method is pre- and Machine Intelligence 23 (2) (2001) 129139.
[9] H. Wang, S.Z. Li, Y. Wang, Generalized quotient image, in: Proc. Intl Conf.
sented for variable lighting face recognition. In our approach, the Computer Vision and Pattern Recognition, 2004.
low resolution coefcients of the approximate subbands are trun- [10] R. Gross, V. Brajovic, An image preprocessing algorithm for illumination
cated so as to minimize the illumination variations contained in invariant face recognition, in: 4th Intl Conf. Audio and Video Based Biometric
Person Authentication, 2003, pp. 1018.
the face images. Moreover, the coefcients of the directional sub- [11] X. Xie, K.-M. Lam, An efcient illumination normalization method for face
bands are used to extract the key facial features. In order to reduce recognition, Pattern Recognition Letters 27 (2006) 609617.
the effect of the illumination discontinuity, new discontinuity [12] T. Chen, W. Yin, X. Zhou, D. Comaniciu, T.S. Huang, Total variation models for
variable lighting face recognition, IEEE Transactions on Pattern Analysis and
measure and conduct function are presented which can smooth
Machine Intelligence 28 (9) (2006) 15191524.
the illumination discontinuity while keeping most of the facial fea- [13] T. Zhang, B. Fang, Y. Yuan, Y.Y. Tang, Z. Shang, D. Li, F. Lang, Multiscale facial
tures unimpaired. The histogram remapping technique is em- structure representation for face recognition under varying illumination,
ployed on the normalized image which remaps the histogram Pattern Recognition 42 (2) (2009) 252258.
[14] W. Chen, M.J. Er, S. Wu, Illumination compensation and normalization for
into normal distribution and thus can improve the recognition per- robust face recognition using discrete cosine transform in logarithmic domain,
formance. We compare our method with some famous illumina- Man and Cybernetics Part B 36 (2) (2006) 458466.
tion normalization techniques with using the Yale B database [15] I.W. Selesnick, R.G. Baraniuk, N.C. Kingsbury, The dual-tree complex wavelet
transform, IEEE Signal Processing Magazine 22 (6) (2005) 123151.
and CMU PIE database. The results shows the proposed method [16] V. truc, J. Zibert, N. Pavesic, Histogram remapping as a preprocessing step for
can achieve high recognition performance for all the test condi- robust face recognition, WSEAS Transactions on Information Science and
tions, which indicates it has more capability than other approaches Applications 6 (3) (2009) 520529.
[17] B.K.P. Horn, Robot Vision, MIT Press, Cambridge, MA, 1986.
in extracting the illumination invariant. [18] J. Short, J. Kittler, K. Messer, Photometric normalisation for face verication, in:
In the future, we will concentrate our work on the following Proc. of AVBPA05, 2005, pp. 617626.
two points: [19] Y.K. Park, S.L. Park, J.K. Kim, Retinex method based on adaptive smoothing for
illumination invariant face recognition, Signal Processing 88 (8) (2008) 1929
1945.
(1) We have only used the spatial information of DT-CWT. In the [20] S. LoPresto, K. Ramchandran, M. Orchard, Image coding based on mixture
future, we will investigate how to include its phase informa- modeling of wavelet coefcients and a fast estimationquantization
framework, in: Proc. Data Compression Conf., Snowbird, UT, 1997, pp. 221
tion in the invariant facial structure extraction.
230.
(2) We will extend our method to color images and evaluate its [21] Y. Yoo, A. Ortega, B. Yu, Image subband coding using context based
performance on larger and more challenging outdoor face classication and adaptive quantization, IEEE Transactions on Image
databases. Processing 8 (1999) 17021715.
[22] J. Shapiro, Embedded image coding using zerotrees of wavelet coefcients,
IEEE Transactions on Signal Processing 41 (1993) 34453462.
[23] T. Sim, S. Baker, M. Bsat, The CMU pose, illumination, and expression database,
IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (12) (2003)
Acknowledgments 16151618.
[24] D.J. Jobson, Z. Rahman, G.A. Woodell, Properties and performance of a center/
surround retinex, IEEE Transactions on Image Processing 6 (3) (1997) 451
This work is supported by National Science Foundation of China 462.
(NSFC) under Contract 60802069 and the Fundamental Research [25] G. Heusch, F. Cardinaux, S. Marcel, Lighting Normalization Algorithms for Face
Funds for the Central Universities of China. Verication, IDIAP, 2005.
[26] S. Du, R. Ward, Wavelet-based illumination normalization for face recognition,
The authors are very thankful to Simon Baker for facilitating the Proceedings of IEEE International Conference on Image Processing 2 (2005)
CMU PIE database and to Vitomir truc for providing INface Matlab 954957.
Toolbox [29,30] for testing. [27] P. Perona, J. Malik, Scale-space and edge detection using anisotropic diffusion,
IEEE Transactions on Pattern Analysis and Machine Intelligence 12 (7) (1990)
629639.
[28] J. Ruiz-del-Solar, P. Navarrete, Eigenspace-based face recognition: a
References
comparative study of different approaches, IEEE Transactions on Systems,
Man, Cybernetics C 35 (3) (2005) 315325.
[1] D. Shin, H.-S. Lee, D. kim, Illumination-robust face recognition using ridge [29] V. truc, N. Pavesic, Performance evaluation of photometric normalization
regressive bilinear models, Pattern Recognition Letters 29 (2008) 4958. techniques for illumination invariant face recognition, in: Y.J. Zhang (Ed.),
[2] J. Phillips, T. Scruggs, A. OToole, P. Flynn, K. Bowyer, C. Schott, M. Sharpe, FRVT Advances in Face Image Analysis: Techniques and Technologies, IGI Global,
2006 and ICE 2006 Large-Scale Results, NISTIR 7408 Report, 2007. 2010.
[3] W. Zhao, R. Chellappa, Robust Face Recognition Using Symmetric Shape-from- [30] V. truc, N. Pavesic, Gabor-based kernel partial-least-squares discrimination
Shading, Technical Report, Center for Automation Research, Univ. of Maryland, features for face recognition, Informatica (Vilnius) 20 (1) (2009) 115138.
1999.

Das könnte Ihnen auch gefallen