Sie sind auf Seite 1von 6

99

Pose Invariant Face Recognition using Hybrid DWT-DCT Frequency


Features with Support Vector Machines


Jawad Nagi1
1,2*
, Syed Khaleel Ahmed
2
, and Farrukh Nagi
3

1
Power Engineering Centre, Research Management Centre
2
Department of Electronics and Communication Engineering, College of Engineering
3
Department of Mechanical Engineering, College of Engineering
Universiti Tenaga Nasional, Km 7, Jalan Kajang-Puchong, 43009 Selangor, Malaysia
jawad@uniten.edu.my, syedkhaleel@uniten.edu.my, farrukh@uniten.edu.my


Abstract

Face recognition is a challenging problem and up
to date, there is no technique that provides a robust
solution to all situations. This paper presents a hybrid
approach to pose invariant human face recognition.
The proposed scheme is based on a combination of the
Discrete Wavelet Transform (DWT) and Discrete
Cosine Transform (DCT) analysis on face images. The
DWT-DCT domain coefficients are used for feature
extraction using simple statistical measures and
quantization. This approach reduces the dimension of
the original face images while preserving the property
of data distribution in the feature subspace. A Support
vector machine (SVM) classifier is used for classifying
DWT-DCT based feature vectors into separate groups
for recognition purposes. The hybrid DWT-DCT-SVM
face recognition model is evaluated in MATLAB on the
Cambridge ORL face database. Comparison of the
proposed technique with existing face recognition
schemes proves that the combination of DWT-DCT
improve feature selection performance compared to
other approaches.

Keywords: Face recognition, Discrete wavelet
transform, Discrete cosine transform, Support vector
machine, Dominant frequency features.

1. Introduction

Face recognition has become an active area of
research in recent years mainly due to increasing
security demands and its potential commercial and law
enforcement applications [1]. The last decade has
shown dramatic progress in this area, with emphasis on
applications such as human-computer interaction
(HCI), biometric analysis, content-based coding of
images and videos, intelligent monitoring, and
identification fields [2].
Commonly researched face recognition methods
include geometry, Eigenfaces, neural networks and
hidden Markov based methods [3]. Human face images
cannot be directly used for classifier design. This is
due to the high dimensionality of the face vectors and
redundant information contained in the face vectors,
which does not reveal dominant characteristics within
face images.
Face recognition techniques commonly used for
feature extraction and dimensionality reduction include
Principal Component Analysis (PCA) and Linear
Discriminant Analysis (LDA). The disadvantage of
PCA is that it treats inner-classes and outer-classes
equally, and therefore, becomes sensitive to problems
associated with facial expressions. To overcome this
problem many methods have been put forward such as
Fisherfaces, a combination of PCA and LDA [4].
Recently, wavelets having good qualities in spatial and
frequency domains have been considered an ideal tool
for solving face recognition problems [5,6]. In
addition, the DCT a commonly used approach for
image compression has been previously used for
feature extraction of face images [2].
This paper presents a novel approach for pose
invariant face recognition with dramatic reduction in
computational requirements. This approach is based on
a hybrid combination of the Discrete Wavelet
Transform (DWT) and Discrete Cosine Transform
(DCT) for feature selection from face images. Support
Vector Machines (SVM) having good generalization
ability, non-linear dividing hypersurfaces and high
discrimination are used to classify DWT-DCT based
feature vectors for recognition purposes [6]. The
hybrid DWT-DCT-SVM face recognition model is
evaluated in MATLAB on the ORL face database.
Proceedings of the 4
th
International Conference on 17
th
19
th
November 2008
Information Technology and Multimedia at UNITEN (ICIMU 2008), Malaysia

100

2. Discrete Wavelet Transform

The Discrete Wavelet Transform (DWT) is a very
popular and commonly used tool for image analysis
and, as such, has become the part of JPEG2000
standard. The DWT decomposes a signal into a set of
basis functions called wavelets; decomposition is
defined as the resolution of a signal. The DWT then
performs a multi-resolution analysis of a signal with
localization in both time and frequency domains [5].
DWT can be mathematically expressed as follows:


) 2 ( ) (
) 2 ( ) (
*
,
*
,
) (

=
=
=

k n g n x a
k n h n x d
DWT
j
j k j
j
j k j
n x
(1)

where coefficients d
j,k
refer to the detail components in
the signal x(n), and a
j,k
refers to the approximation
components in the signal. The functions h(n) and g(n)
represent the coefficients of the high-pass and low-pass
filters respectively, whilst parameters j and k refer to
wavelet scale and translation factors.
For the case of images, 2D-DWT is implemented as
a set of filter banks, comprising of a cascaded scheme
of high-pass and low-pass filters. The final result
obtained is a decomposition of the input image into
four non-overlapping multi-resolution sub-bands: LL,
LH, HL and HH. The sub-band LL represents the
coarse-scale DWT coefficients while the sub-bands
LH, HL and HH represent the fine-scale of DWT
coefficients. To obtain the next coarser scale of
wavelet coefficients, the sub-band LL is further
processed until some final scale N is reached. When N
is reached 3N+1 sub-bands consisting of the multi-
resolution sub-bands LL
N
and LH
y
, HL
y
and HH
y
are
computed, where y ranges from 1 until N.

3. Discrete Cosine Transform

The Discrete Cosine Transform (DCT) is an
algorithm widely used for image compression, as it
forms the basis for the international standard loss
image compression algorithm known as JPEG. The
DCT converts spatial domain signals into elementary
frequency components by representing an image as a
sum of sinusoids of varying magnitudes and
frequencies [4].
Face images having high correlation and redundant
information cause computational burden in terms of
processing speed and memory utilization. Therefore,
the 2D blocked-DCT segments an image non-
overlapping blocks and applies the DCT to each block,
which results in: low frequency and high frequency
sub-bands. Most of the visually significant signal
energy lies at low-frequency sub-band which contains
the most important visual parts of the image. These
coefficients can be used as a type of signature that is
useful for recognition tasks, such as face recognition.
High frequency components of the image are usually
removed through compression, which reduce data
volume without sacrificing image quality. For an input
image, x, the 2D-DCT coefficients for the transformed
output image y, are computed as follows:


2
) 1 2 (
cos
2
) 1 2 (
cos
) , (
2 2
) , (
1
0
1
0
|
.
|

\
| +
|
.
|

\
| +
=


=

=
N
v n
M
u m
n m x
N M
v u y
N
v
M
u
v u


(2)
where,

(3)

The input image x, consists of matrix having N M
pixels, where x(m,n) is the intensity of the pixel in row
m and column n, and y(u,v) is the 2D-DCT coefficient
in row u and column v. The image is reconstructed by
applying the 2D-IDCT operation defined, as follows:



2
) 1 2 (
cos
2
) 1 2 (
cos
) , (
2 2
) , (
1
0
1
0
|
.
|

\
| +
|
.
|

\
| +
=


=

=
N
v n
M
u m
v u y
N M
n m x
N
v
v u
M
u

(4)

4. Support Vector Machine

Support vector machines (SVMs) were introduced
by Vapnik in the late 1960s on the foundation of
statistical learning theory [7]. In SVM, training is
performed in such a way as to obtain a quadratic
programming (QP) problem. The solution to this QP
problem is global and unique. For empirical data (x
1
,
y
1
),,(x
m
, y
m
) R
n
{-1,+1} that are mapped by :
R
n
F into a feature space, the linear hyperplanes
that divide them into two labeled classes is shown as:

= + b w b x w
n
, 0 ) (

(5)
To construct an optimal hyperplane with maximum-
margin and bounded error in the training data (soft
margin), the following QP problem is to be solved:
0 = u
1 1 M u

=
, 1
,
2
1


u

=
, 1
,
2
1


q

1 1 N v
0 = v
101

=
+
m
i
i b w
C w
1
2
,
2
1
min
( ) m i b x w y
i i
,..., 2 , 1 , 1 ) ( = +

(6)

The first term in cost function (6) makes maximum
margin of separation between classes, and the second
term provides an upper bound for the error in the
training data. The constant C [0, ) creates a tradeoff
between the number of misclassified samples in the
training set and separation of the rest samples with
maximum margin. A way to solve (6) is via its
Lagrange function. Given a kernel K(x
i
, y
i
) = (x
i
)
(x
j
), the Lagrange function of (6) is simplified to:

) , (
2
1
max
1 1 1
j i j i j
m
i
m
j
i
m
i
i
x x K y y


= = =

(7)

= =
= =
m
i
i i i i
m
i
i i
i C y x y w
1 1
, 0 , 0 ), (
(8)

From eq. (5) it is seen that the optimal hyperplane in
feature space can be written as the linear combination
of training samples with
i
0. These informative
samples known as support vectors, construct the
decision function of the classifier based on the kernel
function:

|
|
.
|

\
|
+ =

=
b x x k y x f
j i
m
i
i
) , ( sgn ) (
1 ,

(9)


Kernel functions in SVMs are selected based on the
data structure and type of the boundaries between the
classes. The widely applied kernel function is the radial
basis function (RBF) kernel, which is defined as:

|
.
|

\
|
=
2
exp ) , (
j i j i
RBF
x x x x K
(10)


where > 0 is the RBF kernel parameter. The RBF
kernel induces an infinite-dimensional kernel space,
and the kernel width parameter controls the scaling
of the mapping.

5. Methodology

A general overview of the proposed hybrid face
recognition model developed is shown in Figure 1. A
hybrid combination of the DWT and the DCT is used
to extract features from face images, which then
undergo classification using SVM. Feature vectors
obtained from DWT-DCT selection are used as inputs
for the SVM classifier. Classification is carried out by
validating the input image with a trained face database.
Classification results including identities of the closest
matches and confidence scores contribute to the output
of the system.
To evaluate the performance of the proposed
method, the ORL face database is used. This database
was developed at the Olivetti Research Laboratory,
Cambridge, U.K. The ORL database contains 400
images of 40 people i.e., 10 different images for each
person. Images differ with respect to frontal views for
each person with some tolerance in pose and rotation.
The size of each image is 48 48 pixels in Bitmap file
format with 256 grey levels per pixel [11]. Three
individuals from the ORL face database with five
different images are shown in Figure 2.
The hybrid face recognition model presented in this
paper is developed using MATLAB R2008a v7.6.0.
The DWT and DCT are implemented using the
MATLAB Wavelet and Signal Processing Toolboxes.
A MATLAB library for support vector machines
(LIBSVM) [11] is used as the core of the multi-class
SVM classifier.
Input Image

DWT

DCT
Feature Extraction Classification
SVM
Classification
Output
Match

No Match

Face
Database
ORL Face Database
Figure 1. Proposed hybri d face recognition model


102



Figure 2. ORL face image database

5.1. Feature Extraction

For feature extraction face images were firstly,
preprocessed which includes noise removal, gray level
modification and equalization. The feature extraction
method consists of two stages: DWT and DCT.


(a)


(b) (c) (d) (e)


(f) (g) (h) (i)
Figure 3. 2D-DWT wavelet decomposition

For DWT computation, the image is decomposed
into blocks. Each block then undergoes wavelet
decomposition, producing an approximation image and
a sequence of detail images. For the work carried out,
Level 2 Haar wavelet decomposition is employed as
demonstrated in Figure 3. Haar wavelet decomposition
of Level 1 of Figure 3(a) produces one approximation
and three orientation detail images as shown in Figures
3(b), 3(c), 3(d) and 3(e) i.e., LL, HH, HL and HH sub-
bands respectively. Figure 3(b) contains the main
energy of the image concentrated on low frequency
components while the other three (3) sub-bands contain
much lesser energies. Similarly, for Level 2 Haar
wavelet decomposition the LL sub-band image (Figure
3(b)) is decomposed, which produces four images as
shown in Figures 3(f), 3(g), 3(h) and 3(i). In our
proposed work, 2D-DWT is used to extract the
coefficients of lowest frequency range in sub-bands.
Therefore, the LL sub-band component having the
highest image energy is selected for feature extraction.
Secondly, the 2D-DCT of LL sub-band component
is computed. The 2D-DCT of each of the 8 8 pixels,
36 image sub-blocks is computed using 8 of 64 DCT
coefficients. The remaining coefficients are discarded.
The image is then reconstructed by computing the 2D-
IDCT of each of the 36 blocks. Figure 4(a) represents
the LL sub-band image from Level 1 Haar wavelet
decomposition. As shown in Figure 4(b), only a few
components are visible after the 2D blocked-DCT. The
DC component and low frequency components are
concentrated in top left corner of each image block.
The resulting compressed image produced after 2D-
IDCT computation is shown in Figure 4(c).
Figure 4. 2D-DCT computation

To build DCT-feature vectors the average of some
entire DCT feature set was computed for each of the 36
sub-blocks. The most upper left DCT coefficients i.e.,
the DC component and first five AC components were
selected in a zigzag order. From each face image 36
6 = 216 DCT-features were extracted, where 36
represents the number of DCT sub-blocks in the image
and 6 is the number of the features in each sub-block.

5.2. SVM Classification

Image feature vectors obtained from DWT-DCT
selection are combined to obtain the support vectors,
satisfying eq. (7). For solving multi-class problems
LIBSVM [11] uses the One Against One or OAO
method. In the training stage, a C-SVM classifier is
used. It is known that, k(k-1)/2 classifiers are needed, if
k independent classes are required. It is noted that if k
cannot be represented as exponent of 2, k should be
separated as the sum of the exponent of 2. In this
experiment, the ORL database consists of 40 people,
therefore, k = 40 i.e., a 40-class C-SVM model is used.
(a)
(b)
2D-DCT
2D-IDCT
(c)
103

50% ORL
Testing data
C-SVM Prediction
Accuracy > 95%
No
C-SVM Prediction
Find optimal SVM
hyper-parameters
for SVM classifier
using Grid-Search

Cross-validation
Accuracy
C-SVM Trained
Classifier

Cross-validation using 70%
Training data and 30% Testing
data from 50% ORL face data

Bad Good
No
Classification
Result
End of Recognition Engine
Start of Recognition Engine
Yes
Figure 5. SVM classification engine

Classification accuracy of the SVM was optimized
using the Grid-Search method for different RBF kernel
parameters and cost parameters C. Exponentially
growing sequences of (C, ) were used to identify best
parameters, where C = [2
-5
, 2
-3
,,2
15
] and = [2
-15
, 2
-
13
,,2
3
] were used for 75 75 = 5625 combinations.
For each pair of (C, ) validation performance was
measured by training 70% classifier data and testing
the other 30% classifier data. Based on the highest 10-
fold Cross-validation accuracy 99.26%, optimal hyper-
parameters, C = 1 and = 0.8 were selected to fit the
SVM classifier. The C-SVM classification engine
modeled for face recognition is illustrated in Figure 5.

6. Experimental Results

Experiments carried out involved splitting the ORL
face database into training and testing sets. For the
purpose of training, 70% of the ORL face images (280
images) were used, while the other 30% (120 images)
were used for testing the face recognition model. All
experimental results obtained were an average of 10
consecutive simulations, with different sets being used
for training and testing each time.
6.1. Wavelet Decomposition Level

The first experiment studies the effect of using
DWT for Haar wavelet decomposition on the
recognition rate of the system. Experimental results
carried out to compare performance of different
wavelet decomposition levels with their recognition
rates as illustrated in Table 1.

Table 1. Recognition Rates for Different
Wavelet Decomposition Levels
Decomposition
level
Evaluation time
(secs)
Recognition rate
(%)
1 35.42 98.71
2 42.20 97.03

As can be seen from Table 1, Haar Level 1 wavelet
decomposition yields a better recognition rate than
Level 2 decomposition. This is because the LL sub-
band contains most sufficient energy concentrated on
low frequency components in contrast with the LL
2

sub-band contains, which contains much lesser energy
with reduced dimensions.

6.2. DCT Block Size

The second experiment studies the effect of DCT
block size on the rate of recognition of the system with
each DCT coefficient being used in the feature vector.
Table 2 indicates that the best recognition rate obtained
is for the case of 6 6 DCT block size.

Table 2. Recognition Rates for Different
DCT Block Sizes
DCT block size

2
x
2

4
x
4

6
x
6

8
x
8

Recognition
rate (%)
97.42 98.16 98.97 98.71

6.3. DCT Feature Vector Size

The third experiment is concerned with the
computational load, which comes from large sized
DCT-feature vectors. The aim of this experiment is to
determine if smaller DCT-feature vectors can be used
without significantly degrading system performance.
For the chosen DCT block size of 6 6 pixels, a
total of 36 DCT coefficients are computed for every
sample. Each of these coefficients represent a separate
dimension in a 36-dimensional feature space. By
assessing the variance in each dimension of this space,
it was possible to determine which of the coefficients
104

contribute most to the final decision of the classifier.
Results obtained in Table 3 reveal that in spite of the
dramatic reduction from 216 DCT-features to only 22,
the recognition rates obtained are essentially the same.
This experiment demonstrates that good face
recognition performance is possible, even with feature
vectors that are dramatically reduced in size relative to
the usual case for DCT-based analysis.

Table 3. Recognition Rates for DCT
Feature Vector Sizes
DCT feature
vector size
Evaluation time
(secs)
Recognition rate
(%)
22 14.06 98.90
216 35.42 98.97

For comparison purposes, face recognition results
as reported by the respective authors on the ORL face
database are shown in Table 4. It can be seen that our
proposed hybrid DWT-DCT-SVM face recognition
approach outperforms other techniques proving that the
hybrid DWT-DCT combination of feature selection
provides more accurate recognition.

Table 4. Comparative Results on ORL
Face Database
Method
Recognition
rate (%)
Ref.
Eigenface 90.50 [4]
FisherFace (PCA + LDA) 95.00 [4]
Gabor Wavelet + KPCA + SVM 95.40 [7]
Discriminant Wavelet + NFS 96.10 [8]
DWT-PCA 96.50 [6]
DWT-SHMM 97.00 [9]
SVM 97.00 [4]
DCT-LDA 97.50 [4]
DWT-HMM 98.50 [5]
DWT-DCT-SVM 98.90 Proposed

7. Conclusion

In this paper a novel framework for pose invariant
human face recognition is presented. This framework
uses a hybrid combination of the DWT and DCT to
extract dominant frequency features from face images.
The DWT is used to capture major face features,
whereas the DCT extracts the most dominant face
information from the lower frequency components in
the DCT frequency domain. Furthermore, SVM is
adopted to classify DWT-DCT based features for
recognition purposes. The proposed DWT-DCT-SVM
face recognition model is evaluated in MATLAB on
the ORL face database. Experimental results obtained
reveal that the proposed DWT-DCT-SVM approach
shows good accuracy and outperforms previously
proposed face recognition techniques. Furthermore, a
reduced feature space dramatically reduces the
computational requirements as compared with standard
DCT feature extraction methods. making our system
well suited for low-cost, real-time implementation.

8. References

[1] R. Chellappa, C. L.Wilson, and S. Sirohey, Human and
machine recognition of faces: A survey Proceedings of the
IEEE, Vol. 83, No. 5, pp. 705-741, May 1995.

[2] J. Nagi, S. K. Ahmed, and F. Nagi, A MATLAB based
Face Recognition System using Image Processing and
Neural Networks in Proc. of 4th International Colloquium
on Signal Processing and its Applications, pp. 83-88, 2008.

[3] E. Hjelms, and B. K. Low, Face detection: A survey
Computer Vision and Image Understanding, Vol. 83, No. 3,
pp. 236-274, Sept. 2001.

[4] Zhang Yankun, and Liu Chongqing, Efficient face
recognition method based on DCT and LDA Journal of
Systems Engineering and Electronics, Vol. 15, No. 2, pp
211-216, 2004.

[5] Vinayadatt V. Kohir and U. B. Desai, DWT-HMM Based
Face Recognition in Proc. of Indian Conference on
Computer Vision and Image Processing, pp. 173-178, 1998.

[6] P. Nicholl, and A. Amira, DWT/PCA Face Recognition
using Automatic Coefficient Selection in Proc. of the 4th
IEEE International Symposium on Electronic Design, Test
and Applications, pp. 390-393, 2008.

[7] Guang Dai, and Changle Zhou, Face Recognition Using
Support Vector Machines with the Robust Feature in Proc.
of IEEE International Workshop on Robot and Human
Interactive Communication, pp. 49-53, 2003.

[8] Jen-Tzung Chien, and Chia-Chen Wu, Discriminate
wavelet faces and nearest feature classifiers for face
recognition IEEE Transactions on PAMI, Vol. 24, No. 2,
pp. 644-1649, 2004.

[9] P. Nicholl, A. Amira, D. Bouchaffra, and R. H. Perrott,
Multiresolution Hybrid Approaches for Automated Face
Recognition in Proc. of the Second NASA/ESA Conference
on Adaptive Hardware and Systems, pp. 89-96, 2007.

[10] Dmitry Briliuk (2002). Normalized ORL face database.
Face recognition articles and demos. [Online]. Available:
http://handysolution.com/facerec.htm

[11] C.-C. Chang, and C.-J. Lin. LIBSVM: A library for
support vector machines. [Online]. Available:
http://www.csie.ntu.edu.tw/~cjlin/libsvm

Das könnte Ihnen auch gefallen