Beruflich Dokumente
Kultur Dokumente
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 9, 2009 at 04:34 from IEEE Xplore. Restrictions apply.
Pattern Pattern+Bias Pattern×Gain
hface
x (Mi (x)) = αhface
x (Mi (x)) , ∀x ∈ W. (6)
P face
If ci is a nonface and x∈W hx (Mi (x)) > γ then demotion
Original
takes place
hface
x (Mi (x)) = βhface
x (Mi (x)) , ∀x ∈ W. (7)
(2L )D
X
vx = |hx (Pi )|, ∀x ∈ W (8)
i=1
in the feature patch. Sorting all the feature relevance values in the
patch will result in an importance list. Let W ⊆ W be a subset
chosen to contain the features with the largest relevance values. Then
Fig. 1. Example 4 × 4 local area patterns and SMQT results. X
θ = hx (M(x)) (9)
x∈W
local patterns which contain the same structure will yield the same can function as a weak classifier, rejecting no faces within the train-
SMQT features for a specified level L, see Fig. 1. ing database, but at the cost of an increased number of false detec-
The number of possible patterns using local SMQT features will
` ´D tions. The desired threshold used on θ is found from the face in the
be 2L . For example the 4×4 pattern at L = 1 in Fig. 1 has training database that results in the lowest classification value from
` 1 ´4×4 Eq. (9).
2 = 65536 possible patterns.
Extending the number of sub classifiers can be achieved by se-
lecting more subsets and performing the same operations as described
3. SPLIT UP SNOW CLASSIFIER for one sub classifier. Consider any division, according to the rele-
vance values, of the full set W ⊆ W ⊆ . . . ⊆ W. Then W
The SNoW learning architecture is a sparse network of linear units has fewer features and more false detections compared to W and
over a feature space [9]. One of the strong properties of SNoW is so forth in the same manner until the full classifier is reached. One
the possibility to create lookup-tables for classification. Consider a of the advantages of this division is that W will use the sum result
patch W of the SMQT features M(x), then a classifier from W . Hence, the maximum of summations and lookups in the
table will be the number of features in the patch W.
X X
θ= hnonface
x (M(x)) − hface
x (M(x)) (4)
x∈W x∈W 4. FACE DETECTION TRAINING AND CLASSIFICATION
II 590
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 9, 2009 at 04:34 from IEEE Xplore. Restrictions apply.
A circular mask containing P = 648 pixels is applied to each This setting will remove over 90% of the background patches in the
patch to remove background pixels, avoid edge effects from possi- initial stages from video frames recorded in an office environment.
ble filtering and to avoid undefined pixels at rotation operation, see Overlapped detections are pruned using geometrical location and
Fig. 2. classification scores. Each detection is tested against all other de-
tections. If one of the area overlap ratios is over a fixed threshold,
Pixels Pixels masked then the different detections are considered to belong to the same
face. Given that two detections overlap each other, the detection with
the highest classification score is kept and the other one is removed.
This procedure is repeated until no more overlapping detections are
found.
Mask
Fig. 2. Masking of pixel image and feature indices image. The fea-
tures are here found by using a 3 × 3 local area and L = 1.
With the SNoW and the split up SNoW classifier, the lookup
table is the major memory-intense issue. Consider the use of Nbit =
32 bit floating numbers in the table, then the classifier size (in bits)
will be
D↓ L→ 1 2 3
2×2 40.5 KB 648 KB -
3×3 1.26 MB 648 MB 324 GB Fig. 3. Left - face image marked with three landmarks. Right -
4×4 162 MB 10.1 TB 648 PB examples of how the three landmarks are used to warp the face to
5×5 81 GB - - the 32 × 32 patches with different destination points for variation.
Table 1. Size of the classifier table with different local area sizes and
different levels of the SMQT. P = 648 and Nbit = 32, see Eq. (11).
4.2. Nonface Database
The choice of the local area and the level of the SMQT are of Initially the nonface database contains randomly generated patches.
vital import to successful practical operation. For the split up SNoW A classifier is then trained using this nonface database and the face
classifier, with fast lookup table operation, one of the properties to database. A collection of videos are prepared from clips of movies
consider is memory. Another is the local area required to make containing no faces and are used to bootstrap the database by ana-
Eq. (3) valid. Finally, the level of the transform is important in order lyzing all frames in the videos. Every false positive detection in any
to control the information gained from each feature. In this paper, frame will be added to the nonface database. The nonface database
the 3 × 3 local area and level L = 1 are used and found to be a is expanded using this bootstrap methodology. In final training, a
proper balance for the classifier. Some tests with 3 × 3 and L = 2 total of approximatively one million nonface patches are used after
were also conducted. Although these tests showed promising results, bootstrapping.
the amount of memory required made them impractical, see Tab. 1.
The face and nonface tables are trained with the parameters α = 5. RESULTS
1.005, β = 0.995 and γ = 200. The two trained tables are then
combined into one table according to Eq. 5. Given the SNoW clas- The proposed face detector is evaluated on the CMU+MIT database
sifier table, the proposed split up SNoW classifier is created. The [6] which contains 130 images with 507 frontal faces and the BioID
splits are here performed on 20, 50, 100, 200 and 648 summations. database [11] which has 1521 images showing 1522 upright faces.
II 591
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 9, 2009 at 04:34 from IEEE Xplore. Restrictions apply.
For the scanning procedure used here, the CMU+MIT database has into several weaker classifiers in cascade. Each weak classifier uses
77138600 patches to analyze and the BioID database 389252799 the result from previous weaker classifiers which makes it computa-
patches. Both these databases are commonly used for upright face tionally efficient.
detection within the face detection community. The performance A face detection system using the local SMQT features and the
is presented with a Receiver Operation Characteristic (ROC) curve split up SNoW classifier was proposed. The face detector achieves
[13] for each database, see Fig. 4. With regard to the scanning used the best published ROC curve for the BioID database, and a ROC
here, the False Positive Rate (FPR) is 1.93 ∗ 10−7 and the True Pos- curve comparable with state-of-the-art published face detectors for
itive Rate (TPR) is 0.95 if the operation on both databases is consid- the CMU+MIT database.
ered (77138600+389252799 patches analyzed).
The proposed local SMQT features and the split up SNoW clas- 7. REFERENCES
sifier achieves the best presented BioID ROC curve and compara-
ble results with other works on the CMU+MIT database. An exten- [1] O. Lahdenoja, M. Laiho, and A. Paasio, “Reducing the feature
sive comparison to other works on these databases can be found in vector length in local binary pattern based face recognition,” in
[2, 4, 5, 11]. IEEE International Conference on Image Processing (ICIP),
September 2005, vol. 2, pp. 914–917.
1
[2] B. Froba and A. Ernst, “Face detection with the modified cen-
sus transform,” in Sixth IEEE International Conference on Au-
0.9 tomatic Face and Gesture Recognition, May 2004, pp. 91–96.
[3] M. Nilsson, M. Dahl, and I. Claesson, “The successive mean
quantization transform,” in IEEE International Conference on
True Positive Rate (TPR)
0.8
Acoustics, Speech, and Signal Processing (ICASSP), March
2005, vol. 4, pp. 429–432.
0.7 [4] M.-H. Yang, D. Kriegman, and N. Ahuja, “Detecting faces in
images: A survey,” IEEE Transactions on Pattern Analysis and
Machine Intelligence (PAMI), vol. 24, no. 1, pp. 34–58, 2002.
0.6
[5] E. Hjelmas and B. K. Low, “Face detection: A survey,” Com-
puter Vision and Image Understanding, vol. 3, no. 3, pp. 236–
0.5 274, 2001.
BioID − database [6] H. Rowley, S. Baluja, and T. Kanade, “Neural network-based
MIT+CMU − database
0.4 face detection,” in In Procedings of Computer Vision and Pat-
0 5 10 15 20 25 30 35 40
Number of False Positives (FP)
tern Recognition, June 1996, pp. 203–208.
[7] H. Schneiderman and T. Kanade, “Probabilistic modeling of
Fig. 4. Detection results on MIT+CMU (130 images, 507 faces local appearance and spatial relationships for object recogni-
to detect and 77138600 patches to analyze) and the BioID (1521 tion,” in Proceedings of the IEEE Conference on Computer
images, 1522 faces to detect and 389252799 patches to analyze) Vision and Pattern Recognition (CVPR ’98), July 1998, pp. 45–
databases. 51.
[8] P. Viola and M. Jones, “Rapid object detection using a boosted
Note that the masking performed on each patch restricts detec- cascade of simple features,” in In Proceedings of the 2001
tion of faces located on the edge of images, since important infor- IEEE Computer Society Conference on Computer Vision and
mation, such as the eyes, can be masked away in those particular Pattern Recognition (CVPR), 2001, vol. 1, pp. 511–518.
positions. This is typically the case with only few of the images [9] D. Roth, M. Yang, and N. Ahuja, “A snow-based face detector,”
found in the BioID database, hence to achieve a detection rate of in In Advances in Neural Information Processing Systems 12
one requires a large amount of false detections for those particular (NIPS 12), pp. 855–861, MIT Press 2000.
faces. The patches of size 32 × 32 also restrict detection of smaller [10] E. Osuna, R. Freund, and F. Girosi, “Training support vector
faces unless upscaling is performed. The upscaling could be utilized machines:an application to face detection,” in Proceedings of
on the CMU+MIT database, since it contains some faces that are of the IEEE Conference on Computer Vision and Pattern Recog-
smaller size, however it is not considered here for the purpose of fair nition (CVPR ’97), 1997, pp. 193–199.
comparison with other works. Some of the faces were missed in the
[11] K. J. Kirchberg, O. Jesorsky, and R. W. Frischholz, “Robust
databases - a result which may have ensued due to scanning issues
face fetection using the hausdorff distance,” in Audio- and
such as masking or patch size.
Video-Based Person Authentication - AVBPA 2001, Josef Bi-
gun and Fabrizio Smeraldi, Eds., Halmstad, Sweden, 2001,
6. CONCLUSIONS vol. 2091 of Lecture Notes in Computer Science, pp. 90–95,
Springer.
This paper has presented local SMQT features which can be used as [12] E. Adelson, “Lightness perception and lightness illusions,” in
feature extraction for object detection. Properties for these features The Cognitive Neurosciences, M. Gazzaniga, Ed. MIT Press,
were presented. The features were found to be able to cope with Cambridge, MA, 1999.
illumination and sensor variation in object detection.
[13] T. Fawcett, “Roc graphs: Notes and practical considerations
Further, the split up SNoW was introduced to speed up the stan-
for researchers,” Tech. Rep., HP Laboratories, MS 1143, 1501
dard SNoW classifier. The split up SNoW classifier requires only
Page Mill Road, Palo Alto CA 94304, USA, 2004.
training of one classifier network which can be arbitrarily divided
II 592
Authorized licensed use limited to: IEEE Xplore. Downloaded on January 9, 2009 at 04:34 from IEEE Xplore. Restrictions apply.