Beruflich Dokumente
Kultur Dokumente
3D object Recognition
COMPUTER GRAPHICS
(CSE-405)
INDEX
Acknowledgment
Introduction
Algorithms
Neural Network
References
Acknowledgement
I would like to thanks my mam, Ms. Shashi Raina who gave me the topic which is completely of my
interest. Her continuous help in completions of my research paper is very appreciable for me. All my
friends who helped me time to time with every stuff that was required to complete this.
Warm Regards,
Praneet Jain
Pattern recognition and pattern detection are both instances of a wider class of computer vision
problems, called pattern classification.
Face Detection:
Given as input an arbitrary image, which could be a digitized video signal or a scanned photograph,
determine whether or not there are any human faces in the image, and if there are, return an encoding
of the location and spatial extent of each human face in the image.
Face Recognition:
Given an input image of a face, compare the input face against models in a library of known faces
and report if a match is found.
Face Localization:
The input image contains exactly one human face, and the task is to determine the location and scale
of the face, and sometimes also its pose.
View-based Detection:
It is used by almost all face detection systems. View-based detection is to detect face or non-face
based on a small view window (20 x 20 pixel described ).
Overall, the face detection can be extremely difficult given various inputs with different facial
presents, and lighting, shadow, scaling and dimensional variances.
Algorithms
Training Data Preparation:
- For each face and non-face image:
o Subtract out an approximation of the shading plane to correct for single light source
effects.
o Rescale histogram so that every image has the same gray level range.
- Aggregate data into data sets.
c
Then
I '( x, y ) = I ( x, y ) − (a * x + b * y + c)
In order to implement the linear fit function, I used the Java Image Processing API [5], and the Java
Matrix Class [6]. The histogram equalization function is from the Java Image Processing API.
Also, as [2] suggests, it is a good idea to mask an oval within the face rectangle to prune the pixel
used in training in neural net. Figure 1 shows the some of the processed images. Figure 2 shows
some non-face images.
Figure1. The first line is the original faces. The second line is processed by linear fit function. The
third line is processed by histogram equalization function. The last line is masked by the oval mask.
Figure2. Non-face images are from randomly picked images with no face included.
Neural Network
Output
Hidden
Bias
Input
X1 X2
Neural Network is the simulation of human brain. Each neuron has an activation function, which can
be a linear or sigmoid function. Basically, the activation function of the neuron computes a value
based on the previous layer’s output and the weights of the links connected to it. Often used
activation functions include Identity Activation Function, Sigmoid Activation Function, Tanh
Activation Function, and etc. There are three kinds of neuron layers, which are input, hidden and
output. Both input and output have only one layer, but there can be multiple hidden layers in the net.
There is a special neuron named Bias in the neural network, which is used for all 0 inputs. That
means without the Bias, if all input neurons get 0, the weights are useless and the output is always 0.
Both [1] and [2] suggested using a three layers neural network, which includes one hidden layer. In
[1], the author stated that it is important to split the input into small pieces instead of using complete
connections on the entire input (Figure 3). While in [2], their experiments show that the detailed net
architecture is not crucial. I implemented the basic Backpropagation Neural Network based on the
code from [4]. The net has 400 input neurons mapping the 400 pixels in the view window, 20 hidden
neurons and 1 output neuron, with complete connections between layers. I also tried the architecture
mentioned by [1], but there was no apparent evidence to show that the results are better.
Figure 4: The basic algorithm used in [1]. The left row is the input image pyramid, scaled by factor
of 1.2. The middle row is the input images processed by the brightness gradient correction and
histogram equalization. The right row is the neural network architecture, whose input neurons are
grouped by different input regions.
Figure 5. The results above were created by feeding the net with the same data as they were trained.
Figure 6. From left to right is: original image, mirror flipped, translated up, translated down,
translated left, translated right, scaled 0.9, scaled 0.95, scaled 1.05 and scaled 1.10.
Figure 5: 48 x 48 pixel images. The first two faces are in the training data set, while the last three are
not. The faces are all detected.
Figure 6: 100 x 76 pixel image. The image is not in the training data set. The face is detected, but
with a lot of false detections.
Figure 7: 120 x 86 pixel image. The image is not in the training data set. None of the two faces is
detected.
Figure 8: 150 x 65 pixel image. The image is not in the training data set. Only one of the 7 faces is
detected, with a lot of false detections.
[2] Sung, Kah-Kay and Poggio, Tomaso, Example-based learning for view-based human face
detection. A.I. Memo 1521, CBCL Paper 112, MIT, December 1994.
[3] Duda, R.O., Hart, P.E. and Stork, D.G. Pattern Classification. Wiley, New York, 2001.
[4] Russell, S.J. and Norvig, P. Artificial intelligence: A modern approach. Prentice Hall/Pearson
Education, Upper Saddle River, N.J., 2003.