Object Detection - Basics

Localization, Classification, Evaluation Descriptors, Classifiers, Learning Performance of Object Detectors HoG
Object Detection - Basics1
Lecture 28
See Sections 10.1.1, 10.1.2, and 10.1.3 in

Reinhard Klette: Concise Computer Vision
Springer-Verlag, London, 2014
1
See last slide for copyright information.
1 / 33
Agenda
1 Localization, Classification, Evaluation
2 Descriptors, Classifiers, Learning
3 Performance of Object Detectors
4 Descriptor Example: Histogram of Oriented Gradients
2 / 33
Localization
Localization, classification, and evaluation are three basic steps of an

object detection system
Object candidates are localized within a rectangular bounding box
3 / 33
Classification
Localized object candidates are mapped by classification either in detected
objects or rejected candidates
Face detection: one false-positive and two false-negatives (not counting

the side-view of a face)
4 / 33
Evaluation
A true-positive, also called a detection, is a correctly-detected object
A false-positive, also called a false detection, occurs if we detect an object

where there is none
A false-negative denotes a case where we miss an object
A true-negative describes the cases where non-object regions are correctly

identified as non-object regions (typically not of interest)
5 / 33
Which one is TP or FP or FN or TN?
6 / 33
Agenda
7 / 33
Descriptors
Classification is membership in pairwise-disjoint classes being subsets of

Rn , where n > 0 is defined by the used descriptors
A descriptor x = (x1 , . . . , xn ) is a point in the n-dimensional descriptor
space Rn representing measured or calculated property values in a given
order
Two Examples: n = 128 for SIFT

n = 2 on the next page: descriptor space is defined by properties
perimeter and area; e.g. descriptor x1 = (621.605, 10 940) for
Segment 1
8 / 33
Example: 2D Descriptor Space
Left: Regions in a segmented image. Right: Descriptor space

Area
80,000 3
70,000
1 4
60,000
50,000
-1
3
2 +1
40,000
5
6 30,000
20,000
10,000 1 2
5 6 Perimeter
4
200 600 1,000 1,400 1,800 2,200 2,600
The blue line defines a binary classifier; it subdivides the descriptor space
into two half-planes such that descriptors in one half-plane have value +1
(i.e. +1 is a class identifier) assigned, and -1 if in the other half-plane
9 / 33
Classifiers
A classifier (i.e. a partioning of the descriptor space) assigns class numbers

to descriptors
Training: using a given set {x1 , . . . , xm } of already-classified descriptors
(the learning set) for defining the partitioning (the classifier)
Application: on descriptors generated for recorded data
General classifier: Assigns class numbers 1, 2, . . . , k for k > 1 classes, and
0 for not classified
Binary classifier: Assigns class numbers 1 or +1
10 / 33
Weak or Strong Classifiers
A classifier is weak if it does not perform up to expectations (e.g., it might

be just a bit better than random guessing)
Multiple weak classifiers can be mapped into a strong classifier, aiming at
a satisfactory solution of a classification problem
Weak or strong classifiers can be general-case (i.e. multi-class) classifiers
or just binary classifiers; just being binary does not define weak
Example: AdaBoost defines a statistical combination of multiple weak

classifiers into one strong classifier (see later)
11 / 33
Example 1: Binary Classifier by Linear Separation
We define a binary classifier by constructing a hyperplane
: w> x + b = 0
in Rn , for n 1
Vector w Rn is the weight vector
Real b R is the bias of
Example: n = 2 or n = 3, then w is the gradient or normal orthogonal to

the defined line or plane , respectively
12 / 33
Example 1: Continued
x2 x2
x1 x1
Left: Linear-separable distribution of descriptors pre-classified to be either

in class +1 (green descriptors) or -1 (red descriptors)
Right: Not linear separable; sum of shown distances (black line segments)
of misclassified descriptors defines total error for
13 / 33
h(x) = w> x + b
h(x) 0: One side of the hyperplane (including the plane itself) defines
value +1
h(x) < 0: The other side (not including the plane itself) value -1
Linear classifier defined by w and b can be calculated for a distribution of
(pre-classified) training descriptors in nD descriptor space
Error for a misclassified descriptor x is the perpendicular distance
>
w x + b
d2 (x, ) =
||w||2
to the hyperplane
Task: Calculate such that total error for all misclassified training
descriptors is minimized
14 / 33
Example 2: Classification by Using a Binary Decision Tree
Classifier defined by binary decisions at split nodes in a tree

(i.e. yes or no)
Each decision is formalized by a rule, and given input data can be tested
whether they satisfy the rule or not
Accordingly, we proceed with the identified successor node in the tree
Each leaf node of the tree defines finally an assignment of data arriving at
this node into classes
Example: each leaf node identifies exactly one class in Rn ; see next slide
for n = 2
15 / 33
Left: Decision tree
Right: Resulting subdivison in 2D descriptor
x
space
2
200
x1 < 100
180
yes no 160
x2 >60 x1 >160 140
120
yes no yes no
100
x1 + x2 < 120
80
yes no 60
40
20
x1
20 40 60 80 100 120 140 160 180 200
Tested rules in the shown example of a tree define straight lines in the 2D
descriptor space; descriptors arriving at one of the leaf nodes are then in
one of the shown subsets of R2
16 / 33
Trees, Forests, Cascades of Binary Classifiers
A single decision tree (defined by at least one split node) can be

considered to be an example for a weak classifier
A set of decision trees, called a forest, can then be used for defining a
strong classifier
Observation.
A single decision tree provides a way to partition a descriptor space into
multiple regions (i.e. classes)
When applying binary classifiers defined by linear separation then we need
to combine several of those (e.g. in a cascade) to achieve a similar
partitioning of a descriptor space
17 / 33
Learning
Learning is the process when defining or training a classifier based on a set

of descriptors
Classification is the actual application of the classifier
During classification we may also identify some misbehavior, and this can
lead again to another phase of learning
The set of descriptors used for learning may be pre-classified or not
Supervised learning: We have a mechanism for assigning class numbers to
descriptors (e.g. manually based on expertise such as yes, the driver
does have closed eyes in this image)
Unsupervised learning: We do not have prior knowledge about class
memberships of descriptors, e.g. for randomly selected patches in an
image: a typical patch for a pedestrian or not?
18 / 33
Unsupervised Learning: Two Examples
Data distribution in learning set decides about the classifier

Clustering
Apply a clustering algorithm for a given set of descriptors for identifying a
separation of Rn into classes
Example: Analyze the density of the distribution of given descriptors in
Rn ; a region having a dense distribution defines a seed point of one class,
and then we assign all descriptors to identified seed points by applying, for
example, the nearest-neighbor rule
Learn Rules at Split Nodes in a Decision Tree
Learn decision rules at split nodes e.g. by having a general scheme how to
define such rules, and optimise parameters by maximising the information
gain at this split node (e.g. equal number of training descriptors passing
to either the left or the right successor)
19 / 33
Positive (for Pedestrian) and Negative Class Examples
20 / 33
Combined Learning Approaches
There are also cases where we may combine supervised learning with
strategies known from unsupervised learning
Example
Supervised: Decide whether a given bounding box shows a pedestrian, or
decide for a patch, being a subwindow of a bounding box, whether it
possibly belongs to a pedestrian
Unsupervised: Generate a decision tree, e.g. by maximising information
gain at split nodes
Result: Assign class probabilities to a leaf node in the generated tree
according to percentages of pre-classified descriptors arriving at this leaf
node
21 / 33
Agenda
22 / 33
Object Detector and Measures
An object detector is defined by applying a classifier for an object

detection problem
We assume that any made decision can be evaluated as being either
correct or false
Evaluations of designed object detectors are required to compare their
performance under particular conditions
There are common measures in pattern recognition or information retrieval
for performance evaluation of classifiers
23 / 33
Basic Definitions
Let tp or fp denote the numbers of true-positives or false-positives,

respectively
Let tn or fn denote the numbers of true-negatives or false-negatives,
respectively
What are the numbers for the example on Page 6?
Note: just the image does not indicate how many non-object regions have
been analyzed (and correctly identified as being no faces); thus we cannot
specify the number tn; we need to analyze the applied classifier for
obtaining tn
24 / 33
PR, RC, MR, and FPPI

Precision is the ratio of true-positives compared to all detections
Recall (or sensitivity) is the ratio of true-positives to all potentially
possible detections
tp tp
PR = and RC =
tp + fp tp + fn
PR = 1: no false-positive is detected
RC = 1: all visible objects are detected & there is no false-negative
Miss rate is the ratio of false-negatives to all objects

False-positives per image is the ratio of false-positives to all detected
objects
fn fp
MR = = 1 RC and FPPI = = 1 PR
tp + fn tp + fp
MR = 0: all visible objects are detected
FPPI = 0: detected objects are correctly classified
25 / 33
TNR and AC
tn is not a common entry for performance measures, but, if available then

we also have TNR and AC:
True-negative rate (or specificity) is the ratio of true-negatives to all

decisions in no-object regions
Accuracy is the ratio of correct decisions to all decisions
tn tp + tn
TNR = and AC =
tn + fp tp + tn + fp + fn
26 / 33
Detected?
How to decide whether a detected object is true-positive?
Assume: Objects in images have been locally identified (e.g. manually) by
bounding boxes, serving as the ground truth
Detected objects are matched with these ground-truth boxes by
calculating ratios of areas of overlapping regions
A(D T )
ao =
A(D T )
where A denotes the area of a region in an image, D is the detected
bounding box of the object, and T is the area of the bounding box of the
matched ground-truth box
If ao T , say for T = 0.5, the detected object is taken as a true-positive
If more than one possible matching for a detected bounding box then use
the one with the largest ao -value
27 / 33
Agenda
28 / 33
Scanning an Image for Object Candidates
1 Window of the size of the expected bounding box scans through an

image
2 The scan stops at potential object candidates
3 If a potential bounding box has been identified, a process for

descriptor calculation starts
Histogram of oriented gradients (HoG) is a common way to derive a

descriptor for a bounding box for an object candidate
29 / 33
Bounding Box, Blocks, and Cells

A bounding box (here: of a pedestrian) is subdivided into blocks, and each
block into smaller cells for calculating the HoG
Yellow solid or dashed blocks are subdivided into red cells; a block moves
left to right, top down, through a bounding box
Right: Magnitudes of gradient vectors
30 / 33
Algorithm for Calculating the HoG Descriptor
1 Preprocessing. Intensity normalization and smoothing

2 Calculate an edge map. Gradient magnitudes and gradient angles for
each pixel, generating a magnitude map Im and an angle map Ia
3 Spatial binning.
1 Group pixels into non-overlapping cells (e.g. 8 8)
2 Accumulate magnitude values in Im into direction bins (e.g., nine bins
for intervals of 20 each) to obtain a voting vector for each cell
calculation
4 Normalize voting values for generating a descriptor.
1 Group cells (e.g., 2 2) into one block
2 Normalize voting vectors over each block, and combine them into one
block vector
5 Concatenation. Augment all block vectors consecutively; this
produces the final HoG descriptor
31 / 33
Two Examples
Length of vectors in nine different directions in each cell represents the

accumulated magnitude of gradient vectors for one of those nine directions
32 / 33
Copyright Information
This slide show was prepared by Reinhard Klette

with kind permission from Springer Science+Business Media B.V.
The slide show can be used freely for presentations.

However, all the material is copyrighted.
R. Klette. Concise Computer Vision.

Springer-Verlag,
c London, 2014.
In case of citation: just cite the book, thats fine.
33 / 33

Object Detection - Basics

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Object Detection - Basics

Hochgeladen von

Copyright:

Verfügbare Formate

Localization, Classification, Evaluation Descriptors, Classifiers, Learning Performance of Object Detectors HoG

Object Detection - Basics1

See Sections 10.1.1, 10.1.2, and 10.1.3 in

1 Localization, Classification, Evaluation

2 Descriptors, Classifiers, Learning

3 Performance of Object Detectors

4 Descriptor Example: Histogram of Oriented Gradients

Localization, classification, and evaluation are three basic steps of an

Object candidates are localized within a rectangular bounding box

Face detection: one false-positive and two false-negatives (not counting

A true-positive, also called a detection, is a correctly-detected object

A false-positive, also called a false detection, occurs if we detect an object

A false-negative denotes a case where we miss an object

A true-negative describes the cases where non-object regions are correctly

Which one is TP or FP or FN or TN?

1 Localization, Classification, Evaluation

2 Descriptors, Classifiers, Learning

3 Performance of Object Detectors

4 Descriptor Example: Histogram of Oriented Gradients

Classification is membership in pairwise-disjoint classes being subsets of

Two Examples: n = 128 for SIFT

Example: 2D Descriptor Space

Left: Regions in a segmented image. Right: Descriptor space

A classifier (i.e. a partioning of the descriptor space) assigns class numbers

Weak or Strong Classifiers

A classifier is weak if it does not perform up to expectations (e.g., it might

Example: AdaBoost defines a statistical combination of multiple weak

Example 1: Binary Classifier by Linear Separation

We define a binary classifier by constructing a hyperplane

Example: n = 2 or n = 3, then w is the gradient or normal orthogonal to

Left: Linear-separable distribution of descriptors pre-classified to be either

Example 2: Classification by Using a Binary Decision Tree

Classifier defined by binary decisions at split nodes in a tree

Trees, Forests, Cascades of Binary Classifiers

A single decision tree (defined by at least one split node) can be

Learning is the process when defining or training a classifier based on a set

Unsupervised Learning: Two Examples

Data distribution in learning set decides about the classifier

Positive (for Pedestrian) and Negative Class Examples

Combined Learning Approaches

1 Localization, Classification, Evaluation

2 Descriptors, Classifiers, Learning

3 Performance of Object Detectors

4 Descriptor Example: Histogram of Oriented Gradients

Object Detector and Measures

An object detector is defined by applying a classifier for an object

Let tp or fp denote the numbers of true-positives or false-positives,

What are the numbers for the example on Page 6?

PR, RC, MR, and FPPI

Miss rate is the ratio of false-negatives to all objects

tn is not a common entry for performance measures, but, if available then

True-negative rate (or specificity) is the ratio of true-negatives to all

1 Localization, Classification, Evaluation

2 Descriptors, Classifiers, Learning

3 Performance of Object Detectors

4 Descriptor Example: Histogram of Oriented Gradients

Scanning an Image for Object Candidates

1 Window of the size of the expected bounding box scans through an

2 The scan stops at potential object candidates

3 If a potential bounding box has been identified, a process for

Histogram of oriented gradients (HoG) is a common way to derive a

Bounding Box, Blocks, and Cells

Algorithm for Calculating the HoG Descriptor