Sie sind auf Seite 1von 39

Agenda

Introduction
Bag-of-words models
Visual words with spatial location
Part-based models
Discriminative methods
Segmentation and recognition
Recognition-based image retrieval
Datasets & Conclusions

Classifier based methods


Object detection and recognition is formulated as a classification problem.
The image is partitioned into a set of overlapping windows
and a decision is taken at each window about if it contains a target object or not.
Background

Where are the screens?

Bag of image patches

Decision
boundary

Computer screen

In some feature space

Discriminative methods
Nearest neighbor

Neural networks

106 examples

Support Vector Machines and Kernels

Conditional Random Fields

Nearest Neighbors
Difficult due to high intrinsic dimensionality of images
- lots of data needed
- slow neighbor lookup

106 examples

Shakhnarovich, Viola, Darrell 2003

Torralba, Fergus, Freeman 2008

Multi-layer Hubel-Wiesel
architectures
Neural networks

Biologically inspired

LeCun, Bottou, Bengio, Haffner 1998


Rowley, Baluja, Kanade 1998
Hinton & Salakhutdinov 2006
Ranzato, Huang, Boureau, LeCun 2007
Riesenhuber & Poggio 1999
Serre, Wolf, Poggio. 2005
Mutch & Lowe 2006

Support Vector Machines


Face detection

Combining Multiple Kernels

Heisele, Serre, Poggio, 2001

Varma & Roy 2007


Bosch, Munoz, Zisserman 2007

Pyramid Match Kernel

Grauman & Darrell 2005

Conditional Random Fields


Kumar & Hebert 2003

Quattoni, Collins, Darrell 2004

More in
Segmentation section

Boosting
A simple algorithm for learning robust classifiers
Freund & Shapire, 1995
Friedman, Hastie, Tibshhirani, 1998

Provides efficient algorithm for sparse visual


feature selection
Tieu & Viola, 2000
Viola & Jones, 2003

Easy to implement, not requires external


optimization tools.

A simple object detector with Boosting


Download
Toolbox for manipulating dataset
Code and dataset

Matlab code
Gentle boosting
Object detector using a part based model

Dataset with cars and computer monitors

http://people.csail.mit.edu/torralba/iccv2005/

Boosting
Boosting fits the additive model

by minimizing the exponential loss

Training samples
The exponential loss is a differentiable upper bound to the misclassification error.

Weak classifiers
The input is a set of weighted training
samples (x,y,w)
Regression stumps: simple but commonly
used in object detection.
fm(x)
b=Ew(y [x> ])
Four parameters:

a=Ew(y [x< ])

From images to features:


A myriad of weak detectors
We will now define a family of visual
features that can be used as weak
classifiers (weak detectors)

Takes image as input and the output is binary response.


The output is a weak detector.

A myriad of weak detectors

Yuille, Snow, Nitzbert, 1998


Amit, Geman 1998
Papageorgiou, Poggio, 2000
Heisele, Serre, Poggio, 2001
Agarwal, Awan, Roth, 2004
Schneiderman, Kanade 2004
Carmichael, Hebert 2004

Weak detectors
Textures of textures
Tieu and Viola, CVPR 2000

Every combination of three filters


generates a different feature

This gives thousands of features. Boosting selects a sparse subset, so computations


on test time are very efficient. Boosting also avoids overfitting to some extend.

Haar wavelets
Haar filters and integral image
Viola and Jones, ICCV 2001

The average intensity in the


block is computed with four
sums independently of the
block size.

Haar wavelets
Papageorgiou & Poggio (2000)

Polynomial SVM

Edges and chamfer distance

Gavrila, Philomin, ICCV 1999

Edge fragments
Opelt, Pinz, Zisserman,
ECCV 2006

Weak detector = k edge


fragments and threshold.
Chamfer distance uses 8
orientation planes

Histograms of oriented gradients


SIFT, D. Lowe, ICCV 1999

Dalal & Trigs, 2006

Shape context
Belongie, Malik, Puzicha, NIPS 2000

Weak detectors
Part based: similar to part-based generative
models. We create weak detectors by
using parts and voting for the object center
location

Car model

Screen model

These features are used for the detector on the course web site.

Weak detectors
First we collect a set of part templates from a set of training
objects.
Vidal-Naquet, Ullman, Nature Neuroscience 2003

Weak detectors
We now define a family of weak detectors as:

Better than chance

Weak detectors
We can do a better job using filtered images

Still a weak detector


but better than before

Example: screen detection


Feature
output

Example: screen detection


Feature
output

Thresholded
output

Weak detector
Produces many false alarms.

Example: screen detection


Feature
output

Thresholded
output

Strong classifier
at iteration 1

Example: screen detection


Feature
output

Thresholded
output

Second weak detector


Produces a different set of
false alarms.

Strong
classifier

Example: screen detection


Feature
output

Thresholded
output

Strong
classifier

+
Strong classifier
at iteration 2

Example: screen detection


Thresholded
output

Strong
classifier

Feature
output

Strong classifier
at iteration 10

Example: screen detection


Thresholded
output

Strong
classifier

Feature
output

Adding
features
Final
classification

Strong classifier
at iteration 200

Cascade of classifiers

Fleuret and Geman 2001, Viola and Jones 2001

100%
Precision

100 features
30 features
3 features

0%
Recall
100%
We want the complexity of the 3 features classifier with the performance of the 100
features classifier:

Select a threshold with high


recall for each stage.
We increase precision using
the cascade

Some goals for object recognition


Able to detect and recognize many object
classes
Computationally efficient
Able to deal with data starving situations:
Some training samples might be harder to
collect than others
We want on-line learning to be fast

Shared features
Is learning the object class 1000 easier
than learning the first?

Can we transfer knowledge from one


object to another?
Are the shared properties interesting by
themselves?

Shared features
Independent binary classifiers:
Screen detector
Car detector
Face detector
Binary classifiers that share features:
Screen detector
Car detector
Face detector
Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

50 training samples/class
29 object classes
2000 entries in the dictionary
Class-specific features

Results averaged on 20 runs


Error bars = 80% interval

Shared features

Krempp, Geman, & Amit, 2002


Torralba, Murphy, Freeman. CVPR 2004

Generalization as a function of
object similarities
12 viewpoints

K = 2.1

Area under ROC

Area under ROC

12 unrelated object classes

Number of training samples per class

K = 4.8

Number of training samples per class

Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007

Sharing patches
Bart and Ullman, 2004
For a new class, use only features similar to features that where good for other
classes:

Proposed Dog
features

Sharing transformations
Miller, E., Matsakis, N., and Viola, P. (2000). Learning from one example
through shared densities on transforms. In IEEE Computer Vision and
Pattern Recognition.

Transformations are shared


and can be learnt from other tasks.

Some references on multiclass

Caruana 1997
Schapire, Singer, 2000
Thrun, Pratt 1997
Krempp, Geman, Amit, 2002
E.L.Miller, Matsakis, Viola, 2000
Mahamud, Hebert, Lafferty, 2001
Fink 2004
LeCun, Huang, Bottou, 2004
Holub, Welling, Perona, 2005

Das könnte Ihnen auch gefallen