Recognitionbyrelations

Matching by relations
Idea:
find bits, then say object is present if bits are ok
Advantage:
objects with complex configuration spaces dont make good templates internal degrees of freedom aspect changes (possibly) shading variations in texture etc.
Computer Vision - A
Simplest
Define a set of local feature templates
could find these with filters, etc. corner detector+filters
Think of objects as patterns Each template votes for all patterns that contain it Pattern with the most votes wins
Computer Vision - A
Figure from Local grayvalue invariants for image retrieval, by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
Computer Vision - A
Probabilistic interpretation
Write
Assume
Likelihood of image given pattern
Computer Vision - A
Possible alternative strategies

Notice:
different patterns may yield different templates with different probabilities different templates may be found in noise with different probabilities
Computer Vision - A
Employ spatial relations
Computer Vision - A
Computer Vision - A
Finding faces using relations

Strategy:
Face is eyes, nose, mouth, etc. with appropriate relations between them build a specialised detector for each of these (template matching) and look for groups with the right internal structure Once weve found enough of a face, there is little uncertainty about where the other bits could be
Computer Vision - A
Finding faces using relations

Strategy: compare
Notice that once some facial features have been found, the position of the rest is quite strongly constrained.
Figure from, Finding faces in cluttered scenes using random labelled graph matching, by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE
Computer Vision - A
Detection
This means we compare
Computer Vision - A
Issues
Plugging in values for position of nose, eyes, etc.
search for next one given what weve found when to stop searching when nothing that is added to the group could change the decision i.e. its not a face, whatever features are added or its a face, and anything you cant find is occluded what to do next look for another eye? or a nose? probably look for the easiest to find
What if theres no nose response

marginalize
Computer Vision - A
Figure from, Finding faces in cluttered scenes using random labelled graph matching, by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE
Computer Vision - A
Pruning
Prune using a classifier
crude criterion: if this small assembly doesnt work, there is no need to build on it.
Example: finding people without clothes on

find skin find extended skin regions construct groups that pass local classifiers (i.e. lower arm, upper arm) give these to broader scale classifiers (e.g. girdle)
Computer Vision - A
Pruning
Prune using a classifier
better criterion: if there is nothing that can be added to this assembly to make it acceptable, stop equivalent to projecting classifier boundaries.
Computer Vision - A
Horses
Computer Vision - A
Hidden Markov Models

Elements of sign language understanding
the speaker makes a sequence of signs Some signs are more common than others the next sign depends (roughly, and probabilistically) only on the current sign there are measurements, which may be inaccurate; different signs tend to generate different probability densities on measurement values
Many problems share these properties

tracking is like this, for example
Computer Vision - A
Hidden Markov Models
Now in each state we could emit a measurement, with probability depending on the state and the measurement We observe these measurements
Computer Vision - A
HMMs - dynamics
Computer Vision - A
HMMs - the Joint and Inference
Computer Vision - A
Trellises
Each column corresponds to a measurement in the sequence Trellis makes the collection of legal paths obvious Now we would like to get the path with the largest negative log-posterior Trellis makes this easy, as follows.
Computer Vision - A
Computer Vision - A
Fitting an HMM
I have:
sequence of measurements collection of states topology
I want
state transition probabilities measurement emission probabilities
Straightforward application of EM
discrete vars give state for each measurement M step is just averaging, etc.
Computer Vision - A
HMMs for sign language understanding-1

Build an HMM for each word
Computer Vision - A
HMMs for sign language understanding-2

Build an HMM for each word Then build a language model
Computer Vision - A
For both isolated word recognition tasks and for recognition using a language model that has five word sentences (words always appearing in the order pronoun verb noun adjective pronoun), Starner and Pentlands displays a word accuracy of the order of 90%. Values are slightly larger or smaller, depending on the features and the task, etc.
User gesturing
Figure from Real time American sign language recognition using desk and wearable computer based video, T. Starner, et al. Proc. Int. Symp. on Computer Vision, 1995, copyright 1995, IEEE
Computer Vision - A
HMMs can be spatial rather than temporal; for example, we have a simple model where the position of the arm depends on the position of the torso, and the position of the leg depends on the position of the torso. We can build a trellis, where each node represents correspondence between an image token and a body part, and do DP on this trellis.
Computer Vision - A
Computer Vision - A
Figure from Efficient Matching of Pictorial Structures, P. Felzenszwalb and D.P. Huttenlocher, Proc. Computer Vision and Pattern Recognition2000, copyright 2000, IEEE
Computer Vision - A
The future is bright

Computation is cheap Lots of pix
cameras are cheap, many pix are digital, ink wars
Lots of demand for slicing and dicing pix

generate models new movies from old search
Lots of hidden value

cant do data mining for collections with pix in them e.g. mortgage papers, cheques, etc. e.g. filtering
Computer Vision - A
Recent flowering of vision

can do (sort of!)
structure from motion segmentation video representation model building tracking face finding
will be able to do (sort of!)

face recognition inference about people character recognition perhaps more
Computer Vision - A
Big open problems

Next step in structure from motion Really good missing variable formalism Decent understanding of illumination, materials and shading Segmentation Representation for recognition Efficient management of relations Recognition processes for lots of objects A lot of this looks like applied statistics
Computer Vision - A

Recognitionbyrelations

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Recognitionbyrelations

Hochgeladen von

Copyright:

Verfügbare Formate

Matching by relations

Likelihood of image given pattern

Possible alternative strategies

Employ spatial relations

Finding faces using relations

Finding faces using relations

This means we compare

What if theres no nose response

Example: finding people without clothes on

Hidden Markov Models

Many problems share these properties

Hidden Markov Models

HMMs - the Joint and Inference

HMMs for sign language understanding-1

HMMs for sign language understanding-2

The future is bright

Lots of demand for slicing and dicing pix

Lots of hidden value

Recent flowering of vision

will be able to do (sort of!)

Big open problems

Das könnte Ihnen auch gefallen