Petercorkekioloa2012 PDF

Robot Vision
Peter Corke
CyberPhysical Systems Lab.
CVPR Summer School

Kioloa 2012
petercorke.com/kioloa.pdf
CRICOS No. 00213J

Queensland University of Technology
Why would it be useful for a robot to see?

a university for the real world CRICOS No. 00213J
What is a robot?

1956
1977
Robot: the word
Karel apek
1921

2004
Robot: a definition
where am I?
where are you?
A goal oriented machine that can
sense, plan and act.
how do I get there?

What about GPS?
GPS is not perfect
Satellites are obscured in urban areas
Multi-pathing in industrial sites
Not available in many important work
domains such as
underwater
underground
deep forest
Only tells where I am


What about vision?
Eyes are useful/essential for

the critical life tasks of all
animals:
finding food
avoiding being food
finding mates
Can sense shape, color,
motion
A long range sensor
beyond our finger tip

Evolution of the eye
eye invented 540 million

years ago
10 different eye designs
lensed eye invented 7
times
Vision does need a light

source, but we evolved next
to a bright star

Consider the bee
brain
1g
106 neurons

Anterior Median and Anterior Lateral Eyes of an Adult Female
Phidippus putnami Jumping Spider

Compound Eyes of a Holocephala fusca Robber Fly

Face of a Southern Yellowjacket Queen- Vespula squamosa





Pupil
Sclera
Iris
Pupil
Iris Sclera


brain
1.5 kg
1011 neurons
~1/3 for vision
bee brain
1g
106 neurons
Seeing is an active process

Robots and vision


A great sensor for robots

Vision is the process of discovering from images
what is present in the world and where it is.
David Marr

Robots that read



Dual-camera image-based visual servo

Watching whales with UAVs


Robots underwater


Image
geometry

Reflection of light
i r
Specular reflection
- angle of incidence equals
angle of reflection

Reflection of light
I cos r
Lambertian reflection
- diffuse/matte surface
- brightness invariant to
observers angle of view
Johann Heinrich Lambert

1728-1777

Extramission theory

Image formation
points in the world

Image formation
points in the world image plane

Image formation
points in the world image plane

The pin hole camera

Pin hole images

The worlds largest pin hole camera
http://www.legacyphotoproject.com

Image formation

Use a lens to gather more light
George R. Lawrence 1900

Image formation
f
bigger
area

F = f /
image plane

Pin-hole image geometry
Y Y y
y =
Z z Z z
Image formation is the mapping of scene points to the

image plane

No unique inverse

Thin lens model
The pin-hole camera p
luminance in units of
brighter images is to c
pin A convex lenses can f
hol equivalent
object er
ay lens
pin holeallows more light
The
zi elementary aspec
z zo
inverted
f11.1.
f The
image
positive z-a
focal points and its image are relat
image plane
ideal
thin
lens
1 1 1
+ =
zo zi f
where zo is the distan
a university for the real world
focal length of the le
CRICOS No. 00213J
Thin lens model
pin
hol equivalent
object er pin hole
ay
zi
z zo
f inverted
f
image
focal points
image plane
ideal
thin
Focussing on distant objects lens
zo
zi f

Thin lens model
pin
hol equivalent
object er pin hole
ay
zi
z zo
f inverted
f
image
focal points
image plane
ideal
thin
lens

Pin-hole image geometry (3D)
Y y
=
Y x Z f
y
Z f X x
X =
Z f
fX fY
x= ,y= (X,Y, Z) (x, y)
Z Z
3 2
R R

Perspective transform

Perspective projection
Lines lines
parallel lines not necessarily parallel
Conics conics


Ideal city (1470)
Piero della Francesca (1415-1492)


Homogeneous coordinates
Fig. 2.1.
a " ! ! '
" $ " % " ! " "
! # " " b
Cartesian homogeneous
"! ! % " ! "
" " " *! " ( )
% "# ! ! '
P = (x, y, 1)
" $ ! & !
P = (x, y) " ' " ! % "
% $ " ! ' " ! % "
P R 2 !% "
"
%
P P % " !
2 ! '
homogeneous Cartesian
P = (x,
y,
z) P = (x, y)
x y
x= , y=
z z
Fig. 2.2.
" ! '
" $ " ! " $ "
a university for the" real ( ) ( ) !
world CRICOS No. 00213J
( ) " $ " ( ) !
Pin-hole model in homogeneous form

X
x f 0 0 0
Y
y = 0 f 0 0
Z

z 0 0 1 0
1
x y
x = f X, y = fY, z = Z x= , y=
z z
fX fY
x= ,y=
Z Z
Perspective transformation, with the pesky divide by Z, is linear in
homogeneous coordinate form.
CRICOS No. 00213J

Queensland University of Technology

X
x f 0 0 0
Y
y = 0 f 0 0
Z

z 0 0 1 0
1

1 0 0 0 f 0 0 0
0 1 0 0 0 f 0 0

0 0 1 0 0 0 1 0
0 0 0 1

Central projection model

P = (X, Y, Z) X
x f 0 0 0
y = 0 Y
f 0 0
Z
z 0 0 1 0
1
z
opt
ica
l ax
is
p x
z=f
l
c
prin t
i pa xC
zC
poin camera
{C} origin
e p l an e yC
g
ima
y

Change of coordinates
P = (X, Y, Z)

1
u 0 u0
W
1 u 1
0 v v0
z 0 0 1
opt
ica 0
l ax
is 0
p
z=f
c ipal xC
H n
pri t zC
poin ) camera
( u 0, v0 {C} origin
e p lan
e yC
g
i ma
1
H
W
v

Complete camera model C
P = (X, Y, Z)
z
0
X {C}
x P
x f 0 0 0
Y
y = 0 f 0 0
y C
Z
z 0 0 1 0 {0}
1
extrinsic parameters

1 X
u u 0 u0 f 0 0 0 1
1 R t Y
v = 0 v0 0 f 0 0
v 013 1 Z
w 0 0 1 0 0 1 0
1
K

C
intrinsic
parameters

camera matrix
MATLAB example
>> cam = CentralCamera('focal', 0.015, 'pixel', 10e-6, ...
'resolution', [1280 1024], 'centre', [640 512], ...
'name', 'mycamera')
cam =
name: mycamera [central-perspective]
focal length: 0.015
pixel size: (1e-05, 1e-05)
principal pt: (640, 512)
number pixels: 1280 x 1024
T:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
>> whos cam

Name Size Bytes Class Attributes
cam 1x1 112 CentralCamera
>> P = [0.3, 0.4, 3.0]';

>> cam.project(P)
ans =
790
712
>> cam.C
ans =
1.0e+03 *
1.5000 0 0.6400 0
0 1.5000 0.5120 0
0 0 0.0010 0

MATLAB example
>> [X,Y,Z] = mkcube(0.2, 'centre', [0.2, 0, 0.3], 'edge');

>> cam.mesh(X, Y, Z)
>> T = transl(-1,0,0.5)*troty(0.8);
>> cam.mesh(X, Y, Z, Tcam, T)

Fish-eye lens

Fish-eye imaging model
11.3 Non-Perspective Imaging Models
>> cam = FishEyeCamera('name', 'fisheye', ...

'projection', 'equiangular', ...
'pixel', 10e-6, ...
'resolution', [1280 1024])

Imaging by reflection
From Opticks, Newton, 1704.
An Accompt of a New Catadioptrical

Telescope invented by Mr. Newton
by Isaac Newton
Philosophical Transactions of the Royal
Society, No. 81 (25 March 1672)



Catadioptric imaging model
>> cam = CatadioptricCamera('name', 'panocam', ...

'projection', 'equiangular', ...
'maxangle', pi/4, ...
'pixel', 10e-6, ...
'resolution', [1280 1024])

Multi-view
correspondence
The problem of finding the
same point in different views of
the same scene

The correspondence problem


131 131 131 131 131 131 131 130 131

145 147 144 144 147 145 144 145 144
131 131 131 131 130 131 130 131 131
147 147 143 144 144 145 145 145 147
131 133 131 131 130 131 131 130 130
147 147 145 147 145 145 145 145 145
130 130 131 130 131 131 131 131 131
147 145 147 148 147 147 144 144 143
131 131 131 130 130 130 131 131 130
147 144 147 147 147 147 147 145 144
131 131 130 130 130 131 130 130 130
147 145 147 145 145 147 147 144 145
131 131 130 131 130 130 128 130 131
147 145 147 145 145 145 144 144 145
130 130 128 130 130 130 130 130 130
147 144 147 147 147 147 147 147 145
131 130 130 130 131 130 130 130 130
147 145 145 147 147 147 147 144 144

Corner detector

Corner detector

Corner
! " %
l% %"
13.3.1
'" ' ! detector
' Corner Detectors
Classical ' ' # " ! ' " !
" * ) % % ! ' ' ! & " * *
&'%" ! ! % ! ' !&
! & ' ' # +
% '
' % ! * % " " " ) %' , ( ' ' ' & ! # " ! ' " " (! % & " ! ! ' & ! & '! % " !" ! ' % % & ' ! ! ' !
! " % & ' "# " ' ! ' ' ! ' "& * ) % % ! '% ! ' ' ! " % ' ! " &" ! " * * % ' " ! & ' ! & ' ' '
&' ! ! # + * ' " ' " & ) % & , ! ( ! ' , ' & % ! ! ' ! ' " ! (& % &' , " ' ! " ' " ! ' & ! ! " ! " ' (% % & ' " %
' & ' % # ", ! ' ' # + ' " & ! ' " % ! % " ! %" ! ' ' ! ! " % ' " % " ! % " ! # " ! ' & % % ' $ " (! ' &
'& ! ! ' ' # , + ) ' ' ( & & %! ! "' " , " % ! ! ' % ! ' ! ,& ' , ' ' " ' !" '& %! ! ' )
" '
' & ' % & ,! # , + % " ' ! ' % " % " % ! , % ' " " (! ' " ) * ' ' ! ! $ ( & " %& (! % # & " & '! ' % & " % !
" ' " ! &' ' " ! * * * & (& & ! ' ! + ' # ' %
' ! ' ' , ) ( % " " " ! % , ' ' !
% &' " % ! % # " ! ' ' '" % * & " % ) .& & "
" ( '& ' & ! &' ! # " ! ' & !, ' % & ' ! % ' " % ' * % , ' " ( ' ) % * " ' ' % ! $ (! & # & (% & # & ' )
' * " ' & " ! & & ' " ! ' ' " ! ! '* ( ' " ! '* ' * ! & ( & &% ! " ' ! ! & + ' '" (# ! ' %
( , )
! ! " ' u% % v& ' ' " % ! ( & % ' # " & ! ( ' ' ! ' ,' " % * % & ! ' ' " " % ) " ).& % # # !
(" (& , " '
! ' % & "" !
" (% & ) ' ! ! ' ' # & " ! ' & % ! ' ,' '& * ! ! ' % ' * " ! % ! ' % ' %" ! '! % ! ! '# %
'"*! & & # & " ! ,' ! ' ( ' & " ! ' ' ! % " ! & '" (! (" (
! ! " ' % ' (& ' &( ! ' , % ! ' '" " ) % # # ! !
" % ) ! ' & % ', '* ! % " ! ! '% ' ! !
" ! &# , &
* & & "% " % " ! ! ' ,# , ! &$ ( % * ! " * & & '
& % ', & (% % " ' ' * & (& & # % ) " (& , % ', & )
' " % &# ! '& ! ' % ! !
% ' " ! & ! ' ! (
) ( & ' ! ' % &' & (%
* %
& & " " % " ! ! ' ,# , ! &$ ( % * ! " *
& % ', & (% % " ' ' * & (& & # % ) " (& , %
' " % &# ! '& ! ' % ! !
% ' " ! & ! '
) ( & ( ' ! ' " ! ! ' % & / ' & ) & (( % ' " % ) % , # + ! ' ! ! ' % &' # " ! '&
' " & * % & ! ' ' " ! " ' " % ) ' '" % & ' ' ' & ! "
& " '%" # & ! ' + ! & ! && ! ' , % ! ' ! ' ! ( %
% ' " ! & " ! & $ ( ! ' , ' ' '" % ! ) & ' % " ! " (' # (' " % # " ! ' " ! !

a university for the real* world & ! (" ! ' ' " & ! % / & ) ( ' " % ) % , # +
CRICOS No. 00213J
! ' ! ! ' % &'
! ! % - ' # # %" , ! ! ' & % ', & ' * ' &(
) ( & ' ! ' % &' & (%
Corner detector
(! ' " ! / & ) ( ' " % ) %, # + ! ' ! ! ' % &' # " ! '& %
' " & * % & ! ' ' " ! " ' " % ) ' ' " % & ' ' ' & ! 13.3 " ! Point Features
More general approach
&" '%" # & !
& ' +
% ' " ! & " ! & $ ( ! ' ,
' '
!
'" %
!
!
&& ! '
)
, % ! ' ! '
& ' % " ! " (' # (' " % # " ! ' " !
! ( % "
!
Gaussian weighting matrix
* & ! " ' & %
! ! % - ' # # %" , ! ! ' & % ', & ' * ' &( "
&$ ( % % ! & '* ! ' % " ! ! ' &# % " ! &
, '
13.3 Point Features
* % & * ' ! '% + ' ' # & & & # " ! '& " & % '" ' ! '% " '
* ! " * ! ' ' % ! # # %" + ' , ' % (! ' , " % & % &
) , ) # " )' - ( * (( # ' # ! # , ' % ! ) (* " "
,) $ # ' . $ # + $ !* )$ # ) #
* % ! % ' " % -" ! ' ! ) % ' % ! '& % &# ' ) , !
! " * * % '
structure tensor
) , ) # " )' - ( * (( # ' # ! # , ' % ! ) (* " "
), $ # . ( $ # (.
+ $ " ! * )" $ # )') # " )' - ' '' )$ + ' $ * (! . ( ) ()' * )* ' ) # ($ ' * )$
$ ' ' ! ) $ # " )' - $ ' ( $ # " $ " # ) " )' - ) % )* ' ( ) # ) # ( ). ()' * )* ' $ )
!$ ! # $ * ' $ $ # )( # + !* (% ' $ + ' $ ) )$ # !!. # + ' # ) ( ' % )$ #
$ ) # $ * ' $ $ ! " # )( $ ) " )' - ' $ " % * ) '$ " ) "
'
a university # for)(the(& real
* ' world $ ' " * ! ) % ! # ) # (" $ $ ) * (# , CRICOS) No.
# 00213J
" )' - (
,* ( ! (. " * " % ' ' )' ! " # " ' ,)' & - ' '' )$ + ' $ * (! . ( ) ()' * )* ' ) # ($ ' * )$
' * (# $ ( # " % ' $ + ( ) () ! ). # ' ! ! ). $ ) ) )$ ' ' # )
Convolution
12.4.1 lConvolution
kernel
& " ) " $ $ " # $ " $ " # & % $
' " # $ & % $ " " & ") % $ % $ ( $ "" #

' ( # " $
% $ # % $ $ ' # ' $ $
a university for the real
$" $ ' ' world " # # " $ " $
CRICOS No. 00213J
12.4.1.1 Smoothing l
" ! & % % ! * & &$ ( %
>> K = ones(21,21) / 21^2;
! " ( ! ' )" ( ' ' &

'& ) ( & &( '
' & %! & ! * % " (' # (' #
! ! " (% " " ! ' ! # ('
>> lena = iread('lena.pgm', 'do
>> idisp( iconv(K, lena) );
Defocus involves a kernel which is a & '" & " " ' ! (% % ! " % !
K
2-dimensional Airy pattern or sinc func- % ( , * * & & " ! ' " % -" ! '
tion. The Gaussian function is similar in
% ! ! " % &( ' % ! " % & " " '
shape, but is always positive whereas
the Airy pattern has low amplitude
negative going rings.

a university for the real world * & &, '% " (' ' " %
CRICOS No. 00213J ! !
0.02
0.015
0.01
0.005
0.005
0.01
0.015
0.02
15
10
1
5 10
0 5
5 0
5
10 10
15 15


' # )( (& * ' $ ' " * !)% ! # ) # (" $ $ ) * (# , ) # " )' -
)$ ' ( ' !& ' ) $ #! ' " % & ' )' - & $ ( % ' ( $ # " $ " # ) " )' - ) % )* ' ( ) # ) # ( ). ()' * )
' * (# $ ( # " % ' $ + ( ) () ! ). # ' ! ! ). $ ) ) )$ ' '
! $" ! # ( # $ * Corner detector
' $
' ). % $ # )(
!!. ! * ! ) * (# # + ! * ( % ' $ + ' + )+ $ ' $ ) ) $ # ! !
* (( # . # + ' '# ! "# )
$ )) # $ ,* ' ) $ $ (" $ $ ) # ! " % ' # )( " $ ) ' ) " )' - ' $ " % * ) '$ "
' # # ( #! )()' " (&' ! () * %' / $ & # $ ))' (" ' * ! )" % % ( ! )$ # % , # + #$ ' ! ,)' # (" 1$ ! $ )( ! ' % & * ' (# " $ # ! ' ' & 13.3 Point Features
% , ' )) # $ # (" $
' + ' " & *)$ * ' (% # $ ( & # ) (" % # ' $ ,! + ()' )+ ' " ' ! ()" ' ' ! ) ).$ " # % ),# " ' ' $ ! + ' " % )& ! ' )., ' $ ' # & )$! " , ! ) ' ) % )$ ! ' .
More" & " general '%" # & ! ' + approach ! & ! && ! ' , % ! ' ! ' ! (
% (( ' " " (! ! & ' " )$! # & )$ ( ! $ ' ' ' , ' ).# % ! ' ' ' ! " ! % . $ # ! ! * ) ,! )& ' % $ " # ! * ((" # ( ' '# ()' " % $ ' ' # + " )! # ' + " ! ! $ " ! * ((( # (* ' ' #
% "
Gaussian * # ) + & ! ! * " ' weighting

( $ & % , ' ) ) ("% matrix ' $ # $ )% # ! * % ' + ' )* " ' () $ ' ) (* ' )) )% $ # ) $ )
+ , ! * ' # ( ! # ' ) ! ("' % ()!-! % )' $ # # )# )% " (* ' , ( $ ! # !( '! $ )' & ), )% ' (, ) & ' 1 " * ( ' ' & ( $ $ # "' ( % % ' ' $ -) "$
&$ ( % % ! & '* ! ' % " ! ! ' &# % " ! &
+ $ # )$()' # ) ! $ ! # ) # )( ).( # $ , # ) + # ' + ! * ' () $ # , # " $ ) + $ )) ' , ! $ # , $ ), # ))' % (* ' ! .
' (( " ( ! % ' )$ , ) $ ' # # )! ' ( # $ # , $ )$ # ( # + ' )! * ($ ' ' # ! )" (* ' ( ( ((*
ompression removes % 13.3 Point Features
# + !* ($ , , $ # ( ' )$
' ) % ' # % ! * ' + )* ' ( $ ) (* ' $ ' # ' !
)) )% $ # ) $
etail from the image, $ " ! '! (! ) # ) ' % ))$+ ' ' (*' $ ' # (# based ' (& ( )& on! # gradients
()' # " )& ) (% '$ )" ' ) " ! $ ' '% # " ' ' ' $ ' $ #
what defines a cor- + * ! * % ( ' & ("
* & " ) ! ' ) & ( % % '
" " )* # * )" , ! $ ! ! ) # # # + )' " ! * # (' )' ).
% - ( ! $ # * # (( # % " + # # + ' ' ! # * ! , ( ' % ( ! ' # # , )" % , & $ % ' ) & % ! ' ! $ ), )(* " # " )
" * $ !# # () so intensity difference is
detectors should be )$ # . $ # + $ !* )$ # ) # eliminated
that have not been ' ( % , )( # # $ ) # + !* ( ' ) (* '
, '
decompressed.
mpression removes
ail from the image,
%
* %
,
!
, $ # ( ' )$ $ '# ' ! structure tensor
$ % " ' ( " % -") ! ' )$ ' ! $ )# % (' ' ( ) % ()' ! # ' & %) & # $
' ) , ! ) $ '# ' $ '
what defines a cor- ,! " * *' % ' ' ) # + !* ($ $ # )( # ) " $ ' , ) (" (* ' (
ed to in theshould
literature " ' # ' " * ' " ' )$ # (+ / ! * 0)$ + ' $ ' ' ( ) )$2 small
' ( ( 2 large
$ # ) (
etectors be !
ner , ( (. " " )' " )' - ' ' ' * (! . ( ) ()' * )* ' ) # ($ ' * )$
hatdetector.
have not been # $ (' ' ) ! )), $ * # )" ) # )' # " - $ ()'' -($ ' (# $ # ' ()'
* " (($ " # # )# )' # " (! )' - ) % )*# ' , ( )' % ! # ) )# ( ).(* " ()'" * )* ' $ )
compressed. )$ # . $ # + $ !* )$ # ) # 1 small constant edge
!$ ! # $ * ' $ $ # )( # + !* (% ' $ + ' $ ) )$ # !!. # + ' # ) ( ' % )$ #
$ ) # $ * ' $ $ ! " # )( $ ) " )' - ' $ " % * ) '$ " ) "
, ' ' # )( (&' * )' $ ' " # * + ! ) % ! * ! ( $ # ) $ # # (")( $ # $ )) 1 large *"(#
edge
,$ ' , ) # " ) )'( -"
peak (*(
to in the literature '' # * ' ( ' # # ' $ ( ! )$ ' # ( + "/ % ! * ' $ + ' (% )' ( ()# )( ! ).()' $ # # 0' ! () !# ).) ' $ ' $ )(' # ' ) ) )$# )$ ' ' ! () (' $ # # )
er detector.
#,)'* " ( ()! (.* "# )* " % ' ' )' ! )# ' world
(# % " ' ' ' # ,! )'(! .' & - ()'
" ). $ ' )! * ' # )$!' )' ))$' * (%+ (' $ # $ # * (!(. , (' )+ # )()'+ $ * )$ )* ' *)CRICOS
# ($+ ' # ! No.
(( * * 00213J
)$(' # ' ! " ! ' ) $ #
Harris corner value

Harris corner features
>> b1 = iread('building2-1.png', 'grey', 'double');

>> idisp(b1)
>> C1 = icorner(b1, 'nfeat', 200, 'patch', 5);

7497 corners found (0.7%), 200 corner features saved
>> C1(1:4)
ans =
(600,662), strength=0.0054555, descrip= ..
>> C1.plot()

Harris corner features

Image motion sequence
>> im = iread('bridge-l/*.png', 'roi', [20 750; 20 480]);

>> c = icorner(im, 'nfeat', 200, 'patch', 7);
>> ianimate(im, c, 'fps', 10)

Comparing features
Weve found the coordinates of some interesting

points in each image:
1
{ pi , i 1 N1 } { 2p j , j 1 N2 }
Now we need to determine the correspondence,

1 2
which pi p j
1 1 1
The pixel values I [ u, v] themselves are not
sufficiently unique

Feature matching
Chapter 12 Image Processing
We use a W W window of pixels centred on each

corner point (W is odd)
We use a similarity metric to compare the windows
1
For each point pi we test the similarity against all
2
the points in the other image { p j , j 1 N2 }
This is an N1 N2 search problem

MATLAB example
>> m = C1.match(C2)
m =
39 corresponding points (listing suppressed)
>> whos m
Name Size Bytes Class
Attributes
m 1x39 416 FeatureMatch
We need to separate good >> m(1:5)

ans =
and bad matches

(999, 602) <-> (770, 624), dist=0.125548
(885, 570) <-> (659, 588), dist=0.131761
(251, 599) <-> (26, 588), dist=0.148539
(272, 647) <-> (42, 638), dist=0.161652
(591, 314) <-> (448, 290), dist=0.172520

>> idisp({b1, b2});
>> m.plot()
CRICOS No. 00213J
For tracking
>> t = Tracker(im, c)
200 continuing tracks, 41 new tracks, 0 retired
.
.

Tracking results

Problems with feature matching
Best match is not necessarily a

good match
obscuration
Non-unique matches
many to one
visual similarity
left right

More problems with feature matching!
Large changes in viewpoint will distort the pattern of

pixels
view direction scale change rotation

perspective
We need a descriptor that is invariant to scale and

rotation

Harris feature recap
Concise:
hundreds of features instead of millions of pixels
a description that is useful for the viewer and not cluttered
with irrelevant information (Marr)
Finds points that are distinct and easily located in a different
view of the same scene
Computationally efficient (good for real-time tracking)
Problems:
finds only small scale features
simple neighbourhood window matching is problematic
with changes in scale and rotation
with missing parts
leading to erroneous matches

Epipolar
geometry
The geometry underlying
different views of the same
scene

Epipolar geometry
P!
P
epipolar plane
I1 ima
ge p
lane I2
ne
la
p !
2
1
p p
ge p 2
epipolar ima
1 line 2
z1 ! !
z2 x2
{1}
1
e 2
e
x1 {2}
y1
y2
1
2

Fig. 2.1.
a " ! ! '
" $ " % " ! " "
! # " " b
Cartesian homogeneous
"! ! % " ! "
" " " *! " ( )
% "# ! ! '
P = (x, y, 1)
" $ ! & !
P = (x, y) " ' " ! % "
% $ " ! ' " ! % "
P R 2 !% "
"
%
P P % " !
2 ! '
homogeneous Cartesian
P = (x,
y,
z) P = (x, y)
x y
x= , y=
z z
Fig. 2.2. lines and points are duals
" ! '
" $ " ! " $ "
" ( ) ( ) !

a university for the real( )
world " $ " ( ) ! CRICOS No. 00213J
A line in homogeneous form
such that
T p = 0
p = (x,
y,
z)
= (l1 , l2 , l3 ) Point equation of a line
y = mx + c

Line joining two points
p 2 = (d, e, f )
p 1 = (a, b, c)
= p 1 p 2

Point joining two lines
2 = (d, e, f )
p
handles the case of
non-intersecting lines
p = 1 2 automatically
1 = (a, b, c) line equation of a point

Fundamental matrix
2 2
! ! p =0

Fundamental
matrix
2 T 1
p F p=0

Testing match validity
If a pair of points are genuinely corresponding then
2 T 1
p F p=0
Now we just have to figure out F...

Computing F
F has special structure

rank 2
null vector is the epipole coordinate
Can be computed from 8 pairs of corresponding
points
>> F = fmatrix(p1, p2);
but we dont know the correspondences...

The RANSAC algorithm
1.Take 8 random possible pairs

2.Compute F
3.Test all other pairs with this F and
score how well they fit
4.Repeat N times and choose the F
that best explains the most pairs

>> F = m.ransac(@fmatrix, 1e-4, 'verbose')
62 trials
6 outliers
5.94368e-05 final residual
F =
-0.0000 -0.0000 0.0052
0.0000 -0.0000 0.0010
-0.0053 -0.0023 1.0682
>> idisp({b1,b2})
>> m.outlier.plot(r);

How did the camera move?
essential
matrix
2 T 1
x E x = 0
must be F
p Kx
2 T T 1 1
p K2 EK1 p =0
T
E= K2 FK1 E S(t)R

Dealing with
scale

More problems with feature matching!
Large changes in viewpoint will distort the pattern of

pixels
view direction scale change rotation

perspective
We need a descriptor that is invariant to scale and

rotation

Gaussian sequence

+ )1" / + 1& 3" * " + 0 1, # & + ! 1% " - , & + 1, # * 5& * 2* $ / ! & " + 1& 0 1, , * - 21" 1% " 0" , + !
! " / & 3 1& 3" + ! ! " 1" / * & + " 4 % " / " 1% & 0 & 0 7 " / , % " - ) & + , - " / 1, /
Laplacian of Gaussian sequence
Fig. 12.18. , * - / & 0, + , # 14 , + ! / " 12/ + 0 + & * $ " 4 % " / " 1% " " ! $ " 0
" ! $ " , - " / 1, / 0 a + + 6 , - " / 1, / 0- , + ! & + $ 1, $ / ! & " + 1* $ + & 12! " 11%
4 & 1% ! " # 2)1 - / * " 1" / 0 b $
1% " " ! $ " 0 / " * 2 % 1% & + + " / 1% + 1% ,
+ & 12! " , # ! " / & 3 1& 3" , # 200& +
(" / + " ) % " , , -" / , - " / 1, / 4 % & % & 0 0% , 4 + & + & $
1, / / " . 2& / " 0)" 00 , * - 21 1& , + 1% + 200& + , - " / 1& , + % " % 601" / " 0& 0 1% /
+ + 6 21$ " + " / 1" 01% & ( " / " ! $ " 0 $ 2* " + 10
& 0 1% " 02* , # 1% " 0" , + ! 0- 1& )! " / & 3 1& 3" & + 1% " % , / & 7 , + 1 ) + ! 3" / 1& )! & / " 1& , + 0 , /
, / , 1% 0" 0 / " 02)10 / " 0% , 4 + , # / 4 " % 3" , + 0& ! " / " ! + " ! $ "
& + 3" / 1" ! 4 % & 1" & 0 7 " / ,
02- - / " 00& , + % 0 " " + 20" ! 1, #,
! & 0 / " 1" & * $ " 1% & 0 + " , * - 21" ! 6 , + 3, )21& , + 4 & 1% 1% " - ) & + (" / + " ) + )1" / + 1& 3" * " + 0 1, # & + ! 1% " - , & + 1
! " / & 3 1& 3" + ! ! " 1" / * & + " 4 % " / " 1% & 0 &
>> L = klaplace()
L =
0 1 0
& 0 1% " 02* , # 1% " 0" , + ! 0- 1& )! " / & 3 1
1 -4 1 ! & 0 / " 1" & * $ " 1% & 0 + " , * - 21" !
0 1 0 >> L = klaplace()
L =
0 1 0
4 % & % & 0 & 0, 1/ , - & 8 & 1/ " 0- , + ! 0 " . 2 ))6 1, " ! $ " 0 & + + 6 ! & / " 1& , + % " 0" , + ! ! " / & 3 1 -4 1
0 1 0
1& 3" & 0 " 3" + * , / " 0" + 0& 1& 3" 1, + , & 0" 1% + 1% " # & / 01 ! " / & 3 1& 3" + ! & 0 $ & + , 4* % & *% & 0,& 0,+ 1/ )6
, - & 8 & 1/ " 0- , + ! 0 " . 2 )
20" ! & + , + ' 2+ 1& , + 4 & 1% 200& + 0* , , 1% " ! & * $ " 1& 3" & 0 " 3" + * , / " 0" + 0& 1& 3" 1, + , & 0" 1%
20" ! & + , + ' 2+ 1& , + 4 & 1% 200& + 0
4% & % 4" , * & + " & + 1, 1% " -) & + ,

( " / + " ) $ & 3" + , 3" % & 0 + " 4 / & 11"
4 % & % 4 " , * & + " & + 1, 1% " -) & + , # 200& + ( " / + " ) , + ! & 0 1% " -) & +
( " / + " Mexican
) $ & 3" + , 3" hat
% & 0 function
+ " 4 / & 11" + + )61& ))6 0
4 % & % & 0( + , 4 + 0 1% " / / & )! / " 1% ,

&+ &$ "

Stack the images
Find local maxima with
respect to u, v,
The (u,v) coordinate is
the position of a
feature
The z coordinate is the
scale of the feature

Simple scale-space example

Laplacian of Gaussian sequence

Magnitude of LoG function
this peak is the
characteristic scale of the
feature

Characteristic scale

Scale Invariant Feature Transform
>> s = isift(b1, nfeat, 200);

>> s.plot(clock);
SIFT detector (Lowe, 2004)

Compare Harris and SIFT
100 100
200 200
300 300
v (pixels)
v (pixels)
400 400
500 500
600 600
700 700
800 800
200 400 600 800 1000 1200 200 400 600 800 1000 1200
u (pixels) u (pixels)
Harris corner SIFT

Epipolar magic example
image 1 image 2

484 Using multiple images
200
v (pixels)
400
600
800
500 1000 1500 2000 2500

u (pixels)
Figure 14.3:
image 1 image 2
Feature matching. Subset (100 out of 1664) of matches based on SURF de-
scriptor similarity. We note some (low in the image) that some are clearly incorrect.

which results in
a university the realof SurfPointFeature objects. ManyCRICOS
two forvectors thousands
No. 00213J of
world
200
v (pixels)
400
600
800
500 1000 1500 2000 2500

u (pixels)
200
v (pixels)
400
600
800
500 1000
1500 2000 2500
u (pixels)
CRICOS No. 00213J
Geometry of multiple views 497
Epipolar magic
noname
100
200
300
v (pixels)
400
500 image 2
600
P
700
epipolar plane
I 1 image plan I2
e
800 lan
e
ge p
1
p epipolar ima
p
2
1 line
200 400 600 800 1000 1200 z1 ! 2
! z2 x2
{1} e
1 2
e
u (pixels) x1 {2}
image 1
re 14.10: Image from Figure 14.1(a) showing epipolar lines converging on the projection
y 1 epipolar
point
1
y2
2
e second cameras centre. In this case the second camera is clearly visible in the bottom
of the image.

3 dimensional
vision

How big is it?


The Ames room

The Ames room

How do we estimate distance?
1.Occlusion
PERCEPTION, LAYOUT, AND VIRTUAL REALITY 29
2.Height in visual field

3.Relative size
4.Texture density
5.Aerial perspective
6.Binocular disparity
7.Accomodation
8.Convergence
Figure 1. Just-discriminable ordinal depth thresholds as a function of the logarithm of distance from the ob-
9.Motion perspective server, from 0.5 to 10,000 m, for nine sources of information about layout. I assume that more potent sources of
information are associated with smaller depth-discrimination thresholds; and that these thresholds reflect
suprathreshold utility. This array of functions is idealized for the assumptions given in Table 1. From Perceiving
Layout and Knowing Distances: The Integration, Relative Potency, and Contextual Use of Different Information
About Depth, by J. E. Cutting and P. M. Vishton, 1995, in W. Epstein and S. Rogers (Eds.), Perception of Space
and Motion (p. 80), San Diego: Academic Press, Copyright 1995 by Academic Press. Reprinted with permission.
How the eye measures reality and virtual reality.
JAMES E. CUTTING, Cornell University, Ithaca, New York.
Chauvet et al., 1995; Hobbs, 1991) and Egyptian art (see image) to the top, and assuming the presence of a ground
Behavior
Hagen, Research
1986; Hobbs, 1991),Methods, Instruments,
where it is often used alone, & Computers
plane, 1997,
of gravity, and 29 (1),
the absence of a27-36
ceiling (see Dunn,
with no other information to convey depth. Thus, one Gray, & Thompson, 1965). Across the scope of many
can make a reasonable claim that occlusion was the first different traditions in art, a pattern is clear: If one source
source of information discovered and used to depict spa- of information about layout is present in a picture be-
tial relations
in depth. yond occlusion, that source is almost always height in the
a university for the real worldBecause occlusion can never be more than ordinal in-
formationone can only know that one object is in front
visual field. The conjunction
CRICOS No. of occlusion
00213J and height,
with no other sources, can be seen in the paintings at Chau-
of another, but not by how muchit may not seem im- vet; in classical Greek art and in Roman wall paintings;
1.Occlusion
3.Relative size
4.Texture density
7.Accomodation
8.Convergence
9.Motion perspective

1.Occlusion
3.Relative size
4.Texture
Texture density
7.Accomodation
8.Convergence
9.Motion perspective

1.Occlusion

3.Relative size
4.Texture density
7.Accomodation
8.Convergence
Behavior
Hagen, Research
plane, 1997,
ceiling (see Dunn,
tial relations
00213J and height,
1.Occlusion

3.Relative size
4.Texture density
7.Accomodation
8.Convergence
Behavior
Hagen, Research
plane, 1997,
ceiling (see Dunn,
tial relations
00213J and height,
No unique inverse

Binocular disparity


Stereo disparity
50
100
150
200
v (pixels)
250
300
350
400
450
500
550
200 400 600 800 1000 1200
u (pixels)
points in the right image are shifted to the left

the shift is less for distant points

1954-59

Disparity
The horizontal displacement in an image point due
to horizontal translation of the camera
fb
d
Z
b
f
f
f is focal length, b is baseline, Z is depth.

2010


Prague


Anaglyph images


Anaglyph image



left

right

Shutter glasses

Computational
stereo
Depth from 2 images

Disparity
The horizontal displacement in an image point due

to horizontal translation of the camera 14.3 Stereo Vision
Fig. 14.21.
( & # ! ( " ' &
" (& ( " ( & (
fb
! ' ' + $ ( # & .# " ( -
)" ( (! ( ' (
d
+ " # + &# ! ( ( !
Z

T =
vL
d
dmax
left right
uL uL
Computational stereo
50
100
150
50 85
200
100
v (pixels)
250
300
80
350
400
450
150 75
500
550
200 400 600 800 1000 1200 200
u (pixels)
70
v (pixels)
250
65
300
60
350
55
400
450 50
500 45
550 40
100 200 300 400 500 600
u (pixels)
>> d = istereo(L, R, [40 90], 3);

50 85
100 80
150 75
200
70
v (pixels)
250
65
300
60
350
55
400
450 50
500 45
550 40
100 200 300 400 500 600
u (pixels)
Planar
homgraphy
The problem of finding the
same point in different views of
the same scene

Homography
n object plane
P
epipolar plane
I 1 image plane I2
H ne
e pla
1
p imag
p
2
z1 z2 x2
{1}
x1 {2}
y1
1 y2
2
2 1
p=H p

Homography
The matrix H is called an homography

The 3x3 matrix contains 9 elements
Overall scale factor is arbitrary, that is kH is the
same as H
So only 8 unique numbers to determine
Can be computed from 4 or more corresponding
point pairs
2 1
p=H p

Corresponding points
p1 q1
p2
q2
q3
p3 q4
p4
pi = (ui , vi ) qi = (ui , vi )

u1 u2 u3 u4 u1 u2 u3 u4
v1 v2 v3 v4 v1 v2 v3 v4
>> H = homography(P, Q)
>> Q = homtrans(H, P)

Perspective rectification
>> H = homography(p1, p2)

2 1
p=H p
H=
1.4003 0.3827 -136.5900
-0.0785 1.8049 -83.1054
-0.0003 0.0016 1.0000

Perspective rectification
50
100
150
200
250
v (pixels)
300
350
400
450
500
550
100 200 300 400 500 600 700 800 900 1000
u (pixels)
2 1
p=H p >> homwarp(H, im, 'full')

Virtual camera
# & % &
& & % &
0 /
(" (
% &

"
>> tr2rpy(sol(2).T, 'deg')
0
ans = & &
-76.6202 9.4210 -13.8081
/ - # % "

Optical flow
How points in an image move

as the camera moves

Optical flow patterns
tx tz
100
100
200
200
300
300
400
400
v (pixels)
v (pixels)
500
500
600
600
700
700
800
800
900
900
1000
1000
100 200 300 400 500 600 700 800 900 1000
100 200 300 400 500 600 700 800 900 1000
u (pixels)
u (pixels)
z
100
200
300
400
>> cam.flowfield([1 0 0 0 0 0])
v (pixels)
500
600
700
800
900
1000
100 200 300 400 500 600 700 800 900 1000
u (pixels)

+ +
' "
$ $
Optical
"
flow
" & $ & #
" equation
$ ( " $ # " $ & $ $ "
#% #$ $% $ ! ! $ ! # $
(u, are distances from principal point

v)

Ambiguity between translation and rotation
tx 100
y 100
200 200
300 300
400 400
v (pixels)
v (pixels)
500 500
600 600
700 700
800 800
900 900
1000 1000
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000

tx #% #$ $% $ ! ! $ ! # $
100
200
300
400
v (pixels)
500
600
700
800
900
1000
100 200 300 400 500 600 700 800 900 1000
u (pixels)
y small f y large f
100 100
200 200
300 300
400 400
v (pixels)
v (pixels)
500 500
600 600
700 700
800 800
900 900
1000 1000
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000

* / * # !& % ,! + ! & - & ,! ' & $ ' * % +
+ & Motion
- of $ multiple
+)/ . points
. ## - % ,## * ' .
$ ) $ & + & % * & )) * ' & % * + & )& + + & % )& , % # %
Consider the& case
) + ) of' & three
% + * points, in matrix form
!. / / * !,
' -+ ' * / * # ! & % ,! + / * !+

' % ( * ! + + ,* & + $ ,! ' & $ & * ' , ,! ' & $ . $ ' ! ,1 ' % ( ' & & ,+
, % & ! (-$ ,' + * $ ' + ) ! / & . ' # # * , % & % *' % % , ,*# ! ) * & # & ' % ! *& + ' & % + * )
!0 & ' % (-, !* ,$ 1! 1 , jacob0 % , ' ' ,
15.2.2 lControlling Feature Motion
ob0(qn)
& ). - * & . % & . ' & % +* $ & - % + $ ' # %
.0144 $ & + & real
0.3197 % * * & + 0% +
a university for the
world
* + * 0+ % - )* ' )& 0 # $ CRICOS No. 00213J
+ +
+& - $ +)/ . . ## - % ,## * ' . +
$ & ))Inverting
+ $ & + & * % &* +& ))the
) * ' ' & & problem
% % *+ * + &2 )& + + & % )& , !% 3# % ! & &
. & )% + % )- )+' & %( + *
% 2 ! * " ! 01. ! 2 ! (+ % 05 3 ! * + ) , 10! 0$ ! . ! - 1% . !
3 ! ! 0! . ) % * ! 0$ ! " ! 01. ! 2 ! (+ % 05 $ ! / % ) , (! / 0 / 0.
% 2 ! * " ! 01. ! 2 ! (+ % 05 3 ! * + )
+ *0. + ((! . required point
3 ! ! 0! . ) velocity
% * ! 0$ to! move
" ! 01.
from! 2 ! (+ % 05
p to p*
+ *0. + ((! .
0$ 0 0$ ! " ! 01. ! / 0+ 3 . 0$ ! % . ! / % . ! 2 (1! /

+3 % 0$ $ + -) / . # # 3 ! 3% & . %% 0!0$ * % 0 , # ) * 0$& !# & " %! 01.* ! +/ 0+ 3' & . % + 0$* ! % ). %
% * & #- & )+ ) ( ,3 )% 0$ - $ ) 3 -! 3# & . % 0!+ 0
15.2.2 lControlling Feature Motion

& ). - * & . % & . ' & % +* $ & - % + $ ' # %
$ & + & % * * & + % + * + * + % - )* ' )& # $ + + *
$
07/ % 0world $

%/ $ .07/3% 0% (( $ .% / % 2 !+ *0.
+ *0. + ((! 0$ !+ ((! .)3 !% ((
. /. % + 2 !0
CRICOS No. 00213J
Desired view

Current view

Image plane motion

Image plane motion
(u,
v)

(u, v)

IBVS simulation

New concepts
Image convolution
Gaussian, Laplacian operators
Feature detectors
Scale-space
Geometry of image formation
Fundamental and essential matrix
Homography
Image correspondence
RANSAC

Further reading
Peter C0rke Peter Corke

Robotics,
Vision eachrithms
and opedtoadata.
The practice of robotics and computer vision
involve the application of computational algo-
The research community has devel-
very large body of algorithms but for a
Control newcomer to the field this can be quite daunting. Corke
Robotics,
Vision
For more than 10 years the author has maintained two open-
source matlab Toolboxes, one for robotics and one for vision.
They provide implementations of many important algorithms and
allow users to work with real problems, not just trivial examples.
This new book makes the fundamental algorithms of robotics,
vision and control accessible to all. It weaves together theory, algo-
and
rithms and examples in a narrative that covers robotics and com-
Robotics, Vision and Control

puter vision separately and together. Using the latest versions
of the Toolboxes the author shows how complex problems can be
decomposed and solved using just a few simple lines of code.
The topics covered are guided by real problems observed by the
Control
author over many years as a practitioner of both robotics and
computer vision. It is written in a light but informative style, it is
easy to read and absorb, and includes over 1000 matlab and
Simulink examples and figures. The book is a real walk through
the fundamentals of mobile robots, navigation, localization, arm-
robot kinematics, dynamics and joint level control, then camera
models, image processing, feature extraction and multi-view
geometry, and finally bringing it all together with an extensive
FUNDAMENTAL
discussion of visual servo systems. ALGORITHMS
IN MATLAB
isbn 978-3-642-20143-1
9 783642 201431
springer.com

Homework
How could a robot vision system exploit other (non

binocular) depth cues to determine distance?
How could a robot recognize a place (kitchen,

bathroom, garden) in a way that is invariant to:
lighting levels
position of the robot
small changes in the environment


Petercorkekioloa2012 PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Petercorkekioloa2012 PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Robot Vision

CVPR Summer School

CRICOS No. 00213J

how do I get there?

Eyes are useful/essential for

eye invented 540 million

Vision does need a light

~1/3 for vision

Johann Heinrich Lambert

points in the world

points in the world image plane

points in the world image plane

George R. Lawrence 1900

Image formation is the mapping of scene points to the

Piero della Francesca (1415-1492)

CRICOS No. 00213J

>> whos cam

>> P = [0.3, 0.4, 3.0]';

>> [X,Y,Z] = mkcube(0.2, 'centre', [0.2, 0, 0.3], 'edge');

11.3 Non-Perspective Imaging Models

>> cam = FishEyeCamera('name', 'fisheye', ...

From Opticks, Newton, 1704.

An Accompt of a New Catadioptrical

>> cam = CatadioptricCamera('name', 'panocam', ...

131 131 131 131 131 131 131 130 131

' " # $ & % $ " " & ") % $ % $ ( $ "" #

! " ( ! ' )" ( ' ' &

Gaussian * # ) + & ! ! * " ' weighting

>> b1 = iread('building2-1.png', 'grey', 'double');

>> C1 = icorner(b1, 'nfeat', 200, 'patch', 5);

>> im = iread('bridge-l/*.png', 'roi', [20 750; 20 480]);

>> ianimate(im, c, 'fps', 10)

Weve found the coordinates of some interesting

Now we need to determine the correspondence,

We use a W W window of pixels centred on each

We need to separate good >> m(1:5)

and bad matches

Best match is not necessarily a

Large changes in viewpoint will distort the pattern of

view direction scale change rotation

We need a descriptor that is invariant to scale and

1 = (a, b, c) line equation of a point

If a pair of points are genuinely corresponding then

F has special structure

but we dont know the correspondences...

1.Take 8 random possible pairs

Large changes in viewpoint will distort the pattern of

view direction scale change rotation

We need a descriptor that is invariant to scale and

4% & % 4" , * & + " & + 1, 1% " -) & + ,

4 % & % & 0( + , 4 + 0 1% " / / & )! / " 1% ,

>> s = isift(b1, nfeat, 200);

Harris corner SIFT

500 1000 1500 2000 2500

500 1000 1500 2000 2500

2.Height in visual field

2.Height in visual field

2.Height in visual field

points in the right image are shifted to the left

f is focal length, b is baseline, Z is depth.

The horizontal displacement in an image point due

>> d = istereo(L, R, [40 90], 3);

The matrix H is called an homography