Advanced Computer Vision: Sarah Constantin April 19, 2012

Advanced Computer Vision
Sarah Constantin
April 19, 2012
1 Lecture 1
Eye to retina to lateral geniculate nucleus to visual cortex, in the back of the brain. The
cortex is a layered structure, numbered 1, 2-3, 4, 5-6. The eye plugs into the middle layer,
layer 4, and there are connections above and below. This is the rst visual area, V1.
Theres a second visual area, named V2, and it too consists of layered cortex. Many of the
cells in V1 project into V2, and some of the cells in V2 project back into V1.
There are about one million neurons. What kind of abstraction could we build around
this? Each individual cell is dened by a property. The function of a cell is assessed by
what images make the cell re a lot. Some cells are sensitive to edges in a certain direction,
or lines. An orientation tuning curve denes which direction the cell is most sensitive to.
Theres a concrete way this works a +/- lter convolved against a lter detects sharp
gradients or edges.
We could also vary the length of the line used as the stimulus. The ring rate increases
with the length of the line, up until a point. If the brain were linear, then it would plateau.
But in fact, in some cells it drops o with length, to varying extent. The amount they
drop o is called the percentage endstopping. The idea is that these cells are nding line
ends and corners.
Orientation hypercolumn: the line of cells in a tangiential penetration. Each column is at
an (x, y) position in the visual eld, and has a stack of tangiential directions, sensitive to
dierent orientation angles of cells.
But there are also complex cells, which generalize over position. Theyre responsible for
some notion of shape.
In the horseshoe crab Limulus polythemus, theres a compound eye, and the cells inhibit
their neighbors so an incoming stimulus produces a response that amplies edges. This
is called lateral inhibition. Its kind of taking a derivative.
1
Circuits among neurons: there are excitatory and inhibitory interactions between the units
involved. What would a theory of this kind of network be?
Take a curve, and two cells, with dierent sizes of receptive eld. For a straight line, both
cells respond well. For a curved line, the cell with the larger receptive eld will be less
responsive, because of endstopping.
What does all this have to do with computer science? We learn the parameters r
ij
(,
),
with neuron i tuned to and neuron j tuned to
. We would like to maximize
p
i
()r
ij
(,
)p
j
(
)
Computational energy, Hopeld energy, average local consistency, whatever you want to
call it. How do brains maximize this? Its related to game theory, and well discuss this
later.
Frenet equations: dierential equation of frames along a curve.
[T
, N
] = K[T, N]
2 Lecture 2
Why is the visual cortex arranged in (x, y, )? Does it serve any purpose?
Mathematical inspiration: A curve is a map from some interval [a, b] into a plane. Pick
some point t along the interval; this corresponds to some point on the curve, c(t). Were
thinking of curves as maps. The piece thats missing from here is curvature. Does curvature
have to work somehow with x, y, ? Straight lines have zero curvature. The radius of the
osculating circle is inversely correlated with the curvature.
Given an interval and a function dened on that interval, a smooth function has a maximum
on that interval. Rolles Theorem: if f(a) = f(b) = 0, if the function is smooth, there exists
at least one point c which is between a and b such that f
(c) = 0. If you dene three points

on the curve, s
1
, s
2
, s
3
, then theres a circle with all three points on the border. Dene the
vector (p(s) = (s) c(s
1
, s
2
, s
3
). If we take p ( c), this would be the radius squared
function.
Standard orientation: dimensions: tangent direction and normal direction. [T, N]. Just a
linear transformation.
d
ds
(T T) =
d
ds
(1)
T T = 0
2
With some more calculations we can get
T
= k(s)N(s)
this gives us the notion of directed curvature.
We can also attach, at every point along the curve, the tangent vector. The pair of vectors,
tangent and normal, is called a frame. This smells like a basis... but of WHAT?
Another way to think about it: think of change in normal, instead of change in tangent.
Thats called the shape operator.
(s) = (0)sT
0
kN
0
s
2
/2
This is approximated by the osculating circle. How well? This is rst homework. (Possibly
do it with MATLAB.)
Fundamental Theorem for plane curves: if k(s) is a real-valued function, then there exists
a curve , parametrized by arclength, such that k(s) is the directed curvature of at every
point. This curve is unique up to translations and rigid rotations.
Modern point of view: point P of attachment, and the vector attached at that point. Im
starting to think in terms of vector geometry. Pairs of vectors which each live in R
2
. Were
looking at points in R
2
R
2
. Tangent space: consists of points (p, v) RR. Tangent space
at every point. We say these collections of vectors are bers over the point of attachment
p, and these bers can be bundled together. U
p
(fibers) is called a ber bundle. If these
bers are vectors, this becomes a vector bundle. The tangent space attached to the point
p has the structure of a vector space. We need a rule for addition of vectors and a rule for
scaling vectors.
(p, v) + (p, w) = (p, v +w)
(p, v) = (p, v)
d/dt(x +t, y) = (1, 0)
d/dt(x, y +t) = (0, 1)
The coordinates that span the local tangent space are the x and y derivatives. Choosing
a value at each point is called taking a section through the tangent bundle; this is a
curve!
3 Lecture 3
Neurons in the visual cortex has a lot of axonal arborization. What are they sensitive to
and what kinds of circuits are being formed?
3
The cortex has six layers, stacked vertically. The top half are the supercial layer. Input
comes into the middle layer, layer IV, and processing goes down to the lateral genicu-
late, and also up into the next visual area. The brain is about the size of a large pizza
unfolded.
Neurons have visual receptive elds tuned to location and orientation.
If you insert an electrode into the supercial layers tangientially, at each position in the
retinotopic array of x-y coordinates, there are cells tuned to dierent angle orientations.
These are orientation columns. Roughly ten-degree steps. Every position is covered by
a bunch of cells with receptive elds whose position-orientation response overlaps.
There are nuclei below the cortex the lateral geniculate nucleus, the superior colliculus.
The colliculus is involved a lot in eye movement and eye movement control, especially the
involuntary eye movements. Voluntary eye movements are in the frontal cortex Frontal
Eye Fields. And these project to the colliculus. The colliculus has layers and locations
corresponding to locations in retinotopic space; and a motor map; and an auditory map. All
pretty much matched up. There are projections between these lamina, but very few.
Youre looking somewhere, you take a saccade, and theres a huge optical ow swing
the world shifts quickly. One of the things that happens in the lateral geniculate is that
during high optical ow periods, the colliculus shuts down. You dont want to notice as
much while youre undergoing a saccade. But even when you think youre xed, you have
micro-saccades. 20-30 minutes of angle. Thats huge in terms of photoreceptors.
It turns out that the lines traced by saccades form a fractal. Its impossible to pinpoint
the points of xation from a trace. This is relevant for autistics its hypothesized that
they avoid faces. How do they even know where the face is to avoid it?
Also: even though your eyes are saccading through the visual scene, you have a xed repre-
sentation of the room. How does that work? You must synchronize retinotopic coordinates
somehow.
Experimental setup: while the monkey views a pattern, keep track of metabolic activity
in the cortex. In places where horizontal is really active, vertical is really quiet, and so
on. Thats how we get a color map, where color corresponds to orientation of underlying
grating. This is an optical image of the visual cortex, layer II/III. If you made a pene-
tration in some directions, you would cross a lot of colors an orientation hypercolumn.
If you stayed in one color region, Hubel and Wesel would have said it was an uninteresting
penetration.
If at the same time you inject a tracer into the optical images, you can get a picture of
the long-range connections between dierent orientation regions. Like orientations pretty
much wire together. Thats Gestalt good continuation. Thats wonderful. But there
are still a lot of connections to non-identical orientations. However, cocircularity works
4
better as a model. Orientation columns are connected to orientation columns that share a
circle.
Number of connections is more peaked across orientation dierences for those cocircular
compatibility elds that are most restricted to a single angle (shallowest curvature.)
4 Lecture 4
Curves as maps:
F : R
n
R
m
Plane curves: n = 1, m = 2. Once you have maps, you naturally ask: what happens if I
compose maps?
: I R
n
F : R
n
R
m
Derivatives are maps:
x f
(x)x
takes an interval to a line in R
n
, (t) = (p + tv), and F takes this line to a curve in
R
m
, (t) = F(p + tv). We can also dene F
p
(x), a function of the unit vector v, that
sends v to
(p). This is called the tangent map. What does it mean to have coordinates
on a surface? Local homeomorphism between R
2
and the surface. The local copy of R
2
for a point on the surface is called a chart, and these charts comprise an atlas. Weve just
dened a manifold. The tangent bundle is not a surface, but it *is* a manifold.
Commutative diagram:
f
: TR
n
TR
m
linear map between tangent spaces
f : R
n
R
m
n
: TR
n
R
n
m
: TR
m
R
m
The commutativity of the diagram says that f

m
=
n
f. The tangent map f
(p) is
the linear transformation that approximates f near p. If we have a pair of vectors i and j
in R
2
, and Qi = [cos , sin ] Qj = [sin , sin ], we can put these together in a rotation
matrix through the angle .
_
cos sin
sin cos
_
5
The sign of the determinant of this gives the orientation. Rotation is an isometry the
distance between a pair of points is left invariant.
d(f(x), f(y)) = d(x, y)
F = Q : T : R
2
R
2
Rotate tangent vector by and then project back up. If F is an isometry, then the star
map is a rotation. Recall that if you have the curvature then you can dene the curve
up to translation and rotation. This is a key part of the proof of that statement. If you
have a function
F = (f
1
. . . f
m
) : R
n
R
m
we could ask what is
F
(U
N
j
( p)) =
df
i
dx
j
U
M
i
(f(p))
This is the Jacobian.
Note for later: curvature resides in connections. (Jet bundle.)
The unit tangent bundle on the circle is SO(2).
If TR
n
= R
n
R
n
we say the tangent bundle is trivial. Its a product between the base
space and the product space. Typically this is true only locally. The famous example is
the Mobius strip. Imagine the base space is a 2d surface embedded in R
3
. Because of
the non-uniformity in orientation, the tangent bundle is not the product between R
1
and
R
1
.
Normal bundle of a submanifold c embedded in M: N(c) in M is the set of tangent vectors
in M based on c and orthogonal to TC. The normal eld is not trivial; as you go around
once, the normal is pointing in the opposite direction.
Now lets consider a space curve in R
3
. The tangent to the space curve is one dimensional;
but I have these other two coordinates of the ber in the normal bundle. The structure of
the rotation of the tangent and normal is an element in SO(2). If I impose the condition
that the movement of the frame is smooth, when I get all way back to the point where I
started, the normals dont have to agree. But I can restrict the change of frame so that
the normals do agree. The normal bundle of a smooth closed curve in R
3
is trivial. Unlike
a 2-d manifold in R
3
.
5 Lecture 5
If patches of image arent the right objects to base a stereo correspondence on, what are
the right objects? Orientation vector elds. Templates of straight lines one way of doing
6
this. Classical computer vision. Used for TV commercials. A primate has trees which
are mostly lines, but not frontal parallel, at all dierent depths. Whats a space curve?
How do those space curves land in our visual cortex?
Orientation disparity; left eye vs. right eye sees slightly dierent position and orienta-
tions.
Whats the generalization of cocircularity in 3-d? Last time: Frenet equations:
_
T
_
=
_
0 k
k 0
_ _
T
N
_
General equation involves tangent, normal, and torsion.
_
_
T
_
_
=
_
_
0 k 0
k 0
0 0
_
_
_
_
T
N
B
_
_
Orthographic projection: project onto a 2-dimensional plane orthogonal to the image plane.
The tangent goes to the tangent, the normal goes to the normal, and all you lose is the
third direction.
Three planes: (T, N) osculating plane, (T, B) rectifying plane, (N, B) normal plane.
How do you solve stereo correspondence? Theres a space curve in the world, projects to
a point in the left eye and a point in the right eye. We know the tangent in both images,
the normal in both images, the curvature in both images. Thats enough information to
reconstruct the position of the point. Cross product between the tangents is approximately
a normal, and the binormal is the error between that and the actual normal.
This assumes I can do correspondence between points.
If youve got four points, how do you connect them up and see if theyre consistent?
Transport the tangents along a curve. Take pairs of tangents and normals, and transfer
them along the curve to see if they correspond.
Geometrically consistent matching pairs: the algorithm. Want to maximize compatibil-
ity,
p
i
()r
ij
(,
)p
j
(
)
6 Lecture 6
Review of the Hubel-Wiesel model; a block of cortex, input from layer 4, upstairs layers
2/3. Tangiential penetration reveals a whole range of orientations, alternating left and
7
right eyes, and from this perspective the question is, given a little cube from the left eye
and a little cube from the right eye, how could we compute the stereo response? This
would involve connections between the left and right eyes. We could imagine what these
looked like if we thought of our connection structure in x, y, z coordinates in the actual
world. A space curve, with a Frenet 3-frame with a tangent, normal, and binormal, and
we worked out an r
ij
compatibility eld between frames that are compatible by transport
along the curve.
Why couldnt we do this with some extra long-range connections? Why not do it in V1?
Why elaborate stereo into V2?
One thing to note: I want to be able to compute stereo even though depth changes and
relative position changes. Relative vs. absolute spatial disparity. The circle of zero
disparity is the circle of points the same distance from the eyes if theyre focused to
a particular point. Smaller circles are crossed disparities because if youre verged on
a distance and you look at something closer, you see double. These are called diploptic
images. Outside that area, images fuse (appear as one) and this is called Panums Fusion
Area.
Were going to build up a complex cell. A Gabor lter:
p(x) = [(I G
c
)
2
+ (I G
s
)
2
]
1/2
= |I (G
c
+iG
s
)|
= p(x)c
i(x)
so depending on the phase, you can have positive stripes, negative stripes, or either kind
of boundary. How about a left and right eye?
|(I
l
(x, y) +I
r
(x, y)) (G
c
+iG
s
)| = |
l
e
i
l
+
r
e
ir
|
and the norm squared is
2
l
+
2
r
+
l
r
(e
i(
l
r)
+e
i(
l
r)
)
stereo
=
2
l
+
2
r
+ 2
l
r
cos(
l

r
)
Maximized when
l
=
r
and minimized when
l

r
= . This is a tuned excitatory
binocular neuron.
Anti-correlated dots are a problem for the correlation model. If you have correlated images
(even random mush) you can fuse them. If you have uncorrelated dots, you cant. If theyre
anti-correlated, s going to look like -1.
Train a monkey (with water) to fuse a random-dot stereogram. The saccade process will
correct how the monkey looks. The oset introduced by the stereogram is consistent. If
8
you put an anti-correlated stereogram, the eye will move in the opposite direction. There
are cells in VI and VII tuned to be excitatory and inhibitory based on horizontal disparity,
and then there are cells tuned to near and far (want to see things ahead of xation point
and behind xation point). Near and far are used to get rid of the symmetry when an
image could correspond to two dierent real-world depth.
What happens if you only illuminate one eye? just the left ocular dominance bands in
VI.
Stripes that give a strong cytochrome oxidase staining are full of cells that are disparity-
selective. There are also thin stripes, and pale stripes.
The thick-stripe system. The parvo and he magno cells in the LGN; Parvo are color-
sensitive, poor contrast sensitivity, slow and sustained. The magno are fast and transient,
color blind, and high-contrast sensitivity. These are localized: more parvo at the fovea and
all magno at the periphery. One is for clarity, one is for motion.
On average, VII is more than twice as tightly tuned as VI, but thats all classes of
stripes.
Cocircularity gives you 2-d consistency, you go up you get 3d, not everything makes sense
so you go back own, and that kind of explains the feedforward and feedback projection.
The other pathway involves the faster, quicker, dirtier aspects of things. The visual area
MT also has clumns and optical ow vectors; theres a dual input into MT, and if youre
moving in an orthogonal direction near-far changes fast and if youre moving tangientially
then near-far wont change much.
7 Lecture 7
Were going to go from curves to surfaces.
Vector eld (r cos , r sin ) represents the tangent to a circle of radius r, at every point.
We like these to be dierentiable. A frame eld is a set of frames at dierent points; the
vectors in a frame should all be mutually orthogonal.
Covariant derivatives: where w is a vector eld,
u
w = w(p +tu)
(0)
=
(D
u
w
i
)u
i
(p)
That is, the change in the vectors as we move in direction u, in the limit as the distance
traveled goes to zero. Suppose w = x
2
u
1
+yzu
3
= (x, 0, yz). Let v = (v
1
, v
2
, v
3
) = (1, 1, 1).
9
Then
v
w =
(D
v
w
i
)u
i
D
v
w
1
= v
1
D
x
(x
2
) +v
2
D
y
(x
2
) +v
3
D
z
(x
2
)
= 2xv
1
= 2x
D
v
w
2
= 0
D
v
w
3
= v
1
D
x
(yz) +v
2
D
y
(yz) +v
3
D
v
(yz)
= 0 + 1z + 1y
So
v
w = 2xu
1
+ 0u
2
+ (z +y)u
3
= (2x, 0, z +y)
The covariant derivative to the normal on a surface is called the shape operator, and its
very important. Changes in tangents and changes in normals dene everything you need
to know about a surface.
A dierential form is a linear map that takes a vector space to a real number. And
follows the alternation rule: dg
1
dg
2
= dg
2
dg
1
. Anticommutative. The dierential is a
one-form:
df(v) = df/dxdx +df/dydy +df/dzdz
= f v R
The gradient of a vector eld produces level curves of this surface. f is a vector; df is a
form. One-forms are objects that measure vectors. We also have interesting operations on
forms called the wedge product.
dg dh
is the wedge product; This goes from V V R. It can calculate, roughly, an area. Takes
two inputs instead of one. Thats a two-form. Now consider the 3d Frenet equations.
General equation involves tangent, normal, and torsion.
_
_
T
_
_
=
_
_
0 k 0
k 0
0 0
_
_
_
_
T
N
B
_
_
Cartans connection forms generalize Frenet frames. For any frame eld {E
v
} dene con-
nection forms w
ij
= (
v
E
j
)E
j
which are one-forms, with the property w
ij
= w
ji
.
_
_
v
E
1
v
E
2
v
E
3
_
_
=
_
_
0 w
12
w
13
w
12
0 w
23
w
13
w
23
0
_
_
_
_
E
1
E
2
E
3
_
_
10
Forms take vectors to numbers; you need to evaluate on directions.
Now we introduce surfaces. Instead of a surface, talk about a collection of maps. A
surface is a collection of maps; for all patches there exists x(u, v) containing p, where x is
dierentiable at each point, and x is 1-1.
8 Lecture 8
Locally, a surface is a bunch of curves (innitely many) about each point. But it isnt an
innite amount of information. A subset S R
3
is a surface if for every point p there is a
neighborhood around the point that maps to some coordinate patch.
x : U V S
such that x is dierentiable, x
1
is continuous, and the dierential dx
p
is one-to-one. The
dierential takes vectors to vectors; basis vectors on the surface to basis vectors in R
2
.
We can talk about the collection of all curves in U which are tangent to the surface at a
point:
{
(0) : (t) U, (0) = p}

This is the tangent plane. Problems when we deal with surface; we have basis vectors on
U, and you map them onto the surface with a dierential; but they might no longer be
orthogonal.
dx
(
e
1
) d
x
(e
2
) = 0
need not be orthogonal. So you instead need the First Fundamental form
_
E F
F G
_
If we let X
u
= d
x
(e
1
), X
v
= d
x
(e
2
), and
w = aX
u
+bX
v
r = cX
u
+dX
v
w r = ac(X
u
X
u
) + (ad +bc)X
u
X
u
+bd(X
v
X
v
)
= ac +bd
This means the right dot product is exactly the matrix multiplication, with
E = X
u
X
u
G = X
v
X
v
11
F = X
u
X
v
Example: cylinder. In a cylinder, for instance, the rst fundamental form is the identity
matrix. So the tangent vectors stay orthogonal. Look at two curves and their velocity
vectors. From the Frenet frame, we know that the velocity vector can change in the
direction of the normal, or of the binormal. Geodesic curvature is the projection onto the
tangent plane; normal curvature is the projection onto the normal curve. Think of the
geodesic curvature as a curve property, and the normal curvature as a surface property.
Curvature of a great circle is just 1/r, where r is the radius. All normal curvatures on the
sphere are the same. This is only true for spheres; it denes the sphere, in fact. limB/A,
the ratio of areas spanned by normals at adjacent solid angles, is called the Gaussian
curvature. Shape operator: dN, the dierential of the Gauss map. This is dened as
S(v) =
v
N
The covariant derivative in the direction v of the normal.
< S(v), v >
is normal curvature in direction v. When diagonalized, the shape operators diagonal
elements are the principal curvatures. K, Gaussian curvature, is the product of the principal
curvatures. If K > 0, paraboloid, K = 0, plane or cylinder; K < 0 hyperboloid.
9 Lecture 9
Recovering the geometry of the surface from a 2-d image of the surface. Can we do this?
On a Lambertian surface, the irradiance is n l, the surface normal dotted by the light
source vector. Light hits the surface, ips the surface normal, and is reected.
I = f(v 2(n v)n)
I(u, v) = (x(u, v))
is an albedo map; x is a shape parametrization; M is the shape.
dx = (x
u
, x
v
) = UV
T
singular value decomposition has coordinates that indicate how much a circle is squished
onto an ellipse by the surface of the image. The direction in which image intensities
12
are changing the quickest is the direction in which the environment map is changing the
most.
r(u, v) = v 2(n v)n
= v 2(n
T
v)n
r : R
2
S
2
dr = TR
2
TS
2
r is the reected view vector; n is the surface normal; v is the view vector; I is the image.
So we can predict orientation structure for a specular surface. Isophobes lines where
brightness isnt changing at all. Shading ow: tangents of this brightness map. Unfortu-
nately, the measured orientation ows are noisy. How can we make something smooth and
internally consistent? Maybe theres something intermediate in how our vision solves the
problem. Maybe we can go from the ow eld to the surface structure somehow.
(x, y) : R
2
S
1
s(x, y) = (x, y, (x, y))
E
T
(x, y) : R
2
R
2
, (v
1
(x, y), v
2
(x, y)), v
2
1
+v
2
2
= 1
This is a tangent eld. Directional derivative of f in direction v at point p:
V
p
(f) = d/dt(f(p +tv))|
t=0
Covariant derivatives:
v
(w) = d/dt(w(p +tv))|
t=0
Frame eld: collection of orthogonal vectors, with the property that E
i
E
j
=
ij
.
v
p
=
1

E
1
(p) +
2

E
2
(p)
v

E
1
(p) = w
11
(v)

E
1
(p) +w
12
(v)

E
2
(p)
v

E
2
(p) = w
21
(v)

E
1
(p
+
w
22
(v)

E
2
(p)
The frame eld is orthogonal so
0 = v[

E
i

E
j
]
=
v
(

E
i
) E
j
+E
i
v
E
j
2
v

E
j
E
i
= 0
If
E
T
= (cos , sin )
E
N
= (sin , cos )
13
then
T
=
E
T
E
T
= (cos , sin )
N
=
E
N
E
N
= (sin , cos )
=
_
2
T
+
2
N
2
T
+
2
N
=
T

E
N

N
E
T
We want to nd some function over a local neighborhood that reects coherent properties
of a ow equivalent to an osculating circle.
10 Lecture 9
What is the probability that youll die of some disease? Huge list of diseases. How do you
get all that to balance out? At some time t, you have observations. But the observations are
noisy. Probability distribution. But if at an earlier time you knew where the ball was then,
you should have some kind of correlation between distributions. X
t
= f(X
t1
). Here this is
a stochastic equation some noise process. We have some notion of state of the world which
we want to predict. State of the world observed through some probabilistic observation
model. The state of the world updates on the past states of the world. Chunking their
way along.
Kalman Filter: a system of equations for making assumptions that the state at time t is
only a function of the state at time t-1. The Markov assumption.
P(x
t
|x
t1
) = N(Ax
t1
, Q)
A is the linear system of update formulae. Observations go the same way:
P(O
t
|x
t
) = N(HX
t
, R)
This is the Kalman model. The estimated mean position at t +1 is equal to the estimated
position at t plus the state transition matrix.
t+1
= A
t
A
t
+Q
This is the state update. This is all we need. Observation update: dene something called
the Kalman Gain.
K
t+1
=

t+1
H
t
H
t+1
H
t
+R
t+1
=
t+1
+K
t+1
(
t+1
H
t+1
)
14
Recall that denotes estimate.
t+1
= (I K
t+1
H)
t+1
The Kalman gain is a sort of measure of condence. This is very useful in computer vision
for tracking and so on.
Generalized graphical model (Belief propagation) Coming from Asia increases the proba-
bility that you can contract tuberculosis. Smoking is another thing that aects lung disease
though. Inuences probability of lung cancer or bronchitis. X-rays could tell you whether
you have lung cancer or tuberculosis. Tuberculosis, lung cancer, or bronchitis could all
cause shortness of breath. You can set up a lovely directed acyclic graph. This gives us a
graphical model!
Nodes that have no arrows going into them you cant say anything but a prior. Other nodes
have observables going into them. Belief *propagates* through the magic of Bayes!
P({x}) = P(x
A
, X
B
. . . ) = p(X
A
)P(X
B
)P(X
T
|X
A
)P(X
C
|X
S
)...
depends on the precise arrangement of the graph.
n
i=1
p(x
i
|parents)
We talk instead about beliefs in the marginals, b(x
i
), approximately the marginals. This
is the heart of belief propagation.
Stereo problem: given brightness at each position, whats the disparity at each position?
(Depth location.) Coupled relationships between dierent disparities legal and illegal
labelings (so you dont get impossible drawings.) So you get a graph. Factors into two sets
of terms: how the observables aect the depth values, and how nearby depth values aect
one another. Write as a big product:
1/z
ij
(x
i
, x
j
)
i
(x
i
, y
i
)
the s are about how does the brightness at i aect the depth at i? the s are about
how the depth value of i relates to the depth of j. This is like the smoothness term. Potts
potential:
ij
(x
i
, x
j
) = e
V (x
i
,x
j
)
You cant compute the messages until you have all the information from things passing
into you. This is like relaxation labeling, but a harder and more general model.
15
11 Lecture 10
From the video: functional and structural imaging of a cortical circuit.
Related to Paul Allens connectome project, the Allen Institute.
How are circuits wired up, and what do they do? One is physiology and one is func-
tional labeling. Color-code neurons based on the orientation of the neurons that excite
them.
A mouse is running on a trackball, but its head is xed and its shown visual images.
Theres a cranial window and an observatory into the mouse brain. The goal is complete
functional and structural imaging of a cortical circuit.
Cortical neurons come in two types, excitatory and inhibitory. Computers are organized
in OR and NOR and so on. Statistics at that level wont tell you much; you need motifs,
such as binary addition, 8-bit addition, etc.
Is the excitatory network sparse and random? Probably not. One can study that non-
randomness in the context of what the neurons actually do. Neurons that do the same
things are interconnected?
In the rodent visual cortex, neurons that do dierent things are mixed together. Re-
construct bits of the brain with serial electron-microscopy. Identify connections between
neurons physiologically. The brain isnt a perfectly organized VLSI-like structure. Its all
the stu jumbled together.
Neuro View: a simple question about a possible motif. Excitatory cells all synapsing on
the same interneuron. Why?
A running mouse gets three times as much visual activity while running than while standing
still.
Recall the relaxation labeling: set up compatibilities r
ij
to maximize A(p) =

p
i
r
ij
p
j
.
Whats in between this abstraction and the physiology? Dendrites to a neuron are inputs,
axon is output; neuron res when the sum of the inputs exceeds some threshold for ring.
The neuron makes a decision when the sum of the evidence exceeds some threshold. This
is the individual decision making level.
Two connected neurons ones decision is a function of the others, and vice versa. Coupled
decision-makers.
Winding Road Game: One player wants to drive on the left side of the road, one wants to
drive on the right. You die if you disagree; you win if you agree, but you win more if you
get your way. Mixed strategies work well.
16
Put a probability distribution on the strategies, and an r
ij
on the linkage between them,
and you get a relaxation labeling quadratic form!
A(p) =
p
i
r
ij
p
j
p
= Rp +c
R is the update matrix. This gives you gradient ascent. Need sum of the probabilities to
stay equal to one, so
Ap = 1
A is just a big block of ones. and you need probabilities to be nonzero.
p 0.
This is an interior point method!
Minimax:
max
x
min
y
y
t
Ax
min
y
max
x
y
t
Ax
These are equal for the optimal strategy. This is the Min-Max Theorem. Want to maximize
a scalar payo v such that
v e
T
i
Ax
x
j
= 1
x
j
0
This is called a linear program.
12 Lecture 11
How can the connectome and game theory be related?
Linear programming: minimize an objective function c
T
x subject to Ax b, x 0. Slack
variable: switch an inequality constraint to an equality constraint. Say
7x
1
+ 5x
2
7
becomes
7x
1
+ 5x
2
x
3
= 7
x
3
0
17
The objective function pushes towards one of the corners of the feasible set. Test:
evaluate the objective function at all the boundary points of the simplex. This is called the
Simplex Method. This is technically exponential in the number of dimensions; you could
go through all the points. In practice it tends to be linear.
Payo:
j=i
p
i
r
ij
(
i
,
j
)p
j
(
j
) + 1/2
i
,
i
p
i
(
i
)r
ii
(
i
,
i
), p
i
(
i
) +
i
c
i
(
i
)p
i
(
i
)
an elaborated version of the usual payo. First term is eect of your neighbors; second is
eect of you playing; third is a constant freebie. In neurons, if the sum of the inputs is less
than a threshold, it doesnt re; if it exceeds the threshold, it res. Treat this likd voltage
instead.
c
i
du
i
/dt =
T
ij
v
j
= u
i
/R
i
+I
i
u
i
=
i
V
i
+
i
i
=
i

i
i
u
i

i
dV
i
/dt =
T
ij
v
j
c V
i
/R
i
c
i
+
cI
1
i
c
i
The game is between players being ampliers (or neurons)
p
i
(d) = v
i
r
ij
(d, d) = 2T
ij
/
i
c
i
System of equations:
p
= Rp +c
Ap = q, p 0
q is a vector of all -1s. Coupled equations. ps are probabilities, rs are r
i
j links between
neurons, c
i
the weight on the freebies. Cliques are self-excitatory, continue to re; non-
cliques will go back down to zero because theres a leakage term. This also gets stability,
even in case one neuron doesnt re. Greater stability. Increased accuracy, increased
reliability.
You wind up with 33 cells/clique, by experimental calculation.
18
13 Lecture 13
Last time: input vector, length 500. Want to project into 3 dimensions, 3 colors. Imagine
the data points form a cloud, and we want the three top principal components. (PCA all the
way!) First component: coordinate of maximum variance. Second component: orthogonal
coordinate of second greatest variance. Third component: orthogonal coordinate of third
greatest variance.
Alternative: directions that span the space but dont necessarily have to be orthogonal.
This is independent components analysis. Lots of literature about this.
But still linear. What if the data lives on a curved surface? Optimal straight axes dont
make sense, need kernel PCA.
Colors actually have some structure, luckily: theres a color wheel!
Theres got to be a lower bound on the number directions: at least, projecting everything
onto zero-d isnt distance-preserving. Here we go: Johnson-Lindenstrauss!!!!
x
i
and x
j
are data points; x
i
, x
j
R
d
and we want a map f : R
d
R
k
, k << d
(1 )||x
i
x
j
|| ||f(x
i
) f(x
j
)|| (1 +)||x
i
x
j
||
provided k O(1/
2
log N). You can choose randomly. With high probability this works.
Almost every vector is approximately orthogonal in a suciently high-dimensional space.
Also: concentration of measure. The length of the projected vector onto a k-dimensional
subspace is about k/d. (multivariate Gaussian.) Probability of vectors collects on the
k-equator of the d-1 sphere.
Two ways of thinking about this: discrete (graphs) and continuous, (N , manifold.)
Think of each data point as a point; (a muncell patch) and each point has a vector of
500 wavelengths. Dene a similarity matrix or a weight on the edges of the graph as
1 e
||x
i
x
j
||
2
/
. This is an anity its one between a point and itself. Similarity
kernel.
Think of minimizing a form

i,j
(y
i
y
j
)
2
W
ij
. If the similarity kernel says theyre close,
then theyd better be similar in the low-d space.
Graph Laplacian: D W, where D
ii
=

j
W
ji
the weighted degree of each point. So in
graph Laplacian terms, were minimizing
Y
T
LY.
Thats the spectral graph theory approach. The other way: think of D
1
W, a row-
stochastic matrix. Thats transition probability!! Random walk on the graph. Just plain
19
old diusion. Diusion distance:
l
(p
t
(l|i) p
t
(l|j))
2
is the diusion distance between i and j the dierence between probability distributions
after t steps given that you start at i or given that you start at j. Basically an L
2
dierence
between probability distributions.
t
diusion operator can be expanded in eigenvectors. Only the components with big eigen-
values are large in the long term. Project onto that, and you have nonlinear dimensionality
reduction.
Note diusion operator is kind of a clustering operator bottlenecks get pulled apart,
clumps get clumpier.
Now lets go back to color patches. Embed color patches in diusion coordinates they
live on a CONE. Hue and brightness and saturation!! Its a nonlinear manifold recovered
exactly from the Munsell patches. Theres evidence in V2 and V4 that theres direct
representation of hue this cone.
Go back to hue color is a VECTOR FIELD! because each color is an element of S
1
. Hue
ow.
14 Lecture 14
MMPI data: can embed questionnaire answers in a manifold. Every person is a point in 800
dimensions. Is there a lower-dimensional manifold? If you subtract questions at random,
you can ll in the missing data via extension techniques (remember geometric harmonics
paper.) Turns out only about 12-15 dimensions are relevant. Similarity over countries
UN voting relationships. Create a similarity tree. Look at the eigenfunctions based on the
voting kernel. If you look at the manifold of countries, you can see an alliance of red
countries (Soviet Bloc) and blue countries (Western Europe) and you can see the eect of
De Gaulles withdrawal from Nato on the manifold. You can also see the US way out away
from the principal axis of blue (Western Europe) to red (Russia). After 1990, the Soviet
countries y along over to Western Europe. But another satellite country moved out with
the US away from the main axis. That other country is Israel.
You have databases on conicts between countries. Do countries ght with geographical
neighbors? Yes. You ght less with countries that youre close to on the measure of UN
agreement. (Interesting note: Canada is technically involved in many military conicts.)
You can use the Nystrom extension techniques to tell if a country that abstained was
abstaining because they were intending to vote yes or intending to abstain no.
20
Shading ow at boundaries: how does the tangent map to the iso-brightness contours
behave as you approach the boundary of an object bounded by a smooth surface. Margaret
Wong-Rilie got the following idea in 1970s: the metabolic idea of neurons is dictated by
how much ring they do. Maybe there are clusters of neurons that re more, and clusters
that re less. You can look for cytochrome oxidase patterns. This distinguishes clusters
which are richly tuned to contrast and color opponency, but not to edge detection. Long-
range connections go from blob to blob, or within blobs to within blobs, but not much
between color cells and edge cells.
Coherent vs. noisy hue elds. They look just like a texture elds. We make connections
between hue elds just like texture elds, and so on. Denoising via diusion turns out to
look horrible. Finding hues is a matter of nding texture discontinuities.
15 Lecture 15
Visual curve completion in the Tangent Bundle. Object occlusion: objects in front of
others. Curve completion how do you guess how the curves extend behind a block? Has
nothing to do with closed shapes. There are innitely many ways to complete a curve from
the two points where it intersects the occlusion. This is, of course, ill posed. Many possible
models for curve completion imposing geometrical properties on the curve and seeking
the curve that uniquely satises them. Minimum total curvature, isotropy, smoothness,
scale invariance, roundedness, etc.
Relevant perceptual insights: curve completion is quick low level mechanism. Com-
pleted curves between cocircular inducers are not circular arcs; no roundedness. No scale
invariance either: completed curves depend on distance between inducers.
The shortest distance between two points in the unit tangent line cannot be a straight
line, because the tangent orientations along a straight line arent straight. This is called
inadmissible. We need to nd the shortest path between two endpoints that is admissible
the tangent of the curve at a point should be the corresponding tangent at that point.
We use gradient descent to minimize this length. Euler-Lagrange equations give the fol-
lowing family of solutions:
(h
)
2
= c
2
/(sin
2
( +)) 1
need to choose c and .
Calculus of variations is a technique to nd equations that minimizes a function of the
form
l =
_
||
(t)||dt
21
which in our case is _
_
x
2
+y
2
+h
2
2
dt
subject to
tan (t) = y
/x
Numerical procedure: make initial guess for c, , compute the length of the projected curve.
Then reconstruct the curve by iteratively (say, Euler) solving the dierential equation;
examine its endpoint, use its distance from the correct endpoint as a measure of error.
Take a gradient descent on the error function, until it converges.
These compare well to human completion in psychophysical studies.
Network model: source to sink, path in a weighted graph. Want to minimize energy of
traveling. This is actually analogous to a network of neurons. Theres a nice way to do this!
Dijkstra! Bellman! But the trouble with these algorithms are not useful for distributed
computing as in the cortex.
Dynamic programming = divide into sub-problems and combine. Our model consists of
four layers which work simultaneously. Each node tells its neighbors what is the shortest
path between its neighbors and itself. Computes the cost of shortest path from each neuron
to the source.
16 Lecture 16
Perceptual organization: how you ascribe structure to patterns. What is the nature of that
structure?
Fractals are related to perceptual organization. Lines against a background. Whats the
dimensionality of each part? Its like 1-d in front of 2-d. How can you tell inside from
outside on a Jordan curve? Cant see instantly. Have to trace. Texture vs. lines: are you
looking at part of a contour or a texture? Discontinuities are multiple tangents at the same
direction. But these can come from quantization issues rather than evidence youre part
of a contour.
Orientations that are really dierent from the background orientations stand out; orienta-
tions that match the background orientations are camouaged.
A curve is called rectiable if its line integral
_
|
(t)|dt is nite. Fractals are non-rectiable

curves. Hausdor measure:
inf
d
r
d
i
smallest power such that the curve can be covered by nitely many balls. Four categories:
dust, curves, turbulence, ow. If you continue *along* the edge, is it long (ow, curve)
22
or short? (dust, turbulence) If you continue perpendicular to the edge, is it long (ow,
turbulence) or short (dust, curve)?
Divide an image into four parts: dust, turbulence, ow, and curves. This distinguishes
background and foreground.
On Paolina, curves are her bodys outline; ow is the straight parts of her hair; turbulence
is the curly parts of her hair; isolated dust points are just specks.
23

Advanced Computer Vision: Sarah Constantin April 19, 2012

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Advanced Computer Vision: Sarah Constantin April 19, 2012

Hochgeladen von

Copyright:

Verfügbare Formate

Advanced Computer Vision

. We would like to maximize

(c) = 0. If you dene three points

(0) : (t) U, (0) = p}

(t)|dt is nite. Fractals are non-rectiable

Das könnte Ihnen auch gefallen