Sie sind auf Seite 1von 17

CS 231A Computer Vision Midterm

Out: 12:30pm, February 25, 2015

Due: 12:30pm, February 27, 2015

1

Transformations

Solution Set

In this question, we will explore some interesting low-level properties of transformations.

(a) In the lectures, we showed that a 3D rotation can be formed as the product of three matrices

where

R x (α) =

1

0

0

0

cos α

sin α

R = R x (α)R y (β)R z (γ),

0

sin α cos α

, R y (β) =

cos β

0

sin β

0

1

0

sin β

0

cos β

, R z (γ) =

cos γ

sin γ

0

sin γ cos γ

0

0

0

1

are rotations around the x, y, and z axis, respectively. This is called the X-Y-Z Euler angle representation, indicating the order of matrix multiplication.

However, representing rotations in this form can lead to ambiguities, since the result de- pends on the order in which the transforms are performed.

Let p be the point obtained by rotating a point p with a rotation matrix R, so p = Rp. Give an expression for p obtained by rotating p in the following two ways:

First rotate α around x axis, then rotate γ around z axis.

First rotate γ around z axis, then rotate α around x axis.

Show that these rotations produce different values of p .

Solution:

No, the result is dependent on the order of rotation.

1

p = R x (α)R z (γ)p

1

0

0

0

cos α

sin α

cos γ

0

sin α cos α

 

cos γ

sin γ

0

sin γ

cos γ

 

0

 

0

0

1

 

0

 

sin α cos α

p

1

0

0

0

cos α

sin α

0

sin α

cos α

 

sin γ sin α cos γ sin α cos α

p

=

sin γ

cos α cos γ sin α cos γ

=

cos α sin γ sin α sin γ

p = R z (γ)R x (α)p

=

=

cos γ

sin γ

0

cos γ

sin γ

0

sin γ cos γ

0

sin γ cos α cos γ cos α sin α

0

0

1

2

p

p

(b)

(c)

To avoid the representation ambiguity introduced in part a, we can fix the rotation order. However, even if the order of rotation is fixed, this rotation system may still result in a degenerate representation.

For instance, let α, β , and γ be three different angles. We can represent an arbitrary rotation with respect to the X-Y-X axes as the product R x (α)R y (β)R x (γ) (note that the axes here are X-Y-X, not X-Y-Z). This representation of a rotation matrix should have three degrees of freedom. However if we set β = 0, something unusual happens. How many degrees of freedom are left? Hint: some trigonometric identities may be useful.

Solution:

R x (α)R y (0)R x (γ) =

=

=

1

0

0

1

0

0

1

0

0

0

cos α

sin α

0

sin α

cos α

 

1 0

0 1

0

0

0

cos α cos γ sin α sin γ sin α cos γ + cos α sin γ

0

cos(α + γ) sin(α + γ)

0

sin(α + γ) cos(α + γ)

0

0

1

 

1

0 cos γ

0

sin γ

0

0

sin γ

cos γ

0

cos α sin γ sin α cos γ sin α sin γ + cos α cos γ

Since it is parameterized by (α + γ), it only has one degree of freedom.

To avoid the loss of a degree of freedom mentioned above, we can use an alternative rotation representation defined as follows:

R(, θ) = I + sin θ[] × + (1 cos θ)[] 2

×

,

where is a unit vector and θ is the angle of rotation. Recall that [·] × is the matrix cross product operator (see Lecture 5).

Any valid rotation representation must satisfy some properties of rotation. Generally, for any rotation matrix R, we must have that R T R = I. Prove that this equality holds for the rotation matrix R defined above. (Hint: Consider the rotation given by R T .)

Solution:

[] T = [] × because

×

[] T

×

=

Therefore, we have

0

n z

n y

n z

0

n x

n y

n x

0

T = n z

0

n y

n z

0

n x

n y

n x

0

R(, θ) T = (I + sin θ[] × + (1 cos θ)[] 2

)

× )

×

= I + sin θ[] T + (1 cos θ)([] 2

×

= I sin θ[] × + (1 cos θ)[] 2

= R(, θ)

×

3

  = [].

T

T

Therefore, applying R T R is equivalent to not rotating at all, so R T R = I.

Alternatively, students can also write out R T R explicitly. First, [] T = [] × because

×

[] T

×

=

0

n z

n y

n z

0

n x

n y

n x

0

T = n z

0

n y

n z

0

n x

n y

n x

0

  = [].

Also observe that [] × 2 = nˆˆn T I , so we have

[] × 4 = (nˆˆn T I)(nˆˆn T I)

Thus,

= nˆˆn T nˆˆn T 2nˆˆn T + I

= nˆˆn T 2nˆˆn T + I

= [] 2 × .

R T R = (I + sin θ[] × + (1 cos θ)[] × 2 ) T (I + sin θ[] × + (1 cos θ)[] 2 × )

= (I sin θ[] × + (1 cos θ)[] × 2 )(I + sin θ[] × + (1 cos θ)[] 2 × )

= I + sin θ[] × + (1 cos θ)[] 2 × sin θ[] × sin 2 θ[] × 2 sin θ(1 cos θ)[] 3

×

+ (1 cos θ)[] 2 × + sin θ(1 cos θ)[] × 3 + (1 cos θ) 2 [] 4

×

= I

(d) Let us now consider points in 1-D. In homogeneous coordinates, a point in 1-D can be represented as (x, 1) T , where x R. More generally, such a point can be represented as (kx, k) T or generally as (x 1 , x 2 ) T .

A projective transformation of a point in 1D is represented by a 2 × 2 matrix,

x = H 2×2 x,

where H 2×2 has a non-zero determinant.

Given 4 points on a line, the cross ratio is defined as

where

Cross(x 1 , x 2 , x 3 , x 4 ) = |x 1 x 2 ||x 3 x 4 |

|x 1 x 3 ||x 2 x 4 | ,

|x i x j | = det

x

x

i1

i2

x

x

j1

j2

with point x i given by (x i1 , x i2 ) T and x j given by (x j1 , x j2 ) T .

Prove that the value of the cross ratio is invariant under any projective transformation of the line: ie show that if x = H 2×2 x, then

Cross(x 1 , x 2 , x 3 , x 4 ) = Cross(x 1 , x 2 , x 3 , x 4 ).

Equivalently, the cross ratio is invariant to the projective coordinate frame chosen for the line.

4

Solution:

|x i x

j | = |λ i Hx i λ j Hx j |

= det

λ i H 11 x i1 λ i H 21 x i1

+

+

λ i H 12 x i2 λ i H 22 x i2

λ j H 11 x j1 λ j H 21 x j1

+

+

=

λ i λ j (x i1 x j2 x i2 x j1 ) det H

11

H 21

H

H 22

12

= λ i λ j |x i x j | det H

λ j H 12 x j2 λ j H 22 x j2

Cross(x 1 , x 2 , x 3 , x 4 ) = |x

|x

1 x

2 ||x

3 x

| = λ 1 λ 2 |x 1 x 2 | det 3 λ 4 |x 3 x 4 | det H H = |x 1 x 2 ||x 3 x 4 |

4 |

λ 1 λ 3 |x 1 x 3 | det 2 λ 4 |x 2 x 4 | det

|x 1 x 3 ||x 2 x 4 | .

1 x

3 ||x

2 x 4

(e)

Given the previous result, does the cross ratio remain invariant under rotation? brief explanation.

Give a

Solution:

Yes, because rotation matrix is a special case of projective transformation.

(f)

Extra Credit

Show that the result in (??) holds even when x i = λ i H 2×2 x i , where points are scaled by

arbitrary scale factors λ i .

5

2

Multiview geometry

Assume that we have two cameras with camera matrices M 1 and M 2 . Suppose that the cameras generate image I 1 and I 2 , respectively, where p 1 , p 2 , p 3 , p 4 and p 1 , p 2 , p 3 , p 4 are corresponding points across the two images.

(a) Suppose M 1 = K 1 R 1 T 1 M 2 = K 2 R 2 T 2

(1)

where

K 1 = K 2 = K

R 1 = I = Identity matrix

R 2 = R

T 1 = T 2 = [0, 0, 0] T

Prove that the homography H defined by

p i = Hp i where i = 1, 2, 3, 4

(2)

(3)

can be expressed as H = KRK 1 .

Solution:

Consider p and p are corresponding points in two images whose world coordinate is P . We have

Therefore, we can have

p = K I

0 P

p = KR I

p = K R

0 P

0 P

= KR(K 1 K) I 0 P

= (KRK 1 )(K I 0 P)

= (KRK 1 )p = Hp

(b) Continuing with our setup in part (a). Do N > 3 corresponding points necessarily need to belong to the same planar surface in 3D in order for the above result in part (a) to be true? Justify your answer.

Solution:

No. Notice that in part (a), we did not use any coplanarity constraints.

6

(c) Now we change our setup of two cameras M 1 and M 2 as follows.

where

M 1 = K 1 R 1

T 1

M 2 = K 2 R 2

T 2

(4)

K 1 = K 2 = K

R 1 = I = Identity matrix

R 2 = R T 1 = 0 T 2 = t = t x

0

0 T

t y

t z T

(5)

All four image correspondences p i p i are projections of four points P i which lie on the same planar surface Π in 3D. Π is defined as the set of points X such that Z T X = 0, where Z =

1 T (v is a 3 × 1 vector). An illustration of this step can be found in Figure ??. Express

the homography H (defined by p = Hp) in terms of K, R, t, v and explain your derivation.

v T

of K , R , t , v and explain your derivation. v T Figure 1:

Figure 1: Camera setup used in part (c) (d) (e) and (f).

Solution:

To compute H we back-project a point p in the first view and determine the intersection point P of this ray with the plane X. The 3D point P is then projected into the second view.

For the first view p = K I

0 P and so any point on the ray P = K 1 p projects to p, where

d parameterizes the point on the ray. Since the 3D point P is on plane X it satisfies Z T P = 0.

d

7

This determines d, and P =

K 1 p

v T K

1 p . The 3D point P projects into the second view as

p = K R

= KR

t P

Kt

K 1 p

v T K

1 p

= K(R tv T )K 1 p

Therefore, we have H = K(R tv T )K 1 .

(d) Continuing with our setup in part (c). Prove that the fundamental matrix F which satisfies p T Fp = 0 can be expressed as

F = ([e 2 ] × H) T

where e 2 (see Figure ??) is the epipole in the second image plane.

Solution:

By the definition of fundamental matrix, the epipolar line l in the second image plane should be

l = F T p

On the other hand, we can also express l as

l = e 2 × p = [e 2 ] × p

Since p = Hp, we have F T = [e 2 ] × H.

(e) Continuing with our setup in part (c). In addition to the four correspondences on the plane, we are given two additional pairs of corresponding points (p 5 p 5 , p 6 p 6 ) that do NOT belong to the same planar surface Π. Show how to find epipole e 2 and explain your derivation. Hint: e 2 can be expressed as the intersection of two lines.

Solution:

Suppose the two additional correspondences are {p 5 , p 5 } and {p 6 , p 6 }. e 2 can be expressed as the intersection of two lines in the second image plane, where these two lines are p 5 × Hp 5 and p 6 × Hp 6 .

8

(f) Continuing with our setup in part (c). From what you have shown in the previous parts, write a function computeF.m that estimates the fundamental matrix F using 6 correpondences (4 of them belong to the same 3D plane).

6 correpondences (4 of them belong to the same 3D plane). (a) Image 1 (b) Image

(a) Image 1

(4 of them belong to the same 3D plane). (a) Image 1 (b) Image 2 Figure

(b) Image 2

Figure 2: Two images from camera M 1 and M 2

We provide

A dataset of six corresponding points from two images (Figure ??) under the camera setup shown in Figure ??. p1.mat includes six points from image 1 and p2.mat includes six corresponding points from image 2. The first four corresponding points are from the same 3D plane, and the last two corresponding points are out of that plane. You can run loadData.m to get familiar with the data.

Function computeH.m that estimates a homography from at least 4 points.

For this part, you need to turn in

An outline of the algorithm that you are using to compute F .

Your code for function computeF.m.

Your numerical results of F .

Hint:

You don’t need to (and should not) label additional points.

You can verify your results by computing p T Fp .

Solution:

function

F

=

computeF(

pt1,

pt2

)

if

(size(pt1,

1)

~=

6

||

size(pt2,

1)

~=

6),

error(’Wrong

number

of

points.’);

 

end

 

9

H =

computeH(pt1(1:4,

:),

pt2(1:4,

:));

x5

=

H

*

[pt1(5,:)’;

1];

x5

=

x5

/

x5(3);

x6

=

H

*

[pt1(6,:)’;

1];

x6

=

x6

/

x6(3);

l1

=

cross(x5,

[pt2(5,:)’;

1]);

l2

=

cross(x6,

[pt2(6,:)’;

1]);

e

=

cross(l1,

l2);

 

e

=

e

/

e(3);

ex

=

zeros(3,3);

ex(1,

2)

=

-e(3);

ex(2,

1)

=

e(3);

ex(1,

3)

=

e(2);

ex(3,

1)

=

-e(2);

ex(2,

3)

=

-e(1);

ex(3,

2)

=

e(1);

F =

(ex

*

H)’;

end

F = 10 4 × 0.0001

0.0000 0.0000 0.0008

0.0000

0.0139 0.0800 5.2679

0.0899

10

3

Hough Transform and Vanishing Points

For this problem, we wish to find the vanishing points of a scene by finding the intersection point of a set of parallel lines. In Problem Set 1, we selected a pair of parallel lines using 4 points and then extrapolated to find their intersection. This process is prone to noise and is sensitive to the selection of points (as many of you complained). To improve this process, we will densely sample many points along a set of parallel lines and use the Hough Transform to better fit lines in the scene. If we parameterize a line using r and

θ as presented in class, we can represent a line as y = cos(θ) x +

. As a result, each edge

point will be mapped to a curve in the polar parameter space by the Hough Transform.

r

sin(θ)

sin(θ)

(a) We will begin by building some intuition about the Hough transform. For the given Cartesian points in the left plot, sketch the corresponding curves in the Hough space. A rough sketch is ok; ensure your sketch captures the fact that there are 10 points that all lie along the same line.

Solution:

10 points that all lie along the same line . Solution: (b) Once we identify peaks

(b)

Once we identify peaks in the Hough space, we can map these back to the Cartesian space. For the given peaks in the Hough space (left plot below), sketch the corresponding lines in the Cartesian space. The sketches can be approximate.

Solution:

(c)

Assuming we densely sample the parallel lines in the scene that correspond to the same vanishing point, we can use the Hough Transform to identify the most likely set of lines. Since there could be multiple parallel lines in the scene, there could be multiple peaks in the Hough space.

Let the vanishing point lie at x 0 , y 0 . The lines in the image that intersect the vanishing point will form peaks in Hough space. Describe the curve in (θ, r) Hough space that all peaks will lie along. Express your answer as a function of the form r i = r(θ i , x 0 , y 0 ).

11

Solution: Since all of the points correspond to lines that should intersect at the same

Solution:

Since all of the points correspond to lines that should intersect at the same vanish- ing point, all of the peaks in the Hough transform should lie along the curve r(θ) = x 0 cos(θ) + y 0 sin(θ), which is the equation in Hough space of the set of all lines that go through the vanishing point.

(d)

If the points selected were from lines that corresponded to more than one vanishing point, what would the peaks in the Hough space look like? Give your answer qualitatively, referencing your solution to (c).

Solution:

The intersection points/peaks would represent a point wise sampling of multiple sinusoids, of the form r(θ) = x i cos(θ) + y i sin(θ) for each unique vanishing point, with the x i , y + i being the image coordinates of the vanishing

(e)

If we took the locations of the peaks described in part (d) and transformed them back into Cartesian space, how could we use a procedure similar to the Hough transform to find the location of the vanishing points?

Solution:

Plot lines in Cartesian space corresponding to detected peaks in the Hough parameter space. Then we can quantize the Cartesian space and using the lines found by the hough transform to cast votes for intersection points to find the vanishing points.

(f)

In the previous parts of this problem we overlooked the size of the bins in the Hough space. If the Hough space had bins of finite size to act as accumulators for the transformed points, describe the effects of bin size on the procedure described in parts (d) and (e), assuming that the original point sampling of the parallel lines described in part (a) was subject to noise. Specifically discuss the trade offs that must be considered when selecting bin size.

12

Solution:

Assuming that the thresholds for the peak detection is proportional to the size of the bins:

Smaller bin sizes more susceptible to noise, since the intersection of the curves or lines in parts e/f could be spread by noise, so there could be spurious peaks detected in neighboring cells. For part e, this would case spurious lines to be found, which would cause even more spread in part f when finding vanishing points.

Larger bin sizes. are less susceptible to noise causing spurious peaks, however they could result in more coarse detection of peaks since more noisy points could be incorporated in each bin. Also, if lines/points are close together in the parameter spaces, they night get binned together, so some lines and vanishing points may be merged.

together, so some lines and vanishing points may be merged. Figure 3: Scene with two planes

Figure 3: Scene with two planes containing parallel lines

(g) Suppose the sets of parallel lines described thus far in this problem belong to N planes, where the number of planes, N , is unknown. Each plane contains multiple sets of parallel lines with different directions. A simple illustration of this setup can be seen in Figure ?? with N=2 planes. How might we find N , the number of planes in the scene and their normal directions using techniques developed in earlier parts of this question? What additional information might you need? Hint: What properties do vanishing points that belong to the same plane have? Are they geometrically related?

Solution:

We can use Hough transform as described by parts d and e to find the vanishing points, some might be spurious because there are many more intersections. All vanishing points for a plane should fall on the the same horizon line (vanishing line). So we can use a

13

Hough transform on these vanishing points to find the most likely set of horizon lines. The number of peaks in the Hough transform is an estimate for the number of planes. The horizon lines for each plane can be used to find the normal from the expression

→− l

= K T n . We would need the camera matrix to find the normal.

(h) Extra Credit

One way to estimate the position (x 0 , y 0 ) of the vanishing point is to fit a model of the form found in (c) to the n observed Hough peaks that lie at (r i , θ i ). We could then use least squares to minimize the sum of squared errors between the measured r i and that predicted by the model r(θ i , x 0 , y 0 ).

min

x 0 ,y 0

n

i=1

(r i r(θ i , x 0 , y 0 )) 2

This is an elegant way to find a vanishing point just from the point measurements of the parallel lines in a scene.

Formulate an appropriate objective to describe this problem and solve it using least squares. For this part, you can assume that all peaks in Hough space correspond to lines intersecting at a single vanishing point (x 0 , y 0 ) in the image. What is the expression for the optimal vanishing point? Note: it is OK to leave matrix inverses in your solution, but try to simplify other terms as much as possible.

Solution:

n

i=1

n

i=1

Which can be reformed to

cos(θ i ) (r i

x 0 cos(θ i ) y 0 sin(θ i )) = 0

sin(θ i ) (r i x 0 cos(θ i ) y 0 sin(θ i )) = 0

n

i=1

n

i=1

cos(θ i ) 2 x 0 +

n

i=1

cos(θ i )sin(θ i ) y 0 = cos(θ i )r i

i=1

cos(θ i )sin(θ i ) x 0 +

n

i=1

sin(θ i ) 2

y 0 =

i=1

sin(θ i )r i

  i=1 cos(θ i )sin(θ i ) i=1 sin(θ i ) 2

i=1 cos(θ i ) 2

n

n

i=1 cos(θ i )sin(θ i )

n

n

x o

y

o

=

i=1 cos(θ i )r i

sin(θ i )r i

i=1

x o

y

o

=

i=1 cos(θ i ) 2

n

i=1 cos(θ i )sin(θ i )

n

i=1 cos(θ i )sin(θ i ) i=1 sin(θ i ) 2

n

n

1

i=1 cos(θ i )r i

i=1 sin(θ i )r i

Alternate formulations using SVD are also possible.

14

4

Blob detection

In lecture we learned how we can use a convolution with a Laplacian of a Gaussian to find edges in an image. In this question we will explore this idea in more detail.

(a) Suppose that you have the following black-and-white image. In our image encoding, white is represented by the value of 1 and black the value of 0.

is represented by the value of 1 and black the value of 0. Sketch a one-dimensional

Sketch a one-dimensional slice of the intensity (i.e. brightness) of this image in the x- direction. Then sketch the result of convolving this signal with the Laplacian of a Gaussian and mark where the edges are located in the plot of the convolution. Your sketches do not need to be scaled correctly because we have not specified the width of the Gaussian. Please ignore border-effects, i.e. ignore the effect of convolving the kernel with the borders of the image.

Draw your answer here:

i.e. ignore the effect of convolving the kernel with the borders of the image. Draw your
i.e. ignore the effect of convolving the kernel with the borders of the image. Draw your

Solution:

15

The signal starts at a high intensity / brightness (white), goes to low values (black), and then goes back to high values (white). We therefore draw the signal as starting at 1, going to 0, and going back to 1 (the shape of the graph is the important part here - the scale is arbitrary).

graph is the important part here - the scale is arbitrary). After convolving the signal with

After convolving the signal with the Laplacian of a Gaussian, the response looks like this (ignoring border-effects):

the response looks like this (ignoring border-effects): Note that this is the reverse of the response

Note that this is the reverse of the response from the lecture slides, since the signal in the lecture slides went from low to high to low, whereas the signal here goes from high to low to high.

(b) Given the following formula for a 2D Gaussian centered at (0, 0):

g =

1

2πσ 2 e (x 2 +y 2 )/(2σ 2 )

take derivatives and compute the formula for the 2D Laplacian of a Gaussian.

Solution:

First, we have the formula for a 2D Gaussian:

g =

1

2πσ 2 e (x 2 +y 2 )/(2σ 2 )

Taking the first partial derivative:

∂g

∂x = xg

σ

2

The second partial derivative is then:

Similarly:

2 g

∂x 2

2 g

∂y 2

=

g + x 2 g

σ

2

σ

4

=

g + y 2 g

σ

2

σ

4

16

So we can write the sum of the partial derivatives as:

Therefore:

2 g

∂x 2

+

2 g

∂y 2

=

(x 2 + y 2 2σ 2 )g

σ 4

2 norm g = σ 2 (

2 g

∂x 2

+

2 g

∂y 2

)

(x 2 + y 2 2σ 2 )g

=

=

σ 2

1

2πσ 4 e (x 2 +y 2 )/(2σ 2 ) (x 2 + y 2 2σ 2 )

(c) Suppose that you convolve a 2D Laplacian of a Gaussian with the following image, where the circle has radius r. Using your answer from parts ?? and ??, find the characteristic scale of this image, in terms of the radius of the circle r.

Hint: At the characteristic scale, the convolution will have the same zeros as the Laplacian convolution kernel.

have the same zeros as the Laplacian convolution kernel. Solution: From part b, the zeros of

Solution:

From part b, the zeros of the Laplacian occur when

x 2 + y 2 2σ 2 = 0

Because the convolution has the same zeros as the Laplacian convolution kernel at the characteristic scale (given in the hint), we want the zeros of the kernel to align with the edges of the image. The edges of the image occur at

x 2 + y 2 = r 2

Combining the above equations, we get that the characteristic scale occurs at

or

r 2 2σ 2 = 0

σ = r/ 2

17