Sie sind auf Seite 1von 212

Springer Topics in Signal Processing

Amar Mitiche J.K. Aggarwal

Computer Vision Analysis of Image Motion by Variational Methods

Topics in Signal Processing Amar Mitiche J.K. Aggarwal Computer Vision Analysis of Image Motion by Variational

Springer Topics in Signal Processing

Volume 10

For further volumes:

Amar Mitiche J. K. Aggarwal

Computer Vision Analysis of Image Motion by Variational Methods

Amar Mitiche INRS-Energie, Matériaux et Télécommunications Institut National de la Recherche Scientifique Montreal, QC Canada

J. K. Aggarwal Department of Electrical and Computer Engineering The University of Texas Austin, TX USA

ISSN

ISBN 978-3-319-00710-6 DOI 10.1007/978-3-319-00711-3

Springer Cham Heidelberg New York Dordrecht London

1866-2609

ISSN 1866-2617

(electronic)

ISBN 978-3-319-00711-3

(eBook)

Library of Congress Control Number: 2013939338

Springer International Publishing Switzerland 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

Contents

 

1

 

1

 

5

 

6

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

11

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

13

2.1

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

13

 

13

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

15

 

18

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

18

 

22

 

31

 

31

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

32

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

33

2.5

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

35

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

40

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

41

3.1

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

41

3.2

 

45

3.3

 

47

3.4

 

48

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

49

 

50

 

51

 

52

3.5

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

54

vi

Contents

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

59

.

.

.

.

.

.

.

.

.

.

.

.

.

61

 

61

 

63

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

67

.

.

.

.

.

.

.

.

.

.

.

.

70

 

70

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

71

 

72

 

75

 

83

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

87

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

88

4

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

95

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

95

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

99

 
 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

99

 

102

 

102

 

104

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

105

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

107

 

113

 

114

 

116

 

117

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

121

 

125

 
   

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

125

 

128

 

130

 
 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

130

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

132

 

133

 
 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

134

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

137

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

139

Contents

vii

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

143

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

143

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

148

   

148

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

150

 

152

   

153

 

155

Set

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

156

 

157

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

157

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

160

 

161

 

165

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

170

.

.

.

.

.

.

.

.

.

.

.

.

.

.

175

6.1

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

175

6.2

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

177

6.3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

181

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

182

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

183

6.4

 

184

   

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

185

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

188

6.5

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

197

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

202

Index

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

205

Chapter 1

Image Motion Processing in Visual Function

Retinal motion comes about whenever we move or look at moving objects. Small involuntary retinal movements take place even when we fixate on a stationary target. Processing of this ever-present image motion plays several fundamental functional roles in human vision. In machine vision as well, image motion processing by com- puter vision algorithms has in many useful applications several essential functions reminiscent of the processing by the human visual system. As the following discus- sion sets to point out, computer vision modelling of motion has addressed problems similar to some that have arisen in human vision research, including those concerning the earliest fundamental questions and explanations put forth by Helmholtz and by Gibson about human motion perception. However, computer vision motion models have evolved independently of human perception concerns and specificities, much like the camera has evolved independently of the understanding of the human eye biology and function [ 1].

1.1 Image Motion in Visual Function

The most obvious role of image motion processing by the human visual system is to perceive the motion of real objects . The scope and quality of this perception varies widely according to the visual task performed, ranging from detection where moving versus static labelling of objects in the visual field is sufficient, to event interpretation where a characterization of motion by more detailed evaluation or attributes is required. Less evident a role is the perception of depth. Computational and experimental investigations have revealed the link between the image motion and the variables of depth and three-dimensional (3D) motion. To emphasize this role of image motion, Nakayama and Loomis [2] named kineopsis, by analogy to stereopsis, the process of recovering depth and 3D motion from image motion.

2

1 Image Motion Processing in Visual Function

Kineopsis : The role of motion in the perception of depth, and structure thereof, has been known for a long time. In the words of Helmholtz for instance ([ 3], pp. 297), over a hundred years ago in his Handbook of Physiological Optics, 1910:

“If anybody with two good eyes will close one of them and look at unfamiliar objects of irregular form, he will be apt to get a wrong, or at any rate an unreliable, idea of their shape. But the instant he moves about, he will begin to have the correct apperceptions.”

He adds the following explanation as to the origin of this perception of environ- mental structure, or apperception as he called it:

“In the variations of the retinal image, which are the results of movements, the only way an apperception of differences of distance is obtained is by comparing the instantaneous image with the previous images in the eye that are retained in memory.”

This is the first recorded enunciation of structure-from-motion , tying the percep- tion of structure to image brightness variations. By distinguishing geometry from photometry, Gibson elaborated on this Helmholtz view of structure-from-motion and stated in his book The Perception of the Visual World , 1950, that image motion was the actual stimulus for the perception of structure, rather than image variations as Helmholtz conjectured. He was quite explicit about it when he wrote ([ 4], pp.119):

“When it is so considered, as a projection of the terrain or as the projection of an array of slanted surfaces, the retinal image is not a picture of objects but a complex of variations. If the relative motion is analyzed out and isolated from the complex of other variations, it proves to be a lawful and regular phenomenon. Defined as a gradient of motion, it is potentially a stimulus correlate for an experience of continuous distance on a surface, as we shall see, and one no longer is required to postulate a process of unconscious inference about isolated objects.”

By gradient of motion Gibson meant not the spatial or temporal variations of image motion but the image motion field itself, or optical flow , stating, when he discussed the example of the motion field on the retina of a flier landing on a runway ([ 4], pp. 128), that:

“The gradients of motion are approximately represented by a set of vectors indicating direc- tion and rate at various points. All velocities vanish at the horizon”.

The veracity of Gibson’s statement that image motion is the stimulus for the per- ception of structure is not so much surprising when we observe that the perception of the structure of a surface in motion does not change for different texture cover- ings of this surface. There have been several experiments designed to demonstrate unequivocally this perception of structure-from-motion, first the landmark kinetic depth effect experiment of Wallach and O’Connell [5] which used the shadow of a tilted rod projected on a translucent screen which viewers observed from the back. It was also demonstrated by Gibson et al. [6] who used a texture of paint splashed on two lined-up parallel transparent screens the shadows of which were presented to viewers on a frontal translucent screen. The most striking demonstrations are per- haps the experiments of Ullman [ 7] and of Rogers and Graham [ 8] with random dot distributions. Random dots constitute stimuli void of any texture or geometric arrangement. Rogers and Graham’s demonstration [ 8] is to some extent a mechan- ical counterpart of Ullman’s experiment with computer-generated random dots on rotating cylinders [ 7]. Ullman presented viewers with the orthographic projection on a computer screen of about a hundred points on each of two imaginary coaxial

1.1

Image Motion in Visual Function

3

1.1 Image Motion in Visual Function 3 Fig. 1.1 Ullman’s rotating cylinders setup simulated by a

Fig. 1.1 Ullman’s rotating cylinders setup simulated by a computer program: Viewers were shown the orthographic projection on a computer screen of a set of about a hundred random points on each of two coaxial cylinders of different radii. The cylinders outline was not included in the display so that they were imaginary to the viewers and, therefore, contributed no clue to the perception. Looking at the random dots image on the screen when the cylinders were not moving afforded no perception of depth. But when the cylinders were made to move, by a computer program, observers reported the vivid perception of two rotating coaxial cylinders and were also able to give a good estimate of the amount of rotation

cylinders of different radii (Fig. 1.1). The cylinders were imaginary in the sense that their outline was not presented in the display so as not to offer viewers a cue to the perception of structure. Looking at the image on the screen of the random dots on static cylinders afforded no perception of depth. But when the cylinders were made to move, by a computer program, observers reported perceiving vividly two rotating coaxial cylinders and could also estimate the amount of rotation. The view of Helmholtz on the role of image variations in the perception of struc- ture, which we quoted previously, is quite general but otherwise correct, because the link between image variations and image motion is lawful , to employ this expression often used by Gibson to mean a relation which can be formally specified by governing laws. Horn and Schunck provided us with such a law [ 9] in the form of a relation, or equation, deduced from the assumption that the image sensed from a given point of a surface in space remains unchanged when the surface moves. The equation, which we will investigate thoroughly in Chap. 3 and use repeatedly in the other chapters, is called the optical flow constraint , or the Horn and Schunck equation:

I x u + I y v + I t = 0 ,

(1.1)

where I x , I y , I t are the image spatiotemporal derivatives, t being the time and x, y

) is the optical flow vector. The

equation is written for every point of the image positional array. As to the link between image motion and environmental motion and structure, one can get a law, or equation, by drawing a viewing system configuration model, projecting points in three-dimensional space onto the imaging surface, and taking the time derivative of the projected points coordinates. Under a Cartesian reference

the image spatial coordinates, and (u, v) = ( dx

dt

, dy

dt

4

1 Image Motion Processing in Visual Function

system and a central projection model of imaging (to be detailed in Chap. 6), one can immediately get equations connecting optical flow to 3D motion and depth:

u

v

=

=

f U xW

Z

f V yW

Z

,

(1.2)

where X, Y, Z are the 3D coordinates, Z being the depth, and U = d X , V =

dt

are the corresponding coordinates of the 3D motion vector, called

the scene flow vector, and f is a constant representing the focal length of imag- ing. The equation applies to every point on the visible environmental surfaces. The viewing system configuration model, and Eq. (1.2), as well as other equations which follow from it, will be the subject of Chap. 6. We will nonetheless mention now the special but prevailing case of rigid 3D motion, i.e., compositions of 3D translations and rotations. When we express 3D velocity (U, V, W ) as coming from rigid body motion, then we have the Longuet-Higgins and Prazdny model equations [ 10 ]:

dY

dt

, W

dZ

dt

=

u = − xy

f

ω 1 + f 2 + x 2

f

ω 2 yω 3 + f τ 1 xτ 3

Z

v = − f 2 + y 2

f

ω 1 + xy

f

ω 2 + xω 3 + f τ 2 yτ 3

Z

,

(1.3)

where τ 1 , τ 2 , τ 3 are the coordinates of the translational component of the rigid motion and ω