Convolution Lec1

Image Processing
Outline
• Logistics
• Motivation
• Convolution
• Filtering
Waitlist
• We are at 103 enrolled with 158 students on wait

list. This room holds 107.
• I’m getting numerous requests of the form “how

likely is it that I’ll get registered?” unlikely :(
• If you are considering dropping, please do so

quickly
Some final class
philosophies
• Diverse background of class implies folks will find some topics will be
redundant/new (e.g., EE folks might be bored by today’s signal
processing)
• I think 1-way lectures are boring (and such context can easily be found
elsewhere). Discussions are way more fun! I encourage you to come to
class.
• I hate power-point. I’d rather write on board, but this room is not
conducive for it. I still encourage you to take notes.
• If you are going to come and check e-mail / Facebook, I’d rather you
drop now to make room for someone else who’d get more out of
lecture.
Outline
• Logistics
• Motivation
• Convolution
• Filtering
Computational perspective
Credited with early computational approach for vision
David Marr, 1970s

Fei-Fei Li & Andrej Karpathy! Lecture 1 - !12' 5"Jan"15'
!
David Marr, 1982
!
David Marr
Low-level Mid-level High-level

Low-level vision
Finding edges, blobs, bars, etc….

Consider family of low-level
image processing operations
Photoshop / Instragram filters: blur, sharpen, colorize, etc….
Are certain combinations redundant? Is there a mathematical way to characterize them?

Recall: what is a digital (grayscale) image?
Matrix of integer values

Images as height fields
Let’s think of image as zero-padded functions
F[i,j]
Characterizing image
transformations
F[i,j] T G[i,j]
5 4 2 3 7 4 6 5 3 6 F[i] T G[i] 5 4 2 3 7 4 6 5 3 6
G = T (F )
G[i] = T (F [i])
(Abuse of T (↵F1[i]+does
notation: ↵F not2 ) = transformation
mean ↵G1 + Gis 2applied at each pixel separately)
How do we characterize
image processing operations ?
Properties of “nice” functional transformations
Additivity
T (F1 + F2 ) = T (F1 ) + T (F2 )
Scaling
T (↵F ) = ↵T (F )
Direct consequence: Linearity

T (↵F1 + ↵F2 ) = ↵G1 + G2
Shift Invariance
G[i j] = T (F [i j])
Impulse response
[also called delta function]
[i] = 1 for i = 0 (0 othwerwise)
What does this look like for an image?
Any function can be written as linear combination of shifted and scaled impulse reponses
= + + + +
... ...
Figure 1: Staircase approximation to a continuous-time signal.

F[i] = ?
F [i] = signals
Representing F [0] with
[i] + F [1] [iAny signal
impulses. 1] +can . . be
. expressed as a sum of scaled and
shifted unit impulses.X We begin with the pulse or “staircase” approximation to a continuous
signal F ,[i] =
as illustratedFin[u] [i Conceptually,
Fig. 1. u] this is trivial: for each discrete sample of the
original signal, we make a pulse signal. Then we add up all these pulse signals to make up the
u
Convolution
= + + + +
... ...
Figure 1: Staircase approximation to a continuous-time signal.
F [i] =Representing
F [0] [i]signals+ Fwith [1] impulses.
[i 1]Any + .signal
. . can be expressed as a sum of scaled and
shiftedXunit impulses. We begin with the pulse or “staircase” approximation to a continuous
F [i]signal
= , as illustrated in Fig. 1. Conceptually, this is trivial: for each discrete sample of the
F [u] [i u]
original signal, we make a pulse signal. Then we add up all these pulse signals to make up the
u signal. Each of these pulse signals can in turn be represented as a standard pulse
approximate
X
scaled by the appropriate value and shifted to the appropriate place.
impulseIn mathematical
response,notation:
filter, kernel
T (F [i]) = F [u]T ( [i u])
u
X
G[i]As=we let F [u]H[i
approach zero, the u] where becomes
approximation H[i]better
= Tand( better,
[i]),andG[i]
the in = T (F [i])
the limit
equals . Therefore,
u
G=F ⇤H
Also, as , the summation approaches an integral, and the pulse approaches the unit impulse:
January 20, 2015
Example X
G[i] = F [i] ⇤ H[i] = F [u]H[i u]
u
H F
X
1 2 3 * 5 ⇤4F [i]2= 3 H[u]F
= H[i] 7 4[i 6u] 5 3 6
u
0 1 2 0 1 2 3 4 5 6 7 8 9
X
G[i] = F [i] ⌦ H[i] = H[u]F [i + u]
u
G[0] = ?
= F [i]G[1]
⇤ H[= ?i]
XX
G[i, j] = F ⇤ H = F [u, v]H[i u, j v]
u v
XX
G[i, j] = F ⇤ H = H ⇤ F = H[u, v]F [i u, j v]
u v
Example
1 2 3 * 5 4 2 3 7 4 6 5 3 6
-3 -2 -1 0 1 2 3 4 5 6 7 8 9
3 2 1
G[0] = 5x1 = 5
G[1] = 5x2+ 4x1 = 14
G[2] = 5x3 + 4x2 + 2x1 = 25
…
Preview of 2D
h
f
Properties of convolution
F ⇤H =H ⇤F Commutative
(F ⇤ H) ⇤ G = F ⇤ (H ⇤ G) Associative
(F ⇤ G) + (H ⇤ G) = (F + H) ⇤ G Distributive
Implies that we can efficiently implement complex operations
Powerful way to think about any image transformation that

satisfies additivity, scaling, and shift-invariance
Proof: commutativity
X X
H ⇤F = H[u]F [i u] = H[i u0 ]F [u0 ] where u0 = i u
u u0
X
= F [u]H[i u] = F ⇤ H
u
Conceptually wacky: allows us to interchange the filter and image

Size
Given F of length N and H of length M, what’s size of G = F * H?
Size
Given F of length N and H of length M, what’s size of G = F * H?
N+M-1
>>conv(F,H,’full’)
N-M+1
>>conv(F,H,’valid’)
>>conv(F,H,’same’) N
January
Deva Ramanan 14, 2015
A simpler approach
January 14, 2015
XX
G[i, j] = F ⌦ H = H[u, v]F [i + u, j +
u v
XX XX
1 2 j]3 = F ⌦ HG[i,
G[i, 2 ⇤
=5 j] 4= G 3H =
H[u, 7v]F
4 [i + 5H[u,
6 u, j+ v]F
3 v]6 [i u, j
u v u v
-1 0 1 0X1X2 3 4 5 6 7 8 9
G[i, j] = G ⇤ H = H[u, v]F [i u, j v]
u v
Scan original F instead of flipped version.
What’s the math?
January
Deva Ramanan 14, 2015
(Cross) correlation
January 14, 2015
XX
G[i, j] = F ⌦ H = H[u, v]F [i + u, j +
u v
XX XX
1 2 j]3 = F ⌦ HG[i,
G[i, 2 ⇤
=5 j] 4= G 3H =
H[u, 7v]F
4 [i + 5H[u,
6 u, j+ v]F
3 v]6 [i u, j
u v u v
-1 0 1 0X1X2 3 4 5 6 7 8 9
G[i, j] = G ⇤ H = H[u, v]F [i u, j v]
u v
Scan original F instead of flipped version.
What’s the math?
u=k
X
F [i] ⌦ H[i] = H[u]F [i + u]
u= k
Properties
Associativity, Commutative properties do not hold
… but correlation is easier to think about

Template
Template
Deva Ramanan
ConvolutionDevavsRamanan
correlation
January 20, 2015
Deva Ramanan
(1-d)
January 20, 2015
January 20, 2015
X
G[i] = F [i] ⇤ H[i] =X F [u]H[i u]
Xu (convolution)
G[i] = F [i] ⇤ H[i] =
G[i] = F [i] ⇤ H[i] = FF[u]H[i
[u]H[i u]u]
u
u
X
= H[i] ⇤ F [i] =
X F [i
X u]H[u]
H[i] ⇤⇤ FF[i]
= H[i] [i] == (commutative property)
= uH[u]F[i[i u]u]
H[u]F
uu
X
X
X
G[i] = F [i] ⌦ H[i] = F [i + u]H[u]
G[i] = F
F[i] ⌦ H[i]
[i] ⌦ H[i]== H[u]F[i[i++u]u]
H[u]F (cross-correlation)
u
uu
= [i]⇤⇤⇤H[
=FFF[i]
[i] H[i]i] i]
H[ (exercise for reader!)
X
XXX
XX
G[i, j]
G[i,
G[i, j] =FFF⇤⇤⇤H
j] =
= HH=
== FF[u,
F v]H[i
[u,[u, u,u,j ju,v]jv] v]
v]H[i
v]H[i
Image filtering h[⋅ ,⋅ ] 1 1 1
2D correlation k k
1 1 1
1 1 1
X X
f [.,.]
Gaussian filtering
G[i, j] = F ⌦ H = H[u, v]F [i + u, j + v]
A Gaussian
u= kernel
k v=gives
of the window
g[.,.]
k less weight to pixels further from the center
0 00 00 00 00 00 00 00 00 00 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 10 20 30 30 30
0 0 0 90 90 90 90 90 0 0
1 2 1
0 00 00 090 9090 9090 9090 9090 900 00 0 0 20 40 60 60 60
2 4 2
0 0 0 90 90 90 90 90 0 0
0 1 30 2 60 1 90
00 00 090 9090 090 9090 9090 900 00 0 0 90 90
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0 0 30 50 80 80 90
0 0 0 0 0 0 0 0 0 0
0 00 00 9090 00 090 090 0 90 00 00 0 0 30 50 80 80 90
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0 0 20 30 50 50 60
0 0 0 0 0 0 0 0 0 0 10 20 30 30 30 30
0 0 90 0 0 0 0 0 0 0 10 10 10 0 0 0
0 This
0 kernel
0 is an0approximation
0 0 0 0 of a
0 Gaussian
0 function:
Slide by Steve Seitz
60
Convolution Template
= F [i] ⇤ H[ i]
vs correlation
G[i, j] = F ⇤ H = Deva(2-d)
XX
F Ramanan
[u, v]H[i u, j v]
u v
Convolution:
January
X14,
X 2015
h
G[i, j] = F ⇤ H = H ⇤ F = H[u, v]F [i u, j v] f
u v
Correlation: XX
G[i, j] = F ⌦ H = X Xv]F [i + u, j + v]
H[u, h
G[i, j] = F ⌦ Hu =v H[u, v]F [i + u, j + v] f
u v
XX
G[i, j] = G ⇤ H
Xk =X
k H[u, v]F [i u, j v]
G[i, j] = F ⇤ H = >> conv2(H,F)
H[u, v]F [i + u, j +convolution
v]
u v
u= k>>
v=filter2(H,F)
k correlation
Can we compute correlation with convolution?

Annoying details
What is the size of the output?
• MATLAB: filter2(g, f, shape)
Border effects
• shape = ‘full’: output size is sum of sizes of f and g
• shape = ‘same’: output size is same as f
• shape = ‘valid’: output size is difference of sizes of f and g
full same valid

g g g g
g g
f f f
g g
g g
g g
Border padding
Borders!
Examples of correlation
Linear filters: examples
1 1 1
1 1 1
1 1 1
=
Original Blur (with a mean
filter)
Source: D. Lowe
Examples offilterscorrelation
Practice with linear
0 0 0
0 1 0 ?
0 0 0
Original
Source: D. Lowe
Examples offilterscorrelation
Practice with linear
0 0 0
0 1 0
0 0 0
Original Filtered
(no change)
Source: D. Lowe
Practice with linear filters
0 0 0
10 0 01 ?
0 0 0
Original Shifted left

By 1 pixel
Source: D.
Source: D. Lowe
Lowe
0 0 0
0 0 1
0 0 0

By 1 pixel
Source: D. Lowe
What would this look like for convolution?
00 00 00
10 00 01 ?
00 00 00
Original
By 1 pixel
Source:
Source:D.
D.Lowe
Lowe
Examples
Practice with
with
of
linear correlation
filters
Practice linear filters
0 0
00 0
00 00
0
0
0
10
0
1
00
0
01 ?
00 00 00
Original
By 1 pixel
By 1 pixel
Source: D. Lowe
Source:
Source:D.
D.Lowe
Lowe
0 0 0
1 2 1
0 1 0
2 4 2 /16
0 0 0
1 2 1
Original Filtered
(no change)
What would this look like for convolution?

Source: D. Lowe
Practice with linear filters Practice with linea
0 0 00 00 0 0 0 0
0 0 21 00 - -0 1 0 ? 0
0 0 00 00 0 0 0 0
Original Filtered Original

(no change)
Source: D. Lowe
0 0 00 00 1 2 1 0 0 0
( 0 0 11 00 - -2 4 2 /16 ) + 0 1 0
0 0 00 00 Sharpen1filter 2 1 0 0 0
blurred
Original image
image
Filtered
unit impulse
(identity)
(no change)
Source: D. Lowe
scaled impulse Gaussian Laplacian of Gaussian
Unsharp filter
Examples
Image!rota>on
Image!rota>on! !
? ? ?
? ? ⊗= ? ? =
⊗ ?
? ? ?
h[m,n] h[m,n]
€ €
f[m,n] f[m,n]
g[m,n] g[m,n]
Can rotations be represented with a convolution?

It is linear, but Itnot
is a
linear, but not
spatially a spatially
invariant invariant
operation. operation.
There There is not convolution.
is not convolution.
Are they linear shift-invariant (LSI) operations G[i,j] = T(F[i,j])?
Derivative filters (correlation)

1
⇥ ⇤
1 1 1
Practice
Question: what withas
happens linear filters
we repeatedly convolve
an image F with filter H?
F F*H
0 0 0
0 1 0
0 0 0
Original Filtered
(no change)
Source: D. Lowe
Aside for the probability junkies: The PDF of the sum of two random variables = convolution of their
PDFs functions. Repeated convolutions => repeated sums => CLT
Gaussian
2 3
1 2 1
1 4
2 4 25
16
1 2 1
Gaussian filters
= 1 pixel = 5 pixels = 10 pixels = 30 pixels

Implementation
Gaussian Kernel
Matlab: >> G = FSPECIAL('gaussian',HSIZE,SIGMA)
2 3
σ = 2 with 30 x 30 σ = 5 with 30 x 30
1 2 kernel 1 kernel
1 4
2 4 25
Standard deviation σ: determines extent of smoothing
• 16
1 2 1
63 Source: K. Grauman
Finite-support filters
Choosing kernel width
• The Gaussian function has infinite support, but discrete filters
use finite kernels
What should HSIZE be?

65 Source: K. Grauman
Rule-of-thumb
Set radius of filter to be 3 sigma

Useful representation:
Gaussian pyramid
Filter Pyramid.
Figure 1: Gaussian + subsample (to are
Depicted exploit redundancy
four levels in output)
of the Gaussian pyamid,
levels 0 to 3 presented from left to right.
Burt & Adelson 83
http://persci.mit.edu/pub_pdfs/pyramid83.pdf
[2] P.J. Burt. Fast filter transforms for image processing. Computer Graphics
Smoothing
ussian filters vs edge filters
How should filters behave on a flat region with value ‘v’ ?
= 5 pixels = 10 pixels = 30 pixels

Smoothing
ussian filters vs edge filters
How should filters behave on a flat region with value ‘v’ ?
Output ‘v’ Output 0

= 5 pixels = 10 pixels = 30 pixels
X X
H[i, j] = 1 H[i, j] = 0
ij ij
Template matching with filters
Template matching
Goal: find in image F[i,j]
emplate matching
Main challenge: What is a
good similarity or
distanceH[i,j]
measure
between two patches?
al: find •
•
Correlation in image
Zero-mean correlation
• Sum Square Difference
• Normalized Cross Correlation
in challenge: What is a
good similarity
Can we use or
Side by Derek Hoiem
filtering to build detectors?
distance measure 53
Attempt 1: correlate with eye patch
k
X k
X
G[i, j] = H[u, v]F [i + u, j + v]
Matching with filters
u= k v= k
Goal: find
T in image (2K+1)2
= H Fij = ||H||||Fij || cos ✓, H, Fij 2 R
Method 0: filter the image with eye patch
h[ m, n] = ∑ g[ k , l ] f [ m + k , n + l ]
plate matching
k ,l
f = image
g = filter
nd in image
What went wrong?
Input Filtered Image Side by Derek Hoiem

Attempt 1: correlate with eye patch
k
X k
X
G[i, j] = H[u, v]F [i + u, j + v]
u= k v= k
TGoal: find in image (2K+1)2
= H Fij = ||H||||Fij || cos ✓, H, Fij 2 R
Method 0: filter the image with eye patch
h[about
Useful to think ∑
m, n] =correlation
g[ k , l ]and
f [m + k,n + l]
convolution
plate matching
k ,l
f = image
g = filter
H
F ij
✓ij
nd in image
What went wrong?
Fij
H
Input Filtered Image Side by Derek Hoiem

Attempt 1.5:
correlate with
Template matching transformed eye patch
Goal: find in image
emplate matching
Main challenge: What is a
good similarity or
distance measure
between two patches?
al: find•
•
Correlation in image
Zero-mean correlation
• Sum Square Difference
• Normalized Cross Correlation
ain challenge: What is a

good similarity orthat response on a flat region is 0
Side by Derek Hoiem
Let’s transform filter such
distance measure
Attempt 1.5: correlate with zero-mean eye patch
k
X k
X
Goal:
G[i,find
j] = in image
(H[u, v] H̄)F [i + u, j + v]
u= k v= k
Method 1: filter
Xk the
Xk image with zero-mean Xk eye
Xk
h[ m, n]== ∑ ( f [ k ,H[u,
l ] − fv]F
) ( g[i[+mu,+jk+
, nv]+ l ]H̄
) F [i + u, j +
u=k ,l k v= k mean of f u= k v= k
True detections
False
detections
Input Filtered Image (scaled) Thresholded Image
57
MatchingAttempt
with filters 2: SSD
2
Goal: find
SSD[i, j]in=image
||H Fij ||
T
Method 2: SSD= (H Fij ) (H Fij )
n] =this∑be( gimplemented
h[ m,Can [ k , l ] − f [with
m + filtering?
k,n + l]) 2
k ,l
True detections
Input 1- sqrt(SSD) Thresholded Image
58
What will SSD find here?
What’s the potential
Goal: find in image downside of SSD?
Method 2: SSD
h[ m, n] = ∑ ( g[ k , l ] − f [ m + k , n + l ] ) 2
mplate matching
k ,l
find in image
challenge: What is a
Input 1- sqrt(SSD)
Side by Derek Hoiem
d similarity orhave been darkened by .5 scale factor)

(where eyes
ance measure SSD will fire on shirt

59
Normalized cross correlation
H T Fij where H, F are mean-centered
N CC[i, j] =
||H||||Fij ||
T
H
Matching
H Fwith
qij filters
=p
TH TF ✓ij
Goal:Hfind inij
F image
ij
Method 3: Normalized cross-correlation Fij
= cos ✓
ate matching True detections
in image
60
Input Normalized X-Correlation Thresholded Image
The above approaches to filtering were largely hand designed. This is partly
due to limitations in computing power and lack of access to large datasets in
Modern filter banks
he 80s and 90s. In modern approaches to image recognition the convolution
ernels/filtering operations areNeural
Convolutional oftenNets
learned from
(CNNs) huge
Lecun et alamounts
98 of training
data. Learn filters from training data to look for low, mid, and high-level features
In 1998 Yann LeCun created a Convolutional Network (named “LeNet”)

hat could recognize hand-written digits using a sequence of filtering op-
rations, subsampling and assorted nonlinearities the parameters of which
were learned via stochastic gradient descent on a large,labeled training set.
Rather than hand selecting the filters to use, part of LeNet’s training was to
pick for itself the most e↵ective set of filters. Modern ConvNets use basically
he same structure as LeNet but because of richer training sets and greater
omputing power we can recognize far more complex objects than handwrit- 61
en digits (see, for example, GoogLeNet in 2014 and other submissions to
A look back
• Any linear shift-invariant operation can be characterized by a convolution

• (Convolution) correlation intuitively corresponds to (flipped) matched-filters
• Derive filters by continuous operations (derivative, Gaussian, …)
• Contemporary application: convolutional neural networks
62

Convolution Lec1

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Convolution Lec1

Hochgeladen von

Copyright:

Verfügbare Formate

Image Processing

• We are at 103 enrolled with 158 students on wait

• I’m getting numerous requests of the form “how

• If you are considering dropping, please do so

David Marr, 1970s

Low-level Mid-level High-level

Finding edges, blobs, bars, etc….

Are certain combinations redundant? Is there a mathematical way to characterize them?

Matrix of integer values

Let’s think of image as zero-padded functions

Direct consequence: Linearity

[i] = 1 for i = 0 (0 othwerwise)

What does this look like for an image?

Figure 1: Staircase approximation to a continuous-time signal.

Figure 1: Staircase approximation to a continuous-time signal.

Implies that we can efficiently implement complex operations

Powerful way to think about any image transformation that

Conceptually wacky: allows us to interchange the filter and image

Associativity, Commutative properties do not hold

… but correlation is easier to think about

Can we compute correlation with convolution?

full same valid

Original Shifted left

Original Shifted left

What would this look like for convolution?

Original Filtered Original

scaled impulse Gaussian Laplacian of Gaussian

Can rotations be represented with a convolution?

= 1 pixel = 5 pixels = 10 pixels = 30 pixels

Matlab: >> G = FSPECIAL('gaussian',HSIZE,SIGMA)

What should HSIZE be?

Set radius of filter to be 3 sigma

= 5 pixels = 10 pixels = 30 pixels

Output ‘v’ Output 0

filtering to build detectors?

Input Filtered Image Side by Derek Hoiem

Input Filtered Image Side by Derek Hoiem

ain challenge: What is a

Input Filtered Image (scaled) Thresholded Image

Input 1- sqrt(SSD) Thresholded Image

d similarity orhave been darkened by .5 scale factor)

ance measure SSD will fire on shirt

ate matching True detections

In 1998 Yann LeCun created a Convolutional Network (named “LeNet”)

• Any linear shift-invariant operation can be characterized by a convolution

Das könnte Ihnen auch gefallen