Beruflich Dokumente
Kultur Dokumente
November 2014
Outline
5 Experiments
6 Summary
Artificial neuron
Sigmoid: ReLU:
1
'(z) = 1+exp{ z} '(z) = max{0, z}
ConvNet example
Kernel methods for deep learning [Cho and Saul. Kernel methods for
deep learning.]
Outline
5 Experiments
6 Summary
d
X
u >
(x, z) = ui · (x , z)i
i=1
d
X
u >
(x, z) = ui · (x , z)i
i=1
d
X
u >
(x, z) = ui · (x , z)i
i=1
d
X
u >
(x, z) = ui · (x , z)i
i=1
d
X
u >
(x, z) = ui · (x , z)i
i=1
d
X
u >
(x, z) = ui · (x , z)i
i=1
input output
H h x ij
D n
w
W D
out i, j, l ul T x ij , z l
input output
H h x ij
D n
w
W D
out i, j, l ul T x ij , z l
n
!
1 1X
MEX⇠ {ci } := log exp{⇠·ci }
i=1,...,n ⇠ n
i=1
n
!
1 1X
MEX⇠ {ci } := log exp{⇠·ci }
i=1,...,n ⇠ n
i=1
n
!
1 1X
MEX⇠ {ci } := log exp{⇠·ci }
i=1,...,n ⇠ n
i=1
input output
block t
out t MEX inp s bts s block t
, ct
H H
D D
block t W W
out t MEX inp s bts s block t
, ct out t max inp t ,0
Simply set:
Input blocks – single entries (output dimensions same as input’s).
bts = 0
ct = 0
⇠ ! +1
H
pool i , j
block t D
W
out t MEX inp s bts s block t
, ct out i, j, l max mean inp i ', j ', l i ', j ' pool i , j
Simply set:
Input blocks – 2D windows (output depth same as input’s).
bts = 0
Omit ct
⇠ ! +1 for max-pooling, ⇠ ! 0 for average-pooling.
H
pool i , j
block t D
W
out t MEX inp s bts s block t
, ct out i, j, l max mean inp i ', j ', l i ', j ' pool i , j
Simply set:
Input blocks – 2D windows (output depth same as input’s).
bts = 0
Omit ct
⇠ ! +1 for max-pooling, ⇠ ! 0 for average-pooling.
Note that ⇠ can be learned during training, i.e. a trade-off between max-
and average-pooling can be learned.
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 15 / 50
The SimNet architecture
Outline
5 Experiments
6 Summary
Outline
5 Experiments
6 Summary
1 1
x n k
1 1
H
n
W D sim l ul T x, z l out r MEX sim l brl l 1
n
MEX ul T x, z l brl
l 1
1 1
x n k
1 1
H
n
W D sim l ul T x, z l out r MEX sim l brl l 1
n
MEX ul T x, z l brl
l 1
n
ŷ (x ) = argmax MEX⇠ {u>
l (x, zl ) + brl }l=1
r =1,...,k
n
X
ŷ (x ) = argmax ↵rl · K (x, zl )
r =1,...,k l=1
n
X
ŷ (x ) = argmax ↵rl · K (x, zl )
r =1,...,k l=1
n P o
For all considered similarities, K (x, zl ) := exp ⇠ · di=1 (x, zl )i is a
kernel function:
n
X
ŷ (x ) = argmax ↵rl · K (x, zl )
r =1,...,k l=1
n P o
For all considered similarities, K (x, zl ) := exp ⇠ · di=1 (x, zl )i is a
kernel function:
Klin (x, z) = exp {⇠ · hx, zi} – “Exponential” kernel
n
X
ŷ (x ) = argmax ↵rl · K (x, zl )
r =1,...,k l=1
n P o
For all considered similarities, K (x, zl ) := exp ⇠ · di=1 (x, zl )i is a
kernel function:
Klin (x, z) = exp {⇠ · hx, zi} – “Exponential” kernel
Kl1 (x, z) = exp { ⇠ kx zk1 } – “Laplacian” kernel
n
X
ŷ (x ) = argmax ↵rl · K (x, zl )
r =1,...,k l=1
n P o
For all considered similarities, K (x, zl ) := exp ⇠ · di=1 (x, zl )i is a
kernel function:
Klin (x, z) = exp {⇠ · hx, zi} – “Exponential” kernel
Kl1 (x, z) = exp {n ⇠ kx zk1 }o– “Laplacian” kernel
Kl2 (x, z) = exp ⇠ kx zl k22 – “RBF” kernel
n
X
ŷ (x ) = argmax ↵rl · K (x, zl )
r =1,...,k l=1
n P o
For all considered similarities, K (x, zl ) := exp ⇠ · di=1 (x, zl )i is a
kernel function:
Klin (x, z) = exp {⇠ · hx, zi} – “Exponential” kernel
Kl1 (x, z) = exp {n ⇠ kx zk1 }o– “Laplacian” kernel
Kl2 (x, z) = exp ⇠ kx zl k22 – “RBF” kernel
Corollary
For all considered similarities, the basic SimNet with fixed ⇠ > 0 and
uniform weights (ul = 1) is a “reduced” kernel-SVM.
The network similarity templates z1 , ..., zn are the (reduced) support
vectors, and the MEX offsets brl are directly related to the SVM
coefficients.
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 22 / 50
SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output
With weighted similarities (i.e. when ul are not fixed), the basic SimNet is
no longer a kernel machine.
With weighted similarities (i.e. when ul are not fixed), the basic SimNet is
no longer a kernel machine.
With weighted similarities (i.e. when ul are not fixed), the basic SimNet is
no longer a kernel machine.
ŷ (x ) = argmax max{u>
l (x, zl ) + brl }
r =1,...,k l2[n]
ŷ (x ) = argmax max{u>
l (x, zl ) + brl }
r =1,...,k l2[n]
ŷ (x ) = argmax max{u>
l (x, zl ) + brl }
r =1,...,k l2[n]
ŷ (x ) = argmax max{u>
l (x, zl ) + brl }
r =1,...,k l2[n]
S
Then, up to boundary conditions: Ar = l2[n] Ar ,l .
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 24 / 50
SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output
half-spaces:
Arr ,l,l = {x : hx, zl i + brl
0 0
hx, zl 0 i + br 0 l 0 }
Ar ,l are intersections of half-spaces (polytopes), and the decision region
Ar is thus a union of n polytopes.
half-spaces:
Arr ,l,l = {x : hx, zl i + brl
0 0
hx, zl 0 i + br 0 l 0 }
Ar ,l are intersections of half-spaces (polytopes), and the decision region
Ar is thus a union of n polytopes.
With unweighted l2 -similarity ( (x, z)i = (xi zi )2 , ul = 1):
2 2
Arr ,l,l
0 0
= {x : kx zl k2 + brl kx zl 0 k2 + br 0 l 0 }
2 2
= {x : 2 hx, zl i kzl k2 + brl 2 hx, zl 0 i kzl 0 k2 + br 0 l 0 }
i.e. Arr ,l,l are again half-spaces, Ar ,l are polytopes and Ar is a union of n
0 0
polytopes.
half-spaces:
Arr ,l,l = {x : hx, zl i + brl
0 0
hx, zl 0 i + br 0 l 0 }
Ar ,l are intersections of half-spaces (polytopes), and the decision region
Ar is thus a union of n polytopes.
With unweighted l2 -similarity ( (x, z)i = (xi zi )2 , ul = 1):
2 2
Arr ,l,l
0 0
= {x : kx zl k2 + brl kx zl 0 k2 + br 0 l 0 }
2 2
= {x : 2 hx, zl i kzl k2 + brl 2 hx, zl 0 i kzl 0 k2 + br 0 l 0 }
i.e. Arr ,l,l are again half-spaces, Ar ,l are polytopes and Ar is a union of n
0 0
polytopes.
Adding weights to the l2 -similarity (ul no longer fixed) converts Arr ,l,l from
0 0
Outline
5 Experiments
6 Summary
“Locality-sharing-pooling SimNet”:
input similarity pooling output
q i, j 1
k
ph , pw Ph 1
H h x ij I
D n n
w J Pw
D out r
W pool ph , pw , l
sim i, j, l ul T x ij , z l MEX pool ph , pw , l brlph pw
MEX 1
sim i, j, l i, j :q i, j ph , pw
2 ph , pw ,l
“Locality-sharing-pooling SimNet”:
input similarity pooling output
q i, j 1
k
ph , pw Ph 1
H h x ij I
D n n
w J Pw
D out r
W pool ph , pw , l
sim i, j, l ul T x ij , z l MEX pool ph , pw , l brlph pw
MEX 1
sim i, j, l i, j :q i, j ph , pw
2 ph , pw ,l
Three layers:
1 Similarity layer
2 MEX layer for pooling: 2D input blocks, terms ct omitted, offsets
zeroed.
3 MEX layer for classification: densely connected, terms ct omitted,
offsets serve for classification.
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 29 / 50
SimNets and kernel machines A basic 3-layer SimNet with locality, sharing and pooling
As before, we set ul = 1 and denote the kernel function exp{⇠ · 1> (x, z)}
by K (x, z). The classification becomes:
X X
ŷ (inp) = argmax ↵rlph pw K (xº , zl )
r =1,...,k p ,p ,l
h w i,j:q(i,j)=(ph ,pw )
As before, we set ul = 1 and denote the kernel function exp{⇠ · 1> (x, z)}
by K (x, z). The classification becomes:
X X
ŷ (inp) = argmax ↵rlph pw K (xº , zl )
r =1,...,k p ,p ,l
h w i,j:q(i,j)=(ph ,pw )
where:
K (·, ·) is a kernel function.
Classified instance X contains the concatenation of all input patches.
Support-vectors Zlph pw are concatenations of vectors subject to:
“Sharing” constraint: Entries that correspond to the pool index
(ph , pw ) contain copies of zl .
“Locality” constraint: Entries that do not correspond to the pool index
(ph , pw ) contain “null values”.
Outline
5 Experiments
6 Summary
1
k
I 1 I 1
H h x ij 1
n q i, j
D n
ph , pw out r
w k
J J
shared MEX label i, j , r
W D offsets 2 i, j
sim i, j, l ul T
x ij , z l label i, j, r MEX 1
sim i, j, l br ,l ,q i , j
l
1
Lin, Chen, Yan. "Network in network".
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 34 / 50
Other SimNet settings – global average pooling
1
k
I 1 I 1
H h x ij 1
n q i, j
D n
ph , pw out r
w k
J J
shared MEX label i, j , r
W D offsets 2 i, j
sim i, j, l ul T
x ij , z l label i, j, r MEX sim i, j, l br ,l ,q i , j
Three layers:
1 l
1
Lin, Chen, Yan. "Network in network".
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 34 / 50
Other SimNet settings – global average pooling
1
k
I 1 I 1
H h x ij 1
n q i, j
D n
ph , pw out r
w k
J J
shared MEX label i, j , r
W D offsets 2 i, j
sim i, j, l ul T
x ij , z l label i, j, r MEX sim i, j, l br ,l ,q i , j
Three layers:
1 l
1 Similarity layer
1
Lin, Chen, Yan. "Network in network".
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 34 / 50
Other SimNet settings – global average pooling
1
k
I 1 I 1
H h x ij 1
n q i, j
D n
ph , pw out r
w k
J J
shared MEX label i, j , r
W D offsets 2 i, j
sim i, j, l ul T
x ij , z l label i, j, r MEX sim i, j, l br ,l ,q i , j
Three layers:
1 l
1 Similarity layer
1
Lin, Chen, Yan. "Network in network".
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 34 / 50
Other SimNet settings – global average pooling
1
k
I 1 I 1
H h x ij 1
n q i, j
D n
ph , pw out r
w k
J J
shared MEX label i, j , r
W D offsets 2 i, j
sim i, j, l ul T
x ij , z l label i, j, r MEX sim i, j, l br ,l ,q i , j
Three layers:
1 l
1 Similarity layer
The exact same rationale and resulting initialization scheme presented for
l2 -similarity apply also to the case of l1 -similarity, the only difference being
the replacement of the GMM with a mixture of Laplacian distributions,
where each Laplacian has statistically independent coordinates.
H h x ij block t
D n
w
W D
sim i, j, l ul T x ij , z l mex t MEX sim i, j, l bt , i , j ,l , ct
i , j ,l block t
H h x ij block t
D n
w
W D
sim i, j, l ul T x ij , z l mex t MEX sim i, j, l bt , i , j ,l , ct
i , j ,l block t
H h x ij block t
D n
w
W D
sim i, j, l ul T x ij , z l mex t MEX sim i, j, l bt , i , j ,l , ct
i , j ,l block t
This suggests estimating the mixture separately for each location, and
calculating offsets such that when appended to the outputs of the similarity
layer, the probabilistic heat maps take into account location dependency.
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 38 / 50
Other SimNet settings – global average pooling
The computed offsets serve for initialization of the MEX layer’s offsets.
Outline
5 Experiments
6 Summary
Experiments
2
Coates et al.: An analysis of single-layer networks in unsupervised feature learning.
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 41 / 50
Experiments
Experiments
2
Coates et al.: An analysis of single-layer networks in unsupervised feature learning.
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 41 / 50
Experiments
Evaluated SimNet
q i, j 1,1 q i, j 1, 2
stride=1
14 1 10
1
14
32 27 27 13
6 x ij
3 13
6 3 n q i, j 2,1 q i, j 2, 2
10
32 27 27 out r MEX 0 label i , j , r 1 i , j 27
108 108
2 n
sim i, j, l ul ,t xij ,t z l ,t ul ,t xij ,t z l ,t label i, j, r MEX 1 sim i, j, l br l q i , j
t 1 t 1 l 1
Evaluated ConvNet
q i, j 1,1 q i, j 1, 2
stride=1
14 1 10
2
1
14
32 27 27 13 n
6 x ij 2
3 13 out r pool , w r
6 3 n q i, j 2,1 q i, j 2, 2
n
32 27 27
pool ph , pw , l max relu i, j, l 1 i , j 27:q i , j ph , pw
conv i, j, l x ij , z l relu i, j, l max conv i, j, l ,0
q i, j 1,1 q i, j 1, 2
stride=1
14 1 10
2
14 1
32 27 13 n
6 x ij 2
13
3
q i, j 2,1 q i, j 2, 2
6 3 n
32 27 out r pool , w r
n
pool ph , pw , l code i , j , l
code i , j , l max mean x ij zl ' x ij z l ,0 1 i , j 27
2 l' 1 2 q i, j ph , pw
Results
Conclusions
Conclusions
Conclusions
Conclusions
Outline
5 Experiments
6 Summary
Summary
The SimNets architecture consists of two basic building blocks:
Similarity operator: generalizes the ConvNet convolutional operator.
MEX operator: generalizes ConvNet ReLU activation and pooling, but
allows much more...
Summary
The SimNets architecture consists of two basic building blocks:
Similarity operator: generalizes the ConvNet convolutional operator.
MEX operator: generalizes ConvNet ReLU activation and pooling, but
allows much more...
Summary
The SimNets architecture consists of two basic building blocks:
Similarity operator: generalizes the ConvNet convolutional operator.
MEX operator: generalizes ConvNet ReLU activation and pooling, but
allows much more...
Summary (cont’d)
Summary (cont’d)
Summary (cont’d)
Summary (cont’d)
Thank You