Sie sind auf Seite 1von 101

SimNets

A Generalization of Convolutional Networks

Nadav Cohen Amnon Shashua

The Hebrew University of Jerusalem

November 2014

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 1 / 50


SimNets — Goals

Generalization: an architecture which includes ConvNets as a special


case.

Abstraction: higher abstraction levels of each layer can potentially


give rise to more compact networks (fewer layers, fewer channels).

Initialization: statistical analysis of unlabeled data (K-means, GMM,


Laplacian Mixtures) form a "natural" initialization of parameters, and
could help in determining network architecture (layers, channels).

Kernel Machines: a stronger connection to classical machine learning


could open new doors from analysis to optimization.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 2 / 50


Convolutional neural networks (ConvNets)

Outline

1 Convolutional neural networks (ConvNets)

2 The SimNet architecture

3 SimNets and kernel machines


A basic neural-network analogy: input ! hidden layer ! output
A basic 3-layer SimNet with locality, sharing and pooling

4 Other SimNet settings – global average pooling

5 Experiments

6 Summary

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 3 / 50


Convolutional neural networks (ConvNets)

Artificial neuron

Common activation functions:

Sigmoid: ReLU:
1
'(z) = 1+exp{ z} '(z) = max{0, z}

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 4 / 50


Convolutional neural networks (ConvNets)

Artificial neural network (ANN)

instance to classify: object class of x: prediction rule:

x = (x1 , ..., xd ) y 2 {1, ..., k} ŷ (x) = argmax or


r =1,...,k

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 5 / 50


Convolutional neural networks (ConvNets)

ConvNet example

Convent: locality, sharing and pooling.

Source: Zeiler and Fergus. Visualizing and understanding convolutional networks.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 6 / 50


Convolutional neural networks (ConvNets)

Other deep learning approaches

HMAX [Serre et al. Robust object recognition with cortex-like


mechanisms.]

Kernel methods for deep learning [Cho and Saul. Kernel methods for
deep learning.]

Sum-product networks [Poon and Domingos. Sum-Product Networks: A


New Deep Architecture.]

Invariant scattering convolution networks [Bruna and Mallat. Invariant


Scattering Convolution Networks.]

Network in network [Lin, Chen, Yan. Network in network.]

Polynomial networks [Livni et al. An algorithm for training polynomial


networks.]

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 7 / 50


The SimNet architecture

Outline

1 Convolutional neural networks (ConvNets)

2 The SimNet architecture

3 SimNets and kernel machines


A basic neural-network analogy: input ! hidden layer ! output
A basic 3-layer SimNet with locality, sharing and pooling

4 Other SimNet settings – global average pooling

5 Experiments

6 Summary

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 8 / 50


The SimNet architecture

The SimNet architecture

The SimNet architecture consists of two basic building blocks:


Similarity operator: Generalizes the inner-product (convolutional)
operator found in ConvNets.
MEX operator: Replaces ConvNet ReLU activation and max/average
pooling, but allows much more...

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 9 / 50


The SimNet architecture

The similarity operator

The “similarity” between input x 2 Rd and template z 2 Rd with


corresponding weights u 2 Rd+ :

d
X
u >
(x, z) = ui · (x , z)i
i=1

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 10 / 50


The SimNet architecture

The similarity operator

The “similarity” between input x 2 Rd and template z 2 Rd with


corresponding weights u 2 Rd+ :

d
X
u >
(x, z) = ui · (x , z)i
i=1

: Rd ⇥ Rd ! Rd – point-wise similarity mapping. We consider the


following forms:

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 10 / 50


The SimNet architecture

The similarity operator

The “similarity” between input x 2 Rd and template z 2 Rd with


corresponding weights u 2 Rd+ :

d
X
u >
(x, z) = ui · (x , z)i
i=1

: Rd ⇥ Rd ! Rd – point-wise similarity mapping. We consider the


following forms:
“linear”: lin (x, z)i = xi · zi

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 10 / 50


The SimNet architecture

The similarity operator

The “similarity” between input x 2 Rd and template z 2 Rd with


corresponding weights u 2 Rd+ :

d
X
u >
(x, z) = ui · (x , z)i
i=1

: Rd ⇥ Rd ! Rd – point-wise similarity mapping. We consider the


following forms:
“linear”: lin (x, z)i = xi · zi
“l1 ”: l1 (x, z)i = |xi zi |

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 10 / 50


The SimNet architecture

The similarity operator

The “similarity” between input x 2 Rd and template z 2 Rd with


corresponding weights u 2 Rd+ :

d
X
u >
(x, z) = ui · (x , z)i
i=1

: Rd ⇥ Rd ! Rd – point-wise similarity mapping. We consider the


following forms:
“linear”: lin (x, z)i = xi · zi
“l1 ”: l1 (x, z)i = |xi zi |
“l2 ”: l2 (x, z)i = (xi zi )2

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 10 / 50


The SimNet architecture

The similarity operator

The “similarity” between input x 2 Rd and template z 2 Rd with


corresponding weights u 2 Rd+ :

d
X
u >
(x, z) = ui · (x , z)i
i=1

: Rd ⇥ Rd ! Rd – point-wise similarity mapping. We consider the


following forms:
“linear”: lin (x, z)i = xi · zi
“l1 ”: l1 (x, z)i = |xi zi |
“l2 ”: l2 (x, z)i = (xi zi )2

When setting u = 1, the corresponding similarities reduce to hx, zi,


kx zk1 and kx zk22 respectively.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 10 / 50


The SimNet architecture

The similarity layer

input output

H h x ij
D n
w
W D
out i, j, l ul T x ij , z l

Templates z1 , ..., zn 2 RhwD with corresponding weights u1 , ..., un 2 RhwD


+
are applied to input patches, creating n output channels (“feature maps”).

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 11 / 50


The SimNet architecture

The similarity layer

input output

H h x ij
D n
w
W D
out i, j, l ul T x ij , z l

Templates z1 , ..., zn 2 RhwD with corresponding weights u1 , ..., un 2 RhwD


+
are applied to input patches, creating n output channels (“feature maps”).

Note that setting ul = 1 and = lin reduces the similarity layer to a


standard convolutional layer.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 11 / 50


The SimNet architecture

The MEX operator

Max-min-Expectation Collapsing Smooth (“CS” ! “X”) operator:

n
!
1 1X
MEX⇠ {ci } := log exp{⇠·ci }
i=1,...,n ⇠ n
i=1

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 12 / 50


The SimNet architecture

The MEX operator

Max-min-Expectation Collapsing Smooth (“CS” ! “X”) operator:

n
!
1 1X
MEX⇠ {ci } := log exp{⇠·ci }
i=1,...,n ⇠ n
i=1

Parameter ⇠ 2 R spans a continuum between max, expectation (mean)


and min: 8
< max{ci } , ⇠ ! +1
MEX⇠ {ci } ! mean{ci } ,⇠ ! 0
:
min{ci } ,⇠ ! 1

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 12 / 50


The SimNet architecture

The MEX operator

Max-min-Expectation Collapsing Smooth (“CS” ! “X”) operator:

n
!
1 1X
MEX⇠ {ci } := log exp{⇠·ci }
i=1,...,n ⇠ n
i=1

Parameter ⇠ 2 R spans a continuum between max, expectation (mean)


and min: 8
< max{ci } , ⇠ ! +1
MEX⇠ {ci } ! mean{ci } ,⇠ ! 0
:
min{ci } ,⇠ ! 1
For a given ⇠, MEX exhibits the following “collapsing” property:

MEX⇠ {MEX⇠ {cº }} = MEX⇠ {cº }


i j i,j

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 12 / 50


The SimNet architecture

The MEX layer

input output

block t
out t MEX inp s bts s block t
, ct

Output element t is assigned by a MEX taken over the corresponding


input block, where:
Output-specific offsets bts 2 R are added to the elements of the block.
Optionally, the output-specific term ct 2 R participates in the MEX.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 13 / 50


The SimNet architecture

The MEX layer – generalization of ReLU activation

The MEX layer can realize ReLU activation found in ConvNets:


input output input output

H H
D D
block t W W
out t MEX inp s bts s block t
, ct out t max inp t ,0

Simply set:
Input blocks – single entries (output dimensions same as input’s).
bts = 0
ct = 0
⇠ ! +1

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 14 / 50


The SimNet architecture

The MEX layer – generalization of max/average-pooling

The MEX layer can realize max/average-pooling found in ConvNets:


input output input output

H
pool i , j
block t D
W
out t MEX inp s bts s block t
, ct out i, j, l max mean inp i ', j ', l i ', j ' pool i , j

Simply set:
Input blocks – 2D windows (output depth same as input’s).
bts = 0
Omit ct
⇠ ! +1 for max-pooling, ⇠ ! 0 for average-pooling.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 15 / 50


The SimNet architecture

The MEX layer – generalization of max/average-pooling

The MEX layer can realize max/average-pooling found in ConvNets:


input output input output

H
pool i , j
block t D
W
out t MEX inp s bts s block t
, ct out i, j, l max mean inp i ', j ', l i ', j ' pool i , j

Simply set:
Input blocks – 2D windows (output depth same as input’s).
bts = 0
Omit ct
⇠ ! +1 for max-pooling, ⇠ ! 0 for average-pooling.
Note that ⇠ can be learned during training, i.e. a trade-off between max-
and average-pooling can be learned.
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 15 / 50
The SimNet architecture

The SimNet architecture – generalization of ConvNets

To recap, the SimNet architecture can realize conventional ConvNets as


follows:
Convolutional layer: similarity layer with linear form (x, z)i = xi · zi
and unit weights ul = 1.
ReLU activation: MEX layer with single-entry input blocks, bts = 0,
ct = 0 and ⇠ ! +1.
Max-pooling: MEX layer with 2D input blocks, bts = 0, ct omitted
and ⇠ ! +1.
Dense layer: similarity layer with entire input as only block, linear
form (x, z)i = xi · zi and unit weights ul = 1.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 16 / 50


SimNets and kernel machines

Outline

1 Convolutional neural networks (ConvNets)

2 The SimNet architecture

3 SimNets and kernel machines


A basic neural-network analogy: input ! hidden layer ! output
A basic 3-layer SimNet with locality, sharing and pooling

4 Other SimNet settings – global average pooling

5 Experiments

6 Summary

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 17 / 50


SimNets and kernel machines

SimNets and kernel machines

So far, we set the architectural choices of SimNets to realize the special


case of classical ConvNets. We did not make use of many available
options, such as the l1 and l2 similarities, and the MEX offsets bts .

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 18 / 50


SimNets and kernel machines

SimNets and kernel machines

So far, we set the architectural choices of SimNets to realize the special


case of classical ConvNets. We did not make use of many available
options, such as the l1 and l2 similarities, and the MEX offsets bts .

Next, we consider two basic SimNet constructions, exploring their


connections to kernel machines:

Basic neural-network analogy:


input ! hidden layer ! output

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 18 / 50


SimNets and kernel machines

SimNets and kernel machines

So far, we set the architectural choices of SimNets to realize the special


case of classical ConvNets. We did not make use of many available
options, such as the l1 and l2 similarities, and the MEX offsets bts .

Next, we consider two basic SimNet constructions, exploring their


connections to kernel machines:

Basic neural-network analogy: Basic 3-layer network with locality,


input ! hidden layer ! output sharing and pooling

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 18 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

Outline

1 Convolutional neural networks (ConvNets)

2 The SimNet architecture

3 SimNets and kernel machines


A basic neural-network analogy: input ! hidden layer ! output
A basic 3-layer SimNet with locality, sharing and pooling

4 Other SimNet settings – global average pooling

5 Experiments

6 Summary

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 19 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

A basic neural-network analogy:


input ! hidden layer ! output

The “basic SimNet”:


input similarity output

1 1
x n k
1 1
H
n
W D sim l ul T x, z l out r MEX sim l brl l 1
n
MEX ul T x, z l brl
l 1

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 20 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

A basic neural-network analogy:


input ! hidden layer ! output

The “basic SimNet”:


input similarity output

1 1
x n k
1 1
H
n
W D sim l ul T x, z l out r MEX sim l brl l 1
n
MEX ul T x, z l brl
l 1

Classification corresponding to this network:

n
ŷ (x ) = argmax MEX⇠ {u>
l (x, zl ) + brl }l=1
r =1,...,k

MEX combines weighted similarities to n templates, where offsets assign


relevancy of templates to classes. For ⇠ ! +1 prediction is determined
by highest similarity (“nearest-neighbor”).
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 20 / 50
SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet and kernel machines


Fix ⇠ > 0. The basic SimNet’s classification rule becomes:

ŷ (x ) = argmax MEX⇠ {u>


l (x, zl ) + brl }nl=1
r =1,...,k
n
X
= argmax exp{⇠ · brl } · exp{⇠ · u>
l (x, zl )}
r =1,...,k | {z }
l=1
:=↵rl
n
X
= argmax ↵rl · exp{⇠ · u>
l (x, zl )}
r =1,...,k
l=1

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 21 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet and kernel machines


Fix ⇠ > 0. The basic SimNet’s classification rule becomes:

ŷ (x ) = argmax MEX⇠ {u>


l (x, zl ) + brl }nl=1
r =1,...,k
n
X
= argmax exp{⇠ · brl } · exp{⇠ · u>
l (x, zl )}
r =1,...,k | {z }
l=1
:=↵rl
n
X
= argmax ↵rl · exp{⇠ · u>
l (x, zl )}
r =1,...,k
l=1

Setting uniform weights ul = 1 we get:


n
( d
)
X X
ŷ (x ) = argmax ↵rl · exp ⇠ · (x, zl )i
r =1,...,k
l=1 i=1
| {z }
:=K (x,zl )

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 21 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet and kernel machines (cont’d)

n
X
ŷ (x ) = argmax ↵rl · K (x, zl )
r =1,...,k l=1

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 22 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet and kernel machines (cont’d)

n
X
ŷ (x ) = argmax ↵rl · K (x, zl )
r =1,...,k l=1
n P o
For all considered similarities, K (x, zl ) := exp ⇠ · di=1 (x, zl )i is a
kernel function:

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 22 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet and kernel machines (cont’d)

n
X
ŷ (x ) = argmax ↵rl · K (x, zl )
r =1,...,k l=1
n P o
For all considered similarities, K (x, zl ) := exp ⇠ · di=1 (x, zl )i is a
kernel function:
Klin (x, z) = exp {⇠ · hx, zi} – “Exponential” kernel

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 22 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet and kernel machines (cont’d)

n
X
ŷ (x ) = argmax ↵rl · K (x, zl )
r =1,...,k l=1
n P o
For all considered similarities, K (x, zl ) := exp ⇠ · di=1 (x, zl )i is a
kernel function:
Klin (x, z) = exp {⇠ · hx, zi} – “Exponential” kernel
Kl1 (x, z) = exp { ⇠ kx zk1 } – “Laplacian” kernel

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 22 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet and kernel machines (cont’d)

n
X
ŷ (x ) = argmax ↵rl · K (x, zl )
r =1,...,k l=1
n P o
For all considered similarities, K (x, zl ) := exp ⇠ · di=1 (x, zl )i is a
kernel function:
Klin (x, z) = exp {⇠ · hx, zi} – “Exponential” kernel
Kl1 (x, z) = exp {n ⇠ kx zk1 }o– “Laplacian” kernel
Kl2 (x, z) = exp ⇠ kx zl k22 – “RBF” kernel

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 22 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet and kernel machines (cont’d)

n
X
ŷ (x ) = argmax ↵rl · K (x, zl )
r =1,...,k l=1
n P o
For all considered similarities, K (x, zl ) := exp ⇠ · di=1 (x, zl )i is a
kernel function:
Klin (x, z) = exp {⇠ · hx, zi} – “Exponential” kernel
Kl1 (x, z) = exp {n ⇠ kx zk1 }o– “Laplacian” kernel
Kl2 (x, z) = exp ⇠ kx zl k22 – “RBF” kernel

Corollary
For all considered similarities, the basic SimNet with fixed ⇠ > 0 and
uniform weights (ul = 1) is a “reduced” kernel-SVM.
The network similarity templates z1 , ..., zn are the (reduced) support
vectors, and the MEX offsets brl are directly related to the SVM
coefficients.
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 22 / 50
SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet and kernel machines (cont’d)

With weighted similarities (i.e. when ul are not fixed), the basic SimNet is
no longer a kernel machine.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 23 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet and kernel machines (cont’d)

With weighted similarities (i.e. when ul are not fixed), the basic SimNet is
no longer a kernel machine.

More formally, as the following theorem states, learning weighted


templates (zl , ul ) through support-vectors of a kernel machine is not
possible in the cases of l1 and l2 similarities (for linear similarity weights
are not applicable):

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 23 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet and kernel machines (cont’d)

With weighted similarities (i.e. when ul are not fixed), the basic SimNet is
no longer a kernel machine.

More formally, as the following theorem states, learning weighted


templates (zl , ul ) through support-vectors of a kernel machine is not
possible in the cases of l1 and l2 similarities (for linear similarity weights
are not applicable):
Theorem
For any dimension d 2 N, constant c > 0 and p 2 {1, 2}, there are no
mappings Z : Rd ! Rd and U : Rd ! Rd+ and a kernel
K : (Rd ⇥ Rd+ ) ⇥ (Rd ⇥ Rd+ ) ! Rd ⇥ Rd+ , such that for all z, x 2 Rd and
u 2 Rd+ :
( d
)
X
K ([Z (x), U(x)], [z, u]) = exp c ui |xi zi |p
i=1
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 23 / 50
SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet – abstraction level

We now turn to a qualitative study of the basic SimNet’s abstraction level


(ability to capture category distributions) under the different similarity
measures.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 24 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet – abstraction level

We now turn to a qualitative study of the basic SimNet’s abstraction level


(ability to capture category distributions) under the different similarity
measures.

Consider the basic SimNet’s classification rule in the case ⇠ ! +1:

ŷ (x ) = argmax max{u>
l (x, zl ) + brl }
r =1,...,k l2[n]

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 24 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet – abstraction level

We now turn to a qualitative study of the basic SimNet’s abstraction level


(ability to capture category distributions) under the different similarity
measures.

Consider the basic SimNet’s classification rule in the case ⇠ ! +1:

ŷ (x ) = argmax max{u>
l (x, zl ) + brl }
r =1,...,k l2[n]

Define Ar to be the decision region corresponding to class r 2 {1, ..., k},


i.e. Ar := {x : ŷ (x ) = r }.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 24 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet – abstraction level

We now turn to a qualitative study of the basic SimNet’s abstraction level


(ability to capture category distributions) under the different similarity
measures.

Consider the basic SimNet’s classification rule in the case ⇠ ! +1:

ŷ (x ) = argmax max{u>
l (x, zl ) + brl }
r =1,...,k l2[n]

Define Ar to be the decision region corresponding to class r 2 {1, ..., k},


i.e. Ar := {x : ŷ (x ) = r }.
For r 0 2 {1, ..., k} and l, l 0 2 {1, ..., n} define:

Arr ,l,l := {x : u>


0 0

l (x, zl ) + brl u> l 0 (x, zl 0 ) + br 0 l 0 }


T
Ar ,l := (r 0 ,l 0 )6=(r ,l) Arr ,l,l
0 0

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 24 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet – abstraction level

We now turn to a qualitative study of the basic SimNet’s abstraction level


(ability to capture category distributions) under the different similarity
measures.

Consider the basic SimNet’s classification rule in the case ⇠ ! +1:

ŷ (x ) = argmax max{u>
l (x, zl ) + brl }
r =1,...,k l2[n]

Define Ar to be the decision region corresponding to class r 2 {1, ..., k},


i.e. Ar := {x : ŷ (x ) = r }.
For r 0 2 {1, ..., k} and l, l 0 2 {1, ..., n} define:

Arr ,l,l := {x : u>


0 0

l (x, zl ) + brl u> l 0 (x, zl 0 ) + br 0 l 0 }


T
Ar ,l := (r 0 ,l 0 )6=(r ,l) Arr ,l,l
0 0

S
Then, up to boundary conditions: Ar = l2[n] Ar ,l .
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 24 / 50
SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet – abstraction level with linear and


unweighted l2 similarities
In the case of linear similarity ( (x, z)i = xi · zi , ul = 1) Arr ,l,l are
0 0

half-spaces:
Arr ,l,l = {x : hx, zl i + brl
0 0
hx, zl 0 i + br 0 l 0 }
Ar ,l are intersections of half-spaces (polytopes), and the decision region
Ar is thus a union of n polytopes.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 25 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet – abstraction level with linear and


unweighted l2 similarities
In the case of linear similarity ( (x, z)i = xi · zi , ul = 1) Arr ,l,l are
0 0

half-spaces:
Arr ,l,l = {x : hx, zl i + brl
0 0
hx, zl 0 i + br 0 l 0 }
Ar ,l are intersections of half-spaces (polytopes), and the decision region
Ar is thus a union of n polytopes.
With unweighted l2 -similarity ( (x, z)i = (xi zi )2 , ul = 1):
2 2
Arr ,l,l
0 0
= {x : kx zl k2 + brl kx zl 0 k2 + br 0 l 0 }
2 2
= {x : 2 hx, zl i kzl k2 + brl 2 hx, zl 0 i kzl 0 k2 + br 0 l 0 }

i.e. Arr ,l,l are again half-spaces, Ar ,l are polytopes and Ar is a union of n
0 0

polytopes.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 25 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet – abstraction level with linear and


unweighted l2 similarities
In the case of linear similarity ( (x, z)i = xi · zi , ul = 1) Arr ,l,l are
0 0

half-spaces:
Arr ,l,l = {x : hx, zl i + brl
0 0
hx, zl 0 i + br 0 l 0 }
Ar ,l are intersections of half-spaces (polytopes), and the decision region
Ar is thus a union of n polytopes.
With unweighted l2 -similarity ( (x, z)i = (xi zi )2 , ul = 1):
2 2
Arr ,l,l
0 0
= {x : kx zl k2 + brl kx zl 0 k2 + br 0 l 0 }
2 2
= {x : 2 hx, zl i kzl k2 + brl 2 hx, zl 0 i kzl 0 k2 + br 0 l 0 }

i.e. Arr ,l,l are again half-spaces, Ar ,l are polytopes and Ar is a union of n
0 0

polytopes.

This implies that qualitatively, unweighted l2 -similarity induces the same


abstraction level as linear similarity (convolutional layer).
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 25 / 50
SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet – abstraction level with weighted l2


similarity

Adding weights to the l2 -similarity (ul no longer fixed) converts Arr ,l,l from
0 0

half-spaces to regions defined by second-order hyper-surfaces. Ar ,l are not


necessarily polytopes, and the shapes that Ar can take are enriched.

Qualitatively, the addition of weights to the l2 -similarity increases the


abstraction level above that of the linear similarity (convolutional layer).

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 26 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet – abstraction level with l1 similarity

zi |) Arr ,l,l are regions defined by


0 0
With l1 similarity ( (x, z)i = |xi
piecewise-linear surfaces.

In the unweighted case (ul = 1) the space is partitioned equally (up to


shift caused by offsets). Adding weights allows more complex decision
surfaces.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 27 / 50


SimNets and kernel machines A basic neural-network analogy: input ! hidden layer ! output

The basic SimNet – abstraction level with l1 similarity

zi |) Arr ,l,l are regions defined by


0 0
With l1 similarity ( (x, z)i = |xi
piecewise-linear surfaces.

In the unweighted case (ul = 1) the space is partitioned equally (up to


shift caused by offsets). Adding weights allows more complex decision
surfaces.

Qualitatively, the abstraction level induced by the l1 similarity is higher


than that of the linear similarity (convolutional layer).

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 27 / 50


SimNets and kernel machines A basic 3-layer SimNet with locality, sharing and pooling

Outline

1 Convolutional neural networks (ConvNets)

2 The SimNet architecture

3 SimNets and kernel machines


A basic neural-network analogy: input ! hidden layer ! output
A basic 3-layer SimNet with locality, sharing and pooling

4 Other SimNet settings – global average pooling

5 Experiments

6 Summary

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 28 / 50


SimNets and kernel machines A basic 3-layer SimNet with locality, sharing and pooling

A basic 3-layer SimNet with locality, sharing and pooling

“Locality-sharing-pooling SimNet”:
input similarity pooling output

q i, j 1
k
ph , pw Ph 1
H h x ij I
D n n
w J Pw
D out r
W pool ph , pw , l
sim i, j, l ul T x ij , z l MEX pool ph , pw , l brlph pw
MEX 1
sim i, j, l i, j :q i, j ph , pw
2 ph , pw ,l

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 29 / 50


SimNets and kernel machines A basic 3-layer SimNet with locality, sharing and pooling

A basic 3-layer SimNet with locality, sharing and pooling

“Locality-sharing-pooling SimNet”:
input similarity pooling output

q i, j 1
k
ph , pw Ph 1
H h x ij I
D n n
w J Pw
D out r
W pool ph , pw , l
sim i, j, l ul T x ij , z l MEX pool ph , pw , l brlph pw
MEX 1
sim i, j, l i, j :q i, j ph , pw
2 ph , pw ,l

Three layers:
1 Similarity layer
2 MEX layer for pooling: 2D input blocks, terms ct omitted, offsets
zeroed.
3 MEX layer for classification: densely connected, terms ct omitted,
offsets serve for classification.
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 29 / 50
SimNets and kernel machines A basic 3-layer SimNet with locality, sharing and pooling

Locality-sharing-pooling SimNet classification

Fixing ⇠1 = ⇠2 = ⇠ > 0, and using the MEX collapsing property, the


network’s classification becomes:
n o
ŷ (inp) = argmax MEX⇠ u>l (x ,
º lz ) + br ,l,q(i,j)
r =1,...,k i,j,l

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 30 / 50


SimNets and kernel machines A basic 3-layer SimNet with locality, sharing and pooling

Locality-sharing-pooling SimNet classification

Fixing ⇠1 = ⇠2 = ⇠ > 0, and using the MEX collapsing property, the


network’s classification becomes:
n o
ŷ (inp) = argmax MEX⇠ u>l (x ,
º lz ) + br ,l,q(i,j)
r =1,...,k i,j,l

Recall the classification of the basic SimNet:

ŷ (x ) = argmax MEX⇠ {u>


l (x, zl ) + brl }l
r

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 30 / 50


SimNets and kernel machines A basic 3-layer SimNet with locality, sharing and pooling

Locality-sharing-pooling SimNet classification

Fixing ⇠1 = ⇠2 = ⇠ > 0, and using the MEX collapsing property, the


network’s classification becomes:
n o
ŷ (inp) = argmax MEX⇠ u>l (x ,
º lz ) + br ,l,q(i,j)
r =1,...,k i,j,l

Recall the classification of the basic SimNet:

ŷ (x ) = argmax MEX⇠ {u>


l (x, zl ) + brl }l
r

Two important differences here:


Similarity is applied to patches (“locality”+“sharing”)
MEX offsets are shared across regions (“pooling”)

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 30 / 50


SimNets and kernel machines A basic 3-layer SimNet with locality, sharing and pooling

Locality-sharing-pooling SimNet and kernel machines

The network’s classification can be expressed as:


X X
ŷ (inp) = argmax e ⇠·ul
> (xº ,zl )
↵rlph pw
r =1,...,k p ,p ,l
h w i,j:q(i,j)=(ph ,pw )

where ↵rlph pw := e ⇠·br ,l,q(i,j) .

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 31 / 50


SimNets and kernel machines A basic 3-layer SimNet with locality, sharing and pooling

Locality-sharing-pooling SimNet and kernel machines

The network’s classification can be expressed as:


X X
ŷ (inp) = argmax e ⇠·ul
> (xº ,zl )
↵rlph pw
r =1,...,k p ,p ,l
h w i,j:q(i,j)=(ph ,pw )

where ↵rlph pw := e ⇠·br ,l,q(i,j) .

As before, we set ul = 1 and denote the kernel function exp{⇠ · 1> (x, z)}
by K (x, z). The classification becomes:
X X
ŷ (inp) = argmax ↵rlph pw K (xº , zl )
r =1,...,k p ,p ,l
h w i,j:q(i,j)=(ph ,pw )

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 31 / 50


SimNets and kernel machines A basic 3-layer SimNet with locality, sharing and pooling

Locality-sharing-pooling SimNet and kernel machines

The network’s classification can be expressed as:


X X
ŷ (inp) = argmax e ⇠·ul
> (xº ,zl )
↵rlph pw
r =1,...,k p ,p ,l
h w i,j:q(i,j)=(ph ,pw )

where ↵rlph pw := e ⇠·br ,l,q(i,j) .

As before, we set ul = 1 and denote the kernel function exp{⇠ · 1> (x, z)}
by K (x, z). The classification becomes:
X X
ŷ (inp) = argmax ↵rlph pw K (xº , zl )
r =1,...,k p ,p ,l
h w i,j:q(i,j)=(ph ,pw )

This classification can be expressed as a (reduced) kernel-SVM, with a


kernel K (·, ·) based on K (·, ·), that is designed for instances represented
by collections of vectors (“patches”).

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 31 / 50


SimNets and kernel machines A basic 3-layer SimNet with locality, sharing and pooling

Locality-sharing-pooling SimNet and kernel machines


(cont’d)

More explicitly, we may write:


X
ŷ (inp) = argmax ↵rlph pw K (X , Zlph pw )
r =1,...,k p ,p ,l
h w

where:
K (·, ·) is a kernel function.
Classified instance X contains the concatenation of all input patches.
Support-vectors Zlph pw are concatenations of vectors subject to:
“Sharing” constraint: Entries that correspond to the pool index
(ph , pw ) contain copies of zl .
“Locality” constraint: Entries that do not correspond to the pool index
(ph , pw ) contain “null values”.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 32 / 50


Other SimNet settings – global average pooling

Outline

1 Convolutional neural networks (ConvNets)

2 The SimNet architecture

3 SimNets and kernel machines


A basic neural-network analogy: input ! hidden layer ! output
A basic 3-layer SimNet with locality, sharing and pooling

4 Other SimNet settings – global average pooling

5 Experiments

6 Summary

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 33 / 50


Other SimNet settings – global average pooling

Other SimNet settings – global average pooling

Following the “global average pooling” paradigm1 recently suggested for


ConvNets, we reverse the order of the MEX pooling and classification,
referring to the resulting network as “patch-labeling SimNet”:
input similarity labeling output

1
k
I 1 I 1
H h x ij 1
n q i, j

D n
ph , pw out r
w k
J J
shared MEX label i, j , r
W D offsets 2 i, j

sim i, j, l ul T
x ij , z l label i, j, r MEX 1
sim i, j, l br ,l ,q i , j
l

1
Lin, Chen, Yan. "Network in network".
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 34 / 50
Other SimNet settings – global average pooling

Other SimNet settings – global average pooling

Following the “global average pooling” paradigm1 recently suggested for


ConvNets, we reverse the order of the MEX pooling and classification,
referring to the resulting network as “patch-labeling SimNet”:
input similarity labeling output

1
k
I 1 I 1
H h x ij 1
n q i, j

D n
ph , pw out r
w k
J J
shared MEX label i, j , r
W D offsets 2 i, j

sim i, j, l ul T
x ij , z l label i, j, r MEX sim i, j, l br ,l ,q i , j
Three layers:
1 l

1
Lin, Chen, Yan. "Network in network".
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 34 / 50
Other SimNet settings – global average pooling

Other SimNet settings – global average pooling

Following the “global average pooling” paradigm1 recently suggested for


ConvNets, we reverse the order of the MEX pooling and classification,
referring to the resulting network as “patch-labeling SimNet”:
input similarity labeling output

1
k
I 1 I 1
H h x ij 1
n q i, j

D n
ph , pw out r
w k
J J
shared MEX label i, j , r
W D offsets 2 i, j

sim i, j, l ul T
x ij , z l label i, j, r MEX sim i, j, l br ,l ,q i , j
Three layers:
1 l

1 Similarity layer

1
Lin, Chen, Yan. "Network in network".
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 34 / 50
Other SimNet settings – global average pooling

Other SimNet settings – global average pooling

Following the “global average pooling” paradigm1 recently suggested for


ConvNets, we reverse the order of the MEX pooling and classification,
referring to the resulting network as “patch-labeling SimNet”:
input similarity labeling output

1
k
I 1 I 1
H h x ij 1
n q i, j

D n
ph , pw out r
w k
J J
shared MEX label i, j , r
W D offsets 2 i, j

sim i, j, l ul T
x ij , z l label i, j, r MEX sim i, j, l br ,l ,q i , j
Three layers:
1 l

1 Similarity layer

2 MEX layer for patch classification: cross-channel input blocks, terms

ct omitted, offsets serve for classification and shared across regions.

1
Lin, Chen, Yan. "Network in network".
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 34 / 50
Other SimNet settings – global average pooling

Other SimNet settings – global average pooling

Following the “global average pooling” paradigm1 recently suggested for


ConvNets, we reverse the order of the MEX pooling and classification,
referring to the resulting network as “patch-labeling SimNet”:
input similarity labeling output

1
k
I 1 I 1
H h x ij 1
n q i, j

D n
ph , pw out r
w k
J J
shared MEX label i, j , r
W D offsets 2 i, j

sim i, j, l ul T
x ij , z l label i, j, r MEX sim i, j, l br ,l ,q i , j
Three layers:
1 l

1 Similarity layer

2 MEX layer for patch classification: cross-channel input blocks, terms

ct omitted, offsets serve for classification and shared across regions.


3 MEX layer for pooling (combining patch classifications): channel

input blocks, terms ct omitted, offsets zeroed.


1
Lin, Chen, Yan. "Network in network".
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 34 / 50
Other SimNet settings – global average pooling

Patch labeling SimNet

The network’s classification:

ŷ (inp) = MEX⇠2 {MEX⇠1 {u>


l (xº , zl ) + br ,l,q(i,j) }}
i,j l

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 35 / 50


Other SimNet settings – global average pooling

Patch labeling SimNet

The network’s classification:

ŷ (inp) = MEX⇠2 {MEX⇠1 {u>


l (xº , zl ) + br ,l,q(i,j) }}
i,j l

If we set ⇠1 = ⇠2 = ⇠, and use the MEX collapsing property, we obtain the


same classification as the locality-sharing-pooling SimNet:

ŷ (inp) = argmax MEX⇠ {u>


l (xº , zl ) + br ,l,q(i,j) }i,j,l
r =1,...,k

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 35 / 50


Other SimNet settings – global average pooling

Patch labeling SimNet

The network’s classification:

ŷ (inp) = MEX⇠2 {MEX⇠1 {u>


l (xº , zl ) + br ,l,q(i,j) }}
i,j l

If we set ⇠1 = ⇠2 = ⇠, and use the MEX collapsing property, we obtain the


same classification as the locality-sharing-pooling SimNet:

ŷ (inp) = argmax MEX⇠ {u>


l (xº , zl ) + br ,l,q(i,j) }i,j,l
r =1,...,k

In our experiments, the best results were obtained by a different setting –


⇠2 ! 0, which corresponds to:
X n o
ŷ (inp) = argmax MEX⇠1 u> l (x º , z l ) + br ,l,q(i,j)
r =1,...,k i,j l

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 35 / 50


Other SimNet settings – global average pooling

Unsupervised initialization of l2 -similarity layer

Consider a Gaussian Mixture Model (GMM) with n components and


n
diagonal covariances – N µl , diag( l ) 2 l=1 .

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 36 / 50


Other SimNet settings – global average pooling

Unsupervised initialization of l2 -similarity layer

Consider a Gaussian Mixture Model (GMM) with n components and


n
diagonal covariances – N µl , diag( l ) 2 l=1 .

It can be shown that with l2 -similarity ( (x, z)i = (xi zi )2 ), setting


zl = µl and ul = 0.5 · l 2 gives:

l (xº , zl ) = log Pr (xº ^ Gaussian l) + cl


u>

for some c1 , ..., cn 2 R. In other words, assuming input patches follow a


GMM distribution as above, channel l of the layer’s output holds (up to a
constant) the probabilistic heat map of Gaussian l and the input patches.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 36 / 50


Other SimNet settings – global average pooling

Unsupervised initialization of l2 -similarity layer

Consider a Gaussian Mixture Model (GMM) with n components and


n
diagonal covariances – N µl , diag( l ) 2 l=1 .

It can be shown that with l2 -similarity ( (x, z)i = (xi zi )2 ), setting


zl = µl and ul = 0.5 · l 2 gives:

l (xº , zl ) = log Pr (xº ^ Gaussian l) + cl


u>

for some c1 , ..., cn 2 R. In other words, assuming input patches follow a


GMM distribution as above, channel l of the layer’s output holds (up to a
constant) the probabilistic heat map of Gaussian l and the input patches.

In practice, this suggests estimating the GMM means and covariances


based on unlabeled patches, and assigning templates and weights as above.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 36 / 50


Other SimNet settings – global average pooling

Unsupervised initialization of l1 -similarity layer

The exact same rationale and resulting initialization scheme presented for
l2 -similarity apply also to the case of l1 -similarity, the only difference being
the replacement of the GMM with a mixture of Laplacian distributions,
where each Laplacian has statistically independent coordinates.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 37 / 50


Other SimNet settings – global average pooling

Unsupervised initialization of MEX offsets

Consider the case of a MEX layer following the similarity layer:


input similarity MEX

H h x ij block t
D n
w

W D
sim i, j, l ul T x ij , z l mex t MEX sim i, j, l bt , i , j ,l , ct
i , j ,l block t

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 38 / 50


Other SimNet settings – global average pooling

Unsupervised initialization of MEX offsets

Consider the case of a MEX layer following the similarity layer:


input similarity MEX

H h x ij block t
D n
w

W D
sim i, j, l ul T x ij , z l mex t MEX sim i, j, l bt , i , j ,l , ct
i , j ,l block t

Taking into account location-dependent statistics, we not only assume


that input patches follow a mixture distribution, but also that each patch
location corresponds to a different mixture of the same n components.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 38 / 50


Other SimNet settings – global average pooling

Unsupervised initialization of MEX offsets

Consider the case of a MEX layer following the similarity layer:


input similarity MEX

H h x ij block t
D n
w

W D
sim i, j, l ul T x ij , z l mex t MEX sim i, j, l bt , i , j ,l , ct
i , j ,l block t

Taking into account location-dependent statistics, we not only assume


that input patches follow a mixture distribution, but also that each patch
location corresponds to a different mixture of the same n components.

This suggests estimating the mixture separately for each location, and
calculating offsets such that when appended to the outputs of the similarity
layer, the probabilistic heat maps take into account location dependency.
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 38 / 50
Other SimNet settings – global average pooling

Unsupervised initialization of MEX offsets (cont’d)

For example, if a template is deemed unlikely to appear on the top-left


corner of an image, that template’s heat map will be suppressed there.

The computed offsets serve for initialization of the MEX layer’s offsets.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 39 / 50


Experiments

Outline

1 Convolutional neural networks (ConvNets)

2 The SimNet architecture

3 SimNets and kernel machines


A basic neural-network analogy: input ! hidden layer ! output
A basic 3-layer SimNet with locality, sharing and pooling

4 Other SimNet settings – global average pooling

5 Experiments

6 Summary

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 40 / 50


Experiments

Experiments

Large-scale implementation and evaluation of deep SimNets against state


of the art ConvNets is currently under work. We report here a comparison
between the patch labeling SimNet, an equivalent ConvNet and the
“single-layer” network of Coates et al2 .

2
Coates et al.: An analysis of single-layer networks in unsupervised feature learning.
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 41 / 50
Experiments

Experiments

Large-scale implementation and evaluation of deep SimNets against state


of the art ConvNets is currently under work. We report here a comparison
between the patch labeling SimNet, an equivalent ConvNet and the
“single-layer” network of Coates et al2 .

Evaluation on CIFAR-10 dataset:


32 ⇥ 32 color images
10 classes, 50K training images, 10K test images

2
Coates et al.: An analysis of single-layer networks in unsupervised feature learning.
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 41 / 50
Experiments

Evaluated SimNet

Patch labeling SimNet:


input l1/l2-similarity labeling output

q i, j 1,1 q i, j 1, 2
stride=1
14 1 10
1
14
32 27 27 13
6 x ij
3 13
6 3 n q i, j 2,1 q i, j 2, 2
10
32 27 27 out r MEX 0 label i , j , r 1 i , j 27
108 108
2 n
sim i, j, l ul ,t xij ,t z l ,t ul ,t xij ,t z l ,t label i, j, r MEX 1 sim i, j, l br l q i , j
t 1 t 1 l 1

Initialization of similarity templates and weights via statistical


parameter estimation, using training set without the labels.
Supervised training via SGD minimization of softmax loss.
Experimented with up to n = 400 templates.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 42 / 50


Experiments

Evaluated ConvNet

input convolution ReLU pooling output

q i, j 1,1 q i, j 1, 2
stride=1
14 1 10
2
1
14
32 27 27 13 n
6 x ij 2
3 13 out r pool , w r
6 3 n q i, j 2,1 q i, j 2, 2
n
32 27 27
pool ph , pw , l max relu i, j, l 1 i , j 27:q i , j ph , pw
conv i, j, l x ij , z l relu i, j, l max conv i, j, l ,0

Implemented with Caffe toolbox.


Random initialization, training via SGD minimization of softmax loss,
Dropout regularization included.
Experimented with up to n = 6400 templates.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 43 / 50


Experiments

Evaluated “single-layer” network of Coates et al.

input coding pooling output

q i, j 1,1 q i, j 1, 2
stride=1
14 1 10
2
14 1
32 27 13 n
6 x ij 2
13
3
q i, j 2,1 q i, j 2, 2
6 3 n
32 27 out r pool , w r
n
pool ph , pw , l code i , j , l
code i , j , l max mean x ij zl ' x ij z l ,0 1 i , j 27
2 l' 1 2 q i, j ph , pw

“Triangle” coding – state of the art network of this depth.


We used the implementation published by the authors, and added a
supervised training phase that jointly optimizes the coding templates
and the SVM coefficients.
Experimented with up to n = 6400 templates.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 44 / 50


Experiments

Results

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 45 / 50


Experiments

Conclusions

SimNets with weighted l1 and l2 similarities reach slightly higher


accuracies than the ConvNet, at less than 1/9 its size.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 46 / 50


Experiments

Conclusions

SimNets with weighted l1 and l2 similarities reach slightly higher


accuracies than the ConvNet, at less than 1/9 its size.
SimNets with weighted l1 and l2 similarities are slightly outperformed
by the “single-layer” network of Coates et al., but are almost 1/5 in
size.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 46 / 50


Experiments

Conclusions

SimNets with weighted l1 and l2 similarities reach slightly higher


accuracies than the ConvNet, at less than 1/9 its size.
SimNets with weighted l1 and l2 similarities are slightly outperformed
by the “single-layer” network of Coates et al., but are almost 1/5 in
size.
Without similarity weights the SimNets are comparable to the
ConvNet. Weights add parameters to the network, but provide a
super-linear gain in accuracy.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 46 / 50


Experiments

Conclusions

SimNets with weighted l1 and l2 similarities reach slightly higher


accuracies than the ConvNet, at less than 1/9 its size.
SimNets with weighted l1 and l2 similarities are slightly outperformed
by the “single-layer” network of Coates et al., but are almost 1/5 in
size.
Without similarity weights the SimNets are comparable to the
ConvNet. Weights add parameters to the network, but provide a
super-linear gain in accuracy.
SimNets with l1 and l2 similarities produce comparable performance.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 46 / 50


Summary

Outline

1 Convolutional neural networks (ConvNets)

2 The SimNet architecture

3 SimNets and kernel machines


A basic neural-network analogy: input ! hidden layer ! output
A basic 3-layer SimNet with locality, sharing and pooling

4 Other SimNet settings – global average pooling

5 Experiments

6 Summary

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 47 / 50


Summary

Summary
The SimNets architecture consists of two basic building blocks:
Similarity operator: generalizes the ConvNet convolutional operator.
MEX operator: generalizes ConvNet ReLU activation and pooling, but
allows much more...

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 48 / 50


Summary

Summary
The SimNets architecture consists of two basic building blocks:
Similarity operator: generalizes the ConvNet convolutional operator.
MEX operator: generalizes ConvNet ReLU activation and pooling, but
allows much more...

The SimNet ingredient of input to hidden-units to output-nodes, and


the basic locality-sharing-pooling structure, are both generalizations
of kernel machines.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 48 / 50


Summary

Summary
The SimNets architecture consists of two basic building blocks:
Similarity operator: generalizes the ConvNet convolutional operator.
MEX operator: generalizes ConvNet ReLU activation and pooling, but
allows much more...

The SimNet ingredient of input to hidden-units to output-nodes, and


the basic locality-sharing-pooling structure, are both generalizations
of kernel machines.

We considered three types of similarities: l1 , l2 and linear


(convolution).
In their unweighted form, the similarities correspond to kernel machines
with Laplacian, RBF and Exponential kernels respectively.
In their weighted forms, the l1 and l2 similarities go beyond kernel
machines, providing higher abstraction levels.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 48 / 50


Summary

Summary (cont’d)

The SimNet architecture exhibits a natural unsupervised initialization


scheme based on statistical estimation, which has the potential of
automatically determining the number of channels in a similarity layer
via variance analysis of patterns generated from previous layers.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 49 / 50


Summary

Summary (cont’d)

The SimNet architecture exhibits a natural unsupervised initialization


scheme based on statistical estimation, which has the potential of
automatically determining the number of channels in a similarity layer
via variance analysis of patterns generated from previous layers.

Large-scale evaluation of deep SimNets against state of the art


ConvNets is under work. On a benchmark conducted for a 3-layer
SimNet against an analogous ConvNet and the “single-layer” network
of Coates et al., the SimNet achieved accuracies comparable to the
competition at 1/9 and 1/5 (respectively) of their size.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 49 / 50


Summary

Summary (cont’d)

The SimNet architecture exhibits a natural unsupervised initialization


scheme based on statistical estimation, which has the potential of
automatically determining the number of channels in a similarity layer
via variance analysis of patterns generated from previous layers.

Large-scale evaluation of deep SimNets against state of the art


ConvNets is under work. On a benchmark conducted for a 3-layer
SimNet against an analogous ConvNet and the “single-layer” network
of Coates et al., the SimNet achieved accuracies comparable to the
competition at 1/9 and 1/5 (respectively) of their size.

Experimental results validate that the similarity weighting, which takes


SimNets beyond kernel machines, is crucial in terms of performance.

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 49 / 50


Summary

Summary (cont’d)

The SimNet architecture exhibits a natural unsupervised initialization


scheme based on statistical estimation, which has the potential of
automatically determining the number of channels in a similarity layer
via variance analysis of patterns generated from previous layers.

Large-scale evaluation of deep SimNets against state of the art


ConvNets is under work. On a benchmark conducted for a 3-layer
SimNet against an analogous ConvNet and the “single-layer” network
of Coates et al., the SimNet achieved accuracies comparable to the
competition at 1/9 and 1/5 (respectively) of their size.

Experimental results validate that the similarity weighting, which takes


SimNets beyond kernel machines, is crucial in terms of performance.

Offsets in a MEX layer enable addition of locality-based biases to the


templates in a preceding similarity layer. This is something that
ConvNets cannot express, and will be evaluated in future work.
Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 49 / 50
Summary

Thank You

Nadav Cohen, Amnon Shashua (HUJI) SimNets November 2014 50 / 50

Das könnte Ihnen auch gefallen