Sie sind auf Seite 1von 11

DNN Theory

1
A unit is a neuron, a node can have multiple neuron and
can act as a shallow network.

A unit computes:

ReLU

2
A specific deep net (b)

The approximation of functions with a compositional


structure can be achieved with the same degree of accuracy
by deep and shallow networks but that the number of
parameters are much smaller for the deep networks than for
the shallow network with equivalent approximation accuracy.

First result to be seen

Shallow networks do not have any structural


information (compositional structure)
3
Let:

Class of all shallow networks with N units

Parameters to be trained

4
Set of all functions of n variables with continuous partial
derivatives of orders up to m < ∞

Smoothness (smooth function class)


with

Ensuring continuous mapping

Each constituent function

Set of all deep networks with a binary


tree architecture.

Each constituent node is in


5
Set of non-leaf vertices N is multiple of M

Total parameters involved.


when n is an integer power of 2

We have

• Deep networks can exploit in their architecture the special


structure of compositional functions, unlike shallow ones.

• To prove valid approximation, it is sufficient that the acyclic graph


representing the structure of the function is a subgraph of the
graph representing the structure of the deep network.

6
For Shallow Nets

We have: Typically

Kth degree polynomial of n variables


7
For Deep Nets (Hierarchical local nets)

8
On Theorem 2:
𝑛𝑛

𝑁𝑁 = O(𝜖𝜖 𝑚𝑚 ) In a shallow net

In each node
𝑚𝑚

𝜖𝜖 = cN 2

||𝑓𝑓1𝑖𝑖 − 𝑃𝑃1𝑖𝑖 || ≤ 𝜖𝜖

By additive accumulation:
||𝑓𝑓2𝑘𝑘 (𝑓𝑓1𝑖𝑖 , 𝑓𝑓1𝑗𝑗 ) − 𝑃𝑃2𝑘𝑘 (𝑃𝑃1𝑖𝑖 , 𝑃𝑃1𝑗𝑗 )|| ≤ 𝐶𝐶1 𝜖𝜖
||𝑓𝑓𝐷𝐷𝐷𝐷 (𝑓𝑓(𝐷𝐷−1)𝑖𝑖 , 𝑓𝑓(𝐷𝐷−1)𝑗𝑗 ) − 𝑃𝑃𝐷𝐷𝐷𝐷 (𝑃𝑃(𝐷𝐷−1)𝑖𝑖 , 𝑃𝑃(𝐷𝐷−1)𝑗𝑗 )|| ≤ 𝐶𝐶𝐶𝐶
2

So 𝑁𝑁 = O(𝜖𝜖 𝑚𝑚 )

9
So:

Curse of dimensionality:

‘n’ the input dimension is a scaling parameter for binary deep net;
It is an exponent for shallow net.
Dimensionality of the net:

2/m of deep nets (vs) n/m of shallow nets

• Can increase N by increasing depth to improve accuracy


without affecting both issues of dimensionality!

10
It does for non-smooth ReLU based on Lipschitz continuity!

Directed
Acyclic
Graph

Polynomial
type f:

Effective dim

11