Beruflich Dokumente
Kultur Dokumente
0
Version: January 24, 2018
1 / 35
Why information theory
2 / 35
Overview
3 / 35
Definition
4 / 35
Entropy
Discrete variable X
H(R) = − p(r ) log2 p(r )
r
letting ∆r → 0 we have
Z
lim [H + log2 ∆r ] = − p(r ) log2 p(r )dr
∆r →0
5 / 35
Joint, Conditional entropy
Joint entropy:
X
H(S, R) = − P(S, R) log2 P(S, R)
r ,s
Conditional entropy:
X
H(S|R) = P(R = r )H(S|R = r )
r
X X
= − P(r ) P(s|r ) log2 P(s|r )
r s
= H(S, R) − H(R)
If S, R are independent
6 / 35
Mutual information
Mutual information:
X p(r , s)
Im (R; S) = p(r , s) log2
r ,s
p(r )p(s)
= H(R) − H(R|S) = H(S) − H(S|R)
7 / 35
Mutual Information
8 / 35
Mutual Information: Examples
Y1 Y1
non non
smoker smoker smoker smoker
lung lung
cancer 1/3 0 cancer 1/9 2/9
Y2 Y2
no lung no lung
0 2/3 2/9 4/9
cancer cancer
9 / 35
Mutual Information: Examples
Y1 Y1
non non
smoker smoker smoker smoker
lung lung
cancer 1/3 0 cancer 1/9 2/9
Y2 Y2
no lung no lung
0 2/3 2/9 4/9
cancer cancer
Only for the left joint probability Im > 0 (how much?). On the right,
knowledge about Y1 does not inform us about Y2 .
10 / 35
Kullback-Leibler divergence
11 / 35
Mutual info between jointly Gaussian variables
Z Z
P(y1 , y2 ) 1
I(Y1 ; Y2 ) = P(y1 , y2 ) log2 dy1 dy2 = − log2 (1 − ρ2 )
P(y1 )P(y2 ) 2
Given Z
H(R) = − p(r) log2 p(r)dr − N log2 ∆r
and Z
H(Ri ) = − p(ri ) log2 p(ri )dr − log2 ∆r
We have X
H(R) ≤ H(Ri )
i
13 / 35
Mutual information in populations of Neurons
14 / 35
Entropy Maximization for a Single Neuron
15 / 35
Let r = f (s) and s ∼ p(s). Which f (assumed monotonic)
maximizes H(R) using max firing rate constraint? Require:
1
P(r ) = rmax
dr 1 df
p(s) = p(r ) =
ds rmax ds
Thus df /ds = rmax p(s) and
Z s
f (s) = rmax p(s0 )ds0
smin
16 / 35
Fly retina
Evidence that the large monopolar cell in the fly visual system carries
out histogram equalization
Similar in V1, but On and Off channels [Brady and Field, 2000]
18 / 35
Information of time varying signals
19 / 35
Information of graded synapses
20 / 35
Spiking neurons: maximal information
−T
H= [p log2 p + (1 − p) log2 (1 − p)]
δt
For low rates p 1, setting λ = (δt)p:
e
H = T λ log2 ( )
λδt
21 / 35
Spiking neurons
Calculation incorrect when multiple spikes per bin. Instead, for large
bins maximal information for exponential distribution:
1
P(n) = Z1 exp[−n log(1 + hni )]
1
H = log2 (1 + hni) + hni log2 (1 + hni ) ≈ log2 (1 + hni) + 1
22 / 35
Spiking neurons: rate code
[Stein, 1967]
23 / 35
[Stein, 1967]
Similar behaviour for Poisson : H ∝ log(T )
24 / 35
Spiking neurons: dynamic stimuli
v =w·u+η
26 / 35
For a Gaussian RV with variance σ 2 we have H = 12 log 2πeσ 2 . To
maximize H(v ) we need to maximize wT Qw subject to the
constraint kwk2 = 1
Thus w ∝ e1 so we obtain PCA
If v is non-Gaussian then this calculation gives an upper bound on
H(v ) (as the Gaussian distribution is the maximum entropy
distribution for a given mean and covariance)
27 / 35
Infomax
Infomax: maximize information in multiple outputs wrt weights
[Linsker, 1988]
v = Wu + η
1
H(v ) = log det(hvvT i)
2
Example: 2 inputs and 2 outputs. Input is correlated. wk21 + wk22 = 1.
30 / 35
Common technique for Im : shuffle correction [Panzeri et al., 2007]
See also: [Paninski, 2003, Nemenman et al., 2002]
31 / 35
Summary
32 / 35
References I
33 / 35
References II
34 / 35
References III
Stein, R. B. (1967).
The information capacity of nerve cells using a frequency code.
Biophys J, 7:797–826.
Strong, S. P., Koberle, R., de Ruyter van Steveninck, R. R., and Bialek, W. (1998).
Entropy and Information in Neural Spike Trains.
Phys Rev Lett, 80:197–200.
Warzecha, A. K. and Egelhaaf, M. (1999).
Variability in spike trains during constant and dynamic stimulation.
Science, 283(5409):1927–1930.
35 / 35