# Optimal Learning Rate

What is the optimal value ηopt of the learning rate?

Consider 1-dim. case. Use ﬁrst-order Taylor expansion around current weight wc
∂E(wc ) . E(w) = E(wc ) + (w − wc ) ∂w

Diﬀerentiating both sides with respect to w gives:
∂E(w) ∂E(wc ) ∂ 2 E(wc ) = + (w − wc ) ∂w ∂w ∂w 2

Setting w = wmin and noting that

∂E(wmin ) ∂w

= 0, one obtains

∂E(wc ) ∂ 2 E(wc ) 0= + (wmin − wc ) + ∂w ∂w 2
– p. 132

Optimal Learning Rate (cont.)
wmin = wc − ∂ 2 E(w ∂w 2
ηopt
E(w) η < ηopt E(w) η = ηopt

c)

−1

∂E(wc ) ∂w

wmin

w

wmin

w

– p. 133

Hopﬁeld Network Introductory Example recalled by the memory • Suppose we want to store N binary images in some memory.g. presentation of corrupted images – p. • We present corrupted images to the memory (e. • The memory should be content-addressable and insensitive to small errors. our brain) and recall the corresponding images. 134 .

Hopﬁeld Network S5 w51 = w15 S4 • wij S1 S3 S2 denotes weight connection from unit j to unit i • no unit has connection with itself wii = 0.  . where sgn(a) =  wij Sj Si = sgn −1 if a < 0 j – p. j State of unit i can take values ±1 and is denoted as Si . 135 . ∀i • connections are symmetric wij = wji . State dynamics are governed by activity rule:   +1 if a ≥ 0. ∀i.

number of patterns). • The weights are set using the sum of outer products 1 (n) (n) xi xj . Given a m × 1 column vector a and 1 × n row vector b. e. m = n = 3  a3 b1 a3 b2 a3 b3 a3 – p.g. 136 . where each memory is a binary pattern with xi ∈ {−1. wij = N n • where N denotes the number of units (N can also be some positive constant. +1}.Learning Rule in a Hopﬁeld Network Learning in Hopﬁeld networks: Store a set of desired memories {x(n) } in the network. The outer product a ⊗ b (short a b) is deﬁned as the m × n matrix     a1 b1 a1 b2 a1 b3 a1     a2  ⊗ [b1 b2 b3 ] =  a2 b1 a2 b2 a2 b3  .

137 . −1. +1. +1]. −1] =  −1 +1 −1    −1 +1 −1 +1 +     +1 −1 +1 +1  −1  ⊗ [+1. +1. −1] and x(2) = [+1. +1] =  −1 +1 −1    +1 +1 −1 +1 – p. −1.Learning in Hopﬁeld Network (Example) Suppose we want to store patterns x(1) = [−1.     +1 −1 +1 −1  +1  ⊗ [−1.

The condition for patterns to be stable is:   Suppose we present pattern x(1) to the network and want to restore the corresponding pattern. – p. . The storage of patterns in the network can also be interpreted as constructing stable states. ∀i. 138 sgn  j wij xi  = xi .Learning in Hopﬁeld Network (Example) (cont.) 0 −2 +2 1  W =  −2 0 −2  3 +2 −2 0   Recall: no unit has connection with itself.

∀i. +1. We can restore pattern x(1) = [−1. −1] as follows:     3 3 S1 = sgn  j=1 3 Can we also restore the original patterns by presenting “similar” patterns which are corrupted by noise? S3 = sgn   w1j Sj  = −1  S2 = sgn  j=1 w2j Sj  = +1 j=1 w3j Sj  = −1 – p. 139 .) Let us assume that the network states are set as follows: Si = xi .Learning in Hopﬁeld Network (Example) (cont.

140 . 1 S1 = +1 S2 = −1 1 – p. The sequence of selected units may be a ﬁxed sequence or a random sequence.Updating States in a Hopﬁeld Network Synchronous updates: • all units update their states Si = sgn j wij Sj simultaneously. Asynchronous updates: • one unit at a time updates its state. Synchronously updating states can lead to oscillation (no convergence to a stable state).

and by applying iteratively the state update rule the Hopﬁeld network will settle down in a stable state which corresponds to the desired pattern.Aim of a Hopﬁeld Network Our aim is that by presenting a corrupted pattern. 141 . Hopﬁeld network is a method for • pattern completion • error correction. – p. The state of a Hopﬁeld network can be expressed in terms of the energy function 1 E=− 2 wij Si Sj i.j Hopﬁeld observed that if a state is a local minimum in the energy function. it is also a stable state for the network.

142 . – p.Basin of Attraction and Stable States 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 basin of attraction stable states Within the space the stored patterns x(n) are acting like attractors.

93 Pattern 6 Pattern 9 Pattern box – p. 143 .13 Energy = −66.6 Energy = −77.Haykin’s Digit Example Suppose we stored the following digits in the Hopﬁeld network: Energy = −67.73 Energy = −67.33 Energy = −86.87 Energy = −82.47 Energy = −83.73 Pattern 0 Pattern 1 Pattern 2 Pattern 3 Pattern 4 Energy = −90.

13 Energy = −31.6 Energy = −14.4 updated unit 117 updated unit 3 updated unit 48 updated unit 6 updated unit 79 – p.2 Energy = −13.2 Energy = −23.8 Energy = −29.27 Energy = −12.87 Energy = −15.73 updated unit 80 updated unit 12 updated unit 114 updated unit 115 updated unit 49 Energy = −26.Updated States of Corrupted Digit 6 Energy = −10.07 Energy = −20.67 Energy = −30.47 Energy = −34.4 Energy = −22.87 Start Pattern updated unit 40 updated unit 39 updated unit 81 updated unit 98 Energy = −18. 144 .33 Energy = −25.

27 updated unit 113 updated unit 57 updated unit 103 updated unit 18 updated unit 109 Energy = −47.33 Energy = −64.6 Energy = −50.67 Energy = −56.) Energy = −36.27 updated unit 31 updated unit 58 updated unit 16 updated unit 29 updated unit 88 – p.4 Energy = −52.73 Energy = −38.07 Energy = −42. 145 .4 updated unit 83 updated unit 71 updated unit 77 updated unit 26 updated unit 15 Energy = −60.4 Energy = −45.47 Energy = −58.67 Energy = −63.Updated States of Corrupted Digit 6 (cont.4 Energy = −41.47 Energy = −68 Energy = −71.

47 Energy = −90.47 Energy = −84. 146 .) The resulting pattern (stable state with energy −90.73 Energy = −77.47 updated unit 73 Original Pattern 6 – p. Energy = −73.47) matches the desired pattern.Updated States of Corrupted Digit 6 (cont.27 Energy = −81.33 updated unit 72 updated unit 90 updated unit 19 updated unit 21 updated unit 25 Energy = −90.27 Energy = −87.

4 Energy = −35.6 Energy = −37.27 Energy = −31.8 Start Pattern updated unit 44 updated unit 12 updated unit 64 updated unit 45 Energy = −33.13 Energy = −50.27 Energy = −30.93 Energy = −32.Recall a Spurious Pattern Energy = −28. 147 .87 updated unit 65 updated unit 15 updated unit 54 updated unit 62 updated unit 33 – p.8 Energy = −48.6 updated unit 98 updated unit 111 updated unit 50 updated unit 81 updated unit 95 Energy = −44.53 Energy = −51.6 Energy = −40 Energy = −42.27 Energy = −28.53 Energy = −44.

148 .6 Energy = −63.53 Energy = −59.93 Energy = −74.73 Energy = −56.27 Energy = −81.6 Energy = −69 Energy = −70.4 updated unit 28 updated unit 112 updated unit 48 updated unit 88 updated unit 26 Energy = −71.8 Energy = −67.73 Energy = −66.2 updated unit 37 updated unit 91 updated unit 58 updated unit 84 updated unit 43 Energy = −63.) Energy = −53.13 Energy = −76.93 Energy = −61.6 Energy = −80.4 updated unit 73 updated unit 70 updated unit 40 updated unit 117 updated unit 106 – p.Recall a Spurious Pattern (cont.

) The Hopﬁeld network settled down in local minima with energy −84. It is a pattern which was not stored in the network.8 Energy = −84.Recall a Spurious Pattern (cont.93 Energy = −83. Energy = −84. 149 .93. This pattern however is not the desired pattern.13 updated unit 61 updated unit 15 Original Pattern 9 – p.

07 Energy = −22.13 Energy = −22.53 Energy = −38. 150 .47 Energy = −36.67 updated unit 68 updated unit 86 updated unit 119 updated unit 33 updated unit 87 – p.87 Energy = −31.33 Energy = −24.Incorrect Recall of Corrupted Pattern 2 Energy = −22.33 Energy = −29.47 updated unit 18 updated unit 100 updated unit 7 updated unit 103 updated unit 81 Energy = −32.07 Energy = −22.13 Start Pattern updated unit 97 updated unit 17 updated unit 58 updated unit 45 Energy = −24.6 Energy = −28.53 Energy = −27.13 Energy = −32.33 Energy = −35.

6 Energy = −51.8 Energy = −68.27 updated unit 91 updated unit 37 updated unit 3 updated unit 31 updated unit 24 Energy = −60.2 Energy = −41.67 Energy = −55.4 Energy = −58.73 Energy = −61.47 Energy = −48 Energy = −49.87 Energy = −64.87 Energy = −62.Incorrect Recall of Corrupted Pattern 2 (cont.6 updated unit 57 updated unit 73 updated unit 120 updated unit 104 updated unit 43 Energy = −51. 151 .93 updated unit 101 updated unit 41 updated unit 117 updated unit 65 updated unit 10 – p.6 Energy = −56.73 Energy = −45.) Energy = −39.

67 Energy = −83.53 updated unit 96 updated unit 48 updated unit 28 updated unit 38 updated unit 27 – p.07 Energy = −78.8 updated unit 114 updated unit 67 updated unit 112 updated unit 47 updated unit 85 Energy = −84.87 Energy = −70.13 Energy = −82.53 Energy = −85.93 Energy = −73.47 Energy = −72.73 Energy = −88.Incorrect Recall of Corrupted Pattern 2 (cont.33 Energy = −86.13 Energy = −71.) Energy = −69.4 Energy = −87. 152 .8 Energy = −82.47 updated unit 8 updated unit 76 updated unit 32 updated unit 106 updated unit 75 Energy = −77.

Energy = −90. the Hopﬁeld network settled down in the stable state that corresponds to pattern 6.47 Energy = −82.Incorrect Recall of Corrupted Pattern 2 (cont. 153 .33 updated unit 86 Original Pattern 2 – p.) Although we presented the corrupted pattern 2.

however most of them are not stable states. × Ö Ñ ÑÓÖ × Spurious states represent stable states that are diﬀerent from the stored desired patterns. – p.MacKay’s Example of an Overloaded Network Six patterns are stored in the Hopﬁeld network. 154 .

155 . If Nmax = 4 log d then most of stored patterns can be recalled perfectly. – p. • Spin glass states are local minima that are not correlated with any ﬁnite number of the original patterns. • Capacity: What is the relation between the number d of units and the maximum number Nmax of patterns one can store by allowing d some small error. They corresponds to a linear combination of an odd number of patterns. • Stable mixture states are not equal to any single pattern.Spurious States and Capacity Reversed states ((−1) · x(n) ) have same energy as the original patterns x(n) .