Optimal Learning Rate

What is the optimal value ηopt of the learning rate?

Consider 1-dim. case. Use first-order Taylor expansion around current weight wc
∂E(wc ) . E(w) = E(wc ) + (w − wc ) ∂w

Differentiating both sides with respect to w gives:
∂E(w) ∂E(wc ) ∂ 2 E(wc ) = + (w − wc ) ∂w ∂w ∂w 2

Setting w = wmin and noting that

∂E(wmin ) ∂w

= 0, one obtains

∂E(wc ) ∂ 2 E(wc ) 0= + (wmin − wc ) + ∂w ∂w 2
– p. 132

Optimal Learning Rate (cont.)
wmin = wc − ∂ 2 E(w ∂w 2
ηopt
E(w) η < ηopt E(w) η = ηopt

c)

−1

∂E(wc ) ∂w

wmin

w

wmin

w

– p. 133

g. • We present corrupted images to the memory (e. presentation of corrupted images – p. our brain) and recall the corresponding images. • The memory should be content-addressable and insensitive to small errors. 134 .Hopfield Network Introductory Example recalled by the memory • Suppose we want to store N binary images in some memory.

 . ∀i. 135 . j State of unit i can take values ±1 and is denoted as Si . State dynamics are governed by activity rule:   +1 if a ≥ 0.Hopfield Network S5 w51 = w15 S4 • wij S1 S3 S2 denotes weight connection from unit j to unit i • no unit has connection with itself wii = 0. where sgn(a) =  wij Sj Si = sgn −1 if a < 0 j – p. ∀i • connections are symmetric wij = wji .

e. number of patterns).g. +1}. where each memory is a binary pattern with xi ∈ {−1. m = n = 3  a3 b1 a3 b2 a3 b3 a3 – p. The outer product a ⊗ b (short a b) is defined as the m × n matrix     a1 b1 a1 b2 a1 b3 a1     a2  ⊗ [b1 b2 b3 ] =  a2 b1 a2 b2 a2 b3  . wij = N n • where N denotes the number of units (N can also be some positive constant. • The weights are set using the sum of outer products 1 (n) (n) xi xj . 136 .Learning Rule in a Hopfield Network Learning in Hopfield networks: Store a set of desired memories {x(n) } in the network. Given a m × 1 column vector a and 1 × n row vector b.

+1] =  −1 +1 −1    +1 +1 −1 +1 – p. +1].     +1 −1 +1 −1  +1  ⊗ [−1. −1] =  −1 +1 −1    −1 +1 −1 +1 +     +1 −1 +1 +1  −1  ⊗ [+1. −1] and x(2) = [+1. 137 .Learning in Hopfield Network (Example) Suppose we want to store patterns x(1) = [−1. −1. −1. +1. +1.

. The storage of patterns in the network can also be interpreted as constructing stable states. ∀i. – p. 138 sgn  j wij xi  = xi . The condition for patterns to be stable is:   Suppose we present pattern x(1) to the network and want to restore the corresponding pattern.Learning in Hopfield Network (Example) (cont.) 0 −2 +2 1  W =  −2 0 −2  3 +2 −2 0   Recall: no unit has connection with itself.

) Let us assume that the network states are set as follows: Si = xi . −1] as follows:     3 3 S1 = sgn  j=1 3 Can we also restore the original patterns by presenting “similar” patterns which are corrupted by noise? S3 = sgn   w1j Sj  = −1  S2 = sgn  j=1 w2j Sj  = +1 j=1 w3j Sj  = −1 – p. +1. ∀i.Learning in Hopfield Network (Example) (cont. We can restore pattern x(1) = [−1. 139 .

The sequence of selected units may be a fixed sequence or a random sequence. Synchronously updating states can lead to oscillation (no convergence to a stable state). Asynchronous updates: • one unit at a time updates its state. 140 .Updating States in a Hopfield Network Synchronous updates: • all units update their states Si = sgn j wij Sj simultaneously. 1 S1 = +1 S2 = −1 1 – p.

The state of a Hopfield network can be expressed in terms of the energy function 1 E=− 2 wij Si Sj i.Aim of a Hopfield Network Our aim is that by presenting a corrupted pattern. it is also a stable state for the network. and by applying iteratively the state update rule the Hopfield network will settle down in a stable state which corresponds to the desired pattern. 141 . – p.j Hopfield observed that if a state is a local minimum in the energy function. Hopfield network is a method for • pattern completion • error correction.

– p.Basin of Attraction and Stable States 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 basin of attraction stable states Within the space the stored patterns x(n) are acting like attractors. 142 .

6 Energy = −77.47 Energy = −83.73 Energy = −67.Haykin’s Digit Example Suppose we stored the following digits in the Hopfield network: Energy = −67.93 Pattern 6 Pattern 9 Pattern box – p. 143 .73 Pattern 0 Pattern 1 Pattern 2 Pattern 3 Pattern 4 Energy = −90.33 Energy = −86.87 Energy = −82.13 Energy = −66.

87 Start Pattern updated unit 40 updated unit 39 updated unit 81 updated unit 98 Energy = −18.33 Energy = −25.73 updated unit 80 updated unit 12 updated unit 114 updated unit 115 updated unit 49 Energy = −26.27 Energy = −12.67 Energy = −30.6 Energy = −14.87 Energy = −15.Updated States of Corrupted Digit 6 Energy = −10.07 Energy = −20. 144 .47 Energy = −34.4 updated unit 117 updated unit 3 updated unit 48 updated unit 6 updated unit 79 – p.2 Energy = −13.2 Energy = −23.13 Energy = −31.4 Energy = −22.8 Energy = −29.

67 Energy = −63.) Energy = −36.4 Energy = −41.4 updated unit 83 updated unit 71 updated unit 77 updated unit 26 updated unit 15 Energy = −60.47 Energy = −68 Energy = −71.47 Energy = −58.07 Energy = −42.67 Energy = −56.27 updated unit 113 updated unit 57 updated unit 103 updated unit 18 updated unit 109 Energy = −47.27 updated unit 31 updated unit 58 updated unit 16 updated unit 29 updated unit 88 – p.33 Energy = −64. 145 .73 Energy = −38.6 Energy = −50.Updated States of Corrupted Digit 6 (cont.4 Energy = −45.4 Energy = −52.

) The resulting pattern (stable state with energy −90.47 updated unit 73 Original Pattern 6 – p.27 Energy = −87.47 Energy = −90.27 Energy = −81.73 Energy = −77.47 Energy = −84. Energy = −73.47) matches the desired pattern.Updated States of Corrupted Digit 6 (cont.33 updated unit 72 updated unit 90 updated unit 19 updated unit 21 updated unit 25 Energy = −90. 146 .

27 Energy = −31.27 Energy = −30.Recall a Spurious Pattern Energy = −28.87 updated unit 65 updated unit 15 updated unit 54 updated unit 62 updated unit 33 – p.6 Energy = −37.53 Energy = −44.53 Energy = −51.6 Energy = −40 Energy = −42.6 updated unit 98 updated unit 111 updated unit 50 updated unit 81 updated unit 95 Energy = −44.93 Energy = −32.8 Start Pattern updated unit 44 updated unit 12 updated unit 64 updated unit 45 Energy = −33.27 Energy = −28.8 Energy = −48. 147 .13 Energy = −50.4 Energy = −35.

6 Energy = −80.Recall a Spurious Pattern (cont.93 Energy = −61.73 Energy = −66.73 Energy = −56.6 Energy = −69 Energy = −70. 148 .4 updated unit 73 updated unit 70 updated unit 40 updated unit 117 updated unit 106 – p.6 Energy = −63.93 Energy = −74.13 Energy = −76.8 Energy = −67.53 Energy = −59.2 updated unit 37 updated unit 91 updated unit 58 updated unit 84 updated unit 43 Energy = −63.) Energy = −53.27 Energy = −81.4 updated unit 28 updated unit 112 updated unit 48 updated unit 88 updated unit 26 Energy = −71.

149 .8 Energy = −84. This pattern however is not the desired pattern. It is a pattern which was not stored in the network.) The Hopfield network settled down in local minima with energy −84.Recall a Spurious Pattern (cont.93 Energy = −83.13 updated unit 61 updated unit 15 Original Pattern 9 – p. Energy = −84.93.

87 Energy = −31.13 Energy = −22.53 Energy = −38.47 Energy = −36.07 Energy = −22.13 Start Pattern updated unit 97 updated unit 17 updated unit 58 updated unit 45 Energy = −24.6 Energy = −28. 150 .Incorrect Recall of Corrupted Pattern 2 Energy = −22.53 Energy = −27.33 Energy = −29.47 updated unit 18 updated unit 100 updated unit 7 updated unit 103 updated unit 81 Energy = −32.07 Energy = −22.33 Energy = −24.33 Energy = −35.13 Energy = −32.67 updated unit 68 updated unit 86 updated unit 119 updated unit 33 updated unit 87 – p.

151 .8 Energy = −68.93 updated unit 101 updated unit 41 updated unit 117 updated unit 65 updated unit 10 – p.6 Energy = −56.2 Energy = −41.27 updated unit 91 updated unit 37 updated unit 3 updated unit 31 updated unit 24 Energy = −60.73 Energy = −61.87 Energy = −62.6 updated unit 57 updated unit 73 updated unit 120 updated unit 104 updated unit 43 Energy = −51.73 Energy = −45.) Energy = −39.Incorrect Recall of Corrupted Pattern 2 (cont.6 Energy = −51.67 Energy = −55.4 Energy = −58.87 Energy = −64.47 Energy = −48 Energy = −49.

93 Energy = −73.87 Energy = −70.33 Energy = −86.07 Energy = −78.4 Energy = −87.73 Energy = −88.13 Energy = −71.) Energy = −69.8 Energy = −82.47 updated unit 8 updated unit 76 updated unit 32 updated unit 106 updated unit 75 Energy = −77. 152 .47 Energy = −72.53 Energy = −85.67 Energy = −83.8 updated unit 114 updated unit 67 updated unit 112 updated unit 47 updated unit 85 Energy = −84.13 Energy = −82.Incorrect Recall of Corrupted Pattern 2 (cont.53 updated unit 96 updated unit 48 updated unit 28 updated unit 38 updated unit 27 – p.

33 updated unit 86 Original Pattern 2 – p.Incorrect Recall of Corrupted Pattern 2 (cont.) Although we presented the corrupted pattern 2. 153 .47 Energy = −82. the Hopfield network settled down in the stable state that corresponds to pattern 6. Energy = −90.

× Ö Ñ ÑÓÖ × Spurious states represent stable states that are different from the stored desired patterns. however most of them are not stable states.MacKay’s Example of an Overloaded Network Six patterns are stored in the Hopfield network. 154 . – p.

• Capacity: What is the relation between the number d of units and the maximum number Nmax of patterns one can store by allowing d some small error. They corresponds to a linear combination of an odd number of patterns.Spurious States and Capacity Reversed states ((−1) · x(n) ) have same energy as the original patterns x(n) . • Spin glass states are local minima that are not correlated with any finite number of the original patterns. • Stable mixture states are not equal to any single pattern. – p. 155 . If Nmax = 4 log d then most of stored patterns can be recalled perfectly.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.