Optimal Learning Rate

What is the optimal value ηopt of the learning rate?

Consider 1-dim. case. Use first-order Taylor expansion around current weight wc
∂E(wc ) . E(w) = E(wc ) + (w − wc ) ∂w

Differentiating both sides with respect to w gives:
∂E(w) ∂E(wc ) ∂ 2 E(wc ) = + (w − wc ) ∂w ∂w ∂w 2

Setting w = wmin and noting that

∂E(wmin ) ∂w

= 0, one obtains

∂E(wc ) ∂ 2 E(wc ) 0= + (wmin − wc ) + ∂w ∂w 2
– p. 132

Optimal Learning Rate (cont.)
wmin = wc − ∂ 2 E(w ∂w 2
ηopt
E(w) η < ηopt E(w) η = ηopt

c)

−1

∂E(wc ) ∂w

wmin

w

wmin

w

– p. 133

our brain) and recall the corresponding images. • The memory should be content-addressable and insensitive to small errors. • We present corrupted images to the memory (e. 134 . presentation of corrupted images – p.g.Hopfield Network Introductory Example recalled by the memory • Suppose we want to store N binary images in some memory.

Hopfield Network S5 w51 = w15 S4 • wij S1 S3 S2 denotes weight connection from unit j to unit i • no unit has connection with itself wii = 0. ∀i • connections are symmetric wij = wji . ∀i.  . where sgn(a) =  wij Sj Si = sgn −1 if a < 0 j – p. 135 . j State of unit i can take values ±1 and is denoted as Si . State dynamics are governed by activity rule:   +1 if a ≥ 0.

• The weights are set using the sum of outer products 1 (n) (n) xi xj . where each memory is a binary pattern with xi ∈ {−1. Given a m × 1 column vector a and 1 × n row vector b. +1}.g. The outer product a ⊗ b (short a b) is defined as the m × n matrix     a1 b1 a1 b2 a1 b3 a1     a2  ⊗ [b1 b2 b3 ] =  a2 b1 a2 b2 a2 b3  . wij = N n • where N denotes the number of units (N can also be some positive constant. number of patterns).Learning Rule in a Hopfield Network Learning in Hopfield networks: Store a set of desired memories {x(n) } in the network. e. m = n = 3  a3 b1 a3 b2 a3 b3 a3 – p. 136 .

−1. +1.Learning in Hopfield Network (Example) Suppose we want to store patterns x(1) = [−1. −1. +1.     +1 −1 +1 −1  +1  ⊗ [−1. 137 . −1] and x(2) = [+1. +1]. −1] =  −1 +1 −1    −1 +1 −1 +1 +     +1 −1 +1 +1  −1  ⊗ [+1. +1] =  −1 +1 −1    +1 +1 −1 +1 – p.

138 sgn  j wij xi  = xi . The condition for patterns to be stable is:   Suppose we present pattern x(1) to the network and want to restore the corresponding pattern. .Learning in Hopfield Network (Example) (cont. ∀i.) 0 −2 +2 1  W =  −2 0 −2  3 +2 −2 0   Recall: no unit has connection with itself. – p. The storage of patterns in the network can also be interpreted as constructing stable states.

∀i.) Let us assume that the network states are set as follows: Si = xi . 139 .Learning in Hopfield Network (Example) (cont. We can restore pattern x(1) = [−1. +1. −1] as follows:     3 3 S1 = sgn  j=1 3 Can we also restore the original patterns by presenting “similar” patterns which are corrupted by noise? S3 = sgn   w1j Sj  = −1  S2 = sgn  j=1 w2j Sj  = +1 j=1 w3j Sj  = −1 – p.

Synchronously updating states can lead to oscillation (no convergence to a stable state).Updating States in a Hopfield Network Synchronous updates: • all units update their states Si = sgn j wij Sj simultaneously. The sequence of selected units may be a fixed sequence or a random sequence. 140 . Asynchronous updates: • one unit at a time updates its state. 1 S1 = +1 S2 = −1 1 – p.

The state of a Hopfield network can be expressed in terms of the energy function 1 E=− 2 wij Si Sj i. and by applying iteratively the state update rule the Hopfield network will settle down in a stable state which corresponds to the desired pattern.Aim of a Hopfield Network Our aim is that by presenting a corrupted pattern. it is also a stable state for the network. 141 . Hopfield network is a method for • pattern completion • error correction. – p.j Hopfield observed that if a state is a local minimum in the energy function.

– p. 142 .Basin of Attraction and Stable States 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 basin of attraction stable states Within the space the stored patterns x(n) are acting like attractors.

13 Energy = −66.93 Pattern 6 Pattern 9 Pattern box – p. 143 .73 Pattern 0 Pattern 1 Pattern 2 Pattern 3 Pattern 4 Energy = −90.Haykin’s Digit Example Suppose we stored the following digits in the Hopfield network: Energy = −67.87 Energy = −82.73 Energy = −67.47 Energy = −83.33 Energy = −86.6 Energy = −77.

8 Energy = −29.33 Energy = −25.67 Energy = −30.13 Energy = −31.07 Energy = −20.4 Energy = −22.4 updated unit 117 updated unit 3 updated unit 48 updated unit 6 updated unit 79 – p. 144 .2 Energy = −13.87 Energy = −15.47 Energy = −34.Updated States of Corrupted Digit 6 Energy = −10.27 Energy = −12.6 Energy = −14.73 updated unit 80 updated unit 12 updated unit 114 updated unit 115 updated unit 49 Energy = −26.87 Start Pattern updated unit 40 updated unit 39 updated unit 81 updated unit 98 Energy = −18.2 Energy = −23.

67 Energy = −63.4 Energy = −45.67 Energy = −56.47 Energy = −58.6 Energy = −50.27 updated unit 113 updated unit 57 updated unit 103 updated unit 18 updated unit 109 Energy = −47.07 Energy = −42.) Energy = −36.47 Energy = −68 Energy = −71.33 Energy = −64.4 Energy = −41. 145 .27 updated unit 31 updated unit 58 updated unit 16 updated unit 29 updated unit 88 – p.4 Energy = −52.73 Energy = −38.4 updated unit 83 updated unit 71 updated unit 77 updated unit 26 updated unit 15 Energy = −60.Updated States of Corrupted Digit 6 (cont.

33 updated unit 72 updated unit 90 updated unit 19 updated unit 21 updated unit 25 Energy = −90.) The resulting pattern (stable state with energy −90.47 Energy = −90. Energy = −73.Updated States of Corrupted Digit 6 (cont.47 updated unit 73 Original Pattern 6 – p. 146 .27 Energy = −87.27 Energy = −81.73 Energy = −77.47 Energy = −84.47) matches the desired pattern.

8 Start Pattern updated unit 44 updated unit 12 updated unit 64 updated unit 45 Energy = −33.93 Energy = −32.Recall a Spurious Pattern Energy = −28.6 updated unit 98 updated unit 111 updated unit 50 updated unit 81 updated unit 95 Energy = −44.6 Energy = −40 Energy = −42.87 updated unit 65 updated unit 15 updated unit 54 updated unit 62 updated unit 33 – p.53 Energy = −44.8 Energy = −48.27 Energy = −28.53 Energy = −51.13 Energy = −50.27 Energy = −31.6 Energy = −37.4 Energy = −35.27 Energy = −30. 147 .

6 Energy = −63.4 updated unit 28 updated unit 112 updated unit 48 updated unit 88 updated unit 26 Energy = −71.27 Energy = −81.4 updated unit 73 updated unit 70 updated unit 40 updated unit 117 updated unit 106 – p.73 Energy = −66.13 Energy = −76.73 Energy = −56.Recall a Spurious Pattern (cont.8 Energy = −67.93 Energy = −61.) Energy = −53. 148 .2 updated unit 37 updated unit 91 updated unit 58 updated unit 84 updated unit 43 Energy = −63.6 Energy = −69 Energy = −70.6 Energy = −80.53 Energy = −59.93 Energy = −74.

93. It is a pattern which was not stored in the network. Energy = −84.93 Energy = −83. 149 .Recall a Spurious Pattern (cont.) The Hopfield network settled down in local minima with energy −84. This pattern however is not the desired pattern.8 Energy = −84.13 updated unit 61 updated unit 15 Original Pattern 9 – p.

53 Energy = −27.33 Energy = −24.6 Energy = −28.47 Energy = −36.07 Energy = −22.13 Energy = −32.47 updated unit 18 updated unit 100 updated unit 7 updated unit 103 updated unit 81 Energy = −32.Incorrect Recall of Corrupted Pattern 2 Energy = −22.13 Energy = −22.87 Energy = −31.67 updated unit 68 updated unit 86 updated unit 119 updated unit 33 updated unit 87 – p.07 Energy = −22.13 Start Pattern updated unit 97 updated unit 17 updated unit 58 updated unit 45 Energy = −24. 150 .33 Energy = −29.33 Energy = −35.53 Energy = −38.

6 Energy = −51.Incorrect Recall of Corrupted Pattern 2 (cont. 151 .67 Energy = −55.) Energy = −39.2 Energy = −41.4 Energy = −58.73 Energy = −45.27 updated unit 91 updated unit 37 updated unit 3 updated unit 31 updated unit 24 Energy = −60.87 Energy = −62.93 updated unit 101 updated unit 41 updated unit 117 updated unit 65 updated unit 10 – p.6 Energy = −56.6 updated unit 57 updated unit 73 updated unit 120 updated unit 104 updated unit 43 Energy = −51.8 Energy = −68.47 Energy = −48 Energy = −49.73 Energy = −61.87 Energy = −64.

67 Energy = −83.53 Energy = −85.93 Energy = −73.07 Energy = −78.8 updated unit 114 updated unit 67 updated unit 112 updated unit 47 updated unit 85 Energy = −84.8 Energy = −82.4 Energy = −87.Incorrect Recall of Corrupted Pattern 2 (cont.) Energy = −69.33 Energy = −86.53 updated unit 96 updated unit 48 updated unit 28 updated unit 38 updated unit 27 – p.47 Energy = −72.13 Energy = −71. 152 .13 Energy = −82.47 updated unit 8 updated unit 76 updated unit 32 updated unit 106 updated unit 75 Energy = −77.73 Energy = −88.87 Energy = −70.

Incorrect Recall of Corrupted Pattern 2 (cont.47 Energy = −82.) Although we presented the corrupted pattern 2.33 updated unit 86 Original Pattern 2 – p. 153 . Energy = −90. the Hopfield network settled down in the stable state that corresponds to pattern 6.

MacKay’s Example of an Overloaded Network Six patterns are stored in the Hopfield network. however most of them are not stable states. × Ö Ñ ÑÓÖ × Spurious states represent stable states that are different from the stored desired patterns. – p. 154 .

• Stable mixture states are not equal to any single pattern. They corresponds to a linear combination of an odd number of patterns. If Nmax = 4 log d then most of stored patterns can be recalled perfectly. 155 . • Capacity: What is the relation between the number d of units and the maximum number Nmax of patterns one can store by allowing d some small error. – p. • Spin glass states are local minima that are not correlated with any finite number of the original patterns.Spurious States and Capacity Reversed states ((−1) · x(n) ) have same energy as the original patterns x(n) .

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.