3 Hopfield Networks

1 The Discrete Hopfield Neural Network
The construction of the Hopfield Neural Network (HNN) is shown below
w11
w12
w13
w14
w1N
w21
w22
w23
w24
w2N
w31
w32
w33
w34
w3N
w41
w42
w43
w44
w4N
wN1
wN2
wN3
wN4
wNN
...
2
b1
y1
...
3
b2
y2
4
b3
y3
T
T
T
...
T
T
b4
y4
...
bN
yN
1. Figure The HNN architecture

The activation function for each neuron is the sgn(.) function, which means that the Hopfield
Network is using the McCullock-Pitts neuron model. Such a neuron model determines the
output of the network to be either firing yi = +1 or quiescent yi = 1 , which means that
the output vector is a set of 1 in the N dimensional space, y {- 1,1} . The network stabilizes
for a given input through initial states, where the update of the states can be driven by
different index rules:
Sequential index rule
The output to be updated is chosen by the following form: l = mod N k
Probabilistic index rule
The output to be updated is lot by a given Random Number Generator
The state transition rule for both index rules is:
N
yl (k + 1) = sgn Wlj y j (k ) bl .
j =1
Parallel index rule

All outputs are updated in the same time, so the state transition rule can be
written up in a vector form as follows:
y(k + 1) = sgn{Wy(k ) b} .
In the state transition forms y is the initial state, W is the weight matrix of the Hopfield
network, and b is the bias of the network (which can contain the inputs). When investigating
with such a nonlinear recursion, as given by these state transition rules, the following
questions must be answered:
Is this network stable?
N
What is its convergence speed if stable?

What are the steady states of the network?
What is the capacity of such a network?
1.1 The stability and convergence properties of the

Hopfield Net
At the proof of stability the theory of Lyapunov is used. The HNNs state transition rule is not
a linear recursion (see Figure 2. for the Lyapunov function).
L(y)
Upper bound
The decrease of
the Lyapunov
function in
one step
Decrease of
Lyapunov function
By the state transition
Lower bound
State space y
State transitions
2. Figure The Lyapunov function
1.1.1 Lyapunovs weak theorem

2. The Lyapunov function is denoted as L(y), where y {- 1,1} :
3. L(y) has a global lower bound, over the state space, L( y) A y Y ;
4. The change of L(y) is denoted by L( y(k + 1)) := L( y(k + 1)) L( y(k )) < 0 in each step of
the recursion;
then the recursion is stable and converges to one of the local minima of L(y).
N
1.1.2 Lyapunovs strong theorem
L(y) has a global lower and upper bound, over the state space, B L( y) A y Y ;
The change of L(y) is denoted by L( y(k + 1)) := L( y(k + 1)) L( y(k )) < in each
step of the recursion;
then the recursion is stable and converges one of the local minima of L(y), and its
B A
transient time can be upper bounded by TR
.
In each step L(y) decreases at least by and maximum number of steps needed to
B A
cover the distance B-A is TR
.
What should be the Lyapunov function? Let us define the following Lyapunov function:
L( y) = y T Wy 2b T y = y iWij y j 2 bi y i
i
If this is a Lyapunov function, then we need to check the following conditions.

2
Existence of global upper bound;

By the proof of global upper bound we use the Cauchy-Schwarz inequality
Reminder: a T b a b , also for matrices Aa A a
By the use of the inequality the upper bound (B) is
L( y) = y T Wy 2b T y < y Wy + 2 b y y
W +2 b y = N W +2 N y .
Existence of global lower bound;

N
Let us use instead of the broaden L(y), the broaden L(x), where y {- 1,1} , and x N . This
means that min N L(y ) minN L(x) .
y{1,1}
L(x) = xT Wx 2bT x = xiWij x j 2 bi xi

i
It is known that in the quadratic form the location of the minima can be calculated
as x = W 1b . This will yield to the following lower bound (A):
L(y ) min N L(y ) minN L(x) = minN {x T Wx 2b T x} = b T W 1b W 1 b 2 .
y{1,1}
The change of Lyapunov function

L( y(k + 1)) = L( y(k + 1)) L( y(k )) = y i (k + 1)Wij y j (k + 1) 2 bi yi (k + 1)
i
y i (k )Wij y j (k ) 2 bi yi (k )
i
Let us assume that the sequential index rule is used, so the change of the state vector can be
written up as
y (k ) = y1 (k ),..., yl (k ),..., y N (k )
y (k + 1) = y1 (k ),..., yl (k + 1),..., y N (k ) .
Substituting this into the form above, the change will be the following:
L( y(k + 1)) = Wll y l2 (k ) + y l (k )Wlj y j (k ) bl ,
j
where yl (k ) := yl (k + 1) yl (k ) , and is:
min
y{1,1}N ,i =1,... N
ij
y j bi
To see how this works, let us see the following table:

y l (k )
y l (k + 1)
y l (k )
Wll y l2 (k )
y l (k ) Wlj y j (k ) bl
j
L( y(k ))
-1
4Wll
4(Wll + )
-1
-2
4Wll
4(Wll + )
It can be seen from the table, that whatever the state transition is the change will always be
under L(y (k )) 4(Wll + ) .
1.1.3 Summary of, the Hopfield theory
N
The Hopfield nonlinear recursion yl (k + 1) = sgn Wlj y j (k ) bl , where
j =1
l = mod N k is
stable;
converges to the local maxima of the Lyapunov function:
L(x) = xT Wx 2bT x = xiWij x j 2 bi xi ;
i
the transient time can be written up as

N W + 2 N b + b T Wb
B A
TR
=N
O( N 2 ) .
4(Wll + )
It can be proven that the Hopfield network gives a very powerful solution for
maximum search. The transient time is much lower than the conventional
solutions transient time O( N 2 ) << O(2 N ) .
1.2 Associative signal processing by the Hopfield network

The Hopfield Network can work as an addressable memory. Some values (as images) are
stored in the HN, and for a noisy input the network chooses one of the stored elements. In an
over noised environment the states can be stabilized in local minima, and the solution is an
insensate over the state space (means no association is made).
Storage phase
Associative
Memory
Retrieval phase
3. Figure The association from a disturbed input

There is a state space given over the sets of clues X = {1,1}N , and there are stable points
given as S = {s , = 1,..., M } , and S X . The association means that for a given clue the
network gives an answer from the set of stable points X S . Let us define d as a distance
from a stable point. The association will be the stable point with the shorter distance. With
three stable points: s , s , s S, and
d (x, s ) < d (x, s )
x3
x2
x1
4. Figure Convergence to the defined stable points of the Hopfield network

The realization by the Hopfield network is: s = sgn{Ws b}, this expresses s as the stable
point for a given input.
The question is how to create the weight matrix of the network for a given association.
1.2.1 The capacity of the Hopfield net

In order to apply the Hopfield net as an associative mapping, one has to answer the following
questions:
Given
S = {s : = 1, , }
how
to
construct
W,
such
as
s = sgn Ws
= 1, (learning)
What is the maximum M as a function of the state-space dimension (number of
neurons) M = ( N ) ? (capacity)
1.2.1.1 The Hebbian learning-rule
The construction of the weight matrix is done in accordance with the Hebbian learning rule>
1
Given S = s : = 1, , then Wij = si s j
(1)
=1
1.2.1.2 Fix point analysis -the static capacity:

In the case of static capacity we are only concerned with the fix point analysis, namely we
would investigate that s = sgn{Ws } s S holds or not?
Let us investigate this componentwise and substitute definition (1) into W .
1
1
s i = sgn wij s j = sgn s i s j s j = sgn s i

sj sj =
j =1
j =1
j =1 =1
=1
1
= sgn s i
sj
j=1
1

where i = si
sj sj .
j =1
=1
( ) + s
=1

s j s j = sgn s i + i ,
j=1
Apparently, if i < 1i , then si = sgn si + i holds. This casts the problem as a coding task,
defined as follows: construct a set of S = {s : = 1,, }in such a way that
i =
=1 j =1
s j s j 1 holds for i, , , .
The maximum number of such vectors M yields the capacity of the Hopfield net.
1.2.1.2.1 An easy approach to static capacity:
Let us set i = 0i, , , , which can be easily achieved by using an orthogonal vector
set
s
j =1
s j = 0 , ,
In this case condition (2) surely holds as it becomes
s i = sgn s i
However, the maximum number of orthogonal vectors in an N dimensional space is N which
yields to a maximal capacity of:
Cap stat = = .
{ }
1.2.1.3 A dynamic approach to capacity:

What we have discussed so far is a static approach, neglecting the fact that the network does
not necessary converge to any of s : = 1, , . In order to reveal that these fix points are
stable (i.e. we are indeed dealing with steady states) we need a dynamic approach based on
the Lyapunov technique. Namely, we should see whether on the fix points s : = 1, ,
the Lyapunov function y T Wy is locally maximal. More precisely, we have to prove that for
what M, the following inequality holds
s T Ws > y T Wy y { 1,1} and s S
Substituting definition (1) into W we obtain

1
1

s T Ws = s i
s i s j s j = s i s i s j s j =
=1
=1 i =1
i =1 j=1
j=1
1
1
= si si = si
=1 i =1
i =1
( )
+ si si = N
=1 i =1
because of the orthogonal s , = 1, ,

Similarly

1
1
y T Wy = y i s i s j y j = y i si y j s j =
=1
i =1 j =1
=1 i =1
j =1
1

= y i si
=1 i =1
Remark 1:
Note that the inner product of two vectors a T b where a,b {1,1} can be expressed with
their Hamming distance as a T b = ai bi = 2d (a, b ) , where d (a, b ) denotes the Hamming

i=
distance (taking into account the smallest deviation of the stored pattern).
Thus, the condition for stable steady states in the case of orthogonal coding reduces to
N>
( 2d (s , y ))
for y { 1,1}
(4)
If this condition is neglected then spurious steady states can occur, meaning the Hopfield
algorithm can stop in those states y which do not belong to the pre-selected set of patterns
=1
y S = s : = 1, ,
Remark 2:
Heuristic suggests that if we upperbound the right hand side of inequality (4), we obtain
2
2
2)
2)
(
(
N>
=
=1
leading to a very small capacity Cap
dynamic
= <
( 2)
. This small capacity results from
the crude nature of the bound.
1.2.1.4 Information theoretic capacity and statistical neurodynamics

As we saw, the dynamic capacity of the Hopfield network is strongly limited in the case of
orthogonal coding. Therefore, we would be better off relaxing the condition of orthogonal and
replacing it with quasi-orthogonal. This gives way to a statistical approach, in which we
choose each si as samples of identically distributed independent Bernoulli random variables,

according to the distribution
P si = 1 = P si = 1 = 0.5
In this way, the event s = sgn {Ws } , have certain probabilities. As a result, we define
information theoretic capacity as follows:
1.2.1.4.1 Definition (Information theoretic capacity):

If
there
is
lim P s = sgn Ws
an
= ( ) function
}) = 1 = 1, , ,
which
guarantees
that
then this = ( ) is called information
theoretic capacity of the Hopfield network.

Theorem:
The information theoretic capacity of the Hopfield net is
Proof:
2 log
P s = sgn Ws
=1 j =1

i =1

s i = sgn Wij s j = P s i = sgn Wij s j Let us
i =1
j =1
j =1

P si = sgn Wij s j
j =1
first investigate
i =
}) = P
sj sj
and recall that
s i = sgn s i + i
where
Since we investigate asymptotic capacity ( ) , therefore applying the central limit

1
2
theorem i is a Gaussian random variable E i = 0 and E i =
. Therefore,
P si = sgn s i + i = P 1 = sgn{1 + i } si = 1 P si = 1 +
}) (
)(
)(
P 1 = sgn{ 1 + i } si = 1 P si = 1 = 0.5(P(i > 1) + P(i < 1)) =

1
1
= 0.51
+
1
.
Since this result does not depend on i , we can write
P s = sgn Ws = P s i = sgn wij s j =
i =1
j =1

= 1 which is the same as
Now want to find the function = ( ) for wide lim

= 0.
lim log

One can recall an approximation formula for ( x ) for large x is
})
x2
1
(x ) 1 e 2
x
which yields
1
2
lim log1
e

applying the approximation log(1 x ) x if x is small

1
lim
log + log
2
e
= lim e 2 2

1 log
1
= log than the above formula still tends to zero in order O e 2 .
2
. This result was derived by Amari in

Thus the information theoretic capacity is
2 log
1987.
If we choose
Cap inf .theoretic = M =
N
2 log N
1.2.1.5 Statistical dynamic capacity - the foundations of neurodynamics

As was stated in section xxx, the dynamic behavior around the steady states must be
investigated to figure out whether the desired steady states S = {s , = 1, , } are stable or
not. In this investigation we apply the method of statistical mechanics. Namely, when we
define asymptotic behavior ( ) then the micro-state of the system is characterized by
the y (k ) . The change of the micro-states are described by
y(k + 1) = sgn{Wy(k )} .
On the other hand we define macro-states of the system as statistical averages on y (k ) given
in the form of a (k ) = Av(y (k )) . Our concern is to derive the state transition rule of the macrostates to arrive at a recursion a (k + 1) = (a (k )) based on the micro-state. Revealing we
say that the system is described at macro-state level. This approach is often referred to as
mean field theory.
Let us introduce the direction cosine as a macro-state given by the following form:
y (k )
1
s
si yi ( k )
a ( k ) cos
i =1
First we would like to reveal a a(k +1) = (a(k )) , then we would like to prove that
a (k +1) = (a(k )) is a noncontractive mapping, namely a (k +1) > a(k ) which means
that (k ) 0 . In this case s is a stable steady state. Without the loss of generality, we
assume that s = (1,1,,1) .
a ( k ) :=
1
1
(
)
=
s
y
0
i i
yi (0)
i=1
i=1
1
1
and investigate a(1) = si yi (1) = yi (1)
i =1
i =1
Let us start with a(0) =
1
yi (1) Eyi (1) = 1E ( yi (1) = 1) 1P ( yi (1) = 1) =
i =1
= P sgn w ij y j (0 ) = 1 P sgn w ij y j (0) = 1 =
j=1

j=1
= P(sgn{a (0 ) + i (0 )} = 1) P(sgn{a (0 ) + i (0 )} = 1)
1
1
where i (0 ) = si s j y j (0 ) and a (0 ) = s j y j (0 ) we can again use the central

j =1
=1
j =1
a (1) =
limit theorem to prove that i (0) is Gaussian with E i ( 0 ) = 0 E i 2 =

9
Thus a (1) = P(sgn{a (0 ) i (0 )} = 1) P(sgn{a (0 ) + i (0 )} = 1) =
a ( 0)
a (1) = P ( i ( 0 ) > a ( 0 ) ) P ( i ( 0 ) < a ( 0 ) ) = 1
a (0)
1
a (1) = 2
1
Now we could deduce, that in general
a (k )
1
a ( k + 1) = 2
1
(1.1)
However in general it is not true since there is a strong correlation in terms

i (k ), i (k + 1), i (k + 2) etc. Without better analytical solution at hand, though, we try to
derive dynamical capacity from formula ((1.1)).
Theorem:
N
The neuro-dynamical capacity of Hopfield net is given as
2
Proof:
a (0 )
Let us analyze recursion a (k + 1) = 2

1
1
then
If 2 (x ) 1 > xx R
y
a(2)
a(1)
x
a(0)
a(1)
a(2)
5. Figure The stable case

lim a (k ) = 1a (k ) < a (k + 1) < a (k + 2)
If 2 (x ) 1 < xx R
then
10
a(1)
a(2)
x
a(2) a(1)
a(0)
6. Figure The unstable case

lim a ( k ) = 0
k
a ( k ) > a ( k + 1) > a ( k + 2 )
The condition of stability 2 (x ) 1 > xx R
a (k)
=
( 0 ) > tg = 1
2
4
da ( k )
1
a ( k )=
1 <
2
2
1
e 0 =
1
2
2
>1
1
2
2
This yields to very strict conditions when applying the Hopfield net as an associative
memory.
1.3 Applications of the Hopfield Network

1.3.1 The Hopfield network for general pattern recognition
s
s
Clue
s
pattern
7. Figure The Hopfield network as an associative memory can be used for pattern recognition, where
the stored patterns are the stable points of the network.
Many problems can be driven to pattern recognition.

Pattern
Distortion
Clue
11
HNN
Pattern
8. Figure The Hopfield network as a universal solver for the pattern recognition problem
The Hopfield network can be used for image recognition, if the Hopfield network is used as
an associative memory, where the stored images are the patterns to be recognized. The only
problem is that respect to the number of images the size of the network drastically increases.
Bitmap Image
HNN
Recalled Image
Stored Images
9. Figure The Hopfield network used for image recognition
1.3.2 The HNN for character recognition

The problem is how to recognize, from the noisy samples, the real sent samples. Lets see the
following, problem. Only one digit must be recognized. For this exercise 4 digits (0, 1, 2, 3)
must be stored by the Hopfield network. The patterns are constructed of a 5x5 pixels. The
characters can be followed in Figure 10.
10. Figure The stored characters that are to be recognized
The question is how to construct the network in a way, that it always converges to one of the
stored patterns, and that pattern is the right one. In Figure 11. one can follow how pattern
recognition works well, and in Figure 12. how it misses to recognize the characters and
converges to wrong steady states.
a)
b)
a)
b)
c)
d)
c)
d)
12
a)
b)
a)
b)
c)
d)
c)
d)
11. Figure One can see how the Hopfield network associates to the right characters a) Output of the
HNN, b) The character to be recognized, c) The character corrupted by noise, d) The noise
Although in Figure 10. one can follow the work of the HNN as a good character recognizer,
we must consider, that the probability of recognizing the right number is very low. This can
be explained with the very large correlation between the characters. In our case the noise
power was taken to be 3dB. Number 1 was approximately always recognized well (error
probability was about 103 ). For number 0, 2 and 3 the match probability was only 0.1, which
is extremely low, but can be explained by the high similarity of the numbers.
a)
b)
a)
b)
c)
d)
c)
d)
a)
b)
a)
b)
c)
d)
c)
d)
12. Figure Examples of bad character recognition a) Output of the HNN, b) The character to be
recognized, c) The character corrupted by noise, d) The noise. Note that in the case of recognizing
digit one, the noise energy was increased.
It has been seen that the HNN, for this kind of recognition problem, can stuck into unwanted
steady states. The question is how to solve the problem. The idea of coding the stored patterns
13
can give better results. The point is to construct such a transformation, where the attraction
area is increased (see Figure 12.), and the probability of mismatch is decreased. Where there
are only as many steady states as many characters to be recognized.
------------------ Spurious steady states jn ide ------------------------
13. Figure The increase of the attractions
1.3.2.1 The network and the transformations

If we want to apply the results of the information theory it is necessary to assure the quasy
orthogonality for the patterns. It means that the bit {1} and the bit {-1} have to appear with
the same probability in the pattern ( P si = 1 = P si = 1 = 0.5 ). So s must be chosen

freely with uniform distribution. In this case the capacity of the associative memory can be
determined as
, where N is the dimension of the patterns.
2 log
In practice the memory items S = s , = 1,, are not independent. So we have to
encode the original S into S which satisfies the quasy orthonogonality. The coder F : S S '
must be topologically invariant with small complexity.
Decoder
Coder
) (
d x,r s
Hopfield
Network
s ()
Random patterns
-1
r ()
{(s ,r ) = 1,2,..., M}
Storage phase
Real patterns
s ()
=1,2,...,M
r ()
14. Figure The modified pattern recognition system implemented by HNN
1.3.2.2 The coder

The following coder was proposed for this problem:
1. Define s1, s2,..., sM as M pattern samples, sk,m is the kth element of pattern sample
N
s ,si {1,1} ,1 M,1 i N .
14
2. Generate M binary vectors s ,si {1,1} ,1 M,1 i N , where the elements of

the vectors are uniformly distributed random variables.
3. Compute the coder matrix elements by the following equation:
1 M
M ij = s i ' s j
N =1
If we will use this coder we should determine the neural networks optimal weights in
function of the pattern set S' = s ' , = 1, , . It allows a good capacity and better stability
N
for the associative memory according to information theory analysis.

The output of the coder will be x=F(x) where the ith element of the x vector can be
determined using the relation:
n
x'i = sgn M ij x j
j =1
This relation assures a small complexity mapping that can easily be calculated. The decoder
could be the (Moore-Penrose) inverse matrix of F.
1.3.2.3 The topological invariance of mapping

We have to make further examination of the above method, namely: Is the mapping,
described in the previous section, topologically invariant? To answer this question we must
prove:
d (x, s ) d (x ', s ' )
15. Figure : The topologically invariant mapping
Let us assume that

s = (+1 + 1 ... + 1) si = 1 i ,
d ( x , s ) d ( x , s ), = 1..M,
The coders output using the x vector as the input with the specified attribute:
N
j =1
j =1
x' i = sign{ M ij x j } = sgn{
x' j = sgn{si '
1
N
j =1
=1

1
N
s j x j + si '
=1
j =1
si ' s j x j } = sgn{ si '

1
N
s j x j } = sgn{si '
j =1
15
1
N
1
N
s x }
=1
s x
j =1
+ i}
where i =
1 '
si s j x j
=1 j =1

Since we investigate asymptotic capacity ( ) , therefore applying the central limit

1
2
theorem, i is a Gaussian random variable E i = 0 and E i =
.
Note that the inner product of two vectors aTb where a,b {1,1} can be expressed with
their Hamming distance as aTb = a i bi = 2d ( a,b ) , where d ( a,b ) denotes the Hamming
i=
distance.
Using the remarks:
x 'i = sgn{
1
N
x'i si ' =
i =1
N 2d(x , x)
+ i }
N
1
x'i Eyi (1) = 1P(x'i = 1) 1P(x'i = 1) =
i =1
N 2d(x , x)
N 2d(x , x)
+ i )) + P(1 = sgn(
+ i )) =
N
N
N 2d(x , x)
N 2d(x , x)
= P(sgn( i >
) P( i <
)=
N
N
N 2d(x , x)
M -1
= 2 (
) 1 =
N
N
'
The Hamming distance between x ' and s will be:
N 2d(x, s ' )
'
d(x, s ) = N(1 (
))
N
So the problem d( x, s ' ) d(x, s ' ) is equivalent with (see Figure. 16.):
= P(1 = sgn(
x N (1 (
N 2x
)) x (0, N )
N
16. Figure : The graphical representation of the problem

Let define the statistical gain, which will show the influence of the number of stored
memory items (M) to the topological invariance:
N a
G=
N
16
The statistical gain strongly depends on M. If M is small G will be large. The optimal
value of G might converge to 1.
1.3.3 The Hopfield network in automatic surveillance system

In automatic surveillance systems, the object is to detect from audiovisual inputs different
situations and noises.
mikes
cams
17. Figure The setup of a one room surveillance system

In the surveillance system the possible noises and situations are stored. The Hopfield network
for any given input converges to one of the stable points, which determines the situation by
the audio and video signal.
O(N2)
x
Audio sugnal
Video sugnal
screch, gunshot, noise
Pre
proc.
HN
crowd, overcrowd, normal
HN
r
b)
a)
18. Figure a) The block diagram of the surveillance system, b) the Hopfield network in the
surveillance system as an associative memory
------------------ Mg kpek jnnek ide --------------------------
1.3.4 The Hopfield network as a combinatorial optimizer

In the case of combinatorial optimization the goal is to find the optima (maxima or minima)
of an optimization problem. It has already been shown that Hopfield networks can be used for
seeking the maximum of a Lyapunov function. One question is how to define a quadratic
form to a given optimization problem. This part can be done by quadratic programming. It has
already been proven that the HN can optimize a quadratic form in polynomial time.
Optimization
problem
opt L(y )
y
Quadratic programming
19. Figure Quadratic programming

17
HN
In the case of optimization by HN two problems must be solved:

globality vs. locality (optima);
Introducing the HN as a minimizer instead of maximizer of L(y).
1.3.5 The HN as a quadratic minimizer

~ W for i j
Let us introduce a weight matrix as Wij = ij
, which means that the diagonal
0 for i = j
elements are cancelled. The question is how does this effect the behavior of the HN.
Lemma: y opt : min N y T Wy 2b T y , as it had been discussed before
y{1,1}
~
~
y opt : min N y T Wy 2b T y , the following equality should be examined y opt = ~
y opt .
y{1,1}
y W
i
ij
2
~
y j 2 bi yi = yiWij y j + Wii yi 2 bi yi =Wij yi y j + Wii 2 bi yi =
i
j
i j
independent from y
~
~
= const + y T Wy 2b T y = y T Wy 2b T y
As the minimization is independent from the diagonal elements of the weight-matrix, the
minima of the function will be exactly the same as the minima of the original Lyapunov
function. This will lead to a modified state transition rule:
N ~
yl (k + 1) = sgn Wlj y j (k ) bl
i =1
As the minimization is the same for both Lyapunov functions the global upper bound and
lower bound is also the same for the modified network.
Global upper bound: N W + 2 N b
Global lower bound: b T Wb
To examine the behavior of the HN as an optimizer, the change of the Lyapunov function
must be examined: L( y(k + 1)) = Wll y l2 (k ) + y l (k )Wlj y j (k ) bl

j
As the diagonal elements are cancelled, only the second part of the expression is taken into
account. This means that by minimization the change is:
L( y(k + 1)) = y l (k )Wlj y j (k ) bl

j
L( y( k ))
y l (k )
y l (k + 1)
y l (k )
y l ( k ) Wlj y j ( k ) bl
j
-1
-1
-2
1.3.6 The traveling salesman problem
( )) it seems plausible to
Since HN is a low complexity optimization architecture TR ~ O N 2
apply it in general optimization theory. We do this in the hope of reducing NP complete

problems polynomial complexity. Of course, in general NP complete problems do not lend
18
themselves to polynomial optimization. There are some problems, however, which yield
themselves to quadratic optimization, thus can be solved by HN. One of the most important
classes of problem, like this is the routing problem, or traveling salesman problem as it is
known in the literature. In the following sections we will investigate this problem closely.
How to use HN as an optimizer
Comb.
y : min L( y)
Binary
Data
Quadratic opt
opt.
problem representation representation Optimization
HN
DeRepresentation
Opt
solution
NP hard
20. Figure The philosophy of quadratic programming
Limitations:
It can only be applied to problems which can be turned into quadratic

optimization
Translating a problem into quadratic optimization is difficult if possible
HN can get stuck in local optima and it does not necessarily provide the global
optimum as a solution to the problem
1.3.6.1 HN as an optimizer in the traveling salesman problem:

The problem
Let us assume that there are N cities described by a set { A, B, C N } with a corresponding
od AB d AC
distance matrix D = d BA od BC where Dij = D ji and D ii = 0 . The question is how to set up
an optimal path (a sequence of cities visited one after another), described by the sequence
{ B, A, D, N } which yields a minimum distance path. In this way an associated cost function
which depends on the distance (e.g. price of fuel associated with the travel) has been
minimized. Figure 20. shows some solutions in the case of 5 cities.
A
dAB
dAB
C E
dAB
a)
b)
c)
21. Figure The traveling salesman problem for five cities with a) fully connected graph, b)
suboptimal solution and c) the optimal solution
It is easy to see that the third solution is more favourable than the second one.
19
1.3.6.1.1 Connection with the routing problem

The traveling salesman problem is equivalent with the routing problem if one is sending
pockets and wants to evaluate which route belongs to the lowest cost.
Representation and properties of a valid path
A path can be represented by a binary matrix V written in the form of
V =
cities
\ time
.
.
.
N
where vij {0,1}indicates, whether at time instant i the salesman is in the city j (vij = 0 ) if not
vij = 1 if yes.
As a result a valid path has the following properties:

each row in matrix V is orthogonal to each other
each column in matrix V is orthogonal to each other
the number of 1s in matrix V is N
Calculation of the distance function:
The overall distance (cost) which belongs to a path V and which is to be minimized is given
as follows:
cos t (v ) = Vij Vi +1,l D jl
(1.2)
i
j l
Simply minimizing this function, however, one can find a solution, which does not fulfill the
properties of a valid path listed above. Thus those properties must be built in the optimization
function as constraints.
Calculation of the goal function to be minimized, including the constraints
As was mentioned above the goal function must be extended to include the constraints.
Therefore, instead of (1.2), we use the following function (which is to be minimized):
2
J (v ) = Vin Vjn + Vni Vnj + Vij n +

j
j
i
n
i
n
i j
j i
j i
i j
+ Vij Vi +1,l D jl
i
j l
In the process of minimizing (1.3)

term enforces the orthogonality of the rows of V
term enforces the orthogonality of the columns of V
term enforces that the number of 1s does not exceed n
term represents the overall distance
20
(1.3)
Minimization by the Hopfield algorithm

The goal function expressed in the form of (1.3) can easily be turned into a quadratic
optimization problem by introducing state vector y according to the formula
y Ni + j = 2(Vij 0.5)
y = (2(V11 0.5),2(V12 0.5) 2(V1N 0.5),,2(WN1 0.5) 2(WNN 0.5))

+1
y
Vij = iN + n
2
Then (1.3) becomes
y Nj+ n
y
+ 0.5 +
(y ) = Ni + n + 0.5
2
j
2
i
n
j i
y Nn + j
y
+ 0.5 +
+ Ni + n + 0.5
2
j
2
i
n
j i
y Ni + j
+
+ 0.5 n +
2
j
i
j i
y Ni + j
y
+
+ 0.5 Ni + N +l + 0.5 D lj =
2
j
i
l
2
j l
= y My 2c T y
T
As the derivation above indicates the problem has been reduced to the optimization of a
quadratic form, which is the Lyapunov function of the Hopfield algorithm
y opt : min (y ) = min yMy 2c T y
y
Note that if one calculates matrix M, it yields M ii = 0i .

Applying the Hopfield algorithm
N
y e (k + 1) = sgn M ej y j (k ) k e
j =1
2
(k ) = y e (k )M ee + 2y e (k ) M ej y j (k ) b e 0
j=1
1.3.7 Hopfield network for optimal detection in mobile envirorment

The Hopfield network can also been used as a decoder in communication systems, where the
stable point of the HNN are the possible codewords, so for any transmitted signal the output
will surely be one of the codes. The size of the network drastically increases respect to the
length of the codeword.
s
Coder
Channel
Distortion
x
Clue
HNN
c , = 1,2,..., M
21
22. Figure The Hopfield network as a decoder
In mobile environment the transmitted signal is distorted in the channel, because of

InterSymbol Interference (ISI). At the receiver the AWGN noise is added to the signal. The
received signal must be detected at the receiver. The model can be followed in Figure 20.
Channel
Max
likelihood
~y
23. Figure The model for mobile detection system
The optimal decision can written into the following form: ~

y : max P(x y ) . The received signal
y
is: x = Hy + , where y {-1,1} is the sent signal, x R , H is the channel matrix, and is
the additional noise.
1
1
( x - Hy )T K 1 ( x - Hy )
P(x y ) =
e2
N
(2) det K
T
~
y : max P(x y ) = min (x - Hy ) K 1 (x - Hy ) = min y T Wy - 2b T y
N
y{1,1}N
y{1,1}N
y{1,1}N
Here K is the covariance matrix of the noise, and W=HT K -1H,b=xT HT K -1 . Unfortunately by
exhaustive search this problem has exponential complexity, which means cannot be solved
real time. Therefore suboptimal solution is needed, which guarantees real time solution. Such
an outcome can be assured using the Hopfield net, and the detection scheme by the HN can be
followed on Figure 22.
W l j
Wlj = lj
0 l= j
N ~
y l (k + 1) = sgn Wlj y j (k ) b l O( N 2 )
j=1
HN
24. Figure The detection system by the Hopfield network
22
~y

3 Hopfield Networks

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

3 Hopfield Networks

Hochgeladen von

Copyright:

Verfügbare Formate

1 The Discrete Hopfield Neural Network

The construction of the Hopfield Neural Network (HNN) is shown below

1. Figure The HNN architecture

Parallel index rule

What is its convergence speed if stable?

1.1 The stability and convergence properties of the

2. Figure The Lyapunov function

1.1.1 Lyapunovs weak theorem

1.1.2 Lyapunovs strong theorem

If this is a Lyapunov function, then we need to check the following conditions.

Existence of global upper bound;

Existence of global lower bound;

L(x) = xT Wx 2bT x = xiWij x j 2 bi xi

The change of Lyapunov function

L( y(k + 1)) = Wll y l2 (k ) + y l (k )Wlj y j (k ) bl ,

To see how this works, let us see the following table:

1.1.3 Summary of, the Hopfield theory

the transient time can be written up as

1.2 Associative signal processing by the Hopfield network

3. Figure The association from a disturbed input

4. Figure Convergence to the defined stable points of the Hopfield network

1.2.1 The capacity of the Hopfield net

1.2.1.2 Fix point analysis -the static capacity:

s i = sgn wij s j = sgn s i s j s j = sgn s i

In this case condition (2) surely holds as it becomes

1.2.1.3 A dynamic approach to capacity:

s T Ws > y T Wy y { 1,1} and s S

Substituting definition (1) into W we obtain

because of the orthogonal s , = 1, ,

their Hamming distance as a T b = ai bi = 2d (a, b ) , where d (a, b ) denotes the Hamming

. This small capacity results from

the crude nature of the bound.

1.2.1.4 Information theoretic capacity and statistical neurodynamics

choose each si as samples of identically distributed independent Bernoulli random variables,

information theoretic capacity as follows:

1.2.1.4.1 Definition (Information theoretic capacity):

then this = ( ) is called information

theoretic capacity of the Hopfield network.

and recall that

Since we investigate asymptotic capacity ( ) , therefore applying the central limit

P 1 = sgn{ 1 + i } si = 1 P si = 1 = 0.5(P(i > 1) + P(i < 1)) =

Since this result does not depend on i , we can write

P s = sgn Ws = P s i = sgn wij s j =

applying the approximation log(1 x ) x if x is small

. This result was derived by Amari in

Cap inf .theoretic = M =

1.2.1.5 Statistical dynamic capacity - the foundations of neurodynamics

= P sgn w ij y j (0 ) = 1 P sgn w ij y j (0) = 1 =

where i (0 ) = si s j y j (0 ) and a (0 ) = s j y j (0 ) we can again use the central

limit theorem to prove that i (0) is Gaussian with E i ( 0 ) = 0 E i 2 =

Thus a (1) = P(sgn{a (0 ) i (0 )} = 1) P(sgn{a (0 ) + i (0 )} = 1) =

a (1) = P ( i ( 0 ) > a ( 0 ) ) P ( i ( 0 ) < a ( 0 ) ) = 1

Now we could deduce, that in general

However in general it is not true since there is a strong correlation in terms

Let us analyze recursion a (k + 1) = 2

5. Figure The stable case

6. Figure The unstable case

The condition of stability 2 (x ) 1 > xx R

1.3 Applications of the Hopfield Network

Many problems can be driven to pattern recognition.

9. Figure The Hopfield network used for image recognition

1.3.2 The HNN for character recognition

10. Figure The stored characters that are to be recognized

13. Figure The increase of the attractions