Beruflich Dokumente
Kultur Dokumente
w11
w12
w13
w14
w1N
w21
w22
w23
w24
w2N
w31
w32
w33
w34
w3N
w41
w42
w43
w44
w4N
wN1
wN2
wN3
wN4
wNN
...
2
b1
y1
...
3
b2
y2
4
b3
y3
T
T
T
...
T
T
b4
y4
...
bN
yN
the output vector is a set of 1 in the N dimensional space, y {- 1,1} . The network stabilizes
for a given input through initial states, where the update of the states can be driven by
different index rules:
Sequential index rule
The output to be updated is chosen by the following form: l = mod N k
Probabilistic index rule
The output to be updated is lot by a given Random Number Generator
The state transition rule for both index rules is:
N
yl (k + 1) = sgn Wlj y j (k ) bl .
j =1
Upper bound
The decrease of
the Lyapunov
function in
one step
Decrease of
Lyapunov function
By the state transition
Lower bound
State space y
State transitions
L(y) has a global lower and upper bound, over the state space, B L( y) A y Y ;
The change of L(y) is denoted by L( y(k + 1)) := L( y(k + 1)) L( y(k )) < in each
step of the recursion;
then the recursion is stable and converges one of the local minima of L(y), and its
B A
transient time can be upper bounded by TR
.
In each step L(y) decreases at least by and maximum number of steps needed to
B A
cover the distance B-A is TR
.
What should be the Lyapunov function? Let us define the following Lyapunov function:
L( y) = y T Wy 2b T y = y iWij y j 2 bi y i
i
W +2 b y = N W +2 N y .
It is known that in the quadratic form the location of the minima can be calculated
as x = W 1b . This will yield to the following lower bound (A):
L(y ) min N L(y ) minN L(x) = minN {x T Wx 2b T x} = b T W 1b W 1 b 2 .
y{1,1}
y i (k )Wij y j (k ) 2 bi yi (k )
i
Let us assume that the sequential index rule is used, so the change of the state vector can be
written up as
y (k ) = y1 (k ),..., yl (k ),..., y N (k )
y (k + 1) = y1 (k ),..., yl (k + 1),..., y N (k ) .
Substituting this into the form above, the change will be the following:
j
where yl (k ) := yl (k + 1) yl (k ) , and is:
min
y{1,1}N ,i =1,... N
ij
y j bi
y l (k ) Wlj y j (k ) bl
j
L( y(k ))
-1
4Wll
4(Wll + )
-1
-2
4Wll
4(Wll + )
It can be seen from the table, that whatever the state transition is the change will always be
under L(y (k )) 4(Wll + ) .
N
The Hopfield nonlinear recursion yl (k + 1) = sgn Wlj y j (k ) bl , where
j =1
l = mod N k is
stable;
converges to the local maxima of the Lyapunov function:
L(x) = xT Wx 2bT x = xiWij x j 2 bi xi ;
i
It can be proven that the Hopfield network gives a very powerful solution for
maximum search. The transient time is much lower than the conventional
solutions transient time O( N 2 ) << O(2 N ) .
Associative
Memory
Retrieval phase
x3
x2
x1
Given
S = {s : = 1, , }
how
to
construct
W,
such
as
s = sgn Ws
= 1, (learning)
What is the maximum M as a function of the state-space dimension (number of
neurons) M = ( N ) ? (capacity)
1.2.1.1 The Hebbian learning-rule
The construction of the weight matrix is done in accordance with the Hebbian learning rule>
1
Given S = s : = 1, , then Wij = si s j
(1)
=1
1
1
j =1 =1
=1
1
= sgn s i
sj
j=1
1
where i = si
sj sj .
j =1
=1
( ) + s
=1
s j s j = sgn s i + i ,
j=1
Apparently, if i < 1i , then si = sgn si + i holds. This casts the problem as a coding task,
defined as follows: construct a set of S = {s : = 1,, }in such a way that
i =
=1 j =1
s j s j 1 holds for i, , , .
The maximum number of such vectors M yields the capacity of the Hopfield net.
1.2.1.2.1 An easy approach to static capacity:
Let us set i = 0i, , , , which can be easily achieved by using an orthogonal vector
set
s
j =1
s j = 0 , ,
s i = sgn s i
However, the maximum number of orthogonal vectors in an N dimensional space is N which
yields to a maximal capacity of:
Cap stat = = .
{ }
1
1
s T Ws = s i
s i s j s j = s i s i s j s j =
=1
=1 i =1
i =1 j=1
j=1
1
1
= si si = si
=1 i =1
i =1
( )
+ si si = N
=1 i =1
1
1
y T Wy = y i s i s j y j = y i si y j s j =
=1
i =1 j =1
=1 i =1
j =1
1
= y i si
=1 i =1
Remark 1:
Note that the inner product of two vectors a T b where a,b {1,1} can be expressed with
distance (taking into account the smallest deviation of the stored pattern).
Thus, the condition for stable steady states in the case of orthogonal coding reduces to
N>
( 2d (s , y ))
for y { 1,1}
(4)
If this condition is neglected then spurious steady states can occur, meaning the Hopfield
algorithm can stop in those states y which do not belong to the pre-selected set of patterns
=1
y S = s : = 1, ,
Remark 2:
Heuristic suggests that if we upperbound the right hand side of inequality (4), we obtain
2
2
2)
2)
(
(
N>
=
=1
leading to a very small capacity Cap
dynamic
= <
( 2)
In this way, the event s = sgn {Ws } , have certain probabilities. As a result, we define
there
is
lim P s = sgn Ws
an
= ( ) function
}) = 1 = 1, , ,
which
guarantees
that
2 log
P s = sgn Ws
=1 j =1
i =1
s i = sgn Wij s j = P s i = sgn Wij s j Let us
i =1
j =1
j =1
P si = sgn Wij s j
j =1
first investigate
i =
}) = P
sj sj
s i = sgn s i + i
where
P si = sgn s i + i = P 1 = sgn{1 + i } si = 1 P si = 1 +
}) (
)(
)(
1
1
= 0.51
+
1
.
i =1
j =1
= 1 which is the same as
Now want to find the function = ( ) for wide lim
= 0.
lim log
One can recall an approximation formula for ( x ) for large x is
})
x2
1
(x ) 1 e 2
x
which yields
1
2
lim log1
e
lim
log + log
2
e
= lim e 2 2
1 log
1
= log than the above formula still tends to zero in order O e 2 .
2
If we choose
N
2 log N
y(k + 1) = sgn{Wy(k )} .
On the other hand we define macro-states of the system as statistical averages on y (k ) given
in the form of a (k ) = Av(y (k )) . Our concern is to derive the state transition rule of the macrostates to arrive at a recursion a (k + 1) = (a (k )) based on the micro-state. Revealing we
say that the system is described at macro-state level. This approach is often referred to as
mean field theory.
Let us introduce the direction cosine as a macro-state given by the following form:
y (k )
1
s
si yi ( k )
a ( k ) cos
i =1
First we would like to reveal a a(k +1) = (a(k )) , then we would like to prove that
a (k +1) = (a(k )) is a noncontractive mapping, namely a (k +1) > a(k ) which means
that (k ) 0 . In this case s is a stable steady state. Without the loss of generality, we
assume that s = (1,1,,1) .
a ( k ) :=
1
1
(
)
=
s
y
0
i i
yi (0)
i=1
i=1
1
1
and investigate a(1) = si yi (1) = yi (1)
i =1
i =1
Let us start with a(0) =
1
yi (1) Eyi (1) = 1E ( yi (1) = 1) 1P ( yi (1) = 1) =
i =1
j=1
j=1
= P(sgn{a (0 ) + i (0 )} = 1) P(sgn{a (0 ) + i (0 )} = 1)
1
1
a ( 0)
a (0)
1
a (1) = 2
1
a (k )
1
a ( k + 1) = 2
1
(1.1)
a (0 )
then
If 2 (x ) 1 > xx R
y
a(2)
a(1)
x
a(0)
a(1)
a(2)
then
10
a(1)
a(2)
x
a(2) a(1)
a(0)
a ( k ) > a ( k + 1) > a ( k + 2 )
a (k)
=
( 0 ) > tg = 1
2
4
da ( k )
1
a ( k )=
1 <
2
2
1
e 0 =
1
2
2
>1
1
2
2
This yields to very strict conditions when applying the Hopfield net as an associative
memory.
s
Clue
s
pattern
7. Figure The Hopfield network as an associative memory can be used for pattern recognition, where
the stored patterns are the stable points of the network.
Distortion
Clue
11
HNN
Pattern
8. Figure The Hopfield network as a universal solver for the pattern recognition problem
The Hopfield network can be used for image recognition, if the Hopfield network is used as
an associative memory, where the stored images are the patterns to be recognized. The only
problem is that respect to the number of images the size of the network drastically increases.
Bitmap Image
HNN
Recalled Image
Stored Images
The question is how to construct the network in a way, that it always converges to one of the
stored patterns, and that pattern is the right one. In Figure 11. one can follow how pattern
recognition works well, and in Figure 12. how it misses to recognize the characters and
converges to wrong steady states.
a)
b)
a)
b)
c)
d)
c)
d)
12
a)
b)
a)
b)
c)
d)
c)
d)
11. Figure One can see how the Hopfield network associates to the right characters a) Output of the
HNN, b) The character to be recognized, c) The character corrupted by noise, d) The noise
Although in Figure 10. one can follow the work of the HNN as a good character recognizer,
we must consider, that the probability of recognizing the right number is very low. This can
be explained with the very large correlation between the characters. In our case the noise
power was taken to be 3dB. Number 1 was approximately always recognized well (error
probability was about 103 ). For number 0, 2 and 3 the match probability was only 0.1, which
is extremely low, but can be explained by the high similarity of the numbers.
a)
b)
a)
b)
c)
d)
c)
d)
a)
b)
a)
b)
c)
d)
c)
d)
12. Figure Examples of bad character recognition a) Output of the HNN, b) The character to be
recognized, c) The character corrupted by noise, d) The noise. Note that in the case of recognizing
digit one, the noise energy was increased.
It has been seen that the HNN, for this kind of recognition problem, can stuck into unwanted
steady states. The question is how to solve the problem. The idea of coding the stored patterns
13
can give better results. The point is to construct such a transformation, where the attraction
area is increased (see Figure 12.), and the probability of mismatch is decreased. Where there
are only as many steady states as many characters to be recognized.
------------------ Spurious steady states jn ide ------------------------
determined as
, where N is the dimension of the patterns.
2 log
In practice the memory items S = s , = 1,, are not independent. So we have to
encode the original S into S which satisfies the quasy orthonogonality. The coder F : S S '
must be topologically invariant with small complexity.
Decoder
Coder
) (
d x,r s
Hopfield
Network
s ()
Random patterns
-1
r ()
{(s ,r ) = 1,2,..., M}
Storage phase
Real patterns
s ()
=1,2,...,M
r ()
x'i = sgn M ij x j
j =1
This relation assures a small complexity mapping that can easily be calculated. The decoder
could be the (Moore-Penrose) inverse matrix of F.
d ( x , s ) d ( x , s ), = 1..M,
The coders output using the x vector as the input with the specified attribute:
N
j =1
j =1
1
N
j =1
=1
1
N
s j x j + si '
=1
j =1
s j x j } = sgn{si '
j =1
15
1
N
1
N
s x }
=1
s x
j =1
+ i}
where i =
1 '
si s j x j
=1 j =1
Note that the inner product of two vectors aTb where a,b {1,1} can be expressed with
their Hamming distance as aTb = a i bi = 2d ( a,b ) , where d ( a,b ) denotes the Hamming
i=
distance.
Using the remarks:
x 'i = sgn{
1
N
x'i si ' =
i =1
N 2d(x , x)
+ i }
N
1
x'i Eyi (1) = 1P(x'i = 1) 1P(x'i = 1) =
i =1
N 2d(x , x)
N 2d(x , x)
+ i )) + P(1 = sgn(
+ i )) =
N
N
N 2d(x , x)
N 2d(x , x)
= P(sgn( i >
) P( i <
)=
N
N
N 2d(x , x)
M -1
= 2 (
) 1 =
N
N
'
The Hamming distance between x ' and s will be:
N 2d(x, s ' )
'
d(x, s ) = N(1 (
))
N
So the problem d( x, s ' ) d(x, s ' ) is equivalent with (see Figure. 16.):
= P(1 = sgn(
x N (1 (
N 2x
)) x (0, N )
N
16
The statistical gain strongly depends on M. If M is small G will be large. The optimal
value of G might converge to 1.
cams
x
Audio sugnal
Video sugnal
Pre
proc.
HN
HN
r
b)
a)
18. Figure a) The block diagram of the surveillance system, b) the Hopfield network in the
surveillance system as an associative memory
Optimization
problem
opt L(y )
y
Quadratic programming
HN
~
~
y opt : min N y T Wy 2b T y , the following equality should be examined y opt = ~
y opt .
y{1,1}
y W
i
ij
2
~
y j 2 bi yi = yiWij y j + Wii yi 2 bi yi =Wij yi y j + Wii 2 bi yi =
i
j
i j
independent from y
~
~
= const + y T Wy 2b T y = y T Wy 2b T y
As the minimization is independent from the diagonal elements of the weight-matrix, the
minima of the function will be exactly the same as the minima of the original Lyapunov
function. This will lead to a modified state transition rule:
N ~
yl (k + 1) = sgn Wlj y j (k ) bl
i =1
As the minimization is the same for both Lyapunov functions the global upper bound and
lower bound is also the same for the modified network.
Global upper bound: N W + 2 N b
Global lower bound: b T Wb
To examine the behavior of the HN as an optimizer, the change of the Lyapunov function
As the diagonal elements are cancelled, only the second part of the expression is taken into
account. This means that by minimization the change is:
L( y( k ))
y l (k )
y l (k + 1)
y l (k )
y l ( k ) Wlj y j ( k ) bl
j
-1
-1
-2
( )) it seems plausible to
themselves to polynomial optimization. There are some problems, however, which yield
themselves to quadratic optimization, thus can be solved by HN. One of the most important
classes of problem, like this is the routing problem, or traveling salesman problem as it is
known in the literature. In the following sections we will investigate this problem closely.
How to use HN as an optimizer
Comb.
y : min L( y)
Binary
Data
Quadratic opt
opt.
problem representation representation Optimization
HN
DeRepresentation
Opt
solution
NP hard
20. Figure The philosophy of quadratic programming
Limitations:
an optimal path (a sequence of cities visited one after another), described by the sequence
{ B, A, D, N } which yields a minimum distance path. In this way an associated cost function
which depends on the distance (e.g. price of fuel associated with the travel) has been
minimized. Figure 20. shows some solutions in the case of 5 cities.
A
dAB
dAB
C E
dAB
a)
b)
c)
21. Figure The traveling salesman problem for five cities with a) fully connected graph, b)
suboptimal solution and c) the optimal solution
It is easy to see that the third solution is more favourable than the second one.
19
cities
\ time
.
.
.
N
where vij {0,1}indicates, whether at time instant i the salesman is in the city j (vij = 0 ) if not
vij = 1 if yes.
j l
Simply minimizing this function, however, one can find a solution, which does not fulfill the
properties of a valid path listed above. Thus those properties must be built in the optimization
function as constraints.
Calculation of the goal function to be minimized, including the constraints
As was mentioned above the goal function must be extended to include the constraints.
Therefore, instead of (1.2), we use the following function (which is to be minimized):
2
j i
j i
i j
+ Vij Vi +1,l D jl
i
j l
(1.3)
y Nj+ n
y
+ 0.5 +
(y ) = Ni + n + 0.5
2
j
2
i
n
j i
y Nn + j
y
+ 0.5 +
+ Ni + n + 0.5
2
j
2
i
n
j i
y Ni + j
+
+ 0.5 n +
2
j
i
j i
y Ni + j
y
+
+ 0.5 Ni + N +l + 0.5 D lj =
2
j
i
l
2
j l
= y My 2c T y
T
As the derivation above indicates the problem has been reduced to the optimization of a
quadratic form, which is the Lyapunov function of the Hopfield algorithm
y opt : min (y ) = min yMy 2c T y
y
y e (k + 1) = sgn M ej y j (k ) k e
j =1
2
(k ) = y e (k )M ee + 2y e (k ) M ej y j (k ) b e 0
j=1
Coder
Channel
Distortion
x
Clue
HNN
c , = 1,2,..., M
21
Channel
Max
likelihood
~y
is: x = Hy + , where y {-1,1} is the sent signal, x R , H is the channel matrix, and is
the additional noise.
1
1
( x - Hy )T K 1 ( x - Hy )
P(x y ) =
e2
N
(2) det K
T
~
y : max P(x y ) = min (x - Hy ) K 1 (x - Hy ) = min y T Wy - 2b T y
N
y{1,1}N
y{1,1}N
y{1,1}N
Here K is the covariance matrix of the noise, and W=HT K -1H,b=xT HT K -1 . Unfortunately by
exhaustive search this problem has exponential complexity, which means cannot be solved
real time. Therefore suboptimal solution is needed, which guarantees real time solution. Such
an outcome can be assured using the Hopfield net, and the detection scheme by the HN can be
followed on Figure 22.
W l j
Wlj = lj
0 l= j
N ~
y l (k + 1) = sgn Wlj y j (k ) b l O( N 2 )
j=1
HN
22
~y