Beruflich Dokumente
Kultur Dokumente
k
i=1
ij
1, j; 0 <
n
j=1
ij
< n, i}
The possibilistic c-means objective function is formulated as:
J
m
=
n
j=1
k
i=1
m
ij
d
2
(x
j
, z
i
) +
k
i=1
n
j=1
(1
ij
)
m
The coecient m (1, ) is a fuzzication parameter.
6.3 Projected Gustafson Kessel Possibilistic Clus-
tering
The possiblistic partitioning subspace for (X, d) is the set
M
fd
= {W
kd
|
ir
[0, 1] i, r;
d
r=1
ir
1, i; 0 <
k
i=1
ir
< k, r}
Thus, we formulate a new possiblistic objective function as:
J
,
=
N
j=1
k
i=1
d
r=1
ij
ir
d
2
ijr
c
i=1
N
j=1
(1
ij
)
+
c
i=1
d
r=1
(1
ir
)
where,
d
2
ijr
= (x
jr
z
ir
)(x
jr
z
ir
)
i
(
i
) is dened in a similar fashion as in [117].
i
= K
n
j=1
d
r=1
ij
d
2
ijr
n
j=1
d
r=1
ij
i
= K
n
j=1
d
r=1
ir
d
2
ijr
n
j=1
d
r=1
ir
Possibilistic partitioning subspace for (X, k) and (X, d) together forms the new par-
tition space for high dimensional data set X. Parameters (1, ) , (1, )
are weighting components. These parameters control the fuzzication of
ij
(
ir
).
Larger the value of () the more equal the distribution of
ij
and
ir
giving
each pattern an equal chance to impact all clusters and dimensions. Value of ()
closer to 1 indicates good clustering behaviour as
ij
(
ir
) assigns higher values to
clusters(subspaces).
Inputs
n: size of data set X
k: number of clusters, 1 < k < n
: the weight exponent of matrix U, > 1
: the weight exponent of matrix W, > 1
: the termination tolerance, > 0
A: the norm-inducing matrix
Outputs
U: membership matrix of objects in clusters
W: matrix indicating relevance of dimensions for clusters
Z: cluster centers
Initialize the partition matrices U, W randomly
t=0
while
_
_
U
t
U
t1
_
_
> do
Step 1. Compute the cluster prototypes
z
t
ir
=
n
j=1
(
t1
ir
)
(
t1
ij
)
x
jr
n
j=1
(
t1
ir
)
ij
Step 2. Compute the cluster covariance matrices
F
t
i
=
n
j=1
(
t1
ir
)
(
t1
ij
)
_
x
jr
z
t
ir
_ _
x
js
z
t
is
_
1 r d, 1 s d.
Step 3. Compute the distances
d
2
ijr
= [
_
x
j1
z
t
i1
_
...
_
x
jd
z
t
id
_
]A
i
[
_
x
j1
z
t
i1
_
...
_
x
jd
z
t
id
_
]
T
Step 4. Update the partition matrices
ij
= 1/1 +
_
d
r=1
(
ir
)
d
2
ijr
i
_
1/1
end
ir
= 1/1 +
_
N
j=1
(
ij
)
d
2
ijr
i
_
1/1
Step 5. t = t+1
Algorithm 6.1: Projected Gustafson Kessel Possibilistic Clustering Algo-
rithm
6.4 Convergence
Theorem 6.1 Let : M
kd
f
, (U) = J
,
(U, W, Z), where W M
kd
f
and
Z
kd
are xed. Then U M
kn
f
is the strict local minima of if and only if U
is computed by equation:
ij
= 1/1 +
_
d
r=1
(
ir
)
d
2
ijr
i
_
1/1
Proof 6.1 We have to minimize J
,
with respect to U, W, subject to constraints
5.7. We formulate the objective function as follows:
J
,
=
n
j=1
k
i=1
d
r=1
ij
ir
d
2
ijr
+
k
i=1
i
n
j=1
(1
ij
)
+
k
i=1
i
d
r=1
(1
ir
)
k
i=1
i
(det(A
i
)
i
)(6.1)
In order to obtain the rst order necessary condition for optimality, we set the
gradient of J
,
w.r.t A
i
equal to zero. We use the following identities [25] and
obtain:
x
T
j
A
i
x
j
= x
j
x
T
j
and det(A
i
) = det(A
i
)A
1
i
J
A
i
=
_
_
n
j=1
ij
ir
(x
jr
z
ir
)(x
js
z
is
)
_
_
dd
+
i
det(A
i
)A
1
i
= 0 (6.2)
Writing rst part of summation in eq. 6.2, as
F
i
=
n
j=1
ij
ir
(x
jr
z
ir
)(x
js
z
is
), 1 r d, 1 s d (6.3)
Eq. 6.2 can be rewritten as:
F
i
=
i
i
A
1
i
(6.4)
Multiplying by A
i
on both sides we get:
F
i
A
i
=
i
i
I
Taking determinant on both sides we get:
det(F
i
A
i
) = det(
i
i
I)
or equivalently
det(F
i
A
i
) = (
i
i
)
d
(6.5)
Using eq. 6.4 and eq. 6.5 we get:
A
i
= (det(F
i
)
i
)
1/d
F
1
i
(6.6)
Now, we prove the suciency condition. The metric used above is Mahalanobis
distance which is generated by a convex function and J
,
is the linear combination
of convex functions hence convex. As critical point of a convex function is minima
and local minima of convex function is global minima thus, suciency condition
is proved.
Next, we compute the rst order derivate of J
,
with respect to
ij
, which is a
necessary condition for optimality.
J
,
ij
=
d
r=1
1
ij
ir
d
2
ijr
i
(1
ij
)
1
[
d
r=1
1
ij
ir
d
2
ijr
i
(1
ij
)
1
] = 0
d
r=1
1
ij
ir
d
2
ijr
=
i
(1
ij
)
1
d
r=1
ir
d
2
ijr
i
=
(1
ij
)
1
1
ij
[
d
r=1
ir
d
2
ijr
i
]
1/(1)
=
(1
ij
)
ij
1 + [
d
r=1
ir
d
2
ijr
i
]
1/(1)
=
1
ij
ij
=
1
1+[
d
r=1
ir
d
2
ijr
i
]
1/(1)
Now, to prove the suciency condition we compute the second order partial deriva-
tive
2
J
,
ij
=
_
_
( 1)
d
r=1
2
ij
ir
d
2
ijr
+ ( 1)
i
(1
ij
)
2
if i = i
j = j
,
0 otherwise.
Hence there are n distinct eigen values each of multiplicity k, of Hessian matrix
of U which is a diagonal matrix. With the assumptions > 1 and,
ij
> 0 i, j,
it implies
2
J
,
ij
d
2
ijr
i
_
1/1
Proof 6.2 Now, we compute the rst order derivative of J
,
with respect to
ir
,
which is a necessary condition for optimality.
J
,
ij
=
n
j=1
ij
1
ir
d
2
ijr
j
(1
ir
)
1
[
n
j=1
ij
1
ir
d
2
ijr
j
(1
ir
)
1
] = 0
n
j=1
ij
1
ir
d
2
ijr
=
j
(1
ir
)
1
n
j=1
ij
d
2
ijr
j
=
(1
ir
)
1
ir
1
[
n
j=1
ij
d
2
ijr
j
]
1/(1)
=
1
ir
1
1 + [
n
j=1
ij
d
2
ijr
j
]
1/(1)
=
1
ir
ir
=
1
1+[
n
j=1
ij
d
2
ijr
j
]
1/(1)
Now, to prove the suciency condition we compute the second order partial deriva-
tive
2
J
,
ir
=
_
_
( 1)
n
j=1
2
ir
ij
d
2
ijr
+ ( 1)
i
(1
ir
)
2
if i = i
j = j
,
0 otherwise.
Hence there are k distinct eigen values each of multiplicity r, of Hessian matrix
of W which is a diagonal matrix. With the assumptions > 1, and
ir
> 0 i, r
it implies
2
J
,
ir
j=1
ij
x
jr
/
n
j=1
ij
Proof 6.3 To minimize J
,
with respect to prototype Z, we x U and W.
Let,
d
r=1
ir
d
2
ijr
=
d
2
ij
Now, taking rst derivative of J
,
w.r.t z
i
we get:
J
z
i
=
n
j=1
ij
(z
i
x
j
) = 0 (6.7)
Solving it for z
i
we obtain:
z
i
=
n
j=1
ij
x
j
/
n
j=1
ij
(6.8)
or,
z
ir
=
n
j=1
ij
x
jr
/
n
j=1
ij
(6.9)
Hence, the necessary condition is proved. Now to prove the suciency condition
we need to compute the second derivate of J
,
with respect to prototype Z.
2
J
z
i
=
n
j=1
ij
(6.10)
Since,
2
J
z
2
i
> 0.
Hence the suciency condition is proved.
Theorem 6.4 Let
1
> 0 and
1
> 0be xed and X = {x
1
, x
2
, ..., x
n
} contain
at least k(< n) distinct points. Let the solution set S of the optimization problem
min
(W,Z,U)M
kd
f
kd
M
kn
f
J
,
(W, Z, U))
S = {(
W,
Z,
U) M
kd
f
kd
M
kn
f
|J
,
(
W,
Z,
U) < J
,
(
W,
Z, U) U =
U and J
,
(
W,
Z,
U) < J
,
(W,
Z,
U) W =
W and J
,
(
W,
Z,
U) <
J
,
(
W, Z,
U) Z =
Z}
Let (
W,
Z,
U) M
kd
f
kd
M
kn
f
. Then J
,
(
W,
Z,
U) J
,
(
W,
Z,
U) for every
(
W,
Z,
U) T(
W,
Z,
U) and the inequality is strict if (
W,
Z,
U) / S.
Proof follows as in Theorem 11 of [23].
Proof 6.4 Let (
W,
Z,
U) T
,
(
W,
Z,
U), then (
W,
Z,
U) A
3
oA
2
oA
1
(
W,
Z,
U),
where A
1
, A
2
and A
3
are dened in eq. 6.2, 3.17 and 3.18. From the deni-
tion of A
1
, A
2
and A
3
, and theorems 6.1, 6.2 and 6.3, it follows that (
W,
Z,
U)
M
kd
f
kd
M
kn
f
and
J
,
(
W,
Z,
U) J
,
(
W,
Z,
U) J
,
(
W,
Z,
U) J
,
(
W,
Z,
U), (6.11)
which implies
J
,
(
W,
Z,
U) J
,
(
W,
Z,
U)
We now show that J
,
(
W,
Z,
U) < J
,
(
W,
Z,
U) if (
W,
Z,
U) / S. Using
proof by contradiction, we assume that J
,
(
W,
Z,
U) = J
,
(
W,
Z,
U) for some
(
W,
Z,
U) / S. Then from eq. 6.11
J
,
(
W,
Z,
U) = J
,
(
W,
Z,
U) = J
,
(
W,
Z,
U) = J
,
(
W,
Z,
U) (6.12)
From theorem 6.3
Z = F(
W,
U) is the global minimizer of J
,
(
W, Z,
U) which
has a unique global minimizer, we have
Z =
Z similarly,
W =
W.
Since (
W,
Z,
U) / S, from the denition of S we have the following three cases:
(a)J
,
(
W,
Z,
U) J
,
(
W,
Z, U
) for some U
=
U;
(b)J
,
(
W,
Z,
U) J
,
(
W, Z
,
U) for some Z
=
Z; or
(c)J
,
(
W,
Z,
U) J
,
(W
,
Z,
U) for some W
=
W
In case(a), as
Z =
Z and
W =
W, from theorem 6.1 we have
J
,
(
W,
Z,
U) > J
,
(
W,
Z, U
) = J
,
(
W,
Z, U
) J
,
(
W,
Z,
U) which con-
tradicts our assumption.
In case(b), as
Z = Z
and
Z =
Z, from theorem 6.3 and eq. 6.12 we have
J
,
(
W,
Z,
U) J
,
(
W, Z
,
U) > J
,
(
W,
Z,
U) = J
,
(
W,
Z, U
) J
,
(
W,
Z,
U),
which contradicts our assumption.
In case(c), as
Z =
Z,
W =
W and W
=
W,from theorem 6.2 and eq. 6.12 we have
J
,
(
W,
Z,
U) J
,
(W
,
Z,
U) = J
,
(W
,
Z,
U) > J
,
(
W,
Z,
U) = J
,
(
W,
Z,
U),
which contradicts the assumption. Hence, theorem is proved.
Lemma 6.5 Let M : X Y be a function and : Y P(Y ) be a point-to-set
map. If M is continuous at a
(0)
and is closed at M(a
(0)
), then the point-to-set
map oM : X P(V ) is closed at a
(0)
.
Theorem 6.6 Let
1
> 0 and
1
> 1 be xed and X = {x
1
, x
2
, ..., x
n
} contain
at least k(< n) distinct points. Then the point-to set map T
,
: M
kd
f
kd
M
kn
f
P(M
kd
f
kd
M
kn
f
) is closed at every point in M
kd
f
kd
M
kn
f
.
To prove the above theorem we need to prove the following:
a) The function A
1
: M
kd
f
kd
M
kn
f
kd
M
kn
f
as dened in eq. 6.2 is
continuous at every point in M
kd
f
kd
M
kn
f
.
b) The function A
2
:
kd
M
kn
f
M
kd
f
kd
as dened in eq. 3.17 is continuous
at every point in
kd
M
kn
f
.
c) To prove A
3
: M
kd
f
kd
P(M
kd
f
kd
M
kn
f
) as dened in eq. 3.18 is
closed on M
kd
f
kd
M
kn
f
.
Proof 6.5 Proof follows immediately from lemma 19, 20, 21 respectively in [23].