Sie sind auf Seite 1von 13

Chapter 6

Projected Gustafson Kessel


Possibilistic Clustering
Clustering literature exhibits that, high dimensional data set renders the cluster-
ing problem harder. This problem is further enhanced in the noisy environments.
Moreover, fuzzy clustering methods fail to generate appropriate possibilistic mem-
bership grades. Hence, extending the intutive concept behind the degree of be-
longingness PCM results in possibilistic partitions over probabilistic partitions. To
accomplish the task of projected clustering we present a novel algorithm, in the
framework of possibility theory. It is eectuated by enclosing new weight vector
which sets apart continum of grades of each dimension in each cluster. In this
chapter, we propose a new objective function for projective possibilistic clustering
which automatically detects the relevant cluster dimensions. Experimental results
indicate that it enhances the eciency of clustering solution by simultaneously
pruning away the irrelevant subspaces. The structure of this chapter is as follows:
in section 6.2, we present the PCM algorithm. In section 6.3 we describe the pro-
posed projected PCM algorithm in detail and discuss its convergence in section
6.4, results based on UCI data set are presented in section 6.5, and nally section
6.6 contains conclusions and suggestions for future work.
115
6.1 Introduction
Traditional probabilistic algorithms, such as FCM uses a constraint on the mem-
berships that the sum of the memberships of an object across all classes must equal
one. Hence the membership constraint causes the FCM to generate memberships
that can be interpreted as degrees of sharing but not as degrees of typicality. It
results in the cases where memberships of two points in a given cluster that are
equidistant from the prototype of the cluster can be signicantly dierent and
memberships of two points in a given cluster can be equal even though the two
points are arbitrarily far away from each other [117]. It cannot estimate class cen-
ter appropriately when noisy or outlying points exist. In order to overcome such
drawbacks, robust techniques such as Possibilistic c-Means(PCM) clustering have
been proposed [117]. In the possibi1istic approach of clustering the membership
value of a point in a cluster represents the typicality of the point in the cluster, or
the possibility of the point belonging to the cluster.
Mixture models have the underlying assumption that a data object belongs to
a single cluster. However, a data object may belong to multiple clusters. Also
mixture model constraints the membership probabilities of a data point to sum to
one for each cluster. Thus, the presence of outliers cannot be detected. In [110]
Chatizs and Tsechpenakis have proposed an possibilstic clustering method based
on generative mixture models. They have proposed a maximum-likelihood tting
algorithm for their proposed model. In [111] Coppi and et al. have proposed
clustering models for LR
2
data using fuzzy and possibilistic clustering. It detects
clusters described by fuzzy variables. In [112] Ji et al. have extended possibilistic
fuzzy c-means algorithm to segment MR images with intensity inhomogeneites and
noise. They have proposed a method to compute weights for the neighbourhood
of each pixel in the image. Thay have also proposed an energy minimization
method. This method combines both local and global intensity information to
address intensity inhomogeneity. Their proposed method is robust to parameter
setting and does not get aected by initializations. In [113] Romdhane and Ayeb
have proposed a two step unsupervised possibilistic clustering methods to mine
microarray gene expression data. It nds the optimal number of clusters using the
information entropy. It partitions the data using possibilistic clustering. In the
second step it selects the most informative genes from each cluster and model them
as proximity graphs which are further used to predict the functioning of unknown
genes. In [114] Wang has proposed an alternating optimization based possibilstic
shell clustering algorithm. It detects generic-template based shell clusters. Three
dierent types of transformations have been proposed to obtain cluster prototypes
from a given template with dierent degrees of complexity and exibility. In [115]
Yang and Lai have extended possibilistic c-means clustering algorithm to make
it robust to noise, cluster number, volumes and initializations. Initially all data
points are treated as cluster centers. It then automatically merges the data points
to obtain clusters. In [116] Yu and Luo have proposed a possibilstic fuzzy leader
algorithm. It resolves the noise sensitivity issue of FCM and coincident cluster
formation by PCM.
6.2 Possibilistic c-Means Clustering
Fuzzy clustering algorithm outperforms hard clustering algorithms by incorporat-
ing a fuzzy extension of the least squares error criterion subject to a probabilistic
constraint that the memberships of a data point across clusters summed up to one.
However, the memberships resulted from the FCM algorithms did not consistently
render typicality. In order to overcome the above diculties Krishnapuram and
Keller introduced the Possibilistic c-means (PCM) algorithm [117] In PCM each
cluster is independent of the other clusters. It removes the constraint associated
with the FCM algorithm and adds an additional term to the FCM objective func-
tion creating an algorithm more resistant to noise than the FCM algorithm.
The possibilistic partition space for (X, k), is the set:
M
fk
= {U
kn
|
ij
[0, 1] , i, j;

k
i=1

ij
1, j; 0 <

n
j=1

ij
< n, i}
The possibilistic c-means objective function is formulated as:
J
m
=

n
j=1

k
i=1

m
ij
d
2
(x
j
, z
i
) +

k
i=1

n
j=1
(1
ij
)
m
The coecient m (1, ) is a fuzzication parameter.
6.3 Projected Gustafson Kessel Possibilistic Clus-
tering
The possiblistic partitioning subspace for (X, d) is the set
M
fd
= {W
kd
|
ir
[0, 1] i, r;

d
r=1

ir
1, i; 0 <

k
i=1

ir
< k, r}
Thus, we formulate a new possiblistic objective function as:
J
,
=

N
j=1

k
i=1

d
r=1

ij

ir
d
2
ijr

c
i=1

N
j=1
(1
ij
)

+

c
i=1

d
r=1
(1
ir
)

where,
d
2
ijr
= (x
jr
z
ir
)(x
jr
z
ir
)

i
(
i
) is dened in a similar fashion as in [117].

i
= K

n
j=1

d
r=1

ij
d
2
ijr

n
j=1

d
r=1

ij

i
= K

n
j=1

d
r=1

ir
d
2
ijr

n
j=1

d
r=1

ir
Possibilistic partitioning subspace for (X, k) and (X, d) together forms the new par-
tition space for high dimensional data set X. Parameters (1, ) , (1, )
are weighting components. These parameters control the fuzzication of
ij
(
ir
).
Larger the value of () the more equal the distribution of
ij
and
ir
giving
each pattern an equal chance to impact all clusters and dimensions. Value of ()
closer to 1 indicates good clustering behaviour as
ij
(
ir
) assigns higher values to
clusters(subspaces).
Inputs
n: size of data set X
k: number of clusters, 1 < k < n
: the weight exponent of matrix U, > 1
: the weight exponent of matrix W, > 1
: the termination tolerance, > 0
A: the norm-inducing matrix
Outputs
U: membership matrix of objects in clusters
W: matrix indicating relevance of dimensions for clusters
Z: cluster centers
Initialize the partition matrices U, W randomly
t=0
while
_
_
U
t
U
t1
_
_
> do
Step 1. Compute the cluster prototypes
z
t
ir
=

n
j=1
(
t1
ir
)

(
t1
ij
)

x
jr

n
j=1
(
t1
ir
)

ij
Step 2. Compute the cluster covariance matrices
F
t
i
=

n
j=1
(
t1
ir
)

(
t1
ij
)

_
x
jr
z
t
ir
_ _
x
js
z
t
is
_
1 r d, 1 s d.
Step 3. Compute the distances
d
2
ijr
= [
_
x
j1
z
t
i1
_
...
_
x
jd
z
t
id
_
]A
i
[
_
x
j1
z
t
i1
_
...
_
x
jd
z
t
id
_
]
T
Step 4. Update the partition matrices

ij
= 1/1 +
_
d
r=1
(
ir
)

d
2
ijr

i
_
1/1
end

ir
= 1/1 +
_

N
j=1
(
ij
)

d
2
ijr

i
_
1/1
Step 5. t = t+1
Algorithm 6.1: Projected Gustafson Kessel Possibilistic Clustering Algo-
rithm
6.4 Convergence
Theorem 6.1 Let : M
kd
f
, (U) = J
,
(U, W, Z), where W M
kd
f
and
Z
kd
are xed. Then U M
kn
f
is the strict local minima of if and only if U
is computed by equation:
ij
= 1/1 +
_
d
r=1
(
ir
)

d
2
ijr

i
_
1/1
Proof 6.1 We have to minimize J
,
with respect to U, W, subject to constraints
5.7. We formulate the objective function as follows:
J
,
=
n

j=1
k

i=1
d

r=1

ij

ir
d
2
ijr
+
k

i=1

i
n

j=1
(1
ij
)

+
k

i=1

i
d

r=1
(1
ir
)

k
i=1

i
(det(A
i
)
i
)(6.1)
In order to obtain the rst order necessary condition for optimality, we set the
gradient of J
,
w.r.t A
i
equal to zero. We use the following identities [25] and
obtain:
x
T
j
A
i
x
j
= x
j
x
T
j
and det(A
i
) = det(A
i
)A
1
i
J
A
i
=
_
_
n

j=1

ij

ir
(x
jr
z
ir
)(x
js
z
is
)
_
_
dd
+
i
det(A
i
)A
1
i
= 0 (6.2)
Writing rst part of summation in eq. 6.2, as
F
i
=
n

j=1

ij

ir
(x
jr
z
ir
)(x
js
z
is
), 1 r d, 1 s d (6.3)
Eq. 6.2 can be rewritten as:
F
i
=
i

i
A
1
i
(6.4)
Multiplying by A
i
on both sides we get:
F
i
A
i
=
i

i
I
Taking determinant on both sides we get:
det(F
i
A
i
) = det(
i

i
I)
or equivalently
det(F
i
A
i
) = (
i

i
)
d
(6.5)
Using eq. 6.4 and eq. 6.5 we get:
A
i
= (det(F
i
)
i
)
1/d
F
1
i
(6.6)
Now, we prove the suciency condition. The metric used above is Mahalanobis
distance which is generated by a convex function and J
,
is the linear combination
of convex functions hence convex. As critical point of a convex function is minima
and local minima of convex function is global minima thus, suciency condition
is proved.
Next, we compute the rst order derivate of J
,
with respect to
ij
, which is a
necessary condition for optimality.
J
,

ij
=

d
r=1

1
ij

ir
d
2
ijr

i
(1
ij
)
1
[

d
r=1

1
ij

ir
d
2
ijr

i
(1
ij
)
1
] = 0

d
r=1

1
ij

ir
d
2
ijr
=
i
(1
ij
)
1

d
r=1

ir
d
2
ijr

i
=
(1
ij
)
1

1
ij
[

d
r=1

ir
d
2
ijr

i
]
1/(1)
=
(1
ij
)

ij
1 + [

d
r=1

ir
d
2
ijr

i
]
1/(1)
=
1

ij

ij
=
1
1+[

d
r=1

ir
d
2
ijr

i
]
1/(1)
Now, to prove the suciency condition we compute the second order partial deriva-
tive

2
J
,

ij

=
_

_
( 1)

d
r=1

2
ij

ir
d
2
ijr
+ ( 1)
i
(1
ij
)
2
if i = i

j = j

,
0 otherwise.
Hence there are n distinct eigen values each of multiplicity k, of Hessian matrix
of U which is a diagonal matrix. With the assumptions > 1 and,
ij
> 0 i, j,
it implies

2
J
,

ij

> 0. Thus, Hessian matrix of U is positive denite and hence,


the suciency condition is proved.
Theorem 6.2 Let : M
kd
f
, (W) = J
,
(U, W, Z), where U M
kn
f
and
Z
kd
is xed. Then W M
kd
f
is the strict local minima of if and only if W
is computed by equation:
ir
= 1/1 +
_
n
j=1
(
ij
)

d
2
ijr

i
_
1/1
Proof 6.2 Now, we compute the rst order derivative of J
,
with respect to
ir
,
which is a necessary condition for optimality.
J
,

ij
=

n
j=1

ij

1
ir
d
2
ijr

j
(1
ir
)
1
[

n
j=1

ij

1
ir
d
2
ijr

j
(1
ir
)
1
] = 0

n
j=1

ij

1
ir
d
2
ijr
=
j
(1
ir
)
1

n
j=1

ij
d
2
ijr

j
=
(1
ir
)
1
ir
1
[

n
j=1

ij
d
2
ijr

j
]
1/(1)
=
1

ir
1
1 + [

n
j=1

ij
d
2
ijr

j
]
1/(1)
=
1

ir

ir
=
1
1+[

n
j=1

ij
d
2
ijr

j
]
1/(1)
Now, to prove the suciency condition we compute the second order partial deriva-
tive

2
J
,

ir

=
_

_
( 1)

n
j=1

2
ir

ij
d
2
ijr
+ ( 1)
i
(1
ir
)
2
if i = i

j = j

,
0 otherwise.
Hence there are k distinct eigen values each of multiplicity r, of Hessian matrix
of W which is a diagonal matrix. With the assumptions > 1, and
ir
> 0 i, r
it implies

2
J
,

ir

> 0. Thus, Hessian matrix of W is positive denite and hence,


the suciency condition is proved.
Theorem 6.3 Let :
kd
, (z) = J
,
(U, W, Z), where U M
kn
f
and W
M
kd
f
is xed. Then Z is the strict local minima of if and only if z
i
1 i k is
computed by equation:
z
ir
=
n

j=1

ij
x
jr
/
n

j=1

ij
Proof 6.3 To minimize J
,
with respect to prototype Z, we x U and W.
Let,

d
r=1

ir
d
2
ijr
=

d
2
ij
Now, taking rst derivative of J
,
w.r.t z
i
we get:
J
z
i
=
n

j=1

ij
(z
i
x
j
) = 0 (6.7)
Solving it for z
i
we obtain:
z
i
=
n

j=1

ij
x
j
/
n

j=1

ij
(6.8)
or,
z
ir
=
n

j=1

ij
x
jr
/
n

j=1

ij
(6.9)
Hence, the necessary condition is proved. Now to prove the suciency condition
we need to compute the second derivate of J
,
with respect to prototype Z.

2
J
z
i
=
n

j=1

ij
(6.10)
Since,

2
J
z
2
i
> 0.
Hence the suciency condition is proved.
Theorem 6.4 Let
1
> 0 and
1
> 0be xed and X = {x
1
, x
2
, ..., x
n
} contain
at least k(< n) distinct points. Let the solution set S of the optimization problem
min
(W,Z,U)M
kd
f

kd
M
kn
f
J
,
(W, Z, U))
S = {(

W,

Z,

U) M
kd
f

kd
M
kn
f
|J
,
(

W,

Z,

U) < J
,
(

W,

Z, U) U =

U and J
,
(

W,

Z,

U) < J
,
(W,

Z,

U) W =

W and J
,
(

W,

Z,

U) <
J
,
(

W, Z,

U) Z =

Z}
Let (

W,

Z,

U) M
kd
f

kd
M
kn
f
. Then J
,
(

W,

Z,

U) J
,
(

W,

Z,

U) for every
(

W,

Z,

U) T(

W,

Z,

U) and the inequality is strict if (

W,

Z,

U) / S.
Proof follows as in Theorem 11 of [23].
Proof 6.4 Let (

W,

Z,

U) T
,
(

W,

Z,

U), then (

W,

Z,

U) A
3
oA
2
oA
1
(

W,

Z,

U),
where A
1
, A
2
and A
3
are dened in eq. 6.2, 3.17 and 3.18. From the deni-
tion of A
1
, A
2
and A
3
, and theorems 6.1, 6.2 and 6.3, it follows that (

W,

Z,

U)
M
kd
f

kd
M
kn
f
and
J
,
(

W,

Z,

U) J
,
(

W,

Z,

U) J
,
(

W,

Z,

U) J
,
(

W,

Z,

U), (6.11)
which implies
J
,
(

W,

Z,

U) J
,
(

W,

Z,

U)
We now show that J
,
(

W,

Z,

U) < J
,
(

W,

Z,

U) if (

W,

Z,

U) / S. Using
proof by contradiction, we assume that J
,
(

W,

Z,

U) = J
,
(

W,

Z,

U) for some
(

W,

Z,

U) / S. Then from eq. 6.11
J
,
(

W,

Z,

U) = J
,
(

W,

Z,

U) = J
,
(

W,

Z,

U) = J
,
(

W,

Z,

U) (6.12)
From theorem 6.3

Z = F(

W,

U) is the global minimizer of J
,
(

W, Z,

U) which
has a unique global minimizer, we have

Z =

Z similarly,

W =

W.
Since (

W,

Z,

U) / S, from the denition of S we have the following three cases:
(a)J
,
(

W,

Z,

U) J
,
(

W,

Z, U

) for some U

=

U;
(b)J
,
(

W,

Z,

U) J
,
(

W, Z

,

U) for some Z

=

Z; or
(c)J
,
(

W,

Z,

U) J
,
(W

,

Z,

U) for some W

=

W
In case(a), as

Z =

Z and

W =

W, from theorem 6.1 we have
J
,
(

W,

Z,

U) > J
,
(

W,

Z, U

) = J
,
(

W,

Z, U

) J
,
(

W,

Z,

U) which con-
tradicts our assumption.
In case(b), as

Z = Z

and

Z =

Z, from theorem 6.3 and eq. 6.12 we have
J
,
(

W,

Z,

U) J
,
(

W, Z

,

U) > J
,
(

W,

Z,

U) = J
,
(

W,

Z, U

) J
,
(

W,

Z,

U),
which contradicts our assumption.
In case(c), as

Z =

Z,

W =

W and W

=

W,from theorem 6.2 and eq. 6.12 we have
J
,
(

W,

Z,

U) J
,
(W

,

Z,

U) = J
,
(W

,

Z,

U) > J
,
(

W,

Z,

U) = J
,
(

W,

Z,

U),
which contradicts the assumption. Hence, theorem is proved.
Lemma 6.5 Let M : X Y be a function and : Y P(Y ) be a point-to-set
map. If M is continuous at a
(0)
and is closed at M(a
(0)
), then the point-to-set
map oM : X P(V ) is closed at a
(0)
.
Theorem 6.6 Let
1
> 0 and
1
> 1 be xed and X = {x
1
, x
2
, ..., x
n
} contain
at least k(< n) distinct points. Then the point-to set map T
,
: M
kd
f

kd
M
kn
f

P(M
kd
f

kd
M
kn
f
) is closed at every point in M
kd
f

kd
M
kn
f
.
To prove the above theorem we need to prove the following:
a) The function A
1
: M
kd
f

kd
M
kn
f

kd
M
kn
f
as dened in eq. 6.2 is
continuous at every point in M
kd
f

kd
M
kn
f
.
b) The function A
2
:
kd
M
kn
f
M
kd
f

kd
as dened in eq. 3.17 is continuous
at every point in
kd
M
kn
f
.
c) To prove A
3
: M
kd
f

kd
P(M
kd
f

kd
M
kn
f
) as dened in eq. 3.18 is
closed on M
kd
f

kd
M
kn
f
.
Proof 6.5 Proof follows immediately from lemma 19, 20, 21 respectively in [23].

Das könnte Ihnen auch gefallen