Sie sind auf Seite 1von 5

Activity:

Explain the fuzzy c-means clustering technique


Try to cluster data points into fuzzy clusters using the FCMC technique
Given a set of data, clustering techniques partition the data into several groups such that the
degree of association is strong within one group and weak between data in different groups.
Classical crisp clustering techniques result in crisp partitions where each data point can
belong to only one cluster. This situation is illustrated in Fig. 3-1-1 where six data points are
clustered into two clusters. In this example the number of dimensions of data is 2. (i.e. the so
called feature space is two-dimensional.)

Fig. 3-1-1. A simple example for crisp clustering

Fuzzy clustering by contrast allows data points to belong to more than one group. The
resulting partition is therefore a fuzzy partition. Each cluster is associated with a membership
function that expresses the degree to which individual data points belong to the cluster.
Among all fuzzy clustering methods, Fuzzy c-Means Clustering (FCMC) remains
predominant in the literature [2] due to its successful application both in the academia as well
as industry.
Activity:
Observe that the number of clusters should be given in advance
Observe the iterativity in the FCMC algorithm
Fuzzy c-Means Clustering performs clustering by iteratively searching for a set of fuzzy
clusters and the associated cluster centres that represent the structure of the data as best as
possible. The algorithm relies on the user to specify the number of clusters present in the set
of data to be clustered. Given a number of clusters c, FCMC partitions the data X =
{x 1 ,x 2 ,,x n } into c fuzzy clusters by minimising the within group sum of squared error
objective function as follows:

J m (U ,V ) = (U ik ) || xk vi || ,
m

(1)

k =1 i =1

1 m

where J m (U,V) is the sum of squared error for the set of fuzzy clusters represented by the
membership matrix U, and the associated set of cluster centres V. ||.|| is some inner product
induced norm. In the formula, ||x k v i ||2 represents the distance between the data x k and the
cluster centre v i . The squared error is used as a performance index that measures the
weighted sum of distances between cluster centres and elements in the corresponding fuzzy
clusters. The number m governs the influence of membership grades in the performance
index. The partition becomes fuzzier with increasing m and it has been shown that the FCMC
algorithm converges for any m (1,). The necessary conditions for Eq. (1) to reach its
minimum are
1

c || x v || 2 /( m 1)
i, k
i

U ik = k

j =1 || x v ||
k
j

(2)

and
n

vi =

(U ik ) m xk

(3)

k =1
n

(U
k =1

ik

)m

In each iteration of the FCMC algorithm, matrix U is computed using Eq. (2) and the
associated cluster centres are computed as Eq. (3). This is followed by computing the square
error in Eq. (1). The algorithm stops when either the error is below a certain tolerance value or
its improvement over the previous iteration is below a certain threshold. The clustering
process is displayed in Fig. 3-1-2 (initial state) and Fig. 3-1-3 (final state) using three clusters.

Fig. 3-1-2. Clustering process (initial state)

Fig. 3-1-3. Clustering process (final state)

Activity:
See the different types of clustering algorithms, note their advantages
Over the years, many extensions and variations of FCMC have been proposed [5]. In general,
the use of different distance functions in Eq. (1) or a slight modification of the objective
function leads to clustering algorithms capable of detecting different types of clusters. Using
the Euclidean distance, FCMC is capable of detecting approximately similar-sized spherical
clusters.
Gustafson and Kessel in [6] proposed the use of a transformed Mahalanobis distance,
allowing the resulting GK clustering algorithm to detect cylinder-shaped normal clusters of
approximately the same size. The technique was analysed and compared to other clustering
techniques in [7]. The distance used is Eq. (4). In the equation, C i is the covariance matrix for

the i-th cluster and C i -1 denotes the covariance matrix inverse, d is the number of dimensions
and i = 1, is a constant.
d ik2 = 1i / d C1i / d ( x k v i ) T C i1 ( x k v i )

(4)

Fuzzy c-Elliptotypes (FCE) was designed to find linear (flat) clusters. The algorithm uses the
distance in Eq. (5). Here, d Eik 2 denotes the Euclidean distance between the cluster i and data
point k; {e i 1 , e i2 , , e ir } are the eigenvectors (arranged in descending order of the
corresponding eigenvalues) of the covariance matrix of clusters i; and r, with 0 r (d-1)
represents the number of dimensions in which the flatness extends in the d-dimensional space.
By the choice of , the cluster shape can be changed from point-shape ( = 0) via elliptic
shapes ( [0, 1]) to straight lines ( = 1).
d ik2 = d 2Lik + (1 )d 2Eik
d 2Lij = x j m i

( x j m i )e ik

(5)

k =1

The Fuzzy c-Shells (FCS) Algorithm is designed to detect circles. The distance used involves
a circle radius r, is defined as Eq. (6).
d ik2 = ( x k v i r ) 2

(6)

Rectangular clusters can be detected by the Fuzzy c-Rectangular Shells (FCRS) algorithm
using Eq. (7). Here, n s , r s are as shown in Fig. 3-1-4.

d ik2 = (min ( x k v i ) T n s + rs mod 2 | s {0,1,2,3} ) 2

(7)

Fig. 3-1-4. FCRS parameters

Table 1. summarises some of the well known cluster algorithms that have not been discussed
and the corresponding types of clusters detectable by the algorithms [8].

The review of the extensions and variations of FCMC suggests that no one fuzzy clustering
algorithm is universally applicable. The effectiveness of the algorithm depends heavily on the
data at hand. When the clusters attributes are known (e.g. size and shape), the most suitable
technique can be chosen accordingly. If such prior knowledge is not available (as is often the
case), assumptions have to be made.
Clustering Algorithm
Gath Geva clustering (GG)
Adaptive Fuzzy c-Varieties (AFC)
Fuzzy c-Spherical Shells algorithm (FCSS)
Fuzzy c-Rings algorithm (FCR)
Fuzzy c-Quadratic Shells algorithm (FCQS)

Types of clusters detectable by the


algorithm
Ellipsoidal clusters with varying size
Line segments in 2D data
Clusters of circle shape
Clusters of circle shape
Ellipsoids

Table 1. Clustering algorithm capable of detecting different types of clusters

Das könnte Ihnen auch gefallen