Fuzzy Clustering For Symbolic Data

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 6, NO.
2, MAY 1998 195
Fuzzy Clustering for Symbolic Data

Yasser El-Sonbaty and M. A. Ismail
Abstract—Most of the techniques used in the literature in clus- symbolic objects may include one or more elementary objects
tering symbolic data are based on the hierarchical methodology, and the data set may have a variable number of features [4].
which utilizes the concept of agglomerative or divisive methods Structured objects have higher complexity than continuous and
as the core of the algorithm. The main contribution of this paper
is to show how to apply the concept of fuzziness on a data set of symbolic objects because of their structure, which is much
symbolic objects and how to use this concept in formulating the more complex, and their representation, which needs higher
clustering problem of symbolic objects as a partitioning problem. data structures to permit the description of relations between
Finally, a fuzzy symbolic c-means algorithm is introduced as an elementary object components and facilitate hierarchical object
application of applying and testing the proposed algorithm on models that describe how an object is built up from the
real and synthetic data sets. The results of the application of the
new algorithm show that the new technique is quite efficient and, primitives. A survey of different representations and proximity
in many respects, superior to traditional methods of hierarchical measures of structured objects can be found in [10].
nature. Diday [3] and Gowda and Diday [4], [5] presented dis-
Index Terms—Fuzzy clustering, hierarchical techniques, parti- similarity and similarity measures based on position, span,
tioning techniques, soft clustering, symbolic objects. and content of symbolic objects. The distance measure is
used in the area of conventional hierarchical clustering of
symbolic data. More work can be found in the field of
I. INTRODUCTION conceptual hierarchical clustering of symbolic data. Fisher
T HE objective of cluster analysis is to group a set of ob-

jects into clusters such that objects within the same cluster
have a high degree of similarity, while objects belonging to
[6] introduced a top-down incremental conceptual clustering
using a category utility metric called the COBWEB. Cheng
and Fu [7] developed HUATUO which produced intermediate
different clusters have a high degree of dissimilarity. conceptual structures for rule-based systems. Michalski and
The clustering of data set into subsets can be divided into Stepp [8] proposed CLUSTER/2, a conjunctive conceptual
hierarchical and nonhierarchical or partitioning methods. The clustering in which descriptive concepts are conjunctive state-
general rationale behind partitioning methods is to choose ments involving relations on selected objects features and
some initial partitioning of the data set and then alter cluster optimized according to certain criterion of clustering quality.
memberships so as to obtain better partitions according to a Ralambondrainy [9] presented a conceptual version of the K-
predefined objective function. means algorithm for numeric and discrete data based on coding
Hierarchical clustering procedures can be divided into ag- symbolic data numerically and using a mix of Euclidean and
glomerative methods, which progressively merge the objects Chi-square distances to calculate the distance between the
according to some distance measure in such a way that hybrid types of data that are represented using predicates as
whenever two objects belong to the same cluster at some level a group of attribute-value tuples joined by logical operators.
they remain together at all higher levels and divisive methods, A survey of different techniques of conceptual hierarchical
which progressively subdivide the data set [1]. clustering for symbolic data can be found in [10].
Objects to be clustered usually come from an experimental From the above, it is clear that most of the algorithms
study of some phenomenon and are described by a specific set available in the literature for clustering symbolic objects,
of features selected by the data analyst. The feature values may are based on either conventional or conceptual hierarchical
be measured on different scale and these can be continuous techniques using agglomerative or divisive methods as the
numeric, symbolic, or structured. core of the algorithm [3]–[8], [10]. Although a partitioning
Continuous numeric data are well known as a classical data technique was introduced in [9], it has many drawbacks like: it
type and many algorithms for clustering this type of data coded symbolic data numerically (which distorted the original
using partitioning or hierarchical techniques can be found in data); it cannot handle interval type of data; the suggested
the literature [2]. Meanwhile, there is some research dealing distance has two weights that their values are very difficult
with symbolic objects [3]–[9] and this is due to the nature to be chosen; and the structure of the predicates selected in
of such objects, which is simple in construction but hard representing the objects are hard in processing.
in processing. Besides, the values taken by the features of The drawbacks in using hierarchical techniques are well
known in the field of data clustering. Memory size, updating
Manuscript received March 18, 1996; revised February 24, 1997. the membership matrix, complexity per iteration of calculating
Y. El-Sonbaty is with the Department of Electrical and Computer Engineer- distance function, overall complexity of the algorithm, and
ing, Arab Academy for Science and Technology, Alexandria, 1029 Egypt. giving a nondetermined classification of the patterns to name
M. A. Ismail is with the Department of Computer Science, Faculty of
Engineering, Alexandria, 21544 Egypt. a few of these difficulties faced when using any hierarchical
Publisher Item Identifier S 1063-6706(98)00806-6. based technique [2].
1063–6706/98$10.00  1998 IEEE
196 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 6, NO. 2, MAY 1998
The main contribution of this paper is to show how to apply If some of the features could exhibit a much profound effect on
the concept of fuzziness on a data set of symbolic objects. calculating the total dissimilarity, the above mentioned formula
Also, how to use this concept in formulating the clustering can be rewritten in the following format:
problem of symbolic objects as a partitioning problem. Finally,
a modified fuzzy c-means algorithm is introduced as an
application of applying and testing the proposed algorithm on
real and synthetic data sets.
where represents the weight corresponding to the th
The proposed algorithm eliminates most of the drawbacks
feature.
found in hierarchical techniques by formulating the given
The dissimilarity component due to position arises only
symbolic clustering problem as an optimization problem with
when the feature type is quantitative. It indicates the relative
a specific objective function subject to a group of constraints.
positions of two feature values on real line. The dissimilarity
The concept of fuzziness is applied here to give more meaning
component due to span indicates the relative sizes of the fea-
and easier interpretation of the results obtained from the
ture values without referring to common parts between them.
proposed algorithm.
The component due to content is a measure of the noncommon
In the following, the details of the proposed algorithm
parts between two feature values. The components , ,
are given together with the description of symbolic objects
are defined such that their values are normalized between zero
and their distance measure. Section II discusses the definition
and one [4].
and the distance measure for symbolic objects. Section III
1) Quantitative Type of and : The dissimilarity be-
describes the proposed algorithm. Applications and analysis
tween two feature values of quantitative type is defined as
of experimental results are shown in Sections IV and V.
the dissimilarity of these values due to position, span, and
content previously mentioned.
II. SYMBOLIC OBJECTS The dissimilarity component due to position is
Various definitions and descriptions of symbolic objects and lower limit of lower limit of
distance measures are found in the literature. Here, we follow length of maximum interval ( th feature)
those given by Diday [3] and Gowda and Diday [4], [5] and in
the following two sections these definitions and descriptions where length of maximum interval ( th feature) is the dif-
are demonstrated. ference between highest and lowest values of the th feature
over all the objects.
The dissimilarity component due to span is
A. Feature Types
length of length of
The symbolic object can be written as the Cartesian
product of specific values of its features ’s as span length of and
where span length of and is the length of minimum
interval containing both and .
The dissimilarity component due to content is
The feature values may be measured on different scales
resulting in the following types: 1) quantitative features, length of length of length
which can be classified into continuous, discrete. and interval of intersection of and span
values and 2) qualitative features, which can be classified into length of and
nominal (unordered), ordinal (ordered), and combinational.
The net dissimilarity between and is
B. Dissimilarity
Many distance measures are introduced in the literature for Quantitative ratio and absolute type of features are special
symbolic objects [4], [5], [12]. Here we follow the dissimilarity cases of interval type having the following properties:
distance measure introduced by Gowda and Diday [4] with a
brief explanation of this distance. length of length of
The dissimilarity between two symbolic objects and
2) Qualitative Type of and : For qualitative type of
is defined as
features, the dissimilarity component due to position is absent.
The two components which contribute to dissimilarity are as
follows.
The dissimilarity component due to span
For the th feature, is defined using the follow- length of length of
ing three components [4]: span length of and
1) due to position ; Where the length of qualitative feature value is the number
2) due to span ; of its elements and the span length of two qualitative feature
3) due to content c. values is defined as the number of elements in their union.
EL-SONBATY AND ISMAIL: FUZZY CLUSTERING FOR SYMBOLIC DATA 197
TABLE I sets. Unlike hard clustering, the patterns in fuzzy clustering

MICROCOMPUTER DATA need not commit to a cluster center indefinitely and this makes
it possible to escape from local extreme of the objective
function.
Hard clustering for symbolic objects is intuitive and can be
easily implemented using the concept of Cartesian join, which
will be mentioned later in this section. On the other hand,
fuzzy clustering of symbolic data is quite involved and is the
main concern of this paper.
A. Fuzzy Clustering for Symbolic Objects

Fuzzy c-means clustering for numerical data is the algorithm
that attempts to find a solution to the mathematical program
[16]
Minimize (1)
The dissimilarity component due to content
subject to
length of length of length
of intersection of and span
length of and
(2)
The net dissimilarity between and is
where
Illustrative Example: Assume we would like to calculate number of patterns
the distance between objects number zero (Apple II) and nine number of clusters
(Ohio Sc. II Series) shown in Table I:
a scalar,
D(Apple II, Ohio Sc. II Series) D(COLOR TV
center of cluster
B&W TV) D(48K, 48K) D(10K, 10K)
D(6502, 6502C) D(52, 53–56) degree of membership of pattern in cluster
D(COLOR TV, B&W TV) cluster centers matrix
dimension of the feature space
D(48K, 48K) 0
D(10K, 10K) 0 membership matrix
D(6502, 6502C) pattern
In applying the above algorithm to symbolic objects, two
D(52, 53–56)
main problems are encountered. These problems are as fol-
lows.
where is the maximum
Problem 1: The formation of cluster centers. This process
length of the interval “Keys”
differs from numeric objects where the centers are calculated
D(Apple II, Ohio Sc. II Series)
using the formula
This value is different from the equivalent distance published (3)

in [4] “0.42” because the distance in [4] is normalized.
III. THE PROPOSED ALGORITHM While in symbolic objects, arithmetic operations are com-
pletely absent because of the nature of the objects with which
Most of the techniques found in the literature that deal with we are dealing. The only valid operation is the Cartesian join
symbolic objects are hierarchical rather than partitioning tech- [4] which states that:
niques and their drawbacks are well known, as mentioned in
If and
Section I. In this section, a new algorithm for fuzzy clustering
are two objects, then the composite object resulting
of symbolic objects is presented. The main purpose of this
from merging and is
algorithm is to show how to apply the concept of fuzziness
[13]–[16] on a data set of symbolic objects. A modified version
of the fuzzy c-means algorithm [17]–[19] is introduced here to
test the behavior of the proposed algorithm on different data where is a Cartesian join operator.
Fig. 1. Structure of a cluster center for data set in Table I.
TABLE II
MEMBERSHIP VALUES OF PATTERNS IN TABLE I TO CLUSTER CENTER IN FIG. 1
When the th feature is quantitative or ordinal qualitative, (7)

is defined as the minimum interval that includes (8)
both and ; that is
where
where and stand for the lower and upper limits of .
When the th feature is qualitative nominal, is the
no. of features
union of and ; that is
no. of events/feature
no. of objects
In hard clustering of symbolic objects, the Cartesian join no. of clusters
operator can be used for constructing the cluster centers. In
this case, the cluster centers can be represented as the merging if the event associated with it is not a part of feature
of all symbolic objects belonging to a specific cluster. On . While if there are no events sharing this event
the other hand, the Cartesian join operator cannot be used in in forming the feature.
solving the problem of fuzzy clustering of symbolic objects The starting value for is one for all events forming the
because the domain of this problem is completely different features because the initial conditions are chosen to be distinct.
from that of hard clustering. The value of is updated using the formula
To overcome this problem, the following solution is sug-
gested.
A cluster center can be formed as a group of features,
each feature is a group of ordered pairs, each is of the form (9)
, where is the th event of feature and
is the degree of association of this event to the feature in
cluster center such that
where and if the th feature of the th
(4) pattern consists of the th event, otherwise . is the
membership of the th pattern in the th cluster.
where is the th feature of the th cluster center. It Illustrative Example: Fig. 1 shows an example of the struc-
is regarded as a fuzzy set of all possible events in the ture of a cluster center for the fuzzy clustering of the data
th feature of the patterns “ ” and the set shown in Table I assuming in Table II the membership
associated memberships are a degree of association with each values to this cluster center. The data set describes a group
event “ ” of microcomputers [8]. The data consists of 12 objects. Each
object has five features. Two of the features are qualitative
(5) (display and MP) and the rest are quantitative (RAM, ROM,
and keys).
(6) For the first feature “DISPLAY,” the structure of this feature
in the cluster center can be calculated as follows:
Fig. 2. Average number of iterations for different values of m.

(COLOR, [0.20 0.40 0.15]/3.4), (B&W, [0.50
0.60 0.20 0.15]/3.4), (BUILT-IN, [0.25 0.30
0.05 0.50]/3.4), (TERMINAL, 0.10/3.4)
(COLOR, 0.22), (B&W, 0.43), (BUILT-IN, 0.32),
(TERMINAL, 0.03) where 3.4 equals the total sum of
the ’s shown in Table II. Fig. 3. Objective function for different values of m.
The second problem encountered when applying the fuzzy
c-means to symbolic objects is as follows. Dissimilarity
Problem 2: The calculation of the dissimilarity between
Color TV, Color TV Color TV,
patterns and cluster centers. This problem arises due to the
changes done in forming of the cluster centers as discussed B&W TV Color TV, Built-in
in Problem 1. This problem is solved using the concept of Color TV, Terminal
weighted dissimilarity [20] given by 48k, 48k 48k, 32k
48k, 64k 10k, 10k
10k, 11–16k 10k, 4k
where and are the features constructing objects and ,
10k, 1k 10k, 8k
respectively, for . The weights ( )
associated with the features are calculated heuristically or 10k, 80k 10k, 12k
using some optimization routines. 10k, 14k
However, due to the changes proposed in forming cluster A Z
centers for symbolic objects, some modifications are done to
make the above formula more suitable for the problem of A HP
symbolic clustering. C
The dissimilarity between pattern and cluster center is – –
given by
–
where is the distance between and based on their

(10)
data types.
The dissimilarity ( ) is mainly dependent on the selection
where is the dissimilarity between feature in of the distance measure used in calculating the distance
pattern “ ” and event of feature in cluster center between symbolic data. Review [4], [5], [12] for different
“ ” and is calculated as mentioned earlier in Section II-B. distance measures of symbolic data.
Illustrative Example: Assume we would like to calculate
the dissimilarity distance between the cluster center shown in C. Proof of Correctness
Fig. 1 and the following pattern:
Assume we have objects belonging
to cluster center with degree of membership
DISPLAY RAM ROM .
COLOR TV 48K 10K For numerical objects, the feature of cluster center is
calculated as follows:
MP KEYS
6502 52
Fig. 4. Results for the proposed algorithm.
Fig. 6. Membership matrix for microcomputer data at m = 2:0.
Fig. 5. Membership matrix for microcomputer data at m = 1:1.
review (3)
Fig. 7. Values of objective function at different initial states for m = 2 :0 .
where
tion can be rewritten in the following form:
implies
Assuming ’s are the events constructing feature in

To calculate the distance between feature in object cluster center and from (5) and (6)
“ ” and feature in cluster center “ ”
which is equivalent to (10).
D. Fuzzy Symbolic C-Means Algorithm (FSCM)

For symbolic objects, the arithmetic operation “ ” is not In this section, a new partitioning algorithm for clustering
valid for the equation of calculating . This equa- symbolic data is introduced. This algorithm is a modified
version of fuzzy c-means for numerical data. The main ob- the data consists of 12 objects; each object has five features.
jective of the new algorithm (FSCM) is to show how to Two of the features are qualitative and the rest are quantitative.
apply the concept of fuzziness on symbolic data sets. In Fig. 2 shows the variations of the average number of
each iteration, the membership matrix and cluster centers are iterations needed to reach a solution for different values of .
updated according to (4)–(9). The initial cluster centers are Fig. 3 shows the distribution of the average objective function
chosen arbitrarily to allow getting different results and not to over . The final membership matrix of the
stuck to certain one. In case of divergence, the number of proposed algorithm tends to follow the distribution shown
iterations exceeds the maximum allowed number of iterations in Fig. 4 and these results are completely the same as the
and the algorithm is terminated. Fuzzy symbolic c-means results reported in [4]. Figs. 5 and 6 show samples of the
algorithm is shown at the bottom of the page. final membership matrices obtained for and ,
respectively. The rest of the memberships for different values
of are omitted due to the scope of the paper. Fig. 7 shows
the distribution of the objective function over different initial
IV. EXPERIMENTAL RESULTS
states.
In this section, the performance of the proposed algorithm
is tested and evaluated using some test data reported in
the literature. The data sets used in these experiments are B. The Problem of Determining a Classification
synthetic or real data and their classification is known from of the Fat–Oil Data
other clustering techniques [4], [5], [8], [12]. A comparison
between results obtained from the proposed algorithm and The data set [4], [5], [21], [22] used for this problem
other techniques is given. Every experiment is repeated for is shown in Table III. It consists of data of fats and oils
different values of , . The experiments used having four quantitative features of interval type and one
are explained in Sections IV-A and B. qualitative feature. Fig. 8 shows the variations of the av-
erage number of iterations needed to reach a solution for
different values of . Fig. 9 shows the distribution of the
A. The Problem of Determining a Classification average objective function over . The final
of Microcomputers membership matrix of the proposed algorithm tends to fol-
The data set of microcomputers [8] shown in Table I is used low the distribution shown in Fig. 10 and these results are
in this experiment. As previously mentioned in Section III-A, better than the results reported in [4] where the classification
[ ]
[ ] Choose the initial cluster centers arbitrarily.
[ ] Select the value of the exponent .
[ ]
] Calculate the Membership Matrix W; from the formula
where
review (10)
[FSCM-3-2] Calculate new cluster centers from:
no. of features
where is calculated as follows:
review (4)–(9)
(Convergence or maximum number of iterations is exceeded)

End [FSCM]
TABLE III
FAT–OIL DATA
Fig. 10. Results for the proposed algorithm.
Fig. 8. Average number of iterations for different values of m .
Fig. 11. Membership matrix for fat–oil data at m = 1:1.
Fig. 12. Membership matrix for fat–oil data at m = 2:0.

Fig. 9. Objective function for different values of m .
V. ANALYSIS OF PROPOSED ALGORITHM

was . The optimum solution for this AND EXPERIMENTAL RESULTS
problem for is , as mentioned in
A new approach for the clustering of symbolic data based
[5], [21], and [22]. The distribution obtained from the proposed
on the principle of fuzziness is introduced. Solutions to the
algorithm has always lower value of objective function than problems of the formation of cluster center and calculation
those in [4]. The results for are explained later in of distance measure between objects and cluster centers, are
Section V. Figs. 11 and 12 show samples of the final member- proposed. Fuzzy symbolic c-means algorithm (FSCM) is then
ship matrices obtained for and , respectively. The introduced as a tool of applying and testing the behavior of
rest of the memberships for different values of are omitted the proposed algorithm on different data sets.
due to the scope of the paper. Fig. 13 shows the effect of initial From the experimental results the following points can be
states on the value of the objective function. concluded.
Fig. 13. Values of objective function at different initial states for m = 2:0.
Fig. 16. Fuzzy and soft clustering for fat–oil data.

(a)
1) The proposed algorithm succeeds to apply the concept
of fuzziness on symbolic objects.
2) The behavior of the proposed fuzzy symbolic c-means
for symbolic objects is consistent with that of conven-
tional fuzzy c-means for numerical vectors.
(b)
3) The final results of the proposed algorithm are consistent
Fig. 14. Results of the proposed algorithm on (a) microcomputer data and with the results published in the literature [4] for solving
(b) fat–oil data using the similarity measure reported in [5].
the test problems.
4) From the experimental results, it was found that the most
appropriate value of was in the range: .
5) Although, we can find in the literature different results
for the test problems [4], [5], [8], [12], [21], [22],
the results obtained from the proposed algorithm are
completely logical and consistent with those in [4] since
the same distance measure is used. The variations of the
results in the literature are due to the use of different
distance measures and different methodologies.
6) When applying the proposed algorithm on microcom-
puter data and fat–oil data shown in Tables I and III at
and , respectively, using the similarity mea-
sure introduced in [5], the results shown in Fig. 14(a)
and (b), respectively, were obtained. The results repre-
sent the crisp distribution of the final membership matrix
for each experiment. The similarity measure of [5] was
used here instead of that in [4] to facilitate comparing
the results obtained from the proposed algorithm with
those reported in [5], [22] that used the same similarity
measure. These results are the same as reported in [5].
When applying the single-linkage method given in [20],
we got the same results for the fat–oil data and with
different merging for the microcomputer data. In [22],
the same classification for the fat–oil data was obtained
as a result of applying the NERF algorithm [22] on the
Fig. 15. Fuzzy and soft clustering for microcomputer data. similarity distance matrix published in [5].
7) The new algorithm overcomes most of the drawbacks [9] H. Ralambondrainy, “A conceptual version of the K-means algorithm,”
encountered in dealing with hierarchical algorithms for Pattern Recogn. Lett., no. 16, pp. 1147–1157, 1995.
[10] D. H. Fisher and P. Langley, “Approaches to conceptual clustering,” in
clustering symbolic data by formulating the given sym- Proc. 9th Int. Joint Conf. Artificial Intell., Los Angeles, CA, 1985, pp.
bolic clustering problem as an optimization problem 691–697.
with a specific objective function subject to a group [11] Y. A. El-Sonbaty, M. S. Kamel, and M. A. Ismail, Representations and
of constraints. The concept of fuzziness is applied here Proximity Measures of Structured Features, to be published.
[12] K. C. Gowda and T. V. Ravi, “Divisive clustering of symbolic objects
to give more meaning and easier interpretation of the using the concepts of both similarity and dissimilarity,” Pattern Recogn.,
results obtained from the proposed algorithm. vol. 28, no. 8, pp. 1277–1282, 1995.
8) The results obtained from the proposed algorithm are [13] M. P. Windham, “Cluster for fuzzy clustering algorithms,” Fuzzy Sets
mainly dependent on , initial conditions and the sim- Syst., vol. 5, pp. 177–185, 1981.
[14] E. R. Ruspini, “Numerical methods for fuzzy clustering,” Inform. Sci.,
ilarity/dissimilarity measure used for calculating the vol. 2, pp. 318–350, 1970.
distance between the symbolic objects. [15] M. Roubens, “Fuzzy clustering algorithms and their cluster validity,”
9) For small values of near 1.1, the behavior of the Eur. J. Operat. Res., vol. 10, pp. 294–301, 1982.
[16] , “Pattern classification problems and fuzzy sets,” Fuzzy Sets Syst.,
new algorithm is almost the same as that of hard
vol. 1, pp. 239–253, 1978.
clustering and this can be observed from the membership [17] M. A. Ismail and S. Z. Selim, “Fuzzy C-means: Optimality of solutions
matrix. This result is consistent with the behavior of and effective termination of the algorithm,” Pattern Recogn., vol. 19,
conventional fuzzy-c-means for numerical data. no. 6, pp. 481–485, 1986.
[18] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function. New
10) Increasing the factor increases the required number York: Plenum, 1981.
of iterations needed to reach a solution. This can be [19] R. J. Hathaway and J. C. Bezdek, “Local convergence of the fuzzy
explained as in the range , the degree of c-means algorithms,” Pattern Recogn., vol. 19, no. 6, pp. 477–480, 1986.
fuzzy memberships assigned at the levels of event-to- [20] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis.
New York: Wiley, 1973.
feature and pattern-to-cluster increases in addition to [21] M. Ichino, “General metrics for mixed features—The Cartesian space
the high percentage of overlapping between the feature theory for pattern recognition,” in Proc. IEEE Int. Conf. Syst., Man,
values. Cybern., Atlanta, GA, Oct. 1988, pp. 14–17.
[22] R. J. Hathaway and J. C. Bezdek, “NERF C-means: Non-Euclidean
11) The nature of symbolic data and the overlapping be-
relational fuzzy clustering,” Pattern Recogn., vol. 27, no. 3, pp. 429–437,
tween the feature values increases the number of iter- 1994.
ations needed to reach a solution. [23] M. A. Ismail, “Soft clustering: Algorithms and validity of solutions,”
12) The number of iterations needed to reach a solution can Fuzzy Computing. Amsterdam, The Netherlands: Elsevier, 1988, pp.
445–471.
be reduced by applying the concept of soft clustering [24] Y. A. El-Sonbaty, “Fuzzy and soft clustering for symbolic data,” M.Sc.
[23]–[25] and that by changing as follows: thesis, Alexandria Univ., Egypt, 1993.
[25] S. Z. Selim and M. A. Ismail, “Soft clustering of multidimensional data:
or A semi-fuzzy approach,” Pattern Recogn., vol. 17, no. 5, pp. 559–568,
1984.
where is a threshold value and is equal to Yasser El-Sonbaty received the B.Sc. (honors) and
M.Sc. degrees in computer science from the Univer-
where . The default value of is 0.5. The sity of Alexandria, Egypt. He is currently working
new values for the number of iterations are shown in toward the Ph.D. degree at the same university.
Figs. 15 and 16. His research interests include object representa-
tion and recognition, machine vision, and pattern
recognition.
REFERENCES
[1] K. C. Gowda and G. G. Krishna, “Dissaggregative clustering using
the concept of mutual nearest neighborhood,” IEEE Trans. Syst., Man,
Cybern., vol. 8, pp. 883–895, Dec. 1978.
[2] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data. Engle-
wood Cliffs, NJ: Prentice Hall, 1988.
[3] E. Diday, The Symbolic Approach in Clustering, Classification and
Related Methods of Data Analysis, H. H. Bock, Ed. Amsterdam, The
Netherlands: Elsevier, 1988. M. A. Ismail received the B.Sc. (honors) and M.Sc.
[4] K. C. Gowda and E. Diday, “Symbolic clustering using a new dis- degrees in computer science from the University of
similarity measure,” Pattern Recogn., vol. 24, no. 6, pp. 567–578, Alexandria, Egypt, in 1970 and 1974, respectively,
1991. and the Ph.D. degree in electrical engineering from
[5] , “Symbolic clustering using a new similarity measure, IEEE the University of Waterloo, Canada, in 1980.
Trans. Syst., Man, Cybern., vol. 22, pp. 368–378, Feb. 1992. He is a Professor of computer science in the
[6] D. H. Fisher, “Knowledge acquisition via incremental conceptual clus- Department of Computer Science, Alexandria Uni-
tering,” Mach. Learning, no. 2, pp. 103–138, 1987. versity, Egypt. He has taught computer science and
[7] Y. Cheng and K. S. Fu, “Conceptual clustering in knowledge organ- engineering at the University of Waterloo, Canada,
ization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-7, pp. University of Petroleum and Minerals (UPM), Saudi
592–598, 1985. Arabia, the University of Windsor, Canada, and the
[8] R. Michalski and R. E. Stepp, “Automated construction of classifica- University of Michigan, Ann Arbor. His research interests include pattern
tions: Conceptual clustering versus numerical taxonomy,” PAMI, no. 5, analysis and machine intelligence, data structures and analysis, medical
pp. 396–410, 1983. computer science, and nontraditional databases.

Fuzzy Clustering For Symbolic Data

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Fuzzy Clustering For Symbolic Data

Hochgeladen von

Copyright:

Verfügbare Formate

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 6, NO.

2, MAY 1998 195

Fuzzy Clustering for Symbolic Data

T HE objective of cluster analysis is to group a set of ob-

TABLE I sets. Unlike hard clustering, the patterns in fuzzy clustering

A. Fuzzy Clustering for Symbolic Objects

This value is different from the equivalent distance published (3)

Fig. 1. Structure of a cluster center for data set in Table I.

When the th feature is quantitative or ordinal qualitative, (7)

Fig. 2. Average number of iterations for different values of m.

where is the distance between and based on their

Fig. 4. Results for the proposed algorithm.

Fig. 6. Membership matrix for microcomputer data at m = 2:0.

Fig. 5. Membership matrix for microcomputer data at m = 1:1.

Assuming ’s are the events constructing feature in

which is equivalent to (10).

D. Fuzzy Symbolic C-Means Algorithm (FSCM)

] Calculate the Membership Matrix W; from the formula

(Convergence or maximum number of iterations is exceeded)

Fig. 10. Results for the proposed algorithm.

Fig. 8. Average number of iterations for different values of m .

Fig. 11. Membership matrix for fat–oil data at m = 1:1.

Fig. 12. Membership matrix for fat–oil data at m = 2:0.

V. ANALYSIS OF PROPOSED ALGORITHM

Fig. 16. Fuzzy and soft clustering for fat–oil data.

Das könnte Ihnen auch gefallen