Beruflich Dokumente
Kultur Dokumente
http://www.cisjournal.org
ABSTRACT
Now-a-days, keeping information (data) is not a problem, but keeping that data effectively is the problem. Clustering is the
classification of patterns into the groups of similar items. The data in every group is similar but quiet different in different
groups. The clustering problem has been addressed in many of the fields .It shows the usability of it .In this paper the
clustering is applied to the image data. The feature values are taken, and the final solution depends upon, these values on
which the categorization is done. The complexities for the different methods are also defined here. The paper ends with
some of the difficulties and solutions for them and with the results, on which the clustering is done.
http://www.cisjournal.org
A. Hierarchical
---Agglomerative
a) Single linkage,
b) Complete linkage,
c) Group average linkage,
d) Median linkage,
e) Centroid linkage,
f) Ward’s method,
g) Balanced iterative reducing and clustering using
hierarchies (BIRCH),
h) Clustering using representatives (CURE),
i) Robust clustering using links (ROCK)
---Divisive
Divisive analysis (DIANA), monothetic analysis
(MONA)
35
Volume 2 Special Issue ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
http://www.cisjournal.org
3. Compute distances (similarities) between the new algorithms. The Average linkage algorithm is obtained by
cluster and each of the old clusters. defining the distance between two clusters to be the
4. Repeat steps 2 and 3 until all items are clustered average distance between two points such that one point
into a single cluster of size N. (*) is in each cluster. If Ci and Cj are clusters, the distance
between them is defined as
Step 3 can be done in different ways, which is what
distinguishes single-linkage from complete-linkage and DAL(Ci,Cj) = 1/ninj ∑ d(a,b)
average-linkage clustering. aεCi, bεCj
In single-linkage clustering (also called the connectedness where d(a,b) denotes the distance between the samples a
or minimum method), we consider the distance between and b.
one cluster and another cluster to be equal to the shortest The main weaknesses of agglomerative clustering methods
distance from any member of one cluster to any member are:
of the other cluster. If the data consist of similarities, we They do not scale well, time complexity of at least
consider the similarity between one cluster and another O(n2), where n is the number of total objects;
cluster to be equal to the greatest similarity from any They can never undo what was done previously.
member of one cluster to any member of the other cluster.
In complete-linkage clustering (also called the diameter or V. K-Means Clustering
maximum method), we consider the distance between one K-means is one of the simplest unsupervised learning
cluster and another cluster to be equal to the greatest algorithms that solve the well known clustering problem.
distance from any member of one cluster to any member The procedure follows a simple and easy way to classify a
of the other cluster. In average-linkage clustering, we given data set through a certain number of clusters
consider the distance between one cluster and another (assume k clusters) fixed a priori. The main idea is to
cluster to be equal to the average distance from any define k centroids, one for each cluster. These centroids
member of one cluster to any member of the other cluster. shoud be placed in a cunning way because of different
The result with image data is shown in the section. location causes different result[6]. So, the better choice is
to place them as much as possible far away from each
Single Linkage Algorithm: other. The next step is to take each point belonging to a
given data set and associate it to the nearest centroid.
Single linkage algorithm is also called as the minimum When no point is pending, the first step is completed and
method. The single linkage algorithm is obtained by an early groupage is done. At this point we need to re-
defining the distance between two clusters to be the calculate k new centroids as barycenters of the clusters
smallest distance between two points such that one point resulting from the previous step. After we have these k
is in each cluster. If Ci and cj are clusters , the distance new centroids, a new binding has to be done between the
between them is defined as same data set points and the nearest new centroid. A loop
DsL(Ci,Cj) = min d(a,b) has been generated. As a result of this loop we may notice
aεCi, bεCj that the k centroids change their location step by step until
no more changes are done. In other words centroids do not
where d(a,b) denotes the distance between the samples a move any more.
and b. Finally, this algorithm aims at minimizing an objective
function, in this case a squared error function. The
Complete Linkage Algorithm:- objective function
36
Volume 2 Special Issue ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
http://www.cisjournal.org
3. When all objects have been assigned, recalculate
the positions of the K centroids.
4. Repeat Steps 2 and 3 until the centroids no longer
move. This produces a separation of the objects
into groups from which the metric to be
minimized can be calculated.
Advantages
Disadvantages
Fig. 1 Different image patterns
1. Although it can be proved that the procedure will
always terminate, the k-means algorithm does not The patterns can be clustered using no of features. The
necessarily find the most optimal configuration, basic features are color, shape and texture. Here one
corresponding to the global objective function feature from the basic features is taken in addition to two
minimum. more new features i.e. No of objects and size of object.
2. The algorithm is also significantly sensitive to the The various methods for detection for size, shape etc are
initial randomly selected cluster centers. The k- available in [7], and in literature also. The corresponding
means algorithm can be run multiple times to value for each feature is shown in the Table. The results
reduce this effect. after the experimentation for clustering is shown in
Figure-2.Results are found to be same for simple,
complete, average linkage algorithms. Even the results are
A large number of attempts have been made to estimate
same by using Euclidian and City lock distances.
the appropriate and some of representative examples are
illustrated in the following. [6]. Some Solutions for this
algorithm are Table Iii Image Features Value With Respect To
Patterns
1. Visualization of the data set. For the data points that
can be effectively projected onto a two-dimensional Pattern Color No. of Size of
Euclidean space, which are commonly depicted with a No objects object
histogram or scatterplot, direct observations can provide
good insight on the value of .However, the complexity of 1 15(White) 1 3
most real data sets restricts the effectiveness of the
strategy only to a small scope of applications. 2 08 (Dark 6 1
Gray)
2. Construction of certain indices (or stopping rules). 3 14 1 2.5
These indices usually emphasize the compactnesss of (Yellow)
intra-cluster and isolation of inter-cluster and consider the
comprehensive effects of several factors, including the 4 06 12 1
defined squared error, the geometric or statistical (Brown)
properties of the data, the number of patterns, the
dissimilarity (or similarity), and the number of clusters.
Milligan and Cooper compared and ranked 30 indices
according to their performance over a series of artificial
data sets.
37
Volume 2 Special Issue ISSN 2079-8407
Journal of Emerging Trends in Computing and Information Sciences
http://www.cisjournal.org
Fig. 2 Dendrogram for the Clustering. Presented work consisting of the basic idea and
implementation of some of the basic clustering methods,
In future, we will go for more number of implementations
for the clustering methods and their utilities.
REFERENCES
[1] Textbook on “Pattern Recognition and Image
Analysis”,Earl Gose, Richard Johnsonbaugh
,Steve Jost.
38