Sie sind auf Seite 1von 3

Clustering Technique

Mohammad Ali Joneidi


Department of Electronic and Computer Engineering
Shahid Rajaee Univercity ,Tehran Iran
joneidimohamad@gmail.com

AbstractClustering is traditionally viewed as an unsupervised


method for data analysis. However, in some cases information
about the problem domain is available in addition to the data
instances themselves. In this report, we demonstrate how the
popular clustering method such as k-means,hierarchy and som
can cluster a solved problem. In this experiments we use
clustering method to cluster face detection problem that solved
befor with supervised classifications.
Index
TermsK-means,Hierarchy,Self
Oriented
Mapping,Clustering

I. I NTRODUCTION
Clustering algorithms are generally used in an unsupervised
fashion. They are presented with a set of data instances that
must be grouped according to some notion of similarity. The
algorithm has access only to the set of features describing
each object. it is not given any information (e.g., labels) as
to where each of the instances should be placed within the
partition. However, in real application domains, it is often
the case that the experimenter possesses some background
knowledge (about the domain or the data set) that could be
useful in clustering the data. Traditional clustering algorithms
have no way to take advantage of this information even when
it does exist.In continues We use this clustering techniques
1) Hierarchy
2) K-means
3) Self Oriented Map

Fig. 2.

III. H IERARCHY
Given a set of N items (in this problem 400) to be clustered,
and an NN distance (or similarity) matrix, the basic process
of hierarchical clustering is this:
1) Start by assigning each item to its own cluster, so that
if you have N items, you now have N clusters, each
containing just one item. Let the distances (similarities)
between the clusters equal the distances (similarities)
between the items they contain.
2) Find the closest (most similar) pair of clusters and merge
them into a single cluster, so that now you have one less
cluster.
3) Compute distances (similarities) between the new cluster
and each of the old clusters.
4) Repeat steps 2 and 3 until all items are clustered into a
single cluster of size N.

II. E UCLIDEAN DISTANCE


for calculating similarity we use Euclidean distance because
our data feature vector is numerical and best way to compare
numeric data is comparing their distance

Fig. 1.

for making distance bigger to completely separate each other


in this problem we dont Calculate root and just Calculate
Square distance
Fig. 3. diagram of hierarchy clustering

In step 3 we Calculate Euclidean distance between train data


(384 1) and means of all data labeled to each Cluster
IV. K- MEANS
K-means is one of the simplest unsupervised learning
algorithms that solve the well known clustering problem.
The procedure follows a simple and easy way to classify a
given data set through a certain number of clusters (in this
problem 40 cluster) fixed a priori. The main idea is to define
40 centroids, one for each cluster. These centroids shoud be
placed in a cunning way because of different location causes
different result. So, the better choice is to place them as much
as possible far away from each other.for better performance
we choose each centroids from one class.The next step is to
Calculate each point distance to a given data set and associate
it to the nearest centroid. When no point is pending, the
clustering is completed and grouping is done.

V. SOM
Kohonen Self-Organizing Maps are a type of neural
network.Self- Organizing Maps are aptly named. SelfOrganizing is because no supervision is required. SOMs
learn on their own through unsupervised competitive learning.
Maps is because they attempt to map their weights to conform
to the given input data. The nodes in a SOM network attempt
to become like the inputs presented to them. In this sense,
this is how they learn. They can also be called Feature Maps,
as in Self-Organizing Feature Maps. Retaining principle
features of the input data is a fundamental principle of
SOMs, and one of the things that makes them so valuable.
The structure of a SOM is fairly simple, and is best understood
with the use of an illustration such as Figure 4
Figure 4 is a 4x4 SOM network . It is easy to overlook this
structure , but there are a few key things to notice. First, each
map node is connected to each input node. For this small
4x4 node network, that is 4x4x3=48 connections.Secondly,
notice that map nodes are not connected to each other. The
nodes are organized in this manner, as a 2-D grid makes
it easy to visualize the results. This representation is also
useful when the SOM algorithm is used. In this configuration,
each map node has a unique (i,j) coordinate. This makes it
easy to reference a node in the network, and to calculate the
distances between nodes. Because of the connections only
to the input nodes, the map nodes are oblivious as to what
values their neighbors have. A map node will only update its
weights (explained next) based on what the input vector tells it.

Fig. 4. Structure of SOM

2) A vector is chosen at random from the set of training


data and presented to the network.(a random number
between 1 and 400)
3) . Every node in the network is examined to calculate
which ones weights are most like the input vector. The
winning node is commonly known as the Best Matching
Unit (BMU).
4) . The radius of the neighborhood of the BMU is calculated. This value starts large. Typically it is set to
be the radius of the network, diminishing each timestep. (we consider this value by distance/variance that
variance depended to number of iteration).
5) . Any nodes found within the radius of the BMU,
calculated , are adjusted to make them more like the
input vector . The closer a node is to the BMU, the
more its weights are altered.
6) . Repeat 2) for N iterations.
in the step 3 we update map node weight base on this formula:
W (t + 1) = W (t) + (t) (D(t) W (t))

A. The SOM Algorithm


The Self-Organizing Map algorithm can be broken up into
6 steps.
1) Each nodes weights are initialized.(in face detection
problem we select random number between 0 and 1

where W (t + 1) is news map node weights,W (t) is


old map node weights,(t) is a restraint due to distance from
BMU, usually called the neighborhood function, is learning
rate that such as Radius depended to number of iteration and
D(t) is the input data selected in step 2

Fig. 6. Accuracy of Som for face detection problem for various epoch number

over fitting
VII. R EFERENCES

Fig. 5. Size of Radius depended to Iteration

in this example we consider a variable called variance tha


t calculated as fallow in each iteration:


variance = exp 2iterationnumber
epochnumber
we use variance to determine radius of each iteration
as fallow:


distance
exp 2variance
that distance is the square distance between neighbor
node and winner node row and col and at the end we
calculate in each iteration as fallow :


= exp iterationnumber
epochnumber
we have 400 data instance we know that there are 40
cluster so we define map with size 5 8 that each node is
a 384 1 vector. figure 6 demonstrate performance of Som
technique for various number of epoch

VI. C ONCLUSION
In hierarchy clustering we have low performance because
our data are not hierarchy base in k-means clustering with
random initial center have low accuracy for enhancing performance we consider center of each cluster one of that cluster
sample randomize.in som clustering after 2000 epoch for 400
train we have best performance and for higher epoch we have

Kohonen Self-Organizing Maps by Shyam M.


Guthikonda 2005
Self Organizing Maps Fundamentals by John A. Bullinaria 2004
Constrained K-means Clustering with Background
Knowledge by Kiri Wagsta,Claire Cardie 2000

Das könnte Ihnen auch gefallen