SOM U C Clustering September 2011

Clustering with Self Organizing Maps
Vahid Moosavi Supervisor: Prof. Ludger Hovestadt September 2011
Outline
SOM Clustering Approaches U*C clustering (1)
Basic Definitions The Algorithm Results
(1):Alfred Ultsch: U*C: Self-organized Clustering with Emergent Feature Maps. LWA 2005: 240-244
The Learning Algorithm
Competition
Cooperation And Adaptation
Representation
SOM Clustering
One Stage Clustering: For maps with small number of nodes, each node is representative for a cluster Two Stage Clustering (for large maps)(1):
First train the SOM Then apply any clustering algorithm on the nodes instead of original data
Partitional clustering algorithms Hierarchical clustering algorithms
U*C Clustering Algorithm (2)

(1): VESANTO AND ALHONIEMI: CLUSTERING OF THE SELF-ORGANIZING MAP, IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 3, MAY 2000 (2): Alfred Ultsch: U*C: Self-organized Clustering with Emergent Feature Maps. LWA 2005: 240-244
Weaknesses of the Existing clustering Algorithms

Weaknesses of the other algorithms (e.g. KMeans, GK, Hierarchical Clustering):
No. of clusters should be known in advance. Clustering algorithms are based on some geometrical assumptions (Euclidean Distance, Ellipsoidal or spherical shapes, )
U*C clustering improves all of the above mentioned issues.

5
(Basic Definitions)
Component Planes U Matrix (1990) P Matrix (2003) U* Matrix
U*C clustering
Presentation and visualization (Component Plane)
Presentation and visualization U Matrix (1990)

Neuron i: ni Neighborhood neurons of ni: N(i)
Definition of U-Matrix

A display of all U-heights on top of the grid is called a U-Matrix: Ultsch (1990) U-Matrix can show visually the hidden clusters in the data set

The Original Data set Basins (clusters) Water Shed (Border)
The U-Matrix
10
Presentation and visualization P Matrix (2003)

In some Cases the U-Matrix is not enough. We use measure of the Density in addition to the Distance.
11
Presentation and visualization P Matrix (2003)

In some Cases the U-Matrix is not enough. We use measure of the Density in addition to the Distance.
12
Presentation and visualization U* Matrix (1990)

As the TwoDiamonds data set shows, a combination of distance relationships and density relationships is necessary to give an appropriate clustering. The combination of a UMatrix and a P-Matrix is called U*-Matrix. The Main Idea: The U*-Matrix exhibits the local data distances as heights, when the data density is low (cluster border). If the data density is high, the distances are scaled down to zero (cluster center).
13
Presentation and visualization U* Matrix (1990)
14
U*C Clustering (Main Ideas)

First Main Idea: Uheight in the center of a cluster is smaller than the Uheight on the border of the cluster in the Umatrix .
Second Main Idea: The P-height in the center of a cluster is larger than the P height in the border of a cluster in Pmatrix. At cluster borders the local density of the points should decrease substantially
15
U*C Clustering (Main Ideas)

A movement from one position ni to another position nj with the result that wj is more within a cluster C than wi is called immersive.
Some times, immersion can be find on U-Matrix (based on Gradient Descent Method). Some times, immersion can be find on P-Matrix (based on Gradient Ascent Method) Then : 1. Do Gradient Descent Method on U-Matrix: Start from point (Node) n in U-Matrix and go in a direction in its neighborhood to reach to minimum U-height (distance) point U. (this is probably a node within a cluster). 2. Do Gradient Ascent Method on P-Matrix: Start from point U in P-matrix and go in a direction in its neighborhood to reach to Maximum P, Immersion Points (which will be probably the center of a Cluster) 3. Calculate the watersheds on the U*Matrix based on any existing algorithm. 4. Partition Immersion Points using these water sheds to Cluster Centers C1,,Cc. 5. Assign the data sets to the clusters based on the Immersion Points of their corresponding Unit of the SOM. 16
U*C Clustering (2005)
17
U*C Clustering (2005) Some Experimental Results
18
19
20
SOM and can transform high dimensional Data sets to two dimensional representation and after that just by analyzing the distances and densities of the transformed data, we can find natural clusters hidden in original data sets. Classification and
SOM Modeling High Dimensional Data Set
Conclusion
Prediction for future experiments
Two Dimensional Representation Clustering Data Sets U-Matrix P-Matrix,
21
Alternative Way
Conclusion
Classification and Prediction for future experiments
High Dimensional Data Set
Two Dimensional Representation Clustering Data Sets U-Matrix P-Matrix,
Feature Selection and Extraction
SOM Modeling
Transformed (reduced) Data set
22
THANKS
23

SOM U C Clustering September 2011

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

SOM U C Clustering September 2011

Hochgeladen von

Copyright:

Verfügbare Formate

Clustering with Self Organizing Maps

Vahid Moosavi Supervisor: Prof. Ludger Hovestadt September 2011

The Learning Algorithm

Cooperation And Adaptation

U*C Clustering Algorithm (2)

Weaknesses of the Existing clustering Algorithms

U*C clustering improves all of the above mentioned issues.

Presentation and visualization (Component Plane)

Presentation and visualization U Matrix (1990)

Presentation and visualization U Matrix (1990)

Presentation and visualization U Matrix (1990)

Presentation and visualization P Matrix (2003)

Presentation and visualization P Matrix (2003)

Presentation and visualization U* Matrix (1990)

Presentation and visualization U* Matrix (1990)

U*C Clustering (Main Ideas)

U*C Clustering (Main Ideas)

U*C Clustering (2005)

U*C Clustering (2005) Some Experimental Results

U*C Clustering (2005) Some Experimental Results

U*C Clustering (2005) Some Experimental Results

Prediction for future experiments

Two Dimensional Representation Clustering Data Sets U-Matrix P-Matrix,

High Dimensional Data Set

Two Dimensional Representation Clustering Data Sets U-Matrix P-Matrix,

Feature Selection and Extraction

Transformed (reduced) Data set

Das könnte Ihnen auch gefallen