Willkommen bei Scribd!

Karussell überspringen

ROCK Clustering Example

Hochgeladen von

shehzad791

100% fanden dieses Dokument nützlich (2 Abstimmungen)

2K Ansichten4 Seiten

Data Mining

Originaltitel

ROCK clustering example

Copyright

Verfügbare Formate

DOCX, PDF, TXT oder online auf Scribd lesen

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Dieses Dokument melden

Data Mining

Copyright:

Verfügbare Formate

Als DOCX, PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

100% fanden dieses Dokument nützlich (2 Abstimmungen)

2K Ansichten4 Seiten

ROCK Clustering Example

Hochgeladen von

shehzad791

Data Mining

Copyright:

Verfügbare Formate

Als DOCX, PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

Zu Seite

Sie sind auf Seite 1von 4

Im Dokument suchen

Assignment

Clustering
ROCK Clustering Algorithm
Input:

A set S of data points

Number of k clusters to be found

The similarity threshold

Output: Groups of clustered data

The ROCK algorithm is divided into three major parts:
1. Draw a random sample from the data set:
2. Perform a hierarchical agglomerative clustering algorithm
3. Label data on disk
in our case, we do not deal with a very huge data set. So, we will consider the
whole data in the process of forming clusters, i.e. we skip step1 and step3
1. Draw a random sample from the data set:
Sampling is used to ensure scalability to very large data sets
The initial sample is used to form clusters, then the remaining data on disk is
assigned to these clusters
in our case, we will consider the whole data in the process of forming clusters.
2. Perform a hierarchical agglomerative clustering algorithm:
ROCK performs the following steps which are common to all hierarchical
agglomerative clustering algorithms, but with different definition to the
similarity measures:
a. places each single data point into a separate cluster
b. compute the similarity measure for all pairs of clusters
c. merge the two clusters with the highest similarity (goodness measure)
d. Verify a stop condition. If it is not met then go to step b
3. Label data on disk:
Finally, the remaining data points in the disk are assigned to the generated
clusters.

This is done by selecting a random sample Li from each cluster Ci, then we
assign each point p to the cluster for which it has the strongest linkage with
Li.
As we said, we will consider the whole data in the process of forming clusters.
Computation of links:
1. using the similarity threshold , we can convert the similarity matrix into an
adjacency matrix (A)
2. Then we obtain a matrix indicating the number of links by calculating (A x
A ) , i.e., by multiplying the adjacency matrix A with itself
3. Suppose we have four verses contains some subjects , as follows:
P1={ judgment, faith, prayer, fair} {Milk, butter, bread, eggs}
P2={ fasting, faith, prayer}
P3={ fair, fasting, faith}

{honey, butter, bread}

{eggs, honey, butter}

P4={ fasting, prayer, pilgrimage} {honey, bread, Jam}

4. the similarity threshold = 0.3, and number of required cluster is 2.
5. using Jaccard coefficient as a similarity measure, we obtain the following
similarity table :

Since we have a similarity threshold equal to 0.3, then we derive the

adjacency table:

By multiplying the adjacency table with itself, we derive the following table
which shows the number of links (or common neighbors) :

we compute the goodness measure for all adjacent points ,assuming that
f() =1- / 1+

g ( Pi , Pj )

link [ Pi , Pj ]
1 2 f ( )

(n m)

n1 2 f ( ) m1 2 f ( )

We obtain the following table:

g(P1,P2)= 3 / (1+1)2.06 - 12.06 - 12.06 = 1.37
g(P1,P4)= 1 / (1+1)2.06 - 12.06 - 12.06 = 0.46

etc.

We have an equal goodness measure for merging ((P1,P2), (P2,P1), (P3,P1))

Now, we start the hierarchical algorithm by merging, say P1 and P2.

A new cluster (lets call it C(P1,P2)) is formed.
It should be noted that for some other hierarchical clustering techniques, we
will not start the clustering process by merging P1 and P2, since Sim(P1,P2) =
0.4,which is not the highest. But, ROCK uses the number of links as the
similarity measure rather than distance.
Now, after merging P1 and P2, we have only three clusters. The following
table shows the number of common neighbors for these clusters:

Then we can obtain the following goodness measures for all adjacent
clusters:
It should be P3,P4= 0.45
g((P1,P2),P3)= 6 / (2+1)2.06 - 22.06 - 12.06 = 1.34 or 1.31
g((P1,P2),P4)= 3 / (2+1)2.06 - 22.06 - 12.06 = 0.67 or 0.66

Since the number of required clusters is 2, then we finish the clustering

algorithm by merging C(P1,P2) and P3, obtaining a new cluster C(P1,P2,P3)
which contains {P1,P2,P3} leaving P4 alone in a separate cluster.

Das könnte Ihnen auch gefallen

Cs6402 DAA Notes (Unit-3)
Dokument25 Seiten
Cs6402 DAA Notes (Unit-3)
Jayakumar D
Noch keine Bewertungen
Single Linked List
Dokument11 Seiten
Single Linked List
lakshmi.s
Noch keine Bewertungen
18CSO106T Data Analysis Using Open Source Tool: Question Bank
Dokument26 Seiten
18CSO106T Data Analysis Using Open Source Tool: Question Bank
Shivaditya singh
Noch keine Bewertungen
18 Cps 13
Dokument3 Seiten
18 Cps 13
Suhas Hanjar
Noch keine Bewertungen
SQL Labrecord Questions 16to20
Dokument5 Seiten
SQL Labrecord Questions 16to20
kent
Noch keine Bewertungen
Data Warehousing and Data Mining JNTU Previous Years Question Papers
Dokument4 Seiten
Data Warehousing and Data Mining JNTU Previous Years Question Papers
Rakesh Varma
Noch keine Bewertungen
MCAN201 Data Structure With Python Questions For 1st Internal
Dokument2 Seiten
MCAN201 Data Structure With Python Questions For 1st Internal
Dr.Krishna Bhowal
Noch keine Bewertungen
C Important Questions PDF
Dokument9 Seiten
C Important Questions PDF
Pradeep Kumar Reddy Reddy
100% (2)
SQL MCQ
Dokument7 Seiten
SQL MCQ
Waqar Hassan
100% (1)
Chapter 3 Solutions: Unit 1 Colutions Oomd 06CS71
Dokument14 Seiten
Chapter 3 Solutions: Unit 1 Colutions Oomd 06CS71
Sneha Shree
Noch keine Bewertungen
Data Mining and Warehousing
Dokument30 Seiten
Data Mining and Warehousing
chituuu
100% (3)
Data WareHouse Previous Year Question Paper
Dokument10 Seiten
Data WareHouse Previous Year Question Paper
Madhusudan Das
100% (1)
DWDM Assignment 1
Dokument4 Seiten
DWDM Assignment 1
jyothibellary2754
Noch keine Bewertungen
UNIT I Basics of C Programming
Dokument44 Seiten
UNIT I Basics of C Programming
courageouscse
Noch keine Bewertungen
Operating System For Bca Full Refrence - BCA NOTES
Dokument68 Seiten
Operating System For Bca Full Refrence - BCA NOTES
Kaushik Gowda
Noch keine Bewertungen
DBMS Practical File
Dokument94 Seiten
DBMS Practical File
Heman Setia
Noch keine Bewertungen
CSL 333 Database Management Systems Lab Cycle
Dokument7 Seiten
CSL 333 Database Management Systems Lab Cycle
S5CSE18Midhun K
Noch keine Bewertungen
02 Data Mining-Partitioning Method
Dokument8 Seiten
02 Data Mining-Partitioning Method
Raj Endran
Noch keine Bewertungen
Unit Wise Question
Dokument2 Seiten
Unit Wise Question
Mohit Saini
Noch keine Bewertungen
Introduction To Communication and Technology Notes
Dokument17 Seiten
Introduction To Communication and Technology Notes
Ali Abdullah Jilani
100% (2)
Data Mining MCQ FINAL
Dokument32 Seiten
Data Mining MCQ FINAL
Amit Kumar Sahu
Noch keine Bewertungen
Computer Science Practical List
Dokument3 Seiten
Computer Science Practical List
Sakshi
Noch keine Bewertungen
SQL Queries
Dokument11 Seiten
SQL Queries
WaqarAhmadAlvi
Noch keine Bewertungen
CCS341 Set1
Dokument2 Seiten
CCS341 Set1
Anonymous SwS8ipxbd
100% (1)
Data Warehouse 21reg
Dokument2 Seiten
Data Warehouse 21reg
Ponni S
100% (1)
GA LAB Manual-1
Dokument33 Seiten
GA LAB Manual-1
Sundar Shahi Thakuri
Noch keine Bewertungen
Preferred Local Author Books
Dokument2 Seiten
Preferred Local Author Books
Radha Sheela
0% (1)
C in Depth Solution
Dokument1 Seite
C in Depth Solution
Avinash Jha
Noch keine Bewertungen
Information Storage and Management
Dokument2 Seiten
Information Storage and Management
api-241651929
100% (1)
Oops Lab File
Dokument57 Seiten
Oops Lab File
gamer king
100% (1)
Co Din G & de Cod Ing: Here Starts Learning
Dokument28 Seiten
Co Din G & de Cod Ing: Here Starts Learning
rahul reddy
Noch keine Bewertungen
Wipro 100 C Programs For Training
Dokument5 Seiten
Wipro 100 C Programs For Training
Srivatsa Karchedu
Noch keine Bewertungen
12.10 Create An ER Diagram For Each of The Following Descriptions
Dokument4 Seiten
12.10 Create An ER Diagram For Each of The Following Descriptions
Desizm .Com
Noch keine Bewertungen
TCS NQT Model Programming Coding Questions Paper
Dokument7 Seiten
TCS NQT Model Programming Coding Questions Paper
Krishna Parag
Noch keine Bewertungen
Datastage
Dokument12 Seiten
Datastage
Narendra Singh
Noch keine Bewertungen
ADA Complete Notes
Dokument151 Seiten
ADA Complete Notes
Krishnakumar K Unnikrishnan
33% (3)
DBMS - Lab Manual
Dokument105 Seiten
DBMS - Lab Manual
Sanjeev Karthick
Noch keine Bewertungen
Module II
Dokument22 Seiten
Module II
johnsonjoshal5
Noch keine Bewertungen
DWH Question Bank
Dokument9 Seiten
DWH Question Bank
bala swaroop
Noch keine Bewertungen
RDBMS Sample Questions
Dokument4 Seiten
RDBMS Sample Questions
Jyothika
Noch keine Bewertungen
EE6502 Microprocessors and Microcontrollers PDF
Dokument79 Seiten
EE6502 Microprocessors and Microcontrollers PDF
Jayamani Krishnan
100% (1)
EXCEL Lab Manual
Dokument17 Seiten
EXCEL Lab Manual
anzar
Noch keine Bewertungen
DMT MCQ
Dokument15 Seiten
DMT MCQ
lalith
Noch keine Bewertungen
2 ASSIGNMENT 2 (Beginning Superstore)
Dokument1 Seite
2 ASSIGNMENT 2 (Beginning Superstore)
Chaitanya Chowdary
0% (1)
Important Problem Types
Dokument2 Seiten
Important Problem Types
Sahila Devi
50% (2)
Linked List Questions1
Dokument7 Seiten
Linked List Questions1
Rontu Barhoi
Noch keine Bewertungen
15MA302 Discrete Mathematics PDF
Dokument3 Seiten
15MA302 Discrete Mathematics PDF
Rohith Reddy
100% (1)
DS IMP Questions Module Wise
Dokument7 Seiten
DS IMP Questions Module Wise
Bhuvan S M
100% (1)
Cisc 630 - HW2 PDF
Dokument2 Seiten
Cisc 630 - HW2 PDF
USVet96
Noch keine Bewertungen
Cps Module 3
Dokument27 Seiten
Cps Module 3
Radhika Ajadka
Noch keine Bewertungen
R Programming Lab Manual
Dokument16 Seiten
R Programming Lab Manual
vivek gupta
0% (1)
CS363 Spring 2021 Homework 2
Dokument9 Seiten
CS363 Spring 2021 Homework 2
zubair zubair
Noch keine Bewertungen
DMW Question Paper
Dokument7 Seiten
DMW Question Paper
Jean Claude
0% (1)
Solutions To Set 7
Dokument20 Seiten
Solutions To Set 7
kitana_sect
100% (1)
Data Mining and Clustering - Benjamin Lam
Dokument49 Seiten
Data Mining and Clustering - Benjamin Lam
Arijit Das
Noch keine Bewertungen
Unit 6 Divide and Conquer: Structure
Dokument26 Seiten
Unit 6 Divide and Conquer: Structure
Raj Singh
Noch keine Bewertungen
Artificial Intelligence Tutorial 5 - Answers: Difficult), and P
Dokument5 Seiten
Artificial Intelligence Tutorial 5 - Answers: Difficult), and P
neha4project
100% (1)
36) Corpet 1988
Dokument10 Seiten
36) Corpet 1988
Paulette Dl
Noch keine Bewertungen
Algorithms Part 1 - Lecture Notes: 1 Union Find
Dokument6 Seiten
Algorithms Part 1 - Lecture Notes: 1 Union Find
Jack
Noch keine Bewertungen
General Least Squares Smoothing and Differentiation of Nonuniformly Spaced Data by The Convolution Method.
Dokument3 Seiten
General Least Squares Smoothing and Differentiation of Nonuniformly Spaced Data by The Convolution Method.
Vladimir Pelekhaty
Noch keine Bewertungen
Quiz 1 - Software - Engg
Dokument2 Seiten
Quiz 1 - Software - Engg
shehzad791
Noch keine Bewertungen
PH.D (Computing) Sample Test Paper: Section I - Technical Communication
Dokument3 Seiten
PH.D (Computing) Sample Test Paper: Section I - Technical Communication
shehzad791
Noch keine Bewertungen
Huffman Tree
Dokument1 Seite
Huffman Tree
shehzad791
Noch keine Bewertungen
Thesis
Dokument99 Seiten
Thesis
shehzad791
Noch keine Bewertungen
Mscs Sample SZABIST
Dokument3 Seiten
Mscs Sample SZABIST
shehzad791
Noch keine Bewertungen
Bs Sample
Dokument7 Seiten
Bs Sample
Khizar Iqbal
Noch keine Bewertungen
NTS Book For GAT General
Dokument142 Seiten
NTS Book For GAT General
Muzaffar Ahsan
Noch keine Bewertungen
Benefits of Studying Abroad
Dokument2 Seiten
Benefits of Studying Abroad
shehzad791
Noch keine Bewertungen
Operating System
Dokument3 Seiten
Operating System
shehzad791
Noch keine Bewertungen
EE TimeTable (Spring 2013) - 2
Dokument5 Seiten
EE TimeTable (Spring 2013) - 2
shehzad791
Noch keine Bewertungen
Movies List
Dokument1 Seite
Movies List
shehzad791
Noch keine Bewertungen
Required Awk Files-Afzal
Dokument1 Seite
Required Awk Files-Afzal
shehzad791
Noch keine Bewertungen
Movies List
Dokument1 Seite
Movies List
shehzad791
Noch keine Bewertungen
Operating System
Dokument3 Seiten
Operating System
shehzad791
Noch keine Bewertungen
Required Awk Files-Afzal
Dokument1 Seite
Required Awk Files-Afzal
shehzad791
Noch keine Bewertungen
Recipes From The Perfect Scoop by David Lebovitz
Dokument10 Seiten
Recipes From The Perfect Scoop by David Lebovitz
The Recipe Club
100% (7)
Process Costing Exercises Series 1
Dokument23 Seiten
Process Costing Exercises Series 1
sarahbee
Noch keine Bewertungen
Nissan Copper LTD
Dokument11 Seiten
Nissan Copper LTD
ankit_shah
Noch keine Bewertungen
Logic Notes
Dokument19 Seiten
Logic Notes
Cielo Pulma
Noch keine Bewertungen
Implicit Explicit Signals
Dokument8 Seiten
Implicit Explicit Signals
Versoza Nel
100% (2)
Esteem 1999 2000 1.3L 1.6L
Dokument45 Seiten
Esteem 1999 2000 1.3L 1.6L
Arnold Hernández Carvajal
Noch keine Bewertungen
GROSS Mystery of UFOs A Prelude
Dokument309 Seiten
GROSS Mystery of UFOs A Prelude
Tommaso Monteleone
Noch keine Bewertungen
Trees
Dokument69 Seiten
Trees
ADITYA GEHLAWAT
Noch keine Bewertungen
Amnaya Sutra (English)
Dokument458 Seiten
Amnaya Sutra (English)
Assam Bhakti Sagar
Noch keine Bewertungen
Kindergarten Math Problem of The Day December Activity
Dokument5 Seiten
Kindergarten Math Problem of The Day December Activity
iammikemills
Noch keine Bewertungen
外贸专业术语
Dokument13 Seiten
外贸专业术语
邱建华
Noch keine Bewertungen
Murata High Voltage Ceramic
Dokument38 Seiten
Murata High Voltage Ceramic
tycristina
Noch keine Bewertungen
Cambridge IGCSE Paper 2
Dokument4 Seiten
Cambridge IGCSE Paper 2
irajooo epik zizter
Noch keine Bewertungen
The FOA Reference For Fiber Optics - Fiber Optic Testing
Dokument19 Seiten
The FOA Reference For Fiber Optics - Fiber Optic Testing
vsalaiselvam
Noch keine Bewertungen
Fundamentals of Pain Medicine: Jianguo Cheng Richard W. Rosenquist
Dokument346 Seiten
Fundamentals of Pain Medicine: Jianguo Cheng Richard W. Rosenquist
May
Noch keine Bewertungen
Sambong
Dokument3 Seiten
Sambong
Nica Del Gallego
Noch keine Bewertungen
Physical and Morphological Characterisation of Typha Australis Fibres
Dokument12 Seiten
Physical and Morphological Characterisation of Typha Australis Fibres
IJAR JOURNAL
Noch keine Bewertungen
Assessment Questions: 1: Wash - Rinse and Sanitize
Dokument3 Seiten
Assessment Questions: 1: Wash - Rinse and Sanitize
Ana Margarita Aycocho
Noch keine Bewertungen
Piriformis Syndrome: Hardi Adiyatma, Shahdevi Nandar Kusuma
Dokument6 Seiten
Piriformis Syndrome: Hardi Adiyatma, Shahdevi Nandar Kusuma
ismael wandikbo
Noch keine Bewertungen
Sika - Bitumen: Bitumen Emulsion Waterproof & Protective Coating
Dokument3 Seiten
Sika - Bitumen: Bitumen Emulsion Waterproof & Protective Coating
dinu69in
Noch keine Bewertungen
PIX4D Simply Powerful
Dokument43 Seiten
PIX4D Simply Powerful
JUAN BAQUERO
Noch keine Bewertungen
Highway-And-Railroad-Engineering Summary
Dokument15 Seiten
Highway-And-Railroad-Engineering Summary
Rodin James Gabrillo
Noch keine Bewertungen
NCP Orif Right Femur Post Op
Dokument2 Seiten
NCP Orif Right Femur Post Op
Cen Janber Cabrillos
Noch keine Bewertungen
Innerwear Industry Pitch Presentation
Dokument19 Seiten
Innerwear Industry Pitch Presentation
RupeshKumar
Noch keine Bewertungen
EXP4 The Diels Alder Reactions
Dokument3 Seiten
EXP4 The Diels Alder Reactions
Laura Guido
Noch keine Bewertungen
Cobalamin in Companion Animals
Dokument8 Seiten
Cobalamin in Companion Animals
Flávia Uchôa
Noch keine Bewertungen
Goliath 90 v129 e
Dokument129 Seiten
Goliath 90 v129 e
erkan
Noch keine Bewertungen
Study The Effect of Postharvest Heat Treatment On Infestation Rate of Fruit Date Palm (Phoenix Dactylifera L.) Cultivars Grown in Algeria
Dokument4 Seiten
Study The Effect of Postharvest Heat Treatment On Infestation Rate of Fruit Date Palm (Phoenix Dactylifera L.) Cultivars Grown in Algeria
Journal of Nutritional Science and Healthy Diet
Noch keine Bewertungen
Fyp-Hydraulic Brakes Complete
Dokument32 Seiten
Fyp-Hydraulic Brakes Complete
Rishabh Jain
Noch keine Bewertungen
Cynosure Starlux 500 Palomar Technical Service Manual
Dokument47 Seiten
Cynosure Starlux 500 Palomar Technical Service Manual
JF Silva
Noch keine Bewertungen