Sie sind auf Seite 1von 4

Intelligent Data Analysis and Probabilistic Inference Data Mining Tutorial 3: Clustering and Associations Rules

1. i. ii. Explain the operation of the k-means clustering algorithm using pseudo code. Given the following eight points, and assuming initial cluster centroids given by A, B, C, and that a Euclidean distance function is used for measuring distance between points, use k-means to show only the three clusters and calculate their new centroids after the second round of execution.

ID A B C D E F G H 2. i. ii.

X 2 2 8 5 7 6 1 4

Y 10 5 4 8 5 4 2 9

Explain the meaning of support and confidence in the context of association rule discovery algorithms and explain how the a priori heuristic can be used to improve the efficiency of such algorithms. Given the transactions described below, find all rules between single items that have support >= 60%. For each rule report both support and confidence. 1: (Beer) 2: (Cola, Beer) 3: (Cola, Beer) 4: (Nuts, Beer) 5: (Nuts, Cola, Beer) 6: (Nuts, Cola, Beer) 7: (Crisps, Nuts, Cola) 8: (Crisps, Nuts, Cola, Beer) 9: (Crisps, Nuts, Cola, Beer) 10:(Crisps, Nuts, Cola, Beer)

yg@doc.ic.ac.uk, mmg@doc.ic.ac.uk

16th Dec2003

3.

a. Explain how hierarchical clustering algorithms work, make sure your answer describes what is meant by a linkage method and how it is used. b. Explain the advantages and disadvantages of hierarchical clustering compared to K-means clustering.

4.

The following table shows the distance matrix between five genes, G1 G2 G3 G4 G5 i. ii. iii. G1 0 9 3 6 11 G2 0 7 5 10 G3 G4 G5

0 9 2

0 8

Based on a complete linkage method show the distance matrix between the first formed cluster and the other data points. Draw a dendrogram showing the full hierarchical clustering tree for five points based on complete linkage. Draw a dendrogram showing the full hierarchicatree for the five points based on single linkage.

yg@doc.ic.ac.uk, mmg@doc.ic.ac.uk

16th Dec2003

Data Mining Tutorial 3: Answers


1. Clusters after 1st iteration Cluster1: A (2,10), D (5,8), H (4,9) Cluster2: B: B (2,5), G (1,2) Cluster3: C (8,4), E (7,5), F (6,4) Centroids after 1st iteration Cluster1: centroid: (3.66, 9) Cluster2: centroid: (1.5, 3.5) Cluster3: centroid: (7, 4.33) Clusters after 2nd iteration(no change) Cluster1: A (2,10), D (5,8), H (4,9) Cluster2: B: B (2,5), G (1,2) Cluster3: C (8,4), E (7,5), F (6,4) Centroids after 2nd iteration (no change) Cluster1: centroid: (3.66, 9) Cluster2: centroid: (1.5, 3.5) Cluster3: centroid: (7, 4.33)

2. Initial Supports Beer: Support = 9/10 Cola: Support=8/10 Nuts: Support=7/10 Crisps: Support=4/10 (Drop Crisps) Beer, Cola: Support=7/10 Beer, Nuts: Support=6/10 Cola, Nuts: Support=6/10 Beer->Cola (Support=70%, Confidence= 7/9=77% Cola->Beer (Support=70%, Confidence= 7/8=87.5 Beer->Nuts (Support=60%, Confidence= 6/9=66% Nuts->Beer (Support= 60%, Confidence= 6/7=85.7% Cola->Nuts (Support=60%, Confidence= 6/8=75% Nuts->Cola (Support=60%, Confidence= 6/7=85.7%

yg@doc.ic.ac.uk, mmg@doc.ic.ac.uk

16th Dec2003

4. The first cluster will be formed from G3 and G5 since they have the minimum distance. G35 0 11 10 9 G1 0 9 6 0 5 G2 G4

G35 G1 G2 G4

Single Linkage

Complete Linkage

yg@doc.ic.ac.uk, mmg@doc.ic.ac.uk

16th Dec2003

Das könnte Ihnen auch gefallen