Sie sind auf Seite 1von 3

Q1) [1.5 points] Implement the k-means algorithm with k=2, 3, 4, 5, 6 to cluster train1.txt and train2.txt.

Which values of k would you choose for the first and second datasets? (You can use the cluster reconstruction error on training and test set.) Q2) [3.5 points] Implement the GMM using EM with k=2, 3, 4, 5, 6 to cluster train1.txt and train2.txt. Which values of k would you choose for the first and second datasets? (You can use the cluster reconstruction error-using Mahalanobis distance on training and test set.) Answers 1 For Clustering Train1 Test1 , Train2 and Test2 data k-means algorithm was used . For initialization cluster mean vector. C count of Instances was randomly chosen from data. In Figure [1-4] Clustered Data for Train1 Test1 , Train2 and Test2 can be seen for different clustering numbers. To determine cluster number For Train1 and Train2 data cluster reconstruction errors was found respect to different cluster numbers. The change of cluster reconstruction errors can be seen in Figure [5,6]. By looking Figure 5 after c=5 cluster reconstruction error doesnt change so much. So for Train1 cluster number was chosen as 5. Similarly by looking Figure 6 after c=3 cluster reconstruction error doesnt change so much. So for Train2 cluster number was chosen as 3.

50

C=2

-50 -40 50

-30

-20

-10

0 50

10

20

30

40

C=3

C=4
-30 -20 -10 0 10 20 30 40

-50 -40 50

-50 -40 50

-30

-20

-10

10

20

30

40

C=5

C=6
-30 -20 -10 0 10 20 30 40

-50 -40

-50 -40

-30

-20

-10

10

20

30

40

Figure 1: Clustering of Train 1 Data with different cluster counts (+ cluster means)
50

C=2

-50 -40 50

-30

-20

-10

10 50

20

30

40

C=3

C=4
-30 -20 -10 0 10 20 30 40

-50 -40 50

-50 -40 50

-30

-20

-10

10

20

30

40

C=5

C=6
-30 -20 -10 0 10 20 30 40

-50 -40

-50 -40

-30

-20

-10

10

20

30

40

Figure 2: Clustering of Test 1 Data with different cluster counts (+ cluster means)

10

C=2

-10 -10 10

-5

0 10

10

C=3

C=4
-5 0 5 10

-10 -10 10

-10 -10 10

-5

10

C=5

C=6
-5 0 5 10

-10 -10

-10 -10

-5

10

Figure 3: Clustering of Train 2 Data with different cluster counts (+ cluster means)
10

C=2

-10 -10 10

-5

0 10

10

C=3

C=4
-5 0 5 10

-10 -10 10

-10 -10 10

-5

10

C=5

C=6
-5 0 5 10

-10 -10

-10 -10

-5

10

Figure 4: Clustering of Test 2 Data with different cluster counts (+ cluster means)

x 10

Test Data Train Data

Cluster Reconstruction Error

1.5

0.5

0 2

4 Cluster Count C

Figure 5 :Cluster reconstruction error versus Cluster count for Train1 and Test1 Data Set

2.5

x 10

Test Data Train Data 2

Cluster Reconstruction Error

1.5

0.5

0 2

4 Cluster Count C

Figure 6 :Cluster reconstruction Error versus Cluster count for Train2 and Test2 Data Set

Das könnte Ihnen auch gefallen