Beruflich Dokumente
Kultur Dokumente
Learning
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of Computer Science and Engineering
1/5/2020 2
Grouping Unlabeled Items
Using K-means Clustering
1/5/2020 3
Clustering in Machine
Learning
Clustering: is the assignment of a set of observations into
subsets (clusters) so that observations in the same cluster are
similar in some sense.
1/5/2020 6
K-means Clustering
Step 2: put any initial partition that classifies the data into k
clusters.
You may assign the training sample randomly, or
systematically as the following:
Take the first k training sample as single-element clusters.
Assign each of the remaining (N-k) training sample to the cluster
with the nearest centroid. After each assignment, recomputed the
centroid of the gaining cluster.
1/5/2020 7
K-means Clustering
1/5/2020 8
K-means Clustering
1/5/2020 9
K-means Algorithm
1/5/2020 10
K-means Algorithm
1/5/2020 11
K-means Algorithm
1/5/2020 12
K-means Algorithm
1/5/2020 13
K-means Algorithm
1/5/2020 14
K-means Algorithm
1/5/2020 15
K-means Algorithm
1/5/2020 16
K-means Clustering (Example)
1/5/2020 17
K-means Clustering (Example)
1/5/2020 18
K-means Clustering (Example)
1/5/2020 19
K-means Clustering (Example)
1/5/2020 20
K-means Clustering (Example)
1/5/2020 21
K-means Clustering (Example)
1/5/2020 22
K-means Clustering (Example)
1/5/2020 23
K-means Clustering (Example)
1/5/2020 24
K-means Clustering (Example)
5. Iteration-1, Objects-Centroids
distances: The next step is to
compute the distance of all
objects to the new centroids.
Similar to step 2, we have
distance matrix at iteration 1 is:
1/5/2020 25
K-means Clustering (Example)
1/5/2020 26
K-means Clustering (Example)
Group 1 and Group 2 both has two members, this the new
centroids are C1 =
And C2 =
1/5/2020 27
K-means Clustering (Example)
8. Iteration-2, Objects-
Centroids distances: Repeat
step 2 again, we have new
distance matrix at iteration 2 as:
1/5/2020 28
K-means Clustering (Example)
1/5/2020 29
K-means Clustering (Example)
1/5/2020 30
K-means Clustering (Example)
1/5/2020 31
K-means Clustering
1/5/2020 32
K-Means : the Syntax
Fit the instance on the data and then predict clusters for new
data.
kmeans = kmeans.predict(x1)
y_predict = kmrans.predict(x1)
Can also be used in batch mode with MiniBatchkMeans.
1/5/2020 33
Distance Metrics
1/5/2020 34
Distance Metrics
Euclidian distance:
1/5/2020 35
Distance Metrics
Manhattan distance:
1/5/2020 36
Distance Metrics
Cosine distance:
1/5/2020 37
Euclidean Vs Cosine Distance
1/5/2020 38
Distance Metrics
Jaccard distance:
Applies to sets (like word occurrence)
Sentence A: “I like chocolate ice cream.”
Set A = {I, like, chocolate, ice, cream}
Sentence B: “Do I want chocolate cream or vanilla cream?”
Set B = {Do, I, want, chocolate, cream, or, vanilla}
1/5/2020 39
Distance Metrics
Jaccard distance:
Applies to sets (like word occurrence)
Sentence A: “I like chocolate ice cream.”
Set A = {I, like, chocolate, ice, cream}
Sentence B: “Do I want chocolate cream or vanilla cream?”
Set B = {Do, I, want, chocolate, cream, or, vanilla}
1/5/2020 40
Distance Metrics : the Syntax
1/5/2020 41
Other Types of Clustering
1/5/2020 42
Association Analysis with the
Apriori Algorithm
1/5/2020 43
Mining Association Rules
1/5/2020 44
Mining Association Rules
What could be a rule and what kind of rules are we looking for?
Example of an association rule could be:
Computer => finacial _managment_software
What is support?
Support (frequency) is simply a probability that a randomly
chosen transaction t contains both items A and B.
1/5/2020 47
Notation and Basic Concepts
What is confidence?
Confidence (accuracy) is simply a probability that an itemset B
is purchased in a randomly chosen transaction t given that the
itemset A is purchased.
1/5/2020 48
Notation and Basic Concepts
1/5/2020 50
Apriori Algorithm
1/5/2020 51
Apriori Algorithm
1/5/2020 52
Apriori Algorithm
1/5/2020 53
Apriori Algorithm
1/5/2020 54
Apriori Algorithm
If a minimum confidence rule was 75%, only the second, third and
sixth rule would be considered strong and thus outputted.
1/5/2020 57
Efficiently Finding Frequent
Items with FP-growth
1/5/2020 58
Disadvantage of Apriori
1/5/2020 59
FP-growth
1/5/2020 60
FP-growth
1/5/2020 61
FP-growth
Step 1:
The first step is we count all the items in all the transactions
TMario= [ beer: 5, bread: 2, butter: 3, milk: 3, cheese: 3,
diapers: 1]
Step 3:
Now we sort the list according to the count of each item.
Tmariosorted = [ beer: 5, butter: 3, milk: 3, cheese: 3, bread: 2]
1/5/2020 63
FP-growth
1/5/2020 64
FP-growth
1/5/2020 65
FP-growth
1/5/2020 66
FP-growth
1/5/2020 67
FP-growth
1/5/2020 68
FP-growth
Step 5:
we go through every branch of the tree and only include in the
association all the nodes whose count passed the threshold.
1/5/2020 69
FP-growth
1/5/2020 70
FP-growth
1/5/2020 71
Apriori Vs FP-growth
1/5/2020 72
Question & Answer
1/5/2020 73