Beruflich Dokumente
Kultur Dokumente
Topics Covered
Frequent Pattern
Association
Correlation
Support & Confidence
Closed Patterns and Max-Patterns
Apriori Algorithm
FP-Growth
Comparison between Apriori and FP-Growth
Correlation Analysis
7/11/15
Frequent Patterns
Frequent patterns are patterns such as itemset,
subsequence or substructure that appear in dataset
frequently.
Helps in data classification, clustering and mining
association, correlation and other interesting
relationships among data.
Has become an important data mining task and a
focused theme in data mining research.
7/11/15
Association
Association rules are
if/then statements
helps uncover relationships between seemingly
unrelated data in a relational database or other
information repository.
7/11/15
Correlation
A mutual relationship or connection between
two or more things.
Main goal is to find correlated interested
itemset.
7/11/15
7/11/15
Example
Transaction Items
ID
Min_sup = 50%
Frequent
itemset
Support
75%
50%
A, B, C
A, C
50%
A, D
50%
B, D
AC
50%
For rule A C:
support = support({A C}) = 50%
confidence = support({A C})/support({A}) = 66.6%
7/11/15
7/11/15
Example
Exercise: Suppose there are only two transactions
<a1, , a100>, <a1, , a50>
Let min_sup = 1
What is the set of closed itemset?
{a1, , a100}: 1
{a1, , a50}: 2
What is the set of max-pattern?
{a1, , a100}: 1
7/11/15
Apriori Algorithm
Finding frequent itemsets by candidate
generation
Apriori property:
All nonempty subsets of a frequent itemset must
also be frequent.
7/11/15
10
Apriori Algorithm
Apriori pruning principle
If there is any pattern which is infrequent, its superset
should not be generated/tested.
Process
Scan Database once to get frequent 1-itemset
For each level k:
Generate length (k+1) candidates from length k frequent
patterns
Scan Database and remove the infrequent candidates
11
Pseudo-code
1: Find all large 1-itemsets
2: For (k = 2 ; while Lk-1 is non-empty; k++)
3: {Ck = apriori-gen(Lk-1)
4:
For each c in Ck, initialise c.count to zero
5:
For all records r in the DB
6:
{Cr = subset(Ck, r); For each c in Cr , c.count+
+}
7:
Set Lk := all c in Ck whose count >= minsup
8:
} /* end -- return all of the L k sets.
7/11/15
12
Example
TID
List of
Items_IDs
Consider a database, D ,
consisting of 9 transactions.
T100
I1, I2, I5
T200
I2, I4
T300
I2, I3
T400
I1, I2, I4
T500
I1, I3
T600
I2, I3
T700
I1, I3
T800
I1, I2 ,I3, I5
T900
I1, I2, I3
7/11/15
13
Example
Generating 1-itemset Frequent Pattern:
Scan D for
count of
each
candidate
Itemse
t
Sup.Count
{I1}
{I2}
{I3}
Itemse
t
Sup.Count
{I1}
{I2}
{I3}
{I4}
{I4}
{I5}
{I5}
C1
7/11/15
Compare candidate
support count with
minimum support
count
L1
14
Example
Generating 2-itemset Frequent Pattern:
Itemset
Generate
C2
candidat
es from
L1
{I1, I2}
{I1, I3}
{I1, I4}
{I1, I5}
{I2, I3}
{I2, I4}
{I2, I5}
{I3, I4}
{I3, I5}
{I4, I5}
C2
7/11/15
Scan D
for count
of each
candidat
e
Itemse
t
Sup.
Count
Items
et
Sup
Count
{I1,
I2}
{I1,
I2}
{I1,
I3}
{I1,
I3}
{I1,
I4}
{I1,
I5}
{I1,
I5}
{I2,
I3}
{I2,
I3}
{I2,
I4}
{I2,
I4}
{I2,
I5}
{I2,
I5}
C2
Compare
candidate
support
count with
minimum
support
count
L2
2
2
15
Example
Generating 3-itemset Frequent Pattern:
Generate
C3
candidat
es from
L2
Itemset
{I1, I2, I3}
{I1, I2, I5}
C3
Scan D
for count
of each
candidat
e
Itemset
Sup.
Count
{I1, I2,
I3}
{I1, I2,
I5} C3
Compare
candidate
support
count with
min
support
count
Itemset
Sup
Coun
t
{I1, I2,
I3}
L3
{I1, I2,
I5}
16
7/11/15
17
Example
From previous example frequent itemset = {{I1}, {I2}, {I3}, {I4},
{I5}, {I1,I2}, {I1,I3}, {I1,I5}, {I2,I3}, {I2,I4}, {I2,I5}, {I1,I2,I3},
{I1,I2,I5}}.
Lets take l = {I1,I2,I5}.
Its all nonempty subsets are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.
18
7/11/15
19
Bottlenecks of Apriori
Generate a huge number of candidate sets
Repeatedly scan the whole database
Check a large set of candidates by pattern
matching
7/11/15
20
FP-Growth
FP-growth or Frequent Pattern Growth adopts
a divide-and-conquer strategy
Compresses the database into FP-tree.
Divides the database into a set of conditional
databases which is associated with one pattern
fragment.
Associated data sets for each fragment is
examined.
7/11/15
21
Example
TID
Items
T10
0
I1, I2, I5
T20
0
Item
Id
Support
Count
I2, I4
I2
T30
0
I2, I3
I1
T40
0
I1, I2, I4
I3
T50
0
I1, I3
I4
T60
0
I2, I3
I5
T70
0
I1, I3
T80
0
I1, I2 ,I3,
I5
T90
0
I1, I2, I3
7/11/15
Node
link
null{
}
22
Example
TID
Items
T10
0
I1, I2, I5
T20
0
Item
Id
Support
Count
I2, I4
I2
T30
0
I2, I3
I1
T40
0
I1, I2, I4
I3
T50
0
I1, I3
I4
T60
0
I2, I3
I5
T70
0
I1, I3
T80
0
I1, I2 ,I3,
I5
T90
0
I1, I2, I3
7/11/15
Node
link
I2:
1
null{
}
I1:
1
I5:
1
23
Example
TID
Items
T10
0
I1, I2, I5
T20
0
Item
Id
Support
Count
I2, I4
I2
T30
0
I2, I3
I1
T40
0
I1, I2, I4
I3
T50
0
I1, I3
I4
T60
0
I2, I3
I5
T70
0
I1, I3
T80
0
I1, I2 ,I3,
I5
T90
0
I1, I2, I3
7/11/15
Node
link
I2:
2
I1:
1
null{
}
I4:
1
I5:
1
24
Example
TID
Items
T10
0
I1, I2, I5
T20
0
Item
Id
Support
Count
I2, I4
I2
T30
0
I2, I3
I1
T40
0
I1, I2, I4
I3
T50
0
I1, I3
I4
T60
0
I2, I3
I5
T70
0
I1, I3
T80
0
I1, I2 ,I3,
I5
T90
0
I1, I2, I3
7/11/15
Node
link
I2:
3
I1:
1
I3:
1
null{
}
I4:
1
I5:
1
25
Example
TID
Items
T10
0
I1, I2, I5
T20
0
Item
Id
Support
Count
I2, I4
I2
T30
0
I2, I3
I1
T40
0
I1, I2, I4
I3
T50
0
I1, I3
I4
T60
0
I2, I3
I5
T70
0
I1, I3
T80
0
I1, I2 ,I3,
I5
T90
0
I1, I2, I3
7/11/15
null{
}
Node
link
I2:
4
I1:
2
I5:
1
I3:
1
I4:
1
I4:
1
26
Example
TID
Items
T10
0
I1, I2, I5
T20
0
Item
Id
Support
Count
I2, I4
I2
T30
0
I2, I3
I1
T40
0
I1, I2, I4
I3
T50
0
I1, I3
I4
T60
0
I2, I3
I5
T70
0
I1, I3
T80
0
I1, I2 ,I3,
I5
T90
0
I1, I2, I3
7/11/15
null{
}
Node
link
I2:
4
I1:
2
I5:
1
I1:
1
I3:
1
I4:
1
I3:
1
I4:
1
27
Example
TID
Items
T10
0
I1, I2, I5
T20
0
Item
Id
Support
Count
I2, I4
I2
T30
0
I2, I3
I1
T40
0
I1, I2, I4
I3
T50
0
I1, I3
I4
T60
0
I2, I3
I5
T70
0
I1, I3
T80
0
I1, I2 ,I3,
I5
T90
0
I1, I2, I3
7/11/15
null{
}
Node
link
I2:
5
I1:
2
I5:
1
I1:
1
I3:
2
I4:
1
I3:
1
I4:
1
28
Example
TID
Items
T10
0
I1, I2, I5
T20
0
Item
Id
Support
Count
I2, I4
I2
T30
0
I2, I3
I1
T40
0
I1, I2, I4
I3
T50
0
I1, I3
I4
T60
0
I2, I3
I5
T70
0
I1, I3
T80
0
I1, I2 ,I3,
I5
T90
0
I1, I2, I3
7/11/15
null{
}
Node
link
I2:
5
I1:
2
I5:
1
I1:
2
I3:
2
I4:
1
I3:
2
I4:
1
29
Example
TID
Items
T10
0
I1, I2, I5
T20
0
Item
Id
Support
Count
I2, I4
I2
T30
0
I2, I3
I1
T40
0
I1, I2, I4
I3
T50
0
I1, I3
I4
T60
0
I2, I3
I5
T70
0
I1, I3
T80
0
I1, I2 ,I3,
I5
T90
0
I1, I2, I3
7/11/15
null{
}
Node
link
I2:
6
I1:
3
I5:
1
I1:
2
I3:
2
I3:
1
I4:
1
I3:
2
I4:
1
I5:
1
30
Example
TID
Items
T10
0
I1, I2, I5
T20
0
Item
Id
Support
Count
I2, I4
I2
T30
0
I2, I3
I1
T40
0
I1, I2, I4
I3
T50
0
I1, I3
I4
T60
0
I2, I3
I5
T70
0
I1, I3
T80
0
I1, I2 ,I3,
I5
T90
0
I1, I2, I3
7/11/15
null{
}
Node
link
I2:
7
I1:
4
I5:
1
I1:
2
I3:
2
I3:
2
I4:
1
I3:
2
I4:
1
I5:
1
31
Example
Branches of I5 :
I2, I1, I5: 1
I2, I1, I3, I5: 1
I2:
2
I1:
2
Item
Conditional pattern
base
Conditional
FP-Tree
I5
<I2:2 ,
I1:2>
7/11/15
32
Example
Branches of I4 :
I2, I4: 1
I2, I1, I4: 1
I2:
2
Item
Conditional pattern
base
Conditional
FP-Tree
I4
<I2: 2>
{I2, I4: 2}
7/11/15
33
Example
Branches of I3 :
I2, I1, I3: 2
I2, I3: 2
I1, I3: 2
I2:
4
I1:
2
Ite
m
Conditional pattern
base
Conditional
FP-Tree
I3
<I2: 4, I1:
2>,<I1:2>
7/11/15
34
Example
Branches of I1 :
I2, I1: 4
I2:
4
7/11/15
null{
}
Item
Conditional
pattern base
Conditional
FP-Tree
Frequent
pattern
generated
I1
{(I2: 4)}
<I2: 4>
{I2, I1: 4}
35
Example
Item
Conditional pattern
base
Conditional
FP-Tree
Frequent pattern
generated
I5
<I2:2 , I1:2>
I4
<I2: 2>
{I2, I4: 2}
I3
<I2: 4, I1:
2>,<I1:2>
I1
{(I2: 4)}
<I2: 4>
{I2, I1: 4}
7/11/15
36
Pros of FP-growth
No candidate generation, no candidate test
Use compact data structure
Eliminate repeated database scan
Basic operation is counting and FP-tree building
7/11/15
37
Apriori Algorithm
FP-growth Algorithmn
Technique
It constructs conditional
pattern base and
condition FP tree from
database which satisfy
minimum support
Memory
utilization
Number of
scans
Execution
time
7/11/15
38
Correlation Analysis
Correlation Analysis provides an alternative
framework for finding interesting
relationships, or to improve understanding of
meaning of some association rules
Correlation measure
Lift
X2 measure
7/11/15
39
7/11/15
40
41
7/11/15
42
Reference
Chapter 6, Data Mining Concepts and
Techniques, Third Edition. By Jiawei Han,
Micheline Kamber and Jian Pei.
7/11/15
43
Thank You
7/11/15
44