Sie sind auf Seite 1von 37

association

Given a set of transactions, find rules that will


predict the occurrence of an item based on the
occurrences of other items in the transaction

Market-Basket transactions

Example of Association
Rules

{Diaper} {Beer},
{Milk, Bread}
{Eggs,Coke},
{Beer, Bread} {Milk},

Implication means cooccurrence, not causality!

Itemset
A collection of one or more items
Example: {Milk, Bread, Diaper}

k-itemset
An itemset that contains k items

Support count ()
Frequency of occurrence of an
itemset
E.g. ({Milk, Bread,Diaper}) = 2

Support
Fraction of transactions that
contain an itemset
E.g. s({Milk, Bread, Diaper}) =
2/5

Frequent Itemset
An itemset whose support is
greater than or equal to a minsup
threshold

Association Rule
An implication expression of the
form X Y, where X and Y are
itemsets
Example:
{Milk, Diaper} {Beer}

Rule Evaluation Metrics


Example:

Support (s)

Confidence (c)

{Milk, Diaper} Beer

Fraction of transactions that


contain both X and Y
Measures how often items in Y
appear in transactions that
contain X

s
c

(Milk, Diaper, Beer) 2


0.4
|T|
5

(Milk, Diaper, Beer) 2


0.67
(Milk, Diaper )
3

Given a set of transactions T, the goal of


association rule mining is to find all rules
having
support minsup threshold
confidence minconf threshold

Brute-force approach:
List all possible association rules
Compute the support and confidence for each rule
Prune rules that fail the minsup and minconf
thresholds
Computationally prohibitive!

Example of Rules:

Observations:
All the above rules are
binary partitions of the same
itemset:
{Milk, Diaper,
Beer}
Rules originating from the
same itemset have identical
support but can have
different confidence
Thus, we may decouple the
support and confidence

{Milk,Diaper} {Beer}
c=0.67)
{Milk,Beer} {Diaper}
c=1.0)
{Diaper,Beer} {Milk}
c=0.67)
{Beer} {Milk,Diaper}
c=0.67)
{Diaper} {Milk,Beer}
c=0.5)
{Milk} {Diaper,Beer}
c=0.5)

(s=0.4,
(s=0.4,
(s=0.4,
(s=0.4,
(s=0.4,
(s=0.4,

Two-step approach:
1. Frequent Itemset Generation
Generate all itemsets whose support minsup

2. Rule Generation
Generate high confidence rules from each
frequent itemset, where each rule is a binary
partitioning of a frequent itemset

Frequent itemset generation is still


computationally expensive

Brute-force approach:
Each itemset in the lattice is a candidate frequent
itemset
Count the support of each candidate by scanning the
database

Match each transaction against every candidate


Complexity ~ O(NMw) => Expensive since M = 2d !!!

Transaction ID

Items

100

Bread, Cheese

200

Bread, Cheese, Juice

300

Bread, Milk

400

Cheese, Juice, Milk

Find out all possible combinations

Itemsets

Frequency

Bread

Cheese

Juice

milk

(Bread, Cheese)

(Bread, Juice)

(Bread, Milk)

(Cheese, Juice)

(Cheese, Milk)

(Juice, Milk)

(Bread, Cheese, Juice)

(Bread, Cheese, Milk)

(Bread, Juice, Milk)

(Cheese, Juice, Milk)

(Bread, Cheese, Juice,


Milk)

minimum support 50%


Minimum confidence 75%

Itemsets

Frequency

Bread

Cheese

Juice

Milk

Bread, cheese

Cheese, Juice

Bread Cheese with confidence of 2/3


=67%
Cheese Bread with confidence of 2/3
=67%
Cheese Juice with confidence of 2/3
=67%
Juice
Cheese with confidence of
=100%
Rules that have more than the user-specified
minimum confidence are called confident

Transaction
Find out Items
all possible
ID

Combinations
combinations

100

Bread, Cheese

{Bread, Cheese}

200

Bread, Cheese,
Juice

{Bread, Cheese}, {Bread,


Juice}
{Cheese, Juice}, {Bread,
Cheese, Juice}

300

Bread, Milk

{Bread, Milk}

400

Cheese, Juice, Milk

{Cheese, Juice}, {Cheese,


Milk}, {Juice, Milk}{Cheese,
Juice, Milk}
Find out all possible combinations with non-zero frequency

Itemsets

Frequency

Bread

Cheese

Juice

milk

(Bread, Cheese)

(Bread, Juice)

(Bread, Milk)

(Cheese, Juice)

(Cheese, Milk)

(Juice, Milk)

(Bread, Cheese, Juice)

(Cheese, Juice, Milk)

Method:
Let k=1
Generate frequent itemsets of length 1
Repeat until no new frequent itemsets are
identified

Generate length (k+1) candidate itemsets from


length k frequent itemsets
Prune candidate itemsets containing subsets of
length k that are infrequent
Count the support of each candidate by scanning
the DB
Eliminate candidates that are infrequent, leaving
only those that are frequent

Candidate counting:

Scan the database of transactions to determine


the support of each candidate itemset
To reduce the number of comparisons, store the
candidates in a hash structure

Instead of matching each transaction against every


candidate, match it against candidates contained in
the hashed buckets

Transaction ID

Items

100

Bread, Cheese, Eggs, Juice

200

Bread, Cheese, Juice

300

Bread, Milk, Yogurt

400

Cheese, Juice, Milk

500

Cheese, Juice, Milk

50% support

Item

Frequency

Bread

Cheese

Juice

Milk

Itemsets

Frequency

(Bread, Cheese)

(Bread, Juice)

(Bread, Milk)

(Cheese, Juice)

(Cheese, Milk)

(Juice, Milk)

Item

Frequency

Bread, Juice

Cheese, Juice

Bread

Juice

Cheese

Juice
3/4

Juice with confidence of 3/4


=75%
Bread with confidence of 3/4
=75%
Juice with confidence of 3/3
=100%
Cheese with confidence of
=75%

Item Number

Item Name

Biscuits

Bread

Cereal

Cheese

Chocolate

Coffee

Donuts

Eggs

Juice

10

Milk

11
12

Newspaper
Pastry

13

Rolls

14

Sugar

15

Tea

16

Yogurt

TID

Items

Biscuits, Bread, Cheese, Coffee, Yogurt

Bread, Cereal, Cheese, Coffee

Cheese, Chocolate, Donuts, Juice, Milk

Bread, Cheese, Coffee, Cereal, Juice

Bread, Cereal, Chocolate, Donuts, Juice

Milk, Tea

Biscuits, Bread, Cheese, Coffee, Milk

Eggs, Milk, Tea

Bread, Cereal, Cheese, Chocolate, Coffee

10

Bread, Cereal, Chocolate, Donuts, Juice

11

Bread, Cheese, Juice

12

Bread, Cheese, Coffee, Donuts, Juice

13

Biscuits, Bread, Cereal

14

Cereal, Cheese, Chocolate, Donuts, Juice

15

Chocolate, Coffee

16

Donuts

17

Donuts, Eggs, Juice

18

Biscuits, Bread, Cheese, Coffee

19

Bread, Cereal, Chocolate, Donuts, Juice

20

Cheese, Chocolate, Donuts, Juice

21

Milk, Tea, Yogurt

22

Bread, Cereal, Cheese, Coffee

23

Chocolate, Donuts, Juice, Milk, Newspaper

24

Newspaper, Pastry, Rolls

25

Rolls, Sugar, Tea


25% support

Frequency count for all items


Item
Number

Item Name

Frequency

Biscuits

Bread

13

Cereal

10

Cheese

11

Chocolate

Coffee

Donuts

10

Eggs

Juice

11

10

Milk

11
12

Newspaper
Pastry

2
1

13

Rolls

14

Sugar

15

Tea

16

Yogurt

Item

Frequency

Bread

13

Cereal

10

Cheese

11

Chocolate

Coffee

Donuts

10

Juice

11

{Bread, Cereal}
{Bread, Cheese}
{Bread, Chocolate}
{Bread, Coffee}
{Bread, Donuts}
{Bread, Juice}
{Cereal, Cheese}
{Cereal, Chocolate}
{Cereal, Coffee}
{Cereal, Donuts}
{Cereal, Juice}
{Cheese, Chocolate}

{Cheese, Coffee}
{Cheese, Donuts}
{Cheese, Juice}
{Chocolate, Coffee}
{Chocolate, Donuts}
{Chocolate, Juice}
{Coffee, Donuts}
{Coffee, Juice}
{Donuts, Juice}

{Bread, Cereal}

{Bread, Cheese}

{Bread, Chocolate}

{Bread, Coffee}

{Bread, Donuts}

{Bread, Juice}

{Cereal, Cheese}

{Cereal, Chocolate}

{Cereal, Coffee}

{Cereal, Donuts}

{Cereal, Juice}

{Cheese, Chocolate}

{Cheese,
Coffee}

{Cheese, Donuts}

{Cheese, Juice}

{Chocolate,
Coffee}

{Chocolate,
Donuts}

{Chocolate, Juice}

{Coffee, Donuts}

{Coffee, Juice}

{Donuts, Juice}

{Bread, Cereal}

{Bread, Cheese}

{Bread, Coffee}

{Cheese, Coffee}

{Chocolate, Donuts}

{Chocolate, Juice}

{Donuts, Juice}

{Bread, Cereal,
Cheese}

{Bread, Cereal,
Coffee}

{Bread, Cheese,
Coffee}

{Chocolate, Donuts,
7
Frequent 3-itemsets
Juice}
{Bread, Cheese,
Coffee}

{Chocolate, Donuts,
Juice}

or L3

Confidence of association rules from


{Chocolate, Donuts, Juice}
Rule

Support of BCD

Frequency of
LHS

Confidence

MP

0.78

NP

10

0.70

NM 7

11

0.64

MP

0.78

NP

1.0

NM

1.0

Rule

Support of
BCD

Frequency of
LHS

Confidence

B
CD

13

0.61

C
BD

11

0.72

D
BC

0.89

CD
B

0.89

BD
C

1.0

BC
D

1.0

Cheese

Bread

Cheese

Coffee

Coffee

Bread

Coffee

Cheese

Cheese, Coffee

Bread

Bread, Coffee

Cheese

Bread, Cheese

Coffee

Chocolate

Donuts

Chocolate

Juice

Donuts

Chocolate

Donuts

Juice

Donuts, Juice

Chocolate

Chocolate, Juice

Donuts

Chocolate, Donuts

Juice

Bread
Cereal

Cereal
Bread

Das könnte Ihnen auch gefallen