Beruflich Dokumente
Kultur Dokumente
DUVVADA, VISAKHAPATNAM.
PRESENTED BY :
P.BHAKTHAVATHSALANAIDU K.K.S.SWAROOP
pbhaktavatsala@gmail.com
karey_swaroop@yahoo.co.in
1) In the first iteration of the algorithm, each items is a member of the set of candidate 1-
itemsets, C1. The algorithm simply scans all of the transactions in order to count the
number of occurrence of each item.
2) Suppose that the minimum transactions support count required is 2. The set of frequent 1-
itemsets, L1, can then be determined. It consists of the candidate 1-itemsets satisfying
minimum support.
3) To discover the set of frequent 2-itemsets, L2 the algorithm uses L1 1X1 L1 to generate a
candidate set of 2-itemsets C2.
In this way we will find candidate sets until a candidate set is null.
The mining of the FP-tree proceeds as follows. Start from each frequent length-1 pattern,
construct its conditional pattern base (a sub database which consists of the set of prefix paths in
the FP-tree co-occurring with the suffix pattern), then construct its (conditional) FP-tree and
perform mining recursively on such a tree. The pattern growth is achieved by the concatenation
tree.
Let’s first consider I5 which is the last item in L, rather than the first. The reasoning
behind this will become apparent as we explain the FP-tree mining process. I5 occurs in two
branches of the FP-tree. The paths formed by these branches are < (I2, I2, I5:1)> and < (I2, I1,
I3, I5:1)>. Therefore considering I5 as a suffix, its corresponding two prefix paths are <
(I2I1:1)> and < (I2, I1, I3:1) >, which form its conditional pattern base. Its conditional FP-tree
contains only a single path, (I2:2, I1:2); I3 is not included because its support count of 1 is less
than the minimum support count. The single path generates all the combinations of frequent
patterns: I2 I5:2, I1 I5:2, I2 I1 I5:2.
In the same way find the frequent itemsets for all other Items. The FP-growth method
transforms the problem of finding long frequent patterns to looking for shorter ones recursively
and then concatenating the suffix. It uses the least frequent items as suffix, offering good
selectivity. The method substantially reduces the search costs.
PROBLEM DEFINITION:
An itemset X is contained in transaction <tid,Y> if X⊆ Y. Given a transaction database
TDB, the support of an itemset X, denoted as sup(X), is the number of transactions in TDB
which contain X. An association rule R: X⇒Y is an implication between two itemsets X and Y
where X, Y⊂I and X∩Y =∅. The support of the rule, denoted as sup(X⇒Y), is defined as sup
(XUY). The confidence of the rule, denoted as conf(X⇒Y), is defined as sup (XUY)/sup(X).
The requirement of mining the complete set of association rules leads to two problems:
1) There may exist a large number of frequent itemsets in a transaction database, especially
when the support threshold is low.
2) There may exist a huge number of association rules. It is hard for users to comprehend
and manipulate a huge number of rules.
An interesting alternative to this problem is the mining of frequent closed itemsets and their
corresponding association rules.
Frequent closed itemset: An itemset X is a closed itemset if there exist no itemset X’ such that
(1) X’ is a proper superset of X and (2) every transaction containing X also contains X’. A closed
itemset X is frequent if its support passes the given support threshold.
How to find the complete set of frequent closed itemsets efficiently from large database,
which is called the frequent closed itemset mining problem
For the transaction database in table1 with min_sup = 2, the divide and conquer method for
mining frequent closed itemset.
1) Find frequent items. Scan TDB to find the set of frequent items and derive a global
frequent item list, called f_list, and f_list = {c:4, e:4, f:4, a:3, d:2}, where the items are
sorted in support descending order any infrequent item, such as b are omitted..
2) Divide search space. All the frequent closed itemsets can be divided into 5 non-overlap
subsets based on the f_list: (1) the ones containing items d,(2) the ones containing item a
but no d, (3) the ones containing item f but no a not d, (4) the ones containing e but no f,
a nor d, and (5) the one containing only c. once all subsets are found, the complete set of
frequent closed itemsets is done.
3) Find subsets of frequent closed itemsets. The subsets of frequent closed itemsets can be
mined by constructing corresponding conditional database and mine each recursively.
Find frequent closed itemsets containing d. Only transaction containing d are needed. The d-
conditional database, denoted as TDB|d, contains all the transactions having d, which is {cefa,
cfa}. Notice that item d is omitted in each transaction since it appears in every transaction in the
d-conditional database.
The support of d is 2. Items c, f and a appear twice respectively in TDB|d. Therefore, cfad: 2 is a
frequent closed itemset. Since this itemset covers every frequent items in TDB|d finishes.
In the same way find the frequent closed itemsets for a, f, e, and c.
4) The set of frequent closed itemsets fund is {acdf :2, a :3, ae :2, cf :4, cef :3, e :4}
Optimization 1: Compress transactional and conditional databases using FP-tree structures. FP-
tree compresses databases for frequent itemset mining. Conditional databases can be derived
from FP-tree efficiently.
Optimization 2: Extract items appearing in every transaction of conditional databases.
Optimization 3: Directly extract frequent closed itemsets from FP-tree.
Optimization 4: Prune search branches.
PERFORMANCE STUDY
Comparison of A-close, CHARM, and CLOSET, CLOSET out performs both CHARM
and A-close. CLOSET is efficient and scalable in mining frequent closed itemsets in large
databases. It is much faster than A-close, and also faster than CHARM.
CONCLUSION
CLOSET leads to less and more interesting association’s rules then the other previously
proposed methods.