Paper-3 An Efficient Frequent Pattern Mining in Health Care Database Computing

International Journal of Computational Intelligence and Information Security, March 2012 Vol. 3, No.
An Efficient Frequent Pattern Mining in Health Care Database Computing

1
N. Venkatesan
1
E. Ramaraj
Research Scholar, Email: envenki@gmail.com, envenki@sify.com 2 Technology Advisor, Madurai Kamaraj University, Madurai, Email: dr_ramaraj@yahoo.co.in
Abstract
The tremendous growth in data has generated the need for new techniques that can intelligently transform the massive data into useful information and knowledge. Data Mining is such a technique that extracts non-trivial, implicit, previously unknown and potentially useful information from the data in databases. Association Rule Mining is one of the most important and well researched techniques of data mining. It aims to extract interesting correlations, frequent patterns, associations of casual structures among sets of items in the transaction databases or other data repositories. Association rules are widely used in market databases, spatial databases, biological databases, medical databases and crime databases. This paper focuses to mine association rules from the real time surveyed medical database. So far, arrays, trees, hashing, depth first, breadth first, prefix tree based searching are used in association rule mining algorithms. If the size of the input is large, run time analysis of the algorithm is also increased. In this paper, a novel data structure is introduced so that it reduced dataset scan to one search. This new search technique is bit search. This bit search technique is to find the kth itemsets (where k =1,2,3,n) in one search scan. Keywords: Association Rules, Frequent Itemsets, Bit Search 1. INTRODUCTION Association rule mining [26] is one of the important advances in data mining. Association rules are widely used in market database, fraudulent data, etc. Implementing Association Rule algorithms to medical database is a new approach to extract hidden knowledge. Various algorithms are emerged to solve the problem of associations. ARM [6] has divided into two phases of process as follows: Phase 1: Phase 2: Identify the sets of frequent items or itemsets or pattern within the set of transaction using user specified support threshold. Generate inferences or rules from these above patterns using user specified confidence threshold.
Association rule mining [27] finds interesting association or correlation relationship among a large set of data items with massive amounts of data continuously being collected and stored, many industries are becoming interested in mining association rules from their databases. Let D be a set of n transactions such that D={T1, T2, T3,..,Tn}, where Ti=I and I is a set of items, I = (i1, i2, i3, .. ,im}. A subset of I containing k items is called a k-itemset. Let X and Y be two itemsets such that X I, Y I, and X Y=. An association rule is an implication denoted by X=>Y where X is called antecedent and Y is called the consequent. Given an itemset X, support s(X) is defined as the fraction of transactions Ti D such that XTi. Consider P(X) the probability of appearance of X in D, and P(Y|X) the conditional probability of appearance of Y given X. P(X) can estimated as P(X)=s(X). The support of a rule X=>Y is defined as s(X=>Y) = s(XUY). An association rule X=>Y has a measure of reliability called the confidence, defined
21
International Journal of Computational Intelligence and Information Security, March 2012 Vol. 3, No. 3
as c(X=>Y) = s(X=>Y)/s(X). Confidence can be used to estimated P(Y|X): P(Y|X) = P(XUY)/P(X) = c(X=>Y). Eclat, FP-growth, and several other frequent item set mining algorithms rely on this basic scheme, but differ in how they represent the conditional databases. The main approaches are horizontal and vertical representation. In a horizontal representation, the database is stored as a list (or array) of transactions, each of which is a list (or array) of items contained in it. In a Vertical representation, a database is represented by first referring with a list (or array) to the different items. For each item a list (or array) or identifiers are stored, which indicate the transactions that contain the item. There exist many algorithms which efficiently solve the FIM problem. Most of them are APRIORI, eclat and FP-growth based where efficiency come form the sophisticated use of the data structures. All the data structures used in finding frequent itemsets is only for searching the elements from the transactions. Arrays, trees, graphs are the some of the data structure used in Association rule mining techniques. Breadth first search and depth first s rearch are some of the searching procedure used to traverse the tree and graphs. Frequent itemset mining is still in major research issue. For frequent itemset generation, search plays a vital role to find support count of k-itemsets. This paper is classified as follows: Section 2 describes about the relate work. In Section 3, the problem defined. New Bit Search technique is proposed in the section 4. Experimental data and implementation of medical dataset is explained in the section 5. Medical rules are analyzed in the section 6. Section 7 is concluded along with future enhancement. 2. RELATED WORK Frequent patterns are itemsets, subsequences, or substructures that appear in a data set with frequency not less than a user-specified threshold. For example, a set of items, such as milk and bread that appear frequently together in a transaction data set is a frequent itemset. A subsequence, such as buying first a PC, then a digital camera and then a memory card, if it occurs frequently in a shopping history database, is a (frequent) sequential pattern. A substructure can refer to different structural forms, such as sub graphs, sub trees, or sub lattices, which may be combined with itemsets or subsequences. If a substructure occurs frequently in a graph database, it is called a (frequent) structural pattern. Frequent Itemset Mining algorithms are classified into three categories. 1. Apriori based candidate itemset generation 2. FP Growth based without candidate itemset generation 3. Eclat based vertical data layout 2.1. Apriori based candidate itemset generation The Apriori algorithm developed by Agrawal et al [2] is a great achievement in the history of mining association rules [5]. It is by far the most well-known association rule algorithm. This technique uses the property that any subset of a large itemset must be a large itemset. The Apriori algorithm [11] was proposed, there have been extensive studies on the improvements or extensions of Apriori, e.g., partitioning technique [19], sampling approach [21], dynamic itemset counting [4]. Novel method for counting the occurrences of itemsets by Tiwari et al. [20], CARM\A by Hidber [10], c-Miner by Li et al [12], H-Mining by Pei et al [13], Park et als .hashing technique [16], incremental mining by Cheung et al. [5], parallel and distributed mining [3] [24] and integrating mining with relational database systems by Sarawagi et al., [18]. Geerts et al. [7] derived a tight upper bound of the number of candidate patterns that can be generated in the level-wise mining approach. This result is effective at reducing the number of database scans. The Apriori algorithm performs a breadth-first search in the search space by generating candidate k+1itemsets from frequent k-itemsets. The frequency of an itemset is computed by counting its occurrence in each transaction. Many variants of the Apriori algorithm have been developed, such as AprioriTid,
22
AprioriHybrid, Direct Hashing and Pruning (DHP), Dynamic Itemset Counting (DIC), Partition algorithm, etc. 2.2. Mining Frequent Itemsets without candidate generation In many cases, the Apriori algorithm significantly reduces the size of candidate sets using the Apriori property. However, it can suffer from two-nontrivial costs: (1) generating a huge number of candidate sets and (2) repeatedly scanning the database and checking the candidates by pattern matching. Han et al. [9] [25] devised an FP-growth method that mines the complete set of frequent itemsets without candidate generation. FP-growth [22] is based on the divide and-conquer principle. The first scan of the database derives a list of frequent items in which items are ordered by frequency descending order. According to the frequency-descending list, the database is compressed into a frequent pattern tree, or FPtree, which retains the itemset association information. The FP-tree is mined by starting from each frequent length-1 pattern (as an initial suffix pattern), constructing its conditional pattern base (a sub-database, which consists of the set of prefix paths in the FP-tree co-occurring with the suffix pattern), then constructing its conditional FP-tree and performing mining recursively. The pattern growth is achieved by the concatenation of the suffix pattern with the frequent patterns generated from a conditional FP-tree. The main problem in FP-tree [23] is that the construction of the frequent pattern tree is a time consuming activity. Further FP-tree based approaches do not offer flexibility and reusability of computation during mining process. There are many alternatives and extensions to the FP-growth approach, including depth-first generation of frequent itemsets by Agarwal et al. [1] H-Mine, by Pei et al. [17] which explores a hyper-structure mining of frequent patterns; building alternative trees; exploring top-down and bottom-up traversal of such trees in pattern-growth mining by Liu et al. [15][14] and an array-based implementation of prefix-treestructure for efficient pattern growth mining by Grahne and Zhu [8] 2.3. Mining Frequent Itemsets using vertical data layout Most of the algorithms discussed earlier generate frequent itemsets from a set of transactions in horizontal data format (i.e., {TID: itemset}), where TID is a transaction- id and itemset is the set of items contained in transaction TID. Alternatively, mining can also be performed with data presented in vertical data format (i.e., {item: TID_set}). The following data are some of the medical uses of Mining techniques for analyzing the medical databases. Prediction of occurrence of disease and/or complications Therapy of choice for individual cancers Choice of antibiotics for individual infections Choice of a particular technique (of anastamosis, sutures, suture materials etc.) in a given surgical procedures. 2.4. Medical Dataset Experimental data in many domains serves as a basis for predicting useful trends. It is opted to generate association rule in one such medical database and consider the problem of discovering association rules between items in a large database in the Medical field. The data in medical field are usually very vast and interrelated. Researchers and health specialists are increasingly obtaining information on chronic illnesses from selfreports. This study validates self-reports of two major health conditions, hypertension and diabetes, based on a recently fielded survey. The information used to assess the validity of self-reports of hypertension and diabetes in this study derives from both the home-based interviews and the physical examinations. In the interviews, respondents were asked whether they ever had, and whether they currently have, each of 14 specific questions about medical conditions, including high blood pressure and diabetes.
23
Based on this various algorithms are developed for Apriori and clat principles, this provides implementation of these algorithms for medical datasets which is real time one and analyses the various issues. This dissertation proposed the use of rule induction in data mining applications in the field of medicine using new efficient algorithmic approaches. Association rule induction is a powerful method for finding interesting information from the medical dataset. 3. PROBLEM DEFINITION Data Mining has more techniques to mine useful pattern from the dataset. One such technique is Association rule generation, is done with the help of frequent itemsets. Frequent itemsets are generated with the help of searching of items in the given dataset. Dataset contains transaction and items. More than one item makes the itemsets. Searching the itemsets are done with many types of algorithms. Major research is to reduce the searching time of the algorithms. Bit search is a novel search procedure which generates frequent itemset which overcome this problem in order to make association among items of the datasets. Bit search is used to find frequent itemsets. With the help of Bit Search, association rules are generated for the surveyed medical dataset. 4. PROPOSED METHOD OF BIT SEARCHING TECHNIQUE The smallest storage element of the computer is called Bit. The storage of any values is done by bit. The bit has more operations in the memory. The bit is operated by AND, OR like operators to manipulate some output process. 4.1. Bit Operations A bit stream is a time series of bits. The term bit stream is frequently used to describe the configuration data to be loaded into the reconfigurable computer instead of application specific integrated circuit. A bit array is an array data structure that compactly stores individual bits (Boolean values). It implements a simple set data structure storing a subset of {1, 2, n}. Each bit in a word can be singled out and manipulated using bitwise operations. Machines have instructions to manipulate single bit, each bit in a word can be manipulated using bitwise operations. For example OR can be used to set a bit to one: 11101010 OR 00000100 = 11101110 AND can be used to set a bit to zero: 11101010 AND 11111101 = 11101000 AND together with zero-testing can be used to determine if a bit is set: 11101010 AND 00000001 = 00000000 = 0 11101010 AND 00000010 = 00000010 0 4.2. Bit Search Technique In any transaction, it is not easy to search an item or set of items. Any item or itemset search from the transaction is done through sequential or binary or using any other searching technique. In this work, bit search is introduced and implemented to generate association rules Conversion of Database into Numerical Dataset The entire transaction database is converted into numeric dataset for easy manipulation of the searching process.
24
Procedure: Numerical Transformation The input is a text file in which each item is given a unique number
Input: Database Output: Transforming the items as unique integer values numeric dataset Procedure Numerical_transform (database) begin for each item in db { if item already scan current item assign new number = old number +1 } return (numerical file) end
This procedure is used to transform each item in the database in to a numerical value by assigning the values 1, 2, n and get the item. Check whether value is already assigned, if not, new value is assigned to items by the value of previous item +1 Transaction bit array Let N be the number of transactions of the data set. Let M be the total number of items in the datasets. Convert the dataset items into N x M sparse matrix. Substitute all non-zero elements of sparse matrix as 1 and Mask the matrix as sparse bit matrix. Hence, keep all the transactions of the dataset as transaction bit array. Subset bit array k-itemset if it Let I be a set of items. A set X = {i1, . . . , ik} is the subset of I is called an itemset, or a contains k items. All the k-itemsets are converted into bit array by substituting the presence of items as 1 and absence as 0. All subset itemsets are converted into subset bit array. Frequent itemset An itemset is called frequent if its support is not less than a given absolute minimal support user defined threshold value. Bit Mask A pattern of binary values which is combined with some value using bit values 1 for presence of items and 0 for absence of items. The transaction with 0 and 1 combination for searching process is called Bit Mask. The figure 4.1 represents the bit search technique procedure for searching the itemsets from the dataset. Bitwise AND Bitwise AND operation is a novel searching technique used to find the frequent itemsets. The AND can be used to find the result value for subset bit array with transaction bit array of dataset sparse bit matrix. If the result value is as same as the subset bit array value, the k-itemsets are present in the transaction. This operation is applicable and done for all the subset k-itemsets (where k = 1,2,3,n) and find the result in a single search., If the result value is not same as the subset bit array value, the items are not present in the transactions. 25
SUBSET GENERATION
SUBSET BIT ARRAY Otherwise ITEMS ARE NOT PRESENT BITWISE AND BIT MASK ITEM BIT MASK TID DATA SET SPARSE BIT ARRAY TRANSACTION BIT ARRAY
SUBSET
BIT ARRAY
items are present in the transaction
Figure 4.1: Bit Search Procedure 4.3. Representation of Itemsets From the definition of association rule mining problem that transaction databases and sets of association have in common that they contain sets of items together with additional information, For example a transaction in the database contains a transaction ID and an itemset. A rule in a set of mined association rules contain two itemsets, one for the antecedent and one for the consequent, and additional quality information, ex: values for various itemset measure. Figure 4.1 shows the Bit Search procedure to find the frequent itemsets. 4.3.1. Transaction Bit Array All the transactions of the dataset are converted into sparse matrix which contains more numbers of zeros which are represented in table 4.1. Table 4.1: Sparse Bit Matrix Representation 1 T1 T2 T3 T4 T5 1 0 0 1 0 2 0 1 0 0 0 3 1 1 1 0 0 4 0 1 1 0 0 5 1 0 1 1 1
26
For adopting new searching concept, all the non-zero elements are converted into 1. Now the two dimensional array of sparse matrix converted into sparse bit matrix. The table 4.1 represents the transformed sparse bit matrix. All the rows are treated as transaction Bit Mask which is known as Transaction Bit Array. From the above table 3.1, T1 = 10101 is as Transaction Bit array for T1, 01110 as T2 bit array, and so on. 4.3.2. Subset Bit Array Datasets are commonly having N number of times. But the particular transactions have some limited number of items. Subsets are generated by the user for searching the k-number of items in the particular transaction. The following are some of the subsets which are to be found whether it is available in the transactions or not. For searching k-itemsets in any transaction, transaction bit array and subset bit array process is done as in figure 4.2. From the figure, transaction contains {1,4,6,8,10} as itemsets and searching subset is {1,4} and subset bit array Bitwise AND is an operation that is used to find k-itemsets presence in the transaction. Transaction bit array AND with k-itemsets bit array. If the result is equal to k-itemset bit array, all the items of itemsets are present in the transactions. Otherwise items are not present Figure 4.2 represents the transaction bit array of value 1001010101. Subset (1,4) has the bit array value as 1001000000. The result value of the bitwise and operation is also given.
1 Transaction Bit array
Bitwise AND Subset bit Array 1 Result 1 0 3 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
Figure 4.2: Bitwise AND operation 5. EXPERIMENTAL ANALYSIS WITH SAMPLE MEDICAL DATA To store collections of itemsets with possibly duplicated elements (identical rows) i.e., itemsets containing exactly the same items. Since a transaction database can contain different transactions with the same items. Such a database is still a set of transactions, since each transaction also contains a unique transaction ID. The binary incidence matrix will in general be very sparse with many items and a very large number of rows. A natural representation for such data is a sparse matrix format. To implement the above procedures, the following example is used to represent the searching efficiency. Bit Search Item Representation Association rules are generated for the surveyed medical dataset. Table 5.1 shows the sample transactions of the patients disease and complication particulars.
27
Table 5.1: Medical Dataset transactions Transaction id T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 Item set Db, HT, heartdz Db, Heartdz, kidneydz HT, Heartdz, kidneydz Db, kidneydz HT, heartdz HT, stroke HT, heart dz, stroke Db, HT, kidneydz Db, HT, kidney dz Db->Diabetes, HT-> Hypertension, dz -> disease All the transactions of the above dataset are converted into sparse matrix form and masked into sparse bit form. Table 3.4 shows the Horizontal representation of the dataset as sparse bit matrix in order to optimize process memory occupation and search time. In the table 5.2, all the transactions are bit format. Total count of individual items is calculated by making counting number of bits in each column. Table 5.3 represents the same data itemsets are represented in the vertical layout of the sparse bit matrix for searching the data items. Table 5.2: Horizontal Sparse Bit Representation of Medical Dataset Numeric Itemset 1,2,4 1,3,4 2,3,4 1,4 2,3 2,5 2,3,5 1 2,4 1,2,4
Transaction/Items T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 Total
Diabetes 1 1 0 1 0 0 1 1 1 1 7
Hypertension 1 0 1 0 1 1 0 0 0 1 5
Heart Disease 1 1 1 0 1 0 1 0 0 0 5
Kidney Disease 0 1 1 1 0 0 0 0 1 1 5
Stroke 0 0 0 0 0 1 1 0 0 0 2
Table 5.3: Vertical representation of Medical dataset Tid Diab HT HDz KDz ST T1 1 1 1 0 0 T2 1 0 1 1 0 T3 0 1 1 1 0 T4 1 0 0 1 0 T5 0 1 1 0 0 T6 0 1 0 0 1 T7 1 0 1 0 1 T8 1 0 0 0 0 T9 1 0 0 1 0 T10 1 1 0 1 0 Total 7 5 5 5 2
28
All the itemsets are made bitwise and operated with the corresponding row transactions for searching itemsets by one scan as per the table 5.2. Transaction bit arrays are and operated with subset bit array. The result is compared with the corresponding itemsets bit array structure. If it is same, all the items in the itemsets are present in the transaction and count is incremented by one. Otherwise proceed to compare the next transactions and find the support level of itemset. Continue the above operations upto k itemsets search in one search. For 1-Itemset Search The individual element of data items is known as 1-itemset. The total occurrence of individual items in both vertical and horizontal representation is calculated as per proposed algorithms. For 2 and more Itemsets Search Any combination of 2-itemsets, 3-itemsets and more are searched by using this new technique in both type of algorithms. For example for both vertical and horizontal sparse bit representation, to search the itemsets 2-itemset {5,6} in the 3rd transaction. The searching is done as follows: The 5 elements of the transaction 3 bit array are 01110. Subset {2,3} itemsets bit array is 01100 The searching operation is Transaction Bit Array AND Subset Bit Array Transaction Bit Array of T3 - 01110 Subset {2,3} bit array - 01100 01110 BITAND 01100 = 01100 The result is 01100. Hence, the items are present in the 3rd transaction. 6. MEDICAL ASSOCIATION RULES In the database used for this work, the records of patients with hypertension and diabetes as diseases are chosen initially. It is a selection of operational data from the Primary Health Centre patients and contains information about the name, designation, age, address, disease particulars, and duration of diseases. In order to facilitate the KDD process, a copy of this operational data is drawn and stored in a separate database. From the above dataset implementation, the rules are generated are like following. Overall database generate the following association rules with satisfying minimum support and confidence values are given. If the confidence value is high, then the rule is highly informative. The following are association rules derived from the given medical dataset. Diabetes kidney disease(Confidence 82%) Diabetes heart disease (Confidence 69%) Diabetes Stroke (Confidence 10%) Hypertension kidney disease (Confidence 24%) Hypertension heart disease (Confidence 55%) Hypertension Stroke (Confidence 30%) 7. CONCLUSION From the above type of implementation, the execution time is very less. A lot of rules are generated with the help of novel searching technique. From this, numbers of rules are generated from the medical dataset. These types of rules are known as single dimensional Association rules. These rules indicate the user for crucial decision making which retrieve hidden information. Each and every rule is very important in the medical field. Rules with low confidence are not provided any interesting information for patients further investigations. Hence, the dataset is analyzed for the further analysis in order to retrieve intelligent relationship from the medical dataset.
29
References: Agarwal, R.C., C. Aggarwal and V.V.V. Prasad, A tree projection algorithm for generation of frequent item sets. J. Parallel Distributed Computing, 61: 2001, pp: 350-371. 2. Agrawal, R. and R. Srikant, Fast algorithms for mining association rules. Proceedings of the 20th International Conference on Very Large Data Bases, Sept. 12-15, 1994 San Francisco, CA., USA., pp: 487-499. 3. Agrawal, R. and R. Srikant, Mining sequential patterns. Proceedings of the 11th International Conference on Data Engineering, March 6-10, 1995 Taipei, Taiwan, pp: 3-14. 4. Brin, S., R. Motwani, J.D. Ullman and S. Tsur, Dynamic itemset counting and implication rules for market basket data. Proc. 1997 ACM SIGMOID Int. Conf. Manage. Data, 26, 1997. 255-264. 5. Cheung, D.W., J. Han, V.T. Ng and C.Y. Wong, Maintenance of discovered association rules in large databases: An incremental updating technique. Proceedings of International Conference on Data Engineering, Feb. 26-Mar. 1, 1996. New Orleans, Louisiana, pp: 106-114. 6. Gade, K., J. Wang and G. Karypis, Efficient closed pattern mining in the presence of tough block constraints. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 22-25, 2004. Seattle, WA., pp: 138-147. 7. Geerts, F., B. Goethals and J. Bussche, A tight upper bound on the number of candidate patterns. Proceedings of the 2001 International Conference on Data Mining, Nov. 29-Dec. 2, 2001. San Jose, CA., pp: 155-162. 8. Grahne, G. and J. Zhu, Efficiently using prefix-trees in mining frequent itemsets. Proceedings of the 2003 ICDM International Workshop on Frequent Itemset Mining Implementations, (IWFIMI03), 2003. Melbourne, FL., pp: 123-132. 9. Han, J., J. Pei, Y. Yin and R. Mao, Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining Knowledge Discovery, 2004.8: 53-87 10. Hidber, C., Online association rule mining. ACM SIGMOD Rec., 28: 1999. PP: 145-156. 11. Srikant V and Agrawal, R. Fast Algorithms for Mining Association Rules, Proc. VLDB Conference, 1994, pp 487499. 12. Li, Z., Chen, Z. Srinivasan S.M. and Zhou, Y. C-Miner: Mining block correlations in storage systems. Proceedings of the 3rd USENIX Conference on File and Storage Technologies, March 31, 2004. San Francisco, CA., pp: 173-186. 13. Pei, J. Han, J. Lu, H. Nishio, S. Tang, S. and Yang, D. Hmine: Hyper-structure mining of frequent patterns in large databases Proc. of IEEE Intl. Conference on Data Mining, pp. 441-448, 2001. 14. Liu, G., Lu, H. Lou W. and Yu, J.X. On computing, storing and querying frequent patterns. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 24-27, 2003. Washington, DC., pp: 607-612. 15. Liu, J., Pan, Y. Wang K. and Han, J. Mining frequent item sets by opportunistic projection. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery in Databases, July 23-26, 2002. Edmonton, Canada, pp: 239-248. 16. Park, J.S., Chen M.S.and. Yu, P.S An effective hash-based algorithm for mining association rules. ACM SIGMOD Rec., 1995. 24: pp:175-186. 17. Pei, J., Han J. and Lakshmanan, L.V.S. Mining frequent itemsets with convertible constraints. Proceedings of the 17th International Conference on Data Engineering, April 2-6, 2001. Heidelberg, Germany, pp: 433-332. 18. Sarawagi, S., Thomas S. and Agrawal, R. Integrating association rule mining with relational database systems: Alternatives and implications. ACM SIGMOD Rec., 1998. 27: pp: 343-354 19. Savasere, A., Omieccinski E. and Navathe, S. An efficient algorithm for mining association rules in large databases. Proceedings of the 21st International Conference on Very Large Databases, Sept. 11-15, 1995. Zurich, Switzerland, pp: 432-443. 20. Tiwari, A., Gupta R.K. and Agrawal, D.P. A novel algorithm for mining frequent itemsets from large database. Int. J. Information Technology Knowledge Management, 2009. 2: pp: 223-229. 21. Toivonen, H., Sampling large databases for association rules. Proceedings of 22nd International Conference on Very Large Databases, Sept. 3-6, 1996. Bombay, India, pp: 134-145. 22. Han, J. and Pei, J. Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explorations Newsletter 2, 2, 2000, pp: 14-20. 30 1.
23. Han, J., Pei, J., and Yin, Y. Mining frequent patterns without candidate generation. In 2000 ACM SIGMOD Intl. Conference on Management of Data, W. Chen, J. Naughton, and P. A. Bernstein, Eds. ACM Press, 2000. 24. Zaki, M.J., Parthasarathy, S. Ogihara M. and Li, W. Parallel algorithm for discovery of association rules. Data Mining and Knowledge Discovery, 1997. 1: pp: 343-374. 25. Han, J. and Kamber, M. Data Mining Concepts and Techniques. San Francisco: Morgan Kanufmann, 2000. 26. Google. www.google.com 27. Wikipedia. www.wikipedia.com
31

Paper-3 An Efficient Frequent Pattern Mining in Health Care Database Computing

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Paper-3 An Efficient Frequent Pattern Mining in Health Care Database Computing

Hochgeladen von

Copyright:

Verfügbare Formate

International Journal of Computational Intelligence and Information Security, March 2012 Vol. 3, No.

An Efficient Frequent Pattern Mining in Health Care Database Computing

items are present in the transaction

1 Transaction Bit array

Bitwise AND Subset bit Array 1 Result 1 0 3 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0

Transaction/Items T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 Total

Das könnte Ihnen auch gefallen