Sie sind auf Seite 1von 8

WLAR-Viz: Weighted Least Association Rules

Visualization

A. Noraziah1, Zailani Abdullah2, Tutut Herawan1, and Mustafa Mat Deris3


1
Faculty of Computer Systems and Software Engineering, Universiti Malaysia Pahang
Lebuhraya Tun Razak, 26300 Kuantan Pahang, Malaysia
2
Department of Computer Science, Universiti Malaysia Terengganu, 21030 Kuala Terengganu,
Terengganu, Malaysia
3
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn
Malaysia, Parit Raja, Batu Pahat 86400, Johor, Malaysia
{noraziah,tutut,zailania}@ump.edu.my,
mmustafa@uthm.edu.my

Abstract. Mining weighted least association rules has been an increasing


demand in data mining research. However, mining these types of rules often
facing with difficulties especially in identifying which rules are really
interesting. One of the alternative solutions is by applying the visualization
model in those particular rules. In this paper, a model for visualizing weighted
least association rules is proposed. The proposed model contains five main
steps, including scanning dataset, constructing Least Pattern Tree (LP-Tree),
applying Weighted Support Association Rules (WSAR*), capturing Weighted
Least Association Rules (WELAR) and finally visualizing the respective rules.
The results show that by using a three dimensional plots provide user friendly
navigation to understand the weighted support and weighted least association
rules.

Keywords: Weighted least association rules, Data mining, Visualization.

1 Introduction
In the past decades, mining association rules or patterns from transaction database has
attracted many research interests. The aim of mining association rules is to uncover
all interesting and useful patterns that are presented in data repositories. It was first
introduced by Agrawal et al. [1] and still attracts many attentions from knowledge
discovery community [2,3,4,5,6]. In association rules, a set of item is defined as an
itemset. The itemset is said to be frequent, if it occurs more than a predefined
minimum support. Besides that, confidence is another alternative measurement used
in pair in association rules. The association rule is said to be strong if it meets the
minimum confidence. In contradiction with the previous itemset, least itemset is a set
of item that is infrequently found in the database. However, it may produce an
interesting result for certain domain applications such as to detect the air pollution [7],
serious diseases [8], educational decision support [9,10,11,12] and many more.
Normally, the least itemset can be only captured by lowering the minimum support
threshold. As a result, this approach may produce the enormous number of association

B. Liu, M. Ma, and J. Chang (Eds.): ICICA 2012, LNCS 7473, pp. 592599, 2012.
Springer-Verlag Berlin Heidelberg 2012
WLAR-Viz: Weighted Least Association Rules Visualization 593

and it is enormously difficult to identify which association rules are most significant.
Furthermore, the lowing the minimum support will also proportionally increase the
computational performance in generating the complete set of association rules.
In our previous work, we have proposed the Weighted Least Association Rules
framework (WELAR-f) in [13] to extract the significant association rules. Basically,
the WELAR framework contains an enhanced version of existing prefix tree and
frequent pattern growth algorithm called LP-Tree and LP-Growth algorithm,
respectively. Moreover, Weighted Support Association Rules (WSAR*) measurement
is also suggested in [13]. We have shown that by modifying this framework into
suitable visualization model, the significant rules can be captured and visualized. In
this paper, significant rules based on Breast-Cancer Wisconsin and Mushroom
datasets are finely presented and 3-Dimensionally visualized.
The rest of the paper is organized as follows. Section 2 describes the related works.
Section 3 explains in details the proposed methods. This is followed by experimental
results in section 4. Finally, conclusion and future direction are reported in section 6.

2 Related Works
Association rules visualization is one of the exciting subset in association rules. Its
main objective is to display data that can facilitate and comprehend the user
interpretation. Until this recent, many authors have come forward to develop
visualization techniques to support them in analyzing and comprehensively viewing
the association rules.
Wong et al. [14] used 3-Dimensional method to visualize association rules for text
mining. Bruzzese and Buono [15] presented a visual strategy to analyze huge rules by
exploiting graph-based technique and parallel coordinates to visualize the results of
association rules mining algorithms. Ceglar et al. [16] reviewed the current
association visualization techniques and introduced a new technique for visualizing
hierarchical association rules. Kopanakis et al. [17] proposed 3-Dimensional methods
of visual data mining technique for the representation and mining of classification
outcomes and association rules. Lopes et al. [18] presented a framework for visual
text mining to support exploration of both general structure and relevant topics within
a textual document collection. Leung et al. [19,20], developed a visualizer technique
for frequent pattern mining. Later, Herawan and Deris employed soft set theory for
mining maximal association rules and visualized them. Besides that, Abdullah et al.
visualized the Construction of Incremental Disorder Trie Itemset Data Structure
(DOSTrieIT) for Frequent Pattern Tree (FP-Tree).
Agrawal et al. was the first introduced the support-and-confidence measurement
for evaluation and classification of association rules. However, this measurement is
not covered weighted items. Cai et al. [25] introduced Weighted Association Rules
(WAR) with MINWAL(O) and MINWAL(W) algorithms based on the support
bounds approach to mine the weighted binary ARs. It can be considered as among the
first attempt to allow the single item to carry the weight rather than 0 or 1. Tao et al.
[26] proposed an algorithm namely Weighted Association Rule Mining (WARM) to
discover significant weight of itemset. In summary, the study of weighted association
rules is still very limited and worth to explore.
594 A. Noraziah et al.

3 Proposed Method

{ }
Throughout this section the set I = i1 , i2 ,, i A , for A > 0 refers to the set of

{ }
literals called set of items and the set D = t1 , t2 ,, t U , for U > 0 refers to the data
set of transactions, where each transaction t D is a list of distinct items
{ }
t = i1 , i2 ,, i M , 1 M A and each transaction can be identified by a distinct
identifier TID.

3.1 Definition

Definition 1. (Least Items). An itemset X is called least item if supp ( X ) ,


where and is the lowest and highest support, respectively.
The set of least item will be denoted as Least Items and

Least Items = {X I | supp ( X ) }

Definition 2. (Frequent Items). An itemset X is called frequent item if supp ( X ) > ,


where is the highest support.
The set of frequent item will be denoted as Frequent Items and

Frequent Items = { X I | supp ( X ) > }

Definition 3. (Merge Least and Frequent Items). An itemset X is called least frequent
items if supp ( X ) , where is the lowest support.
The set of merging least and frequent item will be denoted as LeastFrequent Items
and

LeastFrequent Items = { X I | supp ( X ) }

LeastFrequent Items will be sorted in descending order and it is denoted as

LeastFrequent Items desc =


X i s upp ( X i ) supp ( X j ) , 1 i, j k , i j ,

k = LeastFrequent Items , xi ,x j LeastFrequent Items

Definition 4. (Ordered Items Transaction). An ordered items transaction is a


transaction which the items are sorted in descending order of its support and denoted
as tidesc , where tidesc = LeastFrequentItems desc ti , 1 i n, tileast > 0, tifrequent > 0 .
An ordered items transaction will be used in constructing the proposed model, so-
called LP-Tree.
WLAR-Viz: Weighted Least Association Rules Visualization 595

Definition 5. (Significant Least Data). Significant least data is one which its
occurrence less than the standard minimum support but appears together in high
proportion with the certain data.
Definition 6. (Item Weight). A weight of an item is defined as a non negative real
number and it denoted as Item Weight = { X I | 0 weight ( X ) 1}
Definition 7. (Itemset Length). A weight of an item is defined as a non negative real
number and it denoted as Itemset Length = {X I | 0 weight ( X ) 1}
Definition 8. (Weighted Support Association Rules). A Weighted Support Association
Rules (WSAR*) is a weight of itemset by formulating the combination of the support and
weight of item, together with the total number of support in either of them.
The value of Weighted Support Association Rules denoted as WSAR* and

WSAR* ( I ) =
( ( supp ( A ) weight(A) ) + ( supp ( B) weight(B) ))
( supp ( A ) + supp ( B) -supp ( A B) )
WSAR* value is determined by multiplying the summation of items weight from both
antecedent and consequence, with the support of the itemset.

3.2 Model
There are five major components involved in visualizing the weighted least
association rules (WELAR). All these components are closely interrelated and the
process flow is moving in one-way direction. A complete overview model of
visualizing the critical least association rules is shown in Figure 1.
Dataset. All datasets used in this model are in a flat file format. Each record (or
transaction) is written in a line in the file and stored separately from others. The flat
file takes up much less space than the structure file.
LP-Tree. The construction of Least Pattern Tree (LP-Tree) structure is based on the
support descending orders of items. After that, several iterations will take place at LP-
Tree to mine and generate the desired patterns or least association rules.
Weighted Support Association Rules (WSAR*). WSAR* for each association rule is
computed. The items weight are assigned randomly (also can be fixed) in a range of
0.1 and 1.0. The antecedent, consequence and antecedent-consequence are utilized in
calculating the WSAR*.
Weighted Least Association Rules (WELAR). Only association rules that have
WSAR* value equal or more than predefined minimum WSAR* are classified as
significant rules. These rules consist the combination of both least and frequent
itemset.
Visualization. Weighted least association rules are presented in 3-D. Beside the value
of WLAR*, the values from others measurements are also displayed for comparison
and further analysis.
596 A. Noraziah et al.

Fig. 1. WLAR visualization model

4 Experimental Results

In this section, we do comparative analysis of weighted least rules being generated


using current weighted association rules, and the proposed measure, Weighted
Support Association Rules (WSAR*). Abbreviations for each measurement are shown
in Table 1.

Table 1. Abbreviations of different weighted measures


Measures Description
CAR Classical Association Rules
WAR Weighted Association Rules
WSSAR Weighted Support Significant Association Rules
WSAR* Weighted Support Association Rules (proposed measure)

4.1 Breast-Cancer Wisconsin Dataset


The first experiment, we evaluate our proposed visualization model to Breast-Cancer-
Wisconsin dataset from UCI Machine Learning Repository. The aim of the dataset is
to diagnose the breast cancer according to Fine- Needle Aspirates (FNA) test. The
dataset was obtained from a repository of a machine-learning database University of
California, Irvin. It was compiled by Dr. William H. Wolberg from University of
Wisconsin Hospitals, Madison, Madison, WI, United States. It has 11 attributes and
699 records (as of 15 July 1992) with 158 benign and 241 malignant classes,
respectively. Figure 2 visualizes top 20 weighted least association rules in 3-
Dimensional Bar Form.

4.2 Mushroom Dataset


In the second experiment, we evaluate our proposed visualization model to
Mushroom dataset which is also from UCI Machine Learning Repository. It is a dense
dataset and consists of 23 species of gilled mushroom in the Agaricus and Lepiota
WLAR-Viz: Weighted Least Association Rules Visualization 597

Family. Table 12 shows the fundamental characteristics of the dataset. Figure 16


shows some portion of data taken from the Mushroom dataset. It has 23 attributes and
8,124 records. Fig. 3 visualizes top 26 weighted least association rules in 3-
Dimensional Bar Form.

Fig. 2. A three dimensional bar form of visualizing weighted least association rules (WSAR*)
of Breast-Cancer Wisconsin dataset

Fig. 3. A three dimensional bar form of visualizing weighted least association rules (WSAR*)
of Mushroom dataset

5 Conclusion

The current approaches for visualizing association rules are still focusing on common
rules. From our knowledge, no research has been carried out to visualize the weighted
least association rules. Therefore in this paper, we proposed WLAR-Viz (Weighted
Least Association Rules Visualization), an approach for visualizing the significant
and weighted least association rules using Weighted Support Association Rules
(WSAR*) measurement. We evaluate the proposed model through the benchmarked
598 A. Noraziah et al.

Breast-Cancer Wisconsin and Mushroom dataset. The results show that using 3-
Dimensional Bar form can provide useful analysis in comparing the different types of
measurements in association rules. With this approach, we believe that our proposed
approach can also be used in capturing weighted least association rules from other
domain applications.

Acknowledgement. This research is supported by Fundamental Research Grant


Scheme (FRGS) from Ministry of Higher Education of Malaysia, Vote. No RDU
100109.

References
1. Agrawal, R., Imielinski, T., Swami, A.: Database Mining: A Performance Perspective.
IEEE Transactions on Knowledge and Data Engineering 5(6), 914925 (1993)
2. Abdullah, Z., Herawan, T., Deris, M.M.: An Alternative Measure for Mining Weighted
Least Association Rule and Its Framework. In: Zain, J.M., Wan Mohd, W.M.B., El-
Qawasmeh, E. (eds.) ICSECS 2011, Part II. CCIS, vol. 180, pp. 480494. Springer,
Heidelberg (2011)
3. Herawan, T., Yanto, I.T.R., Deris, M.M.: Soft Set Approach for Maximal Association
Rules Mining. In: lzak, D., Kim, T.-H., Zhang, Y., Ma, J., Chung, K.-I. (eds.) DTA
2009. CCIS, vol. 64, pp. 163170. Springer, Heidelberg (2009)
4. Abdullah, Z., Herawan, T., Deris, M.M.: Mining Significant Least Association Rules
Using Fast SLP-Growth Algorithm. In: Kim, T.-H., Adeli, H. (eds.)
AST/UCMA/ISA/ACN 2010. LNCS, vol. 6059, pp. 324336. Springer, Heidelberg (2010)
5. Herawan, T., Deris, M.M.: A soft set approach for association rules mining. Knowledge
Based Systems 24(1), 186195 (2011)
6. Abdullah, Z., Herawan, T., Deris, M.M.: Tracing Significant Information using Critical
Least Association Rules Model. To Appear in Special Issue of ICICA 2010, International
Journal of Innovative Computing and Applications x(x), xxxxxx (2012)
7. Mustafa, M.D., Nabila, N.F., Evans, D.J., Saman, M.Y., Mamat, A.: Association rules on
significant rare data using second support. International Journal of Computer
Mathematics 83(1), 6980 (2006)
8. Abdullah, Z., Herawan, T., Deris, M.M.: Detecting Critical Least Association Rules in
Medical Databases. International Journal of Modern Physics: Conference Series 9, 464
479 (2012)
9. Abdullah, Z., Herawan, T., Noraziah, A., Deris, M.M.: Extracting Highly Positive
Association Rules from Students Enrollment Data. Procedia Social and Behavioral
Sciences 28, 107111 (2011)
10. Abdullah, Z., Herawan, T., Noraziah, A., Deris, M.M.: Mining Significant Association
Rules from Educational Data using Critical Relative Support Approach. Procedia Social
and Behavioral Sciences 28, 97101 (2011)
11. Herawan, T., Vitasari, P., Abdullah, Z.: Mining Interesting Association Rules of Student
Suffering Mathematics Anxiety. In: Zain, J.M., Wan Mohd, W.M.B., El-Qawasmeh, E.
(eds.) ICSECS 2011, Part II. CCIS, vol. 180, pp. 495508. Springer, Heidelberg (2011)
12. Herawan, T., Vitasari, P., Abdullah, Z.: Mining Interesting Association Rules on Student
Suffering Study Anxieties using SLP-Growth Algorithm. International Journal of
Knowledge and Systems Science 3(2), 2441 (2012)
WLAR-Viz: Weighted Least Association Rules Visualization 599

13. Abdullah, Z., Herawan, T., Deris, M.M.: An Alternative Measure for Mining Weighted
Least Association Rule and Its Framework. In: Zain, J.M., Wan Mohd, W.M.B., El-
Qawasmeh, E. (eds.) ICSECS 2011, Part II. CCIS, vol. 180, pp. 480494. Springer,
Heidelberg (2011)
14. Wong, P.C., Whitney, P., Thomas, J.: Visualizing Association Rules for Text Mining. In:
Proceedings 1999 IEEE Symposium on Information Visualization (Info Vis 1999), pp.
120123 (1999)
15. Bruzzese, D., Buono, P.: Combining Visual Techniques for Association Rules Exploration.
In: Proceedings of the Working Conference on Advanced Visual Interfaces (AVI 2004),
pp. 381384. ACM Press (2004)
16. Ceglar, A., Roddick, J., Calder, P., Rainsford, C.: Visualizing hierarchical associations.
Knowledge and Information Systems 8, 257275 (2005)
17. Kopanakis, I., Pelekis, N., Karanikas, H., Mavroudkis, T.: Visual Techniques for the
Interpretation of Data Mining Outcomes. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005.
LNCS, vol. 3746, pp. 2535. Springer, Heidelberg (2005)
18. Lopes, A.A., Pinho, R., Paulovich, F.V., Minghim, R.: Visual text mining using
association rules. Computers & Graphics 31, 316326 (2007)
19. Leung, C.K.S., Irani, P., Carmichael, C.L.: WiFIsViz: Effective Visualization of Frequent
Itemsets. In: Proceeding of Eighth IEEE International Conference on Data Mining (ICDM
2008), pp. 875880. IEEE Press (2008)
20. Leung, C.K.-S., Irani, P.P., Carmichael, C.L.: FIsViz: A Frequent Itemset Visualizer. In:
Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI),
vol. 5012, pp. 644652. Springer, Heidelberg (2008)