Beruflich Dokumente
Kultur Dokumente
6, JUNE 2010
Abstract—In Data Mining, the usefulness of association rules is strongly limited by the huge amount of delivered rules. To overcome
this drawback, several methods were proposed in the literature such as itemset concise representations, redundancy reduction, and
postprocessing. However, being generally based on statistical information, most of these methods do not guarantee that the extracted
rules are interesting for the user. Thus, it is crucial to help the decision-maker with an efficient postprocessing step in order to reduce
the number of rules. This paper proposes a new interactive approach to prune and filter discovered rules. First, we propose to use
ontologies in order to improve the integration of user knowledge in the postprocessing task. Second, we propose the Rule Schema
formalism extending the specification language proposed by Liu et al. for user expectations. Furthermore, an interactive framework is
designed to assist the user throughout the analyzing task. Applying our new approach over voluminous sets of rules, we were able, by
integrating domain expert knowledge in the postprocessing step, to reduce the number of rules to several dozens or less. Moreover,
the quality of the filtered rules was validated by the domain expert at various points in the interactive process.
Index Terms—Clustering, classification, and association rules, interactive data exploration and discovery, knowledge management
applications.
1 INTRODUCTION
expressive, and accurate formalism, the more the rule of transactions containing X [ Y . If suppðX ! Y Þ ¼ s,
selection is efficient. In the Semantic Web1 field, ontology is s % of transactions contains the itemset X [ Y .
considered as the most appropriate representation to . The confidence of the rule, defined as confðX ! Y Þ ¼
express the complexity of the user knowledge, and several suppðX ! Y Þ=suppðXÞ ¼ suppðX [ Y Þ=suppðXÞ ¼ c,
specification languages were proposed. is the ratio (c %) of the number of transactions that,
This paper proposes a new interactive postprocessing containing X, contain also Y .
approach, ARIPSO (Association Rule Interactive post-Proces- Starting from a database and two thresholds minsupp and
sing using Schemas and Ontologies) to prune and filter minconf for the minimal support and, respectively, the
discovered rules. First, we propose to use Domain Ontologies minimal confidence, the problem of finding association
in order to strengthen the integration of user knowledge in rules, as discussed in [1], is to generate all rules that have
the postprocessing task. Second, we introduce Rule Schema support and confidence greater than the given thresholds.
formalism by extending the specification language proposed This problem can be divided into two main problems:
by Liu et al. [12] for user beliefs and expectations toward the
use of ontology concepts. Furthermore, an interactive and . first, all frequent itemsets are extracted. An itemset X
iterative framework is designed to assist the user throughout is called frequent itemset in the transaction database D
the analyzing task. The interactivity of our approach relies on if suppðXÞ minsupp;
a set of rule mining operators defined over the Rule Schemas in . and then, for each frequent itemset X, the set of rules
order to describe the actions that the user can perform. X Y ! Y , with Y X, and satisfying confðX
This paper is structured as follows: Section 2 introduces Y ! Y Þ minconf is generated.
notations and definitions used throughout the paper. If X is frequent and no superset of X is frequent, X is
Section 3 justifies our motivations for using ontologies. denoted as a maximal itemset.
Section 4 describes the research domain and reviews related
works. Section 5 presents the proposed framework and its Theorem 1. Let X I and T D. Let cit ðXÞ denotes the
elements. Section 6 is devoted to the results obtained by composition of the two mappings t iðXÞ ¼ iðtðXÞÞ. Also,
applying our method over a questionnaire database. let cti ðT Þ ¼ i tðT Þ. Then, cit and cti are both Galois
Finally, Section 7 presents conclusions and shows directions closure operators [13] on itemsets and sets of transactions,
for future research. respectively.
Definition 3. A closed itemset [14] is defined as an itemset X
2 NOTATIONS AND DEFINITIONS which has the property of being the same as its closure, i.e.,
X ¼ cit ðXÞ. The minimal closed itemset containing an itemset
The association rule mining task can be stated as follows:
let I ¼ fi1 ; i2 ; . . . ; in g be a set of literals, called items. Let Y is obtained by applying the closure operator cit to Y .
D ¼ ft1 ; t2 ; . . . ; tm g be a set of transactions over I. A Definition 4. Let R1 and R2 be two association rules. We say that
nonempty subset of I is called itemset and is defined as rule R1 is more general than rule R2 , denoted R1 R2 , if R2 can
X ¼ fi1 ; i2 ; . . . ; ik g. In short, itemset X can also be denoted be generated by adding additional items to either the antecedent
as X ¼ i1 i2 . . . ik . For an itemset, the number of items is or consequent of R1 . In this case, we say that a rule Rj is
called length of the itemset and an itemset of length k is redundant [15] if there exists some rule Ri such that Ri Rj .
referred to as k-itemset. Each transaction ti contains an In consequence, in a collection of rules, the nonredundant
itemset i1 i2 . . . ik , with a variable k number of items for rules are the most general ones, i.e., those rules having minimal
each ti . antecedents and consequents, in terms of subset relation.
Definition 1. Let X I and T D. We define the set of all Definition 5. A rule set is optimal [6] with respect to an
transactions that contain the itemset X as: interestingness metric if it contains all the rules except those with
t : I ! D; tðXÞ ¼ ft 2 D j X tg: no greater interestingness than one of its more general rules. An
optimal rule set is a subset of a nonredundant rule set.
Similarly, we describe the itemsets contained in all the
transactions T by: Definition 6. Formally, an ontology is a quintuple O ¼
fC; R; I; H; Ag [16]. C ¼ fC1 ; C2 ; . . . ; Cn g is a set of concepts
i : D ! I; iðT Þ ¼ fx 2 I j 8t 2 T ; x 2 tg: and R ¼ fR1 ; R2 ; . . . ; Rm g is a set of relations defined over
concepts. I is a set of instances of concepts and H is a Directed
Acyclic Graph (DAG) defined by the subsumption relation (is-a
Definition 2. An association rule is an implication X ! Y ,
relation, ) between concepts. We say that C2 is-a C1 ,
where X and Y are two itemsets and X \ Y ¼ ;. The former,
C1 C2 , if the concept C1 subsumes the concept C2 . A is a set of
X, is called the antecedent of the rule, and the latter, Y , is
axioms bringing additional constraints on the ontology.
called the consequent.
A rule X ! Y is described using two important 3 MOTIVATIONS FOR THE GENERAL IMPRESSION
statistical factors: IMPROVEMENT USING ONTOLOGIES
. The support of the rule, defined as suppðX ! Y Þ ¼ Since early 2000s, in the Semantic Web context, the number
suppðX [ Y Þ ¼ jtðX [ Y Þj, is the ratio of the number of available ontologies has been increasing covering a wide
domain of applications. This could be a great advantage in
1. http://www.w3.org/2001/sw/. an ontology-based user knowledge representation.
786 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 6, JUNE 2010
More recently, Li [6] proposed optimal rules sets, defined As early as 1994, in the KEFIR system [28], the key
with respect to an interestingness metric. An optimal rule finding and deviation notions were suggested. Grouped in
set contains all rules except those with no greater interest- findings, deviations represent the difference between the
ingness than one of its more general rules. actual and the expected values. KEFIR defines interesting-
A set of reduction techniques for redundant rules was ness of a key finding in terms of the estimated benefits, and
proposed and implemented in [21]. The developed potential savings of taking corrective actions that restore the
techniques are based on the generalization/specification deviation back to its expected value. These corrective
of the antecedent/consequent of the rules and they are actions are specified in advance by the domain expert for
divided in methods for multiantecedent rules and multi- various classes of deviations.
consequent rules. Later, Klemettinen et al. [29] proposed templates to
Hahsler et al. [22] were interested in the idea of describe the form of interesting rules (inclusive templates)
generating association rules from arbitrary sets of itemsets. and not interesting rules (restrictive templates). The idea of
This makes possible for a user to propose a set of itemsets using templates for association rule extraction was reused
and to integrate another set generated by a data mining in [30]. Other approaches proposed to use a rule-like
tool. In order to generate rules, a support counter is needed; formalism to express user expectations [3], [12], [31], and
consequently, the authors proposed an adequate data the discovered association rules are pruned/summarized
structure which provides fast access: prefix trees. by comparing them to user expectations.
Toivonen et al. proposed in [9] a novel technique for Imielinski et al. [32] proposed a query language for
redundancy reduction based on rule covers. The notion of association rule pruning based on SQL, called M-SQL. It
rule cover is defined as the subset of a rule set describing allows imposing constraints on the condition and/or the
the same database transaction set as the rule set. Thus, the consequent of the association rules. In the same domain of
authors developed an algorithm to efficiently extract a rule query-based association rule pruning, but more constraints-
cover out of a set of given rules. driven, Ng et al. [33] proposed an architecture for exploratory
The notion of subsumed rules, discussed in [23], describes mining of rules. The authors suggested a set of solutions for
a set of rules having the same consequent and several several problems: the lack of user exploration and control,
additional conditions in the antecedent regarding a certain the rigid notion of relationship, and the lack of focus. In
rule. Bayardo, Jr., et al. [24] proposed a new pruning order to overcome these problems, Ng et al. proposed a new
measure (Minimum Improvement) described as the difference query language called Constrained Association Query and
between the confidences of two rules in a specification/ they pointed out the importance of user feedback and user
generalization relationship. The specific rule is pruned if the flexibility in choosing interestingness metrics.
proposed measure is less than a prespecified threshold, so Another related approach was proposed by An et al. in
the rule does not bring more information compared to the [34] where the authors introduced domain knowledge in order
general one. to prune and summarize discovered rules. The first
Nevertheless, both closed and maximal itemset mining algorithm uses a data taxonomy, defined by user, in order
still break down at low support thresholds. To address these to describe the semantic distance between rules, and in
limitations, Omiecinski proposed in [25] three new impor- order to group the rules. The second algorithm allows to
tant interestingness measures: any-confidence, all-confidence, group the discovered rules that share at least one item in the
and bond. All these measures are indicators of the degree of
antecedent and the consequent.
relatedness between the items in an association. The most
In 2007, a new methodology was proposed in [35] to
interesting one, all-confidence, introduced as an alternative to
prune and organize rules with the same consequent. The
support, represents the minimum confidence of all associa-
authors suggested transforming the database in an associa-
tion rules extracted from an itemset. Bond is also similar to
tion rule base in order to extract second-level association
support, but with respect to a subset of the data rather than
rules. Called metarules, the extracted rules r1 ! r2 express
the entire database.
relations between the two association rules and help
4.3 User-Driven Association Rule Mining pruning/grouping discovered rules.
Interestingness measures were proposed in order to dis- 4.4 Ontologies in Data Mining
cover only those association rules that are interesting
In knowledge engineering and Semantic Web fields,
according to these measures. They have been divided into
ontologies have interested researchers since their first
objective measures and subjective measures. Objective mea-
proposition in the philosophy branch by Aristotle. Ontol-
sures depend only on data structure. Many survey papers
ogies have evolved over the years from controlled vocabul-
summarize and compare the objective measure definitions
and properties [26], [27]. Unfortunately, being restricted to aries to thesauri (glossaries), and later, to taxonomies [36].
In the early 1990s, an ontology was defined by Gruber as
data evaluation, the objective measures are not sufficient to
reduce the number of extracted rules and to capture the a formal, explicit specification of a shared conceptualization [37].
interesting ones. Several approaches integrating user By conceptualization, we understand here an abstract model
knowledge have been proposed. of some phenomenon described by its important concepts.
In addition, subjective measures were proposed to The formal notion denotes the idea that machines should be
integrate explicitly the decision-maker knowledge and to able to interpret an ontology. Moreover, explicit refers to the
offer a better selection of interesting association rules. transparent definition of ontology elements. Finally, shared
Silbershatz and Tuzilin [3] proposed a classification of outlines that an ontology brings together some knowledge
subjective measures in unexpectedness—a pattern is interest- common to a certain group, and not individual knowledge.
ing if it is surprising to the user—and actionability—a pattern Several other definitions are proposed in the literature.
is interesting if it can help the user take some actions. For instance, in [38], an ontology is viewed as a logical theory
788 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 6, JUNE 2010
defining all food items that have the Boolean property conforming to the condition and, respectively, the conclu-
isDiet on T RUE. For our example, isDiet is instantiated sion of RS1 . Translating this description into the ontolo-
as follows: gical definition of concepts means that AR1 is conforming
to RS1 if the itemset A is conforming to the concept X and if
isDiet : fðapple; T RUEÞ; ðchicken; T RUEÞg:
the itemset B is conforming to the concept Y .
Now, we are able to connect the ontology and the Similarly, rule AR1 is filtered by CðRS2 Þ if the condition
database. As already presented, leaf-concepts are connected and/or the conclusion of the rule AR1 are conforming to
to items in a very simple way, for example, the concept the schema RS2 . In other words, if the itemset A [ B is
grape is connected to the same item f0 ðgrapeÞ ¼ grape. conforming to the concept U and the itemset A [ B is
On the contrary, the generalized concept F ruits is conforming to the concept V , then the rule AR1 is
connected through its three subsumed concepts: conforming with the nonimplicative rule schema RS2 .
fðF ruitsÞ ¼ fgrape; pear; appleg: Unexpectedness. With a higher interest for the user, the
Similarly, we can describe the connection for the other unexpectedness operator UðRSÞ proposes to filter a set of
concepts. rules with a surprise effect for the user. This type of rules
More interesting, the restriction concept DietP roducts will interests the user more than the conforming one since,
be connected through those concepts satisfying the restric- generally, a decision-maker searches to discover new
knowledge with regard to his/her prior knowledge.
tions in the definition of the concept. Thus, DietP roducts is
Moreover, several types of unexpected rules can be
connected through the concepts apple and chicken:
filtered according to the rule schema: rules unexpected
fðDietP roductsÞ ¼ fapple; chickeng: regarding the antecedent Up , rules unexpected regarding the
consequent Uc , and rules unexpected regarding both sides Ub .
5.4 Operations over Rule Schemas For instance, let us consider that the operator Up ðRS1 Þ
The rule schema filter is based on operators applied over extracts the rule AR1 which is unexpected according to the
rule schemas allowing the user to perform several actions condition of the rule schema RS1 . This is possible if the rule
over the discovered rules. We propose two important consequent B is conforming to the concept Y , while the
operators: pruning and filtering operators. The filtering condition itemset A is not conforming to the concept X.
operator is composed of three different operators: conform- In a similar way, we define the two other unexpected-
ing, unexpectedness, and exception. We propose to reuse the ness operators.
operators proposed by Liu et al.: conforming and unexpect- Exceptions. Finally, the exception operator is defined
edness, and we bring two new operators in the postproces- only over implicative rule schemas (i.e., RS1 ) and extracts
sing task: pruning and exceptions. conforming rules with respect to the following new
These four operators will be presented in this section. To implicative rule schema: X ^ Z ! :Y , where Z is a set
this end, let us consider an implicative rule schema RS1 : of items.
ð<X ! Y >Þ, a nonimplicative rule schema RS2 : ð<U; V >Þ, Example. Let us consider the implicative rule schema
and an association rule AR1 : A ! B, where X, Y , U, and V RS : F ruits ! EcologicalP roducts, where
are the ontology concepts, and A and B are the itemsets.
Definition 8. Let us consider an ontology concept C associated fðF ruitsÞ ¼ fgrape; apple; pearg
in the database to fðCÞ ¼ fy1 ; . . . ; yn g and an itemset and
X ¼ fx1 ; . . . ; xk g. We say that the itemset X is conforming
to the concept C if 9yi ; yi 2 X. fðEcologicalP roductsÞ ¼ fgrape; milkg;
and I ¼ fgrape; apple; pear; milk; beefg (see Fig. 1 for
Pruning. The pruning operator allows to the user to
supermarket taxonomy). Also, let us consider that the
remove families of rules that he/she considers uninterest-
following set of association rules is extracted by
ing. In databases, there exist, in most cases, relations
between items that we consider obvious or that we already traditional techniques:
know. Thus, it is not useful to find these relations among R1 : grape; beef ! milk; pear;
the discovered associations. The pruning operator applied
over a rule schema, P ðRSÞ, eliminates all association rules R2 : apple ! beef;
matching the rule schema. To extract all the rules matching R3 : apple; pear; milk ! grape;
a rule schema, the conforming operator is used. R4 : grape; pear ! apple;
Conforming. The conforming operator applied over a
R5 : beef ! grape;
rule schema, CðRSÞ, confirms an implication or finds the
implication between several concepts. As a result, rules R6 : milk; beef ! grape:
matching all the elements of a nonimplicative rule schema Thus, the operator CðRSÞ filters the rules R1 and R3 ,
are filtered. For an implicative rule schema, the condition the operator UpðRSÞ filters the rules R5 and R6 , and the
and the conclusion of the association rule should match operator UcðRSÞ filters the rules R2 and R4 . The pruning
those of the schema. operator P ðRSÞ prunes the rules selected by the conform-
Example. The rule AR1 is selected by the operator CðRS1 Þ if ing operator CðRSÞ. Let us explain the operator UcðRSÞ:
the condition and the conclusion of the rule AR1 are Uc operator filters the rules whose conclusion itemset is
792 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 6, JUNE 2010
5.5 Filters
In order to reduce the number of rules, three filters integrate
the framework: operators applied over rule schemas,
minimum improvement constraint filter [24], and item-
relatedness filter [45].
Minimum improvement constraint filter [24] (MICF) R1 : grape; pear; butter > milk;
selects only those rules whose confidence is greater with IRðR1 Þ ¼ minðdðgrape; milkÞ; dðpear; milkÞ;
minimp than the confidence of any of its simplifications. dðbutter; milkÞÞ ¼ minð4; 4; 2Þ ¼ 2:
Example. Let us consider the following three association
rules:
TABLE 2 TABLE 4
Pruning Rule Schemas Pruning Rate for Each Filter Combination
TABLE 3
Filtering Rule Schemas
TABLE 5
Notation Meaning
TABLE 6
Rates for Rule Schema Filters Applied after
the Other Three Filter Combinations
Q17 concepts, are subsumed by the concept Stairwell. [4] M.J. Zaki and M. Ogihara, “Theoretical Foundations of Associa-
Similarly, for the second rule, q8 and q9 are subsumed by tion Rules,” Proc. Workshop Research Issues in Data Mining and
Knowledge Discovery (DMKD ’98), pp. 1-8, June 1998.
CalmDistrict concept. Thus, the expert applied the IRF [5] D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu,
filter, and only three rules are filtered. One of these rules “Mafia: A Maximal Frequent Itemset Algorithm,” IEEE Trans.
attracts the interest of the expert: Knowledge and Data Eng., vol. 17, no. 11, pp. 1490-1504, Nov. 2005.
[6] J. Li, “On Optimal Rule Discovery,” IEEE Trans. Knowledge and
q15 ¼ 4; q16 ¼ 4; q97 ¼ 4 ¼¼> q9 ¼ 4; Data Eng., vol. 18, no. 4, pp. 460-471, Apr. 2006.
[7] M.J. Zaki, “Generating Non-Redundant Association Rules,” Proc.
Support ¼ 2:3% Confidence ¼ 79:1%; Int’l Conf. Knowledge Discovery and Data Mining, pp. 34-43, 2000.
[8] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Efficient Mining
which can be translated by: if a client is not satisfied with
of Association Rules Using Closed Itemset Lattices,” Information
the cleaning of the close surrounding and the entry hall, and Systems, vol. 24, pp. 25-46, 1999.
if he is not satisfied with the service charges, then it is [9] H. Toivonen, M. Klemettinen, P. Ronkainen, K. Hatonen, and H.
possible with a confidence of 79.1 percent that he considers Mannila, “Pruning and Grouping of Discovered Association
that his district has a bad reputation. This rule is very Rules,” Proc. ECML-95 Workshop Statistics, Machine Learning, and
interesting because the expert thought that the building Knowledge Discovery in Databases, pp. 47-52, 1995.
state does not influence the opinion concerning the district, [10] B. Baesens, S. Viaene, and J. Vanthienen, “Post-Processing of
Association Rules,” Proc. Workshop Post-Processing in Machine
but it is obvious that this is the case. Learning and Data Mining: Interpretation, Visualization, Integration,
and Related Topics with Sixth ACM SIGKDD, pp. 20-23, 2000.
[11] J. Blanchard, F. Guillet, and H. Briand, “A User-Driven and
7 CONCLUSION Quality-Oriented Visualization for Mining Association Rules,”
This paper discusses the problem of selecting interesting Proc. Third IEEE Int’l Conf. Data Mining, pp. 493-496, 2003.
association rules throughout huge volumes of discovered [12] B. Liu, W. Hsu, K. Wang, and S. Chen, “Visually Aided Exploration
of Interesting Association Rules,” Proc. Pacific-Asia Conf. Knowledge
rules. The major contributions of our paper are stated Discovery and Data Mining (PAKDD), pp. 380-389, 1999.
below. First, we propose to integrate user knowledge in [13] G. Birkhoff, Lattice Theory, vol. 25. Am. Math. Soc., 1967.
association rule mining using two different types of [14] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering
formalism: ontologies and rule schemas. On the one hand, Frequent Closed Itemsets for Association Rules,” Proc. Seventh Int’l
domain ontologies improve the integration of user domain Conf. Database Theory (ICDT ’99), pp. 398-416, 1999.
knowledge concerning the database field in the postpro- [15] M. Zaki, “Mining Non-Redundant Association Rules,” Data
Mining and Knowledge Discovery, vol. 9, pp. 223-248, 2004.
cessing step. On the other hand, we propose a new
[16] A. Maedche and S. Staab, “Ontology Learning for the Semantic
formalism, called Rule Schemas, extending the specification Web,” IEEE Intelligent Systems, vol. 16, no. 2, pp. 72-79, Mar. 2001.
language proposed by Liu et al. The latter is especially [17] B. Liu, W. Hsu, L.-F. Mun, and H.-Y. Lee, “Finding Interesting
used to express the user expectations and goals concerning Patterns Using User Expectations,” IEEE Trans. Knowledge and
the discovered rules. Data Eng., vol. 11, no. 6, pp. 817-832, Nov. 1999.
Second, a set of operators, applicable over the rule [18] I. Horrocks and P.F. Patel-Schneider, “Reducing owl Entailment to
schemas, is proposed in order to guide the user throughout Description Logic Satisfiability,” J. Web Semantics, pp. 17-29,
vol. 2870, 2003.
the postprocessing step. Thus, several types of actions, as
[19] J. Pei, J. Han, and R. Mao, “Closet: An Efficient Algorithm for
pruning and filtering, are available to the user. Finally, the Mining Frequent Closed Itemsets,” Proc. ACM SIGMOD Workshop
interactivity of our ARIPSO framework, relying on the set of Research Issues in Data Mining and Knowledge Discovery, pp. 21-30,
rule mining operators, assists the user throughout the 2000.
analyzing task and permits him/her an easier selection of [20] M.J. Zaki and C.J. Hsiao, “Charm: An Efficient Algorithm for
interesting rules by reiterating the process of filtering rules. Closed Itemset Mining,” Proc. Second SIAM Int’l Conf. Data Mining,
pp. 34-43, 2002.
By applying our new approach over a voluminous [21] M.Z. Ashrafi, D. Taniar, and K. Smith, “Redundant Association
questionnaire database, we allowed the integration of Rules Reduction Techniques,” AI 2005: Advances in Artificial
domain expert knowledge in the postprocessing step in Intelligence – Proc 18th Australian Joint Conf. Artificial Intelligence
order to reduce the number of rules to several dozens or pp. 254-263, 2005.
less. Moreover, the quality of the filtered rules was [22] M. Hahsler, C. Buchta, and K. Hornik, “Selective Association Rule
validated by the expert throughout the interactive process. Generation,” Computational Statistic, vol. 23, no. 2, pp. 303-315,
Kluwer Academic Publishers, 2008.
[23] J. Bayardo, J. Roberto, and R. Agrawal, “Mining the Most
Interesting Rules,” Proc. ACM SIGKDD, pp. 145-154, 1999.
ACKNOWLEDGMENTS [24] R.J. Bayardo, Jr., R. Agrawal, and D. Gunopulos, “Constraint-
The authors would like to thank Nantes Habitat, the Public Based Rule Mining in Large, Dense Databases,” Proc. 15th Int’l
Housing Unit in Nantes, France, and more specially Ms. Conf. Data Eng. (ICDE ’99), pp. 188-197, 1999.
[25] E.R. Omiecinski, “Alternative Interest Measures for Mining
Christelle Le Bouter, and also M. Loic Glimois for Associations in Databases,” IEEE Trans. Knowledge and Data Eng.,
supporting this work. vol. 15, no. 1, pp. 57-69, Jan./Feb. 2003.
[26] F. Guillet and H. Hamilton, Quality Measures in Data Mining.
Springer, 2007.
REFERENCES [27] P.-N. Tan, V. Kumar, and J. Srivastava, “Selecting the Right
[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Objective Measure for Association Analysis,” Information Systems,
Rules between Sets of Items in Large Databases,” Proc. ACM vol. 29, pp. 293-313, 2004.
SIGMOD, pp. 207-216, 1993. [28] G. Piatetsky-Shapiro and C.J. Matheus, “The Interestingness of
[2] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Deviations,” Proc. AAAI’94 Workshop Knowledge Discovery in
Advances in Knowledge Discovery and Data Mining. AAAI/MIT Databases, pp. 25-36, 1994.
Press, 1996. [29] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I.
[3] A. Silberschatz and A. Tuzhilin, “What Makes Patterns Interesting Verkamo, “Finding Interesting Rules from Large Sets of Dis-
in Knowledge Discovery Systems,” IEEE Trans. Knowledge and covered Association Rules,” Proc. Int’l Conf. Information and
Data Eng. vol. 8, no. 6, pp. 970-974, Dec. 1996. Knowledge Management (CIKM), pp. 401-407, 1994.
MARINICA AND GUILLET: KNOWLEDGE-BASED INTERACTIVE POSTMINING OF ASSOCIATION RULES USING ONTOLOGIES 797
[30] E. Baralis and G. Psaila, “Designing Templates for Mining Claudia Marinica received the master’s de-
Association Rules,” J. Intelligent Information Systems, vol. 9, pp. 7- gree in “KDD” from the Polytechnique School
32, 1997. of Nantes University in 2006, and the Com-
[31] B. Padmanabhan and A. Tuzhuilin, “Unexpectedness as a puter Science degree from Politehnica Uni-
Measure of Interestingness in Knowledge Discovery,” Proc. versity of Bucharest, Romania, in 2006. She is
Workshop Information Technology and Systems (WITS), pp. 81-90, currently working toward the PhD degree in
1997. computer science in the “Knowledge and
[32] T. Imielinski, A. Virmani, and A. Abdulghani, “Datamine: Decision” Team, LINA UMR CNRS 6241 at
Application Programming Interface and Query Language for Polytechnique School of Nantes University,
Database Mining,” Proc. Int’l Conf. Knowledge Discovery and Data France. Her main research interests are in
Mining (KDD), pp. 256-262, http://www.aaai.org/Papers/KDD/ Association Rule Mining and Semantic Web.
1996/KDD96-042.pdf, 1996.
[33] R.T. Ng, L.V.S. Lakshmanan, J. Han, and A. Pang, “Exploratory
Mining and Pruning Optimizations of Constrained Associations Fabrice Guillet received the PhD degree in
Rules,” Proc. ACM SIGMOD Int’l Conf. Management of Data, vol. 27, computer sciences from the Ecole Nationale
pp. 13-24, 1998. Superieure des Telecommunications de Bre-
[34] A. An, S. Khan, and X. Huang, “Objective and Subjective tagne in 1995. He has been an associate
Algorithms for Grouping Association Rules,” Proc. Third IEEE professor (HdR) in computer science at Poly-
Int’l Conf. Data Mining (ICDM ’03), pp. 477-480, 2003. tech’Nantes, and a member of the “KnOwledge
[35] A. Berrado and G.C. Runger, “Using Metarules to Organize and and Decision” team (KOD) in the Nantes-
Group Discovered Association Rules,” Data Mining and Knowledge Atlantic Laboratory of Computer Sciences (LINA
Discovery, vol. 14, no. 3, pp. 409-431, 2007. UMR CNRS 6241) since 1997. He is a founder
[36] M. Uschold and M. Grüninger, “Ontologies: Principles, Methods, of the “Knowledge Extraction and Management”
and Applications,” Knowledge Eng. Rev., vol. 11, pp. 93-155, 1996. French-speaking association of research (EGC, www.egc.asso.fr). His
[37] T.R. Gruber, “A Translation Approach to Portable Ontology research interests include knowledge quality and visualization in the
Specifications,” Knowledge Acquisition, vol. 5, pp. 199-220, 1993. frameworks of Data Mining and Knowledge Engineering. He has
[38] N. Guarino, “Formal Ontology in Information Systems,” Proc. First recently coedited two refereed books of chapter entitled Quality
Int’l Conf. Formal Ontology in Information Systems, pp. 3-15, 1998. Measures in Data Mining (Springer, 2007), and Statistical Implicative
[39] H. Nigro, S.G. Cisaro, and D. Xodo, Data Mining with Ontologies: Ananlysis—Theory and Applications (Springer, 2008).
Implementations, Findings and Frameworks. Idea Group, Inc., 2007.
[40] R. Srikant and R. Agrawal, “Mining Generalized Association
Rules,” Proc. 21st Int’l Conf. Very Large Databases, pp. 407-419, . For more information on this or any other computing topic,
http://citeseer.ist.psu.edu/srikant95mining.html, 1995. please visit our Digital Library at www.computer.org/publications/dlib.
[41] V. Svatek and M. Tomeckova, “Roles of Medical Ontology in
Association Mining Crisp-dm Cycle,” Proc. Workshop Knowledge
Discovery and Ontologies in ECML/PKDD, 2004.
[42] X. Zhou and J. Geller, “Raising, to Enhance Rule Mining in Web
Marketing with the Use of an Ontology,” Data Mining with
Ontologies: Implementations, Findings and Frameworks, pp. 18-36,
Idea Group Reference, 2007.
[43] M.A. Domingues and S.A. Rezende, “Using Taxonomies to
Facilitate the Analysis of the Association Rules,” Proc. Second Int’l
Workshop Knowledge Discovery and Ontologies, held with ECML/
PKDD, pp. 59-66, 2005.
[44] A. Bellandi, B. Furletti, V. Grossi, and A. Romei, “Ontology-
Driven Association Rule Extraction: A Case Study,” Proc. Work-
shop Context and Ontologies: Representation and Reasoning, pp. 1-10,
2007.
[45] R. Natarajan and B. Shekar, “A Relatedness-Based Data-Driven
Approach to Determination of Interestingness of Association
Rules,” Proc. 2005 ACM Symp. Applied Computing (SAC), pp. 551-
552, 2005.
[46] A.C.B. Garcia and A.S. Vivacqua, “Does Ontology Help Make
Sense of a Complex World or Does It Create a Biased Interpreta-
tion?” Proc. Sensemaking Workshop in CHI ’08 Conf. Human Factors
in Computing Systems, 2008.
[47] A.C.B. Garcia, I. Ferraz, and A.S. Vivacqua, “From Data to
Knowledge Mining,” Artificial Intelligence for Eng. Design, Analysis
and Manufacturing, vol. 23, pp. 427-441, 2009.
[48] L.M. Garshol, “Metadata? Thesauri? Taxonomies? Topic Maps
Making Sense of It All,” J. Information Science, vol. 30, no. 4,
pp. 378-391, 2004.
[49] I. Horrocks and P.F. Patel-Schneider, “A Proposal for an owl Rules
Language,” Proc. 13th Int’l Conf. World Wide Web, pp. 723-731,
2004.
[50] W.E. Grosso, H. Eriksson, R.W. Fergerson, J.H. Gennari, S.W. Tu,
and M.A. Musen, “Knowledge Modeling at the Millennium (the
Design and Evolution of Protege-2000),” Proc. 12th Workshop
Knowledge Acquisition, Modeling and Management (KAW ’99), 1999.
[51] M.-A. Storey, N.F. Noy, M. Musen, C. Best, R. Fergerson, and N.
Ernst, “Jambalaya: An Interactive Environment for Exploring
Ontologies,” Proc. Seventh Int’l Conf. Intelligent User Interfaces
(IUI ’02), pp. 239-239, 2002.