Sie sind auf Seite 1von 7

Improved collaborative filtering recommendations using

quantitative implication rules mining


in implication field
Hoang Tan Nguyen Hung Huu Huynh
Department of Information and Communications of Dong Thap University of Science and Technology, Da Nang University
12 Tran Phu Street, ward 1, Cao Lanh City, Dong Thap, Viet Nam 54 Nguyen Luong Bang Street, Lien Chieu District, Da Nang City, Viet
hoangntdt@gmail.com Nam.
Lan Phuong Phan hhhung@dut.udn.vn
Can Tho University Hiep Xuan Huynh
3/2 Street, Ninh Kieu District, Can Tho City, Viet Nam Can Tho University
. pplan@cit.ctu.edu.vn 3/2 Street, Ninh Kieu District, Can Tho City, Viet Nam
hxhiep@ctu.edu.vn

ABSTRACT the quality of rules not good enough for the recommendations. That
Collaborative filtering recommendation based on association rule is the follows:
mining has become a research trend in the field of recommender - The confidence measure for generating rules is insensitive
systems. However, most research results only focus on binary data, (unchanged) in situations where the size of the consequence or size
whereas in practice sets of transactions are usually quantitative of the population changes [10]. Besides, the confidence measure
data. Moreover, association rule mining algorithms are designed to does not care about the consequences, so it does not show the
focus on optimizing for basket analysis, so that in order to better relationship between the premise and the consequences of rule [10];
serve for recommendation, they need to be adjusted. Therefore, a
- The support measure decreases with the increase in the size of the
solution for recommender systems to deal with association rules on
premise of the rules;
both binary and quantitative data as well as improve the quality of
recommendation based on the rule set is a challenge today. This - The number of rules generated increases exponentially with the
paper proposes a new approach to improve the accuracy, the number of itemsets [11][12]. As a result, a very large number of
performance and the time of recommendation by the model based rules can be extracted even from small datasets, which can cause
on quantitative implication rules mining in the implication field. time and memory problems. In ARM algorithms, the higher the
likelihood is (the higher the support and confidence are), the
Keywords stronger the rule is. In fact, it is also concerned about the
unlikelihood again, i.e. the higher the likelihood and the less the
Implication field, quantitative implication rules, implication index, unlikelihood are, the stronger the rule is. In order to limit the
implication intensity. number of rules in the scope of management, the minimum
thresholds of support and confidence are used, but identifying these
1. INTRODUCTION thresholds is also a challenge for users.
The development of the Internet and the Internet of Things (IoT)
has caused data overload, so users have to face more difficulties in To solve the above mentioned problems, the application of
finding the right items to meet their needs from a huge collection statistical implication analysis to the recommender systems were
of items available. In such a situation, the recommender systems proposed in [13] [14]. Those studies were based on the variance of
[1][2][3] can provide a better option according to the needs of users implication index (or implication intensity) in implication field. In
and depending on their previous preferences, by trying to determine this paper, we continue using statistical implication analysis to
the needs and preferences of the users and present the most propose a new approach for improving collaborative filtering
appropriate option to the user using a number of well-defined recommendations. The proposed approach is based on quantitative
algorithms. Because of this, the algorithms for the recommender implication rules mining model in implication field using ARM
systems have attracted the attention of researchers for practical algorithms and the implication relationship among items.
applications. Among them, collaborative filter algorithms [4] are The paper is organized in five parts. The first one introduces the
widely used and most effective. In particular, collaborative filtering context and issues to be solved by the present systems as well as
based on association rules mining (ARM) is a current trend to proposing our approach. The second part presents the related
further improve the effectiveness of recommendations, especially contents. The third part describes the recommendation model based
in cold-start situations [5][6][7]. However, algorithms for mining on mining quantitative/binary implication rules in the implication
association rules focus on binary data only, whereas data sets of field. The fourth part is the experiment. The last part is the
transactions in real life are mostly quantitative data. Moreover, conclusion.
ARM algorithms are designed to focus on optimizing for basket
analysis, so that in order to better serve for recommendation, the 2. STATISTICAL IMPLICATION FIELD
algorithms need to be adjusted. These problems are solved by 2.1 Statistical implication analysis
solutions such as using fuzzy logic [8][9] to extend the results to
Statistical implication analysis (SIA) theory [15][16][17], proposed
quantitative data sets, but those solutions have to trade-off between
by Regis Gras, studies the implication relationship of data
performance and accuracy of the algorithm as well as information
variables. It can be presented as follows:
loss problem.
Let population 𝐸 of 𝑛 objects or individuals described by a finite
On the other hand, the ARM algorithms based on the traditional set of variables (properties), 𝐴 (𝐵) be the subset of 𝐸 containing the
probability framework themselves have some problems that make elements 𝑖 such that 𝑖(𝑎) = 𝑡𝑟𝑢𝑒 (𝑖(𝑏) = 𝑡𝑟𝑢𝑒), sets 𝐴̅ , 𝐵̅ be the
complement of sets 𝐴 and 𝐵 respectively, let 𝑛𝑎 = 𝑐𝑎𝑟𝑑(𝐴), 𝑛𝑏 = 𝜕𝑞 𝜕𝑞 𝜕𝑞 𝜕𝑞
𝑑𝑞 = 𝑑𝑛 + 𝑑𝑛 + 𝑑𝑛 + 𝑑𝑛 ̅ = 𝑔𝑟𝑎𝑑𝑞. 𝑑𝑀 (5)
𝑐𝑎𝑟𝑑(𝐵) 𝑛𝑎̅ = 𝑛 − 𝑛𝑎 , 𝑛𝑏̅ = 𝑛 − 𝑛𝑏 be the cardinality of 𝐴 , 𝜕𝑛 𝜕𝑛𝑎 𝑎 𝜕𝑛𝑏 𝑏 𝜕𝑛𝑎^𝑏̅ 𝑎^𝑏
B, 𝐴̅ , 𝐵 and 𝐵̅ respectively (the number of elements possessing
where 𝑀 is the point with the coordinates (𝑛, 𝑛𝑎 ,
properties 𝑎, 𝑏, 𝑎̅, and 𝑏̅ respectively), and 𝑛𝑎𝑏̅ = 𝑐𝑎𝑟𝑑(𝐴 ∩ 𝐵̅) be
𝑛𝑏 , 𝑛𝑎𝑏̅ ) belonging to the scalar vector field 𝐶, 𝑑𝑀 is the
the cardinality of the set 𝐴 ∩ 𝐵̅ (a set containing the elements 𝑖 that
differential component vector of the instance variables, and 𝑔𝑟𝑎𝑑𝑞
satisfy the properties 𝑖(𝑎) = 𝑡𝑟𝑢𝑒 and 𝑖(𝑏) = 𝑓𝑎𝑙𝑠𝑒 ), 𝑛𝑎𝑏̅ also
is the partial differential vector of the variables.
called counter-example[15].
From (3), the differential of the function 𝑞 appears as a scalar
The implication relationship between 𝐴 and 𝐵 is modeled in the product between gradient q and the increase of 𝑞 on the surface
statistical implication analysis as follows (see Figure 1). representing the variables of the function 𝑞(𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ). 𝑔𝑟𝑎𝑑𝑞
denoting the variability of the function of four variables which
points to the direction of the function 𝑞 in four dimensions space.
In fact, the interest of this differential resides in the estimation of
the increase (positive or negative) of q (denoted by 𝛥𝑞) with respect
to the variations ∆𝑛, ∆𝑛𝑎 , ∆𝑛𝑏 , and ∆𝑛𝑎𝑏̅ . Therefore, we have [18]
[15]:
𝜕𝑞 𝜕𝑞 𝜕𝑞 𝜕𝑞
∆𝑞 =
𝜕𝑛
∆𝑛 +
𝜕𝑛𝑎
∆𝑛𝑎 +
𝜕𝑛𝑏
∆𝑛𝑏 +
𝜕𝑛𝑎^𝑏
∆𝑛𝑎𝑏̅ + 𝑜(∆𝑞) (6)
̅

where 𝑜(𝑞) is an infinitely small first order.


To further examine the relationship between the implication index
Figure 1. The illustration of the components of statistical 𝑞 and implication intensity 𝜑. Taking the primitive of the equation
implication analysis by Venn diagram (1), we obtain [18]:
Implication intensity measure 𝜑(𝑎, 𝑏) of rule 𝑎 → 𝑏 is defined by dφ 1 −q2

(1) [15][16]: =- e 2 <0 (7)


dq √2π
𝑛𝑎𝑏
̅ ∞
𝜆𝑠 −𝜆 1 𝑡2 This confirms that the implication intensity increases 𝜑 as 𝑞
1−∑ 𝑒 = ∫ 𝑒 − 2 𝑑𝑡 , if 𝑛𝑏 < 𝑛 decreases, but the rate of increase is determined by formula (6),
𝜑(𝑎, 𝑏) = 𝑠! √2𝜋 (1)
𝑠=0 ̅) 𝑞(𝑎,𝑏 which allows for a more rigorous study of the variability of 𝜑.
{ 0, other wise
where 𝝀 =
𝒏𝒂 𝒏̅𝒃
and 𝑞(𝑎, 𝑏̅) is implication index. In terms of
2.3 Implication Field
𝒏
approximation (e.g. 𝜆 ≥ 4), implication index is the approximation 2.3.1 Statistical implication field
of the normal distribution 𝑁(0,1). Consider the implication index 𝑞(𝑎, 𝑏̅) in the four-dimensional
For binary variables, implication index is defined by (2) [15]. space 𝐸 where the point 𝑀 whose coordinates are the parameters
𝒏𝒂𝒃
𝒏𝒂 𝒏̅
𝒃 associated with 𝑎 and 𝑏 are (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ), 𝑞(𝑎, 𝑏̅ ) is a scalar
̅− 𝒏
𝑞(𝑎, 𝑏̅) = 𝒏𝒂𝒏𝒃
̅
(2) field by applying the mapping from space 𝑅4 to space (3) 𝑅. The

𝒏 vector 𝑔𝑟𝑎𝑑𝑞 containing the partial derivatives of 𝑞 for the
For modal variables, implication index is defined by (3) [15]. parameters 𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ is a special gradient field called
𝑛 𝑛
∑𝑖∈𝐸 𝑎(𝑖)𝑏̅(𝑖) − 𝑎 𝑏̅ implication field because it satisfies the Schwartz criteria in the
𝑞𝑝 (𝑎, 𝑏̅) = 𝑛
equation (8) for the mixed differential. The mixed derivative event
2 2 2
√(𝑛 𝑠𝑎 + 𝑛𝑎 )((𝑛
2 𝑠 2 + 𝑛2 )
𝑏̅ 𝑏̅ (3) of each pair of parameters(𝑛, 𝑛 , 𝑛 , 𝑛 )[18], is: (3)
𝑎 𝑏 𝑎𝑏̅
𝑛3 δ δq δ δq
( )= ( ) (8)
δna∧b
̅ δnb δnb δna∧b̅
where 𝑛𝑎 = ∑𝑛𝑖=1 𝑎(𝑖) , 𝑛𝑏 = ∑𝑛𝑖=1 𝑏(𝑖) , 𝑎(𝑖) (𝑏̅(𝑖)) be the value
given by element 𝑖 for variable 𝑎 ( 𝑏̅ ), and 𝑠𝑎 , 𝑠𝑏 , be standard Similar to each other pairs in the parameters (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ) ,
deviations of modal variables 𝑎, 𝑏̅. This expansion is still valid for g𝑟𝑎𝑑𝑞 is considered to be the potential of 𝑞 . Vector grad q is
the frequency variable and quantitative variables when they are performed to change the space of the confidentiality of the case, it's
normalized by (4) [15]. sort of the low value to a higher value. At each point of the gradient,
we observe an increase in the implied density of space and to what
𝑎̃(𝑖) = 𝑎(𝑖)/𝑚𝑎𝑥𝑖∈𝐸 𝑎(𝑖) (4) extent the rate at which it changes under the influence of one or
When 𝑎 and 𝑏 are binary variables then 𝑞𝑝 (𝑎, 𝑏̅) = 𝑞(𝑎, 𝑏̅). more parameters.

The implication rule that a→b is admissible at the level 𝛼 if and 2.3.2 Implication index equipotential plane
only if 𝜑(𝑎, 𝑏) ≥ 1 − 𝛼. [15][16]. The implication field is formed from the four-dimensional space,
consisting of ordered ordinate planes corresponding to the
2.2 Implication index variation sequential successive values of q with respect to the variation of the
Let us consider small variations in the neighborhood of all four cardinalities (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ) [18]. Consider the implication index
observed values of variables 𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ . These variables is as a function of four parameters 𝑞(𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ), a line or plane of
considered as real numbers and q is as a continuously differentiable equipotential in implication field is curved in 𝐸, an 4-dimensionals
function with the constrained inequalities: 0 ≤ 𝑛𝑎 ≤ 𝑛𝑏 ; 𝑛𝑎^𝑏̅ ≤ space along which or at which point a variable 𝑀 maintains the
inf{𝑛𝑎 , 𝑛𝑏 } and sup{𝑛𝑎 , 𝑛𝑏 } ≤ 𝑛. The differential of q in Frechet’s same value of potential of 𝑞. The plane of equipotential is orderly.
geometry is expressed by the following way [18]: The equation of this curve is shown in (9) [18]:
𝑛𝑎 𝑛𝑏̅ change the implication of the rule. However, if the number of
𝑛𝑎𝑏̅ −
𝑞(𝑎, 𝑏̅) − 𝑛 =0 (9)
counterexamples appear more and more, the implication of the rule
𝑛𝑎 𝑛𝑏̅
√ decreases, and eventually if the number of counter-examples is
𝑛
large enough, it will result in the elimination of the rule,
3. RECOMMENDATIONS BASED ON - The implication intensity measure is good adaptation to noisy data
QUANTITATIVE IMPLICATION RULES [10], since a small number of counter-examples do not have the
ability to invalidating the rule,
MINING IN IMPLICATION FIELD
- The implication intensity measure does not allow the creation of
3.1 Statistical implication rules and rules such as 𝑎 → 𝑏 when the consequence 𝑏 is true for almost all
association rules examples of the training set (whether 𝑎 is true or false): in that case,
A rule is a relationship between a pair of variables (𝑎, 𝑏) denoted it is not surprising that the set where 𝑎 is true is almost included in
by 𝑎 → 𝑏. Typically, in the literal, three parameters 𝑛, 𝑛𝑎 , 𝑛𝑏 and the set where 𝑏 is true [10] .
a parameter of the general distribution of two variables such as 𝑛𝑎𝑏̅ ,
𝑛𝑎𝑏 or 𝑛𝑎̅𝑏̅ are used for presenting a rule. Composition examples 3.2 Recommendation models based on
(likelihood) of rule 𝑛𝑎𝑏 is the objects identified by the antecedent association rules and implication rules
𝑎 and the consequence 𝑏, while the following counter-examples The recommendation model based on the association rules 𝑀ℛ𝑎
(unlikelihood) of rule 𝑛𝑎𝑏̅ are objects identified by 𝑎 and the consists of a set of rules ℛ𝑎 ; framework support/confidence
negative of 𝑏. For association rule, the fourth parameter is 𝑛𝑎𝑏 but 𝐹𝑅𝑎 with parameters such as type of data; threshold of support and
for implication rule, it is 𝑛𝑎𝑏̅ . The relationship of those parameters confidence; algorithms for initializing, registering, executing,
is shown in Table 1. training the model; algorithms for evaluating model; and
Table 1. Contingency table for rule 𝑎 → 𝑏 algorithms for displaying the recommendation result.
𝑏 𝑏̅ total 𝑀ℛ𝑎 = {𝑋|ℛ𝑎 , 𝐹𝑅𝑎 }
𝑎 𝑛𝑎𝑏 𝑛𝑎𝑏̅ 𝑛𝑎 Like 𝑀ℛ𝑎 , recommendation model based on implication rule 𝑀ℛ𝑖
𝑎̅ 𝑛𝑎̅𝑏 𝑛𝑎̅𝑏̅ 𝑛𝑎̅ also includes components presented in 𝑀ℛ𝑎 , but using framework
support, confidence and a SIA measure 𝐹𝑅𝑖 , processing on data in
total 𝑛𝑏 𝑛𝑏̅ 𝑛
binary, modal, or quantitative forms according to equations (2), (3),
An implication rule - a special association rule - is (1) a rule in and (4) respectively.
which 𝑎 and 𝑏 are two itemsets and 𝑎 ∩ 𝑏 = ∅; and (2) modeled
into a mathematical model of four variables (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ) with Another difference from 𝑀ℛ𝑎 , 𝑀ℛ𝑖 is a set of models, each of which
some constraints. corresponds to a SIA measure added in the framework. Those SIA
measures are the variance of the implication index and the variance
Specifically, an association rule ℛ𝑎 and an implication rule ℛ𝑖 can of the implication intensity. In this paper, the recommendation
be expressed as follows. model based on the variation of the implication index in the
𝑛𝑎 ≤ 𝑛, 𝑛𝑏 ≤ 𝑛, implication field used for the experiment.
𝑛𝑏 ≤ 𝑛, max(0, 𝑛𝑎 + 𝑛𝑏 − 𝑛) 𝑀ℛ𝑖 = {𝑋|ℛ𝑖 , 𝐹𝑅𝑖 }
ℛ𝑎 = {(𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏 )| }
≤ 𝑛𝑎𝑏 ≤ min(𝑛𝑎 , 𝑛𝑏 )
𝑠𝑚𝑖𝑛 ≤ 𝑠, 𝑐𝑚𝑖𝑛 ≤ 𝑐 3.3 Threshold of implication variation in
equipotential plane
0 ≤ 𝑛𝑎 ≤ 𝑛𝑏 ≤ 𝑛 ,0 ≤ 𝑛𝑎𝑏̅ ≤ 𝑛𝑎 In practice, the implication index (or implication intensity) of a set
ℛ𝑖 = {(𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ )| } of rules on an equipotential plane in implication fields, are rules
𝑠𝑚𝑖𝑛 ≤ 𝑠, 𝑐𝑚𝑖𝑛 ≤ 𝑐, 𝑖𝑚𝑖𝑛 ≤ 𝑖
with implication index (or implication intensity) that are within a
Where 𝑠 is the support value of rule, , 𝑐 is the confidence value of certain threshold 𝜃. Specifically, in [13] [14] they were defined as
rule, and 𝑖 is the value of rule according to a SIA measure; follows:
𝑠𝑚𝑖𝑛 , 𝑐𝑚𝑖𝑛 , 𝑖𝑚𝑖𝑛 are the minimum values used for keeping the
quality rules. The threshold 𝜃 of implication index 𝑞 in equipotential plane is
defined by (10).
A rule will have a better meaning when it has more examples and
less counter-example and vice versa. Likewise, a rule will be 𝜕𝑞 Δ𝑞 (10)
=𝑘 + 𝑜(𝑞)
reinforced if the increase of the example is faster than the increase 𝜕𝜉 Δ𝜉
of the counter-example. The implication rules add more properties 𝜕𝑞 Δ𝑞
where 𝑜(𝑞) is an infinitely small first order. , respectively are
to the nature of an association rules such as the asymmetry 𝜕𝜉 Δ𝜉
and 𝑛𝑎 ≤ 𝑛𝑏 . Some strengths of SIA measure compared to other partial derivatives and increment of 𝑞 according to ξ, where𝜉 ∈∈
probability and statistical measures are the follows. {𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ }, 𝑘 is the number of rules that changes when an item
is added or removed from a data set
- The implication index (and also implication intensity) is an
asymmetric and nonlinear statistical measure. The threshold 𝜃 of implication intensity 𝜑 in the equipotential
plane is defined by (11).
- The value of implication intensity increases with the size of the
training set, while other measures (support/confidence, lift, etc.) 𝜕𝜑 Δ𝜑 (11)
= max( ) + 𝑜(𝜑)
remain constant [10], 𝜕𝜉 Δ𝜉
- The implication intensity reflects the way the human draws
(removes) the previous statement [10]: If a statement has strong
implications, some counter-examples appear to be insufficient to
𝜕𝜑 Δφ Implication
Where 𝑜(𝜑) is an infinitely, and respectively are partial ARM SIA Recommendation
𝜕𝜉 Δ𝜉 field Algorithms
derivatives and increment of 𝜑 according to ξ, where 𝜉 ∈ algorithms Knowledges item
{𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ }.
3.4 Recommendation model based on Support, Implication k-Recommendation
quantitative implication rules mining confidence field measure list (topk-list)
In this paper, we study the variation of the implication index as measures
shown in Section 2 as the basis for proposing a recommendation
model: Collaborative filtering recommendation with threshold
Framework 𝐹𝑅𝑖 Model
value of the equipotential plane in implication field. This model
𝑀ℛ𝑎 𝑀ℛ𝑖
consists of two main algorithms for generating implication rules
(QIRG) and predicting and recommending to users the appropriate And other
items using set of equipotential planes (RBMQIR) as the following: Binary/quantitative models
Algorithm 1. QIRG (Quantitative Implication Rules Generator) ℛ𝑎 set, ℛ𝑖 set
Input: set of transactions Evaluation of model
Output: Quantitative implicative rule set presented by the with evaluate
Dataset
cardinality (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ) and values of those rules according to metric
measures such as support, confidence, SIA measures.
Figure 2. The overall structure of recommendation model based on
Step 1: Constructing a measure named as ifbyCountExam quantitative implication rules in implication field.
calculating the variation of implication index in the implication
field by counter-examples: 3.5 Evaluation
𝑛 𝑛̅ To evaluate a recommendation model, firstly the training set is used
𝑛𝑎𝑏̅ − 𝑎 𝑏 1 for finding the implication rules, then a test set and the rule set are
𝑖𝑓𝑏𝑦𝐶𝑜𝑢𝑛𝑡𝐸𝑥𝑎𝑚 = 𝑛 +
𝑛𝑎 𝑛𝑏̅ 𝑛𝑎( 𝑛 − 𝑛𝑏 ) used for predicting recommendation result; and lastly the real
√ √ (actual) ratings and the predicted result are used for evaluating the
𝑛 𝑛
Step 2: Improving the framework support/confidence 𝐹𝑅𝑎 by adding accuracy of the recommendation model.
the ifbyCountExam measure, then we have 𝐹𝑅𝑖 . Recommendation systems try to predict whether an item will be
liked or disliked by a particular user and propose the appropriate
Step 3: Generating implication rules set from set of transactions by items. When recommending a list of items to a user, the usual way
using data mining algorithms (such as apriori, eclat, etc) with to calculate accuracy is to compare the predicted items with a list
framework support/confidence and ifbyCountExam 𝐹𝑅𝑖 . Note that: of items that the user prefer. In comparison, each item can be
If data is in binary form, 𝑞 is computed by equation (2); if the data classified as True-Positive (TP) - the relevant and recommended
is in modal form, 𝑞 is computed by equation (3); otherwise, items, False-Positive (FP) - the irrelevant and recommended items,
equation (4) is used to normalize ratings to modal values. True-Negative (TN) - the relevant but not recommended items, or
Step 4: Calculating cardinalities of implication rules 𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ , False-Negative (FN) - the irrelevant and not recommended items
and the values of rules according to measures such as: support, as shown in Table 2 [20] [21] [22].
confidence, implication index, implication intensity, Table 2. Confusion Matrix
ifbyCountExam and so on. Recommended Not Recommended
Algorithm 2. RBMQIR (Recommendation by mining Quantitative Good True Positive (TP) False Negative (FN)
implication rules) Bad False Positive (FP) True Negative (TN)
Input: 𝑑𝑎𝑡𝑎𝑠𝑒𝑡, 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝜃, 𝑖𝑛𝑑, 𝑘𝑖𝑛𝑑, 𝑏𝑦𝐹𝑎𝑐𝑡𝑜𝑟
In addition, other indexes such as TPR (True Positive Rate - the
Output: recommendation 1 item or top k item list percentage of purchased items that have been recommended;
Step 1: calling QIRG(dataset) for generating rules set and FPR=TP/(TP + FN)), FPR (False Positive Rate - the percentage of
cardinalities of implication rules, and values of rules according to not purchased items that have been recommended; FPR= FP/(FP +
measures such as such as support, confidence, SIA measures. TN)), precision, recall [20] [22] [21] are used to evaluate the use
Step 2: Building a recommendation model based on quantitative of predictions. These indexes evaluate the appropriate
implication rules or binary implication rules depending on the type recommendations for each user instead of evaluating the number of
of data (kind=binary or kind=modal) and the implication field items related to each recommendation. Recommendations are
measure (ifbyCountExam). considered appropriate when a user selects an item from a list of
recommendations that have been suggested to the user.
Step 3: Training predictive model.
𝑇𝑃 (12)
Step 4: Predicting and returning the recommendation result (1 item 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃
or k items); and evaluating the that result.
TP (13)
The algorithms described above serve as the basis for the proposed 𝑟𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃 + 𝐹𝑁
recommendation model based on quantitative implication rules in
implication field (named as IFAR). Figure 2 shows the overall In this paper, the above listed measures are used to evaluate the
structure of that model. accuracy of the proposed model.
4. EXPERIMENT As shown in Figure 3a (ROC curve) and Figure 3b (Precision/ recall
curve), the result of the proposed model is better than that of
4.1 Dataset user/item based collaborative filter models; and the result of the
With the collaborative filtering based-on implication field item-based collaborative filtering models (for both Cosine and
recommendation model built above, we conduct experiments on Pearson) are the worst. This remains valid for other givens.
both binary (MSWeb 1 ) and quantitative (MovieLens 2 ) datasets.
MovieLens dataset collected by GroupLens consists of 100,000
ratings made by 943 users for 1682 films. The ratings range from 1
to 5 corresponding to from the lowest to the highest. MSWeb
dataset is created by sampling and processing the
www.microsoft.com logs of 38000 anonymous, randomly-selected
users in one-week timeframe. For each user, the data lists all the
areas of the web site (Vroots) that user visited.
To serve the experiment to be more accurate, the datasets are
preprocessed by:
- Normalizing data because ratings high (or low) for all users’
films/Vroots can lead to bias.
(a) ROC curve (b) Precision/recall curve
- Selecting relevant data because there are items rated by a few
users and there are users rating a few items. Figure 3. The ROC and Precision/recall curves of the IFAR model
and user/item based collaborative filtering models on binary data.
- Using k-fold cross validation method (with 𝑘 = 5 for this
paper) to avoid overfitting problems as well as to get better 4.4 Scenario 2: Comparing with association
accuracy as for each model evaluation. This action splits the dataset rules based recommendation model on binary
(MovieLens or MSWeb) into k equal sized parts and performs the
following function: dataset
Similar to Scenario 1, the experimental result with given = 5 on
Function Accuracy() binary dataset MSWeb is shown in Figures 4. In which, the
Begin accuracy of proposed model (IFAR) and that of the ARM model
using framework support/confidence are compared together.
𝑖 = 0;
As shown in Figure 4a (ROC curve) and Figure 4b (Precision/ recall
Repeat
curve): the result of the proposed model is better than that of the
𝑖 = 𝑖 + 1; ARM model. This remains right for other givens.
Determine the accuracy 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑖 of the model by
using the 𝑖𝑡ℎ part for testing and the rest of set (k-1 parts)
for training;
until 𝑖 = 𝑘;
∑𝑘
𝑖=1 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑖
return 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑘
;
End.
4.2 Experimental tools
Experiments are conducted by using the 𝑖𝑚𝑝𝑙𝑖𝑐𝑎𝑡𝑖𝑣𝑒𝑓𝑖𝑒𝑙𝑑 toolkit
developed by our group. It includes the proposed algorithms shown
in Section 3.4.
(a) ROC curve (b) Precision/recall curve
4.3 Scenario 1: Comparing with user/items-
Figure 4. The ROC and Precision/recall curves of IFAR model and
based recommendation collaborative filtering ARM model on binary dataset.
model on quantitative dataset
The experimental results with given = 5 (the number of known 4.5 Scenario 3: Comparing with association
ratings of an active user needing the recommendation) on the rules based recommendation model on
quantitative dataset Movielens are shown in Figure 3. In which, the
accuracy of proposed model (IFAR), user-based collaborative filter
quantitative dataset
This scenario is similar to Scenario 2, but it is done quantitative
models (UBCF), and item-based collaborative filtering models
dataset instead of binary dataset.
(IBCF) are compared. For user/item based models, the measures
Pearson and Cosine are used for finding the nearest neighbors; and Figures 5 shows the ROC curve and the Precision/recall curve of
the number of nearest neighbors is 40. The number of items to be proposed model and ARM model on quantitative dataset
recommended to a user is 1, 5, 10, 20 and 25. The thresholds for MovieLens with given = 5. It is that the result of the proposed
support and confidence are ? and ? respectively. model is better than that of ARM model. For other givens, we also
obtain the same result.

1 2
https://kdd.ics.uci.edu/databases/msweb/msweb.html https://grouplens.org/datasets/movielens/
mining implication rules in the implication field is effectively better
than the model based on mining association rules as well as the
collaborative filtering models based on items/users.

6. REFERENCES
[1] Adomavicius Gediminas, Tuzhilin Alexander, Toward the
Next Generation of Recommender Systems: A Survey of the
State-of-the-Art and Possible Extensions, IEEE transactions
on Knowledge and Data engineering, Vol.17 No.6, pp. 734 –
749, 2005.
(a) ROC curve (b) Precision/recall curve
[2] Adomavicius Gediminas, Tuzhilin Alexander, Context-
Figure 5. The ROC and Precision/recall curves of IFAR model and aware recommender systems, Springer US, pp. 217-253,
ARM model on quantitative data. 2011.
4.6 Scenario 4: Comparing the performance [3] Francesco Ricci, Lior Rokach and Bracha Shapira,
Introduction to Recommender Systems Handbook, Springer-
and processing time of two rules based models Verlag and Business Media LLC, pp.1-35, 2011.
[4] Rahul Katarya, Om Prakash Verma, Effective collaborative
movie recommender system using asymmetric user similarity
and matrix factorization, The 2016 IEEE International
Conference on Computing, Communication and Automation,
DOI:10.1109/CCAA.2016.7813692, pp.1-12, 2016.
[5] Gavin Shaw, Yue Xu and Shlomo Geva (2010), Using Association
Rules to Solve the Cold-Start Problem in Recommender Systems,
Advances in Knowledge Discovery and Data Mining, pp.340-
347, DOI 10.1007/978-3-642-13657-3_37, ISSN 0302-9743.
[6] Timur Osadchiy, Ivan Poliakov, Patrick Olivier, Maisie
Rowland, Emma Foster (2018), Recommender system based
on pairwise association rules, Expert Systems with
Applications 115 pp. 535–542.
https://doi.org/10.1016/j.eswa.2018.07.077.
[7] Ahmed Mohammed K. Alsalama (2015), A Hybrid
Figure 6. Comparison of training/modelling time, predicting time Recommendation System Based On Association Rules,
and size of the rule set generated by the two models. International Science Index, Computer and Information
Engineering Vol:9, No:1, 2015 waset.org/Publication/
In order to evaluate the performance of the proposed model 10000147
IFARRS (in terms of the size of the rule set), the training/modelling [8] Tzung Pei Hong, Chang Sheng Kuo, Sheng Chai Chi, (2001)
time as well as the predicting time of the model, the experiment is Trade-off between computation time and number of rules for
conducted on dataset? using the k-fold method, the number of fuzzy mining from quantitative data, International journal of
predicted items to be 5, the number of executions per model to be Uncertainty, Fuzziness and Knowledge-Based Systems Vol.9
2 times. The average result is shown in the Figure 6, the modelling No.5, pp.587-604.
time and the predicting time of IFARRS compared to those of
ARRS model are 46.57% and 62.80% respectively; the size of rule [9] Tzung-Pei Hong, Chun-Hao Chen, Yeong-Chyi Lee, and Yu-
set of IFARRS model compared to that of the ARRS model is 9.12 Lung Wu. (2008), Genetic-Fuzzy Data Mining with Divide-
%. This demonstrates that the IFARRS model for quantitative rules and-Conquer Strategy, IEEE transactions on evolutionary
produces better rule enforcement, faster execution times. This is computation, Vol.12, No.2.
due to the combination of the measure of support/confidence for [10] Guillaume S., Guillet F., Philipp6 J. (1998): Contribution of the
finding the resulting rules with the optimized likelihood integration of intensity of implication into the algorithm proposed by
component, and the ifbyCountExam measure for optimizing the Agrawal, EMCSR'98, Vienna, vol. 2, pp. 805-810
unlikelihood to increase the interestingness of the rule. [11] Rakesh Agrawal, Imielinski, T., & Swami, T. (1993). Mining
association rules between sets of items in large databases. In
5. CONCLUSIONS Proceedings of ACM SIGMOD international conference on
This article has contributed to an solution for recommendation management of data (SIGMOD’93) (pp. 207–216).
based on the combination between rules mining on both binary data
[12] Rakesh Agrawal and Ramakrishnan Srikant Fast algorithms
and quantitative data and the statistical implication anlaysis; as well
for mining association rules. Proceedings of the 20th
as improved the quality of the recommendation result. By
International Conference on Very Large Data Bases, VLDB,
integrating the ARM with the variation of implication index (or
pages 487-499, Santiago, Chile, September 1994.
implication intensity) in the implication field, the recomendation
model improves the accuracy, the performance, and the execution [13] Hoang Tan Nguyen, Hung Huu Huynh and Hiep Xuan
time of recommendation. The experiment is performed on MSWeb Huynh (2018), Collaborative filtering recommendation with
(binary dataset) and MovieLens (quantitative dataset). The threshold value of the equipotential plane in implication
experiment result shows that collaborative filtering model based on field, the 2nd International Conference on Machine learning
and Soft computing (ICMLSC2018); Phu Quoc island, International Meeting on Statistical Implicative Analysis,
Vietnam, ISBN: 978-1-4503-6336-5 pp.39-44. Tunisia, pp 1-21, 2015 (in French).
[14] Hoang Tan Nguyen, Hung Huu Huynh and Hiep Xuan [19] Yanchang Zhao, Chengqi Zhang, Longbing Cao, (2009) Post-
Huynh (2018), Collaborative Filtering Recommendation in Mining of Association Rules: Techniques for Effective
the Implication Field, International Journal of Machine Knowledge Extraction, Information Science Reference (an
imprint of IGI Global), USA, ISBN 978-1-60566-405-7.
Learning and Computing, Volume 8 Number 3 (Jun. 2018),
pp 214-222. [20] Herlocker J.L et al, 2004. Evaluating collaborative filtering
recommender systems. ACM Trans. Inf. Syst., vol. 22, no. 1,
[15] Régis Gras, Einoshin Suzuki Fabrice Guillet, Filippo pp. 5–53
Spagnolo (Eds.), Statistical Implicative Analysis, Theory and
Application, Springer Verlag Berlin Heidelberg, 2008. [21] Sarwar, B and G. Karypis, 2000. Analysis of
recommendation algorithms for ecommerce. EC ’00. USA:
[16] Regis Gras, Raphael Couturier, Spécificités de l'Analyse ACM, pp. 158–167
Statistique Implicative (A.S.I.) par rapport à d'autres mesures
de qualité de règles d'association, Quaderni di Ricerca in [22] Yeong, et al, 2005. Mining changes in customer buying
Didattica - GRIM (ISSN on-line 1592-4424, p.19-57, 2010. behavior for collaborative recommendations. Expert Syst.
Appl. 28, 2 (February 2005), 359-369.
[17] Dominique Lahanier-Reuter, Didactics of Mathematics and DOI=10.1016/j.eswa.2004.10.015
Implicative Statistical Analysis, Statistical Implicative http://dx.doi.org/10.1016/j.eswa.2004.10.015
Analysis - Studies in Computational Intelligence, pp 277-
298, 2008.
.
[18] Régis Gras, Pascale Kuntz and Nicolas Greffard, Notion de
champ implicatif en analysis statistique implicative, The 8th

Authors’ background
Your Name Title* Research Field Personal website

Hoang, Tan Nguyen Phd candidate Data mining, Statistical none


Implicative Analysis,
Recommender systems
Lan, Phuong Phan Phd candidate Recommender systems, none
statistical implicative analysis,
software engineering,
Hung, Huu huynh PhD candidate Computer Vision Scv.udn.vn/hhhung
Hiep, Xuan Huynh Associate Professor Data Mining, Artificial none
(HDR) Intelligence, Statistical
Implicative Analysis, Wireless
Sensor Network

Das könnte Ihnen auch gefallen