Fuzzy in Rs

Fuzzy Sets and Systems 136 (2003) 133 149
www.elsevier.com/locate/fss
Fuzzy logic methods in recommender systems

Ronald R. Yager
Machine Intelligence Institute, Iona College, 715 North Avenue, New Rochelle, NY 10801, USA Received 5 July 2001; received in revised form 4 March 2002; accepted 25 April 2002
Abstract Here we consider methodologies for constructing recommender systems. The approaches studied here di er from collaborative ltering, they are based solely on the preferences of the single individual for whom we are providing the recommendation and make no use of the preferences of other collaborators. We have called these reclusive methods. Another important feature distinguishing these reclusive methods from collaborative methods is that they require a representation of the objects. Considerable use is made of fuzzy set methods for the representation and subsequent construction of justications and recommendation rules. It is pointed out these reclusive methods rather than being competitive with collaborative methods are complementary. c 2002 Elsevier Science B.V. All rights reserved.
Keywords: Customization; Recommender systems; Fuzzy methods; Collaborative ltering
1. Introduction Recommender systems [8,5] are a rapidly emerging class of software especially within the domain of E-Commerce [9]. Their importance being directly related to the ability of the internet to collect, store and process vast quantities of information about individuals actions and preferences. They are important component toward the goal of providing specic customized information to each user. Most of the current generation of recommender systems are based on collaborative ltering technologies [4,6,7,10]. An important component of collaborative ltering type systems is the calculation of similarity of interest based on correlations between individuals. In order to predict a users potential interest for some object they have not experienced, collaborative ltering uses these measures of similarity of interest in conjunction with ratings of the object by other individuals who have experienced the object. An important feature of these pure collaborative ltering systems is that they do not require any representation of the objects being considered.
Tel.: +1-212-249-2047; fax: +1-212-249-1689. E-mail address: ryager@iona.edu (R.R. Yager).
0165-0114/03/$ - see front matter c 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 5 - 0 1 1 4 ( 0 2 ) 0 0 2 2 3 - 3
134
R.R. Yager / Fuzzy Sets and Systems 136 (2003) 133 149
In this work we shall focus on a di erent class of recommender systems which are not collaborative. These types of systems, which we call reclusive, use only preference information about the user of interest. This types of systems require some representation of the object. An essential di erence between the collaborative ltering approach and the reclusive approach is that collaborative ltering is based upon nding a similarity between people whereas the reclusive approach is based upon nding a similarity between the objects. What is clear, although we shall not study it here, is that future recommender systems will incorporate both these perspectives. 2. A general view of recommender systems A recommender systems is associated with a collection of objects D = {d1 ; : : : ; dn }. The purpose of this system is to recommend to the user objects of D that may be of interest to him. As a tangible example of a recommender system we shall often nd it convenient to use one in which the objects are movies. Here we shall consider some approaches to this problem of recommendation and shall describe some methods for performing this task. The implementation of technologies for developing these systems is strongly dependent upon the type of information that is being used. In the following, we shall discuss the types of information that may be available to a recommender system. A prime source of information for use in a recommender system is the knowledge about the objects in D. The usefulness of this information is dependent upon the representation used for the objects in D. The least information-rich situation is the one in which all we have is just some unique identication of an object and no other information. For example, all we know about a movie is just its title. A more information-rich environment is one in which we describe an object with some attributes. For example, we indicate the year the movie was made, the type of movie, the stars. These attributes and their associated values provide a representation of the objects. We can have degrees of representation, more sophisticated representations will depend upon the features used to characterize the objects. Many techniques that can be used in recommender systems are based upon using some representation of the objects. Generally, the more sophisticated the representation, the better these techniques perform. In order to make a recommendation to a user we must have some information about the users preferences. Information about user preferences can essentially be obtained in two di erent ways, although these need not be mutually exclusive. We shall refer to these two modes, respectively, as extensionally and intentionally expressed preference information. By extensionally expressed preference information we mean information based upon the actions or past experiences of the user with respect to specic objects of the type found in D. Examples of this are movies a user has previously seen and possibly some rating of these movies. In another domain we could mean the objects which the user has purchased. By intentionally expressed information we mean some specications by the user of what they desire in objects of the type under consideration. Generally to be of use these specications must be of such a nature that they can be related to the attributes and features used in the representation of the objects in D. We would like to make some comment on the distinction between targeted marketing [16] and recommender systems. We say that recommender systems are participatory in that the user is
135
participating in the process by providing information about their preferences. In a targeted marketing system, while information may be available about a users preferences, this is generally based on extensional information obtained from past actions of the user. Here the target is a passive supplier of information rather than an active supplier as in the recommender system. For example the system used by Amazon.com, while called a recommender system, according to our denition is more appropriately a targeted marketing system that is based solely upon past information of the user and does not involve any cooperation by the user. On the other hand the system used by NETFLIX 1 is a true recommender system as it uses ratings supplied by the user. Another characterizing aspect of these recommender systems is whether the system is collaborative or not. We shall say a system is collaborative if information about the preferences of other people are used in determining the recommendation to the current user. Furthermore, in these collaborative recommender systems the available technologies depend on the nature of preference information used with respect to participating agents. Generally, in the collaborative approach when extensional information is used one tries to obtain, based on mutually experienced items, some measure of correlation between the participants and use this as a basis of providing recommendations. In Fig. 1 we summarize the situation with respect to the information available in a recommender system. In order to develop a recommender system we need to use information about the users preferences. Collaborative type recommender can be constructed using only extensional preference information. Non-collaborative, reclusive, type systems require the availability of object representations. Table 1 shows a simplied typology of di erent types of recommender systems. The rst column indicates a purely collaborative type of system. Columns 24 indicate those based purely on representation, the di erences between these columns being types of information used to specify the users preferences. The nal column indicates systems which use both collaborative and representational information. Here we shall focus on reclusive, non-collaborative, recommender systems in which there exists a representation of the objects. At a meta level, we see a kind of symmetry between reclusive methods and collaborative methods. In both cases we make use of a vector of ratings of the objects the current user has experienced. We shall denote this as A. In the collaborative ltering approach we have for each collaborator a vector Aj indicating their ratings of the corresponding objects. For any object d unexperienced by our current user we have vector R whose components, rj , are a rating of this object by the collaborators. The procedure for obtaining the degree of recommendation can be seen to essentially involve two steps: 1. We combine A and Aj to obtain Sj , a degree of similarity of our user with each collaborator. 2. Rating of d = Aggregation of weighted tuples (Sj ; rj ). In the reclusive method for each object the user has experienced we have a representation Ri as well as the current users rating ai as contained in A. In addition, for an unexperienced object d which we are trying to evaluate we only have a representation R. The procedure for obtaining the
NETFLIX is a website that rents DVD videos. It asks users to rate the videos they have rented and uses this to recommend other videos.
136
Intentional USER Preference information type Extentional Both
OBJECTS of INTEREST
COLLABORATORS Peer Group
Representation Available
Available
Yes
No
Yes Preference Information Type
No
Intentional
Extentional
Fig. 1. Recommender systems information structure.
Both
Table 1 Recommender systems typology Extensional preferences Intentional preferences Representation Collaborators
degree of recommendation also essentially involves a two step procedure: 1. We combine R and Ri to obtain Si , a degree of similarity of the object d with the experienced objects. 2. Rating of d = Aggregation of weighted tuples (Si ; ai ). Our goal in the following is to develop modules that can be used to help evaluate objects with regard to their degree of recommendation in the case of reclusive approaches.
137
3. Object representation We now turn to the issue of object representation. For our purposes the representation of an object shall be based upon a set of primitive assertions or statements. Each assertion can essentially be viewed as some declarative statement. Associated with each object and each assertion is a value contained in the unit interval indicating the degree to which the assertion is valid for that object. For example in the movie domain a primitive assertion may be this movie is a comedy. In this case, the value associated with this assertion for a movie indicates the degree to which it is true that this movie is a comedy. Another assertion may be that Robert DeNiro is a star in this movie. If the movie has Robert DeNiro as one of its stars then this assertion has validity one otherwise it is zero. Another assertion may be that this movie was made in 1993, if the movie was made in 1995 this would have a validity of zero. If it was made in 1993, this assertion would have truth value one. Essentially then the basis of our representation scheme is a collection of assertions or statements whose validity is determinable for any object in D. We shall denote this set of primitive or atomic assertions as A = {A1 ; : : : ; An }. The representation of an object consists of a valuation of these assertions for the object. For object d, Aj (d) indicates the degree to which assertion Aj is satised by d. When we are just focusing on one object we can denote this aj . For some purposes we can view any object d as a fuzzy subset over the space A. Using this perspective the membership grade of Aj in d; d(Aj ) = Aj (d) = aj . As an alternative perspective an object can be viewed as an n dimensional vector whose jth component is Aj (d). As we shall subsequently see these di erent perspectives are useful in inspiring di erent information processing operations. We shall call a subset of related assertions an attribute or feature. For example, the subset V may consist of all the assertions of the form this movie was made in the year xyz. We can denote this attribute as the year the movie was made. Another notable subset of related assertions from A may consist of all the assertion of the form x stars in this movie. This feature corresponds to the attribute of who are the stars of the movie. For a given recommender system, in addition to the set A of primitive assertions, we shall assume the existence of a collection of features or attributes associated with the objects in D. We denote this collection of attributes as F = {V1 ; V2 ; : : : ; V }. Each attribute V corresponds to a subset of assertions q j which can be seen as constituting the possible values for the attribute. In some special cases a feature may consist of a single assertion. The performance of a recommender system is clearly related to the sophistication of the primitive assertions and associated features used to represent the objects of interest. We note that while we have started with assertions and constructed features by forming subsets of related assertions it is equally valid, and perhaps more intuitive, to start with attributes and generate the primitive assertions as being the possible values for these attributes. Attributes can be classied by various characteristics associated with their solution space [23,24]. Attributes can be distinguished with respect to number of solutions they allow, is it restricted to having only one solution, does it allow multiple solutions, must it have a solution. For example, the attribute corresponding to release year of a movie must have only one solution. On the other hand the attribute corresponding to the star of a movie can take on multiple values. The primitive assertions can also be classied with respect to the allowable truth values they can
138
assume. For example binary type assertions are those in which its truth value must assume the value of either one or zero while other assertions can have truth values lying in the unit interval. We shall look little more carefully at the relationship between atomic assertions and attributes. As we have indicated an assertion is a declarative statement that is assigned a value for a given object depending on its degree of validity for that object, generally lies in the unit interval. On the other hand an attribute can be viewed as a variable that takes its value(s) from a universe associated with the variable. In our framework the universe associated with an attribute corresponds to the subset of primitive assertions that is used to dene it. The value of an attribute for a given object depends upon the truth values of the associated primitives. Let us look at this. If V is an attribute we can nd its value for a particular object d in the following way. Let j A(V ) indicate the subset of primitives associated with V . Let d represent the fuzzy subset of A j j corresponding to object d, then the value of the feature V for object d is j Vj (d) = A(Vj ) d; it is the intersection of the attribute denition, the crisp subset A(V ), and the object representation, j the fuzzy subset d. The collection of elements in the subset V (d) determine the value of the attribute j V for the object d. j Often the information about an object will be specied directly in terms of attribute values. We shall assume the ability to extract information about assertion validity from information expressed about attribute values. To illustrate this we consider the following. Let A(V ) = {Aj1 ; Aj2 ; : : : ; Ajn } be j the subset of assertions related to the attribute V . If we are informed that the value of attribute j V for object d is q this means that V (d) = {Ajq }, where Ajq is the assertion that V is q. Since j j j V (d) = A(V ) d, we can conclude that Aji (d) = 0 for all i = q and Ajq (d) = 1. j j In relating the knowledge about assertion validation and feature values it is necessary to carefully distinguish between features that can only assume one unique value, such as date of release of a movie, and features that can assume multiple values, such as people starring in the movie. In the rst case multiple assertions in V (d) is an indication of uncertainty regarding our knowledge of the j value of V . In the second case multiple assertions in V (d) is an indication of multiple solutions j j for V . Here we shall not further pursue this important issue regarding di erent types of variables j but only point to [17] for those interested. Here, we shall assume the ability to interchange between these two representations.
4. Modeling user expressed preferences The basic function of a recommender system is to use what we shall call justications to generate recommendations to a user. By a justication we shall mean a rational for believing a user may like an object. These justications can be obtained either from preferences directly expressed by users or induced using data about the users experiences. In the following we shall look at techniques for obtaining recommendations which make use of a representation of the objects. As we noted not all recommender technologies require representations, collaborative ltering being an example of a technology that does not need representations of the objects.
139
In this section we shall consider the situation in which we have a representation of the objects and the user has specied their preferences intentionally in some manner compatible with this representation. This situation is closely related to the problem of information retrieval [1]. The availability of technologies for this environment is quite rich. The quality of performance of a recommender system in this environment is strongly dependent upon the ability of the system to allow the user to e ectively express their preferences. This capability is itself dependent both upon the assertions and features used to represent the object and the sophistication of the language available to the user to express their preferences in terms of these assertions and features. In the following we brie y describe a language which we introduced in [18]. This language called Hi-Ret provides a very expressive language. This language makes considerable use of the ordered weighted averaging (OWA) operator [13,21]. We shall rst brie y describe this operator. The OWA operator F of dimension n is a mapping OWA : Rn R characterized by an n-dimension vector W , called the weighting vector, such that its components wj ; j = 1 to n, lie in the unit interval and sum to one. The OWA aggregation is dened as
n
OWA(a1 ; : : : ; an ) =
j=1
wj bj ;
where bj is the jth largest of the ai . The unique feature of this operator is the ordering of the arguments by value, a process that introduces a nonlinearity into the operation. We can represent this aggregation operator in vector notation as OWA(a1 ; a2 ; : : : ; an ) = W T B, where W is the weighting vector and B is a vector, called the ordered argument vector, whose components are the bj . The generality of the operator lies in the fact that by selecting W we can implement many di erent aggregation operators. Specically, by appropriately selecting the weights in W , we can emphasize di erent arguments based upon their position in the ordering. From an application point of view an important feature of this operator is the characterizing vector W can be readily related to natural language expressions of aggregation rules. A number of special cases of this operator are illustrated in the following. If the components in W are such that w1 = 1 and wj = 0 for all j = 1 we get OWA(a1 ; a2 ; : : : ; an ) = Maxj [aj ]. We denote this weighting vector as W . If the weights are wn = 1 and wj = 0 for j = n we get OWA(a1 ; a2 ; : : : ; an ) = Minj [aj ]. We denote this weighting vector as W . If the weights are such that wj = 1=n for all j, denoted Wave , then OWA(a1 ; a2 ; : : : ; an ) = (1=n) n aj . Thus we see that the simple average is j=1 a special case of these operators. A number of di erent methods have been suggested for obtaining the weighting vector to be used in the aggregation. For our purpose we shall use an approach based upon the idea of linguistic quantiers. Classical logic provides two quantiers for aggregating truth values for all and there exists, these correspond to anding and oring. The concept of linguistic quantiers was originally introduced by Zadeh [22] to help formalize the many expressions of quantication available in natural language. According to Zadeh a linguistic quantier is a natural language expression corresponding to a proportional quantity. Examples of this are at least one, all, at least %, most, more than a few, some and all. Zadeh [22] suggested a method for formally representing these linguistic quantiers. Let Q be a linguistic expression corresponding to a quantier such as most; then Zadeh suggested representing this as a fuzzy subset Q over I = [0; 1] in which for any proportion r I; Q(r) indicates the degree to which r satises the concept identied by the quantier Q.
140
Fig. 2. Linguistic quantier at least .
In [15], Yager considered the use of linguistic quantiers to generalize the logical quantication operation. He considered the valuation of the statement Q(a1 ; : : : ; an ) where Q is a linguistic quantier and the aj are truth values. It was suggested that the truth value of this type of statement could be obtained with the aid of the OWA operator. This process involved rst representing the quantier Q as a fuzzy subset Q and then using Q to obtain an OWA weighting vector W which was used to perform an OWA aggregation of the ai . Formally we denote this as Q(a1 ; : : : ; an ) = OWAQ (a1 ; : : : ; an ): In the following we shall describe the process of obtaining the weighting vector from the associated fuzzy subset Q. Here we shall restrict ourselves to the class of linguistic quantiers called RIM quantiers. A RIM quantier is represented by fuzzy subset Q : I I in which: Q(0) = 0; Q(1) = 1 and if r1 r2 then Q(r1 )Q(r2 ) (monotonic). These RIM quantiers model the class in which an increase in proportion results in an increase in compatibility to the linguistic expression being modeled. Examples of these types of quantiers are at least one, all, at least %, most, more than a few, some. These are the type of quantiers that are generally used by people in expressing their preferences. If Q is a RIM quantier we associate with it an OWA weighting vector W such that wj = Q(j=n) Q((j 1)=n) for j = 1 to n. Fig. 2 is seen as corresponding to the quantier at least %. For this quantier wj = 1 for j such j that (j 1)=n 6 n and wj = 0 for all other. Another quantier is one in which Q(r) = r for r [0; 1]. For this quantier we get wj = 1=n for all j. This gives us the simple average. We shall denote this quantier as some. One can consider parameterized families of quantiers [14]. For example consider the parameterized family Q(r) = r where [0; ]. Here if = 0, we get the existential quantier; when , we get the quantier for all and when = 1, we get the quantier some. In addition for the case in which = 2, Q(r) = r 2 , we get one possible interpretation of the quantier most. We are now in a position to describe the use of the OWA operator in the construction of a recommender system. We shall assume available to the user a vocabulary of linguistic quantiers
141
Q = {Q1 ; Q2 ; : : : ; Qq } in which they can express themselves. Furthermore, we assume transparent to the user is the representation of each of these quantiers in terms of a fuzzy subset of the unit interval, Qk Qk . We now turn to representation of user preference information. We rst introduce the idea of primal preference module (PPM). As we shall see this will serve as the basic unit which can be used to evaluate the appropriateness of an object for recommendation based on the users preferences. A PPM is of the form A1 ; : : : ; Aq : Q . The components of a PPM, the Ai , are assertions associated with the objects in D and Q is a linguistic quantier. With a PPM a user can express preference information by describing what properties they are interested with respect to the class of objects in D and then using Q to capture the desired relationship between these properties. For example do they desire all or most or some or at least one of these requirements satised. If h is a PPM we can evaluate any object in D with respect to this. In particular for object d we obtain the values Aj (d) from our representation of d then use the OWA aggregation to evaluate it, h(d) = OWAQ (A1 (d); A2 (d); : : : ; Aq (d))]: Here the weighting vector is determined from Q. While the PPM can be directly evaluated for any object, the great benet of our system is that we can let users express their preferences in much more sophisticated ways. We now shall introduce the idea of a basic preference module (BPM). A BPM is a module of the form m = C1 ; C2 ; : : : ; Cp : Q in which the Ci are called the components of the BPM. The only required property of these components are that they can be evaluated for each object. That is for any Ci we need to be able to obtain Ci (d). Once having this we can obtain using the OWA aggregation m(d) = OWAQ [C1 (d); : : : ; Cp (d)]: Let see what kinds of elements can constitute the Ci . Clearly the Ci can be any of the assertions in the set A. More generally the Ci can be any PPM as we know how to evaluate these. Even more generally the Ci can itself be a BPM if we can evaluate it. 2 Additionally the Ci can be the negation of any of preceding types. For example if C is an object which we can evaluate if we include as one of our components not C, C, then C(d) = 1 C(d). We note that preferences specied in terms of attribute values can be easily represented in this framework. Let us illustrate this. Consider an attribute V and let A(V ) = {Aj1 ; Aj2 ; : : : ; Ajn } be the j j subset of assertions related to the attribute V . Without loss of generality we shall let Aji indicate j the assertion that V is ai . First let us consider the case where V is a variable, such as star in ji j a movie, which can take multiple solutions. The requirement that V has aq as one of its values j can be easily expressed simply using the assertion Ajq as one of the components in our preference modules. Consider now the situation where V is an attribute, such as year of release of a movie, that j can assume one and only one value. Consider now the representation of the desire that V is a1 . We j
2
In this case we must be careful to avoid self-reference.
142
BPM C = <C1, C2, ...., Cn: Q>
C1 BPM C1 = <C11, C12, ......:Q1> C11
Ck
Cn
BPM Cn = <Cn1, Cn2, ....:Qn>
Cn1 BPM C11 Aj A i BPM Cn1
Fig. 3. Hierarchical structure of BPM.
represent this as the BPM m = C1 ; C2 : all where C1 is simply the attribute Aj1 . The component C2 is obtained as not C3 where C3 is the BPM dened by Aj2 ; Aj3 ; : : : ; Ajn : Q where Q is the quantier any. The preceding illustrates the ability of formalizing preferences expressed in terms of attributes within this framework. This allows to express their preferences in terms of attribute requirements. Using this framework based on BPMs we can express very sophisticated user preferences. Using a BPM we can express any type of user preference information as long as it can be evaluated by decomposing it into primitive assertions. Of particular value, is the fact that a user can express their preferences even using concepts and language not within the given set of primitive assertions and associated attributes as long as they can eventually formulate their concepts using the primitive assertions. The general structure resulting from the use of BPM is a hierarchical type tree structure whose leafs are primitive assertions (see Fig. 3). Let us see the process. A user expresses a predilection, C, for some types of objects. This predilection is formalized in terms of some BPM, a collection of components (criteria) and some quantier relating these components. This components get further expressed (decomposed) by BPMs which are then further decomposed until we reach a component that is a primitive assertion which terminates a branch. This process can be considered as a type of grounding. We start at the top with the most highly abstract cognitive concepts we then express these using less abstract terms and continue downward in the tree until we reach a grounded concept, a primitive assertion. Once having terminated each of the branches with a primitive assertion our tree provides an operational denition of the predilection expressed by the user. For any object d in D we can evaluate the degree to which it satises the predilection expressed. Starting at the bottom of the tree with the primitive assertions whose validities can be obtained from our database we then back up the tree using the OWA aggregation method. We stop when we reach the top of the tree, this is the degree to which the object d satises the expressed preference.
143
5. User proles Using these basic preference modules we can now dene what we shall call a user prole. One part of the user prole is the user preference prole which consists of a collection of basic preference modules, mj for j = 1 to K, of the type described in the preceding section. Each mj provides a description of a class of objects from D that the user likes. These BPMs can be simple or sophisticated. From statements such as I like Robert DeNiro movies to complicated descriptions of movies. For any object d, mj (d) indicates the degree to which it satises the BPM mj . As any of these preference modules provides a justication for recommendation an object satisfying anyone of these mj is recommendable to the user. If M = {m1 ; m2 ; : : : ; mK } is the preference prole for a given user, then for any object d in D we calculate M (d) = Maxj [mj (d)] as the degree of positive recommendation of this object to the user. At formal level one can view a user preference prole M as a BPM in which its components are the mj . One reason we choose not to do this is that we prefer to reserve the idea of BPM for user preferences that are in some sense conceptually distinct. Thus, while we can formally combine a user preference for Robert Di Niro movies with a user preference for 1940s musical in a single BPM by just oring these it is more in keeping with the way the user sees these by keeping them distinct. Thus in introducing the user prole we are emphasizing the individuality of each of the preferences in M . In the preceding we have assumed that each of the BPM in M had an equal value with regard to their worth to the user. We can consider the situation in which the user associates with each BPM mj in his prole a value j [0; 1] indicating the weight or strength of this preference module. Using this we can calculate M (d) = Maxj [mj (a)
j ]:
We can also allow a user to supply negative or rejection information. We dene a basic rejection module (BRM) ni to be a description of objects from D which the user prefers not to have recommended to him. A BRM is of the same form of BPM except it describes features which the user species as constituting objects he does not want. Thus a second component of the user prole is a collection N = {n1 ; n2 ; : : :} of basic rejection modules. Using this we can calculate the degree of negative recommendation (rejection) of any object to a user, N (d) = Maxi [ni (d)]. It is not necessary that a user have any negative modules. Additionally, we can associate with each rejection module, ni a value i [0; 1] indicating the weight associated with the rejection module ni . Using this we get N (d) = Maxi [ni (d) i ]: We must now combine these two types of scores, recommendation and rejection. Here we just describe some types of operations available. The builder of the system must implement the one that best represents the situation they are trying to model. We note that closely related issues have been discussed by Dubois et al. [3] in there use of examples and counter examples for querying databases.
144
Let R(d) indicate the overall degree of recommendations of the object d with respect to a user. One possibility is some kind of bounded subtraction of the two types of recommendations R(d) = (M (d) N (d)) 0: Another possibility is to assume that rejection has priority over preference. In this case we have R(d) = (1 N (d)) M (d): Here then we are saying we recommend things that are preferred and not rejected by the user. 6. Extensionally expressed preference Now we consider the environment in which each object in D is represented but the user preference information is expressed extensionally. We assume each user has an associated subset E of D corresponding to the subset of objects which it is known they have experienced. Since we shall focus on just one user we can equivalently view this as a situation in which each object has an attribute indicating whether the person has experienced this object or not. Here we shall initially assume a closed world assumption. Under this assumption any object which we have not been informed of the user experiencing we shall assume they have not experienced. In addition in some systems of this type we shall assume that for any object which the user has experienced they have provided a value a [0; 1] indicating their scoring of that object. Our goal here is to suggest ways in which we can use this type of information to recommend new objects to our user. In this environment, one basic paradigm that we can use for justifying recommending objects becomes very obvious. We look for objects which the user has experienced and liked and try to nd objects which the user has not experienced similar to these. In trying to implement this we become faced with the issue of determining the similarity between objects. The problem of determining similarity is clearly context dependent and often very complex [2]. However, the assumed availability of a representation for each of the objects allows us to develop some kind of tool for the calculating the similarity (proximity) between objects. In the following we shall assume the existence of a similarity (proximity) relationship S over the set D of objects. That is, for any two objects di and dj in D we assume S(di ; dj ) [0; 1] is available. The larger S(di ; dj ) the more related or similar the objects. As we already indicated we also assume the existence of some subset E D of objects the user has experienced. Furthermore, we assume for each object di in E the availability of a rating, ai , indicating the score the user has attributed to this object. We note these ratings can be viewed as a fuzzy subset A of E in which A(di ) = ai . Semantically, A corresponds to the subset of objects the user liked. Our goal here is to try to obtain a fuzzy subset R over the space M = D E corresponding to the objects to be recommended. One approach to obtaining this fuzzy subset is in the spirit of fuzzy modeling. We shall try to provide a collection of justications, rules or circumstances, which indicate that an object in M is suitable for recommendation. If Rj are a collection of circumstances for recommending objects where Rj (di ) indicates the degree to which di M meets this condition then R(di ) = Maxj [Rj (di )]:
145
Let us begin considering some guidelines that can be used to support recommendation of an object di . Our focus here is not as much on providing a denitive listing of rules but more to see how fuzzy logic can be used to enable the evaluation of some commonsense guidelines which are expressed in a natural type language. That is we are interested in the process of translating some rules that appear to provide reasonable justications for recommending objects. A most natural source of recommendation can be captured by the following guideline. Rule 1: Recommend an object if there exists a similar object that the user liked. Under this rule the strength of recommendation of an unexperienced object di in D E can be obtained as R1 (di ) = Max [S(di ; dj ) A(dj )]:
j E
We express a second guideline for recommending objects which can be seen as a softening of the rst rule. Rule 2: Recommend an object for which we have at least several comparable objects which the user somewhat liked. Here we are softening the requirements of rule 1 by allowing a less stronger indication of satisfaction, somewhat liked, and allowing a weaker connection as denoted by the use of the word comparable instead of similar. We are compensating for this reduction by requiring at least several such objects instead of just a single object. Our goal now is to suggest a way to formalize this type of rule in a way that we can evaluate it. In anticipation of expressing this rule we introduce some fuzzy subsets. First we note that the term at least several is an example of what Zadeh [22] called a linguistic quantity, words denoting precise or imprecise quantities. In [22] Zadeh suggested that any linguistic quantity can be represented as a fuzzy subset Q of the set of integers. It is clear that at least several is monotonic in that Q(k1 )Q(k2 ) if k1 k2 . We must now introduce a fuzzy subset to capture the idea of somewhat liked. This concept can be modeled in a number of di erent ways. With A being the fuzzy subset of E indicating the users satisfaction, A(dj ) = aj we let A be a softening of this corresponding to the concept somewhat liked. One way of dening A is as A(di ) = (A(di )) for 0 1. The smaller the more the softening. Another method to dene A is A(di ) = 1 if A(di ) ; if A(di ) 6 :
A(di ) = A(di )
More generally, we can express A using a transformation function T : [0; 1] [0; 1] such that T (a)a j ) = T (A(xj )). The function T can be expressed using a fuzzy systems model and then dening A(x [12], for example if a is low then T (a) is medium if a is moderate then T (a) is high if a is large then T (a) is very large Finally, we must dene the concept comparable. As used, the term comparable is meant to indicate a softening of the concept of similar. Again if T is dened in the preceding, as some softening function, T (a)a, we can use this to provide a denition for comparable. Thus, if S(x; y) indicates the degree of similarity between two objects then we can use Comp(x; y) = T (S(x; y)) to indicate
146
the degree to which they are comparable. One possible denition for T in this case is T (a) = 1 if a and T (a) = if a. Once having satisfactorily obtained representations of these softening concepts we can use them to provide an operational formulation of this second rule. For any object di D E we have R2 (di ) = Max Q(|F|) Min (A(dj ) Comp(dj ; di )) :
F E dj F
In the preceding we can express A(dj ) = T1 (A(dj )) and Comp(dj ; di ) = T2 (S(dj ; di )) where T1 and T2 are two transformations. It is interesting to see that our rst rule is a special case of this. If we let T1 and T2 be such that T1 (a) = T2 (a) = a, identity transforms, then R2 (di ) = Max Q(|F|) Min (A(dj ) S(dj ; di )) :
F E dj F
Furthermore, if Q is dened to be at least one then Q(|F|) = 1 if F = and hence R2 (di ) = Max Min (A(dj ) S(dj ; di ))
F E dj F
this can be seen to be equal to Maxdj E [(A(dj ) S(dj ; di ))] which is R1 (di ). It is interesting to consider a collection of rules of this type Rk = Qk ; Ak ; Compk where each is a softening. Each one requiring more objects but softening either or both the requirements regarding satisfaction to the user and proximity to the object being evaluated. Here then, in this softening process, we are essentially increasing the radius about the object, decreasing the required strength but increasing the number of objects that need be found. Another method for justifying possible objects to recommend is to look for unexperienced objects that have a lot of neighbors which the user has experienced regardless of the valuation which they have been given. This captures the idea that the user likes objects of this type regardless of their evaluation. For example a person may see horror movies even if they think these movies are bad. We can see that this type of situation can be expressed as an extreme case of the preceding recommendation rules. Again consider a rule Q; A; Comp where we evaluate it for an item as R(di ) = Max Q(|F|) Min (A(dj ) Comp(dj ; di )) :
F E dj F
To capture the above imperative we let A(dj ) = 1 if dj E and hence we get R(di ) = Max Q(|F|) Min (Comp(dj ; di )) :
F E dj F
Letting Comp(dj ; di ) = S(dj ; di ) we have R(di ) = MaxF E [Q(|F|) Mindj F (S(dj ; di ))]. This can be seen as a type of fuzzy integral [11]. Let Sindexi (k) be the similarity of the kth most similar object in E to the object di . Furthermore let qk = Q(k) then R(di ) = Maxk [qk Sindexi (k) ]:
147
The essential idea of the preceding methods for justifying items to recommend was based on the process of discovering unexperienced items located in areas of the object space that are rich in objects that the user liked or experienced. We can capture this imperative in an alternative manner, one that is in the spirit of the mountain method [19,20]. For each unexperienced item, di D E, we calculate M1 (di ) = dj E aj S(di ; dj ). Here M1 (di ) can be seen as a kind of support for di based on a weighted sum. We let d be such that M1 (d ) = Maxi [M1 (di )]. Using this we can obtain an evaluation or score for each di as R(di ) = M1 (di )=M1 (d ). Alternatively, for each di D E we can calculate M2 (di) = dj E S(di ; dj ), the density of nearby experienced objects regardless of the users rating. We then calculate d+ such that M2 (d+ ) = Maxi [M2 (di )]. From this we obtain a normalized score function R(di ) = M2 (di )=M2 (d+ ). 7. Using domain expert prototypes We shall now consider another approach for obtaining basis for recommendation of objects. In this approach, we rely upon the use of expert dened prototype objects. That is using the features available in the representation of the objects we allow experts in the domain to dene prototype objects. In some sense these prototypes can be seen as re ection of the language and categories in which the community discusses the domain. For example, terms like lm noir, happy movie, block buster, epic can be seen as prototypes used in the movie domain. The actual creation and selection of these domain prototypes is clearly a creative activity and we shall not venture into the issue of obtaining methods for generating prototypes. From a formal point of view these prototypes can be expressed in the terms of the primitive attributes of the objects in a manner analogous to that used to express user preference models. Here we shall assume the availability of a collection of prototypes. For our subsequent purposes we shall consider a prototype object Ti to be some function of the domain of attributes such that for each dj D, we can obtain Ti (d) as a value in the unit interval. As a justication for recommendation we can say that if a user likes prototype class Ti and if a given unexperienced object d is in this prototype class then we recommend the object. To calculate the degree to which an object d is of type Ti we use our denition of this prototype class Ti in which Ti (d) indicates the degree to which d belongs to the prototype. If we let i indicate the degree to which the user likes type Ti objects then the degree of recommendation of object d is Ri (d) =
i
Ti (d):
In order to implement this we need to obtain i , the degree the user likes type Ti objects. One method to determine whether the user likes objects in class Ti is as follows. For any experienced object dj E let aj be the users rating. Then if we calculate L(Ti ) = jE aj Ti (dj )= dj E Ti (dj ) this then gives us the users average rating for objects in class Ti . We can use this for i . Let us consider another method for determining a users inclination toward objects in class Ti . If a user has experienced a lot of objects in the class Ti it is reasonable to assume that they are interested in this class. This interest may be independent of their reporting liking the objects or not. People experience things for various, sometimes neurotic, reasons not necessarily only because they think they are good. The term camp, used to describe objects that are so bad they become amusing re ects this situation. Thus, it appears useful to be able to provide some indication that a user has
148
a signicant degree of interest in movies of type Ti based solely on the quantity of items of this type experienced by the user. We can calculate the number of objects of type Ti experienced by this user as N (Ti ) =
dj E
Ti (dj ):
Using this, we can calculate a number of indices. The rst P1 (Ti ) =

dj E dj D
Ti (dj ) Ti (dj )
indicates the proportion of available type Ti experienced by this user. The second index is P2 (Ti ) =
dj E i(
Ti (dj ) Ti (dj ))
dj E
We now dene a fuzzy subset of the unit interval corresponding to the concept signicant proportion and calculate the degree of membership of both P1 (Ti ) and P2 (Ti ) in this set. Let us denote these values as SP1 (Ti ) and SP2 (Ti ). Using these and the value L(Ti ) we can calculate i = Max[L(Ti ); SP (Ti ); SP2 (Ti )] and then use as our degree of recommendation Ri (d) = i Ti (d). 1 8. Conclusion Here we have considered methodologies for constructing recommender systems. The reclusive approaches studied here di er from the collaborative ltering in that they are based solely on the preferences of the individual for whom we are providing the recommendation and make no use of the preferences of other individuals. We have called these reclusive methods. Another important feature distinguishing these reclusive methods from collaborative methods is that they require a representation of the objects not necessarily required of collaborative ltering methods. While our focus has not been on collaborative methods but rather reclusive methods optimal recommender systems should of course use all information available and hence should be based on a combination of these two classes of systems. In future research we shall look at methods integrating collaborative and reclusive approaches. References
[1] R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, Reading, MA, 1999. [2] J.C. Bezdek, J. Keller, R. Krisnapuram, N.R. Pal, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, Kluwer, Boston, 1999. [3] D. Dubois, H. Prade, F. Sedes, Fuzzy logic techniques in multimedia database querying: a preliminary investigation of the potentials IEEE Trans. Knowledge Data Eng. 13 (2001) 383392. [4] D. Goldberg, D. Nichols, B.M. Oki, D. Terry, Using collaborative ltering to weave an information tapestry, Comm. ACM 35 (12) (1992) 6170. [5] H. Kautz, Recommender Systems, AAAI Press, Menlo Park, CA, 1998.
149
[6] J.A. Konstan, B.N. Miller, D. Maltz, J.L. Herlocker, L.R. Gordon, J. Riedl, Grouplens: applying collaborative ltering to Usenet news Comm. ACM 40 (3) (1997) 7787. [7] P. Perny, J.-D. Zucker, Collaborative ltering methods based on fuzzy preference relations, EUROFUSE-SIC99, Budapest, 1999. [8] P. Resnick, H.R. Varian, Recommender systems, Comm. ACM 40 (3) (1997) 5658. [9] J.B. Schafer, J.A. Konstan, J. Reidl, E-Commerce recommendation applications, Data Mining Knowledge Discovery 5 (2001) 115153. [10] U. Shardanand, P. Maes, Social information ltering: algorithms for automating word of mouth, Proc. Computer Human Interaction-95 Conference, Denver, 1995, pp. 210 217. [11] M. Sugeno, Fuzzy measures and fuzzy integrals: a survey, in: M.M. Gupta, G.N. Saridis, B.R. Gaines (Eds.), Fuzzy Automata and Decision Process, North-Holland, Amsterdam, 1977, pp. 89102. [12] T. Terano, K. Asai, M. Sugeno, Applied Fuzzy Systems, Academic Press, Orlando, FL, 1994. [13] R.R. Yager, On ordered weighted averaging aggregation operators in multi-criteria decision making, IEEE Trans. Systems Man Cybernet. 18 (1988) 183190. [14] R.R. Yager, Families of OWA operators, Fuzzy Sets and Systems 59 (1993) 125148. [15] R.R. Yager, Quantier guided aggregation using OWA operators, Internat. J. Intell. Systems 11 (1996) 4973. [16] R.R. Yager, Targeted e-commerce marketing using fuzzy intelligent agents, IEEE Intell. Systems (2000) 42 45. [17] R.R. Yager, Veristic variables, IEEE Trans. Systems Man Cybernet. Part B: Cybernetics 30 (2000) 7184. [18] R.R. Yager, A hierarchical document retrieval language, Inform. Retrieval 3 (2000) 357377. [19] R.R. Yager, D.P. Filev, Approximate clustering via the mountain method, IEEE Trans. Systems Man Cybernet. 24 (1994) 12791284. [20] R.R. Yager, D.P. Filev, Generation of fuzzy rules by mountain clustering, J. Intell. Fuzzy Systems 2 (1994) 209219. [21] R.R. Yager, J. Kacprzyk, The Ordered Weighted Averaging Operators: Theory and Applications, Kluwer, Norwell, MA, 1997. [22] L.A. Zadeh, A computational approach to fuzzy quantiers in natural languages, Comput. Math. Appl. 9 (1983) 149184. [23] L.A. Zadeh, Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets and Systems 90 (1997) 111127. [24] L.A. Zadeh, A new direction in AItoward a computational theory of perceptions, Artif. Intell. Mag. 22 (1) (2001) 7384.

Fuzzy in Rs

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Fuzzy in Rs

Hochgeladen von

Copyright:

Verfügbare Formate

Fuzzy Sets and Systems 136 (2003) 133 149

Fuzzy logic methods in recommender systems

Tel.: +1-212-249-2047; fax: +1-212-249-1689. E-mail address: ryager@iona.edu (R.R. Yager).

COLLABORATORS Peer Group

Yes Preference Information Type

Fig. 1. Recommender systems information structure.

Fig. 2. Linguistic quantier at least .

In this case we must be careful to avoid self-reference.

BPM C = <C1, C2, ...., Cn: Q>

C1 BPM C1 = <C11, C12, ......:Q1> C11

BPM Cn = <Cn1, Cn2, ....:Qn>

Cn1 BPM C11 Aj A i BPM Cn1

Fig. 3. Hierarchical structure of BPM.

Using this, we can calculate a number of indices. The rst P1 (Ti ) =

Das könnte Ihnen auch gefallen