Sie sind auf Seite 1von 16

Advanced information

retrieval
Chapter. 02: Modeling (Set
Theoretic Models)
Fuzzy model


Set Theoretic Models
The Boolean model imposes a binary
criterion for deciding relevance
The question of how to extend the
Boolean model to accomodate partial
matching and a ranking has attracted
considerable attention in the past
We discuss now two set theoretic models
for this:
Fuzzy Set Model
Extended Boolean Model
Fuzzy Set Model
Queries and docs represented by sets of index
terms: matching is approximate from the start
This vagueness can be modeled using a fuzzy
framework, as follows:
with each term is associated a fuzzy set
each doc has a degree of membership in this fuzzy
set
This interpretation provides the foundation for
many models for IR based on fuzzy theory
In here, we discuss the model proposed by
Ogawa, Morita, and Kobayashi (1991)
Fuzzy Set Theory
Framework for representing classes whose
boundaries are not well defined
Key idea is to introduce the notion of a degree of
membership associated with the elements of a set
This degree of membership varies from 0 to 1 and
allows modeling the notion of marginal membership
Thus, membership is now a gradual notion, contrary
to the crispy notion enforced by classic Boolean logic
Alternative Set Theoretic Models
-Fuzzy Set Model
Model
a query term: a fuzzy set
a document: degree of membership in this set
membership function
Associate membership function with the elements
of the class
0: no membership in the set
1: full membership
0~1: marginal elements of the set
documents
Fuzzy Set Theory
A fuzzy subset A of a universe of discourse
U is characterized by a membership
function
A
: U[0,1] which associates
with each element u of U a number
A
(u)
in the interval [0,1]
complement:
union:
intersection:
) ( 1 ) ( u u
A
A
=
)) ( ), ( max( ) ( u u u
B A B A
=

)) ( ), ( min( ) ( u u u
B A B A
=

a class
a document
document collection
for query term
Examples
Assume U={d
1
, d
2
, d
3
, d
4
, d
5
, d
6
}
Let A and B be {d
1
, d
2
, d
3
} and {d
2
, d
3
, d
4
},
respectively.
Assume
A
={d
1
:0.8, d
2
:0.7, d
3
:0.6, d
4
:0, d
5
:0, d
6
:0}
and

B
={d
1
:0, d
2
:0.6, d
3
:0.8, d
4
:0.9, d
5
:0, d
6
:0}
={d
1
:0.2, d
2
:0.3, d
3
:0.4, d
4
:1, d
5
:1, d
6
:1}
=
{d
1
:0.8, d
2
:0.7, d
3
:0.8, d
4
:0.9, d
5
:0, d
6
:0}
=
{d
1
:0, d
2
:0.6, d
3
:0.6, d
4
:0, d
5
:0, d
6
:0}
) ( 1 ) ( u u
A
A
=
)) ( ), ( max( ) ( u u u
B A B A
=

)) ( ), ( min( ) ( u u u
B A B A
=

Fuzzy Information Retrieval


basic idea
Expand the set of index terms in the query
with related terms (from the thesaurus) such
that additional relevant documents can be
retrieved
A thesaurus can be constructed by defining a
term-term correlation matrix c whose rows
and columns are associated to the index
terms in the document collection
keyword connection matrix
Fuzzy Information Retrieval
(Continued)
normalized correlation factor c
i,l
between
two terms k
i
and k
l
(0~1)



In the fuzzy set associated to each index
term k
i
, a document d
j
has a degree of
membership
i,j
l i l i
l i
l i
n n n
n
c
,
,
,
+
=
) 1 ( 1
, ,
[
e
=
j
d
l
k
l i j i
c
where
n
i
is # of documents containing term k
i
n
l
is # of documents containing term k
l
n
i,l
is # of documents containing k
i
and k
l
Fuzzy Information Retrieval
(Continued)
physical meaning
A document d
j
belongs to the fuzzy set associated to
the term k
i
if its own terms are related to k
i
, i.e.,

i,j
=1.
If there is at least one index term k
l
of d
j
which is
strongly related to the index k
i
, then
i,j
~1.
k
i
is a good fuzzy index
When all index terms of d
j
are only loosely related to
k
i
,
i,j
~0.
k
i
is not a good fuzzy index
Example
q=(k
a
. (k
b
v k
c
))
=(k
a
. k
b
. k
c
) v (k
a
. k
b
. k
c
) v(k
a
. k
b
. k
c
)
=cc
1
+cc
2
+cc
3

D
a
D
b
D
c
cc
3
cc
2
cc
1
D
a
: the fuzzy set of documents
associated to the index k
a
d
j
eD
a
has a degree of membership

a,j
> a predefined threshold K

D
a
: the fuzzy set of documents
associated to the index k
a

(the negation of index term k
a
)


Example
)) 1 )( 1 ( 1 ( )) 1 ( 1 ( ) 1 ( 1
) 1 ( 1
, , , , , , , , ,
3
1
,
, 3 2 1 ,
j c j b j a j c j b j a j c j b j a
i
j
i
cc
j cc cc cc j q


=
=
=
[
=
+ +
Query q=k
a
. (k
b
v k
c
)
disjunctive normal form q
dnf
=(1,1,1) v (1,1,0) v (1,0,0)
(1) the degree of membership in a disjunctive fuzzy set is computed
using an algebraic sum (instead of max function) more smoothly
(2) the degree of membership in a conjunctive fuzzy set is computed
using an algebraic product (instead of min function)
Recall
) ( 1 ) ( u u
A
A
=
Fuzzy Set Model
Q: gold silver truck
D1: Shipment of gold damaged in a fire
D2: Delivery of silver arrived in a silver truck
D3: Shipment of gold arrived in a truck
IDF (Select Keywords)
a = in = of = 0 = log
3/3

arrived = gold = shipment = truck = 0.176 = log
3/2

damaged = delivery = fire = silver = 0.477 = log
3/1

8 Keywords (Dimensions) are selected
arrived(1), damaged(2), delivery(3), fire(4), gold(5),
silver(6), shipment(7), truck(8)
Fuzzy Set Model
Fuzzy Set Model
Fuzzy Set Model
Sim(q,d): Alternative 1





Sim(q,d
3
) > Sim(q,d
2
) > Sim(q,d
1
)
Sim(q,d): Alternative 2





Sim(q,d
3
) > Sim(q,d
2
) > Sim(q,d
1
)

Das könnte Ihnen auch gefallen