Sie sind auf Seite 1von 13

Proceedings of the Third Workshop on Knowledge Economy and Electronic Commerce

Discovering fuzzy time-interval sequential patterns in sequence


databases
Yen-Liang Chen
Department of Information Management
National Central University
ylchen@mgt.ncu.edu.tw

Cheng-Kui Huang
Department of Information Management
National Central University
ckhuang@mgt.ncu.edu.tw

Abstract
Given a sequence database and minimum support threshold, the task of sequential
pattern mining is to discover the complete set of sequential patterns in databases. From
the discovered sequential patterns, we can know what items are frequently bought
together and in what order they appear. However, they can not tell us the time gaps
between successive items in patterns. Accordingly, Chen, Chiang and Ko have proposed a
generalization of sequential patterns, called time-interval sequential patterns, which
reveals not only the order of items but also the time intervals between successive items
(Chen et al. 2003). An example of time-interval sequential pattern has a form like (A, I2,
B, I1, C), meaning that we buy A first, then after an interval of I2 we buy B, and finally
after an interval of I1 we buy C, where I2 and I1 are predetermined time ranges. Although
this new type of pattern can alleviate the above concern, it causes the sharp boundary
problem. That is, when a time interval is near the boundary of two predetermined time
ranges, we either ignore or overemphasize it. Therefore, this paper uses the concept of
fuzzy sets to extend the original research so that fuzzy time-interval sequential patterns
are discovered from databases. An efficient algorithm, the FTI-Apriori algorithm, is
developed for mining fuzzy time-interval sequential patterns by modifying traditional
Apriori algorithm. An experimental study is shown for the algorithm.
Keywords: data mining, sequential patterns, sequence data, time interval, fuzzy sets

1. Introduction
Data mining extracts implicit, previously unknown and potentially useful
information from databases. The discovered information and knowledge are useful for
various applications, including market analysis, decision support, fraud detection and
business management. Many approaches have been proposed to extract information, and
mining sequential patterns is one of the most important approaches (Han et al. 2000).
The problem of mining sequential patterns was first introduced in the mid 1990s,
which discovers patterns that occur frequently in a sequence database (Agrawal et al.

337

Proceedings of the Third Workshop on Knowledge Economy and Electronic Commerce

1995; Pei et al. 2000). A typical example of sequential pattern is like that in which a
customer who, having bought a computer, returns to buy a scanner and a microphone.
Although the discovered sequential patterns can reveal what items are frequently bought
together and in what order they appear, they cannot tell us the time gaps between
successive items. Unfortunately, not knowing the time intervals means that, although
we know what items will be bought next, we have no idea when the next purchase will
happen; this makes it difficult to take the right action at the right time. In view of this
problem, Chen, Chiang and Ko (Chen et al. 2003) have proposed a generalization of
sequential patterns, called time-interval sequential patterns, which reveals not only the
order of items but also the time intervals between successive items. The following are
some examples of the time-interval sequential pattern: (a) having bought a laser printer, a
customer returns to buy a scanner in three months and then a CD burner in six months. (b)
A customer revisits website A within a week. (c) After an operation X, a patient is very
likely to be infected by virus Y in two weeks.
Here, we briefly restate the approach proposed by Chen, Chiang and Ko (Chen et al.
2003). The input of their problem contains a sequence database S, a set I = {i1, i2, , im}
of items and a set TI= {I0, I1, I2, , Ir } of time intervals, where TI is a complete and
non-overlap partition of the time domain. A sequence B=(b1, &1, b2, &2, , bv-1, &v-1, bv
) is a time-interval sequence if biI for 1iv and &iTI for 1iv-1. The output is all
time-interval sequences which occur frequently in database S. An example of
time-interval sequential pattern has a form like (A, I2, B, I1, C), meaning that we buy A
first, then after an interval of I2 we buy B, and finally after an interval of I1 we buy C,
where I2 and I1 are predetermined time intervals.
Although sequential patterns extended with time-intervals can offer more
information than those without time-intervals, the approach may cause the sharp
boundary problem. That is, when a time interval is near the boundary of two adjacent
ranges, we either ignore or overemphasize it. For example, let the interval of I2 be
5t<10 and that of I3 be 10t<20, where t is the time gap between two successive items.
Then if the time gap between items A and B is near 10, either a little larger or smaller 10,
it is not fair to judge whether the time interval between A and B is in I2 or in I3.
However, according to the original definition of Chen, Chiang and Ko, it can only be one
hundred percent in I2 or in I3. This difficulty can be adequately tackled by using fuzzy
techniques, for fuzzy set theory allows this time gap to be 50% in I2 and at the same time
50% in I3. This simple example indicates that the fuzzy concept is better than the partition
method because fuzzy sets provide a smooth transition between member and
non-member of a set.
Besides the above-mentioned benefit, there are several other reasons that support the
use of fuzzy time interval in place of crisp interval. First, the human knowledge can be

338

Proceedings of the Third Workshop on Knowledge Economy and Electronic Commerce

represented more naturally and appropriately by fuzzy logic. And how to partition and
represent the time interval is a sort of human knowledge. Second, it is widely recognized
that many real world situations are intrinsically fuzzy. And the partition of time interval is
one of them. Third, fuzzy time interval is simple and easy for users. For example, if we
use fuzzy sets to handle the time intervals, we can first define the linguistic terms that are
meaningful and understandable to users. Then, for each such term we can choose
appropriate fuzzy function to represent it.
A number of researches have exploited fuzzy techniques to mine fuzzy association
rules or sequential patterns from databases. These efforts can be roughly classified into
the following: (1) fuzzy representation of items quantity (Lee et al. 1997), (2) fuzzy
representation of quantitative attribute (Hong et al. 1999; Zhang 1999), (3) fuzzy product
taxonomies or generalization hierarchies (Chen et al. 2002), (4) fuzzy representation of
item importance (Yue et al. 2000), (5)fuzzy representation of transactions (Lee 2000), (6)
fuzzy support and confidence measure (Kuok et al. 1998), (7) using fuzzy techniques for
determining linguistic terms or domain partition (Fu et al. 1998; Vazirgiannis 1998), and
(8) using fuzzy techniques to determine rules interestingness (Au et al. 1997; Au et al.
1998; Au et al. 1999; Au et al 2003).
To our knowledge, no research has ever applied fuzzy techniques to deal with time
intervals in time-interval sequential patterns. We, therefore, extend the original research
of Chen, Chiang and Ko so that fuzzy time-interval sequential patterns can be discovered
from databases. Some linguistic terms, such as Long, Middle, and Short, will be provided
to represent time-intervals. And, a fuzzy time-interval sequential pattern may have a form
like:
Having bought a laser printer, a customer returns to buy a scanner in a Short
period and then a CD burner in a Long period.
The rest of this paper is organized as follows. Section 2 formally defines the
problem and the fuzzy time-interval sequential pattern. Thereafter, Section 3 develops
an algorithm to find fuzzy time-interval sequential patterns, which is developed by
modifying the traditional Apriori algorithm. Section 4 shows the performance of the
algorithm. Conclusions are finally drawn in Section 5.

2. Problem Definition
As done in the previous research of Chen, Chiang and Ko (Chen et al. 2003), we
represent a sequence in the following way.
Definition 1. A sequence s is represented as ( (a1, t1), (a2, t2), (a3, t3),, (an, tn) ), where
aj is an item and tj stands for the time at which aj occurs, 1 j n, and tj-1 tj for 2 j n.
In the sequence, if items occur at the same time, they are ordered alphabetically.
From the time tags attached to the items in sequence s, we can compute the time

339

Proceedings of the Third Workshop on Knowledge Economy and Electronic Commerce

interval values as tij =|tj+1-tj|, where j=1, 2, , n-1. For example, if we have a sequence
s as ((a, 1), (b, 4), (e, 29)), then its time interval values are 3 and 25. Suppose we have the
set LT={ltj | j=1, 2, , l} of linguistic terms. Then we use ltj(ti) to denote the
membership degree of time-interval value ti to linguistic term ltj.
Two approaches have been used to determine linguistic terms and fuzzy membership
functions (Medasani et al. 1998). The first approach relies on domain experts to specify
the functions based on their background knowledge and requirements. The second
approach assumes that the functions are obtained by a preprocessing phase that learns the
functions from the data, such as learning by neural-network (Lin et al. 1991), by genetic
algorithm (Karr et al. 1993), by clustering method (Fu et al. 1998), and by entropy
measure (Ross 1995). Therefore, a complete process in fuzzy mining may contain two
phases, where the first phase learns fuzzy functions from data and the second phase
discovers patterns according to the fuzzy functions learned from the first phase.
Interestingly but not surprisingly, almost all of the existing papers in fuzzy mining only
deal with the second phase by assuming that the fuzzy functions are given, because this
can simplify the presentation of the paper and enable us to focus on the design of mining
algorithms. Due to these reasons, we adopt the same assumption that the fuzzy functions
are given.
Example 1. Suppose we want to represent a time interval by using three linguistic terms:
Short(S), Middle(M), and Long(L). Their membership functions can be represented as
follows.

ti j 2
1,
either 2 ti j or ti j 28
0,
15 ti
j
ti j 2
Short (ti j ) =
, 2 < ti j < 15
(
)
=
,
2 < ti j 15
ti

13

Middle
j

13
ti
0
,

15

28 ti j ,
15 < ti j < 28
13
ti j 15
0,
ti 15
, 15 < ti j < 28
Long (ti j ) = j
13
ti j 28
1,

Fig. 1. The fuzzy membership functions for time-interval concept.


By applying the fuzzy functions above, we find that the time-interval value 3 is
0.92/Short + 0.08/Middle + 0.0/Long and the time-interval value 25 is 0.0/Short +
0.23/Middle + 0.77/Long. According to the linguistic terms and the membership functions,
we can define the fuzzy time-interval sequence as follows.
Definition 2. Let I = {i1, i2, , im} be the set of all items and LT={ltj | j=1, 2, , l} be
the set of all linguistic terms. A sequence =(b1, lg1, b2, lg2, , br-1, lgr-1, br ) is a fuzzy

340

Proceedings of the Third Workshop on Knowledge Economy and Electronic Commerce

time-interval sequence if biI for 1ir and lgiLT for 1ir-1.


Definition 3. Let s=((a1, t1), (a2, t2), (a3, t3),, (an, tn)) be a sequence and =(b1, lg1, b2,
lg2, , br-1, lgr-1, br ) be a fuzzy time-interval sequence, where r2. Let lgi(t) denote the
membership degree of time-interval value t to linguistic term lgi. Suppose there are K
lists of indexes in s, denoted as 1 wk ,1 < wk , 2 << wk ,r n for k=1 to K, each of which
satisfies the condition of b1 = a wk ,1 , b2 = a wk , 2 , , and br = a wk , r . Then we call that is
contained in s with degree or that is a fuzzy time-interval subsequence of s with
degree iff the following conditions hold:
(1) tiw =| twk ,i+1 twk ,i | for i=1, 2,, r-1 and k=1, 2, , K;
k ,i
(2) =max1kK min1ir-1{lgi( tiw )}.
k ,i

Although Definition 3 seems to be a right definition, it does not consider the situation
of r=1, where the fuzzy time-interval sequence degenerates into a crisp sequence
containing a single item. To make the definition complete, we do the following
amendment.
Definition 4. When a fuzzy time-interval sequence only contains a single item, it can
be represented as =(b1), where b1I. In such a case, we call that is contained in s with

degree 1 if there exists an integer j, where 1jn, such that b1 = aj.


The total number of items in a fuzzy time-interval sequence is referred to as the
length of the sequence. A fuzzy time-interval sequence whose length is k is referred to as
a fuzzy k-time-interval sequence.
Example 2. Suppose we are given a sequence s=((a, 4), (d, 5), (d, 10), (e, 28)) and a

fuzzy time-interval sequence =(a, Short, d, Middle, e). There are two ways that we
can match : one is ((a, 4), (d, 5), (e, 28)) and the other is ((a, 4), (d, 10), (e, 28)). For the
first case, we have the degree as min {Short(1), Middle(23)}= min {1, 5/13}=0.385. The
second case has the degree as min{Short(6), Middle(18)}= min{9/13, 10/13}= 0.692.
Consequently, is contained in s with degree max{0.385, 0.692}=0.692.
For ease of reference, let (, s) represent the degree that a fuzzy time-interval
sequence is contained in sequence s, which is determined according to Definitions 2, 3
and 4. A transaction is represented by <sid, s>, where sid is the identifier of this
transaction and s is a sequence. A sequence database S is formed by a set of transactions.
For a given fuzzy time-interval sequence , its support in database S is defined as
follows.
Definition 5. supportS () =(sid, s) in S (, s) / S

341

Proceedings of the Third Workshop on Knowledge Economy and Electronic Commerce

A fuzzy time-interval sequence is called a fuzzy time-interval sequential pattern or


a frequent fuzzy time-interval sequence if its support in S is greater than or equal to the
user-specified minimum support (called min_sup). A fuzzy time-interval sequential
pattern with length k is referred to as a fuzzy k-time-interval sequential pattern.
Given a sequence database and min_sup, the goal of fuzzy time-interval sequential
pattern mining is to determine in the sequence database all the fuzzy time-interval
subsequences whose supports are more than or equal to min_sup.

Sid

Sequence

10

( (a, 1), (b, 4), (e, 29) )

20

( (d, 1), (a, 2), (d, 24) )

30

( (b, 1), (a, 11), (e, 28) )

40

( (f, 1), (b, 5), (c, 19) )

50

( (a, 4), (b, 5), (d, 10), (e, 28) )

60

( (a, 0), (b, 5), (e, 30) )

70

( (j, 2), (a, 17), (h, 17) )

80

( (c, 3), (i, 10), (f, 18) )

90

( (h, 4), (a, 10), (b, 21) )

100

( (g, 0), (a, 0), (b, 3), (e, 30) )

Fig. 2. A sequence database.


Example 3. Consider the sequence database shown in Fig. 2 with the linguistic terms
defined in Example 1. If min_sup=0.3, then we can find fuzzy time-interval sequential
pattern (a, Short, b, Long, e) with support 0.308 in the database. Four transactions
(Sid=10, 50, 60 and 100) contribute to this pattern, whose degrees are respectively 0.77,
0.62, 0.77 and 0.92. According to Definition 5, the support of this pattern is
(0.77+0.62+0.77+0.92)/10=0.308.

3. Algorithms for Mining Fuzzy Time-interval Sequential Patterns


The goal of this section is to develop an algorithm for mining fuzzy time-interval
sequential patterns from databases. The algorithm is developed by modifying the
well-known Apriori algorithm. We introduce them in the following.
3.1. The FTI-Apriori algorithm
The Fuzzy Time Interval (FTI)-Apriori Algorithm is developed by modifying the
well-known Apriori algorithm. Basically, two phases are repeatedly executed to generate
the patterns. The first phase generates candidate sequences of length k, denoted by Ck,
342

Proceedings of the Third Workshop on Knowledge Economy and Electronic Commerce

from the frequent sequences of length k-1, denoted by Lk-1. So, each candidate sequence
generated in the current cycle will have one more item and one more linguistic term than
the frequent sequences in the preceding cycle. After finding the set of candidate
sequences, the second phase scans the database to determine the support of each
candidate pattern, and the resulting set comprises all frequent sequences of length k.
In the following, we discuss how to execute the first phase for different values of k:
(1) For k=1: The set of candidate patterns of length 1, C1, will be generated by listing all
distinct items in databases.
(2) For k=2: Traditional, C2 was obtained by directly joining L1 with L1. However,
since the first item and the second item in C2, say b and c, may have various fuzzy
time-interval relations, pairs for all possible fuzzy time-interval relations must be
generated. Let us consider an explanatory example. Suppose that (b) and (c) belong
to L1 and LT={lt1, lt2, lt3, lt4, lt5}. Then there are totally 20 candidate fuzzy
time-interval sequences in C2. Some of them are (b, lt1, b), (b, lt3, b), (b, lt2, c), (c,
lt2, b) and (c, lt2, c). In a word, C2 can be generated as L1 TI L1, where denotes
join.
(3) When k>2: Let (b1, lg1, b2, lg 2, , lg k-1, bk) be a fuzzy k-time-interval sequence in Lk.
Then, the fuzzy (k-1)-time-interval sequences (b1, lg1, b2, lg2, , lgk-2, bk-1) and (b2 ,
lg2, , bk-1, lg k-1, bk) must be also frequent, because the support of (b1, lg1, b2, lg2, ,
lgk-1, bk) must be no larger than the supports of the other two. (For the proof, please
refer to Theorem A.1 in Appendix). Therefore, if the time-interval sequences (b1, lg1,
b2, lg2, , lgk-2, bk-1) and (b2, lg2, , bk-1, lgk-1, bk) exist in Lk-1, then (b1, lg1, b2,
lg2, , lgk-1, bk) must exist in Ck. All the time-interval sequences in Ck can be
generated by joining the time-interval sequences in Lk-1 this way.
Next, we will discuss how to execute the second phase, i.e., to determine the supports
of all patterns in Ck. To this end, a tree structure, called fuzzy candidate tree, is used as a
basis. Basically, the candidate tree is similar to the prefix tree adopted in previous
research (Agrawal et al. 1994; Liu et al. 2003). The major difference lies in that the
traditional approach connects each tree branch with an item name, whereas in the new
approach two components are attached an item name and a linguistic term.
Suppose we are given a candidate set Ck. Initially, we have an empty tree with a
single root node. Then we insert every fuzzy time-interval pattern in Ck into the tree, just
as how we build a prefix tree. After all the patterns in Ck have been inserted, the tree is
built. Next, we will traverse the tree for every transaction. For a given transaction, after
finishing the traversal we can determine the degrees that the patterns in the tree are
contained in that transaction. Finally, after the tree has been traversed by all transactions
the support value of every pattern is kept in the corresponding leaf node in the tree. So,
we can determine what patterns are frequent and what are not.

343

Proceedings of the Third Workshop on Knowledge Economy and Electronic Commerce

In the following, the major steps of the FTI-Apriori algorithm are listed. For clarity,
we omit the detailed functions and steps.
Fig. 3. The FTI-Apriori algorithm.
Input: Sequence Database S, Minimum Support min_sup, and Linguistic Terms LT;
Output: The complete set of fuzzy time-interval patterns
Variable: c.count is the support of time-interval sequence c
Method:
C1 = find_all_items(S);

L1 ={cC1|c.count min_sup}
For each i1L1{
For each i2L1{
For each ltdLT;
c=i1*ltd*i2;
add c to C2;
}
}
L2={cC2|c.count min_sup}
For (k>2; Lk-1 ; k++) do begin
{
Ck=fuzzy_apriori_gen(Lk-1);
Build the fuzzy candidate tree from Ck;
For each sequence sS
{Traverse the fuzzy candidate tree and accumulate the supports; }
Lk={cCk|(c.count /S) min_sup}
}
return Lk;
Example 4. Consider the sequence database shown in Fig. 2 and assume that we set
min_sup as 0.3. C1 will be generated as follows:

(a): 8, (b): 7, (c): 2, (d): 2, (e): 5, (f): 2, (g): 1, (h): 2, (i):1, (j): 1,
Then, we have L1={a, b, e} because their supports are larger than min_sup. After that, C2
can be generated by joining L1 with LT={Short, Middle, Long}, where their membership
functions are referred to Fig. 1. The a then b pattern can be generated as the following:
For a then b with different linguistic terms:
Short =(0.92+1.0+0.77+0.31+0.92)/10=3.92/10=0.392
Middle =(0.08+0.0+0.23+0.69+0.08)=1.08/10=0.108
Long =(0.0+0.0+0.0+0.0+0.0)=0.0
Among the above three, only the pattern of a then b in Short can be generated in L2
since its support is greater than min_sup. Besides this pattern, other patterns in L2 include:

344

Proceedings of the Third Workshop on Knowledge Economy and Electronic Commerce

a then e in Long with support 0.361, and b then e in Long with support 0.308.
After the generation of L2, the algorithm starts to produce Ck and Lk for k>2. Since
the patterns in L2 are (a, Short, b), (a, Long, e) and (b, Long, e), the candidate pattern that
we can generate for C3 is (a, Short, b, Long, e). The following computation indicates that
the support of this pattern exceeds min_sup, and thus we have L3 as {(a, Short, b, Long,
e)}.
Sid=10: min{(a, 0.92/Short, b), (b, 0.77/Long, e)} = 0.77
Sid=50: min{(a, 1.0/Short, b), (b, 0.62/Long, e)} = 0.62
Sid=60: min{(a, 0.77/Short, b), (b, 0.77/Long, e)} = 0.77
Sid=100: min{(a, 1.0/Short, b), (b, 0.92/Long, e)} = 0.92
Support= (0.77+0.62+0.77+0.92)/10=0.308

4. Experimental Results
In this section, we perform a simulation study of the algorithm, FTI-Apriori. It is
implemented by Sun Java language (J2SDK 1.4.1_02) and tested on a PC with two Intel
Pentium III 933 processors and 1GB main memory under the Windows 2000 operating
system. Neither the multithreading technology nor the parallel computing skill is used in
our implemented programs.
Synthetic datasets are generated by applying the famous synthetic data generation
algorithm in Agrawal et al. (Agrawal et al. 1995). Basically, each transaction is a
sequence of itemsets. However, we extend the transaction data so that the items in
different item sets have different time values and that those in the same item set have the
same time values. A value w is drawn from a Poisson distribution with mean TI for each
customer. The drawn value w represents the average time interval between successive
itemsets in the sequence of this particular customer. After that, we determine the
intervals between successive itemsets of this customer by repetitively drawing values
from a Poisson distribution with mean w.
Table I lists the parameters used in the simulation; the first eight parameters are the
classical ones used in previous research but the last parameter TI is a new parameter
created for the problem considered here. In the simulation, some parameters are fixed:
N=10000, Ns=5000, NI=25000, TI=15 and |D|=250000.
Table I
Parameters
|D|
|C|
|T|
|S|
|I|

Number of customers
Average number of transactions per customer
Average number of items per transaction
Average length of maximal potentially large sequences
Average size of itemsets in maximal potentially large sequences

345

Proceedings of the Third Workshop on Knowledge Economy and Electronic Commerce

Number of maximal potentially large sequences


Number of maximal potentially large itemsets
Number of items
Average length of time intervals

NS
NI
N
TI

Table II
Parameters
Name

|C|

|T|

|S|

|I|

C10-T2.5-S4-I1.25

10

2.5

1.25

C10-T5-S4-I1.25
C10-T5-S4-I2.5

10
10

5
5

4
4

1.25
2.5

C20-T2.5-S4-I1.25
C20-T2.5-S4-I2.5
C20-T2.5-S8-I1.25

20
20
20

2.5
2.5
2.5

4
4
8

1.25
2.5
1.25

The first comparison would compare the run times of these seven algorithms for
different minimum supports. The comparison is carried out on the basis of the six data
sets shown in Table II, where the minimum support threshold is varied from 3.0% to
1.5%. Fig. 4 summarizes the results.

C10-T5-S4-I1.25

Runtime(Second)

Runtime(Second)

C10-T2.5-S4-I1.25
300
200

FTI-Apriori

100
0
1.5

2.5

4000
3000
2000

FTI-Apriori

1000
0

1.5

2000
1500
FTI-Apriori

500
0
1.5

2.5

2.5

C20-T2.5-S4-I1.25

Runtime(Second)

Runtime(Second)

C10-T5-S4-I2.5

1000

Minimum Support

Minimum Support

6000
4000
FTI-Apriori

2000
0
1.5

2.5

Minimum Support

Minimum Support

346

Proceedings of the Third Workshop on Knowledge Economy and Electronic Commerce

C20-T2.5-S8-I2.5

3000
2000

Runtime(Second)

Runtime(Second)

C20-T2.5-S4-I2.5

FTI-Apriori

1000
0
1.5

2.5

2000
1500
1000
500
0

FTI-Apriori

1.5

Minimum Support

2.5

Minimum Support

Fig. 4. Run times for the six data sets.

5. Conclusion
Sequential-pattern mining is useful in discovering customer purchasing patterns
along time from transactional databases. Since the method was first proposed by
Agrawal et al. (Agrawal et al. 1995) in 1995, it has become an established and active
research area. The existing methods, however, do not discover the time intervals
between successive items in the pattern. In view of this problem, Chen, Chiang and Ko
proposed a novel method to discover the time-interval information between successive
items in the pattern. With this additional information, we can know when the next
purchase will happen after the previous purchase was made.
Although time-interval sequential patterns can provide more information than those
without time-intervals, the approach may cause the sharp boundary problem. That is,
when a time interval is near the boundary of two adjacent ranges, we either ignore or
overemphasize it. Therefore, this paper uses the concept of fuzzy sets to extend the
original research of Chen, Chiang and Ko so that fuzzy time-interval sequential pattern
can be discovered from databases. Some linguistic terms, such as Long, Middle and Short,
are provided to represent the linguistic terms for time-intervals.
Fuzzy time-interval sequential pattern mining represents a new and promising
research area in data mining. The results of this paper can be extended by considering
time constraints, spatial constraints, fuzzy time-hierarchy and other kinds of time-related
knowledge. Furthermore, it is important to explore how different fuzzy membership
functions may influence the result of mining.

References
Agrawal, R., and Srikant, R. Fast Algorithms for Mining Association Rules, in
Proceedings of 1994 International Conference Very Large Data Bases, 1994, pp.
487-499.
Agrawal, R., and Srikant, R. Mining Sequential Patterns, in Proceedings of 1995
International Conference Data Engineering, 1995, pp. 3-14.

347

Proceedings of the Third Workshop on Knowledge Economy and Electronic Commerce

Au, W. H., and Chan, K. C. C. Mining fuzzy association rules, in Proc. 6th Int.
Conference Information Knowledge Management, Las Vegas, NV, 1997, pp.
209-215.
Au, W. H., and Chan, K. C. C. An effective algorithm for discovering fuzzy rules in
relational databases, in Proceedings IEEE International Conference Fuzzy Systems,
vol. II, 1998, pp. 1314-1319.
Au, W. H., and Chan, K. C. C. FARM: A data mining system for discovering fuzzy
association rules, in Proceedings FUZZ-IEEE99, vol. 3, 1999, pp. 22-25.
Au, W. H., and Chan, K. C. C. Mining fuzzy association rules in a bank-account
database, IEEE Transaction on Fuzzy Systems (11), 2003, pp. 238-248.
Chen, G., and Wei, Q. Fuzzy association rules and the extended mining algorithms,
Information Sciences (147), 2002, pp. 201-228.
Chen, Y. L., Chiang, M. C., and Ko, M. T. Discovering Time-interval Sequential
Patterns in Sequence Databases, Expert Systems with Applications (25:3), 2003, pp.
343-354.
Fu, A. W. C., Wong, M. H., Sze, S. C., Wong, W. C., Wong, W. L., and Yu, W. K.
Finding fuzzy sets for the mining of fuzzy association rules for numerical
attributes, in Proceedings International Symposium Intelligent Data Engineering
Learning (IDEAL98), Hong Kong, 1998, pp. 263-268.
Hong, T. P., Kuo, C. S., and Chi, S. C. Mining association rules from quantitative data,
Intelligent Data Analysis (3), 1999, pp. 363-376.
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., and Hsu, M. C. FreeSpan:
Frequent Pattern-projected Sequential Pattern Mining, in Proceedings of 2000
International Conference on Knowledge Discovery and Data Mining, 2000, pp.
355-359.
Karr, C. L., and Gentry, E. J. Fuzzy control of pH using genetic algorithms, IEEE
Transaction on Fuzzy Systems (1:1), 1993, pp.46-53.
Kuok, C. M., Fu, A., and Wong, M. H. Mining fuzzy association rules in databases,
SIGMOD Record (27:1), 1998, pp. 41-46.
Lin, C. T., and Lee, C. S. G. Neural network based fuzzy logic control and decision
systems, IEEE Transaction on Computers (40:12), 1991, pp. 1320-1336.
Lee, J. H., and Kwang, H. L. An extension of association rules using fuzzy sets,
presented at the IFSA97, Prague, Czech Republic, 1997.
Lee, J. W. T. An ordinal framework for data mining of fuzzy rules, in FUZZ IEEE
2000, San Antonio, TX, 2000, pp. 399-404.
Liu, G., Lu, H., Xu, Y., and Yu, J. X. Ascending frequency ordered prefix-tree: efficient
mining of frequent patterns, in Proceedings of the Eighth International Conference on
Database Systems for Advanced Applications, 2003, pp. 65-72.

348

Proceedings of the Third Workshop on Knowledge Economy and Electronic Commerce

Medasani, S., Kim, J., and Krishnapuram, R. An overview of membership function


generation techniques for pattern recognition, International Journal of Approximate
Reasoning (19), 1998, pp. 391-417.
Pei, J., Han, J., Mortazavi-Asl, B., and Zhu, H. Mining access patterns efficiently from
web logs, in Proceedings of 2000 Pacific-Asia Conference on Knowledge Discovery
and Data Mining, 2000, pp. 396-407.
Ross, T. J. Fuzzy Logic with Engineering Applications, McGraw-Hill, Inc. 1995.
Vazirgiannis, M. A classification and relationship extraction scheme for relational
databases based on fuzzy logic, in Proceedings Research Development Knowledge
Discovery Data Mining, Melbourne, Australia, 1998, pp. 414-416.
Yue, J. S., Tsang, E., Yenng, D., and Daming, S. Mining fuzzy association rules with
weighted items, in Proc. IEEE International Conference Systems, Man, Cybernetics,
Nashville, TN, 2000, pp. 1906-1911.
Zhang, W. Mining fuzzy quantitative association rules, in Proceedings 11th
International Conference Tools Artificial Intelligence, Chicago, IL, 1999, pp. 99-102.

349

Das könnte Ihnen auch gefallen