Sie sind auf Seite 1von 4

2016 International Symposium on Computer, Consumer and Control

Fuzzy C-means with Spatiotemporal Constraints


Xu Gao, Fusheng Yu
School of Mathematical Sciences, Beijing Normal University,
Laboratory of Mathematics and Complex Systems, Ministry of Education, Beijing 100875, China
yufusheng@bnu.edu.cn

Abstract— Cluster analysis is a method of dividing the data into by defining space-time coupled distance. But it is difficult to
meaningful clusters based on the similarity of the data, making it fuse spatial and temporal properties. Fuzzy c-means(FCM) is
possible to observe the data on a higher-level. Spatiotemporal a classic method of data clustering[10], based on which
data mining is a hot research topic in recent years. Currently, the Pedrycz proposed a fuzzy clustering method with spatial
research of spatiotemporal clustering analysis is still at a
preliminary stage. Different from the existing methods for
constraint[11], and he also proposed a fuzzy clustering
clustering spatiotemporal data, in this article we propose a novel method for spatiotemporal data with separatable spatial and
fuzzy clustering method, fuzzy c-means with spatiotemporal temporal components[12]. Yujie Li and Huimin Lu proposed
constraints, which has both spatial and temporal constraints, a robust color image segmentation method based on weighting
considering closeness degree in both space and time. Using the fuzzy C-means clustering, introducing a method for deter-
Lagrange multiplier method, the iterative formulas of cluster mining the number of clusters for image segmentation, but the
centers and partition matrix are derived. The proposed method dataset doesn’t have temporal properties[13,14]. In this paper,
is verified using both artificially generated data and real data. we have proposed a FCM with spatiotemporal constraints
Experimental results show that the algorithm can effectively method, which is applicable for all kinds of data with
treat trajectory data. For spatiotemporal data with transition
points or mixed, the algorithm also has good applicability.
spatiotemporal information, and experimental results show its
feasibility in dealing with trajectory data, providing an
approach for granulation of trajectories.
Keywords—spatiotemporal data, fuzzy clustering, trajectory,
fuzzy c-means, spatiotemporal constraints.
The structure of this paper is as follows: Section ൖ presents
the preliminary of this paper, Section ൗ illustrates the type of
I. INTRODUCTION
data to be processed, proposes the objective function of FCM
In recent years, with the development of information with spatiotemporal constraints, derives the formula for
technology, we’ve come to an era of big data and spatiote- iterative algorithm using the Lagrange multiplier method, and
mporal data is increasing explosively. Temporal and spatial introduces the algorithm of the FCM with spatiotemporal
properties are key issues while analysing data from business,
management, scientific research and so on. Therefore, deve- constraints. The experimental results are shown in Section൘.
loping an approach for analysing spatiotemporal data is And Section ൙ is for the conclusions.
meaningful.
Cluster analysis is a method of dividing the data into II. PRELIMINARY
meaningful clusters based on the similarity of the data, In this section, the standard FCM algorithm is briefly
making it possible to observe the data on a higher-level. Clus- introduced.
tering spatiotemporal data has become a focus because it can
Given a dataset X = {x1 , x2 ," , xn } , a fuzzy matrix
reveal the essential features and developing trend of the data.
And it has important application value in the analysis of A = (aij )c×n is called a partition matrix of X if A satisfies:
global climate change [1], public health security [2], seismic F
monitoring [3,4] and so on. But there is not a universal  ¦ DLM = M = " Q 
approach for spatiotemporal data clustering at present. L =
The existing methods of spatiotemporal data clustering Q
includes: space-time permutation statistics method [5], density
based method[3,4,6], and spatiotemporal distance based
  < ¦ DLM
M=
< Q L = " F 
method[7,8,9]. The idea of space-time permutation statistics A partition matrix denotes a fuzzy partition of X which is
method is adopting a permutation window, which is usually a
clustered into c clusters. In the partition matrix, aij denotes
cylinder with the spatial distance as radius and temporal
distance as height, and scanning around every data point with the membership degree of x j to the i th cluster. The cluster
the help of statistical identifying method. This approach needs
assumption of the probability distribution of data, and the center of the i th cluster is denoted as:
result is influenced by the window chosen. The idea of density viˈV = (v1 , v2 ," , vc ) .
based method is extending the classical DBSCAN by adding a
time dimension. The spatiotemporal distance based method is

978-1-5090-3071-2/16 $31.00 © 2016 IEEE 336


337
DOI 10.1109/IS3C.2016.94
temporal neighbourhood function, and ϕ (Pk , Pl ) is the spatial
c n
Let J ( A, V ) = ¦¦ a
2
2
ij x j − vi , then the optimal partition
i =1 j =1 neighbourhood function. Here we choose Gaussian kernel and
and cluster centers can be obtained by minimizing J ( A, V ) Uniform kernel as our neighbourhood function, so ψ (k, l ) and
using the following formulas: ϕ (Pk , Pl ) are as follows:
n

¦(a )
2
xj −1
­1, k − l ≤ ε
ij
ª c º ψ (k , l ) = ®
( )
2
vi =
j =1
n
, aij = « ¦ x j − vi x j − vl » ¯0, elsewhere ˄2˅

¦(a ) ¬ l =1 ¼
2
d
­ 1 2½
j =1
ij
ϕ (Pk , Pl ) = (2π )- 2 exp®− Pk − Pl ¾ ˄3˅
¯ 2 ¿
III. FCM WITH SPATIOTEMPORAL CONSTRAINTS Where d denotes the dimension of the spatial
In this section, the novel method: FCM with spatiotemporal information.
constraints is introduced. The type of data to be processed is The first item of the objective function is the total
illustrated in subsection A, and the objective function for our distance of feature {x } N
j j =1 to cluster centers. The second item
method is proposed in subsection B. Next, the iterative
formulas are derived in subsection C. Finally, the iterative denotes temporal constraint. The third item denotes spatial
algorithm is introduced in subsection D. constraint. α , β , γ are the weights of these items, satisfying
α + β + γ = 1.
A. Type of data to be processed
Spatial and temporal properties are two key features of C. Calculation of partition matrix and cluster center
spatiotemporal data which distinguish it from other types of Lagrange multiplier method is applied to get the iterate
data. There are mainly five types of spatiotemporal data[13]: formulas for equation (1). For X t , suppose
ST events, Geo-referenced series, Geo-referenced time series,
moving points and trajectories. § c · c
V = λ ¨ ¦ uit − 1¸ + α ¦ uit2 d it2
In this paper, a single trajectory which is related to time and © i =1 ¹
{ }
i =1
position is processed. We assume that X j N be the dataset
j =1 ªN º
( )
c N
to be clustered where X j = Pj , x j , j = 1,2," N . Index j + β ¦ «¦ (uik − uit ) ψ ( k , l ) d ik2 + ¦ (uit − uil ) ψ ( k , l )d it2 »
2 2
«
i =1 k =1 l =1
»
denotes temporal feature, and the time interval between two «¬ k ≠t l ≠t »¼
adjacent elements is equal. Pj is the spatial part of X j , and ªN º
c N
+ γ ¦ «¦ (uik − uit ) ϕ (Pk , Pt )d ik2 + ¦ (uit − uil ) ϕ (Pt , Pl )d it2 »
2 2
x j ∈ R m denotes other features of X j , for instance, maximal « »
i =1 k =1 l =1
wind speed of a typhoon. ¬« k ≠ t l ≠t ¼»
B. Objective function ˄4˅
∂V c
Spatial and temporal properties are two important features
of spatiotemporal data, and compaction in spatial and
Let
∂u st
= 0 , and consider ¦u
i =1
ij = 1, j = 1, 2," , N ,
temporal properties within each cluster is expected.
u st can be calculated as the following formula:
Referencing the idea of supervised clustering, an improved
c c
FCM method by joining spatial and temporal constraints in
the objective function is proposed as shown below: ust = (1 + ¦ ( Bst − Bit Ait )) ¦A st Ait ˄5˅
c N c N N i =1 i =1
Q = α ¦ ¦ uij2 d ij2 + β ¦¦¦ (uik − uil ) ψ (k , l )d ik2
2
Where
N
i =1 j =1 i =1 k =1 l =1
Ast = 2αd st2 + 2¦ (βψ (k , t ) + γϕ (Pk , Pt ))d sk2
c N N
+ γ ¦¦¦ (uik − uil ) ϕ (Pk , Pl )d ik2
2 k =1
˄1˅ k ≠t

i =1 k =1 l =1 N

Where U = uij ( ) , uij ∈ [0,1], denotes the fuzzy membership + 2¦ (βψ (t , l ) + γϕ (Pt , Pl ))d st2
c× n l =1 ˄6˅
l ≠t
degree for X j to the i th cluster, satisfying
N
c Bst = 2¦ (βψ (k , t ) + γϕ (Pk , Pt ))u sk d sk2
¦u
i =1
ij = 1, j = 1,2," , N . And d ij = x j − vi , is the k =1
k ≠t
N
distance between X j and the i th cluster center, which
+ 2¦ (βψ (t , l ) + γϕ (Pt , Pl ))u sl d st2 ˄7˅
doesn’t include spatial or temporal information. ψ (k, l ) is the l =1
l ≠t

338
337
The objective function can also be written as: A. Experiment with artificial datasets

Q = α ¦¦ uij2 ¦ (x jm − vim )
c N n
2 First, a single trajectory which can be classified into three
clusters clearly is considered, as shown in Fig.1, the
i =1 j =1 m =1
connecting line between two points indicates that they are
c N N n
+ β ¦¦¦ (uik − uil ) ψ (k , l )¦ ( xkm − vim ) adjacent in time. The experiment result in Fig.2 shows that the
2 2

i =1 k =1 l =1 m =1
dataset can be divided into three clusters(black, red, green),
c N N n which is consistent with visual conclusion.
+ γ ¦¦¦ (uik − uil ) ϕ (Pk , Pl )¦ ( xkm − vim )
2 2
˄8˅
i =1 k =1 l =1 m =1
∂Q
Let = 0 , the formula of vs is as follows: 100

∂vsr 80

N N N N N
2α¦usj2 xj + 2β ¦¦(usk −usl ) ψ (k, l )xk + 2γ ¦¦(usk −usl ) ϕ(Pk , Pl )xk
2 2 60

z
j =1 k =1 l =1 k =1 l =1 40
vs = N N N N N
2α¦u + 2β ¦¦(usk −usl ) ψ (k, l ) + 2γ ¦¦(usk −usl ) ϕ(Pk , Pl )
2 2 2 20
sj
j =1 k =1 l =1 k =1 l =1 0
60

˄9˅ 40
60
80

20 40
20
When the partition matrix is initialized, the clustering y 0 0
x
method can be realized by iteration according to formulas (5)
and (9). Fig.1. Original spatiotemporal dataset.

D. Algorithm of FCM with spatiotemporal constraints


1) Implementation of the proposed algorithm. 100

Step 1: initialization. Determine cluster number c , 80

maximal iterative number M , and randomly initialize 60

partition matrix U = (uij ), i = 1,2," , c, j = 1,2," , N .


z

40

v = (v )
k c 20
Step 2: calculate cluster center s s =1 of the k th
0
time according to formula (9). 60
80
Step 3: modify partition matrix 40
60

U = (u st ), s = 1,2," , c, t = 1,2," , N . According to


k 20 40
20
y 0 0
x
formulas (5),(6),(7).
Step 4: check if k reaches the maximal iterative number Fig.2. Clustering result using the augmented FCM

M , or U k − U k −1 ≤ ε 0 , if either of these is satisfied, the Then we add some transitional points between each cluster
algorithm comes to an end, otherwise, returns to step 2. and mix the position of some points from different clusters as
shown in Fig.3, and the result is shown in Fig.4. The weights
2) De-fuzzification of clustering results are user-determined. Here we choose the weight for the
To de-fuzzy the clustering results, maximum membership temporal item as 0.4 because temporal property plays an
degree law is adopted: important part in trajectory data. We can see the algorithm
For any X j , if uij = max u kj , then X j is classified to the tends to classify two successive points as the same cluster,
1≤ k ≤ c which is reasonable. In fact, spatial property is still an
i th cluster. important aspect, because the membership degree of the
3) Computational complexity of the proposed algorithm transitional points to the two clusters are close, for example,
the membership degree vector of the three transitional points
Once M and c are fixed, the computational complexity of between the red cluster and the green cluster are
the proposed method is Ο(n
2
) , which is the same as the [0.52,0.39,0.09], [0.44,0.45,0.01], [0.35,0.55,0.10]. The result
traditional FCM algorithm. also shows that the temporal constraint can prevent
misclassification effectively as the mixed points are classified
IV. EXPERIMENT STUDIES to temporal adjacent cluster.

339
338
V. CONCLUSIONS
100
In this paper, a fuzzy clustering method with spatio-
80
temporal constraints is proposed. And its effectiveness is
60 testified using artificial trajectory data. This approach can be
used for granulation of trajectories, which can simplify
z

40

20
calculation and help us to view the data on a higher level. And
it can be generalized to other types of spatiotemporal data if
0
60 the spatial and temporal information are given.
80
40
60
Further research can be focused on clustering of multiple
20
20
40
trajectories and how to determine the weights according to
y 0 0
x different data sets and different issues.
Fig. 3. Original spatiotemporal dataset.
ACKNOWLEDGMENT
This work is supported by the National Natural Science
Foundation of China (No.11571001), Beijing Natural Science
100 Foundation (No. 4112031), and the Fundamental Research
80
Funds for the Central Universities.
60 REFERENCES
z

40 [1] Steinbach M, Tan P N, Kumar V. “Clustering earth science data: goals,


issues and results,” In: Kamath C, ed. Proceedings of the 4th KDD
20 Workshop on Mining Scientific Datasets in conjunction with 7th ACM
SIGKDD International Conference on Knowledge and Data Mining.
0
60 San Francisco: ACM Press, 2001. p.1-8.
80 [2] Gaudart J, Poudiougou B, Dicko A, Ranque S, Toure O, Sagara I,
40
60 Diallo M, Diawara S, Ouattara A, Diakite M, Doumbo O K. “Space-
20 40
20
time clustering of childhood malaria at the household level: a dynamic
y 0 0 cohort in a mali village,” BMC Public Health, vol.6, pp. 1-13, 2006.
x
[3] Wang M, Wang A P, Li A B. “Mining spatial-temporal clusters from
Fig.4.Clustering result using the augmented FCM geo-database,” Lect Notes Artif Intell, pp.263-270, 2006.
[4] Pei T, Zhou C H, Zhu A X, Li B, Qin C. “Windowed nearest neighbour
B. Experiment with real data method for mining spatio-temporal clusters in the presence of noise,”
Int J Geogr Inf Sci, vol.24, pp.925-948, 2010.
To further verify the effectiveness of FCM with spatio- [5] Kulldorff M, Heffernan R, Hartman J, Assuncao R, Mostashari, F. “A
temporal constraints, real data of typhoon Halong is processed. space-time permutation scan statistics for disease outbreak detection,”
Besides spatial and temporal properties, maximal wind speed PLoS Med, vol.2, pp.216-224, 2005.
[6] Birant D, Kut A. “ST-DBSCAN: an algorithm for clustering spatial-
is selected as another property. As shown in fig.5, the temporal data,” Data Knowl Disc, vol.60, pp.208-221, 2007.
trajectory is clustered into three parts. The first cluster marked [7] Jacquez G M. “A k nearest neighbour test for space-time interaction,”
with ‘o’ is the initial stage of Halong, during which the wind Stat Med, vol.15, pp.1935-1949, 1996.
is not very strong; the second cluster is the period of strong [8] Kulldorff M, Hialmars U. “The Knox method and other tests for space-
time interaction,” Biometrics, vol.55, pp.544-552, 1999.
wind; and in the last cluster the wind decreases as a result of [9] Zaliapin I, Gabrielov A, Keilis-borok V, Wong H. “Clustering analysis
landfall. The clustering result is quite reasonable. of seismicity and aftershock identification,” Phys Rev Lett, pp.1-4,
2008.
[10] Luo C Z, Introduction of fuzzy sets, 2nd ed., Luo C Z, Introduction of
fuzzy sets, Jr.,ed., China: Beijing Normal University Press, 2005.
[11] Witold Pedrycz, Yu F S translated, Knowledge-based clustering: from
data to information granules, China: Beijing Normal University
Press,2005.
[12] Hesam Izakian, Witold Pedrycz, Iqbal Jamal, “ Clustering spatio-
temporal data: an augmented fuzzy c-means,” IEEE transactions on
fuzzy systems, VOL.21, NO.5, Oct 2013.
[13] Y Li, H Lu, L Zhang, J Zhu, S Yang, X Hu, X Zhang, Y Li, B Li, S
Serikawa, “An automatic image segmentation algorithm based on
weighting fuzzy c-means clustering,” Soft Computing in Information
Communication Technology, 27-32, 2011.
[14] Yujie Li, Huimin Lu, Yingying Wang, Lifeng Zhang, Shiyuan Yang,
Seiichi Serikawa, “Robust color image segmentation method based on
weighting fuzzy c-means clustering,” Proc. of 2012 IEEE/SICE
International Symposium on System Intetration, pp.133-137, Fukuoka,
Japan.
[15] Slava Kisilevich, Florian Mansmann, Mirco Manni, Salvatore
Rinzivillo, Data mining and knowledge discovery handbook. Springer
Fig.5.Clustering result of typhoon Halong
science business media, LLC 2010.

340
339

Das könnte Ihnen auch gefallen