Skyline

Continuously Maintaining Sliding Window Skylines in a Sensor Network
Junchang Xin1 , Guoren Wang1 , Lei Chen2 , Xiaoyi Zhang1 , and Zhenhua Wang1
1
Institute of Computer System, Northeastern University, Shenyang, China wanggr@mail.neu.edu.cn Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China leichen@cs.ust.hk
Abstract. Currently, wireless sensor network has been widely used in environment monitoring. The skyline query, as an important operator for multiple criteria decision making and data mining, plays an important role in many sensing applications. Though skyline queries have been well-studied in traditional database system, the existing solutions designed for data stored in a centralized site are not directly applicable to sensor environment due to the unique characteristics of wireless sensor network. In this paper, we propose an energy-ecient algorithm, called Sliding Window Skyline Monitoring Algorithm (SWSMA), to continuously maintain sliding window skylines over a wireless sensor network. Specically, SWSMA employs two types of lters within each sensor to reduce the amount of data transferred and save the energy consumption as a consequence. In addition to SWSMA, a set of optimization mechanisms are also discussed to improve the performance of SWSMA. Our extensive simulation studies show that SWSMA together with the optimization techniques performs eectively on reducing communication cost and saving the energy on monitoring sliding window skylines.
Introduction
In recent years, wireless sensor networks (WSN) have been widely used in environmental monitoring [16, 24], such as earthquake monitoring, habitat monitoring, agriculture monitoring, coal mine environment monitoring, etc. Current sensors are generally cheap, resource-constraint and battery powered, it is not possible or at least very dicult to change batteries. Therefore, applications over sensor networks need a scalable, energy-ecient and fault-tolerant method to monitor the tremendous data generated by sensors. Among all the queries, the skyline query, as an important operator for multiple criteria decision making and data mining, plays an important role in many sensing applications. A skyline query is dened as following: Denition 1. Assume that we have a relational database, given a set of tuples T , a skyline query retrieves tuples in T that are not dominated by any other tuple. For two tuples ti and tj in T , tuple ti dominates tuple tj if it is no worse than tj in all dimensions and better than tj in at least one.
R. Kotagiri et al. (Eds.): DASFAA 2007, LNCS 4443, pp. 509521, 2007. c Springer-Verlag Berlin Heidelberg 2007
510
J. Xin et al.
Within a sensor network environment, data are collected by each sensor node periodically. It is impossible or meaningless to conduct skyline queries over the innite data streams collected by sensors. Thus, sliding window skylines, which seek the skylines over the latest data that are constrained by a sliding window, are very useful for some data monitoring applications. For example, an ornithologist who has been studying birds in the forest may want to know when and where certain kinds of birds are more likely to be discovered. Existing solutions for skylines can not be applied to the sensor environment directly due to distributed nature of the sensory data. In addition to that, as we mentioned, energy is the precious resource in the sensor network and wireless communication is the main consumer, therefore, the sliding window skylines over a wireless sensor network raise up a new challenge on how to minimize the communication cost, which is not addressed by the centralized skyline solutions. In this paper, we propose an energy-ecient algorithm, called Sliding Window Skyline Monitoring Algorithm (SWSMA), to continuously maintain sliding window skylines over a wireless sensor network. SWSMA employs two types of lters within each sensor to reduce the amount of data transferred and save the energy consumption as a consequence. The contributions of this paper are: 1. We prove theoretically that skyline queries are decomposable, which indicate that in-network computation can be applied to skylines; 2. We propose an energy ecient Sliding Window Skyline Monitoring Algorithm (SWSMA) to continuously maintain sliding window skylines by employing two types of lters to avoid transmission unqualied tuples; 3. In addition to SWSMA, a set of optimization mechanisms are also discussed to improve the performance of SWSMA.
Related Work
There are various query process models having been proposed for sensor networks, in which TinyDB [13, 14, 15] and COUGAR [25] are the two typical systems. Both of them provide a SQL-Like interface to implement aggregation operators, such as MAX, MIN, AVERAGE, SUM, and COUNT. The skyline query was rst investigated in [3], where several methods were presented, including SQL implementaton, divide-and-conquer (DC) and block-nested-loop (BNL). Chomicki et al. [5] present a pre-sort method, which sorts the dataset according to a monotone preference function and then computes the skyline in another pass over the sorted list. Two progressive methods, Bitmap and Index, are presented in [22]. Since the nearest neighbor (NN) is sure to belong to the last skyline, Kossmann et al. [10] present a progressive on-line method based on NN, which allows user to interact with the process. Papadias et al. [19] use R-tree to further improve the performance of algorithm presented in [10]. The methods above are all based on centralized scenarios. So far, we do not nd any approach having been proposed to address skyline queries over a sensor network. The most related works to ours are some studies about skylines in a distributed scenario. Balke et al. [4] extend the skyline problem to the world wide web in which the attributes of an object are distributed in dierent
511
web-accessible servers and presented a basic distributed skyline algorithm (BDS) and an improved distributed skyline algorithm (IDS) that compute the skyline in such a distributed environment. BDS uses a simple method to identify a subset of the objects that includes the skyline, and then lters away all the non-skyline objects in that subset. IDS nds the subset more quickly than BDS with a heuristic approach. Later, Lo et al. [11] propose a progressive distributed skyline algorithm (PDS), based on progressiveness and rank estimation, to improve the performance of BDS and PDS. Huang et al. [9] propose a ltration policy to reduce communication cost among mobile devices and a hybrid storage model to reduce the execution time on each single mobile device, which is similar to our proposal. However, their approach mainly focuses on answering skyline queries on one timestamp, i.e. snapshot skylines, ours focuses on continuously monitoring sliding window skyline queries. There are some works having been proposed to answer sliding window skyline queries with the focus on handling the characteristics of stream data. Tao et al. [23] propose a framework to continuously monitor skyline over stream data. Lin et al. [12] explore the problem of n-of -N skyline queries computing skyline against any most recent n elements in the set of the most recent N elements and present a pruning technique to reduce data capacity, an encoding scheme to reduce memory space, and a new trigger based technique to continuously process an n-of -N skyline query.
Preliminaries
In this section, sensor stream is rst introduced, followed by sliding window skylines, and nally, some related properties which are the foundation of our ltering algorithms are presented. 3.1 Sliding Window Skyline
In a sensor network, data are periodically collected by sensor nodes, therefore, strictly speaking, sensory data collected by a sensor network cannot be simply considered as a traditional database. It is more like a distributed, multiple data stream system in which all sensor streams are append-only. Since the data stream is innite, and the volume of a complete stream is theoretically boundless, it is impossible to carry out skyline operation after all data have been collected. In this paper, sliding window skyline is considered. Sliding window skyline only considers the latest set of data. Each tuple in data stream has a timestamp t.arr indicating its arrival time. If the size of sliding window is set to W , and lifespan of the tuple is [t.arr, t.exp], we have t.exp = t.arr + W . All the valid data in the interval [t.curr W, t.curr] of the sliding window will be used to compute skylines when the current time is t.curr. 3.2 Properties
We denote the whole tuple set in a sensor network as T , the tuple set for each node as T i. Furthermore, we assume that the dimension set of the tuple set is
512
J. Xin et al.
D, dimensionality of tuple set is |D|, and n is the number of sensor nodes. We use SKY to stand for skyline operator, for the dominance relationship, and t.xd for the dth attribute of tuple t. Based on the denition of skylines in Section 1, we have the following Lemma. Lemma 1. Let ti , tj and tk be three tuples in T. If ti ti t k . d D, tj and tj tk , then
Proof: According to the denition of dominance relationship, ti .xd tj .xd tj .xd tk .xd ti .xd > tk .xd ti .xd > tj .xd ) (1) (2) (3)
d D, (d D, So we can conclude ti
ti .xd tk .xd ) (d D, tk .
If an operation is decomposable, it can be computed in-network [14], which means the computation can be carried on within each sensor and the intermedin results will be transmitted during the query data collection phrase. Thus, many redundant transmissions can be saved. Fortunately, skyline in sensor network has this attractive property. Theorem 1. The skyline query in sensor network is decomposable. Proof: According to Lemma 1 and the denition of skyline query,
n
SKY (T )
i=1 n
SKY (Ti )
(4)
SKY (T ) SKY (
i=1
SKY (Ti )) T
n
(5)
SKY (T ) = SKY (
i=1
SKY (Ti ))
(6)
It satises the formula f (v1 , v2 , . . . , vn ) = g(f (v1 , v2 , . . . , vk ), f (vk+1 , . . . , vn )) given in [6]. So skyline query in sensor network is decomposable. Theorem 1 indicates that in-network computation can be applied to skylines. In order to further reduce the tuples transmitted among the sensors, setting a lter within each sensor becomes essential. The following Theorem implies the eect of using a tuple as a lter. Theorem 2. If tuple t, whether exists or not, is dominated by a valid tuple, all tuples dominated by t will not belong to skyline. Here valid tuple means the tuple that does exist and is unexpired. Proof: Immediate deduct from lemma 1.
513
Sliding Window Skyline Monitoring Algorithm
In this section, we present the Sliding Window Skyline Monitoring Algorithm (SWSMA), specically, the computation and maintenance modules are discussed in Section 4.1 and 4.2, respectively. 4.1 The Computation Module
In this paper, we design our skyline algorithms based on the popular data routing structure, tree-based routing structure, which is described in TAG [14, 13] and Cougar [25]. In tree-based routing structure, a spanning tree is created with the base station as the root. To compute a skyline, the naive approach is a centralized one, i.e., collecting all the data through the tree-based routing structure to the base station and compute the slide window skylines at the base station. This will cause a large amount of communication cost and network congestion. Therefore, this method is unpractical for wireless sensor network. The possible improvement methods are listed as following. Merge Approach. From Theorem 1, we learn that skyline operation is decomposable, thus, one feasible method is to use in-network computation. That is to compute skyline in-network whenever possible, the intermediate node merges its own skyline and the skyline results sent by its children, then sends the merged result to its parent. Most of the tuples that belong to local skyline and not global skyline are ltered out on the intermediate nodes. However, there are still many tuples that do not belong to the nal skyline having be transmitted. Tuple Filter Approach. If a tuple belongs to local skyline, not to global skyline, there must exist a global skyline tuple dominating it. The tuple does not need to be transmitted, if it is known in advance that there is a global skyline tuple dominating it. The transmitted data size will be greatly reduced when the tuple which dominates the most tuples is found and informed to other nodes. To be general, suppose tuple set T is in |D|-dimensional space, and the ith dimension range is [0, Ui ]. X and Y denote tuples in T . (x1 , x2 , . . . , x|D| ) and (y1 , y2 , . . . , y|D| ) are the corresponding coordinates of X and Y . If the probability density function of the tuple in T is p(X) = p(x1 , x2 , . . . , x|D| ). The total amount of tuples that tuple Y (y1 , y2 , . . . , y|D| ) can dominate is c(Y ) = |T |
R
p(x1 , x2 , . . . , x|D| )dx1 dx2 . . . dx|D|
(7)
where R = {X|x1 y1 , x2 y2 , . . . , x|D| y|D| }. The tuple that can dominate the most tuples is the one which has the biggest value according to equation 7. Although p(X) is unknown in most cases, the approximate distribution of p(X) can be easily obtained by random sampling or collecting histograms [7, 21]. To reduce the cost of integration, we use a polynomial function to present p(x). After the integration is nished at the base station, we only send a few coecients of p(x), which enable each node to reconstruct the expression approximately.
514
J. Xin et al.
Equation 7 is then used in nding the powerful lter tuple. After obtaining p(X), steps of computing skyline are as follows. 1. Calculate the value c(according to equation 7) of the stored tuples locally. 2. Find the tuple with the maximum value c using the method of in-network aggregate, and set it as the tuple lter. 3. Broadcast tuple lter to the entire network. 4. Filter out tuples that are dominated by the lter in sensor nodes. 5. Use merge approach to carry out skyline computation. Grid Filter Approach. Intuitively, for some distribution, tuple lter will dominate most non-skyline tuples with an obvious ltration eect; for other distributions, it will only dominate a part of non-skyline tuple with inferior eect. In order to solve this problem, grid lter approach is introduced. In grid lter approach, a regular grid is used to partition the data space. Each dimension is divided into s segments, and the extent of each segment is Ui /s. Totally there are sd cells. A cell.sta is used to record the state. If any tuple falls in the cell, cell.sta is set to 1, otherwise 0. Another option to ll the cell is to preprocess the grid, in which cell.stas of the dominated cells are set to 0, meaning that the tuples in the cell are not belong to skyline; cell.stas of all the other cells are set to 1, meaning the tuples in the cell may belong to skyline. We call the former one original grid and the latter one pre-processed grid for distinction. To determinate whether a tuple is dominated by a grid, the former needs examine all its bottom left cells cell.stas, while the later just needs to examine its own cell.sta. Furthermore, to deal with the merge of grids on intermediate nodes, we operate the or on the original grid, while operate and on the pre-processed grid. The original grid costs more during the determination and their cost of merge are the same. We conclude that the pre-process grid performs better totally. So we are apt to using the pre-process grid, and the grid mentioned in the following discussion is the pre-processed grid. Adaptive Filter Approach. Since tuple lter and grid lter have their own pros and cons, tuple lter is more eective on independent and correlated distributions whereas grid lter performs better on anti-correlated ones, a feasible method is to use a selection mechanism to choose the right lter to avoid their disadvantages and fully utilize the advantages, and we call this approach adaptive lter approach. First of all, samples or histograms [7, 21] are used to gain the rough distribution of data, and then the adaptive lter approach determines the lter strategy according to specic distribution. If data is approximate independent or correlated distribution, tuple lter will be used in ltration; if data is approximate anti-correlated distribution, grid lter is used in ltration. In this way, merits of both lter strategies can be fully unutilized to optimize the system performance. To adapt to the variation of data distribution, base station needs to select a new lter that is suitable for the new data distribution according to current results of skylines, and determines whether or not to broadcast to the whole network based on the computation result of cost-benet model.
515
4.2
The Maintenance Module
After computing the initial skyline, new tuple is generated by sensor node, while the old one will be moved out of the window and become expired. A simple and direct way is to use the method presented in Section 4.1 to recompute the skyline periodically, so as to maintain the coherence of global skyline. Obviously this kind of method is unpractical, because there is a great intersection between the old window and the new one. If the overlapped information can be used, data transmission cost in the maintenance phase will be reduced. An eective way should be update-only, which means only those new local skyline tuples are transmitted, and those that have been transmitted do not need to be retransmitted. Thus, the communication cost is reduced. For Merge Approach, there is no extra process needed, while for the two lterbased approaches, how to maintain lters incrementally during global skyline maintenance becomes a critical problem. Maintenance of Tuple Filter. It is known from theorem 2 that even if the lter tuple is expired, it is not necessary to replace it, for the reason that if there is a tuple in skyline that can dominate it, the ltered tuples will denitely not belong to skyline. There will be no false negative. However, once a tuple f with small c(f ) (according to equation 7) is chosen as a lter, the probability that it is dominated by the skyline tuple is high. Thus, it may not be replaced for a long time. The inferior ltration ability will cause a lot of false alarms and a great waste of energy. The replacement of lter can improve the ltration ability, but will cost certain communication cost to replace the old one. There is a tradeo between benet and cost, thus, we introduce a benet-cost model. benef it(f ) = (c(f ) c(fold )) n t f (8)
where t is the average lifetime of lter tuple, and benet is the increased number f of tuples that will be dominated by f compared with fold . cost(f ) = broadcast(f ) The detailed replacement policy of the lter is described as follows: 1. Filter is expired and there is no tuple in skyline that can dominate it. 2. There is a new tuple whose benet exceeds the cost. If one of the above is satised, the replacement of lter is carried out, and a new tuple will be chosen as the new lter and broadcasted to the network. Maintenance of Grid Filter. Before discussing the replacement policy of grid lter, the denition of dominate relationship between grid is brought forward. Denition 2. For two grids g1 and g2, the set of cells whose cell.stas are 0 in g1 denotes s1, and the set of cells whose cell.stas are 0 in g2 denotes s2, if s1 is the subset of s2, we say grid g2 dominates g1. (9)
516
J. Xin et al.
Theorem 3. If certain grid g is dominated by a valid grid, whether g expires or not, all tuples dominated by g will not belong to skyline, where valid grid is the one that is gained by unexpired skyline tuples. Proof: Immediate deduct from Denition 2. In the same way, an expired grid does not necessarily need to be replaced. When a better grid appears, a selection should be made on whether replacing or remaining the old one. The standard to evaluate the ltration ability of grid is dierent from equation 7. The precondition is same as that of tuple lter, then the ltration ability equation of grid is c(Grid) = |T |
G
p(x1 , x2 , . . . , xn )dx1 dx2 . . . dxn
(10)
where G = {X|X cell cell.sta = 0}. The benet-cost model of tuple lter can also be applied to grid lter. The replacement policy of grid lter is similar to that of tuple lter.
Optimizations
In this section, several optimization approaches are discussed to further improve the performance of SWSMA. The snooping method is applicable to all lter strategies, while the shearing and compressing methods work on the grid lter. 5.1 Snooping
The snooping method aims to use the information of non-child nodes to reduce the communication cost. In snooping mode, the intermediate node not only keeps data sent by its child node, but also snoops messages sent by other nodes. The snooped data is used when computing skyline just as the tuple lter. The data only participates in ltration, and does not enter the nal skyline. Some skyline tuples that were meant to be transmitted do not necessarily need to be transmitted to the parent node, since they are dominated by the snooped tuple, which will further reduce data transmission capacity in sensor network. 5.2 Shearing
The shearing method aims to transmit only the part of useful information during the transmission of grid, while the part to be deduced will not be transmitted. Take skyline operation using min for example. Since the top right edge of grid does not dominate any cell when merging grid, it does not necessarily need to be transmitted. When grid is broadcast as lter, there is no need to transmit its bottom left edge, because cell.sta of each cell of the bottom left edge is always 1. Therefore, the edge of grid can be cut according to dierent situations, so as to reduce communication cost.
517
5.3
Compressing
The compressing method aims to reduce the communication cost. For binary string, special compression mechanism can be utilized. Since the probability of cell.sta of each cell being 0 is computable, it is more likely to have successive 1s or 0s in the sequence if cells in grid are sorted by probability. Thus, better compression eectiveness is gained. Encoded mode in [18] is used to compress data to 30% of the original size.
Simulation Evaluation
In this section, we mainly compare the performances of merge approach (MA), tuple lter approach (TF), grid lter approach (GF) and adaptive lter approach(AF) based on tree-based structure under the two data distributions, independent and anti-correlated, which are common benchmarks for skyline query [3, 23]. The experimental data distribute evenly on 600-1000 nodes with the communication radius of of 2 2 times the area side length occupied by the node itself. The dimension of sensory data ranges from 2 to 4 and the size of sliding window varies from 100 to 500. For each timestamp, each node generates a new tuple, thus there will be n new tuples generated in the whole network. All simulations are completed on pentium 2.8 GHz CPU with 512MByte memory. In the simulation, the default values number of nodes n=1000, cardinality C=300 and dimension d=3. Since under independent distribution, TF algorithm is adopted by AF, performances of MA, GF and AF will be compared; while under anti-correlated distribution, GF algorithm is adopted, performances of MA, TF and AF will be compared.
Total Communication Cost( 10 ) Total Communication Cost( 103) 0
3 3
160 155 150 145 140 135 130 125 120 115 1
3
600 550 500 450 400 350 300 3 3 3 3 3 3 3 3 3 3 3 10 11 12 13 14 15 16 17 18 19 20 Grid Size
4 5 6 Grid Size
10
(a) Independent
(b) Anti-correlated
Fig. 1. Eect of GF granularity in computation module
Before performances of each algorithm in computation module are compared, we study eect by grid granularity rst. Figure 1 shows the total communication cost(number of bytes of the messages transmitted) under dierent grid granularity. The lowest grid granularity for independent distribution is 0, and the best grid granularity for anti-correlated distribution is 13. This is because grid will lter out some of the data, at the same time, its computation and broadcasting are not free. For independent data, the broadcast cost acceleration of grid exceeds the eect resulted from ltration with the increase of granularity. While for
518
200
J. Xin et al.
200 450
Total Communication Cost( 10 )
Total Communication Cost( 103)
AF MA
AF MA
150
150
AF 400 MA 350 300 250 200 150 100 50 0 2 3 Dimension 4
100
100
50
50
0 600
700
800 900 Number of Nodes
1000
0 100
200
300 Cardinality
400
500
(a)
(b) Fig. 2. Performance with independent data
(c)
anti-correlated data, cost and benet balance well. Therefore, in the experiment of computation module, the grid granularity for independent data is 0, which degenerates into MA; while the grid granularity for anti-correlated data is 13. Figure 2 and 3 present the inuence on performance by dimension, cardinality and the number of nodes under independent and anti-correlated data distribution, respectively. It shows that AF is the best under all circumstances. This is because TF can lter a great amount of tuples with far less transmission cost than GF under independent distribution; while GF can lter out several times of data than TF which far exceeds its own transmission cost under anti-correlated distribution. Since AF always chooses the best strategy, its performance turns out to be the best. Meanwhile, cost increases with the increase of dimensions, since the skyline result will increase with a high dimension which leads to the increment of communication cost. Change of cardinality will not directly aect the cost, because there is no obvious functional relationship between the size of the result set and cardinality. For the same reason as the former, change in the number of nodes does not directly aect the cost.
Total Communication Cost( 10 ) Total Communication Cost( 104) TF 90 MA AF 80 70 60 50 40 30 20 600 700 800 Number of Nodes 900 1000 TF MA AF Total Communication Cost( 10 )
4 4
100
120 100 80 60 40
400
TF 350 MA AF 300 250 200 150 100 50 0 2 3 Dimension 4
20 100
200
300 Cardinality
400
500
(a)
(b) Fig. 3. Performance with anti-correlated data
(c)
Figure 4 reports the inuence on algorithm performance brought by optimization strategies for independent and anti-correlated distributions, respectively. It shows that snooping (SNP) remarkably improves the eciency of the algorithm under all circumstances. Because the shearing and compressing methods only apply to grid lter, they just appear in anti-correlated distributions. Figure 4(b) shows that both shearing (SHE) and compressing (CPS) are eective for anticorrelated distributions.

3 Total communication cost(`10 )
4 Total communication cost(`10 )
519
AF 35 30 25 20 15 10 5 0
AF+SNP
AF 45 40 35 30 25 20 15 10 5 0
AF+SNP
AF+SHR+CPS
AF+SNP+SHR+CPS
(a) Independent
(b) Anti-correlated
Fig. 4. Optimizations in computation module
Next, we study the performances of algorithms in skyline maintenance module. Figure 5 gives the inuence on GF performance by grid granularity. Since grid has a long aging in the process of maintenance and computation and broadcast cost will be shared by each time segment, the optimum grid granularity changes. For two dierent routing structures, the best grid granularity for independent distribution is 25, and it is 15 for anti-correlated distribution.
Total Communication Cost( 10 ) Total Communication Cost( 10 ) Total Communication Cost( 10 )
5 5
140 135 130 125 120 115 110 3 3 3 3 3 3 3 3 3 3 3 20 21 22 23 24 25 26 27 28 29 30 Grid Size
75 70 65 60 55 50 3 3 3 3 3 3 3 3 3 3 3 10 11 12 13 14 15 16 17 18 19 20 Grid Size
(a) Independent
(b) Anti-correlated
Fig. 5. Eect of GF granularity in maintenance module
Figure 6 presents the time-varying regularity of communication cost in maintenance for each algorithm. We can observe that the communication cost increases smoothly with time. Similar to the computation module, AF is the best under independent and anti-correlated distribution. It basically shares the same reason with computation module.
5
GF 160 MA 140 AF 120 100 80 60 40 20 0 0 100 200 300 400 500 Timestamp
180
100 TF 90 MA 80 AF 70 60 50 40 30 20 10 0 0
100
200
300
400
500
Timestamp
(a) Independent
(b) Anti-correlated
Fig. 6. Communication cost varying time
Finally, we study the inuence on algorithm performance brought by optimization strategies in maintenance module. Figure 7(a) shows that SNP further
520
J. Xin et al.
Total Communication Cost( 10 ) Total Communication Cost( 105) 30 25 20 15 10 5 0 0 100 200 300 Timestamp 400 500 80 70 60 50 40 30 20 10 0 0 100 200 300 Timestamp 400 500
AF+Tree AF+Tree+SNP
A AF+SNP AF+SNP+SHR+CPS
(a) Independent
(b) Anti-correlated
Fig. 7. Optimizations in maintenance module
improves performance and reduces communication cost for independent distributions. Figure 7(b) shows that all optimizations work well for anti-correlated distributions.
Conclusions
In most sensor network, energy is a critical resource, and is mainly consumed by communication. How to minimize the communication cost for applications in sensor network becomes an essential problem. In this paper, we introduce an energy-ecient algorithm called SWSMA to maintain the sliding window skyline of sensor network. First, merge, tuple lter, grid lter and adaptive lter are proposed to calculate the initial skyline in sensor network, then methods for maintaining lter in continuous query are discussed. Furthermore, some optimizations to improve SWSMA capacity are also presented. The experiment result proves that SWSMA is an ecient method for calculating and maintaining skyline in sensor networks. Acknowledgement. This work is partially supported by National Natural Science Foundation of China under grant No. 60573089 and 60473074 and supported by Natural Science Foundation of Liaoning Province under grant no. 20052031.
References
1. D. J. Abadi, S.l Madden, and W. Lindner: REED: Robust, Ecient Filtering and Event Detection in Sensor Networks. In Proc. of VLDB, 2005. 2. Boris Jan Bonls and Philippe Bonnet: Adaptive and Decentralized Operator Placement for In-Network Query Processing. In Proc. of IPSN, 2003. 3. S. Borzonyi, D. Kossmann, and K. Stocker: The skyline operator. In Proc. of ICDE, 2001. 4. W.-T. Balke, U. Guntzer, J. X. Zheng: Ecient distributed skylining for web information systems. In Proc. of EDBT, 2004. 5. J. Chomicki, P. Godfrey, J. Gryz, and D. Liang: Skyline with presorting. In Proc. of ICDE, 2003. 6. J. Considine, F. Li, G. Kollios, and J. Byers: Approximate aggregation techniques for sensor databases. In Proc. of ICDE, 2004.
521
7. Surajit Chaudhuri, Nilesh N. Dalvi, Raghav Kaushik: Robust Cardinality and Cost Estimation for Skyline Operator. In Proc. of ICDE, 2006. 8. Vishal Chowdhary, Himanshu Gupta: Communication-Ecient Implementation of Join in Sensor Networks. In Proc. of DASFAA, 2005. 9. Zhiyong Huang, Christian S. Jensen, Hua Lu, Beng Chin Ooi1: Skyline Queries Against Mobile Lightweight Devices in MANETs. In Proc. of ICDE, 2006. 10. D. Kossmann, F. Ramsak, S. Rost: Shooting Stars in the Sky: An Online Algorithm for Skyline Queries. In Proc. of VLDB, 2002. 11. Eric Lo, Kevin Ip, King-Ip Lin, David Cheung: Progressive Skylining over WebAccessible Database. DKE, 57(2): 122-147, 2006. 12. Xuemin Lin, Yidong Yuan, Wei Wang, Hongjun Lu: Stabbing the Sky: Ecient Skyline Computation over Sliding Windows. In Proc. of ICDE, 2005. 13. S. Madden, M. Franklin, J. Hellerstein, and W. Hong: The design of an acquisitional query processor for sensor networks. In Proc. of SIGMOD, 2003. 14. S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. TAG: A Tiny AGgregation Service for Ad-Hoc Sensor Networks. In Proc. of OSDI, 2002. 15. S. Madden et al.: Supporting aggregate queries over ad-hoc wireless sensor networks. In Proc. of WMCSA, 2002. 16. R. Oliver, K. Smettem, M. Kranz, K. Mayer: A Reactive Soil Moisture Sensor Network: Design and Field Evaluation. JDSN, 1: 149-162, 2005. 17. Aditi Pandit, Himanshu Gupta: Communication-Ecient Implementation of Range-Joins in Sensor Networks. In Proc. of DASFAA, 2006. 18. C. Palmer, P. Gibbons, and C. Faloutsos: ANF: A Fast and Scalable Tool for Data Mining in Massive Graphs. In Proc. of SIGKDD,2002. 19. D. Papadias, Y. Tao, G. Fu, et.al.: An Optimal and Progressive Algorithm for Skyline Querie. In Proc. of SIGMOD,2003. 20. G. Pottie and W. Kaiser: Wireless integrated sensor networks. Communications of the ACM, 2000. 21. Bernard W Silverman: Density Estimation for Statistics and Data Analysis. CRC Press, 1986. 22. K.-L. Tan, P.-K. Eng, and B. C. Ooi: Ecient progressive skyline computation. In Proc. Of VLDB,2001. 23. Yufei Tao, Dimitris Papadias: Maintaining Sliding Window Skylines on Data Streams. TKDE, 18(3): 377-391, 2006. 24. W. Xue, Q. Luo, L. Chen, and Y. Liu: Contour Map Matching For Event Detection in Sensor Networks. In Proc. of SIGMOD, 2006. 25. Y. Yao and Johannes Gehrke: The cougar approach to in-network query processing in sensor networks. SIGMOD Record, 31(3), 2002.

Skyline

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Skyline

Hochgeladen von

Copyright:

Verfügbare Formate

Continuously Maintaining Sliding Window Skylines in a Sensor Network

Continuously Maintaining Sliding Window Skylines in a Sensor Network

Continuously Maintaining Sliding Window Skylines in a Sensor Network

Sliding Window Skyline Monitoring Algorithm

p(x1 , x2 , . . . , x|D| )dx1 dx2 . . . dx|D|

Continuously Maintaining Sliding Window Skylines in a Sensor Network

The Maintenance Module

p(x1 , x2 , . . . , xn )dx1 dx2 . . . dxn

Continuously Maintaining Sliding Window Skylines in a Sensor Network

600 550 500 450 400 350 300 3 3 3 3 3 3 3 3 3 3 3 10 11 12 13 14 15 16 17 18 19 20 Grid Size

Fig. 1. Eect of GF granularity in computation module

Total Communication Cost( 10 )

Total Communication Cost( 103)

Total Communication Cost( 10 )

AF 400 MA 350 300 250 200 150 100 50 0 2 3 Dimension 4

800 900 Number of Nodes

(b) Fig. 2. Performance with independent data

TF 350 MA AF 300 250 200 150 100 50 0 2 3 Dimension 4

(b) Fig. 3. Performance with anti-correlated data

Continuously Maintaining Sliding Window Skylines in a Sensor Network

Fig. 4. Optimizations in computation module

140 135 130 125 120 115 110 3 3 3 3 3 3 3 3 3 3 3 20 21 22 23 24 25 26 27 28 29 30 Grid Size

Fig. 5. Eect of GF granularity in maintenance module

Fig. 6. Communication cost varying time

Fig. 7. Optimizations in maintenance module

Continuously Maintaining Sliding Window Skylines in a Sensor Network

Das könnte Ihnen auch gefallen