Sie sind auf Seite 1von 6

IEEE 8th International Conference on Computer and Information Technology Workshops

Secure Fault Tolerance in Wireless Sensor Networks


Ting Yuan Shiyong Zhang Department of Computing and Information Technology Fudan University, Shanghai, 200433, P. R. China {061021081, szhang}@fudan.edu.cn Abstract
Fault tolerance provides wireless sensor networks (WSN) with reliable collection and dissemination of data while preserving limited resources in sensor nodes, especially power energy. Although data redundancy achieves the goal of fault tolerance in the data-centered network infrastructure of WSNs, it also incurs security concerns by making data available in multiple locations. More attention should be paid when WSNs are deployed in hostile environments where sensor nodes are easy to be captured for deleterious use by an adversary. In this context, cryptographic keys are of low efciency for protecting data not involved in communication. In this paper, we propose a secure fault tolerance scheme for WSNs, which uses secret sharing to checkpoint the state of the sink over multiple nodes. Through security analysis, we show that our scheme enhances the resiliency against node capture in the presence of data redundancy.

1 Introduction
A wireless sensor networks (WSN) typically consists of sensor nodes with very limited power energy and computation, communication and storage capabilities. These nodes, usually hundreds of thousands in number, are deployed to solve many traditionally challenging applications [1], such as realtime trafc monitoring and military sensing and tracking. In these applications, data is sensed from the surroundings by sensor nodes and aggregated to and stored in sink nodes which may themselves be sensor nodes or other nodes with stronger capabilities and richer resources. The aggregated data is then forwarded to the end user, through a higher level node or a base station, for further handling and analyzing. These applications typically require the reliable collection and dissemination of data satisfying specic constraints of quality of service (QoS) such
This work was supported by Grant No. 60672113 from the National Natural Science Foundation of China.

as the predictable delay time or the maximal node availability. These requirements pose unique challenges upon resource-limited WSNs especially when they are unattendedly deployed in hostile environments such as battleelds or enemy territories, making them more susceptible to various kinds of failures than other wireless networks such as mobile ad hoc networks (MANET). Many fault tolerance schemes for WSNs [2, 3, 4, 5] have been proposed to guarantee the network dependability by recovering from node failures. However, none of these schemes take security into account when carrying out their fault tolerance functionalities. As node failures can probably result in data loss, data redundancy is thought to be essential to guarantee data dependability. Although data redundancy achieves the goal of fault tolerance of data, it also incurs security degradation by making data available in multiple locations. Situations become worse when WSNs are deployed in hostile environments where sensor nodes are easy to be captured by an adversary to obtain their stored data. In order to protect data in such situations, cryptographic keys are extensively used in collaboration with various key management schemes [6] to provide data condentiality and authentication services. Although these keys as well as key management schemes can effectively protect data exchange between sensor nodes, they are are of low efciency for protecting data not involved in communication, i.e., for providing local data security. The reason is that, without tamper-resistent hardware, the keys are typically stored together with the data in sensor nodes, making them invalidated if the nodes are captured by an adversary. In order to provide local data security in the presence of data redundancy, we propose a secure fault tolerance scheme for WSNs which applies secure distributed data checkpointing and recovery while incurring only a small amount of security degradation compared to the situation without data redundancy. Our scheme focuses on the secure fault tolerance of the sink node which is thought to be more associated with the in-network data dependability. The remainder of the paper is organized as follows. In Section 2, we give the system model of our scheme. In Sec-

978-0-7695-3242-4/08 $25.00 2008 IEEE DOI 10.1109/CIT.2008.Workshops.26

477

tion 3, we focus on the secure fault tolerance of our scheme. In Section 4, we give a comparative security analysis of our scheme. In Section 5, we conclude this paper.

2 System Model
In this section, we describe the system model for our secure fault tolerance scheme as well as assumptions regarding the model which will be used in the following sections. We consider a homogeneous network in terms of node capabilities. In the network, all sensor nodes are identical in hardware and software conguration, thus having the same capabilities in computation, communication and storage. Due to its inherent limited capabilities, a node can be operational if they can carry out their tasks in the current WSN application or nonoperational or fails because of various problems such as system crash or power depletion. Nonoperational nodes would be logically disconnected from the rest of the network. We assume that all the task data is stored in volatile storage of the nodes since persistent storage is not always available in many WSN platforms. Thus, when a node becomes nonoperational all its task data is thought to be lost. The role of aggregating and forwarding the data to the end user is called sink. The functionality of the sink can be located in any suitable sensor node which we call a sink node. The role of carrying out the sensing task and sending the sensed data to the sink is called source and a node with the role is called a source node. The above two roles are mutually exclusive. A node cannot be both a sink and a source because this would drastically deplete the limited power energy in the node. A node routing protocol is assumed to be present which can efciently forward source messages from different sources to the sink through one or more wireless hops. The sink sends the aggregated messages to the user either periodically, say every T seconds, or on demand when the user requests it to do so. A source message can be represented in the form of < 0 , 1 , . . . , t1 >, which we call a state vector of the sink since the collected messages are part of the sinks state. Each weight i in the vector represents a fractional value of the message. It is observed that many applications can apply this kind of message format to represent their in-network data. In the case of temperature monitoring, for example, we use < ID, T ime, Location, Degree > to indicate a temperature reading of Degree sensed at Location on T ime by node ID. We assume that all the weights can be represented by integers in the range of [0, l 1] (e.g., l = 264 ). In our scheme, we dene three possible failure models. In the rst model, a node becomes nonoperational due to power depletion. We assume that nodes in the network are battery-powered and know the percentage of the remaining

power of their batteries. In the second model, loss of messages happens due to the unreliable wireless transmission characteristic of the network. In the third model, a node becomes nonoperational by a system failure which is unpredictable by itself. We assume that the sink consumes more energy than the source. We dene two power percentage thresholds, operational threshold 1 and sink threshold 2 , respectively, where 1 < 2 . The node can be a sink only if the current remaining power percentage 2 . If < 1 , the node becomes nonoperational and quit the current application. If 1 < 2 , the node cannot run the sink and can still be a source. A new sink would be selected if necessary and we call the movement between the old and new sinks sink mobility. We also dene an attack model against the sink. In the model, an adversary tries to obtain and manipulate the aggregated data by randomly capturing a group of one or more nodes which may include the sink itself. When a node is captured, all the stored data would be known and manipulated by the adversary. We assume that some kind of key management scheme [6] is used to protect messages in transit, by cryptographic keys, throughout the network. Moreover, each node shares a secret key with the user who we think is logically independent of the network. The above security measure composes our schemes cryptographic infrastructure.

3 Secure Fault Tolerance


In this section, we focus on the secure fault tolerance of the sink. Initially, a node with 2 is selected by the user as the sink. When the current sink fails or reaches below 2 , a new node would become the sink. In order to guarantee secure fault tolerance of the sink under the dened failure models when the sink fails, the state of the sink is periodically checkpointed by dividing it into state shares which are then stored in a set of other nodes called state checkpoint (SC) nodes. The state shares are securely established such that any combination of up to a certain number of them exposes no information about the state. When the current sink S fails, the state shares would be transferred to the new sink S which resumes the application from the checkpointed state recovered from the state shares. In Figure 1, we can see that S checkpoints its state to a group of m SC nodes SC1 through SCm , which compose Ss state checkpoint group, before it fails. When S fails, sink mobility happens when S takes the place of S as the sink and recovers the checkpointed state from the state shares stored by the m SC nodes. All the operational sources are notied of the sink mobility event and redirect their source messages from S to S .

478

SC1

SC j

SC1

SC j

Checkpoint
S

SC m

SC m

Sink mobility (b)

(a) Sink node State checkpoint node Source node

State checkpoint path

Data path

Figure 1. The procedure overview of secure fault tolerance of the sink.

3.1

Distributed checkpointing

In the literature of fault tolerance, checkpointing techniques have been extensively studied and applied for providing fault tolerance in computing systems [7, 8]. In our scheme, as presented in Figure 1, multiple nodes are used to store the checkpointed state of the sink. The checkpointed state is composed of multiple state shares which are securely established in order to enhance the security of fault tolerance of the sink. In order to achieve this, we dene a checkpointing polynomial Qc (x) = 0 + 1 x + 2 x2 + . . . + t1 xt1 of degree t 1, where the coefcients 0 , 1 , . . . , t1 are the weights of a state vector of the sink such that there is a checkpointing polynomial for each state vector. We assume that the state of the sink contains the source messages and its remaining power percentage . Figure 2 presents the functionality of the sink S in our scheme. The sink mobility agent (SMA) node Sa is a SC node of S in its SC group, also responsible for carrying out the sink mobility procedure from S to the new sink S . U and C, both having the data structure of matrices, refer to the latest received source messages since the last data interaction with the user and the latest state shares since the last checkpointing procedure, respectively. In the matrix C, each column is occupied with all the state shares of a source message. Thus, C would be t n if n messages are added since the last checkpointing procedure. SMi,j and SSi,j refer to the jth source message from the source i and its state shares, respectively. As presented in Figure 2, Sa is selected from the SC group, < SC1 , SC2 , . . . , SCt >, of S. Upon receiving a source message SMi,j : < 0 , 1 , . . . , t1 > from Sourcei , S adds SMi,j to U . The weights 0 , 1 , . . . , t1 are then used to construct its checkpointing polynomial Qc (x) for deriving state shares. By choosing different xs,

U = , C = ; Select the sink mobility agent Sa {SC1 , . . . , SCt }; do Measure the remaining power percentage ; if Receive(SMi,j : < 0 , 1 , . . . , t1 >, Sourcei ) Add SMi,j to U ; Randomly choose different x1 , x2 , . . . , xt ; for each xk in {x1 , x2 , . . . , xt } Qc (xk ) = 0 + 1 xk + . . . + t1 xt1 ; k SSi,j =< (x1 , Qc (x1 )), . . . , (xt , Qc (xt )) >T ; Add SSi,j to C; if Receive(DataRequest, User) or Timeout(Tu ) Send U to the end user, U = ; if Timeout(Tc ) for each SCk in {SC1 , SC2 , . . . , SCt } CM : <CheckPoint, , kth row of C >; Send CM to SCk , C = ; if < 2 CM : <SinkMobility, S, NULL>; Send CM to Sa ; while ApplicationRunning Figure 2. The functionality of the sink S. all the state shares of the message are distinct from each other in order to recover the unique Qc (x) for the message. These state shares are organized as SSi,j which is then added to C. Every Tu time or when the user requests for data, the sink sends U to the user. Every Tc time when the checkpointing procedure is launched, the remaining power percentage and the kth row of C which contains the kth state shares of all the received source messages since the last checkpointing procedure are sent, in the CheckPoint control message (CM), to SCk for each k in the SC group. The is sent in order for each member in the SC group to esti-

479

mate the sinks status quo. According to our system model, when the sink fails all its task data is thought to be lost, so we implement distributed checkpointing by storing the state shares to other nodes, making it more persistent and available. Moreover, our distributed checkpointing is perfectly (t 1)-secure because the collection of less than t state shares exposes nothing about the original state of the sink for a specic source message. This is different, in terms of security, from the distributed checkpointing, we call insecure DCP scheme compared to our scheme, of directly storing all the t weights to other nodes, which would scale up the exposure of the original state for the source message if the collection of weights increases. In our scheme, we also implement an incremental checkpointing rather than a full checkpointing by sending only the message updates to both the user and the SC nodes in order to decrease the communication overhead which is very limited in WSNs. When reaches below the sink threshold 2 , S would inform Sa of the sink mobility event in the SinkMobility control message because the lack of battery power makes it unaffordable for S to continue the sink.

network delay is Dm and a delayed message would eventually arrive at the destination in at most Dm time. We dene a timeout to detect whether a message is suspected to be lost as Td such that Td > Tc + Dm . As for all the SC nodes, their corresponding AC nodes are thought to guarantee the detection and recovery of message loss for them. In the third failure model, S may suffer unpredictable failures without any notication to its SC group. Since is periodically sent to the SC group, all the SC nodes can estimate with a high level of certainty whether S fails or just suffers a message loss in the network by using the chronologically stored s and the Td timeout. Ui = , Ci = ; do Measure the remaining power percentage i ; if Receive(CM : <CheckPoint, , data>, S) Add CheckPoint.data to both Ui and Ci ; if Receive(CM : <SinkMobility, S, S >, Sa ) Send Ui to S , Ui = ; if Timeout(Tac ) CM : <CheckPoint, i , Ci >; Send CM to ACi , Ci = ; if Timeout(Td) Decide(S, ); if Decision result for S is Failure CM : <SinkMobility, S, NULL>; Send CM to Sa ; if i < 1 CM : <SelfMobility, SCi , ACi >; Send CM to ACi ; while ApplicationRunning Figure 3. The functionality of the non-SMA member SCi in the SC group of S. Figure 3 presents the functionality of non-SMA member SCi in the SC group of S. Upon receiving a CheckPoint message from S, SCi stores the message which contains part of Ss checkpointed state both locally and remotely in its AC node ACi . Upon receiving a SinkMobility message from Sa for transferring the sink from S to S , SCi would send its part of the checkpointed state to S . The time Tac is the timeout for the checkpointing procedure of SCi which follows the fault tolerance steps described in [2]. When the timeout Td is triggered for a suspicious message loss, SCi invokes the function Decide() which takes as a parameter to estimate the cause of the time expiration for S. If SCi thinks that S fails rather than suffers a message loss, it would inform Sa to carry out the sink mobility procedure. If SCi itself fails when i < 1 , ACi would take its place as a new SCi .

3.2

Failure detection and recovery

In this section, we focus on how to detect three possible failures described in the failure models and how to recover the sink functionality from these failures. In the rst failure model, the current sink S rst reaches below its sink threshold 2 before it runs out of battery power. According to Figure 2, when < 2 , S sends a SinkMobility message to the SMA node Sa which would then select a new sink S to resume the sink functionality. When a SC node reaches below its operational threshold 1 , it would become nonoperational because of the power depletion, making it quit the SC group and lose all its task data including its part of the checkpointed state of S. Therefore, it is necessary to provide fault tolerance to all the SC nodes in this case. To address this, we use the scheme described in [2] to let each SC node have a fault tolerance path representing its mobility trace. We call the checkpoint node for a SC node auxiliary checkpoint (AC) node in our scheme. According to the model-based analysis in [2], having exactly one such checkpoint node turns out to be the optimal setting for the scheme. Addition of more than one such nodes would only consume more energy without signicant impact on fault tolerance. When the current SC node fails, the AC node would replace it to be a member of the SC group of S. In the second failure model, a message loss from S can be detected by estimating the message arrival time on the side of each SC node. We know from Figure 2 that every Tc time S carries out the checkpointing procedure and sends state shares to the SC group. We assume that the maximum

480

Ua = , Ca = ; do Measure the remaining power percentage a ; if Receive(CM : <CheckPoint, , data>, S) Add CheckPoint.data to both Ua and Ca ; if Receive(CM : <SinkMobility, S, NULL>, S or SCi ) Select a new sink S ; CM : <SinkMobility, S, S >; Send CM to all non-SMA SC nodes; Send Ua to S , Ua = ; if Timeout(Tac) CM : <CheckPoint, a, Ca >; Send CM to the AC node of Sa , Ca = ; if Timeout(Td) Decide(S, ); if a < 1 CM : <SelfMobility, Sa, Sa s AC node>; Send CM to the AC node of Sa ; while ApplicationRunning Figure 4. The functionality of the sink mobility agent Sa .

much could they inuence the sink if their stored information has been retrieved and analyzed.

4.1

Resilience against node capture

Figure 4 presents the functionality of the sink mobility agent Sa . The procedure is much like that of the non-SMA SCi in Figure 3 except for the cases when a SinkMobility message is received and when Sa thinks that S fails when the timeout Td is triggered. In these cases, Sa selects a new sink S when S fails to run the sink and informs all the nonSMA SC nodes to send their part of the checkpointed state of S to S . Its part of the checkpointed state is also sent since Sa itself is a member of the SC group of S.

4 Security Analysis
In this section, we want to give a security analysis of our fault tolerance scheme under the dened attack model. The attack model has been extensively applied in the literature of WSNs to evaluate the security strength of a key management scheme [6]. Since communication links between sensor nodes are usually protected by cryptographic keys when taking security into account, compromising nodes by directly capturing them is thought to be one of the most effective ways to expose the secret information to the adversary. Moreover, tamper-resistent hardware is still economically unsuitable to be applied in low-cost sensor nodes, making node capture even more attractive to implement. To evaluate security of our scheme, we try to answer the question that when certain number of nodes are captured, how

In our attack model, an adversary intends to obtain the information, i.e., the state, stored in the sink to analyze the in-network data stream. The adversary can either directly capture the sink or indirectly capture certain members of its SC group in order to obtain the information. If this information is unauthorizedly obtained by the above means, we say that the sink is compromised. Since the sink and all the members of its SC group are undistinguishable from the other sensor nodes in appearance, we assume that the adversary would randomly capture c sensor nodes in the network. As presented in Figure 2, 3 and 4, the number s of state shares of a source message and the number m of members of the SC group of the sink are both xed to t, same as the degree of Qc (x). As for s, it cannot be less than t in order for the new sink to successfully recover the unique Qc (x) for each state vector when the current sink fails. Although fault tolerance may be enhanced if s > t state shares are stored, security strength would be decreased by doing so, since it would be more likely for the adversary to obtain enough state shares ( t) to recover the corresponding Qc (x)s. As for m, it seems optimal for a SC node to store exactly one state share for each source message. Either storing more state shares for each source message or letting m > t would decrease security strength, although letting m > t along with s > t would actually enhance fault tolerance by providing a larger SC group of the sink. We now consider the possibility, P (c), of compromising the sink when the adversary randomly captures c sensor nodes in the network of the size n. If the sink happens to be captured by the adversary, it is obviously compromised. If the sink survives the node capture, the adversary has to capture at least tm SC nodes to compromise the sink. s Figure 5 presents P (c) in terms of c. We do not include the result of the inuence introduced by the underlying cryptographic infrastructure we dened before, since it does not affect our security analysis. As presented in Figure 5.a, we can see that the setting of deriving the least required number of state shares for a given t as well as storing exactly one state share in a SC node for each source message (m = s = t) turns out to be the most secure setting for our scheme.

4.2

Security comparison

In this section, we give a security comparison among our scheme, the insecure DCP scheme and the scheme presented in [2], which we call the NSS scheme. We also give the result of the non-fault-tolerance (NOFT) version of the

481

1 0.9 0.8
Possibility of compromising the sink (P(c))

1 0.9 0.8
Possibility of compromising the sink (P(c))

t = 4, Insecure DCP 0.7 0.6 0.5 0.4 0.3 0.2

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 m = 4, t = 4, s = 4 m = 8, t = 4, s = 8 m = 2, t = 4, s = 4 m = 2, t = 4, s = 8 100 200 300 400 500 600 700 800 900 1000
Number of nodes captured (c)

m = 1, NSS scheme Benchmark

0.1 0 0

NOFT, NSS scheme m = 4, t = 4, s = 4 m = 1, NSS scheme t = 4, Insecure DCP 100 200 300 400 500 600 700 800 900 1000
Number of nodes captured (c)

(a) Relationship between m, t and s

(b) Comparison with other fault tolerance schemes

Figure 5. The possibility, P (c), of compromising the sink when c sensor nodes are captured. The network size is assumed to be 1000.

NSS scheme to establish a comparison benchmark. We assume that the other three schemes are all deployed with the same cryptographic infrastructure as in our scheme. We also use m to represent the number of checkpoint sensors of the sink , which is xed to 1, in the NSS scheme without any confusion with m in our scheme. We show our comparison result also in Figure 5. As presented in Figure 5.b, results of the insecure DCP scheme and the NSS scheme are undistinguishable. All the settings incur security degradation compared to the benchmark NOFT scheme when introducing fault tolerance. It can be inferred from the analysis that with properly congured parameters (i.e., t, s and m), our scheme exhibits better robustness against node capture than the insecure DCP scheme and the NSS scheme.

References
[1] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam and E. Cayirci. A Survey on Sensor Networks. In IEEE Communications Magazine, 40(8): 102-114, Aug. 2002. [2] I. Saleh, A. Agbaria and M. Eltoweissy. In-Network Fault Tolerance in Networked Sensor Systems. In Proceedings of the 2006 workshop on Dependability issues in wireless ad hoc networks and sensor networks DIWANS 06, pp. 47-54, Los Angeles, CA, USA, Sep. 2006. [3] F. Koushanfar, M. Otkonjak and A. Sangiovanni-Vincentelli. Fault Tolerance in Wireless Ad-Hoc Sensor Networks. In IEEE Sensors, pp. 1491-1496, Jun. 2002. [4] M. Ishizuka and M. Aida. Performance Study of Node Placement in Sensor Networks. In Proceedings of the 24th International Conference on Distributed Computing Systems Workshops, pp. 598-603, 2004. [5] H. W. Liu and C. D. Mu. An Efcient Algorithm for Fault Tolerance in Multisensor Networks. In International Conference on Machine Learning and Cybernetic, vol. 2, pp. 12581262, Aug. 2004. [6] Y. Xiao, V. K. Rayi, B. Sun, X. Du, F. Hu and M. Galloway. A survey of key managment schemes in wireless sensor networks. In Computer Communications, 30(11-12): 23142341, Sep. 2007. [7] E. N. Elnozahy, L. Alvisi, Y. M. Wang and D. B. Johnson. A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys, 34(3): 375-408, Sep. 2002. [8] J. S. Plank. An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implemenatation and Performance. Technical Report UT-CS-97-372, Department of Computer Science, University of Tennessee, Jul. 1997.

5 Conclusion and Future Work


In this paper, we propose a secure fault tolerance scheme for wireless sensor networks. As data redundancy is essential to guarantee fault tolerance of the sink, our scheme manages to bridge the security gap between itself and the situation when fault tolerance is unavailable. We also compare our scheme with the NSS scheme presented in [2] which also applies data checkpointing and recovery to tolerate sink failures. In the future, we will focus on addressing tradeoffs between security, performance and dependability as well as other related parameters. Detailed comparisons with other fault tolerance schemes in terms of performance will be conducted and other security means will also be explored in order to provide more secure fault tolerance schemes.

482

Das könnte Ihnen auch gefallen