Sie sind auf Seite 1von 21

Distributed Outsourcing of Computation on Private Data

A Distributed Implementation of the Certied Information Access Service

Carlo Blundo

Emiliano De Cristofaro

Aniello Del Sorbo

Clemente Galdi

Giuseppe Persiano

Abstract In this paper we consider the problem of securely outsourcing computation on private data. We present a protocol for securely distributing the computation of the data structures used by current implementations of the Certied Information Access primitive. To this aim, we introduce the concept of a Veriable Deterministic Envelope that may be of independent interest and of which we provide practical implementations based on standard cryptographic assumptions. We further discuss a prototype implementation of our proposal based on the PUB-WCL framework developed at the University of Paderborn for sharing computation across a network.

Introduction

The advances in communication technology of the last decades have made it possible the emergence of global-scale computer networks. Networks of such a large scale are currently used to share information for fast dissemination and ecient access. It has been observed that most of the nodes in a global size network are idle most of the time and it is very tempting to use the idle cycles of this huge computer to our advantage. The typical scenario that we envision in this paper consists of a user that outsources its computation to the network. Several researchers have considered the challenging issues related to load balancing and service discovery that naturally arise in this context. In this paper, instead, we concentrate on the security issues related to the scenario outlined above and, more specically, on the security issues relate on outsourcing computation on private data. Sharing computation across the network has been successfully achieved for several computation-intensive tasks like number-theoretic computation [3] or search for signs of extraterrastrial intelligence [5]. We notice that in both cases the input to the shared computation are public and security issues are trivial or non-existing. In other applications in which the outsourced computation is to be carried out on public data, most of the security issues are directly related to the problem of user authentication: is the user entitled to use the network for his own computation?
This work was partially supported by the European Union under IST FET Integrated Project AEOLUS (IST-015964). Dip. di Informatica ed App., Universit` di Salerno, 84084 Fisciano (SA), Italy. blundo@dia.unisa.it. a Dip. di Informatica ed App., Universit` di Salerno, 84084 Fisciano (SA), Italy. emidec@dia.unisa.it. a Dip. di Informatica ed App., Universit` di Salerno, 84084 Fisciano (SA), Italy. anidel@dia.unisa.it. a Dip. di Scienze Fisiche, Universit` degli Studi di Napoli Federico II, 80126 Napoli, Italy. a galdi@na.infn.it. Dip. di Informatica ed App., Universit` di Salerno, 84084 Fisciano (SA), Italy. giuper@dia.unisa.it. a

In some cases though one wants to share computation that is to be performed on private data. A typical example is when the user wants to share the computation associated with a cryptographic algorithm. For example, in [9], the authors showed how a computationally weak device like a mobile device could be helped in computing the encryption of a message by faster devices. It is obvious that in this case the input of the computation, that is the cleartext, is private for otherwise there would not be any need for encryption in the rst place. Distributing computation on private data is thus more problematic. On one side, the user wants to harness the computational power of network to speed-up his computation; on the other side, he does not want to trust the network with his private data. In this paper, we show that it is possible to securely share the computation associated with the construction of a data structure to be used for Certied Information Access. To do so, we introduce a new cryptographic primitive Veriable Deterministic Envelopes and show how to use this new primitive to securely share the construction of the data structures needed for Certied Information Access. We then give a very ecient implementation of Deterministic Envelopes. We also describe an implementation based on the PUBWCL framework developed at the University of Paderborn. Certied access to a database. The Certied Information Access (CIA for short) is a cryptographic primitive introduced by [9] that provides certied access to a database. Specically, the database owner publishes a snapshot of its current database, which we refer to as the Public Information, on a trusted entity. Once the public information is available, any user may issue queries to the database. The database owner gives the user the query result along with a short proof that the result is consistent with the published snapshot. CIA is particularly suited for slowly changing databases like the Domain Name System. In this case a query returns the IP address associated to a name. One trivial way of implementing CIA is to publish the whole content of the database on a trusted server. A user can thus compare the received reply with the one contained in the public copy of the database (or actually directly query the trusted server). This solution is, of course, neither secure nor ecient. Since the database has to publish its whole contents, the condentiality of the information therein contained is compromised. Furthermore, since all the elements in the database need to be transmitted and stored, the communication and space complexity of this solution is linear in the size of the database For these reasons the public information should satisfy the following properties: Compactness: The size of the public information should be smaller than the size of the database. Condentiality: The public information should not reveal anything about the actual content of the database. Correctness: A correct answer to a query should be consistent with the public information with probability one. Soundness: Any wrong answer to a query will be detected with high probability. Currently, a way of implementing CIA primitives is by means of a new cryptographic primitive, namely mercurial commitments introducecd and studied in [12, 8, 7, 11]. Unfortunately, the implementation of such primitive are computationally intensive. On one hand, the generation of the public information is time consuming also on current servers. On the other hand, although

the verication procedure can be easily executed on a pc in few seconds, it still requires much more time on mobile devices. In [9] the authors provide a distributed implementation of a CIA service. In such implemetation, the database owner builds a tree representing the database. Each node in the database is associated to the computation of modular exponentations. The system presented in the above paper, by using pre-computation peers, securely distributes, for each node in the tree, the computation of the modular exponentiations. The database owner is simply required to locally combine partial results obtained by the peers. Pre-computation peers are only required to precompute elements of the form g r mod p where g and p are xed elements and r is randomly chosen by the peer. Our Contribution. In this paper we present an ecient distributed implementation of a CIA service. Our implementation allows the database owner to completely outsource the tree computation. Clearly, the information in the database need to be hidden before they are transferred. Unfortunately, as it will be clear in the next sections, it is not possible to use simple (neither randominzed nor deterministic) encryption schemes. To this aim, we rst present a new primitive, which we call Veriable Deterministic Envelope (or VDE for short), that allows the encryption of all the elements in the database. The encrypted version of the database is then transferred to a web-cluster that computes the tree associated to the database and sends it back to the database owner. In this scenario, the database owner still needs to locally compute an encryption of each element in the databaes. As in [9] we describe a solution for static databases, i.e., databases in which the content does not change. This is however without loss of generality as our main target will be slowly-changing databases for which every time the database is updated a new snapshot is published. For rapidly changing databases, one could use the construction of [11] which is however not very practical. We assume that there exists a trusted entity that does not collude with the entity holding the database. The trusted party is only required to generate some of the public parameters for the scheme and to store the databases snapshop. Furthermore, the system comprises a sucient number of peers whose only role is to compute modular exponentiations. We assume that such peers are honest, that is, they properly execute the protocols, but a small fraction of them may be curious, in the sense that they may collude in order to infer additional information from the messages they have exchanged. This paper is organized as follows: In Section 2, we describe the cryptographic primitives used to implement our prototype. In Section 3 we describe a new primitive, which we call the Veriable Determinitic Envelope. In Section 4 we show how to the new primitive can be used to securely outsource computation. In Section 5 we report the design principles for a distributed architecture implementing a CIA service.

Cryptographic Background

In this section we describe the cryptographic primitives we will use in this work. One Way Trapdoor Permutations A trapdoor permutation family over D consists of the following triple of algorithms: (PermGen, Permute, Invert). The randomized algorithm PermGen, on input a security parameter, outputs the description f of a permutation along with the corresponding trapdoor tf . The algorithm Permute, given the permutation description f

and a value x D, outputs y D, the image of x under the permutation f . The inversion algorithm Invert, given the permutation description f , the trapdoor tf , and a value y D, outputs the preimage of y under the permutation f . When it engenders no ambiguity, we will write f (x) instead of Permute(f, x) and f 1 (y) instead of Invert(f, tf , y). We require that Permute(f, ) be a permutation of D for all possible (f, tf ) output by PermGen, and that Invert(f, tf , Permute(f, x)) = x hold for all (f, tf ) output by PermGen and for all x D. Let (f, tf ) PermGen(1k ) and y = f (x). A trapdoor permutation family is said to be one-way if, given y, and f , no polynomial time algorithm can compute x with non-negligible probability. Deterministic Signatures A deterministic signature scheme is a triple of algorithms S = (KeyGen, Sign, Ver). The algorithm KeyGen is a randomized algorithm that, on input a security parameter, outputs a pair (sk, vk), consisting of a signature key sk and a verication key vk. The algorithm Sign is a deterministic algorithm that takes as input a message x and the signing key sk and outputs a signature for x. The verication algorithm Ver is a (possibly randomized) algorithm that takes as input a message, x, a signature, and a verication key vk and outputs 1 if and only if = Sign(x, sk). A signature scheme is secure against selective forgery attacks if there exist no (probabilistic) polynomial time algorithm that, on input a message x and the verication key vk, outputs a valid signature for x with non-negligible probability. Mercurial Commitments A Mercurial Commitment scheme, studied in [12, 8, 7, 11], allows two possible ways of creating a commitment. A hard commitment h to a message m, corresponding to a classical commitment, meets both the hiding and binding properties. In other words, the hard commitment h for a message m does not reveal any information on m. At the same time, h cannot be opened to a value m = m. A soft commitment does not take any message as input and cannot be opened as hard commitments. On the other hand, soft commitments are not binding, that is, there exists a special tease operation that allow to open a soft commitment to any message m. A hard commitment can be opened and teased only to the original message m. A feature of Mercurial commitments schemes is that hard and soft commitments are indistinguishable. Thus, given a mercurial commitment, it is not possible to determine a priori whether it is a hard or a soft one. More formally, a Mercurial Commitment scheme consists of the following t-uple M C = (Setup, Commit, VerifyOpen, SoftCommit, VerifyTease) of possibly randomized algorithms. In this section we show how we implement Mercurial Commitment schemes. Our implementation is based on the hardness of the discrete logarithm in cyclic groups and is based on [8]. The Setup procedure consists in randomly picking a random prime p and two generators g, h of the cyclic group Zp . All operations are to be considered in the group Zp unless otherwise specied. The Commit procedure takes as input the string m and public parameters (p, g, h) and computes com and dec as follows: randomly pick r0 , r1 Zp and set com = (g m (hr1 )r0 , hr1 ) and dec = (r0 , r1 ). The VerifyOpen procedure takes as input the public parameters (p, g, h), a message m, a commitment com = (C0 , C1 ) and decommitment key dec = (r0 , r1 ) and consists in checking r whether C0 = g m C1 0 and C1 = hr1 . The SoftCommit procedure takes as input the public parameters (p, g, h) and computes Scom and Sdec as follows: randomly pick r0 , r1 Zp and set Scom = (g r0 , g r1 ) and Sdec = (r0 , r1 ). 4

As stated above, the Tease procedure can be applied both on hard and soft committments. In the former case, the Tease procedure takes as input a hard commitment com = (g m (hr1 )r0 , hr1 ), the decommitment key dec = (r0 , r1 ) and the string m and simply returns = r0 . In the latter case, the procedure takes as input a soft commitment Scom = (g r0 , g r1 ), the teasing key Sdec = (r0 , r1 ) and the string m and returns = (r0 m)/r1 (mod p 1). The VerifyTease procedure takes as input public parameters (p, g, h) and teasing of commitment (C0 , C1 ) to string m consists in checking whether C0 = g m C1 . Correctness and security of this scheme have been shown in [8]. Certied Information Access In the context of secure databases, an implementation of a certied information access has to provide the users with a database service in which each answer to a query consists of the actual query results and a proof that such information is indeed the actual content of the database. The verication of the proof can be accomplished by using some public information that the database provided before the query was issued. Such public information should not reveal anything about the actual content of the database. In a CIA system we identify three parties, the CertifiedDBOwner the User and the PubInfoStorage. In a setup phase, the PubInfoStorage generates the public parameters that will be used for the CIA service. The PubInfoStorage is assumed not to collude with the CertifiedDBOwner. The CertifiedDBOwner, based on public parameters and the content of the database, produces the public information that is then sent to the PubInfoStorage. Whenever a User makes a query to the CertifiedDBOwner, he obtains an object that contains the answer to the query and some information that can be used, along with the information held by the PubInfoStorage, to prove that the answer is indeed correct and that the CertifiedDBOwner has not cheated. Certied Information Access via Mercurial Committments In this section we describe how to implement the CIA functionality based on Mercurial Committments. This description resembles the one in [8]. We consider a simple database D associating to a key x a value D(x) = v. Let us assume that all keys have the same length , i.e., x {0, 1} . A reasonable choice is = 60 since on one hand it allows to have a large key space and, at the same time, it is possible to use hash functions for reducing the key size to this small value. The database D can thus be represented by a height- binary tree where leaf identied by the binary representation of x contains the value v = D(x). If no value is associated by the database to key x, the leaf numbered x contains the special value . A rst solution to the CIA problem could be to constructs a binary tree as follows: Leaves of the tree contain classical commitments of elements of the database. Each internal node of the tree contains the classical commitment to the contents of its two children. The classical commitment contained in the root of such a tree constitutes the public information that is sent to the PubInfoStorage. To answer to a query about x, the CertifiedDBOwner simply decommits the corresponding leaf and provides the authenticating path (along with all the decommitments) to the root. The problem with this approach is that it requires time exponential in the height of the tree: if we choose = 60, then 261 1 commitments need to be computed. This is where Mercurial Commitment helps. Observe that the exponential-size tree might have large empty subtrees (that is, subtrees where each leaf is a commitment to ). Hence, instead of actually computing such a subtree ahead of time, the CertifiedDBOwner forms the root of this subtree as a soft commitment and does not compute anything for the rest of the 5

tree. Thus, the size of the tree is reduced from 2 +1 1 to at most 2 |D|, where |D| represents the number of element in the database. Answering to a query about x such that D(x) = is still done in the same way, i.e., the CertifiedDBOwner decommits the leaf corresponding to x along with all the committments on the path from the root of the tree to the leaf. If instead D(x) =, the CertifiedDBOwner teases the path from the root to x. More precisely, the path from the root to x will consists of hard commitments until the root R of the empty subtree containing x is encountered. All hard commitments from the root of the tree to R are teased to their real values (recall that hard commitments can be teased only to their real value). Then, the CertifiedDBOwner generates a path of soft commitments from R to (the leaf with number) x ending with the commitment of . Each soft commitment corresponding to a node along the path is teased to the soft commitments corresponding to its two children. The User simply needs to verify that each teasing has been correctly computed. We stress that for positive queries (that is, queries for x such that D(x) =) the User expects to see opening of hard commitments whereas for negative queries (that is, queries for x such that D(x) =) the User expects to see teasing of commitments; some of them will be hard commitments and some will be soft commitments but the user cannot say which ones are which. Due to space limitations, we describe the above procedures in details in Appendix A.

Veriable Deterministic Envelopes (VDE)

In this section we introduce a new primitive, the Veriable Deterministic Envelope, or VDE for short, which we will use in our implementation.

3.1

VDE: Denition

Denition 1 (Veriable Deterministic Envelope) A Veriable Deterministic Envelope scheme (VDE for short) is a triple (EnvKeyGen, EnvelopeGen,EnvelopeOpen) of algorithms, involving a sender S and a receiver R, described as follows: A randomized Key generation algorithm EnvKeyGen executed by the sender. The algorithm EnvelopeGen generates a pair (pk, sk). The value pk is made public while sk is kept secret by the sender. A deterministic envelope generation algorithm EnvelopeGen executed by the sender; The algorithm EnvelopeGen, on input a string x, and the pair (pk, sk) computes a V DE for x. In order to simplify the notation, we will denote by EnvelopeGen(x) the VDE EnvelopeGen(x, (pk, sk)). An interactive Opening protocol EnvelopeOpen executed by both the sender and the receiver. The protocol is started by the receiver who provides a string x. At the end of the protocol, the receiver is able to compute the (correct) value for EnvelopeGen(x). The following is a security denition for VDE: Denition 2 A VDE scheme (EnvKeyGen, EnvelopeGen, EnvelopeOpen) is OW -secure if the following holds: a. For any x, given pk and sk, the sender can compute EnvelopeGen(x) in polynomial time. b. For any x, given pk, the receiver R by executing the Opening protocol EnvelopeOpen can compute in polynomial time the (correct) value y = EnvelopeGen(x). 6

c. For any x, given pk, there exists no polynomial time algorithm that computes y = EnvelopeGen(x) with non-negligible probability. d. For any VDE y = EnvelopeGen(x), given pk, there exists no polynomial time algorithm that computes x with non-negligible probability.

3.2

A general construction

In the following we describe a general construction for a VDE scheme based on one-way trapdoor permutations. Denition 3 (A VDE Scheme) Let be a one-way trapdoor permutation family. The proposed VDE consists of the following algorithms: EnvKeyGen(): The algorithm takes as input 1k , where k is a security parameter and outputs the pair (pk, sk) = (( , (f, g)), (tf , tg )), where is an integer, (f, tf ) PermGen(1k ) and (g, tg ) PermGen(1k ), are one-way trapdoor permutations. The sender publishes the pair (f, g) while (tf , tg ) is kept secret. EnvelopeGen(, ): The algorithm is executed by the sender. It takes as input a value x {0, 1} , and the pair (pk, sk) and outputs e = EnvelopeGen(x, (pk, sk)) = f (g 1 (x)). The value e is sent to the receiver. EnvelopeOpen: Is an interactive protocol executed by the sender and the receiver. Let x {0, 1} . The Opening protocol works as follows: The receiver R sends x to S. The sender S computes w = g 1 (x) and sends w to R. The receiver checks whether g(w) = x and computes EnvelopeGen(x) = f (w). The security of the presented scheme is proven in the following Lemma whose proofs are reported in Appendix C. Lemma 1 If f and g are a one-way trapdoor permutations, then the Veriable Deterministic Envelope scheme described above satises the following property: For any message x, there exists no polynomial time algorithm A that, on input x, f and g, computes y = EnvelopeGen(x) with non-negligible probability, where the probability is taken over the possible choices of x and the random coins tossed by A. Lemma 2 If f and g are one-way trapdoor permutations then the Veriable Deterministic Envelope scheme described above satises the following property: For any veriable envelope y = EnvelopeGen(x) = f (g 1 (x)), there exists no polynomial time algorithm A that, on input y, f and g, computes x with non-negligible probability, where the probability is taken over the possible choices of y and the random coins tossed by A. Given the above Lemmas, we can derive the following: Theorem 3 If f and g are one-way trapdoor permutations then the Veriable Deterministic Envelope scheme described above is OW-secure. Proof. The proof of statement (a) can be derived easily. The proof of statement (b) while statements follows immediately by the fact that f and g are permutations. Finally statements (c) and (d) have been proven in Lemma 1 and 2. 7

3.3

VDE: A practical instantiation

In this section we describe a practical instantiation of the VDE primitives we have introduced in the previous section. In our system we use the following implementation of the VDE: The function f is implemented as a deterministic RSA encryption. The function g is implemented as a deterministic RSA signature. The dierent phases of the VDE are implemented as follows: EnvKeyGen(): The algorithm EnvKeyGen outputs the t-ple ((nE , eE , dE ), (nS , eS , dS )), where (nE , eE , dE ) are the parameters for the RSA encryption scheme; (nS , eS , dS ) are the parameters for the RSA signature scheme; The parameters pk = ((nE , eE ), (nS , eS )) are published while sk = (dE , dS ) are kept secret. EnvelopeGen(x, (pk, sk)): Given a value x, the sender computes e = E(Sig(x, dS ), eE ) = (xdS mod nS )eE mod nE . The value e is sent to the receiver. EnvelopeOpen: Let x be a key. The Opening protocol works as follows: The Receiver R sends x to the S. The Sender S computes g = Sig(x, ds ) = xdS mod nS and sends g to R. The receiver checks whether g eS = x mod nS and computes EnvelopeGen(x, (pk, sk)) = g eE mod nE .

Securely Outsourcing CIA using VDE

As stated in the previous section, a CIA service can be implemented by using mercurial committments. An implicit assumption is that the CertifiedDBOwner uses a deterministic algorithm to associate the elements in the database to the leaves of the tree, e.g., the value v = D(x) is associated to the leaf identied by the binary representation of the key x. Such a deterministic approach allows the user to verify that the received proof corresponds to a specic path that, in turn, corresponds to the queried key. Notice that the above argument does not rule out the possibility of (pseudo-)randomly assigning keys to tree leaves. However, if the CertifiedDBOwner uses a randomized strategy for such assignment, for each key in the database, there should exists a certied way of binding the key along with the randomness used to create the path in the tree. Thus, a deterministic placement of database elements to tree leaves guarantees the security of the CIA service. On the other hand, if the CertifiedDBOwner is willing to outsource the tree construction, the problem of condentiality of information contained in the database has to be taken into account. Notice, however, that there exist two conicting requirements. On one hand, the CertifiedDBOwner need to hide the information contained in the database from the entity that constructs the tree. Specically, it should not be possible to check whether there exists in the database a value associated to a specic key. On the other hand, the User

should be able to verify that the answer received for a given query actually corresponds to the submitted key. A rst solution could be to (deterministically) encrypt the keys of the elements in the database (along with the values associated to them). If the CertifiedDBOwner uses a symmetric key encryption scheme, she can securely hide all the information from the computing entity. Unfortunately, after the tree has been constructed, she need to publish the key used to encrypt the keys in order to allow the users to properly verify the answers. A second possible approach could be to encrypt each key in the database using a public key encryption scheme. In this case, the leaf associated to the key is identied by the binary representation of the encryption of the key itself. This approach allows the User to compute by herself the permuted value of the key. Unfortunately the entity that computes the tree can execute the same operation herself by obtaining information on the existence of some specic key in the database. Another possible solution could be to place each element in the database in the leaf identied by the (binary representation of the deterministically computed) signature of the key. Notice that also this solution does not prevent the computing entity to check whether or not some key belongs to the database by simply verifying if there exists a valid signature for the key under consideration. For the above reasons, we need a way of deterministically permuting the keys in the database in a way that the computing entity will not be able (a) to compute the key given its permuted value and (b) to compute the permuted value of any key of its choice. Furthermore, the user, by interacting with the CertifiedDBOwner, should be able (a) to compute the permuted value associated to a given key and (b) to verify that the computed value is the one used by the CertifiedDBOwner during the tree construction phase. As stated in the previous section, such properties can be achieved by using a VDE scheme for permuting the elements in the database. Given such a permutation, the user can, by interacting with the CertifiedDBOwner, compute the value associated to each key. On the other hand, the computing entity, given the permuted database, cannot tell whether or not a key is therein contained without interacting with the CertifiedDBOwner.

The Architectural Design Principles.

In this section we describe the architectural design principles of the implemented prototype. The PUBWCL library The PUBWCL library [6] has been designed to execute massively parallel algorithms in the BSP model on PCs distributed over the Internet, exploiting unused computation donated by PC owners. This goal is achieved eciently by supporting thread migration in Java and providing a load-balanced programming interface. The BSP (Bulk Synchronous Parallel) model was rst introduced in [15] in the context of distributed computation and parallel computing. This model provides a parallel computing scheme consisting of a set of processors with local memory, an interconnection mechanism allowing point-to-point communication, and a mechanism for barrier-style synchronizations. A BSP program consists of a set of processes and a sequence of supersteps, i.e. time intervals bounded by the barrier synchronization. A more detailed description of the PUBWCL library can be found in Appendix D.

A BSP-based tree construction implementation Here we present a distributed algorithm for distributing the construction of the tree of committments among the nodes a cluster. Notice that, because of the specic characteristics of the BSP model, the work assigned to all nodes should be, for eciency reasons, more or less the same. We rst compute the hash value of the key of each element of the database and store it in a ordered list of nodes. Such list is partitioned into 2x sub-list of ordered nodes. Sub-list i includes all the elements in the subtree rooted at the node identied by the binary representation of i. The parameter x is computed from the number of available processors in the BSP cluster. For example is the cluster contains 16 processors, a good value for x is 4, so that each processor of the cluster will work on a sub-list of ordered nodes. The list is then sent to the BSP cluster, where a task, spanning on 2x processors, has been previously created. The task will run the following algorithm: If the height of the resulting subtree is not too high (i.e. bigger than a chosen parameter k) , then processor i will compute the subtree from the ordered block of leaf nodes assigned to it (i.e. the i-th sub-list). Once that the computation of the subtree is completed, processor i sends (a) the whole subtree back to the CertiedDBOwner, in order to cache it, and (b) the root node of the subtree to processor 0. If the subtree is higher than k, then processor i will rst partition the list of nodes in the same way the whole database was partitioned before sending it to the cluster. Processor i will then compute, and send back to the CertiedDBOwner, one sub-subtree at a time keeping only its root in a list of nodes. This is done in order to keep the memory footprint low on each processor i. When processor i has done with its own sub-subtree and has a list of root nodes, it will compute the top part of its subtree. Again the resulting tree is sent back to the CertiedDBOwner while the root node to processor 0. Processor 0 will retrieve all the roots from the other processors and will build a new ordered list of leaf nodes. From this new ordered list it will build a new subtree and will send it to the CertiedDBOwner. This subtree represents the top part of the whole tree and its root node contains the commitment of the whole database. Once that the CertiedDBOwner has received all the subtrees (including the last top subtree from PID 0) it is ready to answer to queries from clients.

Figure 1: The advantage of outsourcing the tree computation is clearly visible in this graph. Analisys In Figure 1 we compare the time needed to compute the whole subtree locally against the time to outsource the computation on a remote cluster. With no surprise the total 10

Figure 2: Detailed view of the timing on the cluster. local time is linear with the number of elements in the database. In Figure 2 we show, in greater detail, the time needed for permuting the elements in the database with a VDE scheme (labeled V DE), the time to transfer the permuted elements to the cluster (labeled T ransf er), the time the cluster needs to compute the tree with the aforementioned BSP algorithm (labeled T reeComputation, this also includes the time to transer it back to the CertiedDBOwner) and the total time the CertiedDBOwner takes to outsource the tree computation (labeled T otaltime). The VDE scheme used to permutate the elements of the database is computed locally by the CertiedDBOwner and this is clearly visible in this graph. This time drives up quite fast the total time needed for outsourcing the computation of the tree. Even so, the overall time is much lower than a local computation of the whole tree. Architectural design In this section we briey describe the architectural design for a system implementing the primitives described above. We identify ve dierent entities, CertifiedDBOwner, User and PubInfoStorage, Peer and Cluster. As described in Section 2, a CIA service consists essentially of three phases: An initialization phase in which the parameters used for the scheme are generated; A Tree construction phase in which the tree of committments is constructed; A query phase, in which the User queries the database. In our setting, the applications cooperate as follow: Generation of the parameters: The parameters used for the implementations are generated as follows: At startup, the PubInfoStorage generates the public parameters that will be used for the Mercurial Commitment implementations. Since public parameters are used both for generating the public information and verifying all the answers to queries, such generation is carried out once, and the parameters are stored in a le. The CertifiedDBOwner generates the pair ((nE , eE , dE ), (nS , eS , dS )) that will be used for the VDE computations. The CertifiedDBOwner sends to the PubInfoStorage pk = ((nE , eE ), (nS , eS )) while it keeps secret sk = (dE , dS ).. Construction of the Tree of Committments: The tree of committments is constructed in three main steps: Local VDEs Computation: For each element (key, value) in the database the CertifiedDBOwner locally computes EnvelopeGen(h(key), (pk, sk)), where h is a 11

collision-resistant hash function. Once the VDEs for all the keys have been computed, the CertifiedDBOwner holds a new database consisting of the pairs (EnvelopeGen(h(key), (pk, sk)), value). Distributed Computation of Tree Leaves: For each element in the database, the CertifiedDBOwner computes the hard commitment to the string value with the help of the Peer applications1 . After this phase, the CertifiedDBOwner holds a database consisting the pairs (EnvelopeGen(h(key), (pk, sk)), com(value)). Outsourcing the Tree Construction: The CertifiedDBOwner sends the transformed database to the Cluster by using the PUBWCL library. The Cluster computes the public information as described in previous paragraph. that is sent back to the CertifiedDBOwner who forwards it to PubInfoStorage. User Queries: Once the public information has been published by the PubInfoStorage, the User can query the database as follows: Let key be a key. The User engages in a Opening protocol with the CertifiedDBOwner. At the end of the protocol, the User computes y = EnvelopeGen(key, (pk, sk)). The CertifiedDBOwner sends to the User the answer to the query along with the open/tease of all the nodes on the path identied by the binary representation of y. The User veried the proof received by the CertifiedDBOwner and accepts it if it is consistent with the public information held by the PubInfoStorage. When the User queries the CertifiedDBOwner, he obtains a reply that is veried against the public information held by the PubInfoStorage. We assume point-to-point secure communication among CertifiedDBOwner, User and PubInfoStorage. Such an assumption can be easily implemented, e.g., by using Die-Hellman key exchange for establishing once a common key to used for all the communications. Furthermore, the communication between any peer and an entity invoking its service is also secured.

Conclusions and Future Works

In this paper we have presented a distributed architecture for a Certied Information Access system. We have shown that it is possible to securely distribute the load of the most time-consuming operations among a set of untrusted peers. Furthermore, we have presented a solution that allows the usage of pre-computation in order to reduce the time needed by the CertifiedDBOwner for generating the public information. Currently, pre-computation is used only by the CertifiedDBOwner for constructing the tree leaves. A possible way of further reducing the computation time is to use pre-computation also during the tree construction phase on the BSP architecture. Unfortunately, to date, the PUBWCL library only allows on one hand mono-directional communication between CertifiedDBOwner and Cluster. The CertifiedDBOwner can send a task along with some data to the Cluster and, when the task is completed, some output is sent to the CertifiedDBOwner. On the other hand, because of the superstep synchronization, communication
1 A secure distributed scheme for the computation of hard committments scheme has been presented in [9] and it is reported, for the sake of completeness, in Appendix B.

12

between dierent subtasks, namely a pre-computation subtask and a tree-computation subtask, would not be ecient.

Acknowledgements

We want to thank Joachim Gehweiler from the Universitt of Paderborn for his invaluable help. a

References
[1] The JXTA Project. http://www.jxta.org, 2007. [2] distributed.net. http://www.distributed.net, Last Visit July 2007. [3] Great Internet Mersenne Prime Search (GIMPS). http://www.mersenne.org, Last Visit July 2007. [4] JavaGo RMI. http://www.joachim-gehweiler.de/en/software/javago.php, Last Visit July 2007. [5] Search for extraterrestril intelligence (SETI@home). http://setiathome.berkley.edu, Last Visit July 2007. [6] Olaf Bonorden, Joachim Gehweiler, and Friedhelm Meyer auf der Heide. A Web Computing Environment for Parallel Algorithms in Java. In Proceeedings of International Conference on Parallel Processing and Applied Mathematics (PPAM), pages 801808, 2005. [7] Dario Catalano, Yevgeniy Dodis, and Ivan Visconti. Mercurial commitments: Minimal assumptions and ecient constructions. In Shai Halevi and Tal Rabin, editors, Third Theory of Cryptography Conference (TCC 2006), volume 3876 of Lecture Notes in Computer Science, pages 120144. Springer, 2006. [8] Melissa Chase, Alexander Healy, Anna Lysyanskaya, Tal Malkin, and Leonid Reyzin. Mercurial commitments with applications to zero-knowledge sets. In Ronald Cramer, editor, 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT 2005), volume 3494 of Lecture Notes in Computer Science, pages 422439. Springer, 2005. [9] Aniello Del Sorbo, Clemente Galdi, and Giuseppe Persiano. Distributed certied information access for mobile devices. In Damien Sauveron, Constantinos Markantonakis, Angelos Bilas, and Jean-Jacques Quisquater, editors, WISTP, volume 4462 of Lecture Notes in Computer Science, pages 6779. Springer, 2007. [10] Li Gong. JXTA: A Network Programming Environment. 5(3):8895, 2001. IEEE Internet Computing,

[11] Moses Liskov. Updatable zero-knowledge databases. In Bimal K. Roy, editor, 11th International Conference on the Theory and Application of Cryptology and Information Security (ASIACRYPT 2005), volume 3788 of Lecture Notes in Computer Science, pages 174198. Springer, 2005.

13

[12] Silvio Micali, Michael O. Rabin, and Joe Kilian. Zero-knowledge sets. In 44th Symposium on Foundations of Computer Science (FOCS 2003), pages 8091. IEEE Computer Society, 2003. [13] R. Schollmeier. A Denition of Peer-to-Peer Networking for the Classication of Peer-toPeer Architectures and Applications. In Proceedings of the First International Conference on Peer-to-Peer Computing (P2P01), pages 101102. IEEE Computer Society, 2001. [14] Tatsurou Sekiguchi, Hidehiko Masuhara, and Akinori Yonezawa. A Simple Extension of Java Language for Controllable Transparent Migration and Its Portable Implementation. In COORDINATION 99: Proceedings of the Third International Conference on Coordination Languages and Models, pages 211226, London, UK, 1999. Springer-Verlag. [15] Leslie G. Valiant. A bridging model for parallel computation. Communications of ACM, 33(8):103111, 1990.

Certied Information Access via Mercurial Committments

In this appendix we describe in more details the generation of the tree of commitments, the construction of a answer by the CertifiedDBOwner and the verication procedure executed by the User. Generation of Public Information. To generate the public information representing the database D, the CertifiedDBOwner proceeds as follows. We stress, that even though not explicitly specied, all calls to Commit and SoftCommit take as input also the public parameter generated by the PubInfoStorage during the SETUP. The construction of the tree of commitments starts from its leaves. More precisely, for each x such that D(x) =, the CertifiedDBOwner produces (Cx , Dx ) = Commit(D(x)). Then for each x such that D(x) = but D(x ) = (where x is x with the last bit ipped), the CertifiedDBOwner produces (Cx , Dx ) = SoftCommit. For all the others x, Cx = . Now the tree is constructed in a bottom-up fashion as follows: for each level i = 1, , 0 and for each string s of length i, dene Cs as follows: 1. If Cs0 = and Cs1 = , the let (Cs , Ds ) = Commit((Cs0 , Cs1 )). 2. For all s such that Cs has been dened in the previous step (s is s with the last bit ipped) but Cs has not, dene (Cs , Ds ) = SoftCommit. 3. For all other s, dene Cs = . The value at the root C is the public information. If we have C = we set (C , D ) = SoftCommit. Answer to a query. This method constructs an object that contains the answer to the query and a proof of validity composed by the decommitments of the corresponding leaf along with the authenticating path (together with all the decommitments) to the root. More specically, we distinguish between the case in which the query x is such that D(x) = and D(x) =. For a string x we denote by x|i the rst i bits of x and by (x|i ) the rst i 1 bits of x followed by the i-th bit of x ipped.

14

If D(x) =, the authenticating path is computed by sending D(x), the corresponding decommitment key Dx and, for 0 i 1, the values (Cx|i 0 , Cx|i 1 ) along with the decommitment key Dx|i . Suppose instead that D(x) = and let h be largest value such that Cx|h = , set (Cx , Dx ) = Commit(), and build a path from x to Cx|h as follows: set (Cx , Dx ) = SoftCommit; for each level i from 1 to h + 1, dene (Cx|i , Dx|i ) = Commit(Cx|i 0 , Cx|i 1 ), and (C(x|i) , D(x|i) ) = SoftCommit. Note that the only values inside the tree redened by the above procedure are those that were not dened before. Let x = Tease(D(x), Cx , Dx ) and x|i = Tease((Cx|i 0 , Cx|i 1 ), Cx|i , Dx|i ) for 0 i < . The response to the query consists of along with its validation path: (Cx|i , C(x|i ) ) for 1 i and x|i for 0 i . Verication of an Answer. To verify the certied answer, for a query to a key x such that D(x) =, User executes the VerifyOpen method on all the decommitment received, from the bottom up to the root. The last verication is made against the database commitment that he has previously retrieved from the PubInfoStorage. In case the key that has been queried is not in the database, then the User has to execute VerifyTease instead of VerifyOpen. Hashing the values. In the discussion above, we have in several points constructed a hard commitment of two hard commitments. This will make the size of the commitment roughly double at each level. Instead we assume that instead of committing to a string (or to a pair of strings) we commit to its hash value computed using a collision-resistant hash function H which hashes down a string to a xed length = 128. Similarly, we can have a database with keys of dierent length if, instead of storing the pair (x, v) we store the pair (H(x), v).

Distributing Mercurial Committment Computation

This paragraph describes a way of distributing the computations needed to create and verify mercurial commitments while preserving the security of the Mercurial Committment scheme. Since the most time-consuming operation is the modular exponentiation, we show a way of distributing such operation securely. The idea behind the load distribution is to use the computational power of peers to execute modular exponentiations. In this way the CertifiedDBOwner and the User are only required to execute additions and multiplications. Crucial operations to be distributed are modular exponentiations in which either the exponent or both the base and the exponent are sensitive information. We assume that a service Mod Exp is run a set of peers that the User and the CertifiedDBOwner may use for such operations. Peers are assumed to be honest, i.e., compute correctly the modular exponentiations they are required to, but curious, in the sense that they may collect the information received in order to obtain information on the values queried by the user or on the elements of the database. Mod Exp: We assume that the peers can be identied by an ID in the set {1, . . . , t}, for some integer t. This service is invoked with input a base b, an exponent r, the modulus p and the ID of the peer that will execute the task. The peer with identity ID computes br mod p and sends back the result to the player who required it. We assume secure point-to-point communication between peers and the player who invokes their services.

15

Computation with secure exponent. We rst analyze the case in which the exponent is a secret information. Let k be an integer such that k t. We assume that the maximum number of colluding peers is at most k 1. In this case, given b and e, it is possible to compute be mod p keeping the exponent e private as follows:

Procedure Secure Exp(b, e, p, k): randomly select k out of the t peers and let {ID1 , . . . , IDk } be their identities. pick e1 , . . . , ek1 R Zp1 ek = e (e1 + . . . + ek1 ) mod p. ri = Mod Exp(b, ei , p, IDi ), for i = 1, . . . , k. r = k ri mod p i=1 output r The correctness of the above procedure follows immediately from the fact that ri = bei mod p and, thus, r = be1 +...+ek = be mod p. Security follows from the observation that the exponent e is shared among the k peers using a (k, k)-threshold secret sharing scheme. Thus, the only way to reconstruct e is to collect all the k shares. Computation with secure base and exponent. Using a similar idea, it is possible to compute be while keeping both the base b and the exponent e private to any coalition consisting of at most k 1 players, in case k 2 t. The main dierence is that, in this case, we need to properly share both the base and the exponent and recombine the partial results.

Procedure Secure Base Exp(b, e, p, k 2 ) randomly select k 2 peers out of the t and let {ID1 , . . . , IDk2 } be their identiers pick e1 , . . . , ek1 R Zp1 pick b1 , . . . , bk1 R Zp ek = e (e1 + . . . + ek1 ) mod p. bk = b/ k bi mod p. i=1 ri,j = Mod Exp(bi , ej , p, IDi(j1)+j1 ), for i = 1, . . . , k and j = 1, . . . , k ri = k ri,j mod p j=1 r = k ri mod p i=1 output r The correctness of the above procedure can be derived by observing that ri = be mod p i since Secure Base Exp implicitly contains an invocation of the procedure Secure Exp with parameters bi and e. Furthermore, r = be . . . be = ( k bi )e = be mod p. The security of i=1 1 k the procedure derives from the fact that we use two independent (k, k)-threshold secret sharing schemes for sharing b and e. Since each pair (bi , ej ), for i, j {1, . . . , k}, is assigned to a dierent peer, the only way to reconstruct the value of b (resp., the value of e) is to collect all the values bi (resp., ei ). Given the above procedures, we can distribute the computation of commitments as follows: Recall that, given the public parameters (p, g, h), a soft commitment consists of a pair Scom = (g r0 , g r1 ) and Sdec = (r0 , r1 ), where r0 and r1 are randomly chosen. In this case, the values r0 and r1 need to be kept private since they are used for the teasing of Scom. Thus, the SoftCommit procedure can be distributed as follows:

16

Procedure Procedure SoftCommit(p, g, h, k) r0 , r1 R Zp y0 = Secure Exp(g, r0 , p, k) y1 = Secure Exp(g, r1 , p, k) output Scom = (y0 , y1 ), and Sdec = (r0 , r1 ). The correctness of the above procedure follows immediately by inspection, while its security follows from the fact that y0 and y1 are computed by independent executions of the Secure Exp algorithm. Let us now consider hard commitments. A hard commitment, given the public parameters (p, g, h) and a message m, consists of a pair com = (g m (hr1 )r0 , hr1 ) and dec = (r0 , r1 ) where r0 and r1 are randomly chosen. In this case, clearly the value m need to be private for guaranteeing the hiding property of the commitment. The exponents r0 and r1 need to be private since they constitute the decommitment key dec. Finally the value hr1 needs to be kept private since it constitutes the second component of com. Indeed, if such value becomes public, an attacker that eavesdrops a pair (x, hr1 ) knows, w.h.p. that such pair denes a hard commitment, contradicting the indistinguishability of hard and soft commitments. Thus, the Commit procedure can be distributed as follows:

Procedure Procedure Commit(p, g, h, m, k 2 ) r0 , r1 R Zp . w = Secure Exp(g, m, p, k) y1 = Secure Exp(h, r1 , p, k) y0 = Secure Base Exp(y1 , r0 , p, k 2 ) set com = (wy0 , y1 ) and dec = (r0 , r1 ) The correctness of the above algorithm can be veried by inspection while, as before, its security follows from the independence of the computations for w, y1 and y0 .

Proofs

Proof of Lemma 1 Proof. Let assume that there exists a polynomial time algorithm A that, on input x, f and g, outputs the value y = EnvelopeGen(x) = f (g 1 (x)) with probability at least > 0. Given the algorithm A, we design a polynomial time algorithm A that on input g and x outputs = g 1 (x) with probability at least . The algorithm A works as shown in Figure 3: We note that, by hypothesis, F = EnvelopeGen(x) = f (g 1 (x)) with probability at least . Furthermore, the correctness of the output follows from the fact that f is a permutation. We note that the above lemma holds even if g is a one-way permutation. Proof of Lemma 1 Proof. Let us assume that there exists an algorithm A that, on input y = EnvelopeGen(x) = f (g 1 (x)), f and g, outputs the value of x with probability at least > 0. By using the algorithm A, it is possible to design a polynomial time algorithm A that given f and a value y = f (z), computes z with probability at least . The algorithm A works as shown in Figure 4: For any input y, by hypothesis, with probability at least , the algorithm A outputs a value x such that y = EnvelopeGen(x) = f (g 1 (x)). The correctness of the output and the probability of success follow by the fact that f and g are permutations. 17

Algorithm A (g, x) 1. (f, tf ) PermGen(1k ) 2. Let EnvelopeGen = A(x, f, g) 3. Compute z = f 1 (F ) 4. Output z. Figure 3: Inverting g Algorithm A (f, y). 1. (g, tg ) PermGen(1k ) 2. x A(y, f, g) 3. z g 1 (x). 4. Output z. Figure 4: Inverting f We note that the above lemma holds even if f is a one-way permutation.

P2P and Web-computing platforms

JXTA services JXTA technology is a set of open protocols that allows any connected device on the network to communicate and collaborate in a peer-to-peer style [1]. For a detailed denition of the peer-to-peer (P2P) networking concept, we refer to [13]. JXTA Project was originally conceived by Sun Microsystems, and designed with the participation of a small number of experts from academic institutions and industry [10]. Main goals were: Interoperability: trying to overcome the problem of binding each peer-to-peer system to a single service and so to a single infrastructure. Platform and programming language independence: trying to avoid binding the system to a single development environment and deployment platform. Ubiquity: implementable on every device ranging from mobile devices, PDAs, to PCs and servers. JXTA provides several features. Indeed, peers are able to: Discover other peers in the network. Self set up in peer groups. Expose and discover network services. Communicate in a direct way through pipes. 18

Monitor other peers activities. The JXTA architecture is quite simple and is presented in Figure 5. JXTA core is at the

Figure 5: The JXTA Architecture. Platform Layer and encapsulates the main primitives needed for basic P2P operations, such as discovery, transport, peers and groups creation, security. Upwards, there is the Services Layer which implements useful features, such as searching and indexing, storage systems, authentication, PKI, and so on. The Application Layer provides a set of integrated implementations, such as instant P2P messages, P2P e-mail systems, resource sharing, and so on. In a JXTA network, peers are organized in groups, satisfying the need to oer a common set of services. Each peer exposes its services through XML documents, called Advertisements, which specify all the information needed to other peers to connect and communicate. In order to exchange messages, peers use the Pipes. Pipes are used to asynchronously and unidirectionally transfer messages between peers that do not necessarily share a direct communication channel. Exchanged messages are XML documents, whose envelopes contain information on credentials, digest, and routing. Pipes are connected to pipe endpoints, which can be either input pipe or output pipe. Pipes oer the following connection modalities: Point-to-Point pipes. Two pipes are directly connected (input and output). Propagate pipes. One output pipe is connected to more input pipes. Secure Unicast pipes. It is a Point-to-Point pipe which oers a reliable and secure communication channel. Peers can fulll dierent activities and can be classied as the following: Minimal edge peer (limited resources). It sends and receives messages, but it cannot cache advertisements or route messages to other peers. Full-feature edge peer. It sends and receives messages, and typically caches advertisements. It replies to discovery requests but it does not send them. Rendezvous peer. It is similar to the full-feature one but it sends discovery requests to help other peers in the search. Whenever a peer joins a group, it looks for the rendezvous or becomes a rendezvous if it does not nd any. Each rendezvous maintains a list of rendezvous. 19

Relay peer. It maintains information on the routes to the other peers and routes messages. The PUBWCL library The PUBWCL library [6] has been designed to execute massively parallel algorithms in the BSP model on PCs distributed over the Internet, exploiting unused computation donated by PC owners. This goal is achieved eciently by supporting thread migration in Java and providing a load-balanced programming interface. The BSP (Bulk Synchronous Parallel) model was rst introduced in [15] in the context of distributed computation and parallel computing. This model provides a parallel computing scheme consisting of a set of processors with local memory, an interconnection mechanism allowing point-to-point communication, and a mechanism for barrier-style synchronizations, as showed in Figure D. A BSP program consists of a set of processes and a sequence of supersteps, i.e. time intervals bounded by the barrier synchronization.

Figure 6: The BSP model

However, when dealing with complex algorithms, BSP model suered of some eciency problems. In fact, due to the presence of barrier synchronization, the delay of a single BSP process (which can be caused, for instance, by a lower percentage of donated computational power) will cause the delay of the entire BSP program. To address this limitation, PUBWCL implements a thread migration to move processes at run-time to other hosts with currently more available computation power. Since the PUBWCL library is implemented in Java, it can address JavaGo RMI [4, 14] to support thread migration. This implementation consists in a source code transformation, which extends the Java programming language with three features: 1. Migrations are performed using the keyword go; 2. All methods which can be migrated have to be declared migratory; 3. The depth, up to which the stack will be migrated, can be bounded using undock statement. The JavaGo compiler jgoc translated this extended language into Java source code. Related works in the context of distributed parallel computing, such as distributed.net[2], GIMPS [3], or SETI@home [5], used to follow the master-worker paradigm, i.e. the master decomposes the BSP program into pieces of work, each consisting of one superstep in one BSP process. Once all BSP processes have been completed, the master moves to the next superstep. In this way, all the communication passes through the server. Instead, in PUBWCL a peer-topeer mechanism is achieved by letting threads migrating without the intervention of the master. However, administrative tasks (e.g. user management) and scheduling must be still performed on a client-server basis.

20

The architecture of the PUBWCL is presented in Figure D. As it is depicted, a user can request to the PUBWCL central server additional clients to distribute its application. The PUBWCL server sends a list of these clients and the execution is supervised by the user. On each of the assigned clients a PUBWCL environment is started and the parallel programs are executed.

Figure 7: The PUBWCL architecture

21

Das könnte Ihnen auch gefallen