You are on page 1of 26

Integrity Verification of

Cloud--hosted Data Analytics
Cloud
Computations
(Position paper)
Hui (Wendy) Wang
Stevens Institute of Technology
New Jersey, USA

VLDB Cloud Intelligence workshop,
2012

9/4/2012

1

Data-Analytics
DataAnalytics--as
as--aService (DAaS
(DAaS))
Outsource: Data + Analytics Needs
Analytics results

Data Owner (Client)

Owns large volume of data
Computationally weak

Cloud

Computationally powerful
Provide data analytics as a service

Examples of DAaS
Google Prediction APIs, Amazon EC2
VLDB Cloud Intelligence workshop,
2012

9/4/2012

2

2012 9/4/2012 3 .Fear of Cloud: Security VLDB Cloud Intelligence workshop.

o The client is computationally weak to perform sophisticated analysis. VLDB Cloud Intelligence workshop.Integrity Verification of the result in DAaS Paradigm • The server cannot be fully trusted • We focus on result integrity o The client should be able to verify that the analytics result returned by the cloud is correct. 2012 9/4/2012 4 . • Challenges: o Analytics result is unknown before mining.

2012 9/4/2012 5 . challenge tokens [5]. and counterfeit records [9] • Protect sensitive data and data mining results in the data-mining-as-a-service (DMAS) o Association rule mining [1. 6-7] • Integrity Verification in DMAS Paradigm o Frequent itemset mining [8] VLDB Cloud Intelligence workshop. signatures on a chain of paired tuples [4].3].Related Work • Integrity Assurance for Database-as-a-Service (DAS) Paradigm o Merkle hash trees [2.

Outline • • • • • Introduction Related Work Preliminaries Our Verification Approach Future Work and Conclusion VLDB Cloud Intelligence workshop. 2012 9/4/2012 6 .

2012 9/4/2012 7 .Outline • • • • • Introduction Related Work Preliminaries Our Verification Approach Future Work and Conclusion VLDB Cloud Intelligence workshop.

2012 9/4/2012 8 .Summarization Form • Training data: an m×n dimensional matrix X • Target lables: an m × 1 matrix Y • Problem: o Find the parameter vector θ such that Y = θTX. • The solution: obtain θ∗ = (XTX)−1XT Y. • In other words. • This computation can be reformulated to compute (1) A = XTX. compute VLDB Cloud Intelligence workshop. and (2) B = XT Y.

Importance of Summarization Form • A large class of machine learning algorithms can be expressed in the summation form. 2012 9/4/2012 9 . Naïve Bayes. • Examples of summarization form based algorithms: o o o o o Locally weighted linear regression. Principle component analysis. Neural network. … VLDB Cloud Intelligence workshop.

2012 9/4/2012 10 . VLDB Cloud Intelligence workshop.Summarization Form in MapReduce • Mappers: o Each Mapper is assigned a split and/or a split o The Mappers compute the partial values • Reducers: o The Reducers sum up the partial values As and Bs.

without consulting other malicious workers. VLDB Cloud Intelligence workshop. • Collusive workers o Communicate with each other before cheating.Attack Model • Non-collusive workers o Return incorrect result independently. 2012 9/4/2012 11 .

Verification Goal VLDB Cloud Intelligence workshop. 2012 9/4/2012 12 .

2012 9/4/2012 13 .Outline • • • • • Introduction Related Work Preliminaries Our Verification Approach Future Work and Conclusion VLDB Cloud Intelligence workshop.

Architecture • MapReduce core consists of one master JobTracker task and many TaskTracker tasks. • We assume the master node is trusted. and is responsible for verification. VLDB Cloud Intelligence workshop. • Typical configurations run the JobTracker task on the same machine. and run TaskTracker tasks on other machines. called the master. called slaves. 2012 9/4/2012 14 .

2012 9/4/2012 15 .Overview of Our Verification Approaches • Catch non-collusive malicious workers o When honest workers take majority: the replication-based verification approach o When malicious workers take majority: the artificial data injection (ADI) approach • Catch collusive malicious workers o The instance-hiding verification approach VLDB Cloud Intelligence workshop.

2012 9/4/2012 16 . • The majority of workers will return correct answer • Workers whose results are inconsistent with the majority of the workers that are assigned the same task are caught as malicious. • If there is no winning answer by majority voting. the master node assigns more copies of the task to additional workers VLDB Cloud Intelligence workshop.Catch NonNon-collusive Malicious Workers • When honest workers take majority o We propose the replication-based verification approach • The master node assigns the same task to multiple workers.

Catch NonNon-collusive Malicious Workers • When malicious workers take majority o We propose the artificial data injection (ADI) verification approach • The master nodes inserts an artificial k × ℓ matrix Xa into Xs VLDB Cloud Intelligence workshop. 2012 9/4/2012 17 .

2012 9/4/2012 18 . β)-correctness requirement. o Otherwise. the master node concludes with 100% certainty that the worker returns incorrect answer. where β is the precision threshold given in (α. VLDB Cloud Intelligence workshop. the master node determines the result correctness with a probability .) • Verification o The master node pre-computes o After the worker returns checks whether .ADI Approach (Cont. the master node o If there exists any mismatch.

VLDB Cloud Intelligence workshop. • ℓ is independent of the size of the input matrix. Thus our ADI mechanism is especially useful for verification of computation of large matrics. it must satisfy that Where is the number of columns in the artificial matrix Xa (the number of rows of Xa is the same as that of Xs). • It only needs a small to catch workers that change a small fraction of result with high correctness probability.ADT Discussion • To satisfy (α. β)-correctness. 2012 9/4/2012 19 .

ADT Complexity • The complexity of verification preparation: O(kℓ). • The complexity of verification: O(kℓ2) o K: number of columns of the input matrix Xs o L: number of columns of the artificial matrix Xa VLDB Cloud Intelligence workshop. 2012 9/4/2012 20 .

• Before assigning X to the workers. the master node applies transformation on X by computing s s where T is a k×k transformation matrix. VLDB Cloud Intelligence workshop.Catch Collusive Malicious Workers • ADT approach cannot resist the cheating of collusive malicious workers • We propose the instance-hiding verification approach. o Then the master node injects the artificial data Xa into the transformed matrix X′s and apply the ADI verification procedure. s o Ts is unique for each input matrix Xs. 2012 9/4/2012 21 .

the master node computes o After that.Post--processing Post • The post-processing procedure eliminates the noise due to the insertion of artificial matrix • Non-collusive malicious workers o For the ADI verification approach. the master node computes VLDB Cloud Intelligence workshop. 2012 9/4/2012 22 . the master node returns as the real answer of • Collusive malicious workers o For the instance-hiding approach.

in which a large class of machine learning algorithms can be expressed.Conclusion • Result integrity verification of the result in cloudbased data-analytics-as-a-service paradigm is very important • We consider summarization form. • We propose verification approaches for both noncollusive and collusive malicious Mappers VLDB Cloud Intelligence workshop. 2012 9/4/2012 23 .

2012 9/4/2012 24 .. as well as whether collusive workers take the majority. in a cloud in practice? • Can we achieve a deterministic verification guarantee by adapting the existing cryptographic techniques to DAaS paradigm? VLDB Cloud Intelligence workshop. e.Open Questions • What will be the cost that our verification techniques will bring to computations in real-world cloud. Amazon EC2? • Can we define a budget-driven model to allow the client to specify her verification needs in terms of budget (possibly in monetary format) besides α and β? • How can we identify the collusive and non-collusive workers.g.

Giannotti. K. M. W. In: SIGKDD (2010) Wong..-S. In: SPCC (2010) Li. Kao. Meng.. In: SIGMOD (2006) Mykletun. In: VLDB (2007) VLDB Cloud Intelligence workshop. Hung. Tsudik. H.S. N.: Query execution assurance for outsourced databases. 3. M.. 5....W. Narasimha. 9. A. 2012 9/4/2012 25 . M. F. X. D. Ramamritham. B. E. In: VLDB (2007) Wong. 8..: k-support anonymity based on pseudo taxonomy for outsourcing of frequent itemset mining.. In: VLDB (2005) Tai. Hung. L.-L. H.. Monreale. A. R. D. H. M.. G. N. F. Wang. 4.. Mamoulis. G..: Security in outsourcing of association rule mining.: Verifying completeness of relational query results in data publishing.: Privacy-preserving mining of association rules from outsourced transaction databases. D. B. Hadjieleftheriou... 7. Yu. W.. Kao... 2.. Cheung.: An audit environment for outsourcing of frequent itemset mining. Chen. K. Trans. E. E. Kollios. L.-H.. Wang.. Jain. 6.. Cheung. P. Mamoulis.K. PVLDB 2 (2009) Xie. C.: Authentication and integrity in outsourced databases.: Dynamic authenticated index structures for outsourced databases. Storage 2 (May 2006) Pang.W. Tan.. In: SIGMOD (2005) Sion.V.: Integrity auditing of outsourced data.References 1.. Pedreschi. J. Reyzin.K. Yin... Lakshmanan.

2012 9/4/2012 26 .Q&A • Thanks ! VLDB Cloud Intelligence workshop.