Presentation 1

Efficient Closeness Privacy Metric
for
Data Publishing
P .RA J E S H 210CS2001
Need Of Privacy with Utility

y Collection of Digital Information by Governments , Corporations,
& Individuals y Publishing data driven by mutual benefits & regulations y Demand for exchange & Publication of data among various parties
Example:
Netfix , a popular online movie rental service , recently published a
data set containing movie ratings of 500,000 subscribers, in a drive to improve the accuracy of movie recommendations based on personal preferences [New York Times, 2006]
Licensed Hospitals in California are required to submit specific
demographic data on every patient disc charged from their facility [Carlisle et al. 2007]
Need Of Privacy with Utility

y Published Data (original data ) contains sensitive information y y y y y
about Individuals Such data publications violate individual privacy Published data rammed by policies , guidelines , & agreements Leading to excessive data distortion or insufficient protections Needs a method for publishing useful information on while preserving data privacy which names Privacy Preserving Data Publishing Privacy preserving data publishing to preserve the data , providing useful information
Ladder of Privacy Models y k Anonymity

y " diversity
y t Closeness y (n , t) Closeness
Basic words in Tables

SSN 101 102 209 340 789 123 657 Name Mr.A Mr.B Mr.C Mr.D Mr.E Mr.G Mr.H JOB Engineer Engineer Lawyer Writer Writer Dancer Dancer SEX Male Male Male Female Female Female Female AGE 35 38 38 30 30 30 30 DISEASE Hepatitis Hepatitis HIV Flu HIV HIV HIV
y Explicit Identifier : explicitly Identifies record Owner -- Name
, SSN y Quasi Identifier : potentially Identifies record owner -- sex , age , zip code y Sensitive Attribute : Person specific information -- Disease , Salary
Concerning Disclosures
y Identity Disclosure
Attacker can identify a subject or respondent from the released data y Attribute Disclosure Confidential information about a data subject is revealed
SSN 101 102 209 340 789 123 657 Name Mr.A Mr.B Mr.C Mr.D Mr.E Mr.G Mr.H JOB Engineer Engineer Lawyer Writer Writer Dancer Dancer SEX Male Male Male Female Female Female Female AGE 35 38 38 30 30 30 30 DISEASE Hepatitis Hepatitis HIV Flu HIV HIV HIV
k - Anonymity
let RT(A1,...,An) be a table and QIRT be the quasi-identifier associated with it. RT is said to satisfy k -anonymity if and only if each sequence of values in RT[QIRT] appears with at least k occurrences in RT[QIRT]. Drawbacks y k Anonymity protects against identity disclosure but not attribute disclosure y Homogeneity Attack y Back ground knowledge Attack
" diversity
An equivalence class is said to have l -diversity if there are at least l well-represented values for the sensitive attribute. A table is said to have l - diversity if every equivalence class of the table has l diversity Drawbacks
y " diversity principle beyond k anonymity in protection

y y y y
against attribute disclosure Achieving is difficult Insufficient to prevent attribute disclosure Similarity Attack Skewness Attack
t closeness
An equivalence lass is said to have t-closeness if the distance between the distribution of a sensitive attribute in this class and the distribution of the attribute in the whole table is no more than a threshold t. A table is said to have t-closeness if all equivalence classes have t-closeness
Drawbacks
y Distance between two probability functions is measured by
using
earth movers distance (emd) method
y emd does not satisfy distance metric property
Probability scaling y Loss of information i.e. Utility is less y Extension of t closeness is (n,t) closeness
(n, t) closeness
y Extended t closeness with different distance metric y t closeness & (n, t ) closeness protect against attribute
disclosure
Drawbacks
y Doesnt deal Identity disclosure y Desired to use (n , t) closeness with k anonymity y Possibility of
y Homogeneity y Background knowledge y Similarity Attacks
Need of New Model

y As Privacy is important in publishing data y t closeness achieves the privacy y (n, t ) closeness also achieves privacy by maintaining same t y y y y y y
closeness metric Compromise with privacy or utility Trading between Privacy and utility should be good Preserving of data using t closeness metric To provide t- closeness , distance between two probabilities is measured Distance measure forms correlation privacy and utility Deploying other distance metric may give a chance efficient model
Design of New Model
Bhattacharya Distance Measure

y Distance between two probabilities measures using
Bhattayacharya distance metric y Bhattacharya satisfies all distance metric properties y Formula:
for Discrete Attribute for Continuous Attribute
p(x) , q(x) are probabilities of x in one equivalence class and whole table
Bhattacharya Vs (n, t) distance meesure

y Computed the distance between two probabilities using
Bhattacharya distance method and (n, t) distance method Experimental Results y D1 distance value using Bhattacharya method y D2 distance value
using (n, t) method.
Variations with (n,t) closeness Distance

y Bhattacharya is close comparatively with (n, t) distance
method
y Anonymixation Hides the information
Experimental Results y Anonymization is inverse to Utility

y Amount of anonymization is less y Loss of information is very less y Efficient utility than (n, t) closeness
Dealing With Identity Disclosure

y Identity Disclosure is one of the serious privacy concern y K anonymity is well known model for Identity Disclosure y Problem with k anonymity is that an intruder can know
about the cofidentional information of individuals in anonymized data. y Solution is Data Reconstruction Approach
Data Reconstruction Approach

y Mainly two step
1)Randomization with discretize median 2) Swapping with 2nd order distribution
Randomization with Discretized Median

y Randomization means adding noise to column data y X is the original data y Y s the noise data y Z is the randomized data
y
i.e Z = X + Y
y Original table has X attribute with (x1,x2,x3,..xn)
where n is number of records y Y is the noise data with (y1,y2,..,yk) and yi is the ith group median where k is number of discretized groups y Z is (z1,z2,z3zn)
Computing Noise
ID 1 2 3 4 5 6 7 8 9 10 11 12 ID 1 2 3 4 5 6 7 8 9 10 11 12
Age 23 23 27 28 32 33 35 37 40 43 45 45 Age 28 28 28 28 28 28 40 40 40 40 40 40
Marital Status Never married Never married Never married Never married Married Married Married Divorced Widow(er) Divorced Married Divorced Marital Status Never married Never married Never married Never married Married Married Married Divorced Widow(er) Divorced Married Divorced
BP 75/12 0 66/11 3 74/11 5 77/12 8 72/12 5 93/14 7 75/12 4 BP 95/14 2 75/12 88/14 0 6 66/11 110/1 3 55 74/11 90/14 5 0 77/12 104/1 8 45 72/12 5 93/14 7 75/12
Blood Type O O A AB B O AB O A O O B Blood Type O O A AB B O AB O A O O B
Test Result Negative Positive Negative Negative Negative Negative Positive Negative Positive Positive Positive Positive Test Result Negative Positive Negative Negative Negative Negative Positive Negative Positive Positive Positive Positive
Original table
Age Attribute values divided into two groups and their group median is being masked age <=34 with median = 28 age>34 with median = 40
Anonymized Table
Experimental Results
Experimental Results
Swapping
y Swapping with Second order Frequency distribution y Algorithm Input : The Set of all Frequency tables Ft , 0 <= t <=2 Output : An N x V database D consistent with Ft For i = 1 to N Do Choose ( i,1, f0 (1)); For j=2 toV Do p2=0; For k = 1 to j-1 Do p1= f1(i , k, j); p2= f2(p1, p2); End End End
Algorithm terms
y Choose( i , j,p) sets Dij =0 with probability p and to 1 otherwise y Choose(i, j, p) sets Dij = 0 with probability p, and to 1 otherwise. y There are obvious choices for the selection of fO, f1, f2, and f3. Notably,
if F1[vj=Di,j] f 1(i, j, k) =
y f 2(p1,p2) = p1+p2 y f3(p2,n) = (p2)/(n-1)
otherwise
Future Work
y Design a clustering based Efficient Closeness algorithm with
overlapping equivalence classes. y Applying Efficient Closeness in Social Networking to prevent Neighbour hood Attack
Conclusion
y Trading between privacy & utility sholud be good in
publishing data , considering Identity Disclosure , Attribute Disclosure which can be addressed by closeness metric using Bhattayacharya distance method provided with Data reconstruction Approach. y Existing privacy models were unable to maintain good utility while our approach achieved and therefore became Efficient Closeness Privacy Metric For Publishing Data
References Preserving Data Publishing A Survey of Recent Developments , ACM Computing Fung et al., Privacy
Surveys , Vol-42(4) June 20 0.
Charu C Aggarwal , Jian Pei , Bo Zhang , On privacy preservation against adversarial data mining,
ACM Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, USA 2006.
Charu C Aggarwal , Philips, Privacy preserving data mining : models and algorithms, Springer ,
Advances in database Systems , Vol 34 , 2008.

George T. Duncan, Thomas B. Jabine, and Virginia A. de Wolf, PRIVATE LIVES AND PUBLIC
POLICIES,NATIONAL ACADEMY PRESS , 1993.

Mohammad S. Obaidat, Joaquim Filipe , e- Business and Telecommunications , Springer ,
Communications in computer and information sciences , 6th international joint conference ,July 2009.
L. Sweeney. k-anonymity: a model for protecting privacy. International Journal on Uncertainty,
Fuzziness and Knowledge-based Systems, 10 (5), 2002; 557-570.

LI, N., LI, T., AND VENKATASUBRAMANIAN S, t-closeness: Privacy beyond k-anonymity and l-
diversity. In Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE). 2007.
MACHANAVAJJHALA, A., KIFER, D., GEHRKE, J., AND VENKITASUBRAMANIAM, M, l-
diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov.

MOTWANI, R. AND XU,Y, Efficient algoData 1, 2007.rithms for masking and finding quasi-
identifiers. In Proceedings of the Conference onVery Large Data Bases (VLDB),2007.

Ninghui Li , Tiancheng Li , Venkatasubramaniam S , Closeness : A new privacy measure for data
publishing , IEEE , Knowledge and Data Engineering , vol-22(7), July 2007 , pg: 943 - 956.
REISS, S. P, Practical data-swapping: The first steps. ACM Trans. Datab. Syst. 9, 1, 2037, 1984. K G Derpanis , The Bhattacharyya Measure , Mendeley , Computer ,Volume: 1, Issue: 4, Pages: 1990-
1992 , 2008.
Road Map
Period June September September October October November Activity Literature survey and learning R programming language Proposal of model , theoretical computations , Practical Implementations of Bhattacharya , randomization, comparison of distance methods Swapping with 2nd order distribution implementation Evaluation of methods using various parameters Thesis Report Review of Thesis
December January January February February March March April
THANKING YOU .!
Seminar Presentation under the Guidance of
Prof. K Satya Babu

Presentation 1

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Presentation 1

Hochgeladen von

Copyright:

Verfügbare Formate

Efficient Closeness Privacy Metric

Need Of Privacy with Utility

Need Of Privacy with Utility

Ladder of Privacy Models y k Anonymity

Basic words in Tables

y Explicit Identifier : explicitly Identifies record Owner -- Name

y " diversity principle beyond k anonymity in protection

Need of New Model

Design of New Model

Bhattacharya Distance Measure

Bhattacharya Vs (n, t) distance meesure

Variations with (n,t) closeness Distance

Experimental Results y Anonymization is inverse to Utility

Dealing With Identity Disclosure

Data Reconstruction Approach

1)Randomization with discretize median 2) Swapping with 2nd order distribution

Randomization with Discretized Median

y Original table has X attribute with (x1,x2,x3,..xn)

Blood Type O O A AB B O AB O A O O B Blood Type O O A AB B O AB O A O O B

Advances in database Systems , Vol 34 , 2008.

POLICIES,NATIONAL ACADEMY PRESS , 1993.

Fuzziness and Knowledge-based Systems, 10 (5), 2002; 557-570.

diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov.

identifiers. In Proceedings of the Conference onVery Large Data Bases (VLDB),2007.

December January January February February March March April

Prof. K Satya Babu

Das könnte Ihnen auch gefallen