Beruflich Dokumente
Kultur Dokumente
Yao Rong
Jie Li
{er40, yao.rong}@duke.edu
{jieli, ashwin}@cs.duke.edu
ABSTRACT
General Terms
Algorithms, Security
Keywords
1.
Ashwin Machanavajjhala
INTRODUCTION
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
DBSocial 13 New York, NY USA
Copyright 2013 ACM 978-1-2191-4 ...$15.00.
13
2.
sents Alice herself. Assume each user i has M binary attributes/features {xij }M
j=1 , where xij {0, 1}. For some
user-attribute pair (i, j), xij is public, while for others, xij
is hidden. If xij = 1, then we say that user i has a positive jth attribute/feature. Conversely, xij = 0 means the
user i has a negative jth attribute/feature. If xij is not
known (missing), then user i has a hidden (or private) jth
attribute/feature. Under this construction, Alice has access
to T = (L, X) for all public xij s.
Suppose Alice has a hidden attribute x0m {0, 1}. We
seek to:
RELATED WORK
3.
PROBLEM FORMULATION
Figure 1 shows the local social network around an enduser named Alice. Alice has a set of public attributes accessible by all of her friends. She has a hidden attribute, say,
her ability to speak Mandarin. She wants to find out, to
the best of her knowledge, whether an adversary can guess
this information based on her public attributes. She is also
worried if publicizing this attribute would breach any of her
friends privacy. Specifically, based on the structure of her
local network, Alice wants to:
3.1
In this section we formulate our error function E(xij ) associated with the estimator f (xij ). We first define the deviation function g(xij ) = g(i, j)
g(i, j) = |xij f (xij )|
(1)
(2)
14
4.2.1
4.
ATTRIBUTE INFERENCE
4.1
Importance of a friend
4.2.2
Value of an attribute
Observe that t runs through the friends of user i with (positive) feature j. |It+ | is the number of friends and positive
features associated with user t. As in the case of u(i, i0 ), we
downplay the significance of high-degree social nodes. This
utility function is designed so that feature j is more significant to user i if more of is friends have feature j.
4.2.3
Power of an attribute
4.3
Deterministic Algorithm
Mi = {m|xim is public}.
4.2
(7)
Utility Functions
(8)
15
4.4
Logistic Regression
where the precisions Hdi and Gsj are the Hessians evaluated at the modes. We can use p(|W) to approximate
p(Wij |W) for all public (i, j) by approximating integral:
R
p(Wij |W) = p(Wij |di , sj , )p(di , sj |W, )
p(|W)dsi ddj d,
i0 6=i
5.
Datasets
5.1.1
Google+ dataset
i0 6=i
5.1.2
The Facebook sampling dataset collected at UCI [4] contains the network of nearly one million unique users, their
network IDs and their privacy settings. Each person can
have zero, one or multiple network IDs, and exactly four privacy settings: add as friend, photo thumbnail, view friends,
send message. As more than 90% of users use the default
privacy settings (all enabled), we pre-process the dataset to
minimize the number of overlapping attributes that do not
add much information about the identity of the users. We
can also regard nodes with an exceptionally large number
of friends as the sensitive attributes and test how well our
model predicts these links.
Matrix Factorization
(11)
EXPERIMENTS
5.1
im
4.5
5.1.3
We created a new Facebook dataset corresponding to profiles of students at Duke University. We crawled Facebook
pages of Duke students, and retrieved attributes such as gender, education, employment, and likes. We use employment
as our sensitive attribute.
Duke Online phonebook is a service available for all of the
Duke students, staff and faculty, which returns a comprehensive set of attributes about Duke affiliates. We use data
from Duke phonebook as ground truth (when the Facebook
profiles of Duke students are hidden) and use it to verify the
quality of detecting attribute inference using our algorithms.
Table 1 shows the summary of datasets.
(10)
Specifically, we assume
Wij N (dTi sj , 1 ), gamma(a, b)
dki N (0, 1), skj N (0, 1 )
gamma(c, d), K Uniform(1, ..., Kmax )
Let = (, , K). We use the Integrated Nested Laplace
Approximation (INLA) [10] to approximate the posterior
predictive distribution p(Wij |W). First, approximate the
marginal posterior p(|W) by
p(W, D, S, )
,
p(|W) p(|W)
p(D, S|W, ) (D,S)=(D ,S )
Dataset
Google+
UCI FB
Duke FB
Nodes
5200
984K
1475
Attributes
School, Work
Popular Nodes
Work
Domain Size
275
367
69
16
5.2
Experimental Setup
X
1
|x0m f (x0m )|,
|M0 | mM
(12)
Figure 5: Scatter plot of degree of the user versus inference error for Task 1 using the Matrix algorithm
for the Duke Facebook dataset
(13)
(14)
B 0 (m) = 1
1 X
|xim f (xim |x0m )|,
|0m |
0
(15)
im
5.3
Algorithms
5.4
Results
Google+
.6844 .1068
.7635 .0788
.8073 .0917
.5082 .1385
UCI FB
.7490 .1233
.6812 .1381
.7249 .1192
.5201 .1305
Duke FB
.7511.0965
.7186.0611
.7401.0824
.6257.0717
Google+
.0217 .0079
.0225 .0062
.0334 .0093
.0119 .0027
UCI FB
.0091.0038
.0057.0027
.0048.0016
.0021.0035
Duke FB
.0419.0064
.0327.0071
.0648.0108
.0196.0163
17
Figure 3: Fraction of users with inference errors less than threshold for each of the three datasets used in
Task 1
Figure 4: Fraction of users that experience accuracy improvement of at least after Alice publicizes her
hidden attribute x0m .
work can give a reasonable estimate of hidden attributes:
information content in social networks are densely clustered.
[3]
6.
CONCLUSION
[4]
[5]
[6]
[7]
[8]
[9]
Task 2: determine whether publicizing her hidden attribute would breach her friends privacy.
We presented three novel schemes to answer the above two
questions. While the proposed framework is not able to
prove if Alices profile is free of attribute inference, our results indicate that in some cases, even the local network can
give a reasonable estimate of hidden attributes, and thus
can be used to warn individuals of such privacy breaches.
7.
[10]
[11]
[12]
REFERENCES
[13]
18