Curso - Protect Yourself From Curse of Attribute Inference

curso: Protect Yourself from Curse of Attribute Inference
A social network privacy-analyzer

Eunsu Ryu
Yao Rong
Jie Li
Dept. of Computer Science

Duke University, Durham, NC, USA
{er40, yao.rong}@duke.edu
{jieli, ashwin}@cs.duke.edu
ABSTRACT
raised serious privacy concerns [2, 7, 9, 13]. For instance,

social networks have been criticized for leaking user privacy
[6], and advertisers take advantage of social networks to collect information about users.
As a remedy, social networking companies allow users to
hide a portion of their profiles, or to select specific groups
of friends with whom to share sensitive information. Unfortunately, it has been shown that this approach does little
to protect users from privacy breach. Recent work [5, 13]
demonstrates that it is still possible to infer sensitive user
attributes to an embarrassingly high accuracy using only
friendship and group information. The fact that every user
publishes different parts of profile implies that private information is present, can be learned, and even shared subconsciously in the social network. For instance, while Alice may
want to keep the fact that she can speak Mandarin private,
if all of her friends publicize the fact that they speak Mandarin, then one might infer with high probability that Alice
also speaks Mandarin.
Therefore, the access control mechanisms provided by social networks cannot protect against such privacy breaches,
and in fact lull users into a false sense of privacy. It is thus
important to raise awareness amongst social network users
about the possibility of the aforementioned attribute inference attacks. Our research goal is to build tools that can
execute on behalf of a user, detect potential attribute inference attacks, and warn the user so that they can make
an informed decision. In this paper, we present some initial
work toward this goal.
While social networking platforms allow users to control

how their private information is shared, recent research has
shown that a users sensitive attribute can be inferred based
on friendship links and group memberships, even when the
attribute value is not shared with anyone else. Thus, existing access control mechanisms are unable to protect against
such privacy breaches.
Our research goal is to develop tools that help a user Alice
be aware of privacy breaches via attribute inference. In this
paper, we specifically focus on two problems: (a) whether
Alices sensitive attribute can be inferred based on public
information in Alices neighborhood, and (b) whether making Alices sensitive attribute public leads to the disclosure
of sensitive information of another user Bob in Alices neighborhood. We propose three algorithms to detect the aforementioned privacy breaches. We limit our scope to the onehop neighbors of Alice information that is visible to an
app that can be executed on behalf of Alice. Our results indicate that analyzing local networks is sufficient to extract
a significant amount of information about most users.
Categories and Subject Descriptors

H.2 [Database Management]: Data mining
General Terms
Algorithms, Security
Keywords
Contributions: In this paper we focus on two concrete

problems: (a) whether a user Alices sensitive information
can be inferred based on public attributes of her friends, and
(b) whether making Alices attribute value publicly accessible results in the disclosure of a private attribute value for
another user Bob in Alices neighborhood. While the former
problem directly affects Alices privacy, the latter problem
may inform a conscientious friend of Bob about Bobs privacy disclosure. We depart from prior work in the following
way: rather than analyzing the risks of attribute inference
using large global networks that contain millions of users and
thousands of attributes [5, 12, 13], we focus our attention to
only the one-hop neighborhood of Alice. Not only does this
allows our algorithms to be very efficient, focusing on the
immediate neighborhood would help building tools (future
work) that run on behalf of the user by leveraging the information in the social network that is accessible to the user
(via APIs). Using the entire social network would require
social networks, attribute inference
1.
Ashwin Machanavajjhala
Dept. of Electrical & Comp. Engg

Duke University, Durham, NC, USA
INTRODUCTION
Social networks have gained a wide popularity over the

past decade. While the unprecedented success of the social
networking industry has established an attractive ecosystem
with advertisements and social gaming, the increasing volume of personal information shared in social networks has
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
DBSocial 13 New York, NY USA
Copyright 2013 ACM 978-1-2191-4 ...$15.00.
13
these tools to be built in cooperation with the social network

platform.
We propose three models for detecting and quantifying
the aforementioned privacy breach scenarios. Our methods
are evaluated on real social network data. Our results indicate that analyzing the one-hop neighborhood is sufficient
to infer the values of private attributes from a significant
fraction of users. However, it should be noted that the proposed framework is unable to prove if Alices profile is free
of attribute inference, as the adversary can potentially have
more information than is available to our tool.
Outline: The remainder of the paper is organized as follows. In Section 2, we discuss related work. Section 3 introduces notation, and describes the problem formulation.
Section 4 presents three novel models for attribute inferences
along with inference methods. In Section 5 we discuss experiments and performance evaluations of our models, and
we conclude in Section 6.
2.
Figure 1: Local network around an end-user Alice.

Alice only has access to her local network.
sents Alice herself. Assume each user i has M binary attributes/features {xij }M
j=1 , where xij {0, 1}. For some
user-attribute pair (i, j), xij is public, while for others, xij
is hidden. If xij = 1, then we say that user i has a positive jth attribute/feature. Conversely, xij = 0 means the
user i has a negative jth attribute/feature. If xij is not
known (missing), then user i has a hidden (or private) jth
attribute/feature. Under this construction, Alice has access
to T = (L, X) for all public xij s.
Suppose Alice has a hidden attribute x0m {0, 1}. We
seek to:
RELATED WORK
Many recent research publications analyze social network

structures to infer hidden attributes. Zheleva & Getoor [13]
use a large Facebook network to show that friendships and
group memberships contain sufficient information to learn
end-user hidden attributes with astounding accuracy. Other
work on similar lines [7, 9], use friendships and attribute information to evaluate risks of privacy breach. Backstrom et
al [2] also show active attacks through which adversaries may
reidentify and learn sensitive information from anonymized
social networks.
Many models have been proposed for inferring links and
attributes in social networks. In [1, 5, 12], authors use socialattribute networks (SANs) to jointly infer latent attributes
as well as friendships.
The attribute inference problem can be formulated as a
clustering problem [3], a matrix factorization problem [8],
or a regression problem. Restrictive Boltzmann Machine
(RBM) introduced in [11] also has an interesting application in inferring latent features such as hidden attributes.
Using RBM for social network attribute inference may be
an interesting direction for future research.
3.
Quantify the amount of information about x0m that can

be inferred by adversaries based on the structure of T
(for Task 1).
Quantify the gain of information about the attribute m
across the network L induced by publicizing x0m (for
Task 2).
Specifically, we seek to design an estimator/predictor
f (xij ) for a hidden attribute j of user i. Based on f (xij ),
we compute an error function E(xij ) that evaluates goodness of our estimation. We shall design E(xij ) to have a low
value if:
An adversary can guess x0m with a high accuracy (for
Task 1).
Alice breaches her friends privacy by publicizing x0m
(for Task 2).
PROBLEM FORMULATION
For > 0, we declare that privacy is breached at level

if E(xij ) < .
Figure 1 shows the local social network around an enduser named Alice. Alice has a set of public attributes accessible by all of her friends. She has a hidden attribute, say,
her ability to speak Mandarin. She wants to find out, to
the best of her knowledge, whether an adversary can guess
this information based on her public attributes. She is also
worried if publicizing this attribute would breach any of her
friends privacy. Specifically, based on the structure of her
local network, Alice wants to:
3.1
Deviation and Error Metrics
In this section we formulate our error function E(xij ) associated with the estimator f (xij ). We first define the deviation function g(xij ) = g(i, j)
g(i, j) = |xij f (xij )|
(1)
as the residue in approximating xij with f (xij ).

Suppose Alice knows the value of her hidden attribute
x0m {0, 1}. For Task 1, we declare that x0m is breached
at level if it is possible to infer x0m up to an error based
on Alices local network L:
Task 1: determine whether her secret can be guessed

from public information.
Task 2: determine whether publicizing her hidden attribute would breach her friends privacy.
Formally, consider a local network L = (V, E) around a user
(Alice). This network is a graph that consists of Alice and
her friends; each node i V represents a user while the
edges in E model friendships. Let N denote the number
of Alices friends so that |V | = N + 1, and i = 0 repre-
E0 (m) g(0, m) = |x0m f (x0m )| < .
(2)
Since x0m is either 0 or 1, another interpretation of E0 (m)

is that an adversary can use f (x0m ) to correctly guess x0m
with probability 1 E0 (m).
14
4.2.1
For Task 2, we say that the mth attribute is breached

at level due to x0m , if the deviation after publicizing x0m
is on average lower than that before publicizing x0m . Let
m = {i|xim is public} denote the set of Alices neighbors
whose mth attribute value is public. Then we have

1 X
g(i, m) g 0 (i, m) < .
(3)
E(m)
|m | i
We define the importance of user i0 to user i as:

X
1
1
.
(4)
u(i, i0 ) =
log |Gi0 |
log
|It |
+
+
tIi I 0
i
Note that t runs through all the common friends and

attributes between i and i0 . |It | is the number of users
connected to the user/attribute t, signifying the popularity
of t. |Gi0 | is the total number of friends associated with i0 .
The logarithm log |It | is inspired by the Adamic-Adar
(AA) notion defined in [7], in which popular friends and
attributes are considered less significant. The multiplicative
1
factor log |G
takes into account the local nature of our neti0 |
work L by further reducing the significance of a social node
with a large number of friends (e.g. celebrities). u(i, i0 )
quantifies the significance of user i0 to user i, and is larger if
the i and i0 share more friends (or attributes) in common.
Here g(i, m) = g(xim ) is the deviation without x0m , while

g 0 (i, m) g(xim |x0m ) is the deviation given x0m . With
the error functions defined as above, we now turn our attention to the design of the estimator function f (xij ) to infer
attribute values.
4.
ATTRIBUTE INFERENCE
In this section, we present three techniques for inferring

private attributes values using a users 1-hop neighborhood
in a social network. Before we describe our algorithms, we
start by describing the social-attribute network model, and
describe utility metrics that will be used in our algorithms.
4.1
Importance of a friend
4.2.2
Value of an attribute
Define v(j, i) as the value of attribute j to user i:

X
1
.
(5)
v(j, i) =
log
|It+ |
tI I
Social-Attribute Network Model
We adopt the notion of Social-Attribute Networks (SAN)

[12]. A SAN can be constructed by augmenting the the
original network L with M distinct nodes corresponding to
M attributes. The original nodes corresponding to the users
are called social nodes, and the new nodes representing the
attributes are called the attribute nodes. An undirected link
between the user i and attribute j is formed if xij is public
(positive or negative). Figure 2 shows an example of SAN
(from [5]).
Observe that t runs through the friends of user i with (positive) feature j. |It+ | is the number of friends and positive
features associated with user t. As in the case of u(i, i0 ), we
downplay the significance of high-degree social nodes. This
utility function is designed so that feature j is more significant to user i if more of is friends have feature j.
4.2.3
Power of an attribute
The power of an attribute j (having value z) to user i is:

X
wz (i, j) =
u(i, t).
(6)
tIi Fjz
Here t runs through all the friends of user i having value

z for feature j (i.e. xij = z). The power of xij = z is
obtained by adding up the importance of all the friends i0 s
of user i, having xi0 j = 1. For example, the power of the
ability to speak Chinese to Alice is computed by summing
the importance of all of her Chinese-speaking friends.
We define the relative power of attribute j to user i:
Figure 2: Example of a simple SAN model (from [5])

The plus sign between a social node ui and and attribute
node j means xij = 1, while a minus sign signifies xij = 0.
The mutex links tie a set of mutually exclusive attributes
together so that no two mutually exclusive attributes are
selected simultaneously.
Based on the above description of a SAN, we define a number of useful sets that will be used in this section. As before,
we let i to represent a user, and j an attribute.
wij = w1 (i, j) w0 (i, j).
Note wij > 0 if and only if xij = 1 has more power/significance

to user i than does xij = 0. wij = 0 means xij = 1 possesses equal importance to xij = 0.
Now we present three designs of the estimator f (xij ).
4.3
Deterministic Algorithm
For a hidden attribute xij of interest, we compute the

relative power wij as defined in (7):
Il = {all users connected to user/attribute l }.

Ii+ = {all friends and positive attributes of user i}.
Fjz = {all users with feature j having value z}.
wij = w1 (i, j) w0 (i, j).
Gi = {all friends of user i in the network}.
Since wij can be any real number, we map it onto (0, 1)

to construct the estimator f (xij ):
Mi = {m|xim is public}.
4.2
(7)
f (xij ) = h(wij ) = 1/[1 + exp(wij )],
Utility Functions
Here, we define three utility functions as weighted sums

of common neighbors with lower weights on popular nodes.
(8)
where h() = 1/(1 + exp()) is the sigmoid function. We

say that xij = 1 is more likely than xij = 0 if xij = 1 has
15
by stochastic gradient descent on log p(D, S|W, ). For

simplicity, we employ
Y
Y
pG (D, S|W, ) =
N (di |di , H1
N (sj |sj , G1
d )
s )
more power relative to xij = 0 (i.e. wij > 0). Conversely,

xij = 0 is more likely than xij = 1 if wij < 0. In short,
wij > 0 implies that xij = 1 is a better guess than xij = 0.
4.4
Logistic Regression
Next, we use logistic regression to model an adversary

trying to learn a sensitive attribute xim associated with user
i and a given feature m. Since xim takes on binary values
and is currently hidden, we model
X

Pr [xim = 1] = h(
u(i, i0 ) + v(m, i0 ) i0 )
(9)
where the precisions Hdi and Gsj are the Hessians evaluated at the modes. We can use p(|W) to approximate
p(Wij |W) for all public (i, j) by approximating integral:
R
p(Wij |W) = p(Wij |di , sj , )p(di , sj |W, )
p(|W)dsi ddj d,
i0 6=i
from which we can approximate the expectation E(Wij |W ).

We can then construct the estimator f (xij ) by
using the utility functions u(i, i0 ) and v(j, i0 ) defined in (4)

and (5). To learn the coefficients i0 s, we will use regularized maximum likelihood approach with `1 penalty on .
Specifically, we minimize `() defined as
X
[xim log h(yim ) + (1 xim ) log(1 h(yim ))] + kk1 .
f (xij ) = h(E(Wij |W )).
5.
We may use known algorithms (such as gradient methods)

to solve the above optimization problem in . Once the
are estimated, we may construct f (xim ) by
coefficients
simply computing the predictive value yim
X

yim =
u(i, i0 ) + v(m, i0 ) i0
Datasets
5.1.1
Google+ dataset
The Google+ dataset introduced in [5] contains the social

and attribute links (SAN) of roughly 5200 users collected
separately at three different times of the year 2012. The
authors use the education and employment profiles of the
targets to construct a vocabulary of attributes. For analysis, we use education and employment attribute values that
belong to more than five users.
i0 6=i
and taking the sigmoid transformation:

f (xim ) = Pr(
xim = 1) = h(
yim ).
5.1.2
UCI Facebook data
The Facebook sampling dataset collected at UCI [4] contains the network of nearly one million unique users, their
network IDs and their privacy settings. Each person can
have zero, one or multiple network IDs, and exactly four privacy settings: add as friend, photo thumbnail, view friends,
send message. As more than 90% of users use the default
privacy settings (all enabled), we pre-process the dataset to
minimize the number of overlapping attributes that do not
add much information about the identity of the users. We
can also regard nodes with an exceptionally large number
of friends as the sensitive attributes and test how well our
model predicts these links.
Matrix Factorization
Here we use a Bayesian model to construct the estimator

f (xij ). For all public xij s, we compute the relative power
as in (7)
X
wij = w1 (i, j) w0 (i, j), wz (i, j) =
u(i, t),
tIi Fjz
and organize them in an (N + 1) M array W = [Wij ]ij

R(N +1)M . Observe that the matrix W has missing values.
Our goal is to estimate those missing entries Wij associated
with the hidden attribute of interest xij .
We first assume that the matrix W can be represented as
the inner product of two latent matrices D and S plus some
noise:
W = DT S + E
(11)
EXPERIMENTS
5.1
im
4.5
5.1.3
Duke Facebook data
We created a new Facebook dataset corresponding to profiles of students at Duke University. We crawled Facebook
pages of Duke students, and retrieved attributes such as gender, education, employment, and likes. We use employment
as our sensitive attribute.
Duke Online phonebook is a service available for all of the
Duke students, staff and faculty, which returns a comprehensive set of attributes about Duke affiliates. We use data
from Duke phonebook as ground truth (when the Facebook
profiles of Duke students are hidden) and use it to verify the
quality of detecting attribute inference using our algorithms.
Table 1 shows the summary of datasets.
(10)
Specifically, we assume
Wij N (dTi sj , 1 ), gamma(a, b)
dki N (0, 1), skj N (0, 1 )
gamma(c, d), K Uniform(1, ..., Kmax )
Let = (, , K). We use the Integrated Nested Laplace
Approximation (INLA) [10] to approximate the posterior
predictive distribution p(Wij |W). First, approximate the
marginal posterior p(|W) by

p(W, D, S, )
,
p(|W) p(|W)
p(D, S|W, ) (D,S)=(D ,S )
Dataset
Google+
UCI FB
Duke FB
Nodes
5200
984K
1475
Attributes
School, Work
Popular Nodes
Work
Domain Size
275
367
69
Table 1: Summary of datasets
where pG is the Gaussian approximation to p(D, S|W, )

with mode at (D , W ). These modes can be approximated
16
5.2
Experimental Setup
In order to test the performance of our proposed approaches,

we evaluate the prediction/inference accuracy for each of
the three algorithms on held-out test data. Specifically, we
randomly take out some public attributes (ground truth),
then run our algorithms to reconstruct these values assuming
that they are hidden. For the Duke dataset, we use ground
truth from the online phonebook when available. For Task
1, if M0 is the set of binary attributes on which we run attribute inference, the average prediction accuracy A0 is
computed as follows:
A0 = 1
X
1
|x0m f (x0m )|,
|M0 | mM
(12)
Figure 5: Scatter plot of degree of the user versus inference error for Task 1 using the Matrix algorithm
for the Duke Facebook dataset
For Task 2, we compute the improvement defined

B(m) = B 0 (m) B(m)
1 X
B(m) = 1
|xim f (xim )|
|m | i
(13)
(14)
B 0 (m) = 1
1 X
|xim f (xim |x0m )|,
|0m |
0
(15)
im
where: B(m) is the fraction of correctly predicted instances

without the knowledge of x0m , and B 0 (m) = B(m|x0m ) is
the same metric computed after x0m is publicized.
5.3
Algorithms
We evaluate the following algorithms:

Det: The deterministic method
Figure 6: Scatter plot of degree of the user versus
the number of friends with attribute for the Duke
Facebook dataset
Log: Logistic regression based inference

Mat: Matrix factorization using INLA
Maj: Majority vote, for baseline
5.4
Results
We demonstrate our results in greater detail in Figures 3

and 4. Figure 3 plots on the x-axis the inference error , and
on the y-axis the fraction of users with inference errors less
than threshold for each of the three datasets used in Task
1. We see, for instance, that for about 20% of the users the
inference error is less than 20%, and for more than 75% of
the users the inference error is less than 50% (that is we can
do better than random guessing for more than 75% of the
users). To further investigate our algorithms, we also plotted the inference error versus the degree of the user for Task
1 (Figure 5). We can see as the degree of the user increases,
the inference error also increases. This can be explained by
the fact that users with higher degree tend to have friends
who are more diverse and thus inferring their sensitive attribute is harder using our algorithms. Studying whether
this result holds fundamentally for all inference algorithms
is an interesting direction for future work.
In Figure 6, we plot the fraction of neighbors with an attribute value against the degree of users. As the number
of friends increases, a diverse set of attribute values are observed in Alices neighborhood. Hence, the prevalence of
the target attribute value decreases, and attribute inference
could give higher errors for high-degree nodes.
Figure 4 plots the fraction of users that experience accuracy improvement of at least after Alice publicizes her
hidden attribute x0m .
In summary, our results indicate that even the local net-
We now present our evaluation results. Table 2 shows

the average predictions accuracy of inferring Alices hidden
attributes. Higher accuracy means that the estimation is
in general accurate. In each of the datasets, the prediction
accuracy is averaged over 20 different users (acting as Alice). We can see that all algorithms have about the same
performance on all the datasets. Table 3 shows the improveMethod
Det
Log
Mat
Maj
Google+
.6844 .1068
.7635 .0788
.8073 .0917
.5082 .1385
UCI FB
.7490 .1233
.6812 .1381
.7249 .1192
.5201 .1305
Duke FB
.7511.0965
.7186.0611
.7401.0824
.6257.0717
Table 2: Average prediction accuracy

ment in prediction accuracy after publicizing a given hidden
attribute x0m for Alice.
Method
Det
Log
Mat
Maj
Google+
.0217 .0079
.0225 .0062
.0334 .0093
.0119 .0027
UCI FB
.0091.0038
.0057.0027
.0048.0016
.0021.0035
Duke FB
.0419.0064
.0327.0071
.0648.0108
.0196.0163
Table 3: Improvement induced by making x0m public
17
Figure 3: Fraction of users with inference errors less than threshold for each of the three datasets used in
Task 1
Figure 4: Fraction of users that experience accuracy improvement of at least after Alice publicizes her
hidden attribute x0m .
work can give a reasonable estimate of hidden attributes:
information content in social networks are densely clustered.
[3]
6.
CONCLUSION
Social networks are vulnerable to privacy attacks. Since

users publish different parts of their profiles, adversaries are
capable of inferring their sensitive attributes by exploiting
the link structure of the social network. Since sharing even a
small seemingly-benign chunk of personal information may
be detrimental to privacy, it is important to analyze the risk
of publicizing hidden attributes.
Though there has been recent interests in analyzing the
privacy risks in social networks, the current trend seems to
be on the use of large networks. However, for an end-user
with access to only his/her one-hop neighbors, using such
global networks is impractical. Thus we proposed three ways
for making the best use of the information locally available
to individual end-users.
Throughout the paper, we answered two question for an
end-user Alice:
[4]
[5]
[6]
[7]
[8]
Task 1: determine whether her secret can be guessed

from her public information.
[9]
Task 2: determine whether publicizing her hidden attribute would breach her friends privacy.
We presented three novel schemes to answer the above two
questions. While the proposed framework is not able to
prove if Alices profile is free of attribute inference, our results indicate that in some cases, even the local network can
give a reasonable estimate of hidden attributes, and thus
can be used to warn individuals of such privacy breaches.
7.
[10]
[11]
[12]
REFERENCES
[1] L. Adamic and E. Adar. Friends and neighbors on the

web. Social Networks, 25:211230, 2001.
[2] L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore
art thou r3579x?: anonymized social networks, hidden
[13]
18
patterns, and structural steganography. In WWW,

2007.
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent
dirichlet allocation. J. Mach. Learn. Res., 3:9931022,
Mar. 2003.
M. Gjoka, M. Kurant, C. T. Butts, and
A. Markopoulou. Walking in facebook: a case study of
unbiased sampling of osns. In INFOCOM, 2010.
N. Z. Gong, A. Talwalkar, L. W. Mackey, L. Huang,
E. C. R. Shin, E. Stefanov, E. Shi, and D. Song.
Predicting links and inferring attributes using a
social-attribute network (san). CoRR, abs/1112.3265,
2011.
R. Gross and A. Acquisti. Information revelation and
privacy in online social networks. In WPES, 2005.
J. He, W. W. Chu, and Z. V. Liu. Inferring privacy
information from social networks. In ISI, 2006.
Y. Koren, R. Bell, and C. Volinsky. Matrix
factorization techniques for recommender systems.
Computer, 42(8):3037, Aug. 2009.
J. Lindamood, R. Heatherly, M. Kantarcioglu, and
B. Thuraisingham. Inferring private information using
social network data. In WWW, 2009.
H. Rue, S. Martino, and N. Chopin. Approximate
Bayesian inference for latent Gaussian models using
integrated nested Laplace approximations. J. Royal
Stat. Soc., Series B, 2009.
R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted
boltzmann machines for collaborative filtering. In
ICML, 2007.
Z. Yin, M. Gupta, T. Weninger, and J. Han. Linkrec:
a unified framework for link recommendation with
user attributes and graph structure. In WWW, 2010.
E. Zheleva and L. Getoor. To join or not to join: the
illusion of privacy in social networks with mixed
public and private user profiles. In WWW, 2009.

Curso - Protect Yourself From Curse of Attribute Inference

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Curso - Protect Yourself From Curse of Attribute Inference

Hochgeladen von

Copyright:

Verfügbare Formate

curso: Protect Yourself from Curse of Attribute Inference

A social network privacy-analyzer

Dept. of Computer Science

raised serious privacy concerns [2, 7, 9, 13]. For instance,

While social networking platforms allow users to control

Categories and Subject Descriptors

Contributions: In this paper we focus on two concrete

social networks, attribute inference

Dept. of Electrical & Comp. Engg

Social networks have gained a wide popularity over the

these tools to be built in cooperation with the social network

Figure 1: Local network around an end-user Alice.

Many recent research publications analyze social network

Quantify the amount of information about x0m that can

For  > 0, we declare that privacy is breached at level 

Deviation and Error Metrics

as the residue in approximating xij with f (xij ).

Task 1: determine whether her secret can be guessed

E0 (m) g(0, m) = |x0m f (x0m )| < .

Since x0m is either 0 or 1, another interpretation of E0 (m)

For Task 2, we say that the mth attribute is breached

We define the importance of user i0 to user i as:

Note that t runs through all the common friends and

Here g(i, m) = g(xim ) is the deviation without x0m , while

In this section, we present three techniques for inferring

Define v(j, i) as the value of attribute j to user i:

Social-Attribute Network Model

We adopt the notion of Social-Attribute Networks (SAN)

The power of an attribute j (having value z) to user i is:

Here t runs through all the friends of user i having value

Figure 2: Example of a simple SAN model (from [5])

wij = w1 (i, j) w0 (i, j).

Note wij > 0 if and only if xij = 1 has more power/significance

For a hidden attribute xij of interest, we compute the

Il = {all users connected to user/attribute l }.

wij = w1 (i, j) w0 (i, j).

Gi = {all friends of user i in the network}.

Since wij can be any real number, we map it onto (0, 1)

f (xij ) = h(wij ) = 1/[1 + exp(wij )],

Here, we define three utility functions as weighted sums

where h() = 1/(1 + exp()) is the sigmoid function. We

by stochastic gradient descent on log p(D, S|W, ). For

more power relative to xij = 0 (i.e. wij > 0). Conversely,

Next, we use logistic regression to model an adversary

from which we can approximate the expectation E(Wij |W ).

using the utility functions u(i, i0 ) and v(j, i0 ) defined in (4)

[xim log h(yim ) + (1 xim ) log(1 h(yim ))] + kk1 .

f (xij ) = h(E(Wij |W )).

We may use known algorithms (such as gradient methods)

The Google+ dataset introduced in [5] contains the social

and taking the sigmoid transformation:

UCI Facebook data

Here we use a Bayesian model to construct the estimator

and organize them in an (N + 1) M array W = [Wij ]ij

Duke Facebook data

Table 1: Summary of datasets

where pG is the Gaussian approximation to p(D, S|W, )

In order to test the performance of our proposed approaches,

For Task 2, we compute the improvement defined

where: B(m) is the fraction of correctly predicted instances

We evaluate the following algorithms:

Log: Logistic regression based inference

We demonstrate our results in greater detail in Figures 3

For > 0, we declare that privacy is breached at level

E0 (m) g(0, m) = |x0m f (x0m )| < .