Beruflich Dokumente
Kultur Dokumente
Alan Mislove
Bimal Viswanath
Krishna P. Gummadi
Peter Druschel
Northeastern University
MPI-SWS
Rice University
Exploiting homophily
People associate with others like them
or
05.02.2010 WSDM10
Alan Mislove
This talk
Explore how much implicit data exists on online social networks?
Or, how much information can be inferred? How much data is needed to be able to infer?
05.02.2010 WSDM10
Alan Mislove
Roadmap
. Idea: Use communities to infer attributes . Collect ne-grained community data . Do attribute-based communities exist? . How well can we infer user attributes?
05.02.2010 WSDM10
Alan Mislove
05.02.2010 WSDM10
Alan Mislove
Group: Users who share a common attribute Community: Users more densely connected than overall graph
05.02.2010 WSDM10 Alan Mislove 7
Group: Users who share a common attribute Community: Users more densely connected than overall graph
05.02.2010 WSDM10 Alan Mislove 7
Roadmap
. Idea: Use communities to infer attributes . Collect ne-grained community data . Do attribute-based communities exist? . How well can we infer user attributes?
05.02.2010 WSDM10
Alan Mislove
Collecting attributes
Obtained authoritative information
Queried student directory College (dormitory), major(s), year
RICE
05.02.2010 WSDM10
Roadmap
. Idea: Use communities to infer attributes . Collect ne-grained community data . Do attribute-based communities exist? . How well can we infer user attributes?
05.02.2010 WSDM10
Alan Mislove
11
05.02.2010 WSDM10
Alan Mislove
12
Roadmap
. Idea: Use communities to infer attributes . Collect ne-grained community data . Do attribute-based communities exist? . How well can we infer user attributes?
05.02.2010 WSDM10
Alan Mislove
14
05.02.2010 WSDM10
Alan Mislove
15
Normalized Conductance
How strong is a particular community A?
Rest of Network
Range is [-1,1]
0 represents no stronger than random
Alan Mislove
16
Algorithm
Given seed users, nd a community by
Adding users Stopping at some point
A
At each step, add user who increases normalized conductance by the most
05.02.2010 WSDM10
Alan Mislove
17
How to evaluate?
Evaluate performance using precision and recall
Algorithm takes in fraction sharing attribute
recall = fraction of remaining attribute-sharing users identied precision = fraction of identied users that share attribute
05.02.2010 WSDM10
Alan Mislove
18
0.80 0.60 0.40 0.20 0.00 1.00 0.80 0.60 0.40 0.20 0.00
0.80 0.60 0.40 0.20 0.00 1.00 0.80 0.60 0.40 0.20 0.00
1st Years
Recall
0.80 0.60 0.40 0.20 0.00 1.00 0.80 0.60 0.40 0.20 0.00
1st Years
2nd Years
Recall
0.80 0.60 0.40 0.20 0.00 1.00 0.80 0.60 0.40 0.20 0.00
2nd Years
Recall
0.80 0.60 0.40 0.20 0.00 1.00 0.80 0.60 0.40 0.20 0.00
2nd Years
4th Years
Recall
Recall
Recall
Dormitory
Matriculation year
05.02.2010 WSDM10
Alan Mislove
20
Other algorithms
1.00 0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 0.00
Precision
Recall
Recall
Precision
Dormitory
Matriculation year
05.02.2010 WSDM10
Alan Mislove
20
Precision Recall
05.02.2010 WSDM10
Alan Mislove
21
Summary
Ongoing online social network privacy debate
Focuses mainly on explicitly provided attributes
Questions?
05.02.2010 WSDM10
Alan Mislove
23
Backup slides
05.02.2010 WSDM10
Alan Mislove
24
05.02.2010 WSDM10
Alan Mislove
25
Modularity
How good is a community division? Metric: Modularity Q
Fraction of links within communities Relative to a random graph
Range is [-1,1]
Q=
= Tr e ||e ||
2
05.02.2010 WSDM10
(eii a2 ) i
Modularity
How good is a community division? Metric: Modularity Q
Fraction of links within communities Relative to a random graph
Range is [-1,1]
Q=
= Tr e ||e ||
2
05.02.2010 WSDM10
(eii a2 ) i