Beruflich Dokumente
Kultur Dokumente
Walking in Facebook
Outline
Walking in Facebook
Social Graph
Walking in Facebook
Walking in Facebook
Problem statement
Obtain a representative sample of users in a
given OSN by exploration of the social graph.
in this work we sample Facebook (FB)
explore graph using various crawling techniques
Walking in Facebook
Related Work
Graph traversal (BFS)
A. Mislove et al, IMC 2007
Y. Ahn et al, WWW 2007
C. Wilson, Eurosys 2009
Walking in Facebook
Our Contributions
Compare various crawling techniques in FBs social graph
Breadth-First-Search (BFS)
Random Walk (RW)
Metropolis-Hastings Random Walk (MHRW)
Practical recommendations
online convergence diagnostic tests
proper use of multiple parallel chains
methods that perform better and tradeoffs
Walking in Facebook
Outline
Motivation and Problem Statement
Sampling Methodology
crawling methods
data collection
convergence evaluation
method comparisons
Data Analysis
Conclusion
Walking in Facebook
Walking in Facebook
Degree of node
k
2 E
Number of edges
Walking in Facebook
10
p( A )
| Ai |
|V |
uV 1
uAi
Subset of sampled
nodes with value i
p( A )
uAi
1/ ku
uV
1/ ku
Walking in Facebook
Degree of node u
11
PMH
,w
1 PMH
,y
if w=
1
V
MH
AA
P
Walking in Facebook
1 1 31 1 2
1P ( )
3 3 55 51215
MH
AC
Walking in Facebook
13
Summary of Datasets
Sampling method
MHRW
RW
BFS
#Valid Users
28x81K
28x81K 28x81K
984K
# Unique Users
957K
2.19M
984K
2.20M
UNI
http://odysseas.calit2.uci.edu/research/osn.html
Walking in Facebook
14
Data Collection
Walking in Facebook
15
Detecting Convergence
Number of samples (iterations) to loose
dependence from starting points?
Walking in Facebook
16
Xa
Xb
z
E( X a ) E( X b )
Var ( X a ) Var ( X b )
Walking in Facebook
17
Walk 2
Between walks
variance
n 1 m 1 B
R
mn W
n
Walk 3
Within walks
variance
Walking in Facebook
18
Burn-in determined to be 3K
Minas Gjoka, UC Irvine
Walking in Facebook
19
Methods Comparison
Node Degree
Poor performance
for BFS, RW
28 crawls
MHRW, RWRW
produce good
estimates
per chain
overall
Minas Gjoka, UC Irvine
Walking in Facebook
20
Sampling Bias
BFS
BFS is biased
Walking in Facebook
21
Sampling Bias
MHRW, RW, RWRW
Walking in Facebook
22
Practical Recommendations
for Sampling Methods
Use MHRW or RWRW. Do not use BFS, RW.
Use formal convergence diagnostics
assess convergence online
use multiple parallel walks
MHRW vs RWRW
RWRW slightly better performance
MHRW provides a ready-to-use sample
Minas Gjoka, UC Irvine
Walking in Facebook
23
Outline
Walking in Facebook
24
FB Social Graph
Degree Distribution
a =1.32
1
a =3.38
2
Walking in Facebook
25
FB Social Graph
Topological Characteristics
Our MHRW sample
Assortativity Coefficient = 0.233
range of Clustering Coefficient [0.05, 0.35]
Walking in Facebook
26
Conclusion
Compared graph crawling methods
MHRW, RWRW performed remarkably well
BFS, RW lead to substantial bias
Practical recommendations
correct for bias
usage of online convergence diagnostics
proper use of multiple chains
Walking in Facebook
27
Thank you
Questions?
Walking in Facebook
28