Beruflich Dokumente
Kultur Dokumente
Brian Baingana, Student Member, IEEE, and Georgios B. Giannakis, Fellow, IEEE
AbstractVisual rendering of graphs is a key task in the mapping of complex network data. Although most graph drawing
algorithms emphasize aesthetic appeal, certain applications such as travel-time maps place more importance on visualization
of structural network properties. The present paper advocates two graph embedding approaches with centrality considerations
to comply with node hierarchy. The problem is formulated first as one of constrained multi-dimensional scaling (MDS), and it is
solved via block coordinate descent iterations with successive approximations and guaranteed convergence to a KKT point. In
addition, a regularization term enforcing graph smoothness is incorporated with the goal of reducing edge crossings. A second
approach leverages the locally-linear embedding (LLE) algorithm which assumes that the graph encodes data sampled from
a low-dimensional manifold. Closed-form solutions to the resulting centrality-constrained optimization problems are determined
yielding meaningful embeddings. Experimental results demonstrate the efficacy of both approaches, especially for visualizing
large networks on the order of thousands of nodes.
Index TermsMulti-dimensional scaling, locally linear embedding, network visualization, manifold learning.
I NTRODUCTION
centrality-based weights. However, the proposed algorithm is sensitive to initialization and offers no convergence guarantee, a limitation that is addressed in
this paper. In another study whose results have been
used extensively for internet visualization, the k-core
decomposition is used to hierarchically place nodes
within concentric onion-like shells [39]. Although
effective for large-scale networks, it is a heuristic
method with no optimality associated with it.
1.3
P RELIMINARIES
1
X = arg min
x1 ,...,xN 2
N X
N
X
i=1 j=1
[kxi xj k2 ij ]2 . (1)
X = arg min
x1 ,...,xN
s. to
1
2
N P
N
P
i=1 j=1
kxi xj k2 ij
2
BCD
SUCCESSIVE APPROXIMATIONS
= arg min
x
2
1 X
kx xj k2 ij
2
j6=i
kxk2 f (ci )
s. to
(3)
or equivalently as
arg min
x
X
X
(N 1)
xr1
)
xrj +
kxk22 xT (
j
2
j>i
j<i
X
X
ij kx xr1
k2
ij kx xrj k2
j
j>i
j<i
s. to
kxk2 f (ci )
(4)
PN
Pi1
P
P
where j<i (.) := j=1 (.) and j>i (.) := j=i+1 (.).
With the last two sums in the cost function of (4)
being non-convex and non-smooth, convergence of
the BCD algorithm cannot be guaranteed [45, p. 272].
Moreover, it is desired to have each per-iteration
subproblem solvable to global optimality, in closed
form and at a minimum computational cost. The
proposed approach seeks a global upper bound of the
objective with the desirable properties of smoothness
and convexity. To this end, consider the function
(x) := 1 (x) 2 (x), where
1 (x) :=
X
X
(N 1)
xr1
)
xrj +
kxk22 xT (
j
2
j>i
j<i
(5)
and
2 (x) :=
X
j<i
ij kx xrj k2 +
X
j>i
ij kx xr1
k2 .
j
(6)
N
X
j=1
ij x kx xj k2 .
(9)
(x0 ),
x0 F
(12a)
(N 1) T
x x
2
{x:kxk2 f (ci )}
hX
(xrj + ij gjr (xr1 ))
xT
arg min
j<i
X
j>i
i
(xr1
+ ij gjr1 (xr1 )) .
j
(13)
(x )
"
X
1
(xr + ij gjr (xr1 ))
N 1 j<i j
#
X
r1
r1 r1
(xj + ij gj (x )) .
+
(15)
j>i
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
Input: {ci }N
i=1 , ,
Initialize X0 , r = 0
repeat
r =r+1
for i = 1 . . . N do
Compute xri according to (14) and (15)
Xr (i, :) = (xri )T
end for
until kXr Xr1 kF
X = (I N1 11T )Xr
E NFORCING
GRAPH SMOOTHNESS
In this section, the MDS stress in (3) is regularized through an additional constraint that encourages smoothness over the graph. Intuitively, despite
the requirement that the node placement in lowdimensional Euclidean space respects inherent network structure, through preserving e.g., node centralities, neighboring nodes in a graph-theoretic sense
(meaning nodes that share an edge) are expected to
be close in Euclidean distance within the embedding.
An example of an application where smoothness is
well motivated is the visualization of transportation
networks that are defined over geographical regions.
Despite the interest in their centrality structure, embeddings that place nodes that are both geographically close and adjacent in the representative graph
are more acceptable to users that are familiar with
their conventional (and often geographically representative) maps. Such a requirement can be captured
by incorporating a constraint that discourages large
distances between neighboring nodes. In essence, this
constraint enforces smoothness over the graph embedding.
A popular choice of a smoothness-promoting function is h(X) := Tr(XT X), where Tr(.) denotes the
trace operator, and := D A is the graph Laplacian
with D a diagonal matrix whose (i, i)th entry is the
degree of node i, and A the adjacency matrix. It
N P
N
P
can be shown that h(X) = (1/2)
aij kxi xj k22 ,
i=1 i=1
i=1 j=1
+ 2
s. to
N P
N
P
i=1 j=1
kxi k2 = f (ci ), i = 1, . . . , N
(16)
where the scalar 0 controls the degree of smoothness. The penalty term has a separable structure and is
convex with respect to xi . Consequently, P3 lies within
the framework of successive approximations required
to solve each per-iteration subproblem. Following the
same relaxations and invoking the successive upper
bound approximations described earlier, yields the
following QCQP
xri
arg min
(N +dii 1) T
x x
2
{x:kxk2 f (ci )}
P
x [ ((1 + aij )xrj + ij gjr (xr1 ))
T
s. to
j>i
with dii :=
N
P
(19)
i=1
s. to
N
X
(17)
wij = 1, i = 1, . . . , N
jNi
j<i
jNi
j=1
xi = 0,
i=1
N
1 X
xi xTi = I
N i=1
(20)
j=1
j>i
Input: A, {ci }N
i=1 , , ,
Initialize X0 , r = 0
repeat
r =r+1
for i = 1 . . . N do
Compute xri according to (14) and (18)
Xr (i, :) = (xri )T
end for
until kXr Xr1 kF
X = (I N1 11T )Xr
LLE belongs to a family of non-linear dimensionality reduction techniques which impose a lowdimensional manifold structure on the data with the
objective of seeking an embedding that preserves the
local structure on the manifold [38, Chap. 14]. In particular, LLE accomplishes this by approximating each
data point by a linear combination of its neighbors determined by a well-defined criterion followed by construction of a lower dimensional embedding that best
6.1
Centrality constraints
This section proposes an adaptation of the LLE algorithm to the graph embedding problem by imposing
a centrality structure through appropriately selected
constraints. Since G is given, selection of neighbors
for each node, i, is straightforward and can be accomplished by assigning Ni to the set of its n-hop
neighbors. As a result, each node takes on a different
number of neighbors, as dictated by the topology of
the graph.
Associating with node i a vector, yi , of arbitrary
dimension
q p,Pand adding the centrality constraint
P
( jNi wij yj )T ( jNi wij yj ) = f 2 (ci ) to (19) per
node yields
(P4)
wi = arg min
{w: 1T w=1}
s. to
kyi Yi wi k22
i
] contains the K n-hop
where Yi := [y1i , . . . , yK
i K
neighbors of i, yj j=1 , and wi := [wi1 , . . . , wiK ]T .
Similarly, the final LLE step is modified as follows
(P5)
arg min
x1 ,...,xN
s. to
N
X
i=1
kxi
N
X
j=1
wij xj k22
P
P
( jNi wij yj )T ( jNi wij yj ) f 2 (ci ) leads to a
convex problem which can be easily solved.
N
In general, {yi }i=1 are unknown and a graph embedding must be determined entirely from G. However, the terms in both the cost function and the
constraint in (21) are inner products, yiT yj for all
i, j {1, . . . , N }, which can be approximated using
N
the dissimilarities {{ij }N
j=1 }i=j+1 . This is reminiscent
N
of classical MDS which seeks vectors {yi }i=1 so that
kyi yj k2 ij for any pair of points i and j. To this
end, (21) can be written as
+(1T w 1).
(27)
Assuming Slaters condition is satisfied [44], a zero
duality gap is achieved if the following KKT conditions are satisfied by the optimal primal and dual
variables:
wT Hi w f 2 (ci ) 0 (28a)
1T w 1 =
0 (28c)
(w Hi w f (ci )) = 0 (28d)
wi = arg min
{w: 1T w=1}
wT YiT Yi w 2yiT Yi w
(23)
s. to
wT YiT Yi w f 2 (ci )
If D denotes the N N matrix whose (i, j)th entry is
the square Euclidean distance between yi and yj i.e.,
[D]ij := kyi yj k22 , and Y := [y1 , . . . , yN ], then
D = 1T + 1 T 2YT Y
(24)
0 (28b)
and
2(1T H1
i hi )
1T H1
i 1
T 1
21
1
T
hi Hi (11T H1
i hi hi 1 Hi 1)
+2
(29)
1 3 2
2
T
(1T H1
i 1) (1 Hi 1) f (ci )
= 1T H1
i hi
T 1
1 Hi 1 1.
2
(30)
wi
1
1
(1+ ) Hi
H1
i
hi
2 1
,
1
T
1 Hi hi 1
1
,
hi 1 T H
1
1
if R+
otherwise.
(31)
Although Hi is provably positive semidefinite, it is
not guaranteed to be non-singular as required by
(29) - (31). Since Hi is an approximation to inner
products, ensuring positive definiteness by uniformly
perturbing the diagonal entries slightly is reasonable
in this case, i.e., Hi Hi + IK , where is a small
number.
Setting all wij = 0 for j
/ Ni , the sought graph
N
embedding is determined by solving P5 over {xi }i=1 .
(P6) wi = arg min
wT Hi w 2hTi w
P5 is non-convex and global optimality is not guaran{w: 1T w=1}
teed. However, the problem decouples over vectors
T
2
s. to w Hi w f (ci ), i = 1, . . . , N
N
{x }
and this block separability can be exploited
(26). i i=1
to solve it via BCD iterations. Inner iteration i under
Remark 2. Since the entries of H are inner products, it outer iteration r involves solving
is a kernel matrix and can be replaced by a graph kernel
X
X
that reliably captures similarities between nodes in G. The
r
r
wij xr1
k22
w
x
x
=
arg
min
kx
ij
j
i
j
Laplacian pseudoinverse, L is such a kernel in the space
x
j>r
j<r
spanned by the graph nodes, where node i is represented
2
2
s.
to
kxk
=
f
(c
).
(32)
i
2
by ei [40]. In fact, (1/2)J(2) J and L are equivalent
P
P
r1
r
r
and denotwhen the entries of (2) are average commute times, briefly With vi := j<r wij xj + j>r wij xj
ing
a
Lagrange
multiplier,
and
minimization
of the
discussed in Section 8.
Lagrangian for (32), leads to
In order to solve P6 per node, Lagrange multipliers
xri = arg min kx vir k22 + (kxk22 f 2 (ci )) (33)
and corresponding to the inequality and equality
x
which yields
vir
xri =
.
(34)
1+
Substituting (34) into the equality constraint in (32)
leads to the closed-form per-iteration update of xi
( vr
i
f (ci ),
if kvir k2 > 0
r
r
i k2
xi = kvr1
(35)
xi , otherwise.
Centrality measures offer a valuable means of quantifying the level of importance attributed to a specific
node or edge within the network. For instance, such
measures address questions pertaining to the most
authoritative authors in a research community, the
most influential web pages, or which genes would
be most lethal to an organism if deleted. Such questions often arise in the context of social, information
and biological networks and a plethora of centrality
measures ranging from straightforward ones like the
node degree to the more sophisticated ones like the
PageRank [28] have been proposed in the network
science literature. Although the approaches proposed
i
where j,k
is the number of shortest paths between
nodes j and k through node i [23]. An immediate
application of betweenness centrality to network visualization is envisaged in the context of exploratory
analysis of transport networks (for instance a network
whose nodes are transit and terminal stations and
edges represent connections by rail lines). In such a
network, a knowledge of stations that route the most
traffic is critical for transit planning and visualizations
that effectively capture the betweenness centrality are
well motivated.
N ODE D ISSIMILARITIES
This section briefly discusses some of the dissimilarity measurements for the graph embedding task.
Although V can represent a plethora of inter-related
objects with well defined criteria for pairwise dissimilarities (e.g., ij = kyi yj k2 , where vectors yi and
yj corresponding to nodes i and j are available in
Euclidean space), D must be determined entirely from
G in order to obtain embeddings that reliably encode
the underlying network topology. Node dissimilarity
metrics can be broadly classified as: i) methods based
on the number of shared neighbors, and ii) methods
based on graph-theoretic distances.
1/2 zi , yields
(41)
N UMERICAL E XPERIMENTS
ci min ci
diam (G)
iV
1
(42)
f (ci ) =
2
max ci max ci
iV
iV
with diam(G) denoting the diameter of G. Node dissimilarities were computed via ECTD.
The visual quality of the embeddings from CCMDS and CC-LLE was evaluated against a recent
centrality-based graph drawing algorithm, a More
Flexible Radial Layout (MFRL) [30], and the spring
embedding (SE) [21] algorithm. The first column in
1. https://wikis.bris.ac.uk/display/ipshe/London+Tube
Graph
Type
Number of vertices
Number of edges
undirected
34
78
undirected
307
353
undirected
5,242
28,980
Gnutella-04
directed
10,879
39,994
Gnutella-24
directed
26,518
65,369
100
50
0
0.05
0.1
0.15
0.2
(a)
0.25
0.3
0.35
0.4
400
Number of nodes
200
0
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
(b)
4000
2000
0
20
40
60
(c)
80
100
120
x 10
1
0
50
100
150
200
(d)
250
300
350
400
5.5
x 10
5.499
5.498
Stress
5.497
5.496
5.495
5.494
5.493
50
100
150
200
250
BCD Iteration
300
350
400
10
Algorithm
Betweenness centrality
Closeness centrality
Degree centrality
CC-MDS
CC-LLE
MFL
SE
Fig. 4: Embedding of the London tube graph emphasizing different centrality considerations.
11
1.0
1.0
0.5
0.5
0.0
0.0
0.5
0.5
1.0
1.0
0.5
0.0
0.5
1.0
1.0
1.0
0.5
(a) = 0
1.0
0.5
0.5
0.0
0.0
0.5
0.5
0.5
0.0
0.5
0.5
1.0
(b) = 1
1.0
1.0
1.0
0.0
1.0
9.4
1.0
1.0
0.5
(c) = 100
0.0
0.5
1.0
10
10
10
running time(s)
10
Running times
10
10
CCMDS
CCLLE
MFL
10
10
10
10
10
10
C ONCLUSION
12
j>i
j<i
Since
(xr1 ) =
=
j<i
X
j>i
ij kxr1 xrj k2
ij kxr1 xr1
k2 .
j
(43)
j<i
X
j>i
r1
ij kx
1
1
hi
H
1 .
wi =
(1 + ) i
2
(45)
(46)
Upon applying the result in (46) to the primal feasibility condition in (28b), it turns out that
H1
i
1T
h
1
= 1.
(47)
i
(1 + )
2
Thus,
= 1T H1
i hi
T 1
1 Hi 1 1.
2
(48)
Assuming the inequality constraint is inactive at optimality, then = 0, which upon solving (48) yields
2 1T H1
i hi 1
.
(49)
=
1T H1
i 1
(50)
xrj k2
ij kxr1 xr1
k2 ,
j
wiT Hi wi = f 2 (ci ).
(51)
2
2
(2hi 1) H1
i (2hi 1) = 4f (ci )(1+ ) (52)
2 T 1
1 Hi 1
4
2
= f (ci )(1 + )2 .
1
T
hTi H1
i hi 1 H i hi +
(53)
j>i
j<i
2Hi wi 2hi + 2 Hi wi + 1 = 0.
1 (xr1 ) 2 (xr1 )
(N 1) r1 2
kx k2
2
X
X
xr1
)
xrj +
(xr1 )T (
j
X
CONDITIONS
i hi 1
w i = H i hi
1 .
1T H1
i 1
A PPENDIX A
P ROOF OF P ROPOSITION 1
A PPENDIX B
S OLVING THE KKT
(44)
2 T 1
1
T
hTi H1
1 Hi 1
i hi 1 H i hi +
4
2
T 1
= f 2 (ci ) 1T H1
h
1
H
1
.
(54)
i
i
i
2
Finally, solving for from (54) and taking one root
yields
1T H1 hi
= 2 T i1
1 Hi 1
T 1
21
1
T
hi Hi (11T H1
i hi hi 1 Hi 1)
+2
.
(55)
1 3 2
2
T
(1T H1
i 1) (1 Hi 1) f (ci )
13
1.0
1.0
0.5
0.5
0.0
0.0
0.5
0.5
1.0
1.0
0.5
0.0
0.5
1.0
1.0
1.0
0.5
(a)
0.5
1.0
(b)
1.0
1.0
0.5
0.5
0.0
0.0
0.5
0.5
1.0
1.0
0.0
0.5
0.0
0.5
1.0
1.0
1.0
(c)
0.5
0.0
0.5
1.0
(d)
Fig. 6: Visualization of large networks: a) AGR (CC-MDS), b) AGR (CC-LLE), c) Gnutella-04 (CC-LLE), and d) Gnutella-24 (CC-LLE).
R EFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
14
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
P. Jaccard, Etude
comparative de la distribution florale dans
une portion des Alpes et des Jura, Bulletin de la Societe
Vaudoise des Sceience Naturelles, vol. 37, pp. 547579, November
1901.
E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai,
A. L. Barabasi, Hierarchical organization of modularity in
metabolic networks, Science, vol. 297, pp. 15511555, August
2002.
E. A. Leicht, P. Holme, M. E. J. Newman, Vertex similarity in
networks, Physical Review E, vol. 73, 026120, February 2006.
D. J. Klein, M. Randic, Resistance distance, Journal of
Mathematical Chemistry, vol. 12, pp. 8195, December 1993.
L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank
citation ranking: bringing order to the web., Stanford InfoLab,
November 1999.
W. Liu, L. Lu,
Jin, T. Zhou, Link prediction based on local
random walk, Europhysics Letters, vol. 89, 58007, March 2010.
U. Brandes and C. Pich, More flexible radial layout, Journal
of Graph Algorithms and Applications, vol. 15, pp. 157173,
February 2011.
I. Borg and P. J. Groenen, Modern Multidimensional Scaling:
Theory and Applications, Springer, New York, NY, 2005.
M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, em Neural
computation, vol. 15, pp. 13731396, June 2003.
T. Kamada and S. Kawai, An algorithm for drawing general
undirected graphs, Information Processing Letters, vol. 31, pp.
715, April 1989.
L. K. Saul, and S. T. Roweis, Think globally, fit locally:
unsupervised learning of low dimensional manifolds, Journal
of Machine Learning Research, vol. 4, pp. 119155, June 2003.
J. B. Tenenbaum, V. de Silva, and J. C. Langford, A global
geometric framework for nonlinear dimensionality reduction,
Science, vol. 290, pp. 23192323, December 2000.
D. De Ridder, O. Kouropteva, O. Okun, M. Pietikainen, and
R. Duin, Supervised locally linear embedding, Artificial Neural Networks and Neural Information Processing-ICANN/ICONIP,
pp. 175175, June 2003.
H. Chang and Y. Dit-Yan, Robust locally linear embedding,
Pattern Recognition, vol. 39, pp. 10531065, June 2006.
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer,
New York, NY, USA, 2001.
J. Alvarez-Hamelin, L. DallAsta, A. Barrat, and A. Vespignani,
Large scale networks fingerprinting and visualization using
the k-core decomposition, Advances in Neural Information Processing Systems, vol. 18, pp. 4150, May 2006.
[40] F. Fouss, A. Pirotte, J. M. Renders, and M. Saerens, Randomwalk computation of similarities between nodes of a graph
with application to collaborative recommendation, IEEE
Transactions on Knowledge and Data Engineering, vol. 19, pp.
355369, March 2007.
[41] S. L. France and J. D. Carroll, Two-way multidimensional
scaling: a review, IEEE Transactions on Systems, Man, and
Cybernetics, vol. 41, pp. 644661, September 2011.
[42] K. Q. Weinberger and L. K. Saul, Unsupervised learning of
image manifolds by semidefinite programming, Proceedings
of the IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, vol. 2, pp. 988995, July 2004.
[43] K. Q. Weinberger, B. D. Packer, and L. K. Saul, Nonlinear
dimensionality reduction by semidefinite programming and
kernel matrix factorization, Proceedings of the 10th International
Workshop on Artificial Intelligence and Statistics, pp. 381388,
January 2005.
[44] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, NY, 2004.
[45] D. P. Bertsekas, Nonlinear programming, Athena Scientific,
Belmont, MA, 1999.
[46] M. Razaviyayn, M. Hong, and Z. Q. Luo, A unified convergence analysis of block successive minimization methods
for nonsmooth optimization, arXiv preprint arXiv:1209.2385v1,
September 2012.
[47] B. Baingana, and G. B. Giannakis, Centrality-constrained
graph embedding, Proc. of the 38th Intern. Conf. on Acoust.,
Speech and Signal Proc., Vancouver, Canada, May 2631, 2013.
[48] K. Q. Weinberger, F. Sha, Q. Zhu, and L. K. Saul, Graph
Laplacian regularization for large-scale semidefinite programming, Advances in Neural Information Processing Systems,
vol. 19, pp. 14891496, September 2007.
[49] A. Buja, D. F. Swayne, M. L. Littman, N. Dean, H. Hofmann,
and L. Chen, Data visualization with multidimensional
scaling, Journal of Comp. and Graph. Stats., pp. 444472, June
2008.
[50] J. Leskovec, J. Kleinberg, and C. Faloutsos, Graph evolution:
Densification and shrinking diameters, ACM Transactions on
Knowledge Discovery from Data, vol. 1, article 2, March 2007.
[51] C. Shi, X. Kong, P. S. Yu, S. Xie, and B. Wu, Relevance
search in heterogeneous networks, Proceedings of the 15th
International Conf. on Extending Database Technology, pp. 180
191, March 2012.