Sie sind auf Seite 1von 63

Centrality

Concepts and Methods

Ulrik Brandes
Department of Computer & Information Science
University of Konstanz

NetSci 2006 Workshop, 1619 May 2006, Bloomington

network analysis

element-level
I

centrality

dense groups
clustering
roles and blockmodels

group-level
I
I

network-level
I
I
I

statistics
models
comparison

network analysis

element-level
I

centrality

dense groups
clustering
roles and blockmodels

group-level
I
I

network-level
I
I
I

statistics
models
comparison

across all levels: dynamics, scalability

network analysis

element-level
I

centrality

dense groups
clustering
roles and blockmodels

group-level
I
I

network-level
I
I
I

statistics
models
comparison

across all levels: dynamics, scalability

centrality in networks
structural importance

One of the primary uses of graph theory in social network


analysis is the identification of the most important actors in a
social network.
(Wasserman/Faust, 1994)

Strozzi

Medici

applications other than social networks include


bibliometrics, Web graph analysis, transportation networks,
systems biology, electrical networks, . . .
used synonymously: centrality, status, prestige,
prominence, power, impact, . . .

centrality in networks
structural importance

One of the primary uses of graph theory in social network


analysis is the identification of the most important actors in a
social network.
(Wasserman/Faust, 1994)

Strozzi

Medici

applications other than social networks include


bibliometrics, Web graph analysis, transportation networks,
systems biology, electrical networks, . . .
used synonymously: centrality, status, prestige,
prominence, power, impact, . . .

closeness centrality
Beauchamp 1965, Sabidussi 1966

definition:

c(v ) = X 1
dG (v , t)
tV

intuition: a node is central, if it is close (on average) to all


other nodes.

betweenness centrality
Anthonisse 1971, Freeman 1977

definition:

c(v ) =

X (s, t | v )
(s, t)

s,tV

intuition: a node is central, if it is between many pairs of other


nodes.

status
Katz 1953

definition:

c(v ) =

X 1 + c(u)

uv

intuition: the status of a vertex is the total weighted number of


paths reaching that vertex (where the contribution of a path is
decreasing exponentially with its length).


c=

( 1 AT )k

k =1


1

eigenvector centrality
Bonacich 1972

definition:

c(v ) =

c(u)

{u,v }E

intuition: a node is central, if it is has many central neighbors.

status & contrastatus


Harary 1959

definition:
c(v ) =

dG (v , t)

status

dG (s, v )

contrastatus

tV

c(v ) =

X
sV

where the sums are over all finite distances.


intuition: a node has high status, if it is far above many other
nodes; a node has high contrastatus, if it is deep below many
other nodes.

backbone centrality
definition:
c(v ) =

X
T

max dT (v , t)
tV

where the sum is over all spanning trees T of G.

intuition: a node is central, if it has low eccentricity in many


sparsifications.

backbone centrality
definition:
c(v ) =

X
T

max dT (v , t)
tV

where the sum is over all spanning trees T of G.

intuition: a node is central, if it has low eccentricity in many


sparsifications.

just kidding

what is a centrality measure, anyway?


There is certainly no unanimity on exactly what centrality is or
on its conceptual foundations, and there is little agreement on
the proper procedure for its measurement.
(Freeman, 1979)

loose collection of ad-hoc indices:


I

requirements?

characterization?

properties?

comparison?

interpretability?

conclusions?

what is a centrality measure, anyway?


There is certainly no unanimity on exactly what centrality is or
on its conceptual foundations, and there is little agreement on
the proper procedure for its measurement.
(Freeman, 1979)

loose collection of ad-hoc indices:


I

requirements?

characterization?

properties?

comparison?

interpretability?

conclusions?

directed multigraphs
general data model

undirected directed

loops, multiples

= single, uniform network model

levels of generality

general

(weakly) connected

strongly connected

symmetric (= undirected)

axiomatization and classification


1. minimal requirements (invariance under automorphisms,
consistency)
2. distinguish based on
I

access/influence

such as closeness centrality or graph center


I

visibility/reputation

such as eigenvector centrality or Katzs status


I

mediation/control

such as betweenness centrality or stress

what about my favorite centrality measure?

indegree is a centrality index,


outdegree is a centrality index,
nodal degree is a , , and centrality index

closeness is a constrained centrality,

eigenvector is a constrained centrality,

betweenness is a centrality,
but neither nor (not even constrained)

status & contrastatus are no centralities


but rather the opposite

what about my favorite centrality measure?

indegree is a centrality index,


outdegree is a centrality index,
nodal degree is a , , and centrality index

closeness is a constrained centrality,

eigenvector is a constrained centrality,

betweenness is a centrality,
but neither nor (not even constrained)

status & contrastatus are no centralities


but rather the opposite

what about my favorite centrality measure?

indegree is a centrality index,


outdegree is a centrality index,
nodal degree is a , , and centrality index

closeness is a constrained centrality,

eigenvector is a constrained centrality,

betweenness is a centrality,
but neither nor (not even constrained)

status & contrastatus are no centralities


but rather the opposite

standard normalization
Freeman 1979

centrality c:
I
I

non-negative measure c : V IR0


c(v )
[0, 1]
c(v ) =
max
c(x)
0
G G(n)
xV (G0 )

note:
I

applies to few measures (degree, closeness, betweenness)

connected undirected unweighted graphs only

standard normalization
Freeman 1979

centrality c:
I
I

non-negative measure c : V IR0


c(v )
[0, 1]
c(v ) =
max
c(x)
0
G G(n)
xV (G0 )

note:
I

applies to few measures (degree, closeness, betweenness)

connected undirected unweighted graphs only

generalized normalization
1. basic measure
computed node-wise on the maximum subgraph it is
well-defined for (range)
2. normalization w.r.t. range
divide by maximum for this range
3. normalization w.r.t. network
weight by relative size of range
4. normalization across networks
divide by total score

probability distribution on the set of nodes of any network


(read: relative importance, share of power, . . . )

normalization example
betweenness centrality

1. basic measure: c(v ) =

X (s, t | v )
(s, t)

s,tV

2. normalization w.r.t. range:


c(v )
c(v ) =

| N(v )| |N(v ) | |N(v ) N(v ) |


3. normalization w.r.t. network: c(v ) = c(v )
c(v )
4. normalization across networks: pv = P
c(x)
xV

with N(v ) = {s V : s v }
and N(v ) = {t V : v t}

normalization example
closeness centrality

1. basic measure: c(v ) = P

1
dG (v , t)

tV

2. normalization w.r.t. range:


c(v )
|N(v ) |
P
c(v ) =
=
dG (v , t)
1 / |N(v ) |
tV

3. normalization w.r.t. network: c(v ) = |N(v ) | c(v )


c(v )
4. normalization across networks: pv = P
c(x)
xV

with N(v ) = {t V : v t}

centrality concepts

summary
I
I
I

basic centrality measures


axiomatic classification
normalization

left out
I

I
I
I

many, many more measures


e.g. based on information theory, vitality,
random walks, etc.
refined classification
e.g., based on spreading processes
centrality of groups, centralization
dynamic and temporal networks

pagerank: surfing the Web


Brin and Page 1998

what is the probability that a random surfer views this


document (at any given point in time)?
X
pu
pv =
+ (1 ) popularity(v )
|
{z
}
links(u)
uv
|
{z
}
jump
random surfing

java query
pagerank

www.sikasenbey.or.jp/~ueshima/

home.interlink.or.jp/~ichisaka/

www.auscomp.com

www.w3.org

www.gamelan.com
www.sun.com

java.sun.com

www.nep.chubu.ac.jp/~nepjava/

tacocity.com.tw/java/
www.stat.duke.edu/sites/java.html
www.phy.syr.edu/courses/java-suite/crosspro.html
www.javafile.com
www.china-contact.com/java/
physics.syr.edu

hubs & authorities


Kleinberg 1998

an authoritative document is linked to by many hub documents,


whereas a hub document links to many authoritative
documents.

Ca (v ) =

auv Ch (u)

uv

Ch (v ) =

X
v w

auv Ca (w)

hubs & authorities


Kleinberg 1998

an authoritative document is linked to by many hub documents,


whereas a hub document links to many authoritative
documents.
!
Ca (v ) =

wuv

auv Ch (u) =

uv

auv auw

Ca (w)
!

Ch (v ) =

auv Ca (w) =

v w

v wu

3
w

bibliographic coupling

co-citation

avw auw

Ch (u)

java query
authority vs. pagerank

www.sikasenbey.or.jp/~ueshima/

home.interlink.or.jp/~ichisaka/

www.auscomp.com

www.w3.org

www.gamelan.com
www.sun.com

java.sun.com

www.nep.chubu.ac.jp/~nepjava/

tacocity.com.tw/java/
www.stat.duke.edu/sites/java.html
www.phy.syr.edu/courses/java-suite/crosspro.html
www.javafile.com
www.china-contact.com/java/
physics.syr.edu

java query
authority vs. pagerank

www.sikasenbey.or.jp/~ueshima/

home.interlink.or.jp/~ichisaka/

www.auscomp.com

www.w3.org

www.gamelan.com
www.sun.com

java.sun.com

www.nep.chubu.ac.jp/~nepjava/

tacocity.com.tw/java/
www.stat.duke.edu/sites/java.html
www.phy.syr.edu/courses/java-suite/crosspro.html
www.javafile.com
www.china-contact.com/java/
physics.syr.edu

java query
authority vs. pagerank

www.sikasenbey.or.jp/~ueshima/

home.interlink.or.jp/~ichisaka/

www.auscomp.com

www.w3.org

www.gamelan.com
www.sun.com

java.sun.com

www.nep.chubu.ac.jp/~nepjava/

tacocity.com.tw/java/
www.stat.duke.edu/sites/java.html
www.phy.syr.edu/courses/java-suite/crosspro.html
www.javafile.com
www.china-contact.com/java/
physics.syr.edu

within-site links must not be considered

the greatest football nations


network of matches at world cup finals 19302002

excerpt:
Brasilien
0:2
(2002)

Deutschland
1:2
(2002)
4:2 n.V.
(1966)

4:3 n.E.
(1990)

3:2 n.V.
(1970)

England

3:1
(1962)

nation ranking
closeness centrality

Argentina
Bulgaria

Russia
Brazil

Netherlands

Uruguay

Belgium
Germany
England
Mexico

Sweden

Italy
Spain

nation ranking
betweenness centrality

Bulgaria

Argentina
Netherlands
Uruguay
Belgium
Sweden
Brazil

England

Germany

Mexico
Italy

Spain

nation ranking
pagerank

Brazil

Germany
Italy

Argentina

England

France
Sweden
Spain

Uruguay

Netherlands

Yugoslavia

Poland

Hungary

Scotland

Russia

Czech Republic
Belgium
Bulgaria

Austria

Mexico

Chile
USA

nation ranking
pagerank (elimination games only)

Brazil

Italy

Germany
France

Argentina

Sweden

Hungary

England

Poland

nation ranking
authority (elimination games only)

Brazil

Germany

Italy
Argentina

France

Hungary
Sweden

England
Poland

nation ranking
pagerank (elimination games only)

England

Italy

Sweden

Germany

France

Poland

Hungary

Brazil
Argentina

The Internet Movie Database (IMDb)


www.imdb.com

comprehensive collection of movie information


I

detailed (facts, trivia)

submitted by industry and users

edited by permanent staff

free for non-commercial use

growing by the day

Multidimensional Scaling
new implementation for large-scale data

Stars and Blockbusters


hubs & authorities

Stars and Blockbusters


hubs & authorities
actors:
Phelps, Lee (I)
Flowers, Bess
Vogan, Emmett
London, Tom
Bacon, Irving
OConnor, Frank (I)
Sullivan, Charles (I)
Flavin, James
Cobb, Edmund
Mower, Jack
Strang, Harry
Boteler, Wade
Blystone, Stanley
Kibbee, Milton
Adams, Ernie (I)
Ellis, Frank (I)
Osborne, Bud
Jackson, Selmer
Crehan, Joseph
Homans, Robert

movies:
San Quentin (1937)
Union Pacific (1939)
Wells Fargo (1937)
Whole Towns Talking, The (1926)
You Cant Take It with You (1938)
Spiders Web, The (1912)
Meet John Doe (1941)
Unconquered (1917)
Star Is Born, A (1937)
Law and Order (1917)
Mr. Smith Goes to Washington (1939)
Honky Tonk (1929)
Roaring Twenties, The (1939)
Adventures of Mark Twain, The (1944)
Incendiary Blonde (1945)
Reap the Wild Wind (1942)
Buccaneer, The (1938)
Castle on the Hudson (1940)
Shadow, The (1913)
Broadway Bill (1918)

Stars and Blockbusters


hubs & authorities
actors:
Phelps, Lee (I)
Flowers, Bess
Vogan, Emmett
London, Tom
Bacon, Irving
OConnor, Frank (I)
Sullivan, Charles (I)
Flavin, James
Cobb, Edmund
Mower, Jack

movies:
San Quentin (1937)
Union Pacific (1939)
Wells Fargo (1937)
Whole Towns Talking, The (1926)
You Cant Take It with You (1938)
Spiders Web, The (1912)
Meet John Doe (1941)
Unconquered (1917)
Star Is Born, A (1937)
Law and Order (1917)

restrict to decade around some focal year

Dominating the Data in 1992


actors:
North, Peter (I)
Byron, Tom
Boy, T.T.
Wallice, Marc
West, Randy (I)
Silvera, Joey
Dough, Jon
Horner, Mike
Sanders, Alex (I)
Jeremy, Ron
Diamond, Debi
Michaels, Sean
East, Nick
Drake, Steve (I)
Morgan, Jonathan (I)
Bionca
Moore, Melanie (I)
Hartley, Nina
Tedeschi, Tony
Spears, Randy (I)

movies:
Adult Video News Awards 1992 (1992)
Adult Video News Awards 1994 (1994)
Bloopers (1994)
Boobs Butts and Bloopers 1 (1992)
True Legends of Adult Cinema[...] (1993)
Orgy 3, The (1993)
Mirage 2 (1992)
Adult Video News Awards 1996 (1996)
Gang Bang Girl 12 (1993)
Sorority Sex Kittens (1992)
Sodomania: The Baddest[...] (1994)
Orgy 2, The (1993)
Blondes Who Blow (1996)
True Legends of Adult Cinema[...] (1992)
Cumshot Revue 5 (1989)
Only the Very Best On Video (1992)
Tight Squeeze (1992)
Oral Majority 8 (1991)
Devil in Miss Jones 5[...] (1994)
Queen of Hearts 3 (1992)

IMDb red-light district

distance computations
single-source shortest paths using breadth-first search

dG (s, ) in time O(n + m)

distance computations
s (v ) =

(s, t | v )

tV

w3
d3

d2
w2
s

w1

d1

s (v ) =

X
w : v Ps (w)

sv
(1 + s (w))
sw

closeness and betweenness computation


unit of computation is single-source shortest paths problem
execute n times
total of O(n2 + nm) running time, O(n + m) space

seconds

1000s

1000w

5000
vertices

approximation by sampling
Hoeffding 1963

independent identically distributed random variables


X1 , . . . , Xk with 0 Xi M (i = 1, . . . , k )

> 0 arbitrary





X1 + . . . + Xk
2
X1 + . . . + Xk
P
E

e2k ( M )

k
k

closeness centrality
Eppstein and Wang 2001

for pivots p1 , . . . , pk and constant diameter D, estimator


Xi (v ) =

n
d(v , pi )
n1

guarantees D-approximation of total distance w.h.p. by using


only k O(log n) pivots, i.e.
cC (v )

k
n1
Pk
n
i=1 d(v , pi )

betweenness estimation

X
X
X (s, t|v )
=
(s, t|v ) =
(s|v )
(s, t)
s6=v 6=t
s6=v
s6=v 6=t
P
where (s|v ) = t6=v (s, t|v )
cB (v ) =

for pivots p1 , . . . , pk , estimator is


Xi (v ) =

n
(pi |v )
n1

random graphs
Gilbert 1959
closeness centrality in random graphs (n=1000, m=~10000, 20)

closeness centrality in random graphs (n=1000, m=~10000, 20)

0.005

160000
MaxMin
MaxSum
MinSum
RanDeg
Random
Mixed

0.0045
0.004

MaxMin
MaxSum
MinSum
RanDeg
Random
Mixed

140000

average inversion number

average Euclidean distance

120000
0.0035
0.003
0.0025
0.002
0.0015

100000

80000

60000

40000
0.001
20000

0.0005
0

0
0

100

200

300

400
500
600
number of pivots

700

800

900

1000

100

betweenness centrality in random graphs (n=1000, m=~10000, 20)

300

400
500
600
number of pivots

700

800

900

1000

betweenness centrality in random graphs (n=1000, m=~10000, 20)

0.08

180000
MaxMin
MaxSum
MinSum
RanDeg
Random
Mixed

0.07

MaxMin
MaxSum
MinSum
RanDeg
Random
Mixed

160000
140000
average inversion number

0.06
average Euclidean distance

200

0.05

0.04

0.03

0.02

120000
100000
80000
60000
40000

0.01

20000

0
0

100

200

300

400
500
600
number of pivots

700

800

900

1000

100

200

300

400
500
600
number of pivots

700

800

900

1000

small worlds
Watts and Strogatz 1998
closeness centrality in small worlds (n=1000, m=10000, 20)

closeness centrality in small worlds (n=1000, m=10000, 20)

0.016

250000
MaxMin
MaxSum
MinSum
RanDeg
Random
Mixed

0.014

MaxMin
MaxSum
MinSum
RanDeg
Random
Mixed

200000

average inversion number

average Euclidean distance

0.012

0.01

0.008

0.006

150000

100000

0.004
50000
0.002

0
0

100

200

300

400
500
600
number of pivots

700

800

900

1000

100

betweenness centrality in small worlds (n=1000, m=10000, 20)

300

400
500
600
number of pivots

700

800

900

1000

betweenness centrality in small worlds (n=1000, m=10000, 20)

0.12

200000
MaxMin
MaxSum
MinSum
RanDeg
Random
Mixed

MaxMin
MaxSum
MinSum
RanDeg
Random
Mixed

180000
160000

average inversion number

0.1

average Euclidean distance

200

0.08

0.06

0.04

140000
120000
100000
80000
60000
40000

0.02
20000
0

0
0

100

200

300

400
500
600
number of pivots

700

800

900

1000

100

200

300

400
500
600
number of pivots

700

800

900

1000

preferential attachment
Barabsi and Albert 1999, Bollobs et al. 2001
closeness centrality in preferential attachment graphs (n=1000, m=20000, 20)

closeness centrality in preferential attachment graphs (n=1000, m=20000, 20)

0.004

120000
MaxMin
MaxSum
MinSum
RandDeg
Random
Mixed

0.0035

MaxMin
MaxSum
MinSum
RandDeg
Random
Mixed

100000

average inversion number

average Euclidean distance

0.003

0.0025

0.002

0.0015

80000

60000

40000

0.001
20000
0.0005

0
0

100

200

300

400
500
600
number of pivots

700

800

900

1000

betweenness centrality in preferential attachment graphs (n=1000, m=20000, 20)

100

200

300

400
500
600
number of pivots

700

800

900

1000

betweenness centrality in preferential attachment graphs (n=1000, m=20000, 20)

0.14

160000
MaxMin
MaxSum
MinSum
RandDeg
Random
Mixed

0.12

MaxMin
MaxSum
MinSum
RandDeg
Random
Mixed

140000

average inversion number

average Euclidean distance

120000
0.1

0.08

0.06

100000

80000

60000

0.04
40000
0.02

20000

0
0

100

200

300

400
500
600
number of pivots

700

800

900

1000

100

200

300

400
500
600
number of pivots

700

800

900

1000

closeness (exact vs. estimated using M AX M IN)


protein interaction network (Saccharomyces cerevisiae; n = 2 114, m = 4 480)

closeness (Euclidean distance error)


Reuters ticker news (n = 13 308, m = 148 036)

closeness centrality of ticker news text network (n=13308, m=148036, 20 runs)


0.0007
MaxMin
MaxSum
MinSum
RanDeg
Random
Mixed

0.0006

Euclidean distance

0.0005

0.0004

0.0003

0.0002

0.0001

0
0

2000

4000

6000
8000
number of pivots

10000

12000

14000

pivot selection using M IN S UM


Reuters ticker news (n = 13 308, m = 148 036)

Das könnte Ihnen auch gefallen