Sie sind auf Seite 1von 5

Improved Achievability and Converse Bounds for

Erdos-Rnyi
Graph Matching

ABSTRACT
Take a pair of correlated Erd
os-Renyi graphs and permute
the vertex labels of one of the graphs. When does the correlation allow the original labels to be recovered? Learning the original labels is equivalent to finding the true correspondence between the vertex sets. We improve on the
achievability bound derived by Pedarsani and Grossglauser
Previously, the converse was only known in the limiting case
where the two graphs are identical. We prove a converse that
applies to correlated pairs. The converse and achievability
thesholds differ by a factor of two throughout the paramter
space.

1.

INTRODUCTION

2.

NOTATION

Let [n] = {1, 2, . . . , n}. Represent a labelled graph on the


vertex set [n] by its edge indicator function g : [n]
[2].
2
A permutation of [n] is a bijective function : [n] [n].
The set of permutations of [n] under the binary operation
of function composition form the symmetric group Sn . This
group has an action on [n]
. We can write the action of
2
the permutation on the graph G as the composition of
functions G , where sigma is the lifted version of :
[n]
2

[n]
2

{i, j} 7 {(i), (j)}.


Throughout, let N =

n
2


.

Pedarsani and Grossglauser [2] generate correlated Erd


osRenyi (ER) graphs as follows. Let H be an ER graph on
[n] with edge probability r. Let Ga and Gb be independent
random subgraphs of H such that each edge
 of H appears
in Gi with probability si . Thus for e [n]
, (Ga , Gb )(e) are
2

i.i.d. and are equal to

(1, 1) w.p.

(1, 0) w.p.

(0, 1) w.p.

(0, 0) w.p.

p11
p10
p01
p00

= rs0 s1
= rs0 (1 s1 )
= r(1 s0 )s1
= 1 r(s0 + s1 s0 s1 ).

It is possible to interpret H as representing some ground


truth and Gi as incomplete observations of H.
Theorem 1. Let (Ga , Gb ) ER(n, p) where p00 1
p00
. If p11 2 log n+(1)
, then there is a
and pp11
n
10 p01
deanonymizer that succeeds with probability 1o(1). If p11
log n(1)
, then any deanonymizer suceeds with probability
n
o(1).
Theorem 2. Let (Ga , Gb ) ER(n, p) where p10 = p01 ,
p2

r = o(1) and s = (1/n). If p10 +p11


8 log n+(1)
,
n
01 +p11
then there is a deanonymizer that succeeds with probability
1 o(1).
In the symmetric case, we have reduced the achievability
01 +p11 )
threshold by a factor of 4(p10 +p
. This improvement
p11
becomes more significant as the graphs Ga and Gb become
less correlated. Additionally, the gap between the achievability and converse threshold has been reduced to a factor
of 2 throughout the whole parameter space.

3.

CONVERSE

The converse statement depends on the following lemma.


Lemma 1. Let Ga and Gb be graphs on the vertex set [n].
Let Aut(Ga Gb ). Then (Ga , Gb ) (Ga , Gb ).

Proof: Let e [n]
. Suppose that (Ga , Gb )(e) = (1, 1), so
2
(Ga Gb )(e) = 1. Because Aut(Ga Gb ), for all i,
(Ga Gb )( i (e)) = 1. Then the contribution of this cycle of
to both (Ga , Gb ) and (Ga , Gb ) is zero.
Suppose (Ga Gb )(e) = 0. Then for all i, (Ga Gb )( i (e)) =
0 and (Ga , Gb )( i (e)) is (0, 0), (0, 1), or (1, 0). The contribution of this cycle to (Ga , Gb ) is equal to number of total edges in Ga and Gb . The contribution of this cycle to
(Ga , Gb ) cannot be larger.

Lemma 2. Let (Ga , Gb ) ER(n, r, s0 , s1 ) and let be


a uniformly random permutation of [n]. Let k = 12 (Ga
, Gb ). Then

k
p10 p01
P [ = |(Ga , Gb )]
.
p11 p00

Thus E

X
2



n
2


(1 p)2n3 . Then

E[X 2 ]E[X]2 1
"
!#
!
X
= 2E
+ E[X] E[X]2 1
2
(n2 n)(1 p)2n3 + n(1 p)n1
1
n2 (1 p)2n2
= (1 p)1 n1 (1 p)1 + n1 (1 p)n+1 1
p + n1 (1 p)n+1 .
=

Proof:
By Bayes rule and the independence of from (Ga , Gb ), we
have

P [ = |(Ga , Gb ) = (ga0 , gb )]
P [ = , (Ga , Gb ) = (ga0 , gb )]
P [(Ga , Gb ) = (ga0 1 , gb )]P [ =
P [(Ga , Gb ) = (ga0 1 , gb )]

Let ma = |E(ga )|, mb = |E(gb )|. Then


ma + mb
k
2
ma mb
|{e : (ga , gb )(e) = (1, 0)}| =
+k
2
mb ma
|{e : (ga , gb )(e) = (0, 1)}| =
+k
2
ma + mb
k.
|{e : (ga , gb )(e) = (0, 0)}| = N
2
|{e : (ga , gb )(e) = (1, 1)}| =

From the definition of the distribution of (Ga , Gb ), we have


P [(Ga , Gb ) = (ga , gb )]
ma +mb

ma mb
2

p10
= p11 2

k
p10 p01
=c
p11 p00

+k

mb ma
2

p01

+k N
p00

ma +mb
2

where c depends only on ma and mb . Permuting Ga does


not change ma .

The expected value of X goes to infinity:



(n1)
p
n(1 p)n1 = n 1 +
1p


(n1)
p
n exp
1p


(n 1)p
= exp log n
1p


cn + p p log n
= exp
.
1p
Thus P [X 12 E[X]] 0.
Proof of Theorem 1, converse part:
From Lemma 2, if
(Ga , Gb ) (Ga , Gb ), then the posterior probability of is at least as large as the true permutation. From
Lemma 1, there are at least |Aut(Ga Gb )| such permutatations. Thus any deanonymizer succeeds with probability at
most 1/|Aut(Ga Gb )|. The graph Ga Gb is distributed as
ER(n, p11 ). With high probability, the size of the automorphism group of an ER(n, p11 ) graph goes to infinity with n.
More precisely, if p11 log n(1)
, then from Lemma 3 there
n
is some sequence qn such that


1
P
qn qn .
|Aut(Ga Gb )|
Any deanonimizer succeeds with probability at most 2qn .

It is well-known that Erd


os-Renyi graphs with average degree less than log n have many automorphisms [1]. The following lemma is precise version of this fact that is suitable
for our purposes.
n
Lemma 3. Let G ER(n, p). If p log nc
and cn
n
, then there is some sequence qn 0 such that P [|Aut(G)|
qn1 ] qn .

Proof: Let X be the number of isolated vertices in G. A


permutation that moves only isolated vertices is an automorphism of G, so |Aut(G)| X!. We will use Chebyshevs
inequality to bound the probability that there are few isolated vertices in G:
P [X

E[X 2 E[X]2 ]
1
E[X]] 4
.
2
E[X]2

The probability that a particular vertex is isolated is (1


p)n1 . Thus E[X] = n(1 p)n1 . The probability that a
particular pair of vertices are both isolated is (1 p)2n3 .

4.

ACHIEVABILITY

Now we will prove the achievability half of Theorem 1. From


Lemma 2, we know that the maximum a posteriori estimator is closely connected to the statistic (Ga , Gb ). This
measure the quality of the matching produced by the permutation . We would like to show that with high probability, all nonidentity permutations decrease the quality of the
matching between Ga and Gb .
Here is the basic strategy. Throughout, we will analyze random graphs for some fixed permutation. First, we will relate
the distribution of (Ga , Gb ) to (Ga , Ga ). Then,
we will precisely analyze the distribution of (Ga , Ga ).
This will allow us to obtain a tight bound on the probability
that a particular permutation produces a better matching
than the identity. We conclude the proof by applying the
union bound over permutations.

Definition 1. Define d(, Ga , Gb ) to be (Ga , Gb )


(Ga , Gb ). This is the difference in matching quality between the permutation and the identity permutation.

Definition 2. Let K (z) be the generating function for


the random variable k = 21 (Ga , Ga ) where Ga ER(n, p1 ):
X
K (z) =
P [(Ga , Ga ) = 2k]z k .
k

Lemma 4. Let (Ga , Gb ) ER(n, p), let be a permutation of [n], and let 2k = (Ga , Ga ). Conditioned on k,
d(, Ga , Gb ) has the generating function

k 
k
p01 z + p00 z 1
p11 z 1 + p10 z
Dk,Gb (z) =
p0
p1

4.1

Cycle combinatorics

Lemma 5.
Let al,k,r be the number of cyclic sequences of length l with
k ones and r ones that followed by ones. Let
X
al (x, y, z) =
al,k,r xk y lk z r .
k,r

Additionally, P [d(, Ga , Gb ) 0|Ga ] (z )k and P [d(, Ga , Gb ) Let cl be the number of cycles of length l in . Then
0] K (z ), where
n
Y
K (z) =
al (p1 , p0 , z)cl .

z =
(p00 p10 + p01 p11 + 2 p00 p10 p01 p11 ) .
l=1
p0 p1
Proof: Let a(e) P
= |Ga ((e))Gb (e)||Ga (e)Gb (e)|. Then
d(, Ga , Gb ) = e a(e). Because a(e) depends on Ga only at
Ga (e), the terms of the sum are conditionally independent.
If Ga ((e)) = Ga (e), then |Ga ((e)) Gb (e)| = |Ga (e)
Gb (e)| and the contribution of a(e) to d(, Ga , Gb ) is zero. If
Ga ((e)) 6= Ga (e), then |Ga ((e))Gb (e)| 6= |Ga (e)Gb (e)|
and a(e) is either 1 or 1.
Within each cycle of , the number of e such that Gb (e) = 0
and Gb ((e)) = 1 is equal to the number of e such that
Gb (e) = 1 and Gb ((e)) = 0. Throughout all of , the
number of e such that Gb (e) = 0 and Gb ((e)) = 1 is equal
to k = 21 (Ga , Ga ).
Suppose that Ga (e) = 0 and Ga ((e)) = 1. Then
p01
P [a(e) = 1|Ga (e) = 0, Ga ((e)) = 1] =
.
p0
Suppose that Ga (e) = 1 and Gb ((e)) = 0. Then
p10
.
P [a(e) = 1|Ga (e) = 1, Ga ((e)) = 0] =
p1
Thus

k 
k
p01 z + p00 z 1
p11 z 1 + p10 z
Dk,Gb (z) =
p0
p1
k

2
p00 p11 z + p00 p10 + p01 p11 + p10 p01 z 2
=
p0 p1
Abbreviate d = d(, Ga , Gb ). For all 0 < z 1

2

Theorem 3. Let q =
p11 p00 p10 p01 and let z
be as defined in Lemma 4. Then
al (p1 , p0 , z ) (1 2q)l/2
and
K (z ) (1 2q)

N c1
2

The proof of Theorem 3 will use a few combinatorial lemmas.


Let bl,s be the number of cyclic sequences of length l with s
ones, none of which are consecutive.
Lemma 6. For all l, k, s N,
!
!
X
r
l 2s
al,k,r
= bl,s
.
s
ks
r
Proof: This identity is due the following bijection. In the
first cycle, mark s of the r ones that are followed by zeros.
No two of these are consecutive. Place a one followed by a
zero at each these positions in the second cycle and fill in
the rest with zeros. There are bl,s such cycles. There are
l 2s remaining unspecified positions in the first cycle. In
these positions are k s ones and l k s zeros. Record
l2s
the symbols at these positions in a vector. There are k2s
such vectors.

P [d 0|Ga ] = E[1d0 |Ga ] E[z d |Ga ] = Dk,Gb (z).


The value of z that minimizes Dk,Gb (z) is

1/4
p00 p11
.
p01 p10

Lemma 7. For all l, s N.


l2s

bl,s = 2

X
i

! !
l
i
2i
s

At this z, (p0 p1 ) Dk,Gb (z) is equal to

( p00 p10 p01 p11 + p00 p10 + p01 p11 + p00 p10 p01 p11 )k .
Thus
P [d 0] =

P [(Ga , Ga ) = 2k]P [d 0|Ga ]

P [(Ga , Ga ) = 2k](z )k

= K (z ).

Proof: For s 1, consider the set of ternary cyclic sequences


of length l with exactly s ones, such that in each interval
separating a pair of ones there are an odd number of twos
(which forces the interval to be nonempty). Thus there
 are
an even number, 2i, of ones and twos. There are 2 si possible induced sequences of ones and twos: ones appear either
only in even positions or only in odd positions. Regardless
of the location of the ones, there are 2l2s possible induced
sequences of zeros and twos: there are l s total symbols

broken into s segments and there is a parity constraint on


each segment.

Then we have

For s = 0, both sides are equal to 2l .

p1 p0 (1 z )

= p1 p0 p00 p10 p01 p11 2 p00 p10 p01 p11


= p10 p00 + p10 p01 + p11 p00 + p11 p01

p00 p10 p01 p11 2 p00 p10 p01 p11

= p10 p01 + p11 p00 2 p00 p10 p01 p11

= ( p11 p00 p10 p01 )2

Lemma 8. The formal power series


!
i
X l 
4xy(z 1)
1l
l
al (x, y, z) = 2 (x + y)
.
1+
(x + y)2
2i
i

Proof: Applying the binomial theorem, then Lemma 6, then


the binomial theorem again, we obtain
X k lk r
al (x, y, z) =
x y z al,k,r

From

k,r

K (z) =

X r
=
x y al,k,r
(z 1)s
s
s
k,r
!
X k lk
l 2s
=
x y bl,s
(z 1)s
ks
k,s
X
=
bl,s (xy)s (x + y)l2s (z 1)s
X

k lk

= (x + y)l bl

xy(z 1)
(x + y)2


.

Applying Lemma 7 followed by the binomial theorem, we


obtain
X
bl (z) =
bl,s z s
=

1l+2s

=2

X
!i

X
1l

and

Pn

l=1

lcl = N , we obtain the second claim.

Proof of Theorem 1, achievability part: Let Sn,m be the set


of permutations of [n] that move exactly
m points and fix the

n
other n m. Then |Sn,m | = m
!m nm , where !m is the
number of derangments of [m]. If Sn,m , then e = {i, j}
is a fixed point of if either i and j are both fixed points of
or i and j form a cycle of length 2in . Thus c1 ,the number
of fixed points of , satisfies nm
c1 nm
+m
. Thus
2
2
2

n(n 1) (n m)(n m 1) m
2
m(2n m 2)
.
=
2

N c1

! !
n
i s
z
2i
s

n
(1 + 4z)i .
2i

Combining these gives the theorem.


p
Proof of Theorem 3: Let g = 1 + 4p1 p0 (z 1), where
z is the value from Lemma 4. Subsituting p1 , p0 , and z
into the expression from Lemma 8, we obtain
!
X l

1l
al (p1 , p0 , z ) = 2
g 2i
2i
i
!
X l
l
=2
(1 + (1)j )g j
j
j
l 
l

1g
1+g
+
=
2
2

2 
2 !l/2
1+g
1g

+
2
2


2 l/2
1+g
=
2
= (1 + 2p1 p0 (z 1))l/2

al (p1 , p0 , z)cl

l=1

n
Y

The probability that there is some permutation that produces a better match than the identity permutation is

P [6=I d(, Ga , Gb ) 0]
X

P [d(, Ga , Gb ) 0]
=

6=I
n
X

P [d(, Ga , Gb ) 0]

m=2 Sn,m

n
X
m=2

nm max P [d(, Ga , Gb ) 0].


Sn,m

Here we applied the union bound, grouped permutations by


the number of points that they move, and considered the
worst case permutation in each group.
From Lemma 4 and Theorem 3,

P [d(, Ga , Gb ) 0] K (z ) (1 2q)

N c1
2

where q =

2

p11 p00 p10 p01 . Substituting, we obtain

P [6=I d(, Ga , Gb ) 0]
n
X
N c1

nm (1 2q) 2
m=2
n
X

nm (1 2q)

m=2
n 
X

m(2nm2)
4

2n m 2
log(1 2q)
4
m=2

m
n 
X
q(2n m 2)

n exp
2
m=2
m


n
X
q(n 2)

exp log n
2
m=2

m

n exp

x2
1x

where

log x = log n

q(n 2)
.
2

As long as
q2

log n + (1)
n

we have


2
log x log n (log n + (1)) 1
n


.

Because q 1, log x and the probability of being


able to deanonymize goes to one. Finally, we have

2
r
p10 p01
q = p11 p00 1
p11 .
p11 p00

5.

REFERENCES

[1] B. Bollob
as. Random graphs. Springer, 1998.
[2] P. Pedarsani and M. Grossglauser. On the privacy of
anonymized networks. In Proceedings of the 17th ACM
SIGKDD international conference on Knowledge
discovery and data mining, pages 12351243. ACM,
2011.

Das könnte Ihnen auch gefallen