Beruflich Dokumente
Kultur Dokumente
I T H A S TA K E N P L A C E .
G E O R G E B E R N A R D S H AW
W O R D S E M P T Y A S T H E W I N D A R E B E S T L E F T U N S A I D.
HOMER
E V E RY T H I N G B E C O M E S A L I T T L E D I F F E R E N T A S S O O N A S I T I S S P O K E N
O U T L O U D.
HERMANN HESSE
L A N G U A G E I S A V I R U S F R O M O U T E R S PA C E .
WILLIAM S. BUROUGHS
A N U P R A O , A M I R Y E H U D AYO F F
C O M M U N I C AT I O N
COMPLEXITY
( E A R LY D R A F T )
Contents
I Fundamentals 13
1 Deterministic Protocols 15
Defining 2 party protocols 17
Balancing Protocols 18
Rectangles 18
From Rectangles to Protocols 20
Some lower bounds 21
Rectangle Covers 27
2 Rank 33
Basic Properties of Rank 33
Lower bounds using Rank 35
Towards the Log-Rank Conjecture 37
Non-negative Rank and Covers 42
3 Randomized Protocols 45
Variants of Randomized Protocols 47
Public Coins vs Private Coins 49
Nearly Monochromatic Rectangles 50
6
4 Numbers On Foreheads 53
Cylinder Intersections 56
Lower bounds from Ramsey Theory 57
5 Discrepancy 63
Some Examples Using Convexity in Combinatorics 64
Lower bounds for Inner-Product 65
Lower bounds for Disjointness in the Number-on-Forehead model 68
6 Information 73
Entropy, Divergence and Mutual Information 75
Some Examples from Combinatorics 82
Lower bound for Indexing 84
Randomized Communication of Disjointness 84
Lower bound for Number of Rounds 87
Lower bounds on Non-Negative Rank 92
7 Compressing Communication 93
II Applications 95
Bibliography 117
Introduction
Acknowledgements
For a positive integer h, we use [h] to denote the set {1, 2, . . . , h}.
2[h] denotes the power set, namely the family of all subsets of [h].
All logarithms are computed base 2 unless otherwise specified. A
boolean function is a function whose values are in the set {0, 1}.
Random variables are denoted by capital letters (e.g. A) and
values they attain are denoted by lower-case letters (e.g. a). Events
in a probability space will be denoted by calligraphic letters (e.g.
E ). Given a = a1 , a2 , . . . , an , we write ai to denote a1 , . . . , ai . We
define a<i similarly. We write aS to denote the projection of a to
the coordinates specified in the set S [n]. [k] to denotes the set
{1, 2, . . . , k}, and [k]<n denotes the set of all strings of length less than
n over the alphabet [k ], including the empty string. |z| denotes the
length of the string z.
Graphs
Probability
| p q| = (1/2) | p( x ) q( x )| = max p( T ) q( T ),
x T
E [| p( a|b) p( a)|] e.
p(b)
Convexity 0 0.5
x
f ( x )+ f (y)
A function f : R R is said to be convex if 2 f x+2 y , for
f ( x )+ f (y) x +y
all x, y in the domain. It is said to be concave if 2 f 2 .
Some convex functions: x2 , e x , x log x. Some concave functions:
log x, x.
Jensens inequality says if a function f is convex, then E [ f ( X )] x log x
f (E [ X ]), for any real-valued random variable X. Similarly, if f is x
concave, then E [ f ( X )] f (E [ X ]).
1 2 3 4
x
Part I
Fundamentals
1
Deterministic Protocols
Equality Alice and Bob are given two n-bit strings x, y {0, 1}n and
want to know if x = y. There is a trivial solution: Alice can send
her input to Bob, and Bob can let her know if x = y. This is a 1
These terms will be made clear in due
deterministic1 protocol that takes n + 1 bits of communication, and course.
no deterministic protocol can do better. On the other hand, there is
a randomized1 protocol that uses only O(1) bits of communication:
the parties can hash their inputs and check that the hashes are the
same. There is a non-deterministic1 protocol that uses O(log n) bits
of communication: If Alice guessed an index i where xi 6= yi , she
could send it to Bob and they could confirm that their inputs are
not the same.
Cliques and Independent Sets Alice and Bob are given two sets A, B
[n] and both know a graph G on the vertex set [n], with the
promise that A is a clique and B is an independent set. They want
to know whether A intersects B or not. There is no one-way pro-
tocol that solves this problem efficiently using less than n bits of
communication. However, there is an interactive protocol that uses
O(log2 n) bits of communication. If A contains a vertex v of de-
gree less than n/2, Alice announces the name of the vertex. Either
v B or Alice and Bob can safely discard all the non-neighbors
of v, since these cannot be a part of A. This reduces the size of the
graph by a factor of 2. Similarly, if B contains a vertex v of degree
at least n/2, Bob announces the name of v. Again, either v A,
16 communication complexity
or Alice and Bob can safely discard all the neighbors of v which
reduces the size of the graph by a factor of 2. After at most log n
such steps, Alice and Bob will have determined the answer.
k-Disjointness Alice and Bob are given two sets A, B [n], each
of size k, and want to know if the sets share a common element.
Alice can send her set over, which takes k log n bits of communi-
cation. There is a randomized protocol that uses only O(k ) bits of
communication. Alice and Bob sample a random sequence of sets
in the universe, Alice announces the name of the first set that con-
tains A. If A and B are disjoint, this eliminates half of B. Repeating
this procedure gives a protocol with O(k) bits of communication.
There is a non-deterministic protocol that uses O(log n) bits of
communication.
k k 1.
Lemma 1.2. The number of leaves in the protocol tree for k k is at most
2k k .
Balancing Protocols
Lemma 1.4. In every protocol tree that has ` leaves, there is a vertex v such
Input: Alice knows x X , Bob
that the subtree rooted at v contains r leaves, and `/3 r 2`/3. knows y Y , both know a
protocol that has ` leaves.
Proof. Let r be the root of the protocol tree. Consider the sequence Output: The outcome of the
of vertices r = v1 , v2 , . . . defined as follows. v1 is the root of the tree, protocol .
and for each i, vi+1 is the child of vi that has the most leaves under it. while has more than 1 leaf do
Find a vertex v as promised by
Let `i denote the number of leaves in the subtree rooted at vi . By the
Lemma 1.4;
definition of vi , we have that `i+1 `i /2, and `i+1 < `i . Since `1 = `, Alice and Bob exchange two
and the sequence is decreasing until it hits 1, there must be some i for bits to confirm that their
inputs are consistent with the
which `/3 `i 2`/3. path in the protocol tree to v;
if both inputs are consistent with
In each step of the balanced protocol, the parties pick a vertex v v then
Replace with the
as promised by Lemma 1.4 and each verify whether their input is
subtree rooted at v;
consistent with the entire path leading up to v. If this is the case, the else
parties repeat the procedure at the subtree rooted at v. If not, the Remove v from the
protocol tree, and replace
parties delete the vertex v from the protocol tree and continue the vs parent with vs sibling;
protocol. In each step, the number of leaves of the protocol tree is end
reduced by a factor of at least 23 , so there can be at most log3/2 ` such end
Output the unique leaf in ;
steps.
Figure 1.1: Rebalancing Protocol
Rectangles
A0 = { x A : f v ( x ) = 0}
A1 = { y A : f v ( x ) = 1}
A0 , A1 are a partition of A, and moreover Ru = A0 B, Rw = A1 B. Figure 1.3: A partition of the space into
Thus Ru , Rv are rectangles that partition v. Continuing in this way, rectangles.
Theorem 1.7. If the communication complexity of g : X Y {0, 1} is c, Figure 1.5: A partition into rectangles
then X Y can be partitioned into at most 2c monochromatic rectangles. that does not correspond to a protocol.
20 communication complexity
1
monochromatic rectangles R that cover all inputs. The aim of the
protocol is to find a rectangle R x,y such that ( x, y) R R. In
each step, one of the parties will announce the name of a rectangle 3
R = A B that is consistent with their input. If Alice announces
such a rectangle, then it must be that x A, so both parties can safely
2
discard all rectangles in R that do not vertically intersect R. Any
rectangle that contains ( x, y) will not be discarded. Similarly, if Bob
announces R, then both parties can safely discard all rectangles that
do not horizontally intersect R. We shall show that there will always Figure 1.6: Rectangles 1 and 2 intersect
vertically, while rectangles 1 and 3
be an R that one of the parties can announce that will allow for many intersect horizontally.
other rectangles to be discarded.
Let R0 = { R R : g( R) = 0}, be the set of rectangles that have
value 0, and R1 = { R R : g( R) = 1} be the set of rectangles with
value 1.
Alice can send Bob her input, and Bob can respond with the value
of a function, giving a protocol with communication n + 1. Is
there a protocol with communication n? Since any such protocol
induces a partition into 2n monochromatic rectangles, a first
attempt at proving a lower bound might to try and show that
there is no large monochromatic rectangle. If we could prove that,
then we could argue that many monochromatic rectangles are
needed to cover the whole input. Unfortunately, equality does
have large monochromatic rectangles, for example, the rectangle
R = {( x, y) : x1 = 0, y1 = 1}. This is a rectangle that has density
1
4 , and it is monochromatic, since EQ( x, y ) = 0 for every ( x, y ) R.
Instead, we will try to show that equality does not have any large
1-monochromatic rectangle.
Observe that if x 6= x 0 , then the points ( x, x ) and ( x, x 0 ) cannot be
in the same monochromatic rectangle. Otherwise, by Lemma 1.5,
( x, x 0 ) would also have to be included in this rectangle. Since the
rectangle is monochromatic, we would have EQ( x, x 0 ) = EQ( x, x ),
which is a contradiction. We have shown:
Claim 1.13. Every 1-monochromatic rectangle of EQ has size at most 1.
Alice can send her whole set X to Bob, which gives a protocol with
communication n + 1. Can we prove that this is optimal? Once
again, this function does have large monochromatic rectangles,
for example the rectangle R = {( X, Y ) : 1 X, 1 Y }, but we
shall show that there are no large monochromatic 1-rectangles.
Indeed, suppose R = A B is a 1- monochromatic rectangle. Let
X 0 = X A X and Y 0 = Y B Y. Then X 0 and Y 0 must be disjoint,
0 0
so | X 0 | + |Y 0 | n. On the other hand, | A| 2| X | , | B| 2|Y | , so
| R| = | A|| B| 2n . We have shown:
Claim 1.15. Every 1-monochromatic rectangle of Disj has size at most 2n .
Richness
Sometimes we need to understand asymmetric communication proto-
cols, where we need separate bounds on the communication complex-
5
Miltersen et al., 1998
ity of Alice and Bob. The concept of richnesst5 is useful here:
Proof. The statement is proved inductively. For the base case, if the
protocol does not communicate at all, then clearly g( x, y) = 1 for all
x X , y Y , and the statement holds.
If Bob sends the first bit of the protocol, then Bob partitions Y =
Y0 Y1 . One of these two sets must have v/2 off the inputs y that
make g (u, v) rich. By induction, this set contains a 2ua 2av/2
+b1 1-
monochromatic rectangle, as required. On the other hand, if Alice
sends the first bit, then this bit partitions X into two sets X0 , X1 .
Every input y that has u 1s must have u/2 1s in either X0 or X1 .
Thus there must be v/2 choices of inputs y Y that have u/2 1s
for g restricted to X0 Y or for g restricted to X1 Y . By induction,
we get that there is a 1-monochromatic rectangle with dimensions
u/2
2 a 1
2av/2
1+b , as required.
ever, what can we say about the communication of this problem if communication complexity of this
problem is at least log n .
Alice is forced to send much less than log (n ) bits? k ()
k
To prove a lower bound, we need to analyze rectangles of a
certain shape. We restrict our attention to special family of sets for
24 communication complexity
Alice and Bob. Let n = 2kt, and suppose Y contains exactly one
element of 2i 1, 2i, for each i, and that X contains exactly one
element from 2t(i 1) + 1, . . . , 2ti for each i.
The disjointness matrix here is at least (tk , 2kt )-rich, since every
choice Y allows for tk possible choices for X that are disjoint. By
Lemma 1.18, any protocol where Alice sends a bits and Bob sends
b bits induces a 1-monochromatic rectangle with dimensions
tk /2a 2ktab , so Claim 1.19 gives:
a/k
2kt ab 2ktkt/2
n
a + b a/k+1 .
2
We conclude:
Span Suppose Alice is given a vector x {0, 1}n , and Bob is given
a n/2 dimensional subspace V {0, 1}n . Their goal is figure out
whether or not x V. As in the case of disjointness, we start by
claiming that the inputs do not have 1-monochromatic rectangles
of a certain shape:
2
The problem we are working with is at least (2n/2 , 2n /4 /n!)-
2
rich, since there are at least 2n /4 /n! subspaces, and each contains
2n/2 vectors. Applying Lemma 1.18 and Claim 1.21, we get that if
there is a protocol where Alice sends a bits and Bob sends b bits,
2 /4 a b 2 /2 n log 2n/2 a
2n /n! 2n
n2 /4 a b n log n na
n2 /4 a(n + 1) n log n b.
Theorem 1.22. If Alice sends a bits and Bob sends b bits to solve the span
problem, then b n2 /4 a(n + 1) n log n.
last pair of sets must intersect, while the other two pairs are dis-
joint. Since S has 2n pairs, and at least one more monochromatic
rectangle is required, this proves:
Krapchenkos Method
We end this chapter with a clever idea of Krapchenko. Let X = { x
{0, 1}n : in=1 xi = 0 mod 2} and Y = {y {0, 1}n : in=1 yi = 1
mod 2}. Since X and Y are disjoint, for every x X , y Y , there
is an index i such that xi 6= yi . Suppose Alice is given x and Bob is
given y, and they want to find an index i. How much communication
is required?
Perhaps the most trivial protocol is for Alice to send Bob her entire
string, but we can use binary search to do better. Since
xi + xi 6 = yi + yi mod 2,
i n/2 i >n/2 i n/2 i >n/2
Alice and Bob can exchange in/2 xi mod 2 and in/2 yi mod 2.
If these values are not the same, they can safely restrict their attention
to the strings xn/2 , yn/2 and continue. On the other hand, if the
values are the same, they can continue the protocol on the strings
x>n/2 , y>n/2 . In this way, in every step they communicate 2 bits and
eliminate half of their input string, giving a protocol of communica-
tion complexity 2 log n.
It is easy to see that log n bits of communication are necessary,
because thats how many bits it takes to write down the answer. We need at least n monochromatic
rectangles to cover pairs of the type
Now we shall prove that 2 log n bits are necessary, using a variant of (0, ei ), where ei is the ith unit vector.
fooling sets. Consider the set of inputs
since they both belong to S, that is the only coordinate on which they
disagree. Thus y = y0 . Similarly we cannot have two distinct elements
( x, y), ( x 0 , y) S that belong to the same monochromatic rectangle.
Thus, if R = A B has r elements of S, we must have | A| r, | B| r,
proving that | R| r2 .
Now suppose there are t monochromatic rectangles that cover
the set S, and the ith rectangle covers ri elements of S. Then |S| =
it=1 ri , but since the rectangles are disjoint, 22n2 it=1 ri2 . Using
these facts and the Cauchy-Schwartz inequality:
!2
t t
2n2
2 ri2 ri / t = n2 22n2 /t,
i =1 i =1
proving that t n2 . This shows that the binary search protocol is the
best one can do.
Rectangle Covers
< (n)
2k t 2
(1 22k )t e2 , Fact: 1 x e x for x 0.
k
g(( x1 , . . . , xk ), (y1 , . . . , yk )) = g( x1 , y1 ), g( x2 , y2 ), . . . , g( xk , yk ).
7
Feder et al., 1995
In fact, one can show that even computing the two bits ik=1 g( xi , yi ),
and ik=1 g( xi , yi ) requires k( c log n 1) bits of communication8 .
The main technical lemma we show is:
Theorem 1.8 and Lemma 1.30 imply that g has a protocol with
communication (`/k + log n + 1)2 . Thus,
as required.
Now we turn to proving Lemma 1.30. We find the rectangles that
cover the inputs to g iteratively. Let S {0, 1}n {0, 1}n denote
the set of inputs to g that have not yet been covered by one of the
monochromatic rectangles we have found. Initially, S is the set of all
inputs. We claim:
proving that this process will stop after at most 2n 2`/k steps.
Exercise 1.1
Define the inner product of two n-bit strings x, y to be in=1 xi yi
mod 2. Use linear algebra to show that inner-product has no 0-
rectangle of size bigger than 2n . Conclude that the communication
complexity of inner-product is (n).
Exercise 1.2
Use ideas from Yannakakiss protocol to show that if g : X Y
{0, 1} is such that g1 (1) can be partitioned into 2c rectangles, then g
has communication complexity at most O(c2 ).
30 communication complexity
Exercise 1.3
Suppose Alice and Bob each get a subset of size k in the elements
[n], and want to know whether these sets intersect or not. Use the
fooling set method to show that at least log(bn/k c) bits are required.
Exercise 1.4
Suppose Alice gets a string x {0, 1}n which has more 0s than
1s, and Bob gets a string y {0, 1}n that has more 1s than 0s.
They wish to communicate to find a coordinate i where xi 6= yi . Use
Krapchenkos method to show that at least 2 log n bits of communica-
tion are required.
Exercise 1.5
Show that almost all functions f : {0, 1}nn {0, 1} require
communication (n), where Alice gets x {0, 1}n , Bob gets y
{0, 1}n , and they must evaluate f ( x, y).
Exercise 1.6
Let X and Y be families of subsets of [n]. Assume for all x X
and y Y the intersection of x and y contains at most 1 element, that
is, | x y| 1. Define the communication problem as follows. Alice
receives x X, Bob receives y Y, and they with to evaluate the
function f : X Y {0, 1} defined as f ( x, y) = | x y|. Show the
deterministic complexity of f is O(log2 (n)).
Exercise 1.7
Alice and Bob receive subsets of numbers X, Y [n]. They wish
to output the median of X Y. Exhibit a deterministic protocol
with O(log n) bits of communication. Show no protocol can do
asymptotically better.
Exercise 1.8
Consider the partial function f : {0, 1}n {0, 1}n {0, 1}, where
the inputs to the parties are interpreted as 2 n/2 bit strings and
1 if x = y and x 0 6= y0 ,
f ( x, x 0 , y, y0 ) =
0 if x 6= y and x 0 = y0 .
Exercise 1.9
For a boolean function g, define gk by g( x1 , x2 , . . . , xk , y1 , . . . , yk ) =
ik=1 g( xi , yi ).
Show that if gk has a 1-cover of size 2` , then g has a
1-cover of size 2`/k .
Exercise 1.10
9
Alon and Orlitsky, 1995
In this exercise, we will show9 that an optimal direct sum theorem
does not hold for the deterministic communication complexity of
relations. Consider the problem where Alice is given a subset X [n]
of size t, and Bob is given no input. The players want to output an
element of X.
bits, which is significantly less than k log(n t + 1), when t = n/2. Hint: Pick a random subset of [n]k of
size (n/t)k ln (n ) , and argue that it
k
t
intersects X1 Xk with positive
probability.
Exercise 1.11
Show that if g : {0, 1}n {0, 1}n {0, 1} requires c bits of
communication, then computing ik=1 g( xi , yi ), and ik=1 g( xi , yi )
requires k( c/2 log n 1) bits of communication. HINT: Find a
small 1-cover using the protocol for computing ik=1 g( xi , yi ), and
0-cover using the protocol for computing ik=1 g( xi , yi ).
2
Rank
The most basic quantity associated with a matrix is its rank. The
rank of a matrix is the maximum size of a set of linearly independent
rows in the matrix. Its versatility stems from the fact that it has many
interpretations:
The matrices we are working with are boolean, so one can view
the entries of the matrix as real numbers, or rationals, or coming
from the field of integers modulo 2: F2 . This potentially leads to 3
different notions of rank, but we have:
Lemma 2.7. The real rank of a boolean matrix is the same as its rational
rank. The real rank is always at least as large as the rank over F2 .
The proof of the first fact follows from Gaussian elimination. If the
rank over the rationals is r, we can always apply a linear transforma-
tion to the rows using rational coefficients to bring the matrix into
this form:
1 0 0 . . . 0 M1,r+1 . . . M1,n
0 1 0 . . . 0 M2,r+1 . . . M2,n
0 0 1 . . . 0 M3,r+1 . . . M2,n
.. .. .. . . ..
. . . . .
0 0 0 . . . 1 M Mr,n
r,r +1 . . .
0 0 0 . . . 0 0 ... 0
.. .. .. .. .. .. .. ..
. . . . . . . .
This transformation does not affect the rank over the reals, and now
it is clear that the rank is exactly r. Now if any set of rows is linearly
rank 35
The trivial protocol takes n bits, and one case use bounds on the
size of the largest rectangle to show that the communication is at
See Exercise 1.1
least (n). Here it will be helpful to use Fact 2.4. If Pn represents
the matrix whose entries are , sorting the rows and columns
lexicographically, we see that
" # " #
Pn1 Pn1 1 1
Pn = = Pn1 ,
Pn1 Pn1 1 1
1
Lovsz and Saks, 1988
Lovasz and Saks conjectured 1 that Theorem 2.11 is closer to the
truth than Theorem 2.9:
Let us see how to use Lemma 2.14 to get a protocol. Let R be the
rectangle promised by the lemma. Then,
" rearranging
# the rows and
R A
columns, we can write the matrix as: . Now we claim6 that
B C
38 communication complexity
" #! " #!
R h i R A
rank + rank R A rank + 3.
B B C
So by Fact 2.3, 7
(t + 3)/2 2t/3, when t 9.
" #!
R h i
rank + rank R A rank( A) + rank( B) + 2
8
Fact: 1 x e x , for x 0.
B
" #!
0 A Input: Alice knows i, Bob knows j.
rank +2 Output: Mi,j .
B C
" #!
R A while rank( M) > 9 do
rank + 3. (2.2) Find a monochromatic
B C rectangle R as promised by
Lemma 2.14;
" #
R A
R Write M =
B C
;
Now suppose has the smaller rank. Then Bob sends the bit
B if
R
0 if his input is consistent with R and 1 otherwise. If it is consistent, rank
B
> rank R A
then if rank( M ) > 9, players have reduced7 the rank of the matrix by then
a factor of at least 23 . If it is not consistent, the players have reduced if i is consistent with R
then
the size of the matrix by a factor of 1 2 20 r log r.
Both parties
replace
By Lemma 2.8, we can assume that any matrix of rank r has at M with R A ;
else
most 2r rows and columns. The
number of 0 transmissions in this Both parties replace
protocol is at most 2r ln 2 220 r log r , since after that many transmis- M with B C ;
end
sions, the number of entries in the matrix have been reduced to8
else
if j is consistent with R
r log r 2r 220 r log r 20 r log r 2r ln 2220 r log r
22r (1 220 ) < 22r e2 then
Both parties
replace
= 22r e2r ln 2 = 1. R
M with ;
B
The number of 1 transmissions is at most O(log3/2 r ), since af- else
Both parties
replace
ter that many transmissions, the rank of the matrix is reduced to A
M with ;
less than 6. Thus,
the number of leaves in this protocol is at most C
20 r log r end
( 2r ln 2 2
log3/2 r ) 2
2O( r log r) . By Theorem 1.3, we can bal-
end
end
ance the protocol tree to obtain a protocol with communication
The parties exchange at most 9 bits
O( r log2 r ) that computes the same function. to compute Mi,j , using Theorem
It only remains to prove Lemma 2.14. To prove it, we need to 2.9;
understand Johns theorem. A set K Rr is called convex if whenever Figure 2.1: Protocol for Low Rank
r log2 r )
x, y K, then all the points on the line from x to y are also in K. The Matrices with 2O( leaves.
rank 39
where j = j if j 6= i, and i = ei .
For the rest of the proof, we assume that M has as least mn/2 0s.
We can do this, because if M has more 1s than 0s, we can replace M
with J M, where J is the all 1s matrix. This can increase the rank by
at most 1, but now the role of 0s and 1s has been reversed.
Lemma 2.17 says something about the angles between the vectors
hvi ,w j i
we have found. Define i,j = arccos kv kkw k . Then observe that
i j
when vi , w j are orthogonal, the angle is /2. But when the inner
product is 1, the angle is at most arccos 1r 2 72
.
r
So we get:
= if Mi,j = 0,
2
i,j 2
if Mi,j = 1.
2 7 r
vi
4
wj
vi
For a fixed (i, j) and k the probability that hvk , zk i > 0 and
/2i,j 2p
1 p
hwk , zk i < 0 is exactly 4 2 . So we get 7 r
t
= 1 if Mi,j = 0, wj
4
Pr[(i, j) R] t
R 1 1
if Mi,j = 1.
4 7 r
mn 14r log r
E [ Q] 2 (1 1/r )
2
mn 216 r log r . since r > 1.
42 communication complexity
T T0 A1 , . . . , Ar 0 T 00
Another way to measure the complexity of a matrix is by measur- Figure 2.5: Going from a nearly
monochromatic rectangle to a
ing its non-negative rank. The non-negative rank of a m n boolean monochromatic rectangle.
matrix M is the smallest number r such that M = AB, where A, B are
matrices with non-negative entries, such that A is an m r matrix
and B is an r n matrix. Equivalently, it is the smallest number of
non-negative rank 1 matrices that sum to M. Clearly, we have
Moreover, one can prove that if a matrix has both small rank and a
11
Lovsz, 1990
small cover, then there is a small communication protocot11 :
Proof. The protocol is similar to the one used to prove Theorem 1.8.
For every rectangle R in the cover, we can write
" #
R A
M= ,
B C
or
" #!
R
rank (rank( M) 3)/2. (2.4)
B
Exercise 2.1
Fix a function f : X Y {0, 1} with the property that in
every row and column of the communication matrix M f there are
exactly t ones. Cover the zeros of M f using O(t(log| X | + log|Y |))
monochromatic rectangles.
44 communication complexity
Exercise 2.2
Show that Nisan-Widgerson protocol (i.e., the proof of Lemma
2.13) goes though even if we weaken Lemma 2.14 to only guarantee
a rectangle with rank at most r/8 (instead of rank at most one, or
monochromatic).
Exercise 2.3
Recall that for a simple, undirected graph G the chromatic number
( G ) is the minimum number of colors needed to color the vertices
of G so that no two adjacent vertices have the same color. Show that
log ( G ) is at most the deterministic communication complexity of
Gs adjacency matrix.
Exercise 2.4
For any symmetric matrix M {0, 1}nn with ones in all diagonal
entries, show that
n2
2c ,
| M|
where c is the deterministic communication complexity of M, and
| M| is the number of ones in M.
Exercise 2.5
For any boolean matrix M, define rank2 ( M) to be the rank of M
over F2 , the field with two elements. Exhibit an explicit family of ma-
trices M {0, 1}nn with the property that c rank2 ( M )/10, where c
is the deterministic communication complexity of M. Conclude that
this falsifies the analogue of log-rank conjecture for rank2 .
Exercise 2.6
Show that if f has fooling set of size s then rk ( M f ) s. Hint:
tensor product.
3
Randomized Protocols
used the rank method to argue that at least log (n ) k log(n/k ) Let J = [n];
k
bits of communication are required. Here we give a randomized while | J | > 1 do
Let J 0 be the first | J |/2
protocol2 that requires only O(k) bits of communication3 , which is elements of J;
more efficient when k n. Both parties use shared
randomness to sample a
Alice and Bob sample a sequence of sets R1 , R2 , . . . [n], un- random function
formly at random. They exchange 2 bits to announce whether or 0
h : {0, 1}| J | {0, 1}2 log log ` ;
not their sets are empty. If neither set is empty, Alice announces Alice sends h evaluated on the
bits in J 0 , h( x J 0 );
the index of the first set Ri that contains her set, and Bob an- Bob announces whether or not
nounces the index of the first set R j that contains his set. Now h ( x J 0 ) = h ( y J 0 );
if h( x J 0 ) = h(y J 0 ) then
Alice can safely replace her set with X R j , and Bob can replace Alice and Bob replace
his set with Y Ri . If at any point one of the parties is left with an J = J \ J0;
empty set, they can safely conclude that the inputs were disjoint. else
Alice and Bob replace
We will argue that if the sets are disjoint, this process terminates J = J0;
after O(k) bits of communication. end
Both parties announce x J , y J ;
Assume that X, Y are disjoint. Let us start by analyzing the
end
expected number of bits that will be communicated in the first
Figure 3.3: Public-coin protocol for
step. We claim: greater than.
2
Hstad and Wigderson, 2007
Claim 3.1. E [i ] = 2| X | , E [ j] = 2 |Y | .
3
Later, we show that (k ) bits are
required.
Proof. The probability that the first set of the sequence contains
X is exactly 2|X | . In the event that it does not contain X, we are
picking the first set that contains X from the rest of the sequence.
randomized protocols 47
Theorem 3.4. Let M be an m n matrix. Then The minimax principle can also be seen
as a consequence of linear program-
ming duality.
min max xMy = max min xMy,
x 0 y 0 y 0 x 0
Theorem 3.5. If g : {0, 1}n {0, 1}n {0, 1} can be computed with c bits
of communication, and error e in the worst case, then it can be computed by
a private coin protocol with c + log(n/e2 ) + O(1) bits of communication, It is known that computing whether or
and error 2e in the worst case. not two n-bit strings are equal requires
(log n) bits of communication if only
private coins are used. This shows that
Proof. We use the probabilistic method to find the required private
Theorem 3.5 is tight.
coin protocol. Let us pick t independent random strings, each of
which can be used as the randomness for the given private-coin
protocol.
For any fixed input, some of these t random strings lead to the
public coin protocol computing the right answer, and some of the
lead to the protocol computing the wrong answer. By the Chernoff
bound, the probability that 1 2e fraction of the t strings lead to the
2
wrong answer is at most 2(e t) . We set t = O(2n/e2 ) to be large
enough so that this probability is less than 22n . Then by the union
bound, we get that the probability that 2et of these strings give the
wrong answer for any input is less than 1. Thus there must be some
fixed strings with this property.
The private coin protocol is now simple. Alice samples one of the
t strings and sends its index to Bob, which takes at most log(n/e2 ) +
O(1) bits. Alice and Bob then run the original public coin protocol.
50 communication complexity
Pr[ g( x, y) = b|( x, y) R] 1 e.
Theorem 3.7. If there is a c-bit protocol that computes g with error e under
a distribution , then you can partition the inputs into 2c rectangles, such
that the average bias of a random rectangle from the partition is at least 1 e.
Theorem 3.8. If there is a c-bit protocol that computes g with error e under
, then for every `, there are disjoint (1 `e)-monochromatic rectangles
R1 , R2 , . . . , R2c such that Pr [( x, y) i Ri ] 1 1/`. Theorem 3.8 will be instrumental to
prove lower bounds on randomized
protocols
As a corollary, we get:
Corollary 3.9. If there is a c-bit protocol that computes g with error e under
, then for every `, there is a (1 `e)-monochromatic rectangle of density at
least 2c (1 1/`).
randomized protocols 51
Exercise 3.1
In this exercise we will develop a randomized protocol for greater-
than that requires only O(log log n) bits of communication. Let
x, y {0, 1}` be two strings. Alice and Bob want to find the smallest i
such that xi 6= yi .
Exercise 3.2
In this exercise, we design a randomized protocol for finding the
first difference between two n-bit strings. Alice and Bob are given n
bit strings x 6= y and want to find the smallest i such that xi 6= yi . In
class we saw how to accomplish this using O(log n log log n) bits of
communication. Here we do it with O(log n) bits of communication.
Define a rooted tree as follows. Every vertex will correspond to an
interval of coordinates from [n]. The root corresponds to the interval
I = [n]. Every internal vertex corresponding to the interval I will
have two children, the left child corresponding to the first half of I
and the right child corresponding to the right half of I. This defines
a tree of depth log n, where the leaves correspond to intervals of
size 1 (i.e. coordinates) of the input. At each leaf, attach a path of
length 3 log n. Every vertex of this path represents the same interval
of size 1. The depth of the tree is now 4 log n.
3. Use the Chernoff bound to argue that the number of hashes that
gives the right answer is high enough to ensure that the protocol
52 communication complexity
Exercise 3.3
Show that if the inputs to greater-than are sampled uniformly
and independently, then there is a protocol that communicates only
O(log(1/e)) bits and has error at most e under this distribution.
4
Numbers On Foreheads
1
Chandra et al., 1983
The number-on-forehead model1 of communication is one way
to generalize the case of two party communication to the multiparty
setting. There are k parties communicating, and the ith party has an
input drawn from the set Xi written on their forehead. Each party can
When there are only k = 2 parties, this
see all of the inputs except the one that is written on their forehead. model is identical to the model of 2
The fact that each party can see most of the inputs means that parties party communication.
do not need to communicate as much. It also means that proving
lower bounds against this model is particularly hard. Indeed, we
do not yet known how to prove optimal lower bounds in the model
of computation, in stark contrast to models where the inputs are
Moreover, optimal lower bounds in
completely private. this model would have very interesting
We start with some examples of clever number-on-forehead proto- consequences to the study of circuit
cols. complexity.
Intersection size Suppose there are k parties, and the ith party has a
subset Xi [n] on their forehead. The parties want to compute A protocol solving this problem would
the size of the intersection i Xi . We shall describe a protocol2 that compute both the disjointness function
requires only O(k4 n/2k ) bits of communication. and the inner product function.
Claim 4.1. If there are two valid solutions Ak , . . . , A0 and A0k , . . . , A00
that are both consistent with the values Ci,j , then either A0k = Ak , or for
each j, | A j A0j | (kj).
as required.
k
Claim 4.1 implies that if n < (k/2 ) 2k / k, then there can
only be one solution for Ak , since | Ak/2 A0k/2 |
cannot exceed
n. To get the final protocol, the parties divide the columns of
k , and compute A
the matrix into blocks of size at most
k/2 ( ) k
for each such block separately. The total communication is then
k
nk2 log( )k/2 = O(k4 n/2k ).
k
( )
k/2
Exactly n Suppose 3 parties each have a number from [n] written on
their forehead, and want to know whether these numbers sum to The randomized communication of
n or not.A trivial protocol is for one of the parties to announce one exactly n is only a constant, since Alice
can use the randomized protocol for
of the numbers she sees, which takes log n bits. Here we use ideas
p equality to check that her forehead
of Behrend3 to show that one can do it with just O( log n) bits of is equal to what it needs to be for all
communication. Behrends ideas lead to a coloring of the integers numbers to sum to n.
a + 2b + a = 2( a + b)
a + 2b W ( a + 2b) + a W ( a) = 2( a + b W ( a + b)).
as required.
p
communication required is O( log n) since there are at most
2O( log n) colors in the coloring.
Cylinder Intersections
p
protocol that computes this function using O( log n) bits of commu-
nication. Here we show that (log log log n) bits of communication 5
Chandra et al., 1983
are required5 .
Let cn is the communication of the exactly n problem. Three points
of [n] [n] form a corner if they are of the form ( x, y), ( x + d, y), ( x, y +
d). A coloring of [n] [n] with 2c colors is a function g : [n] [n] [C ].
We say that the coloring avoids monochromatic corners if there is
no corner with g( x, y) = g( x + d, y) = g( x, y + d). Let Cn be the
minimum number of colors required to avoid monochromatic corners
in any coloring of [n] [n]. We claim that Cn essentially captures the
value of cn :
Next we prove6 :
log log n
Theorem 4.7. Cn log log log n .
2r
We shall prove by induction that as long as C > 3, if n 2C ,
then any coloring of [n] [n] with C colors must contain either a
monochromatic corner, or a rainbow corner with r colors. When
2( C +1)
r = C + 1, this means that if n 2C , [n] [n] must
contain a
log log n 2(Cn + 1) log Cn log log n,
monochromatic corner, proving that Cn log log log n .
which cannot happen if Cn =
For the base case, when r = 2, n = 4, two of the points of the type o (log log n/ log log log n).
( x, n x ) must have the same color. If ( x, n x ) and ( x 0 , n x 0 ) have
the same color, with x > x 0 , then ( x 0 , n x ), ( x, n x ), ( x 0 , n x 0 ) are
either a monochromatic corner, or a rainbow corner with 2 colors.
2r 2r 2(r 1)
For the inductive step, if n = 2C , n contains m = 2C C
consecutive disjoint intervals: [n] = I1 I2 . . . Im , each of size
2(r 1)
exactly 2C . By induction, each of the sets Ij Ij must have either
a monochromatic corner, or a rainbow-corner with r 1 colors. If
one of them has a monochromatic corner, we are done, so suppose
they all have rainbow-corners with r 1 colors. Since a rainbow
corner is specified by choosing the center, choosing the colors and
2(r 1) 2
choosing the offsets for each color, there are at most (2C ) 2C
C 2(r 1) C
(2 ) rainbow-corners in each interval. This number is at most
2C 2(r 1) +C +C2r 1 2r 2(r 1)
2 < 2C C = m, so there must be j < j0 that Figure 4.9: A rainbow-corner induced
by two smaller rainbow corners.
have exactly the same rainbow corner with the same coloring. Then
we see (Figure 4.9) that these two rainbow corners must induce a
monochromatic corner centered in the box Ij Ij0 , or a rainbow corner
with r colors.
Exercise 4.1
Define the generalized inner product function GIP as follows. Here
each of the k players is given a binary string xi {0, 1}n . They want
to compute GIP( x ) = nj=1 ik=1 xi,j (mod 2).
Each vector xi can be interpretted as
This exercise outlines a number-on-forehead GIP protocol using a subset of [n]. Our set intersection
O(n/2k + k) bits. It will be convenient to think about the input X as a protocol computes GIP with O(k4 n/2k )
bits. This improved protocol, by A.
k n matrix with rows corresponding to x1 , . . . , xk .
Chattopadhyay, slightly improves a
famous protocol of V. Grolmusz.
Fix z {0, 1}n . Assume the first t coordinates of z are ones and
the rest are zeros. For ` {0, 1, . . . , k 1} define c` as the number
of columns in X with ` ones, followed by either a one or zero,
followed by k ` 1 zeros. Note that GIP( x ) = ck (mod 2). Find
a protocol to compute GIP( x ) using O(k ) bits assuming the players
know ct (mod 2).
One can extend this protocol to com-
Exhibit an overall protocol for GIP by showing that the players can pute any function of the number of all
agree upon a vector z and communicate to determine ct (mod 2) ones rows using O(n/2k + k log n) bits.
using O(n/2k + k) bits.
Exercise 4.2
60 communication complexity
lim cn = .
n
Exercise 4.3
A three player NOF puzzle demonstrates that unexpected effi-
ciency is sometimes possible.
Inputs: Alice has a number i [n] on her forehead, Bob has a
number j [n] on his forehead, and Charlie has a string x {0, 1}n
on his forehead.
Output: On input (i, j, x ) the goal is for Charlie to output the bit xk
where k = i + j (mod n).
Question: Find a deterministic protocol such that Bob sends one bit
to Charlie, and Alice sends b n2 c bits to Charlie. Alice and Bob must
numbers on foreheads 61
Exercise 4.4
Show that any degree d polynomial over F2 over the variables
x1 , . . . , xn can be computed by d + 1 players with O(d) bits of Number-
On-Forehead communication, for any partition of the inputs where
each party has n/(d + 1) bits on their forehead. (You may assume
d + 1 divides n exactly).
5
Discrepancy
1 2e
2c .
maxR Ex,y R ( x, y) (1) g( x,y)
Lemma 5.3. Every n-vertex graph with e (n ) edges has at least (en
2
1)4 /4 4-cycles.
Proof. Let 1x,y be 1 when there is an edge between the vertices x and Figure 5.1: A dense graph with no
y, and 0 otherwise. Then if x, x 0 , y, y0 are chosen uniformly at random, 3-cycles.
discrepancy 65
We can use similar ideas to prove that every dense bipartite graph
must contain a reasonably large bipartite clique. Next we show a
slightly different way to prove this:
Lemma 5.4. If G is a bipartite graph of edge density e, and bipartition
A, B, with | B| = n, then there exists subsets Q A, R B with
log n
| Q| 2 log(e/e) , | R| n, such that every pair of vertices q Q, r R is
connected by an edge.
log n
Proof. Pick a random subset Q A of size 2 log(e/e) , and let R be all
the common neighbors of Q. Given any vertex b B that has degree
d, the probability that b is included in R is exactly
d
( log n
2 log(e/e)
)
d
log n
2 log(e/e)
. Fact: ( nk )k n ( en
() k
k ) .
n en k
( log n
2 log(e/e)
)
So if di is the degree of the ith vertex, the expected size of the set R
is at least
! log n
n log n e log n
di 2 log(e/e) 1 n di 2 log(e/e)
en
2 log(e/e)
n n = n. By convexity.
i =1
n i=1 en e
So there must be some choice of Q, R that proves the Lemma.
Say Alice and Bob are given x, y {0, 1}n and want to compute
h x, yi mod 2. We have seen that this requires n + 1 bits of communi-
cation using a deterministic protocol. Here we show that it requires
n/2 bits of communication even using a randomized protocol.
66 communication complexity
Lemma 5.5. For any rectangle R, the discrepancy of R with respect to the
inner product is at most 2n/2 .
Theorem 5.6. Any 2-party protocol that computes the inner-product with
error at most e over the uniform distribution must have communication at
least n/2 log(1/(1 2e)).
where the inequality follows from the fact that E [ Z ]2 E Z2 for
any real valued random variable Z. Now we can drop k ( x ) from
this expression to get:
h i2
GIP( x )
E S ( x ) (1)
x
" #2
k 1
GIP( x )
E E i ( x ) (1)
x1 ,...,xk1 xk
i =1
" #
k 1 1
nj=1 ( xk + xk0 ) ik= 1 xi,j
= E
x1 ,...,xk ,xk0
i (x)i (x ) (1) 0
i =1
Theorem 5.8. Any randomized protocol for computing the generalized inner
product in the number-on-forehead model with error e requires n/4k1
log(1/(1 2e)) bits of communication.
68 communication complexity
At first it may seem that the discrepancy method is not very useful
for proving lower bounds against functions like disjointness, which
do have large monochromatic rectangles.
Suppose Alice and Bob are given two sets X, Y [n] and want to
compute disjointness. If we use a distribution on inputs that gives
intersecting sets with probability at most e, then there is a trivial
protocol with error at most e. On the other hand, if the probability of
intersection is at least e, then then there must be some fixed coordi-
nate i such that an intersection occurs in coordinate i with probability
at least e/n. Setting R = {( X, Y ) : i X, i Y }, we get
h i
E R ( X, Y ) (1)Disj(X,Y ) e/n,
h m
i2
Disj( Xi ,Yi )
E R ( X, Y ) (1)i=1
h m
i2
= E A( X ) B(Y ) (1)i=1 Disj(Xi ,Yi )
h i2
m
E A( X )2 E B(Y ) (1)i=1 Disj(Xi ,Yi )
h m m 0
i
E B(Y ) B(Y 0 ) (1)i=1 Disj(Xi ,Yi )+i=1 Disj(Xi ,Yi )
X,Y,Y 0
h
m
i
Disj( Xi ,Yi )+im=1 Disj( Xi ,Yi0 )
E E (1) i = 1
Y,Y 0 X
discrepancy 69
Lemma 5.9 may not seem useful at first, because under the given
distribution, the probability that X, Y are disjoint is 2m . However,
we can actually use it to give a linear lower bound on the communi-
cation of deterministic protocols. Suppose a deterministic protocol
for disjointness has communication c. Then there must be at most 2c
monochromatic 1-rectangles R1 , . . . , Rt that cover all the 1s. When-
ever X, Y are disjoint, we have that m j=1 Disj( Xi , Yi ) = m. On the
other hand, the probability that X, Y are disjoint is exactly 2m . Thus,
we get
" #
t
m m
j=1 Disj( Xi ,Yi )
2 E Ri (X, Y ) (1)
i =1
t h m i
Disj( Xi ,Yi )
E Ri ( X, Y ) (1) j=1
i =1
m q
2c (1/ | Ij |).
j =1
h i m
2k 1 1
m Disj( X1,j ,...,Xk,j )
E S ( X ) (1) j=1 q .
j =1 | Ij |
h i2
m Disj( X1,j ,...,Xk,j )
E S ( X ) (1) j=1
" #2
k 1
k ( X )2 E m
j=1 Disj( Tj )
E
X1 ,...,Xk1 Xk
i (X ) (1)
i =1
" #2
k 1
m
j=1 Disj( Tj )
E
X1 ,...,Xk1
E
Xk
i (X ) (1)
i =1
" #
k 1
m 0
j=1 Disj( Tj )+Disj( Tj )
= E
X1 ,...,Xk1 ,Xk ,Xk0
i (X )i (X ) (1) 0
, (5.2)
i =1
Then we get:
" #
m m
(5.2) E Zj E Zj ,
j =1 j =1
Proof.
1 (1/| Ij |) |Q|1 (1 )| Ij ||Q|
E
| Q|
= | Q|
Q,v
1 1 1
= |Q| (1 )| Ij ||Q| (1 + ) | I j | = .
| Ij | Q6=
| Ij | | Ij |
discrepancy 71
(2k 2 1 )2
Zj = q
0 | |X0 \ X |
| Xk,j \ Xk,j k,j k,j
!
(2k 2 1 )2 1 1
0 | + |X0 \ X | , By the Arithmeticmean - geometric
2 | Xk,j \ Xk,j k,j k,j mean inequality: ab a+2 b .
" #
(2k 2 1 )2 0
E Zj Pr[v = v ] + E 0 |
| Xk,j \ Xk,j
2k1 1 2(2k1 1)(2k2 1)2
+
| Ij | (2k2 1)| Ij |
(2k 1 1 )2
= ,
| Ij |
as required.
1
Shannon, 1948
Shannons seminal work on information theory1
has had a big im-
pact on communication complexity. Shannon wanted to measure the
amount of information (or entropy) contained in a random variable
X. Shannons definition was motivated by the observation that the
amount of information contained in a message is not the same as the
length of the message. Suppose we are working in the distributional
setting, where the inputs are sampled from some distribution .
Consider a protocol where Alices first message to Bob is a c-bit The entropy of the message is 0.
string that is always 0c , no matter what her input is. This message
does not convey any information to Bob. We might as well run the
protocol imagining that this first message has already been sent,
and so reduce the communication of the first step to 0.
The entropy of the message is log |S|.
Consider a protocol where Alices first message to Bob is a random
string from a set S {0, 1}c , with |S| 2c . In this case, the
parties should use log |S| bits to index the elements of the set,
reducing the communication from c to log |S|.
1 H (e)
0.5
0
0 0.5 1
e
Figure 6.1: The entropy of a bit with
p(1) = e.
1
H (X) = p(x) log(1/p(x)) = pE(x) log
p( x )
.
x
so some vertex will be available at the jth step. This ensures that
every step of this process succeeds.
Conversely, suppose X can be encoded in such a way that i is
encoded using `i bits. Then the expected length of the encoding is:
h i
`
E [`i ] = E [log(1/p(i ))] E log(2 i /p(i ))
p (i ) p (i ) p (i )
!
h i
`
H ( X ) log E 2 i /p(i ) By convexity of the log function,
p (i ) E [log Y ] log (E [Y ]).
!
= H ( X ) log 2 `i
.
i
If you pick a random path starting from the root in the protocol tree,
you hit the leaf encoding i with probability 2`i . The probability
that you hit one of theleaves encoding
a number from [n] is thus
i 2`i 1. Thus log i 2`i is at most 0, and the entropy is at
most the expected length of the encoding.
p( x )
Fact 6.2. 0.
q( x )
Proof.
p( x ) p( x )
= E log
q( x ) p( x ) q( x )
q( x ) q( x )
= p( x ) log log p( x ) = log 1 = 0. The inequality follows from the convex-
x p( x ) x p( x ) ity of the log function.
76 communication complexity
e log e + (1 e) log 11
e
5
4.5
0.8
3.5
3
0.5
2.5
1.5
0.2 1
0.5
0
0.1 0.5 0.9
e
Figure 6.2: The divergence between two
bits.
information 77
p( x ) q( x )
However, the divergence is not symmetric: in 6=
q( x ) p( x )
general. Moreover, the divergence can be infinite, for example if p is
supported on a point that has 0 probability under q. If X is an `-bit
string, we see that:
p( x ) p( x )
H ( X ) = E [log(1/p( x ))] = ` E log ` = ` ,
p( x ) p( x ) 2 q( x )
p( x |E ) 1
Fact 6.3. log p(E )
.
p( x )
Proof.
We can use divergence to quantify the dependence between two
p( x |E ) p( x |E )
random variables. If p( a, b) is a joint distribution, we define the p( x )
= E
p( x |E )
log
p( x )
mutual information between a and b to be
p(E | x )
= E log
" # p( x |E ) p(E )
p( a, b) p(b| a) p(b| a) 1
I ( A : B) = E log = E log = E . log .
p( a,b) p( a) p(b) p( a,b) p(b) p( a) p(b) p(E )
Chain Rules
Chain rules allow one to relate bounds on the information of a
collection of random variables to the information associated with
78 communication complexity
In words, the total divergence is the sum of the divergence from the
first variable, plus the expected divergence from the second variable.
Similar chain rules hold for the entropy and mutual informa-
tion. Suppose A, B are two random bits that are always equal. Then
H ( AB) = 1 6= H ( A) + H ( B), so the entropy does not add in general.
Nevertheless, a chain rule does exist for entropy. Denote
1
H ( B | A) = E log .
p( a,b) p(b| a)
h i
2
H ( AB) = E p(a,b) log p(a) p1(b|a) =
Then we have the chain rule2 :
H ( AB) = H ( A) + H ( B | A). h i
Suppose A, B, C are three random bits that are all equal to each E p(a,b) log p(1a) + log p(b1|a) = H ( A) +
H ( B | A) .
other. Then I ( AB : C ) = 1 < 2 = I ( A : C ) + I ( B : C ). On the other
hand, if A, B, C are three random bits satisfying A + B + C = 0 mod 2,
we have I ( AB : C ) = 1 > 0 = I ( A : C ) + I ( B : C ). Nevertheless, a
chain rule does hold for mutual information, after we use the right
definition. Denote:
p(b, c| a)
I ( B : C | A) = E log .
p( a,b,c) p(b| a) p(c| a)
3
I ( AB :hC ) =
Then we have the chain rule3 : I ( AB : C ) = I ( A : C ) + I ( B : C | A). p( a,c) p(b| a,c)
i
E p(a,b,c) log p(a) p(b|a) p(c) =
h i
p(b| a,c)
I ( A : C ) + E p(a,b,c) log p(b|a) =
Subadditivity h
p(b,c| a)
i
I ( A : C ) + E p(a,b,c) log p(b|a) p(c|a =
Each of the definitions we have seen so far satisfies the property that I ( A : C ) + I ( B : C | A ).
conditioning on variables can either only increase the quantity or
only decrease the quantity, a property that we loosely refer to as
subadditivity. We start with the divergence. Suppose p( a, b), q( a, b) are
two distributions. Then:
" #
p( a|b) p( a) p( a|b)
E = E log + log
p(b) q( a) p( a,b) q( a) p( a)
p( a) p( a)
= + I ( A : B) .
q( a) q( a)
p ( x1 , . . . , x n ) n p ( xi )
q ( x1 , . . . , x n
q ( xi )
.
i =1 Proof of Fact 6.4.
H ( XS ) = H ( X a ) + H ( Xb | X a ) + H ( Xc | X a , Xb )
H ( X a | X< a ) + H ( Xb | X< b ) + H ( Xc | X< c ) ,
Pinskers Inequality
Pinskers inequality bounds the statistical distance between two
distributions in terms of the divergence between them.
p( x ) 2
Lemma 6.6. ln 2 | p q |2 .
q( x )
80 communication complexity
e log e + (1 e) log 11
e
2
ln 2 ( e )
2
0.8 0.6
0.5
0.4
0.5
0.3
0.2
0.1
0.2
0
0.1 0.5 0.9
e
Figure 6.3: Pinskers Inequality
information 81
p( x ) p( xT )
q( x ) q( xT )
2 2
( p( x T = 1) q( x T = 1))2 = | p q |2 .
ln 2 ln 2
The first inequality follows from the chain rule for divergence. It
only remains to prove the second inequality. Suppose p( x T = 1) =
e q( x T = 1) = . Then we shall show that
f
g
e 1e 2
e log + (1 e) log ( e )2 (6.1)
1 ln 2
is always non-negative. (6.1) is 0 when e = , and its derivative with
respect to is
0 0.67 1
e 1e 4( e ) e
+
ln 2 (1 ) ln 2 ln 2 Figure 6.4: f = e log 2/3 e
+ (1
e e + e 4( e) e) log 11/3
e
, g = ln22 (e 2/3)2 .
=
(1 ) ln 2 ln 2
( e) 1
= 4 .
ln 2 (1 )
H ( B) /n I ( A1 , . . . , An : B) /n
n
(1/n) I A j : B .
j =1
E [I ( Ai : B)] H ( B) /n.
Claim 6.9. One of the three projections must have size at least n2 .
H ( XY ) + H (YZ ) + H ( XZ ) 2
H ( XYZ ) = 2 log n,
3 3
so one of the first three terms must be at least 2 log n, proving that
the projection must be of size at least n2 .
information 83
1 n 1
= E [e(S)] E [H ( GS | S)] + 1 H ( G ) + 1,
2 (2)
S S 2
n 2
and so H ( G ) (n2 ) 2, which implies that |G| 2( 2 ) =
n
2( 2 ) /4.
One of the triumphs of information theory is its ability to prove This result is especially impactful
optimal lower bounds on the randomized communication complexity because many other lower bounds
in other models (more in Part II) are
of functions like disjointness9 , which we do not know how to prove
consequences of Theorem 6.13.
any other way.
information 85
1 1
E qs qs
32 p(q,s) 4
1
p ( a t = 0 = bt ) E qs qs ,
p(q,s| at =0=bt ) 4
1 1 1
E qs + qs = ,
p(q,s| at =0=bt ) 4 8 8
1
and so one of the two terms in the expectation must be at least 16 .
Without loss of generality, say we have
1
E qs ,
p(q,s| at =0=bt ) 16
n
I (Xi : M | X<i Yi ) I (X : M | Y ) ,
i =1
n
I (Yi : M | Xi Y>i ) I (Y : M | X ) .
i =1
information 87
e0 e0
p ( y i | r k 2 ) p ( y i ) p ( y i | m k 1 , r k 2 ),
information 89
q
`+k log n
where e0 = n . There are two cases to consider:
Bob sends the message mk1 In this case, after fixing rk2 , zk1 is inde-
pendent of yi for every i. q
By induction, p(zk1 |rk2 ) is e-close to
`+k log n Fact: If i, j, a are independent, and
uniform, with e = (k 2) n . So on average over rk1 , i,:
p(i ) p( j), we have p( ai ) p( a j ). See
the Conventions chapter of the book for
e e0
p ( z k | r k 1 ) = p ( y z k 1 | m k 1 , r k 2 ) p ( y i | m k 1 , r k 2 ) p ( y i ) . a proof.
Alice sends the message mk1 In this case, p(yi |rk1 ) = p(yi |rk2 ), since
after fixing rk2 , yi is independent of mk1 , zk1 . So on average
over rk1 , i:
e e0
p ( z k | r k 1 ) = p ( y z k 1 | r k 2 ) p ( y i | r k 2 ) p ( y i ) .
q
`+k log n
Both of these bounds imply that p(zk |rk1 ) is (k 1) n -close to
uniform, as required.
Alice sends the i + 1st message In this case, fixing ri1 leaves mi and zi
independent. Pick mi by greedily setting each bit of mi in such a
90 communication complexity
To choose zi , define
B1 = {z0 , z1 , . . . , zi1 }
( )
p ( x j | m i , r i 1 ) `+k
B2 = j : > 4
p( x j ) n
p( Zi = j|ri1 )
B3 = j : < 1/2
p( Zi = j|zi1 )
We shall prove:
`+k p ( x j | m i , r i 1 )
| B2 B1 | 4
n
p( x j )
j B2 B1
p ( x j | m i , z i 1 )
= p ( x j | z i 1 )
Since x j is independent of zi1 for all
j B2 B1 j/ B1 .
p ( m i | z i 1 ) = p ( m i | r i 1 ) p ( m i 1 | z i 1 )
2|mi | 2|mi1 |i+1 2`k ,
p ( m i | z i ) = p ( m i | r i 1 ) p ( m i 1 | z i )
p ( z i | m i 1 , z i 1 )
2|mi | p ( m i 1 | z i 1 )
p ( z i | z i 1 )
p ( m i 1 | z i )
2|mi |i (1/2) = 2|mi |(i+1) p ( m i 1 , z i | z i 1 )
=
p ( z i | z i 1 )
by the choice of mi , and the fact that zi
/ B2 . p ( z i | m i 1 , z i 1 )
= p ( m i 1 | z i 1 )
p ( z i | z i 1 )
Bob sends the i + 1st message In this case, we pick zi first. Define the
sets:
B1 = {z0 , z1 , . . . , zi1 }
( )
p ( x j | r i 1 ) `+k
B2 = j : > 4
p( x j ) n
p( Zi = j|ri1 )
B3 = j : < 1/2
p( Zi = j|zi1 )
`+k p ( x j | r i 1 )
| B2 B1 | 4
n
p( x j )
j B2 B1
p ( x j | m i 1 , z i 1 )
= p ( x j | z i 1 )
Since x j is independent of zi1 for all
j B2 B1 j/ B1 .
Exercise 6.1
Show that for anys two joint distributions p( x, y), q( x, y) with same
support, we have
" # " #
p( x |y) p( x |y)
E E .
p(y) p( x ) p(y) q( x )
Exercise 6.2
Suppose n is odd, and x {0, 1}n is sampled uniformly at random
from the set of strings that have more 1s than 0s. Use Pinskers
inequality to show that the expected number of 1s in x is at most
n/2 + O( n).
Exercise 6.3
Let X be a random variable supported on [n] and g : [n] [n] be a
function. Prove that
H ( X | g( X )) 1 Use the fact that log
log e
1,
Pr[ X 6= g( X )] . e
log n for > 0.
Use this bound to show that if Alice has a uniformly random
vector y [n]n , and Bob has uniformly random input i [n],
and Alice sends Bob a message M with that contains ` bits, the
probability that Bob guesses yi is at most 1log
+`/n
n .
Exercise 6.4
Let G be a family of graphs on n vertices, such that every two
vertices in the graph share a clique on r vertices. Show that the
n
number of graphs in the family is at most 2( 2 ) /2r1 .
7
Compressing Communication
Proof. We prove the first inequality, since the second has the same
proof. By the chain rule, we can write:
I (X : M | Y) = I (X : Mi | YM<i ) ,
i
where here M1 , M2 , . . . are the bits of M. Now for any fixing of m<i ,
1 if M is to be transmitted by Alice,
i
I ( X : Mi | Ym<i )
0 if Mi is to be transmitted by Bob,
since in the first case the bit can have at most 1 bit of information,
and in the second case Mi is independent of X once we fix M<i and
Y. Taking the expectation over m<i gives
Applications
8
Circuits and Branching Programs
Boolean Circuits
Theorem 8.4. Every monotone circuit computing Match has depth (n).
Proof. It is enough to prove a lower bound on the communication Recall Theorem 8.2.
complexity of corresponding monotone Karchmer-Wigderson game.
In the game, Alice gets a graph G which has a matching of size
n/3 + 1 and Bob gets a graph H that does not have a matching of size
n/3 + 1. Their goal is to compute an edge which is in G, but not in H.
Set m = n/3. We shall show that if the parties can solve the
monotone Karchmer-Wigderson game using c bits of communication,
then they can get a randomized protocol for computing disjointness
on a universe of size m. Since any such randomized protocol requires
linear communication complexity3 , this gives a communication lower 3
By Theorem 6.13.
bound of (n).
Suppose Alice and Bob get inputs X [m] and Y [m]. Alice
constructs the graph GX on the vertex set [3m + 2] such that for each i,
GX contains the edge {3i, 3i 1} if i X, and has the edge {3i, 3i 2}
if i
/ X. In addition, GX contains the edge {3m + 1, 3m + 2}. Alices
graph consists of m + 1 disjoint edges. Bob uses Y to build a graph
HY on the same vertex set as follows. For each i [m], Bob connects
3i 2 to all the other 3m + 1 vertices of the graph if i Y. If i
/ Y, Bob
connects 3i to all the other vertices. Since every edge of HY contains
exactly one of {3i, 3i + 1}.
Alice and Bob permute the vertices of the graph randomly and run
the protocol promised by the monotone Karchmer-Wigderson game GX HY all pairs that include
Theorem 8.5. Any circuit of depth k 1 that computes F must have size at
n
least 2 16k 1 .
k
the coordinates of y0 {0, 1}[n] that are consistent with her input
to be 1, and all other coordinates to be 0. Clearly, there is only one
coordinate where x 0 is 1 and y0 is 0, and that is the coordinate that is
consistent with both x, y.
Now we claim that every gate in the formula that corresponds to a
path that is consistent with Alices input evaluates to 0. This clearly
true for the gates at depth k, since that is how we set the variables
in x 0 . For the gates at depth d < k, if the gate is an AND gate, then it
must be true because one of its inputs will be consistent with x and
so evaluate to 0. On the other hand, if the gate is an OR gate, then all
of its inputs correspond to paths that must be consistent with Alices
input, so they must all evaluate to 0. Thus F ( x 0 ) = 0, since the gate at
the root is certainly consistent with Alices input. Similar arguments
prove that F (y0 ) = 1.
Thus any protocol for the monotone Karchmer-Wigderson game
gives a protocol solving the pointer-chasing problem, and so by
Theorem 6.17, we get that the communication of the game must be at
least n/16 k, as required.
Branching Programs 1
x1 0
Branching programs model computations that do not use a
lot of memory. A branching program of length ` and width w is a
0
layered directed graph whose vertices are a subset of [` + 1] [w]. All
1
the vertices in the uth layer (u, ) are associated with a variable from x2 0 1
x1 , . . . , x n .
Every vertex (u, v), with u < ` + 1 has exactly two edges coming
out of it, and both go to a vertex (u + 1, ) in the next layer. Every
vertex in the last layer (` + 1, ) is labeled with an output of the 0
1
program. On input x {0, 1}n , the program is executed by starting
at the vertex (1, 1) and reading the variables associated with each
x3 0 1
layer in turn. These variables define a path through the program. The
program outputs the label of the last vertex on this path. 0 1
Every function f : {0, 1}n {0, 1} can be computed by a
branching program of width 2n , and most functions require expo- Figure 8.2: A branching program that
nential width. Here we define an explicit function that requires that computes the logical AND, x1 x2 x3 .
` log w (n log2 n), for any branching program that computes it. In the literature, the programs we
To prove the lower bound, we shall first show that any branching define here are referred to as oblivi-
ous branching programs. In general
program can be simulated (at least, in a sense) by an efficient com-
branching programs, every vertex of the
munication protocol in the number-on-forehead model (Chapter 4). program can be associated with a dif-
Let g : ({0, 1}r )k {0, 1} be an arbitrary function that k-parties ferent variable, vertices in a particular
need not read the same variable.
wish to compute in the number-on-forehead model. Then define
circuits and branching programs 103
g0 : {0, 1}r log t+t {0, 1} to be g0 (S, x ) = g( xS ), where here the first
part of g0 s input is a set S [t] of size r, and the second part of g0 s
input is a string x {0, 1}t . g( xS ) is the output obtained when x is
projected to the coordinates in S. 4
Babai et al., 1989; and Hrubes and Rao,
We claim 4 : 2015
Lemma 9.1. There is a family of t subsets of [5d2 log t], such that for any
d + 1 sets S1 , . . . , Sd+1 in the family, S1 is not contained in the union of
S2 , . . . , S d + 1 .
Proof. Pick the t sets at random from [5d2 log t], where each element
is included in each set independently with probability 1/d. Then for
a particular choice of S1 , . . . , Sd+1 , the probability that some element
j S1 but j is not in any of the other sets is
1
(1 1/d)d /d 22 /d = d.
4
2
The probability that there is no such j is at most (1 14 d)5d log t
ed log t .
The number of choices for d + 1 such sets from the family is
t
( d+1 ) td+1 2d log t . Thus, by the union bound, the probabil-
ity that the family does not have the property we need is at most
ed log t 2d log t < 1.
In each round of the protocol, all parties send their current color to
all the other parties. If there are t colors in a particular round, each
party looks at the d colors she received and associates each with a set
from the family promised by Lemma 9.1. She picks a color by picking
an element that belongs to her own set but not to any of the others.
Thus, the next round will have at most 5d2 log t colors. Continuing in
this way, the number of colors will be reduced to O(d2 log d) in log n
rounds.
A C
v w
B D
i i
i i
if (i, j) 2
/X if (i, j) 2
/Y
j j
Consider the protocol obtained when Alice simulates all the ver-
tices close to A, B, and Bob simulates all the nodes close to C, D. This
protocol must solve the disjointness problem, and so has communi-
cation at least (n2 ). This proves that the O(n) links that cross from
the left to the right in the above network must carry at least (n2 )
bits of communication to compute the diameter of the graph.
Detecting Triangles
4
Drucker et al., 2014
Another basic measure associated with a graph is its girth,
which is the length of the shortest cycle in the graph. Here we show
that any distributed protocol for computing
the girth of an n vertex
graph must involve at least (n2 2O( log n) )) bits of communication.
We prove4 this by showing that any such protocol can be used
to compute disjointness in the number-on-forehead
model with 3
parties, and a universe of size (n2 2O( log n) )). Applying Theorem
5.12 gives the lower bound.
Suppose Alice, Bob and Charlie have 3 sets X, Y, Z U written
on their foreheads, where U is a set that we shall soon specify. Let
A, B, C be 3 disjoint sets of size 2n. We shall define a graph GX,Y,Z on
the vertex set A B C, that will have a triangle (namely a cycle of
length 3) if and only if X Y Z is non-empty. c a
A a if 2Q C
2
To construct GX,Y,Z we need the coloring
promised by Theorem
c
4.2. This is a coloring of [n] with 2O( log n) colors such that there are
no monochromatic arithmetic progressions. Since such a coloring if b a2Q if c b2Q
exists, there must be a subset Q [n] of size n2O( log n) that does
not contain any non-trivial arithmetic progressions.
b
Now define a graph G on the vertex set A B C, where for each B
a A, b B, c C, Figure 9.2: The graph G.
distributed computing 109
( a, b) G b a Q,
(b, c) G c b Q,
ca
( a, c) G Q.
2
log n)
Claim 9.3. The graph G has at least n| Q| = (n2 2O( ) triangles,
and no two triangles in G share an edge.
( a, b) G a triangle of U containing ( a, b) is in Z,
B
(b, c) G a triangle of U containing (b, c) is in X,
2X C
( a, c) G a triangle of U containing ( a, b) is in Y.
2Y
Given sets X, Y, Z as input, Alice, Bob and Charlie build the network 2Z
GX,Y,Z and execute the protocol for detecting triangles, with Alice, Figure 9.3: The graph GX,Y,Z .
Bob, Charlie simulating the behavior of the nodes in A, B, C of the
network. Each of the players knows enough information to simulate
the behavior of these nodes. By Theorem 5.12, the total
communi-
log n)
cation of triangle detection must be at least (n2 2O( ), as
required.
3 10
4 7 4 7 4 7
7 5 9 8 7 5 9 8 7 5 9 8
8 9 6 6 9 10 10 8 9 6 6 9 10 10 8 9 6 6 9 10
4 4 4
10 7 5 7 5 7
7 5 9 8 7 10 9 8 7 6 9 8
8 9 6 6 9 10 8 9 6 6 9 10 8 9 10 6 9 10 2
4 4 2
5 7 5 2 5 4
7 6 9 2 7 6 9 7 7 6 9 7
8 9 10 6 9 10 8 8 9 10 6 9 10 8 8 9 10 6 9 10 8
Polygons
Correlation Polytope
Matching Polytope
Bibliography
Toms Feder, Eyal Kushilevitz, Moni Naor, and Noam Nisan. Amor-
tized communication complexity. SIAM Journal on Computing, 24(4):
736750, 1995. Prelim version by Feder, Kushilevitz, Naor FOCS 1991.
Pavel Hrubes and Anup Rao. Circuits with medium fan-in. In 30th
Conference on Computational Complexity, CCC 2015, June 17-19, 2015,
Portland, Oregon, USA, volume 33, pages 381391, 2015.