Beruflich Dokumente
Kultur Dokumente
Dimitris Koukoulopoulos
University of Montreal
5 Selbergs sieve 55
5.1 An optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 The fundamental lemma: encore . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 The parity problem in sieve methods . . . . . . . . . . . . . . . . . . . . . . 68
6 Smooth numbers 71
6.1 Iterative arguments and integral-delay equations . . . . . . . . . . . . . . . . 71
6.2 Rankins method: encore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3
4 CONTENTS
We assume that the reader has already taken a first course in Analytic Number Theory and
thus she is familiar with basic techniques, such as partial summation and Perrons inversion
formula, as well as with basic results, such as Chebyshevs and Mertenss estimates, as well
as the Prime Number Theorem for arithmetic progressions. However, we do give an almost
self-contained proof of the latter in Appendix B.
Throughout these notes there are various exercises which are imbedded in the text (rather
than at the end of each section or chapter). The most difficult ones have a star.
We make use of some standard and of some less standard notation. We write 1A to
denote the characteristic function of the set A. The symbol P denotes the set of prime
numbers and the letter p, with or without subscripts, will always denote a member of P. We
write f = O(g) or, equivalently, f g if there is a constant M such that |f | M g. The
constant M will be absolute unless otherwise specified, e.g. by a subscript. Also, we write
f g if f g and g f . For n N we use P + (n) and P (n) to denote the largest and
smallest prime factor of n, respectively, with the notational conventions that P + (1) = 1 and
P (1) = +. We write (n) for the number of distinct prime factors of n and (n) for
the total number of prime factors of n, counted with multiplicity. As usually, denotes the
Mobius function, defined to be (1)(n) if n is square-free and 0 otherwise, denotes Eulers
totient function, which counts the size of the set {r (mod n) : (r, n) = 1}, denotes the von
Mangoldt function, which is defined to be log p if n = pk , for some prime p andP some k 1,
and 0 otherwise, and k denotes the k-divisor function, defined by k (n) = d1 dk =n 1. In
particular, 2 (n) is the number of divisors of n, which we simply denote by (n). We write
(x) for the number of primes up to x and (x; q, a) for the number of primes up to x that lie
in the arithmetic progression a (mod q). Finally, we give below references to the page where
some additional basic notation is introduced.
Symbol Page Equation Page
Ad 23 (A1) 23
S(A, z) 23 (A2) 24
P (z) 23 (A3) 25
g(d) 23 (A4a) 25
rd 23 (A4b) 26
V (z) 24 (R) 33
37 (R0 ) 58
(r) 58
5
6 CONTENTS
Chapter 0
We start by covering some basic background material, which we will need in order to handle
some technical parts of the theory of sieve methods. In particular, we will see various
techniques for evaluating asymptotically the average value of a multiplicative function.
whereas f is called completely multiplicative if f (1) = 1 and the above relation holds for
all m and n, without the requirement that they are co-prime. Some important examples of
multiplicative functions are the functions ns , the divisor functions
k (n) := #{(d1 , . . . , dk ) Nk : d1 dk = n}
The function f g is called the convolution of f and g. For example, we have that = 1 1
and, in general, k = |1 {z
1}.
k times
7
8 CHAPTER 0. PRELUDE: MULTIPLICATIVE FUNCTIONS
Note that the operation is commutative and associative. Moreover, if f and g are both
multiplicative functions, then their convolution f g is also multiplicative. The unit of the
convolution operation is the completely multiplicative function
(
1 if n = 1,
1(n) =
0 if n > 1.
Any arithmetic function f with f (1) 6= 0 has an inverse with respect to , that is to say,
there is g : N C with f g = 1. In particular, any multiplicative function has an inverse
with respect to . Combining all of the above, we conclude that the set of multiplicative
functions together with the operation is an abelian group.
A particularly important example of a multiplicative function is the constant function 1.
Its convolution inverse is the Mobius function. Equivalently, we have the Mobius inversion
formula
(
X 1 if n = 1,
(0.1.1) (d) =
d|n
0 if n > 1,
which follows by the inclusion-exclusion principle. Alternatively, one may observe that 1
is multiplicative and verify directly the above formula when n is a prime power. A direct
consequence of (0.1.1) is that if f is completely multiplicative, then its convolution inverse
is given by f .
Y
X n
1
=x 1+ + O( x).
nx
(n) p
p(p 1)
0.2. AVERAGES OF MULTIPLICATIVE FUNCTIONS: BASIC TECHNIQUES 9
We now turn to another important example, the divisor function . As we said above,
= 1 1. So we have that
X jxk X x
X XX 1
(n) = 1= = + O(1) = x log x + + O + O(x)
nx nx dx
d dx
d x
d|n
= x log x + O(x).
Dirichlet discovered that it possible to take advantage of the fact that both factors of = 11
are nice functions and improve significantly upon this result.
First, note that for every A and B with AB = x we have the general formula
! !
X X X X X X X
(0.2.1) (f g)(n) = f (a) g(b) + g(b) f (b) f (a) g(b) .
nx aA bx/a bB ax/b aA bB
If we know something about the average behaviour of both f and g, then may pick both A
and B to be increasing functions of x. This allows us to reduce the length of sums we are
considering and thus improve the error term in our formula for the summatory function of
f g. In particular, when f = g = 1, then choosing A = B = x yields the formula
X X j x k 2 X x
(n) = 2 x =2 + O(1) ( x + O(1))2
nx
a a
a x a x
1
= 2x log x + + O x + O( x)
x
= x log x + (2 1)x + O( x).
This argument can be generalized to deduce the following theorem.
10 CHAPTER 0. PRELUDE: MULTIPLICATIVE FUNCTIONS
Proof. Exercise.
0.2.4. (a) Find an asymptotic formula for the summatory function of 2(n) , i.e.
ExerciseP
for nx 2(n) .
(b)* Try doing the same thing for the summatory function of 2(n) . What happens? Can
you explain why?
Exercise 0.2.5.
(a) Show that, for every x 1,
Y
1
#{n x : n is square-free} = x 1 2 + O( x).
p
p
(b)* Show that the error term is in fact o( x) as x .
Remark 0.2.7. Taking f (n) = k (n), we see that the above theorem is best possible in this
generality, up to multiplicative constants (cf. Theorem 0.2.3).
So we conclude that
X x X f (n)
f (n) (A + B + 1) .
nx
log x nx n
To see the second part of the theorem, note that
X f (n) Y f (p) f (p2 ) f (p2 ) f (p3 )
Y
f (p)
1+ + 2 + 1+ exp 2
+ 3 + ,
nx
n px
p p px
p p p
and r
X (n)
k (n) r,k x(log x)k1 (x 1)
nx
n
Exercise 0.2.10. Show that the error term in Theorem 0.2.1 can be strengthened to
O(log x).
Exercise 0.2.11. This exercise generalizes and strengthens Theorem 0.2.1. Let s 6= 0 and
x 1. Show that
X a Y 1
Y
1
|s|
=x 1+ 1 +O log(2x) .
ax
(a) p(p 1) p (s)
p-s p|s
(a,s)=1
Conclude that
X 1 Y 1
Y
1
X log p X log p
= 1+ 1 log x +
ax
(a) p(p 1) p p1 p2 p + 1
p-s p|s p|s p-s
(a,s)=1
|s| log(2x)
+O .
(s) x
The following result is another application of Rankins trick. In fact, it is precisely this
form of the trick that we will see appearing again and again while discussing sieve methods.
0.3. RANKINS METHOD 13
Remark 0.3.4. Integers n all of whose prime divisors are y are called y-smooth. We will
study their distribution more carefully in Chapter 6.
Proof. Clearly, we may assume that u is large enough. For every (0, 1/3], we have that
1
X 1 1 X 1 1 Y 1
= 1 1
n x n1 x py p
P + (n)y P + (n)y
n>x
1
log y Y 1 1
1 1 1
x py p p
( )
log y X p 1
exp ,
x py
p
since
Y 1 X 1 X X
1 1 1 X 1
log 1 = log 1 = m
=
+ O(1)
py
p py
p py m=1
mp py
p
for all 2/3. Moreover, note that if p e1/ , then p = 1 + O( log p) and consequently
X p 1 X log p
1.
1/
p 1/
p
pe pe
In order to optimize the above inequality, we choose w 1 with ew1 /w = u (for every
u 1, there is such a w). We need that w y 1/3 . This does hold if and only if
ew1 3y 1/3
u= log x 3y 1/3 .
w log y
This last inequality is guaranteed by our assumption that y (log x)3 . Finally, note that
w log w 1 = log u. In particular, w log u and thus w = log u+log w = log u+log log u+
O(1). Therefore
ew
w
e
uw + O = u u(log u + log log u) + O(u),
w w2
which together with (0.3.1) completes the proof of the theorem.
Exercise 0.3.5. Let x y 2 with u 1. Show that, for every fixed c 1, we have that
X 1 log y
c cu .
n e
P + (n)y
n>x
where
Y 1
S(z) = (1 + g(p)) 1 .
pz
p
Proof. A natural way to try and prove this theorem would be to approximate 2 (n)g(n)
by
P the function (n) /n or, even, (n)/n, where is defined via the identity (s) =
s
n1 (n)/n , the point being that both of these functions are equal to on prime numbers.
This would theoretically reduce our task to estimating nx (n) /n or nx (n)/n, which
P P
should be easier to analyze due to the regularity of the summands. However, this cannot be
made to work because our assumptions on g are too weak to allow the convolution method
to work. Instead we use a different approach, working directly with the partial sums of 2 g.
Set S(z) = nz 2 (n)g(n). The idea behind this theorem is the following: we have that
P
Z z
X
2 S(t)
(n)g(n) log n = (log z)S(z) dt,
nz 1 t
So if z = eu and (u) = S(eu )/u , then the above formulas imply that
Z u
+1
(0.4.2) u (u) ( + 1) w (w)dt.
0
In particular, the theorem holds trivially when z eL , so we only need to consider the case
16 CHAPTER 0. PRELUDE: MULTIPLICATIVE FUNCTIONS
Moreover,
X X X
2 (m)g(m) = g(p) 2 (r)g(r) g(p) 2 (r)g(r) g(p)S(ew )w ,
mew /p rew /p2 rew
p|m p-r
where we used our assumption that g(p) C5 /p. Inserting (A30 ) into the above formula, we
deduce that
X X
2 (m)g(m) log m = 2 (m)g(m) ( log(ew /m) + OC6 (L)) + O (S(ew )Lw ) ,
mew mew
X
= 2 (m)g(m) log(ew /m) + O(S(ew )Lw )
mew
Z ew
S(y)
= dt + O(S(ew )Lw ),
1 y
0.4. INTEGRAL-DELAY EQUATIONS 17
R ew
on using the formula log(ew /m) = m dy/y and inverting the order of summation and
integration. By partial summation, the left hand side of the above formula is wS(ew )
R ew
1
S(y)dy/y. So we find that
Z ew
w S(y)
wS(e ) = ( + 1) dt + O(S(ew )Lw )
y
Z1 w
= ( + 1) S(et )dt + O(S(ew )Lw )
Z0 w
= ( + 1) S(et )dt + O(S(ew )Lw ),
1
Set
Z w
+1
(0.4.5) E(w) := (w) +1 t (t)dt,
w 1
so that E(w) S(ew )L/w for all w 1. We multiply E(w) by a weight function k(w)
and integrate over w [1, u]. In anticipation of the choice of k, and in order to simplify
some calculations, we let k(w) = f 0 (w)w+1 /( + 1), where f 0 is the derivative of a twice
differentiable increasing function f : [1, +) R to be chosen later. (Assuming that k is
of this form is clearly not a serious restriction.) We have that
Z u Z u Z u Z w
E(w)f 0 (w)w+1 (w)f 0 (w)w+1 0
dw = f (w) t (t)dtdw
1 + 1 + 1
Z1 u Z1 u Z1 u
(w)f 0 (w)w+1
= t (t) f 0 (w)dwdt
+ 1
Z1 u Z1 u t
(w)f 0 (w)w+1
= t (t)(f (u) f (t))dt
1 + 1
Z u 1
+1 0 Z u
f (w)w
= (w) dw f (u) t (t)dt.
1 +1 1
Ru
Using (0.4.5) to rewrite the integral 1 t (t)dt and rearranging the terms, we find that
Z u 0 Z u
f (u)u+1 f (w)w+1 E(w)f 0 (w)w+1
(0.4.6) ((u) E(u)) = (w) dw dw.
+1 1 +1 1 +1
We choose f (w) = ( + 1)/w+1 so that k(w) = f 0 (w)w+1 /( + 1) = 1/w and (0.4.6)
becomes
Z u
dw
(0.4.7) (u) = E(u) + ( + 1) E(w) .
1 w
18 CHAPTER 0. PRELUDE: MULTIPLICATIVE FUNCTIONS
Since E(w) S(ew )L/w for w 1 and S(ew ) S for w L, by relation (0.4.4), the
integral in (0.4.7) converges absolutely as u . Moreover,
Z Z
dw dw SL
E(w) SL 2
(u L),
u w u w u
and, consequently,
Z
dw SL SL
(0.4.8) (u) = ( + 1) E(w) +O =: I + O (u L).
0 w u u
an identity which completes the proof of (0.4.1) and, hence, of the theorem. In order to
show (0.4.9) note that, as s 0+ , we have that
2 (n)g(n) Y Y
X g(p) g(p) 1 S
(0.4.10) = 1+ s = (s + 1) 1+ s 1 s+1 ,
n=1
ns p
p p
p p s
by (A30 ) and the fact that (s) 1/(s 1) as s 1. On the other hand, integration by
parts implies that
2 (n)g(n) S(eu/s )
Z Z Z
X dt S(t)
= dS(t) = s dt = du
n=1
ns 1 ts 1 ts+1 0 eu
(0.4.11)
1
I (u/s) + Og (1 + (u/s) I ( + 1) + Og (s)
Z
))
= du = ,
0 eu s
for every s > 0. Comparing (0.4.10) with (0.4.11) shows relation (0.4.9), thus completing
the proof of the theorem.
Chapter 1
Then, Eratostheness algorithm for founding all primes up to x has the following steps:
The termination of this algorithm is guaranteed by Theorem 1.1.1. After its termination,
the integers which have not yet been deleted together with the ones that are circled will be
exactly the prime numbers in [1, x]. Eratosthenes algorithm is called a sieve because the
integers that are not deleted by it (do not pass through it) are exactly the primes up to a
given point.
Eratosthenes sieve provides a way to count primes in various settings. We give below its
simplest application to the basic question of how big is (x) = #{p x}. We already know
from Prime Number Theorem that
x
(1.1.1) (x) (x ).
log x
For comparison,
let us see what Eratosthenes sieve gives as an answer: consider all primes
p x. If an integer n x is not divisible by any of these primes, then either n = 1 or n
19
20 CHAPTER 1. INTRODUCTION TO SIEVE METHODS
is a prime number lying in ( x, x]. So
(x) = #{n x : p|n p > x} 1 + ( x)
(1.1.2)
= #{n x : p|n p > x} + O( x).
Now we will try to understand what is the cardinality of the set appearing on the right
hand side of (1.1.2). Let {p1 , p2 , . . . , pr } be an indexing of the set P [1, x]. Then, by the
inclusion-exclusion principle, we have that
r
! r
!
\ [
#{n x : p|n p > x} = # {n x : pi - n} = #{n x} # {n x : pi |n}
i=1 i=1
r
X X
= #{n x} #{n x : pi |n} + #{n x : pi pj |n}
i=1 1i<jr
X
#{n x : pi pj pk |n}
1i<j<kr
r X x X x
X x
(1.1.3) = bxc +
i=1
pi 1i<jr
p i p j 1i<jr
p i p j p k
where denotes the Euler-Mascheroni constant. Inserting (1.1.5) into (1.1.4) and combining
the resulting estimate with (1.1.2) leads to the prediction that
2e x
(x) (x ).
log x
Comparing this estimate with (1.1.1), we see that it overestimates (x), since 2e =
1.1229189671 . . . > 1, a typical feature of sieve methods, as we will see. In order to un-
derstand why this happens, it is convenient to recast formula (1.1.3) in a more compact way,
using the Mobius function , so that (1.1.3) becomes
X jxk
(1.1.6) #{n x : p|n p > x} = (d) .
d
p|dp x
1.1. THE SIEVE OF ERATOSTHENES-LEGENDRE 21
X (d)
#{n x : p|n p > x} = x + O(2( x) )
d
p|dp x
(2e + o(1))x
(1.1.7) = + O(4(1+o(1)) x/ log x ).
log x
So we see that our attempt to obtain an asymptotic formula for (x) fails dramatically, as the
error term in (1.1.7) is much bigger than the main term. This happens for two interconnected
reasons:
The numbers d in the sum appearing Qon the right hand side (1.1.6) are in one-to-one
correspondence with the divisors of px p, and there are too many of these.
The numbers d in (1.1.6) can get as big as
Y X
p = exp log p = e x(1+o(1)) ,
p x p x
Nevertheless, Legendre observed that it is possible to use the above idea to gain some
information on the size of (x). His starting point was that, for any z [1, x], the set of
integers that have all their prime factors > z contain the primes in the interval (z, x]. So
Moreover, as before, the inclusion-exclusion principle and Mertens estimate imply that
X jxk X x
#{n x : p|n p > z} = (d) = (d) + O(1)
d d
p|dpz p|dpz
Y 1
x
+ O 2(z) + 2z
=x 1
pz
p log z
and, consequently,
x
(x) + 2z ,
log z
This formula is valid for all z [1, x]. Taking z = (log x)/2 then yields that
x
(1.1.8) (x) ,
log log x
a non-trivial estimate for (x). Of course, this estimate is much worse that Chebyshevs
estimate (x) x/ log x. However, as the following exercise shows, it is possible to use
these ideas to obtain other interesting results.
22 CHAPTER 1. INTRODUCTION TO SIEVE METHODS
Exercise 1.1.2 (The square-free sieve). Use the ideas developed above to give a new proof
of the estimate
Y
1
#{n x : n is square-free} = x 1 2 + O( x) (x 2).
p
p
Exercise 1.1.3. Use Legendres refinement of the sieve of Eratosthenes to show that
(x) = o(x) (x ),
Exercise 1.1.4. Find the average value of the greatest common divisor of a and b asymp-
totically, as a and b range over all integers up to x.
The twin prime conjecture: Are there infinitely many pairs of integers (n, n + 2) which
are both prime? (Such pairs are called twin primes.)
Goldbachs conjecture: Can every even integer greater than 2 be written as the sum
of two primes?
Is there a prime number between two consecutive squares?
Are there infinitely many primes of the form n2 + 1?
To this day, all of the above problems remain wide open. However, it is possible to use sieve
methods to make some progress towards them. Examples of results that can be proven using
sieve methods include:
There are infinitely many primes p such that p+2 has at most two prime factors (Chen,
1966).
P
p, p+2 twin primes 1/p < (Brun, 1908).
For every large m, the interval (m2 , (m + 1)2 ) contains an integer with at most 2 prime
factors (Chen, 1975).
There are infinitely many integers n such that n2 + 1 has at most 2 prime factors
(Iwaniec, 1980).
There are infinitely many primes of the form a2 + b4 (Friedlander - Iwaniec, 2004).
1.2. GENERAL SET-UP 23
In the context of sieve methods, all four problems listed in the beginning of this chapter
as well as many others can be viewed in a unified way. To see this, we need to introduce
some notation. Let A be a finite set of integers and z 1 some real number. We set
Y
P (z) = p
p<z
and
S(A, z) = #{a A : (a, P (z)) = 1}.
By Theorem 1.1.1, if we wish to extract primes (or product of primes) from the set A,
then weQneed to be able to obtain non-trivial lower bounds on S(A, z) with z as big as
max{p| aA a}1/2 . For example,
To count the number of twin prime pairs (n, n + 2) with n x, we take
A = {n(n + 2) : n x} and z = x + 2.
Alternatively, we can take
A = {p + 2 : p x} and z = x + 2.
To count the number of representations of the even integer 2N as the sum of two
primes, we take
A = {n(2N n) : n 2N } and z = 2N .
The above examples indicate how flexible this terminology is. Analyzing S(A, z) will be
one of the main objective of this course. Recall the Mobius inversion formula:
(
X 1 if n = 1,
(d) =
d|n
0 if n > 1,
So
X X X
(1.2.1) S(A, z) = (d) = (d)|Ad |,
aA d|(a,P (z)) d|P (z)
where
Ad := {a A : a 0 (mod d)}.
In order to proceed we write
(A1) |Ad | = g(d) X + rd ,
for every integer d, where
24 CHAPTER 1. INTRODUCTION TO SIEVE METHODS
Therefore we find that V (z) 1/(log z)2 and relation (1.2.2) yields the estimate
x X x X
S(A, z) 2
+ 2 (d)(d) 2
+ 3(z) + 3z .
(log z) (log z) log X
d|P (z)
Choosing z = (log x)/3 in the above estimate and combining the resulting estimate with (1.2.4),
we deduce that
x
#{n x : n, n + 2 are both prime} .
(log log x)2
This bound is however worse than the semi-trivial bound
x
#{n x : n, n + 2 are both prime} (x) .
log x
In conclusion, we see that the sieve of Eratosthenes-Legendre suffers a lot from a quanti-
tative point of view. We will see how this deficiency was remedied, first by Viggo Brun and
then by others. In particular, we shall show that (1.2.3) holds if log z = o(log X) and A is
structured.
Often, the above condition is unnecessarily strong and we can get away with the weaker
condition
log w0
V (w) Y
1
(A4a) 0
= (1 g(p)) K (3/2 w w0 ).
V (w ) wp<w0 log w
1
Note that when p = 2, the classes 0 (mod 2) and 2 (mod 2) coincide. However, this does not affect
things on average.
26 CHAPTER 1. INTRODUCTION TO SIEVE METHODS
for some C1 0.
Exercise 1.3.1. Verify the sieve axioms (A1), (A2), (A3) and (A4b) when A is the set
{n(2N n) : n 2N }, {x2 < n (x + 1)2 }, or {n2 + 1 : n N }.
Exercise 1.3.2. Verify the sieve axioms (A1), (A2), (A3) and (A4b) when A = {p+2 : p
x}. For this set, S(A, x + 2) counts twin primes. This is also true if A = {n(n+2) : n x}.
What difference do you notice among these two choices of A?
Chapter 2
Before we embark on the study of the combinatorial sieve, and in order to motivate Bruns
pure sieve, we give a very brief introduction to probabilistic number theory.
The above theorem can be expressed loosely by saying that a typical integer n has about
log log n prime factors. Hardy and Ramanujan deduced theorem 2.1.1 from the following
result.
27
28 CHAPTER 2. INTERLUDE: PROBABILISTIC NUMBER THEORY
for some constant c, by Chebyshevs estimate (x) x/ log x. We will show the theorem
with A = c and B large enough.
We argue by induction: assume that the result holds for some r N. Let n x be an
ar+1
integer with r + 1 distinct prime factors, say n = pa11 pr+1 with p1 < p2 < < pr+1 .
aj +1 aj
Then pj < pj pr+1 x for j r, so that there are at least r ways to write n = pa m for
some p x1/(a+1) , a 1 and m with (m) = r. Consequently,
X X X X Ax/pa (log log(x/pa ) + B)r1
rr+1 (x) r (x/pa )
a1 pa x1/(a+1) a1 px1/(a+1)
log(x/pa ) (r 1)!
Ax(log log x + B)r1 X X 1
.
(r 1)! a1
pa log(x/pa )
px1/(a+1)
1
P P
Therefore, we need to show that a1 px1/(a+1) pa log(x/p a ) (log log x+O(1))/ log x. Then
choosing B large enough will complete the inductive step and hence the proof. Indeed, note
that 1/(1 y) 1 + (a + 1)y for y [0, a/(a + 1)]. So
X X 1 1 X X 1
a log(x/pa )
=
a1 px1/(a+1)
p log x a1 px1/(a+1) p a 1 a log p
log x
1 X X 1 (a + 1) log p
1+
log x a1 1/(a+1)
pa log x
px
!
1 X 1 log p X a + 1
1+
log x p 1 log x a1 2a1
p x
Exercise 2.1.4.
(a) Show that the entries of the N N multiplication table form a sparse set of the integers
in the sense that
1
lim #{ab : a N, b N } = 0
N N 2
1 bx/dc 1
#{n x : d|n} = .
bxc bxc d
Motivated by this rough calculation, Kubilius assigned to the event {d|n} the probability
1/d. In addition, note that if d1 and d2 and co-prime, then
1
Prob({d1 |n} {d2 |n}) = Prob({d1 d2 |n}) = = Prob({d1 |n}) Prob({d2 |n}).
d1 d2
So it is reasonable to assume that the events {d1 |n} and {d2 |n} are independent of each
other. This leads to the estimate
Prob({(n) = r | n x})
X
= Prob({p1 pr |n} {p - n p [1, x] \ {p1 , . . . , pr }})
p1 <<pr x
Y X
X 1 Y 1 1 1
= 1 = 1
p1 <<pr
p pr
x 1 px
p px
p p <<p x
(p1 1) (pr 1)
1 r
p{p
/ 1 ,...,pr }
X X !r
1 Y 1 1 1 Y 1 1
= 1 1
r! px p p ,...,p x (p1 1) (pr 1) r! px p px
p
1 r
distinct
1 (log log x + O(1))r
,
log x r!
So we see that, even though the Kubilius model is very successful in predicting the
distribution of , it fails in the case of S(A, z). The reason for this failure lies in the
assumption that the events Ad1 and Ad2 are independentof each other whenever d1 and d2 are
coprime. Indeed, if for example A = {n x} and z = x, then an integer a x is divisible
by at most three distinct primes p (x1/3 , z], which certainly violates the independence
assumption we made above3 . In rough terms, this is why it is hard to obtain asymptotic
formulas on S(A, z) when z is large and, in fact, why the approximation S(A, z) X V (z)
is generally false if z is large. However, we shall see in the next sections that the Kubilius
model is accurate when z = X o(1) .
2
Indeed, if > 1, then N 1 + + 2 + + N 1 N .
3
This phenomenon is not significant in the study of , since n has at most 1000 prime factors in [n1/1000 , n].
So we may ignore these large primes without sacrificing a whole lot in precision.
Chapter 3
31
32 CHAPTER 3. THE COMBINATORIAL SIEVE
by (A1). Even though the above relation holds for all r, Theorem 2.1.1 suggests that, in
order to have any hope of succeeding, we must take r as a function of x and z (usually,
r c log log x suffices).
Instead of estimating the main term in (3.1.2) directly, we observe that
X X
V (z) = 1 + (1)j g(p1 pj ).
j1 pj <<p1 <z
So
2m+1
X X 2m
X X
j
(3.1.3) 1+ (1) g(p1 pj ) V (z) 1 + (1)j g(p1 pj )
j=1 pj <<p1 <z j=1 pj <<p1 <z
Since
X X 1 1
g(p) log = ,
p<z p<z
1 g(p) log V (z)
By Stirlings formula, the main term in the above formula is bigger than the first error term
if r > 3.6| log V (z)|. Indeed, we have the following estimate:
3.1. BRUNS PURE SIEVE 33
Theorem 3.1.2. Let A be a finite set of integers which satisfies (A1) and (A2). For z 1
and r 3.6| log V (z)|, we have that
1 X
S(A, z) = X V (z) 1 + O +O |rd |
.
r
d|P (z)
dz r , (d)r
Controlling the second error in Theorem 3.1 is often very hard and depends on having at
our disposal an estimate of the form
X C2 X
(R) 2 (d)|rd |
dD
(log X)A
with D = z r and A some large enough positive number. If such an estimate holds, then
we say that A has level of distribution D. Assuming(A4a) and (R), we may simplify the
statement of Theorem 3.1.2.
Theorem 3.1.3. Let A be a finite set of integers which satisfies (A1) and (A4a), as well
as (R) with A = + 1 and D = X for some (0, 1]. For 1 z X /(4 log log X) , we have
that
1
S(A, z) = X V (z) 1 + O ;
log log X
the implied constant depends at most of , K, and C2 .
Proof. Without loss of generality, we may assume that X is large enough. Note that
relation (A2) holds by (1.3.1). The result then follows by Theorem 3.1.2 applied with
r = 4 log log X 3.6| log V (z)|, so that z r X .
In the next section we will see that, by improving upon Bruns idea, it is possible to extend
the above theorem to z X. But before this, we give a nice application of Theorem 3.1.3,
due to Brun.
Remark 3.1.5. The value of the sum p, p+2 twin primes p1 is called Bruns constant. Since we
P
P
know that p 1/p = +, the above theorem tells us that primes p for which (p, p + 2) are
twin primes are sparse among the sequence of primes.
Proof of Theorem 3.1.4. First, we apply Theorem 3.1.3 with A = {n(n + 2) : n x}. Note
that (A1) holds with X = x, g(d) = (d)/d and rd (d), where
by (1.2.5). Moreover, (A4a) holds with = 2, by (1.2.6). Finally, (R) holds for any A > 0
and any < 1, since |(d)| 2 (d) for d square-free. So Theorem 3.1.3, applied with
z = x1/(10 log log x) , yields the estimate
x x(log log x)2
#{n x : n, n + 2 twin primes} z + S(A, z) z + .
(log z)2 (log x)2
The desired result then follows by partial summation.
This identity can be used to obtain an entire family of combinatorial sieves, as follows:
Consider sets
j {(p1 , . . . , pj ) : z > p1 > p2 > > pj primes} (j 1)
such that
3 1 {p < z}2 , 5 3 {p < z}2 , . . . , 2j+1 2j1 {p < z}2 , . . .
and
4 2 {p < z}2 , 6 4 {p < z}2 , . . . , 2j+2 2j {p < z}2 , . . .
Moreover, set
D+ = {1} {d = p1 pr > 1 : (p1 , . . . , pj ) j , 1 j r, j odd}
and
D = {1} {d = p1 pr > 1 : (p1 , . . . , pj ) j , 1 j r, j even}.
Then we have that
X X
S(A, z) = |A| S(Ap1 , p1 ) |A| S(Ap1 , p1 )
p1 <z p1 1
X X
= |A| |Ap1 | + S(Ap1 p2 , p2 )
p1 1 p2 <p1
p1 1
X X X
= |A| |Ap1 | + |Ap1 p2 | S(Ap1 p2 p3 , p3 )
(3.2.2) p1 1 p2 <p1 p3 <p2 <p1
p1 1 p1 1
X X X
|A| |Ap1 | + |Ap1 p2 | S(Ap1 p2 p3 , p3 )
p1 1 p2 <p1 (p1 ,p2 ,p3 )3
p1 1
X
+ (d)|Ad |,
d|P (z)
3.2. BUCHSTAB ITERATIONS AND GENERAL UPPER & LOWER BOUND SIEVES35
where
(
(d) if d D+ ,
(3.2.3) + (d) =
0 otherwise.
where
(
(d) if d D ,
(3.2.4) (d) =
0 otherwise.
From the sum over p1 and p2 we drop the terms with D/(p1 p2 ) < p2 , since some of these
are potentially much smaller than expected, and keep all of the terms with D/(p1 p2 ) p2 ,
since for these we have that
S(Ap1 p2 , p2 ) |Ap1 p2 | V (p2 ) g(p1 p2 )X V (p2 )
36 CHAPTER 3. THE COMBINATORIAL SIEVE
As before, we drop the terms with D/(p1 p2 p3 p4 ) < p4 and keep the rest. This leads us to
the choice
4 = {(p1 , . . . , p4 ) : z > p1 > > p4 , p1 p4 < D/p4 }.
is called an upper bound sieve of level D. Similarly, a sequence S = { (d)}dD such that
X
(S ) (1) = 1 and (d) 0 (n > 1)
d|n
is called a lower bound sieve of level D. The reason for this terminology is that, under the
above assumptions, we have that
X X X X
S(A, z) = 1 + (d) = + (d)|Ad |
aA aA d|(a,P (z)) d|P (z)
(a,P (z))=1
and
X X X X
S(A, z) = 1 (d) = (d)|Ad |,
aA aA d|(a,P (z)) d|P (z)
(a,P (z))=1
Theorem 3.3.1 (Fundamental Lemma of Sieve Methods, I). Let A be a finite set of integers
which satisfies (A1) and (A4a), for some > 0 and K 1. For z 1 and u > 0, we
have that
X
S(A, z) = X V (z) 1 + O eu log u+O,K, (u) + O
|rd | .
d|P (z), dz u
Theorem 3.3.2 (Fundamental Lemma of Sieve Methods, II). Let A be a finite set of integers
which satisfies (A1) and (A4a), for some > 0 and K 1, and (R) for A = + 1 and
D = X , (0, 1]. If z = X 1/s with s 1, then we have that
s log s+O,K, (s) 1
S(A, z) = X V (z) 1 + O,K e + .
log X
X V (z)
S(A, z) (z0 z X /k ),
10
where z0 is a sufficiently large constant that depends at most on and C1 .
Proof. The first part of the theorem follows by taking u = s and = in the first part of
Theorem 3.3.1, since relation (A4a) implies that V (z) ,K (log X) . For the second part,
note that z X . So the desired lower bound on S(A, z) follows by the second part of
Theorem 3.3.1.
Theorem 3.3.3 (Fundamental Lemma of Sieve Methods, III). Let > 0, z 1 and D = z u
with u 2. There exist two arithmetic functions : N [1, 1], depending at most on ,
z and D, such that:
Deduction of Theorem 3.3.1 from Theorem 3.3.3. Note that we may assume throughout the
proof that u 2, since if u 2, then we have that
X
0 S(A, z) S(A, z /2 ) c,K X V (z /2 ) + O |rd |
d|P (z /2 ), dz
2 X
c,K X K V (z) + O |rd |
d|P (z), dz u
by the case u = 2 and relation (A4a), where c,K is some constant depending at most on
and K. Therefore holds in this case too, possibly by enlarging the implied constants in the
statement of Theorem 3.3.1. Now, assume that u 2 and let be the two sequences from
Theorem 3.3.3 applied with z and as in the statement of Theorem 3.3.1, and with D = z u .
Then
X X X X
S(A, z) = 1 + (d) = + (d)|Ad |
aA aA d|(a,P (z)) d|P (z)
(a,P (z))=1
X X
=X + (d)g(d) + + (d)rd
d|P (z) d|P (z)
X
= X V (z) 1 + O,K eu log u+O,K (u)
+O |rd | ,
d|P (z), dz u
3.3. THE FUNDAMENTAL LEMMA OF SIEVE METHODS 39
and
X X
V (z) (d)g(d) = Vr .
(3.3.3)
d|P (z) r1
r even
We fix an integer r 1 and proceed to the estimation of Vr . This will be done by studying the
complicated range of summation in Vr and replacing it by a much simpler one. In particular,
we will show that the primes pi are bounded from below by certain appropriate powers of z.
Consider p1 , . . . , pr lying in the range of Vr . We claim that
(u1)( 1
b 2j c
(3.3.4) p1 pj Dz +1 ) (0 j r, j r 1 (mod 2)).
We argue by induction: if j = 0 or j = 1, then relation (3.3.4) holds trivially, since p1 < z
D. Assume now that (3.3.4) holds for some j {0, . . . , r 3} that has opposite parity than
r. Then
2
+1
2 D
p1 pj+2 < p1 pj pj+1 < p1 pj
p1 pj
1
1 b 2 c
j +1
1 2 2
= (p1 pj ) +1 D +1 Dz (u1)( +1 ) D +1
1 b j+2
2 c
= Dz (u1)( +1 ) ,
40 CHAPTER 3. THE COMBINATORIAL SIEVE
which completes the inductive step and hence the proof of (3.3.4). Note that relation (3.3.4)
and our assumption that pr yr imply that
1
+1
D
(3.3.5) pr z r ,
p1 pr1
where
1 b 2 c
r1 r
(3.3.6) u1 1 1 2
r = .
+1 +1 +1 +1
V (z r ) X
g(p1 ) g(pr )(p1 pr1 p+1
r )
D
z>p1 >>pr z r
(3.3.7) r
V (z ) X X
g(pr )p(+1)
r g(p1 ) g(pr1 )(p1 pr1 )
D (r 1)!
z>pr z r z>p1 ,...,pr1 >z r
r1
KV (z) X
(+1)
X
g(p)p g(p)p .
r D (r 1)! r r
z p<z z p<z
by relation (A4a) and Rankins trick (see Section 0.3). Moreover, if is some number > 1
and 1/ log z, then we have that
X X X X
g(p)p g(p) + t+1 g(p)
z r p<z z r p<1/ 1t log
log z t/ p<(t+1)/
log X
t+1 t+1
log K + log K .
r log z 1t log z
t
1 r +1
log K + OK,, (z ) log + OK,, (z ),
r 2 1
by (A4a). The inequality x + y xey/x , for x and y positive, and relation (3.3.6) then imply
that
r1
r r1
1 X
+1 2 r +1
g(p)p ( + 1) log eOK,, (z )
1 2 1
(3.3.8) r zr p<z
r1
r
eOK,, (z ) ,
e
3.3. THE FUNDAMENTAL LEMMA OF SIEVE METHODS 41
where
2
e5/4
+1 e +1
(3.3.9) = log = < 0.873,
1 2 1 4
Inserting (3.3.8) and (3.3.10) into (3.3.7) and choosing > 1 so that 0.9, we deduce
that r1 r+1
eOK, (z ) 0.9 r
Vr
D e r!
Since r!/r > rr1 /er , we conclude that
eOK, (z ) 2
Vr r (0.9)r .
D
Summing the above inequality over r 1, we arrive to the estimate
X eO,K (z )
Vr .
r1
D
and
+ + 2
+1 2 2 2 2 1
= 1+ 1+ < e4 ,
1 1 4
42 CHAPTER 3. THE COMBINATORIAL SIEVE
which is possible since = > 1 + 4 and = e5/4 /4 < 0.873, by relation (3.3.9). Note
that for 3/2 w < z we have that
+
V (w) C log z C log z
1+ 1+
V (z) log w log w (log w)1 (log z) log w
+
log z
1.001 ,
log w
V (z r )
X 1.001
g(p) log log .
z r p<z
V (z) r+
1 X X 0.8732 7
Vr 0.272 0.873r = 0.272 2
< .
V (z) r1 r1
1 0.873 8
r even r even
Inserting the above inequality into (3.3.3) completes the proof of property (5) and hence of
Theorem 3.3.3.
Chapter 4
In this chapter, we give some applications of the results we proved in the previous sections,
particularly of the fundamental lemma of sieve methods. In order to show that (R) holds
for certain sequences A, we need to control the number of primes in arithmetic progressions
on average. If we let Z x
dt
li(x) := ,
2 log t
then we expect that (x; q, a), the number of primes up to x in the arithmetic progression
a (mod q), should be very well approximated by li(x)/(q). The prime number theorem for
arithmetic progressions implies that this is true if x is large enough in terms of q, specifically,
/3
when x > eq . Bombier and Vinogradov showed (independently) that this remains true for
much smaller x as well (for x as small as q 2+ ), but on an average sense. Indeed, if we set
li(y)
(4.0.1) E(x; q) = max max (y; q, a)
,
yx (a,q)=1 (q)
then their result is formulated as follows:
We will prove this theorem in Chapter 10. Note that the trivial bound on the sum in
question, which comes from the Brun-Titchmarsch inequality (see Theorem 4.1.4 below), is
X li(y) X x
max max (y; q, a) x (x 2).
1/2 B
yx (a,q)=1 (q)
1/2 B
(q) log x
qx /(log x) qx /(log x)
So the Bombieri-Vinogradov theorem allows us to save an arbitrary power of log over this
trivial estimate. Moreover, Theorem 4.0.4 is essentially of the same strength with the Gen-
eralized Riemann Hypothesis (GRH) on average. Indeed, GRH implies the bound
X li(y) X x
max max (y; q, a)
x log x (x 2).
yx (a,q)=1 (q) (log x)B1
qx1/2 /(log x)B qx1/2 /(log x)B
43
44 CHAPTER 4. SOME APPLICATIONS OF SIEVE METHODS
Remark 4.0.6. As Friedlander and Granville [FG] have shown, if we replace x1 with
x/(log x)B in the above conjecture, then the conclusion is not always correct.
Exercise 4.0.7. Show that the Bombieri-Vinogradov theorem is equivalent to the following
statement: For every A > 0, there is some C > 0, such that if 1 Q x/(log x)C , then
x 1 Q
# q Q : E(x; q) A
A .
(q) log x (log x) (log x)A
Exercise 4.0.8. Let k N and A > 0. Using the Bombieri-Vinogradov theorem, prove that
there is B = B(A, k) such that
X x
k (q)E(x; q) k,A (x 2).
(log x)A
qx1/2 /(log x)B
and
x
#{p x : (p + 2) 8} (x 3).
(log x)2
|Ad | = (x; d, 2) = 1
1
if (d, 2) = 1,
g(d) = (d)
0 otherwise,
and
(x; d, 2) li(x)
if (d, 2) = 1,
rd = (d)
1 if (d, 2) > 1.
log w0
1 1
= 1+O 1+O
log w log w w
0
1 log w
(4.1.1) = 1+O (3/2 w w0 ),
log w log w
by Mertens estimate (1.1.5), that is to say, relation (A4b) holds with = 1. Lastly,
relation (R) holds with A = 2 and D = x , for any (0, 1/2), by the Bombieri-Vinogradov
Theorem (see Theorem 4.0.4). So Theorem 3.3.1 and relation (4.1.1) imply that
Y
1 x
#{p x : p + 2 prime} x + S(A, x) x + li(x) 1 ,
(p) (log x)2
p< x
Exercise 4.1.2.
(a) Show that
x
#{n x : (n, P (z)) = 1} (3/2 z x)
log z
46 CHAPTER 4. SOME APPLICATIONS OF SIEVE METHODS
and
x
#{n x : (n, P (z)) = 1} (3 2z x).
log z
(b)* Show the stronger estimates
X n x
(3/2 z x)
nx
(n) log z
P (n)z
and
X 2 (n)(n) x
(3 2z x).
nx
n log z
P (n)z
Proof. If y/2 < q y, then the theorem follows by the trivial inequality
y
(x; q, a) (x y; q, a) #{x y < n x} 1 + .
q
Assume now that q y/2 and let A = {x y < n x : n a (mod q)}. Fix for the
moment an integer d. Note that if (d, q) > 1, then there are no solutions to the congruence
dm a (mod q), since (a, q) = 1 by assumption. Therefore |Ad | = 0 in this case. On the
other hand, if (d, q) = 1 and we let d [1, q] be the multiplicative inverse of d modulo q,
then
xy x xy x
|Ad | = # < m : dm a (mod q) = # < m : m da (mod q)
d d d d
xy x
=# kZ: < kq + da
d d
y
= + O(1).
dq
4.1. PRIME VALUES OF POLYNOMIALS 47
that is to say, (A4a) holds with = 1. Finally, relation (R) holds triviallypwith A = 2 and
D = (y/q) , for any < 1. Therefore applying Corollary 3.3.3 with z = y/q and u = 1
implies that
r r
y p y y Y
(x; q, a) (x y; q, a) + S(A, y/q) + (1 g(p))
q q q
p< y/q
r
y y Y 1
= + 1
q q p
p< y/q
p-q
r
y y Y 1
y
+ 1+ ,
q q log(y/q) p (q) log(y/q)
p< y/q
p|q
Theorem 4.1.5. Let F1 (x), . . . , Fr (x) be distinct irreducible polynomials over Z[x] with posi-
tive leading coefficient. Suppose that the polynomial F = F1 Fr has no fixed prime divisors,
i.e. there is no prime p such that p|F (n) for all integers n. Then we have that
x
#{n x : F1 (n), . . . , Fr (n) are all primes} F .
(log x)r
Moreover, if x is large enough and d denotes the degree of F , then
x
#{n x : (F (n)) 4rd} F .
(log x)r
Proof. Exercise.
Hint: If G is an irreducible polynomial over Z[x] and
G (d) = #{m Z/dZ : G(m) 0 (mod d)},
then there is a constant cG such that
X G (p)
1
= log log x + cG + OG (x 2).
px
p log x
48 CHAPTER 4. SOME APPLICATIONS OF SIEVE METHODS
Theorem 4.2.1. There are absolute constants A0 > 0 and B 0 > 0 such that, for every x 3
and every integer r 1, we have that
A0 x (log log x + B 0 )r1
p1
# 3<px: =r
2 log2 x (r 1)!
Proof. Exercise.
Hint: Show the stronger statement that
p1
Sr (x, k) := # p x : p 1 (mod 2k), =r
2k
(4.2.1)
A0 x (log log(x/k) + B 0 )r1
,
(k) log2 (x/d) (r 1)!
uniformly in r 1 and 1 k x/2.
Now, note that (n) = pa kn pa1 (p 1), so we expect (n) to have many prime factors,
Q
many more than a typical integer of the same size. Our goal in this section is how fast the
image of grows. So set V (x) = #{(n) x}. Clearly, V (x) (x + 1) x/ log x for
x 2. Erdos showed that this lower bound is not so far from the truth:
Indeed, if m = (n) x is counted by V (x) but not by V1 , V2 or V3 ,Qthen (m)Q 10 log log x
r ai
c
and there are > R/2 primes in P that divide n. So, if we write m = i=1 pi = pb kn pb1 (p
1), where p1 , . . . , pr are the distinct prime factors of m, then
X R
(m) (m) (p 1) 10 log log x 100N 10 log log x = 40 log log x.
2
p|n
Consequently,
r r ai 1 Pr ai 1 (m)(m)
ba /2c
Y Y
pi i pi 2
2 i=1 2 =2 2 220 log log x > (log x)10 ,
i=1 i=1
that is to say, (m) is divisible by a square d > log x and thus it is counted by V4 .
As we mentioned above, we expect that in the right hand side of (4.2.2) the main con-
tribution comes from V1 , whereas V2 , V3 and V4 are error terms. We start by estimating V4
which is the easiest. We have that
X X x x
V4 #{m x : d2 |m} 2
,
10 10
d (log x)10
d>(log x) d>(log x)
which is admissible. Next, Theorem 2.1.2 and Stirlings formula imply that
X X Ax (log log x + B)r1
V3 r (x)
r>10 log log x r>10 log log x
log x (r 1)!
x (log log x + B)b10 log log xc
log x (b10 log log xc)!
x (log log)10 log log x
log x (10(log log x)/e)10 log log x log log x
x x
10 log 109
,
(log x) (log x)14
which is also admissible. Similarly, we have that
X X Ax (log log x + B)r1 x (log log x + B)bRc1
V1 r (x1 )
rR rR
log x (r 1)! log x (bRc 1)!
log log x
x1 (log log x + B) N
log x1 ((log log x)/(eN ))! R
x1
1+log N ,
(log x) N +1
which is admissible, provided that N is large enough. Finally, we estimate V2 : note that if
n is counted by V2 , then it is possible to write n = ab, where a has exactly S = bR/2c prime
factors all of which are in P. Call A the set of such numbers a. Then
!S
X Xx X 1 x X1
V2 #{n x1 : a|n} =x .
a p <<p
p 1 pS S! p
aA aA 1 S pP
pi P
50 CHAPTER 4. SOME APPLICATIONS OF SIEVE METHODS
So we deduce that
xeON (S) xeON (S) x x
V2 N log log log x N ,
S! (S/e)S S (log x) 2N +ON (1) (log x)2
which is admissible. This completes the proof of the theorem.
Titchmarsch first studied this sum when k = 2 and evaluated it asymptotically under the
assumption of GRH. Subsequently, Linnik removed this assumption, so that the following
result now holds unconditionally:
Proof. We shall present a proof due to Rodriguez [Ro] based on the Bombieri-Vinogradov
theorem (i.e. Theorem 4.0.4). We have that
X X X X X
T = (p + s) = 1= 2 1 + 1 p+s=
px px ab=p+s px a< p+s
a|p+s
X X
=2 1 + O( x)
px a< p+s
X
=2 ((x, a, s) (a2 s, a, s)) + O( x).
a< x+s
4.3. THE TITCHMARSCH-LINNIK DIVISOR PROBLEM 51
So we deduce that
X
T =2 ((x, a, s) (a2 s, a, s)) + O( x).
a< x+s
(a,s)=1
Next, the Bombieri-Vinogradov Theorem with A = 1 implies that there is some absolute
constant B > 0 such that
X
(x, a, s) li(x) x ,
pQ
(a) log x
and consequently
X 1 Y 1
Y
1
log x
= 1+ 1 + O(log log x) .
aQ
(a) p(p 1) p 2
p-s p|s
(a,s)=1
Inserting this estimate into (4.3.1), and noting that li(x) = x/ log x+O(x/ log2 x), we deduce
that
Y 1
Y
1
X
x log log x
T =x 1+ 1 + (x, a, s) + O .
p(p 1) p log x
p-s p|s Q<a x+s
(a,s)=1
The reason that this sum is so small is that log a lies in a short interval: log a is of size
log x and it is restricted in an interval of length log log x. To see (4.3.2), we apply the
Brun-Titchmarsch inequality and Exercise 0.2.9 to get that
X X x x X 1
(x, a, s)
(a) log(x/a) log x (a)
Q<a x+s Q<a x+s Q<a2 x
(a,s)=1 (a,s)=1 (a,s)=1
x X 1 X a x X 1
2m
log x 2m (a) log x 2m
Q/22m 2 x 2m <a2m+1 Q/22m 2 x
x x log log x
= (log x log Q + 1) ,
log x log x
which completes the proof of (4.3.2) and hence of the theorem.
Exercise 4.3.2.
(a) Show that, for 1 |s| x and x 3, we have that
Y 2
X 1
3 (p + s) x log x 1
px
p
p|s
4.3. THE TITCHMARSCH-LINNIK DIVISOR PROBLEM 53
(b)* Let 1 |s| x and x 3. Show that under the Elliott- Halberstam conjecture we
have that X
3 (p + s) = C(s)x log x + O(x),
px
Selbergs sieve
Note that this immediately provides an upper bound sieve: expanding the square, we find
that X X
z (n) + (d), where + (d) = d1 d2 .
d|(n,P (z) d1 ,d2 N
[d1 ,d2 ]=d
In particular, for any finite set of integers A satisfying (A1), we have that
X
S(A, z) d1 d2 |A[d1 ,d2 ] |
d1 ,d2 |P (z)
X X
=X d1 d2 g([d1 , d2 ]) + d1 d2 r[d1 ,d2 ]
d1 ,d2 |P (z) d1 ,d2 |P (z)
X X
X d1 d2 g([d1 , d2 ]) + |+ (d)r[d1 ,d2 ] |
d1 ,d2 |P (z) d|P (z)
=: X G + R.
This inequality turns upper bounds for S(A, z) to an optimization problem: we need to
choose the parameters d so that XG + R is minimized. It turns out that this problem is
too hard. Instead we optimize only the main term XG. In order to have some
control on
the error term R, we impose the additional constraint that d = 0 for d > D, so that +
is supported on integers d D.
55
56 CHAPTER 5. SELBERGS SIEVE
Now we turn to the problem of optimizing the value of the bilinear form
X
G= d1 d2 g([d1 , d2 ]),
di |P (z), di D
i{1,2}
under the restriction that 1 = 1. For any prime p, if p1 kd1 and p2 kd2 , then pmax{1 ,2 } k[d1 , d2 ]
and pmin{1 ,2 } k(d1 , d2 ). This implies that
(5.1.1) g((d1 , d2 ))g([d1 , d2 ]) = g(d1 )g(d2 ).
Let P = {p prime : g(p) 6= 0} and Pz = P (1, z). Note that in optimizing G we may assume
that the d s are supported in
Y
Dz = {d D : d| p}.
pPz
Moreover, we assume that (A2) holds. From the above discussion, we have that
X g(d1 )g(d2 )
G= d1 d2 .
d1 ,d2 Dz
g((d 1 , d2 ))
Q
Next, let h(n) = p|n g(p)/(1 g(p)) > 0 so that 1/g(n) = (1 (1/h))(n), whenever n is
a square-free integer for which g(n) 6= 0. Then
X X 1
G= d1 d2 g(d1 )g(d2 )
d ,d D
h(m)
1 2 z m|(d1 ,d2 )
X 1 X
= d1 d2 g(d1 )g(d2 )
mDz
h(m)
di Dz , m|di
i{1,2}
2
X 1 X
=
.
d g(d)
mDz
h(m) dDz
d0 (mod m)
This proves that there is a one-to-one correspondence between the variables d and the
variables m . In particular, the constraint 1 = 1 becomes
X
(5.1.3) m (m) = 1.
mDz
Additionally,
!1
X 1 2 2 X X
(5.1.5) G= c (m)h2 (m) = c2 h(m) = h(m) .
mDz
h(m) mD mDz
z
Theorem 5.1.1. Let A be a finite set Q of integers satisfying (A1) and (A2) and let D and
z be positive real numbers. If h(n) = p|n g(p)/(1 g(p)), then
1
X X
S(A, z) X h(m) + 3 (d)|rd |.
m D, m|P (z) dD, d|P (z)
Remark 5.1.2. Note that we no longer need to keep track of the fact that the first sum
above runs over integers m with g(m) 6= 0, as this is captured by the vanishing of h(m)
whenever g(m) = 0.
As we will see in the next section, this theorem is essentially as strong as the upper bound
implicit in Theorem 3.3.1. For now, let us point out that
X Y Y 1 1
h(m) (1 + h(p)) = = .
p<z p<z
1 g(p) V (z)
m D, m|P (z)
So the upper bound provided by Theorem 5.1.1 is always at least as big as X V (z).
Remark 5.1.3. As in the combinatorial sieve, we need control of the error terms on average.
We may then impose the condition that
X C3 X
(R0 ) 3 (d)|rd | .
(log X)B
dD, d|P (z)
This can be related to condition (R) under the assumption of a crude bound on rd . Indeed,
assume that
1/2 1/2
X X X
3 (d)|rd | 3 (d)2 |rd | |rd |
dD, d|P (z) dD, d|P (z) dD, d|P (z)
9 !1/2
log X C2 X
C4 K 9 X
log 2 (log X)A
and hence (R0 ) holds with C3 = (C2 C4 )1/2 (K/ log 2)9/2 and B = (A 9)/2.
Theorem 5.2.1 (Fundamental Lemma of Sieve Methods, IV). Let A be a finite set of
integers which satisfies (A1) and (A4b), for some > 0 and C1 0. For z 1 and
u > 0, we have that
X
S(A, z) = X V (z) 1 + O,K, uu/2 + O
3 (d)|rd | .
d|P (z), dz u
Proof. Without loss of generality, we may assume that u is large enough. First, we show the
upper bound. We claim that
X
h(m) = 1 + O u(u+1)/2 .
V (z)
(5.2.1)
mz u/2 , m|P (z)
Together with Theorem 5.1.1, relation (5.2.1) certainly implies the desired upper bound.
Since
X Y Y 1 1
h(m) = (1 + h(p)) = = ,
p<z p<z
1 g(p) V (z)
m|P (z)
Note that P (z) eO(z) , so we may assume that u z/ log z. We shall show (5.2.1) using
60 CHAPTER 5. SELBERGS SIEVE
Rankins trick, arguing as in Theorem 0.3.3. For every [1/ log z, 1], we have that
X X Y
(5.2.3) h(m) z u/2 h(m)m = z u/2 (1 + h(p)p )
m|P (z), m>z u/2 m|P (z) p<z
u/2
Y 1 + h(p)p
z
(5.2.4) =
V (z) p<z 1 + h(p)
z u/2 Y
(5.2.5) = (1 + g(p) (p 1)) .
V (z) p<z
by (A4b). Moreover, note that for every integer r with 2 r log z + 1, we have that
Y X V (e(r1)/ )
log (1 + g(p) (p 1)) er g(p) er log
V (er/ )
e(r1)/ p<er/ e(r1)/ p<er/
r C1 r
e + log
r1 r1
r
e
C1 , .
r
by (A4b), and consequently
Y
z
log (1 + g(p)(p 1)) OC1 , .
1/
log z
e p<z
for every [1/ log z, 1]. We set w = log z and choose w 1 so that ew1 /w = u. Then
w = log u + log log u + O(1) and thus
w
u log z z uw e u log u + u log log u
+ OC1 , = + O,C1 = + OC1 , (u)
2 log z 2 w 2
(u + 1) log u
+ OC1 , (1),
2
5.2. THE FUNDAMENTAL LEMMA: ENCORE 61
which completes the proof of (5.2.2) and hence of the desired upper bound.
Finally, for the lower bound, we start with Buchstabs identity (3.2.1)
X
(5.2.7) S(A, z) = |A| S(Ap , p),
p<z
which allows to turn the task of finding a lower bound for S(A, z) to finding upper bounds for
S(Ap , p), for p < z, which will be accomplished by an appeal to (5.2.2). We remark here that
in the proof of (5.2.2) we only needed to (A1) for d|P (z) and (A4b) with 3/2 w w0 z.
Note that for d|P (p), we have that
which shows that (A1) holds with Xg(p) and rdp in place of X and rd , respectively. Moreover,
relation (A4b) is clearly satisfied for 3/2 w w0 p with the same parameters and C1 .
Therefore, relation (5.2.2) with Ap , p and wp in place of A, z and u, respectively, implies
that
Y X
S(Ap , p) 1 + OC1 , wp(wp +1)/2 g(p)X (1 g(p0 )) + O
3 (d)|rdp | .
p0 <p dpwp , d|P (p)
We pick wp so that pwp = z u /p, that is to say, wp = 1 + u log z/ log p, so that if we set
h = u(log z/ log p 1), then
(wp + 1) log wp = O(1) + (wp + 1) log(wp + 1) = O(1) + (u + h) log(u + h)
log z
O(1) + u log u + h O(1) + u log u + ,
log p
(wp +1)/2
by the Mean Value Theorem, so that wp uu/2 e log z/ log p . So we conclude that
Y X
uu/2 e log z/ log p g(p)X (1g(p0 ))+O
S(Ap , p) 1 + OC1 , 3 (d)|rdp | .
p0 <p dz u /p, d|P (p)
Inserting this inequality into (5.2.7) and using (A1) to write |A| in terms of g, X and r1 , we
find that
! !
X X X g(p)V (p)
S(A, z) X 1 g(p)V (p) + OC1 , log z
p<z
uu/2 p<z e log p
(5.2.9)
X
+O 3 (m)|rm | ,
mz u , m|P (z)
since each m|P (z) with m z u can be written uniquely as m = dp with d|P (p), p < z and
d z u /p. Finally, observe that
X
(5.2.10) 1 g(p)V (p) = V (z)
p<z
62 CHAPTER 5. SELBERGS SIEVE
and that
X g(p)V (p) X g(p)V (p) log z
log z C1 V (z) log z
p<z e log p p<z e log p log p
(5.2.11)
V (z) X
C1 , g(p) log p C1 , V (z),
log z p<z
by the argument leading to (5.2.6). Combining relations (5.2.9), (5.2.10) and (5.2.11) com-
pletes the proof of the desired lower bound as well.
5.3 Applications
Our goal here is to obtain upper bounds on S(A, z) that are as tight as possible for z as
large as possible. Our starting point is Theorem 5.1.1 with D = z 2 , which implies that
!1
X X
(5.3.1) S(A, z) X 2 (m)h(m) + 3 (d)|rd |.
m<z dz 2
This reduces upper bounds on S(A, z) to lower bounds on the partial sums of 2 (m)h(m).
Theorem 0.4.1 then handles these averages.
Theorem 5.3.1. Let A be a finite set of integers which satisfies (A1) and (A30 ) for some
> 0, C6 0 and L 1. Furthermore, assume that g(p) C5 /p for all primes p, and that
(R0 ) holds with B = + 1 and D = X , for some (0, 1). For z eL/ , we have that
X ( + 1) L
S(A, z) S(A) + O,C5 ,C6 ,
(log X) (/2) +1 log z
where
Y 1
S(A) = (1 g(p)) 1 .
p
p
Proof. This is a direct corollary of Theorems 5.1.1 and 0.4.1 (note that since g satisfies its
hypotheses, so does h).
Corollary 5.3.2. Let F1 (x), . . . , Fr (x) be distinct irreducible polynomials over Z[x] with
positive leading coefficient. Suppose that the polynomial F = F1 Fr has no fixed prime
divisors, i.e. there is no prime p such that p|F (n) for all integers n. Then we have that
S(F )x r log log x
#{n x : F1 (n), . . . , Fr (n) are all primes} 2 r! + OF ,
(log x)r log x
where r
Y F (p)
1
S(F ) = 1 1 ,
p
p p
and F (m) = #{n Z/mZ : F (n) 0 (mod m)}.
5.3. APPLICATIONS 63
Proof. Exercise.
With regard to the frequency that tuples of polynomials are simultaneously prime num-
bers, Bateman-Horn conjectured that
Exercise 5.3.4. Using the ideas of Section 7.4, build a probabilistic model in favour of the
above conjecture.
where 2
Y 2
1
c=2 1 1
p>2
p p
and (
E(x; d) if (d, 2s) = 1,
|rd |
(2s) otherwise,
where E(x; d) is defined by (4.0.1). Therefore, if D x/(log x)B for some large enough
B, then the Bombieri-Vinogradov theorem (Theorem 4.0.4) and Remark 5.1.3 imply that
64 CHAPTER 5. SELBERGS SIEVE
0 2
(R
) holds Bwith A = 3. Consequently, Theorem 5.1.1 and relation (5.3.2) with D = z =
x/(log x) yield that
li(x) x
(5.3.3) #{p x : p + 2s is prime} +O ,
S (log x)3
where X Y 1
S= 2 (n)h(n) with h(n) = .
p2
n D p|n, p>2
(n,s)=1
Note that
Y X X X X
(1 + h(p)) 2 (n)h(n) = 2 (d)h(d) 2 (n)h(n) 2 (m)h(m).
p|s n D d|s n D m D
(n,s)=1 (n,s)=1
Hence
Y Y
log x 1 log x p2
S + O(log log x) (1 + h(p)) = + O(log log x) .
4c 4c p1
p|s p|s, p>2
Since li(x) = x/ log x + O(x/ log2 x) and (p 1)/(p 2) > 1, the theorem follows.
Proof. Exercise.
Finally, following an argument due to Selberg [Se91, p. 226-233], we show how keeping
track of the special structure of the parameters d can yield an improved version of Theorem
5.3.6.
5.3. APPLICATIONS 65
Proof. Clearly, we may assume that y cq, for some large enough constant c. First, we
show the theorem when q = 1. The general case will follow from this special case. We shall
go through the proof of Theorem 5.1.1 again because we need to use the structure of the
d s in the error term too.
We start with the inequality
2
X X
#{x y < n x : (n, P (z)) = 1} d
xy<nx d|(n,P (z))
X x xy
= d1 d2
[d1 , d2 ] [d1 , d2 ]
d1 ,d2 |P (z)
(5.3.5) X y X
d1 d2 + |d1 d2 |
[d1 , d2 ]
d1 ,d2 |P (z) d1 ,d2 |P (z)
2
X d d X
1 2
=y + |d | .
[d1 , d2 ]
d1 ,d2 |P (z) d|P (z)
Let D 1. Arguing as in the proof of Theorem 5.1.1 (with g(d) = 1/d and h(d) = 1/(d)
for d square-free), we find that if we let
,
X 1 X 1
d = (d)d ,
(m) (m)
m|P (z), m D m|P (z), m D
m0 (mod d)
then
1
X d1 d2 X 1 1
(5.3.6) = = .
[d1 , d2 ] (m) S
d1 ,d2 |P (z) m|P (z), m D
66 CHAPTER 5. SELBERGS SIEVE
1 X (m) 1 X 2 (m)(m)
(5.3.7)
S (m) S (m)
m|P (z), m D m D
DY p+1 1
1+ 1 ,
S p p(p 1) p
So if D is large enough, then inserting (5.3.6) and (5.3.7) into (5.3.5), we deduce that
y 2.31D
#{x y < n x : (n, P (z)) = 1} + .
S S2
Next, note that if D = z 2 , then
X 2 (d) X log p
log z
S= = log z + + +O log z + 1.29
dz
(d) p
p(p 1) z
p
for z large enough. Consequently, writing z = y/t for some t > 0, we find that
y 2.31z 2
#{x y < n x : (n, P (z)) = 1} +
log z + 1.29 log2 z
2y 4.62 2y
= +
log y log t + 2.58 t(log y log c)2
2y log t 2.58 + 4.62/t 1
= 1+ + Ot .
log y log y log y
We choose t = 4.62, in which case log t 2.58 4.62/t 2.409, to conclude that
p 2y
(5.3.8) #{x y < n x : (n, P ( y/4.62)) = 1} (y c1 ).
log y + 2.409
p
Inserting this estimate into (5.3.4) when z = y/4.62 completes the proof of the theorem
in the case when q = 1.
The case q > 1 now follows from the case q = 1: note that
z X
(x; q, a) (x y; q, a) + 1 + 1.
(5.3.9) q xy<nx
na (mod q)
(n,P (z))=1
5.3. APPLICATIONS 67
Q
Write Pq (z) = p<z, p-q p and note that if n a (mod q), then (n, P (z)) = 1 if and only if
(n, Pq (z)) = 1. Let w [1, Pq (z)] be the multiplicative inverse of q (mod Pq (z)), that is to
say wq 1 (mod Pq (z)). Then if we write n = a + kq, we find that
Moreover, since x y < n x, we have that (x a y)/q < k (x a)/q. Thus if we set
m = k + aq, then x1 y/q < m x1 , where x1 = (x a)/q + aq, and consequently
X X
1= 1.
xy<nx x1 y/q<mx1
na (mod q) (n,Pq (z))=1
(n,P (z))=1
p
Now note that if z = (y/q)/4.62, then
q
X X X X X
(q) 1= 1= 1
x1 y/q<mx1 x1 y/q<mx1 1jq j=1 x1 y/q<mx1
(m,Pq (z))=1 (m,Pq (z))=1 (m+jPq (z),q)=1 (m+jPq (z),P (z))=1
2y/q
q (y c1 q)
log(y/q) + 2.409
p
by (5.3.8). So letting z = (y/q)/4.62 in (5.3.9), we deduce that
p
y/q 2y
(x; q, a) (x y; q, a) +1+ (y c1 q),
q (q)(log(y/q) + 2.409)
Remark 5.3.8. Using different methods, Montgomery and Vaughan [MV] proved that
2y
(x; q, a) (x y; q, a) (1 q < y x, (a, q) = 1).
(q) log(y/q)
Remark 5.3.9. Improving the constant 2 in the statement of Theorem 5.3.7 would have
very important consequences. In particular, if there are positive constants and L such that
x
(x; q, a) (2 ) (q 2, (a, q) = 1, x q L ),
(q) log(x/q)
then it is possible to show that there are no Landau-Siegel zeroes, that is to say, there is
some constant c0 , depending at most on L and , such that the L-function L(s, ) has no
zeroes in (1 c0 / log q, 1), for every Dirichlet character (mod q). However, Selbergs sieve
alone cannot lead to such an improvement: as we will see in the next section, the constant
2 is, in general, best possible.
68 CHAPTER 5. SELBERGS SIEVE
z. This was accomplished by taking the integers n [1, x] and splitting them two subsets
of roughly the same size, according to the parity of (n). This inability of sieve methods
to distinguish between the elements of the same set that have an even or an odd number
of prime factors is referred to as the parity problem in sieve methods and it is a stumbling
block in many important number-theoretic questions, such as the twin prime conjecture. As
Friedlander and Iwaniec showed [FI98], it is possible to overcome this obstacle by adding an
extra axiom to (A1), (A3) and (R0 ), which eliminates examples such as the sets A(0) and
A(1) defined above. This extra axiom guarantees that the characteristic function of A does
not correlate with the Mobius function, that is to say, we impose conditions of the form
X
(a) = o(|A|) (|A| )
aA
Smooth numbers
So far we have dealt with sieving problems where we try to estimate the size of a set A
after having removed all multiplies of small primes from it, say of primes < z, in the hope
of detecting prime numbers in A. However, another natural problem, which also occurs
frequently is to study the size of a set A after having removed all multiplies of large primes
from it, say of primes > y. Such numbers are called y-smooth numbers1 , and their study has
various important applications. We shall study the problem of counting y-smooth numbers
in the simple case when A = {n x}. To this end, given real numbers x and y, we define
Following a heuristic argument based on probabilistic grounds (see Section 2.2), one might
log y
guess that (x, y) x log x
. However, as we will see, in reality the size of (x, y) is much
smaller.
As one might expect, estimating (x, y) becomes harder as y becomes smaller in terms of
x. But for large values of x, it is not terribly hard to estimate (x, y). Indeed, observe that
when y x, we trivially have that
1
The term friable numbers, that is to say, numbers that can be easily broken into many little pieces, is
used often too.
71
72 CHAPTER 6. SMOOTH NUMBERS
so there is not much to say in this case. Assume
now that y [ x, x]. Note that an integer
n x can have at most one prime divisor in ( x, x]. Therefore
(6.1.3) X
= (x, x) #{m x/p : P + (m) p}
y<p x
X
= (x, x) (x/p, p).
y<p x
p
Now, note that x/p < p x/p for all p (y, x] (x1/3 , x1/2 ]. So, we may use (6.1.2) to
estimate all terms appearing in (6.1.2). Hence we arrive to the estimate
(6.1.4)
x X x
log(x/p)
1
(x, y) = x(1 log 2) + O 1 log +O
log x p log p log(x/p)
y<p x
X x
log(x/p)
x
= x(1 log 2) 1 log +O
p log p log x
y<p x
Z x
log(x/t) dt x
= x(1 log 2) 1 log +O ,
y log t t log t log x
by Mertens estimate and partial summation, which is an asymptotic formula for (x, y) for
y [x1/3 , x1/2 ).
Relations (6.1.2) and (6.1.4) suggest introducing the parameter
log x
u= .
log y
6.1. ITERATIVE ARGUMENTS AND INTEGRAL-DELAY EQUATIONS 73
Indeed, if we set
(
1 if 0 u < 1,
(6.1.5) (u) =
1 log u if 1 u 2,
which can be derived using the argument leading to (6.1.3). So it should come to no surprise
that (u) = limx (x, x1/u )/x. Indeed, this is a consequence of the next theorem:
On the other hand, the left hand side of the above identity is equal to
Z u Z u Z u
0
w (w)dw = u(u) (1) (w)dw = u(u) (t)dt.
1 1 0
This formula is particularly useful in the study of the Dickman-de Brujin function. For
example, using this formula, it is not very hard to show that
1
(6.1.12) 0 < (u) (u 0).
(u + 1)
To see the lower bound, we argue by contradiction: let u0 be the smallest u 0 with
(u) 0. Then we necessarily have that u0 > 1, since (u) = 1 for u [0, 1]. So (6.1.11)
implies that
1 u0
Z
0 (u0 ) = (t)dt > 0,
u0 u0 1
which is a contradiction. This proves the lower bound in (6.1.12) does hold. Now, the upper
bound follows by (6.1.11) and induction: indeed, the lower bound and (6.1.10) imply that
is a decreasing function. Therefore (6.1.11) yields that u(u) (u 1), and the inequality
(u) 1/(u + 1) follows by inducting on buc and the fact that when u [0, 1], we have
that (u) = 1 1/(u + 1).
The following theorem gives a more accurate estimate on the rate of decay of u.
Proof. First, we show the lower bound, which is simpler. Define (u) by e(u) = u(u) + 1.
Clearly, is an increasing function and (1) = 0. Moreover, since
we find that (u) log u. Inserting this estimate into (6.1.13), we conclude that
(6.1.15) fails, and let u0 be the smallest counterexample to it. By choosing c small enough,
we may assume that u0 2. Then
1 u0 dt 1 u0
Z Z
c dt
u (u )
> (u 0) t(t)
e 0 0 u0 u0 1 e u0 u0 1 e 0 )t(u
1 1 1
=
u0 (u0 ) e(u0 1)(u0 ) eu0 (u0 )
1
= u0 (u0 ) ,
e
by the definition of (u), which is a contradiction. So relation (6.1.15) does hold, and the
lower bound in our theorem follows.
For the upper bound we follow a similar argument: define (u) by e(u)+2 = u(u). Then
relation (6.1.14) also holds with in place of . Moreover, arguing as above, we can show
that
C
(u) u(u) (u 1),
e
where C is some large absolute constant. This completes the proof of the upper bound as
well.
d
(s)) = es b(s).
(sb
ds
Derive a formula for b (this formula might not be closed).
Proof of Theorem 6.1.1. We argue by induction. However, instead of using relation (6.1.8),
we establish another iterative formula for (x, y): using the identity log = 1 , we have
76 CHAPTER 6. SMOOTH NUMBERS
that
X X X X
log n = (n) = (d)(x/d, y)
nx nx d|n dx
P + (n)y P + (n)y P + (d)y
!
X X x log p
= (log p)(x/p, y) + O
py 2, py
p
X
= (log p)(x/p, y) + O(x).
py
It is not hard to show that a similar formula also holds for Dickmans function: indeed, by
Mertens theorem and partial summation, we find that
X log p log(x/p) Z y log(x/t) dt Z y
0 log(x/t) dt
= +O
py
p log y 1 log y t 1
log y t log y
Z u Z u
0
= (log y) (w)dw + O | (w)|dw .
u1 u1
The first integral is equal to u(u), by (6.1.11), and the second one is equal to (u1)(u)
1, since 0 (w) for all w 1, by (6.1.10) and (6.1.12). So we conclude that
X log p log(x/p)
(6.1.17) (log x)(u) = + O(1).
py
p log y
for all x y 2.
We are now ready to show the theorem. Fix y 2. We need to prove that there is some
absolute constant C such that
(x, y) log x C
(6.1.19) E(x, y) =
,
x log y log y
6.2. RANKINS METHOD: ENCORE 77
for all x y. We may assume that y eC/2 ; else, relation (6.1.19) holds trivially for all
x 2. Moreover, (6.1.19) holds when x [y, y 2 ], by the discussion in the beginning of this
chapter. In order to establish for larger x, we argue inductively: assume that (6.1.19) holds
for all x [y, 2m y 2 ], for some m 0, and consider x (2m y 2 , 2m+1 y 2 ]. Write x = y u and
note that y x/p 2m y, for every prime p [2, y], so the induction hypothesis and (6.1.18)
imply that
X log p C C(log y + O(1))
(log x)E(x, y) + O(1) = + O(1) = C + O(1),
py
p log y log y
by Mertens theorem and our assumption that y eC/2 . Since x y 2 , we deduce that
C 1 C 1 C
E(x, y) +O +O ,
log x log x 2 log y log y log y
provided that C is large enough, which we may assume. This completes the inductive step
and hence the proof of the theorem.
Theorem 6.2.1. Let x y 3 and set x = y u . If y (log x)3 , then we have that
eO(u)
(x, y) x .
(u log u)u
Proof. Without loss of generality, we may assume that u and x are large enough. As in the
proof of Theorem 6.1.1, we start with the formula
X X X
log n = (d).
(6.2.1) nx mx dx/m
P + (n)y P + (m)y P + (d)y
Now, fix some [1/ log y, 1/3], and note that for 1 n x we have that
1 x1 3x1
log x = log n + log(x/n) log n + 1 log n + 1 ,
1 n n
by the inequality log t t, for t > 0. Together with (6.2.1), this implies that
X x1 X X
(log x)(x, y) + (d).
(6.2.2) nx
n1 mx dx/m
P + (n)y P + (m)y P + (d)y
78 CHAPTER 6. SMOOTH NUMBERS
We bound this last sum as in the proof of Theorem 0.3.3 and choose = w/ log y, where
ew1 /w = u, to complete the proof of the theorem.
Chapter 7
Let p1 , p2 , p3 , . . . be the sequence of prime numbers in increasing order. Our goal in this
chapter is to study the spacing of this sequence and, in particular, how small and how large
the gaps between two consecutive primes can get.PThe Prime Number Theorem implies
that pn n log n as n or, equivalently, that kn (pk+1 pk ) = pn+1 p1 n log n.
Consequently,
!
X pk+1 pk Z n 1 X
= d (pk+1 pk )
1<kn
log k 1 log t 1<kt
Z n X !
pn p2 dt
= (pk+1 pk )
log n 1 1<kt
t log2 t
Z n
pn p2 (pbt+1c p2 )dt
= n (n ),
log n 1 t log2 t
that is to say, the mean value of (pk+1 pk )/ log k is 1. However, it could be possible that
this ratio deviates significantly from its mean. Indeed, if the twin prime conjecture is true,
then it immediately follows that pk+1 pk = 2 for infinitely many values of k. One of the
main results of this chapter, which will be proven in Section 7.0.2, is a major step towards
this conjecture.
This result is due to Maynard [May15a] and Tao [Tab], who built upon previous work
of Goldston, Pintz and Yildirim [GPY]. In the special case m = 1, the first person to show
that lim inf n (pn+1 pn ) < was Zhang [Z]. We will discuss more the history of Theorem
7.0.2 in Section 7.1, where its proof will be presented.
79
80 CHAPTER 7. GAPS BETWEEN PRIMES
show that X X
((1 + ) log N 2k) 1 ( + o(1))N log N,
k
(1+) log N N <n2N
2 pn+1 pn =2k
as N . Next, use the sieve to bound #{N < n 2N : pn+1 pn = 2k} from above and
deduce the desired result.
On the other hand, one could imagine that it would be feasible to construct long strings
of consecutive numbers that are all compositive. For example, all numbers in the sequence
n! + 2, n! + 3, . . . , n! + n are composite. Consequently, if pr < n! + 2 < pr+1 are consecutive
primes, then pr+1 pr n. Moreover, Bertrands postulate implies that pr > n!/2 and
therefore log r log pr log(n!) n log n as n , by Stirlings formula, that is to say,
n log r/ log log r. So this construction produces gaps dr & log r/ log log r, for infinitely
many integers r. This is not quite enough to yield gaps that are longer than the average ones.
However, Westzynthius used a different construction to construct large gaps between primes,
that is to say infinitely many integers n such that pn+1 pn / log n can get arbitrarily large.
His ideas were strengthened by Eros and subsequently by Rankin. Following their arguments,
we will show the following slightly stronger result (Rankins result had the constant e /2 in
place of e ):
s = (s1 , . . . , sk ) is admissible, that is to say, its elements do not cover all congruences classes
modulo any prime. Then, for an integer N sk and a sequence of non-negative weights
{wn }
n=1 , we consider the sum
k
!
X X
S= 1P (n + sj ) m wn .
N <n2N j=1
and + is an upper bound sieve. Indeed, the original idea presented here is due to Goldston,
Pintz and Yildirim [GPY], who defined + using the same idea as in Selbergs sieve, by
letting
2
X
(7.1.1) wn = d ,
d|(P (z),Q(n))
with the parameters d being at our disposal, with the condition that they are supported
on integers d D (we dont have to assume that 1 = 1 here). If we set D = N /2 , then
opening the square and interchanging the order of summation, we find that we need a bound
of the form
X li(x) x
(7.1.2) max (x; q, a)
A ,
(a,q)=1 (q) (log x)A
qx
for all x 2 and all A 1. Note the Bombieri-Vinogradov theorem implies that (7.1.2)
holds with x1/2 /(log x)B and B sufficiently large in place of x . In particular, any < 1/2
is permissible. However, this was not enough to deduce that lim inf n (pn+1 pn ) < in
the original approach by Goldston, Pintz and Yildirim. Indeed, their method required (a
82 CHAPTER 7. GAPS BETWEEN PRIMES
weaker version of) (7.1.2) with some fixed > 1/2 that was not available at the time. This
missing ingredient was supplied by Zhang in May 2013 in his breakthrough paper [Z], where
he proved a variation of (7.1.2) with = 1/2 + 1/584 that was sufficient for him to deduce
that lim inf n (pn+1 pn ) < .
Very shortly after Zhangs paper was published, another method was proposed by James
Maynard [May15a]. Maynard, instead of trying to prove stronger level-of-distribution results
about the primes, took an alternative path and introduced a multidimensional variation of
the GPY weights. His idea, also discovered independently by Tao [Tab], was to consider
weights of the form
2
X
wn =
d
.
dj |n+sj
1jk
It turned out that this simple idea has very far reaching consequences. As we will see,
these modified weights need very weak level-of-distribution results as input. In fact, we only
need to know that (7.1.2) holds for some > 0, a much weaker result that the available
Bombieri-Vinogradov result that allows us to take = 1/2 . This new flexibility that
the Maynard-Tao variation of the Goldston-Pintz-Yildirim weights permits makes them very
applicable to a wide variety of set-ups. Indeed, after the publication of Maynards paper,
the subject has witnessed an explosion of activity.
We now proceed to the proof of Theorem 7.0.2. We will add a small technical twist to
the construction of Maynards weights, by performing a preliminary sieve up to z, where this
is a parameter at our disposal (we will eventually take z to be a large power of log N - a
similar idea is also used in [May15a], but the pre-sieving parameter is smaller). Indeed, we
set 2
X
wn = 1P (Q(n))>z
d
,
dj |n+sj
1jk
In the estimation of S(N, z) in the following two lemmas, we use the notations
NM2
+ OA, .
(log N )A
Proof. Call T the sum in question. Applying the Fundamental Lemma of Sieve Methods (cf.
Lemma 3.3.1) with u defined via the relation z u = N , we find that
X N < n 2N n sj (mod [dj , ej ]),
T = d e # :
P (Q(n)) > z 1jk
(di ei ,dj ej P (z))=1
1i,jk, i6=j
Q (p)
X N pz 1 p
d e 1 + O(uu/2 ) + O N (log N )k1
=
[d1 , e1 ] [dk , ek ]
(di ei ,dj ej P (z))=1
1i,jk, i6=j
Y M 2N
(p) X d e
=N 1 + O,A ,
pz
p [d1 , e1 ] [dk , ek ] (log N )A
(di ei ,dj ej P (z))=1
1i,jk, i6=j
which is admissible by our assumption that z (log N )3k+A+1 . Finally, we note that
1 (d, e) 1 X
= = (a).
[d, e] de de
a|(d,e)
Therefore
2
X d e X X d
= (a1 ) (ak ) .
[d1 , e1 ] [dk , ek ] d1 dk
P (di ei )>z
P (a1 ak )>z dj 0 (mod aj )
1ik P (dj )>z
1jk
84 CHAPTER 7. GAPS BETWEEN PRIMES
since (a) log a. So we may replace (aj ) by aj for all j {1, . . . , k} by producing a total
error term of size (log N )3k+1 /z, which is admissible by our assumption on z.
Lemma 7.1.2. If (log N )A+3k+1 z N 1/(10 log log N ) , D N 1/4 /(log N )log z and j0
{1, . . . , k}, then
2
Y 1
X X (p) 1 X
1P (n + sj0 )
=X
d 1
p
1
p
a1 ak
N <n2N dj |n+sj pz P (aj )>z
P (Q(n))>z 1jk 1jk
aj0 =1
2
NM2
X d
+ OA, .
d1 dk (log N )A
P (d j )>z, d j0 =1
dj 0 (mod aj )
1jk
Proof. Call Tj0 the sum in question. For simplicity, we consider the case j0 = k; the proof
of the other cases is identical. Observe that the fact that n + sk is a prime number greater
than PN forces n + sk to be co-prime to integers d D N . Therefore, for such an n, the
sum dj |n+sj (1jk) d must have dk = 1, and the condition P (Q(n)) > z is reduced to
P (Q (n)) > z, where
k1
Y
Q (n) = (n + sj ).
j=1
(This decreases the dimension of the sieve we are considering by 1.) So, opening the square,
changing the order of summation and setting p = n + sk , we find that
X N + sk < p 2N + sk p sk sj (mod [dj , ej ]),
Tk = d,1 e,1 # : .
P (Q (p sk )) > z 1j<k
dj ,ej
(di ei ,dj ej P (z))=1
1i,j<k, i6=j
For each fixed d, e, we use the Fundamental Lemma of Sieve Methods with u = log log N to
control the cardinality of the above set of primes. If we set
so that (p) = (p) 1, then we find that the main term equals
XV
,
([d1 , e1 ]) ([dk , ek ])
7.1. BOUNDED GAPS BETWEEN PRIMES 85
where 1
Y (p)
Y
(p) 1
V := 1 = 1 1 ,
pz
p1 pz
p p
and the error term is
X X
A+3k
+ (k 1)(a) E(a[d1 , e1 ] [dk , ek ]),
(log N ) ([d1 , e1 ]) ([dk , ek ])
az log log N
a|P (z)
where
X X X X
E(q) := max 1 = max 1 + O(1).
(a,q)=1 (q) (a,q)=1 (q)
N +s k <p2N +sk N <p2N
pa (mod q) pa (mod q)
Therefore
X d,1 e,1
Tk = XV
([d1 , e1 ]) ([dk1 , ek1 ])
(di ei ,dj ej P (z))=1
1i,j<k, i6=j
2
M N X
+O + M2 2k (q)E(q) .
(log N )A
qN 1/2 /(log N )log z
The Bombieri-Vinogradov theorem (see, also, Exercice 4.0.8) implies that the sum over q in
the error term is N/(log N )A . So we conclude that
M 2N
X d e
Tk = XV +O .
([d1 , e1 ]) ([dk , ek ]) (log N )A
(di ei ,dj ej P (z))=1
1i,j<k, i6=j
As in the proof of Lemma 7.1.1, we may remove the conditions (di ei , dj ej ) = 1 and we may re-
place ([dj , ej ]) by [dj , ej ] at the cost of introducing an error of total size O(M 2 N (log N )3k+1 /z),
which is admissible since z (log N )A+3k+1 . Moreover, using the formula
1 1 X
= (a),
[d, e] de
a|(d,e)
we find that
2
X d,1 e,1 X X d,1
= (a1 ) (ak1 ) .
[d1 , e1 ] [dk1 , ek1 ] d1 dk1
P (di ei )>z P (ai )>z P (di )>z
1i<k 1i<k di 0 (mod ai )
1i<k
Finally, we replace (aj ) by aj , thus introducing a total error of size O(M 2 N (log N )3k+1 /z),
which is admissible since z (log N )A+3k+1 . This completes the proof of the lemma.
86 CHAPTER 7. GAPS BETWEEN PRIMES
Motivated by the two lemmas above, we make a change of variable which diagonalizes
the quadratic form appearing in Lemma 7.1.1: for a k-tuple a = (a1 , . . . , ak ), we set
a X d
= .
(7.1.3) a1 ak d1 dk
P (dj )>z
dj 0 (mod aj )
1jk
k
X d X X b Y
= (bj /dj )
d1 dk b1 bk j=1
P (d1 dk )>z P (d1 dk )>z bj 0 (mod dj )
dj 0 (mod aj ) dj 0 (mod aj ) 1jk
1jk, dj0 =1 1jk, dj0 =1
k
X b X Y bj /aj
=
b1 bk dj /aj
P (b1 bk )>z dj 0 (mod aj ) j=1
bj 0 (mod aj ) dj /aj |bj /aj
1jk 1jk, dj0 =1
X (bj0 )b X (b)a1 ,...,aj 1 ,b,aj +1 ,...,ak
0 0
(7.1.5) = = .
bj =aj
b1 bk
a1 ak b
P (b)>z
1jk, j6=j0
P (bj0 )>z
k Z Z 2 Y Z
S(N, z) 1X 2 1
= f (t)dtj0 dtj m f (t) dt + O ,
S(s)N (log D)k 4 j =1 1jk
log N
0
j6=j0
where
Y k
(p) 1
S(s) := 1 1 ;
p
p p
Except for Lemmas 7.1.1 and 7.1.2, the key input to the proof of the above lemma comes
from the following result:
Lemma 7.1.4. Let D and z be two parameters with log D z D. If g : Rk R is a
g
smooth function supported on {t [1, +)k : t1 tk D} and such that tj
1/tj , then
k Z k1 !
X g(n1 , . . . , nk ) Y 1 g(t1 , . . . , tk ) log D
= 1 dt + O ;
n1 nk p t1 tk log z
P (n1 nk )>z pz
Proof. All implied constants might depend on g and on k. We may assume that z
D1/(1000k) ; otherwise, the result is trivially true. We note that
k1 !
X g(n1 , . . . , nk ) X g(n1 , . . . , nk ) log D
= +O .
n1 nk n1 nk log z
P (n1 nk )>z P (nj )>z
nj >z 100
1jk
We split the range of summation {n : n1 nk D, nj > z 100 (1 j k)} into small cubes
of the form B = kj=1 (xj , xj + xj ]. We note that
Q
g(n1 , . . . , nk ) g(x1 , . . . , xk ) 1 I 1
= +O 100
= +O 100
n1 nk x1 xk z x1 x k x1 xk z x1 xk
Therefore
k
X g(n1 , . . . , nk ) I Y X 1
= 1+O
n1 nk x1 xk j=1 x z 100 (log z)k x1 xk
P (n1 nk )>z j <nj xj + xj
nB P (nj )>z
Y k
1/ log z 1
= I 1 + O max (xj ) 1
1jk
pz
p
1
+O
z 100 (log z)k x1 xk
k
Y (log z)k
1 1 1/ log z
=I 1 +O + max (x )
pz
p x1 xk z 100 1jk j
by the Fundamental Lemma of Sieve Methods (cf. Lemma 3.3.1). Hence, summing over all
cubes B {n : n1 nk D, nj > z 100 (1 j k)} completes the proof of the lemma.
Proof of Lemma 7.1.3. Note that our choice of a and an upper bound sieve imply that
M = max |d | (log N )k .
d
So, if we set
log a1 log ak
g(a1 , . . . , ak ) = f ,...,
log D log D
and
Y 1
V = 1 ,
pz
p
then Lemmas 7.1.1 and 7.1.4, and our assumption that (log N )6k+2 z e log N
imply that
2
Y g(a)2
X X N (p) X N
d
= V 2k 1 +O
p a1 ak (log N )k+1
N <n2N dj |n+sj pz P (a1 ak )>z
P (Q(n))>z 1jk
g(u)2
Z
N Y (p) k1/2
= k 1 du + O((log N ) ) .
V pz p u1 uk
Moreover,
Y (p)
1
k
(7.1.6) V 1 = S(s) 1 + O
pz
p z
and
g(u)2
Z Z
du = (log D)k f (t)2 dt,
u1 uk Tk
7.1. BOUNDED GAPS BETWEEN PRIMES 89
so that
2
Z
X X k 2 1
(7.1.7) d = S(s)N (log D)
f (t) dt + O .
N <n2N
dj |n+sj
log N
P (Q(n))>z 1jk
Next, if we set
X 2 (b)g(a1 , . . . , aj 1 , b, aj +1 , . . . , ak )
0 0
Sj0 (a) = ,
b
P (b)>z
where j0 {1, . . . , k}, then Lemma 7.1.2 and relations (7.1.5) and (7.1.6) imply that
2
Sj0 (a)2
X X X Y (p) X
1P (n + sj0 )
d = 2k+1
1
V p a1 ak
N <n2N dj |n+sj pz P (a1 ak )>z
P (Q(n))>z 1jk aj0 =1
N
+O
(log N )k+1
Sj0 (a)2
S(s)N X 1
= +O .
V k+1 log N a1 ak log N
P (a1 ak )>z
aj0 =1
If 2 (b) = 0 and P (b) > z, then b is divisible by the square of a prime > z. Therefore,
X g(a1 , . . . , aj 1 , b, aj +1 , . . . , ak )
0 0 log N
Sj0 (a) = +O
b z
P (b)>z
Z
g(a1 , . . . , aj0 1 , u, aj0 +1 , . . . , ak ) p
=V du + O log N
u
1
= V (log D) Gj0 (a1 , . . . , ak ) + O
log N
Therefore
2
2
Gj0 (a)2
= S(s)N (log D) 1
X X X
1P (n + sj0 ) d +O .
V k1 log N a1 ak log N
N <n2N dj |n+sj P (a1 ak )>z
P (Q(n))>z 1jk aj0 =1
90 CHAPTER 7. GAPS BETWEEN PRIMES
In view of Lemma 7.1.3, it is clear that our goal is to choose f supported on Tk and
maximizing the ratio
k R R 2
1X f (t)dtj dt dtj1 dtj+1 dtk
(f ) := R 1 .
k j=1 f (t)2 dt
If we can show that, for k large enough, (f ) > 4m/k, then we deduce that lim inf n (pn+m
pn ) < .
As a warm-up, using calculus of variations, we show that the maximiser of (f ) is an
eigenvector of the linear operator
k Z
1X
(Lk f )(t) := f (t1 , . . . , tj1 , u, tj+1 , . . . , tk )du
k j=1
and the corresponding eigenvalue is the maximum possible ratio = (f ). Indeed, for a
function f supported on Tk , we have that
hLk f, f i
(f ) = ,
hf, f i
where Z
hg, hi := g(t)h(t)dt.
Tk
If, now, f is a maximiser of the function (), then the function (f +g) has a maximum
at = 0 for any smooth g : Rk R supported on Tk . So its derivative at = 0 must vanish,
which implies that
hLk f, gi + hLk g, f i = 2(f )hf, gi.
It is easy to see that Lk is a self-adjoint operator, so this implies that
An asymptotic estimation for Mk is given in Lemma 7.1.6 below. For explicit bounds on
Mk , the reader is invited to consult the paper by Maynard [May15a] as well [Pol].
Remark 7.1.5. The original weights of Goldston, Pintz and Yildirim essentially correspond
to taking the supremum over the restricted set of functions of the form f (t) = F (t1 + +tk ),
where F is supported on [0, 1]. Then we have that
R 1 k2 R 1 2
u ( u F ) du
(f ) = (k 1) R0 1 .
0
uk1 F (u)2 du
It is possible to show that (f ) 4/k in this special case [Sou], which means that the weights
of Goldston, Pintz and Yildirim cannot yield bounded gaps between primes. On the other
hand, choosing F (t) = (1 t)` , we find that
R1 (k 2)!(2` + 2)!
k2 2`+2
k1 0
u (1 u) du k1 (k + 2` + 1)!
(f ) = 1 =
(` + 1) 2 (` + 1) 2 (k 1)!(2`!)
R
k1 2`
u (1 u) du
0
(k + 2`)!
2(2` + 1) 4
=
(` + 1)(k + 2` + 1) k
if ` = o(k) and `, k . This means that if we could have inserted a slightly stronger input
to the computations in Lemma 7.1.2, which would have allowed to take D slightly larger,
we would have been able to prove Theorem 7.0.2 when m = 1 with these weights. This is
precisely what Zhang did. As we will see in the lemma below, using the higher dimensional
structure of f allow us to show that (f ) can get much bigger.
92 CHAPTER 7. GAPS BETWEEN PRIMES
where
R g : [0, +) [0, +) is a function supported on the interval [0, T ] and such that
0
g(t)2 dt = 1. Then
Z k Z
2
Y 1
f (t) dt g(ktj )2 dtj = k ,
j=1 0
k
so that
Z Z kt1 tk1 2
1 2 2
(f ) g(t1 ) g(tk1 ) g(tk )dtk dt1 dtk1
k 0
R 2 Z
g(t)dt
0 g(t1 )2 g(tk1 )2 dt1 dtk1
k t1 ++tk1 kT
R 2
g(t)dt
= 0 Prob (X1 + + Xk1 k T ) ,
k
where X1 , . . . , Xk1 are independent random variables with density function g 2 . Let
Z
= E[X1 ] = tg(t)2 dt
This suggests choosing A log T . We take T = k/(log k)3 and A = log k, so that
log(k/(log k)2 )
1 2 log log k 1 T log log k
= 1+O =1 +O 1
log k log k log k log k k log k
for k large enough. In particular,
kT T 1
2
= 2
(k T k) k(1 T /k ) log k
and therefore
2
c2 log2 (1 + AT )
Z
T 1
k (f ) g(t)dt 1 = 1+O
0 k(1 T /k )2 A2 log k
2 2
log (k/(log k) ) 1
= 1+O
log k log k
= log k 4 log log k + O(1),
as claimed.
Finally, we prove the upper bound on (f ). Let f be a symmetric measurable function
supported on Tk . Motivated by the choice for f above, we use the Cauchy-Schwarz inequality
in the following fashion:
Z 2 Z 1 2
f (t1 , . . . , tk )dtk = f (t1 , . . . , tk )dtk
0
Z Z 1
2 dtk
(1 + kAtk )f (t1 , . . . , tk ) dtk
0 1 + kAtk
Z
log(1 + kA)
= (1 + kAtk )f (t1 , . . . , tk )2 dtk .
kA
94 CHAPTER 7. GAPS BETWEEN PRIMES
Therefore
Z Z 2 Z
log(1 + kA)
f (t1 , . . . , tk )dtk dt1 dtk1 (1 + kAtk )f (t1 , . . . , tk )2 dt.
kA
By symmetry,
Z Z 2 Z
log(1 + kA)
f (t1 , . . . , tk )dtk dt1 dtk1 (1 + kAtj )f (t1 , . . . , tk )2 dt
kA
for all j {1, . . . , k}. So, summing over j and using the fact that t1 + + tk 1 in the
support of f , we find that
R 2
log(1 + kA) Tk (k + kA(t1 + + tk ))f (t1 , . . . , tk ) dt
k (f ) R
kA f (t)2 dt
(1 + A) log(1 + kA)
.
A
for any A > 0. So, setting A = log k yields that
for all symmetric functions f supported on Tk . This completes the proof of the lemma.
Remark 7.1.7.
So, if k = bCm4 e4m c for a large enough constant C, then S(N, z) > 0 for large enough N ,
which implies that lim inf n (pn+m pn ) sk s1 . We take sj to be the j-th prime that
is > k, which clearly form an admissible set. Then sk . k log k e4m m5 by the Prime
Number Theorem, which completes the proof of Theorem 7.0.2.
Lemma 7.2.1. Let N 1, and assume that there is z 3 and progressions ap (mod p),
p < z, such that if n N , then n ap (mod p), for some p y. Then there exists
x (P (z), 2P (z)] for which there are no primes in (x, x + N ].
7.2. LARGE GAPS BETWEEN PRIMES 95
Proof. Consider x {P (z)+1, P (z)+2, . . . , 2P (z)} such that x ap (mod p), for all p < z.
If n [1, N ] Z, then n ap (mod p), for some p < z, that is to say, there exists a prime
p < z that divides x + n. In particular, x + n cannot be a prime number, for all 1 n N .
This completes the proof of the lemma.
We are now in position to show Theorem 7.0.4.
Proof of Theorem 7.0.4. Let z 1, and N S[z, z 2 ]. To each prime p < z, we will assign a
congruence ap (mod p) such that {n N } p<z ap (mod p), as in Lemma 7.2.1. Therefore,
in order to show Theorem 7.0.4, we need to be able to take
e N (log log N )2
z ,
(log N )(log log log N )
so that
e z(log z)(log log log z) e (log X)(log log X)(log log log log X)
N
(log log z)2 (log log log X)2
with X = P (z).
We will choose ap (mod p) using different arguments, according to whether p N 1/u ,
N 1/u < p N 11/M or N 11/M < p z, where u and M are defined by
log log N
(7.2.1) uu = log N = u
log log log N
and
log N log N
(7.2.2) N 1/M = = M .
log log N log log N
and consequently
N eO(u)
1/u
X N N N
|S| (N, N )+ + +O
p (u log u)u M (log N )2
N 11/M <pN
N N N log log N
= +O
M (log N )e log log N log N
by Theorem 6.2.1, the Prime Number Theorem and our choice of u and M .
Small primes: For each p N 1/u , we select the progressions ap (mod p) greedily: we
let a2 (mod 2) be such that
Having chosen a2 , we set S2 = {n S : n 6 a2 (mod p)}, and we select a3 (mod 3) such that
and we set S3 = {n S2 : n 6 a3 (mod 3)}. Continuing this way, we find that there are
progressions ap (mod p), p N 1/u , such that the set
has cardinality
Y e uN log log N N (log log N )2
0 1
(7.2.3) |S | |S| 1 . 2
e 2 (log log log N )
.
1/u
p (log N ) (log N )
pN
and
N2 := {mq N : m N 1/M , q > N 11/M prime, P (mq 1) > N 1/u }.
7.3. EVEN LARGER GAPS BETWEEN PRIMES 97
(Here and for the rest of this section, the letter q will always denote a prime number, and
the same will be true later on for the letter `.) Recall that N 1/M = (log N )/ log log N . We
further write N2 as a disjoint union N20 N200 , where
N20 := {mq N : N 1/M / log log N < m N 1/M , q > N 11/M , P (mq 1) > N 1/u }.
so that S2 is the disjoint union of the sets {mq : q Qm }, m N 1/M , with N20 consisting of
those with N 1/M / log log N < m N 1/M . An upper bound sieve implies that
uN N log log N
|Qm | 2
,
(m)(log N ) (m)(log N )2 log log log N
so that
N log log N X 1
|N20 |
(log N )2 log log log N (m)
N 1/M / log log N <mN 1/M
N log log N z
log log log N = o .
(log N )2 log log log N log z
So we may pick residue classes ap (mod p) for the primes p (N 11/M , z/2] to cover N1 N20 .
We are then left with the challenge of covering S20 using residue classes ap (mod p) for the
primes p (z/2, z].
Note that if m N 1/M / log log N , then
N N (1 m/N 1/M ) N
N 11/M = .
m m m
So the Fundamental Lemma of Sieve Methods (cf. Lemma 3.3.1) and the Bombieri-Vinogradov
theorem imply that
Y p 1 N/m
|Qm | c e
1/u
p2 (log N )(log(N/m))
p|m
p>2
Y p 1 N log log N
c e
2
p2 m(log N ) log log log N
p|m
p>2
98 CHAPTER 7. GAPS BETWEEN PRIMES
for even integers m N 1/M / log log N , where c is the twin prime constant, that is to say
Y 2
2 1
c=2 1 1 .
p>2
p p
In particular, writing m = 2n, we find that
ce N log log N X 1 Yp1
|N20 |
2 (log N )2 log log log N n p2
nN 1/M /(2 log log N ) p|n
p>2
e N log log N
2 M (log N )(log log log N )
e N (log log N )2
2 (log N )2 (log log log N )
C z
,
2e log z
where the sum over n is estimate using the convolution method (see, for example, the proof
of Theorem 0.2.1 and Exercise 0.2.2). Therefore, we need to choose the residue classes
ap (mod p) for p (z/2, z] in a way that, on average, each one will cover > C/e elements
of N20 . In order to do so, we will use
Q the Maynard-Tao sieve weights from Section 7.1 to
construct a probability measure on z/2<pz Z/pZ that will be biased on choices of residue
classes ap (mod p), z/2 < p z, that each cover many elements of N20 simultaneously. Here
is the key result:
Proposition 7.3.1. Fix > 0 and let m N 1/M / log log N be even. If Im [z/2, z] is
an interval whose length is between |Qm | log N and 2|Qm | log N , and Ck > k 5 , then there
there are choices of residue classes ap (mod p), p Im , whose union covers Qm .
Theorem 7.0.5 is now a direct corollary of Proposition 7.3.1 and of the above discus-
sion.
Q As we mentioned before, our goal is to construct a probability measure on the space
pIm Z/pZ. Indeed, we set Y
sj = p(Ck )+j p,
pCk
where Ck is a large auxiliary integer to be chosen later. For the convenience of notation, we
also set
s0 = 0,
and we assume that Ck > 2k, so that the polynomial
k
Y
Qp,m (x) := [(x + psj )(m(x + psj ) 1)]
j=0
does not have a fixed prime divisor (recall that m is even here). We further decompose
(1) (2)
Qp,m = Qp,m Qp,m , where
k
Y k
Y
Q(1)
p,m (x) := (2)
(m(x + psj ) 1) and Qp,m (x) := (x + psj ).
j=0 j=0
7.3. EVEN LARGER GAPS BETWEEN PRIMES 99
We also set
log N
w=e
and fix two upper bound sieves (j (d))d1 such that 1 has dimension k +1, 2 has dimension
1, and
supp(1 ) {d N 1/100 : p|d = w < p N 1/u },
supp(2 ) {d N 1/100 : p|d = N 1/u < p N 1/1000 }.
Finally, we let d be some sieve parameters supported on k-tuples d = (d1 , . . . , dk ) with
d1 dk N 1/100 . Eventually, we will take d exactly as in Section 7.0.2. Then we define
the probability density function
2
1 X X
p,m (a) := (1 1 )(Q(1)
p,m (n)) (1 2 )(n) d
,
p,m
nN/m dj |n+psj
P (Qp,m (n))>w 1jk
na (mod p)
The choice of the parameters d will be similar to the one leading to the proof of Theorem
7.0.2: we choose them so that many of the numbers n + psj are biased towards being primes,
(1)
while the weight 1Np,m (n)(1 2 )(Qp,m (n))(1 3 )(n) guarantees that our sum is essentially
supported on integers n for which mn 1, m(n + ps1 ) 1, . . . , mn(n + psk ) 1 have no prime
factors N 1/u and n has no primes N 1/1000 (so, there is a positive probability it will be
prime). We then have the following crucial estimate.
Lemma 7.3.2. There are choices of d such that the following holds: if m, and Im are as
in the statement of Proposition 7.3.1, q Qm (sk z, ) and N is large enough in terms of
k and , then there are absolute constants c, c0 > 0 such that
3
0 k log k
X
p,m (q) c log k c eGm (q) ,
pI
Ck
m
where
X X 2k
Gm (q) := .
(7.3.1) `
1j,j 0 ,j 00 k Ck <`(log N )2
j,j 0 ,j 00 distinct `|(qm(sj sj 0 )(sj sj 00 ))
100 CHAPTER 7. GAPS BETWEEN PRIMES
Before we turn to the proof of Lemma 7.3.2, let us see how it implies Proposition 7.3.1.
Deduction of Proposition 7.3.1 from Lemma 7.3.2. Let Q0m be the set of q Qm such that
Gm (q) 1 and q > sk z. Markovs inequality and the sieve imply that
X
|Qm \ Q0m | Gm (q) + #{q sk z : P (mq 1) > N 1/u }
qN/m
P (mq1)>N 1/u
for all m N 1/M / log log N , where we used the fact that if a prime ` > Ck divides (sj
sj 0 )(sj sj 00 ), then it must be one of the O(log Ck ) divisors of the number (p(Ck )+j
p(Ck )+j 0 )(p(Ck )+j p(Ck )+j 00 ).
Now, let Jm be the left half of the interval Im and fix q Q Q0m . The probability that q is
not covered by a random selection of congruences (ap )pJm pJm Z/pZ is
( )
Y X
(1 p,m (q)) exp p,m (q) <
pJ pJ
5
m m
if N and k are large enough in terms of , by Lemma 7.3.2. Hence, the expected cardinality
of the random set
( )
[
Rm ((ap )pJm ) := q Q0m : q
/ {ap (mod p)} ,
pJm
which is the uncovered part of Q0m , is < |Qm |/4. This implies that there is a choice of
(ap )pJm such that
n = q psj for some j {1, . . . , k}. Note that q psj q N/m and q psj q zsk > 0,
so that the condition n [1, N/m] is always satisfied for these integers, and the same is true
for the condition n q (mod p). Therefore
(7.3.2)
2
k (1)
X X X (1 1 )(Qp,m (q psj )) (1 2 )(q psj ) X
p,m (q) d
.
pIm j=1 pIm
p,m
di |q+p(si sj )
P (Qp,m (qpsj ))>w 1ik
So we see that we need an upper bound for p,m and a lower bound for the resulting sums
over p Im . Anticipating the choice of the parameters d , and in order to simplify the
statements of the results, we make the change of variables
a X d
:= .
(7.3.3) a1 ak d1 dk
P (dj )>w
dj 0 (mod aj )
1jk
We further set
1P (a1 ak )>w log a1 log ak
a := Q k
(a1 ak ) f 1/100
,..., ,
`w (1 1/`) log(N ) log(N 1/100 )
As in Section 7.1, we have the inversion formula (7.1.4), which also implies that
Lemma 7.3.3. Let m N and p (z/2, z]. If Ck > ek and N is large enough in terms of
k and f , then
k Z
Hp,m e (log Ck )2 u
p,m e |Qm | f (t)2 dt,
100
where the implied constant is absolute and
X X 2k + 2
Hp,m = .
1i,jk Ck <`(log N )2
`
i6=j `|pm(si sj )1
Proof. All implicit constants might depend on k, f and A, unless otherwise specified. Write
P(s, t) for the set of integers all of whose prime factors are in the interval (s, t]. Opening
102 CHAPTER 7. GAPS BETWEEN PRIMES
the square and the two convolutions 1 2 and 1 3 in the definition of p,m , we find that
X X
p,m = 1 (f1 )2 (f2 )d e
dj ,ej |n+psj f1 P(w,N 1/u )
(di ei ,dj ej P (w))=1 f2 P(N 1/u ,N 1/1000 )
1i,jk, i6=j
(1)
Qp,m (n) 0 (mod f1 ),
n N/m n 0 (mod f2 ),
# : .
P (Q p,m (n)) > w n + psj 0 (mod [dj , ej ]),
1jk
On the other hand, if (f1 f2 , d1 e1 dk ek ) > 1, then we note that there must exist a prime
(1)
r > w dividing f1 f2 and d1 e1 dk ek . In particular, r|Qp,m (n) and r|n + psi for some
(1)
i {1, . . . , k}, which implies that r|Qp,m (psi ). Therefore, in this case the cardinality of
the above set of integers n is
k
X X N (k + 1)(f1 )
i=1 r|(f1 f2 ,d1 e1 dk ek )
m [f1 f2 , [d1 , e1 ] [dk , ek ]]
(1)
r|Qp,m (psi ), r>w
(1)
since Qp,m (psi ) has log N prime factors in total. Next, we remove again the condition
(f1 f2 , d1 e1 dk ek ) = 1 from our new formula for p,m . Since any common prime factor of
f1 f2 and d1 e1 dk ek must be > w by the fact that (d1 e1 dk ek , P (w)) = 1, this produces
an error term of size N (log N )O(1) /w, which is admissible. In conclusion,
N Y p,m (`)
X d e
N
p,m = S1 S2 1 +O ,
m `w
` d ,e
[d1 , e1 ] [dk , ek ] m(log N )10
j j
(di ei ,dj ej P (w))=1
1i,jk, i6=j
where
(1)
X 1 (f1 )p,m (f1 ) X 2 (f2 )
S1 := and S2 := .
f1 f2
f1 P(w,N 1/u ) f1 P(N 1/u ,N 1/1000 )
by Mertenss formula. The implied constants in the second and third line are absolute,
provided that N is large enough in terms of k. Moreover, the Fundamental Lemma of Sieve
Methods (cf. Lemma 3.3.3) implies that1
!
(1) k+2
Y p,m (`) log w
S1 1
1/u
` log(N 1/u )
w<`N
(1)
since p,m (`) = k + 2, unless ` is one of the O(log N ) prime divisors of the discriminant of
the polynomial Qp,m , and
Y 1 1
S2 1
1/u 1/1000
` u
N <`N
Both constants are absolute here, as long as N is large enough in terms of k. Finally, note
that the definition of sj implies that p,m (`) = 1 + 1`-m for primes ` Ck . Moreover,
1
Strictly speaking, Lemma 3.3.3 cannot be used directly to estimate S1 and S2 . It is easy to deduce the
claimed estimates from it though. For example, we may take 1 := e 1P (n)>w , where
e1 is an upper
1/100 1/u
bound sieve supported on the set {d N : d|P (N )} and apply Lemma 3.3.3 with the multiplicative
function g(n) = 1P (n)>w /n. The sum S2 is handle in an analogous way.
104 CHAPTER 7. GAPS BETWEEN PRIMES
if Ck < ` w, then p,m (`) = 2k + 2, unless ` divides one of the numbers (si sj ) or
mp(si sj ) 1, for some 0 i, j k with i 6= j. Since
X X k k 3 log Ck
1
0i,jk C <`w
` Ck
k
i6=j `|si sj
and
X X k k 3 log N
1
0i,jk (log N )2 <`w
` (log N )2
i6=j `|mp(si sj )1
Proof. The proof is a combination of the proofs of Lemmas 7.1.2 and 7.3.3. We outline the
argument when j0 = k, the other cases being similar. All implicit constants might depend
on k, f and A, unless otherwise specified. Also, recall the notation P(s, t) from the proof of
Lemma 7.3.3.
Note that since q is a prime > z N 1/100 , it cannot have any divisors N 1/100 . In
particular, we must have that dk = 1. Similarly, since q Qm and q > N 1/u , we know
that P (q(qm 1)) > N 1/u . Therefore the condition P (Qp,m (q psk )) > w reduces to
P (Q
ep,m (q psk )) > w, where
k1
Y
Q
ep,m (x) := [(x + psi )(m(x + psi ) 1)].
i=0
(1) e(1)
In the same fashion, we have that (1 1 )(Qp,m (q psj )) = (1 1 )(Q p,m (q psj )) by our
assumption that 1 is supported on the set P(w, N 1/u ), where
k1
Y
e(1)
Q p,m (x) := (m(x + psi ) 1).
i=0
So, if we set
2
X X
S= (1 1 )(Q(1)
p,m (q psk )) (1 2 )(q psk )
,
d,1
pIm di |q+p(si sk )
pb0 (mod `0 ) 1i<k
P (Qp,m (qpsk ))>w
k Y
Y k
D= ((sk sj )(mq 1) q(sk si )).
j=1 i=0
106 CHAPTER 7. GAPS BETWEEN PRIMES
where
en,m (q nsk ) 0 (mod d)}
e(d) := #{n (mod d) : Q
and
Next, we remove again the condition that (f1 f2 , d1 e1 dk ek ) = 1 as in Lemma 7.3.3 at the
cost of a total error that is N (log N )O(1) /(mw), which is of admissible size. This separates
the variables f1 , f2 , f3 from each other and from d1 , e1 , . . . , dk ek , and we deduce that
#{p Im } Y e(`)
X d,1 e,1
S= S1 S2 1
(`0 ) `w
` 1 di ,ei
([d1 , e1 ]) ([dk1 , ek1 ])
`6=`0 (di ei ,dj ej P (w))=1
1i,j<k, i6=j
N
+O ,
m(log N )A
where
X e(1) (f1 )1 (f1 ) X 2 (f2 )
S1 := , and S2 := .
f1
(f1 ) f2
(f2 )
7.3. EVEN LARGER GAPS BETWEEN PRIMES 107
The sum over d and over e is estimated as in Section 7.1: we have that
X d,1 e,1
d ,e
([d1 , e1 ]) ([dk1 , ek1 ])
i i
(di ei ,dj ej P (w))=1
1i,j<k, i6=j
2
(log N )O(1)
X 1 (b)a,b
X
= +O
a1 ak1
b w
P (a1 ak1 )>w P (b)>w
Y k1 Z Z 2
1/100 k+1 1
(log(N )) 1 f (t)dtk dt1 dtk1
`w
`
2
ek (log N )k+1 (log w)k+1
Z Z
f (t)dtk dt1 dtk1
100k
by Mertenss formula, provided that N is large enough in terms of k. The implied constants in
the third and fourth line are absolute. Moreover, the Fundamental Lemma of Sieve Methods
(cf. Lemma 3.3.3) implies that
k+1
Y e(1) (`)
log w
S1 1 ,
1/u
`1 log(N 1/u )
w<`N
where the constant is uniform in k if N is large enough, since e(1) (`) = k + 1 unless ` is one
of the O(log N ) divisors of the discriminant of the polynomial n Q en,m (n psk ), and that
Y 1 1
S2 1 .
1/u 1/1000
`1 u
N <`N
Moreover, note that e(`) = 0 if ` Ck since q(qm 1) have no divisors N 1/u . This implies
that
Y Y Y 2k Y 1 e(`)/(` 1)
e(`) e(`) 1
1 = 1 1
C <`w
`1 C <`w
`1 C <`w
` C <`w
(1 1/`)2k
k k k k
`-`0 `-`0
2k
log Ck Y 1 e(`)/`
log w Ck<`w
1 2k/`
e(`)<2k
Note that
k1
Y
en,m (q nsk ) =
Q [(q + n(sj sk ))(mq 1 + mn(sj sk ))].
j=0
108 CHAPTER 7. GAPS BETWEEN PRIMES
Finally, note that if `|sj sj 0 with ` > Ck , then we must have that ` is one of the O(log Ck )
divisors of the number p(Ck )+j p(Ck )+j 0 . Since k 3 (log Ck )/Ck 1 by our assumption that
Ck > ek , the lemma follows.
Proof ofP
Lemma 7.3.2. Since we are looking for a lower bound, we may restrict the summa-
tion in pIm p,m (q) to those primes p Im with Hp,m 1, where Hp,m is defined in the
statement of Lemma 7.3.3. Then relation (7.3.2) and Lemma 7.3.3 imply that
2
k (1)
X X X (1 1 )(Qp,m (q psj )) (1 2 )(q psj ) X
p,m (q) kR
d
e (log Ck )2 u
pIm j=1 pIm |Qm | 2
f (t) dt di |q+p(si sj )
Hp,m 1 100
1ik
P (Qp,m (qpsj ))>w
1 2
= k R ,
e (log Ck ) u
2
|Qm | 100
f (t)2 dt
2
k
X X X
1 := (1 1 )(Q(1)
p,m (q ps j )) (1 2 )(q ps j )
d
j=1 pIm di |q+p(si sj )
P (Qp,m (qpsj ))>w 1ik
and
2
k
X X X
2 := (1 1 )(Q(1)
p,m (q ps j )) (1 2 )(q ps j )
d
.
j=1 pIm di |q+p(si sj )
Hp,m >1 1ik
P (Q p,m (qpsj ))>w
k Z Z
k X 2
e (log Ck )2 u
1 |Qm | f (t)dtj dt1 dj1 dtj+1 dtk ,
100 j=1
7.4. CRAMERS MODEL 109
whereas Markovs inequality and the upper bound in Lemma 7.3.4 imply that
2
k
X X X
2 Hp,m (1 1 )(Q(1)
p,m (q ps j )) (1 2 )(q ps j )
d
j=1 pIm di |q+p(si sj )
Hp,m >1 1ik
P (Q p,m (qpsj ))>w
k
X X X 2k + 2
=
j=1 0i,i0 k Ck <`(log N )2
`
i6=i0 `-m(si si0 )
2
X X
(1 1 )(Q(1)
p,m (q psj )) (1 2 )(q psj )
d
pIm di |q+p(si sj )
pm(si si0 )1 (mod `) 1ik
P (Qp,m (qpsj ))>w
k Z Z
k X 2
k 3 eGm (q) e (log Ck )2 u
|Qm | f (t)dtj dt1 dj1 dtj+1 dtk .
Ck 100 j=1
Putting together the above estimates and choosing f such that the quantity
k R
X (f (t)dtj ) dt1 dj1 dtj+1 dtk
R
j=1
f (t)2 dt
is log k, which is possible by Lemma 7.1.6, completes the proof of Lemma 7.3.2.
7.4 Cram
ers model
We conclude this chapter with a heuristic discussion of the local distribution of primes and,
in particular, of the gaps between them. In order to do so, we introduce the so-called Cramer
model for the prime numbers and its refinement due to Granville. This model turns out to
be very effective in making accurate predictions about the distribution of primes (local and
global), unlike the Kubilius model which, as we saw in Section 2.2, is not very successful in
this task.
First of all, let us recall a quantitative form of the Prime Number Theorem:
Z x
x dt x
(x) = li(x) + OA = + OA (x 2).
(log x)A 2 log t (log x)A
This estimate can be also interpreted by saying that the density of primes around x is
about 1/ log x (this is was what Gauss had conjectured in fact). So an integer n should be
prime with probability about 1/ log n. Of course, the primes are deterministic objects, so this
statement is obviously false. However, modelling them this way leads to surprisingly accurate
estimates. More precisely, we define a sequence of Bernoulli random variables Y2 , Y3 , . . . such
110 CHAPTER 7. GAPS BETWEEN PRIMES
that
1
Prob(Yn = 1) =
,
(7.4.1) log n
1
Prob(Yn = 0) = 1
.
log n
Furthermore, we assume that the variables Yn are independent from each other, since a priori
there should not be any correlation between two integers being prime. This is the so-called
Cramer model of the prime numbers, which we can use to make predictions about various
questions regarding the primes.
For example, the random variable
X
(x) = Yn ,
2nx
and
X X 1 1
x
Var[(x)] = Var[Yn ] = 2
= li(x) + O(1) .
2nx 2nx
log n log n log x
almost surely, for every fixed > 0. In particular, with probability 1, the Riemann Hypoth-
esis is true for our model (x).
Next, we use Cramers model to study the distribution of the difference of two consecutive
primes. Let p1 , p2 , . . . denote the sequence of prime numbers in increasing order. Then,
according to Cramers model, a model for
#{pm x : pm+1 = pm + k}
is given by
" k1
# k1
X Y X 1 Y 1
E Yn Yn+k (1 Yn+j ) = log(n + k) 1
nx j=1 nx
log n j=1
log(n + j)
k1
x 1 x logk x
1 e ,
(log x)2 log x (log x)2
7.4. CRAMERS MODEL 111
as x . This leads to the prediction that the sequence (pm+1 pm )/ log m should follow
an exponential distribution.
Now, let us use this model to study another problem: counting P twin primes. A model
for the number of twin primes (p, p + 2) with p x then is 2 (x) = 2nx Yn Yn+2 . Since
X X X 1 x
E[2 (x)] = E[Yn Yn+2 ] = E[Yn ] E[Yn+2 ] = 2 ,
2nx 2nx 2nx
(log n) log(n + 2) log x
as x . So this leads to the prediction that the numbers of twin primes (p, p + 2) with
p x is asymptotic to x/ log2 x. However, it is widely believed that
Y 2
x 1 2
(7.4.2) #{n x : (n, n + 2) are twin primes} 2 1 1 .
log2 x p>2 p p
The reason why Cramers model failed to give the right asymptotic formula for the
number of twin primes is the assumption that the variables Yn are independent from each
other. Indeed, two consecutive integers larger than 2 can never be prime simultaneously. So
there is a strong correlation between Yn and Yn+1 . Similarly, if n > 3, then n, n + 2 and
n + 4 cannot be simultaneously prime, since at least one of them is divisible by 3. So there
the variables Yn , Yn+2 and Yn+4 are correlated. Similar constraints arise for all small primes
and we need to adjust our model appropriately if we want to make accurate predictions.
The way we modify Cramers is by ensuring that n and n+2 are in the right classes modulo
all small primes. This is done by applying what is called a preliminary sieve and restricting
n and n + 2 to a priori have no fixed prime factors. This increases the probability that the
randomly chosen n and n+2 are both primes. More precise, we set N = {n N : P (n) > z}
and we consider a new sequence of independent Bernoulli random variables {Zn }nN such
that
1
1 Y 1
1
Prob(Zn = 1) = log n ,
pz
p
(7.4.3) 1
1 Y 1
Prob(Zn = 0) = 1 log n 1 .
p
pz
(Strictly speaking, we have to assume that n is large enough in terms of z, otherwise the
above probabilities might not be numbers in [0, 1]. This is a minor technical assumption.)
112 CHAPTER 7. GAPS BETWEEN PRIMES
Then our model for the number of twin primes up to x is the function
X
e 2 (x) = Zn Zn+2 .
nx
n,n+2N
We have that
X X
E[
e 2 (x)] = E[Zn Zn+2 ] = E[Zn ] E[Zn+2 ]
nx nx
n,n+2N n,n+2N
Y 2
X 1 1
= 1
nx
(log n) log(n + 2) pz p
n,n+2N
2
x Y 1 (p)
1 1 ,
log2 x pz p p
where (
1 if p = 2,
(p) := #{m (mod d) : m(m + 2) 0 (mod p)} =
2 otherwise,
by the Fundamental Lemma of Sieve Methods (cf. Lemma 3.3.1). So we arrive to the refined
prediction that
Y 2
x 1 2
#{n x : (n, n + 2) are twin primes} 2 1 1 .
log2 x 2<pz p p
Finally, it is possible to use Cramers model (or its refinement) to make predictions about
how big the difference of two consecutive primes can be. To do this, we appeal to the Borel-
Cantelli lemma. Before we state this lemma, recall that, for a sequence of sets {An } n=1 , all
of which are subsets of some ambient space , we have that
[
\
lim sup An = Am = { : Am for infinitely many m}
n
n=1 m=n
and \
[
lim inf An = Am = { : Am for all but finitely many m}.
n
n=1 m=n
Lemma 7.4.2 (Borel-Cantelli). Let (, F, Prob) denote a probability space and consider
{An }
n=1 , a sequence of events in the -algebra F.
7.4. CRAMERS MODEL 113
P
(1) If n=1P (An ) < , then Prob(lim supn An ) = 0.
(2) If the events An are independent and
P
n=1 P (An ) = , then Prob(lim supn An ) =
1.
Now, let be the sample space where the random variables Yn , defined by (7.4.1), live.
Given an increasing function h : N [1, ), we set
that is to say, almost surely all but finitely many of the events An (h)c occur simultaneously.
This leads to the prediction that, for all but finitely many n, we have that pn+1 pn h(n).
Assume that h(n) n/ log n for all n, since we already know by the Prime Number
Theorem that pn+1 pn + O(pn / log2 pn ). Then
bh(n)c h(n)
1 Y 1 1 1 1 h(n)/ log n
Prob(An (h)) = 1 1 e ,
log n j=1 log(n + j) log n log n log n
So applying Lemma 7.4.2, we deduce that (7.4.4) holds for this choice of h. Consequently,
the discussion of the previous paragraph leads to the prediction that, for every fixed > 0,
we have that pn+1 pn (1 + ) log2 n for n n0 ().
Using a similar but slightly more involved argument, we arrive to the prediction that
pn+1 pn > (1 ) log2 n for infinitely many n. Let
$ n
%
X
mn = (1 ) log2 k = (1 )(n log2 n n log n + n) + O(1),
k=2
and consequently, X
Prob(Bn (h)) = .
n10
Since the events Bn (h) are independent, by our assumption that the random variables Yn
are independent, Lemma 7.4.2 implies that
Prob lim sup Bn (h) = 1,
n
that is to say, infinitely many of the events Bn (h) occur simultaneously almost surely. This
leads to the prediction that infinitely many of the intervals [mn , mn+1 ) contain precisely one
prime number, or equivalently, that pn+1 pn > (1 ) log2 n for infinitely many n, as we
mentioned above.
Putting together the above predictions, Cramer was led to conjecture that
pn+1 pn
lim sup = 1.
n (log n)2
Later, building on the work of Maier [Mai85], Granville [Gr95] gave evidence that this
conjecture might not be true. More precisely, using the refinement of Cramers model that
we gave above, he conjectured that
pn+1 pn
(7.4.5) lim sup 2e = 1.12291896713 > 1.
n (log n)2
Exercise 7.4.3. Use the refinement of Cramers model given by (7.4.3) to give evidence in
support of (7.4.5).
Chapter 8
So far we were concentrating our efforts into proving that the primes behave in the expected
way. As it is discussed in Section 7.4, we expect the maximal gap between two consecutive
primes pn and pn+1 to be of the order of (log pn )2 . So, if y (log x)2+ , then it seems natural
to conjecture that the interval (x, x + y] contains the right amount of prime numbers, that
is to say,
y
(8.0.1) (x + y) (x) ,
log x
Theorem 8.0.4 (Selberg). Assume that the Riemann Hypothesis is true. For every fixed
> 0 and > 0, we have that
1 y y 2+
lim meas X x 2X : (x + y) (x) , y = (log x) = 1.
x X log x log x
However, in 1985 Maier arrived to the groundbreaking conclusion that (8.0.1) fails in-
finitely often when y is a power of log x:
Remark 8.0.6. Note that if relation (8.0.2) holds for some fixed 0 , then it also holds for
all (0, 0 ] by the pigeonhole principle.
We will show Theorem 8.0.5 in Section 8.2. Before this, in Section 8.1, we will introduce
and study the so-called Buchstab function, which is central in the proof of Theorem 8.0.5.
115
116 CHAPTER 8. IRREGULARITIES IN THE DISTRIBUTION OF PRIMES
We know that
Z x
dt x
(x, z) = 1 + (x) (z) = + O clog x
z log t e
(8.1.1)
x x z
= +O + ( x z x).
log x (log x)2 log z
On the other hand, Theorem 3.3.2 and Mertens estimate imply that
e x
1 s log s+O(s)
(8.1.2) (x, z) = 1+O +e (1 z x, x = z s ).
log z log x
We want to understand the transition from (8.1.1) to (8.1.2) as z becomes smaller compared
to x. As in Chapter 6, in order to achieve this, we use a variation of Buchstabs identity:
X
0
(8.1.3) (x, z) = (x, z ) (x/p, p) (1 z z 0 ).
z<pz 0
When x1/3 z < x1/2 , applying formula (8.1.3) with z 0 = x yields
X
(x, z) = (x, x) + (x/p, p)
z<p x
X x/p
x x x/p z
= +O + +O +
log x (log x)2 log(x/p) log2 (x/p) log z
z<p x
Z x
x 1 du x
(8.1.4) = +x +O
log x z u log(x/u) log u (log x)2
Z x
x 1 du x
= +x log x
+O
log x z log u
1 u(log u)2 (log x)2
Z log x
x x log z dt x
= + +O .
log x log x 2 t1 (log x)2
Motivated by relations (8.1.1), (8.1.2), (8.1.3) and (8.1.4), we define Buchstabs function
w : [0, +) R by letting w(u) = 0 for u 1, w(u) = 1/u for 1 < u 2, and then defining
w inductively for u > 2 via the relation
1 1 u1
Z
(8.1.5) w(u) = + w(t)dt.
u u 1
This define a continuous and differentiable function for u > 2. The following theorem gives
the expected relation between (x, z) and (u).
8.1. BUCHSTABS FUNCTION 117
Theorem 8.1.1. For 1 z x with x = z u , we have that
x(u) x
(x, z) = +O .
log z (log z)2
Before we prove Theorem 8.1.1, we discuss briefly the properties of Buchstabs function.
By induction and relation (8.1.5), we immediately find that
1
(8.1.6) w(u) 1 (u > 1).
u
Moreover, mutiplying (8.1.5) by u and differentiating the resulting identity, we find that
Z u
0
(8.1.7) w (u)u = w(u) + w(u 1) = w(t)dt (u > 2).
u1
Proof of Theorem 8.1.1. First, we show that Buchstabs function satisfies an identity similar
to (8.1.3). Indeed, the prime number theorem implies that
X 1 Z x
dt
= c1 + 2
+ R(x) with R(x) ec2 log x ,
px
p log p 2 t(log t)
for some appropriate constants c1 R and c2 >0. Together with relations (8.1.5), (8.1.6)
0
and (8.1.8), this implies that for z 0 = x1/u [z, x],
Z z0 z0
X log x 1 log x dt log x
w 1 = w 1 + R(t)w 1
z<pz 0
log p p log p z log t t(log t)2 log t
t=z
Z z0
log x log x
+ R(t)w0 1 dt
z log t t(log t)2
Z u
1
= w(s 1)ds + O(ec2 log z )
log x u0
w(u)u w(u0 )u0 w(u) w(u0 )
= + O(ec2 log z ) = + O(e c2 log z
).
log x log z log z 0
Together with (8.1.3), the above relation implies that
0
X x
(8.1.9) R(x, z) = R(x, z ) + R(x/p, p) + O ,
z<pz 0
e c2 log z
118 CHAPTER 8. IRREGULARITIES IN THE DISTRIBUTION OF PRIMES
where
y log y
R(y, t) := (y, t) w .
log t log t
We will prove Theorem 8.1.1 using the above formula and an inductive argument, much like
the proof of Theorem 6.1.1. It suffices to prove that there is some absolute constant C such
that
Cx
(8.1.10) |R(x, z)| (x = z u z 2 4).
(log z)2
Clearly, we may assume that z is large enough. Also, if u [2, 3), then note that (8.1.10)
follows by (8.1.4). Next, assume that relation (8.1.10) holds for all u [2, U ), for some
U 3, and consider u [U, U + 1). Note that (log(x/p))/(log
p) u 1 < U , for all p > z,
so the induction hypothesis and (8.1.9) with z 0 = x imply that
Cx
X x
|R(x, z)| |R(x, x)| + 2
+ O
p(log p) ec2 log z
z<p x
X 1 x x
Cx +O + ,
p>z
p(log p)2 (log x)2 ec2 log z
Choosing C large enough implies that (8.1.10) when u [U, U + 1) too, thus completing the
inductive step. This concludes the proof of (8.1.10), and hence of Theorem 8.1.1.
(8.1.11) Z u Z
0
w0 (t)dt + O eu log(u log u)+O(u)
w(u) = w(2) + w (t)dt = w(2) + (u > 2).
2 2
In particular, the limit limu w(u) exists. On the other hand, Theorem 3.3.2 and Mertens
estimate imply that
e x
1 u log u+O(u)
(x, z) = 1+O +e .
log z log z
8.2. MAIERS MATRIX METHOD 119
Comparing the above formula with Theorem 8.1.1, we deduce that limu w(u) = e .
Together with relation (8.1.11), this implies that
and the first part of the theorem follows. To see the second part, write E(u) = w(u) e
and note that relation (8.1.7) can be rewritten as
Now, if u+ = sup{u 2 : E(t) > 0 for all t u}, then we must have that u+ = ; otherwise
we have that Z
0 < (u+ + 1)E(u+ + 1) = E(t)dt < 0,
u+
a contradiction. Similarly, if u = sup{u 2 : E(t) < 0 for all t u}, then we have
that u = . These two facts together imply that E changes sign infinitely often, thus
completing the proof of the theorem.
where m, h and q are fixed positive integers. Note that the i-th row of this matrix contains
all integers in the short interval (1 + (m + i)q, h + (m + i)q], whereas its j-th column contains
all integers in the arithmetic progression {n j (mod q) : j + mq < n j + 2mq}. Now,
if we know that primes are well distributed in the arithmetic progressions {n j (mod q) :
j + mq < n j + 2mq}, 1 j h, (j, q) = 1, then we expect that
Z j+2mq
X 1 dt
#{p prime : p appears in M (m, q, h)}
1jh
(q) j+mq log t
(j,q)=1
mq
#{1 j h : (j, q) = 1}.
(q) log(mq)
120 CHAPTER 8. IRREGULARITIES IN THE DISTRIBUTION OF PRIMES
On the other hand, if we know that each short interval (1+(m+i)q, h+(m+i)q], 1 i m,
contains the expected proportion of primes, then
m
X h mh
#{p prime : p appears in M (m, q, h)} = .
i=1
log(mq) log(mq)
li(x) n o
(x; q, a) = 1 + O xc/ log q + ec log x (x q, (a, q) = 1).
(q)
Proof. Without loss of generality, we may assume that x q L for a sufficiently large L;
otherwise, the result follows from the Brun-Titchmarsch inequality. For any T 1, we have
that
X x x log2 (qx)
X x 1 X
(x; q, a) := (n) = + +O .
(8.2.4) (q) (q) T
nx (mod q) : L(,)=0
na (mod q) |Im()|T
(This is a standard consequence of Perrons formula and the residue theorem in complex
analysis; see, for example, [Da, Chapters 17, 19].) Since the zeroes of L(s, ) are sym-
metric about the line Re(s) = 1/2, a consequence of the functional equation for Dirichlet
L-functions, relation (8.2.4) can be rewritten as
(8.2.5)
x x1 qx log2 (qx)
x 1 X X
(x; q, a) = + + +O .
(q) (q) 1 T
(mod q) : L(,)=0
Re()1/2, |Im()|T
8.2. MAIERS MATRIX METHOD 121
Now, Linnik (see, for example, [IK, Chapter 18]) showed that there exists an absolute positive
constant c1 such that
X X
1 (qT )c1 (1) ( 1/2, T 1).
(8.2.6) (mod q) : L(,)=0
Re(), |Im()|T
We will show that this estimate, together with an assumption about a certain zero-free region
for all L(s, ) implies the conclusion of the lemma. More precisely, we assume that there
exists some constant c2 > 0 such that
c2
(8.2.7) L( + it, ) 6= 0 for all 1 and all (mod q),
log(q + |t|)
for some positive constant c3 that depends at most on c2 , provided that L is large in terms
of the absolute constant c1 . Partial summation the implies that the conclusion of the lemma
holds, provided that q satisfies (8.2.7).
Q So it remains to show that (8.2.7) holds for infinitely
many moduli q of the form q = pz p.
We know (see, for example, [Da, p. 93]) that for each modulus q 2 there is at most one
possible counterexample to (8.2.7), that is to say, there exists a constant c4 > 0 which has
the following property: there is at most one real non-principal Dirichlet character (mod q)
which has at most one zero = + i with 1 c4 / log(q + ||). Moreover, if such a zero
exists, then it is necessarily
Q real and simple, that is to say = 1 c4 / log q.
Now, consider q = pz p for which an Q exceptional zero as above exists, say at . By
0
Betrands postulate, there exists some q = pz0 p q such that
c4 c4
1 1 .
log q 0 2 log q 0
By the discussion in the previous paragraph, the modulus q 0 satisfies relation (8.2.7) with c2 =
c4 /2. (The character induces a character 0 (mod q 0 ), which has a zero 1 c4 / log(q 0 ).
Therefore, this is the unique character failing (8.2.7) when cQ2 = c4 .) In any case, we see that
we can construct arbitrarily large moduli of the form q = pz p for which relation (8.2.7)
is true. As we saw above, such moduli satisfy the conclusion of the lemma, which completes
the proof.
Lemma 8.2.1 together with the argument we gave in the beginning of the section yield
Theorem 8.0.5. We give the complete argument below.
Proof of Theorem 8.0.5. By Remark 8.0.6, it suffices to show the theorem for an unbounded
Q
sequence of arbitrarily large values of . Fix for the moment some 3. Let q = pz p be
122 CHAPTER 8. IRREGULARITIES IN THE DISTRIBUTION OF PRIMES
2
a sufficiently large modulus which satisfies
the conclusion
of Lemma 8.2.1. Set m = q L eL ,
where L is a large integer, and let h = (log q) [z 2 , q]. Consider the matrix M (m, h, q),
defined by (8.2.1), and let N be the number of its elements that are prime numbers. Then
X X X 1 Z j+2mq dt
mq
N= 1= + O cL
1jh j+mq<pj+2mq 1jh
(q) j+mq log t e (q) log(mq)
(j,q)=1 pj (mod q) (j,q)=1
X mq
1 1
mq
1 1
= 1+O + = (h, z) 1+O + .
1jh
(q) L log q ecL (q) log(mq) ecL
(j,q)=1
Theorem 8.1.1 and the fact that w(t) 1, for all t 2, imply that
xw(0 )
1
(h, z) = 1+O ,
log z log z
log h
where 0 = log z
, as z . Since we also have that
e
(q) Y 1 1
= 1 = 1+O ,
q pz
p log z log z
Therefore
(x + (log x) ) (x) 1 + w()e
lim sup > 1,
x (log x)1 2
which proves the first part of the theorem. The second part follows similarly, by using
relation (8.2.10) in place of (8.2.9).
124 CHAPTER 8. IRREGULARITIES IN THE DISTRIBUTION OF PRIMES
Chapter 9
In this chapter we will see a quite different approach to sieving, the so-called large sieve.
There are three versions of it, each suited for different applications. Originally, the large
sieve arose as an inequality involving trigonometric polynomials, which explored the idea of
quasi-orthogonality. However, for number-theoretic purposes, the power of the large sieve is
revealed when stated in its two other forms: the arithmetic version and the character sum
version. As we will see, the former allows us to obtain quite good estimates in sieve problems
of very large dimension, whereas using the later we can control the average distribution of
interesting sets of integers in arithmetic progressions.
We begin with the arithmetic formulation of the large sieve, Theorem 9.1.1, which will
allow for a direct comparison with the results of the previous chapters. Then we give the more
classical trigonometric version of the large sieve, Theorem 9.2.1, and show how to deduce
Theorem 9.1.1 from it. Finally, we conclude with the character sum version in Section 9.3.
Arguably the most important application of this last version is the Bombieri-Vinogradov
theorem, whose proof we give in the subsequent chapter.
where
Y |Rp |/p Y |Rp |
h(m) = = .
1 |Rp |/p p |Rp |
p|m p|m
125
126 CHAPTER 9. THE LARGE SIEVE
then
#{n N : n
/ Rp (mod p), p < z} = #{n N : p - F (n), p < z} = S(A, z).
So we see that Theorem 9.1.1 can be translated to a sieve estimate in important cases
such as the above one. Moreover, it provides an upper bound on S(A, z) that is - up to
the value of the constant C - as strong as Theorem 5.1.1. Finally, Theorem 9.1.1 has the
significant advantage that it does not depend on the assumption of hypotheses such as (A1),
(A3) and (R0 ), and it is particularly strong when the sifting dimension becomes unbounded.
The proof of Theorem 9.1.1 will be given in Section 9.2. We give below a couple of
applications of it.
Given a prime p, we define
a
n(p) = min a N : 6= 1 .
p
It is easy to see that n(p) is a prime number < p. The least quadratic non-residue problem
asks for estimates on n(p). It is believed that n(p) (log p)1+ , whereas the Generalized
Riemann Hypothesis
would imply that n(p) (log p)2+ . The pointwise bound known is
n(p) p1/(4 e)+ , for every fixed > 0. Using Theorem 9.1.1, we will show that for most
primes p it is possible to do much better than this bound:
(9.1.1) |N | N 2
so that (
p1
2
if n(p) > N and p 3,
|Rp | =
0 otherwise,
Note that if p is such that n(p) > N , then ( pq ) = 1 for all primes q N , and consequently
( mp ) = 1 for all m N. This implies that
(9.1.2) N {m N 2 : m
/ Rp (mod p), for all p < N }
The idea of the proof is that if Rp 6= 0 for too many primes p < N , then N will be forced to
have abnormally small size, thus contradicting (9.1.1). Indeed, relations (9.1.1) and (9.1.2)
imply that
N 2 #{n N 2 : n / Rp (mod p), for all p < N }.
9.1. ARITHMETIC VERSION AND APPLICATIONS 127
So, if we let
X Y |Rp |
S= 2 (m) ,
m<N
p |Rp |
p|m
(9.1.3) S 1.
Combining the above relation with (9.1.3) completes the proof of the theorem.
Proof. Exercise.
Exercise 9.1.5. Fix > 0. Show that, for every prime p, we have that
n(p) p1/(2 e)+
.
Hint: Use Theorem A.3.1 and the fact that ( mp ) = 1 for all m with P + (m) n(p).
Finally, we give a last application to demonstrate the power of the large sieve when the
sifting dimension grows. As a motivation, note that the set of squares occupies exactly
(p + 1)/2 congruence classes modulo each odd prime p, or equivalently, it avoids (p 1)/2
congruence classes modulo each odd prime p. Moreover, there are about N squares of size
N . Theorem 9.1.1 implies that this is in fact best possible:
Since |Rp | = p/2 + O(1), we find that |Rp |/(p |Rp |) = 1 + O(1/p), and (9.1.4) follows by
an application of the convolution method (see Section 0.2).
128 CHAPTER 9. THE LARGE SIEVE
for some relatively small M = M (, N ) that is close to N . The following theorem confirms
this guess.
Theorem 9.2.1 (Large sieve - trigonometric version). Let {an }N n=1 be a sequence of complex
numbers. Consider a set of -spaced real numbers {1 , . . . , R }. Then
2
XR X N X N
(9.2.1) a e(n ) (N + 1/) |an |2 .
n r
r=1 n=1 n=1
Remark 9.2.2. Selberg [Se91] and, independently, Montgomery and Vaughan [MV] showed
that the above theorem holds with N + 1/ 1 in place of N + 1/, which is best possible
in this generality, as the two examples below indicate:
If R is fixed and an = e(n1 ) for all n, then the left hand side of (9.2.1) is N 2 ,
whereas the right hand side of (9.2.1), which is asymptotically equal to (N + 1/
1) n=1 |an |2 . In fact, if R = 1 and = 1, then we have that
P
2
XR X N XN
2
an e(nr ) = N = (N + 1/ 1) |an |2 .
r=1 n=1
n=1
9.2. QUASI-ORTHOGONALITY AND THE TRIGONOMETRIC VERSION OF THE LARGE SIEVE129
So we have that
2 Z 2
XR XN 1 X N XN
an e(nr ) an e(n) d = |an |2
r=1 n=1
0 n=1
n=1
There are several approaches to proving Theorem 9.2.1, or other closely related results.
We follow an argument due to Gallagher, which is based on the following key lemma.
Proof. We may assume that c = 0 and = 2; if not, we replace f (t) by g(t) = f (c R + t/2).
1 1
The idea of the proof is that f (0) should be well approximated by the mean value 2 1 f (t)dt,
and the quality of this approximation should be controlled by how large f 0 changes. Indeed,
note that
Z 1 Z 1 Z 1
f (t)dt 2f (0) = (f (t) f (0))dt = f (1) f (0) + f (1) f (0) tf 0 (t)dt
1 1 1
Z 1 Z 0 Z 1
= f 0 (t)dt f 0 (t) tf 0 (t)dt
0 1 1
Z 1
= sgn(t)(1 |t|)f 0 (t)dt,
1
Taking absolute values and using the triangle inequality then completes the proof of the
theorem.
130 CHAPTER 9. THE LARGE SIEVE
Proof of Theorem 9.2.1. For each r {1, . . . , R}, we apply Lemma 9.2.3 with c = r and
N
!2
X
f (t) = an e((n N/2)t)
n=1
to get that
Z r +/2 Z r +/2
1 1
(9.2.2) |f (r )| |f (t)|dt + |f 0 (t)|dt.
r /2 2 r /2
Since the points {1 , . . . , r } are -spaced, the intervals (r /2, r +/2) are disjoint (mod 1).
But f is 1-periodic, so summing relation (9.2.2) yields that
R
1 1 1 1 0
X Z Z
|f (r )| |f (t)|dt + |f (t)|dt.
r=1
0 2 0
In order to complete the proof of the theorem, note that
2
Z 1 Z 1 X
X Z 1
|f (t)|dt =
an e((n N/2)t) =
an am e((n m)t)dt
0 0 N/2<nN/2 1n,mN 0
X
= |an |2
N/2<nN/2
and, similarly,
! N !
Z 1 Z 1 N X
0
X N
|f (t)|dt = 2 an e((n N/2)t) 2 n an e((n N/2)t) dt
2
0 0 n=1 n=1
Z 1 XN
2 1/2 Z N
2 1/2
1 X
N
4 an e(nt) n an e(nt)
2
0
n=1
0
n=1
N
! N 2 !1/2 N
X
2
X N 2
X
= 4 |an | n |an | 2N |an |2 .
n=1 n=1
2 n=1
We now show how Corollary 9.2.4 can be used to obtain control on the average of the
error
N
X 1X
an an ,
1nN
p n=1
na (mod p)
Corollary 9.2.5. Let Q 1, and {an }N n=1 be a sequence of complex numbers. Then we have
that 2
p N N
X X X 1 X 2
X
p a n a n
(N + Q ) |an |2 .
pQ b=1 1nN
p n=1 n=1
nb (mod p)
PN
Proof. For brevity, we write S() = n=1 an e(n). Note that
N N p N
X 1X 1X X 1X
an an = an e(j(n b)/p) an
1nN
p n=1 p n=1 j=1 p n=1
nb (mod p)
N p1 p1
1X X 1X
= an e(j(n b)/p) = e(jb/p)S(j/p).
p n=1 j=1 p j=1
Consequently,
2
p N p p1 p1
X X 1 X 1 XXX
an an = 2
e((j2 j1 )b/p)S(j1 /p)S(j2 /p)
b=1 1nN
p n=1 p b=1 j =1 j =1
1 2
nb (mod p)
p1 p1 p
(9.2.3) 1 XX X
= 2 S(j1 /p)S(j2 /p) e((j2 j1 )b/p)
p j =1 j =1 b=1
1 2
p1
1X
= |S(j/p)|2 .
p j=1
Multiplying the above identity by p, summing the resulting formula over p Q, and applying
Corollary 9.2.4 completes the proof of the corollary.
132 CHAPTER 9. THE LARGE SIEVE
Remark 9.2.6. Note that if we divide the inequality in the statement of Corollary 9.2.5 by
N 2 , we find that
2
p N N
Q2 X
X1X 1 X 1 X 1
an an
+ 2 |an |2 .
pQ
p N/p
b=1 1nN
N n=1 N N n=1
nb (mod p)
If |an | 1 for all n, then the right hand side of the above inequality is 1+Q2 /N . Therefore,
if Q = o(N ), then we find that, for most p Q,
2
p N
1 X
1
X 1 X
a n a n
= o(1),
p b=1 N/p 1nN N n=1
nb (mod p)
that is to say, an is well-distributed in most progressions b (mod p), for most primes p Q.
M = {M < n M + N : n
/ Rp (mod p), for all p < z}.
We claim that for all sequences of complex numbers, and for all square-free integers q < z,
we have that
2
X 2
X X
an e(an/q) h(q) an .
(9.2.4)
1aq nM nM
(a,q)=1
When q = 1, this holds trivially. For q > 1, we argue by induction on (q). First, we
establish (9.2.4) when q = p is prime. Our starting point is relation (9.2.3), which implies
that
2
p1
2 p
XX X X 1 X
an e(an/p) = p an an .
p
a=1 nM
b=1
nbnM nM
(mod p)
9.2. QUASI-ORTHOGONALITY AND THE TRIGONOMETRIC VERSION OF THE LARGE SIEVE133
Now, using the identity |z w|2 = |z|2 + |w|2 2Re(zw), we find that
p1
2
X X
an e(an/p)
a=1 nM
2
p
2
X X 1 X 2 X X
=p an + 2 an Re an am
p nM p
nM
b=1 nM mM
nb (mod p) nb (mod p)
2
X 2
p !
X X X X
=p an + an 2Re an am
b=1
nbnM nM nM mM
(mod p)
2
X 2
p
X X
=p an an .
nbnM
b=1 nM
(mod p)
On the other hand, the Cauchy-Schwarz inequality and the fact that if b Rp (mod p), then
there are no elements of M that lie in the arithmetic progression b (mod p), imply that
2 2
2 p
p
X X X X X
(9.2.5)
an =
an (p |Rp |)
an .
nM b=1 nM
nbnM
b=1
nb (mod p) (mod p)
that is to say, (9.2.4) does hold when p is a prime < z. Now assume that (9.2.4) holds for
all square-free integers q < z with (q) j, where j is some positive integer. Let q be a
square-free integer < z with (q) = j + 1. Then we may write q = q1 q2 with (qi ) j for
i {1, 2}. Furthermore, note that the set {a1 q2 + a2 q1 : 1 ai qi , (ai , qi ) (i {1, 2})} is
a set of representatives for the set of residues {1 a q : (a, q) = 1}. Hence
2 2
X X X X X a 1 n a 2 n
an e(an/q) = an e + .
q1 q
1aq nM 1a1 q1 1a2 q2 nM
(a,q)=1 (a1 ,q1 )=1 (a2 ,q2 )=1
For each fixed a1 as above, we apply (9.2.4) with q2 in place of q and an e(a1 n/q1 ) in place
of an , which holds by the induction hypothesis. So
2 2
X X a1 n a2 n X a1 n
an e + h(q2 ) an e .
q1 q q1
1a q nM
2 2 nM
(a2 ,q2 )=1
134 CHAPTER 9. THE LARGE SIEVE
Summing the above inequality over a1 and applying (9.2.4) with q1 in place of q, which also
holds by the induction hypothesis, yields that
2
X 2
X X
an e(an/q) h(q1 )h(q2 ) an .
1aq nM nM
(a,q)=1
Since h is a multiplicative function, we deduce that relation (9.2.4) is true. This completes
the inductive step, and hence the proof of (9.2.4). Finally, applying this relation with an
being the characteristic function of the set N implies that
2
X X X X
|N |2 2 (q)h(q) e(na/q) (N + z 2 )|N |,
q<z q<z 1aq nM
(a,q)=1
N 2 N
X q X X X
2
an (n) (N + Q ) |an |2 ,
(q)
qQ n=1
(mod q) n=1
X
where the notation means that the sum runs over primitive characters only.
PN
Proof. For brevity, we write S() = n=1 an e(n). In order to translate the statement
of the theorem to an inequality involving the additive characters n e(an/q) and apply
Corollary 9.2.4, we use Theorem A.2.2 (i.e. we use Fourier inversion with respect to the
additive characters; see Section A.1). This theorem implies that, for every primitive character
(mod q), we have
N N q q
X X 1 X 1 X
an (n) = an (a)e(an/q) = (a)S(a/q).
n=1 n=1
() a=1
() a=1
9.3. CHARACTER SUM VERSION 135
Since for such a character we also have that | ()| = q, we find that
2 q 2 q 2
N
X X 1 X
X 1 X X
an (n) = (a)S(a/q) (a)S(a/q)
q q
n=1
(mod q) a=1
(mod q) (mod q) a=1
q q
1 X XX
= (a1 )(a2 )S(a1 /q)S(a2 /q)
q
(mod q) a1 a2 =1
q q
1XX X (q) X
= S(a1 /q)S(a2 /q) (a1 )(a2 ) = |S(a/q)|2 ,
q a a =1 q 1aq
1 2 (mod q)
(a,q)=1
by Theorem A.1.1. Multiplying the above relation by q/(q), summing the resulting in-
equality over q Q, and applying Corollary 9.2.4 completes the proof of the theorem.
136 CHAPTER 9. THE LARGE SIEVE
Chapter 10
In this chapter we prove the Bombieri-Vinogradov theorem. We shall prove this theorem in
a slightly different but equivalent formulation: set
X
0
y
E (x; a) = max max (n) ,
(a,q)=1 yx
ny
(q)
na (mod q)
where is the von Mangoldt function. Then we have the following result.
Theorem 10.0.2 (Bombieri-Vinogradov theorem, II). Fix A > 0. There is B = B(A) > 0
such that X x
E 0 (x; q) A A
,
1/2 B
(log x)
qx /(log x)
Remark 10.1.2. If Q x3/7 , then x + x1/2 Q2 + x4/5 Q13/10 x1/2 Q2 . SoTheorem 10.1.1
implies that X
(n)(n) x1/2+
nx
137
138 CHAPTER 10. THE BOMBIERI-VINOGRADOV THEOREM
for most q (Q, 2Q] and for most primitive characters (mod q), an estimate which is as
good as the Generalized Riemann Hypothesis.
As we prove below, Theorem 10.0.2 follows by Theorem 10.1.1 and the following funda-
mental result:
Theorem 10.1.3 (Siegel-Walfisz). Fix A > 0. Let be a primitive Dirichlet character (mod q).
For 1 q (log x)A , we have that
X x
(n)(n) = ()x + OA ,
nx
ec log x
Remark 10.1.4. As it is well-known, the implied constant in the above theorem cannot be
computed effectively due to the potential presence of Landau-Siegel zeroes. This deficiency
will be inherited to the Bombieri-Vinogradov theorem, as it will become clear below.
Deduction of Theorem 10.0.2 from Theorem 10.1.1. First, we rewrite E 0 (x; q) in terms of
primitive characters. Note that
X X X
(n) log p 1 (q) log y (log q)(log y).
(10.1.1) ny p|q m1
(n,q)>1 pm y
In particular,
X X y
(n) = (n) + O((log y)(log q)) = y + O + O((log y)(log q))
ny ny
ec log y
(n,q)=1
=: y + O(Rq (y)).
So we have that
X y X 1 X
(n)(n) = (n)(n) (n) + O(Rq (y))
ny
(q) ny
(q) ny
na (mod q) na (mod q) (n,q)=1
X 1 X 1 X
= (a)(n) (n)0 (n) + O(Rq (y))
ny
(q) (q) ny
(mod q)
1 X X
(a) (n)(n) + O(Rq (y)).
(q) ny
(mod q)
6=0
10.1. REDUCTION TO DIRICHLET CHARACTERS 139
by Theorem A.1.1, where 0 denotes the principal character. Given a non-principal character
(mod q), let 0 be the primitive character which induces it, say of conductor d > 1. Then
X X X
(n)(n) (n)0 (n) (n)(n) (log q)(log y),
ny ny ny
(n,q)>1
|Rq (y)|
Rq0 (y) = + (log q)(log y),
(q)
then
X y 1 X X
(n)(n) = (a) (n)0 (n) + O(Rq0 (y))
ny
(q) (q) ny
(mod q)
na (mod q) 6=0
1 X X X X
= (n)0 (n) (a) + O(Rq0 (y)).
(q)
d|q 0 (mod d) ny (mod q)
d>1 is induced by 0
Since, each primitive character 0 (mod d) induces at most one character (mod q), taking
absolute values we deduce that
X X 1 X X X
E 0 (x; q) max (n)0 (n)
(q) yx
qx1/2 /(log x)B qx1/2 /(log x)B d|q 0 (mod d) ny
d>1
X
+O Rq0 (x)
qx1/2 /(log x)B
X X X
0
X 1
= max (n) (n)
yx (q)
1<dx1/2 /(log x)B 0 (mod d) ny qx1/2 /(log x)B
d|q
x(log x)
+O .
ec log x
For the sum over q, note that (ab) (a)(b) for all a and b, and consequently
X 1 X 1 1 X 1 log x
= .
(q) (dm) (d) (m) (d)
qx1/2 /(log x)B mx1/2 /(d(log x)B ) mx1/2 /(d(log x)B )
d|q
140 CHAPTER 10. THE BOMBIERI-VINOGRADOV THEOREM
Consequently,
X
E 0 (x; q)
qx1/2 /(log x)B
X log x X X
0
x(log x)
max (n) (n) + O
(d) yx ec log x
1/2 0 ny
1<dx /(log x)B (mod d)
X log x X d X X
0
x(log x)
max (n) (n) + O
k 1/2 B
2k k k+1
(d) 0 yx
ny
e c log x
12 x /(log x) 2 <d2 (mod d)
(log x)2 X
d X X
0
x(log x)
max max (n) (n) + O .
1Qx1/2 /(log x)B Q Q<d2Q (d) 0 yx e c log x
(mod d) ny
Otherwise, if (log x)A+8 Q x1/2 /(log x)B , then Theorem 10.1.1 implies that
X d X X xQ
max (n)0 (n) (log x)6 (x + x1/2 Q2 + x4/5 Q13/10 ) .
(d) yx (log x) A+2
Q<d2Q 0 (mod d) ny
This is barely not sufficient for deducing Theorem 10.0.2: indeed, the above inequality can
be rewritten as
1 X q X
X p p
(n)(n) Q x log x + x log x.
Q (q)
Q<q2Q (mod q) nx
10.2. VAUGHANS IDENTITY 141
However, the right hand side in the above estimate is never x/(log x)A , a crucial ingredient
in the deduction of Theorem 10.0.2.
We will see that the above approach can only work if instead of (n)Pwe have weights
that have a certain bilinear structure, that is to say weights of the form k`=n ak b` , where
ak is supported on integers k K and b` is supported on integers ` L, where we also have
that KL = x. Indeed, the key idea in proving Theorem 10.1.1 is to decompose as the
sum of convolutions f g of some arithmetic functions f and g which have one of the two
following properties: either f is supported on small integers and g is a nice smooth function,
such as g = 1 or g = log, or they are of the form f g where both f and g are supported on
large
P integers. In the first case, we take advantage of the cancellation coming from the sums
nt (n)g(n), which is a consequence of the smoothness of g and of the P olya-Vinogradov
inequality (see Theorem A.3.1). In the second case, we argue as in the previous paragraph,
applying the Cauchy-Schwarz inequality together with the character sum version of the Large
Sieve.
The aforementioned decomposition is given by the following lemma due to Vaughan.
Note that the first two sums appearing are of the first kind (which are often referred to in
the literature as Type I sums), whereas the third sum is of the second kind (which are often
referred to in the literature as Type II sums).
Lemma 10.2.1 (Vaughans identity). Let U 1 and V 1 be two parameters. For any
n > U , we have that
X X X X X
(n) = (a) log b (c)(d) (a) (d)
ab=n ab=n cd=a ab=n d|b
aV aU V cV, dU a>U, b>V dV
Moreover,
X X X X X
(k) log ` = (k) (m) = (m) (k)
k`=n k`=n m|` m|n k|n/m
k>V k>V k>V
X X X X
= (m) (k) = (m) (k)
mr=n k|r mr=n k|r
r>1 r>1
k>V kV
X X X X
= (m) (k) (m) (k)
mr=n k|r mr=n k|r
mU m>U, r>1
kV kV
X X X
= (k)(m) (m) (k),
mk`=n mr=n k|r
kV, mU m>U, r>V
kV
Remark 10.2.2. Lemma 10.2.1 can be also proven starting from the straightforward iden-
tity
0
0
0
= F F G G + F (1 G),
where is the Riemann function,
X (n) X (m)
F (s) = and G(s) = .
nU
ns mV
ms
Lemma 10.3.1 (Estimates for Type I sums). Let be a non-principal character (mod q).
Also, let f : N C be an arithmetic function that is supported on integers d D and which
satisfies the inequality |f | logr , for some r 0. Then, for any s 0, we have that
X
(f logs )(n)(n) D q(log(Dqx))r+s+1 (x 2).
nx
Remark 10.3.2. The above lemma yields a non-trivial result as soon as x > (D q)1+ .
Now, for every y 1, the Polya-Vinogradov inequality (i.e. Theorem A.3.1) and partial
summation imply that
Z y ! Z y !
s1
X X X (log t) X
(b)(log b)s = (log t)s d (b) = (log y)s (b) s (b) dt
by 1 bt by 1 t bt
Z y
(log t)s1
s
q(log q) (log y) + s dt q(log q)(log y)s .
1 t
So
X X x s
(f logs )(n)(n) (log a)r q(log q) log D q(log q)(log Dx)r+s ,
nx aD
a
and X X
(n) (a) log b V q(log x)2
U <ny ab=n
aV
then, if we set
X q X X
S(x; Q) = max (n)(n) ,
Q<q2Q
(q) yx
ny
(mod q)
we have that
X q X X
(10.3.2) S(x; Q) max (mn)m n + U V Q5/2 (log x)2 .
Q<q2Q
(q) yx
mny
(mod q) m>U, n>V
This reduces Theorem 10.1.1 to a bilinear sum estimate,w which will be accomplished in the
next section.
144 CHAPTER 10. THE BOMBIERI-VINOGRADOV THEOREM
Inserting the above estimate into (10.3.2), and majoring the integrand by its maximum over
all t [x2 , x2 ], we deduce that
X X (m)m X (n)n
3
X q
S(x; Q) x(log x) max
1/2+it
1/2+it
M U, N V
3 Q<q2Q
(q) m n
M N x, |t|x (mod q) U <mx/V
V <nx/U
+ U V Q5/2 (log x)2 .
since M x/V and N x/U , for all M and N as above. For Q x1/2 , we choose
U = V = x2/5 /Q3/5 , so that (10.4.2) becomes
This completes the proof of Theorem 10.1.1, and hence of the Bombieri-Vinogradov theorem.
146 CHAPTER 10. THE BOMBIERI-VINOGRADOV THEOREM
Dirichlet characters
In this chapter we gather some basic facts about an important class of multiplicative func-
tions, the Dirichlet characters. In general, a Dirichlet character modulo some integer q is a
completely multiplicative function1 : N C such that:
The two above facts imply that induces canonically a group homomorphism e : (Z/qZ)
C \ {0}, simply by letting e(n (mod q)) = (n). In group theoretic terms, e is a character
of the abelian group (Z/qZ) .
The above discussion gives a natural correspondence between Dirichlet characters mod q
and characters2 of the group (Z/qZ) . In the next section, we develop the basics of character
theory of finite abelian groups, and then discuss it further in the case of the groups (Z/qZ)
and Z/qZ.
147
148 APPENDIX A. DIRICHLET CHARACTERS
the relation |C(G)| = |G| follows by writing G as the direct product of cyclic groups, say
The characters of an abelian group G satisfy the following important orthogonality rela-
tions.
Theorem A.1.1. Let (G, ) be a finite abelian group. For every C(G), we have that
(
1 X 1 if = 0 ,
(A.1.2) (g) =
|G| gG 0 otherwise.
Proof. First, we prove relation (A.1.2). If = 0 , then (A.1.2) is trivially true. Now assume
that 6= 0 . For every h G, we have that hG = G. So we have that
q
X X X
(A.1.4) (h) (g) = (hg) = (g).
gG gG gG
Since 6= 0 , there must be some g G with (g) 6= 1. Together with (A.1.4), this
completes the proof of (A.1.2).
The proof of relation (A.1.3) is very similar. This relation is trivial when g = 1. Now,
if g 6= 1, then there exists some character 1 C(G) such that 1 (g) 6= 1. Indeed, if G is
cyclic, say G = Z/qZ, then this equivalent to saying that, for every n 6 (mod q), there is
some a Z such that an 6 0 (mod q), which is true. In general, the existence of 1 follows
by the cyclic case and relation (A.1.1). Now, we have that
X X X
(A.1.5) 1 (g) (g) = 1 (g) = (g).
C(G) C(G) C(G)
Since 1 (g) 6= 1 by assumption, relation (A.1.5) completes the proof of (A.1.3), and hence
of the theorem.
By (A.1.2), the characters of a group G form an orthonormal set in the space of functions
: G C with respect to the inner product
1 X
h, iG = (g)(g).
|G| gG
A.2. PRIMITIVE CHARACTERS 149
These functions are also called additive characters (mod q), as opposed to the Dirichlet char-
acters (mod q), which are also called multiplicative characters (mod q). Of particular impor-
tance is the interaction between these two different kind of characters. To this end, given a
function f : Z/qZ C, we define its Fourier transform by
q
X
fb(n) = f (a)e(an/q).
a=1
In order to study the Fourier transform of a Dirichlet character , we define its Gauss sum
() by the formula
q
X
() = (a)e(a/q).
a=1
or equivalently,
b(n) = ()(n), whenever (n, q) = 1.
This gives the Fourier transform of in terms of the additive characters mod q for the
frequencies n that are co-prime to q. In the next section we shall see that this formula can
be expanded to all n for an important class of Dirichlet characters, the primitive characters,
thus showing that a primitive Dirichlet character (mod q) is a conjugate eigenvector of the
Fourier transform (mod q) 3 , with conjugate eigenvalue equal to ().
to q, that is to say, is a genuine character (mod q), then we say that is primitive. We
give the formal definitions below.
Let q1 |q2 and consider two Dirichlet characters 1 and 2 , modulo q1 and q2 , respectively.
We say that 1 induces 2 if
(
1 (n) if (n, q2 ) = 1,
2 (n) =
0 otherwise.
Theorem A.2.2. Let be a primitive Dirichlet character mod q. Then, for every n N,
we have that q
1 X
(n) = (a)e(an/q).
() a=1
Proof. When (n, q) = 1, this is a consequence of the general Fourier inversion formula for
characters, that is to say, relation (A.1.6). Assume now that (n, q) = d > 1, in which case
we need to show that q
X
(a)e(an/q) = 0.
a=1
for all b {1, 2, . . . , q1 }. Since is primitive, Exercise A.2.1 implies that there is some k Z
for which(1 + kq1 , q) = 1 and (1 + kq1 ) 6= 1. Consequently,
d
X d
X
(1 + kq1 ) (b + jq1 ) = (b + [bk + (1 + kq1 )j] q1 ).
j=1 j=1
A.3. THE POLYA-VINOGRADOV INEQUALITY 151
Since (1 + kq1 , q) = 1, the numbers {bk + j(1 + kq1 ) : 1 j d} run over a complete set of
representatives mod d, and therefore
d
X d
X
(1 + kq1 ) (b + jq1 ) = (b + jq1 ).
j=1 j=1
Since (1 + kq1 ) 6= 1, relation (A.2.1) follows. This completes the proof of the theorem.
Using the above theorem, we can determine the size of the Gauss sum for primitive
Dirichlet characters.
Theorem A.2.3. Let be a primitive Dirichlet character mod q. Then | ()| = q.
Exercise A.2.4. Calculate the absolute value of the Gauss sum () for all Dirichlet
characters (mod q).
A.3 The P
olya-Vinogradov inequality
Theorem A.3.1 (Polya-Vinogradov inequality). Let be a non-principal character mod q.
Then X
(n) q log q.
M <nM +N
Proof. First, we prove the theorem when is primitive. Theorem A.2.2 and the periodicity
of imply that
X X 1 X X 1 X
(n) = (a)e(an/q) = (a)e(an/q)
M <nM +N M <nM +N
() 1aq M <nM +N
()
q/2<aq/2
(a,q)=1 (a,q)=1
1 X X 1 X 1
= (a) e(an/q) .
() M <nM +N
| ()| |1 e(a/q)|
q/2<aq/2 q/2<aq/2
(a,q)=1 (a,q)=1
152 APPENDIX A. DIRICHLET CHARACTERS
So Theorem A.2.3, and the fact that |1 e(x)| |x|, for all x [1/2, 1/2], imply that
X 1 X 1 X 1
(n) 2 q q log q.
M <nM +N
q |a/q| a
q/2<aq/2 1aq/2
(a,q)=1
This completes the proof in the case that is primitive. Finally, if is induced by the
primitive character 1 (mod q1 ), then
X X X X
(n) = (n) = 1 (n) = 1 (n)
M <nM +N M <nM +N M <nM +N M <nM +N
(n,q)=1 (n,q)=1 (n,q/q1 )=1
X X X X
= 1 (n) (d) = (d)1 (d) 1 (m)
M <nM +N d|(n,q/q1 ) d|q/q1 M/d<m(M +N )/d
X
r
q
q1 log q1 (q/q1 ) q1 log q q1 log q = q log q,
q1
d|q/q1
li(x) x
(x; q, a) = + O c log x .
(q) eA
The proof we will present is not the classical complex-analytic proof. Rather, we use the
approach of [Kou], which is based on the theory of pretentious multiplicative functions.
As it is common, instead of working directly with the indicator function of the primes,
we use von Mangoldts function (n), which reduces Theorem B.0.2 to proving that
X x x
(n) = + O c log x (q (log x)A , (a, q) = 1).
nx
(q) e A
na (mod q)
where (
1 if is principal,
() :=
0 otherwise.
So, it suffices to prove that
X x
(B.0.1) (n)
log x
(x exp{q })
nx
e
153
154 APPENDIX B. PRIMES IN ARITHMETIC PROGRESSIONS
We write F (s) for the Dirichlet series corresponding to and we note that
X (n) L0
F (s) = = (s, ) ()(s),
n=1
ns L
Our strategy towards proving (B.0.1) is to multiply (n) by an appropriately high power
of log n and control that sum instead. This manoeuvre will allow us to work with F (s)
when Re(s) > 1, where no arguments about the analyticity and the location of the zeroes of
L(s, ) are needed. For technical reasons, we also introduce the smoothing log(x/n) to the
sum we consider. Fourier inversion implies that
(1)k1 xs
Z
X
k1 x
(n)(log n) log = F(k1) (s) 2 ds
nx
n 2i Re(s)=1+1/ log x s
(B.0.2) Z
(k1) 1 dt
x F 1+ + it
log x 1 + t2
for all k N. In view of the above formula, we need to understand the size of the derivatives
of F . The following key lemma, which is based on an idea in [IK, p. 40], allows us to reduce
this problem to upper bounds for the derivatives of F (s) and a lower bound on |F (s)|.
(B.0.3)
(k1) a1 a2
F 0 F 0 F 00
X (1 + a1 + a2 + )!
(s) = k! (s) (s) ,
F a
a1 !a2 ! 1!F 2!F
1 +2a2 +=k
which can be easily verified by induction. In order to complete the proof of the lemma, we
will show that
X (a1 + a2 + + ak )! X a1 + a2 + + ak
(B.0.4) = = 2k1 .
a +2a ++ka =k
a !a
1 2 ! a k ! a +2a ++ka =k
a ,
1 2a , . . . , a k
1 2 k 1 2 k
Indeed, for each fixed k-tuple (a1 , . . . ,ak ) (N {0})k with a1 + 2a2 + + kak = k,
the multinomial coefficient a1a+a 2 ++ak
1 ,a2 ,...,ak
represents the way of writing k as the sum of a1
155
ones, a2 twos, and so on, with the order of the different summands being important, e.g. if
k = 5, a1 = 1, a2 = 2 and a3 = a4 = a5 = 0, then there are three such ways to write 5:
5 = 2 + 2 + 1 = 2 + 1 + 2 = 1 + 2 + 2. So we conclude that
X a1 + a2 + + ak
= #{ordered partitions of k},
a +2a ++ka =k
a1 , a2 , . . . , ak
1 2 k
where we define an ordered partition of k to be a way to write k as the sum of positive integers,
with the order of the different summands being important. To every ordered partition of
k = b1 + +bm , we can associate a unique subset of {1, . . . , k} in the following way: consider
the set B {1, . . . , k} which contains {1, . . . , b1 }, does not contain {b1 + 1, . . . , b1 + b2 },
contains {b1 +b2 +1, . . . , b1 +b2 +b3 }, does not contain {b1 +b2 +b3 +1, . . . , b1 +b2 +b3 +b4 }, and
so on. Then B necessarily contains 1 and, conversely, every subset of {1, . . . , k} containing
1 can arise this way. So we conclude that there are 2k1 ordered partitions, and (B.0.4)
follows, thus completing the proof of the lemma.
Our next goal is to obtain upper bounds on the derivatives of L(s, ) and a lower bound
on |L(s, )|. In fact, we shall perform a technical manoeuvre and switch to a sifted version
of L(s, ). We may do so since
(k1) (k1)
L0 L0y
(s, ) = (s, ) + O(ck k!(log y)k ),
L Ly
where
X (n)
Ly (s, ) := .
ns
P (n)>y
The reason for doing so is that, in general, we can only control nN (n)nit for N large
P
enough in terms of q and t, and it is possible that this sum is rather large for small values
of N . This would force L(j) (s, ) to be large. But then, the same thing would be true for
L(s, ). However, it is rather hard to capture this correlation in the sizes of L(j) (s, ) and
L(s, ) in practice.PBy considering the sifted L-function Ly (s, ), we ensure that the partial
sums we consider nN (n)nit all have N > y, so the small values of N can no longer cause
any problems.
Next, we need to control the sum nx, P (n)>y (n)nit . We use the Fundamental Lemma
P
of Sieve Methods (cf. Lemma
3.3.3) to construct upper and lower bound weights ( (d))d1
supported on the set {d x : d|P (y)}. Then
!
X X X
(n)nit = (1 + )(n)(n)nit + O (1 + 1 )(n)
nx nx nx
P (n)>y
x11/ log y
X X
+ it it
= (d)(d)d (m)m +O .
log y
d x mx/d
We break the inner sum into congruence classes mod q. For each b (Z/qZ) , partial
summation implies that
Z x/d
(x/d)1it
X
it it u
m = u d + O(1) = + O(1 + |t| log x).
1 q q(1 it)
mx/d
mb (mod q)
(q) x1it Y
11/ log y
1 x
= () 1 +O
q 1 it py p log y
p-q
for all x y. Let Rt (x) be the error term in the above approximation. Then
X (n)(log n)k
(log u)k ()(q) u1it Y
Z
1
s
=
d 1 + Rt (u)
n>z
n z u q 1 it py
p
P (n)>y p-q
Z Z
(log u)k (log u)k
()(q) Y 1
= 1 s
du +
dRt (u).
q py
p z u z u
p-q
Since
Z
(log u)k (log z)k Rt (z) (log u)k1 ( log u k)
Z
dRt (u) = + Rt (u)du
z u z z u+1
(log z)k + k (log u)k
Z
+ du
log y log y z u+1/ log y
Z
(log z)k +k (log u)k (k + 1)!(log z)k+1
+ 1 du ,
log y z log y z u1+1/ log y log y
157
This quantity defines a certain measure of distance between f and g, that is to say
D(f, g; y, x) is small if f pretends to be g for primes in (y, x]. It was introduced by Granville
and Soundararajan, partially in order to conceptualize and put under a unified framework
several results in the literature. It is central in the theory of pretentious multiplicative func-
tions and it satisfies the triangle inequality (see [GS, p. 207] for a slightly weaker form of
this inequality).
for all r (0, 1] and all complex numbers w, w0 such that |w|, |w0 | r. We write w = x + iy
and w0 = a + ib and we consider r, x and a fixed for the moment. So our goal is to show
that p
1 (ax by)/r2 1 x + 1 a,
p
for all b, y with |b| r2 a2 and |y| r2 b2 . It is easy to see that 1 (ax by)
is maximized when b2 = r2 a2 and y 2 = r2 x2 , so that (B.0.8) follows by the case
|w| = |w0 | = r. We set w = rei and w0 = rei , and we define s [0, 1] via the relation
2s
= r.
s2 +1
158 APPENDIX B. PRIMES IN ARITHMETIC PROGRESSIONS
We can now give our lower bound for |Ly (s, )|.
Lemma B.0.6. Fix (0, 1]. Let be a Dirichlet character modulo q, s = + it with
> 1 and t R, and y q + |t|. If |t| / log y or if is complex, then we have that
|Ly (s, )| 1. Finally, if is a non-principal, real character, and |t| 1/ log y, then
|Ly (s, )| Ly (1, ).
Proof. First, assume that either |t| / log y or is complex. Equivalently, |t| (2 )/ log y.
Note that
Re((p)pit )
X
1
log Ly 1 + it +
, = + O(1)
log x p>y
p1+1/ log x
X Re((p)pit )
= + O(1)
y<px
p
it log x
= D((n), (n)n ; y, x) log + O(1),
log y
upon observing that p1/ log x = 1 + O(log p/ log x) for p x and that p>x 1/p1+1/ log x 1.
P
Therefore we see that if |Ly (1 + 1/ log x, ) is small, then (n) must pretend to be (n)nit .
But then the triangle inequality implies that 2 (n) must present to be n2it , and this is
impossible by our upper bounds on |Ly (1 + 1/ log x + 2it, 2 )| provided by Lemma B.0.4.
Indeed, for every x y we have that
1
log x
X Re(2 (p)p2it )
2 it
D ((n), (n)n ; y, x) log + O(1)
4 log y y<px
p
1 log x 1 2
= log + O(1) log Ly 1 + + 2it,
4 log y log x
1 log x
log + O (1),
4 log y
159
n o
2
by relation (B.0.6) with k = 0, provided that y max q + |t|, e( )/|t| . Hence,
3/4
Ly 1 + 1 + it, log y
n 2
o
(B.0.9) x y max q + |t|, e( )/|t| ,
log x log x
We claim that this inequality is self-improving. Indeed, if x and y are as above, then
X (p) X (p) X (p) X (p)
1+it
= 1+it+1/ log x
+ O(1) = 1+it+1/ log x
1+it+1/ log y
+ O(1)
y<px
p p>y
p p>y
p p>y
p
Z xX
(p) log p du
= 1+it+1/ log u
+ O(1)
y p>y p u(log u)2
Z x 0
Ly
1 du
= 1 + it + , + O(1)
y Ly log u u(log u)2
Z x
du
(log y)3/4 (log u)1/4 + 1 1,
y u(log u)2
by (B.0.6) and (B.0.9). So we conclude that
X (p) n o
(2 )/|t|
(B.0.10) 1 x y max q + |t|, e .
p1+it
y<px
The above relation for x = max{e1/(1) , y} implies that |Ly (s, )| 1, which completes the
proof of the first part of the lemma.
Finally, assume that is a real, non-principal character, and that |t| 1/ log y. Let
x = max{e1/(1) , y} and z = min{x, e1/|t| } y q + |t|. Then we have that
X (p)
1,
p 1+it
z<px
trivially if z = x, and by relation (B.0.10) with z in place of y otherwise, which holds, since
2
in this case z = e1/|t| e( )/|t| . So we deduce that
X (p) X (p) X (p) + O(|t| log p) X (p)
= + O(1) = + O(1) = + O(1).
y<px
p1+it y<pz p1+it y<pz
p y<pz
p
Finally, we prove an estimate for the derivatives of (L0 /L)(s, ), which will be key in the
proof of Theorem B.0.2.
Lemma B.0.7. Let be a Dirichlet character modulo q and s = + it with > 1 and
t R. For every k N we have that
L0 (k1) ()(1)k1
(k 1)!
ck log(2q + |t|)
k
(s, ) + .
L (s 1)k () + (1 ())|Lq+|t| (s, )|
Proof. Set y = q + |t| and fix some constant to be chosen later. We separate three cases.
Case 1: 1+/ log y. Note that ()+(1())|Lq+|t| (s, )| 1, trivially if () = 1,
and by relation (B.0.6) with k = 0 if () = 0. Since we also have that
L0 (k1) k1
()(1) (k 1)! X (n)(log n)k1 (k 1)!
(s, ) + + (c1 k log y)k
L (s 1) k n1+/ log y (/ log y)k
n=1
Y ()
() 1
F (s) = (s 1) Ly (s, ) 1 ,
py
p
Together with (B.0.11), the above estimate reduces the desired result to showing that
If () = 0, then (B.0.12) holds, since |F (s)| = |Ly (s, )| 1, by relation (B.0.6) with
k = 0. Lastly, if () = Q1, then |F (s)| = 1 + O(|s 1| log y) = 1 + O() by relation (B.0.5)
with k = 0 (note that py, p|q (1 1/p) = (q)/q, since y > q by assumption). So choosing
small enough (independently of k, q and s), we find that |F (s)| 1, that is, (B.0.12) is
satisfied in this case too. This completes the proof of (B.0.12) and hence of the lemma.
The last ingredient missing in order to complete the proof of Theorem B.0.2 is Siegels
theorem:
Theorem B.0.8. Let (0, 1]. With at most one exception, for all real, non-principal,
primitive Dirichlet characters we have that L(1, ) 3 q , where q denotes the conductor
of ; the implied constant is effectively computable.
Before proving Theorem B.0.8, we need a preliminary lemma. The idea behind its proof
can be traced back to [Pi].
If L(1 , f ) 0 for some [0, 1/20], then L(1, f ) c,r /Q2 . If, in addition, L(1
, f ) = 0, then L(1, f ) c,r (log Q)r .
and
B 1 {u}
X 1 Z
1
= + + O(B 1 ) (B 1), where = 1 (1 ) du.
bB
b 1 u2
by Lemma B.0.9 applied to 1 and the standard bound L(1, 1 ) log q, which follows by
partial summation, we deduce the theorem.
Also, we state the following classical result, which is a consequence of Dirichlets class
number formula [Da, p. 49-50]. See also [IK, p. 37] for an elementary proof, in the spirit of
the proof of Lemma B.0.9.
If, now, |t| 1/ log(3q) 1/(3 log(q + |t|) or is complex, then |Lq+|t| (s, )| 1 by
Lemma B.0.6, so (B.0.14) follows. Also, if |t| 1/ log(3q) and is principal, that is to
say, () = 1, then we have trivially that () + (1 ())|Lq+|t| (s, )| = 1, so (B.0.14)
holds in this case too. Finally, if |t| 1/ log(3q) 1 and is real and non-principal, then
|t| 1/ log(q + |t|), and thus Lemma B.0.6 implies that
() + (1 ())|Lq+|t| (s, )| = |Lq+|t| (s, )| Lq+|t| (1, )
So, a continuity argument implies that
Y (p)
Y (p)
Lq+|t| (1, ) = lim+ L(, ) 1 = L(1, ) 1
1 p p
(B.0.15) pq+|t| pq+|t|
L(1, ) 1
/3
log q q
by Theorems B.0.8 and B.0.10, which shows (B.0.14) in this last case too. This completes
the proof of (B.0.13).
Next, for every integer k 3 and for every real number y 2, we apply relations (B.0.2)
and (B.0.13) to get that
(c5 k log(2q))2 + (c5 k log(|t| + 1))k
Z
X
k1 y
(n)(log n) log y 2+1
dt + y(c2 kq /3 )k
ny
n 1
|t| log(3q) t
y(c6 k 2 )k + y(c6 kq /3 )k .
Now, set ( k/2 k/2 )
c6 k 2 c6 kq /3
p
(x) = x log x +
log x log x
and note that (x) x, since c6 k 2 > k x1/k log x. We claim that
X
(B.0.16) (n)(log n)k1 (x)(log x)k1 (x 4).
nx
164 APPENDIX B. PRIMES IN ARITHMETIC PROGRESSIONS
If (x) > x/2, then (B.0.16) holds trivially. So assume that (x) < x/2. Applying (B.0.2)
for y = x and y = x (x) and subtracting one inequality from the other completes the
proof of (B.0.16). Relation (B.0.16) and partial summation imply that
Z x !
X 1 X
(n) = O( x) + k1
d (n)(log n)k1 2k (x) (x 16).
nx x (log t) nt
Choosing
log x log x log x
k = min , /3
+ O(1) = + O(1),
4c6 4c6 q 4c6
where we used our assumption that x exp{q }, yields the estimate
X
c7 log x
(B.0.17) (n) xe ,
nx
which proves (B.0.1) and, consequently, completes the proof of the theorem.
Bibliography
[Da] H. Davenport, Multiplicative number theory. Third edition. Revised and with a preface
by Hugh L. Montgomery. Graduate Texts in Mathematics, 74. Springer-Verlag, New
York, 2000.
[Fo] K. Ford, Sieve methods. Lecture notes (2011). http://www.math.uiuc.edu/~ford/
sieve_methods_Sp2011.html
[FGKT] K. Ford, B. Green, S. Konyagin and T. Tao, Large gaps between consecutive prime
numbers. Preprint (2014). arXiv:1408.4505
[FGKMT] K. Ford, B. Green, S. Konyagin, J. Maynard and T. Tao, Long gaps between
primes. Preprint (2014). arXiv:1412.5029
[FG] J. Friedlander and A. Granville, Limitations to the equi-distribution of primes. I. Ann.
of Math. (2) 129 (1989), no. 2, 363382.
[FI98] J. Friedlander and H. Iwaniec, Asymptotic sieve for primes. Ann. of Math. (2) 148
(1998), no. 3, 10411065.
[FI10] , Opera de cribro. American Mathematical Society Colloquium Publications, 57.
American Mathematical Society, Providence, RI, 2010.
[GPY] D. A. Goldston, J. Pintz and C. Y. Yildirim, Primes in tuples. I. Ann. of Math. (2)
170 (2009), no. 2, 819862.
[GGPY] D. A. Goldston, S. W. Graham, J. Pintz and C. Y. Yildirim, Small gaps between
primes or almost primes. Trans. Amer. Math. Soc. 361 (2009), no. 10, 52855330.
[Gr95] A. Granville, Harald Cramer and the distribution of primes. Harald Cramer Sympo-
sium (Stockholm, 1993). Scand. Actuar. J. 1995, no. 1, 1228.
[Gr15] , Primes in intervals of bounded length. Bull. Amer. Math. Soc. (N.S.) 52 (2015),
171222.
[Gr12] , Analytic number theory. Lecture notes (2012). http://www.dms.umontreal.ca/
~andrew/Courses/MAT6627.A12.html
[GS] A. Granville and K. Soundararajan, Pretentious multiplicative functions and an in-
equality for the zeta-function. Anatomy of integers, 191197, CRM Proc. Lecture Notes,
46, Amer. Math. Soc., Providence, RI, 2008.
165
166 BIBLIOGRAPHY
[HR] H. Halberstam and H.E. Richert, Sieve methods. London Mathematical Society Mono-
graphs, No. 4. Academic Press [A subsidiary of Harcourt Brace Jovanovich, Publishers],
London-New York, 1974.
[IK] H. Iwaniec and E. Kowalski, Analytic number theory. American Mathematical Society
Colloquium Publications, 53, American Mathematical Society, Providence, RI, 2004.
[Kou] D. Koukoulopoulos, Pretentious multiplicative functions and the prime number theo-
rem for arithmetic progressions. Compos. Math. 149 (2013), no. 7, 11291149.
[Kow] E. Kowalski, Gaps between prime numbers and prime numbers in arithmetic progres-
sions, after Y. Zhang and J. Maynard. Survey (Bourbaki seminar, March 2014). The
english version is available at people.math.ethz.ch/~kowalski/zhang-bourbaki.pdf
[Mai81] H. Maier, Chains of large gaps between consecutive primes. Adv. in Math. 39 (1981),
no. 3, 257269.
[May15a] J. Maynard, Small gaps between primes. Ann. of Math. (2) 181 (2015), no. 1,
383413.
[MV] H. L. Montgomery and R. C. Vaughan, The large sieve. Mathematika 20 (1973), 119
134.
[Pi] J. Pintz, Elementary methods in the theory of L-functions. I. Heckes theorem. Acta
Arith. 31 (1976), no. 1, 5360.
[Pol] D. H. J. Polymath, Variants of the Selberg sieve, and bounded intervals containing
many primes. Research in the Mathematical Sciences 1:12 (2014). arXiv:1407.4897
[Ra] R. A. Rankin, The difference between consecutive primes. J. London Math. Soc. (1938)
s1-13 (4): 242247.
[Ro] G. Rodriquez, Sul problema dei divisori di Titchmarsh. (Italian. English summary).
Boll. Un. Mat. Ital. (3) 20 1965, 358366.
[Se47] A. Selberg, On an elementary method in the theory of primes. Norske Vid. Selsk.
Forh., Trondhjem 19, (1947). no. 18, 6467.
[Se91] , Lectures on sieves. Collected papers. Vol. II. With a foreword by K. Chan-
drasekharan. Springer-Verlag, Berlin, 1991.
[Sou] K. Soundararajan, Small gaps between prime numbers: the work of Goldston-Pintz-
Yildirim. Bull. Amer. Math. Soc. (N.S.) 44 (2007), no. 1, 118.
BIBLIOGRAPHY 167
[Taa] T. Tao, The parity problem is sieve methods. Blog post: terrytao.wordpress.com/
2007/06/05/open-question-the-parity-problem-in-sieve-theory/
[Tab] , Polymath8b: Bounded intervals with many primes, after Maynard. Blog post:
terrytao.wordpress.com/2013/11/19/polymath8b-bounded-intervals-with-
many-primes-after-maynard/#more-7155
[Z] Y. Zhang, Bounded gaps between primes. Ann. of Math. (2) 179 (2014), no. 3, 1121
1174.