Sie sind auf Seite 1von 131

MA4263 Introduction to Analytic Number

Theory

Notes by Chan Heng Huat


References

The references are “Introduction to Analytic Number Theory” by T.M.


Apostol, 1991 Analytic Number Theory notes by A. Hildebrand, “An-
alytic Number for undergraduates” by H.H. Chan. The last chapter is
based on notes by F. Sica.

1
1
Fundamental Theorem of Arithmetic

1.1 Least Integer Axiom and Mathematical Induction


Let
Z = {0, ±1, ±2, · · · }

be the set of integers. Let N denote the set of non-negative integers.


The Least Integer Axiom, more commonly known as the Well Ordering
Principle, is the following:
The Least Integer Axiom
Let S be a non-empty subset of the set of positive integers Z+ Then
there exists an integer ` ∈ S such that

`≤s

for all s ∈ S.
The Least Integer Axiom is equivalent to the Principle of Mathemat-
ical Induction. Assuming the Principle of Mathematical Induction, we
now derive the Least Integer Axiom.

Proof
Let S be any non-empty subset of Z+ and let u ∈ S. Write

S = {s ∈ S|s ≤ u} ∪ {s ∈ S|s > u}.

Note that if the first set has a least positive integer, then this integer is
also a least positive integer of S. This shows that we only need to prove
the Least Integer Axiom for finite subsets of Z+ .
For k ≥ 1, let P (k) denote the proposition that “any non-empty subset
T of Z+ with k integers contains a least non-negative integer.” If a set

2
1.2 The Division Algorithm 3
T has one integer, then that integer is the least integer in T and hence,
P (1) is true. Suppose P (k) is true. Let T be a subset of Z+ with k + 1
elements. Let a ∈ T . Consider the set T − {a}. This is a set with k
elements. By induction hypothesis, T − {a} has a least non-negative
integer, say m. Then m ≤ s for all s ∈ T − {a}.
Now, if a ≤ m, then a ≤ t for all t ∈ T . If m ≤ a, then m is the least
non-negative integer in T . Hence, T has a least non-negative integer
and P (k + 1) is true. This completes the derivation of the Least Integer
Axiom from the Principle of Mathematical Induction.

Remark 1.1.1
Assuming the Least Integer Axiom, one can derive the Principle of Math-
ematical Induction.

1.2 The Division Algorithm

Theorem 1.2.1 (Division Algorithm)


Let a and b be integers such that b > 0. Then there exist unique integers
q and r with
a = bq + r, where 0 ≤ r < b.

Proof
Let
S = {y ∈ Z|y = a − bx, x ∈ Z and y ≥ 0}.
Note that since
a − b(−|a|) = a + b|a| ≥ 0,

we find that
a + b|a| ∈ S,

and we conclude that S is nonempty. By the Least Integer Axiom, S


contains a least non-negative integer, which we denote by r. We note
that since r ∈ S,
r = a − bq,
4 Fundamental Theorem of Arithmetic
for some integer q. We therefore conclude that
a = bq + r and r ≥ 0.

We now show that r < b. Suppose r ≥ b. Then


r−b≥0 and r − b = a − b(q + 1).

This implies that


r − b ∈ S.

By assumption, b > 0 and hence r − b < r. Hence, we have found


a non-negative integer r − b contained in S and smaller than r. This
contradicts the minimality of r and we conclude that r < b.
Finally, we show that the integers q and r are unique. We suppose the
contrary. Then there is a different representation of the form a = bq 0 +r0 .
This implies that
b(q 0 − q) = r − r0 (1.2.1)

and we conclude that


|r − r0 | = kb

for some positive integer k. On the other hand, both r, r0 lie in the
interval [0, b) and |r − r0 | can be a multiple of b only when |r − r0 | = 0.
In other words, r = r0 and by (1.2.1), q = q 0 . This contradicts the fact
that the representations a = bq 0 + r0 and a = bq + r are different and
therefore, the integers q and r must be unique.

When r = 0 in Theorem 1.2.1, we have a = bq and we say that b


divides a and we write b|a. When r > 0, we say that b does not divide
a and we write b - a.

Definition 1.2.1
If an integer b divides a, we say that b is a divisor of a and that a is a
multiple of b.

Definition 1.2.2
We say that a positive integer is a prime if it has exactly two divisors,
namely, 1 and itself.

We now state some elementary properties of divisibility.


1.3 Greatest common divisors 5
Theorem 1.2.2
Let m, n, a, d ∈ Z with ad 6= 0. Then the following statements are true:

(a) For all integers n, n|n.


(b) Let d 6= 0. If d|n and n|m, then d|m.
(c) Let d 6= 0. If d|n and d|m, then d|(an + bm).
(d) If d|n, then ad|an.
(e) If ad|an, then d|n.
(f) If d|n, then |d| ≤ |n|.
(g) Suppose nd 6= 0. If d|n and n|d,
 n then |d| = |n|.

(h) Suppose n 6= 0. If d|n, then n.
d

Proof
We will prove (c) and leave the rest of the statements as exercises. Since
d|n, we find that n = ds for some integer s. Similarly, d|m implies that
m = dt for some integer t. Now,

an + bm = ads + bdt = d(as + bt).

This shows that d|(an + bm) for any integers a and b.

In Theorem 1.2.2 (h), we see that if d is a divisor of n, then n/d is


also a divisor of n. If d is a divisor of n, then we call n/d is called the
conjugate divisor of d.

1.3 Greatest common divisors

Definition 1.3.1
Let a and b be integers for which at least one of them is non-zero. A
common divisor of integers a and b is an integer c with c|a and c|b.

Definition 1.3.2
A greatest common divisor of integers a and b is a number d with the
following properties :
(a) The integer d is non-negative.
(b) The integer d is a common divisor of a and b.
(c) If e is any common divisor of a and b, then e|d.
6 Fundamental Theorem of Arithmetic
The greatest common divisor of two integers (one of which is non-zero)
is unique. It is written as
(a, b).
We will show later that the greatest common divisors of two integers a
and b exists.

Remark 1.3.1

Note that if b 6= 0 and a = 0 then |b| = (b, 0).


We will next show that the greatest common divisor of two integers
exists. By Remark 1.3.1, it suffices to consider the case when both a
and b are nonzero.

Theorem 1.3.1
Let a and b be integers for which at least one of them is non-zero. Then
the smallest positive integer in the set
P := {sa + tb|s, t ∈ Z and sa + tb > 0}
is (a, b).

Proof
If a is positive then a ∈ P since
a = 1 · a + 0 · b.
Similarly, if b is positive, then b ∈ P . Suppose a and b are both negative.
Then 0·a+(−1)·b ∈ P . Hence that P is nonempty. By the Least Integer
Axiom, there is a smallest positive integer, say d, in P . Our aim is to
show that
d = (a, b).

Since d ∈ P ,
d = xa + yb (1.3.1)
for some integers x, y ∈ Z. We first show that d is a common divisor of
a and b.
By Theorem 1.2.1, we may suppose
a = dq + r, 0 ≤ r < d.
1.3 Greatest common divisors 7
Then
r = a − dq = a − (xaq + ybq) = a(1 − xq) + byq.
Therefore, r ∈ P and it is smaller than d. But d is the smallest integer
in P . Hence r = 0. In other words, d|a. By similar argument, with
a replaced by b, we conclude that d|b. This shows that d is a common
divisor of a and b.
Next, we observe that since d ∈ P , d > 0. Furthermore, if c|a and c|b
then a = cu and b = cv. This implies, by (1.3.1), that
d = xa + yb = c(ux + vy)
and hence, c|d. This shows that d satisfies the conditions in Definition
1.3.2 and we conclude that d = (a, b).
Identity (1.3.1) will be used frequently and we record it as follow:

Corollary 1.3.2 Let a and b be integer. Then there exist integers x


and y such that
(a, b) = ax + by.

Definition 1.3.3
We say that two integers a and b are relatively prime if
(a, b) = 1.

Theorem 1.3.3
Let a and b be nonzero integers. Then (a, b) = 1 if and only if 1 = ax+by
for some integers x and y.

Proof
Note that if (a, b) = 1, then by Corollary 1.3.2,
1 = ax + by
for some integers x and y.
Conversely, if
1 = ax + by,
8 Fundamental Theorem of Arithmetic
then (a, b)|a and (a, b)|b, and therefore (a, b)|1. This implies that (a, b) =
1.

We now list down some basic properties of the greatest common divisor
of two integers.

Theorem 1.3.4
Let a, b and c be nonzero integers. Then

(a) (a, b) = (b, a)


(b) (a, (b, c)) = ((a, b), c) and
(c) (ac, bc) = |c|(a, b).

Proof
We will prove only (c) and leave the proofs of the other statements as
exercises. Let d = (ac, bc) and d0 = |c|(a, b). By Corollary 1.3.2,

d = acx + bcy

for some integers x and y. Hence,


c
d= (a · |c| · x + b · |c| · y) . (1.3.2)
|c|

Now, d0 = |c|(a, b) and since (a, b)|a and (a, b)|b, we find that d0 is a
common divisor of a · |c| and b · |c| and therefore, by (1.3.2), d0 |d.
Next, since d0 /|c| = (a, b), by Corollary 1.3.2,

d0
= au + bv
|c|

for some integers u and v. This implies that

|c|
d0 = a · |c| · u + b · |c| · v = (acu + bcv) .
c
But d is a common divisor of ac and bc and hence d|d0 . Since d0 |d and
d|d0 , we conclude by Theorem 1.2.2 (g) that |d| = |d0 |. Since both d and
d0 are positive, we deduce that d = d0 .
1.4 Congruences 9
1.4 Congruences
We say that a is congruent to b modulo n when n|(a − b). The notation
is
a≡b (mod n).

Theorem 1.4.1 (Basic Properties of Congruences)

Let a, b, c, d, n be integers with n > 0. Then


(a) For all integers k, k ≡ k (mod n).
(b) If a ≡ b (mod n) then b ≡ a (mod n).
(c) If a ≡ b (mod n) and b ≡ c (mod n) then a ≡ c (mod n).
(d) If a ≡ b (mod n) and c ≡ d (mod n) then a + c ≡ b + d (mod n)
and ac ≡ bd (mod n).

The proof of Theorem 1.4.1 is straightforward and we leave it as an


exercise.
We know that if c 6= 0 then ca = cb implies that a = b. This is known
as the law of cancelation for equality. The law is not true in general if we
replace “=” by ≡. For example, 15 ≡ 3 (mod 12) but 5 6≡ 1 (mod 12).
The next result shows that the law of cancelation holds if we impose a
condition on the integer c.

Theorem 1.4.2
Let a, b, c and n be integers. If ca ≡ cb (mod n) and (c, n) = 1, then
a ≡ b (mod n).

Proof
Recall from Corollary 1.3.2 that if (c, n) = 1 then there exist integers x
and y such that cx + ny = 1. Multiplying a and b yields

acx + any = a

and
bcx + bny = b,

respectively. Since ac ≡ bc (mod n), we conclude that

a − b ≡ (ac − bc)x ≡ 0 (mod n)


10 Fundamental Theorem of Arithmetic
and hence,
a≡b (mod n).

Theorem 1.4.2 can be used to prove the following result of Euclid.

Corollary 1.4.3 (Euclid’s Lemma) Let a and b be integers and p be


a prime. If p|(ab), then p|a or p|b.

Proof
For any integer n, (n, p) = 1 or p since p has only two divisors. Suppose
p - a. Then (p, a)=1. By Theorem 1.4.2, the relation
ab ≡ 0 (mod p)
then implies that
b≡0 (mod p).

By induction, we have the following:

Corollary 1.4.4 Let a1 , a2 , · · · , am be integers and let p be a prime.


If p|(a1 a2 · · · am ) then p|ak for some k.

1.5 Fundamental Theorem of Arithmetic

Theorem 1.5.1 (Fundamental Theorem of Arithmetic)


Every positive integer n > 1 can be expressed as a product of primes;
this representation is unique apart from the order in which the factors
occur.

Proof
We first show that n can be expressed as a prime or a product of primes.
We use induction on n. The statement is clearly true for n = 2 since 2 is
a prime. Suppose m is a prime or a product of primes for 2 ≤ m ≤ n− 1.
If n is a prime then we are done. Suppose n is composite then n = ab,
where 1 < a, b < n. By induction each of the a and b is either a prime
1.5 Fundamental Theorem of Arithmetic 11
or a product of primes. Hence, n = ab is a product of primes. By
mathematical induction, every positive integer n > 1 is a prime or a
product of primes.
To prove uniqueness, we use induction on n again. If n = 2 then the
representation of n as a product of primes is clearly unique. Assume,
then that it is true for all integers greater than 1 and less than n. We
shall prove that it is also true for n. If n is prime, then there is noth-
ing to prove. Assume, then, that n is composite and that n has two
factorizations, say,
n = p1 p2 · · · ps = q1 q2 · · · qt . (1.5.1)
Since p1 divides the product q1 q2 · · · qt , it must divide at least one factor
by Corollary 1.4.4. Relabel q1 , q2 , ..., qt so that p1 |q1 . Then p1 = q1 since
both p1 and q1 are primes. In (1.5.1), we may cancel p1 on both sides
to obtain
n/p1 = p2 · · · ps = q2 · · · qt .
Now the induction hypothesis implies that the two factorizations of
n/p1 must be the same, apart from the order of the factors. Therefore,
s = t and the factorizations in (1.5.1) are also identical, apart from
order. This completes the proof.
In subsequent chapters, whenever we write
n = pα1 α2 αr
1 p2 · · · pr ,

we mean that pα 1 α2 αr
1 p2 · · · pr is the prime power decomposition of n that
is unique up to rearrangement of factors. When we write
r
Y
n= pα
k
k

k=1

we mean that αj 6= 0, 1 ≤ j ≤ r. If we write


Y
n= pαp ,
p

then we understand that only finitely many αp ’s are nonzero.


2
Arithmetical Functions

2.1 Arithmetical functions

Definition 2.1.1
A real or complex-valued function defined on the set of positive integers
is called an arithmetical function.

Example 2.1.1

1. The simplest arithmetical function is u(n) = 1 for all positive


integers n.

2. The function N (n) = n is also an arithmetical function.


3. We write
X
f (`)
`|n

to denote the sum of the values of f over divisors ` of n.


Let

X X
d(n) = 1= u(`).
`|n `|n

The function d(n) is the called the divisor function and it is the
number of divisors of n.

12
2.1 Arithmetical functions 13
4. Let α be a positive integer. Define
X
σα (n) = `α .
`|n

When α = 0, the function σ0 (n) coincides with d(n). When


α = 1, we write σ(n) instead of σ1 (n) and this gives the sum
of divisors of n. A positive integer is called a perfect number
if the sum of all divisors less than n is equal to n. There is a
one to one correspondence between even perfect numbers and
the Mersenne primes (which are primes of the form 2p − 1, with
p a prime). More precisely, an even integer is perfect if and
only if it is of the form 2p−1 (2p − 1) where 2p − 1 is a prime.
For example, the perfect numbers 6 and 28 correspond to the
3 = 22 − 1 and 7 = 23 − 1 respectively. Showing that there are
infinitely many even perfect numbers is therefore equivalent to
showing the infinitude of Merenne’s primes. Both statements,
however, remain as conjectures. We mention here that it is not
known if there were any odd perfect number.

Remark 2.1.1

Suppose `1 , `2 , · · · , `k are the divisors of n, then


X
f (`) = f (`1 ) + f (`2 ) + · · · + f (`k ).
`|n

Now, there is a one to one correspondence between `j and its conjugate


divisor n/`j . And each of these conjugate divisors is also a divisor of n.
Hence,
X X n
f (`) = f (`1 ) + · · · f (`k ) = f (n/`1 ) + · · · f (n/`k ) = f .
`
`|n `|n

Therefore
X X n
f (`) = f . (2.1.1)
`
`|n `|n
14 Arithmetical Functions
2.2 Multiplicative functions

Definition 2.2.1
An arithmetical function f is said to be multiplicative if

f (1) = 1

and
f (mn) = f (m)f (n) whenever (m, n) = 1.

Example 2.2.1

1. The functions u(n) and N (n) are clearly multiplicative.

2. The function d(n) is multiplicative. If m and n are relatively


Qs Qt βj
prime, we may let m = k=1 pα k and n =
k
j=1 qj . Then the
divisor of mn is of the form
YY
pγk q δj ,
k j

where 0 ≤ γk ≤ αk and 0 ≤ δj ≤ βj . The total number of


Q Q
such divisors are k (αk + 1) j (βj + 1). But this is precisely
d(m)d(n).

Suppose n > 1 is an integer written in the form


k
Y
n= pα
i
i

i=1

and if f is multiplicative, then


k
! k
Y Y
f pα
i
i
= f (pα
i ).
i

i=1 i=1

This shows that if f is multiplicative, then its value at any positive


integer n is determined by its values at prime powers.
We now prove a simple but useful result for multiplicative functions.
2.2 Multiplicative functions 15
Theorem 2.2.1
Let f be a multiplicative function. Then the function
X
g(n) = f (`)
`|n

is also multiplicative.

Proof
It is immediate that g(1) = 1. Let (m, n) = 1. Then observe that if
`|mn, then we may write ` = `1 `2 with `1 |m and `2 |n since m and n are
relatively prime. Therefore
X
g(mn) = f (`)
`|mn
XX
= f (`1 )f (`2 )
`1 |m `2 |n

= g(m)g(n).

Remark 2.2.1

1. Note that since u(n) is multiplicative, this implies immediately


that d(n) is multiplicative.
2. We can show one direction of the correspondence between perfect
numbers and Mersenne’s primes. Suppose 2p − 1 is a prime. We
claim that n = 2p−1 (2p − 1) is a perfect number. Note that σ(n)
is multiplicative by Theorem 2.2.1 since N (n) is multiplicative.
Since 2p−1 and 2p − 1 are relatively prime, we conclude that
σ(n) = σ(2p−1 )σ(2p − 1).
But the divisors of 2p−1 are 2d , 0 ≤ p−1 and the divisors of 2p −1
are 1 and 2p − 1 since 2p − 1 is a prime. Hence
σ(n) = (1 + 2 + · · · + 2p−1 )(1 + 2p − 1) = (2p − 1)2p = 2n.
This implies that n is a perfect number. We leave the converse
as an exercise.
16 Arithmetical Functions
2.3 The Möbius function

Let us now introduce one of the most important arithmetical functions,


namely, the Möbius function µ(n).

Definition 2.3.1
Let µ(1) = 1. If n = pα αk
1 · · · pk , then define
1

(
(−1)k if α1 = α2 = · · · = αk = 1,
µ(n) =
0 otherwise.
The function µ(n) is known as the Möbius function.
We observe that by the above definition, the Möbius function µ(n) is
identically zero if and only if n has a square factor greater than 1.
Now, let
k
Y t
Y β
m= pα
i
i
and n = qj j .
i=1 j=1

If (m, n) = 1, then for αi = βk = 1, 1 ≤ i ≤ k, 1 ≤ j ≤ s,


µ(mn) = (−1)s+t = µ(m)µ(n).
If at least one of the elements in {αi , βj |1 ≤ i ≤ k, 1 ≤ j ≤ t} is greater
than 1, then µ(mn) = 0 = µ(m)µ(n). Hence µ is multiplicative.

Theorem 2.3.1
Let n be any positive integer and [x] denote the integer part of a real
number x. We have
X   (
1 1 if n = 1,
µ(`) = =
n 0 if n > 1.
`|n

Proof
P
By Theorem 2.2.1, we know that g(n) = `|n µ(`) is multiplicative. In
other words, g(1) = 1 and
Y Y
g( pαp ) = g(pαp ).
p

But
g(pαp ) = µ(1) + µ(p) + 0 + · · · + 0 = 1 − 1 = 0.
2.4 The Euler totient function 17
In other words, if n 6= 1, then g(n) = 0.

2.4 The Euler totient function

Definition 2.4.1
The Euler totient ϕ(n) is defined to be the number of positive integers
not exceeding n which are relatively prime (see Definition 1.3.3) to n.

It is sometimes convenient to write ϕ(n) as


n
X
ϕ(n) = 1. (2.4.1)
k=1
(k,n)=1

Theorem 2.4.1
Let n be any positive integer. Then
X n
ϕ(n) = µ(`) .
`
`|n

Proof
We first observe that
(
1 if j = 1,
I(j) =
0 if j > 1.

Therefore, if g(k) is an arithmetical function, then


n
X n
X
g(k) = g(k)I((k, n)).
k=1 k=1
(k,n)=1

Setting g(k) = 1, we find that


n
X n
X
ϕ(n) = 1= I((k, n)).
k=1 k=1
(k,n)=1
18 Arithmetical Functions
Now,
n
X n
X X n X
X
ϕ(n) = I((k, n)) = µ(`) = µ(`)
k=1 k=1 `|(k,n) k=1 `|k
`|n
n/`
X X X n
= µ(`) 1= µ(`) .
q=1
`
`|n `|n

This completes the proof of the theorem.

Theorem 2.4.2
Let n be any positive integer with prime factorization
k
Y α
n= pj j .
j=1

Then
Y 1

ϕ(n) = n 1− .
p
p|n

Proof
Since µ(n)/n is multiplicative, by Theorem 2.2.1 By Theorem 2.4.1, we
conclude that ϕ(n)/n is multiplicative. Therefore
 
ϕ(n) Y 1
= 1− ,
n p
p|n

which completes the proof of the theorem.

Note that since N (n) = n and ϕ(n)/n are multiplicative, we deduce


that

Theorem 2.4.3
If m and n are positive integers such that (m, n) = 1, then ϕ(mn) =
ϕ(m)ϕ(n).

We next show the following result.


2.4 The Euler totient function 19
Theorem 2.4.4
If f is a multiplicative function, then
f (m)f (n) = f ((m, n))f ([m, n]).

Proof
Let
Y α
Y Y γ
Y
m= pj j qkβk and n = pj j r`δ` ,
j k j `

where pj are the prime divisors of (m, n). Note that


Y α γ
f (m)f (n) = f (pj j )f (pj j )f (qkβk )f (r`δ` )
j,k,`

and
Y min(αj ,γj ) max(αj ,γj )
f ([m, n])f ((m, n)) = f (pj )f (pj )f (qkβk )f (r`δ` ).
j,k,`

The result follows from the fact that


(min(αj , γj ), max(αj , γj )) = (αj , γj ) or (γj , αj ).

Note that when f (n) = N (n), we recover the formula


mn = (m, n)[m, n]. (2.4.2)
We are now ready to prove a generalization of Theorem 2.4.3.

Corollary 2.4.5 If m and n are positive


 integers such that d =
d
(m, n), then ϕ(mn) = ϕ(m)ϕ(n) .
ϕ(d)

Proof
Let f (n) = ϕ(n). Then
ϕ(m)ϕ(n) = ϕ((m, n))ϕ([m, n]).
Note that with the notation of m and n in the proof of Theorem 2.4.4,
we find that
Y 1

1

1

ϕ(mn)
ϕ([m, n]) = [m, n] 1− 1− 1− = ,
pj qk r` d
j,k,`
20 Arithmetical Functions
where we have used (2.4.2). This completes the proof of the corollary.

2.5 Dirichlet product of arithmetical functions and


multiplicative functions

Definition 2.5.1
Let f and g be two arithmetical functions. We define the Dirichlet
product of f and g, denoted by f ∗ g, as
X n
(f ∗ g)(n) = f (`)g .
`
`|n

We will often use f ∗ g to represent the function (f ∗ g)(n), suppressing


the argument n.
Using the above notation, Theorem 2.4.1 can simply be written as

ϕ = µ ∗ N.

Our aim now is to show that the set of multiplicative functions, which
we denote as M, together with the operation ∗ forms an abelian group.
We first note that ∗ is a binary operation on M. The proof of this fact
is similar to the proof of Theorem 2.2.1.

Theorem 2.5.1
Let f and g be multiplicative functions. Then f ∗ g is multiplicative.

Proof
Let h = f ∗ g. Note that

h(1) = f (1)g(1) = 1.

Next, consider the expression


X  mn 
h(mn) = f (c)g .
c
c|mn
2.5 Dirichlet product 21
Given that (m, n) = 1, we can write c = ab, where a|m and b|n. There-
fore, we deduce that
XX m n
h(mn) = f (ab)g
a b
a|m b|n
XX m n
= f (a)f (b)g g ,
a b
a|m b|n

since (m/a, n/b) = 1 and both f and g are multiplicative. This implies
that
X m X n
h(mn) = f (a)g f (b)g
a b
a|m b|n

= h(m)h(n).

The following result shows that ∗ is both a commutative and associa-


tive operation on M.

Theorem 2.5.2
The Dirichlet product is commutative and associative, that is, for any
arithmetical functions f, g, k, we have

f ∗g =g∗f

and
(f ∗ g) ∗ k = f ∗ (g ∗ k).

Proof
The Dirichlet product of f and g is given by
X n
(f ∗ g)(n) = f (`)g .
`
`|n

Let d1 = n/d be the conjugate divisor of d. As d runs through all divisors


of n, so does d1 . By (2.1.1),
X n
(f ∗ g)(n) = f g(d1 ) = (g ∗ f )(n).
d1
d1 |n
22 Arithmetical Functions
To prove the associativity property, let A = g ∗ k. Then
X n
(f ∗ A)(n) = f (a)A
a
a|n
X X
= f (a) g(b)k(c)
a·d=n b·c=d
X
= f (a)g(b)k(c).
a·b·c=n

Similarly, if we set B = (f ∗ g), then


X
(B ∗ k)(n) = B(d)k(c)
d·c=n
X X
= f (a)g(b)k(c)
d·c=n a·b=d
X
= f (a)g(b)k(c).
a·b·c=n

Therefore,
(f ∗ (g ∗ k))(n) = ((f ∗ g) ∗ k)(n).

Our next step is to establish an identity for (M, ∗).

Definition 2.5.2
Let n be any positive integer. The arithmetical function I is defined by
  (
1 1 if n = 1
I(n) = =
n 0 if n > 1

The function I(n) appeared in Theorem 2.3.1 as µ ∗ u.

Theorem 2.5.3
The function I is the identity function for ∗, that is, I ∗ f = f ∗ I = f
for every arithmetical function f .

Proof
By the definition of I, we find that
X n
(I ∗ f )(n) = I(`)f = f (n).
`
`|n
2.5 Dirichlet product 23
By the commutative law in Theorem 2.5.2, we conclude that
f ∗ I = f.

We now show that for any arithmetical function f (n) such that f (1) 6=
0 (not necessarily multiplicative), the inverse of f under ∗ exists.

Theorem 2.5.4
Let f be an arithmetical function. If f (1) 6= 0, then there is a unique
function g such that
f ∗ g = I. (2.5.1)

Proof
We show by induction on m that (2.5.1) has a unique solution g(m). In
order for (2.5.1) to hold, the function g(n) must satisfy
f (1)g(1) = 1.
Since f (1) 6= 0, we find that
1
g(1) =
f (1)
and g(1) is uniquely determined. Suppose m > 1 and assume the values
of g(k) have been determined for 1 ≤ k ≤ m − 1. From (2.5.1), we find
that
X m
f (1)g(m) + f (`)g = 0.
`
`|m
`>1

Therefore,
 
1  X  
m 
g(m) = − f (`)g 
f (1)  ` 
`|m
`>1

and g(m) is uniquely determined. By mathematical induction, there is


a unique function g(n) such that
f ∗ g = I.
24 Arithmetical Functions
Definition 2.5.3
Given an arithmetical function f such that f (1) 6= 0. The unique
function g such that f ∗ g = I is called the Dirichlet inverse of f . The
notation for the Dirichlet inverse of f is f −1 .

Example 2.5.1
From Theorem 2.3.1 which states that I = u ∗ µ, we conclude that the
inverse of µ is u.
The above observation leads us to the following important result.

Theorem 2.5.5 (Möbius inversion formula)


If f = g ∗ u, then g = f ∗ µ. Conversely, g = f ∗ µ implies that f = g ∗ u.

Proof
Suppose f = g ∗ u. Then
f ∗ µ = (g ∗ u) ∗ µ = g ∗ (u ∗ µ) = g ∗ I = g.
Conversely, if g = f ∗ µ then
g ∗ u = (f ∗ µ) ∗ u = f ∗ (µ ∗ u) = f ∗ I = f.

We can now show the following identity that relates N (n) to ϕ(n).

Theorem 2.5.6
Let n be any positive integer. Then
X
ϕ(`) = n.
`|n

Proof
We have seen from Theorem 2.4.1 that
ϕ = µ ∗ N.
By Möbius inversion formula, we conclude that
N = u ∗ ϕ.
2.5 Dirichlet product 25
From the construction of f −1 in Theorem 2.5.4, it is not clear that
the Dirichlet inverse of a multiplicative function f is multiplicative. To
complete the proof that (M, ∗) forms an abelian group, it suffices to f −1
is multiplicative if f is multiplicative.

Theorem 2.5.7
If both g and f ∗ g are multiplicative, then f is also multiplicative.

Proof
We prove the theorem by contradiction. Suppose f is not multiplicative.
Let
h = f ∗ g.

Since f is not multiplicative, there exist two relatively prime integers m


and n such that
f (mn) 6= f (m)f (n).

We choose mn as small as possible. If mn = 1, then

f (1) 6= f (1)f (1)

and f (1) 6= 1. Since h(1) = f (1)g(1) = f (1) 6= 1, we conclude that h is


not multiplicative, which leads to a contradiction. Hence, mn 6= 1.
If mn > 1, then
f (ab) = f (a)f (b)

for all 1 ≤ ab < mn and (a, b) = 1. Now,


X  mn 
h(mn) = f (mn)g(1) + f g(ab)
ab
a|m
b|n
1<ab≤mn
X m n
= f (mn) + f f g(a)g(b). (2.5.2)
a b
a|m
b|n
1<ab≤mn

Note that the difference between


X m n X m n
f f g(a)g(b) and f f g(a)g(b) =: h(m)h(n)
a b a b
a|m a|m
b|n b|n
1<ab≤mn
26 Arithmetical Functions
is the term when a and b are both 1. In other words, we find from (2.5.2)
that
h(mn) − h(m)h(n) = f (mn) − f (m)f (n).
Since f (mn) 6= f (m)f (n), we find that h(mn) 6= h(m)h(n). This implies
that h is not multiplicative, which contradicts our assumption that h is
multiplicative.

Theorem 2.5.8
If g is multiplicative, then the Dirichlet inverse g −1 is also multiplicative.

Proof
The functions g and g ∗ g −1 = I are multiplicative. By Theorem 2.5.7,
g −1 is multiplicative.
3
Averages of Arithmetical Functions

3.1 Introduction

Let x be a positive real number. We use the notation


X
f (n)
n≤x

to denote the sum


f (1) + f (2) + · · · + f ([x]).
For positive real number x, the mean of the function f from 1 to x is
defined by
1X
f (x) = f (n)
x
n≤x

The purpose of studying f (x) is because in general, f (x) behaves more


regularly than f ([x]), especially when x is large. For example, when f
is the characteristic function for primes, namely,
(
1 if n is a prime
f (n) =
0 otherwise.
The function
X
f (n)
n≤x

is usually written as π(x) and the Prime Number Theorem states that
f (x) = π(x)/x “behaves” like 1/ln x. On the other hand, we cannot
predict the value of f (n) for each n = [x] since we do not know the
location of primes in Z.
We now introduce the “big-O” notation and the notion of asymptotic.

27
28 Averages of Arithmetical Functions
Definition 3.1.1
Let a be any real number and let g(x) be a real-valued function such
that g(x) > 0 when x ≥ a. We write

f (x) = O(g(x))

to mean that the quotient f (x)/g(x) is bounded for x ≥ a; that is, there
exists a constant M > 0 such that

|f (x)| ≤ M g(x) for all x ≥ a.

Sometimes, we will also use the notation

f (x)  g(x)

to represent f (x) = O(g(x)).

Example 3.1.1
The function x2 = O(x3 ) when x is large. The function xn = O(ex ) for
any positive integer n.

Definition 3.1.2
If
f (x)
lim = 1,
x→∞ g(x)

then we say that f (x) is asymptotic to g(x) as x → ∞, and we write

f (x) ∼ g(x) as x → ∞.

Example 3.1.2
The Prime Number Theorem can be written as
x
π(x) ∼ .
ln(x)
3.2 Partial summation and the Euler-Maclaurin summation formula29
3.2 Partial summation and the Euler-Maclaurin summation
formula

Theorem 3.2.1
Let a(n) be an arithmetic function and set
X
A(x) = a(n).
n≤x

Let 0 ≤ y < x be real numbers and f be a real-valued function with


continuous derivative on [y, x]. Then
X Z x
a(n)f (n) = f (x)A(x) − f (y)A(y) − A(t)f 0 (t) d t. (3.2.1)
y<n≤x y

Proof
We observe that
Z x Z xX
A(t)f 0 (t) d t = a(n)f 0 (t) d t (3.2.2)
y y n≤t
X Z x
= a(n) f 0 (t) d t
n≤x max(y,n)
X
= a(n)[f (x) − f (max(y, n))].
n≤x

Therefore,
Z x X X
A(t)f 0 (t) d t = f (x)A(x) − a(n)f (y) − a(n)f (n)
y n≤y y<n≤x
X
= f (x)A(x) − f (y)A(y) − a(n)f (n).
y<n≤x

Simplifying, we find that


X Z x
a(n)f (n) = A(x)f (x) − A(y)f (y) − A(t)f 0 (t) d t.
y<n≤x y
30 Averages of Arithmetical Functions
Remark 3.2.1
The second equality of (3.2.2) follows from interchanging the integral
with the summation. We now explain the limits in the integral using
Figure 3.1. Note that for a fixed t, the sum is over all n ≤ t (consider
the vertical line). For a fixed n, we integrate from y to x if n < y and
from n to x if n ≥ y (consider the two horizontal lines in the shaded
region). Hence for a fixed n, we integrate from max(n, y) to x.

Fig. 3.1. Diagram for explaining the limits in the integral of (3.2.2)

When y = 1, we have

X Z x
a(n)f (n) = A(x)f (x) − A(1)f (1) − A(t)f 0 (t) dt.
1<n≤x 1

But
X X
a(n)f (n) + A(1)f (1) = a(n)f (n) + a(1)f (1)
1<n≤x 1<n≤x
X
= a(n)f (n).
n≤x
3.3 Some facts about Riemann-Stieltjes integrals 31
Consequently, we have
X Z x
a(n)f (n) = A(x)f (x) − A(t)f 0 (t) dt. (3.2.3)
n≤x 1

We now deduce the Euler-Maclaurin summation from Theorem 3.2.1.

Theorem 3.2.2 (The Euler-Maclaurin summation formula)

Let 0 < y < x and let f (x) be a real-valued function with continuous
derivative on [y, x]. Then
X Z x Z x
f (n) = f (t) dt + {t}f 0(t) dt − f (x){x} + f (y){y}. (3.2.4)
y<n≤x y y

Proof
By partial summation formula with a(n) = 1 and A(x) = [x], we find
that
X Z x
f (n) = f (x)[x] − f (y)[y] − [t]f 0 (t) dt
y<n≤x y
Z x
= f (x)x − {x}f (x) + f (y){y} − f (y)y − (t − {t})f 0 (t) dt
y
Z x Z x
0
= −{x}f (x) + f (y){y} + {t}f (t) dt + f (x)x − f (y)y − tf 0 (t) dt
y y
Z x Z x
0
= −f (x){x} + f (y){y} + {t}f (t) dt + f (t) dt.
y y

We can derive the analogue of (3.2.3) from . We observe that {1}f (1) =
0 and by adding f (1) to both sides of (3.2.4), we deduce that
X Z x Z x
0
f (n) = f (1) − f (x){x} + {t}f (t) dt + f (t) dt. (3.2.5)
n≤x y y

3.3 Some facts about Riemann-Stieltjes integrals


In this section, we give another proof of Theorem 3.2.2. Our main
reference of this section is Mathematical Analysis (second edition) by
T.M. Apostol (will be referred to as [MA-Apostol]).
32 Averages of Arithmetical Functions
Definition 3.3.1
If [a, b] is a compact interval, a set of points

P = {a = x0 , x1 , x2 , · · · , xn−1 , xn = b}

satisfying the inequalities

a = x0 < x1 < x2 < · · · < xn−1 < xn = b,

is called a partition of [a, b]. The collection of all possible partitions of


[a, b] will be denoted by P[a, b].

Definition 3.3.2
Let f be defined on [a, b] and let P = {a = x0 , x1 , x2 , · · · , xn−1 , xn = b}
be a partition of [a, b]. If there exists a positive number M such that
n
X
|f (xk ) − f (xk−1 )| ≤ M
k=1

for all partitions P ∈ P[a, b], then f is said to be of bounded variation


on [a, b].

Definition 3.3.3
A partition P 0 is said to be finer than P if P ⊂ P 0 .

Definition 3.3.4
Let P = {a = x0 , x1 , x2 , · · · , xn−1 , xn = b} be a partition of [a, b] and
let tk be a point in the subinterval [xk−1 , xk ]. A sum of the form
n
X
S(P, f, α) = f (tk )(α(xk ) − α(xk−1 ))
k=1

is called a Riemann-Stieltjes sum of f with respect to α. We say that f is


Riemann-integrable with respect to α on [a, b], and we write “f ∈ R(α)
on [a, b]” if there exists a number A having the property : For every
 > 0, there exists a partition P of [a, b] such that for every partition P
finer than P and for every choice of the points tk ∈ [xk−1 , xk ], we have

|S(P, f, α) − A| < .

The number A, if exists, is called the Riemann-Stieltjes integral of f


3.3 Some facts about Riemann-Stieltjes integrals 33
with respect to α on [a, b]. The number A is denoted by
Z b Z b
f dα or f (t)dα(t).
a a

We have the following facts:

Theorem 3.3.1
If f is continuous on [a, b] and if α is of bounded variation on [a, b], then
f ∈ R(α) on [a, b].

Theorem 3.3.2
If f ∈ R(α) on [a, b], then α ∈ R(f ) on [a, b] and we have
Z b Z b
f (x)dα(x) + α(x)df (x) = f (b)α(b) − f (a)α(a).
a a

For the proofs of Theorems 3.3.1 and 3.3.2, see [MA-Apostol, p.159,
Theorem 7.27] and [MA-Apostol, p.144, Theorem 7.6] respectively.
We now give another proof of Theorem 3.2.1.

Proof [Second proof of Theorem 3.2.1]


Note that if P is chosen so that [xk−1 , xk ] contains just one integral
point, then
n
X
S(P, f, α) = f (tk )(A(xk ) − A(xk−1 ))
k=1

= f (s[y+1] )a([y + 1]) + · · · + f (s[x] )a([x]),


where sj = tk for some k for which [xk−1 , xk ] contains the integer j.
Note that as the length of each interval of the partition P tends to 0,
tk → j. Hence, we conclude that
Z x X
f (t)dA(t) = f (n)a(n). (3.3.1)
y y<n≤x

By Theorem 3.3.2, we find that


Z x Z x
f (t)dA(t) + A(t)df (t) = f (x)A(x) − f (y)A(y)
y y

and this is precisely Theorem 3.2.1 by (3.3.1).


34 Averages of Arithmetical Functions
Note that the proof of Theorem 3.2.2 can also be proved directly using
Theorem 3.3.2 by setting a(n) = 1 for all positive integers n.

3.4 Some elementary asymptotic formulas

Definition 3.4.1
For each real number s > 1, we define the Riemann zeta function as
X∞
1
ζ(s) = s
.
n=1
n

Definition 3.4.2
The Euler constant C is defined as
 
1 1 1
C = lim 1 + + + · · · + − ln n .
n→∞ 2 3 n

Theorem 3.4.1
If x ≥ 1, then
X 1  
1
(a) = ln x + C + O ,
n x
n≤x
X 1 x1−s
(b) = + C(s) + O(x−s ) if s > 0 and s 6= 1,
ns 1−s
n≤x

where


 ζ(s) if s > 1,
  
C(s) = X 1 x1−s

 lim 
x→∞ −  if 0 < s < 1.
ns 1−s
n≤x

X 1
(c) s
= O(x1−s ) if s > 1, and
n>x
n
X xα+1
(d) nα = + O(xα ) if α ≥ 0.
α+1
n≤x
3.4 Some elementary asymptotic formulas 35
Proof
To prove (a), we first let f (t) = 1/t in Theorem 3.2.2. Then by (3.2.5),
X 1 Z x dt Z x {t} {x}
= − 2
dt + 1 −
n 1 t 1 t x
n≤x
Z x  
{t} 1
= ln x − 2
dt + 1 + O
1 t x
Z ∞ Z ∞  
{t} {t} 1
= ln x + 1 − 2
dt + 2
dt + O .
1 t x t x
The improper integral
Z ∞
{t}t−2 dt
1

exists since it is dominated by


Z ∞
t−2 dt.
1

Furthermore,
Z ∞ Z ∞
{t} 1 1
0≤ dt ≤ dt = ,
x t2 x t 2 x
so the last equation becomes
X1 Z ∞  
{t} 1
= ln x + 1 − 2
dt + O .
n 1 t x
n≤x

This proves (a) with


 
Z ∞ X
{t} 1
C =1− dt = lim  − ln x .
1 t2 x→∞ n
n≤x

To prove (b), we use the same argument with

f (x) = x−s ,

where s > 0, s 6= 1. The Euler-Maclaurin summation implies that


X 1 Z ∞
x1−s 1 {t}
= − + 1 − s dt + O(x−s ).
ns 1−s 1−s 1 t s+1
n≤x

Therefore,
X 1 x1−s
s
= + C(s) + O(x−s ), (3.4.1)
n 1−s
n≤x
36 Averages of Arithmetical Functions
where
Z ∞
1 {t}
C(s) = 1 − −s dt.
1−s 1 ts+1
If s > 1 then the left-hand side of (3.4.1) approaches ζ(s) as x approaches
∞ and both x1−s and x−s approach 0. Hence

C(s) = ζ(s)

if s > 1. If 0 < s < 1, then


1
lim =0
x→∞ xs
and (3.4.1) shows that
 
X 1 x1−s
C(s) = lim  − 
x→∞ ns 1−s
n≤x

and this completes the proof of (b).


To prove (c), we use (b) with s > 1 to obtain
X 1 X 1 x1−s
s
= ζ(s) − s
= ζ(s) + − C(s) + O(x−s ) = O(x1−s )
n>x
n n s − 1
n≤x

since C(s) = ζ(s) for s > 1 and x−s ≤ x1−s


Finally, to prove (d), we use the Euler-Maclaurin summation formula
with f (t) = tα to obtain
X Z x Z x
nα = tα dt + α tα−1 {t} dt + 1 − {x}xα
n≤x 1 1
α+1
 Z x 
x 1 α−1
= − +O α t dt + O(xα )
α+1 α+1 1
xα+1
= + O(xα ).
α+1

3.5 The divisor function and Dirichlet’s hyperbola method


In this section, we will first discuss the hyperbola method and then apply
the method to study the mean value of the divisor function d(n).
3.5 The divisor function and Dirichlet’s hyperbola method 37
Theorem 3.5.1
Let f and g be two arithmetic functions with
X X
F (x) = f (n), and G(x) = g(n).
n≤x n≤x

For 1 ≤ y ≤ x, we have
X X x X x  
x
(f ∗ g)(n) = g(n)F + f (m)G −F G(y).
n x m y
n≤x n≤y m≤ y

Proof
First, we observe that
X X
(f ∗ g)(n) = f (m)g(d).
n≤x md≤x

Next, for y ≤ x, we find that


X X X
f (m)g(d) = f (m)g(d) + f (m)g(d)
md≤x md≤x md≤x
d≤y d>y
X x X n x o
= g(d)F + f (m) G − G(y) .
d x m
d≤y m≤ y

We now set f = g = u. Then


f ∗ g = u ∗ u = d.

Note that F (x) = [x] = G(x). Let y = x. Then by Theorem 3.5.1,
X X hxi √
d(n) = 2 − [ x]2
√ n
n≤x n≤ x
X 1 √
= 2x − x + O( x).
√ n
n≤ x

Using Theorem 3.4.1 (a), we conclude that

Theorem 3.5.2
For all x ≥ 1,
X √
d(n) = x ln x + (2γ − 1)x + O( x),
n≤x
38 Averages of Arithmetical Functions
where γ is the Euler’s constant.

As a corollary, we deduce that

d(x) ∼ ln x. (3.5.1)

The asymptotic (3.5.1) can be shown without using the Dirichlet’s


Hyperbola method. However, the error term obtained would be O(x)

instead of O( x).

Remark 3.5.1
The error term in Theorem 3.5.2 can be improved. In 1903 Voronoi
proved that it is O(x1/3 log x). In 1928, J.G. van der Corput improved the
error term to O(x27/82 ) using the method of exponential sums. In 1988,
H. Iwaniec and C.J. Mozzochi showed that the error term can be taken
as O(x7/22 ). The best possible error term is one given recently by M.N.
Huxley in 2003, who showed that the error is O(x131/416 (ln x)26947/8320 ).

3.6 An application of the hyperbola method


An interesting question one can ask is:
“If two positive integers are randomly chosen, what is the probability
that they are relatively prime?”
To answer this question, we first show the following result:

Theorem 3.6.1
Let ϕ(n) be the Euler ϕ function. For x > 1,
X 3
ϕ(n) = x2 + O(x3/2 ).
π2
n≤x

Proof
We recall that ϕ = µ ∗ N . Applying Theorem 3.5.1 with f = N and
g = µ, we find that
X X X x X x  
x
ϕ(n)= µ∗N (n)= µ(n)F + N (m)G −F G(y),
n x m y
n≤x n≤x n≤y m≤ y
3.6 An application of the hyperbola method 39
where
X x2
F (x) = N (n) = + O(x)
2
n≤x

and
X
G(x) = µ(n) = O(x).
n≤x

Therefore,
 
X  x 2 X
1X


ϕ(n) = µ(n) 
+O µ(n) x
2 n n≤y
n≤x n≤y
 
X  2 !
 x x
+O m +O y .
x
m y
m≤ y

Let y = x and we conclude that
X X µ(n) x2
ϕ(n) = + O(x3/2 ). (3.6.1)
√ n2 2
n≤x n≤ x

We will show in the chapter on Dirichlet series that


X∞
µ(n)
ζ(2) = 1.
n=1
n2
This implies that
X∞
µ(n) 6
2
= 2,
n=1
n π
since
π2
ζ(2) =.
6
Given the above identity, we find that

X µ(n) X ∞ X µ(n)
µ(n)
2
= 2

√ n n=1
n √ n2
n≤ x n> x
 
X∞ X 1
µ(n)
= 2
+O 2

n=1
n √ n
n> x
6
= 2 + O(x−1/2 ),
π
40 Averages of Arithmetical Functions
where the last equality follows from Theorem 3.4.1 (c). Substituting the
above into (3.6.1), we conclude the proof of the theorem.
Now, let T be a positive integer and
S = {(m, n)|1 ≤ m ≤ T, 1 ≤ n ≤ T }.
Then the total number of elements in S such that (m, n) = 1 is given by
X X X X
1=1+2 1
n≤T m≤T m≤T n<m
(m,n)=1 (m,n)=1
X 6 2
=1+2 ϕ(m) = T + O(T 3/2 ).
π2
m≤T

This shows that the probability that two randomly chosen positive in-
tegers are relatively prime is
|S|
lim = 6/π 2 .
T →∞ T 2
4
Elementary Results on the Distribution of
Primes

4.1 Introduction

Definition 4.1.1
For real number x > 0, let π(x) denote the number of primes not
exceeding x.

The behavior of π(x) as the function of x has been studied by many


mathematicians ever since the eighteenth century. Inspection of tables
of primes led C.F. Gauss (1792) and A.M. Legendre (1798) to conjecture
that
x
π(x) ∼ . (4.1.1)
ln x
This conjecture was first proved independently by J. Hadamard and
de la Vallée Poussin in 1896 and is known now as the Prime Number
Theorem. We record the theorem as follows:

Theorem 4.1.1 (Prime Number Theorem)


Let x be a real positive number and π(x) be the number of primes less
than x. Then
x
π(x) ∼ .
ln x

Proofs of the Prime Number Theorem are often classified as elemen-


tary or analytic. The proofs of J. Hadamard and de la Vallee Poussin are
analytic, using complex function theory and properties of the Riemann
zeta function ζ(s) (see Definition 3.4.1 for the definition of ζ(s) when
s ∈ R and s > 1). Elementary proofs were discovered around 1949 by

41
42 Elementary Results on the Distribution of Primes
A. Selberg and P. Erdös. Their proofs do not involve ζ(s) and complex
function theory, hence the name “elementary”.
There are other elementary proofs of the prime number theorem since
the appearance of the work of Selberg and Erdös, one of which is due
to A. Hildebrand. The proof given by Hildebrand relies on proving an
equivalent statement of the Prime Number Theorem and the mean value
of µ(n). This equivalent statement of the Prime Number Theorem will
be established in Section 4.5.
In this chapter, we derive some basic properties of π(x) and establish
several statements equivalent to the Prime Number Theorem. We will
also use the results discussed in this chapter to study Bertrand’s Pos-
tulate, which states that for n ≥ 2, there exists a prime between n and
2n.

4.2 The function ψ(x)


We recall the definition of Mangoldt’s function

Definition 4.2.1
Let n be a positive integer and let
(
ln p, if n is a prime power
Λ(n) =
0, otherwise.

Definition 4.2.2
For real number x ≥ 1,
X X
ψ(x) = Λ(n) = ln p.
n≤x pm ≤x

Theorem 4.2.1
There exist positive constants c1 and c2 such that

c1 x ≤ ψ(x) ≤ c2 x.
4.2 The function ψ(x) 43
Proof
For x ≥ 4, let
X X
S= ln n − 2 ln n.
n≤x n≤ x
2

By Theorem 3.2.2 with f (n) = ln n, we find that


X Z x Z x
1
ln n = ln tdt + {t} dt − {x} ln x + {y} ln y
1 1 t
n≤x

= x ln x − x + O(ln x). (4.2.1)

This implies that


S = x ln 2 + O(ln x).

Therefore, there exists an x0 ≥ 4 such that

x
≤S≤x (4.2.2)
2
whenever x ≥ x0 ≥ 4. Next, since
X
ln n = Λ(d),
d|n

we find that
XX XX
S= Λ(d) − 2 Λ(d)
n≤x d|n n≤ x
2 d|n
X hxi hxi
X
= Λ(d) −2 Λ(d)
d 2d
d≤x d≤ x
2
X nh x i h x io X hxi
= Λ(d) −2 + Λ(d) .
x d 2d x d
d≤ 2 2 <d≤x

Hence,
X X hxi
S= Λ(d)θd + Λ(d) ,
d
d≤ x
2
x
2 <d≤x

where
hxi hxi
θd = −2 . (4.2.3)
d 2d
Now, for
x
< d ≤ x,
2
44 Elementary Results on the Distribution of Primes
we have
hxi
= 1.
d
Therefore, we may simplify the second term on the right-hand side of
(4.2.3) to obtain
X X
S= Λ(d)θd + Λ(d). (4.2.4)
d≤ x
2
x
2 <d≤x

We now observe that θd = 0 or 1 since


[y] − 2[y/2] = 0 or 1.
Using (4.2.4), we deduce that

X X X
S≤ Λ(d) + Λ(d) = Λ(d) = ψ(x) (4.2.5)
d≤ x
2
x
2 <d≤x d≤x

and
X x
S≥ Λ(d) = ψ(x) − ψ . (4.2.6)
x
2
2 <d≤x

From (4.2.2) and (4.2.5),


x
ψ(x) ≥ S ≥ (x ≥ x0 ).
2
Therefore,
ψ(x) ≥ c1 x.
To obtain a lower bound for ψ(x), we first deduce from (4.2.2), (4.2.6)
that
x
ψ(x) − ψ ≤ S ≤ x.
2
Therefore,
x
ψ(x) ≤ x + ψ , x ≥ x0
2  
x x
≤x+ +ψ , x ≥ 2x0
2 4
..
.
x x  x  x x
≤ x + + · · · + k + ψ k+1 , < x0 ≤ k .
2 2 2 2k+1 2
This implies that
ψ(x) ≤ 2x + ψ(x0 ) ≤ c2 x
4.3 The functions θ(x) and π(x) 45
for some positive real number c2 .

4.3 The functions θ(x) and π(x)

Definition 4.3.1
For real number x ≥ 1, let
X
θ(x) = ln p.
p≤x

Theorem 4.3.1
For real number x ≥ 1, we have

θ(x) = ψ(x) + O( x).

Proof
We first note that the difference of ψ(x) and θ(x) is
X
ψ(x) − θ(x) = ln p
pm ≤x
m≤2
X X X
= ln p + ln p 1.

p≤ x p≤x1/3 3≤m≤ ln x
ln p
m=2

Hence,
√ X ln x
ψ(x) − θ(x) ≤ ψ( x) + ln p
ln p
p≤x1/3
√ √
 x + x1/3 ln x  x,

where f (x)  g(x) is another notation for f (x) = O(g(x)) (see Defini-
tion 3.1.1).

Using Theorems 4.2.1 and 4.3.1, we deduce the following corollary.


46 Elementary Results on the Distribution of Primes
Corollary 4.3.2 For x ≥ 4, there exist real positive constants c1 and
c2 such that

c1 x ≤ θ(x) ≤ c2 x.

We give a relation between θ(x) and π(x), where π(x) is given by


Definition 4.1.1.

Theorem 4.3.3
For each positive real x ≥ 4,
c1 x c2 x
≤ π(x) ≤ .
ln x ln x

Proof
It suffices to prove that
 
1 x
π(x) = θ(x) + O
ln x ln2 x

by Theorem 4.3.1. We observe that


 
θ(x) X ln p
π(x) − = 1−
ln x ln x
p≤x
X  
1 1
= ln p − . (4.3.1)
ln p ln x
p≤x

If
(
ln p if n is a prime p,
a(n) =
0 otherwise,

then by Corollary 4.3.2,


X
A(t) = a(n) = θ(t)  t.
n≤t
4.4 Merten’s estimates 47
The last expression in (4.3.1) is
  Z x  0
1 1 1 1
θ(x) − − θ(t) − dt
ln x ln x 2 ln t ln x
Z x Z x
θ(t) dt
= 2 dt  2
2 t ln t 2 ln t
Z √x Z x
dt dt
= 2 + √ 2
2 ln t x ln t
Z x
√ dt x
 x+ √ 2  2 .
x ln x ln x

As corollaries of Theorems 4.3.1 and 4.3.3, we have the following re-


sults. We leave the details of the proofs of these corollaries to the readers.

Corollary 4.3.4 The Prime Number Theorem


x
π(x) ∼
ln x
is equivalent to each of the following relations:
(a) θ(x) ∼ x, and
(b) ψ(x) ∼ x.

4.4 Merten’s estimates


In this section, we show that there are infinitely many primes by showing
X1
that diverges.
p
p≤x

Theorem 4.4.1 (Merten’s estimates)


Let x be a positive real number greater than 1. We have
X Λ(n)
(a) = ln x + O(1),
n
n≤x
X ln p
(b) = ln x + O(1),
p
p≤x
X1  
1
(c) = ln ln x + A + O , and
p ln x
p≤x
48 Elementary Results on the Distribution of Primes
Y 1

e−A
 
1

(d) (Merten’s Theorem) 1− = 1+O ,
p ln x ln x
p≤x

where A is a constant.

Proof
(a) First, we write
X Λ(n) X 1 h x i 
= Λ(n) + O(1)
n x n
n≤x n≤x
 
1X hxi 1 X
= Λ(n) +O Λ(n) .
x n x
n≤x n≤x

Now,
X hxi X
Λ(n) = (Λ ∗ u)(n).
n
n≤x n≤x

Hence, we deduce that


X Λ(n) 1X
= (Λ ∗ u)(n) + O(1)
n x
n≤x n≤x
1X
= ln n + O(1),
x
n≤x

= ln x + O(1).

(b) We observe that


X Λ(n) X ln p X X 1
0≤ − = ln p
n p √ pm
n≤x p≤x p≤ x 2≤m≤ ln x
ln p

X ln p
 2
 1.
√ p
p≤ x

Hence,
X ln p
= ln x + O(1).
p
p≤x

(c) Let
X
A(x) = a(n)
n≤x
4.4 Merten’s estimates 49
where

 ln p , if p is prime
a(n) = p

0, otherwise.

Then, we find that


X 1 X  ln p   1 
=
p p ln p
p≤x p≤x
Z x  0
1 1
= A(x) − A(t) dt
ln x 2 ln t
Z x
A(x) A(t)
= + 2 dt. (4.4.1)
ln x 2 t ln t

By Theorem 4.4.1 (b), we find that

A(t) = ln t + R(t),

with
R(t)  1, t ≥ 2. (4.4.2)

Using (4.4.2) in the last term of (4.4.1), we deduce that


Z x Z x Z x
ln t + R(t) dt R(t)
2 dt = + 2 dt
2 t ln t 2 t ln t 2 t ln t
Z ∞ Z ∞
R(t) R(t)
= ln ln x − ln ln 2 + 2 dt − dt
2 t ln t t ln2 t
  x
1
= ln ln x − ln ln 2 + A00 + O . (4.4.3)
ln x
Substituting (4.4.3) into (4.4.1), we conclude our proof of (c).

(d) We observe that


Y 1
 X 
1

ln 1− = ln 1 −
p p
p≤x p≤x
X  
1
= − + rp ,
p
p≤x

where
 
1 1
rp = ln 1 − + .
p p
50 Elementary Results on the Distribution of Primes
Hence,
Y 1
 X X1
ln 1− = rp −
p p
p≤x p≤x p≤x
  X X
1
= − ln ln x + A + O + rp − rp . (4.4.4)
ln x p p>x

Now,

X  
1 1
rp = − =O , (4.4.5)
m=2
mpm p2

since for m ≥ 1 and p ≥ 2,

mpm ≥ 2m .

Using (4.4.5) in (4.4.4), we deduce that


!
Y 1
  X 1
1

0
ln 1− = − ln ln x + A + O +O
p ln x
p>x
p2
p≤x
   
1 1
= − ln ln x + A0 + O +O
ln x x−1
 
1
= − ln ln x + A0 + O .
ln x

Hence,
Y 1
 
1

0
ln 1− = − ln ln x + A + O . (4.4.6)
p ln x
p≤x

Exponentiating both sides of (4.4.6), we arrive at


Y 1
  
1

1− = exp − ln ln x + A0 + O
p ln x
p≤x
0   
eA 1
= exp O
ln x ln x
0   
eA 1
= 1+O ,
ln x ln x

since et = 1 + O(t).
4.5 Prime Number Theorem and M (µ) 51
4.5 Prime Number Theorem and M (µ)

Definition 4.5.1
Let f be an arithmetical function. We define
1X
M (f ) = lim f (n)
x→∞ x
n≤x

if the limit on the right-hand side exists.

As mentioned in the introduction, there are several elementary proofs


of the Prime Number Theorem. One of the proofs relies on showing
that M (µ) = 0. In this section, we will show that if M (µ) = 0 then
the Prime Number Theorem is true. Conversely, the Prime Number
Theorem implies that M (µ) = 0.

Theorem 4.5.1
The Prime Number Theorem is equivalent to the relation

M (µ) = 0.

Proof
We will first show that the Prime Number Theorem implies that M (µ) =
0.
Define
 
1 X
M1 (µ) = lim  µ(n) ln n .
x→∞ x ln x
n≤x

Note that
M (µ) = 0 if and only if M1 (µ) = 0, (4.5.1)

since

1 X ln n 1
− 1  .
x ln x ln x
n≤x

Assume the Prime Number Theorem in the form θ(x) ∼ x.


52 Elementary Results on the Distribution of Primes
We observe that
X X X
µ(n) ln n = µ(n) Λ(d)
n≤x n≤x d|n
X X
= µ(n) ln p,
n≤x p|n

where we have used the fact that the terms with n non-squarefree are 0.
Furthermore, the value Λ(d) is nonzero only when d is a prime power.
But since n is squarefree, the divisors d|n that are prime powers are
simply primes. Hence,
X X X
µ(n) ln n = ln p µ(n)
n≤x p≤x n≤x
p|n
X X
= ln p (µ(n0 )) ,
p≤x 0
n ≤x/p
p-n0

where we have written n = pn0 . Next, observe that


 
X X X X X 
µ(n) ln n = − ln p µ(n) + O 
 ln p 1

n≤x p≤x n≤x/p p≤x n≤x/p
p|n
 
X X X ln p
=− ln p µ(n) + O x 
p2
p≤x n≤x/p p≤x
X X
=− µ(n) ln p + O(x)
n≤x p≤x/n
X x
=− µ(n)θ + O(x).
n
n≤x

Now, write
X X µ(n) X x
µ(n) ln n = −x − µ(n)R + O(x), (4.5.2)
n n
n≤x n≤x n≤x

where
R(y) = θ(y) − y.
Since
XX X hxi X µ(n)
1= µ(d) = µ(n) =x + O(x),
n n
n≤x d|n n≤x n≤x
4.5 Prime Number Theorem and M (µ) 53
we find that
X µ(n)
x = O(x).
n
n≤x

Therefore, (4.5.2) may be written as


X X x
µ(n) ln n = − µ(n)R + O(x).
n
n≤x n≤x

This implies that



 
1 X X  x 
µ(n) ln n ≤ 1 R + O
1
.
x ln x x ln x
n ln x
n≤x n≤x

Let  > 0 be given. By the Prime Number Theorem,


R(y)
lim = 0.
y→∞ y
Therefore, there exists a y0 such that
|R(y)| ≤ y (y ≥ y0 ).
For x ≥ y0 ,
X  x  X x X
R ≤  + max |R(y)|
n n y≤y0
n≤x n≤x/y0 x/y0 <n≤x

≤ x ln x + O (x).
Thus,
 
1  X  x  
lim R ≤ .
x ln x n
n≤x

This shows that the Prime Number Theorem implies that M1 (µ) = 0.
By (4.5.1), we deduce that M (µ) = 0.
To prove the converse, let
ln n = d(n) − 2C + r(n)
where r(n) is some arithmetical function. Let y ≥ 1. By (4.2.1),
X X X
r(n) = ln n − d(n) + 2Cy + O(1)
n≤y n≤y n≤y

= y(ln y − 1) + O(ln y)
√ √
− (y ln y + (2C − 1)y + O( y)) + 2Cy + O(1) = O( y).
(4.5.3)
54 Elementary Results on the Distribution of Primes
Next, since
Λ = ln ∗µ,

we conclude that
X X
Λ(n) = (µ ∗ ln)(n)
n≤x n≤x
X X X
= (µ ∗ d) − 2C( µ ∗ u) + (µ ∗ r)
n≤x n≤x n≤x
X
= [x] − 2C + (µ ∗ r)(n).
n≤x

The last equality holds because


X
µ ∗ u ∗ u = [x].
n≤x

Thus, the Prime Number Theorem follows from ψ(x) ∼ x (see Corollary
4.3.4) if
1X
lim (µ ∗ r)(n) = 0.
x→∞ x
n≤x

Now, by Theorem 3.5.1,


X X X
µ ∗ r(n) = µ(d1 )r(d2 )
n≤x n≤x d1 d2 =n
X X
= µ(d1 )r(d2 )
d1 ≤x d2 ≤x
d1 d2 ≤x

= S1 + S2 − S3 ,

where
X X
S1 = µ(d1 )r(d2 )
d2 ≤y d1 ≤x/d2
X X
S2 = µ(d1 )r(d2 )
d1 ≤x/y d2 ≤x/d1
X X
S3 = µ(d1 )r(d2 ),
d1 ≤x/y d2 ≤y

and y is a parameter in [1, x] to be chosen.


4.6 The Bertrand Postulate (Ramanujan’s proof ) 55
For a fixed y ∈ [1, x],


X X
|S1 | ≤ |r(d2 )| µ(d1 ) .
d2 ≤y d1 ≤x/d2

Now, using the assumption that

M (µ) = 0,

we find that

1 X
lim µ(d1 ) = 0.
x→∞ x
d1 ≤x/d2

Next, using (4.5.3), we deduce that




X X X rx
|S2 | ≤
r(d2 ) ≤ c
d1
d1 ≤x/y d2 ≤x/d1 d1 ≤x/y
Z x/y !
√ X 1 √ dt x
≤c x √ ≤c x 1+ √ ≤ c1 √ .
d1 1 t y
d1 ≤x/y

Finally, by using (4.5.3), we deduce that




x X x √ x
|S3 | ≤ r(d2 ) ≤ c2 y = c2 √ .
y y y
d2 ≤y

Hence,


1 X 1 c2
lim (µ ∗ r)(n) ≤ 0 + c1 √ + √ .
x→∞ x y y
n≤x

Since y is arbitrary,
1X
lim (µ ∗ r)(n) = 0.
x→∞ x
n≤x

4.6 The Bertrand Postulate (Ramanujan’s proof )


In this section, we will use the properties of the functions θ(x) and ψ(x)
to give a proof of the well-known Bertrand’s Postulate.
56 Elementary Results on the Distribution of Primes
Theorem 4.6.1 (Bertrand’s Postulate)
Let n be an integer. Then for n ≥ 2, there exists a prime p between n
and 2n.

Most books that discuss Theorem 4.6.1 prove the result following
Erdös’ approach. In this course, we will first present the proof due
to S. Ramanujan. In the next section, we will discuss Erdös’ proof.
Ramanujan’s proof was mentioned in an interesting article by P. Erdös
titled “Ramanujan and I”. Erdös’ proof of Theorem 4.6.1 was published
around 1932 and it was Kalmar who asked Erdös to look up on Ramanu-
jan’s proof and that was the first time Erdös heard about Ramanujan.
By definitions of ψ(x) and θ(x), we observe that

Lemma 4.6.2 For each positive real number x,


√ √
ψ(x) = θ(x) + θ( x) + θ( 3 x) + · · · . (4.6.1)

Next, we will show that

Lemma 4.6.3
x x
ln([x]!) = ψ(x) + ψ +ψ + ··· . (4.6.2)
2 3

Proof
The function
X
ψ(x) = Λ(n),
n≤x

where Λ(n) is the von Mangoldt function. Hence


X∞ x X ∞ X X
ψ = Λ(n) = Λ(n)
k x
k=1 k=1 n≤ k kn≤x
k≥1
XX X hxi
= Λ(n) = Λ(n)
x n
n≤x k≤ n n≤x
XX
= Λ(d) = ln[x]!,
n≤x d|n

where we have used properties of Λ(n) for the last equality.


4.6 The Bertrand Postulate (Ramanujan’s proof ) 57
We will now establish a few equalities and inequalities.

Lemma 4.6.4 For positive real number x, we have


√ √ √
ψ(x) − 2ψ( x) = θ(x) − θ( x) + θ( 3 x) − · · · , (4.6.3)
x x
ln[x]! − 2 ln[x/2]! = ψ(x) − ψ +ψ − ··· , (4.6.4)
2 3
√ 
ψ(x) − 2ψ x ≤ θ(x) ≤ ψ(x) (4.6.5)

and
x
ψ(x) − ψ ≤ ln[x]! − 2 ln[x/2]!
2 x x
≤ ψ(x) − ψ +ψ . (4.6.6)
2 3

Proof of (4.6.3).
This follows directly from (4.6.1). More precisely,

X ∞
X
√ √  √ 
ψ(x) − 2ψ( x) = θ k x −2 θ 2k
x .
k=1 k=1

Proof of (4.6.4).
This follows from (4.6.2), namely,

X x ∞
X x
ln[x]! − 2 ln[x/2]! = ψ −2 ψ .
k 2k
k=1 k=1

Proof of (4.6.5).
Note that θ(x) is increasing. Hence, from (4.6.3),

ψ(x) − 2ψ( x) ≤ θ(x).

Also, from (4.6.1),


ψ(x) ≥ θ(x).

Proof of (4.6.6).
This follows immediately from (4.6.4).
58 Elementary Results on the Distribution of Primes
Lemma 4.6.5 Let x be a real number. Then
2
ln[x]! − 2 ln[x/2]! > x if x > 750, (4.6.7)
3
3
ln[x]! − 2 ln[x/2]! < x if x > 3, (4.6.8)
4
x x 2
ψ(x) − ψ( +ψ > x if x > 750, (4.6.9)
2 3 3
and
x 3
ψ(x) − ψ < x if x > 3. (4.6.10)
2 4

Proof of (4.6.7).

For real number z, the Gamma function Γ(z) is given by


(n − 1)!nz
Γ(z) = lim .
n→∞ z(z + 1) · · · (z + n − 1)

The function Γ(x) satisfies the well-known Stirling’s formula


 
√ 1 ϑx
ln Γ(x) = ln 2π + x − ln x − x + , 0 < ϑx < 1. (4.6.11)
2 12x
It is known that ln Γ(x) is convex, that is, it is continuous and satisfy
the relation
 
Γ(u) + Γ(v) u+v
≥Γ .
2 2
The convexity of ln Γ(x) implies that

(ln Γ(x))00 ≥ 0,

which leads to
Γ00 (x)Γ(x) ≥ 0.

Since Γ(x) is positive for x > 0 (see the definition of Γ(x), we deduce
that
Γ00 (x) ≥ 0.

Observe now that Γ(3) > Γ(2) and thus, by mean value theorem, there
is a c ∈ [2, 3] such that Γ0 (c) > 0. Since Γ00 (x) ≥ 0 for x > 0, we deduce
4.6 The Bertrand Postulate (Ramanujan’s proof ) 59
that Γ0 (x) > 0 for all x ≥ c. In other words, Γ(x) is increasing for
x > 3 > c and we conclude that
 
1
ln[x]! − 2 ln[x/2]! ≥ ln Γ(x) − 2 ln Γ x+1 .
2
To prove (4.6.7), it suffices to show that for x > 750,
 
1 2x
ln Γ(x) − 2 ln Γ x+1 > . (4.6.12)
2 3
By (4.6.11), we deduce that
 
1
ln Γ(x) − 2 ln Γ x+1
2
 
√ 1 ϑ1 √
= ln 2π + x − ln x − x + − 2 ln 2π
2 12x
    x 
x 1 x ϑ2
−2 + ln +1 +2 +1 − , (4.6.13)
2 2 2 2 3x + 6
where both ϑ1 , ϑ2 belong to the interval (0, 1). Since
ϑ1 ϑ2
2+ − ≥ 1,
12x 3x + 6
we find from (4.6.13) that
x    
√ 2x 1 x
ln Γ(x)−2 ln Γ + 1 > − ln 2π+1+x ln − ln x−ln 1 + .
2 2+x 2 2

Using the fact that − ln 2π + 1 > 0, −1/2 > −1 and that for x > 2,
  x 
− ln x 1 + > − ln x2 ,
2
we find that
x   
2x
ln Γ(x) − 2 ln Γ + 1 > x ln − 2 ln x.
2 x+2
It suffices to show that for x > 750,
 
2x 2x
x ln − 2 ln x > .
x+2 3
But if we let
ln x
f (x) = ln 2x − ln(x + 2) − 2 ,
x
then
1 1 2 ln x
f 0 (x) = − − 2 +2 2 .
x x+2 x x
60 Elementary Results on the Distribution of Primes
But
1 1
− >0
x x+2
and
ln x 2
2 − 2 >0
x2 x
if x > 3. Hence if x > 3, then f 0 (x) > 0. Therefore, f (x) is increasing.
In other words, if x > 750, then
2
f (x) > f (750) = 0.672 · · · > ,
3
and the proof of (4.6.12) is complete.
Proof of (4.6.8).
The proof is similar to that for (4.6.7). We use the inequality
 
1 1
ln[x]! − 2 ln[x/2]! ≤ ln Γ(x + 1) − 2 ln Γ x+
2 2
and Stirling’s formula to conclude that (why?)
3
ln[x]! − 2 ln[x/2]! ≤ x
4
for all x > 3.
Proof of (4.6.9) and (4.6.10).
These two inequalities follow immediately from (4.6.6)-(4.6.8).

Lemma 4.6.6 For each positive real number x, we have


3
ψ(x) < x if x > 3 (4.6.14)
x x 2 √ x x
ψ(x) − ψ +ψ ≤ θ(x) + 2ψ( x) − θ +ψ
2 3  x  x 2√ 3
< θ(x) − θ + + 3 x. (4.6.15)
2 2

Proof of (4.6.14).
To prove (4.6.14), we use (4.6.10) repeatedly, with x replaced by x/2,
x/4, · · · and add up the results. We find that
 
3 1 3
ψ(x) ≤ x 1 + + · · · < x.
4 2 2

Proof of (4.6.15).
4.7 Bertrand’s postulate (Erdös’ proof ) 61
From (4.6.5), we find that

ψ(x) − 2ψ( x) ≤ θ(x).

Hence

ψ(x) ≤ θ(x) + 2ψ( x).

Next, from (4.6.5),


θ(x/2) ≤ ψ(x/2).

Using the above inequalities, we deduce that



ψ(x) − ψ(x/2) + ψ(x/3) ≤ θ(x) + 2ψ( x) − θ(x/2) + ψ(x/3).

For the second inequality, we use (4.6.14) to deduce that


√ √
2ψ( x) + ψ(x/3) ≤ 3 x + x/2.

We are now ready to prove Bertrand’s Postulate. By (4.6.9),


2
ψ(x) − ψ(x/2) + ψ(x/3) ≥ x
3
for x > 750. Hence we deduce from (4.6.15) that

θ(x) − θ(x/2) ≥ 2x/3 − x/2 − 3 x > 0

whenever x > 750. This implies that for n > 375, there is a prime
between n and 2n.
We are now left with verifying that Bertrand’s Postulate is true for
2 ≤ n ≤ 375. This is straightforward and we leave it as an exercise.

4.7 Bertrand’s postulate (Erdös’ proof )

We will need several elementary lemmas.

Lemma 4.7.1 Let r(p) satisfies

pr(p) ≤ 2n < pr(p)+1 . (4.7.1)

Then
  Y
2n
| pr(p) .
n
p≤2n
62 Elementary Results on the Distribution of Primes
Proof hni
The number of integers less than n and divisible by m is . There-
m
fore, the number of integers from 1 to n that is exactly a multiple of pj
is    
n n
− .
pj pj−1
Hence, the exponent of p in n! is
             
n n n n n n n
− 2 +2 2
− 3
+ · · · + (k − 1) k−1
− k + k
p p p p p p p
     
n n n
= + 2 + ···+ k ,
p p p
where k is such that
pk ≤ n < pk+1 .
 
2n
Therefore the exponent of p is is
n
r(p)     r(p)
X 2n n X
−2 j ≤ 1 = r(p).
j=1
pj p j=1

Hence,
  Y
2n
| pr(p) .
n
p≤2n

Lemma 4.7.2 If p > 2 and


2n
< p ≤ n,
3
then
 
2n
p- .
n

Proof
If p satisfies
2n
< p ≤ n,
3
4.7 Bertrand’s postulate (Erdös’ proof ) 63
then p occurs once in the factorization of n!. This is because if 2p ≤ n,
then
n 2n
p≤ < < p,
2 3
which is a contradiction to our assumption. Now p occurs twice in (2n)!
because 3p > 2n. Therefore,
 
2n
p- .
n

Lemma 4.7.3 For n ≥ 2,


Y
p < 4n .
p≤n

Proof

Let P (n) denote the statement. It is clear that P (2) and P (3) are true.
If m > 1, then
Y Y
p= p ≤ 42m−1 < 42m .
p≤2m p≤2m−1

Therefore, P (2m − 1) implies P (2m).


Suppose n =  2m + 1. 
Then each prime in the interval [m + 2, 2m + 1]
2m + 1
is a factor of . This is because primes in the interval do not
m
 
2m + 1
occur in the denominator of (which is m!(m + 1)!).
m
Since P (m + 1) holds, we find that
Y Y Y  
2m + 1 m+1
p= p p≤ 4 .
m
p≤2m+1 m+2≤p≤2m+1 p≤m+1

But,
     
2m+1 2m + 1 2m + 1 2m + 1
(1 + 1) = + + ···+
0 1 m
     
2m + 1 2m + 1 2m + 1
+ + ··· + ≥2 .
m+1 2m + 1 m
64 Elementary Results on the Distribution of Primes
Therefore,
 
2m + 1
< 4m .
m
Hence,
Y
p ≤ 4m · 4m+1 = 42m+1
p≤2m+1

and P (2m + 1) is true.


We are now ready to prove Bertrand’s postulate.
Suppose that Bertrand’s postulate is false. Then there exists a positive
integer n > 1 such that there is no prime p in the interval [n, 2n). By
Lemma 4.7.2, all prime factors of
 
2n
n
must satisfy
 p ≤ 2n/3. Let s(p) be the largest prime power of p that
2n
divides . By Lemma 4.7.1,
n
Y   Y
2n
ps(p) = | pr(p) .
n
p≤2n/3 p≤2n

Therefore, s(p) ≤ r(p) and


ps(p) ≤ pr(p) ≤ 2n (4.7.2)
by (4.7.1). If s(p) > 1, then ps(p) ≥ p2 and thus,

p < 2n
s(p)

since
 p < 2n. In other words, no more than [ 2n] primes occur in
2n
with exponent larger than 1. Now,
n
  Y Y Y
2n
= ps(p) = ps(p) ps(p)
n
p≤2n/3 p≤2n/3 p≤2n/3
s(p)>1 s(p)=1
Y Y
≤ ps(p) p

p< 2n p≤2n/3

[ 2n] [2n/3]
< 2n 4 ,
by (4.7.2) and Lemma 4.7.3.
4.7 Bertrand’s postulate (Erdös’ proof ) 65
Next, since
       
2n 2n 2n 2n 2n
(1 + 1) = + ···+ + ···+ < (2n + 1) ,
0 n 2n n
we conclude that
 
4n 2n √
≤ ≤ 2n 2n 42n/3 ,
2n + 1 n
which implies that

4n/3 ≤ (2n + 1) 2n+1
.
Therefore,
ln 4 √
n < ( 2n + 1) ln(2n + 1).
3
This is false for large n. In particular, it is false for n ≥ 750. In other
words, Bertrand’s postulate is true for n ≥ 750. For n < 750, we verify
directly that Bertrand’s postulate is true.
5
The Prime Number Theorem

5.1 The Prime Number Theorem


In Chapter 4, Corollary 4.3.4, we proved that the Prime Number Theo-
rem is equivalent to the statement
ψ(x) ∼ x. (5.1.1)
In this chapter, we will prove the following theorem.

Theorem 5.1.1
For positive real number x, we have
 √ 
10
ψ(x) = x + O x exp(−c ln x) ,

where c > 0 is some constant independent of x.


We note that (5.1.1) follows immediately from Theorem 5.1.1.
Theorem 5.1.1, which was mentioned in [?, p. 169], is weaker than the
result obtained independently by J. Hadamard and de la Valleé Poussin,
which states that
 √ 
ψ(x) = x + O x exp(−c ln x) .

But the treatment here (adapted from A. Hildebrand’s 1991 “Analytic


Number Theory” notes [?]) allows us to appreciate the analytic method
used in the proofs of the Prime Number Theorem with less technicalities.

5.2 The Riemann zeta function

In Chapter 3, Definition 3.4.1, we have encountered the Riemann zeta


function for real s > 1. We now give the definition of the function when
s is a complex number.

66
5.3 Euler’s product and the product representation of ζ(s) 67
Definition 5.2.1
Let s = σ + it ∈ C and σ > 1. Define
X∞
1
ζ(s) = s
.
n=1
n

Theorem 5.2.1
The Riemann zeta function ζ(s) is an analytic function for σ > 1.

Proof
Note that if σ ≥ 1 + δ, then
M
X XM XM
1 1 1
≤ ≤ .
ns n σ n 1+δ
n=m n=m n=m

Now, for every  > 0, there exists N > 0 such that


M
X 1
1+δ
<
n=m
n

for M > m > N . Hence, we conclude that


M
X

1
<
ns
n=m

for M > m > N . Therefore, by the Weierstrass M -test, the series


X∞
1
n=1
ns

is absolutely and uniformly convergent in any region σ ≥ 1 + δ, with


δ > 0. The Riemann zeta function ζ(s) is therefore an analytic function
in σ > 1.

5.3 Euler’s product and the product representation of ζ(s)

Theorem 5.3.1
68 The Prime Number Theorem
For σ > 1,
Y 1
−1
ζ(s) = 1− s .
p
p

The above follows immediately from the next theorem.

Definition 5.3.1
An infinite product

Y
(1 + an )
n=1

is said to be absolutely convergent if



X
ln(1 + an )
n=1

is absolutely convergent.

Theorem 5.3.2
Let f be a multiplicative arithmetical function such that the series

X
f (n)
n=1

is absolutely convergent. Then the sum of the series can be expressed


as an absolutely convergent infinite product, namely,

X Y
f (n) = (1 + f (p) + f (p2 ) + · · · ), (5.3.1)
n=1 p

extended over all primes.

The product above is called the Euler product of the series.

Proof
Consider the finite product
Y
P (x) = (1 + f (p) + f (p2 ) + · · · )
p≤x

extended over all primes p ≤ x. Since this is a product of a finite number


5.3 Euler’s product and the product representation of ζ(s) 69
of absolutely convergent series we can multiply the series and rearrange
the terms without altering the sum. A typical term is of the form
!
Y Y
α α
f (p ) = f p ,
p p

since f is multiplicative. By the fundamental theorem of arithmetic we


can write
X
P (x) = f (n)
n∈A

where A consists of those n having all their prime factors less than or
equal to x. Therefore,

X X
f (n) − P (x) = f (n),
n=1 n∈B

where B is the set of n having at least one prime factor greater than x.
Therefore,

X∞ X X

f (n) − P (x) ≤ |f (n)| ≤ |f (n)|.
n>x
n=1 n∈B

Since

X
|f (n)|
n=1

is convergent,
X
lim |f (n)| = 0.
x→∞
n>x

Hence,

X
lim P (x) = f (n).
x→∞
n=1

We have proved that the infinite product is convergent. We now estab-


lish the absolute convergence of the infinite product. A necessary and
sufficient condition for the absolute convergence of the product
Y
(1 + an )
n

is the convergence of the series (see [?, p. 192])


X
|an |.
n
70 The Prime Number Theorem
In this case, we have
X X ∞
X
|f (p) + f (p2 ) + f (p3 ) + · · · | ≤ (|f (p)| + |f (p2 )| + · · · ) ≤ |f (n)|.
p≤x p≤x n=2

Since the partial sums are bounded, the series of positive terms
X
|f (p) + f (p2 ) + f (p3 ) + · · · |
p≤x

converges, and this implies absolute convergence of the product (5.3.1).

Applying Theorem 5.3.2 with


1
f (n) = ,
ns
we obtain Theorem 5.3.1.

5.4 Analytic continuation of ζ(s) to σ > 0

Theorem 5.4.1
The Riemann zeta function ζ(s) can be extended to a function that
is analytic in σ > 0, except at s = 1, where it has a simple pole with
residue 1.

Proof
Recall from Theorem 3.2.2 that
X Z x Z x
f (n) = f (1) + f (t)dt + f 0 (t){t}dt − {x}f (x).
n≤x 1 1

With s real,
1
x = N ∈ N and f (n) = ,
ns
we have
XN Z N Z N
1 dη s{η}
s
= 1 + s
− dη.
n=1
n 1 η 1 η s+1

By analytic continuation, the above identity is also valid for complex


numbers s = σ + it with σ > 1.
5.5 Upper bounds for |ζ(s)| and |ζ 0 (s)| near σ = 1 71
Now, assume σ > 1. Then
N
X 1
lim s
= ζ(s),
N →∞
n=1
n
Z N Z ∞
dη dη 1
lim s
= s
=
N →∞ 1 η 1 η s−1

and
Z N Z ∞
{η} {η}
lim dη = dη =: Φ(s).
N →∞ 1 η s+1 1 η s+1
Therefore,
1
ζ(s) = 1 + − sΦ(s), σ > 1.
s−1
But, Φ(s) is analytic for σ > 0. Define, for σ > 0, the extension of ζ(s)
by
1
1+ − sΦ(s).
s−1
Note that this function has a pole at s = 1.

5.5 Upper bounds for |ζ(s)| and |ζ 0 (s)| near σ = 1

Theorem 5.5.1
Let A be a positive real number. If
 
1 A
|t| ≥ 2 and σ ≥ max ,1 − , (5.5.1)
2 ln |t|
then there are positive constants M and M 0 (depending on A) such that

|ζ(s)| ≤ M ln |t| (5.5.2)


0 0 2
|ζ (s)| ≤ M ln |t|. (5.5.3)

Proof
Suppose s is real and s > 1. Then by the Euler-Maclaurin summation
formula (see Theorem 3.2.2),
72 The Prime Number Theorem

Fig. 5.1. The shaded regions indicate the regions for which (5.5.2) and (5.5.3)
hold.

XN Z N Z N
1 dx {x}
s
=1+ s
−s s+1
dx
n=1
n 1 x 1 x
Z ∞
N 1−s − 1 {x}
=1+ − sΦ(s) + s s+1
dx
1−s N x
Z ∞
N 1−s {x}
= ζ(s) + +s s+1
dx.
1−s N x

By analytic continuation, the above identity holds for complex number


s with σ > 1.
Now,

XN Z ∞
1 N 1−σ dx
|ζ(s)| ≤ σ
+ + |s| σ+1
n=1
n |1 − s| N x
XN
1 N 1−σ |s|
≤ σ
+ + .
n=1
n |t| σN σ

Assume that s is in the region specified by (5.5.1) and let N = [|t|].


5.5 Upper bounds for |ζ(s)| and |ζ 0 (s)| near σ = 1 73
Then
 
A ln N
N 1−σ ≤ exp ≤ exp(A).
ln |t|
This implies that
X 1 eA |s| A
|ζ(s)| ≤ + + e
nσ |t| σN
n≤|t|
X 1 eA (σ + |t|)eA
≤ σ
+ +
n 2 σ(|t|/2)
n≤|t|
X 1  
A 1 2 2
≤ +e + + . (5.5.4)
nσ 2 |t| σ
n≤|t|

Since σ > 1/2 and |t| ≥ 2, we find that


1 2 2
+ + < 6.
2 |t| σ
This shows that (5.5.4) may be written as
X 1
|ζ(s)| ≤ + 6eA . (5.5.5)

n≤|t|

For σ ≥ 1,
X 1 X 1
≤ = ln |t| + O(1). (5.5.6)
nσ n
n≤|t| n≤|t|

For
 
1 A
max ,1 − < σ < 1,
2 ln |t|
and n ≤ N , we find that
1 1 1 1 
σ
≤ n1−σ ≤ N 1−σ ≤ 1 + eA − 1 .
n n n n
Hence,
X 1 X 1
σ
≤ eA = eA (ln |t| + u(t)) , (5.5.7)
n n
n≤|t| n≤|t|

where u(t) = O(1).


Combining (5.5.6) and (5.5.7), we conclude that if s is in the region
specified by (5.5.1), then

|ζ(s)| ≤ M ln |t|,
74 The Prime Number Theorem
where M is a positive constant depending on A. This proves (5.5.2).
We leave the proof of (5.5.3) as an exercise.

5.6 The non-vanishing of ζ(1 + it)

Theorem 5.6.1
For real number t 6= 0,

ζ(1 + it) 6= 0.

We first prove several simple lemmas.

Lemma 5.6.2 For all θ ∈ R,


3 + 4 cos θ + cos 2θ ≥ 0.

Proof
The inequality follows immediately from the following computations:

3 + 4 cos θ + 2 cos2 θ − 1 = 2 cos2 θ + 4 cos θ + 2


= 2(cos2 θ + 2 cos θ + 1)
= 2(cos θ + 1)2 ≥ 0.

Lemma 5.6.3 For σ > 1,


ζ(s) = eG(s) ,
where

XX 1
G(s) = ms
.
p m=1
mp
5.6 The non-vanishing of ζ(1 + it) 75
Proof
Using the Euler product representation of ζ(s), we find that
Y 1
−1
ζ(s) = 1− s
p
p
X  !
1
= exp − ln 1 − s
p
p

!
XX 1
= exp = exp(G(s)).
p m=1
mpsm

Lemma 5.6.4 For σ > 1, and all t ∈ R,

|ζ(σ)|3 |ζ(σ + it)|4 |ζ(σ + 2it)| ≥ 1.

Proof
By Lemma 5.6.3, we have for σ > 1,

!
XX 1
ζ(s) = exp ms
p m=1
mp

!
XX 1
= exp exp {−(ln p)ms}
p m=1
m

!
XX 1
= exp exp {−mσ ln p − itm ln p} ,
p m=1
m

since s = σ + it. Hence,



!
XX 1 1
ζ(s) = exp {cos(tm ln p) − i sin(tm ln p)} .
p m=1
m pσm

Therefore,

!
XX 1 1
|ζ(s)| = exp cos(tm ln p) .
p m=1
m pσm
76 The Prime Number Theorem
This implies that

|ζ(σ))|3 |ζ(σ + it)|4 |ζ(σ + 2it)|



!
XX 1
= exp (3 + 4 cos(tm ln p) + cos(2tm ln p))
p m=1
mpσm

≥ exp(0) = 1.

Proof of Theorem 5.6.1.

Suppose ζ(1 + it0 ) = 0 for some t0 6= 0. By Lemma 5.6.4, we deduce


that for σ > 1,

ζ(σ + it0 ) 4
|ζ(σ)(σ − 1)|3 |ζ(σ + 2it0 )| ≥ 1 . (5.6.1)
σ−1 σ−1

Now, since ζ(σ) has a simple pole with residue 1 at σ = 1, we find that

lim ζ(σ)(σ − 1) = 1. (5.6.2)


σ→1+

Next,

ζ(σ + it0 ) = ζ(1 + it0 ) + (σ − 1)ζ 0 (1 + it0 ) + O((σ − 1)2 ).

This implies that

ζ(σ + it0 )
lim = ζ 0 (1 + it0 ). (5.6.3)
σ→1+ σ−1

It is clear that

ζ(σ + 2it0 ) → ζ(1 + 2it0 ). (5.6.4)

Combining (5.6.2)–(5.6.4), we find that when σ approaches 1 from the


right, the right-hand side of (5.6.1) approaches ∞ and the left-hand side
of (5.6.1) is finite. This leads to a contradiction and we conclude that

ζ(1 + it) 6= 0,

for all nonzero real t.


5.7 A lower bound for |ζ(s)| near σ = 1 77
5.7 A lower bound for |ζ(s)| near σ = 1

Theorem 5.7.1
For |t| ≥ 2, there exist positive constants c and d such that for
c
σ >1− ,
(ln |t|)9

we have
d
|ζ(σ + it)| ≥ .
(ln |t|)7

Proof
For σ ≥ 2,

X ∞
1 X∞
1

|ζ(s)| = ≥1−

n=1
ns
n=2
ns
X∞
1 π2 1
≥1− 2
= 2 − > .
n=2
n 6 4

Therefore, for σ ≥ 2,
d
|ζ(s)| ≥ ,
(ln |t|)7

provided that

ln7 (2)
d≤ and |t| ≥ 2. (5.7.1)
4
For δ > 0, let
δ
1+ ≤ σ ≤ 2, |t| ≥ 2.
(ln |t|)9

By Lemma 5.6.4, we find that

1
|ζ(σ + it)| ≥ .
|ζ(σ)|3/4 |ζ(σ + 2it)|1/4
78 The Prime Number Theorem
Now, if σ ≤ 2,
X∞ Z ∞
1 1 1
ζ(σ) = σ
≤ 1 + σ
dx = 1 +
n=1
n 1 x σ−1
2

σ−1
2
≤ (ln |t|)9 ,
δ
since
δ
σ ≥1+ .
(ln |t|)9
Now suppose
ln9 2
δ< . (5.7.2)
2
Then since |t| > 2 we have
1 1
<
ln |t| ln 2
and therefore,
δ 1 ln9 2 1 ln 2
9 ≤ < .
ln |t| 2 ln8 |t| ln |t| 2 ln |t|
In other words, if δ satisfies (5.7.2) and
δ
σ >1− ,
ln9 |t|
we must have
1 ln 2
σ >1− . (5.7.3)
2 ln |t|
1
By Theorem 5.5.1 with A = 2 ln 2, we can find a constant M > 0 such
that
|ζ(σ + 2it)| ≤ 2M ln |t|.

Hence,
 3/4  1/4
δ 1
|ζ(σ + it)| ≥
2 ln9 |t| 2M ln |t|
δ 3/4 d
= 7 ≥ 7 ,
2M 1/4 ln |t| ln |t|
5.7 A lower bound for |ζ(s)| near σ = 1 79
for
δ 3/4
d≤ . (5.7.4)
2M 1/4
Next, consider
δ δ
1− 9 ≤σ ≤1+ 9 , |t| ≥ 2.
ln |t| ln |t|
If
δ
σ0 = 1 + 9 ,
ln |t|
then we want to show that ζ(σ + it) is close to ζ(σ0 + it).
Z σ0

|ζ(σ + it) − ζ(σ0 + it)| = ζ 0 (u + it)du
σ
≤ |σ − σ0 | max |ζ 0 (u + it)|.
σ≤u≤σ0

Now, by Theorem 5.5.1, there exists an M 0 > 0 such that


|ζ 0 (u + it)| ≤ M 0 ln2 |t|,
for
1 ln 2
|u| ≥ σ ≥ 1 − and |t| ≥ 2.
2 ln |t|
Therefore,

|ζ(σ + it) − ζ(σ0 + it)| ≤ M 0.
ln7 |t|
Hence,
|ζ(σ + it)| ≥ |ζ(σ0 + it)| − |ζ(σ + it) − ζ(σ0 + it)|
δ 3/4 2δ
≥ − M0
2M 1/4 ln7 |t| ln7 |t|
 
δ 3/4 1 1/4 0
= 7 − 2δ M .
ln |t| 2M 1/4
We now choose a real positive number δ = δ1 be such that
 
1 1/4 0
− 2δ 1 M > 0.
2M 1/4
Now, letting
 
1 9
0 < c < min ln (2), δ1
2
80 The Prime Number Theorem
and
  !
3/4
3/4 1 1/4 0 ln7 (2) δ1
0 < d < min δ1 − 2δ1 M , , ,
2M 1/4 4 2M 1/4
c
we conclude that for |t| ≥ 2 and σ > 1 − 9 ,
ln |t|
d
|ζ(σ + it)| ≥ 7 .
ln |t|

5.8 Perron’s Formula

Theorem 5.8.1
Let x be a half integer. Then for any b ∈ [1, 3] and any T ≥ 1,
Z b+iT  0   
1 ζ xs xb ln2 x
ψ(x) = − (s) ds + O +x .
2πi b−iT ζ s T (b − 1) T

We first begin with several lemmas.

Lemma 5.8.2 For σ > 1,


X∞
Λ(n) ζ0
= − (s).
n=1
ns ζ

Proof
The proof is immediate using the formula

Λ = µ ∗ ln

and the fact that (see Chapter 6, Theorem 6.3.1)

X∞ ∞ ∞
f ∗ g(n) X f (n) X g(n)
= .
n=1
ns n=1
ns n=1 ns
5.8 Perron’s Formula 81
Lemma 5.8.3 For σ > 1,
0
ζ
(s)  1 + 1.
ζ σ−1

Proof
For σ > 1,
0 ∞ Z ∞X
ζ X Λ(n) dt
(s) ≤ =σ Λ(n) σ+1
ζ n σ t
n=1 1 n≤t
Z ∞
ct
≤σ dt, by Theorem 4.2.1,
1 tσ+1
σ 1
=c 1+ .
σ−1 σ−1

Lemma 5.8.4 For b > 0, T ≥ 1, and y > 0, y 6= 1, we have


  

 yb
Z b+iT s 
 1 + O if y > 1
1 y T | ln y|
ds =  
2πi b−iT s 

 yb
 O if 0 < y < 1
T | ln y|

Proof
We will only prove the result when y > 1. By the Residue Theorem,
Z b+iT s Z 3 Z
1 y 1 ys 1 X ys
ds = ds = 1 + ds.
2πi b−iT s 2πi Γ0 s 2πi j=1 Γj s

Thus, it suffices to show that with −a large enough,


Z
y s yb

ds  ,
Γj s T | ln y|

with j = 1, 2, 3.
On Γ2 ,
s
y ya
= a
s |s| ≤ y ,
82 The Prime Number Theorem

Γ1

Γ2 Γ0

a b

Γ3

Fig. 5.2. Contours used in the proof of Lemma 5.8.4

if a ≤ −1. This implies that


Z
y s
ds ≤ y a 2T.
s
Γ2

Letting a approaches −∞, we conclude that the above integral is 0.


On Γ1 and Γ3 ,
s
y yσ yσ
= ≤ ,
s |s| T
since
|s| > |T |.

Hence, for j = 1 or 3,
Z Z Z
y s b σ
y 1 b σ ln y yb

ds ≤ dσ ≤ e dσ  .
Γj s a T T −∞ T | ln y|

For the case 0 < y < 1, we will leave it as exercise for the reader.

Proof of Theorem 5.8.1.


Let
Z b+iT  0  s
1 ζ x
I= − (s) ds. (5.8.1)
2πi b−iT ζ s
5.8 Perron’s Formula 83
By Lemmas 5.8.2 and 5.8.4, we find that
Z b+iT X ∞
1 Λ(n) xs
I= ds
2πi b−iT n=1 ns s
X∞ Z b+iT x s
1 n
= Λ(n) ds
n=1
2πi b−iT s
 !
X ∞
X x b
n
= Λ(n) + Λ(n)O ,
n=1
T | ln nx |
n≤x

!
xb X Λ(n)
= ψ(x) + O .
T n=1 nb | ln nx |

Let

X Λ(n)
R= b | ln x |
.
n=1
n n

Then
X Λ(n) X Λ(n)
R= + = R1 + R2 .
x n | ln nx |
b nb | ln nx |
2 ≤n≤2x n6∈[ x
2 ,2x]

Note that if n > 2x or n < x/2 then | ln(x/n)| ≥ ln 2. Furthermore,


since 1 < b ≤ 3, by Lemmas 5.8.2 and 5.8.3,

1 X Λ(n) 1 1 2 1
R2 ≤ b
≤1+ ≤ +  .
ln 2 n=1 n b−1 b−1 3−1 b−1

Now, if
1
− ≤ t < 1,
2
then
|t|
| ln(1 + t)| ≥
2
and we deduce that
 
x
ln = ln 1 + x − n  x − n . (5.8.2)
n n n
Furthermore, since
Λ(n) ≤ ln x (5.8.3)
and
1 2b
≤ (5.8.4)
nb xb
84 The Prime Number Theorem
for x/2 < n. Using (5.8.2)–(5.8.4), with the observations that 2b ≤ 23
and |n| ≤ |x|, we find that

X Λ(n) ln x X x
R1 =  b x − n . (5.8.5)
x nb | ln nx | x x
2 ≤n≤2x 2 ≤n≤2x

Since x is a half integer, the denominator in the sum



X x

x − n
x
2 ≤n≤2x

is nonzero and we find that


X
x

x − n  x ln x. (5.8.6)
x
2 ≤n≤2x

Substituting (5.8.6) into (5.8.5), we conclude that


ln2 x
R1  x.
xb
Hence, the error term for I, given by (5.8.1), is
 
xb ln2 T
O +x .
T (b − 1) T

5.9 Completion of the proof of the Prime Number Theorem

Step 1.
Application of Perron’s Formula:
Let
1 1
T ≥ 1, x = N + ≥ 2 and b = 1 + .
2 ln x
Then
Z  0  s  
1 b+iT
ζ x x ln2 x
ψ(x) = − (s) ds + O .
2πi b−iT ζ s T

Step 2.
Shifting of path of integration:
5.9 Completion of the proof of the Prime Number Theorem 85
Choose a sufficiently close to 1 so that
ζ(s) 6= 0
for all σ ≥ a, |t| ≤ T . We note that the integrand
ζ0 xs
− (s)
ζ s
is analytic in the region enclosed by the old and new paths with an
exception of a pole at s = 1, with residue x. By the Residue Theorem,
Z b+iT  0  s X3 Z  0  s
1 ζ x 1 ζ x
− (s) ds = x + − (s) ds.
2πi b−iT ζ s j=1
2πi Γj ζ s

Γ3

Γ2

Γ1

Fig. 5.3. Contours used in the proof of the Prime Number Theorem

Step 3.
Z  0  s
ζ x
Estimation of − (s) ds:
Γj ζ s
Let 0
ζ
B = max (s) .
s∈Γ1 ,Γ2 ,Γ3 ζ

The number B depends on T and will be estimated in Step 4.


86 The Prime Number Theorem
Now, for T ≥ 2,
Z  0  s Z a+iT
ζ x |ds|
− (s) a
ds ≤ x B
ζ s |s|
Γ2 a−iT
Z T
dt
= 2xa B
0 |a + it|
 Bxa ln T. (5.9.1)
The last inequality follows from the fact that for T ≥ 2,
Z T Z T Z 1
dt dt dt
≤ + ≤ ln T + 2  ln T.
0 |a + it| 1 t 0 a
We will now estimate the integral on Γ3 . The estimate of the integral
on Γ1 is similar. Since
1
b=1+ ,
ln x
we find that
Z  0  s Z
ζ x B b σ
− (s) ds ≤ x dσ
ζ s T
Γ3 a
b
Bx Bx
  . (5.9.2)
T T
We therefore conclude from (5.9.1) and (5.9.2) that
   
Bx a x ln2 x
ψ(x) = x + O + O(Bx ln T ) + O .
T T
We note that the above holds for T ≥ 2 and a suitable choice of a.
Step 4.
Choice of a and estimation of B:
For |t| ≤ 2, we note that ζ(s) 6= 0 for s = 1 + it. Therefore, there exists
a δ > 0 such that for |t| ≤ 2 and σ > 1 − δ,
1
ζ(s)
is analytic and bounded there. This implies that
0
ζ
(s)  1  ln9 T. (5.9.3)
ζ

Suppose 2 ≤ |t| ≤ T . Then by Theorems 5.5.1 and 5.6.1, there exist c


and d such that
d
|ζ(s)| ≥ 7 and |ζ 0 (s)|  ln2 |t|
ln |t|
5.9 Completion of the proof of the Prime Number Theorem 87
in the region
c
σ ≥1− 9 .
ln |t|
Note that we must choose c so that
c < δ ln9 2. (5.9.4)
The additional condition imposed on c is necessary for the validity of
(5.9.3). Next, with 2 ≤ |t| ≤ T , and
c
a=1− 9 ,
ln |T |
we conclude that 0
ζ
(s)  ln9 T.
ζ

Together with (5.9.3), we find that


0
ζ
B = max (s)  1  ln9 T.
s∈Γ1 ,Γ2 ,Γ3 ζ

Therefore,
   
ln9 T x ln2 x
ψ(x) = x + O x +O
T T
  
ln x
+ O x ln10 T exp −c 9 .
ln T
Now the first two error terms can be bounded by
 
ln10 x
O x .
T
Hence
    
ln10 x ln x
ψ(x) = x + O x + O x ln10 T exp −c 9 .
T ln T

Step 5.
Choice of T :
Assume 2 ≤ T ≤ x. The expression in the error term is minimal if
 
1 c ln x
= exp − 9 .
T ln T
Therefore,
1/10
T = exp{c3 ln1/10 x}.
88 The Prime Number Theorem
With the choice of T , we have for sufficiently large x ≥ x0 , 2 ≤ T ≤ x,
!
ln10 x
ψ(x) = x + O x .
exp(c1/10 ln1/10 x)
Since for any  > 0,
 
ln10 x  exp  ln1/10 x ,

we conclude that
ψ(x) = x + O(x exp(−c0 ln1/10 x))
with a suitable choice of c0 > 0. For 2 ≤ x ≤ x0 , we have
ψ(x) = x + O(x exp(−c0 ln1/10 x)).
This completes the proof of the Prime Number Theorem.
The equivalent statements of the above for θ(x) and π(x) are
θ(x) = x + O(x exp(−c ln1/10 x)), (x ≥ 2),
and
π(x) = Li(x) + O(x exp(−c ln1/10 x)), (x ≥ 2),
where Z x
dt
Li(x) = .
2 ln t
6
Dirichlet Series

6.1 Absolute convergence of a Dirichlet series


A Dirichlet series is a series of the form
X∞
f (n)
, s = σ + it,
n=1
ns

where f (n) is an arithmetical function.


Note that if σ ≥ a then |ns | ≥ na . Therefore,

f (n) |f (n)|

ns ≤ na .

Therefore, if a Dirichlet series converges absolutely for s = a + ib, then


by the comparison test, it also converges absolutely for all s with σ ≥ a.
This observation implies the following theorem.

Theorem 6.1.1
Suppose the series

X
f (n)

ns
n=1

does not converge for all s or diverge for all s. Then there exists a real
number σa called the abscissa of absolute convergence, such that the
series
X∞
f (n)
n=1
ns

converges absolutely if σ > σa but does not converge absolutely if σ <


σa .

89
90 Dirichlet Series
Proof . Let D be the set of all reals σ such that
X∞
f (n)

ns
n=1

diverges. Then D is not empty because the series does not converge for
all s. The set D is bounded above since the series does not diverge for
all s. Therefore, D has a least upper bound which we call σa . If σ < σa
then we claim that
X∞
|f (n)|
n=1

diverges. For otherwise,


X∞
|f (n)|
n=1

converges implies

X

f (n)

ns
n=1

converges for all Re s > σ. Hence, σ is an upper bound for D and since
σ < σa , σa is not a least upper bound for D. If σ > σa , then σ 6∈ D
since σa is an upper bound for D and the Dirichlet series converges
absolutely. This proves the theorem.

6.2 The Uniqueness Theorem

Theorem 6.2.1
Let
X∞ X∞
f (n) g(n)
F (s) = and G(s) =
n=1
ns n=1
ns

be absolutely convergent for σ ≥ σ0 . If F (s) = G(s) for each s in an


infinite sequence {sk } such that σk → ∞ as k → ∞, then f (n) = g(n)
for every n.
Proof . Let h(n) = f (n) − g(n) and let H(s) = F (s) − G(s). Then
H(sk ) = 0 for each k. To prove that h(n) = 0 for all n we assume that
h(n) 6= 0 for some n and obtain a contradiction.
Let N be the smallest integer n for which
h(n) 6= 0. (6.2.1)
6.2 The Uniqueness Theorem 91
Then

X ∞
X
h(n) h(N ) h(n)
H(s) = s
= s
+ .
n N ns
n=N n=N +1

Hence,

X h(n)
h(N ) = N s H(s) − N s .
ns
n=N +1

Putting s = sk , we have H(sk ) = 0, and hence



X h(n)
h(N ) = −N sk .
nsk
n=N +1

Choose k so that σk > c where c > σa . Now, note that


 σk −c c
N σk N σk −c N c N N
σ
= σ −c c
≤ .
n k n k n N +1 nc
Then
 (σk −c) ∞
X  σk −c
N c |h(n)| N
|h(N )| ≤ N = A
N +1 nc N +1
n=N +1

where A is independent of k. Letting k → ∞, we find that


 σk
N
→ 0.
N +1
Hence, h(N ) = 0, a contradiction to (6.2.1). Consequently, h(n) = 0 for
all positive integers n.
The above result is very useful. For example let f (n) be a completely
multiplicative function. Suppose
X∞
f (n)
F (s) =
n=1
ns

and
X∞
f −1 (n)
G(s) =
n=1
ns

are absolutely convergent for σ ≥ σ0 . Then we know that


Y f (p)
 X ∞
µ(n)f (n)
G(s) = 1/F (s) = 1− s = .
p
p n=1
ns
92 Dirichlet Series
By Theorem 6.2.1, this shows that

f −1 (n) = µ(n)f (n).

6.3 Multiplication of Dirichlet series


The next theorem relates products of Dirichlet series with the Dirichlet
convolution of their coefficients.

Theorem 6.3.1
Given two functions F (s) and G(s) represented by Dirichlet series
X∞
f (n)
F (s) = for σ > a,
n=1
ns

and
X∞
g(n)
G(s) = for σ > b.
n=1
ns

Then in the half plane where both series converge absolutely, we have
X∞
f ∗ g(n)
F (s)G(s) = .
n=1
ns

If
X∞
α(n)
F (s)G(s) =
n=1
ns

for all s in a sequence {sk } such that σk → ∞ as k → ∞ then α = f ∗ g.

Proof
For any s for which both series converge absolutely, we have
X∞ X ∞
f (n)g(m)
F (s)G(s) = .
n=1 m=1
(nm)s

Because of absolutely convergence, we can multiply these series together


and arrange the terms in any way we please without altering the sum.
Collect together those terms for which mn is constant, say mn = k. The
6.4 Conditional convergence of Dirichlet series 93
possible values of k are 1, 2, · · · , hence,
!
X

f (n)g(m) ∞
X mn=k
X h(k)
F (s)G(s) = =
ks ks
k=1 k=1

where
X
h(k) = f (n)g(m) = f ∗ g(k).
mn=k

This proves the first assertion. The second assertion follows from The-
orem 6.2.1.

6.4 Conditional convergence of Dirichlet series

Theorem 6.4.1
For every Dirichlet series, there exists σc ∈ [−∞, ∞] such that the series
converges (conditionally) for any s with σ > σc and diverges for any s
with σ < σc . Moreover,
σc ≤ σa ≤ σc + 1.

Proof
We will show that if
X∞
f (n)
n=1
ns
converges for s = s1 , then it also converges for every s with σ > σ1 .
Since

X f (n)
n=1
ns
converges at s = s1 , we conclude that there exists a positive integer N0
such that

X f (n)
≤1
s
y<n≤x n 1

for all x > y > N0 . Now, let s with σ > σ1 be given and let x > y ≥ N0 .
Let  > 0 be given. Then
94 Dirichlet Series

X f (n) X f (n)
s
= ns1 −s
n ns1
y<n≤x y<n≤x
X f (n) Z x X
s1 −s f (y) s1 −s f (n) s1 −s−1
= s
x − s
y − t (s1 − s)dt.
n 1 y 1
y nσ1
y<n≤x y<n≤t

Therefore,

X Z x
f (n)
≤ 2y σ1 −σ + |s1 − s|tσ1 −σ−1 dt
s
y<n≤x n y
 
|s1 − s|
≤ 2y σ1 −σ + (6.4.1)
σ − σ1
<

provided that
   1/(σ−σ1 )
 |s1 − s| 

 2 1 + 
σ − σ1 
y≥ .

  

 

We have therefore shown that for any  > 0 and a fixed s with σ > σ1 ,

X
f (n)
<
s
y<n≤x n

whenever
  −1/(σ−σ1 ) !
|s1 − s|
x ≥ y ≥ max N0 , 1+  .
σ − σ1

This shows the convergence of the Dirichlet series at s.


Now, let
( ∞
)
X f (n)
σc := sup Re s| diverges . (6.4.2)
n=1
ns

If σ > σa then by previous argument, we conclude that F (s) is conver-


gent whenever σ = σa + δ, δ > 0. Therefore, we conclude that σa ≥ σc .
It remains to show that σa ≤ σc + 1. We first show that if
X∞
f (n)
n=1
ns
6.4 Conditional convergence of Dirichlet series 95
is convergent at s = s1 then it is absolutely convergent at any s with
σ > σ1 + 1. The series
X∞
f (n)
n=1
ns1

is convergent implies that f (n)n−s1 → 0 as n → ∞, or



f (n)

ns1 ≤ C

for all n ∈ N and some positive constant C. Given s with σ > σ1 + 1,



f (n) f (n) 1 C
=
ns ns1 nσ−σ1 ≤ nσ−σ1 ,

with σ − σ1 > 1. Therefore, for any positive integer m,


Xm m
f (n) X C
≤ < ∞.
ns n σ−σ1
n=1 n=1

Since σ − σ1 > 1, the series



X 1
n σ−σ1
n=1

converges. By comparison test, we conclude that


X∞
f (n)
n=1
ns

is absolutely convergent.
Now, we have shown that if
X∞
f (n)
n=1
ns

X∞
f (n)
is convergent at s1 = σ1 + it, hen is absolutely convergent
n=1
ns
whenever σ > σ1 + 1.
Therefore,
σ1 + 1 > σa .

Now,
σ1 = σc + δ
96 Dirichlet Series
for any positive δ and hence,

σc + 1 ≥ σa .

6.5 Landau’s Theorem for Dirichlet series

Theorem 6.5.1
A Dirichlet series
X∞
f (n)
F (s) =
n=1
ns

is analytic in σ > σc , where σc is given by (6.4.2).

We now come to the main theorem of this chapter.

Theorem 6.5.2
Let
X∞
f (n)
F (s) =
n=1
ns

be a Dirichlet series with f (n) ≥ 0 for all n ∈ N and σc < ∞. Then the
function F (s) has a singularity at s = σc .

Proof
Suppose F (s) is analytic at σc . Then there exists δ > 0 such that F (s)
is analytic in D1 := {s : |s − σc | < δ}. Fix a point on the real axis, say
σ0 > σc contained in this disc and an  > 0 such that the whole disc
D2 := {s : |s − σ0 | < } is inside D1 and σc ∈ D2 (see the following
diagram).
Since F (s) is analytic in D2 , we have the expansion
X∞
F (n) (σ0 )
F (s) = (s − σ0 )n .
n=0
n!

Now σ0 > σc and therefore, F (s) is given by


X∞
f (n)
F (s) = .
n=1
ns
6.5 Landau’s Theorem for Dirichlet series 97

D1
δ
D2

• •
σc σ0

Fig. 6.1. Diagram used to illustrate the choice of discs D1 and D2

Since F (s) is analytic at σ0 , we can differentiate the above term by term


to deduce that
X∞
f (n) lnν n
F (ν) (σ0 ) = (−1)ν .
n=1
nσ0

Substituting this into the Taylor series expansion, we find that


X∞ ∞  
(σ0 − s)ν X f (n) lnν n
F (s) = .
ν=0
ν! n=1
nσ0

Now taking s real, say σ0 −  < s = σ < σ0 , we have


X∞ ∞  
(σ0 − σ)ν X f (n) lnν n
F (σ) =
ν=0
ν! n=1
nσ0
X∞ ∞
f (n) X (σ0 − σ)ν lnν n
= ,
n=1
nσ0 ν=0 ν!

where the interchanging of the summations is valid since all the terms
involved are non-negative. Hence,
X∞
f (n)
F (σ) = exp((σ0 − σ) ln n))
n=1
nσ0
X∞ ∞
f (n) σ0 −σ X f (n)
= n = .
n=1
nσ0 n=1

The last equality shows that Dirichlet series is convergent for some
σ < σc and this contradicts our assumption that σc is the abscissa of
conditional convergence.
7
Primes in Arithmetic Progression

7.1 Introduction
In Chapter 4, we proved that there are infinitely many primes by showing
that (see Theorem 4.4.1 (c))
X1
= ln ln x + O(1). (7.1.1)
p
p≤x

The Dirichlet Theorem of primes in arithmetic progression states that for


(k, l) = 1, there are infinitely many primes of the form kn + l. If we can
prove a result similar to (7.1.1), with sum over primes p replaced by sum
over primes of the form kn + l, then we would have Dirichlet’s Theorem
as a consequence. This strategy motivates the following theorem.

Theorem 7.1.1
Let k > 1 and l be positive integers such that (k, l) = 1. Then
X 1 ln ln x
= + O(1).
p ϕ(k)
p≤x
p≡`(mod k)

Theorem 7.1.1 immediately implies the Dirichlet Theorem of primes


in arithmetic progression.

Theorem 7.1.2 (Dirichlet’s Theorem of primes in arithmetic


progression)

If k and l are positive integers such that (k, l) = 1, then there are
infinitely many primes of the form kn + l.

98
7.2 Dirichlet’s characters 99
7.2 Dirichlet’s characters

Definition 7.2.1
A Dirichlet character (mod k) is an arithmetical function
χ:N→C
satisfying
(i) χ(mn) =(χ(m)χ(n) for all m, n ∈ N.
1 if (n, k) = 1
(ii) |χ(n)| =
0 otherwise.
(iii) χ(n + km) = χ(n) for all n, m ∈ N,
(iv) χϕ(k) (n) = 1, (n, k) = 1.

Remark 7.2.1

(a) The values of χ are 0 or ϕ(k)-th roots of unity. This follows from (iv).
(b) There are only finitely many characters (mod k). This follows from the
fact that χ is defined on ϕ(k) values j with 1 ≤ j ≤ k and (j, k) = 1.
Hence, from (iv), we see that for each j, there are ϕ(k) values we can
assign to χ(j). This shows that there can be at most ϕ(k)ϕ(k) characters.
(c) If χ1 and χ2 are characters modulo k, then so is χ1 χ2 .
(d) A character χ modulo k can be obtained from a homomorphism

χ
e : (Z/kZ) → {z ∈ C||z| = 1}
where (Z/kZ)∗ is the multiplicative group of residue classes
({[n]k |(n, k) = 1}, ·),
with multiplication · as group operation. Given a character χ
e, one de-
fines
(
χe([n]k ), (n, k) = 1
χ(n) =
0, otherwise.
Conversely, given χ, one obtains a homomorphism χ
e given by
χ
e([n]k ) = χ(n).
This shows that there is a one to one correspondence between Dirichlet’s
characters modulo k and homomorphisms from

(Z/kZ) to {z ∈ C||z| = 1}.
100 Primes in Arithmetic Progression
Theorem 7.2.1
There are exactly ϕ(k) Dirichlet characters modulo k.

Proof . From Remark 7.2.1 (d) above, it suffices to show that there
are exactly ϕ(k) homomorphisms from

(Z/kZ)∗ to {z ∈ C||z| = 1}.

From the structure theorem of abelian group [?, Theorem 8.2], we


know that (Z/kZ)∗ can be written as a direct sums of cyclic groups
with prime power order, say,

(Z/kZ) = Ch1 × · · · × Chr ,

where hi are prime powers and Cm denotes a cyclic group of order m.


Let [ai ]k be a generator for Chi , 1 ≤ i ≤ r. Given w1 , · · · , wr such
that
wihi = 1,

set
χ
e([ai ]k ) = wi , 1 ≤ i ≤ r.

If
Y
[n]k = [ai ]α
k ,
i

then define
Y
χ
e([n]k ) = e([ai ]k )αi .
χ
i

Note that χ
e is a homomorphism from

(Z/kZ) to {z ∈ C||z| = 1}.

Therefore, we have at least

h1 · · · hr = ϕ(k)

such homomorphisms.

Next, let [a]k ∈ (Z/kZ) . Then

[a]k = [a1 ]α αr
k · · · [ar ]k
1

where
0 ≤ αi ≤ hr − 1.
7.3 The orthogonal relations 101
Now if χ
e is a homomorphism from

(Z/kZ) to {z ∈ C||z| = 1},

then
Y
χ
e([a]k ) = e([ai ]k )αi .
χ
i

The value χe([a]k ) is dependent on the values χ e([ai ]k ), 1 ≤ i ≤ r. The


number of possible values for χ e([ai ]k ) is hi , 1 ≤ i ≤ r. Therefore, there
can be at most h1 h2 · · · hr = ϕ(k) characters. In conclusion, we deduce
that there are exactly ϕ(k) characters (mod k).
The character χ0 will always denote the principal character (mod k),
that is,
(
1 if (n, k) = 1
χ0 (n) =
0 otherwise.

The character χ will denote the inverse of χ, or, χ · χ = χ0 .

7.3 The orthogonal relations

In this section, we will often identify (see Remark 7.2.1 (d)) Dirichlet’s
characters χ with homomorphism χ e from

(Z/kZ) to {z ∈ C||z| = 1}.

Theorem 7.3.1

(a) Let χ1 , χ2 be two Dirichlet’s characters modulo k. Then


k
(
X ϕ(k) if χ1 = χ2 ,
χ1 (a)χ2 (a) =
a=1 0 otherwise.

(b) Let a1 , a2 be integers with (ai , k) = 1. Then


(
X ϕ(k) if a1 ≡ a2 (mod k),
χ(a1 )χ(a2 ) =
χ (mod k)
0 otherwise.
102 Primes in Arithmetic Progression
Proof of (a).

We will prove the following:


k
(
X ϕ(k) if χ = χ0 ,
χ(a) = (7.3.1)
a=1 0 otherwise.

We first observe that since χ(l) = 0 whenever (l, k) 6= 1, we must have


k
X k
X
χ(a) = χ(a).
a=1 a=1
(a,k)=1

If χ = χ0 then for (a, k) = 1, χ(a) = 1 and


k
X k
X k
X
χ(a) = χ(a) = 1 = ϕ(k).
a=1 a=1 a=1
(a,k)=1 (a,k)=1

If χ 6= χ0 , then there exists an a0 relatively prime to k such that χ(a0 ) 6=


1. Now,
k
X X
χ(a0 ) χ(a) = χ
e([a0 ]k ) χ
e([a]k )
a=1 [a]k ∈(Z/kZ)∗
X
= χ
e([a0 ]k [a]k ).
[a]k ∈(Z/kZ)∗


Now, the multiplication of elements in (Z/kZ) by [a0 ]k permutes the
elements in (Z/kZ)∗ . Hence,

X X k
X
χ
e([a0 ]k [a]k ) = χ
e([a]k ) = χ(a).
[a]k ∈(Z/kZ)∗ [a]k ∈(Z/kZ)∗ a=1

Therefore, we conclude that


k
X
χ(a) = 0.
a=1

We now let χ = χ1 χ2 in (7.3.1) to complete the proof of (a).

Proof of (b).

We will first show that


7.4 The Dirichlet L-series 103

(
X ϕ(k) if a ≡ 1 (mod k),
χ(a) = (7.3.2)
χ(mod k)
0 otherwise.

If a ≡ 1 (mod k) then χ(a) = 1 for all characters χ. Since there are


exactly ϕ(k) such characters, we conclude that
X
χ(a) = ϕ(k).
χ

Next, suppose a 6≡ 1 (mod k). Then there exists a character χ∗ so


that χ∗ (a) 6= 1. Now, observe that the set of Dirichlet characters modulo
k forms a group under multiplication of functions. Therefore,
X X X
χ∗ (a) χ(a) = χ∗ χ(a) = χ(a),
χ χ χ

where we have used the fact that multiplying the elements in the group
of Dirichlet characters modulo k by χ∗ permutes the elements in the
group. This implies that
X
χ(a) = 0.
χ

Now, in order to prove (b), we simply view χ as χ e and let [a]k =



[a1 ]k [a2 ]k where [a]k denotes the inverse of [a]k in the group (Z/kZ) ,
and observe that

χ(a1 )χ(a2 ) = χ
e([a1 ]k )e
χ([a2 ]k ).

7.4 The Dirichlet L-series

Definition 7.4.1
The Dirichlet L-series is defined as
X χ(n)
L(s, χ) = , σ > 1.
ns
n≥1

Theorem 7.4.1
104 Primes in Arithmetic Progression
(a) If χ = χ0 then L(s, χ) can be analytically continued to the half-plane
σ > 0, with the exception of the point s = 1 where it has a simple pole
with residue ϕ(k)/k.
(b) If χ is not the principal character (mod k), then L(s, χ) can be analyt-
ically continued to σ > 0.

Proof [Proof of (a)]

For σ > 1, we have by Theorem 5.3.2,


Y χ(p)
−1
L(s, χ) = 1− s .
p
p

Therefore,
Y 1
−1 Y 
1
−1 Y 
1

L(s, χ0 ) = 1− s = 1− s 1− s
p p
p p
p-k p|k
 
ϕ(k) Y 1
= 1− s .
k p p

The function ζ(s) has an analytic continuation with residue 1 at s = 1.


Therefore, the residue of L(s, χ0 ) at s = 1 is ϕ(k)/k.

Proof of (b).

If χ 6= χ0 , then
k
X
χ(n) = 0.
n=1

Therefore,


X
χ(n) ≤ k,

n≤x

for x ≥ 1. Now,
X χ(n) x Z x
1 X
+ s X
s
= s
χ(n) s+1
χ(n) dt.
n t y y t
y<n≤x n≤t n≤t
7.5 Proof of Dirichlet’s Theorem 105
Hence, for any  > 0,

 
X χ(n) 2k |s|
≤ 1+ < ,
s σ
y≤n≤x n y σ

whenever
  1/σ
2k |s|
|y| > 1+ .
 σ
This implies that L-series converges for σ > 0.

7.5 Proof of Dirichlet’s Theorem

Step 1.

It suffices to show that if x ≥ 3 and


1
σ =1+ ,
ln x
then
X  
1 1 1
= ln + O(1).
p pσ ϕ(k) σ−1
p≡l(mod k)

Let
X 1 X 1
Σ1 = and Σ2 = ,
pσ p
p≡l(mod k) p≤x
p≡l(mod k)

where
1
σ =1+ .
ln x
Then

X 1 1
 X 1
|Σ1 − Σ2 | ≤ − + .
p pσ p>x

p≤x
| {z } | {z }
Σ3 Σ4
106 Primes in Arithmetic Progression
Now,

X 1 − e−(σ−1) ln p X (σ − 1) ln p
Σ3 = ≤
p p
p≤x p≤x
1 X ln p
= = O(1),
ln x p
p≤x

and
X 1
Σ4 = lim
y→∞ pσ
x≤p≤y
 
X X Z yX  
1 1 σ
= lim  σ 1− σ 1− 1 − σ+1 dt
y→∞ y x x t
p≤y p≤x p≤t
Z ∞  
t dt
= O(1) + O σ+1
x ln t t
Z ∞ 
dt
= O(1) + O
tσ ln t
 x Z ∞ 
1 dt
= O(1) + O = O(1).
ln x x tσ

Therefore, if
1
σ =1+ ,
ln x

then
X 1 1 1 1
= ln + O(1) = ln ln x + O(1)
p pσ ϕ(k) σ − 1 ϕ(k)
p≡l (mod k)

and this would imply the Dirichlet Theorem since

X 1 X 1
− = O(1).
p pσ p
p≤x
p≡`(mod k) p≡`(mod k)

Step 2.
7.5 Proof of Dirichlet’s Theorem 107
We observe that for σ > 1,
 
X 1 X 1 X
=  1 χ(l)χ(p)
p σ p σ ϕ(k)
p p χ (mod k)
p≡l(mod k)
1 X
= χ(l)S(σ, χ),
ϕ(k)
χ (mod k)

where
X χ(p)
S(σ, χ) = .
p

Now,
XX 1 X 1 X  1
 X
1
− = − ln 1 − − = O(1),
p m≥1
mpmσ p
p σ
p
p σ
p
p σ

since
XX 1 1XX 1 1X 1

≤ mσ
= = O(1).
p m≥2
mp 2 p p 2 p pσ (pσ − 1)
m≥2

Therefore,
XX 1 XX 1
S(σ, χ0 ) = mσ

p m≥1
mp mpmσ
p|k m≥1
X  1

=− ln 1 − σ + O(1)
p
p
Y 1
−1
= ln 1− σ + O(1)
p
p

= ln ζ(σ) + O(1)
 
1
= ln + O(1).
σ−1
The last equality follows from the fact that for σ near 1, we have
1
ζ(σ) = + g(σ),
σ−1
where g(σ) is a function analytic at 1. We conclude that the main term
arises from the principal character χ0 . Hence, it remains to show that
S(σ, χ) = O(1)
for σ > 1 and all non-principal characters χ (mod k).
108 Primes in Arithmetic Progression
Step 3.

Now, using computations similar to Step 2, we find that


X χ(p) X X χ(p)m
S(σ, χ) = = + O(1)
p
pσ p m≥1
mpmσ
X  χ(p)

=− ln 1 − σ + O(1)
p
p

= ln(L(σ, χ)) + O(1).


Now, for χ 6= χ0 L(s, χ) is analytic in σ > 0. So, L(σ, χ) is continuous
at σ > 1 and
lim L(σ, χ) = L(1, χ).
σ→1

If L(1, χ) 6= 0 then we are done. It remains to show that L(1, χ) 6= 0.


Step 4.

We first show that when χ 6= χ0 is a complex character (mod k), then


L(1, χ) 6= 0.
Consider the expression
Y
P (σ) = L(σ, χ).
χ(mod k)

We find that for σ > 1,


X
ln P (σ) = ln L(σ, χ)
χ(mod k)
X X X χ(pm )
=
p
mpmσ
χ(mod k) m≥1
XX 1 X
= χ(pm )χ(1)
p m≥1
mpmσ
χ(mod k)
X X 1
= ≥ 0.
p
mpmσ
m≥1
pm ≡1(mod k)

Hence, for σ > 1,


P (σ) ≥ 1. (7.5.1)
Suppose that L(1, χ) = 0 for some χ. Then L(1, χ̄) = 0. Hence, P (s)
7.5 Proof of Dirichlet’s Theorem 109
has two zeros at s = 1. But L(s, χ0 ) has a simple pole at s = 1, which
means that P (1) = 0. This is a contradiction to (7.5.1).

Step 5.

In this final step, we show that for real character χ 6= χ0 , L(1, χ) 6= 0.


Consider the function f = χ ∗ u. Then f is multiplicative by Theorem
??. Note that


1 if p|k
Xm 
l
χ(p ) = ≥ 1 if p - k, m even


l=0 ≥ 0 if p - k, m odd .

Thus, f (n) ≥ 0 for all n and f (n) ≥ 1 when n is a square. Hence,


X f (n) X 1
F (σ) = ≥ = ζ(2σ).
nσ n2σ
n≥1 n≥1

In particular, F (σ) diverges at σ = 1/2 and so σc ≥ 1/2. By Theorem


6.5.2, F (s) must have a singularity at s = σc ≥ 1/2.
On the other hand, for σ > 1,
F (s) = L(s, χ)ζ(s).
If L(1, χ) = 0, then F (s) would be analytic in σ > 0 and hence at σ = σc .
This contradicts our previous observation that F (s) has a singularity at
σc and we must have L(1, χ) 6= 0.

From Steps 3 and 4, we conclude that L(1, χ) 6= 0 for all non-principal


characters χ. This completes the proof of Dirichlet’s Theorem.

Remark 7.5.1
If p is a prime that satisfies the property that p + 2 is also a prime, then
we call p a twin prime. The twin primes conjecture states that there are
infinitely many twin primes. This statement remains an open problem.
Motivated by Merten’s estimates and the proof of Dirichlet’s Theorem
of primes in arithmetical progression, it is natural to consider the sum
X 1
p
p≤x,p∈T

where T is the set of twin primes. If one can prove that the sum is
divergent, then there would be infinitely many primes. Unfortunately,
this sum turns out to be convergent (using sieve method). Consequently,
110 Primes in Arithmetic Progression
this line of attack fails to provide a proof of the twin primes conjecture.
8
Introduction to Sieves

8.1 A weaker upper bound for π(x)

In this Chapter, we will learn some important tools in analytic number


theory. We begin with the Large Sieve. Our treatment here is based on
the lectures of F. Sica and the paper † by E. Bombieri.
Sieve methods are important tools in analytic number theory. The
earliest sieve is due to Eratothenes. The basic ideas of sieves are simple.
Given a set of integers less than x, we want to sieve out set A which
satisfies a certain property P. For example, how many primes are there
from 1 to x? We know that by elementary argument that an integer

n ≤ x is a prime if p - n for all p ≤ x. In other words, if

Y
P = p,

p≤ x


then to decide if x < n < x is a prime, it suffices to check if (n, P ) = 1.
Let x > 1 be a real number and π(x) be the number primes less than

† E. Bombieri, Le grand crible dans la théorie analytique des nombres, Astérisque


No. 18, Soc. Math. France, Paris, 1974.

111
112 Introduction to Sieves
x. We find that
√ X
π(x) − π( x) = 1

x<n≤x
(n,P )=1
X X
= µ(d)

x<n≤x d|P
d|n
X X
= µ(d) 1

d|P x<n≤x
d|n
X h i  √ 
x x
= µ(d) −
d d
d|P
 
X µ(d) X √ X µ(d)
=x +O 1 − x
d d
d|P d|P d|P
Y  1
  √  √ Y  1

=x 1− + O 2π( x) − x 1− .
√ p √ p
p≤ x p≤ x

But by Merten’s estimate,


Y  1
  x 
x 1− =O
√ p ln x
p≤ x

and so, the error term 2π( x)
is larger than the main term, that is,
(please check)
x √
2π( . x)

ln x
We now modify the argument, introducing a new parameter 1 ≤ y ≤
x.
Given an integer n ≤ x, a necessary condition for n to be composite
is that p|n, p ≤ y. Then
π(x) − π(y) ≤ α(x), (8.1.1)
where
X X
α(x) = 1= 1,
n≤x n≤x
p|n =⇒ p>y (P,n)=1

with
Y
P = p.
p≤y
8.2 Selberg’s sieve and its applications 113
To see why (8.1.1) is true, we note that the left hand side gives the
number of primes between y and x. The function α(x) measures the
number of integers n ≤ x that are divisible by “large primes”, that is,
primes greater than y. This would include primes between y and x and
hence the left hand side is less than the right hand side of (8.1.1). But
α(x) also counts those integers n ≤ x which are relatively prime to P .
Therefore,
 
Y 1
  
π(x) − π(y) = O x 1−  + O 2π(y)
p
p≤y
 
x
=O + O(2y ).
ln y
Setting y = ln x, we find that
x
xln 2  ,
ln ln x
and hence
x
π(x) − π(ln x) + 1  ,
ln ln x
or, as a corollary,
x
π(x)  .
ln ln x
This bound is of course weaker than Tchebychev’s estimate but nonethe-

less, it is a non-trivial bound that is obtained from replacing x by y.
We will now study the situation closely. Let A be a subset of the set
of integers less than x and P be the set of primes. Let

E = {n ∈ A|n 6≡ 0 (mod p) for all p ∈ P}.

The set E is an example of a sifted set. Note that in our example above,
our set A is the set of integers less than x and P is the set of primes less
than y and α(x) = |E|.

8.2 Selberg’s sieve and its applications

We now generalize the situation in the previous section. Let A be a


subset of N. Let P be the set of primes less than Q. Let Ωp be a set of
γ(p) distinguished residue classes modulo p. Let

E = {n ∈ A|n (mod p) 6∈ Ωp for all p ∈ P}.


114 Introduction to Sieves
In the case when we are bounding π(x), the distinguished residue class
is 0 (mod p).
Let
S(A, P, Q) = |E|.

Our aim is to bound S(A, P, Q).

Theorem 8.2.1 (A. Selberg)

Let M , N and Q be positive integers. Let A be the set of integers


between M + 1 and M + N . Let Q be the set of q ≤ Q whose prime
factors are in P. Then
N + 1 + 3Q2
S(A, P, Q) ≤
L
where
X Y γ(p)
L= µ2 (q) .
p − γ(p)
q∈Q p|q

We will prove Selberg’s sieve in the next section. In this section, we


will study two applications of Selberg’s sieve.

Example 8.2.1

Let A = {n|n ≤ x} and P = {p prime|p ≤ x}. In other words,

Q = x. Let Ωp = {0}. This implies that γ(p) = 1. Let

E = {n ∈ A|n 6≡ 0 (mod p) for all p ∈ P.}.



If n is a prime between x and x then n ∈ E. Hence

π(x) − π( x) ≤ |E|.

By Selberg’s sieve, we conclude that



√ x + 1 + 3( x)2 x
π(x) − π( x) ≤ 
L L
where
X Y 1
L= µ2 (q) .
p−1
q∈Q p|q
8.2 Selberg’s sieve and its applications 115
But
!
X µ2 (q) Y 1
L 1
√ q 1− p
q≤ x p|q
X µ2 (q) Y  1 1

 1 + + 2 + ···
√ q p p
q≤ x p|q
X 1
 = ln x + O(1).
√ n
n≤ x

The last estimate in the above follows by first writing


n = pα1 α2 αk
1 p2 · · · pk q1 · · · qj

where αi ≥ 2. We then set


n0 = p1 p2 · · · pk q1 · · · qj
and
n00 = pα
1
1 −1
· · · pα
k
k −1
,
and observe that the term
1
n0 n00
appears as a term in the form
µ2 (n0 ) 1
n0 n00
on the left hand side.
Hence,
x
π(x)  ,
ln x
which is Chebyschev’s estimate.

The next example illustrates the use of Selberg’s sieve in the study of
twin primes.

Example 8.2.2

Let π2 (x) be the number of primes p less than x such that p + 2 is also
prime. We will show that
 
x
π2 (x) = O . (8.2.1)
ln2 x
116 Introduction to Sieves
As a result, we have
X 1
= O(1).
p
p≤x
p + 2 is a prime

The last conclusion follows from the expression that


Z x Z x
π2 (x) π2 (t) 1
+ 2
dt  1 + 2 dt  1.
x 2 t 2 t ln t
√ √
To prove (8.2.1), let Q = x and P = {p ≤ x} and A = {n ≤ x}. Let
E = {n ∈ A|n 6≡ 0 (mod p) and n 6≡ −2 (mod p) for all p ∈ P}.
In other words, γ(p) = 2 if p 6= 2 and γ(2) = 1. Note that E contains
twin primes r ≤ x. By Selberg’s sieve, we find that
√ x+1+x
π2 (x) − π2 ( x) ≤ E  ,
L
where
X Y γ(p)
L= µ2 (q) ,
p − γ(p)
q∈Q p|q

with Q containing integers with prime divisors p ≤ x. Now,
X Y γ(p) X Y 2 1
µ2 (q) = µ2 (q)
p − γ(p) p−22−1
q∈Q p|q q∈Q p|q
p6=2
1X 2 Y 2 2
= µ (q)
2 p−22−1
q∈Q p|q
p6=2
X 2 Y
 µ2 (q)
p−1
q∈Q p|q
X µ (q)2ω(q) Y 
2
1 1

= 1 + + 2 + ··· .
q p p
q∈Q p|q

The sum
X µ2 (q)2ω(q)
q
q∈Q

adds up term for which q is squarefree. But the sum


X µ2 (q)2ω(q) Y  1 1

1+ + 2 + ···
q p p
q∈Q p|q
8.3 The Large Sieve 117
is a sum of the type
!
X µ2 (q)2ω(q) X1
,
q n
q∈Q n∈B

with
B = {m| p|m =⇒ p|q}.
Therefore,
X µ2 (q)2ω(q) X 1 X 2ω(n)
≥ . (8.2.2)
q n n
q∈Q n∈B n∈Q

The last inequality holds because an integer n appearing on the right


hand side can be written as q · q 0 where q is the squarefree part of n and
that the prime divisors of q 0 are prime divisors of q. Therefore, the term
2ω(n)
n corresponding to n can be written as

2ω(q) 1
· 0
q q
and this term is present in the sum of the left hand side. Now,
X 2ω(n) X 2ω(n)
= .
n √ n
q∈Q q≤ x

Since
X 6
2ω(n) = y ln y + O(y),
π2
n≤y

and
X 2ω(n) 3
= 2 ln2 y + O(ln y),
n π
n≤y

we conclude from (8.2.2) that



L  ln2 x
and we complete the proof of (8.2.1).

8.3 The Large Sieve

In this section, we introduce the Large Sieve. This will be used in the
next section to derive Selberg’s sieve.
118 Introduction to Sieves
Theorem 8.3.1
Let x1 , x2 , · · · , xR be δ-well spaced. That is to say, if
k x k= min |x − n|,
n∈Z

then for k 6= j,
k xk − xj k≥ δ.
Let
M+N
X
S(x) = an e2πinx .
n=M+1

Then
R
X   M+N
X
3
|S(xj )|2 ≤ N +1+ |an |2 .
j=1
δ
n=M+1

We will prove Theorem 8.3.1 using Selberg’s Lemma (see Lemma


8.3.2).
First, we need the definition of Hermitian product. Let H be a com-
plex vector space. A Hermitian product is a function from H × H to C
satisfying the following properties :

(a) For all u, v, w ∈ H, (u + v, w) = (u, w) + (v, w) and (u, v + w) =


(u, v) + (u, w).
(b) For all u, v ∈ H and α ∈ C, (αu, v) = α(u, v) and (u, αv) = α(u, v).
(c) For all u, v ∈ H, (u, v) = (v, u).
(d) For all u ∈ H, (u, u) ∈ R and (u, u) ≥ 0.

Lemma 8.3.2 Let H be a complex vector space together with a Her-


mitian product (·, ·). Let φ1 , φ2 , · · · , φR , f ∈ H. Then
R
X |(f, φk )|2
PR ≤ (f, f )2
k=1 j=1 |(φk , φj )|

Proof
For all (ξs ) ∈ CR , we have
X X
(f − ξk φk , f − ξk φk ) ≥ 0.
k k
8.3 The Large Sieve 119
Hence
X X
(f − ξk φk , f − ξk φk ) ≥ 0.
k k

Now
X X X
(f − ξk φk , f ) + (f − ξk φk , − ξk φk )
X X X X
= (f, f ) − ( ξk φk , f ) − (f, ξk φk ) + ( ξk φk , ξj φj )
X X X
= kf k2 − ξk (φk , f ) − ξk (φk , f ) + ξk ξj (φk , φj )
X X
= kf k2 − 2Re ξk (φk , f ) + ξk ξj (φk , φj ).
Next observe that
X
ξk ξj (φk , φj )
k,j

is real. This is because when k = j, the summand is real and when


k 6= j, we can pair (l, j)-term with (j, k)-term to obtain the real number
2Re ξk ξj (φk , φj ).
Hence,

X X


ξk ξj (φk , φj ) ≤ ξk ξj (φk , φj )
k,j k,j
X
≤ ξk ξj (φk , φj )
k,j
1X 
≤ |ξk |2 + |ξj |2 |(φk , φj )|
2
k,j
X
= |ξk |2 |(φk , φj )|,
k,j

where the second last inequality follows from


1 
|ξk |2 + |ξj |2 ≥ |ξk ξj |. (8.3.1)
2
Note that (8.3.1) is a consequence of
2
(|ξk | − |ξj |) ≥ 0.
Therefore,
X  X
(f, f ) − 2Re ξk (φk , f ) + |ξk |2 |(φk , φj )| ≥ 0. (8.3.2)
k,j
120 Introduction to Sieves
Next, let
(f, φk )
ξk = P .
j |(φk , φj )|

Then we may rewrite (8.3.2) as


!
X |(φ , f )|2 X |(φk , f )|2 X
(f, f ) − 2Re P k + P 2 |(φk , φj )|
k j |(φk , φj )| k |(φ , φ )| j
j k j

X |(φ , f )|2 X |(φk , f )|2


= kf k2 − 2 P k + P ≥ 0.
k j |(φk , φj )| j |(φk , φj )|
k

Therefore,
R
X |(f, φk )|2
(f, f ) ≥ PR
k=1 j=1 |(φk , φj )|

We are now ready to prove Theorem 8.3.1 in the following “symmetric”


form:

Theorem 8.3.3
Let x1 , x2 , · · · , xR be δ-well spaced. That is to say, if
k x k= min |x − n|,
n∈Z

then
R
X   X
N
3
|S(xj )|2 ≤ 2N + 1 + |an |2 ,
j=1
δ
n=−N

where
N
X
S(x) = an e2πinx .
n=−N

We will need an elementary trigonometric lemma.

Lemma 8.3.4 We have


N
X sin2 N x
(N − |n|)e2inx = .
n=−N
sin2 x
8.3 The Large Sieve 121
Proof
Note that
N
X N
X N
X
(N − |n|)e2inx = N − 2 n cos 2nx + 2N cos 2nx.
n=−N n=1 n=1

But
N
X 1 sin(2N + 1)x − sin x
cos 2nx = (8.3.3)
n=1
2 sin x

and
N
X 1 cos x − cos(2N + 1)x
† sin 2nx = . (8.3.4)
n=1
2 sin x

From (8.3.3), we find that


N
X sin(2N + 1)x
N + 2N cos 2nx = N .
n=1
sin x

From (8.3.4), we deduce that


N
X 
1
2 n cos 2nx = sin x {− sin x + (2N + 1) sin(2N + 1)x}
n=1
2 sin2 x

2
− cos x + cos x cos(2N + 1)x .

Hence,
N
X N
X
N + 2N cos 2nx − 2 n cos 2nx
n=1 n=1
1
= (sin(2N + 1)x sin x + cos x cos(2N + 1)x − 1)
2 sin2 x
1
= (cos(2N x) − 1)
2 sin2 x
sin2 N x
= .
sin2 x

† Use 2
X X
N
sin x cos 2nx = (sin(2n + 1)x − sin(2n − 1)x) .
n=1N n=1
122 Introduction to Sieves
Now, let H be the space of sequences α = (α(n))n∈Z such that
X
|α(n)|2 < ∞.
n

Define
X
(α, β) = α(n)β(n).
n

Let
(
an if |n| ≤ N,
f (n) =
0 otherwise.

Let


 e−2πinxj |n| ≤ N

 1/2
N + ` − |n|
φj (n) = e−2πinxj , if N < |n| ≤ N + ` .

 `


0 otherwise.

Clearly,
N
X
kf k2 = |an |2
n=−N

and
N
X
(f, φj ) = an e2πinxj .
n=−N

Also,
N
X X N + ` − |n| −2πin(xk −xj )
(φk , φj ) = e−2πin(xk −xj ) + e
`
n=−N N <|n|≤N +`
N
X +` XN
N + ` − |n| −2πin(xk −xj ) N − |n| −2πin(xk −xj )
= e − e
` `
n=−N −` n=−N
 2  2 !
1 sin(π(N + `)(xk − xj )) sin(πN (xk − xj ))
= − ,
` sin(π(xk − xj )) sin(π(xk − xj ))

by previous Lemma. Therefore,


1 
(φk , φk ) = (N + `)2 − N 2 = 2N + `,
`
8.3 The Large Sieve 123
and for i 6= j,
2
|(φk , φj )| ≤ .
`| sin π(xk − xj )|2
If we use | sin πx| ≥ 2|x| for all |x| ≤ 12 , then we find that
| sin2 π(xk − xj )| = sin2 (πkxk − xj k) ≥ (2kxk − xj k)2 .
By hypothesis, we have
kxk − xj k ≥ δ
if k 6= j. Hence, for k 6= j, we have
1
|(φk , φj )| ≤ .
2`kxk − xj k2
Now,
R
X X
|(φk , φj )| = 2N + ` + |(φk , φj )|
j=1 k6=j
1 X 1
≤ 2N + ` +
2` kxk − xj k2
j6=i
X∞
1 1
≤ 2N + ` + · |{xj |sδ ≤ kxk − xj k < (s + 1)δ}|
2` s=1
s2 δ 2
But there are at most two values xj in the set †
{xj |kδ ≤ kxk − xj k < (k + 1)δ}.
Hence,
R
X 1 π2 2
|(φk , φj )| ≤ 2N + ` + ≤ 2N + ` + 2 .
j=1
`δ 2 6 `δ

Let  
1
`= + 1.
δ
Then
1 1
< ` ≤ + 1.
δ δ
Therefore,
R
X 3
|(φk , φj )| ≤ 2N + 1 + .
j=1
δ

† Take the end points of Iδ to be two of these xj ’s.


124 Introduction to Sieves
Substituting the above inequality into Selberg’s Lemma, we deduce that
X |(f, φj )|2
(f, f ) ≥ P
k j |(φk , φj )|

yields
P 2
N
N
X X n=−N an e2πinxk
2
|an | ≥ .
3
n=−N i 2N + 1 +
δ
Rearranging, we complete the proof of Theorem 8.3.3.
To deduce Theorem 8.3.1, we set an = a0M+N +n+1 . Then the sum
ranges from M + 1 to M + 2N + 1 and
M+2N
X +1
2πi(−M−N −1)
S(x) = e a0n e2πinx .
n=M+1

But

M+2N +1
X
|S(x)| = a0n e2πinx .

n=M+1

Hence, we obtain
R
X   M+2N
X +1
3
|S(xj )|2 ≤ 2N + 1 + |a0n |2
j=1
δ
n=M+1
  M+2N
X +1
3
≤ 2N + 1 + 1 + |a0n |2 .
δ
n=M+1

This implies that Theorem 8.3.1 is true for odd positive integers. Next,
in the above odd case, set

a0M+2N +1 = 0.

Then we find that



R M+2N
2   M+2N
X
X 0 2πinxj 3 X
an e ≤ 2N + 1 + |a0n |2
δ
j=1 n=M+1 n=M+1

and Theorem 8.3.1 is true for even positive integers. This completes the
proof of Theorem 8.3.1.
8.4 Farey sequence and Selberg’s Sieve 125
8.4 Farey sequence and Selberg’s Sieve
By a Farey sequence of order n, denoted Fn , we mean a set of reduced
fractions in the interval from 0 to 1, whose denominators are less than
or equal to n, arranged in ascending order of magnitude.
One fact about elements in Farey sequence is that if a/b, a0 /b0 are
successive terms in Fn , then
a0 a 1
0
− = 0.
b b bb
Therefore,

a a0
− = 1 > 1
b b0 bb0 Q2

if b ≤ Q. Therefore, the elements in the Farey sequence are 1/Q2 well


spaced.
Using non-zeroes elements of the Farey sequence of order Q, we con-
clude from Theorem 8.3.1 that
q   2
X X  M+N
X
S a ≤ N + 1 + 3Q2 |an |2 . (8.4.1)
q
q≤Q a=1 n=M+1
(a,q)=1

To prove Selberg’s Theorem, we will choose an such that

an = 0 whenever n 6∈ E. (8.4.2)

We will show that it suffices to prove the following inequality:

M+N 2   2
X q
2 Y γ(p) X
S a .
an µ (q) ≤ (8.4.3)
p − γ(p) q a=1
n=M+1 p|q
(a,q)=1

If (8.4.3) is true, then by letting


(
1 if n ∈ E
an = .
0 otherwise

From (8.4.3), we find that


2  
M+N X q   2
X  2
Y γ(p)

X X
S a .
an µ (q) ≤
p − γ(p) q a=1
n=M+1 q≤Q p|q q≤Q
(a,q)=1
126 Introduction to Sieves
Hence,
 
q   2
X Y γ(p)  X X a
|E|2  µ2 (q) ≤ S
p − γ(p) a=1
q
q∈Q p|q q≤Q
(a,q)=1
M+N
X
≤ (N + 1 + 3Q2 ) |an |2
n=M+1
2
≤ (N + 1 + 3Q )|E|.

Therefore,

N + 1 + 3Q2
|E| ≤ ,
L

where
X Y γ(p)
L= µ2 (q) .
p − γ(p)
q∈Q p|q

We now prove (8.4.3). First, we observe that if q is not squarefree,


then µ(q) = 0 and (8.4.3) is true since its right hand side is non-negative.
From now, we may assume q to be squarefree.
Let
Y γ(p)
J(q) = µ2 (q) .
p − γ(p)
p|q

We now rewrite (8.4.3) as

q
X   2

S a ≥ |S(0)|2 J(q). (8.4.4)
q
a=1
(a,q)=1

Note that if we can establish (8.4.4), then by replacing an by an e2πinβ ,


we find that
q   2
X
S a + β ≥ |S(β)|2 J(q). (8.4.5)
q
a=1
(a,q)=1

Next, suppose we have proved (8.4.4) for q, q 0 with (q, q 0 ) = 1. Then


8.4 Farey sequence and Selberg’s Sieve 127
by the Chinese Remainder Theorem,(8.4.4) and (8.4.5), we find that

qq
X
0   2 Xq q0
X   2

S c = S a + b
qq 0 q q0
c=1 a=1 b=1
(a,qq0 )=1 (a,q)=1 (b,q0 )=1
q   2
X
≥ S a J(q 0 )
q
a=1
(a,q)=1

≥ |S(0)|2 J(q)J(q 0 ) = |S(0)|2 J(qq 0 ).

The two observations above show that we would only need to prove
(8.4.4) for q = p where p is a prime.
Now, let p be a prime and let

M+N
X
Z(p, a) = an .
n=M+1
n≡a (mod p)

We note that

2
X
|Z(p, a)| = an am . (8.4.6)
M+1≤m,n≤M+N
m≡n≡a (mod p)

Furthermore, if n ≡ a (mod p) and a ∈ Ωp , then n 6∈ E. This is because


an element n in E must satisfy n 6≡ a (mod p) for all a ∈ Ωp . Hence,

Z(p, a) = 0 if a ∈ Ωp (8.4.7)

since (8.4.2) implies that an = 0 whenever n 6∈ E.


128 Introduction to Sieves
Now,
p−1   2
2
X X M+N
p−1 X
S a = a n e 2πin a
p


p
a=0 a=0 n=M+1
p−1
X X a
= an am e2πi(n−m) p
a=0 M+1≤n,m≤M+N
X
=p an am
M+1≤n,m≤M+N
n≡m (mod p)
p
X X
=p an am
a=1 M+1≤n,m≤M+N
m≡n≡a (mod p)
p
X
=p |Z(p, a)|2 ,
a=1

where we have used (8.4.6) in the last equality. In other words, we have
X  a  2
p−1 X p
2
S =p |Z(p, a)| . (8.4.8)
p
a=0 a=1

Let
(
1 if a 6∈ Ωp
χa =
0 otherwise.
By Cauchy’s inequality
X 2 X X

an b n ≤ |an |2 |bn |2 ,

we find that
p 2
X p
X
2
Z(p, a)χa ≤ (p − γ(p)) |Z(p, a)| . (8.4.9)

a=1 a=1

But by (8.4.7),
p 2
X

Z(p, a)χa = |S(0)|2 . (8.4.10)

a=1

Using (8.4.8), (8.4.10) and (8.4.9), we conclude that


p−1   2
!
p − γ(p) X a
|S(0)|2 ≤ S + |S(0)|2 .
p p
a=1
8.4 Farey sequence and Selberg’s Sieve 129
Simplifying, we find that

X  a  2
p−1
S ≥ γ(p) |S(0)|2 ,
p p − γ(p)
a=1

and this completes the proof of (8.4.4) for prime number p and the proof
of Theorem 8.2.1 is complete.

Das könnte Ihnen auch gefallen