Fast Integer Multiplication

NEW SOUTH WALES
Algorithms:
COMP3121/3821/9101/9801
School of Computer Science and Engineering

University of New South Wales
4. FAST LARGE INTEGER MULTIPLICATION
COMP3121/3821/9101/9801 1 / 1
Basics revisited: how do we multiply two numbers?
The primary school algorithm:
X X X X <- first input integer

* X X X X <- second input integer
-------
X X X X \
X X X X \ O(n^2) intermediate operations:
X X X X / O(n^2) elementary multiplications
X X X X / + O(n^2) elementary additions
---------------
X X X X X X X X <- result of length 2n
Can we do it faster than in n2 many steps??
COMP3121/3821/9101/9801 2 / 1
The Karatsuba trick
Take the two input numbers A and B, and split them into two
halves:
A1 A0
n z }| { z }| {
A = A1 2 + A0
2 A = XX . . . X} XX
| {z . . . X}
| {z
n/2 bits n/2 bits
n
B = B1 2 + B0
2
A1 = MoreSignificantPart(A); A0 = LessSignificantPart(A);
AB can now be calculated as follows:

n
AB = A1 B1 2n + (A1 B0 + A0 B1 )2 2 + A0 B0
n
= A1 B1 2n + ((A1 + A0 )(B1 + B0 ) − A1 B1 − A0 B0 )2 2 + A0 B0
COMP3121/3821/9101/9801 3 / 1
1: function Mult(A, B)
2: if |A| = |B| = 1 then return AB
3: else
4: A1 ← MoreSignificantPart(A);
5: A0 ← LessSignificantPart(A);
6: B1 ← MoreSignificantPart(B);
7: B0 ← LessSignificantPart(B);
8: U ← A0 + A1 ;
9: V ← B0 + B1 ;
10: X ← Mult(A0 , B0 );
11: W ← Mult(A1 , B1 );
12: Y ← Mult(U, V);
13: return W 2n + (Y − X − W ) 2n/2 + X
14: end if
15: end function
COMP3121/3821/9101/9801 4 / 1
The Karatsuba trick
How many steps does this take? (remember, addition is in linear
time!) n
Recurrence: T (n) = 3 T + cn
2
a = 3; b = 2; f (n) = c n; nlogb a = nlog2 3
since 1.5 < log2 3 < 1.6 we have
f (n) = c n = O(nlog2 3−ε ) for any 0 < ε < 0.5
Thus, the first case of the Master Theorem applies.

Consequently,
T (n) = Θ(nlog2 3 ) < Θ(n1.585 )
without going through the messy calculations!
COMP3121/3821/9101/9801 5 / 1
Generalizing Karatsuba’s algorithm
Can we do better if we break the numbers in more than two pieces?
Lets try breaking the numbers A, B into 3 pieces; then with
k = n/3 we obtain
A = XXX
| . . . XX} XXX
{z | . . . XX} XXX
{z | . . . XX}
{z
k bits of A2 k bits of A1 k bits of A0
i.e.,
A= A2 22k + A1 2k + A0
B= B2 22k + B1 2k + B0
So,
AB = A2 B2 24k + (A2 B1 + A1 B2 )23k + (A2 B0 + A1 B1 + A0 B2 )22k +
+ (A1 B0 + A0 B1 )2k + A0 B0
COMP3121/3821/9101/9801 6 / 1
The Karatsuba trick
AB = A2 B2 24k + (A2 B1 + A1 B2 )23k + (A2 B0 + A1 B1 + A0 B2 )22k +
+ (A1 B0 + A0 B1 )2k + A0 B0
we need only 5 coefficients:
C4 =A2 B2
C3 =A2 B1 + A1 B2
C2 =A2 B0 + A1 B1 + A0 B2
C1 =A1 B0 + A0 B1
C0 =A0 B0
Can we get these with 5 multiplications only?

Should we perhaps look at
(A2 + A1 + A0 )(B2 + B1 + B0 ) =
A0 B0 + A1 B0 + A2 B0 + A0 B1 + A1 B1 + A2 B1 + A0 B2 + A1 B2 + A2 B2 ???
Not clear at all how to get C0 − C4 with 5 multiplications only ...

COMP3121/3821/9101/9801 7 / 1
The Karatsuba trick: slicing into 3 pieces
We now look for a method for getting these coefficients without any
guesswork!
Let
A = A2 22k + A1 2k + A0
B = B2 22k + B1 2k + B0
We form naturally corresponding polynomials:
PA (x) = A2 x2 + A1 x + A0 ;
PB (x) = B2 x2 + B1 x + B0 .
Note that
A =A2 (2k )2 + A1 2k + A0 = PA (2k );

B =B2 (2k )2 + B1 2k + B0 = PB (2k ).
COMP3121/3821/9101/9801 8 / 1
If we manage to compute somehow the product polynomial
PC (x) = PA (x)PB (x) = C4 x4 + C3 x3 + C2 x2 + C1 x + C0 ,
with only 5 multiplications, we can then obtain the product of numbers

A and B simply as
A · B = PA (2k )PB (2k ) = PC (2k ) = C4 24k + C3 23k + C2 22k + C1 2k + C0 ,
Note that the right hand side involves only shifts and additions.
Since the product polynomial PC (x) = PA (x)PB (x) is of degree 4 we

need 5 values to uniquely determine PC (x).
We choose the smallest possible 5 integer values (smallest by their

absolute value), i.e., −2, −1, 0, 1, 2.
Thus, we compute
PA (−2), PA (−1), PA (0), PA (1), PA (2)
PB (−2), PB (−1), PB (0), PB (1), PB (2)
COMP3121/3821/9101/9801 9 / 1
For PA (x) = A2 x2 + A1 x + A0 we have
PA (−2) = A2 (−2)2 + A1 (−2) + A0 = 4A2 − 2A1 + A0

PA (−1) = A2 (−1)2 + A1 (−1) + A0 = A2 − A1 + A0
PA (0) = A2 02 + A1 0 + A0 = A0
PA (1) = A2 12 + A1 1 + A0 = A2 + A1 + A0
PA (2) = A2 22 + A1 2 + A0 = 4A2 + 2A1 + A0 .
Similarly, for PB (x) = B2 x2 + B1 x + B0 we have
PB (−2) = B2 (−2)2 + B1 (−2) + B0 = 4B2 − 2B1 + B0

PB (−1) = B2 (−1)2 + B1 (−1) + B0 = B2 − B1 + B0
PB (0) = B2 02 + B1 0 + B0 = B0
PB (1) = B2 12 + B1 1 + B0 = B2 + B1 + B0
PB (2) = B2 22 + B1 2 + B0 = 4B2 + 2B1 + B0 .
These evaluations involve only additions because 2A = A + A; 4A = 2A + 2A.

COMP3121/3821/9101/9801 10 / 1
Having obtained PA (−2), PA (−1), PA (0), PA (1), PA (2) and
PB (−2), PB (−1), PB (0), PB (1), PB (2)
we can now obtain PC (−2), PC (−1), PC (0), PC (1), PC (2) with only 5
multiplications of large numbers:
PC (−2) = PA (−2)PB (−2)
= (A0 − 2A1 + 4A2 )(B0 − 2B1 + 4B2 )
PC (−1) = PA (−1)PB (−1)

= (A0 − A1 + A2 )(B0 − B1 + B2 )
PC (0) = PA (0)PB (0)

= A0 B0
PC (1) = PA (1)PB (1)

= (A0 + A1 + A2 )(B0 + B1 + B2 )
PC (2) = PA (2)PB (2)

= (A0 + 2A1 + 4A2 )(B0 + 2B1 + 4B2 )
COMP3121/3821/9101/9801 11 / 1
Thus, if we represent the product C(x) = PA (x)PB (x) in the coefficient form
as C(x) = C4 x4 + C3 x3 + C2 x2 + C1 x + C0 we get
C4 (−2)4 + C3 (−2)3 + C2 (−2)2 + C1 (−2) + C0 = PC (−2) = PA (−2)PB (−2)

C4 (−1)4 + C3 (−1)3 + C2 (−1)2 + C1 (−1) + C0 = PC (−1) = PA (−1)PB (−1)
C4 04 + C3 03 + C2 02 + C1 · 0 + C0 = PC (0) = PA (0)PB (0)
C4 14 + C3 13 + C2 12 + C1 · 1 + C0 = PC (1) = PA (1)PB (1)
C4 24 + C3 23 + C2 22 + C1 · 2 + C0 = PC (2) = PA (2)PB (2).
Simplifying the left side we obtain
16C4 − 8C3 + 4C2 − 2C1 + C0 = PC (−2)

C4 − C3 + C2 − C1 + C0 = PC (−1)
C0 = PC (0)
C4 + C3 + C2 + C1 + C0 = PC (1)
16C4 + 8C3 + 4C2 + 2C1 + C0 = PC (2)
COMP3121/3821/9101/9801 12 / 1
Solving this system of linear equations for C0 , C1 , C2 , C3 , C4 we obtain
C0 = PC (0)
PC (−2) 2PC (−1) 2PC (1) PC (2)
C1 = − + −
12 3 3 12
PC (−2) 2PC (−1) 5PC (0) 2PC (1) PC (2)
C2 = − + − + −
24 3 4 3 24
PC (−2) PC (−1) PC (1) PC (2)
C3 = − + − +
12 6 6 12
PC (−2) PC (−1) PC (0) PC (1) PC (2)
C4 = − + − +
24 6 4 6 24
Note that these expressions do not involve any multiplications of TWO large
numbers and thus can be done in linear time.
With the coefficients C0 , C1 , C2 , C3 , C4 obtained, we can now form the
polynomial PC (x) = C0 + C1 x + C2 x2 + C3 x3 + C4 x4 .
We can now compute PC (2k ) = C0 + C1 2k + C2 22k + C3 23k + C4 24k in linear
time, because computing PC (2k ) involves only binary shifts of the coefficients
plus O(k) additions.
Thus we have obtained A · B = PA (2k )PB (2k ) = PC (2k ) with only 5
multiplications!
Here is the complete algorithm:
COMP3121/3821/9101/9801 13 / 1
1: function Mult(A, B)
2: obtain A0 , A1 , A2 and B0 , B1 , B2 such that A = A2 22 k + A1 2k + A0 ; B = B2 22 k + B1 2k + B0 ;
3: form polynomials PA (x) = A2 x2 + A1 x + A0 ; PB (x) = B2 x2 + B1 x + B0 ;
4: PA (−2) ← 4A2 − 2A1 + A0 PB (−2) ← 4B2 − 2B1 + B0
PA (−1) ← A2 − A1 + A0 PB (−1) ← B2 − B1 + B0
PA (0) ← A0 PB (0) ← B0
PA (1) ← A2 + A1 + A0 PB (1) ← B2 + B1 + B0
PA (2) ← 4A2 + 2A1 + A0 PB (2) ← 4B2 + 2B1 + B0
5: PC (−2) ← Mult(PA (−2), PB (−2)); PC (−1) ← Mult(PA (−1), PB (−1));
PC (0) ← Mult(PA (0), PB (0));
PC (1) ← Mult(PA (1), PB (1)); PC (2) ← Mult(PA (2), PB (2))
6: C0 ← PC (0); C1 ←
PC (−2)
−
2PC (−1)
+
2PC (1)
−
PC (2)
12 3 3 12
PC (−2) 2PC (−1) 5PC (0) 2PC (1) PC (2)
C2 ← − + − + −
24 3 4 3 24
PC (−2) PC (−1) PC (1) PC (2)
C3 ← − + − +
12 6 6 12
PC (−2) PC (−1) PC (0) PC (1) PC (2)
C4 ← − + − +
24 6 4 6 24
7: form PC (x) = C4 x4 + C3 x3 + C2 x2 + C1 x + C0 ; compute

PC (2k ) = C4 24k + C3 23k + C2 22k + C1 2k + C0
8: return PC (2k ) = A · B.
9: end function
COMP3121/3821/9101/9801 14 / 1
How fast is this algorithm?
We have replaced a multiplication of two n bit numbers with 5

multiplications of n/3 bit numbers with an overhead of additions, shifts
and the similar, all doable in linear time c n;
thus, n
T (n) = 5T + cn
3
We now apply the Master Theorem:
we have a = 5, b = 3, so we consider nlogb a = nlog3 5 ≈ n1.465...
Clearly, the first case of the MT applies and we get
T (n) = O(nlog3 5 ) < O(n1.47 ).
COMP3121/3821/9101/9801 15 / 1
Recall that the original Karatsuba algorithm runs in time
nlog2 3 ≈ n1.58 > n1.47 .
Thus, we got a significantly faster algorithm.
Then why not slice numbers A and B into even larger number of slices?
Maybe we can get even faster algorithm?
The answer is, in a sense, BOTH yes and no, so lets see what happens if
we slice numbers into n + 1 many equal slices...
COMP3121/3821/9101/9801 16 / 1
The general case - slicing the input numbers A, B into n + 1 many slices
For simplicity, let A, B have (n + 1)k bits; (k can be arbitrarily large)

Slice A, B into n + 1 pieces each:
A= An 2kn + An−1 2k(n−1) + · · · + A0
B= Bn 2kn + Bn−1 2k(n−1) + · · · + B0

An An-1 . . . A0

k bits k bits … k bits

A divided into n+1 slices each slice k bits = (n+1) k bits in total
We form the naturally corresponding polynomials:
PA (x) = An x n + An−1 xn−1 + · · · + A0
PB (x) = Bn xn + Bn−1 xn−1 + · · · + B0
COMP3121/3821/9101/9801 17 / 1
As before, we have:
A = PA (2k ); B = PB (2k ); AB = PA (2k )PB (2k ) = (PA (x) · PB (x)) |x=2k
Since
AB = (PA (x) · PB (x)) |x=2k
we adopt the following strategy:
we will first figure out how to multiply polynomials fast to obtain
PC (x) = PA (x) · PB (x);
then we evaluate PC (2k ).
Note that PC (x) = PA (x) · PB (x) is of degree 2n:

2n
X
PC (x) = C j xj
j=0
COMP3121/3821/9101/9801 18 / 1
Example:
(a3 x3 + a2 x2 + a1 x + a0 )(b3 x3 + b2 x2 + b1 x + b0 ) =
a3 b3 x6 + (a2 b3 + a3 b2 )x5 + (a1 b3 + a2 b2 + a3 b1 )x4
+(a0 b3 + a1 b2 + a2 b1 + a3 b0 )x3 + (a0 b2 + a1 b1 + a2 b0 )x2
+(a0 b1 + a1 b0 )x + a0 b0
In general: for
PA (x) = An xn + An−1 xn−1 + · · · + A0
PB (x) = Bn xn + Bn−1 xn−1 + · · · + B0
we have
 
2n
X X 2n
X
PA (x) · PB (x) =  Ai Bk  xj = C j xj
j=0 i+k=j j=0
X
We need to find the coefficients Cj = Ai Bk without performing (n + 1)2
i+k=j
many multiplications necessary to get all products of the form Ai Bk .
COMP3121/3821/9101/9801 19 / 1
A VERY IMPORTANT DIGRESSION:
If you have two sequences A ~ = (A0 , A1 , . . . , An−1 , An ) and
~ = (B0 , B1 , . . . , Bm−1 , Bm ), and if you form the two corresponding polynomials
B
PA = An xn + An−1 xn−1 + . . . + A1 x + A0
PB = Bm xm + Bm−1 xm−1 + . . . + B1 x + B0
and if you multiply these two polynomials to obtain their product

 
m+n
X X n+m
X
PA (x) · PB (x) =  Ai Bk  xj = C j xj
j=0 i+k=j j=0
~ = (C0 , C1 , . . . , Cn+m ) of the coefficients of the product

then the sequence C
polynomial, with these coefficients given by
X
Cj = Ai B k , for 0 ≤ j ≤ n + m,
i+k=j
is extremely important and is called the LINEAR CONVOLUTION of

~ and B
sequences A ~ and is denoted by C
~ =A~ ? B.
~
COMP3121/3821/9101/9801 20 / 1
AN IMPORTANT DIGRESSION:
For example, if you have an audio signal and you want to emphasise the bass
sounds, you would pass the sequence of discrete samples of the signal through
a digital filter which amplifies the low frequencies more than the medium and
the high audio frequencies.
This is accomplished by computing the linear convolution of the sequence of
discrete samples of the signal with a sequence of values which correspond to
that filter, called the impulse response of the filter.
This means that the samples of the output sound are simply the coefficients of
the product of two polynomials:
1 polynomial PA (x) whose coefficients Ai are the samples of the input
signal;
2 polynomial PB (x) whose coefficients Bk are the samples of the so
called impulse response of the filter (they depend of what kind of
filtering you want to do).
Convolutions are bread-and-butter of signal processing, and for that reason it
is extremely important to find fast ways of multiplying two polynomials of
possibly very large degrees.
In signal processing these degrees can be greater than 1000.
This is the main reason for us to study methods of fast computation of
convolutions (aside of finding products of large integers, which is what we are
doing at the moment).
COMP3121/3821/9101/9801 21 / 1
Coefficient vs value representation of polynomials
Every polynomial PA (x) of degree n is uniquely determined by its values at

any n + 1 distinct input values x0 , x1 , . . . , xn :
PA (x) ↔ {(x0 , PA (x0 )), (x1 , PA (x1 )), . . . , (xn , PA (xn ))}
For PA (x) = An xn + An−1 xn−1 + . . . + A0 , these values can be obtained via a

matrix multiplication:
1 x0 x20 . . . xn
    
0 A0 PA (x0 )
 1 x1 x21 . . . xn 1
  A1   PA (x1 ) 
..   ..  =  .. . (1)
    
 . .. .. ..
 .. . . . .  .   . 
1 xn x2n . . . xn n An PA (xn )
It can be shown that if xi are all distinct then this matrix is invertible.
Such a matrix is called the Vandermonde matrix.
COMP3121/3821/9101/9801 22 / 1
Coefficient vs value representation of polynomials - ctd.
Thus, if all xi are all distinct, given any values PA (x0 ), PA (x1 ), . . . , PA (xn ) the
coefficients A0 , A1 , . . . , An of the polynomial PA (x) are uniquely determined:
−1 
1 x0 x20 . . . xn
   
A0 0 PA (x0 )
2
 A1   1 x1 x1 . . . x1   PA (x1 )  n
 ..  =  . .. (2)
     
.. .. .. ..  
 .   ..

. . . .   . 
An 1 xn x2n . . . xn n PA (xn )
• Equations (1) and (2) show how we can commute between:

1 a representation of a polynomial PA (x) via its coefficients
An , An−1 , . . . , A0 , i.e. PA (x) = An xn + . . . + A1 x + A0
2 a representation of a polynomial PA (x) via its values
PA (x) ↔ {(x0 , PA (x0 )), (x1 , PA (x1 )), . . . , (xn , PA (xn ))}
COMP3121/3821/9101/9801 23 / 1
Coefficient vs value representation of polynomials- ctd.
If we fix the inputs x0 , x1 , . . . , xn then commuting between a representation of

a polynomial PA (x) via its coefficients and a representation via its values at
these points is done via the following two matrix multiplications, with matrices
made up from constants:
x20 xn
    
PA (x0 ) 1 x0 ... 0 A0
 PA (x1 )   1 x1 x21 ... xn 1
 A1 
.. = .. ;
    
 .. .. .. .. .. 
 .   . . . . .  . 
PA (xn ) 1 xn x2n ... xn n An
−1 
x20 xn
   
A0 1 x0 ... 0 PA (x0 )
 A1   1 x1 x21 ... xn 1
  PA (x1 ) 
.. = .. .
     
 .. .. .. .. ..  
 .   . . . . .   . 
An 1 xn x2n ... xn n PA (xn )
Thus, for fixed input values x0 , . . . , xn this switch between the two kinds of
representations is done in linear time!
COMP3121/3821/9101/9801 24 / 1
Our strategy to multiply polynomials fast:
1 Given two polynomials of degree at most n,
PA (x) = An xn + . . . + A0 ; PB (x) = Bn xn + . . . + B0
convert them into value representation at 2n + 1 distinct points x0 , x1 , . . . , x2n :
PA (x) ↔ {(x0 , PA (x0 )), (x1 , PA (x1 )), . . . , (x2n , PA (x2n ))}
PB (x) ↔ {(x0 , PB (x0 )), (x1 , PB (x1 )), . . . , (x2n , PB (x2n ))}
Note: since the product of the two polynomials will be of degree 2n

we need the values of PA (x) and PB (x) at 2n + 1 points, rather
than just n + 1 points!
2 Multiply these two polynomials point-wise, using 2n + 1 multiplications only.

PA (x)PB (x) ↔ {(x0 , PA (x0 )PB (x0 )), (x1 , PA (x1 )PB (x1 )), . . . , (x2n , PA (x2n )PB (x2n ))}
| {z } | {z } | {z }
PC (x0 ) PC (x1 ) PC (x2n )
3 Convert such value representation of PC (x) = PA (x)PB (x) back to coefficient

form
PC (x) = C2n x2n + C2n−1 x2n−1 + . . . + C1 x + C0 ;
COMP3121/3821/9101/9801 25 / 1
Fast multiplication of polynomials - continued
What values should we choose for x0 , x1 , . . . , x2n ??

Key idea: use 2n + 1 smallest possible integer values!
{−n, −(n − 1), . . . , −1, 0, 1, . . . , n − 1, n}
So we find the values PA (m) and PB (m) for all m such that −n ≤ m ≤ n.
Remember that n + 1 is the number of slices we split the input numbers A, B.
Multiplication of a large number with k bits by a constant integer d can be
done in time linear in k because it is reducible to d − 1 additions:
d · A = A + A + ... + A
| {z }
d
Thus, all the values
PA (m) = An mn + An−1 mn−1 + · · · + A0 : −n ≤ m ≤ n,
PB (m) = Bn mn + Bn−1 mn−1 + · · · + B0 : −n ≤ m ≤ n.

can be found in time linear in the number of bits of the input numbers!
COMP3121/3821/9101/9801 26 / 1
Fast multiplication of polynomials - ctd.
We now perform 2n + 1 multiplications of large numbers to obtain
PA (−n)PB (−n), . . . , PA (−1)PB (−1), PA (0)PB (0), PA (1)PB (1), . . . , PA (n)PB (n)
For PC (x) = PA (x)PB (x) these products are 2n + 1 many values of PC (x):
PC (−n) = PA (−n)PB (−n), . . . , PC (0) = PA (0)PB (0), . . . , PC (n) = PA (n)PB (n)
Let C0 , C1 , . . . , C2n be the coefficients of the product polynomial C(x), i.e., let
PC (x) = C2n x2n + C2n−1 x2n−1 + · · · + C0 ,
We now have:
C2n (−n)2n + C2n−1 (−n)2n−1 + · · · + C0 = PC (−n)

C2n (−(n − 1))2n + C2n−1 (−(n − 1))2n−1 + · · · + C0 = PC (−(n − 1))
..
.
C2n (n − 1)2n + C2n−1 (n − 1)2n−1 + · · · + C0 = PC (n − 1)
C2n n2n + C2n−1 n2n−1 + · · · + C0 = PC (n)
COMP3121/3821/9101/9801 27 / 1
Fast multiplication of polynomials - ctd.
This is just a system of linear equations, that can be solved for C0 , C1 , . . . , C2n :
(−n)2 (−n)2n
    
1 −n ... C0 PC (−n)
 1 −(n − 1) (−(n − 1))2 ... (−(n − 1))2n  C1   PC (−(n − 1)) 
.. = .. ,
    
 .. .. .. .. .. 
 . . . . .  .   . 
1 n n2 ... n2n C2n PC (n)
i.e., we can obtain C0 , C1 , . . . , C2n as

−1 
(−n)2 (−n)2n
   
C0 1 −n ... PC (−n)
 C1   1 −(n − 1) (−(n − 1))2 ... (−(n − 1))2n  PC (−(n − 1))
.. = .. .
     
 .. .. .. .. ..  
 .   . . . . .   . 
C2n 1 n n2 ... n2n PC (n)
But the inverse matrix also involves only constants depending on n only;
Thus the coefficients Ci can be obtained in linear time.
So here is the algorithm we have just described:
COMP3121/3821/9101/9801 28 / 1
1: function Mult(n, A, B)
2: if |A| = |B| = 1 then return AB
3: else
4: obtain n + 1 slices A0 , A1 , . . . , An and B0 , B1 , . . . , Bn such that
nk (n−1) k
A = An 2 + An−1 2 + . . . + A0
nk (n−1) k
B = Bn 2 + Bn−1 2 + . . . + B0
5: form polynomials
n (n−1)
PA (x) = An x + An−1 x + . . . + A0
n (n−1)
PB (x) = Bn x + Bn−1 x + . . . + B0
6: for m = −n to m = n do
7: compute PA (m) and PB (m);
8: PC (m) ← Mult(n, PA (m)PB (m))
9: end for
10: compute C0 , C1 , . . . C2n via
−1 
(−n)2 (−n)2n
   
C0 1 −n ... PC (−n)
2 2n
−(n − 1) (−(n − 1)) (−(n − 1))  PC (−(n − 1))
     
 C1   1 ...  
     
 . = . . . . .   . .
.  . . . . . .
     
.  . . . . . .
    
    
C2n 1 n n2 ... n2n PC (n)
11: form PC (x) = C2n x 2n

+ . . . + C0 and compute PC (2 ) k
12: return PC (2k ) = A · B

13: end if
14: end function
COMP3121/3821/9101/9801 29 / 1
How fast is our algorithm?
it is easy to see that the values of the two polynomials we are multiplying have at
most k + s bits where s is a constant which depends on n but does NOT depend on k:
PA (m) = An mn + An−1 mn−1 + · · · + A0 : −n ≤ m ≤ n.
This is because each Ai is smaller than 2k because each Ak has k bits; thus
|PA (m)| < nn (|An | + |An−1 | + · · · + |A0 |) < nn × n × 2k
Thus, we have reduced a multiplication of two k(n + 1) digit numbers to 2n + 1

multiplications of k + s digit numbers plus a linear overhead (of additions splitting
the numbers etc.)
So we get the following recurrence for the complexity of Mult(A, B):
T ((n + 1)k) = (2n + 1)T (k + s) + c k
Let N = (n + 1)k. Then

N c
T (N ) = (2n + 1) T +s + N
| {z } n+1 n+1
a
| {z }
b
Since s is constant, its impact can be neglected.
COMP3121/3821/9101/9801 30 / 1
How fast is our algorithm?
Since logb a = logn+1 (2n + 1) > 1, we can choose a small ε such that also
logb a − ε > 1.
f (N ) = c/(n + 1) N = O N logb a−ε .

Consequently, for such an ε we would have
Thus, with a = 2n + 1 and b = n + 1 the first case of the Master Theorem applies;
so we get:

T (N ) = Θ N logb a = Θ N logn+1 (2n+1)
COMP3121/3821/9101/9801 31 / 1
Note that
N logn+1 (2n+1) < N logn+1 2(n+1) = N logn+1 2+logn+1 (n+1)

1
1+ log
= N 1+logn+1 2 = N 2 (n+1)
Thus, by choosing a sufficiently large n, we can get a run time arbitrarily

close to linear time!
How large does n have to be, in order to to get an algorithm which runs
in time N 1.1 ?
1+ log 1 1 1
N 1.1 = N 2 (n+1) → = → n + 1 = 210
log2 (n + 1) 10
Thus, we would have to slice the input numbers into 210 = 1024 pieces!!
COMP3121/3821/9101/9801 32 / 1
We would have to evaluate polynomials PA (x) and PB (x) both of degree
n at values up to n.
However, n = 210 , so evaluating PA (n) = An nn + . . . + A0 involves

10
multiplication of An with nn = (210 )2 ≈ 1.27 × 103079 .
Thus, while evaluations of PA (x) and PB (x) for x = −n . . . n can

theoretically all be done in linear time, T (n) = c n, the constant c is
absolutely humongous.
Consequently, slicing the input numbers in more than just a few slices
results in a hopelessly slow algorithm, despite the fact that the
asymptotic bounds improve as we increase the number of slices!
The moral is: In practice, asymptotic estimates are useless if the

size of the constants hidden by the O-notation are not
estimated and found to be reasonably small!!!
COMP3121/3821/9101/9801 33 / 1
Crucial question: Are there numbers x0 , x1 , . . . , xn such that the size
of xni does not grow uncontrollably?
Answer: YES; they are the complex numbers zi lying on the unit circle,
i.e., such that |zi | = 1!
This motivates us to consider values of polynomials at inputs which are
equally spaced complex numbers all lying on the unit circle.
The sequence of such values is called the discrete Fourier transform
(DFT) of the sequence of the coefficients of the polynomial being
evaluated.
We will present a very fast algorithm for computing these values, called
the Fast Fourier Transform, abbreviated as FFT.
The Fast Fourier Transform is the most executed algorithm today
and is thus arguably the most important algorithm of all.
Every mobile phone performs thousands of FFT runs each second, for
example to compress your speech signal or to compress images taken by
your camera, to mention just a few uses of the FFT.
After we study the FFT we will have a guest lecture by a Dolby engineer
to demonstrate to you some cool applications of FFT.
COMP3121/3821/9101/9801 34 / 1

Fast Integer Multiplication

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Fast Integer Multiplication

Hochgeladen von

Copyright:

Verfügbare Formate

NEW SOUTH WALES

School of Computer Science and Engineering

4. FAST LARGE INTEGER MULTIPLICATION

The primary school algorithm:

X X X X <- first input integer

Can we do it faster than in n2 many steps??

AB can now be calculated as follows:

a = 3; b = 2; f (n) = c n; nlogb a = nlog2 3

since 1.5 < log2 3 < 1.6 we have

f (n) = c n = O(nlog2 3−ε ) for any 0 < ε < 0.5

Thus, the first case of the Master Theorem applies.

we need only 5 coefficients:

Can we get these with 5 multiplications only?

Not clear at all how to get C0 − C4 with 5 multiplications only ...

We form naturally corresponding polynomials:

A =A2 (2k )2 + A1 2k + A0 = PA (2k );

PC (x) = PA (x)PB (x) = C4 x4 + C3 x3 + C2 x2 + C1 x + C0 ,

with only 5 multiplications, we can then obtain the product of numbers

Since the product polynomial PC (x) = PA (x)PB (x) is of degree 4 we

We choose the smallest possible 5 integer values (smallest by their

PA (−2) = A2 (−2)2 + A1 (−2) + A0 = 4A2 − 2A1 + A0

Similarly, for PB (x) = B2 x2 + B1 x + B0 we have

PB (−2) = B2 (−2)2 + B1 (−2) + B0 = 4B2 − 2B1 + B0

These evaluations involve only additions because 2A = A + A; 4A = 2A + 2A.

PC (−1) = PA (−1)PB (−1)

PC (0) = PA (0)PB (0)

PC (1) = PA (1)PB (1)

PC (2) = PA (2)PB (2)

C4 (−2)4 + C3 (−2)3 + C2 (−2)2 + C1 (−2) + C0 = PC (−2) = PA (−2)PB (−2)

Simplifying the left side we obtain

16C4 − 8C3 + 4C2 − 2C1 + C0 = PC (−2)

3: form polynomials PA (x) = A2 x2 + A1 x + A0 ; PB (x) = B2 x2 + B1 x + B0 ;

4: PA (−2) ← 4A2 − 2A1 + A0 PB (−2) ← 4B2 − 2B1 + B0

PA (2) ← 4A2 + 2A1 + A0 PB (2) ← 4B2 + 2B1 + B0

5: PC (−2) ← Mult(PA (−2), PB (−2)); PC (−1) ← Mult(PA (−1), PB (−1));

PC (0) ← Mult(PA (0), PB (0));

PC (1) ← Mult(PA (1), PB (1)); PC (2) ← Mult(PA (2), PB (2))

7: form PC (x) = C4 x4 + C3 x3 + C2 x2 + C1 x + C0 ; compute

How fast is this algorithm?

We have replaced a multiplication of two n bit numbers with 5

Recall that the original Karatsuba algorithm runs in time

nlog2 3 ≈ n1.58 > n1.47 .

Thus, we got a significantly faster algorithm.

For simplicity, let A, B have (n + 1)k bits; (k can be arbitrarily large)

PC (x) = PA (x) · PB (x);

then we evaluate PC (2k ).

Note that PC (x) = PA (x) · PB (x) is of degree 2n:

and if you multiply these two polynomials to obtain their product

~ = (C0 , C1 , . . . , Cn+m ) of the coefficients of the product

is extremely important and is called the LINEAR CONVOLUTION of

Every polynomial PA (x) of degree n is uniquely determined by its values at

For PA (x) = An xn + An−1 xn−1 + . . . + A0 , these values can be obtained via a

• Equations (1) and (2) show how we can commute between:

If we fix the inputs x0 , x1 , . . . , xn then commuting between a representation of

Note: since the product of the two polynomials will be of degree 2n

2 Multiply these two polynomials point-wise, using 2n + 1 multiplications only.

3 Convert such value representation of PC (x) = PA (x)PB (x) back to coefficient

What values should we choose for x0 , x1 , . . . , x2n ??

{−n, −(n − 1), . . . , −1, 0, 1, . . . , n − 1, n}

PA (m) = An mn + An−1 mn−1 + · · · + A0 : −n ≤ m ≤ n,

PB (m) = Bn mn + Bn−1 mn−1 + · · · + B0 : −n ≤ m ≤ n.

PC (−n) = PA (−n)PB (−n), . . . , PC (0) = PA (0)PB (0), . . . , PC (n) = PA (n)PB (n)

PC (x) = C2n x2n + C2n−1 x2n−1 + · · · + C0 ,