Beruflich Dokumente
Kultur Dokumente
Algorithms:
COMP3121/3821/9101/9801
COMP3121/3821/9101/9801 1 / 1
Basics revisited: how do we multiply two numbers?
COMP3121/3821/9101/9801 2 / 1
The Karatsuba trick
Take the two input numbers A and B, and split them into two
halves:
A1 A0
n z }| { z }| {
A = A1 2 + A0
2 A = XX . . . X} XX
| {z . . . X}
| {z
n/2 bits n/2 bits
n
B = B1 2 + B0
2
A1 = MoreSignificantPart(A); A0 = LessSignificantPart(A);
COMP3121/3821/9101/9801 3 / 1
1: function Mult(A, B)
2: if |A| = |B| = 1 then return AB
3: else
4: A1 ← MoreSignificantPart(A);
5: A0 ← LessSignificantPart(A);
6: B1 ← MoreSignificantPart(B);
7: B0 ← LessSignificantPart(B);
8: U ← A0 + A1 ;
9: V ← B0 + B1 ;
10: X ← Mult(A0 , B0 );
11: W ← Mult(A1 , B1 );
12: Y ← Mult(U, V);
13: return W 2n + (Y − X − W ) 2n/2 + X
14: end if
15: end function
COMP3121/3821/9101/9801 4 / 1
The Karatsuba trick
How many steps does this take? (remember, addition is in linear
time!) n
Recurrence: T (n) = 3 T + cn
2
B= B2 22k + B1 2k + B0
So,
AB = A2 B2 24k + (A2 B1 + A1 B2 )23k + (A2 B0 + A1 B1 + A0 B2 )22k +
+ (A1 B0 + A0 B1 )2k + A0 B0
COMP3121/3821/9101/9801 6 / 1
The Karatsuba trick
AB = A2 B2 24k + (A2 B1 + A1 B2 )23k + (A2 B0 + A1 B1 + A0 B2 )22k +
+ (A1 B0 + A0 B1 )2k + A0 B0
C4 =A2 B2
C3 =A2 B1 + A1 B2
C2 =A2 B0 + A1 B1 + A0 B2
C1 =A1 B0 + A0 B1
C0 =A0 B0
(A2 + A1 + A0 )(B2 + B1 + B0 ) =
A0 B0 + A1 B0 + A2 B0 + A0 B1 + A1 B1 + A2 B1 + A0 B2 + A1 B2 + A2 B2 ???
We now look for a method for getting these coefficients without any
guesswork!
Let
A = A2 22k + A1 2k + A0
B = B2 22k + B1 2k + B0
PA (x) = A2 x2 + A1 x + A0 ;
PB (x) = B2 x2 + B1 x + B0 .
Note that
COMP3121/3821/9101/9801 8 / 1
The Karatsuba trick: slicing into 3 pieces
If we manage to compute somehow the product polynomial
Thus, we compute
PA (−2), PA (−1), PA (0), PA (1), PA (2)
PB (−2), PB (−1), PB (0), PB (1), PB (2)
COMP3121/3821/9101/9801 9 / 1
The Karatsuba trick: slicing into 3 pieces
For PA (x) = A2 x2 + A1 x + A0 we have
Thus, if we represent the product C(x) = PA (x)PB (x) in the coefficient form
as C(x) = C4 x4 + C3 x3 + C2 x2 + C1 x + C0 we get
COMP3121/3821/9101/9801 12 / 1
The Karatsuba trick: slicing into 3 pieces
Solving this system of linear equations for C0 , C1 , C2 , C3 , C4 we obtain
C0 = PC (0)
PC (−2) 2PC (−1) 2PC (1) PC (2)
C1 = − + −
12 3 3 12
PC (−2) 2PC (−1) 5PC (0) 2PC (1) PC (2)
C2 = − + − + −
24 3 4 3 24
PC (−2) PC (−1) PC (1) PC (2)
C3 = − + − +
12 6 6 12
PC (−2) PC (−1) PC (0) PC (1) PC (2)
C4 = − + − +
24 6 4 6 24
Note that these expressions do not involve any multiplications of TWO large
numbers and thus can be done in linear time.
With the coefficients C0 , C1 , C2 , C3 , C4 obtained, we can now form the
polynomial PC (x) = C0 + C1 x + C2 x2 + C3 x3 + C4 x4 .
We can now compute PC (2k ) = C0 + C1 2k + C2 22k + C3 23k + C4 24k in linear
time, because computing PC (2k ) involves only binary shifts of the coefficients
plus O(k) additions.
Thus we have obtained A · B = PA (2k )PB (2k ) = PC (2k ) with only 5
multiplications!
Here is the complete algorithm:
COMP3121/3821/9101/9801 13 / 1
1: function Mult(A, B)
2: obtain A0 , A1 , A2 and B0 , B1 , B2 such that A = A2 22 k + A1 2k + A0 ; B = B2 22 k + B1 2k + B0 ;
PA (−1) ← A2 − A1 + A0 PB (−1) ← B2 − B1 + B0
PA (0) ← A0 PB (0) ← B0
PA (1) ← A2 + A1 + A0 PB (1) ← B2 + B1 + B0
6: C0 ← PC (0); C1 ←
PC (−2)
−
2PC (−1)
+
2PC (1)
−
PC (2)
12 3 3 12
PC (−2) 2PC (−1) 5PC (0) 2PC (1) PC (2)
C2 ← − + − + −
24 3 4 3 24
PC (−2) PC (−1) PC (1) PC (2)
C3 ← − + − +
12 6 6 12
PC (−2) PC (−1) PC (0) PC (1) PC (2)
C4 ← − + − +
24 6 4 6 24
8: return PC (2k ) = A · B.
9: end function
COMP3121/3821/9101/9801 14 / 1
The Karatsuba trick: slicing into 3 pieces
COMP3121/3821/9101/9801 15 / 1
The Karatsuba trick: slicing into 3 pieces
Then why not slice numbers A and B into even larger number of slices?
Maybe we can get even faster algorithm?
The answer is, in a sense, BOTH yes and no, so lets see what happens if
we slice numbers into n + 1 many equal slices...
COMP3121/3821/9101/9801 16 / 1
Generalizing Karatsuba’s algorithm
The general case - slicing the input numbers A, B into n + 1 many slices
An An-1 . . . A0
k bits k bits … k bits
A divided into n+1 slices each slice k bits = (n+1) k bits in total
We form the naturally corresponding polynomials:
PA (x) = An x n + An−1 xn−1 + · · · + A0
PB (x) = Bn xn + Bn−1 xn−1 + · · · + B0
COMP3121/3821/9101/9801 17 / 1
Generalizing Karatsuba’s algorithm
As before, we have:
A = PA (2k ); B = PB (2k ); AB = PA (2k )PB (2k ) = (PA (x) · PB (x)) |x=2k
Since
AB = (PA (x) · PB (x)) |x=2k
we adopt the following strategy:
we will first figure out how to multiply polynomials fast to obtain
COMP3121/3821/9101/9801 18 / 1
Generalizing Karatsuba’s algorithm
Example:
(a3 x3 + a2 x2 + a1 x + a0 )(b3 x3 + b2 x2 + b1 x + b0 ) =
a3 b3 x6 + (a2 b3 + a3 b2 )x5 + (a1 b3 + a2 b2 + a3 b1 )x4
+(a0 b3 + a1 b2 + a2 b1 + a3 b0 )x3 + (a0 b2 + a1 b1 + a2 b0 )x2
+(a0 b1 + a1 b0 )x + a0 b0
In general: for
PA (x) = An xn + An−1 xn−1 + · · · + A0
PB (x) = Bn xn + Bn−1 xn−1 + · · · + B0
we have
2n
X X 2n
X
PA (x) · PB (x) = Ai Bk xj = C j xj
j=0 i+k=j j=0
X
We need to find the coefficients Cj = Ai Bk without performing (n + 1)2
i+k=j
many multiplications necessary to get all products of the form Ai Bk .
COMP3121/3821/9101/9801 19 / 1
A VERY IMPORTANT DIGRESSION:
If you have two sequences A ~ = (A0 , A1 , . . . , An−1 , An ) and
~ = (B0 , B1 , . . . , Bm−1 , Bm ), and if you form the two corresponding polynomials
B
PA = An xn + An−1 xn−1 + . . . + A1 x + A0
PB = Bm xm + Bm−1 xm−1 + . . . + B1 x + B0
PA (x) ↔ {(x0 , PA (x0 )), (x1 , PA (x1 )), . . . , (xn , PA (xn ))}
1 x0 x20 . . . xn
0 A0 PA (x0 )
1 x1 x21 . . . xn 1
A1 PA (x1 )
.. .. = .. . (1)
. .. .. ..
.. . . . . . .
1 xn x2n . . . xn n An PA (xn )
It can be shown that if xi are all distinct then this matrix is invertible.
Such a matrix is called the Vandermonde matrix.
COMP3121/3821/9101/9801 22 / 1
Coefficient vs value representation of polynomials - ctd.
Thus, if all xi are all distinct, given any values PA (x0 ), PA (x1 ), . . . , PA (xn ) the
coefficients A0 , A1 , . . . , An of the polynomial PA (x) are uniquely determined:
−1
1 x0 x20 . . . xn
A0 0 PA (x0 )
2
A1 1 x1 x1 . . . x1 PA (x1 ) n
.. = . .. (2)
.. .. .. ..
. ..
. . . . .
An 1 xn x2n . . . xn n PA (xn )
PA (x) ↔ {(x0 , PA (x0 )), (x1 , PA (x1 )), . . . , (xn , PA (xn ))}
COMP3121/3821/9101/9801 23 / 1
Coefficient vs value representation of polynomials- ctd.
x20 xn
PA (x0 ) 1 x0 ... 0 A0
PA (x1 ) 1 x1 x21 ... xn 1
A1
.. = .. ;
.. .. .. .. ..
. . . . . . .
PA (xn ) 1 xn x2n ... xn n An
−1
x20 xn
A0 1 x0 ... 0 PA (x0 )
A1 1 x1 x21 ... xn 1
PA (x1 )
.. = .. .
.. .. .. .. ..
. . . . . . .
An 1 xn x2n ... xn n PA (xn )
Thus, for fixed input values x0 , . . . , xn this switch between the two kinds of
representations is done in linear time!
COMP3121/3821/9101/9801 24 / 1
Our strategy to multiply polynomials fast:
1 Given two polynomials of degree at most n,
PA (x) = An xn + . . . + A0 ; PB (x) = Bn xn + . . . + B0
convert them into value representation at 2n + 1 distinct points x0 , x1 , . . . , x2n :
PA (x) ↔ {(x0 , PA (x0 )), (x1 , PA (x1 )), . . . , (x2n , PA (x2n ))}
PB (x) ↔ {(x0 , PB (x0 )), (x1 , PB (x1 )), . . . , (x2n , PB (x2n ))}
So we find the values PA (m) and PB (m) for all m such that −n ≤ m ≤ n.
Remember that n + 1 is the number of slices we split the input numbers A, B.
Multiplication of a large number with k bits by a constant integer d can be
done in time linear in k because it is reducible to d − 1 additions:
d · A = A + A + ... + A
| {z }
d
Thus, all the values
COMP3121/3821/9101/9801 26 / 1
Fast multiplication of polynomials - ctd.
We now perform 2n + 1 multiplications of large numbers to obtain
PA (−n)PB (−n), . . . , PA (−1)PB (−1), PA (0)PB (0), PA (1)PB (1), . . . , PA (n)PB (n)
For PC (x) = PA (x)PB (x) these products are 2n + 1 many values of PC (x):
Let C0 , C1 , . . . , C2n be the coefficients of the product polynomial C(x), i.e., let
We now have:
COMP3121/3821/9101/9801 27 / 1
Fast multiplication of polynomials - ctd.
This is just a system of linear equations, that can be solved for C0 , C1 , . . . , C2n :
(−n)2 (−n)2n
1 −n ... C0 PC (−n)
1 −(n − 1) (−(n − 1))2 ... (−(n − 1))2n C1 PC (−(n − 1))
.. = .. ,
.. .. .. .. ..
. . . . . . .
1 n n2 ... n2n C2n PC (n)
But the inverse matrix also involves only constants depending on n only;
Thus the coefficients Ci can be obtained in linear time.
So here is the algorithm we have just described:
COMP3121/3821/9101/9801 28 / 1
1: function Mult(n, A, B)
2: if |A| = |B| = 1 then return AB
3: else
4: obtain n + 1 slices A0 , A1 , . . . , An and B0 , B1 , . . . , Bn such that
nk (n−1) k
A = An 2 + An−1 2 + . . . + A0
nk (n−1) k
B = Bn 2 + Bn−1 2 + . . . + B0
5: form polynomials
n (n−1)
PA (x) = An x + An−1 x + . . . + A0
n (n−1)
PB (x) = Bn x + Bn−1 x + . . . + B0
6: for m = −n to m = n do
7: compute PA (m) and PB (m);
8: PC (m) ← Mult(n, PA (m)PB (m))
9: end for
10: compute C0 , C1 , . . . C2n via
−1
(−n)2 (−n)2n
C0 1 −n ... PC (−n)
2 2n
−(n − 1) (−(n − 1)) (−(n − 1)) PC (−(n − 1))
C1 1 ...
. = . . . . . . .
. . . . . . .
. . . . . . .
C2n 1 n n2 ... n2n PC (n)
This is because each Ai is smaller than 2k because each Ak has k bits; thus
COMP3121/3821/9101/9801 30 / 1
How fast is our algorithm?
Since logb a = logn+1 (2n + 1) > 1, we can choose a small ε such that also
logb a − ε > 1.
Thus, with a = 2n + 1 and b = n + 1 the first case of the Master Theorem applies;
so we get:
T (N ) = Θ N logb a = Θ N logn+1 (2n+1)
COMP3121/3821/9101/9801 31 / 1
Note that
How large does n have to be, in order to to get an algorithm which runs
in time N 1.1 ?
1+ log 1 1 1
N 1.1 = N 2 (n+1) → = → n + 1 = 210
log2 (n + 1) 10
Thus, we would have to slice the input numbers into 210 = 1024 pieces!!
COMP3121/3821/9101/9801 32 / 1
We would have to evaluate polynomials PA (x) and PB (x) both of degree
n at values up to n.
Consequently, slicing the input numbers in more than just a few slices
results in a hopelessly slow algorithm, despite the fact that the
asymptotic bounds improve as we increase the number of slices!
COMP3121/3821/9101/9801 33 / 1
Crucial question: Are there numbers x0 , x1 , . . . , xn such that the size
of xni does not grow uncontrollably?
Answer: YES; they are the complex numbers zi lying on the unit circle,
i.e., such that |zi | = 1!
This motivates us to consider values of polynomials at inputs which are
equally spaced complex numbers all lying on the unit circle.
The sequence of such values is called the discrete Fourier transform
(DFT) of the sequence of the coefficients of the polynomial being
evaluated.
We will present a very fast algorithm for computing these values, called
the Fast Fourier Transform, abbreviated as FFT.
The Fast Fourier Transform is the most executed algorithm today
and is thus arguably the most important algorithm of all.
Every mobile phone performs thousands of FFT runs each second, for
example to compress your speech signal or to compress images taken by
your camera, to mention just a few uses of the FFT.
After we study the FFT we will have a guest lecture by a Dolby engineer
to demonstrate to you some cool applications of FFT.
COMP3121/3821/9101/9801 34 / 1