Beruflich Dokumente
Kultur Dokumente
Karatsuba Algorithm
Evangelos Kyritsis and Kiamal Pekmestzi
Microprocessors and Digital Systems Lab (MicroLab)
School of Electrical and Computer Engineering, National Technical University of Athens (NTUA), Athens, Greece
evkiritsis@gmail.com, pekmes@cs.ntua.gr
Building Blocks
Abstract
In this work, an efficient implementation of a programmable Finite Impulse Response (FIR) filter based
on the use of the Karatsuba Multiplication Algorithm (KMA) is presented. In this FIR filter circuit, a
parallel, Modified Booth (MB) pre-encoded, Carry-Save (CS) Wallace tree multiplier is used as a
building block. The KMA is a fast divide and conquer algorithm for the multiplication of large numbers.
As a result, the proposed circuit is highly efficient in terms of speed, area and power in comparison
with the conventional FIR filter architecture. Simulations of FIR filters in transposed form made over
standard-cell implementation based on an Faraday 90nm technology show an average reduction of
about 15% in the delay, 9% in area and 17% in power
A. Sub-filter
The three sub-filters are in transposed form
The sub-filters are composed by multipliers (MU), CSA Wallace trees (4:2) and the required delay
units (D)
Both the intermediate products (MUs results) and the final result of the sub-filters are in Carry-Save
form
x(n)
Main Objectives
4:2
y(n)
h1
MU
h2
MU
MU
MU
h3
4:2
h0
4:2
Karatsuba Formula
Let us consider two numbers and b of 2N bits. Each number can be divided to two sub-words of N
bits, as follows:
= 2 +
and
b= 2 +
= =
22
+ +
B. Multiplier
2
sign
Partial Product
Generator
one
two
MB Encoding
Correction Terms
(only input c arries)
CSA Wallace
Tree
Results
Proposed Architecture
The Karatsuba formula is applied in order to split the original filter into three sub-filters of reduced
dynamic range, working in parallel:
1
: =
() ( )
The proposed architecture and the conventional FIR filter were synthesized based on the Faraday
90nm technology library. The following tools have been used for synthesis and simulations: Synopsys
Design Compiler, PrimeTime, PrimePower and ModelSim. The synthesis constraints have been set for
optimal results without keeping the hierarchy of the designs. Power consumption has been estimated
by full timing simulations. The area and power measurements have been obtained, considering the
maximum clock frequency the conventional filter could synthesized.
=0
=0
1
: =
() ( )
=0
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1.28
-16%
-16%
1.08
0.98
0.82
16
The required adders and hardwired shifters are used for the implementation of the input and output
conversions
The original filter output is rebuild from the three sub-results by implementing the Karatsuba
formula:
y = 2 + 2 + 1 2
() ( )
: =
-12%
1.5
1.07
0.9
0.5
0
16
32
KARATSUBA
1.37
-16%
KARATSUBA
0.46
0.409
0.3
0.2
-11%
0.1
0.145 0.164
Area (mm2)
-11%
-7%
0.8
16
0.6
0.4
0.2
-6%
0.283 0.301
16
32
32
Bit
Bit
KARATSUBA
KARATSUBA
CONVENTIONAL
200
150
100
50
264.6
213.7
-20%
106.2
132.9
0
16
32
Power (mW)
Power (mW)
250
700
600
500
400
300
200
100
0
-17% 659.6
550.2
-11%
197.1 222.1
16
Bit
KARATSUBA
CONVENTIONAL
CONVENTIONAL
300
We have presented an efficient FIR filter based on the Karatsuba formula. The architecture we have
designed is composed by three sub-filters of reduced dynamic range. A parallel, MB pre-encoded, CS
Wallace tree multiplier was designed and used as a building block. We choose to use CS arithmetic in
order to enhance delay savings. The proposed Karatsuba design show an improved performance, a
smaller circuit area and lower power consumption, compared with the conventional transposed FIR
filter.
0.924
0.86
Conclusions
CONVENTIONAL
0.5
Area (mm2)
32
Bit
Bit
CONVENTIONAL
0.4
1.55
32
Bit
KARATSUBA
CONVENTIONAL