Beruflich Dokumente
Kultur Dokumente
Tree (PPST), and finally due to the Final Adder [9]. Of these
the dominant components of the multiplier delay are due to
the PPST and the final adder. The relative delay due to the
PPG is small. Therefore significant improvement in the
speed of the multiplier can be achieved by reducing the delay
in the PPST and the final adder stage of the multiplier. In this
work the delay introduced by the PPST is reduced by using
two independent structures in the partial products. The
proposed hybrid final adder computes the final products
much faster.
This paper is structured as follows: Sections II and III
describe the design of parallel structures for the PPST and
the design of hybrid final adder structure respectively.
Section IV reports the ASIC implementation details and the
simulation results. Finally, Section V summarizes the
analysis. Throughout the paper, it is assumed that the number
of bits in the multiplier and multiplicand are equal.
I. INTRODUCTION
This work was carried out at the Integrated Circuit Design Laboratories,
VIT University, Vellore, India.
B. Ramkumar is with the School of Electronics Engineering, VIT
University, Vellore (email: ramkumar.b@vit.ac.in).
V. Sreedeep is with the School of Electronics Engineering, VIT
University, Vellore (email: v.sreedeep@gmail.com).
Harish M Kittur .is with the School of Electronics Engineering, VIT
University, Vellore (email: kittur@vit.ac.in)
65
64
16
15
14
13
12
11
10
17
24
23
22
21
20
19
18
32
31
30
29
28
27
26
25
40
39
38
37
36
35
34
33
41
48
47
46
45
44
43
42
56
55
54
53
52
51
50
49
63
62
61
60
59
58
57
14
13
12
11
10
21
20
19
18
17
16
28
27
26
25
24
35
34
33
32
42
41
40
49
48
(a)
56
65
64
56
48
40
32
24
63
55
47
39
31
16
15
14
13
12
11
10
62
54
46
38
23
22
21
20
19
18
17
61
53
45
30
29
28
27
26
25
60
52
37
36
35
34
33
59
44
43
42
41
51
50
49
58
57
0
c1
s1
s0
c2
c0
20
12
11
10
s2
27
19
18
17
16
42
34
26
25
24
49
41
33
32
56
48
40
c1
s6
s5
s4
s3
c2
c5
c4
c3
18
10
c6
s9
s8
s7
25
17
16
c9
c8
c7
40
32
24
s15
s14
s13
s12
s11
s10
c14
c13
c12
c11
c10
17
c9
c8
c7
40
32
24
16
c15
s22
s21
s20
s19
s18
s17
s16
c22
c21
c20
c19
c18
c17
c16
16
p0[9]
p0[8]
p7
p6
p5
p4
p3
p2
p1
(b)
65
64
56
48
40
32
24
63
55
47
39
31
16
15
14
13
12
11
10
62
54
46
38
23
22
21
20
19
18
17
61
53
45
30
29
28
27
26
25
60
52
37
36
35
34
33
59
44
43
42
41
51
50
49
58
57
Part1
c15
Part0
(c)
Fig. 1. Partitioning the partial products: (a) Partial product array
diagram for 8*8 multiplier, (b) An Alternative Representation, (c)
Partitioned structure of multiplier showing part0 and part1.
p0[10]
p0
63
55
47
39
31
23
15
62
54
46
38
30
22
61
53
45
37
29
60
52
44
36
59
51
43
58
50
57
63
55
47
39
31
s24
s23
54
46
c24
c23
29
C0
S0
20
35
HA
C2
13
HA
S1
28
62
21
FA
C1
53
38
37
36
49
56
C8
S3
C3
33
26
48
FA
S9
11
HA
41
34
FA
C9
S6
19
S4
C4
S5
FA
C6
12
FA
C5
42
61
27
FA
S2
HA
S8
C7
S7
45
44
43
FA
FA
C15
C14
S15
FA
FA
C13
S14
C12
S13
25
18
60
FA
C11
S12
32
40
52
51
50
FA
C22
FA
C21
S22
FA
C20
S21
FA
C19
S20
S10
17
FA
C18
S19
10
HA
C10
S11
24
FA
C17
S18
HA
C16
S17
S16
16
59
58
57
FA
FA
p0[8]
p0[9]
p0[10]
]
FA
FA
p7
p6
FA
FA
FA
FA
HA
p5
p4
p3
p2
p1
p0
(a)
63
55
47
s28
s27
s26
s25
23
15
30
62
c28
c27
c26
c25
43
46
39
24
53
FA
C28
54
60
s30
s29
S28
C27
47
61
c30
c29
58
57
54
55
C41
C34
C40
51
HA
S30
C29
S33
C32
FA
C33
S34
S29
S31
C31
S32
50
58
FA
C39
S40
HA
FA
61
FA
S41
S25
FA
S39
C38
HA
FA
S38
C37
S37
C36
S36
57
63
s33
s32
s31
s30
s29
c33
c32
c31
c30
c29
50
62
61
c30
c29
58
57
(b)
63
s39
s38
s37
s36
s35
s34
c39
c38
c37
c36
c35
c34
57
p1[14]
p1[13]
p1[12]
p1[11]
p1[10]
p1[9]
p1[8]
44
C25
55
p1[15]
p1[15]
59
52
S26
43
FA
S35
62
FA
63
C26
29
FA
60
FA
C35
S23
36
37
FA
C30
C23
FA
S27
45
50
S24
31
FA
22
HA
HA
C24
FA
FA
p1[14]
p1[13]
FA
FA
p1[12]
p1[11]
FA
p1[10]
FA
HA
p1[9]
p1[8]
TABLE I
REGULAR DADDA MULTIPLIER WITH CLA
Multiplier
N by N
Area
( m2 )
Delay
(ns)
Power
( W )
8 by 8
8,428
3.40
6.32
16 by 16
29,169
4.71
33.09
32 by 32
105,237
5.92
210.50
64 by 64
397,146
7.54
925.92
TABLE II
PARTITIONED DADDA MULTIPLIER WITH CLA
Multiplier
N by N
Area
( m2 )
Delay
(ns)
Power
( W )
8 by 8
8,957
3.51
6.85
16 by 16
30,241
4.61
35.22
32 by 32
107,362
5.47
218.76
64 by 64
386,629
6.94
952.59
B.
The p0[10:8] and p1[10:8] are added using 3-bit CLA which
finds p[10:8]. To obtain the remaining p[15:11], the
p1[15:11] are assigned to the input of 5-bit MBEC, which
produce the two partial results p1[15:11] with Cin of 0 and
the 5-bit BEC output with the Cin of 1. Depending on the
Cout of CLA(c[10]), the mux provides the final p[15:11]
without having to ripple the carry through p1[15:11].
The 8-bit multiplier uses a single 5-bit MBEC in the final
adder. But the large bit sized multipliers requires multiple
MBEC and each of them requires the selection input from
the carry output of the preceding MBEC. Therefore to
generate the carry output from the MBEC, an additional
block is developed which is called MBECWC (MBEC With
Carry). The detailed structures of the 5-bit BEC without
carry (BEC) and with carry (BECWC) are shown Fig. 6(a)
and Fig. 6(b). The BEC gets n inputs and generates n output;
the BECWC gets n input and generates n+1 output to give
the carry output as the selection input of the next stage mux
used in the final adder design of 16, 32 and 64-bit
multipliers. The function table of BEC and BECWC are
shown in Table III.
MBEC
p1[15:11]
p1[10:8]
5-Bit BEC
5
10:5 Mux
5
p[7:0]
p[10:8]
b4
b4
x4
b3
b3
b2
b2
x3
x4
b1
x3
x2
x2
(a)
x1
b4
b0
b1
x1
x0
b4
b0
x0
Cout
TABLE III
FUNCTION TABLE OF 5-BIT BEC & BECWC
Input
b[4:0]
00000
00001
00010
00011
00100
11011
11100
11101
11110
11111
BEC without
BEC with
carry
carry
cy x[4:0]
x[4:0]
00001
0 00001
00010
0 00010
00011
0 00011
00100
0 00100
00101
0 00101
11100
11101
11110
11111
00000
0
0
0
0
1
11100
11101
11110
11111
00000
p[7:0]
p[15:11]
3-bit RCA
c[10]
5
1
p0[10:8]
x4
Cout
b3
b2
b3
b1
b2
x3
x4
b0
b1
x2
x3
x2
x1
x1
b0
x0
x0
(b)
Fig. 6. The 5-bit Binary to Execss-1 Code Converter: (a) BEC (without
carry), (b) BECWC (with carry).
A MBECWC
p1[31:24]
p1[19:16]
p1[23:20]
p0[19:16]
p[15:0]
0
8-Bit BEC
4-Bit BECWC
8
1
16:8 Mux
5
0
16
10:5 Mux
p[31:24]
c[23],p[23:20]
p1[63:49]
p1[48:41]
9
1
5
0
32
10:5 Mux
p[63:49]
5-bit RCA
c[36]
18:9 Mux
15
p[31:0]
4-bit BECWC
30:15 Mux
p0[36:32]
8-bit BECWC
15
p1[36:32]
4
0
15-bit BEC
p[15:0]
p[19:16]
p1[40:37]
15
15
4-bit RCA
c[19]
c[48],p[48:41]
c[40],p[40:37]
p[31:0]
p[36:32]
(b)
p1[127:98]
p1[97:82]
30
p1[81:74]
8
16
0
30-bit BEC
30
16-bit BECWC
30
1
0
60:30 Mux
30
p[127:98]
17
1
5
0
18:9 Mux
17
c[69]
5
1
6-bit RCA
64
10:5 Mux
c[97],p[97:82]
p[63:0]
4-bit BECWC
34:17 Mux
p0[69:64]
8-bit BECWC
17
p1[69:64]
p1[73:70]
4
c[81],p[81:74]
c[73],p[73:70]
p[69:64]
p[63:0]
(c)
Fig. 7. Variable block hybrid final adder: (a) For 16-bit multiplier, (b) For 32-multiplier, (c) For 64-bit multiplier.
TABLE IV
PARTITIONED DADDA MULTIPLIER WITH CLA
TABLE V
PARTITIONED DADDA MULTIPLIER WITH HYBRID ADDER
Multiplier
N by N
Area
( m2 )
Delay
(ns)
Power
( W )
Multiplier
N by N
Area
( m2 )
Delay
(ns)
Power
( W )
8 by 8
8,957
3.51
6.85
8 by 8
9,144
3.38
7.07
16 by 16
30,241
4.61
35.22
16 by 16
30,577
4.13
35.99
32 by 32
107,362
5.47
218.76
32 by 32
107,491
4.71
221.01
64 by 64
386,629
6.94
952.59
64 by 64
381,776
5.51
966.45
REFERENCES
[1]
[2]
[3]
V. RESULT SUMMARY
The comparison between the Table I (regular Dadda
multiplier with CLA) and Table V (partitioned multiplier
with hybrid adder) summarizes the enhanced performance of
the proposed multiplier in terms of percentages which are
listed in Table VI. It exhibits that the area of the regular
Dadda multiplier is only slightly lesser, ranging from 7.7%
to 1.4% for the 8, 16, 32 and 64-bits respectively, than the
area of the proposed multiplier. It is clear that the area
overhead of the proposed multiplier continuously decreases
with increasing word size and is only 1.4% for the 64-bit
multiplier.
The power consumption of the regular Dadda multiplier is
5.2% less than the proposed multiplier for the 8-bit word
size. With increasing word size the difference in power
requirement of the proposed and the Dadda multiplier
decreases. Thus the 64-bit Dadda multiplier requires only
3.7% less power than the proposed multiplier.
The delay values clearly indicate that the proposed
multiplier is always faster than the regular Dadda multiplier,
also with increasing word size the percentage reduction of
the delay increases. The speed enhancement is significant
for the 64-bit where the regular Dadda requires 41.1% more
time than the proposed multiplier.
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
VI. CONCLUSION
[14]
[15]
TABLE VI
PERFORMANCE OF THE REGULAR WITH REFERENCE TO THE PROPOSED
DADDA MULTIPLIER
Multiplier
N by N
Area
%
Delay
%
Power
%
8 by 8
-8.5
+ 0.5
-11.8
16 by 16
-4.8
+ 12.21
-8.76
32 by 32
-2.1
+ 20.40
-4.99
64 by 64
3.8
+ 26.91
-2.21