Sie sind auf Seite 1von 2

Low-power-delay-product radix-4 8*8 Booth Since there is no second input (input of 15-bit ‘0’ in Fig.

1a) to
multiplier in CMOS adder/subtractor in stage-1, it is implemented using a novel 9-bit
binary-to-2’s complement converter (B2C) and 9-bit MUX 2:1 rather
H. Xue✉, R. Patel, N.V.V.K. Boppana and S. Ren than a 15-bit adder/subtractor. (Details of the novel B2C design and
performance are included below.) Also, the least significant bit (LSB)
The quest continues for microelectronic implementations with higher of y (y−1 ) in stage-1 is always ‘0’, so the three-input encoder in
throughput and reduced power consumption, particularly for digital Fig. 1a can be simplified to a two-input encoder. The inputs to
signal processing, graphic processing unit and CPU portable appli- stage-1 will be 8-bit multiplicand x and 2-bit LSB of multiplier y. The
cations. This Letter focuses on the digital multiplier circuit, which is 2-bit LSB of y is applied to the two-input encoder, controlling the 3:1
a key component in determining the power-delay-product for numer- MUX to select from 0, x or 2x and controlling the 2:1 MUX to select
ous battery powered applications. A proposed radix-4 8 × 8 Booth from output of MUX 3:1 or B2C. The 2-bit LSB of stage-1 output is
multiplier is implemented using four stages with a unique optimised directly delivered to the 2-bit LSB of final product P, rather than
stage-1 architecture. Instead of using adder/subtractor in stage-1, it is pushing down to add 0 in subsequent stages. Therefore, the shifter in
replaced with a novel binary-to-2’s complement converter and a 2:1
Fig. 1a can be eliminated, and the rest of the 7-bit output of stage-1
MUX to reduce the delay and power consumption by 7.08 and
49.46%, respectively, compared to the other stages. The proposed and x can be computed in a standard way in the next stage.
design is implemented using CMOS 90 nm technology with 1.2 V Stages 2/3/4 are implemented using the same architecture, with the
supply to demonstrate performance. radix-4 Booth encoder, 9-bit MUX 3:1 and 9-bit adder/subtractor.
Similar to stage-1, the corresponding 3-bits of y are applied to Booth
encoder, controlling MUX 3:1 to select from 0, x or 2x as one input to
Introduction: Booth multipliers are widely used in virtually every
the adder/subtractor. Another input of the adder/subtractor is the
application requiring high-performance digital multiplication including
output of the previous stage. Similarly, the 2-bit LSB of stage-2/3
arithmetic logic units, graphic processing units and digital signal pro-
output will be directly delivered as the next 2-bit of final product P.
cessing [1]. Booth encoding changes the multiplier word into a
Therefore, the binary number computed in adder/subtractor will
radix-4 scheme that halves the number of partial products with a corre-
remain 9 bits, rather than accumulating 2-bit in each stage. The output
sponding reduction in the number of hardware implementations of
of stage-4 provides the remaining 9 bits of final product P, as shown
adders. The recoding can be done in real time with a relatively small
in Fig. 1b.
increase in hardware, so the reduced additions has made the Booth
The 9-bit adder/subtractor is implemented with a square root carry
multiplier a popular choice for high-performance multipliers while redu-
select adder (SQRT CSA) to reduce delay and area consumption. The
cing power and area requirements, which is the most important charac-
structure of SQRT CSA is implemented using two pairs of ripple
teristics in designing integrated circuits [2]. This Letter is focused on
carry adder (RCA), one for Cin = 0 and another for Cin = 1 and output
proposed design improvements to the conventional Booth multiplier
is selected using MUX. The first stage of the 9-bit SQRT CSA consists
to decrease the power delay product (PDP). A radix-4 8 × 8 Booth
of two 1-bit full adders with Cin = 0 and Cin = 1, respectively; similarly,
multiplier incorporating the proposed design changes is developed in
the second, third and fourth stages of the adder consist of 2-, 3-, 4-bit
90 nm CMOS technology for performance comparisons.
RCAs with Cin = 0 and Cin = 1, respectively.
Conventional booth multiplier: The key components for the conven-
A0 A1 A2 A3 A4 A5 A6 A7 A8
tional Booth multiplier is shown in Fig. 1a for a radix-4 8 × 8 implemen-
tation. The number of partial products is reduced from 8 to 4 compared
to traditional multiplier, with a 15-bit adder/subtractors in each stage,
together with the radix-4 booth encoder and 9, 11, 13, 15-bit 3:1
MUX. Each stage is virtually identical and the output of the next
stage is dependent on receiving results from the previous stage.

P0–P14 S0 S1 S2 S3 S4 S5 S6 S7 S8
P6–P14
15
y7 9 a
15-bit adder/subtractor sel y7
15 code y6 9-bit adder/subtractor sel
8 9 code y 6 A0 A1 A2 A3 A4 A5 A6 A7 A8
15-bit MUX 3:1 sel y5 y5
15
9-bit MUX 3:1 sel
14
8 9
0 x 2x stage4
0 x 2x stage4
2 P4P5
15
left shift 2 10
y5
y5 9-bit adder/subtractor sel
15-bit adder/subtractor sel
y4 8 9 code y 4
13 code
9-bit MUX 3:1 sel y3
13-bit MUX 3:1 sel y3
8 9
12 13
0 x 2x stage3
0 x 2x stage3 P2P3
2 S0 S1 S2 S3 S4 S5 S6 S7 S8
10
left shift 2 y3
15 9-bit adder/subtractor sel
y3 7 9 code y 2 b
15-bit adder/subtractor sel
11 code y2 9-bit MUX 3:1 sel y1
8 9 A0 A1 A2 A3 A4 A5 A6 A7 A8
11-bit MUX 3:1 sel y1
0 x 2x stage2
10 11 P0P1
2
0 x 2x stage2 9
9-bit MUX 2:1 sel
15
left shift 2 9

y1 9-bit B2C y1
15-bit adder/subtractor sel code
9 code y 0 9 y0
15
9-bit MUX 3:1 sel y–1 9-bit MUX 3:1 sel
8 9
0 8 9
stage1
0 x 2x stage1 0 x 2x
S0 S1 S2 S3 S4 S5 S6 S7 S8
a b c

Fig. 1 Radix-4 8 × 8 Booth multiplier Fig. 2 Gate-level implementation of 9-bit B2C


a Conventional a Conventional
b Proposed b Modified with NAND and NOR gate
c Modified with four-input NAND gate
Proposed radix-4 8 × 8 booth multiplier: The proposed Booth multi-
plier design is shown in Fig. 1b with four stages, in which a unique Optimised B2C: A novel high-speed low-power B2C is presented in
stage-1 is applied as the major optimisation, while relatively smaller this section. In 2’s complement representation, positive numbers are
modifications are employed on other stages to simplify circuit structure. simply represented as themselves, while negative numbers are

ELECTRONICS LETTERS 22nd March 2018 Vol. 54 No. 6 pp. 344–346


represented by the 2’s complement of their absolute value. The 2’s with NAND and NOR gate due to the larger size of four-input gates.
complement of an n-bit number is equivalent to taking 1’s complement The modified B2C with four-input NAND gate in Fig. 2c is used in
(inverting each bit) and then adding 1 [3]. The 2’s complement of n-bit the proposed Booth multiplier.
number is formalised as
Final design performance: The proposed radix-4 8 × 8 Booth multiplier
S0 = A0 is implemented in IBM CMOS 90 nm technology with 1.2 V power
supply. The maximum operation frequency of the proposed Booth
S1 = A1 ⊕ A0 multiplier is 1.22 GHz. The worst-case delay is 1.04 ns with power
  (1) consumption of 435.9 µW at 500 MHz frequency. Performance com-
S2 = A2 ⊕ A1 · A0
parison of this work with conventional booth multiplier, and recent
··· state-of-the-art work [4] is shown in Table 2. To achieve delay and
  power- efficiency, authors in [4] used a modified Booth encoder, also
Sn = An ⊕ An−1 · An−2 , . . . , A0
partial product generation circuit is simplified by removing unnecessary
where (An , . . . , A2 A1 A0 ) is a n-bit binary number, components.
((S8 S7 , . . . , S0 ) = [ A8 A7 , . . . , A0 + 1]) is 2’s complement of
(An , . . . , A2 A1 A0 ). Based on (1), a conventional 9-bit B2C circuit can Table 2: Performance comparison of radix-4 8 × 8 booth multiplier
be designed as Fig. 2a. The critical path in Fig. 2a is highlighted, in 90 nm CMOS technology
which consists one inverter, seven AND gates and one XOR gate. Delay (ns) Power (µW) PDP (fJ)
To eliminate the inverters included in AND gates in the critical path, Conventional 2.21 521.2 1152
NAND and NOR gates are used to replace AND gates. Then, an opti- Qiana [4] 2.08 405.5 834
mised version of (1) is given in (2) by applying De Morgan’s law This work 1.04 435.9 453
(Ā · B̄ = A + B) and for the XOR function (Ā ⊕ B̄ = A ⊕ B) a
Data in reference is normalised to CMOS 90 nm
technology with 1.2 V supply voltage using formulas
S0 = A 0 given in [5]:
S1 = A1 ⊕ A0     2
90 nm 90 nm 1.2V
    tdnorm = td × , Enorm = E ×
S2 = A2 ⊕ A1 · A0 = A2 ⊕ A1 · A0 tech. tech. Vtech.

    
S3 = A3 ⊕ A2 · A1 · A0 = A3 ⊕ A2 + A1 · A0 As seen in Table 2, circuit delay and power consumption of the pro-
⎛  ⎞ posed design are both improved compared to conventional design.
  
  Although the proposed design consumes more power compared to the
Sn−1 ⎝
= An−1 ⊕ An−2 + An−3 · An−4 + An−5 · · · · + A1 · A0 ⎠
referenced work, the PDP performance is superior.
⎛  ⎞
   Conclusion: This Letter presented a low PDP radix-4 8 × 8 Booth mul-
 
Sn = An ⊕ ⎝An−2 · An−3 + An−4 · An−5 + · · · + A1 · A0 ⎠ tiplier. A novel binary-to-2’s complement converter is incorporated to
facilitate replacement of adder/subtractor in stage-1 to reduce power
(2) and delay of the Booth multiplier. Also, some additional modifications
are included to further simplify the structure of the Booth multiplier.
where n is even number, n − 1 is odd number. The modified 9-bit B2C is Cadence simulation results in 90 nm CMOS technology indicate that
illustrated in Fig. 2b. As seen, the highlighted critical path in Fig. 2b PDP of the proposed design is improved compared to conventional
(one inverter, four NAND gates, three NOR gates and one XOR gate) design and recent state-of-the-art design.
is without AND gate. Cadence schematic simulation results of 9-bit
B2C are shown in Table 1. As seen, the worst PDP of the modified © The Institution of Engineering and Technology 2018
B2C in Fig. 2b (17.854fJ) is decreased by 9.2% compared to the con- Submitted: 25 October 2017 E-first: 13 February 2018
ventional B2C in Fig. 2a (19.653fJ). doi: 10.1049/el.2017.3996
One or more of the Figures in this Letter are available in colour online.
Table 1: Performance of 9-bit B2C
H. Xue, R. Patel, N.V.V.K. Boppana and S. Ren (Wright State
Delay (ps) Power (µW) PDP (fJ) University, Dayton OH 45435, USA)
Conventional 465.81 42.19 19.653 ✉ E-mail: xue.10@wright.edu
Modified with NAND and NOR gate 453.96 39.33 17.854
Modified with 4-inputs NAND gate 246.26 40.01 9.853 References
1 Jiang, H., Han, J., Qiao, F., et al.: ‘Approximate radix-8 booth
multipliers for low-power and high-performance operation’,
To further reduce the delay of critical path, four-input NAND gates Trans. Comput., 2016, 65, (8), pp. 2638–2644, doi: 10.1109/
are used to replace some of the two-input NAND gates, as shown in TC.2015.2493547
Fig. 2c. As seen, the four-input NAND gates do not have to wait for pre- 2 Xue, H., and Ren, S.: ‘Low power-delay-product dynamic CMOS circuit
vious two gates: gate 3 does not have to wait for gate 1/2, gate 6 does not design techniques’, Electron. Lett., 2017, 53, (5), pp. 302–304, doi:
have to wait for gate 4/5 to proceed signal in the critical path. The critical 10.1049/el.2016.4173
path in Fig. 2c is highlighted as one inverter, two 4-input NAND gates, 3 Chattopadhyay, T., and Gayen, D.: ‘All-optical 2’s complement number
one NOR gate and one XOR gate. Compared to the circuit in Fig. 2b, the conversion scheme without binary addition’, Optoelectronics, 2017, 11,
number of gates in the critical path is reduced from 9 to 5. Though four- (1), pp. 1–7, doi: 10.1049/iet-opt.2015.0087
4 Qian, L., Wang, C., Liu, W., et al.: ‘Design and evaluation of an approxi-
input NAND gate has longer pull-down-path compared to two-input
mate wallace-booth multiplier’. IEEE Int. Symp. Circuits and Systems
gates, the delay of four-input NAND gate still can be close to two-input (ISCAS), Montreal, QC, Canada, May 2016, pp. 1974–1977
gates by transistor size optimisation. As shown in Table 1, PDP of B2C 5 Chuang, P., Sachdev, M., and Gaudet, V.: ‘A 167-ps 2.34-mW single-
in Fig. 2c (9.853fJ) is decreased by 44.8% compared to the B2C in cycle 64-bit binary tree comparator with constant-delay logic in 65-nm
Fig. 2b (17.854fJ). The power consumption of modified B2C with CMOS’, Trans. Circuits Syst., 2014, 61, (1), pp. 160–171, doi:
four-input NAND gate is slightly increased compared to modified B2C 10.1109/TCSI.2013.2268591

ELECTRONICS LETTERS 22nd March 2018 Vol. 54 No. 6 pp. 344–346

Das könnte Ihnen auch gefallen