ieee

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

10 Aufrufe

ieee

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

- 74HC154
- Design of a Coffee Vending Machine Using Single Electron Devices
- Robotics
- Circuit Debug Question
- A New Full Swing Full Adder Based on a New Logic Approach
- Ace Acadmey
- 18strategies Methodologies for Low Power Vlsi Designs a Review Copyright Ijaet
- Production and Materials Management Assignment
- IR Application Note - 1123
- Course plan Basic.doc
- Evaluation of CMOS
- DJ3 User's Manual
- ReviewKeys.COM Tamil Nadu TTA 2013 Notification
- Ans by VEERU for CMOS Questns.
- 32024981448-ECD-Lab-NEC-752-doc.doc
- Wireless Remote Controlled Toy Cars Work on the Concept Explained in This Project
- 210-212CRP0205P04Design of Power Efficient and High Speed Carry Select Look Ahead Adder Using SP-D3l Logic
- gtaket
- CMOS Report.docx
- 11 Performance

Sie sind auf Seite 1von 5

A Low-Power 16

1395

C. F. Law, S. S. Rofail, and K. S. Yeo

very large scale integration multiplier, designed and fabricated

using a 0.8-m double-metal double-poly BiCMOS process. In

order to achieve low-power operation, the multiplier was designed utilizing mainly pass-transistor (PT) logic circuits. The

inherent nonfull-swing nature of PT logic circuits were taken

full advantage of, without significantly compromising the speed

performance of the overall circuit implementation. New circuit implementations for the partial-product generator and the

partial-product addition circuitry have been proposed, simulated,

and fabricated. Experimental results showed that the worst case

multiplication time of the test chip is 10.4 ns at a supply voltage of

3.3 V, and the average power dissipation is 38 mW at a frequency

of 10 MHz.

Index Terms Low-power VLSI design, parallel multipliers,

pass-transistor logic.

I. INTRODUCTION

parallel multiplication unit to carry out high-speed

mathematical operations. In many situations, the multiplier

lies directly in the critical-path, resulting in an extremely high

demand on its speed. In the past, considerable efforts were put

into designing multipliers with higher speed and throughput,

which resulted in fast multipliers which can operate with a

delay time as low as 4.1 ns [1]. However, with the increasing

importance of the power issue due to the portability and

reliability concerns of electronic devices [2], recent work has

started to look into circuit design techniques that will lower

the power dissipation of multipliers [3][5].

This paper describes the design and fabrication of a

16 16-b parallel multiplier, based on a 0.8- m BiCMOS

process, for low-power applications. Pass-transistor (PT)

logic is chosen to implement most of the logic functions

within our multiplier. Emerging as an attractive replacement

for the conventional static CMOS logic, especially in

the design of arithmetic macros, PT logic requires fewer

devices to implement basic logic functions in an arithmetic

operation, such as the XOR function. This translates into

lower input gate capacitance and power dissipation as

compared to conventional static CMOS [2]. In the PT circuit

implementations reported so far [6][9], transmission-gate

(TG) design techniques which provide full voltage swings

were widely adopted. In this paper, we present several circuits

that fully exploit the inherent nonfull-swing (NFS) nature of

The authors are with the Division of Circuits and Systems, School of

Electrical and Electronic Engineering, Nanyang Technological University,

Singapore 639798 (e-mail: eksyeo@ntu.edu.sg).

Publisher Item Identifier S 0018-9200(99)08225-6.

CODING OF

IN A

TABLE I

STANDARD PPG IMPLEMENTATION

within our multiplier to achieve low-power operation.

Various proposed and reported circuit implementations

of the partial-product generator (PPG) and partial-product

adder (PPA) are discussed in Sections II and III, respectively.

Section IV presents the experimental measurements of the test

chip. All circuit simulations are based on a 0.8- m doublemetal double-poly BiCMOS process, and carried out on the

HSPICE simulator.

II. PARTIAL-PRODUCT GENERATOR (PPG)

To date, the most widely adopted technique for partialproduct generation in large multipliers (16-b and above) is

the modified Booths algorithm (MBA). The main attraction

of MBA is that instead of generating partial-products for an

-b multiplication, it only generates half of that. According

to MBA, a signed binary number in its two-complement form

can be partitioned into overlapping groups of three bits. By

coding each of these groups, an -b signed binary number

signed digits. As each

can be represented as a sum of

signed digit takes the possible values of zero, 1 and 2, the

required partial-products are all power-of-two multiples of the

multiplicand (X), which are readily available.

The standard PPG circuit implementation requires five control bits, each representing a , X, X, 2X, or

2X operation. The truth table for the control bits are

shown in Table I. When implemented in full CMOS, the

encoders only exhibit moderate performance [4]. To improve

its performance, complementary PT logic (CPL) family cells

have been used in [4]. Although significant improvement in

power dissipation (30%) has been reported, the CPL encoder

requires 122 transistors to implement, a 150% increase compared to the CMOS encoder (48 transistors), and provides

only 6% improvement in speed. We present a PT Booths

encoder (Fig. 1) which offers better performance over both

the CMOS and CPL implementations in terms of power, speed

and transistor count. From Table I, it is obvious that the control

,

, and

are the same.

bit for 0 is high when

1396

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 10, OCTOBER 1999

TABLE II

IMPROVEMENT OF PROPOSED ENCODER OVER CMOS

AND

CPL ENCODERS

different, provided

is low. The same is true for X,

must now be high. The expressions for these

except that

control bits are

(2)

(3)

and

should be performed to generate all three control

bits. A PT XOR-XNOR pair carries out this operation and

the results (XOR and XNOR) are fed simultaneously into

three PT AND-NAND pairs to generate the respective control

bits. The control bits for 2X and 2X are generated

using conventional CPL logic style and therefore will not

be discussed. The proposed circuit was compared with the

CMOS and CPL circuits, and the results are shown in Table II.

The proposed encoder outperforms the CMOS implementation

by 21% in speed and over 50% in power dissipation, with

approximately the same transistor count. When compared to

the CPL encoder, our circuit is faster by 15% and achieves

about 50% improvement in power and transistor count.

(a)

(b)

Fig. 1. Proposed circuit implementations for control-bits (a) 0 and (b)

X and X.

(1)

0

The control bit for X is high when

and

are

regular adder array, which has a regular structure and is easy

to layout. However, it suffers from poor speed performance

and power wastage due to spurious transitions. For large

multipliers, another approach, the Wallace reduction technique,

is usually used. This approach leads to much better speed

performance due to the high-level of parallelism employed

in the Wallace tree-adder, constructed using multiple-input

compressors that can sum up several partial-products concurrently. The second approach was adopted in our circuit

implementation to obtain the best speed performance possible,

while PT logic circuits with NFS nodes were used to reduce

the power of the Wallace tree-adder.

In a 16 16-b multiplier utilizing the MBA, there are eight

partial-products to be added up. Thus, the 4-2 compressor was

chosen as the basic building block of the PPA. It receives five

and

), cominput bits of the same weight (

, and

).

presses them, and generates three output bits (

Various circuit implementations of the 4-2 compressor have

been reported. Full CMOS implementations usually suffer

from high transistor count and input gate capacitance, leading

to only moderate speed and power performance. The pseudoCMOS implementation proposed in [10] (using a 0.5- m

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 10, OCTOBER 1999

1397

TABLE III

COMPARISIONS OF WALLACE TREE-ADDERS WITH

DIFFERENT 4-2 COMPRESSOR IMPLEMENTATIONS

transistor count of the basic building gate (an XOR gate),

and an improvement of 12.5% in speed over the full CMOS

circuit has been reported. To further simplify the design, a

PT-multiplexer-based circuit comprising only of TGs, using

a 0.25- m CMOS, technology, was proposed in [6]. Using this

technique, a multiplication time of 4.4 ns has been reported.

Clearly, considerable efforts have been directed toward simplifying the design of the compressors and improving the speed

of the Wallace tree-adder. Its power dissipation, however, was

never a major consideration. We present a 4-2 compressor

circuit design (Fig. 2) which is an improved version of the

one proposed in [6], requiring fewer transistors to implement

and consuming less power. The proposed design takes full

advantage of the NFS nodes that are inherent in PT logic

circuits. As shown in Fig. 2, it consists of two types of PT

multiplexers, one providing NFS outputs (NFS MUX), and

the other providing full-swing (FS) outputs using two PMOS

pull-up devices (FS MUX). For each 4-2 compressor, only

,

,

, and

, and the output ,

the internal nodes

for logic high, while the rest only

are pulled up to

. Among the NFS nodes are

reaches approximately

) and internal nodes

both the output carry signals ( and

to form the PPA. Since the outputs of the compressors in

the first stage drive the inputs of those in the second stage,

of the first stage, being a NFS node, must be used to

drive

or

(which can accept NFS logic high) of the

second stage. While , being an FS node, can be used to

drive any of the second stage compressors inputs. With this

technique, about 50% of the nodes within the PPA are non

full swing. Furthermore, as only two of the 4-2 compressors

and

), only four of

inputs require full voltage swing (

the eight partial-products generated by PPG are required to

achieve full swing. This leads to significant power reduction

for the multiplier.

The PPA was implemented using the various 4-2 compressors discussed above and comparisons, in terms of speed,

power, power-delay product, and transistor count, were made

at 3.3 V. The simulation results are shown in Table III.

When compared to the pseudo-CMOS implementation, the

proposed implementation achieved significant improvements

in delay, power, and transistor count. The shorter delay in

the proposed implementation (2.2 ns) is due to its much

shorter critical-path (three PT multiplexers) compared to the

pseudo-CMOS implementation (two -channel XOR gates

and two CMOS complex gates). The presence of NFS nodes

and 48% cut in the transistor count has led to an improvement in power dissipation by 44%. Significant improvements

over the TG implementation in terms of power (62%) and

transistor count (37.5%) were also obtained. Although the

lower current drive capability of NFS nodes, as compared

to FS-nodes, has caused the proposed implementation to

suffer a 16% decrease in speed, the power-delay product

still improved by over 50%. In conclusion, the low-power

and low-transistor-count characteristics of the improved PT

4-2 compressor are very useful in the design of a lowpower high-performance PPA, with relatively small circuit

area.

1398

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 10, OCTOBER 1999

Fig. 4. Worst case multiplication time and average power dissipation at 10 MHz of the test chip for supply range 2.5 to 5 V as compared to other

reported multipliers of the same width.

TABLE IV

CHARACTERISTICS OF FABRICATED 16

2 16-b MULTIPLIER

0.5- m CMOS process, where over 50% reduction in power

is obtained. The characteristics of the fabricated device are

shown in Table IV.

V. CONCLUSION

The multiplier was fabricated on a test chip using a 0.8- m

double-metal double-poly BiCMOS process. To measure its

worst case multiplication time, input test patterns are applied to trigger its critical-path, which includes a Booths

encoder, a control-line buffer, a partial-product selector, two

4-2 compressors, a half-adder, and the 32-b two-operand carryselect adder, with carry propagation from the fourteenth to the

highest (thirty-first) bit-position. One such pattern is shown

in Fig. 3. The worst case (rise) delay is measured to be 10.4

ns. The average power dissipation of the test chip, inclusive

of the multiplier core, input/output pads, output multiplexers

and testing circuitry, with no probes at the outputs, is 38 mW.

The multiplication time and power dissipation of the fabricated

device are measured for the supply range of 2.5 V to 5 V, and

the results are compared with some of the reported multipliers

of the same width, as shown in Fig. 4. At 3.3 V, the multipliers

reported in [11] and [12], which used a 0.6- and 0.5- m

CMOS technology, respectively, achieved, as expected, better

multiplication time compared to our work. Our multiplier,

however, provides significant saving in power. At 10MHz, it

is less than half that of [11] and even less when compared to

[12]. Similar observation is made at 4 V when our multiplier

We have presented several low-power PT circuit techniques for parallel multiplication. Taking full advantage of the

low-transistor-count and NFS nature of PT logic, we have

successfully implemented low-power circuit blocks which

16-b multiplier,

serve as basic building units within a 16

including a new Booths encoder and a modified 4-2 compressor. Experimental measurements on the fabricated multiplier

and comparisons with other reported multipliers have verified

its low-power characteristics. The total power dissipation of

the test chip at 3.3 V is 38 mW at 10 MHz, with a worst case

multiplication time of 10.4 ns.

REFERENCES

[1] A. Inoue, R. Ohe, S. Kashiwakuura, S. Mitarai, T. Tsuru, T. Izawa,

and G. Goto, A 4.1 ns compact 54 54-b multiplier utilizing sign

select booth encoders, in Proc. Int. Solid-State Circuits Conf., 1997,

pp. 416417.

[2] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, Low-power

CMOS digital design, IEEE J. Solid-State Circuits, vol. 27, pp.

473483, Apr. 1992.

[3] R. Fried, Minimizing energy dissipation in high-speed multipliers, in

IEEE Int. Symp. Low-Power Electronics and Design, Dig. Tech. Papers,

1997, pp. 214219.

[4] I. S. Abu-Khater, A. Bellaouar, and M. I. Elmasry, Circuit techniques

for CMOS low-power high-performance multipliers, IEEE J. SolidState Circuits, vol. 31, pp. 15351546, Oct. 1996.

[5] K. H. Cheng and L. Y. Yee, A 1.2 V CMOS multiplier using low-power

current-sensing complementary pass-transistor logic, in Proc. IEEE Int.

Conf. Electronics, Circuits, Systems, 1996, pp. 10371040.

[6] N. Ohkubo, M. Suzuki, T. Shinbo, T. Yamanaka, A. Shimizu, K.

Sasaki, and Y. Nakagome, A 4.4 ns CMOS 54 54-b multiplier using

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 10, OCTOBER 1999

1399

251257, Mar. 1995.

[7] K. Yano, T. Yamanaka, T. Nishida, M. Saito, K. Shimohigashi, and

A. Shimizu, A 3.8 ns 16 16-b multiplier using complementary passtransistor logic, IEEE J. Solid-State Circuits, vol. 25, pp. 388395,

Apr. 1990.

[8] H. Hara, T. Sakurai, T. Nagamatsu, K. Seta, H. Momose, Y. Niitsu,

H. Miyakawa, K. Matsuda, Y. Watenabe, F. Sano, and A. Chiba, 0.5

m 3.3 V BiCMOS standard cells with 32-kilobyte cache and ten-port

register file, IEEE J. Solid-State Circuits, vol. 27, pp. 15791584, Mar.

1992.

[9] A. Rothermel, B. J. Hosticka, G. Troster, and J. Arndt, Realization

of transmission-gate conditional-sum (TGCS) adders with low latency

time, IEEE J. Solid-State Circuits, vol. 24, pp. 558561, June 1989.

K. Hashimoto, H. Hayashida, and K. Maeguichi, A 10 ns 54

54-b

parallel structured full array multiplier with 0.5 m CMOS technology,

IEEE J. Solid-State Circuits, vol. 26, pp. 600605, Apr. 1991.

[11] Y. Oowaki, K. Numata, K. Tsuchiya, K. Tsuda, H. Takato, N. Takenouchi, A. Nitayama, Y. Kobayashi, M. Chiba, S. Watanabe, K. Ohuchi,

and A. Hojo, A sub-10 ns 16 16 multiplier using 0.6 m CMOS

technology, IEEE J. Solid-State Circuits, vol. SSC-22, pp. 762766,

May 1987.

[12] R. Sharma, A. D. Lopez, J. A. Michejda, S. J. Hilleniue, J. M. Andrews,

and A. J. Studwell, A 6.75 ns 16 16-b multiplier in single-level-metal

CMOS technology, IEEE J. Solid-State Circuits, vol. 24, pp. 922927,

Apr. 1989.

- 74HC154Hochgeladen vonhttpscribd
- Design of a Coffee Vending Machine Using Single Electron DevicesHochgeladen vonpntuanhcm
- RoboticsHochgeladen vonPongsathorn
- Circuit Debug QuestionHochgeladen vonDharang Shah
- A New Full Swing Full Adder Based on a New Logic ApproachHochgeladen vonsalix144
- Ace AcadmeyHochgeladen vonsannu91
- 18strategies Methodologies for Low Power Vlsi Designs a Review Copyright IjaetHochgeladen vonnamanga043424
- Production and Materials Management AssignmentHochgeladen vonSumiya Yousef
- IR Application Note - 1123Hochgeladen vonusama_gcul
- Course plan Basic.docHochgeladen vonnvignesh93
- Evaluation of CMOSHochgeladen vonAmrYassin
- DJ3 User's ManualHochgeladen vonkaranmundhra86
- ReviewKeys.COM Tamil Nadu TTA 2013 NotificationHochgeladen vonReviewKeys.com
- Ans by VEERU for CMOS Questns.Hochgeladen vonveerugc89
- 32024981448-ECD-Lab-NEC-752-doc.docHochgeladen vonpcjoshi02
- Wireless Remote Controlled Toy Cars Work on the Concept Explained in This ProjectHochgeladen vontariq76
- 210-212CRP0205P04Design of Power Efficient and High Speed Carry Select Look Ahead Adder Using SP-D3l LogicHochgeladen voneditor_ijtel
- gtaketHochgeladen vonRajendra Prasad
- CMOS Report.docxHochgeladen vonMuhammad Tayyab Madni
- 11 PerformanceHochgeladen vonKiran Kumar
- BE Course PlanHochgeladen vonanirkhenjas
- IPFA TutorialHochgeladen vonVenkateshwarlu Pillala
- 320249818-ECD-Lab-NEC-752-doc.docHochgeladen vonpcjoshi02
- ir2127Hochgeladen vonxexerica
- FULReduction of Substrate Noise in Mixed-Signal CircuitsLTEXT01 (2)Hochgeladen vonJennifer Montoya
- Lec8Hochgeladen vonvipulugale
- Simulation Lab FileHochgeladen vonSougata Ghosh
- amplificador1w-204-207Hochgeladen vonAndrés Torres
- VB027.pdfHochgeladen vonhskv20025525
- ExperimentHochgeladen vonmgoldiieeee

- 167_1Hochgeladen vontkslib
- 378_1Hochgeladen vontkslib
- 318_1Hochgeladen vontkslib
- 289_1Hochgeladen vontkslib
- ieee1Hochgeladen vontkslib
- ieee3Hochgeladen vontkslib
- 260_1Hochgeladen vontkslib
- 251_1Hochgeladen vontkslib
- ieee-2Hochgeladen vontkslib
- ieee-3Hochgeladen vontkslib
- 106_1Hochgeladen vontkslib
- 194_1Hochgeladen vontkslib
- ieee-1Hochgeladen vontkslib

- Lecture 10 - CpE 690 Introduction to VLSI DesignHochgeladen vonjvandome
- Sram cellHochgeladen vonMananRajput
- l03 Cmos GatesHochgeladen vonkarthik
- Wang MingzhenHochgeladen vonnu2life
- Cmos TechnologiesHochgeladen vonashishsingla
- 79485_INFINEON_BTS740S2Hochgeladen vondasho1
- uln2003aHochgeladen vonIng Andrew Saints
- VL5101 Periodical 1Hochgeladen vonDarwin
- LC-043Hochgeladen vonchanxi9
- A 98% peak efficiency 1.5A 12V-to-1.5V Switched Capacitor dc-dc converter in 0.18um CMOS technologyHochgeladen voniMiklae
- 74HC08Hochgeladen vonangelicamaya
- NLSF595-DHochgeladen vonvanhostingweb
- Duran Leblebici, Yusuf Leblebici Fundamentals of High-Frequency CMOS Analog Integrated Circuits(1)Hochgeladen vonBenigno Tique Jonasse
- Ovonic Unified MemoryHochgeladen vonSaurabh Bhutani
- Datasheet Led DisplayHochgeladen vonokojie
- iCouplerOverview051505Hochgeladen vonHafidz Huda
- Xtal Osc Vittoz.pdfHochgeladen vonraineymj
- pe4259dsHochgeladen vonAbderrahmane Elboubakri
- 200807Hochgeladen vonbalaji_gawalwad9857
- vtu Digital ElectronicsHochgeladen vontitanic84
- Ram TimelineHochgeladen vonCarmen Rosa Amau Quispe
- Ece - III YearHochgeladen vonyash_dk20083706
- M.Tech Control Systems syllabi MANIPAL UNIVERSITYHochgeladen vonJacob Jose
- Chap1 3 Cmos Circuits LayoutHochgeladen vonEvi Sou
- TCDDK Hardware Guide ( Rev A )Hochgeladen vonMidnighttosix
- 01466221Hochgeladen vonSurya Ganesh Penaganti
- GaAs_SiHochgeladen vonpeeyush_agarwal_2
- SMD Question BankHochgeladen vonBharat
- Introduction to CMOS VLSI DesignHochgeladen vonAbdelrehim Siraj
- （课件）CMOS Analog Integrated Circuit Design Course__AllenHochgeladen vonShim Hahng