Sie sind auf Seite 1von 12

Faculty of Engineering

Computer Arithmetic
06-88-555
Fall 2009

Project report
Submitted
By
ABHINEET SINGH
103257615

Low power 4-2 and 5-2 Compressor


Report for Computer Arithmetic course project
Abhineet Singh
Faculty of Engineering
University of Windsor
Abstract
This report presents an introduction about logical effort technique, which is used for designing of

fast CMOS circuits. As an application, this technique has been applied to different architectures

of low power high order compressors such as 4-2 and 5-2 compressors unit, which are

implements using static CMOS gates. These compressors are building blocks for binary

multipliers. A comparison between delay, power consumption and area for these structures, is

presented according to simulation results in standard CMOS process. The compressor presented

is 12% faster and consumes 37% less power.

Introduction
CMOS logic gates are basic building blocks for arithmetic circuits. The delay through these gates

is related to their sizes and their loads. Logical effort is a technique, which gives insight about

proper sizing of CMOS logic gates to have the minimum achievable delay. On the other hand,

Multiplication is the basic arithmetic operation which is important in several microprocessor and

digital signal processing application. Microprocessors use multipliers within their arithmetic

logic unit, and digital signal processing systems require multipliers to implement DSP algorithm

such as convolution and filtering. In most systems, the multipliers lies directly within the critical

path due to which, the demand for high-speed multipliers is continuously increasing. In high-

speed multiplication and multi-operand addition, compressors have been widely used. Pass-

transistor logic is emerging as an attractive replacement of the conventional static CMOS logic,

especially in the design of arithmetic units such as multipliers. Fewer transistors are requires by

the pass-transistor logic to implement basic logic function, which translate into lower input gate

capacitance and lower power dissipation as compared to conventional CMOS. In this project,
first of all, logical effort technique is discussed and some supporting examples are given. Then

few common architectures for 4-2 and 5-2 compressors are sized using logical effort to get the

minimum possible delay, and their power consumptions are simulated.

Logical Effort

A logic gate delay consists of two components, a fixed part which is called as parasitic delay, p and a

load-dependent part which is called as effort delay or stage delay, f. Then the total delay can be

written as [1]:

d= f+ p (1)

The effort delay can be divided into two parts, the logical effort, g which is related to the properties

of the logic gate from sizing point of view and the electrical effort, h which characterize the load.

The effort delay is product of these two factors:

f = gh (2)

The electrical effort can be defined as the ratio of load capacitance Cout and the input capacitance of

the gate Cin:


out
h=C (3)
in
C
Translating the value of delay (given by (1)) to the time scale relates it to the delay of an inverter

driving an identical inverter (τ). This value depends only on the process, and for a given gate the total

delay in time scale is: delay = d ⋅τ (4)

where d = gh + p (5)

In Table 1 the logical effort for some static CMOS gates are summarized. In these values it is

assumed that the worst-case resistances in pull-up and pull-down networks are identical to a

minimum size inverter.


Number of Inputs

Gate Type 1 2 3 n
Inverter 1
NAND 4/3 5/3 (n+2)/3
NOR 5/3 7/3 (2n+1)/3
MUX 2 2 2
XOR 4 12

Table 1. Logical effort of static CMOS gates. (γ = 2)

Another parameter, which contributes in the total delay of the gate, is parasitic delay (p). This

parameter is independent of sizing and the load capacitance. This delay is related to source and drain

capacitance, which drives the output load. As an estimation assuming a simple layout for gates, this

delay for an n-input NAND, NOR and MUX is n, n and 2n times of that of an inverter respectively.

Now we can give a simple example to show how we can use (5) to calculate the delay. As shown in

Fig.1 assume that a 4- input NOR gate drives three identical gates. In order to estimate the delay of

the NOR gate, we should find the value for g, h and p. According to previous discussions for a 4-

input NOR gate g=9/3=3 and p=4pinv which assuming pinv=1, p=4. Also the driving gate drives three

times larger capacitance than its input capacitance and it means h=3. Using (5) gives d=3×3+4=13. If

in this process τ=50ps then the total delay of the gate is 650ps.

delay=d

(Figure 1) A NOR gate drives three identical gates


Partial Product Accumulation

Compressors are the fundamental building blocks used for accumulating the partial product during

the multiplication process. Therefore, improving the power efficiency of these architectures can lead

to significant saving of the power consumption by the entire multiplier. The compressors are

combined to form a Wallace tree or a Dadda tree structure. A Wallace tree is an implementation of

an adder tree design for minimum propagation delay. Rather than completely adding the partial

products in pair like ripple adder tree does. Dadda tree is a generalized form of Wallace tree adder.

The number of adders needed in Dadda tree is less than Wallace tree but the overall interconnections

are more irregular in Dadda tree, making it difficult to layout in VLSI design [4].

Previously, full adder or 3:2 compressor were used for accumulation, in which3 equally

weighted bits were combined to produce two bits: one (the carry) with weight of n+1 and the other

(the sum) with weight n. Each layer of the tree therefore reduced the number of vectors by a factor of

3:2.
(Figure 2) A 3:2 Comprossor, Full Adder

Different CMOS 4-2 compressor circuits

In arithmetic circuits, 4-2 and 5-2 compressors have been widely used in order to lower the latency of

the partial product accumulation stage for high speed multipliers. Also the 4-2 compressors are

popular elements in the construction of regularly structured Wallace tree with low complexity

because of its regular interconnection [2], [3]. Several 4-2 compressor circuits have been proposed

for low power applications [4]-[6]. In this project three different 4-2 compressor structures have been

selected and using logical effort technique and by proper sizing, their minimum achievable delays

have been compared. Figure 3 shows the block diagram of a 4:2 compressor. It has five inputs

including the carry-inn from the neighbouring cell of one binary bit low significance. It has three

outputs including a carry-out to the one greater significance cell. A 4:2 compressor can be built using

3:2 compressors. It consists of two 3:2 compressor in series and involving a critical path delay of 4

XORs.

(Figure 3)A 4:2 Comprossor


In [4], a structure consisting of 4 XOR gates and two multiplexer is introduced (shown in Fig.3). A

somehow similar structure with different XOR implementation-which is shown in Figure 4 is used in

[5]. Third configuration is introduced in [6] and shown in Figure 5.It uses only multiplexers. This

implementation is better and involves a critical path delay of 3 XORs, hence reducing the critical

path delay by 1 XOR [4].

Figure 4 Figure 5
Figure 6 shows the block diagram of a 5:2 comprosser. It has 5 direct inputs and 2 additional

carry-in bits, from a neghbouring one-lower significant cell. It has four outputs, among which

two of them are carry-out bits to the one greater significant cell, carry bit is of one greater

significant and last is the sum bit. A 5:2 compressor can be built using three 3:2 compressors, in

which it involves a critical deklay of 6 XORs.


(Figure 6) A 5:2 Comprossor

An implementation of 5:2 compressor was proposed in [4]. Figure 7 shows the block diagram of

this 5:2 compressor implementation. At the first glance, It can be assumed that this

implementation would have a critical path delay of 4 XORs. But a detailed analysis reveals that

he critical path delay consistes of 3 XORs + 1 another gate delay.


(Figure 7) 5:2 Comprossor, Implementation 1

Figure 8 shows a block diagram of the second implementation of the 5:2 compressor. This

implementationn is derived in a similar fashion as shown in the figure 4 and figure 5. It,

basically, consists of a 4:2 compressor followed by a 3:2 compressor and has a critical path delay

of 5 XORs. As it will be shown below, although this circuit has 5 XORs delay it is faster and

more power efficient as compared to the implementation in figure 8. The detailed equation are

written below.

Sum = a ⊕ b ⊕ c ⊕ d ⊕ e ⊕ x1in ⊕ x2in

Carry =( a ⊕ b ⊕ c ⊕ d ⊕ e ⊕ x1in) . x2in + ( a ⊕ b ⊕ c ⊕ d ⊕ e ⊕ x1in ) . e

x1out =( a ⊕ b). c + ( a ⊕ b ). a

x2out =( a ⊕ b ⊕ c ⊕ d ). X1in + ( a ⊕ b ⊕ c ⊕ d). d


(Figure 8) 5:2 Comprossor, Implementation 2

Simulation results
Different structures shown in Figure 3 to figure 8, are implemented in transistor level in 0.35μm

standard CMOS technology in HSPICE. First, all transistors in minimum size are simulated then

using logical effort technique transistor sizes are determined to achieve minimum delay. For all

configurations, worst-case delay is simulated. Also power consumption of the structures in 500MHz

for a unique data pattern is simulated. Table 2 summarizes the achieved delay and power

consumption before and after using logical effort. The surprising result is that power consumption

also decreases after using logical effort based transistor sizing.


Structure Time Delay Power Consumption

(10 -9 Secs) ( 10 -4 watts)


Implementation 1 .736 8.362
Implementation 2 .600 4.875
Implementation 3 .392 4.901
Implementation 4 .338 2.718

Table 2: Simulation results of different Implementations of the 4:2 compressors

Structure Time Delay Power Consumption

(10 -9 Secs) ( 10 -4 watts)


Implementation 1 .540 8.360
Implementation 2 .477 4.635

Table 3: A Comparison between the implementations of the 5:2 Compressors

Conclusion
In this report the concept of logical effort has been discussed from earliest steps and as an application

it is applied to three different CMOS configurations of 4-2 and 5:2 compressors. The simulation

results show delay and power consumption reduction after applying logical effort. A new faster

circuit for a 5:2 compressor for both delay and power consumption. It achieves 11.67% improvement

in speed and 37.02 % improvement in power consumption. Hence obtaining an overall improvement

of 44.37 % in the power-delay product.

References

[1] I. Sutherland, B. Sproull and D. Harris, Logical Effort, Morgan Kaufmann Publishers, 1999.
[2] D. Radhakrishnan and A. P. Preethy, “Low-power CMOS pass logic 4-2 compressor for high
speed multiplication,” in Proc. 43rd IEEE Midwest Symp. Circuits and Systems, vol. 3, pp. 1296-
1298, 2000.
[3] Z. Wang, G. A. Jullien and W. C. Miller, “A new design technique for column compression
multipliers,” in trans. Comput., vol. 44, pp. 962-970, Aug. 1995.
[4] Karuna Prasad and Keshab K. Parhi, “Low-power 4-2 and 5-2 compressors,” in process signals,
systems and computers, vol. 1, pp. 129-133, Nov. 2001.
[5] G. Goto, T. Sato, M. Nakajima and T. Sukemura, “A 54×54 regularly structured tree multiplier,”
in IEEE JSSC, vol. 27, No. 9, pp. 1229-1235, Sep. 1992.
[6] N. Ohkubo et al., “A 4.4 ns CMOS 54×54 multiplier using pass-transistor multiplexer,” in IEEE
JSSC, vol. 30, No. 3, pp. 251-257, Mar. 1995.

Das könnte Ihnen auch gefallen