Beruflich Dokumente
Kultur Dokumente
Computer Arithmetic
06-88-555
Fall 2009
Project report
Submitted
By
ABHINEET SINGH
103257615
fast CMOS circuits. As an application, this technique has been applied to different architectures
of low power high order compressors such as 4-2 and 5-2 compressors unit, which are
implements using static CMOS gates. These compressors are building blocks for binary
multipliers. A comparison between delay, power consumption and area for these structures, is
presented according to simulation results in standard CMOS process. The compressor presented
Introduction
CMOS logic gates are basic building blocks for arithmetic circuits. The delay through these gates
is related to their sizes and their loads. Logical effort is a technique, which gives insight about
proper sizing of CMOS logic gates to have the minimum achievable delay. On the other hand,
Multiplication is the basic arithmetic operation which is important in several microprocessor and
digital signal processing application. Microprocessors use multipliers within their arithmetic
logic unit, and digital signal processing systems require multipliers to implement DSP algorithm
such as convolution and filtering. In most systems, the multipliers lies directly within the critical
path due to which, the demand for high-speed multipliers is continuously increasing. In high-
speed multiplication and multi-operand addition, compressors have been widely used. Pass-
transistor logic is emerging as an attractive replacement of the conventional static CMOS logic,
especially in the design of arithmetic units such as multipliers. Fewer transistors are requires by
the pass-transistor logic to implement basic logic function, which translate into lower input gate
capacitance and lower power dissipation as compared to conventional CMOS. In this project,
first of all, logical effort technique is discussed and some supporting examples are given. Then
few common architectures for 4-2 and 5-2 compressors are sized using logical effort to get the
Logical Effort
A logic gate delay consists of two components, a fixed part which is called as parasitic delay, p and a
load-dependent part which is called as effort delay or stage delay, f. Then the total delay can be
written as [1]:
d= f+ p (1)
The effort delay can be divided into two parts, the logical effort, g which is related to the properties
of the logic gate from sizing point of view and the electrical effort, h which characterize the load.
f = gh (2)
The electrical effort can be defined as the ratio of load capacitance Cout and the input capacitance of
driving an identical inverter (τ). This value depends only on the process, and for a given gate the total
where d = gh + p (5)
In Table 1 the logical effort for some static CMOS gates are summarized. In these values it is
assumed that the worst-case resistances in pull-up and pull-down networks are identical to a
Gate Type 1 2 3 n
Inverter 1
NAND 4/3 5/3 (n+2)/3
NOR 5/3 7/3 (2n+1)/3
MUX 2 2 2
XOR 4 12
Another parameter, which contributes in the total delay of the gate, is parasitic delay (p). This
parameter is independent of sizing and the load capacitance. This delay is related to source and drain
capacitance, which drives the output load. As an estimation assuming a simple layout for gates, this
delay for an n-input NAND, NOR and MUX is n, n and 2n times of that of an inverter respectively.
Now we can give a simple example to show how we can use (5) to calculate the delay. As shown in
Fig.1 assume that a 4- input NOR gate drives three identical gates. In order to estimate the delay of
the NOR gate, we should find the value for g, h and p. According to previous discussions for a 4-
input NOR gate g=9/3=3 and p=4pinv which assuming pinv=1, p=4. Also the driving gate drives three
times larger capacitance than its input capacitance and it means h=3. Using (5) gives d=3×3+4=13. If
in this process τ=50ps then the total delay of the gate is 650ps.
delay=d
Compressors are the fundamental building blocks used for accumulating the partial product during
the multiplication process. Therefore, improving the power efficiency of these architectures can lead
to significant saving of the power consumption by the entire multiplier. The compressors are
combined to form a Wallace tree or a Dadda tree structure. A Wallace tree is an implementation of
an adder tree design for minimum propagation delay. Rather than completely adding the partial
products in pair like ripple adder tree does. Dadda tree is a generalized form of Wallace tree adder.
The number of adders needed in Dadda tree is less than Wallace tree but the overall interconnections
are more irregular in Dadda tree, making it difficult to layout in VLSI design [4].
Previously, full adder or 3:2 compressor were used for accumulation, in which3 equally
weighted bits were combined to produce two bits: one (the carry) with weight of n+1 and the other
(the sum) with weight n. Each layer of the tree therefore reduced the number of vectors by a factor of
3:2.
(Figure 2) A 3:2 Comprossor, Full Adder
In arithmetic circuits, 4-2 and 5-2 compressors have been widely used in order to lower the latency of
the partial product accumulation stage for high speed multipliers. Also the 4-2 compressors are
popular elements in the construction of regularly structured Wallace tree with low complexity
because of its regular interconnection [2], [3]. Several 4-2 compressor circuits have been proposed
for low power applications [4]-[6]. In this project three different 4-2 compressor structures have been
selected and using logical effort technique and by proper sizing, their minimum achievable delays
have been compared. Figure 3 shows the block diagram of a 4:2 compressor. It has five inputs
including the carry-inn from the neighbouring cell of one binary bit low significance. It has three
outputs including a carry-out to the one greater significance cell. A 4:2 compressor can be built using
3:2 compressors. It consists of two 3:2 compressor in series and involving a critical path delay of 4
XORs.
somehow similar structure with different XOR implementation-which is shown in Figure 4 is used in
[5]. Third configuration is introduced in [6] and shown in Figure 5.It uses only multiplexers. This
implementation is better and involves a critical path delay of 3 XORs, hence reducing the critical
Figure 4 Figure 5
Figure 6 shows the block diagram of a 5:2 comprosser. It has 5 direct inputs and 2 additional
carry-in bits, from a neghbouring one-lower significant cell. It has four outputs, among which
two of them are carry-out bits to the one greater significant cell, carry bit is of one greater
significant and last is the sum bit. A 5:2 compressor can be built using three 3:2 compressors, in
An implementation of 5:2 compressor was proposed in [4]. Figure 7 shows the block diagram of
this 5:2 compressor implementation. At the first glance, It can be assumed that this
implementation would have a critical path delay of 4 XORs. But a detailed analysis reveals that
Figure 8 shows a block diagram of the second implementation of the 5:2 compressor. This
implementationn is derived in a similar fashion as shown in the figure 4 and figure 5. It,
basically, consists of a 4:2 compressor followed by a 3:2 compressor and has a critical path delay
of 5 XORs. As it will be shown below, although this circuit has 5 XORs delay it is faster and
more power efficient as compared to the implementation in figure 8. The detailed equation are
written below.
x1out =( a ⊕ b). c + ( a ⊕ b ). a
Simulation results
Different structures shown in Figure 3 to figure 8, are implemented in transistor level in 0.35μm
standard CMOS technology in HSPICE. First, all transistors in minimum size are simulated then
using logical effort technique transistor sizes are determined to achieve minimum delay. For all
configurations, worst-case delay is simulated. Also power consumption of the structures in 500MHz
for a unique data pattern is simulated. Table 2 summarizes the achieved delay and power
consumption before and after using logical effort. The surprising result is that power consumption
Conclusion
In this report the concept of logical effort has been discussed from earliest steps and as an application
it is applied to three different CMOS configurations of 4-2 and 5:2 compressors. The simulation
results show delay and power consumption reduction after applying logical effort. A new faster
circuit for a 5:2 compressor for both delay and power consumption. It achieves 11.67% improvement
in speed and 37.02 % improvement in power consumption. Hence obtaining an overall improvement
References
[1] I. Sutherland, B. Sproull and D. Harris, Logical Effort, Morgan Kaufmann Publishers, 1999.
[2] D. Radhakrishnan and A. P. Preethy, “Low-power CMOS pass logic 4-2 compressor for high
speed multiplication,” in Proc. 43rd IEEE Midwest Symp. Circuits and Systems, vol. 3, pp. 1296-
1298, 2000.
[3] Z. Wang, G. A. Jullien and W. C. Miller, “A new design technique for column compression
multipliers,” in trans. Comput., vol. 44, pp. 962-970, Aug. 1995.
[4] Karuna Prasad and Keshab K. Parhi, “Low-power 4-2 and 5-2 compressors,” in process signals,
systems and computers, vol. 1, pp. 129-133, Nov. 2001.
[5] G. Goto, T. Sato, M. Nakajima and T. Sukemura, “A 54×54 regularly structured tree multiplier,”
in IEEE JSSC, vol. 27, No. 9, pp. 1229-1235, Sep. 1992.
[6] N. Ohkubo et al., “A 4.4 ns CMOS 54×54 multiplier using pass-transistor multiplexer,” in IEEE
JSSC, vol. 30, No. 3, pp. 251-257, Mar. 1995.