Scaling and Subsystem Design Process: Taranath H B

SCALING AND
SUBSYSTEM
DESIGN
PROCESS
2019 Module 3
CMOS subsystem design: Architectural issues. Switch logic. Gate

Taranath H B logic. Design examples – Combinational logic. Clocked
circuits. Other considerations.
Scaling and Subsystem Design Process
Scaling and Subsystem Design

Process
SCALING OF MOS CIRCUITS
VLSI fabrication technology is still in the process of evolution which leading to smaller line widths and feature
size and to higher packing density of circuitry on chip. The scaling down of feature size generally leads to
improved performance and it is important therefore to understand the effects of scaling
Microelectronic technology may be characterized using following figure of merits
 Minimum feature size

 Number of gates on one chip
 Power dissipation
 Maximum operational frequency
 Die size
 Production cost
Many of the figure-of-merits (FOM) can be improved by shrinking the dimensions of transistors and
interconnections.
Scaling models and scaling

Proportional adjustment of the dimensions of an electronic device while maintaining the electrical properties of
the device, results in a device either larger or smaller than the un-scaled device.
Most commonly used models are Constant Electric Field Scaling model and Constant Voltage Scaling model. In
ideal model – dimensions and voltage scale together by the same scale factor.
1 1
Device scaling modeled in terms of generic scaling factors: α and β
.
1
 α
: Linear dimensions both horizontal and vertical dimensions.
1
 β
: Scaling factor for supply voltage VDD and Gate Oxide thickness D ɛox
Gate Area: 𝑨𝒈
𝐴𝑔 = L * W
1 1
Where L: Channel Length and W: Channel width and both are scaled by α Thus, 𝐴𝑔 is scaled by α 2
Gate Capacitance per unit area 𝑪𝒐 or 𝑪𝒐𝒙

ɛ𝑜𝑥
𝐶𝑜𝑥 = 𝐷
Taranath H B Page 1
1
Where ɛox is permittivity of gate oxide and D is the gate oxide thickness scaled by β
. Thus 𝐶𝑜𝑥 is scaled by
1
1 = β.
𝛽
Gate Capacitance 𝑪𝒈
𝐶𝑔 = 𝐶𝑜 * L * W
1 𝛽
Thus 𝐶𝑔 is scaled by β*α 2 = α 2
Parasitic Capacitance 𝑪𝒙
𝐴𝑥
𝐶𝑥 is proportional to 𝑑
1
Where d is the depletion width around source or drain which is scaled by α , and 𝐴𝑥 is the area of the
1
depletion region around source and drain which is scaled by α 2 .
1 1 1
Thus 𝐶𝑥 is scaled by * 1 =
α2 α
α
Carrier Density in channel 𝑸𝒐𝒏

𝑄𝑜𝑛 = 𝐶𝑜 * 𝑉𝑔𝑠
Where 𝑄𝑜𝑛 is the average charge per unit area in the channel in the ‘on’ state. Note that 𝐶𝑜 is scaled by β
1
and 𝑉𝑔𝑠 is scaled by β . Thus 𝑄𝑜𝑛 is scaled by 1.
Channel resistance 𝑹𝒐𝒏

𝐿 1
Ron = 𝑊 ∗ 𝑄
𝑜𝑛 µ
Where the µ is the carrier mobility in the channel and is assumed constant.
1 1
Thus 𝑅𝑜𝑛 is scaled by * 1* 1 = 1
α
α
Gate Delay 𝑻𝒅
𝑇𝑑 is proportional to 𝑅𝑜𝑛 * 𝐶𝑔
1.𝛽 𝛽
Thus 𝑇𝑑 is scaled by =
α2 α2
Maximum Operating Frequency 𝒇𝒐

𝑊 µ𝐶𝑜 𝑉𝐷𝐷
𝑓𝑜 =
𝐿 𝐶𝑔
α2
Or, 𝑓𝑜 is inversely proportional to delay 𝑇𝑑 . Thus 𝑓𝑜 is scaled by 𝛽
Taranath H B Page 2
Saturation current 𝑰𝒅𝒔𝒔

µ𝐶𝑜 𝑊
𝐼𝑑𝑠𝑠 = 2 𝐿
(𝑉𝑔𝑠 − 𝑉𝑡 )2
1 1 1
Noting that both 𝑉𝑔𝑠 and 𝑉𝑡 are scaled by 𝛽 , we have 𝐼𝑑𝑠𝑠 is scaled by β*𝛽 2 = β
Current Density J
𝐼𝑑𝑠𝑠
J= 𝐴
1 α2
Where A is the cross-sectional area of the channel in the ‘on’ state which is scaled by α 2 . So, J is scaled by 𝛽
.
Switching Energy per Gate 𝑬𝒈

1. 𝐶𝑔
𝐸𝑔 = (𝑉𝐷𝐷 )2
2
𝛽 1 1
So, 𝐸𝑔 is scaled by * =
α2 𝛽2 𝛽 .α 2
Power Dissipation per Gate 𝑷𝒈

𝑃𝑔 Comprises two components such that
𝑃𝑔 = 𝑃𝑔𝑠 + 𝑃𝑔𝑑
Where the static component
𝑉𝐷𝐷 2
𝑃𝑔𝑠 =
𝑅𝑜𝑛
And dynamic component
𝑃𝑔𝑑 = 𝐸𝑔 𝑓0
1 1
It will be seen that both 𝑃𝑔𝑠 and 𝑃𝑔𝑑 are scaled by 𝛽 2 . So, 𝑃𝑔 is scaled by 𝛽 2
Power Dissipation per unit area 𝑷𝒂

𝑃
𝑃𝑎 = 𝐴𝑔
𝑔
1
β2 α2
So, 𝑃𝑎 is scaled by 1 = β2
α2
Power-Speed Product 𝑷𝑻
𝑃𝑇 = 𝑃𝑔 + 𝑇𝑑
1 𝛽 1
So, 𝑃𝑇 is scaled by β 2 * α 2 = 𝛽 .α 2
Taranath H B Page 3
SUBSYTEM DESIGN PROCESS

Q: What is in it for me? Is it going to be worthwhile investing the time to
learn?
A little time is required to learn the fundamentals of VLSI Design and it was brought within the scope of the
ordinary electronics engineer with the help of Mead and Conway methodology.
A VLSI Design technology provides better ways of tackling some problems, providing a way to designing and
realizes systems which are large and complex. It provides the understanding of IC Technology. It includes
some General consideration.
1. Lower Unit cost – compared to other approaches to system requirements
2. Higher reliability – High levels of system integration will reduce the interconnections – which reduces
delay and power loss.
3. Lower power dissipation, lower weight and lower volume compared with other approaches
4. Better performance – particularly in terms of speed power product
5. Enhanced repeatability – fewer process controls for whole system since it is realized using same
design/unit
6. The possibility of reduced design/development periods if suitable design procedures and design aids
are available
Some problems associated with VLSI Design are;
1. How to design large complex systems in a reasonable time and with reasonable effort
2. The nature of architecture best suited to take full advantage of VLSI and the technology
3. The testability of large/ complex systems once implemented in silicon
Probable solution to the problems 1 and 3 is
 Approach the design in top-down manner with adequate CAD tools. Partition the system sensibly,
aiming for simple interconnection between subsystems and high regularity
 Allocating significant proportion (eg: 30% ) of the total chip area to test and diagnostic facilities
 While choosing the architecture communication must be given highest priority as it takes up as much as
40-50% of the chip for the interconnections
To represent design, several approaches may be used at different stages of the design process, like
- Conventional circuit symbols
- Logic symbols
- Stick diagrams
- Mixture of logic symbols and stick diagrams
- Mask layouts
- Architectural block diagrams
- Floor Plans
Taranath H B Page 4
The General Arrangement of a 4-bit Arithmetic Processor

Let’s choose 4-bit microprocessor as a design example because it is suitable for illustrating the design and
interconnection of common architectural blocks. At this stage we will consider the design of the data path only.
Basic Digital Processor Structure

The data path comprises a unit which processes data applied at one port and presents its output at a second
port. The two ports may be combined as bidirectional port if storage facilities exist in data path.
Communication strategy for data path

Decomposing the datapath into a block diagram shows main subunits. It is useful to anticipate possible floor
plan of the chip and mask layout.
Subunits and basic interconnections for data path

Next decision must be made about the nature of the bus architecture linking the subunits. This ranges from
one-bus, two-bus or three-bus architecture.
Taranath H B Page 5
One-bus Architecture
Sequence:
1. First operand from registers to ALU, Operands will be stored
2. Second Operand from registers to ALU, Computation is done and result is stored in ALU
3. Result is passed through shifter to registers
Two-bus Architecture
Sequence:
1. Two Operands (A and B) are sent from registers to ALU and are computed and the result (S) is stored
in ALU
2. Result is passed through the shifter and stored in the registers
Three-bus Architecture
Taranath H B Page 6
Sequence:
The two operands (A and B) are sent from the registers, operated upon, and the shifted result (S) returned to
another registers all in the same clock period
The proposed processor will seem to comprise of a register array in which 4-bit number can be stored. Either
from input/output port or from the output of the ALU via a shifter.
Data connections between the I/O port, ALU and Shifter must be in the form of4-bit buses. And each of the
blocks must be suitably connected to control lines so that function may be defined for any range of possible
operations.
Tentative Floor Plan for 4-Bit Datapath

The above floor plan indicates a possible relative disposing of the blocks and also indicates acceptable and
sensible interconnection strategy.
Taranath H B Page 7
The Design of a 4-Bit Shifter

Any general purpose n-bit shifter should be able to shift incoming data by up to n-1 places in right shift or
left shift direction. Further all shifts would be made ‘end around’ basis, so that any bit shifted out at one end
would result in shifting at other end. It will greatly ease the problem of right and left shift.
Consider for 4-bit word,
1-bit shift right ≃ 3-bit shift left
2-bit shift right ≃ 2-bit shift left

When designing, the shifter must have,
 Input from a four-line parallel data bus

 Four output lines for the shifted data
 Means of transferring data with any shift from zero to three bits
To meet the requirements, should take the best advantage of available technology such as switch-like MOS
pass transistor or transmission gate
A solution which meets these requirements emerges from switch contact based switching networks- Crossbar
Switch
4X4 Crossbar Switch
Taranath H B Page 8
Adaptation of this arrangement recognizes the fact that couple the switch gates together in groups of four
and also form four separate groups corresponding to shifts of zero, one, two and three bits.
The inter bus switches have their gate inputs connected in a staircase function in groups of four and are the
four shift control inputs.
Barrel shifter connects the input lines representing a word to a group of output lines with the required shift
determined by its control inputs (Sh0, Sh1, Sh2, Sh4 ). Control inputs also determine the direction of the shift.
For n inputs word can have from 0 to n-1 bit position shifting can be implemented.
4X4 Barrel Shifter
Regularity
It is a qualitative parameter and should be high as possible to minimize the design effort required for any
system.
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟 𝑠 𝑜𝑛 𝑐𝑕𝑖𝑝
Regularity = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟𝑠 𝑐𝑖𝑟𝑐𝑢𝑖𝑡𝑠 𝑡𝑕𝑎𝑡 𝑚𝑢𝑠𝑡 𝑏𝑒 𝑑𝑒𝑠𝑖𝑔𝑛𝑒𝑑 𝑖𝑛 𝑑𝑒𝑡𝑎𝑖𝑙
The denominator of the expression will obviously be greatly reduced if the whole chip or large parts of it can
be fabricated from a few standard cells.
16
Eg: 4X4 barrel shifter has, Regularity = 1
= 16
Taranath H B Page 9
Design of ALU Subsystem

The heart of the ALU is a 4-bit adder circuit and it is this which we will actually design. Later the same can be
adapted to subtract and perform logical operations
4-bit data path for processor
Design of a 4-bit Adder

For any column k there will be three inputs – the corresponding bits of the input numbers, Ak and Bk , and
previous carry – Carry in (Ck−1 ). It will also be seen that there are two outputs, the Sum (Sk ) and a new Carry
(Ck ).
Truth table for k Column of any adder
Inputs Outputs
Ak Bk Ck-1 Sk Ck
0 0 0 0 0
0 1 0 1 0
1 0 0 1 0
1 1 0 0 1
0 0 1 1 0
0 1 1 0 1
1 0 1 0 1
1 1 1 1 1
From the table,
Sum, Sk = Ak .Bk . Ck−1 + Ak . Bk . Ck−1 +Ak . Bk . Ck−1 + Ak . Bk . Ck−1
Taranath H B Page 10
Representing in terms of Half Adder Sum
Hk = A k . B k + A k . B k
With this Sum expression can be re-written as
Sk = Hk . Ck−1 + Hk . Ck−1
And New Carry, Ck = Ak . Bk . Ck−1 + Ak . Bk . Ck−1 +Ak . Bk . Ck−1 + Ak . Bk . Ck−1

Above expression can be expressed as
Ck = Ak . Bk + Hk . Ck−1
Adder element requirements

Adder requirements may be stated as
If Ak = Bk then Sk = Ck−1 else Sk = Ck−1
And for the Carry Ck ,
If Ak = Bk then Ck = Ak = Bk else Ck = Ck−1
A standard Adder element

A 1-bit adder element may be represented as
n-such elements would be cascaded to form an n-bit adder.
Implementing ALU functions with Adder

ALU must be able to add and subtract two binary numbers, perform logical operations such as AND, OR and
Equality, EX-OR functions. Subtraction can be performed by taking 2’s complement of the number and further
perform the further addition.
The adder equations are
Sum Sk = Hk . Ck−1 + Hk . Ck−1
Carry Ck = Ak . Bk + Hk . Ck−1
where Hk = A k . B k + A k . B k
let us consider Sum output,
and if previous carry is at logic 0,
Then, Ck−1 = 0 and Ck−1 = 1
Sk = Hk . 1 + Hk .0
S k = Hk = A k . B k + A k . B k EX-OR Operation
Now, If Previous carry is at logic 1,
Sk = Hk . 0 + Hk .1
Sk = Hk = Ak .Bk + Ak . Bk EX-NOR Operation

Consider the carry output,
And if previous carry is at logic 0,
Ck = Ak . Bk + Hk . 0
Ck = Ak . Bk AND Operation
Now, If Previous carry is at logic 1,
Ck = Ak . Bk + Hk . 1
On solving Ck = Ak + Bk OR Operation
So, the adder element can be used for implementing both arithmetic and logical functions.
1-bit Adder Element
4-bit ALU
Further considerations of Adders

In order to broaden the scope of our discussion, consider commonly used alternative forms of the adder
equations,
Sum Sk = Hk . Ck−1 + Hk . Ck−1
Carry Ck = Ak . Bk + Hk . Ck−1
where Hk = A k . B k + A k . B k
Expressed in terms of the previous carry Ck-1 with propagate signal Pk and generate signal Gk.
Pk (= Hk) = Ak XOR Bk
Gk = AK .BK
New carry Ck = Pk . Ck-1 + Gk
Propagate / Generate Logic
Inputs Outputs
Ak Bk Ck-1 Sk Ck
0 0 0 0 0
0 1 0 1 0
1 0 0 1 0
1 1 0 0 1
0 0 1 1 0
0 1 1 0 1
1 0 1 0 1
1 1 1 1 1
Can also be expressed in terms of carry in Ck-1 Carry out signals Ck together with Ak and Bk
Sk = Ck . (Ak + Bk + Ck-1) + Ak . Bk . Ck-1

Share Logic
Inputs Outputs
Ak Bk Ck-1 Sk Ck
0 0 0 0 0
0 1 0 1 0
1 0 0 1 0
1 1 0 0 1
0 0 1 1 0
0 1 1 0 1
1 0 1 0 1
1 1 1 1 1
Carry – look ahead Adder

Here will try to predict Ck earlier than Tc * k instead of passing through k stages, compute Ck Separately using
1-stage CMOS logic
Ck = Ak . Bk + Hk . Ck−1
Nothing that Hk = Ak . Bk + Ak . Bk the expression can be rearranged into the form
Ck = Ak . Bk + (Ak . +Bk ). Ck−1
Thus, C0 = A0 . B0 + (A0 . +B0 ). Cin

This allows for an input carry,
C1 = A1 . B1 + (A1 + B1 ). C0
May be written as
C1 = A1 . B1 + (A1 + B1 ). A0 . B0 + (A1 + B1 ). (A0 + B0 ). Cin

Similarly
C2 = A2 . B2 + (A2 + B2 ). A1 . B1 + (A2 + B2 ). (A1 + B1 ). A0 . B0 + (A2 + B2 ). (A1 + B1 ). (A0 +

B0 ). Cin
C3 = A3 . B3 + (A3 + B3 ). A2 . B2 + (A3 + B3 ). (A2 + B2 ). A1 . B1 + (A3 + B3 ). (A2 + B2 ). (A1 +

B1 ). A0 . B0 + A3 + B3 . (A2 + B2 ). (A1 + B1 ). (A0 + B0 ). Cin
Although these expressions are very lengthy as the bit significance increases, each expression is only three
logic levels deep. So the delay will be for forming the three logic levels deep. So the delay in forming the
carry is constant irrespective of bit position.
We can write carry look-ahead expression in terms of the Generate Gk and Propagate Pk signals.
C0 = G0 + P0 . Cin
C1 = G1 + P1 . G0 + P1 . P0 . Cin
C2 = G2 + P2 . G1 + P2 . P1 . G0 + P2 . P1 . P0 . Cin
C3 = G3 + P3 . G2 + P3 . P2 . G1 + P3 . P2 . P1 . G0 + P3 . P2 . P1 . P0 . Cin
Further algebraic manipulation allows the expressions to be written as
C0 = G0 + P0 . Cin
C1 = G1 + P1 . (G0 + P0 . Cin )
C2 = G2 + P2 . (G1 + P1 . (G0 + P0 . Cin ))
C3 = G3 + P3 . (G2 + P2 . (G1 + P1 . (G0 + P0 . Cin )))
Can we reuse logic C0, C1,C2, and C3, same circuit ?

- The node points seems like C0, C1, and C2 but they are not static, they might be floating condition
hence should make separate logic for each.
Carry look-ahead adder compared to ripple-carry adder

- Faster but delay still linear with respect to number of bits
- Larger area – P and G Generation
- Carry Generation circuit – each bit position
- No reuse of logic possible
Manchester Carry Chain

It tries to reuse logic by pre-charging each carry positions.
Manchester Carry Chain Adder
Compared to Carry Look-ahead Adder,

- Consumes less area – reuse of logic for intermediate carry signals
- Carry chain can be any length – since series propagation, Buffer for every 4 bits.
- Good for up to 16 bit
- Computing sum using carry chain will slow down Manchester Carry chain (MCC)
Manchester Carry Chain Element
Carry Skip Adder

For a ripple carry adder, the input bits A and B are different for all bit positions, then the input Carry
is propagated at all bit positions and never generated. The addition is thus only completed after the carry
has propagated along the entire adder.
Carry skip adders take advantage of both generation and Propagation of the carry signal. A special circuit is
used to detect the condition when A and B bit differ in all bit positions in the block.
The output is called block propagation signal. If it is 1, then carry signal entering the block can bypass it and
transmitted through a multiplexer to the next block.
- Carry skip adder is faster than ripple adder but still linear computation
- Requires more hardware circuitry i.e. area consumption
- Not worth for small adder implementation (N<8)
Carry Select Adder

It is also referred to as conditional sum adder – the adder is divided into blocks, each block is composed of
two adders, one with logical 0 carry in and the other with a logical 1 Carry in. the sum and Carry out
generated are then selected by the actual carry in which comes from the carry out output of the previous
block.
- Mainly the area overhead – additional carry path and multiplexer.

- Delay – Sub linear, which can be beaten

Scaling and Subsystem Design Process: Taranath H B

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Scaling and Subsystem Design Process: Taranath H B

Hochgeladen von

Copyright:

Verfügbare Formate

SCALING AND

CMOS subsystem design: Architectural issues. Switch logic. Gate

Scaling and Subsystem Design

 Minimum feature size

Scaling models and scaling

Gate Capacitance per unit area 𝑪𝒐 or 𝑪𝒐𝒙

Carrier Density in channel 𝑸𝒐𝒏

Channel resistance 𝑹𝒐𝒏

Maximum Operating Frequency 𝒇𝒐

Saturation current 𝑰𝒅𝒔𝒔

Switching Energy per Gate 𝑬𝒈

Power Dissipation per Gate 𝑷𝒈

Power Dissipation per unit area 𝑷𝒂

SUBSYTEM DESIGN PROCESS

The General Arrangement of a 4-bit Arithmetic Processor

Basic Digital Processor Structure

Communication strategy for data path

Subunits and basic interconnections for data path

Tentative Floor Plan for 4-Bit Datapath

The Design of a 4-Bit Shifter

1-bit shift right ≃ 3-bit shift left

2-bit shift right ≃ 2-bit shift left

 Input from a four-line parallel data bus

4X4 Crossbar Switch

4X4 Barrel Shifter

Design of ALU Subsystem

4-bit data path for processor

Design of a 4-bit Adder

Sum, Sk = Ak .Bk . Ck−1 + Ak . Bk . Ck−1 +Ak . Bk . Ck−1 + Ak . Bk . Ck−1

Representing in terms of Half Adder Sum

And New Carry, Ck = Ak . Bk . Ck−1 + Ak . Bk . Ck−1 +Ak . Bk . Ck−1 + Ak . Bk . Ck−1

Adder element requirements

If Ak = Bk then Sk = Ck−1 else Sk = Ck−1

And for the Carry Ck ,

If Ak = Bk then Ck = Ak = Bk else Ck = Ck−1

A standard Adder element

n-such elements would be cascaded to form an n-bit adder.

Implementing ALU functions with Adder

Sum Sk = Hk . Ck−1 + Hk . Ck−1

Then, Ck−1 = 0 and Ck−1 = 1

Then, Ck−1 = 1 and Ck−1 = 0

Sk = Hk = Ak .Bk + Ak . Bk EX-NOR Operation

Then, Ck−1 = 0 and Ck−1 = 1

Then, Ck−1 = 1 and Ck−1 = 0

1-bit Adder Element

Further considerations of Adders

Sum Sk = Hk . Ck−1 + Hk . Ck−1

Sk = Ck . (Ak + Bk + Ck-1) + Ak . Bk . Ck-1

Carry – look ahead Adder

Nothing that Hk = Ak . Bk + Ak . Bk the expression can be rearranged into the form

Ck = Ak . Bk + (Ak . +Bk ). Ck−1

Thus, C0 = A0 . B0 + (A0 . +B0 ). Cin

C1 = A1 . B1 + (A1 + B1 ). A0 . B0 + (A1 + B1 ). (A0 + B0 ). Cin

C2 = A2 . B2 + (A2 + B2 ). A1 . B1 + (A2 + B2 ). (A1 + B1 ). A0 . B0 + (A2 + B2 ). (A1 + B1 ). (A0 +

C3 = A3 . B3 + (A3 + B3 ). A2 . B2 + (A3 + B3 ). (A2 + B2 ). A1 . B1 + (A3 + B3 ). (A2 + B2 ). (A1 +

C2 = G2 + P2 . (G1 + P1 . (G0 + P0 . Cin ))

C3 = G3 + P3 . (G2 + P2 . (G1 + P1 . (G0 + P0 . Cin )))

Can we reuse logic C0, C1,C2, and C3, same circuit ?

Carry look-ahead adder compared to ripple-carry adder

Manchester Carry Chain

Manchester Carry Chain Adder

Compared to Carry Look-ahead Adder,

Manchester Carry Chain Element