Sie sind auf Seite 1von 20

SCALING AND

SUBSYSTEM
DESIGN
PROCESS

2019 Module 3

CMOS subsystem design: Architectural issues. Switch logic. Gate


Taranath H B logic. Design examples – Combinational logic. Clocked
circuits. Other considerations.
Scaling and Subsystem Design Process

Scaling and Subsystem Design


Process
SCALING OF MOS CIRCUITS
VLSI fabrication technology is still in the process of evolution which leading to smaller line widths and feature
size and to higher packing density of circuitry on chip. The scaling down of feature size generally leads to
improved performance and it is important therefore to understand the effects of scaling
Microelectronic technology may be characterized using following figure of merits

 Minimum feature size


 Number of gates on one chip
 Power dissipation
 Maximum operational frequency
 Die size
 Production cost
Many of the figure-of-merits (FOM) can be improved by shrinking the dimensions of transistors and
interconnections.

Scaling models and scaling


Proportional adjustment of the dimensions of an electronic device while maintaining the electrical properties of
the device, results in a device either larger or smaller than the un-scaled device.
Most commonly used models are Constant Electric Field Scaling model and Constant Voltage Scaling model. In
ideal model – dimensions and voltage scale together by the same scale factor.
1 1
Device scaling modeled in terms of generic scaling factors: α and β
.

1
 α
: Linear dimensions both horizontal and vertical dimensions.
1
 β
: Scaling factor for supply voltage VDD and Gate Oxide thickness D ɛox

Gate Area: 𝑨𝒈
𝐴𝑔 = L * W
1 1
Where L: Channel Length and W: Channel width and both are scaled by α Thus, 𝐴𝑔 is scaled by α 2

Gate Capacitance per unit area 𝑪𝒐 or 𝑪𝒐𝒙


ɛ𝑜𝑥
𝐶𝑜𝑥 = 𝐷

Taranath H B Page 1
Scaling and Subsystem Design Process

1
Where ɛox is permittivity of gate oxide and D is the gate oxide thickness scaled by β
. Thus 𝐶𝑜𝑥 is scaled by
1
1 = β.
𝛽

Gate Capacitance 𝑪𝒈
𝐶𝑔 = 𝐶𝑜 * L * W
1 𝛽
Thus 𝐶𝑔 is scaled by β*α 2 = α 2

Parasitic Capacitance 𝑪𝒙
𝐴𝑥
𝐶𝑥 is proportional to 𝑑
1
Where d is the depletion width around source or drain which is scaled by α , and 𝐴𝑥 is the area of the
1
depletion region around source and drain which is scaled by α 2 .
1 1 1
Thus 𝐶𝑥 is scaled by * 1 =
α2 α
α

Carrier Density in channel 𝑸𝒐𝒏


𝑄𝑜𝑛 = 𝐶𝑜 * 𝑉𝑔𝑠

Where 𝑄𝑜𝑛 is the average charge per unit area in the channel in the ‘on’ state. Note that 𝐶𝑜 is scaled by β
1
and 𝑉𝑔𝑠 is scaled by β . Thus 𝑄𝑜𝑛 is scaled by 1.

Channel resistance 𝑹𝒐𝒏


𝐿 1
Ron = 𝑊 ∗ 𝑄
𝑜𝑛 µ

Where the µ is the carrier mobility in the channel and is assumed constant.
1 1
Thus 𝑅𝑜𝑛 is scaled by * 1* 1 = 1
α
α

Gate Delay 𝑻𝒅
𝑇𝑑 is proportional to 𝑅𝑜𝑛 * 𝐶𝑔
1.𝛽 𝛽
Thus 𝑇𝑑 is scaled by =
α2 α2

Maximum Operating Frequency 𝒇𝒐


𝑊 µ𝐶𝑜 𝑉𝐷𝐷
𝑓𝑜 =
𝐿 𝐶𝑔
α2
Or, 𝑓𝑜 is inversely proportional to delay 𝑇𝑑 . Thus 𝑓𝑜 is scaled by 𝛽

Taranath H B Page 2
Scaling and Subsystem Design Process

Saturation current 𝑰𝒅𝒔𝒔


µ𝐶𝑜 𝑊
𝐼𝑑𝑠𝑠 = 2 𝐿
(𝑉𝑔𝑠 − 𝑉𝑡 )2
1 1 1
Noting that both 𝑉𝑔𝑠 and 𝑉𝑡 are scaled by 𝛽 , we have 𝐼𝑑𝑠𝑠 is scaled by β*𝛽 2 = β

Current Density J
𝐼𝑑𝑠𝑠
J= 𝐴

1 α2
Where A is the cross-sectional area of the channel in the ‘on’ state which is scaled by α 2 . So, J is scaled by 𝛽
.

Switching Energy per Gate 𝑬𝒈


1. 𝐶𝑔
𝐸𝑔 = (𝑉𝐷𝐷 )2
2
𝛽 1 1
So, 𝐸𝑔 is scaled by * =
α2 𝛽2 𝛽 .α 2

Power Dissipation per Gate 𝑷𝒈


𝑃𝑔 Comprises two components such that

𝑃𝑔 = 𝑃𝑔𝑠 + 𝑃𝑔𝑑
Where the static component

𝑉𝐷𝐷 2
𝑃𝑔𝑠 =
𝑅𝑜𝑛
And dynamic component
𝑃𝑔𝑑 = 𝐸𝑔 𝑓0
1 1
It will be seen that both 𝑃𝑔𝑠 and 𝑃𝑔𝑑 are scaled by 𝛽 2 . So, 𝑃𝑔 is scaled by 𝛽 2

Power Dissipation per unit area 𝑷𝒂


𝑃
𝑃𝑎 = 𝐴𝑔
𝑔

1
β2 α2
So, 𝑃𝑎 is scaled by 1 = β2
α2

Power-Speed Product 𝑷𝑻
𝑃𝑇 = 𝑃𝑔 + 𝑇𝑑
1 𝛽 1
So, 𝑃𝑇 is scaled by β 2 * α 2 = 𝛽 .α 2

Taranath H B Page 3
Scaling and Subsystem Design Process

SUBSYTEM DESIGN PROCESS


Q: What is in it for me? Is it going to be worthwhile investing the time to
learn?
A little time is required to learn the fundamentals of VLSI Design and it was brought within the scope of the
ordinary electronics engineer with the help of Mead and Conway methodology.
A VLSI Design technology provides better ways of tackling some problems, providing a way to designing and
realizes systems which are large and complex. It provides the understanding of IC Technology. It includes
some General consideration.
1. Lower Unit cost – compared to other approaches to system requirements
2. Higher reliability – High levels of system integration will reduce the interconnections – which reduces
delay and power loss.
3. Lower power dissipation, lower weight and lower volume compared with other approaches
4. Better performance – particularly in terms of speed power product
5. Enhanced repeatability – fewer process controls for whole system since it is realized using same
design/unit
6. The possibility of reduced design/development periods if suitable design procedures and design aids
are available
Some problems associated with VLSI Design are;
1. How to design large complex systems in a reasonable time and with reasonable effort
2. The nature of architecture best suited to take full advantage of VLSI and the technology
3. The testability of large/ complex systems once implemented in silicon
Probable solution to the problems 1 and 3 is

 Approach the design in top-down manner with adequate CAD tools. Partition the system sensibly,
aiming for simple interconnection between subsystems and high regularity
 Allocating significant proportion (eg: 30% ) of the total chip area to test and diagnostic facilities
 While choosing the architecture communication must be given highest priority as it takes up as much as
40-50% of the chip for the interconnections
To represent design, several approaches may be used at different stages of the design process, like
- Conventional circuit symbols
- Logic symbols
- Stick diagrams
- Mixture of logic symbols and stick diagrams
- Mask layouts
- Architectural block diagrams
- Floor Plans

Taranath H B Page 4
Scaling and Subsystem Design Process

The General Arrangement of a 4-bit Arithmetic Processor


Let’s choose 4-bit microprocessor as a design example because it is suitable for illustrating the design and
interconnection of common architectural blocks. At this stage we will consider the design of the data path only.

Basic Digital Processor Structure


The data path comprises a unit which processes data applied at one port and presents its output at a second
port. The two ports may be combined as bidirectional port if storage facilities exist in data path.

Communication strategy for data path


Decomposing the datapath into a block diagram shows main subunits. It is useful to anticipate possible floor
plan of the chip and mask layout.

Subunits and basic interconnections for data path


Next decision must be made about the nature of the bus architecture linking the subunits. This ranges from
one-bus, two-bus or three-bus architecture.

Taranath H B Page 5
Scaling and Subsystem Design Process

One-bus Architecture

Sequence:
1. First operand from registers to ALU, Operands will be stored
2. Second Operand from registers to ALU, Computation is done and result is stored in ALU
3. Result is passed through shifter to registers

Two-bus Architecture

Sequence:
1. Two Operands (A and B) are sent from registers to ALU and are computed and the result (S) is stored
in ALU
2. Result is passed through the shifter and stored in the registers

Three-bus Architecture

Taranath H B Page 6
Scaling and Subsystem Design Process

Sequence:
The two operands (A and B) are sent from the registers, operated upon, and the shifted result (S) returned to
another registers all in the same clock period

The proposed processor will seem to comprise of a register array in which 4-bit number can be stored. Either
from input/output port or from the output of the ALU via a shifter.
Data connections between the I/O port, ALU and Shifter must be in the form of4-bit buses. And each of the
blocks must be suitably connected to control lines so that function may be defined for any range of possible
operations.

Tentative Floor Plan for 4-Bit Datapath


The above floor plan indicates a possible relative disposing of the blocks and also indicates acceptable and
sensible interconnection strategy.

Taranath H B Page 7
Scaling and Subsystem Design Process

The Design of a 4-Bit Shifter


Any general purpose n-bit shifter should be able to shift incoming data by up to n-1 places in right shift or
left shift direction. Further all shifts would be made ‘end around’ basis, so that any bit shifted out at one end
would result in shifting at other end. It will greatly ease the problem of right and left shift.
Consider for 4-bit word,

1-bit shift right ≃ 3-bit shift left

2-bit shift right ≃ 2-bit shift left


When designing, the shifter must have,

 Input from a four-line parallel data bus


 Four output lines for the shifted data
 Means of transferring data with any shift from zero to three bits
To meet the requirements, should take the best advantage of available technology such as switch-like MOS
pass transistor or transmission gate
A solution which meets these requirements emerges from switch contact based switching networks- Crossbar
Switch

4X4 Crossbar Switch

Taranath H B Page 8
Scaling and Subsystem Design Process

Adaptation of this arrangement recognizes the fact that couple the switch gates together in groups of four
and also form four separate groups corresponding to shifts of zero, one, two and three bits.
The inter bus switches have their gate inputs connected in a staircase function in groups of four and are the
four shift control inputs.
Barrel shifter connects the input lines representing a word to a group of output lines with the required shift
determined by its control inputs (Sh0, Sh1, Sh2, Sh4 ). Control inputs also determine the direction of the shift.
For n inputs word can have from 0 to n-1 bit position shifting can be implemented.

4X4 Barrel Shifter

Regularity
It is a qualitative parameter and should be high as possible to minimize the design effort required for any
system.
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟 𝑠 𝑜𝑛 𝑐𝑕𝑖𝑝
Regularity = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟𝑠 𝑐𝑖𝑟𝑐𝑢𝑖𝑡𝑠 𝑡𝑕𝑎𝑡 𝑚𝑢𝑠𝑡 𝑏𝑒 𝑑𝑒𝑠𝑖𝑔𝑛𝑒𝑑 𝑖𝑛 𝑑𝑒𝑡𝑎𝑖𝑙

The denominator of the expression will obviously be greatly reduced if the whole chip or large parts of it can
be fabricated from a few standard cells.
16
Eg: 4X4 barrel shifter has, Regularity = 1
= 16

Taranath H B Page 9
Scaling and Subsystem Design Process

Design of ALU Subsystem


The heart of the ALU is a 4-bit adder circuit and it is this which we will actually design. Later the same can be
adapted to subtract and perform logical operations

4-bit data path for processor

Design of a 4-bit Adder


For any column k there will be three inputs – the corresponding bits of the input numbers, Ak and Bk , and
previous carry – Carry in (Ck−1 ). It will also be seen that there are two outputs, the Sum (Sk ) and a new Carry
(Ck ).
Truth table for k Column of any adder
Inputs Outputs
Ak Bk Ck-1 Sk Ck
0 0 0 0 0
0 1 0 1 0
1 0 0 1 0
1 1 0 0 1
0 0 1 1 0
0 1 1 0 1
1 0 1 0 1
1 1 1 1 1
From the table,

Sum, Sk = Ak .Bk . Ck−1 + Ak . Bk . Ck−1 +Ak . Bk . Ck−1 + Ak . Bk . Ck−1

Taranath H B Page 10
Scaling and Subsystem Design Process

Representing in terms of Half Adder Sum

Hk = A k . B k + A k . B k
With this Sum expression can be re-written as

Sk = Hk . Ck−1 + Hk . Ck−1

And New Carry, Ck = Ak . Bk . Ck−1 + Ak . Bk . Ck−1 +Ak . Bk . Ck−1 + Ak . Bk . Ck−1


Above expression can be expressed as

Ck = Ak . Bk + Hk . Ck−1

Adder element requirements


Adder requirements may be stated as

If Ak = Bk then Sk = Ck−1 else Sk = Ck−1

And for the Carry Ck ,

If Ak = Bk then Ck = Ak = Bk else Ck = Ck−1

A standard Adder element


A 1-bit adder element may be represented as

n-such elements would be cascaded to form an n-bit adder.

Implementing ALU functions with Adder


ALU must be able to add and subtract two binary numbers, perform logical operations such as AND, OR and
Equality, EX-OR functions. Subtraction can be performed by taking 2’s complement of the number and further
perform the further addition.
The adder equations are

Sum Sk = Hk . Ck−1 + Hk . Ck−1

Carry Ck = Ak . Bk + Hk . Ck−1

Taranath H B Page 11
Scaling and Subsystem Design Process

where Hk = A k . B k + A k . B k
let us consider Sum output,
and if previous carry is at logic 0,

Then, Ck−1 = 0 and Ck−1 = 1

Sk = Hk . 1 + Hk .0

S k = Hk = A k . B k + A k . B k EX-OR Operation
Now, If Previous carry is at logic 1,

Then, Ck−1 = 1 and Ck−1 = 0

Sk = Hk . 0 + Hk .1

Sk = Hk = Ak .Bk + Ak . Bk EX-NOR Operation


Consider the carry output,
And if previous carry is at logic 0,

Then, Ck−1 = 0 and Ck−1 = 1

Ck = Ak . Bk + Hk . 0

Ck = Ak . Bk AND Operation
Now, If Previous carry is at logic 1,

Then, Ck−1 = 1 and Ck−1 = 0

Ck = Ak . Bk + Hk . 1

On solving Ck = Ak + Bk OR Operation
So, the adder element can be used for implementing both arithmetic and logical functions.

Taranath H B Page 12
Scaling and Subsystem Design Process

1-bit Adder Element

4-bit ALU

Further considerations of Adders


In order to broaden the scope of our discussion, consider commonly used alternative forms of the adder
equations,

Sum Sk = Hk . Ck−1 + Hk . Ck−1

Carry Ck = Ak . Bk + Hk . Ck−1

where Hk = A k . B k + A k . B k
Expressed in terms of the previous carry Ck-1 with propagate signal Pk and generate signal Gk.

Taranath H B Page 13
Scaling and Subsystem Design Process

Pk (= Hk) = Ak XOR Bk
Gk = AK .BK
New carry Ck = Pk . Ck-1 + Gk
Propagate / Generate Logic
Inputs Outputs
Ak Bk Ck-1 Sk Ck
0 0 0 0 0
0 1 0 1 0
1 0 0 1 0
1 1 0 0 1
0 0 1 1 0
0 1 1 0 1
1 0 1 0 1
1 1 1 1 1

Can also be expressed in terms of carry in Ck-1 Carry out signals Ck together with Ak and Bk

Sk = Ck . (Ak + Bk + Ck-1) + Ak . Bk . Ck-1


Share Logic
Inputs Outputs
Ak Bk Ck-1 Sk Ck
0 0 0 0 0
0 1 0 1 0
1 0 0 1 0
1 1 0 0 1
0 0 1 1 0
0 1 1 0 1
1 0 1 0 1
1 1 1 1 1

Taranath H B Page 14
Scaling and Subsystem Design Process

Carry – look ahead Adder


Here will try to predict Ck earlier than Tc * k instead of passing through k stages, compute Ck Separately using
1-stage CMOS logic

Ck = Ak . Bk + Hk . Ck−1

Nothing that Hk = Ak . Bk + Ak . Bk the expression can be rearranged into the form

Ck = Ak . Bk + (Ak . +Bk ). Ck−1

Thus, C0 = A0 . B0 + (A0 . +B0 ). Cin


This allows for an input carry,

C1 = A1 . B1 + (A1 + B1 ). C0
May be written as

C1 = A1 . B1 + (A1 + B1 ). A0 . B0 + (A1 + B1 ). (A0 + B0 ). Cin


Similarly

C2 = A2 . B2 + (A2 + B2 ). A1 . B1 + (A2 + B2 ). (A1 + B1 ). A0 . B0 + (A2 + B2 ). (A1 + B1 ). (A0 +


B0 ). Cin

C3 = A3 . B3 + (A3 + B3 ). A2 . B2 + (A3 + B3 ). (A2 + B2 ). A1 . B1 + (A3 + B3 ). (A2 + B2 ). (A1 +


B1 ). A0 . B0 + A3 + B3 . (A2 + B2 ). (A1 + B1 ). (A0 + B0 ). Cin
Although these expressions are very lengthy as the bit significance increases, each expression is only three
logic levels deep. So the delay will be for forming the three logic levels deep. So the delay in forming the
carry is constant irrespective of bit position.
We can write carry look-ahead expression in terms of the Generate Gk and Propagate Pk signals.

C0 = G0 + P0 . Cin

C1 = G1 + P1 . G0 + P1 . P0 . Cin

C2 = G2 + P2 . G1 + P2 . P1 . G0 + P2 . P1 . P0 . Cin

C3 = G3 + P3 . G2 + P3 . P2 . G1 + P3 . P2 . P1 . G0 + P3 . P2 . P1 . P0 . Cin
Further algebraic manipulation allows the expressions to be written as

C0 = G0 + P0 . Cin

C1 = G1 + P1 . (G0 + P0 . Cin )

C2 = G2 + P2 . (G1 + P1 . (G0 + P0 . Cin ))

C3 = G3 + P3 . (G2 + P2 . (G1 + P1 . (G0 + P0 . Cin )))

Taranath H B Page 15
Scaling and Subsystem Design Process

Can we reuse logic C0, C1,C2, and C3, same circuit ?


- The node points seems like C0, C1, and C2 but they are not static, they might be floating condition
hence should make separate logic for each.

Taranath H B Page 16
Scaling and Subsystem Design Process

Carry look-ahead adder compared to ripple-carry adder


- Faster but delay still linear with respect to number of bits
- Larger area – P and G Generation
- Carry Generation circuit – each bit position
- No reuse of logic possible

Manchester Carry Chain


It tries to reuse logic by pre-charging each carry positions.

Manchester Carry Chain Adder

Taranath H B Page 17
Scaling and Subsystem Design Process

Compared to Carry Look-ahead Adder,


- Consumes less area – reuse of logic for intermediate carry signals
- Carry chain can be any length – since series propagation, Buffer for every 4 bits.
- Good for up to 16 bit
- Computing sum using carry chain will slow down Manchester Carry chain (MCC)

Manchester Carry Chain Element

Carry Skip Adder


For a ripple carry adder, the input bits A and B are different for all bit positions, then the input Carry
is propagated at all bit positions and never generated. The addition is thus only completed after the carry
has propagated along the entire adder.
Carry skip adders take advantage of both generation and Propagation of the carry signal. A special circuit is
used to detect the condition when A and B bit differ in all bit positions in the block.
The output is called block propagation signal. If it is 1, then carry signal entering the block can bypass it and
transmitted through a multiplexer to the next block.

Taranath H B Page 18
Scaling and Subsystem Design Process

- Carry skip adder is faster than ripple adder but still linear computation
- Requires more hardware circuitry i.e. area consumption
- Not worth for small adder implementation (N<8)

Carry Select Adder


It is also referred to as conditional sum adder – the adder is divided into blocks, each block is composed of
two adders, one with logical 0 carry in and the other with a logical 1 Carry in. the sum and Carry out
generated are then selected by the actual carry in which comes from the carry out output of the previous
block.

- Mainly the area overhead – additional carry path and multiplexer.


- Delay – Sub linear, which can be beaten

Taranath H B Page 19

Das könnte Ihnen auch gefallen