Beruflich Dokumente
Kultur Dokumente
MEL G623
BITSPilani ANU GUPTA
Pilani Campus EEE
BITSPilani
Pilani Campus
Introduction
HANDOUT
http://nalanda.bits-pilani.ac.in/
VLSI DESIGN
Digital system engineering
Engineering problems of composing circuits into systems are
only briefly touched upon
Early parallel transmission schemes often were much faster than serial schemes (more wires =
more data faster), but the added cost and complexity of hardware (more wires, more complicated
transmitters and receivers). Parallel transmission protocols are now mainly reserved for
applications like a CPU bus or between IC devices that are physically very close to each other,
usually measured in just a few centimeters.
Serial data transmission is much more common in new communication protocols due to a
reduction in the I/O pin count, hence a reduction in cost.
Common serial protocols include SPI, and I2C. Surprisingly, serial transmission methods can
transmit at much higher clock rates per bit transmitted, thus tending to outweigh the primary
advantage of parallel transmission.
Serial protocols are used for longer distance communication systems, ranging from shared external
devices like a digital camera to global networks or even interplanetary communication for space
probes, however some recent CPU bus architectures are even using serial methodologies as well.
But Parallel I/O could work - but only by applying significant engineering resources.
Stringent specifications in the PCIX standard for rise and fall times, drive strengths,
path delays and skews, for example, have proven so expensive that it has been
adopted today only in high-end applications such as computer servers.
EMI
Cost
Bit Rate
copy of the clock is sent along with the data. The output time
Communication between two blocks where a common clock is of the forwarded clock is adjusted so that the clock
applied to both transmission and reception block. transitions in the middle of the bit cell
If any cables or logic modules have a delay longer than one clock
period, the system will only operate over certain (often narrow)
windows of clock frequencies.
Clk signal comes earlier than the data as data is usually delayed.
Hence , global clock is not usually centered on the eye of the signals
it is sampling. This convention tends to be less tolerant of timing noise
Data rate
The rise (transition) time of the transmitter determines how fast a new
symbol can be put on the line.
Timing noise constitutes Uncertainty in the arrival time of the signal at the
receiver and uncertainty in the timing of the transmit and receive clocks
must be compensated by widening the cell to allow for the worst-case
timing plus a margin.
This active control can greatly decrease the timing uncertainty in a system
and hence increase the maximum data rate if pipeline timing is employed.
A 400 MHz clock is distributed to the transmitter and receiver from a master
clock generator over transmission lines that are matched to 100 ps. A
single-stage buffer at the clock generator, B1, introduces a timing
uncertainty of 100 ps, and the four-stage on-chip clock generator, B4,
adds 400 ps of timing uncertainty.
So, Either a ternary signal or two binary signals are required to encode
aperiodic binary event
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
open-loop, global-clock
synchronous system
A 400 MHz clock is distributed to the transmitter and receiver from a master
clock generator over transmission lines that are matched to 100 ps. A
single-stage buffer at the clock generator, B1, introduces a timing
uncertainty of 100 ps, and the four-stage on-chip clock
generator, B4, adds 400 ps of timing uncertainty.
Constraints---
The system operates at 400 Mb/s (bit-cell time tbit = 2.5 ns)
over twire = 6.25 ns (2.5 bits) of cable.
The cycle time must be large enough to account for uncertainty, aperture, and
rise time
it is the transition time component of the timing budget that is traded off against
voltage margin when fitting a margin rectangle into the eye opening.
Min. time required
In the absence of timing noise,
overlapping constraints
Closed-Loop Timing-
Measure and Cancel Skew
Further advantage that it will work at any clock rate as long as the total
timing uncertainty is less than the gross timing margin.
Approach Badv.
Thus, Approach B does not require matched lines for clock distribution.
The DET flip-flops sample their inputs on both edges of the clock.
Timing nomenclature
Timing nomenclature
Delay and Transition Times
Other definitions
Relative phase between two periodic signals
differ by fAB = |fA - fBI fA, then for short periods they can be treated as if
Over longer periods, however, the slow drift of phase must be accounted for
the aperture time is specified from the 10% or 90% point of the
waveform.
BITS Pilani, Pilani Campus
Edge triggered FF
The aperture offset time, tao, is the time from the center of the aperture to
the rising edge of the clock.
With a latch, however, the aperture and delay timing properties are
referenced to different edges of the clock, and an additional delay
property is referenced to the data input.
Signalling
Encoding Signals/ Events
Over time the signal carries a stream of symbols, one after the
other, in sequence.
Return-to-Zero (RZ)
BITS Pilani, Pilani Campus
Non-return-to-Zero (NRZ) Signaling
Ternary Signaling
Dual-Rail Signaling
Clocked Signaling
That is, all of the signals change in response to the same source of
events.
Ternary
The three states of the logic signal are needed to encode
the 3 possibilities :
staying in the current state denotes continuing the current symbol,
changing to the higher of the two other states signals the start of a new 1 symbol,
changing to the lower of the two other states denotes starting a new 0 symbol. Advantage--
Single wire is required
NRZ--Only the time interval between the crossing of the two thresholds U0 and U 1
distinguishes these two cases
RZ--Because the signal returns to zero following each event, each transition crosses only
a single threshold.
There is no ambiguity or timing uncertainty associated with crossing multiple thresholds in
a single transition; however it does require twice the number of transitions as ternary NRZ
signaling.
All that is required in this case is that transitions on the signal be frequent
enough to keep the transmit and receive time bases from drifting too far apart.
drift and the two time bases are matched to f = 100 kHz.
For this link, N is 1000. The link can run for 1,000 bit cells without a
transition before the phase drifts by more than 100 ps.
transitions to synchronize
Bit stuffing
To ensure that a transition is present at least once
every N bit cells, in the worst case we must
consume 1/ N of the channel bandwidth by inserting a transition cell
that carries no data after every N - 1 data bits
Such a clock is easily generated by delaying the clock from the previous
stage.
The nominal delay of the stages affects latency but not throughput.
tCLK1 tCLK2
tc q tlogic
tc q, cd tlogic, cd tsu, thold
Clock Non-idealities
Clock skew
Spatial variation in arrival time of a clock transition.
It is caused by mismatches in clock path or clock load
It can be positive or negative depending upon routing
direction and position of clock source
Clock skew does not result in clock period variation but
only in phase shift
86
R1 R2 R3
In Combinational Combinational
D Q D Q D Q
Logic Logic
delay delay
(a) Positive skew
R1 R2 R3
In Combinational Combinational
D Q D Q D Q
Logic Logic
Positive Skew
TCLK
TCLK
1 3
CLK1
CLK2 2 4
th
88
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Impact of positive clock
skew
R1 R2
In Combinational
D Q D Q
Logic
tc q tlogic
tc q, cd tlogic, cd tsu, thold
89
BITS Pilani, Pilani Campus
Race condition
Hold time constraint: thold + <
Negative Skew
TCLK -
TCLK
1 3
CLK1
CLK2 2 4
91
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
R1 R2
Combinational
D Q D Q
Logic
tc q tlogic
tc q, cd tlogic, cd tsu, thold
92
93
ji tte r
In
94
s u, hold t
jitter
single wire
BITS Pilani, Pilani Campus
edge triggered FF
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Min. cycle time
with clk. timing uncertainty
We denote the minimum time for output j to make its first transition in
response to a transition on input i as the contamination delay tcdji It is
the minimum over all input states, s.
This method results in very robust timing that is largely insensitive to skew;
it does require that logic be partitioned across the two phases, and signals
from the two partitions cannot be intermingled.
Performance Similar to
102
Slack borrowing
Enhanced performance due to flexible timing, yet no
design changes
Possible for logic block to utilize time that is left over from
the previous logic block.
Total logic delay can be more than one clock cycle
103
104
Reg based vs. latch based--
example
Reg.
latch
EE141 105
Less Tclk
106
Level-Sensitive Pipeline
Timing
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
A starts changing with the rising edge of i ,
whereas B is sampled by the falling edge
of i+1
Time borrowing
Time borrowing can be performed across clock cycles as
well as between the two phases of one cycle.
In general, a two-phase system will operate properly if--
132
Drawback
Clk signals to be planned carefully
133
Register based design, Tclk= 125ns
assume register has zero delays
134
Clk
In Clb Clb
R1 R2 Clb Clb R3 Out
A B
a 75n b 50n c C d D e
T=125n
T 135
Latch based designcase-1(bad design)
L1--HI, L2-- LO , Tclk= 100ns
T 136
Clk
hi
lo
50n Wait
idle
b d e
c
T 137
Latch based designcase-2
T 138
L1---HI, L2 LO Tclk= 100ns
max slack- 1/2 Tclk
T 139
Clk
T 140
Latch based designcase-3 L1---
LO, L2-- HI , Tclk= 100ns
T 141
Clk
hi
lo
50n SLACK
Borrow
c d e
b
T 142
Latch based designcase-4 L1---
LO, L2 HI , Tclk= 100ns
T 143
Clk
hi
lo
50n
W
R
O
N b
G
T 144
Maximum slack possible
Max time that can be borrowed is 0.5 Tclk
So max logic cycle delay can be 1.5 Tclk
But for n stages overall delay would be n
Tclk
Drawbacks
We have to use
two phase clocking scheme,
Glitches-power dissipation increases
145
Single-Phase or Zero
Nonoverlap Clocking
146
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Pipelines With Feedback
Asynchronous systems--
Self timed approach
BITS Pilani, Pilani Campus
Sync. systems
Asynch. designmeeting
constraints
149
BITS Pilani, Pilani Campus
Advnext block can start computation as
soon as previous block has finished.
150
BITS Pilani, Pilani Campus
REQUEST , ACKNOWLEDGE - Logical
ordering
151
BITS Pilani, Pilani Campus
Self timed system
152
BITS Pilani, Pilani Campus
Hand shake
protocol
Hand shaking- synchronize by mutual agreement
154
BITS Pilani, Pilani Campus
Implementation of HS
protocol2 phase (no return to
zero)
155
EE141
BITS Pilani, Pilani Campus
Req 2
Req
Ack
SENDER RECEIVER
Data 3
Ack
cycle 1 cycle 2
Senders action
Receivers action
(b) Timing diagram
156
EE141
BITS Pilani, Pilani Campus
2 phase protocol (NRZ)
Data change---request----data acceptance---
acknowledge
Data change---request----acknowledge----data
acceptance
Ack
3 5
Data 1 1
Cycle 1 Cycle 2
135
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
158
EE141
BITS Pilani, Pilani Campus
Dual rail coding
I bit information coded using two wires
Here, data being clocked into the align stages by the input
request signal and clocked out by the output acknowledge
signal.
Dual rail
Bundled data
BITS Pilani, Pilani Campus
2-phase bundled data
(micro-pipelines)
Capture-Pass
transitioncontrolled latch
Transitions on C and
P alternate
Micropipelines
Elegant, no RZ
overhead
But implementation
(latches and other
control circuits) is
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
complex
EE141 149
PERFORMANCE PARAMETERS
CLOCK LOAD
NO OF TRANSISTORS
CLOCKING SCHEME
178
EE141
Latch versus Register
Latch Register stores data when
stores data when
clock is low/ HIGH
clock rises
179
EE141
D Q D Q
Clk Clk
Clk Clk
D D
Q Q
180
EE141
Storage Mechanisms Static
Dynamic (charge-based)
181
EE141
CLK
D Q
CLK
Static-----Mux-Based Latch-1
Q = CLK . Q +CLK . D
182
EE141
CLK LOAD-4
2 PHASE CLOCKING
10-TRANSISTORS
Mux-Based Latch(2)-
LESS CLK LOAD ,
183
EE141
CLK LOAD-2, 2 PHASE CLOCKING, 6-TRANSISTORS
Mux-Based Latch(3)-
LESS CLK LOAD ,Vt DEGRADATION
184
EE141
Non-overlapping clocks
NMOS only
185
EE141
Master-Slave
(EdgeTriggered) Register
186
EE141
Slave
Master CLK
0 Q D
1 QM
1
QM
D 0 Q
CLK
CLK
Master-Slave Register
187
EE141
Multiplexer-based latch pair
I2 T2 I3 I5 T4 I6 Q
QM
D I1 T1 I4 T3
CLK
188
EE141
TIMING METRICS
T set up = I1+T1+I3+I2
T CLK-Q = T3+ I6
T HOLD = ~0
EXACT VALUES CAN BE OBTAINED
THROUGH SIMULATION
Reduced Clock Load Master-
Slave Register
SIZING IMPORTANT-REVERSE CONDUCTION
189
EE141
I2 Must Be Weak
When Slave Is On----reverse Conduction Is Possible
190
EE141
TIMING METRICS
T set up = T1+I1
T CLK-Q = T2+ I3
T HOLD = ~0 (OR T1)
EXACT VALUES CAN BE OBTAINED
THROUGH SIMULATION
191
EE141
CLK CLK
(a) Schematic diagram
CLK
161
CLK
(b) Overlapping clock pairs
EE141
CLK X CLK
Q
A
D
B
193
EE141
Non overlapping phases
194
EE141
TIMING METRICS
T set up = T1+I1
T CLK-Q = T2+ I3
T HOLD = ~0 (OR T1)
EXACT VALUES CAN BE OBTAINED
THROUGH SIMULATION
Overpowering the Feedback Loop
195
EE141
Cross-Coupled
Pairs
NOR-based set-reset
S S R Q Q
R
0 0 Q Q
1 0 1 0
0 1 0 1
1 1 0 0
196
EE141
R
Forbidden State
Cross-Coupled NAND
Added clock
Cross-coupled NANDs VDD
197
EE141
S M2 M4
Q
Q
Q
Q CLK M6 M8 CLK
R M1 M3
S M5 M7 R
198
EE141
Dynamic registers
TIMING METRICS
T set up = T1
T CLK-Q = I1+T2+ I2
199
EE141
T HOLD = ~0 (OR T1)
EXACT VALUES CAN BE OBTAINED THROUGH SIMULATION
IN OVERLAP--
200
EE141
OVELAPS
201
EE141
Other Latches/Registers:
2
C MOS
202
EE141
VDD VDD
M2 M6
CLK M4 CLK M8
X
D Q
CL1 CL2
CLK M3 CLK M7
M1 M5
Insensitive to Clock-Overlap
203
EE141
204
EE141
Dual edge registers
205
EE141
Single phase clock
Latches/Registers: TSPC
206
EE141
207
EE141
Including Logic in TSPC
VDD VDD VDD VDD
In1 In2
PUN
Q Q
PDN In1
In2
208
EE141
Reduced complexity
209
EE141
TSPC Register
VDD VDD VDD
CLK Q
M3 M6 M9
Y
Q
D CLK X CLK
M2 M5 M8
CLK
M1 M4 M7
210
EE141
Pulse-Triggered Latches
An Alternative Approach
Ways to design an edge-triggered sequential cell:
Master-Slave Pulse-Triggered
Latches Latch
L1 L2 L
Data Data
D Q D Q D Q
211
EE141
Pulsed register-avoid race,
single latch
V DD V DD
M3 M6 V DD
CLK
Q
D CLKG CLKG MP CLKG
M2 M5
X
MN
M1 M4
CLK
CLKG
( c) glitch clock
212
EE141
Pulsed Latches
CLK P1 P3
x Q
M6
M3
D P2 M5
M2
M4
M1 CLKD
213
EE141
Sense amplifier based register
214
EE141