Beruflich Dokumente
Kultur Dokumente
Inhaltsverzeichnis
1 Introduction 4
3 MOS Transistor 13
3.1 Structure and Operation of the MOS Transistor . . . . . . . . . . . . . . . 13
3.2 Threshold Voltage of the MOS Transistor . . . . . . . . . . . . . . . . . . 14
3.2.1 Intrinsic Carrier Concentration . . . . . . . . . . . . . . . . . . . . 15
3.3 First-Order Current-Voltage Characteristics . . . . . . . . . . . . . . . . . 18
3.4 Derivation of Velocity-Saturated Current Equations . . . . . . . . . . . . 20
3.4.1 Effect of High Fields . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Capacitances of the MOS transistor . . . . . . . . . . . . . . . . . . . . . 23
2
6.3 Improving Delay Calculations with Input Slope . . . . . . . . . . . . . . . 70
6.4 Gate Sizing for Optimal Path Delay . . . . . . . . . . . . . . . . . . . . . 74
6.5 Inverter Chain Delay Optimization - FO4 Delay . . . . . . . . . . . . . . . 75
6.6 Path Optimization Using Logical Effort . . . . . . . . . . . . . . . . . . . 84
6.6.1 Understanding Logical Effort . . . . . . . . . . . . . . . . . . . . . 86
3
1 Introduction
• addresses the design of digital circuits
• train you to think like a circuit designer
• achive a good balance among speed, power consumption, reliability
• digital integrated circuit design in 0.18 µm and 0.13 µm
• system designer
• register transfer level (RTL) language
• logic design (gate level)
• circuit level (transistor schematic)
• geometric layout level (GDS-II)
MOS/CMOS
DSM Devices
4
• capacitive coupling (dünne, nahe beieinander liegende Leitungen → aufgestellte
Platten → Kapazität zwischen Leitungen)
• inductive coupling (Leitungen sind in Relation zum Bauteil lang)
• electromigration (hoher Stromfluss → Material wird in eine bestimmte Richtung
abgetragen → Leitungen werden abgetragen)
• antenna effects (Schleife wird zur Antenne: muss verhindert werden)
• Clock Skew (Verschiedene Gates erhalten das Clock Signal zu unterschiedlichen
Zeitpunkten aufgrund von (tlw. obigen) Effekten an den Leitungsbahnen)
5
XOR XNOR
F
F A
A B
B
A B F A B F
0 0 0 0 0 1
0 1 1 0 1 0
1 0 1 1 0 0
1 1 0 1 1 1
F = A ⊕ B = AB ∨ AB
F = AS ∨ BS
CLK
clock
Qa C D Qn
0 0 - 0
C D Qn
1 0 - 1
0 - Qa
- 0→1 0 0
1 - Qa
- 0→1 1 1
0→1 0 0
0 1 - 0
0→1 1 1
1 1 - 1
1 → 0 - Qa
0 1→0 - 0 Zusammengefasst zur
1 1→0 - 1 Reduced Excitation Table
Excitation Table
(Schaltfolgetabelle)
-“heisst Don’t Care“.
” ”
6
Anmerkung zu De Morgan
De Morgan sollte nicht auf Schaltungen angewendet werden, da diese unverständlich
werden:
vorher nachher
≥1 &
3-input-NOR
& ≥1
3-input-NAND
VDD out
.. output VDD
input .
VSS
0
1
in
VDD
2 VDD
• Input impedance → ∞
High Input Impedance → Little Loading on Driving Signal
• Output impedance → 0
Low Output Impedance → Large Currents → stable Output Voltage
• Power Consumption ≈ 0
• No Time Delay (Latency)
7
output[V]
Amp
VDD
VDD VDD VoH -1 1
VoH
1 1
ViH
ViL
0 0
VoL
0 VoL 1 -1
0 0 input[V]
ViL ViH
(the input voltages ViL and ViH are defined by the points at which the magnitude of
the slope of the voltage transfer characteristic is 1)
8
Vin
tHL tLH
Inverter
VoH
90%
50%
tPHL tPLH
10%
VoL
tr tf
tcycle
tPLH
50%
tPHL
tcycle
tcycle
9
– memory (je nach Gebrauch; eher selten)
– clock (arbeitet immer)
– analog blocks (je nach Gebrauch)
• P = I · VDD
where I is the current flowing from VDD to Gnd
• static and dynamic power
Ptotal = Pstatic + Pdynamic
• Pstatic = (IDC + Ileakage ) · VDD
Verbrauch, wenn das Gate nicht switcht
• Pdynamic = C · VDD2 ·f (↑ f → ↑ C)
Verbrauch, wenn das Gate switcht
Treiberfähigkeit C ⇒ balance of power and delay
• Höhere Frequenz verbraucht mehr Leistung
A popular missconception is that good circuit design includes new and innovative
circuit topologies. Instead:
10
2.7 MOS Transistor Structure and Operation
• behaves like a switch
• can be used to build up complex logic functions:
INV, BUFFER, NOR, NAND, MUX, D-FF
NMOS
p-Substrat
Substrat/Bulk
VDD Vin 0 1
PU 1 *
PMOS leitet sperrt
PMOS (pull up) PD * 0
Vin Vout
NMOS sperrt leitet
Vout 1∨∗=1 0∨∗=0
NMOS (pull down)
* = undefined / hochohmig
VDD
RL
Steady-State-Power
VDD − VoL Vin = high Vout = low
IDC =
RL IDC
⇒ CMOS exhibits low standby power
11
2.8 Deep Submicron (DSM) Interconnect
• to provide interconnects between gates
• to route the power supply and ground to all gates
• capacitance to ground
• interconnect issues now dominate the
– performance (→ Timing)
– reliability (→ Electromigration)
– power distribution
• Possible Solution: Buffer insertion (um Leitungslänge in Griff zu bekommen)
Vin
VDD
12
• magic triangle
quality
cost time
• most important design spec today:
– time-to-market
– time-to-volume
• Increasingly important specs in Europe (to differentiate from competition):
Safe, Secure (Datensicherheit) & Reliable Circuits (SSRC)
The Callenges Ahead
– Understanding
– Knowledge
– Creativity
– Reuse in other disciplines
3 MOS Transistor
• metal-oxide-semiconductor field-effect transistor
• MOS devices (Übergang Metall → Oxid), pn junctions, device capacitances
– threshold voltage
– current equations
– capacitance models
• enhancement-mode devices (turn on voltage > 0V, Anreicherung)
• depletion-mode devices (turn on voltage > VT , Verarmung)
13
G S
G B
D
S
W
D
p+ L p+
kein Strom hier
Raumladungszone
n-Substrat
B
Some Facts:
1.1eV
Ei Intrinsic Fermi Level
qΦFp
EFp Actual Fermi Level
Ev Valence band
(p-type semiconductor)
14
3.2.1 Intrinsic Carrier Concentration1
Intrinsische Ladungsträgerkonzentration (Silizium)
1
ni = 1.45 · 1010 (2.1)
cm3
Im intrinsischen Fall (also undotiert) gilt:
p=n
p-Halbleiter n-Halbleiter
p ≈ NA ni n ≈ ND ni
n2i n2i
n= NA p= ND
Dotierung verschiebt das tatsächlichen Fermi-Niveau EF um das elektrostatische Po-
tential ΦF (built-in potential)
kT ni
ΦFp = ln (< 0) (2.3a)
q p
kT n
ΦFn = ln (> 0) (2.3b)
q ni
Konstanten:
kT 300K
VTh = ≈ 26mV
q
V As
k = 1.38 · 10−23
K
−19
q = 1.602 · 10 As
1
Eigenleitungsträgerdichte
2
Keine Verwendung von Metall mehr aufgrund von Herstellungsschwierigkeiten im niedrigen µm
Bereich
15
Flatband condition
G D S ΦEV
E0
W
qΦG Qox
qΦC
tox EF G + EC
n+ L n+
VFB qΦGC + Ei
EFp
+ EV
p
tox
QB
Qox
Ec
Ei
ΦF
EF G EF
ΦS 2|ΦF |
EV
VT
Xd
tox
Große Fläche links: Isolator
Kleine Fläche in der Mitte: Inversion Layer (mobile charge)
Große Fläche rechts: Depletion Layer (exposed fixed charge)
16
Weite der Raumladungszone
1
2εSi |ΦS − ΦF | 2
Xd = (2.7)
qNA
VSB
B
Threshold Voltage VT
In Starke Inversion, wenn gilt: ΦS = −ΦF → ΦS − ΦF = −|2ΦF |
QB
VT = VFB − 2ΦF − (2.10)
Cox
QB0 Qox QB − QB0
= ΦGC − 2ΦF − − − (2.11)
Cox Cox Cox
| {z }
=VT0
p p
= VT0 +γ VSB + |2ΦF | − |2ΦF |
17
Herleitung für γ:
√
QB − QB0 2qNA εSi p p
− = | − 2ΦF | + VSB − |ΦS − ΦF |
Cox C
√ ox
2qNA εSi p p
nmos VSB > 0
= VSB + |2ΦF | − |2ΦF |
Cox pmos VSB < 0
| {z }
→γ
Zusammenfassung:
G VBS
S
S
VSB G
D
B
VSB falsch gepolt, hebt auch die Vorspannung der pn-Übergänge in Sperrrichtung auf.
⇒ keine Inversion (kein Transistor mehr)
n+ n+
y
Größer, da größerer
Spannungsabfall
p-Substrat
VSB
18
• 0 ≤ V (y) ≤ VDS
• gate-to-channel-voltage: VGS − V (y)
• induced charge per unit area:
IDS = Qn · v · W (2.14)
v = µE (carrier velocity) (2.15)
cm/s
µ (carrier mobility)
V /cm
dV (y)
E=
dy
IDS = Cox (VGS − V (y) − VT ) · µn E · W
⇔ IDS dy = W µn Cox (VGS − V (y) − VT ) dV
µn εox
mit k 0 = = µn Cox (process transconductance parameter) (2.16)
tox
ZL VDS
Z
⇔ IDS dy = W k 0 (VGS − V − VT )dV (2.17a)
0 0
W V2
⇔ IDS = k 0 ((VGS − VT )VDS − DS )
L 2
W
mit k = k 0 (device transconductance parameter)
L
Linear Region of Operation (only valid up to the pinch-off point):
k 2
IDS = (2(VGS − VT )VDS − VDS ) (2.17b)
2
Pinch-off Point (Saturation Voltage):
VDsat = VGS − VT (2.18)
VDsat in (2.17b):
Saturation Region of Operation:
k
IDS = (VGS − VT )2 (2.19)
2
19
IDS
VDS = VGS − VT
VGS
K 2
IDS = (2(VSG − |VTp |)VSD − VSD )
2
K
IDS = (VSG − |VTp |)2
2
• values of VT can be adjusted with body-bias VBS
• L is usually selected as the minimum value possible
• only practical degree of freedom is the selection of W
• Bisherige Formeln okay bei long- • Aber relativ ungenau bei DSM Devi-
channel devices (> 1µm) ces (< 0.18µm)
• IDS ∼ VGS (is more linear) • VDsat is smaller (0.6V in 0.18µm)
1.8
IDS
1.35 VGS
0.9
Ausgangskennlinienfeld
VGS
0.5V
20
3.4.1 Effect of High Fields
Horizontal Field (between Drain and Source)
VDS V
Ey = (≈ 105 )
L cm
weil jetzt: µE 6= v
21
Current Equations for Velocity-Saturated Devices
W
IDS = Ey
· Cox (VGS − VT − V (y)) µe Ey
1+ Ec
Linear Region:
W µe Cox VDS
IDS = VGS − VT − · VDS (2.25)
L (1 + EVDS ) 2
c ·L
Ec µe
IDS = W Cox (VGS − VT − VDS )vsat mit vsat = (2.27)
2
Durch Gleichsetzen von (2.25) und (2.27) erhält man:
(VGS − VT ) · Ec L
VDSat = (2.28)
(VGS − VT ) + Ec L
• VDSat is always lower than the first-order model (which was VDsat = VGS − VT )
• devices saturate faster and deliver less current
(VGS − VT )2
IDS = W vsat Cox Saturation region (2.29)
(VGS − VT ) + Ec L
long-channel short-channel
Ec L VGS − VT Ec L VGS − VT
W
IDS = 2L µe Cox (VGS − VT )2 IDS = W vsat Cox (VGS − VT )
(short-channel ab 0.18/0.13)
Alpha-Power Law
W
IDS = KS (VGS − VT )α (2.30a)
L
W
IDS = KL (VGS − VT )VDS (2.30b)
L
KS
VDsat = (VGS − VT )α−1 (2.31)
KL
22
Summary for DSM Devices
(VGS − VT ) · Ec L
VDS ≥ ⇒ Saturation
(VGS − VT ) + Ec L
(VGS − VT ) · Ec L
VDS < ⇒ Linear
(VGS − VT ) + Ec L
(VGS − VT )2
IDS = W vsat Cox
(VGS − VT ) + Ec L
Cjc
Csb Cdb
L
23
C
Cg
Cgb
1
C
2 g
Cgs Cgd
Acc Sat Linear
VGS
VFB Depletion VT VDS + VT
Cutoff
εox
CG = W LCox = W L = W Cg (2.34)
tox
εox fF
Cg = L = 1.6
tox µm
Cg ist praktisch eine Konstante, da L und tox üblicherweise zusammen skaliert werden
pn Junction Cap
• Csb , Csd
• Cjc
Leakage Current:
Vj
ID = IS e Vth − 1 (2.35)
24
Breite der Raumladungszone (depletion zone)
s
2εSi 1 1
Xd = ΦB +
q NA ND
Gespeicherte Ladung
Y
L
n+ n+
STI STI
Ab = W Y
Asw = W xj
Cjb Ab Cjsw Asw
CJ = mj + mjsw (2.41)
1 − ΦVBb
J VJ
1 − ΦBsw
bei einem abrupten pn Übergang:
Cjb (Ab + Asw )
≈ mj (2.42)
1 − ΦVBJ
VJ ∈ [−VDD , 0], Nenner ∈ [1, 2]
25
In digital circuitry, the voltage switches from high-to-low and vice-versa.
⇒ Equivalent, voltage-independent capacitance Ceq :
Qj (V2 ) − Qj (V1 ) ∆Q
Ceq = =
V2 − V1 ∆V
ZV2 ZV2
V −m
∆Q = C(V )dV = Cjb 1 − dV
ΦB
V1 V1
" #
V2 1−m V1 1−m
Cjb ΦB
Ceq = − 1− − 1−
(V2 − V1 )(1 − m) ΦB ΦB
bei einem abrupten pn Übergang gilt m = 0.5 und damit:
1
Ceq −2ΦB2 1 1
Keq = = ((ΦB − V2 ) 2 − (ΦB − V1 ) 2 ) (2.43)
Cjb V2 − V1
CJ = Keq (Cjb W Y + Cjb xj W )
= Keq Cjb (Y + xj )W (2.44)
= Cj W (2.45)
Overlap Cap:
LD
26
– the value of one or more nodes in a dynamic circuitry is based on stored
charge on a capacitor
• periodic clock signal
– combinational logic
– sequential logic
– to load elements, so called transmission gates, or transfer gates
• static inverters
– VTC, voltage transfer characteristics
– noise margin
– inverter configurations
– delay models
0
VDD
Gain = ∞ 2
1
Gain = 0
Vin logic 0
VDD VDD Input Range Ideal VTC of an Inverter
2
high gain region is required by all useful logic gates to regenerate high and low logic
values if there is noise in the system.
low gain regions exhibits large noise immunity.
VDD VoH
1
1 VouH
ViH
X (unknown) X (unknown)
ViL
0 VouL
0
0 VoL
Practical VTC
27
• attenuating noise in low gain regions (damping noise out of a inverter chain)
ViL
VoH
valid logic 0 1
VouH
a) Noiseless System
• largest noise level vn in a single stage that allowes subsequent stages to recover
their proper value (the regenerative property)
3
Umgebung ?
28
VoL VoH + VoH − vn ? ?
-+
vn
b) System with single stage
noise of magnitude vn
Vo (even)
Vout Vin Vi (odd)
VoH even
odd
VS
VoL V
Vin Vout oL Vi (even)
Vo (odd)
VS VoH
VoH − vn
SSNMH = VoH − VS VS 6= VT
SSNML = VS − VoL
Vout
(ViL , VouH )
VoH
-1
(ViH , VouL )
VoL -1
Vin
NML NMH VoH
29
4.3 Resistive Load Inverter Design
Vin
VDD ID
VDD −1
VDD
RL
RL
VGS
= Vin (VS , VS )
D out
ID
−1
CT
G VoL
VDS Vout
VOL VDD = Vout VT ViL ViH VDD
= VOH
VDD − VoL
IDS = = IR (4.4)
RL
VoH = VDD (4.5)
VoL : IR = IDS (Iin ) (4.6a)
VDD − VoL WN µCox 2 1
= V
2(VoH − VT )VoL − VoL (4.6b)
RL LN (1 + E L )
oL 2
c
2 1 2VDD
VoL −2 + VDD − VT VoL + =0 (4.6c)
kRL kRL
Lösen per Mitternachtsformel : ax2 + bc + c = 0
√
−b (±) b2 − 4ac
x1,2 =
2a
√ x
1 − x = 1 − für |x| ≤ 1
2
−b (±)b − 4ac
2b ac c
x1,2 = =− =−
2a ba b
2VDD 1
x1,2 = −
kRL −2 1 + V − V
kRL DD T
VDD
= = VoL (4.6d)
1 + kRL (VDD − VT )
30
Discussion of 4.6d
VoL ↓ by W
L ↑ or RL ↑ (at the expense of area)
RL ↑⇒ tr ↑, k ↑⇒ tf ↓
VoL ↓ und k ↑⇒ Power ↑ Power = I · V = VRDD
L
VDD
RL ↑⇒ Power ↓ ultimately: Noise margin traded off by te choise of K an RL
Bestimmung von ViL , ViH mit δV
δVin = −1
out
Vin = ViL
Vout ≤ VDD
IR = IDS (sat) (4.8a)
VDD − Vout W vsat Cox (Vin − VT )2
= (4.8)
−
RL (V
in VT ) + Ec L
µEc
mit vsat =
2
W
und k = µn Cox
L
1 δVout 1
= k(Vin − VT ) - , Faktor 2 ?
RL δVin RL
1
ViL = VT + = NML (4.8b)
kRL
Bewertung
NML ↑⇐ (K ↓ and RL ↓) ⇒ VoL ↑
| {z }
not desirable
ViL and VoL shift in the same direction
RL and WL will most likely be controlled by VoL (VoL specified by spec)
31
KRL
VouH = VDD = (ViL − VT )2
2
Vin = ViH ⇒ linear region
IDS (lin) = IR
2
W µn Cox Vout VDD − Vout
(Vin − V )V
T out − = (4.8c)
L 1 + VEout
L
2 R
c
1
VIH = VT + 2Vout − (Vin = VIH ) (4.8d)
KRL
(4.8d) in (4.8c) ⇒ (4.8e) ∧ (4.8d)
KRL 2 2Vout
(3Vout − = VDD −
Vout
2 kRL
2 2VDD
Vout = (in 4.8d)
2KRL
r
8VDD 1
VIH = VT + − (4.8f)
3kRL kRL
32
VDD
ID Vout
(W/L)I V VGS = VoH
oH VoH (ViL , VouH )
ID
Vout
load line
Vin (W/L)L VoH = VDD −VT (VoH ) (ViH , VouL )
VDS VoL
= Vout Vin
VoL VDD
Kinvert k0 ( W
L )I (W
L )I
KR = = 0 W = W (4.10)
Kload k ( L )L ( L )L
• VoL ≈ 5% · VDD
• Threshold Voltage
p p
VTL = VT0 + γ VSB + 2|ΦF | − 2|ΦF | (4.11)
33
4.4.2 Linear Enhancement Load
• Connect the Gate of the load transistor to a dc voltage VGG > VDD to increase VoH
VGG VDD
ID Vout
VGS = VoH
VoH VoH
= VDD
ID
Vout
Vin
VDS VoL
= Vout Vin
VoL VDD VTViL ViH VDD
=VoH
a) enhancement- b) drain characteristic c) VTC
loaded inverter, line- and load line
ar load
34
IDS
S
WP III
G B LP
D IDP =IDN IV II
Vin Vout
D
WN
G B LN
S Vout
VDD
2
• subthreshold currents
• leakage currents
⇒ quiescent power dissipation [nW]
35
Bereich III
√
χ(VS − VTN ) =( ±) (VDD − VS − |VTP |)
v
u WN
uE L
χ = t CN N WP
ECP LP
s
W N µn
= (4.15)
W P µp
2Vsat
EC =
µe
χ ↑ ⇒ VS ↓
VDD
VoL =0 Vin
VTN ViL VDD −|VTP |
0 (V ) = −1.
An der Stelle (ViL , VouH ) ist die Steigung Vout in
KN
(ViL − VTN ) = − (ViL − VTN ) − 2 (VDD − Vout )
| {z } KP | {z }
x |{z} y
α
+ (VDD − (VTN + |VTP |)) (4.17’)
| {z }
z
2
χ =α (4.15)
aus (4.16) :
1 KN 1
(ViL − VTN )2 = (VDD − Vout ) [−(VDD − Vout )
2 KP 2
+ 2(VDD − (VTN + |VTP |)) − 2(ViL − VTN )] (4.16’)
36
damit :
2y = z − x(1 + α) (4.17”)
αx2 = y(2z − 2x − y) (4.16”)
(4.17”) in (4.16”) :
1 1 x
αx2 = (z − x(1 + α))(2z − 2x − z + (1 + α))
2 2 2
1
= (z − x(1 + α))(3z − x(3 − a))
4
4αx2 = 3z 2 + x2 (3 + 2α − α2 ) − xz(3 + 3α + 3 − α)
3z 2 = x2 (α2 + 2α − 3) + 2xz(3 + α) (1)
2 2
3z = x (α − 1)(α + 3) + 2xz(α + 3) (1’)
KN = KP ⇒ α = 1
|VTP | = VTN ⇒ z = VDD − 2VTN
⇒ symmetrische VTC (ViL + ViH = VDD )
aus (1’) folgt :
3
x= z
8
2 3 1 3
ViL = VDD + VTN − VTN · 2 = VDD + VTN
8 8 4 2
1 5
ViH = VDD − ViL = VDD − VTN
4 2
37
KN = KP ⇒ α = 1
z = VDD − (VTN + |VTP |)
ViL + ViH = 2VS
VDD − (VTN + |VTP |)
VS = VTN +
2
ViL + ViH = 2VTN + VDD − VTN − |VTP | = VDD + VTN − |VTP |
⇒ skewed VTC (ViL + ViH = VDD + VTN − |VTP |)
aus (1’) folgt :
3
x= z
8
3 3 3 1
ViL = VTN + VDD − VTN − |VTP | = (3VDD + 5VTN − 3|VTP |)
8 8 8 8
1
ViH = VDD + VTN + |VTP | − ViL = (5VDD + 3VTN − 5|VTP |)
8
aus (1’) :
x2 (a + 3)(a − 1) + 2x(3 + a)z = 3z 2
2 2
3z 2
2 z z z
x + 2x + = +
(a − 1) a−1 (a − 1)(a + 3) a−1
2 2 2
z 3z (a − 1) + z (a + 3)
x+ =
a−1 (a − 1)2 (a + 3)
2
2z a
=
a−1 a+3
r
z 2z a
x=− (±)
a−1 a−1 a+3
r
z a
= 2 −1
a−1 a+3
Bereich IV
Vout KN 1
Vout (ViH − VTN ) − = (VDD − |VTP | − ViH )2 (4.18)
2 KP 2
KP KP
ViH 1 + = 2Vout + VTN + (VDD − |VTP |) (4.19)
KN KN
38
Vout
VDD
VoL =0 Vin
ViH VDD −|VTP |
KP
−(VDD − |VTP | − ViH ) = 2 Vout − (VDD − (VTN + |VTP |)) + (VDD − |VTP | − ViH )
|{z} | {z } KN | {z }
y z |{z} x
α
z = 2y + x(1 + α) (4.19”)
2
ax = y(−2x + 2z − y) (4.18”)
q
a
2 a+3 −1 0
l’Hopital =
a−1 0
a=11 1 a+3−a
⇒ 2 q
2 a (a + 3)2
a+3
3 1 3
= 3(a + 3)− 2 a− 2 =
8
39
4.5 Layout design of CMOS Inverter
• Layers: Metal1, Poly, Contact, n+, p+, n-well, p-well
Gnd D VDD
p-well n-well
Metal1
Poly
G
• well plugs
• metal-to-poly (not shown here)
• 4 contacts
• L, W, Area Source, Area Drain, Perimeter Source, Perimeter Drain
A Vout Vout
40
⇒ ratioed inverter (VoL )
power dissipation during output low
smaller area for multi-img gates
• timing
• power
• area
• noise margins
• is ratioless
• and standby power is small
41
V
Vin Vout
tPLH
tPHL
t
Reff =RP
Vout
Reff =RN CL Vout
CL
Falling Case:
Reff = RN
t
−R
Vout (t) = VDD e N CL (4.22a)
Rising Case:
Reff = RP
− t
Vout (t) = VDD 1 − e RP CL (4.22b)
τ = 0.69Reff CL (4.22c)
42
W
Valid for 0.35µm, 0.18µm and 0.13µm: = L
LN
RN = Reqn · (4.24a)
WN
LP
RP = Reqp · (4.24b)
WP
W
= 1 ⇔ (RN = 12.5kΩ) ∧ (RP = 30kΩ)
L
W 1
= ⇔ (RN = 6.25kΩ) ∧ (RP = 15kΩ)
L 2
In En Out
0 0 Z
En 0 1 1
1 0 Z
out
1 1 0
En out out out
En En
* ∧ ∪ ∧ * = 0 1
In In
In
43
∗ VTC
∗ transistor sizing
∗ 1st order timing characteristics
∗ power
– Construction of sequential logic gates: FlipFlop, Latch We will examine their:
∗ constuction using conventional CMOS or Pseudo-NMOS
∗ operation, limits and timing constraints
∗ set-up (vorher) and hold (nachher) times for D-type FlipFlops / Latches
• Power & Timing are the main design specifications for digital integrated circuits
– Usually, one is traded off against the other
∗ Reducing circuit activity
∗ Reducing switching power (operation voltage)
∗ Minimizing glitches4
∗ Equalizing rise & fall times
∗ Reducing Subthreshold leakage
• Design metrics (to compare reduction of power and delay)
– power-delay-product
⇒ Comparison of single gate configurations that use the same supply voltage
and drive the same load (Vergleich von Gattern)
– energy-delay-product
⇒ comparison of two different designs that perform the same function (Ver-
gleich von Designs)
B
F =A∨B F = AB
A
44
• Multiple fanin NOR (stack of series-connected PMOS devices)
• Multiple fanin NAND (stack of series-connected NMOS devices)
2W A 4W
VDD 2W
F
B 4W
CL F
2W A 2W
CL
A F W
W CL B 2W W
• The ratio of Wn /Wp depends primarily on the desired VoL (for pseudo NMOS
inverter)
VDD
VDD Wp VDD
F
Wp A 3Wn CL Wp
F F
Wn 3Wn Wn Wn Wn
A CL B CL
A B C
C 3Wn
45
3Wn
F 3Wn
A Ln F
B Ln 3Ln
C Ln Poly
Diffusion
Poly
W
Pseudo-NMOS are sized for both VoL and timing when we choose the L ratios for the
individual devices
1 1 1 1
= + + (5.1, series stack, all devices on)
Weq W1 W2 W3
W1 W2 W3
⇒ Weq =
W1 W2 + W2 W3 + W1 W3
Weq = W1 + W2 + W3 (5.2, parallel stack, all devices on)
Of course, we design the NOR gate assuming that only one pull-down device is on at
a time, since this is the worst case.
For example:
46
1
2
& 1
2
≥1
5 5
2W
8W Wn
8W
≥1
&
& 1 & 1
&
& 1
≥1
& 1 &
47
CG
CG
CG
CG
Typically, gates are designed to drive a specific number of fanouts to deliver a speci-
fic target delay. Infact, different cells are designed to drive a different number of loads.
Ultimately, the choice of the width W is based on this fanout ratio (Chapter 6).
Vout
VDD VDD
Inv
2W
one
2W both inputs tied
F input
only
CL
A 2W
B 2W Vin
VTC for NAND2 gate
48
Vout
VDD
Inv
A 4W
both
one input only
B 4W inputs
F tied
CL
W
W Vin
VTC for NOR2 gate
In any case, the single-input switching case is usually taken as the VTC of the multiple
input gate.
The NOR has a higher switching threshold than the NAND, and therefore SSNML is
higher and SSNMH is lower for the NOR gate as compared to the NAND gate.
F =(A ∨ B) ∧ C = A ∨ B ∨ C =
A ∧ B ∨ C =: P U (A, B, C) (OAI)
F =(A ∨ B) ∧ C =: P D(A, B, C) (AOI)
F =P D ∨ P U
VDD
F = PD ∨ PU
C
B A
49
Complex CMOS gate implementation of (A ∨ B) ∧ C
PU vs P D:
• dual in operation“
”
• component-wise complement“
”
f = a ∧ b:
a b f f PD PD PU
0 0 0 1 1 0 *
0 1 0 1 1 0 *
1 0 0 1 1 0 *
1 1 1 0 * * 1
Damit:
a a
0 0 * *
f = PD ∨ PU = ∨
b 0 * b * 1
1. Möglichkeit: 2. Möglichkeit:
PD → f =a∨b PU → f =a∨b
P D = a ∨ b P D(a, b) PU = a ∧ b P U (a, b)
PU = a ∧ b P U (a, b) P D = a ∨ b P D(a, b)
f = a ∨ b ∨ (a ∧ b) = a ∧ b
VDD
A
B PU p-complex = PU
C
F = PD ∨ PU dual + complement
A
B PD n-complex = PD
C
50
5.4 XOR and XNOR Gates
fXOR = ab ∨ ab = P U fXNOR = ab ∨ ab = P U
ab ∨ ab = P D ab ∨ ab = P D
VDD VDD
a b a b
b a b a
f f
a a a a
b b b b
a ∼ a ∼
b f b f
f = as ∧ bs = P D
VDD
a b
a 1 s s
f = PD ∨ PU
f
b 0 a s
s
b s
51
5.6 FlipFlops and Latches (sequential circuits)
• feedback loops (history) (positive, regeneration)
Outputs are dependent on the inputs and the preceding values of outputs (e.g.
counters, registers)
• bistable circuit (two stable operating points)
– Latches, level-sensitive or transparent (e.g. enabled by clock)
– FF, edge-triggered (discrete point of time)
Vout
Vout A
2 Inv1
C Inv2
1 B
Vin
Vin
VoL VoH
52
5.6.2 SR Latch (set-reset latch, active high, 2 NORs)
≥1
S R Q1 Q2 S Q2
hold 0 0 Q2 Q1
reset 0 1 0 1
set 1 0 1 0
≥1
1 1 0 0 Q1
R
S Q1 S Q
• S → Q: 2 NOR delays
• S → Q: 1 NOR delay
• R → Q: 1 NOR delay
R Q2 R Q
• R → Q: 2 NOR delays
∗ = [II]
c) symbol (active high inputs) d) Transitions
KBS: Pin-oriented description
S Q1 S Q
• S → Q: 1 NAND delay
• S → Q: 2 NAND delays
• R → Q: 2 NAND delays
R Q2 R Q
• R → Q: 1 NAND delay
∗ = [II]
c) symbol (active low inputs) d) Transitions
53
5.6.4 JK-FlipFlop
• active high voltage levels
• clocked (not edge-triggered)
Jn Kn Qn+1
J Q
0 0 Qn &
0 1 0 J S Q Q CK
1 0 1 CK &
1 1 Qn K R Q Q K Q
& &
J S Q S Q Q
CK & &
K R Q R Q Q
CK
Master Slave
tp > tp (Master)
• Slave is a level-sensitive circuit (ones catching)
• any spike or glitch on the J input line will cause the master latch to be set
→ Keep 1 state of clock short
→ edge-triggered flip-flop
Jn Kn Qn+1
J Q
0 0 Qn & &
0 1 0 J S Q Q CK
1 0 1 CK & &
1 1 Qn K R Q Q K Q
54
For proper functionality, the values on the J and K inputs must be held stable for a
finite length of time before the clock edge (set-up time).
... after the clock edge arrives (hold time, > 0ns).
The device usually has additional inputs for direct set and reset so that the outputs can
be initialized to the desired settings.
D-Latch operation
IN
IN DQ OUT
CLK
CK
OUT
CLK t
D-Flop operation
IN DQ OUT IN DQ DQ OUT IN
CK CK CK CLK
CLK
OUT
t
CLK
Sample-and-hold circuit: The sampling is done when the clock edge hits the flop.
There are are number of important timing characteristics associated with flops and lat-
ches.
55
D
will work won’t work
May work
CLK
Thold
Tsetup Tclk-q
Tsetup : The time that the incoming data must be stable before the clock arrives
Thold : The length of time that the data remains stable after the clock arrives for proper
operation
uncertainty interval: may work
clock-to-Q delay: Tclk-q
overhead: Tsetup + Tclk-q
When designing flop, we should try to minimize Tsetup , Thold and Tclk-q
CLK
Thold
Tsetup
Q
Td-q
56
& ≥1
D Q = Q2
CK
& ≥1
Q = Q1
Q2 := D ∧ CK ∨ Q1 P D2 (D, CK, Q1 ) = D ∧ CK ∨ Q1
P U2 (D, CK, Q1 ) = (D ∨ CK) ∧ Q1
Q1 := D ∧ CK ∨ Q2 P D1 (D, CK, Q2 ) = D ∧ CK ∨ Q2
P U1 (D, CK, Q2 ) = (D ∨ CK) ∧ Q2
VDD VDD
Q1 Q2
Q2 Q1
D CLK
CLK Q1 Q2 D
57
• Power is due to current flowing from the supply to ground
→ add up all sources of current flow from VDD to Gnd and multiply it by the
potential difference between the two supply rails
P = Id · VDD
• Dynamic Power due to
– capacitance switching
– short-circuit power due to crowbar“ current during switching (VDD - Gnd)
”
– glitches in the output waveforms
• Static Power due to
– leakage currents (subthreshold, junction reverse-bias current)
– dc standby current (e.g. Pseudo-NMOS)
dV CL ∆Vswing
ID, avg = C = = CL VDD favg (5.8)
dt ∆t
2
Pswitching = ID,avg · VDD = CL ∆Vswing favg VDD = CL VDD favg (5.9)
⇒ Keep CL small
⇒ Reduce the swing (VDD )
⇒ Reduce the switching frequency favg
The approach used to reduce power will be dependent on the design style used:
• timing constrains
• area constraints
• and other tradeoffs
favg :
CLK (40%)
∆t
2 #toggles/2
Pswitching =α07→1 CL VDD fCLK α07→1 = (5.10)
#clock cycles
58
crowbar“current: |VGS | > |VT |, VTN < Vin < VDD − |VTP |
”
∆tsc = ∆tscr + ∆tscf
VDD
VDD −|VTP |
ISC
Vin Vout
VTN
CL
∆tscr
VDD
VDD −|VTP |
Vin Vout
VTN ISC
CL
∆tscf
In effect, there is a tradeoff between dynamic power of the previous gate and short-circuit
power on the next gate.
⇒ input and output edge rates sharp and about equal
Glitches tend to propagate through the fanout gates and cause unintended transisi-
tons in subsequent stages, increasing the power dissipation even further.
59
⇒ Signals should be made to arrive at roughly the same time at all gate inputs
Keep things balances (glitch free) (path vs gate delay)
A: Area of Junctions
Js : current density
(bottom of the junction, channel-facing sidewall, pA-nA range)
2 IDC VDD
P = αCVDD fCLK + Ileak VDD + (5.18, Pseudo-NMOS)
2
Geteilt durch 2: dc current flowing half the time
60
5.9 Power and Delay Tradeoffs
• equalizing the input arrival times will reduce glitches
• equalizing the edge rates will minimize short-circuit currents
Power-delay-product (PDP)
1
PDP = Pavg · tp 2tp = (5.19)
f
2 1
= CVDD f·
2f
2
CVDD
= (5.20)
2
(energy to perform a specific operation - per toggle -)
Z ∞
EC = ic (t)vout (t)dt
Z0 ∞
dvout
= C vout (t)dt
dt
Z0 ∞
= Cvout (t)dvout
0
1 2
= CVDD
2
Limitations:
VDD ↓ ⇒ tp ↓
C ↓ ⇒ tp ↓
Energy-Delay-Product
EDP = P DP · tp (5.21)
New formulation for tp:
dV
I=C
dt
∆V
∆t = C (5.22)
I
C∆V CVDD
tp = ≈
Isat K2 (VDD − VT )
61
K2 : constant that depends on device sizes
C 2 VDD
3
EDP = (5.23)
2K2 (VDD − VT )
Energy
(PDP)
Lower EDP
c
a
d
b
1
delay
Energy
Energy + Delay
Normalized
values
Delay
VDD
optimum EDP:
∆EDP
=0
∆VDD
∗ 3
VDD = VT
2
Used to set the supply voltage for a given technology, at least!
5.10 Summary
For a series stack with three devices, the width of a single equivalent device is
W1 W2 W3
Weq = (series stack)
W1 W2 + W2 W3 + W1 W3
For a parallel set of devices, the equiavalent width with all devices turned on is
62
Dynamic and static power
2
Pdynamic = αCL VDD fCLK
Pstatic = (Isub + Ipn )VDD
2 IDC VDD
P = αCVDD fCLK + Ileak VDD +
2
Energy-Delay-Product:
C 2 VDD
3
EDP = PDP · tp = P · tp · tp =
2K2 (VDD − VT )
which means:
critical path:
63
50%
a & b ≥1
a b
tp
VS 50%
VS
a b
tp tp
Negative delay indicates a slow gate in the path
50% definition is the most practical and intuitive reference point for propagation delay
90% 90%
10% 10%
Rise time Fall time
VoH
Iout VoL ILH IHL
Vin Vout VoL → VDD
2
VoH → VDD
2
VoH
CL CL CL
VoL
a) b) tPHL c) tPLH
64
dV
I = CL
dt
∆v
∆t ≈ CL
IDS
CL VDD /2
tPLH =
ILH
CL VDD /2
tPHL =
IHL
Average propagation delay:
tPLH + tPHL
tp =
2
0.13µm
CL VDD /2
tPHL = = 0.7RN CL
IDsat,n
VDD /2
RN = (6.4b)
0.7IDsat,n
(1.2 − 0.4)2.4
VDsat = = 0.6V (PMOS)
(1.2 − 0.4) + 2.4
VSD ∈ [1.2; 0.6]V ⇒ VSD > VDsat ⇒ saturation of PMOS
CL VDD /2
tPHL = = 0.7RP CL
IDsat,p
VDD /2
RP = (6.5b)
0.7IDsat,p
Reqn = 12.5kΩ/
L
RN = Reqn ( )n
W
Reqp = 30kΩ/
L
RP = Reqp ( )p
W
works well for timing calculation in digital, should not be used for any other purpose!
⇒ In reality, the on-resistance is nonlinear and its value depends on the applied voltages.
65
6.1.1 Gate Sizing Revisited - Velocity Saturation Effects
I12 >I0
I0
2W CL
W CL M1
M0 2W
M2
IDS IDS
VGS1 =VDD
M1
VGS =VDD I12 M2
I0 M0 VGS2 =VDD − VDS1
VDS VDS
VDD VDD VDS1 VDD VDD
2 2
Stacked devices with velocity saturation (for long channel devices quadratic depen-
dence)
To equalize the delays the stacked devices can be reduced by roughly 20-25%
For a pair of long channel devices in the series stack, the discharge current would actually
be lower than the single transistor case (current drops off quadratically)
2W 3.2W
VDD 2W
3.2W
2W 1.6W
W
W 1.6W W
66
6.2 Detailed Load Capacitance Calculation
• self-loading capacitance
• interconnect (or wire) capacitance
• fanout capacitance
X
Cfanout = CG
67
VDD
COL
S
B
CGp D
Vin Vout
D
B
CGn S
VSS
CGSp CSBp
CGSn CSBn
COL Vout D
2COL
Vin
2COL
68
Cself = CDBn + CDBp + 2COL + 2COL
= Cjn Wn + Cjp Wp + 2Cox (Wn + Wp )
= Ceff (Wn + Wp )
fF fF fF
Ceff = Cj + 2Col ≈ 0.5 + 2(0.25) =1
µm µm µm
VDD
A M4
A=0→1→0
x Shared S/D CDB4
x
B M3 CSB3
B=0
CDB3
F
F
CDB1
CDB2
M1 M2
Shared S/D
A B
a) b)
Self-capacitances for a NOR gate
2
tPHL : Cself = CDB1 + CDB2 + CDB3 + (CSB3 + CDB4 ) (6.11)
3
2
= CDB12 + CDB3 + CSDB34
3
Do not double count shared regions!“
”
The actual delay depends on the order in which inputs switch.
In a series stack, the delay increases as the late arriving input is further from the
output.
Late arriving signal is C: CL + Cx + Cy
B: CL + Cx
A: CL
There are a number of ways to design around this delay difference.
For example:
MC > MB > MA
The correspondingly larger caps act to offset the advantages of progressive sizing!
fF
Cwire = Cint Lw = 0.2 · wire length
µm
VDD
iPMOS
iout
Vin Vout
iNMOS CL
KCL at output
dVout
Iout = −CL = INMOS − IPMOS
dt
All three currents are a function of Vin and Vout (Großsignalverhalten vs Kleinsignalver-
halten, GBS vs KBS)
⇒ contour map
70
Vout
A B
VDD
discharging current
iout = Imax
charging current
D C
Vin
iout = Imax iout = 0 VDD
Simplified inverter output current as a function of Vout and Vin
VDD VDD
A B A B
VDD VDD
Decreasing
Input Slop
D C D C
VDD VDD
a) Step input trajectory b) Ramp input trajectory
71
Example: Delay Calculation with Step Input
tPHL, step
dVout VDD /2
Iout = Imax = CL = CL ⇔
dt tPHL, step
VDD /2
tPHL, step = CL
Imax
72
Example: Delay Calculation with Ramp Input
dVout
Iout = −CL ⇔
Z Z dt
Iout dt = − CL dVout
Z tr /2 Z tPHL, ramp Z VDD /2
t
Imax dt + Imax dt = CL dVout
0 tr /2 tr /2 0
Imax t2 tr /2
tPHL, ramp
VDD
+ I t
max
= CL
tr /2 2 0
tr /2 2
tr VDD
Imax + Imax (tPHL, ramp − tr /2) = CL f
4 2
tr CL VDD /2
tPHL, ramp = +
4 Imax
tr
tPHL, ramp = + tPHL, step with tr ∼ = 2tPLH, step
4
2tPLH, step ! 2tPHL, step
tPHL, ramp = + tPHL, step = + tPHL, step
4 4
tin
tramp = ∆tramp + tstep = + tstep
2
1
mit tin = 0.7RC ≈ 0.3RC
2
tstep = 0.7RC
⇒ tramp ≈ RC
X
total delay ≈ Ri Ci
i
P
Product of on-resistance and load capacitance
73
Example 6.7 Optimal PMOS Device Size For Inverter Chain
Inverter sizing can be performed
Wp Wp Wp Wp (0.18µm)
4λ 4λ 4λ 4λ
a)
Reffp 30kΩ WP
= = 2.4 ⇔ = 2.4
Reffn 12.5kΩ WN
WP = 2.4 · 4λ ≈ 10λ (0.18µm, λ = 100nm)
fF
Ceff = 1 Cself = Ceff (WN + WP ) = 1.4f F
µm
fF
Cg = 2 Cfanout = Cg (WN + WP ) = 2.8f F
µm
delay of the chain
2λ
τtotal = 4tPHL = 4Reff Cload = 4(12.5kΩ) 4.2f F = 105ps
4λ
b) page 275: τtotal = 97.6ps
Designers typically use a 2:1 ratio since it is a reasonable compromise between minimum
delay and equal rise/fall times.
Cload Cload
a) b)
Sizing inverters to drive a large capacitive load
74
Cload
Cin Cload
VDD
Vin Vout
Cin Cself Cout
75
Delay for inverter driving a load
tdelay = Reff (Cout + Cself )
Cout Cself
= Reff Cin +
Cin Cin
Cout
= τinv + γinv
Cin
Cself
γinv =
Cin
Cout
fanout ratio (electrical effort): f =
Cin
0.13µm:
fF
τinv = 3Reqn Cg Ln = 3 · 12.5kΩ · · 0.1µm = 7.5ps
µm
Cself Ceff 3W 1f F/µm
γinv = = = = 0.5
Cin Cg 3W 2f F/µm
0.18µm:
τinv = 15ps
γinv = 0.5
N
X Cj+1
total delay = τinv + γinv
Cj
j=1
X Wj+1
= τinv + γinv
Wj
j
Wj Wj+1
Dj = τinv + γinv + τinv + γinv
Wj-1 Wj
δDj 1 Wj+1
= τinv − τinv =0
δWj Wj1 Wj2
p
Wj = Wj+1 · Wj-1
1 f f2 f N −2 f N −2
76
Series chain of inverters
ln(Cload /Cin )
f N Cin = Cload ⇒ N = (6.21)
ln(f )
Cj
gate delay = τinv + γinv
Cj1
Cj
total delay = N · τinv + γinv (6.22)
Cj-1
ln(Cload /Cin )
= · τinv (f + γinv ) (6.23)
ln(f )
Optimal value of f lies in the range of 2.5 to 4 (e). Typically, we use f = 4, the fanout
of 4 delay, or FO4 delay.
&
&
&
j−1 j j+1
C
X j+1
total delay = τnand +γnand
Cj
j | {z }
f ≈4
C
X j+1
total delay = τnor +γnor
Cj
j | {z }
f ≈4
Ln
τnand = Reff Cin = Reqn 4Wn Cg = 4Reqn Cg Ln
Wn
Ln
τnor = Reff Cin = Reqn 5Wn Cg = 5Reqn Cg Ln
Wn
&
≥1
j j+1 j+2
Cj+1 Cj+2 Cj+3
total delay = τnand + γnand + τinv + γinv + τnor + γnor
Cj Cj+1 Cj+2
77
Cj+1 Cj+2
Dj+1 = τnand + γnand + τinv + γinv
Cj Cj+1
δDj+1 1 Cj+2
= τnand − τinv 2 = 0
δCj+1 Cj Cj+1
τnand FOj = τinv FOj+1
j j+1 j+2
&
≥1
2f F
200f F
Cload 200f F
τnor = 5Reqn Cg Ln = 18.2Reqn Cg Ln
Cj+2 Cj+2
Cj+2 = 55f F
Cj+2 55f F
τinv = 3Reqn Cg Ln = 18.2Reqn Cg Ln
Cj+1 Cj+1
Cj+2 = 9.1f F
Cj+1 9.1f F
τnor = 4Reqn Cg Ln = 18.2Reqn Cg Ln
Cin Cin
Cj+2 = 2f F
78
Optimizing Paths with Logical Effort
intrinsic time constant of a gate
logical effort (LE) =
τinv
LEinv = 1
τnand
LEnand =
τinv
τnor
LEnor =
τinv
total delay τnand Cj+1 τinv Cj+2 τnor Cj+3
= + γnand + + γinv + + γnor
τinv τinv Cj τinv Cj+1 τinv Cj+2
⇔
Wobei:
τgate
LE = (logical effort)
τinv
Cj+1
FO = (fanout, electrical effort)
Cj
P = LE · γgate (parasitic)
τinv = 3Reqn Cg Ln
τnand = 4Reqn Cg Ln
τnor = 5Reqn Cg Ln
LEinv = 1
4
LEnand =
3
5
LEnor =
3
We can also use another approach to compute LE:
79
a) Set the delays of the inverter and the gate to be the same; then take the ratio
of the input capacitances, or
b) Set the input capacitances to be same; then take the delay ratio.
VDD VDD VDD
2 A 4
VDD 2
B 4
2 A 2
A 1
1 B 2 1
4 5
LE = 1 LE = 3 LE = 3
3 12
2 A 5
3
VDD 2
12
B 5
2 3
A 2
3
A 5
3 3
1 B 2 5
4 5
LE = 1 LE = 3 LE = 3
1
RN ∼
( 43 Reff )Cout
4 WN
LEnand = =
Reff Cout 3
5
( Reff )Cout 5
LEnor = 3 =
Reff Cout 3
80
In effect, we are taking the ratio of the driving resistance of gates that have equal
input capaciatances.
A lower LE is better than a higher LE. So when we have a choice, we should use
NAND gates over NOR gates.
Type 1 input 2 3 4
INV 1 - - -
4 5 6
NAND - 3 3 3
5 7 9
NOR - 3 3 3
2. How do we compute the parasitic term (P) for any type of logic gate?
P is technology- and gate-dependent
0
Cself
P = LE · γ = LE · 0
Cin
fF
Ceff · 3W Ceff 1 µm 1
Pinv = LE = LE = 1 · fF =
Cg · 3W Cg 2 µm 2
4 Ceff 2W + 2W + 2W 4 1 6
Pnand = · = · · =1
3 Cg 2W + 2W 3 2 4
5 Ceff W + 4W + 4W 5 1 9
Pnor = · = · · = 1.5
3 Cg W + 4W 3 2 5
NAND3
5 1 3+3+3+2+2 13
Pnand = · · =
3 2 3+2 6
NAND4
6 1 4+4+4+4+2+2 20
Pnand = · · =
3 2 4+2 6
NOR3
7 1 6+6+6+1+1 20
Pnor = · · =
3 2 6+1 6
NOR4
9 1 8+8+8+8+1+1 34
Pnor = · · =
3 2 8+1 6
81
Table of parasitic terms for simple gates
Type 1 input 2 3 4
INV 0.5 - - -
13 20
NAND - 1 6 6
NOR - 1.5 206
34
6
(accounting for shared S/D regions in the layout), bzw genauer:
VDD
0→1
1 2W
1→0
2W
< 0.8
2W
GND
4 1 1
Pnand = · · · (0.5 + 0.5 + 0.25)2+
3 2 2+2
2 5
+ (0.5 + 0.25)2 + (0.5 + 0.25) 2 + 0.5 · · 2
3 6
1 5
= 2.5 + 1.5 + 1 +
6 6
5.8
= ≈1
6
82
VDD
4W
> 0.4
4W
0→1
0 1W
1→0
GND
5 1 1
Pnor = · · · (0.5 + 0.5 + 0.25)1+
3 2 4+1
2 5
+ (0.5 + 0.25)4 + (0.5 + 0.25) 4 + 0.5 4
3 6
1 2
= 1.25 + 3 + 2 + 1 +
6 3
7.9
= ≈ 1.5
6
NAND3
5 1 1
Pnand = · (0.5 · 3 + 0.25 · 2)2 + (0.5 + 0.25)3+
3 22+3
2 2 5
+ (0.5 + 0.25 · 2) · · 6 + (0.5 + 0.25) · · 3 + 0.5 · · 3
3 3 6
11
= ≈2
6
83
NAND4
6 1 1
Pnand = · (0.5 · 3 + 0.25 · 3)2 + (0.5 + 0.25)4+
3 22+4
2 2 5
+ (0.5 + 0.25 · 2) · · 4 · 2 + (0.5 + 0.25) · · 4 + 0.5 · · 4
3 3 6
17.5
= ≈3
6
NOR3
7 1 1
Pnor = · (0.5 · 3 + 0.25 · 2)1 + (0.5 + 0.25)6+
3 26+1
2 2 5
+ (0.5 + 0.25 · 2) · · 6 + (0.5 + 0.25) · · 6 + 0.5 · · 6
3 3 6
16
= ≈3
6
NOR4
9 1 1
Pnor = · (0.5 · 3 + 0.25 · 3)1 + (0.5 + 0.25)8+
3 28+1
2 2 5
+ (0.5 + 0.25 · 2) · · 8 · 2 + (0.5 + 0.25) · · 8 + 0.5 · · 8
3 3 6
26.25
= ≈ 4.5
6
We need to equalize the LE·FO compontents of the delay for all gates:
84
Cload
total path effort = LEnand · LEinv · LEnor
Cin
4 5 200
= ·1· · = 222.2
3
√ 3 2
3
stage effort = 222.2 = 6
85
6.6.1 Understanding Logical Effort
5.5
Normalized Delay: D
4.5
NA 2
2
R
ND
NO
3.5
V
IN
2.5
Effort D = LE· FO +P
delay
1.5
0.5
Parasitic delay
0 1 2 3 4 5 6
Electrical Effort: FO = CCout
in
VDD
The rising and falling cases must be handled separately for this gate.
Method 1: Set the delay equal to that of the reference inverter and take
1 the ratio of input capacitances
A Falling case Rising case 2
1 Cin |gate 1+1
LEF = Cin |inv = 1+2 = 3 2
LER = 2+2 = 4
2
1+2 3
Method 2: Use the definition of logical effort (Set input capacitances equal, then
take LE)
(Reff Cin )|gate
LE =
(Reff Cin )|inv
Falling case Rising case
(2Reqn )(2Cg W ) 2 (2Reqn )(2Cg W ) 4
LEF = Reqn (3Cg W ) = 3 LER = Reqn (3Cg W ) = 3
average logical effort:
2
+ 43 3
=1 LE =
2
if required, otherwise falling and rising case separately.
86
Example 6.13: Path Optimization Using Logical Effort
X Y
≥1 Z
&
10
20
Y
total path effort = (LE · FO)
X 5 Y 4 Z 20
=1 · · ·1
10 3 X 3 Y Z
20 20 400
= · =
9 10 90
1
400 4
optimal stage effort: SE∗ = = 1.45
90
total path delay: D = 4SE∗ + 2Pinv + Pnor + Pnand
= 4 · 1.45 + 1.0 + 1.5 + 1 = 9.3
The optimal value of the total path delay is known before the gate sizes have even been
determined!
Cout
SE∗ = LE × FO = LE ·
Cin
Cout
Cin = LE working backwards
SE∗
20
Z =1· = 13.8
1.45
4 Z
Y = · = 12.7
3 1.45
5 Y
X= · = 14.5
3 1.45
14.5
Cin = 1 · = 10
1.45
The first inverter size can be verified by the last equation, which is consistent with the
specified input capacitance value.
87
90
15
90
15
Branching Factor:
Sideload
Y Z
&
&
4.5
X Y
1 & Z
&
&
4.5
P Z
&
4.5
88
3
4
logical effort: LEp =
3
Cout
electrical effort: FOp = = 4.5
Cin
branching effort: BEp = 2 · 3 = 6
path effort: PE = LEp · BEp · FOp = 64
optimal stage effort: SE∗ = (PE)1/3 = 4
Delay (normalized): D = N · SE∗ + Parasitics = 3 · 4 + 3 · 1 = 15
LE
Cin = BE · Cout ∗
SE
4/3
z = 1 · 4.5 = 1.5
4
4/3
y = 3 · 1.5 = 1.5
4
4/3
x = 2 · 1.5 =1
4
W X Y
1 A B
fixed sideload
Without sideload and FO4 sizing rules:
w = 1, x = 4, y = 16 ⇒ Minimum delay
First two stages:
x
A+y
D1 = LEinv + γinv + LEinv + γinv
w x
δD2 1 B y B
= − 2 =0⇔ =
δy x y x y
89
We can use the two equations to iterate the solution:
√ √
y 2 = BX, y = BX = 64 · 4 = 16
p
x2 = (A + y)w, x = (A + y)w = 4.9
(y = 17.7) ⇒ (x = 5.1)
(y = 18) ⇒ (x = 5)
W=1 X=5
b) + c)
1 8 16
Slightly different but muh easier to carry out, hand calcuation using this method com-
pared to the iterative approach.
Logical effort is a useful tool for back of the envelope timing optimization:
• Allows quickly to determine the optimal delay for a given path, and
• the corresponding device sizes to achieve this delay
90
• it provides insight into the key factors in the delay of logic paths
There are many other aspects of logical effort that were not covered in this chapter
(See Sutherland, 1999).
• CMOS
– dissipates far less power
– more area due to complementary PMOS and their size requirements to achieve
equal rise / fall delays (2:1)
• Pseudo-NMOS
– require ratioed devices to set the desired value of VoL (1:4)
– asymmetric rise and fall times
Key paramters are area, timing and power. All nodes of a static gate have direct paths
to VDD or Gnd. So long as the power remains on, the value of the output node is held
indefinitely.
Dynamic Gates
Basic Concepts
• Pass transistor
• Charge sharing
• Feedthrough
• Charge leakage
Pass Transistor
91
• 3 terminals: input, output, control
• NMOS has trouble to pass VDD
• PMOS has trouble to pass 0V
VDD
S D VDD
VDD
S D 0V In Out
0V
In Out
a) entladen b) laden
VDD
D S |VTP |
0V
D S VDD −|VTN | In Out
VDD
In Out
c) laden d) entladen
0V
D S High-Z
0V
D S High-Z In Out
VDD
In Out
VDD
e) f)
VDD
VDD −VTN
VDD VDD
VDD VDD −2VTN
S D VDD VDD −3VTN
VDD −VTN
a) b) c) Kaskode
NMOS pass gate configurations (ignoring body effect)
92
3|VTP |
S D
|VTP | |VTP |
2|VTP |
|VTP |
a) b) c)
PMOS pass gate configurations (ignoring body effect)
The designers should avoid using the output of a pass transistor to drive the gate of
another pass transistor.
VDD
0V
Cf Vout
Vin D S
Cgnd
Clock feedtrough
The charges associated with the two capacitors Cf and Cgnd redistribute to maintain
equality (displacement current).
The same effect would be observed at the drain node if it were isolated from the rest of
the circuit.
V1 V1
Cf Cf
V2 V2
Cgnd Cgnd
93
which is a high-Z node.
Cf (V1 − V2 ) = Cgnd V2
Cf V1
V2 =
Cf + Cgnd
Cf
∆V2 = ∆V1 (7.3)
Cf + Cgnd
Bootstrapping is usually associtated with increasing a node voltage above VDD using
active devices. We use the term here more generally to refer to any increase in the
voltage due to series capacitor effects.
Cf Vout Cf Vout
1.2V D S 0V D S
Cgnd Cgnd
Initial:
p √
Vout = VDD − VTN (Vout ) = 1.2 − (0.4 + 0.2( 0.88 + Vout − 0.88) = 0.73V
94
7.2 Charge Sharing
Two isolated nodes with different voltage levels are suddenly connected together when
a pass transistor turns on.
0 0→VDD
V1 V2 V∗ V∗
C1 C2 C1 C2
a) Before b) After
Qtotal = C1 V1 + C2 V2 = (C1 + C2 V ∗ )
C1 V1 + C2 V2
V∗ = (7.6)
C1 + C2
To reduce charge sharing (V ∗ < V2 ), C2 should be made much larger than C1 .
V1 = VDD − VTN
∗
Achtung! V > VDD − VTN ⇒ Q2 = Qtotal − C1 (VDD − VTN )
V1 6= V2 after charge sharing
95
Example:
0→VDD
V1 V2
C1 C2
a) C1 = 100f F c) C1 = 20f F
C2 = 20f F C2 = 100f F
V1 = 0V V1 = 0V
V2 = 1.2V V2 = 1.2V
C1 V1 + C2 V2 ⇒ V ∗ = 1.0V
⇒V∗ = = 0.2V
C1 + C2
b) C1 = 20f F V1 cannot rise above 0.8V
C2 = 20f F
Q1 = 0.8V · 20f F = 16f C
V1 = 0V
Q2 = (120 − 16)f C = 104f C
V2 = 1.2V
104f F
⇒ V ∗ = 0.6V V2 = = 1.04V
100f F
• reverse-bias leakage current from the drain junction and subthreshold current
• noise injection from neighboring wires
• α-particles as a leading cause of so-called soft errors
C C
A B ⇔ A B
C
C
The cost of this feature is not only an extra transistor, but also an extra inverter since
each transistor requires a complementary control input (4 transistors).
96
VDD VDD
S D D S
0V VDD −VTN
0V VDD →0V VDD 0V→VDD
In Out In Out
D S S D
|VTP | VDD
a) b)
Transmitting low and high signals (0V, VDD )
The degree of capacitive feedthrough is reduced.
CMOS transmission gate logic can be used to reduce the number of transistors needed
to implement certain logic functions.
S S A B F
0 Z B B
B 1 A Z A
Multiplexer Configuration - Analog
Transmit
S F = AS ∨ BS
A
S F
mit S=A
0 B B
⇒ F = AB ∨ AB
1 B
A F
97
A
mit S= A
B
⇒ F = AB ∨ AB
A F
B
b) XOR Gate
S0 S0 S1 S1
A S = (S1 S0 )
S
2
B
A 00
B 01
Out C 10 MX Out
D 11
C
2
S
D
a) Two-level MUX
98
SA SB SC SD
SA SB SC SD
S = (D,C,B,A)
A S
4
A 0001
B B 0010
C 0100 MX Out
D 1000
C
4
S
D
b) single-level MUX
one-of-four MUX
B B B
OFF OFF High- OFF Charge
Conflict
OFF OFF Impedance OFF sharing
C C C
D D D
ON OFF ON
ON a) ON b) OFF c)
99
A
B A
VDD VDD
A
F=A∨B F = AB
B
B A
B
a) b)
OR and AND functions using CMOS transmission gates (AA)
F = AB ∨ ABC ∨ AC
B A F A F
B
C
B C
C
A A
B
a) b)
F = AB ∨ AB
100
A B F A F A F B F A F
0 0 0 0 * 0 B 0 A 0 *
0 1 1 = 0 * + 0 B = 1 * + 0 B
1 0 1 1 B 1 * 0 A 1 *
1 1 0 1 B 1 * 1 * 1 B
SA AA AA AA AA
Kombinatorik MUX MUX MUX INV
A
B
B
A & B
B ≥1 A F A F
F
&
A B
B B
B
A
F = AB + AB
B
A B F1
0 0 A
F1 = AB A F1 0 1 Z
1 0 A
1 1 Z
B
B
A B F2
0 0 Z
A F2
F2 = AB + AB 0 1 B
1 0 Z
1 1 B
B
A B F1 F2 F F
0 0 A Z A 0
F = F1 + F2 0 1 Z B B 1
1 0 A Z A 1
1 1 Z B B 0
101
7.2.1 CMOS Transmission Gate Delay
C
RN
A B ⇒ ⇒
C1 RP C2 C1 RTG C2
C
RC model for CMOS transmission gate
The role of the TG is to transmit signals from input to output under gate control.
VDS
Ron ≈
Ids
output: VDD → 0V
IDsat VGS
Vout
RN =
IDsat,n
Vout
RN =
VDS IDlin,n
VDD
Vout
R RP =
IDsat,p
RN
RP
Vout
|VTP | VDD
output: 0V → VDD
102
R
RN
VDD − Vout
RP =
IDsat,p
RP
VDD − Vout
RP =
IDlin,p
VDD − Vout
RN =
IDsat,n
RP ||RN
Vout
VDD VDD
−VTN
L
⇒ RTG = Reqn
W
C
C
1 1
C W
2 g
C W
2 g
C
C
OFF state ON state
103
With the complete RC model defined as above, it is now possible to compute the path
delay of circuits containing transmission gates.
However, accurate delay calculation must involve the characteristics of the driver and
load of the transmission gate.
VDD VDD
2W W 2W
a)
W W W
VDD
In
b)
Rinv RTG
out
Cinv CTG1 CTG2 Cload
c)
R1 R2
In out
C1 C2
104
7.2.2 Logical Effort with CMOS Transmission Gates
VDD
sel
2W W R
A Out
W R
W Cload
sel
This is another type of tristate driver which performs a similar function to the trisate
inverter seen in Section 4.9 of Chapter 4. Here, we have three inputs: A, sel, sel
3W Cg (2R)
LE input A = =2
3W Cg R
W Cg (2R) 2
LE input sel = =
3W Cg R 3
VDD
sel
2W 3W R
A Out
R
W 3W 3 Cload
sel
LE computation with a 3X transmission gate
3W (4R/3) 4
LE input A = =
3W R 3
3W (4R/3) 4
LE input sel = =
3W R 3
input A experiences a smaller output resistance without any change in its input capaci-
tance, while input sel experiences an increase of three times its input capacitance but
not a reduction of three times its output resistance.
105
CLK CLK CLK
D Q D Q D Q
C2 C1 C2 C1
CLK CLK
a) b) c)
CLK
Driver INV1 INV2
D Q Q
Qout
TG1
CLK CLK
TG2
CLK
When the circuit is in transparent mode, TG2 is off and the value of D passes through
to Qout after a delay of TG1 and two inverters.
While this circuit removes the problem of dynamic storage it introduces another small
problem. When the CLK signal goes high, we note that there is a delay before CLK goes
low. In that situation, the n-channel device of TG2 is still on. If D is different from the
previously stored Q, then there will be a conflict at node Q for a short duration. TG1 is
attempting to apply a new a value at Q while TG2 is reciculating the old value through
the loop. The problem can be resolved with proper sizing of the INV2, TG2, TG1 and
the gate that is driving D. The basic idea is to ensure that the forward path is stronger
than the feedback path.
106
VDD
M1
CLK
M2
X Q Q
D
M3
M4
TG2
CLK
Q
Q Q
D
CLK
TG1
• charge sharing
• feedthrough
• charge leakage
• single-event upsets (kurzer Glitch durch α-Teilchen Einstrahlung ⇒ kann dynami-
sches Bauteil kurzzeitig logisch verndern; findet man nie wieder)
vs static logic:
107
Out Out
A ⇒ A static current
B CMOS B Pseudo-NMOS
Φprecharge
Out Out
A A
⇒ ⇒ CLK
B B
CLK 8λ
A B C
8λ 8λ 8λ
8λ
In reality, the CLK signal arrives first so A, B and C can be considered as late arriving
signals. In that case, the transistors can be reduced to half the size to deliver the same
delay as the inverter.
108
VDD
CLK MP
Out
CL
NMOS
complex
MN
VDD VDD
CLK CLK
1
1
1
1
109
VDD VDD
CLK CLK
1
0
0
0
Φ
Y1 Y2 Y3
F1 F2 F3
The clock, Φ, should only be low until all inverter outputs are low. When Φ goes
high, we enter the evaluation phase where the internal nodes (labeled Y1 , Y2 , Y3 ) will fall
like dominos in order from left to right, assuming there is a path to Gnd through their
respective n-complexes.
The clock must remain high long enough for logic to propagate through the entire chain.
110
⇒ VS of inv is higher (skewed)
⇒ VS of gate is lower (≈ VTN )
⇒ Therefore, a domino stage will actually switch earlier than a regular gte
There is also a power savings in domino logic. Only those gates that discharge to Gnd
will dissipate power. There will be no power dissipation due to crowbar current since
a direct path between VDD and Gnd is not allowed in this type of logic. In addition,
glitches can be effectively removed since all inputs switch from low to high.
It can onl be used to create noninverting functions.
⇒ An inverter cannot be implemented in domino logic
c
Sum
X = a ⊕ b = ab ∨ ab
b
a X = a ⊕ b = ab ∨ ab
Sum = c ⊕ X = cX ∨ cX
b b b b c c
A 16λ
CLK 8λ
B 16λ
A 8λ B 8λ
4λ
4λ 8λ
111
5
LEnor =
3
Cin,gate 8λ 2
LEdyn nor = = =
Cin,inv 12λ 3
r
2
LEinv = 1 ⇒ LEavg = · 1 ≈ 0.8
3
Therefore, the domino stage is better in terms of its overdrive capability and inout
capacitive loading.
⇒ Only one NMOS device is driven in the domino case
⇒ PD does not fight with the PU, VS ≈ VTN
Once lost, it cannot be recovered and the circuit ceases to function correctly.
CLK
Out
In Cout IN
X
CLK
goes to high
0 CX OUT
skew gate
Y
X
The degree of charge sharing depends on the two capacitances Cout and Cx .
Cout
V∗ = VDD
Cx + Cout
If they are equal, then the output valve will be reduced to half its original value (i.e.
0.5VDD ).
There is obviously a design tradeoff between timing and noise tolerance when sizing the
112
inverter.
Minimum
weak Long
CLK
CLK
a)
Minimizing the effects of charge sharing using Keepers.
Keepers or baby-sitters: weak PMOS ⇒ W L is small
Discussion: siehe Buch Seite 344-345
CLK
Out Out
F F
Structure of
dual-rail domi-
no logic
113
CLK
ab ab
a
a b
b
Dual-rail domino
AND/NAND function
VDD
CLK CLK
ab ab
a
a b
b
Dual-rail domino
CLK
with Keeper devices
114
Delay
A A
Φ
a) Precharge circuit b) Postcharge circuit
The delay line is implemented as a series of inverters. The width of propagating pulses
must be controlled carefully or else there may be contention between NMOS and PMOS
devices or, even worse, oscillations may occur.
For example, if A is high for more than twice the delay around the loop, the circuit will
oscillate. Therefore, special care must be take to ensure correct timing.
Literatur
[1] David Hodges, Horace Jackson, and Resve Saleh. Analysis and Design of Digital
Integrated Circuits. McGraw-Hill Science/Engineering/Math, 2003.
115