Beruflich Dokumente
Kultur Dokumente
Vladimir Stojanović
3
Power-performance system optimization
Complex, many levels of hierarchy and variables
Individual components
Flops & latches
(power and timing critical)
D Q D Q
Logic
Clk Clk
4
Power-performance system optimization
Complex, many levels of hierarchy and variables
D Q D Q Vdd4, Vdd5,
D Q D Q D Q
System level,
Logic A Logic B
Clk Clk Clk
-Logic D Q
Logic A Logic B
5
Power-performance system optimization
Complex, many levels of hierarchy and variables
Interfaces
Individual components (Digital, Analog and
Vdd1, Vth1 Mixed-Signal)
Flops & latches
Vdd2, Vdd3,
(power and timing critical) Vth2 Vth3
Channel
Transmitter Receiver
D Q D Q Vdd4, Vdd5,
D Q D Q D Q
System level,
Logic A Logic B
Clk Clk Clk
-Logic D Q
Logic A Logic B
6
Look at sub-problem: links
Channel
Transmitter Receiver
7
What makes it challenging
High speed
link chip
9
Outline
Show system level optimization for links
Create a framework to evaluate trade-offs
10
Backplane environment
Package
On-chip parasitic
Package
(termination resistance and via
Line card trace device loading capacitance)
Backplane via
Line attenuation
Reflections from stubs (vias)
11
Backplane channel
Loss is variable 0
Attenuation [dB]
Same backplane
-10 9" FR4
Different lengths
Different stubs -20
Top vs. Bot
-30 26" FR4
Attenuation is large -40
9" FR4,
>30dB @ 3GHz -50 via stub
But is that bad?
-60 26" FR4,
via stub
Required signal amplitude 0 2 4 6 8 10
set by noise frequency [GHz]
12
Inter-symbol interference (ISI)
Channel is low pass
Our nice short pulse gets spread out
Dispersion –
short latency
1
(skin-effect,
pulse response
0.8
dielectric loss)
Reflections –
0.6 Tsymbol=160ps long latency
(impedance mismatches
0.4 – connectors, via stubs,
device parasitics,
0.2 package)
0
0 1 2 3
ns 13
ISI
1
Error!
0.8
Amplitude
0.6
0.4
0.2
0
0 2 4 6 8 10 12 14 16 18
Symbol time
14
The right sub-system model
15
Problem with current models
Gaussian distributions
Works well near mean
Often way off at tails
e.g. ISI distribution is bounded
16
Effect of timing noise
Need to map from time to voltage
Jittered Ideal
sampling sampling
Voltage noise
when receiver Voltage noise
clock is off
bk noise
Decompose output into ideal and noise
Noise are pulses at front and end of symbol
Width of pulse is equal to jitter
Approximate with deltas on bandlimited channels
18
Jitter effect on voltage noise
Transmitter jitter
High frequency (cycle-cycle) jitter is bad
Changes the energy (area) of the symbol
No correlation of noise sources that sum
Low frequency jitter is less bad
Effectively shifts waveform
Correlated noise give partial cancellation
kRx kRx
Receive jitter
Modeled by shift of transmit sequence
Same as low frequency transmitter jitter
Bandwidth of the jitter is critical
It sets the magnitude of the noise created 19
Jitter source from PLL clocks
10
Phase Icp
RefClk detector VCO Clock -10
R Kvco/s buffer
Kpd Icp
-20
C from
VCO supply
-30
N
5 6 7 8 9 10
10 10 10 10 10 10
frequency [Hz]
Noise sources
Reference clock phase noise
VCO supply noise
Clock buffer supply noise
M. Mansuri and C-K.K. Yang, "Jitter optimization based on phase-locked loop design parameters,"
IEEE Journal Solid-State Circuits, Nov. 2002
20
2x Oversampled bang-bang CDR
dn
dn
en
en (late)
dn-1
-5
-10
-15
10
log
Attenuation [dB]
26" NELCO, (b)
-20 no stub
-40
-60
26" FR4,
-80 via stub
-100
0 5 10 15 20
frequency [GHz]
24
Capacity with link-specific noise
NELCO FR4
140 140
Capacity [Gb/s]
Capacity [Gb/s]
therm al noise
120 120
20 20
log10(Clipping probability) log10(Clipping probability)
0 0
-25 -20 -15 -10 -5 0 -25 -20 -15 -10 -5 0
Exclusively baseband
Biggest problem is ISI
Starting to use equalization
Thinking about multi-level modulation
Constrained by speed and power
Large number of links on a chip
26
Baseband links - removing ISI
Linear transmit equalizer
Anticausal taps Sampled Deadband Feedback taps
Tx Data
Data
Channel
TapSel
Causal Logic
taps
Decision-feedback equalizer
J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane
Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003. 27
Transmit equalization – headroom constraint
Attenuation [dB]
unequalized
Peak power constraint -5
-10
equalized
-15
-20
frequency [GHz]
-25
0 0.5 1 1.5 2 2.5
28
Optimization example:
Power constrained linear precoding
T T
MSE ( w, g ) Ea 1 2 g w P1 g 2 w PP T w g 2 2
T
Ea ( w P1 ) 2
SINRunbiased ( w)
Ea w P(I 1 1 )(I 1 1 )T P T w 2
T T T
T
0.5d min w P1 V peak wPI PD 1 offset
maximize
w
Ea w P(I 1 1 I PD )(I 1 1 I PD )T P T w 2
T T T
1/ 2
s.t. w 1 1
2=wTS0TXw+wTS0RXw+2thermal
Minimize BER
Residual dispersion into peak distortion
Reflections into mean distortion
Includes all link-specific noise sources
30
Including feedback equalization
1
Feedback equalization (DFE) Feedback
0.8
Subtracts error from input equalization
Amplitude
0.6
No attenuation
0.4
0.2
Problem with DFE
0
Need to know interfering bits 0 2 4 6 8 10 12 14 16 18
Symbol time
ISI must be causal
Problem - latency in the decision circuit
Receive latency + DAC settling < bit time
Can increase allowable time by loop unrolling
Receive next bit before the previous is resolved
31
1 bit loop unrolling
2PAM signal
constellation
1 1 D
1 1
1
1 1
d n | d n 1 1
0 xn d n 1
dClk D Q
1 1 d n | d n 1 0
1
1 1 dClk
33
Comparison with Gaussian model
Cumulative ISI distribution Impact on CDR phase
0 0
9% Tsymbol
-2 -2
-4 -4
10
-6 -6
-8 -8
-10 -10
50 50
margin [mV]
margin [mV]
-15 -15
0 0
-20 -20
-50 -50
Voltage margin
Min. distance between the receiver threshold and contours with same BER
35
Pulse amplitude modulation
00
1 01
0 11
10
36
Multi-level: Offset and jitter are crucial
thermal noise +
thermal noise + offset+
thermal noise offset jitter
45 30 30
Data rate [Gb/s]
0 0 0
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20
Symbol rate [Gs/s] Symbol rate [Gs/s] Symbol rate [Gs/s]
37
Full ISI compensation too costly
thermal noise thermal noise
w. thermal noise + offset + offset+ jitter
20 20 20
Data rate [Gb/s]
38
Outline
Show system level optimization for links
Create a framework to evaluate trade-offs
39
Fully adaptive dual-mode link
TX PLL
PAM2/PAM4
2-10Gb/s
0.13µm
RX
40mW/Gb
tap edge
CDR
updates
eClk
aClk dClk eClk
tap updates
Adaptive sampler
Generates the error signal at reference level
Monitors the link
Adjustable voltage and time reference
On-chip sampling scope
Can replace any other sampler - calibration
41
Dual-loop adaptive algorithm
Data level reference loop
dLevn 1 dLevn stepdataLev sign(en ), xˆn 0
dLevinit x̂n
dLevmid
errorinit p-p dLevend
Sign(en )
… … Sign( xˆn )
Equalizer loop
wn 1 wn stepw sign(en ) sign( xˆ n )
Scale the equalizer - output Tx constraint
42
Dual loop convergence – 4 tap example
PAM2, 5Gb/s, 4taps Tx Equalization
100 1000
800
80
main tap
60
400
40 200
post2
0
20 pre1
-200
post1
0 -400
0 50 100 150 200 0 50 100 150 200
number of updates number of updates
43
Hardware re-use: Dual-mode receiver
prDFE enable
thresh (+)
D Q D Q 0 lsb(+)
thresh(+) D Q
1
0 dClk
prDFE enable
0 x D Q D Q
0 msb
dClk D Q
thresh(-) 1
1
D Q prDFE enable
D Q 0
thresh (-)
dClk D Q 0 lsb(-)
D Q
1
PAM4
44
Hardware re-use: Dual-mode receiver
prDFE enable
thresh (+)
D Q D Q 0 lsb(+)
D Q
1
0 dClk
prDFE enable
0 x D Q D Q
0 msb
dClk D Q
1
1
D Q prDFE enable
D Q 0
thresh (-)
dClk D Q 0 lsb(-)
D Q
1
PAM4
PAM2
45
Hardware re-use: Dual-mode receiver
prDFE enable
thresh (+)
D Q D Q 0 lsb(+)
D Q
1
thresh(+) 0 dClk
prDFE enable
x D Q D Q
thresh(-) 0 msb
dClk D Q
1
1
D Q prDFE enable
D Q 0
thresh (-)
dClk D Q 0 lsb(-)
D Q
1
PAM4
PAM2 with loop-unrolled DFE tap
Leverage multi-level properties of signals in loop-unrolling
46
Improvements with loop-unrolling
0.4
[V]
unequalized
0.3
[mV]
(a) 0 1000 2000 3000 4000 0
-4.5
0.25
[V] transmit equalized -50
0.2 with one tap DFE
-100 -5
fully transmit equalized
0.15
0.05
0
Signal as seen by the
(b) 0 1000 2000 3000
[ps]
4000
receiver (on-chip scope)
47
Model and measurements
0
-2
log10(BER)
-4
-6
-8
-10
-12
-14
80 60 40 20 0 -20 -40 -60 -80
Voltage Margin [mV]
49
Bridging the gap: Multi-tone link
10
Multi-tone data rates with thermal noise
8 Nelco 64Gb/s
FR4 38Gb/s
6
#bits/Hz
0
0 2 4 6 8 10 12 14
frequency [GHz]
50
Bridging the gap: Multi-tone link
10
Multi-tone data rates with thermal noise
8 Nelco 64Gb/s
FR4 38Gb/s
6
#bits/Hz
data0
4
data0
LPF 2 LPF
data1 0
0 2 4 6 8 10 12 14 data1
frequency [GHz]
LPF BPF BPF LPF
53