High Performance Multi-Channel High-Speed IO Circuits PDF

ACSP · Analog Circuits And Signal Processing
Taehyoun Oh
Ramesh Harjani
High Performance
Multi-Channel
High-Speed I/O
Circuits
Analog Circuits and Signal Processing
Series Editors
Mohammed Ismail, The Ohio State University
Mohamad Sawan, École Polytechnique de Montréal
For further volumes:

http://www.springer.com/series/7381
Taehyoun Oh Ramesh Harjani
•
High Performance
Multi-Channel High-Speed
I/O Circuits
123
Taehyoun Oh Ramesh Harjani
Department of Electronic Engineering Department of ECE
Kwangwoon University University of Minnesota
Seoul Minneapolis
South Korea USA
ISSN 1872-082X ISSN 2197-1854 (electronic)

ISBN 978-1-4614-4962-1 ISBN 978-1-4614-4963-8 (eBook)
DOI 10.1007/978-1-4614-4963-8
Springer New York Heidelberg Dordrecht London
Library of Congress Control Number: 2013945793
Ó Springer Science+Business Media New York 2014

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Exempted from this legal reservation are brief
excerpts in connection with reviews or scholarly analysis or material supplied specifically for the
purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the
work. Duplication of this publication or parts thereof is permitted only under the provisions of
the Copyright Law of the Publisher’s location, in its current version, and permission for use must
always be obtained from Springer. Permissions for use may be obtained through RightsLink at the
Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

Preface
The demand for higher throughput combined with the finite number of I/Os in
modern microprocessors has increased the need for higher data rates per pin which
causes the crosstalk noise and the power consumption to increase. The duration of
a crosstalk pulse response is similar to that of forward signal pulse and can span
several unit intervals (UIs) in state-of-art I/Os. We propose a passive receiver-side
analog-IIR differentiator as a solution to this issue. Continuous-time crosstalk
cancellation (XTC) relies on the removal of a continuous-time phenomenon using
a continuous-time technique. This results in power-efficient handling of the high-
frequency crosstalk signal. As technology scales, high-density passive components
with low minimum value are available and our crosstalk cancellation approach is
likely to scale well.
In comparison to prior crosstalk cancellation work we reutilize the normally
wasted crosstalk signal to improve the signal bandwidth and the final signal-
to-noise ratio. This allows us to extend our design to an infinite number of
channels based on a concrete understanding of signal interactions in multi-lanes.
Two prototype test chips have been designed and presented in this dissertation to
validate our approach. The first 2-channel prototype was fabricated in 130 nm
CMOS process and measurement results show improvement in both horizontal and
vertical openings of the eye diagram by 67 % UI and 58.2 % before and after
XTC, respectively, at 5 Gb/s. The XTC portion occupies 0.03 mm2 and consumes
2.8 mW/Gbps/lane. The second 4-channel chip was fabricated in 65 nm CMOS
process with a 3 9 reduction in power (0.96 mW/Gbps/lane) and XTC capabili-
ties. Measured results at 12 Gb/s show that the vertical eye-opening improves by
26.4 % and that the jitter reduces by 37.5 % on average for all 4 channels.
We also describe an efficient automatic calibration algorithm being developed
for this high-speed XTC signaling technique. Channel spacings and channel loss
characteristics vary during PCB manufacture and need to be adapted to. To cope
with this variation, we proposed an adaptive XTC algorithm that simultaneously
operates in conjunction with the automatic gain control (AGC) and adaptive
decision feedback equalization (DFE) loop for channel-ISI mitigation.
v
Acknowledgments
The fear of the Lord is the beginning of wisdom, and knowledge of the Holy One
is understanding (Proverbs 9:10).
First, I would like to thank my Lord, Jesus Christ, who saved my life by
sacrificing himself upon the cross and given me the strength and wisdom to cope
with all challenges throughout the research. He has provided me with all my
essential needs during this time and blessed me by having me meet the precious
people beside me.
I have great admiration for my Advisor, Professor Ramesh Harjani, for his
extensive knowledge in the field of circuit design and his passion for research.
I could neither come up with any idea nor write any papers without his advice in
every step. I hope to learn his enthusiasm and to continue to research like him.
I also want to express my deepest personal gratitude for his unchanging support for
me both when I was good and when I was bad. I cannot forget that he waited for
me to get accustomed to the research project for a long time from the beginning of
my lab life when I did not have sufficient knowledge in the field. He motivated me
to find intuitive solutions for the issues of the project and mentored me to look into
the problems sophisticatedly.
I want to mention that I could not do any work without the support of my better
half, Eunsun. She has endured every difficult moment with me. Thank you for
staying with me and I will love you for the rest of my life as Jesus loves church. I
want to thank my parents for the financial help for 4 years and for assisting me and
my wife.
Most research works in this book have been conducted by the Semiconductor
Research Corporation under grant at the University of Minnesota. During revision
of this book the author has been supported by a Research Grant of Kwangwoon
University in 2013.
Prof. Taehyoun Oh
Department of Electronic Engineering
Kwangwoon University
Seoul, Republic of Korea
vii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Low Impedance Microstrip-Line FEXT Model . . . . . . . . . . . . . 2
1.2 Predicting Eye-Diagram Properties from the Pulse Response . . . 3
1.3 Single-Ended Versus Differential Signaling . . . . . . . . . . . . . . . 6
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 2 3 6 Gb/s MIMO Crosstalk Cancellation and Signal

Reutilization Scheme in 130 nm CMOS Process . . . . . . . . . . . . . . 11
2.1 MIMO-XTCR Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 2 2 MIMO-XTCR Prototype Implementation . . . . . . . . . . . . . 14
2.2.1 2 2 MIMO-XTCR in Single-Ended I/Os . . . . . . . . . . . 14
2.2.2 2 2 MIMO-XTCR in Differential I/Os . . . . . . . . . . . . 18
2.3 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 2 2 MIMO-XTCR Gain Calibration:
Single Input Signal . . . . . . . . . . . . . . . . . .......... 19
2.3.2 2 2 MIMO-XTCR Measurement Results:
Two Independent Input Signals. . . . . . . . . .......... 21
2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... 24
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... 25
3 4 3 12 Gb/s MIMO Crosstalk Cancellation and Signal

Reutilization Receiver in 65 nm CMOS Process . . . . . . . ....... 27
3.1 Characteristic of Far-End Crosstalk and Proposed
Channel Architecture . . . . . . . . . . . . . . . . . . . . . . . . ....... 28
3.1.1 Factors for Crosstalk Strength. . . . . . . . . . . . . ....... 28
3.1.2 Proposed Channel Architecture for Multi-Lane
Single-Ended I/Os . . . . . . . . . . . . . . . . . . . . . ....... 30
3.2 Proposed Low Power XTCR Analog Front-End
for Multi-Lanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....... 33
3.2.1 XTCR on Multi-Lanes (C4) . . . . . . . . . . . . . . ....... 33
3.2.2 Prototype Low Power Analog Front-End
Circuit Design . . . . . . . . . . . . . . . . . . . . . . . ....... 35
ix
x Contents
3.3 Verifying Crosstalk Cancellation Using Multi-Lane Signals . . . . 39

3.3.1 XTCR Gain Calibration . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Measurement Verification for Practical Application . . . . 42
3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4 Adaptive XTCR, AGC, and Adaptive DFE Loop. . ............ 47

4.1 Understanding Crosstalk Behavior . . . . . . . . . . ............ 49
4.2 Adaptive XTC. . . . . . . . . . . . . . . . . . . . . . . . ............ 50
4.3 Automatic Gain Control and Adaptive Decision
Feedback Equalization . . . . . . . . . . . . . . . . . . ............ 55
4.4 Combining the Adaptation of the XTC, AGC
and DFE Coefficients. . . . . . . . . . . . . . . . . . . ............ 59
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . ............ 66
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............ 67
5 Research Summary and Contributions . . . . . . . . . . . . . . . . . . . . . 69
Appendix A: Noise Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Appendix B: Issues of Applying Consecutive 2 3 2 XTCR

on Multi-Lane I/Os (‡4) . . . . . . . . . . . . . . . . . . . . . . . . 77
Appendix C: Transmitter-Side Discrete-Time FIR XTC Filter

Versus Receiver-Side Analog-IIR XTC Filter . . . . . . . . . 79
Appendix D: Line Mismatch Sensitivity . . . . . . . . . . . . . . . . . . . . . . . 83
Appendix E: Input Matching for 4 3 4 XTCR Receiver

Test Bench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Appendix F: Bandwidth Improvement by Technology Scaling. . . . . . . 87

Chapter 1
Introduction
Rapid advances in CMOS technology continue to increase the on-chip clock speeds
exponentially, while high-speed I/Os that are used to connect between chips continue
to be a performance bottleneck for the system. Finite channel bandwidths generate ISI
and reduce the amplitude of the received signal and thus degrade SNR. Techniques to
mitigate ISI in single wireline channels have been used for more than a decade and
systems operating at 20 Gb/s have recently been published [1–6]. With increased
speeds, I/Os become more vulnerable to electromagnetic interference (crosstalk)
from adjacent channels. Channel equalization, in the form of pre-emphasis, that is
often used to tackle ISI, unintentionally increases crosstalk by boosting the high
frequency signal component. A number of XTC techniques have been proposed to
remove the effects of crosstalk [7–13]. Jitter equalization [7, 8], staggered I/Os [9, 10]
and amplitude XTC with a finite-impulse response (FIR) filter at the transmitter [11]
can be used to reduce the effects of crosstalk to a limited extent. However, these
schemes result in increased current consumption and have limited or no impact on
channel spacing. Current crosstalk cancellation techniques do not adapt well to I/O
systems where power efficiency is critical [14]. In most high-speed links, crosstalk
is avoided by maintaining sufficient distance between channels and/or shielding
the channels at a higher cost, choosing differential I/Os at the expense of doubled
power, I/O pins and board area, or running the system at lower data rates with
closely spaced parallel channels. The techniques developed in this thesis rely on two
distinct characteristics. First, we regard the crosstalk as a signal component that can
be reutilized to increase SNR which contrasts with other XTC schemes where the
crosstalk is simply considered as a noise component to be removed. Second, the
continuous-time MIMO-XTC, introduced here, relies on cancelling a continuous-
time phenomenon using a continuous-time technique. This results in very efficient
handling of the high frequency crosstalk signal and its cancellation.
T. Oh and R. Harjani, High Performance Multi-Channel High-Speed I/O Circuits, 1

Analog Circuits and Signal Processing, DOI: 10.1007/978-1-4614-4963-8_1,
© Springer Science+Business Media New York 2014
2 1 Introduction
The rest of the chapter is organized as follows. Section 1.1 introduces the far-
end crosstalk (FEXT) channel model for single-ended I/Os. Section 1.2 shows an
intuitive technique for pulse response analysis in terms of eye-diagram properties.
Section 1.3 includes the comparison of crosstalk characteristics on single-ended I/Os
versus differential I/Os.
1.1 Low Impedance Microstrip-Line FEXT Model
Far-end crosstalk (FEXT) in transmission lines is the signal energy that is coupled
between two closely spaced channels. When an active signal is transmitted on one of
the channels, as illustrated in Fig. 1.1a and b, the end of the adjacent channel receives
the coupled FEXT signal. If the adjacent channel transmits another independent
signal in the same direction, it will receive both its own original signal and FEXT
coupled from the adjacent channel. Since these two signals are uncorrelated, the
FEXT degrades the horizontal and vertical eye-opening of the original signal and is
normally considered as noise. In a homogeneous channel, i.e. strip-line, the inductive
and capacitive coupling is well-balanced and the FEXT becomes negligible [15]. On
the other hand, in an inhomogeneous channel like a micro-strip line, significant
crosstalk energy couples through the asymmetrical field [7]. However, micro-strip
lines have a cost advantage due to its convenient implementation so most interfaces
are manufactured using micro-strip lines. As PCBs are required to have more and
more channels in a limited board area for higher data throughput, the physical spacing
between channels is reduced and crosstalk is rapidly becoming the dominant factor
affecting signal integrity.
Longer channel lengths and reduced channel spacings result in a larger coupling
coefficient and more crosstalk transfers onto the adjacent channel [7, 15]. Transmitted
signals with sharper transitions are more easily coupled because of the high-pass filter
characteristic of the crosstalk channel. Fig. 1.1a presents the physical parameters used
to formulate the crosstalk model in single-ended I/Os. L is the channel length, W
is the channel width and d is the center-line distance between channels. When the
aggressor signal Vin (ω) is transmitted on closely spaced channels, the FEXT signal
VF E X T (ω) occurs at the adjacent channel output as
u
VF E X T (ω) = − jωτ H (ω)Vin (ω) = − jω H (ω)Vin (ω) (1.1)
dk
where τ (= u/d k ) is the forward coupling strength and u is a function of channel

length L and channel height. For simplicity of analysis, we have assumed a fixed
channel length, channel height and a constant transition time for the transmitted sig-
nal. The crosstalk energy diminishes approximately by a factor of d k , where, for
single-ended I/Os, the nominal value for k is between 1 and 2 depending on chan-
nel conditions [15, 16]. In most low impedance micro-strip lines used in portable
1.1 Low Impedance Microstrip-Line FEXT Model 3
electronics, the inductive coupling component is dominant and the FEXT pulse
response is approximately the negative derivative of the channel pulse response
h(t) [7, 11, 15].
1.2 Predicting Eye-Diagram Properties from the Pulse Response
In this section, we discuss the limitations of the previously proposed FIR XTC filter
in [11]. The channel ISI tails at a bit period of Tb and FEXT pulse response tails
sampled at half a period Tb /2, as shown in Fig. 1.1b, are immediately related to the
eye opening level and jitter performance of eye-diagram. An algorithm to determine
the crosstalk cancellation tap weights has been suggested in [11]. However, the FEXT
tails half a bit period away from the eye-center timing, f −1/2 , f 1/2 significantly limit
the efficacy of XTC schemes with integer FIR taps.
The eye-diagram performance can be directly evaluated from the shape of a sin-
gle pulse response analytically. Experimentally, this single pulse response can be
conveniently measured via a digital sampling oscilloscope and a pulse generator. We
propose a method to calculate the jitter p− p and the vertical eye-opening from a single
output pulse. As shown in Fig. 1.1a and b, when a single pulse at the appropriate data
rate is transmitted, we can obtain the pulse responses of the data and the crosstalk
at the end of the channels, which are expressed as h(t) and f (t), respectively. The
data eye-diagram is a convolution of the pulse response h(t) and the impulse train
sequence x[n]δ(t − nTb ) and can be obtained by folding it repeatedly with n
integer multiple symbol periods. If we assume that h(t) is the response for a causal
Fig. 1.1 Pulse responses of NRZ signal and FEXT in single-ended I/Os a Closely spaced single-
ended I/Os b Pulse responses c Eye patterns
4 1 Introduction
channel, then h −m = · · · = h −2 = h −1 = 0, ∀ m > 0. Further, for simplification

in this analysis, we shall assume that the ISI tail is limited to h 1 , h 2 and is zero for
all positive values of m greater than 2. Then, the maximum data eye-opening can
be directly calculated as h 0 − (h 1 + h 2 ) in the worst case scenario. The maximum
magnitude of the vertical signal is h 0 + h 1 + h 2 , which is h 1 + h 2 larger than the
maximum pulse amplitude (h 0 ) due to the randomness of data. The time duration of
the maximum horizontal eye, Tho is the time interval between the two points where
the voltage level of the data pulse is (h 0 + h 1 + h 2 )/2. As a result, the jitter p− p
becomes Tb − Tho . In a highly lossy channel or higher data rates, h 0 decreases and
the number of ISI tails and their magnitude increases and therefore the eye-opening,
h 0 − (h 1 + h 2 ) reduces. At the same time, the horizontal eye interval reduces as the
vertical eye-opening h 0 − (h 1 + h 2 ) decreases and the two points are brought closer
together.
In the case of the crosstalk pulse response, shown in Fig. 1.1b, the maximum
amplitude occurs at approximately half of the two data eye centers. The crosstalk
has its maximum amplitude during the data transition, typically Tb /2 away from the
data eye-center and crosses the zero when the data eye is at its peak (i.e. f 0 = 0).
The maximum crosstalk amplitude | f −1/2 | + f 1/2 + f 3/2 + f 5/2 in the FEXT eye-
diagram occurs at ±Tb /2 away from the center of data eye, as shown in Fig. 1.1d.
On the other hand, the integer crosstalk tails, f 1 and f 2 , align with the center of data
eye. If uncorrelated data and FEXT are combined, representing the case of crosstalk
coupling to an adjacent channel, the vertical eye-opening reduces by 2( f 1 + f 2 ) in
the worst case and becomes h 0 − (h 1 + h 2 ) − 2( f 1 + f 2 ). Here, we assume the
delays between two channels are matched. The maximum vertical eye-opening in
the channel with both ISI and FEXT can be more generally written as,
Veye−max = h 0 − h n − 2 f n (1.2)
The FEXT tails, f −1/2 and f 1/2 take place during the data transitions and dis-
turb the timing. The absolute magnitude of f −1/2 and f 1/2 are larger than f 1 and
f 2 and therefore, the majority of the crosstalk energy is normally transformed into
jitter. The detailed behavior of how crosstalk affects timing variation is described
in [7, 8]. Additionally, in situations where per-pin de-skew scheme may be applied,
as shown in Fig. 1.2, the Veye−max will further reduce because the timing of FEXT
center ( f 0 ) does not align with the timing of data center (h 0 ). A simulation of the per-
formance degradation in situations where a per-pin de-skew scheme may be required
is presented in Sect. 2.3.
Similar degradation by the f −1/2 and f 1/2 crosstalk occurs even at lower speed
signals. Here, the lower speed applies to the overall data rate. We note that ISI tails
are limited at this lower rate for the limited loss cases. For example, as illustrated
in the low speed signal pulse response in Fig. 1.3, the integer ISI tails (h −1 , h 1 )
are zero and do not influence the following symbol amplitude at the data center.
In this example, the integer crosstalk tails ( f −1 , f 0 , f 1 ) are zero and the crosstalk
tails at f −1/2 and f 1/2 only degrade the timing variation. For both low speed signal
and high speed signal, the large amplitude of f −1/2 and f 1/2 limits the benefits of
1.2 Predicting Eye-Diagram Properties from the Pulse Response 5
Fig. 1.2 Skewed FEXT signal in a transmitter per-pin de-skew scheme
Fig. 1.3 NRZ signal pulse response h(t) and FEXT pulse response f (t) (low data rate versus high
data rate
crosstalk cancellation schemes with integer FIR filters that have maximum effect
only at the center of the eye. Fractional taps are required to remove the non-integer
crosstalk tails, which increases the necessary clock speed and power. Sham et al.
[11] proposes a zero-forcing algorithm to optimize the tap weights for the FIR filter
for maximum crosstalk cancellation. Here, 25 mW/Gbps/lane power is consumed to
drive the digital taps with sufficient speed.
Moreover, XTC with a FIR filter is preferentially implemented on the transmitter-
side due to availability of the timing information coincidentally with the crosstalk
signal. A typical FIR filter with discrete taps at the receiver samples the signal at
the timing of h 0 for maximum SNR. However, we can notice that from the crosstalk
pulse response in Fig. 1.1b, the f −1/2 is received before h 0 . Therefore, the sampled
signal is not available when the f −1/2 disturbs the transition timing in the adjacent
6 1 Introduction
channel and a XTC implementation with FIR taps in the receiver becomes ineffec-
tive. A transversal FIR filter that uses continuous-time analog delay, may be used
instead of a sample-data FIR filter to avoid the issues of the availability of sam-
pled signals from adjacent channels and fractional taps. Besides, adding additional
taps at the transmitter to generate the crosstalk cancellation signal in addition to the
pre-emphasis, reduces the overall output swing of the transmitter and decreases the
absolute value of the vertical eye-opening of the received signal [11].
1.3 Single-Ended Versus Differential Signaling
The power spectrum of a NRZ signal coupled to an adjacent channel can be expressed
as ⎤2
⎡
sin ωT2 b
S F E X T (ω) = Tb τ 2 ω 2 |H (ω)|2 ⎣ ⎦ (1.3)
ωTb
2
which is the product of the FEXT channel transfer function, − jωτ H (ω), and the
NRZ data spectrum, Tb [sin(ωTb /2)/(ωTb /2)]2 . The power spectra of a 2 Gb/s and
a 5 Gb/s NRZ signal and the FEXT channel transfer functions when d is 2W , have
been measured and are shown in Fig. 1.4a. A PCB trace with a length of 16 inches,
channel width of 120 mils and channel height of 62.5 mils was used for this set
of measurements. The left-hand Y-axis shows the units for the NRZ spectrum in
dBm, while the right-hand sides shows the scattering parameters (S21 ) for the FEXT
transfer function. Notice that as the data rates increase, a larger portion of the signal
energy moves towards frequencies where crosstalk has a higher gain, allowing more
NRZ signal energy to be coupled to the adjacent channel as crosstalk. The power
spectrum of the NRZ signal coupled to an adjacent channel is shown in the Fig. 1.4b,
i.e., this is the FEXT signal we wish to cancel.
(a) (b)
Fig. 1.4 a 2, 5 Gb/s NRZ spectrum and FEXT transfer function b Coupled NRZ spectrum
1.3 Single-Ended Versus Differential Signaling 7
Fig. 1.5 Differential I/O

crosstalk model
These measurements suggest that to maximize the throughput and at the same
time to avoid crosstalk degradation, data should be distributed over a large number
of parallel lines operating at lower speeds (<2 Gb/s) where the crosstalk strength
is minimal. Traditionally, most CPU-to-memory interfaces follow this guidance.
However, the overall throughput in this case is limited by the number of channels,
I/O pins, lower speed and available PCB area. An alternate choice is to run the
system at a higher speed utilizing differential signaling, which is less susceptible
to crosstalk. Nevertheless, low signal gain and long ISI tails of the received NRZ
signals place challenging design constraints on the clock and data recovery circuits.
Also, the additional power necessary for the I/O makes the differential solution less
attractive.
It is instructive to derive the crosstalk equations for differential I/Os as well.
Figure 1.5 shows all the necessary physical parameters we will use to describe the
coupled crosstalk signal. Here, a is the ratio of the distance between the differ-
ential pair to the channel width. The bandwidth of the differential channel and the
characteristic impedance reduces for a small a , while the shielding effect provided
by the wave guide declines and the channel has a reduced common-mode rejection
ratio (CMRR) benefit for a large a . A typical range for a is 1 − 1.5 [17]. Here,
D is the distance between the two differential pairs. The aggressor signal V (ω)
in
is applied at the differential input nodes and the output signal at the adjacent differ-
ential channels receives the FEXT signals, VF+E X T (ω) and VF−E X T (ω). As discussed
earlier, the inductive crosstalk coupling component is dominant in most micro-strip
lines. Each differential output contains a superposition of the inductively coupled sig-
nals originated from Vin (ω)/2 and −Vin (ω)/2. Using Eq. (1.1), VF+E X T (ω)/2 and
VF−E X T (ω)/2 can be expressed as
u Vin (ω) u Vin (ω)

VF+E X T (ω) = jω H (ω) − jω H (ω) k (1.4)
(D − aW )k 2 D 2
u Vin (ω) u Vin (ω)

VF−E X T (ω) = jω H (ω) − jω H (ω) (1.5)
Dk 2 (D + aW )k 2
8 1 Introduction
D
is usually larger than aW for differential I/Os for better crosstalk rejection.
Expanding (1 − aW/D)k ≈ 1 − kaW/D and (1 + aW/D)k ≈ 1 + kaW/D, for
D aW , the FEXT signals seen at the differential output node is
uk 2 a 2 W 2
VF E X T (ω) = VF+E X T (ω) − VF−E X T (ω) ≈ jω H (ω) Vin (ω)
D k (D 2 − k 2 a 2 W 2 )
uk 2 a 2 W 2
≈ jω H (ω) Vin (ω) (1.6)
D k+2
This equation illustrates three important features of differential signaling. First, con-
trary to single-ended I/Os, the crosstalk signal in differential I/Os has a positive po-
larity even though each single channel is inductively coupled. Second, as the distance
between the single differential pair aW decreases, the FEXT diminishes due to the
increased shielding effect. However, this can only be achieved at the cost of reduced
channel bandwidth and reduced characteristic impedance. Finally, assuming that d
in Eq. (1.1) equals D in Eq. (1.6), the modulus for differential crosstalk is lower
than for the single-ended case by a factor of k 2 a 2 W 2 /D 2 because of the shielding
effect just described. Intuitively, the coupled signals of two differential aggressors
with slightly skewed distances cancel each other in the adjacent channel. The value
of k depends on the board type [see Eq. (1.1)]. In case of the channels presented
in [11, 16], k equals 1 and the crosstalk strength in differential I/Os decreases pro-
portional to ∼ 1/D 3 , whereas the crosstalk in a single-ended channel decreases by
∼ 1/D. In other words, differential I/Os are more immune to crosstalk degradation
than single-ended I/Os at higher data rates. In addition, differential I/O circuits have
a better power supply rejection ratio (PSRR). However, the MIMO-XTC scheme
presented here minimizes the crosstalk in single-ended I/Os allowing reductions in
transmitter driver power, I/O pin count and channel PCB area, provided the other
system constraints, such as PSRR, allow for this change.
References
1. A. Momtaz, M.M. Green, An 80mW 40Gb/s 7-Tap T/2-spaced FFE in 65nm CMOS, IEEE
ISSCC, Feb 2009, pp. 364–365
2. H. Wang, J. Lee, A 21-Gb/s 87-mW transceiver with FFE/DFE/Analog equalizer in 65-nm
CMOS technology, IEEE JSSC, Apr 2010, pp. 909–920
3. J. Lee, K.-C. Wu, A 20Gb/s full-rate linear CDR circuit with automatic frequency acquisition,
IEEE ISSCC, Feb 2009, pp. 366–367
4. K.-C. Wu, J. Lee, A 2x25Gb/s deserializer with 2:5 DMUX for 100Gb/s ethernet applications,
IEEE ISSCC, Feb 2010, pp. 374–375
5. S. A. Ibrahim, B. Razavi, A 20Gb/s 40mW equalizer in 90nm CMOS technology, IEEE ISSCC,
Feb 2010, pp. 170–171
6. C.-F. Liao, S.-I. Liu, A 40Gb/s CMOS serial-link receiver with adaptive equalization and
clock/data recovery, IEEE JSSC, Nov 2008, pp. 2492–2502
7. J.F. Buckwalter, A. Hajimiri, Cancellation of crosstalk-induced jitter, IEEE JSSC, Mar 2006,
pp. 621–631
References 9
8. H.-K. Jung, K. Lee, J.-S. Kim, J.-J. Lee, J.-Y. Sim, H.-J. Park, A 4 Gb/s 3-bit parallel transmitter
with the crosstalk-induced jitter compensation using TX data timing control, IEEE JSSC, Nov
2009, pp. 2891–2900
9. K-J Sham, R. Harjani, I/O staggering for low-power jitter reduction, European microwave
conference, Aug 2008, pp. 1226–1229
10. K.-I. Oh, L.-S. Kim, K.-I. Park, Y.-H. Jun, K. Kim, A 5-Gb/s/pin transceiver for DDR memory
interface with a crosstalk suppression scheme, IEEE CICC, Sep 2008, pp. 639–642
11. K.-J. Sham, M.R. Ahmadi, S.B.G. Talbot, R. Harjani, FEXT crosstalk cancellation for high-
speed serial link design, IEEE CICC, Sep 2006, pp. 405–408
12. C. Pelard, E. Gebara, A.J. Kim, M.G. Vrazel, F. Bien, Y. Hur, M. Maeng, S. Chandramouli,
C. Chun, S. Bajekal, S.E. Ralph, B. Schmukler, V.M. Hietala, J. Laskar, Realization of multigi-
gabit channel equalization and crosstalk cancellation integrated circuits, IEEE JSSC, Oct 2004,
pp. 1659–1669
13. Y. Hur, M. Maeng, C. Chun, F. Bien, H. Kim, S. Chandramouli, E. Gebara, J. Laskar, Equal-
ization and near-end crosstalk (NEXT) noise cancellation for 20-Gb/s 4-PAM backplane serial
I/O interconnections, IEEE JSSC, Jan 2005, pp. 246–255
14. G. Balamurugan, R. Moony, A scalable 5–15 Gbps, 14–75 mW low-power I/O transceiver in
65nm CMOS, IEEE JSSC, Apr 2008, pp. 1010–1019
15. F.D. Mbairi, H. Hesselbom, High-frequency transmission lines crosstalk reduction using spac-
ing rules, IEEE TCPT, Sep 2008, pp. 601–610
16. W.R. Eisenstadt, D.E. Bockelman, Common and differential crosstalk characterization on the
silicon substrate, IEEE MGWL, Jan 1999, pp. 25–27
17. Micron, DDR2-533 memory design guide for two-DIMM unbuffered systems, Technical note,
2003, p. 14
Chapter 2
2 × 6 Gb/s MIMO Crosstalk Cancellation
and Signal Reutilization Scheme in 130 nm
CMOS Process
In this chapter, a continuous-time multiple-input and multiple-output crosstalk can-

cellation and reutilization (MIMO-XTCR) architecture operating at 2–6 Gb/s has
been presented. The performance of the XTCR equalizer has been measured with
various spacings of FR4 channels and data rates. The crosstalk energy reutiliza-
tion (XTR) technique efficiently handles crosstalk and achieves high signal integrity
in severe crosstalk environments where crosstalk had completely closed the data
eye. Measurement results show improvement in jitter p− p and vertical opening
of the eye-diagram by 67 %UI and 58.2 % respectively, which is the best known
improvement to date. The MIMO-XTCR portion occupies 0.03 mm2 and consumes
2.8 mW/Gbps/lane, which is 2 times lower than previously proposed XTC schemes.
Section 2.1 proposes a new 2 × 2 MIMO-XTCR algorithm. A CMOS proto-
type implementation of the 2 × 2 MIMO-XTCR for single-ended I/Os is presented
in Sect. 2.2. Measurement results are shown for the prototype circuit in Sect. 2.3.
Section 2.4 summarizes the measurement results.
2.1 MIMO-XTCR Architecture
Our approach utilizes a continuous-time receiver-side crosstalk cancellation scheme

that is applicable to single-ended I/Os. The proposed algorithm is shown in Fig. 2.1.
X 1 (ω) and X 2 (ω) are the frequency domain representations of the transmitted signals,
x[n]δ(t − nTb ), on channel 1 and channel 2. H (ω) is the channel transfer function
and − jωτ H (ω) is the FEXT. As stated earlier, the FEXT is the negative derivative
of channel transfer function multiplied by ‘τ ’ [see (1.1)]. The received signals Y1 (ω)
and Y2 (ω) contain both the primary signal with the channel ISI and the FEXT of the
adjacent channel signal. The behavior of this two-channel system can be expressed
in matrix form as (2.1)

Y1 (ω) H (ω) − jωτ H (ω) X 1 (ω)
= (2.1)
Y2 (ω) − jωτ H (ω) H (ω) X 2 (ω)

12 2 2 × 6 Gb/s MIMO Crosstalk Cancellation and Signal Reutilization Scheme
Fig. 2.1 2 × 2 MIMO-XTCR architecture
In our algorithm, the received signal Y1 (ω), Y2 (ω) are added to the adjacent
channel’s received signal after differentiation ‘ jω’ and with an appropriate value
for the gain term ‘β’. When the gain is correctly adjusted to β = τ , it forces the
non-diagonal term in the resultant matrix of (2.2) to become zero and the crosstalk
is cancelled out. At this point the equalized signals does not include the adjacent
channel’s signal component and hence the two channels become independent as
shown in (2.3).

Z 1 (ω) G jωβG Y1 (ω)
=
Z 2 (ω) jωβG G Y2 (ω)

(1 + βτ ω )G H (ω) jω(β − τ )G H (ω)
2 X 1 (ω)
= (2.2)
jω(β − τ )G H (ω) (1 + βτ ω 2 )G H (ω) X 2 (ω)

Z 1 (ω) (1 + τ 2 ω 2 )G H (ω) 0 X 1 (ω)
= when β = τ
Z 2 (ω) 0 (1 + τ 2 ω 2 )G H (ω) X 2 (ω)
(2.3)
where ‘G’ is a primary signal path gain. An additional term, τ 2 ω 2 G H (ω), is now
present in the diagonal terms. This second derivative of the signal boosts the high
frequency signal content by τ 2 ω 2 G and works against the channel loss in H (ω).
This additional signal (MIMO signal or XTR signal) is normally lost in previous
schemes since the crosstalk is only considered as a noise component to be cancelled
out. In our MIMO algorithm, the crosstalk signal energy is reutilized to reduce the
jitter and increase the vertical eye opening while mitigating the ISI tails. When we
differentiate the received signal, both the crosstalk cancellation signal and a MIMO
(XTR) signal for the adjacent channel are obtained. For clarity purposes, the signal
path without differentiation is designated as the forward signal path and the path with
differentiation as the differentiation signal path, as indicated in Fig. 2.1.
2.1 MIMO-XTCR Architecture 13
Fig. 2.2 Bandwidth improve-

ment by MIMO signals
A continuous-time high-frequency boosting linear equalizer stage follows the

architecture to compensate for any remaining ISI. In our test setup, the linear equalizer
only compensates for a small part of the channel ISI. In channels with large crosstalk
the MIMO boosting can significantly ease the burden on any follow-on channel
equalizer. It reduces both the number of stages used in analog equalizers and the
number of taps that may be used in a decision feedback equalizer. The saved power
by the MIMO (XTR) signal is analyzed and quantified in Chap. 4. Figure 2.2 shows
the frequency domain impact of MIMO-XTCR for various strengths of the MIMO
signal (τ = 15 ps, τ = 23 ps). We note that for channels with increased crosstalk
degradation, the MIMO crosstalk cancellation and reutilization scheme results in a
larger bandwidth improvement. This implies that high quality signal integrity can
be achieved even in severe crosstalk environments where the crosstalk completely
closes the horizontal and vertical eye-opening.
However, it is important to discuss three fine points here. One, an increased ‘τ ’
cannot be used to substitute for channel equalization for ISI as they operate on various
crosstalk and MIMO signal strengths. And two, wireline channels tend to be crosstalk
and ISI dominant (by over 20 dB) and so increasing the high-frequency response does
not degrade signal-to-noise significantly but improves signal-to-distortion ratios.
Appendix A shows a noise simulation with a realistic white noise level. And three,
in the presence of per-pin de-skew scheme, the primary signal and coupled FEXT
signal will not align perfectly, as depicted in Fig. 1.2. The FEXT and its cancellation
signal or forward and MIMO (XTR) signal will be skewed even with an optimal β
value and degrades some of the benefits of both crosstalk cancellation and MIMO
improvement discussed here. However, the degradation is limited and is affected by
where the majority of the skew is accumulated in the channel. Further details about
per-pin skew are discussed in Sect. 2.3.
2.2 2 × 2 MIMO-XTCR Prototype Implementation
2.2.1 2 × 2 MIMO-XTCR in Single-Ended I/Os
A 5 Gb/s prototype circuit for the proposed 2 × 2 MIMO-XTCR scheme for single-
ended I/Os has been implemented in a 130 nm CMOS technology using passive
differentiators, 180◦ phase shifters, pseudo-differential pairs and analog equalizers
as shown in Fig. 2.3. Replica circuits for the differentiators and the 180◦ phase shifters
are added to corresponding stages of each path to equalize phase delays.
Figure 2.4a, b shows the passive input network and its magnitude and phase
response. The continuous-time capacitor-resistor (CR) filter for implementing the
differentiator enables a low power and low area design. The transfer function of the
CR passive differentiator, sRC/(1 + sRC), now differs from the replica circuit’s,
1/(1 + sRC), by a factor of exactly sRC and the relation between two paths is
only a differentiator. As shown in Fig. 1.4a, b, a 5 Gb/s NRZ signal and its FEXT
Fig. 2.3 2 × 2 MIMO-XTCR circuit for single-ended I/Os

2.2 2 × 2 MIMO-XTCR Prototype Implementation 15
(a)
(b) (c)
Fig. 2.4 Input network performance in the frequency domain. a Input network for forward and
differentiation signal path; b frequency responses, c measured S1 1 amplitude from IC prototype
frequency corner are around 2.5 GHz so the low pass RC replica with a pole at
10 GHz does not affect the gain of the primary signal path. The 1/RC value needs to
be sufficiently large to ensure that the pole of the replica does not reduce the signal
of interest. However the MIMO-XTCR signal gain is directly proportional to the
RC value to the first order, therefore 1/RC cannot be too large. In this design, the
magnitude loss due to the RC filter at 10 GHz on the signal frequency at 2.5 GHz is
−20 log |1/(1 + jωc /ω0 )| = 0.26 dB. The location of this pole frequency is chosen,
considering the trade-off between the MIMO-XTCR signal gain and the replica cir-
cuit’s pole. Tuning resistors can adjust the pole frequency and reduce any mismatch
due to process variation of the RC constant between the differentiator and its replica
circuit.
Phase delays of the forward signal path and differentiation signal path from two
receiver inputs to an linear equalizer input, are matched to 90◦ difference in our
MIMO-XTCR scheme as stated earlier. Any per-pin skew would detract from this
matched case. However, for the prototype we assume matched channel paths. Results
for per-pin de-skew degradation are presented in Sect. 2.3. For a broadband 50
receiver input matching, the passive RC-CR network impedance (Z x ) should have a
large value, as indicated in Fig. 2.4a. In our case, the impedance Z x is approximately
(a) (b)
Fig. 2.5 Signal adder plus analog equalizer circuit design options: a current domain adder +Linear
equalizer; b pseudo-differential pair + Linear equalizer
80 − j156 and the measured input S11 remains below −15 dB for the signals of
interest up to 2.5 GHz as can be seen in the Fig. 2.4c.
The forward and differentiation path’s signals are combined after differentiation
and the remaining ISI can be removed by the follow-on linear equalizer, as shown in
Fig. 2.1. The combination of the adder and the linear equalizer is required to have two
features. First, since the forward signal from the lossy channel and the FEXT signal
coupled from the adjacent channel passes through the forward signal path and are
random and uncorrelated, the adder should have a large input swing, ≈500 mV p− p .
Second, the received single-ended signal needs to be converted to a differential signal
for improved PSRR at some point.
Figure 2.5 shows a couple of potential circuit combinations for the high-speed
signal adder plus analog equalizer. In the circuit of Fig. 2.5a, a current domain adder
with a skewed gain control combines the signals from the two paths and single-
to-differential conversion is achieved by a passive low pass filter at the linear equalizer
input with the penalty of 6 dB gain loss. A difference between the forward signal-
side amplifier and differentiation signal-side amplifier gain of 10 dB can cover a wide
range of FEXT strengths for 2W distance. The forward signal-side amplifier’s gain
only needs to avoid signal loss. The output swing at the summing node of the first
stage is dropped to satisfy these demands, narrowing the output swing and limiting
the dynamic range through the gain control. The circuit in Fig. 2.5b circumvents this
issue by adding pseudo-differential circuits. A 180◦ phase shifter flips the sign of
either the primary FEXT signal path or the MIMO-XTCR signal path. The FEXT
signal and its cancellation signal appear as a common mode signal and is suppressed
by the CMRR of the differential analog equalizers that follow. Although the 180◦
phase shifter and its replica circuit result in approximately 6 dB loss, the gain control
amplifiers of the two paths now drive separate loads and dynamic range need not
be traded-off. The ordering of the phase shifter stage and pseudo-differential gain
control stage can be switched. With the arrangement in Fig. 2.5b, the gain control
amplifier has lower input signal swing and operates in a more linear region. The linear
equalizers that follow both eliminate the common-mode FEXT signal and boost the
Fig. 2.6 Simulated linear equalizer frequency response
high-frequency signals to remove the residual ISI. The frequency dependent gain
equation is shown in ( 2.4) from [1].
gm R L (1 + jω R S C S )
|AAE (ω)| = (2.4)
1 + gm2R L + jω R S C S
The frequency for the zero, 1/(2π R S C S ) is calibrated by changing C S and R S

depending on both the channel loss and the necessary MIMO high-frequency boost-
ing. For 2W distance, the majority of the high-frequency loss is compensated by
the MIMO (XTR) signals and the burden of high-frequency boosting in the follow-
ing stages can be significantly reduced. Figure 2.6 shows the simulated frequency
response of the linear equalizer including variations of C S and R S [2]. Instead of a
varactor, C S can be implemented by a capacitor with a fixed minimum value or a
parasitic capacitance of the source for a higher zero at the expense of a degree of
freedom for calibration, making both the zero frequency and the DC gain dependent
only on the R S value [3].
The prototype 2 × 2 MIMO-XTCR circuit occupies 0.03 mm2 and the die pho-
tograph is shown in Fig. 2.7. The physical conditions of channel 1 and channel 2 are
expected to be symmetric and therefore, the output results of channel 1 and channel
2 is normally identical. In our proto-type implementation, we have added two addi-
tional linear equalizer stages in channel 2 to increase the flexibility of our prototype
so that we are able to handle both low-loss and high-loss channels optimally. For
our particular test conditions, a single stage of the linear equalizer in channel 1 was
sufficient to compensate for the residual ISI after the high-frequency boosting due
to the MIMO (XTR) signal was included. The MIMO-XTCR circuit draws 11.3 mA
at 3W distance and 14.3 mA at 2W distance from 1.2 V supply regardless of signal
Fig. 2.7 Die photograph
Fig. 2.8 2 × 2 MIMO crosstalk cancellation scheme for differential I/Os
speed. The magnitude of the FEXT depends mostly on the distance rather than on
the data rate and the gain of the pseudo-differential amplifier need only be adjusted
according to the channel distance. The linear equalizer consumes 5.3 mA.
2.2.2 2 × 2 MIMO-XTCR in Differential I/Os
Figure 2.8 shows a potential extension of the MIMO-XTCR scheme to differential

I/Os. Note that contrary to the connection in Fig. 1.5, the polarity of differential
signals on the inputs of two independent channels is swapped to produce the
negative derivative crosstalk pulse response on the adjacent differential channel.

In this relation, we can utilize the crosstalk energy beneficially through our MIMO-
XTCR scheme. The PSRR for this circuit improves by the differential implementation
and the need for the 180◦ phase shifters is eliminated. However, the channel distance
might not be as small as the single-ended I/O case because the differential I/Os take
up at least twice the board area. In addition, since the crosstalk strength drops pro-
portional to 1/D 3 in the differential I/Os as derived in (1.6), the MIMO bandwidth
improvement is not likely to be significant.
2.3 Measurement Results
2.3.1 2 × 2 MIMO-XTCR Gain Calibration: Single Input Signal
The receiver proposed in Fig. 2.3 has been implemented and measured in a single-
ended I/O environment. A 16 FR4 board with channel loss of 9 dB at 2.5 GHz is
used for this test. A Centellax TG2P1A PRBS of pattern length 231 − 1 generates a
random NRZ signal on a single-ended channel and the signal at the receiver output
is measured by an Infiniium DSO81204A oscilloscope. The FEXT cancellation and
MIMO improvement were directly observed in our test setup after proper termination
and equalization. By transmitting a single NRZ signal onto either the channel 2 or
the channel 1 input, FEXT cancellation or MIMO improvement operation can be
seen separately, as shown in the test setup in Fig. 2.9 and only the measured output
at the channel 1 was used for both cases. A 5 Gb/s NRZ signal and FR4 channels
with 2W distance were used in this calibration example.
When the input to channel 1 is off and only channel 2 is on, then path A contains the
crosstalk coupled to the channel 1 receiver. Path B is the crosstalk cancellation signal
Fig. 2.9 Calibrating FEXT cancellation gain and MIMO improvement (100 mV, 100 ps/division)
(a) (b)
Fig. 2.10 Pulse response of residual FEXT depending on gain control (β). a Gain control and
residual FEXT; b optimal gain control point
derived from the received signals from channel 2. When the gain of the two paths is
adjusted correctly, the output of path A plus path B will ideally have no signal. Their
eye-diagrams are presented in the Fig. 2.9. The gain of the pseudo-differential pair
has a 3 bit resolution and was manually optimized to result in a minimum amplitude
for the sum of these two signals (path A and path B). Figure 2.10 shows the residual
FEXT energy depending on the gain control value. Both over cancellation and under
cancellation increase the residual FEXT level and degrades SNR. In this test the data
rate is 5 Gb/s and channel distance is 2W . For this channel condition, the pseudo-
differential gain control of [101] results in the minimum achievable value for the
residual error ≈51.5 mVrms . The value of the residual FEXT has a maximum at
the data transition and minimum at the center of the data eye. The residual FEXT
normally occurs at the data transition timing and detracts minimally from the data
eye center.
With a gain value that minimizes the residual FEXT, the MIMO bandwidth
improvement was observed in the path C and path D in Fig. 2.9. The path C includes
the forward signal with channel ISI. Path D has the second derivative signal that is
coupled to the adjacent channel as crosstalk and returns through the differentiation
path in our algorithm. When the signals from these two paths are added, the MIMO
(XTR) signal reduces the jitter horizontally and increases the data eye-opening ver-
tically by mitigating channel ISI, as shown in the eye-diagram of path C plus path
D in Fig. 2.9. The parameters for the follow-on linear equalizer was set to remove
any residual ISI. This calibration procedure could be done in parallel for each pair
of channels.
2.3 Measurement Results 21
2.3.2 2 × 2 MIMO-XTCR Measurement Results: Two Independent

Input Signals
During the calibration phase discussed above, only one signal was used. In the mea-
surements of this section, two independent NRZ signals on channel 1 and channel 2
are transmitted and the eye-diagrams of the received signals with crosstalk (before
MIMO-XTCR) and the equalized signals (after MIMO-XTCR) have been measured
for the channel distances of 2W and 3W for different data rates, as shown in Fig. 2.11.
To observe the FEXT degradation by varying the degree of ISI, eye-diagrams with
various data rates (2, 5, 6 Gb/s) are measured. As the channel distance reduces and
the signal frequency increases, the FEXT impact becomes more severe and closes the
eye completely, as shown in Fig. 2.11d, e. In the pulse response of NRZ signals with
higher data rates, the main tap (h 0 ) is lower and ISI tails (h 1 , h 2 , · · · ) have longer
duration and higher amplitude as explained in Sect. 1.2. The data eyes are more sus-
ceptible to FEXT degradation because of the reduced vertical data eye-opening. For
example, the vertical and horizontal eye openings with the 5 Gb/s NRZ signal for
2W distance is 60 mV/821 mV (7.3 %) and 30 ps/200 ps (15 %UI). For the same con-
dition, by using our MIMO crosstalk cancellation and signal reutilization scheme,
the jitter p− p reduces by 67 % and the vertical eye-opening increases by 58.2 %.
Table 2.1 shows a summary of the measurement results before and after MIMO-
XTCR for various channel distances and data rates. Here, J p− p is the jitter, E is
the vertical eye-opening at the center (h 0 −h 1 −h 2 − · · · − f 1 − f 2 − · · · ) and H is the
overall height of data eye (h 0 +h 1 +h 2 + · · · ). These are indicated in Fig. 2.11c. The
case of a 6 Gb/s NRZ signal with 2W distance is not presented because the gain of
the pseudo-differential pair in our scheme was not sufficient to generate an adequate
crosstalk cancellation signal at this frequency. At 2 Gb/s, MIMO-XTCR has more
impact on jitter reduction rather than increasing the eye-opening because the crosstalk
Table 2.1 Measurement results (before and after MIMO-XTCR)

Data Channel Jitter Vertical-eye Jitter p− p /eye
rate distance peak–peak (E/H in Fig. 2.11c) improvement
(Gb/s) (mil) (ps) (mV/mV)
2 360 (3W) 47/9.5 %UI 612/865/70.7 % 0.9 %↓/0.9 %↑
43/8.6 %UI 385/538/71.6 %
2 240 (2W) 135/27.0 %UI 594/800/74.3 % 19.0 %↓/0.9 %↑
40/8.0 %UI 340/452/75.2 %
5 360 (3W) 81/40.2 %UI 314/815/38.5 % 13.2 %↓/11.4 %↑
54/27.0 %UI 223/447/49.9 %
5 240 (2W) 170/85.0 %UI 60/821/7.3 % 67.0 %↓/58.2 %↑
36/18.0 %UI 306/467/65.5 %
6 360 (3W) 83/49.8 %UI 206/810/25.4 % 28.2 %↓/12.4 %↑
36/21.6 %UI 200/529/37.8 %
(a)
(b)
(c)
(d)
(e)
Fig. 2.11 Measurement results for 2, 5, 6 Gb/s NRZ signals in 3W and 2W distances (left-side
graphs: 200 mV, 100 ps/division, right-side graphs: 100 mV, 100 ps/division). a 2 Gb/s 360 mil (3W)
distance; b 2 Gb/s 240 mil (2W) distance; c 5 Gb/s 360 mil (3W) distance; d 5 Gb/s 240 mil (2W)
distance; e 6 Gb/s 360 mil (3W) distance
degrades only the data transition timing and not the vertical eye-opening at this low
frequency.
An issue which needs discussion with any FEXT cancellation scheme is transmit-
ter per-pin de-skew that adjusts timing of signal in channels with unequal lengths.
2.3 Measurement Results 23
Table 2.2 Simulated performance degradation of the proposed MIMO-XTCR scheme due to per-
pin skew (line length 16 , width 120 mil, height 62.5 mil, distance 2W, 5 Gb/s NRZ signal)
Td (ps) J p− p (ps) Vertical eye↓ (mV p− p )
per-pin skew in Fig. 1.2
0 32 (16 %UI) 320/480 mV p− p
2 32 (16 %UI) 311/480 mV p− p (1.9 %↓)
5 32 (16 %UI) 300/480 mV p− p (4.2 %↓)
10 33 (16.5 %UI) 271/480 mV p− p (10.2 %↓)
20 34 (17 %UI) 265/480 mV p− p (11.5 %↓)
40 38 (19 %UI) 180/480 mV p− p (29.2 %↓)
60 50 (25 %UI) 135/480 mV p− p (38.5 %↓)
Table 2.3 Comparison of FEXT cancellation schemes

Reference, signaling Process Speed Jitter↓/eye↑ Power Area
(nm) (Gb/s) (%UI)/(%mV) (mW/Gbps/lane) (mm2 )
[4], differential 130 10 28/0 8 0.024(XTC)
[5], single-ended 180 5 10.4/11 6 0.035(RX)
[6], differential 180 12.8 14.5/15 25 3.75(TX)
This work, single-ended 130 5 67.0/58.2 2.8 0.03(XTC)
A length mismatch physically close to the transmitter outputs has a limited effect
because the coupled signal will align with the forward signal at the end of the chan-
nel. However, a length mismatch near the receiver input directly skews timing of the
crosstalks, reducing both jitter margin and vertical eye-opening even after XTCR. In
general, skew mismatch is likely to be distributed over the channel and its effect will
lie somewhere between being placed at the transmitter end or at the receiver end.
In Table 2.2, we have assumed that all the per-pin skew, due to length mismatch, is
lumped at the receiver input (i.e., worst case that is unlikely), and show the simu-
lation results of performance degradation for the proposed MIMO-XTCR scheme.
In Fig. 1.2, Td is the amount of timing skew. As an approximate rule of thumb, a
quarter inch mismatch at the receiver input is expected to skew the timing between
the received signals by 40 ps. As we can see from the table, that even though there is
progressively larger impact due to per-pin skew, the proposed MIMO-XTCR scheme
still performs relatively well even at a quarter inch mismatch.
Prior crosstalk cancellation techniques have been implemented at the receiver-
end or at the transmitter-end for various channel conditions and data rates. Table 2.3
shows a comparison of the MIMO-XTCR technique to prior schemes. Differential
signaling is used for [4] and [6]. In [5] staggered I/O cancellation is realized using
both the transmitter and receiver. The MIMO-XTCR technique achieves the best
eye-performance improvement with the lowest power consumption. As technology
scaling advances, the corner bandwidth of the MIMO-XTCR schemes will need to
increase further but should be equally effective for higher frequency.
The proposed continuous-time MIMO FEXT cancellation and signal reutilization

scheme has three advantages over the previous discrete-time digital schemes [4–6].
First, the simple CR differentiator and wide-band pseudo-differential amplifiers gen-
erate the crosstalk cancellation signal with low power, while discrete-time schemes
normally require several high-speed fractional taps that consume significant power.
Second, the simple circuit implementation allows for low chip area compared to
prior architectures [5, 6]. The MIMO FEXT cancellation block spans only 2 stages
for single-ended I/Os and 1 stage for differential I/Os. Third, as the spacing between
channels reduces, the signal integrity improves due to the additional high-frequency
boosting from the MIMO (XTR) signal energy that is available for reuse. Operation
with extremely narrow channel distances, where the FEXT completely closes the
eye-opening as shown in Fig. 2.11d, e, can be achieved. The additional bandwidth
improvement eases the burden on the following ISI equalizer. In conclusion, we
can both save board area and increase signal integrity at higher data rates by using
MIMO-XTCR.
2.4 Conclusions
This paper presents an efficient continuous-time architecture to cancel and reutilize

the crosstalk signal energy to improve SNR of NRZ signals. Slight modifications
of this scheme are applicable to environments where the unwanted crosstalk sig-
nal is proportional to the first order derivative i.e., pad-to-pad and pin-to-pin cou-
pling [7]. The measurement results show the validity of the MIMO-XTCR algorithm
and efficiency of the continuous-time XTCR scheme. The MIMO-XTCR portion
consumes low power (2.8 mW/Gbps/lane) and low area (0.03 mm2 ). For the proto-
type 2 × 2 MIMO-XTCR scheme the jitter p− p reduces by 67 %UI and the vertical
eye-opening improves by 58.2 % at 5 Gb/s, which to the best of our knowledge is
the highest performance improvement for any crosstalk cancellation technique. Due
to the crosstalk, single-ended I/Os have slowly been loosing ground to differential
I/Os as the go-to technology for high performance. However, single-ended I/Os with
MIMO-XTCR maintains the I/O pin, pads, and power advantages of single-ended
designs but allows for speeds that are comparable to differential I/Os although PSRR
considerations may require converting the single-ended signals to differential signals
within the chip as early as possible. In comparison to low speed single-ended I/Os,
MIMO-XTCR enhanced single-ended I/Os allow for closer channel spacing result-
ing in further board area savings. Finally, in addition to the crosstalk cancellation
aspects of MIMO-XTCR, the MIMO (XTR) signal improves the SNR and reduces
the burden on follow on equalization circuits.
2.4 Conclusions 25
Acknowledgments This research was supported in part by a grant from the Semiconductor
Research Corporation. The authors thank Frank O’Mahony, Bryan Casper and others at Intel Circuits
Research Laboratory, Mahmoud Reza Ahmadi at AMD and Brett Hardy at LSI for technical help
with this project. The authors also thank the anonymous reviewers for their constructive comments
and feedback.
References
1. M.R. Ahmadi, A. Amirkhany, R. Harjani, A 5 Gbps 0.13 µm CMOS pilot-based clock and data
recovery scheme for high-speed links. IEEE J. Solid-State Circuits 45, 1533–1541 (2010)
2. J.-S. Choi, D.-K. Jeong, M.-S. Hwang, A 0.18-µm CMOS 3.5-Gb/s continuous-time adaptive
cable equalizer using enhanced low-frequency gain control method. IEEE J. Solid-State Circuits
39, 419–425 (2004)
3. K. Fukuda, T. Saito, A 12.3 mW 12.5 Gb/s complete transceiver in 65 nm CMOS, IEEE ISSCC,
pp. 368–369, Feb. 2010
4. J.F. Buckwalter, A. Hajimiri, Cancellation of crosstalk-induced jitter. IEEE J. Solid-State Circuits
41, 621–631 (2006)
5. K.-I. Oh, L.-S. Kim, K.-I. Park, Y.-H. Jun, K. Kim, A 5-Gb/s/pin transceiver for DDR memory
interface with a crosstalk suppression scheme, IEEE CICC, pp. 639–642, Sep. 2008
speed serial link design, IEEE CICC, pp. 405–408, Sep. 2006
7. C. Pfeil, BGA breakout challenges. PCB Fabrication Magazine, pp. 10–13, Oct. 2007
Chapter 3
4 × 12 Gb/s MIMO Crosstalk Cancellation
and Signal Reutilization Receiver in 65 nm
CMOS Process
The demand for higher throughput combined with the finite number of I/Os has
increased the need for higher data rates per pin [1–6]. Unfortunately, this usu-
ally results in increased inter-symbol interference and crosstalk noise, demanding
larger power consumption for signal amplification and equalization. Convention-
ally, crosstalk has been handled by board-level techniques i.e. maintaining sufficient
distance between channels or shielding signal channels. Choosing differential I/Os
instead of single-ended can reduce crosstalk but at the cost of doubling the number
of I/O pads and increases power consumption.
On the other hand, there is a dearth of on-chip crosstalk cancellation (XTC) cir-
cuits that has been developed to handle multiple channels simultaneously. Figure 3.1
presents a stylized view of input and output signal waveforms for high-speed data
transmission in parallel multi-lanes. We can see that each of the receiver-side inputs
contains two additional crosstalk signals coupled from the two adjacent aggressor
channels. The issue of having multiple aggressors (error signal problem) has not
been addressed well in the prior art [7–10]. Our findings suggest that crosstalk issues
are only adequately addressed when considering a minimum of 4 channels (≥4),
in particular for single-ended channels. In this chapter, we have extended the two
channel XTCR scheme from [11] to support a infinite number of channels and have
verified the extension by implementing a prototype 4 channel XTCR design operat-
ing at 12 Gb/s (2× speed of [11]) per pin on single-ended I/Os for CPU-to-memory
interface applications. All the crosstalk signals in 4 channels have been tracked and
the resulting interference due to them have been appropriately handled to improve
signal integrity.
The rest of the Section is organized as follows. Section 3.1 shows some of the char-
acteristics of FEXT in multiple single-ended I/O channels that our scheme targets.
The proposed XTCR concept and its circuit implementation that is viable for 4 chan-
nel single-ended I/Os are presented in Sect. 3.2. Section 3.3 presents measurement
results of this prototype analog front-end. Section 3.4 summarizes this work.

28 3 4 × 12 Gb/s MIMO Crosstalk Cancellation
Received signal
in chanel 3
Channel 1 Crosstalk from

Channel 2 channel 2
Channel 3
Channel 4
Channel 5 Forward signal

in channel 3 (ISI)
Crosstalk from
channel 4
h(t) Transmitted NRZ signal
d Received NRZ signal (channel inter-symbol interference)

dt
Crosstalk signal (zero at peak amplitude timing of received signal)
Fig. 3.1 The major noise sources in parallel high-speed single-ended I/Os
3.1 Characteristic of Far-End Crosstalk and Proposed Channel

Architecture
3.1.1 Factors for Crosstalk Strength
Figure 3.2 shows the simulated (using Agilent ADS) peak crosstalk amplitudes as a
result of a step transition in an aggressor line for various channel lengths and channel
spacings. The transition time of the aggressor signal is fixed at 20 ps, as illustrated
at the top of Fig. 3.2. As the signal proceeds through the channel, the transition edge
becomes smoother due to the channel loss and the increase of crosstalk strength
saturates for longer channel lengths. In the bottom graph of Fig. 3.2, we note that the
peak crosstalk amplitude of the channel lengths over 11 increases less rapidly than
for channel lengths below 11 . This crossover distance is a function of the speed of
the transition edge. For a sharper transition edge, this saturation point moves to a
longer length. As the coupling between the lines is primarily inductive for typical
impedances [11], crosstalk is proportional to the negative derivative of the signal
transition. Due to this derivative property, the majority of the crosstalk occurs at
the channel end near the transmitter-side driver where the data transition is sharp.
Therefore, the crosstalk strength is significant even for short channel I/Os, i.e. CPU-
to-memory interfaces. Note, that in Fig. 3.2 the y-axis in the graph is in log units
and the lines show a regular increase for every 0.5 W spacing reduction validating
3.1 Characteristic of Far-End Crosstalk and Proposed Channel Architecture 29
(XT effect)
Large Small
1V
W
D Crosstalk
Transition time W amplitude
∇
( tTx :20ps)
∇
L ( XT)
1V ∇ 1 Non-linear behavior
XT ~
D due to close spacing
Peak crosstalk amplitude ( XT)
500mV in single-ended 2W
2.5W
∇
300mV 3W
3.5W
200mV 4W
4.5W
100mV 5W
5.5W
6W
50mV
Channel
30mV distance
Sharp edge Smooth edge
20mV (D)
10mV
2" 4" 6" 8" 10" 12" 14" 16" 18" 20"
Channel length (L)
Fig. 3.2 Crosstalk amplitude versus channel length and channel spacing
that crosstalk amplitude in single-ended I/Os is inversely proportional to the channel

distance (D).
In summary, the factors that influence the strength of the coupled crosstalk are
channel spacing, channel length and the transition time of the aggressor signal. The
crosstalk signal, as a negative derivative of the forward path transfer function, behaves
like a high-pass filter [11]. The data signal with a sharp edge contains a larger portion
of the spectrum energy at high frequencies and more of the aggressor energy tends to
couple to the victim line. For a longer channel length, the crosstalk coupling occurs
over a longer time period due to the longer contact surface between that channels
and it is for this reason that the crosstalk increases. However, once the signal slope
has decreased sufficiently there is minimal increase in the crosstalk amplitude. The
distance between channels is the most critical factor that affects the crosstalk coupling
coefficient.
3.1.2 Proposed Channel Architecture for Multi-Lane

Single-Ended I/Os
In this section we discuss the printed circuit board channel architecture for the multi-
lane single-ended data transmission. To aid in this effort we continue to present
and analyze crosstalk measurement results. Detailed design details for the integrated
crosstalk cancellation and signal reutilization analog front-end IC will be discussed in
Sect. 3.2. Figure 3.3 shows the final proposed channel spacing (2 W—within bundle,
3 W—between bundles, 2 W—within bundle) for the proposed multi-lane single-
ended I/Os (top). Note, the channel spacing is similar to but more compact than
differential lines (bottom) (2 W, 5 W, 2 W), however, our channels are single-ended,
i.e., twice the number of data channels in comparison. We intentionally closely
bundle channels 1 and 2 (2 W) and channels 3 and 4 (2 W) to illustrate the benefits
of increased high frequency bandwidth via the XTCR technique [11]. On the other
hand, we increase bundle-to-bundle spacing between channel 2 and 3 (3 W distance)
to reduce the residual error term at the XTCR analog front-end to a reasonable
level. In Appendix B, the origins of this error signal for previous 2 channel based
XTC schemes applied on multi-channels (≥4) has been mathematically illustrated.
Since the strength of error signal is inversely proportional to the channel spacing
between bundles, we can reduce the error signal sufficiently by trading-off the bundle-
to-bundle spacing.
While differential I/Os provide a finite amount of shielding advantage for crosstalk,
in our proposed single-ended I/Os we are able to minimize the crosstalk and reutilize
its energy for higher signal throughput by using XTCR analog front-end as long
as the PSRR of the channel and the analog front-end allows for this change. The
Non-adjacent crosstalk (5W)1st adjacent crosstalk (2W) 2nd adjacent crosstalk (3W)
-jωζ H(ω )X2(ω ) -jωβ H(ω )X2(ω ) -jωδ H(ω )X2(ω )
• 1x data transmission / physical lane Forward signal
H(ω )X2(ω ) Non-adjacent crosstalk (5W)
• Higher IO & board efficiency RX1 RX2 RX3 RX4
-jωζ H(ω )X2(ω )
• Crosstalk cancellation Y1(ω ) Y2(ω ) Y3(ω ) Y4(ω )
• XTCR high-frequency boosting

XTCR pair XTCR pairXTCR pair XTCR pair
6"
W W=120mil 2W 3W
Trade 1 oz
TX1 TX2 TX3 TX4
62.5 mil X1(ω ) X2(ω ) X3(ω ) X4(ω )
-off 1 oz
Proposed XTCR single-ended multi-lanes
• 0.5x data transmission / physical lane
• Lower IO & board efficiency RX1
Y (ω ) 1
RX2
Y2(ω )
RX3
Y3(ω )
RX4
Y4(ω )
• Crosstalk shielding
• Better PSRR
2W 5W
+ − + − + − + −
TX1 TX2 TX3 TX4
X1(ω ) X2(ω ) X3(ω ) X4(ω )
Conventional differential multi-lanes
Fig. 3.3 Physical description of the proposed single-ended multi-lane (top) and the conventional
differential multi-lane channel architecture (bottom)
3.1 Characteristic of Far-End Crosstalk and Proposed Channel Architecture 31
final goal of this research is to achieve twice the data rate of differential I/Os in
the PCB traces even with the reduced physical channel area of differential I/Os. The
trade-off between the proposed multi-lane single-ended I/O scheme and conventional
multi-lane differential I/O schemes are listed on the top LHS of Fig. 3.3.
6 PCB traces with distance described in Fig. 3.4a were used for our channels and
to test our XTCR analog front-end in Sect. 3.2. The measured insertion loss (11 dB
at 6 GHz) of the 4 channels as well as the crosstalk between them in the frequency
domain are shown in Fig. 3.4b. Note that the crosstalk from non-adjacent channels
(between channel 1 and 3, between channel 2 and 4) are 25 dB lower at 6 GHz and
can be ignored. For the measurement in Fig. 3.4c, we transmit 500 mV pp signals onto
channel 2 only and look at the measured eye-diagrams of channels 1–4 at the output.
The forward signal (channel 2), 2 W distance FEXT (channel 1), 3 W distance FEXT
(channel 3) and 5 W distance FEXT (channel 4) are presented. In comparison to the
amplitude of the 2 W FEXT (276 mV pp ) and 3 W FEXT (129 mV pp ), the 5 W FEXT
amplitude is measured as 39 mV pp , which is close to the white noise amplitude in
(a) X2(ω )
2W 3W 2W 3W 2W 3W 2W
Channel 1
Channel 2
Channel 3
Channel 4
−jωβ H(ω)X2(ω) H(ω)X2(ω) −jωδ H(ω)X2(ω) −jωζH(ω)X2(ω)

2W FEXT (ch.1/2) Forward signal 3W FEXT (ch.2/3) 5W FEXT (ch.2/4)
Channel loss and crosstalk description of proposed single-ended I/Os
(b) (c)
0 496 mVpp 58 ps 197 mVpp
Insertion loss
T ch.1,2,3,4
-10 FEX 4
2 W / 2 , ch . 3 / Ch.2
Amplitude (dB)
-20 1
ch. F E X T
3 W .2 / 3 276 mVppForward:H(ω)X2(ω )
-30 ch T
FEX 4
5 W / 3 ,ch . 2 / Ch.1
-40 ch . 1 2W:−jωβ H(ω )X2(ω )
129 mVpp
-50 ~1/D Ch.3
(single-ended I/Os) 3W:−jωδ H(ω )X2(ω )
-60 39 mVpp
0 2 4 6 8 10 12 Ch.4
5W:−jωζH(ω )X2(ω )
Frequency (GHz)
Measured insertion loss and Measured eye of forward signal
crosstalk in a frequency domain and crosstalk in a time domain
Fig. 3.4 Channel description (a), measured channel loss and FEXT in a frequency domain (b),
measured eye-diagrams of a forward signal and FEXT at 12 Gb/s in a time domain (c)
Cursor
timing
Forward signal
pulse response
X2(ω) H(ω)X2(ω )
Crosstalk
−jω τH(ω)X2(ω) pulse response
Zero-crossing timing
Channels with a matched length
Td
X2(ω) H(ω)X2(ω)
X2(ω) H(ω)X2(ω)
X2(ω) H(ω)X2(ω)
5W
3W
2W
−jωβ H(ω)X2(ω) −jωδ H(ω)X2(ω)

−jωζ H(ω)X2(ω)
Forward
signal eye
Crosstalk
signal eye
Td = 3.3 ps Td = 7.3 ps Td = 20.5 ps
Fig. 3.5 Measured timing skew between the cursor of the forward signal and the zero-crossing of
a crosstalk in the matched channels
our measurement environment. In this sense, we handle the adjacent crosstalk, 2 or

3 W FEXT only and ignore 5 W FEXT from channels that are not directly adjacent
in the forgoing analysis.
Next, we consider the impact of channel spacing on the crosstalk arrival times and
the ensuing impact on our XTCR scheme. Figure 3.5 shows the arrival time mismatch
between the forward signal and the crosstalk signal for different channel spacings.
In our crosstalk cancellation architecture, discussed in Sect. 3.2, we assume that
the peak amplitude timing of the forward signal pulse and the zero crossing timing
of the crosstalk pulse are matched. In reality the crosstalk arrives slightly later due
to the difference in the flight time even between channels with physically matched
lengths, as illustrated in the top of Fig. 3.5. The timing mismatch between the forward
signal and the crosstalk in channels with 2, 3 and 5 W distance is measured as 3.3, 7.3
and 20.5 ps each respectively. This discrepancy in the synchronization of the arrival
times of each signal can affect the performance of XTCR and equalization circuits in
our analog front-end. However, the strength of the crosstalk in channels with larger
timing mismatch i.e. larger separation, tends to be smaller and less problematic. For
example, 5 W FEXT is lower than the forward signal by 25 dB at the Nyquist rate,
as shown in the frequency domain graph in Fig. 3.4b. For the crosstalk signals that
are not ignorable (2 and 3 W FEXT), the timing mismatch is as small as 3 ∼ 7 ps.
Although there is an arrival time mismatch between the forward signal and the
crosstalk signal, which can potentially degrade the XTCR performance, this non-
ideality has limited impact as we are pairing up the two channels with the closest
inter-channel distance (2 W).
3.2 Proposed Low Power XTCR Analog Front-End for Multi-Lanes 33
3.2 Proposed Low Power XTCR Analog Front-End for

Multi-Lanes
Far-end crosstalk (FEXT) cancellation can be attempted at either the receiver-side

or at the transmitter-side. The efficiency of XTC on transmitter-side discrete-time
FIR XTC filter versus receiver-side analog-IIR XTC filter has been conceptually
compared and discussed in Appendix C. This paper focuses on the receiver-side
implementation of analog-IIR XTC due to its low complexity and low power. To
generate the crosstalk cancellation signal at the receiver-side, the received signals
of the two adjacent channels should also be known. The received signals of any
of the channels are not crosstalk-free and have both the forward signal and the
crosstalk signals, as illustrated in Fig. 3.1. Previously, we have identified the problem
of the second derivative of the crosstalk and the necessary tracking of the forward
signals along with the crosstalk signals that exist in multi-lane parallel I/Os [11].
In Sect. 3.2.1, we track the signal and crosstalk for an 8 channels XTCR scheme to
illustrate the validity of this scheme to be extended to an infinite number of parallel
I/Os. However, our prototype design is limited to 4 channel XTCR due to pad and die
area limitations. Circuit level details for the prototype design implemented in 65 nm
CMOS is presented in Sect. 3.2.2.
3.2.1 XTCR on Multi-Lanes ( ≥4)
Figure 3.6 includes block-level diagrams for the channel and the XTCR analog front-
end. Assume 8 independent, single-ended, NRZ signals (X1 –X4 , X A –X D ) are applied
to the 8 channels. The signals go through PCB traces, with their associated ISI
and crosstalk coupling, and the analog front-end equalizes both the crosstalk and
the channel ISI. The matrix representation in Fig. 3.7 shows the transmitted signals
(X1 –X4 , X A –X D ), the channels (H, diagonal elements), the received signals
(Y1 –Y4 , Y A –Y D ), and crosstalk terms (− jωβH, − jωδH which are the derivatives
of the channel [11]). The top RHS matrix in Fig. 3.7 shows that Y A , Y D have 1
aggressor each and Y1 –Y4 , Y B , YC have 2 aggressors each. G is a gain of the adder
and the linear equalizer (LE).
We equalize the crosstalk by differentiating the received signals from adjacent
channels and adding them with the appropriate gain (β, δ), as shown in the top LHS
matrix in Fig. 3.7. Finally, Z1 –Z4 and Z A –Z D appear at the analog front-end output.
For example, the channel 2 output signal (Z2 ) is free of crosstalk from the 2 adjacent
channels and has additional beneficially reutilized crosstalk energy that boosts the
high frequency gain by (ω2 β 2 + ω2 δ 2 )GHX2 . However, we can see that the error
terms (ω2 βδ GHX B + ω2 βδ GHX4 ) occur on channel 2’s output. By making the
spacing between channel B & 1 and channel 2 & 3 to be 3 W, which is still much
smaller (and more difficult) than what is currently used in differential I/Os (≥5 W) to
avoid a crosstalk. Pairing up every two channel as a bundle and maintaining sufficient
FEXT channel model XTCR analog front-end

G: adder+LE gain
YA(ω)
XA(ω) Channel A,H(ω) LE ZA(ω)
jωβ
2W
2W FEXT,-j ωβ H(ω)
jωβ G: adder+LE gain
YB(ω)
XB(ω) Channel B,H(ω) LE ZB(ω)
jωδ
3W
3W FEXT,-j ωδ H(ω)
jωδ G: adder+LE gain

Y1(ω)
X1(ω) Channel 1,H(ω) LE Z1(ω)
jωβ
2W
Y2( ω)
X2(ω) Channel 2,H(ω ) LE Z2(ω)
jωδ
3W
3W FEXT,-j ωδ H(ω)
Y3(ω) jωδ G: adder+LE gain

jωβ
2W
Y4(ω)
jωδ
4 channnel XTCR
3W
3W FEXT,-j ωδ H(ω) Implemented
YC(ω) jωδ G: adder+LE gain
XC(ω) Channel C,H(ω) LE ZC(ω)

jωβ
2W
YD(ω)
XD(ω ) Channel D,H(ω) LE ZD(ω)
Fig. 3.6 Proposed FEXT channel and XTCR analog front-end block diagram
distance between bundles can allow the error terms to approach ignorable levels. The
throughput can be increased by up-to 2X by doubling the number of data channels in
comparison to conventional differential I/O data transmission. We implemented only
4 channel (channel 1–4) XTCR in this work because of the limited number of I/O
pads on the die and ω2 βδ GHX B out of 2 error terms (ω2 βδ GHX B + ω2 βδ GHX4 )
on channel 2 output has not been included in our measurement tests. However, 4
channels address all the issues we are likely to see in any system that may contain 4,
8 or larger number of channels. The scope of signal components covered in this paper
is shown inside the dotted box in the bottom matrix of Fig. 3.7. In summary, within a
Proposed XTCR analog front-end FEXT model in the proposed channel architecture
ZA G j G 0 0 0 0 0 0 YA YA H -j H 0 0 0 0 0 0 XA
ZB j G G j G 0 0 0 0 0 YB YB -j H H -j H 0 0 0 0 0 XB
Z1 0 j G G j G 0 0 0 0 Y1 Y1 0 -j H H -j H 0 0 0 0 X1
Z2 0 0 j G G j G 0 0 0 Y2 Y2 0 0 -j H H -j H 0 0 0 X2
= =
Z3 0 0 0 j G G j G 0 0 Y3 Received Y3 0 0 0 -j H H -j H 0 0 X3
Z4 0 0 0 0 j G G j G 0 Y4 signals Y4 0 0 0 0 -j H H -j H 0 X4
( Y 1-Y4 &
ZC 0 0 0 0 0 j G G j G YC YC 0 0 0 0 0 -j H H -j H XC
Y A-YD )
ZD 0 0 0 0 0 0 j G G YD YD 0 0 0 0 0 0 -j H H XD
Final resultant matrix of analog front-end outputs

(1+ 22)GH 0  2GH 0 0 0 0 0 XA
0 (1+ 22+ 2 2)GH 0  2GH 0 0 0 0 XB FEXT
(large noise)
 2GH 0 (1+ 22 + 2 2 )GH 0  2GH 0 0 0 X1
0  2GH 0 (1+ 22+ 2 2)GH 0  2GH 0 0 X2
= XTCR Rx
0 0  2GH 0 (1+ 22+ 2 2)GH 0  2GH 0 X3
0 0 0  2GH 0 (1+ 22 + 2 2 )GH 0  2GH X4
Error signals
0 0 0 0  2GH 0 (1+ 22+ 2 2)GH 0 XC (reduced noise)
0 0 0 0 0  GH
2 0 (1+  )GH
2 2
XD
Signals covered in this work
Fig. 3.7 Frequency-domain matrix representation of the crosstalk channel and the XTCR analog
front-end
bundle (channel A–B, channel 1–2, channel 3–4 or channel C–D) we utilize XTCR
to both reduce crosstalk and increase signal bandwidth (XTCR). Between bundles
(channel B–1, channel 2–3 and channel 4–C) we only suppress crosstalk by utilizing
XTC and the remaining error terms are negligible.
3.2.2 Prototype Low Power Analog Front-End Circuit Design
Figure 3.8 shows the circuit implementation for the proposed XTCR analog front-
end. An improvement in the architecture compared to the previous work in [11]
is that the single-ended to differential converters (SDC) are now placed right at
the beginning of the analog front-end for better PSRR. At 100 MHz, the simulated
PSRR of the overall architecture is 7.6 dB, which is 5 dB higher than the previous
scheme in [11]. The phase delays of the crosstalk signal (path 1) and XTC signal
(path 2) are equalized by adding a low-pass filter compensator (1/(1+jωRC)) to
path 1 in addition to the differentiators (jωRC/(1+jωRC)) in path 2 with equal pole
frequencies (10 GHz) so that the phase difference between these two paths is 90◦
across all frequencies, as shown in Fig. 3.8 (left inset). Differentiating the crosstalk
signal (adjacent channel) has the potential to amplify the high frequency circuit
noise. Our time-domain measurements show 39 mV ppd of white noise, 233 mV ppd
of crosstalk and 500–600 mV ppd of NRZ signals at the analog front-end output.
The white noise level is 23 dB lower than the signal level even after XTCR (SNR),
whereas the crosstalk level is approximately only 7.5 dB lower than the signal level
SDC VGA Adder

1X 1X 2X 4X
Vi1 Channel 1 b0 b1 b2 LE Vo1

1X 1X 2X 4X
Differentiator and compensator
ω o=1/RC
50
Amplitude
comp
(Passtal
cro
ff
di
Log f
th1 k
D
)
diff
iff
Phase
C
(Path2)
2R
C
comp crosstalk
90°
Log f
cancellation
Comp signal
C
Vi2 Channel 2 R
2 LE Vo2
R
50
Vi3 Channel 3 LE Vo3
50
Vi4 Channel 4 LE Vo4
50
Fig. 3.8 Proposed 4 × 4 XTCR analog front-end circuit diagram
(SIR). Our XTCR scheme suppresses the crosstalk by roughly 21.3 dB, as shown in
Fig. 3.14. So, eliminating the crosstalk has a larger impact on SNIR than slight boost
of the high frequency circuit noise that results due to XTCR.
The circuit diagram for the SDC and the associated AC simulation result is shown
in Fig. 3.9. The input and outputs of the SDC are ac-coupled and the capacitances
are chosen to be sufficiently large (2.4 pF) for the SDC to pass through a 27 -1 PRBS
0
VB -2
RL:55 -4
VBN 2.4pF
Gain (dB)
-6
vN -8
20 log | VP-VN /VIN|
vIN ZX 6 dB
5.4mA -10
gm 2.4pF -12
2.4pF 20 log |VP/VIN| 20 log |VN/VIN|
vP -14
VBP -16
50 ZX
RL:55 -18
-20
10M 100M 1G 10G
Frequency (Hz)
Fig. 3.9 Single-to-differential converter circuit diagram and simulation results
pattern with wide bandwidth at the low frequency-side. The DC voltages of SDC
outputs, which are annotated as V B P and V B N in the LHS of Fig. 3.9, are 297 and
803 mV each respectively. Here, the drain impedance is designed to match the source
impedance, Z L = R L ||Z X , in which Z X is the impedance of RCCR differentiation
and compensation passive network. Gains of SDC from the input (V I N ) to source
(V P ) and drain (V N ) nodes can be expressed as
V P (ω) gm Z L VN (ω) gm Z L
= , =− (3.1)
VIN (ω) 1 + gm Z L VIN (ω) 1 + gm Z L
Here, we can notice the magnitude of the SDC gain from input to drain (V N ) node
is virtually identical to the input-to-source gain (V P ), although they have opposite
polarity to each other. The RHS graph of Fig. 3.9 shows the simulated magnitudes
of |V P /V I N |, |V N /V I N | and |(V P − V N )/V I N |.
Figure 3.10 shows the VGA circuit and its simulation result. Using this VGA
allows us to control the cancellation signal over a wide range, unlike [12] which can
only attenuate the passive CR filter output. Depending on the required gain the current
mirrors can be switched and the variable gain covers the range from 0.4 to 1.5 (V/V).
The input over-drive voltage (input swing range) can be constant for all variable gains
by keeping the ratio of the gm cell and the DC current the same, as shown in the
circuit diagram in Fig. 3.10. In addition, the output DC margin is designed to have
a sufficient value when the maximum gain is selected and all transistors are turned
on. All 4 current mirrors shown in Fig. 3.10 can be shut-off to block the signal pass
when required during the calibration process.
Figure 3.11 presents the circuit diagram for the adder and its simulation result.
The current-mode signal adder combines the 2–3 large-current branches of high-
speed signals. However, 2–3 large DC current are connected to one output node and
the DC voltage at the adder output is not sufficiently high to create enough voltage
swing with the structure illustrated in the top of Fig. 3.11. To prevent excessive DC
voltage drop at the output, current-bleeding PMOS transistors are used. They allow
additional gmp and reduce large signal distortion due to push-pull characteristics.
The detail connections for the coupling capacitors is presented in the bottom of
Fig. 3.11, which is not described in Fig. 3.8 for clarity purposes. Instead of using
RCCR
Differentiation VP VN VP VN VP VN VP VN
1x 1x 1x 1x 2x 2x 4x 4x
&
Compensation
passive
network I I 2I 4I
gain
1.6 range
1.2 1.5
Gain (V/V)
0.8 0.4
0.4
0.0
100M 1G 10G
Frequency (Hz)
Fig. 3.10 The VGA structure used in the forward and the differentiation paths
Excessive DC
3I 3I
voltage drop
VP1 VP2 VP3 VN3 VN2 VN1
I I I I I I
Bias
resistor
gmp gmp gmp gmp gmp gmp

VP1 VP2 VP3 VN3 VN2 VN1
gmn gmn gmn gmn gmn gmn
PMOS bleeding reduces DC voltage drop and contributes additional g mp and

reduces large signal distortion due to push-pull characteristics
Fig. 3.11 Circuit diagram of the adder circuit
NMOS bleeding circuits and connecting the output loads to ground, we have chosen
PMOS bleeding circuits because for the same power consumption the gain of adder
(gmn + gmp ) is higher due to the higher mobility of the NMOS device. Standard
two stage differential linear equalizers remove any remaining ISI after XTCR with
controlled high-frequency boosting. The die photo of 4 channel XTCR analog front-
end is shown in Fig. 3.12. The XTCR occupies 0.036 mm2 /lane chip area.
3.3 Verifying Crosstalk Cancellation Using Multi-Lane Signals 39
LE LE LE
XTCR XTCR XTCR XTCR
LE
SDC SDC SDC SDC
Fig. 3.12 Die photograph for XTCR analog front-end (XTCR adder occupies 0.036 mm2 /lane)
3.3 Verifying Crosstalk Cancellation Using Multi-Lane Signals
3.3.1 XTCR Gain Calibration
6 PCB traces with 24 coaxial connection cables were used to test the prototype. To
fix the optimal XTC gain between, for example, channel 1 and 2, we transmit a single
500 mV p− p 10 Gb/s NRZ signal on channel 1 only and calibrate the differentiation
path gain that results in the minimal residual crosstalk as shown in the RHS graph in
Fig. 3.13. In case of either over compensation or under compensation when the VGA
gain in the differentiation path is not properly adjusted, the residual crosstalk remains
and shows up as a higher amplitude. The graph shown in the bottom RHS of Fig. 3.13
presents the residual crosstalk amplitude after XTC versus VGA gain settings of
differentiation path. Figure 3.14 shows that the crosstalk coupled from channel 1 to
2 with an initial amplitude of 233 mV ppd is optimally reduced to 59 mV ppd at the
analog front-end output for a [b2 b1 b0 ] = [011] gain setting. After the calibration of
XTC gain for channel 1 to channel 2 crosstalk, only 20 mV of the residual noise
is deterministic showing the effective suppression (≥10 times) of the crosstalk. All
other differentiation VGA gains in channel 1–4 were adjusted in a similar manner.
The measured eye-diagrams at 10 Gb/s of each component in the matrix are shown
at the bottom of Fig. 3.15. The high-frequency boosting of LE is adjusted to handle
any remaining ISI after XTCR high-frequency boosting. A secondary error term
(ω2 βδ HGX4 ) decreases as the coupling coefficient between channel 2 and 3 (δ)
reduces, i.e., implying increased spacing. In order to reduce the secondary error
term the distance between channels 2 and 3 was varied across 2.5, 3 and 3.5. The
measured amplitudes of the residual error signals reduces to 87, 73 and 42 mV ppd
respectively, as illustrated in Fig. 3.16. This secondary error term is not addressed in
previous work [11–13] where only the crosstalk between only 2 channels is handled.
However, generalized XTC with multiple channels will invariably end up with this
error term that is handled here. We are able to handle 4 channels in this prototype
Path 2
Under compensation
Vin1 Channel 1 SDC VGA LE Vout1
[000]
50
VGA
Pa
th
Optimal compensation
1
VGA
[011]
Over compensation
50
VGA
[111]
VGA 200 mV/DIV, 50 ps/DIV
50
VGA
VGA
50
On-chip
([000] is a minimum gain setting for a VGA)
Fig. 3.13 Crosstalk cancelation gain calibration
233 mVppd
233 mVppd
(a) Crosstalk signal eye-diagram and vertical crossing point histogram (path 1)
Random noise
59 mVppd
59 mVppd
20 mVppd
Deterministic noise
Random noise
(b) Crosstalk signal eye-diagram and vertical crossing point histogram after XTC (path 1+path 2)
Fig. 3.14 Crosstalk cancellation performance at channel 2 output (path 1 and path 2 in Fig. 3.13)
FEXT channel model
Y1 H -jωβ H 0 0 X1 FEXT
(large noise)
Y2 -jωβ H H -jωδ H 0 X2
= XTCR Rx
Y3 0 -jωδ H H -jωβ H X3
0 0 -jωβ H H X4 Error signals

Y4
(reduced noise)
Received signals (Y1-Y4)
XTCR analog front-end Final resultant matrix of analog front-end outputs
Z1 G jωβ G 0 0 Y1 (1+ω 2β2)GH 0 ω 2βδGH 0 X1
Z2 jωβ G G jωδ G 0 Y2 0 (1+ω β +ω δ )GH

2 2 2 2
0 ω βδGH
2
X2
= =
Z3 0 jωδ G G jωβ G Y3 ω 2βδGH 0 (1+ω 2β2+ω 2δ 2)GH 0 X3
Z4 0 0 jωβ G G Y4 0 ω 2βδGH 0 (1+ω 2β 2)GH X4
HGX2 (ω 2β 2+ω 2δ 2)GX2 (1+ω 2β 2+ω 2δ 2)HGX2 ω 2βδ HGX4
73 mVdpp
91/525 mVdpp 209 mVdpp 247/521 mVdpp
Forward signal Compensated signal Error signal
(XTR) signal
,,,, ,, ,,
* G represents gain of adder & LE. G is optimized to cope with remaining ISI after XTR signal partially
compensates channel loss.
Fig. 3.15 Measured signal components at 10 Gb/s of the crosstalk channel with XTCR analog
front-end
2.5W ω 2βδ HGX4

channel1 LE
jωβ CH 1 CH 2 CH 3 CH 4
2W
Ground
jωβ 6.5W 87 mVppd

channel2 LE
jωδ
4
X 3W ω 2βδ HGX4
2.5W −3.5W
HG al
2β δ sign
ω r .2 CH 1 CH 2 CH 3 CH 4
r o ch
E r on Ground
7W 73 mVppd
jωδ
channel3 LE
jωβ 3.5W ω βδ HGX4
2
2W
CH 1 CH 2 CH 3 CH 4
jωβ
channel4 LE Ground
42 mVppd
7.5W
Fig. 3.16 Measured error signal amplitudes (ω2 βδ H G X 4 ) for various spacings between channel
2 and 3 (bundle-to-bundle) at 10 Gb/s. During the test, the XTC gain between channel 2 and 3 (δ)
is optimally adjusted to minimize any residual crosstalk
(and verified up to 8 channels in simulation) by separating the bundles by up-to 3 W,

such that the error term gain (∼δ) becomes negligible.
3.3.2 Measurement Verification for Practical Application
In comparison to the tests performed in the previous Sect. 3.3.1 where only single
signals were used for calibration purposes, in this Section we verify 4 independent
signal transmissions in closely-spaced multi-lanes with crosstalk and signal reuti-
lization. While in the measurement results shown in Figs. 3.13, 3.14, 3.15 and 3.16
only the signals of one individual lane at a time was used for calibration, here we
transmit 4 independent signals on 4 separate channels that illustrate its usage for 4
channel high-speed I/O parallel data transmission. Due the limited number of I/Os in
our prototype, for the two channels at the edge only one error term is included in this
measurement although in the case of an infinite number of I/O channels there will
be two error terms for all the channels except the final two at the exterior. The error
signal amplitude for 3 W distance between bundles is 73 mV ppd as shown in Fig. 3.16
and the random noise of our measurement environment at our analog front-end out-
put is roughly in the range of 39 mV ppd . Therefore, by adding one more error term,
we can expect the eye opening to degrade by an around 30–40 mV ppd (deterministic
noise).
Physical channel parameters used for the measurement of a practical parallel high-
speed data transmission are described in the top of Fig. 3.3. Figure 3.17 shows the
4" FR4 2" FR4

18" SMA
3" SMA
Cables
Cables
3" SMA
channels
Cables
channels
Arbitrary BERT scope

waveform
output matching
XTCR receiver
Input matching
27-1
selection filter
unity gain ch
4 channel
generator
PRBS
3 independent signal
single-ended
PRBS signals
18" SMA Cable
Total length per a channel
4"+2" FR4 traces

+
18"+3"+3" SMA cables
=
6" FR4 trace and 24" cable
Tektronix AWG 7000B Tektronix BERT scope 17500C
Fig. 3.17 6 FR4 trace and 24 coaxial connection cables per channel are used for the test. Four
independent PRBS7 data (3 from AWG and 1 from the BERT scope) are transmitted onto the signal
channels and the BER contours and bathtub curves are monitored
Channel 1 Channel 2 Channel 3 Channel 4

XTCR off
8 Gb/s
XTCR on XTCR off
12 Gb/s
XTCR on
Fig. 3.18 Channels 1-4 eye-performance measurement results for XTCR off/on at 8 Gb/s (top)
and 12 Gb/s (bottom) and the performance improvements shown in Table 3.1 are based on the BER
contours at 12 Gb/s and XTCR off/on shown here on the 3rd and 4th row
cable connections for the equipment used to test this channel. Three aggressor inputs
come from the AWG but 1 victim signal input is generated from the BERT scope to
measure BER. The figure shows that the channel 2 input is fed from the BERT and the
channel 2 output is also monitored by the BERT. The same connection was repeated
for BER tests that were performed for channels 1, 3 and 4. Using a Tektronix arbitrary
waveform generator (AWG) 7000B multiple independent PRBS7 NRZ data at 8 and
12 Gb/s were applied to closely-spaced multi-lanes and a Tektronix BERT scope
17500C was used to monitor the eyes and BER contours of channels 1–4 analog
front-end outputs while XTCR was switched off/on. Figure 3.18 shows measured
eye-diagrams of channel 1–4 analog front-end outputs. The measured BER contours
of channels 1–4 (XTCR on) at 12 Gb/s are shown in Fig. 3.19. At 12 Gb/s, all channel
eyes are completely closed without XTCR. After turning on XTCR, all 4 channel eye
openings show an average of 37.5 %UI horizontal and 26.4 % vertical improvement at
10−8 BER. Error free zones with a BER of 10−12 have been achieved for all channels.
The SDC, XTCR (VGAs, Adder) and LE consume 5.9, 11.5 and 3.9 mW/lane from
a 1.1 V supply, respectively. The confidence levels of the tests performed is 95 %,
which is achieved by operating the BER tests for 5 min. The BER bathtub curves
Channel 1 Channel 2
Measured
maximum
amplitude
corresponds
to 100% of
y-axis in the
BER contour
Channel 3 Channel 4
Fig. 3.19 Measured BER contours for 12 Gb/s when XTCR is on from the eye-diagrams in the
4th row in Fig. 3.18. For XTCR off, all channel 1–4 eyes at 12 Gb/s are completely closed and not
shown here
10-4 10-4
10-6
Bit error rate
Bit error rate
10-6
10-8 10-8
8 Gb/s XTCR on 12 Gb/s
ch1 XTCR on ch1
-10 ch2 10-10 ch2
10
ch3 ch3
ch4 ch4
10-12 10-12
Fig. 3.20 Measured bathtub curves (channel 1–4, XTCR on) at 8 Gb/s (left) and 12 Gb/s (right)
from the eye-diagrams in the 4th row in Fig. 3.18
shown in Fig. 3.20 show reliable data transmission after the crosstalk is canceled
in closely spaced single-ended channels by utilizing our XTCR analog front-end.
The improved eye opening and other performance parameters are summarized and
compared to prior work in Table 3.1.
3.4 Conclusions 45
Table 3.1 Comparison of 4 × 12 Gb/s XTCR schemes with prior art

Reference [11]∗ [12]∗ [13]∗ This work
XTC type Rx analog-IIR Rx passive SC TX FIR Rx analog-IIR
I/O type Single-ended Differential Single-ended Single-ended
Multi-channel # 2 2 2 4
Data rates (Gb/s) 6 12.5 7 12
XTC power 2.4 0.033 N/A Ch.1 Ch.2 Ch.3 Ch.4 Avg.
(pJ/bit/lane) 0.85 1.07 1.07 0.85 0.96
Horizontal eye 28 23.5 4.2 41.4 35.8 35.7 37 37.5
increase (%UI ↑)∗∗ BER < 10−8
Vertical eye 12.4 N/A N/A 34.3 28 26.1 17.2 26.4
increase (% ↑)∗∗ BER < 10−8
XTC area(mm2 /lane) 0.03 N/A N/A 0.036
(*) Performance number for only one channel output were presented (**) Eye improvement results
using only XTC circuit are compared; pre-emphasis is not considered
3.4 Conclusions
Normally, in prior work crosstalk cancellation has focused on handling the crosstalk
between only 2 channels and it is assumed that this technique is extendable infinitely.
However, as shown in the paper this was not necessarily the case for closely spaced
single-ended multi-channels (≥4). The issue of residual crosstalk error signal for
multiple lanes was identified for the first time and handled efficiently in terms of
complexity and power. The implementation can be extended beyond 4 channels as
long as the spacing between bundles is slight larger than between channels in a
bundle. While prior XTC techniques [11–13], have presented only single channel
outputs, this work shows the signal integrity improvement for all 4 channels. The
XTCR occupies 0.036 mm2 /lane chip area and consumes 0.96 pJ/bit/lane. To the best
of our knowledge, this work shows the largest eye-improvement at 12 Gb/s.
Acknowledgments This research was supported in part by the Semiconductor Research Corpora-
tion under grant #2008-HC-1836-090768 at the University of Minnesota.
References
1. A. Momtaz, M.M. Green, An 80mW 40Gb/s 7-Tap T/2-Spaced FFE in 65nm CMOS. IEEE
ISSCC, 364–365 (Feb. 2009)
2. H. Wang, J. Lee, A 21-Gb/s 87-mW transceiver with FFE/DFE/Analog equalizer in 65-nm
CMOS technology. IEEE JSSC, 909–920 (April 2010)
3. J. Lee, K.-C. Wu, A 20Gb/s full-rate linear CDR circuit with automatic frequency acquisition.
IEEE ISSCC, 366–367 (Feb. 2009)
4. K.-C. Wu, J. Lee, A 2x25Gb/s deserializer with 2:5 DMUX for 100Gb/s ethernet applications.
IEEE ISSCC, 374–375 (Feb. 2010)
5. S.A. Ibrahim, B. Razavi, A 20Gb/s 40mW equalizer in 90nm CMOS technology. IEEE ISSCC,
170–171 (Feb. 2010)
6. C.-F. Liao, S.-I. Liu, A 40Gb/s CMOS serial-link receiver with adaptive equalization and
clock/data recovery. IEEE JSSC, 2492–2502 (Nov. 2008)
7. J.F. Buckwalter, A. Hajimiri, Cancellation of crosstalk-induced jitter. IEEE JSSC, 621–631
(Mar. 2006)
8. Hae-Kang Jung, Soo-Min Lee, Jae-Yoon Sim, Hong-June Park, A slew-rate controlled trans-
mitter to compensate for the crosstalk-induced jitter of coupled microstrip lines. IEEE CICC
(Sept. 2010)
9. J.-H. Bae, Y.-S. Sohn, S.-J. Bae, K.-I. Park, J.-S. Choi, Y.-H. Jun, J.-Y. Sim, H.-J. Park, A
crosstalk-and-ISI equalizing receiver in 2-drop single-ended SSTL memory channel. IEEE
CICC (Sept. 2010)
10. H.-K. Jung, K. Lee, J.-S. Kim, J.-J. Lee, J.-Y. Sim, H.-J. Park, A 4 Gb/s 3-bit parallel transmitter
with the crosstalk-induced jitter compensation using TX data timing control. IEEE JSSC,
2891–2900 (Nov. 2009)
11. T. Oh, R. Harjani, A 6-Gb/s MIMO crosstalk cancellation scheme for high-speed I/Os. IEEE
JSSC, 1843–1856 (Aug. 2011)
12. M. Honarvar, A. Emami-Neyestanak, A 15Gb/s 0.5mW/Gb/s 2-tap DFE receiver with far-end
crosstalk cancellation. IEEE ISSCC, 446–447 (Feb. 2011)
13. S.-J. Bae, Y.-S. Sohn, T.-Y. Oh, A 40nm 2Gb 7Gb/s/pin GDDR5 SDRAM with a programmable
DQ ordering crosstalk equalizer and adjustable clock-tracking BW. IEEE ISSCC, 498–499
(Feb. 2011)
Chapter 4
Adaptive XTCR, AGC, and Adaptive DFE Loop
The hardware performance of an ISI equalizer and XTC is dependent on the adap-
tive calibration algorithm used for the XTC combined with the AGC and the
DFE. In this paper we describe an automatic calibration algorithm that focuses on
high-speed crosstalk cancellation and crosstalk reutilization techniques for single-
end I/Os [1]. The crosstalk signal between channels in multiple lanes strongly
depends on the channel spacing of the PCB lanes and proximity issues caused
by pads and bond wires. Recently developed low-power crosstalk cancellation and
crosstalk reutilization schemes are able to achieve high signal integrity even in severe
crosstalk environments, where previously, the data eye openings might have been
completely closed [1]. The power-efficient adaptive techniques described here are
intended for use with the hardware circuits and techniques described in a previous
study [1].
Adaptive algorithms to find the optimal ISI compensation for channel loss in single
data transmission channels, have been actively investigated [2, 3]. Reference [3]
presents an adaptive sign–sign LMS algorithm implemented in silicon to mitigate
ISI only. Reference [4] shows a transmitter-side FEXT cancellation scheme using a
fractional clock timing and [5] shows an adaptive XTC scheme for far-end crosstalk
at the receiver-side but the XTC logic is performed on the transition only and the
range of crosstalk strength that can be canceled is limited. In [6], an initial bring-
up sequence to find optimal Tx XTC coefficients was shown, but the integration of
the XTC and FFE tap coefficients were not explicitly defined. Integrating adaptive
XTC into existing AGC and adaptive equalization optimization techniques remains a
largely unsolved problem. We propose a new adaptive optimization technique that is
able to find a global optimum for XTC, AGC, and DFE. In this paper, an ideal clock
recovery timing is assumed and we focus on the joint adaptation between XTC, and
AGC & DFE, although the clock recovery loop can affect the performance of our
scheme slightly.
The rest of this section is organized as follows. Section 4.1 introduces crosstalk
and our adaptive XTC algorithm [1]. In Sect. 4.2, we propose extensions to our basic
XTC scheme, which is then validated via simulations. In this section we first ignore

48 4 Adaptive XTCR, AGC, and Adaptive DFE Loop
Fig. 4.1 Simulated eye degradation as a function of channel spacings at 12 Gb/s
the much smaller reutilized crosstalk signal during initial adaptive XTC analysis for
simplicity [1]. Section 4.4 presents the integrated adaptive algorithms that combine
XTC, AGC and DFE. We validate the convergence properties of the tap coefficients
and show the improved performance that results from incorporating the reutilized
crosstalk signal in the DFE tap coefficients.
4.1 Understanding Crosstalk Behavior 49
Fig. 4.2 Positive and negative crosstalk impact on the data transition of the forward signal
4.1 Understanding Crosstalk Behavior
The amount of crosstalk signal is a strong function of the channel-to-channel spacing.

Figure 4.1 shows the simulated eye degradation with various channel spacings. When
the spacing is extremely small, the eye is completely closed as shown in the fourth
row on the bottom RHS of Fig. 4.1.
Figure 4.2 shows an eye-diagram of an NRZ signal with crosstalk and a magnified
version of the transition timing. There are two types of crosstalk that couples onto the
data transition, positive crosstalk and negative crosstalk. If we place a slicer triggered
by a clock at the middle of a transition, the resultant digital values can represent the
positive or the negative impact caused by crosstalk depending on the type of data
transition of the adjacent forward signal. The logical relationship between the forward
signal and a positive or negative crosstalk is described in the table located on the
bottom RHS of Fig. 4.2. Regardless of the direction of data transition of the forward
signal, the direction of the shift caused by crosstalk is a function only of the direction
of the data transition in the adjacent channel.
Fig. 4.3 Proposed adaptive XTC architecture
4.2 Adaptive XTC
The block diagram in Fig. 4.3 shows the proposed adaptive XTC architecture. The
single-to-differential converter (SDC) has unity gain. A slight change from the XTC
receiver shown in [7] is that the VGAs and adders are merged into a single stage
and the XTC gain is adjusted via the signal addition ratio. Designing the output of
the current-mode CMOS differential pair to be part of the adder with a constant DC
current, as shown in Fig. 4.6, allows us to avoid using bypass capacitors at the output
of the adder. This enables a direct connection to the input gate of the amplifier in
the following AGC and equalization stage, reducing the parasitic capacitance and
improving its speed. After the XTC operation in the adder, the amplitude of the
resultant signal can vary depending on the value of the addition. This variation of the
signal amplitude level is compensated by the combination of the AGC and adaptive
DFE that follow so as to maintain a constant peak-to-peak amplitude at the output
of the equalization stage. For simplicity, we assume that the adaptive equalization is
properly set to compensate for any channel loss, and that the signal before the slicers
suffers from crosstalk only. The AGC and adaptive DFE loop will also be assumed
to work independently of the adaptive XTC loop although both share a single slicer.
In Sect. 4.4, we will validate this assumption.
Figure 4.4 shows the simulated eye-diagram of a signal after ISI equalization
in channel 1. Since the ISI equalizer cannot remove the crosstalk, we still see the
impact of crosstalk on the timing jitter. Alternatively, a DFE can mitigate the ISI
4.2 Adaptive XTC 51
Fig. 4.4 Blocks used for data and edge slicing after appropriate ISI equalization
Table 4.1 The combinational logic required for updating the XTC gain for channel 1
x1 [t0 ] ⊕ x1 [t1 ] x2 [t0 ] x2 [t1 ] x1 [t0.5 ] Diagnosis UP DN
1 0 1 0 Under compensation 1 0
1 0 1 1 Over compensation 0 1
1 1 0 0 Over compensation 0 1
1 1 0 1 Under compensation 1 0
Otherwise No update 0 0
without increasing crosstalk noise but the impact on jitter caused by crosstalk remains
unchanged and reduces the horizontal eye margin. A data slicer that is triggered by the
recovered clock makes decisions on the equalized signal. In parallel, an edge slicer
samples the data signal at the transition timing and detects if the crosstalk is likely to
produce a positive or negative impact, as illustrated in Fig. 4.2. The detected digital
signals are used to feed an adaptive XTC loop. CML-to-CMOS circuits convert the
differential signals at the slicer outputs into digital ones. A combinational logic block
generates the ‘UP’ or ‘DN’ pulses depending on the sign of the residual crosstalk after
XTC. An integrator updates the XTC gain by integrating the ‘UP’ and ‘DN’ pulses.
The digital delay block and combinational logic are similar to the phase detectors in
a clock recovery circuit, and can be shared to save a power.
Assuming that the recovered 0◦ clock provides a rising edge timing at the data eye
center (t0 or t1 ) for the differential data slicer, the differential edge slicer is triggered
by a 180◦ clock and samples the data edge generating a digital signal, x[t0.5 ]. If
the detected differential data signal at the timing of a rising edge of a 180◦ clock is
larger than 0, which implies a positive residual crosstalk impact, the edge slicer and
CMOS-to-digital blocks end up with ‘1’ and vice versa. The following digital delay
block holds this x[t0.5 ] for the needed duration and all x[t0 ], x[t0.5 ] and x[t1 ] are sent
concurrently to the combinational logic block based on Table 4.1 for deciding if we
are overcompensating or undercompensating for the crosstalk.
Table 4.1 was created based on this logical hypothesis shown in Fig. 4.2. Using
x1 [t0 ] and x1 [t1 ] from channel 1, and x2 [t0 ] and x2 [t1 ] from channel 2 we can predict
if the data transitions will contain a positive or negative residual crosstalk, as stated
Fig. 4.5 Simulation of the adaptation of control voltage for XTC gain
in Fig. 4.2. If the detected digital signal x1 [t0.5 ] or x2 [t0.5 ] is identical to the predicted
digital signal, then the XTC is undercompensated and the combinational logic gen-
erates an ‘UP’ to increase the XTC signal gain and vice versa. In essence, through
this adaptive loop, we are trying to force the sampled value before the differential
slicer at the data transition timing to become zero.
The top image in Fig. 4.5 shows the steps used for the low frequency feedback
loop to adjust the adaptive XTC gain. The detection block and combinational logic
block generate ‘UP’ or ‘DN’ pulses, which are integrated by a charge pump to update
the control voltage for the XTC gain. In the frequency-domain, the loop gain for this
feedback system can be express as
A Is
Loop gain = 0.25 (4.1)
sC
where ‘Is ’ is a charge pump current and ‘C’ is the integration capacitance, as shown
on the bottom LHS of Fig. 4.5. Likewise, ‘A’ is the AGC and adaptive equalization
gain and in this section, we shall assume that it is unity for simplicity. There is a
multiplication factor of 0.25 because the detection logic updates the ‘UP’ or ‘DN’
values only when the transitions in both channel 1 and channel 2 occur at the same
time, and the probability of this event is 1/4. In practice, this multiplication factor will
vary depending on the patterns in channels 1 and 2. Since there is only a single pole
at the origin for this feedback loop, the loop is always stable with a 90◦ phase margin.
The voltage step (V ) in ‘VC O N T ’ at each symbol duration is ‘A Is Tb /C’. We
have simulated the adaptive XTC algorithm for the crosstalk setup shown on the third
row of eye-diagrams in Fig. 4.1. In our 12 Gb/s application, we set ‘Tb ’ as 83.3 ps.
The other parameters are A = 1, Is = 50 µA and C = 1 pF. The difference in the
4.2 Adaptive XTC 53
Fig. 4.6 Ratio control in the high-speed XTC adder, current based gm control (top right), and
DAC-based control (bottom)
control voltage from an initial value to the optimal value to which it converges is
757 mV. Based on these settings, the theoretical convergence time of the loop can be
calculated as
757 mV · Tb
Convergence time = = 60.1 ns (4.2)
0.25 · V
In comparison to the simulated value of 62 ns, there is a 3.2 % error because of non-
ideal random probability distribution in the data pattern. By increasing the charge
pump current or reducing the capacitor size, the voltage step size can be increased
to reduce the convergence time. However, the control voltage during steady-state
operation has larger perturbations due to the larger voltage step and this directly
translates to an increase in the residual crosstalk noise after XTC. To avoid this
trade-off, an additional counter that holds the control voltage for a fixed time after
convergence is reached can be used [8].
Figure 4.6 shows the high-speed signal adder circuit. The circuit on the top RHS
shows an implementation of the adder block with analog control. We control the
DC current ratio of the forward path amplifier and that for the differential path
amplifier which in turn varies the gm ratio. The transconductance of the two paths
shares a common load, and the forward and crosstalk signal addition gains can be
varied depending on gm ratios. Since the DC current flowing through the load is
Fig. 4.7 Simulated XTC adder adaptation results for different crosstalk levels, the impact on eye
openings and the final VC O N T values
constant regardless of current ratios (i.e., as is the gain), the DC values at the output
nodes remain constant, such that AC coupling capacitors to the next stage can be
avoided. Although this current control methodology enables fine gain tuning, the
input overdrive of the MOSFETs will vary depending on the DC current in each
branch. For a large XTC signal that requires higher amplification, the amplifier will
end up allotting a higher gm and thus a larger input overdrive, which limits the signal
swing at the other input due to the reduced current, gm , and gate overdrive. To solve
this issue, we can implement this block using DAC-based current source switching,
as shown in the bottom circuit of Fig. 4.6. The circuit shows 3 bits of control but can
easily be extended for high resolution. Figure 4.7 shows the simulation results for
our adaptive XTC technique for various crosstalk strengths. Each of these different
crosstalk strengths is inversely proportional to the channel-to-channel spacing.
The range of ‘VC O N T ’ is 0–1 V, and the overall gain of the adder ‘G’ is set to 4 in
our simulation. When ‘VC O N T ’ is increased, a larger crosstalk cancellation signal
is added while the forward signal with the crosstalk signal decreases. As shown in
the simulation results in Fig. 4.7, for a larger crosstalk value that is introduced, the
4.2 Adaptive XTC 55
adaptive XTC loop converges ‘VC O N T ’ to a larger value but takes a little longer to
achieve this. The larger settling time for a larger crosstalk value is due to the constant
slope of settling. The injected crosstalk strengths presented in this simulation are
equivalent to that of Fig. 4.1. Note that the value that it finally converges to, α/(1−α),
is proportional to the peak-to-peak amplitude of the crosstalk signal.
Throughout this section, we have avoided including the reutilized crosstalk signal
in the adaptive XTC loop analysis. The reutilized crosstalk signal does not affect
the decision of the edge slicer. This is because the reutilized crosstalk signal is the
derivative of the crosstalk signal [1, 7, 10], and its timing behavior is similar to that
of the forward signal. At the timing of the maximum crosstalk amplitude during the
data transition when the edge slicer decides between a positive or negative crosstalk
impact, the theoretical reutilized crosstalk signal value is zero and the decision is
not affected. For a larger crosstalk strength, the amplitude of the forward signal after
XTC adaptation tends to become smaller as shown in Fig. 4.7. The AGC and adaptive
DFE blocks will adjust this variation in the forward signal strengths into a constant
signal amplitude.
In the next section we will integrate the various adaptive loops, XTC, AGC and
adaptive DFE and a successful operation of the integrated scheme will be presented.
4.3 Automatic Gain Control and Adaptive Decision Feedback

Equalization
A decision feedback equalizer is a discrete-time scheme, where the sampling clock

provides an edge at the maximum pulse response amplitude (cursor timing). An ideal
pulse response at the transmitter ends up with a pulse response that contains ISI tails
after going through the dispersive channel, as shown in the LHS picture in Fig. 4.8.
The RHS picture in Fig. 4.8 shows our algorithm for the AGC and adaptive DFE in
the discrete domain. For our simulation we shall assume that the ISI tail lasts for only
two timings (h1 , h2 ). The goal of the AGC and adaptive DFE block is to generate an
ISI-free signal with a constant amplitude ‘B’ at the node before the slicer, where the
signal is indicated as z[k] in the figure. On the RHS of Fig. 4.8, the slicer generates
one of two potential digital signals, 1 and −1 (differential), depending on the slicer
decision.
The received signal r[k] results from a convolution of a transmitted signal sequence
x[k] and [h[0] h[1] h[2]] (i.e., the cursor and ISI tails of the channel). This signal is
amplified by a gain control value A[k] and equalized by the DFE loop. The equalized
signal z[k] is expressed as A[k]r[k]-c1 [k]x̂[k-1]-c2 [k]x̂[k-2], where x̂[k-1] and x̂[k-
2] are the digital values previously chosen by the slicer, which are −1 or 1. All the
signals described except for x[k] and x̂[k], have discrete-time values and can take on
any real number. r[k] and z[k] are also assumed to be differential signals.
For the adaptation of the AGC and DFE tap weights, we use a least mean square
(LMS) algorithm. As the gain control and equalization proceeds iteratively, adaptive
variables, A[k], c1 [k] and c2 [k] converge to the optimal values where the error signal
Fig. 4.8 Discrete-time AGC and DFE adaptation loop
e2 [k] reaches a minimum value. The error signal can be expressed as
e[k] = z[k] − B x̂[k] = A[k]r [k] − c1 [k]x̂[k − 1] − c2 [k]x̂[k − 2] − B x̂[k] (4.3)
The LMS algorithm is based on a gradient descent and goes in the direction where the
error becomes smaller. At every symbol period, each adaptive variable is updated with
the negative partial derivative of the error signal e2 [k], which represents a gradient
vector of the function e2 [k] in a space made by the axis of A[k], c1 [k] and c2 [k] (i.e.
e2 [k = func(A[k], c1 [k], c2 [k])).
∂e2 [k]
A[k + 1] = A[k] − μ = A[k] − 2μ · r [k]e[k] (4.4)
∂ A[k]
∂e2 [k]
c1 [k + 1] = c1 [k] − μ = c1 [k] + 2μ · x̂[k − 1]e[k] (4.5)
∂c1 [k]
∂e2 [k]
c2 [k + 1] = c2 [k] − μ = c2 [k] + 2μ · x̂[k − 2]e[k] (4.6)
∂c2 [k]
where μ is a variable that sets up the update rate. A larger μ, results in a quicker set-
tling time but normally degrades the stability of the loop. After sufficient iterations,
the amplitude of e[k] approaches zero and all the adaptive variables converge. How-
ever, it is difficult to implement the product terms of r[k]e[k] in (4.4), x̂[k-1]e[k] in
(4.5) and x̂[k-2]e[k] in (4.6) using analog circuits. The most critical update at every
iteration in the LMS algorithm is the sign of the gradient for each adaptive variable
that can be obtained from the partial derivative of the error signal. As long as the
gradient vector for each variable is headed in the correct direction, they will eventu-
ally reach the optimal point where the error signal e2 [k] becomes the minimum. This
modified version of LMS is called sign–sign LMS, and is much easier to implement
4.3 Automatic Gain Control and Adaptive Decision Feedback Equalization 57
Fig. 4.9 Convergence of AGC gain and the DFE coefficients [LMS (left) and sign–sign LMS
(right)]
in hardware [11]. However, r[k] in (4.4) is replaced by x̂[k] in (4.7) because the
latency in calculating e[k] from r[k] in actual hardware becomes problematic.
A[k + 1] = A[k] − 2μ · sign(x̂[k])sign(e[k]) (4.7)
c1 [k + 1] = c1 [k] + 2μ · sign(x̂[k − 1])sign(e[k]) (4.8)
c2 [k + 1] = c2 [k] + 2μ · sign(x̂[k − 2])sign(e[k]) (4.9)
After sufficient iterations, while converging the amplitude of e[k] approaches

zero and the sign(a[k])sign(e[k]), sign(x̂[k-1])sign(e[k]) and sign(x̂[k-2])sign(e[k])
values will toggle between 1 and −1 with equal probability. This is because μ is
a small value, and the adaptive variables, A[k], C1 [k] and C2 [k], do not change
significantly with the toggling of the sign and finally converge. In Fig. 4.9 we show
the convergence of the adaptive coefficients for the LMS and for the sign–sign LMS
algorithm in Matlab. For this set of simulations, the cursor, first tap ISI and second
tap ISI (h0 , h1 and h2 in Fig. 4.8) values are set to 0.5, 0.2 and 0.1 V, respectively.
The received pulse response with ISI is convoluted with the data sequence x[k] with
digital values of, 1 or −1. The target amplitude of the equalized signal z[k] is 0.5
V ppd (B = 0.25 V in Fig. 4.8). The AGC gain A[k] converges to 0.5 to meet the target
amplitude, and because of this reduced forward signal value, the DFE tap coefficients
converge to have the ISI taps discussed earlier.
When implementing the algorithm in hardware, multiplying B with x̂[k] and
comparing it with z[k] to generate a sign(e[k]) term at every symbol, as shown on the
LHS in Fig. 4.10 can quickly become a speed bottleneck. The circuit diagram on the
RHS of Fig. 4.10 shows a circuit implementation that can circumvent this problem.
The equalized signal z p [k] is the positive half of a differential signal z[k], which can
be compared with reference voltages B/2 and −B/2 using a differential slicer. Using
combinational logic, sign(e[k]) in (4.7), (4.8), and (4.9) can be obtained as −1 or 1,
Fig. 4.10 High-speed circuit implementation that compares z[k] and Bx̂[k] and generates the
sign(e[k]) term
Table 4.2 Combinational logic table for LMS algorithm

Adaptive coefficients Charge pump node Logic expression
A[k] UP ei [k]⊕xi [k]
DN UP
c1 [k] UP ei [k]⊗xi [k-1]
DN UP
c2 [k] UP ei [k]⊗xi [k-2]
DN UP
to be represented by an implementable digital value of 0 or 1, which is indicated as

ei [k] in Fig. 4.10. The logic gates are based on the assumption that when the data
xi [k] is 1, the error signal ei [k] is 1 if and only if the detected digital values in the top
two slicers, pi [k] and ni [k] are 1. In addition, when the data xi [k] is 0, the value of
ei [k] is 1 only if the detected digital value ni [k] on the slicer in the middle is 1. eib [k]
is the inverted digital value of ei [k]. x̂i [k] is the single-ended digitally implementable
data bit of the digital value x̂[k], which can take on values of 1 or −1.
Using the digital value of the error sign, ei [k], and the detected digital data val-
ues, x̂i [k-2], x̂i [k-1] and x̂i [k], we can create an ‘UP’ or a ‘DN’ signal for a charge
pump that integrates the -sign(a[k])sign(e[k]), sign(x̂[k-1])sign(e[k]) and sign(x̂[k-
2])sign(e[k]) shown in (4.7), (4.8) and (4.9) with a speed of μ. This speed is decided
by the integration current and capacitor like that shown for the adaptive XTC case
in Fig. 4.5. Table 4.2 shows the minimized combinational logic required for the inte-
grators for each adaptive coefficient.
The timing path difference between ei [k] and x̂i [k], x̂i [k-1] or x̂i [k-2] can be
compensated by adding an even number of inverters or a transmission gate. Since
the charge pump is in either UP or DN mode, the charge pumps can be replaced by
4.3 Automatic Gain Control and Adaptive Decision Feedback Equalization 59
RC integrators to reduce power consumption unlike in the adaptive XTC case shown
in Table 4.1, which required an additional ‘no update’ state.
4.4 Combining the Adaptation of the XTC, AGC and DFE

Coefficients
In Sects. 4.2 and 4.3, we proposed techniques for the adaptation of the XTC coef-
ficients and described a fundamental sign–sign LMS algorithm to implement the
adaptation of the coefficients for the AGC and the DFE. In this section, we demon-
strate independent adaptation of the two integrated loops and validate it via behavioral
simulations in Verilog-A.
Figure 4.11 shows the pulse response of a forward signal and crosstalk signal
(bottom left) and the resultant eye-diagram (bottom right). Since the forward signal
and crosstalk are independent of each other, the crosstalk acts as a noise term and its
largest impact occurs on the data transition timing of the forward signal. As illustrated
in Sect. 4.2, the sampling time for an adaptive XTC loop is half the unit interval (UI)
between integer timings when the noise by a crosstalk is the dominant factor. On
the other hand, in most symbol-rate equalization schemes, the sampling time for
an adaptive equalization loop is at an integer UI when the vertical eye-opening is
largest (cursor timing). At this timing, the channel ISI contributes to the voltage
Fig. 4.11 Timing for the forward and crosstalk signals. Forward signal is dominant at integer UI
intervals and crosstalk is dominant a the half UI points
Fig. 4.12 Integrated adaptive XTC and AGC and DFE architecture for channels with both ISI and
crosstalk (simulation I) and conventional adaptive AGC and DFE architecture for channels with
only ISI (simulation II). Simulation results are shown in Figs. 4.13 and 4.14
noise mostly, and the noise caused by crosstalk is typically trivial. Using these two
independent sampling times for the XTC adaptation loop and the AGC and DFE
adaptation loop, allows us to integrate them with minimal interaction.
The integrated adaptive XTC architecture with ISI equalization is shown in simu-
lation I at the top of in Fig. 4.12. The data sampler and edge (data transition) sampler
are triggered by time interleaved clocks. The XTC loop uses the digital output signal
from both data and edge samplers, while the AGC and DFE loop uses the digital
output signal of the data sampler and the signal just before the sampler via an error
4.4 Combining the Adaptation of the XTC, AGC and DFE Coefficients 61
detector block. The VGA gain (A) and DFE coefficients of 3-taps (C1 , C2 , C3 ) are
adaptively adjusted using the error signals. There are three nodes used for signal
observation in the simulation, as marked in the grey boxes in Fig. 4.12. The received
signal at point X is single-ended and includes both ISI noise and crosstalk from
an adjacent channel, which will be converted to a differential signal by a single-
to-differential converter (SDC) circuit with unity gain. The XTC adder block com-
bines the forward signal path and differentiated value of the adjacent signal path
with a ratio of G(1 − α) to Gα. The overall gain ‘G’ is set to 4. After XTC, the
signal at point Y is crosstalk-free but suffers from signal amplitude variation and
ISI tails, as discussed in Sect. 4.2. The variation in the signal amplitude comes from
the variation of the crosstalk strength and resulting addition ratio of the XTC adder,
degree of channel loss and reutilized signal strength. The AGC and DFE combination
create an ISI-free NRZ signal with a constant signal level at point Z in Fig. 4.12. The
sign–sign LMS algorithm presented in Sect. 4.3 was used.
Next, we quantify the signal improvement via the reutilized crosstalk signal.
The reutilized crosstalk is a signal that would have normally been discarded during
traditional XTC [9]. In simulation II (bottom) in Fig. 4.12, we use channels with an
insertion loss equivalent to that in simulation I, except that they are crosstalk-free. The
strength of the reutilized signal depends on the amount of crosstalk. For comparison
purposes (reutilized and non-reutilized), we use the same addition ratio of XTC at a
particular crosstalk strength in simulation I for simulation II but disable the crosstalk
cancellation and crosstalk reutilization signal path (differentiation path) in the XTC.
In other words, for simulation II, the crosstalk is removed from the channel and the
crosstalk cancellation signal is disabled but the forward signal gain is maintained
as in simulation I. The node after an XTC block includes a forward signal with a
reutilized crosstalk signal in simulation I. The same node has the same forward signal
without the reutilized crosstalk signal in simulation II.
Figures 4.13 and 4.14 present simulation results for a 12 Gb/s adaptive XTC and
AGC & DFE system. The three channels have insertion losses of −15.7, −17.7 and
−19.7 dB with three crosstalk strengths of 60, 120 and 180 mV pp for each insertion
loss. The transmitted signal is an NRZ signal with an amplitude of 500 mV pp . The
eye-diagrams on the nodes at points X, Y and Z are observed, and their overall signal
amplitude and vertical eye-opening at the sampling point are shown in the first three
columns. The fourth column shows the converging values of the AGC gain (A) and
the XTC adding ratio (α) in simulation I and simulation II. Here, α in simulation II
is fixed to the equivalent value of simulation I. The fifth column shows the values for
the DFE coefficients (C1 , C2 , C3 ) in simulation I and simulation II. All the finally
converged values are summarized in the table at the bottom of Fig. 4.14 along with
the power savings that results from using the reutilized crosstalk signal.
The power reduction due to the reutilized crosstalk signal is considered for a 65 nm
CMOS process and we assume a differential pair structure as shown on the LHS of
Fig. 4.15, which usually forms the base for the high-speed AGC and DFE circuits.
Here, the load C L (100 fF) is a typical parasitic. For realistic comparisons, we include
a self-loading (Cgm ) for the MOSFET that needs to be driven. Here, ‘a’ is the ratio
of the gate capacitance to the drain capacitance (20 % typically). However, the range
Fig. 4.13 Eye-diagrams at points X, Y and Z of simulation I in Fig. 4.12 at convergent state and
adaptation of each of the coefficients (A, α, C1 , C2 , C3 ) for insertion loss of −15.7 and −17.7 dB
at Nyquist rates in simulation I and II
of self-loading capacitance from the MOSFET with a size that can provide a gain
of 0–4 in our simulation, is 0–30 fF and the flat gain bandwidth of the amplifier is
constant. The value of R L , 134 results in a 1-dB loss
dominated by the R L C L time
at the Nyquist rates (20 log 1 + R 2L C L2 ω2BW = 1), where ω BW is 2π · 6 · 109 for a
12 Gb/s NRZ signal.
In the simulation shown in Fig. 4.15, we maintain the overdrive voltage (V O V )
at 100 mV and increase the aspect ratio W/L and the DC current of the device pro-
portionally at the same time. The picture on the RHS of Fig. 4.15 shows the voltage
gain achieved depending on the various DC current levels provided. The gain ‘A’
is almost linearly proportional to the power (2I D ). However, due to the non-ideal
Fig. 4.14 Eye-diagrams at point X, Y and Z of simulation I in the convergent state and adaptation
of each coefficient (A, α, C1 , C2 , C3 ) for insertion losses of −19.7 dB in simulation I and II. The
final converged values of C1 , C2 , C3 are summarized and the power consumption saved by using
the reutilized crosstalk signal is quantified via circuit simulation, for the circuit shown in Fig. 4.15
operation of the short channel device (i.e. velocity saturation), the plot contains a
little bit of some non-linearity. The LHS graph of Fig. 4.16 shows the target gain ‘A’
versus the required current 2I D when the input voltage swing (100 mV overdrive) and
bandwidth (6 GHz) for the MOSFET are constant. For a larger gain ‘A’, it requires a
little more current to increase the gain by A because of the non-linearity. The RHS
graph of Fig. 4.16 plots the ac gain in the frequency domain for various gain values
of ‘A’. In Figs. 4.13 and 4.14, the final converged control voltage of the VGA (A) and
the DFE coefficients (C1 , C2 , C3 ) in simulation I are smaller than those in simulation
II due to the beneficial addition of the reutilized crosstalk signal. Based on the LHS
graph of Fig. 4.16, the VGA gain difference (A) and coefficient difference (C1 ,
C2 , C3 ) of the final values are calculated to give the necessary current values
and the total current savings are summarized in the table at the bottom of Fig. 4.14.
Fig. 4.15 A reference differential pair used to estimate the power for our design in 65 nm CMOS
Fig. 4.16 The required power 2I D versus the target voltage gain ‘A’ (left) and AC gain in the
frequency domain with various gain settings of ‘A’ for constant overdrive and bandwidth (right)
When using the reutilized crosstalk signal there is power savings in both the neces-
sary AGC gain and for DFE taps. The amount of savings increases with increased
crosstalk strength and increased signal loss.
Figure 4.17 shows the pulse responses of the forward signal and the reutilized
crosstalk signal. The reutilized signal is the crosstalk signal that is returned to the
forward path by performing the differentiation in the XTC circuit. The dual differ-
entiation, first due to the crosstalk, and the second time during the generation of the
XTC signal causes the forward and reutilized signal to be completely correlated.
Since the forward signal and reutilized crosstalk signal are correlated, the reutilized
crosstalk signal improves the shape of the forward signal pulse response in the way
to widen the vertical eye-opening, as shown in Fig. 4.17. The major benefits are the
increased cursor (h0 ) and the reduced first ISI tail (h1 ) by the reutilized crosstalk
signal (m0 and m1 ). The increased cursor directly translates to reduced AGC gain
requirements and thus reduces power consumption. For a higher reutilized crosstalk,
the final converged AGC gain is significantly reduced, as shown in the fourth column
in Figs. 4.13 and 4.14. The equalized signal maintains a constant target amplitude.
Fig. 4.17 A vertical eye-opening improvement by adding a reutilized crosstalk signal
The graphs shown in Fig. 4.18 summarize the simulation results in Figs. 4.13
and 4.14. Graph (a) plots the final converged XTC ratio (α) for the various crosstalk
strengths. For a larger crosstalk, the XTC allots more gain to the crosstalk cancellation
path (α). For a larger insertion loss, α increases because the forward signal amplitude
becomes smaller relative to the crosstalk strength and an increased addition ratio (α)
is required. Graph (b) shows the final converged AGC gain (A) and the first DFE
tap coefficient (C1 ) for the various insertion losses. A larger insertion loss requires a
higher AGC gain to meet the constant target amplitude and requires larger DFE tap
coefficients to cope with larger ISI tails. Interestingly, the larger reutilized crosstalk
signal helps decrease the required DFE tap coefficients and the final value for C1 is
smaller in the case of higher crosstalk.
Graph (c) shows the power saved on AGC only and graph (d) shows total power
saved (%) for the AGC and DFE taps relative to the total power consumed for the AGC
and DFE taps in simulation I. For a higher crosstalk value and increased insertion
loss, the improvement due to the use of the reutilized crosstalk signal is larger and
results in more power being saved. For example, for −19.7 dB of insertion loss and
180 mV p− p of crosstalk, the savings in current for AGC and DFE is 35 % out of
4 mA of current consumption. Clearly, the absolute value in powers will depend on
the circuit and technology used for implementation. Additionally, the total power
reduction will depend on the power consumed by the rest of the circuits. However,
there is potential for power savings by using the reutilized crosstalk signal. An
intuitive way of thinking about this is that we are increasing the SNR by reutilizing
the signal that would normally have been lost.
(a) (b)
(c) (d)
Fig. 4.18 Adaptive XTC, AGC and DFE simulation results
4.5 Conclusions
A new adaptive algorithm for XTC is proposed and is proven to be workable in

conjunction with adaptive AGC and DFE schemes. Transition filtering detectors
evaluate for undercompensation or overcompensation by considering the sampled
value in the middle of the data transition. Through a low-speed control loop, the
XTC gain optimally adapts to the strength of the crosstalk. A range of input sig-
nal amplitudes after XTC is handled by the adaptive AGC and DFE stages. LMS
algorithms are used for the adaptation of the AGC and DFE, and are integrated
with our adaptive crosstalk cancellation scheme. The different timings for adaptive
loops enable the adaptive XTC and the adaptive AGC and DFE to run independently,
making the integration of two adaptive loops feasible. The beneficially reutilized
crosstalk energy has been quantified by considering the final values of the coeffi-
cients for the AGC and the DFE. The reutilized crosstalk signal contributes to the
cursor and DFE taps, allowing smaller adaptation values and significant savings in
power.
Acknowledgments The present research was supported in part by the Semiconductor Research
Corporation under grant #2008-HC-1836-090768 at the University of Minnesota and has been
partially conducted under a Research Grant from Kwangwoon University in 2013.
References 67
References
1. O.h. Taehyoun, R. Harjani, A 6-Gb/s MIMO crosstalk cancellation scheme for high-speed I/Os.
IEEE JSSC 46, 1843–1856 (2011)
2. V. Stojanovic, A. Ho, B.W. Garlepp, F. Chen, J. Wei, G. Tsang, E. Alon, R.T. Kollipara, Z.-J.L.
Werner, C. W., M.A. Horowitz, Autonomous dual-mode (PAM2/4) serial link transceiver with
adaptive equalization and data recovery. IEEE JSSC 40, 1012–1026 (2005)
3. N. Krishnapura, M. Barazande-Pour, Q. Chaudhry, J. Khoury, K. Lakshmikumar, A. Aggarwal,
A 5 Gb/s NRZ transceiver with adaptive equalization for backplane transmission, in IEEE
ISSCC (2005), pp. 60–61
Crosstalk-and-ISI equalizing receiver in 2-drop single-ended SSTL memory channel, IEEE
CICC (2010), pp. 19–22
5. S.-K. Lee, H. Ha, H.-J. Park, J.-Y. Sim, A 5 Gb/s single-ended parallel receiver with adaptive
FEXT cancellation, in IEEE ISSCC (2012), pp. 140–142
6. S.-Y. Kao, S.-I. Liu, A 7.5 Gb/s one-tap-FFE transmitter with adaptive far-end crosstalk can-
cellation using duty cycle detection. IEEE JSSC 48, 391–404 (2013)
7. O.h. Taehyoun, R. Harjani, 4×12 Gb/s 0.96 pJ/b/lane analog-IIR crosstalk cancellation and
signal reutilization receiver for single-ended I/Os in 65 nm CMOS, in IEEE VLSI (2012)
8. Y.-C. Huang, S.l. Liu, A 6Gb/s receiver with 32.7dB adaptive DFE-IIR equalization, in IEEE
ISSCC (2011), pp. 356–357
9. T. Oh, R. Harjani, A 5Gb/s 2×2 MIMO crosstalk cancellation scheme for high-speed I/Os, in
IEEE CICC (2010)
10. T. Oh, R. Harjani, A 12 Gb/s multi-channel I/O using MIMO crosstalk cancellation and signal
reutilization in 65 nm CMOS. IEEE JSSC 48(6), pp. 1383–1397 (2013)
11. D.-J. Park, Y.-W. Kim, Convergence analysis of sign-sign LMS algorithm for adaptive filters
with correlated Gaussian data, in ICASSP 40, pp. 1380–1383 (1995)
Chapter 5
Research Summary and Contributions
A new approach to handle crosstalk is provided. While previous approaches have

simply treated crosstalk as noise to be cancelled out, we consider it as a deterministic
and expectable signal that is highly dependent on the shape of the aggressor signal.
Insightful observation of the characteristics of a crosstalk signal led us to a novel idea
to suppress the destructive crosstalk energy and to reuse the constructive crosstalk
energy. We propose a new idea to make these two mechanisms happen simultaneously
(MIMO), which is hardware-friendly in terms of power.
The major distinction of our approach compared to the existing XTC techniques
is to generate the crosstalk cancellation signal with an IIR filter. The duration of
crosstalk tails from a single aggressor pulse is typically several unit intervals (UI).
Reductions in UI time have been pursued for higher throughput in recent years.
The flip-flops used to maintain the high-speed signal for several UIs and wiring the
required connections increase the hardware complexity and power. We propose a
new method to create a crosstalk cancellation signal with a passive differentiator.
Through silicon measurement, as shown in Fig. 3.14, we verify that the generated
crosstalk cancellation signal and crosstalk signal are almost identical. The passive IIR
differentiator to create an XTC signal leads to a low-power hardware implementation.
We choose to implement XTC in the receiver front-end rather than the transmitter
driver, because an additional circuit at the receiver typically consumes less power.
In addition, we developed an implementation method to compensate for the timing
mismatch that can come from adding a differentiator, as shown in Fig. 3.8. This
timing mismatch issue in the XTC signal path has not been sufficiently addressed in
previous approaches.
In recent XTC research, XTC between only two channels has been a center of
focus on for multiple reasons. First, the XTC between two channels is believed to
be extended to a general infinite number of multiple lanes without proof, because
the repetition of XTC between 2 channels seems applicable to multiple lanes (≥4).
However, as illustrated in Fig. B.1, if we apply XTC between each adjacent channels
in multi-lanes (≥4), an error signal is generated, which is not outweighed by the
benefits of the additional XTC scheme in the existing ISI equalization-based I/O

70 5 Research Summary and Contributions
Fig. 5.1 The conventional experiment versus proposed experiment to test XTC schemes
architectures. Second, the low-power XTC technique was not available, and extension
to XTC circuits with more than 4 channels did not look feasible because of the large
size by multiple XTC blocks. Because of this, an approach to handle the crosstalk in
multiple lanes (≥4) and to extend the XTC techniques to more than 4 channels has
not been achieved. Through the crosstalk measurements shown in Fig. 3.4 and 3.5,
we identified that the crosstalk between two adjacent channels is the most significant
noise source. The issue of the error signal for multiple lanes was identified for the
first time. We propose new differential-like single-ended I/Os and XTC techniques in
the proposed I/Os, which are extendable to an infinite number of multi-lanes, and can
avoid the error signal issue. This technique can be applied to the existing channels
with a physical dimension of differential I/Os and the speed of data transmission can
be as much as twice that of a typical differential I/O data transmission because two
independent data signals are being transmitted by consuming a single differential
pair.
The experiments in current XTC research have been confined to an eye-diagram
improvement in a single channel output, as shown in the LHS picture in Fig. 5.1.
The improvement is believed to be generalized for multi-lanes, which has not yet
been proven sufficiently. In our research, we monitor the outputs of all channels,
and fairly show the improvement from XTC schemes when multiple independent
data are transmitted into closely spaced multiple channels, like in practical data
transmission in multiple parallel I/Os. We track the behaviors of all signals and
could find the MIMO (XTR) signal that improves the signal bandwidth when it is
added to a forward signal. The MIMO signal can be used to reduce the burden of the
equalization circuits to mitigate channel ISI (fewer LE stages or tap numbers) or to
push the envelope of the current equalization circuit speed. Pairing two wires carrying
two independent signals with closer spacing results in higher signal bandwidth, as
long as the generated XTC signal can cover the large crosstalk amplitude by the
closely spaced aggressor.
An adaptive algorithm to converge the crosstalk cancellation gain optimally and
automatically is proposed and verified via circuit simulations. The detection circuits
to judge insufficient or excessive XTC compensation can be merged into the detection
5 Research Summary and Contributions 71
circuits of the current clock recovery because of similarity. The adaptive XTC loop
copes with channel spacing variations from manufacturing PCB products. The MIMO
signal does not affect the XTC adaptation loop theoretically. We integrate the adaptive
XTCR loop with the current AGC and LMS DFE loop technique, and show that the
integration can be achieved by independent detection timings. By comparing the
converged AGC gain & DFE tap coefficients with and without the MIMO signal, we
quantify the saved power by the MIMO signal.
XTC in single-ended I/Os is more challenging than XTC in differential, I/Os be-
cause of the larger crosstalk strength and worse PSRR. We have proven that through
silicon measurment the multiple data transmission on the closely spaced multiple
parallel single-ended I/Os is feasible. However, the proposed technique is applica-
ble not only to single-ended I/Os, but also to the differential I/Os, as illustrated in
Fig. 2.8. The crosstalk cancellation research presented here can provide designers
with more options when designing wireline channels. The channels can be brought
closer together, and the board area consumed by channels can be reduced. By miti-
gating the crosstalk noise, the speed in a single wire can be pushed to a higher level.
If the number of I/Os is limited by channel proximity in the PCB, which is a major
factor for the degree of crosstalk, we can increase the number of I/Os. The overall
throughput of chip-to-chip data transmission will be significantly increased.
An intuitive analysis on the eye diagram in the worst case is provided. Section 1.2
shows how to predict the eye-opening in the worst case directly by using the pulse
response of a forward signal or a crosstalk pulse. The amplitude of crosstalk in the dif-
ferential I/Os is mathematically compared with that of single-ended I/Os in Sect. 1.3.
We developed a new method to represent the operation of multiple independent sig-
nals in multiple channels and XTC circuits with a frequency domain matrix form, as
illustrated throughout Chaps. 2 and 3. Calibration techniques to optimally adjust the
XTC gain are presented in Figs. 2.9, 2.10, and 3.13. A new reliable measurement
method to fairly evaluate the eye improvement by an XTC scheme is developed, as
shown in Fig. 3.18. A sign-sign LMS algorithm that can optimally adjust the AGC
gain and DFE tap coefficients is explained based on the gradient descent theory.
By using AGC and adaptive DFE tap coefficients, we have developed a concept to
evaluate the improvement by the MIMO signal in the power domain.
Appendix A
Noise Analysis
In this section we compare the noise performance of a conventional analog equalizer

and the MIMO-XTC scheme. Figure A.1 shows the noise simulation setup for the
two schemes. For this simulation we assume that four independent random noise
sources (N T 1 , NT 2 , N R1 , N R2 ) at the transmitter outputs and at the receiver inputs
degrade SNR. The signal, crosstalk and noise energy at points A, B, C, and D are
summarized in Fig. A.2. The simulation test setup used for these results include an
800 mV p− p NRZ signal operating at 5 Gb/s at the transmitter output, a 16-in. channel
length, a width of 120 mils, a height of 62.5 mils, and a separation of 240 mils between
channels. These are similar to the measured responses shown in Fig. 2.11d.
For these crosstalk dominant channels, a conventional analog equalizer scheme
is designed for comparison purposes, as shown in Fig. A.1a. The gain and pole loca-
tion in AAEC (ω) are set to result ω in an equivalent signal swing as
ωthe MIMO-XTC
scheme in Fig. A.1b and thus, 0 C |AAEC (ω)H (ω)X (ω)|2 dω = 0 C |G AAE (ω)(1 +
τ 2 ω 2 )H (ω)X (ω)|2 dω. Note that the signal energy at points C and D in Fig. A.2 is
matched as 311 mVrms . The corner frequency used for simulation purposes (ωC ) is
1 THz. The output signal and crosstalk are AAEC (ω)H (ω)X 1 (ω)− jωτ AAEC (ω)H (ω)
X 2 (ω) for channel 1 and − jωτ AAEC (ω)H (ω)X 1 (ω) + AAEC (ω)H (ω)X 2 (ω) for
channel 2. The crosstalk energy is amplified by AAEC (ω) and increases from 163 to
182 mVrms as shown in Fig. A.2.
The noise at the receiver input (point B in Fig. A.1a), considering coupling, can be
expressed as H (ω)N T 1 − jωτ H (ω)N T 2 + N R1 for channel 1 and − jωτ H (ω)N T 1 +
H (ω)N T 2 + N R2 for channel 2. When the noise passes through a conventional analog
equalizer system, the output noise (NAEC ) is
⎡ ⎤ ⎡ ⎤
NAEC1 AAEC (ω) (H (ω)N T 1 − jωτ H (ω)N T 2 + N R1 )
⎣ ⎦=⎣ ⎦ (A.1)
NAEC2 AAEC (ω) (− jωτ H (ω)N T 1 + H (ω)N T 2 + N R2 )
The output signal for MIMO-XTC, which can be obtained by appending an analog
equalizer stage to (2.3), is AAE (ω)Z (ω) = AAE (ω)(1 + τ 2 ω 2 )G H (ω)X (ω) for both
channels. Here, X (ω) is the transmitted signal. The crosstalk component is cancelled

Analog Circuits and Signal Processing, DOI: 10.1007/978-1-4614-4963-8,
74 Appendix A: Noise Analysis
(a)
(b)
Fig. A.1 Noise simulation setup in a severe crosstalk environment. a A conventional analog equal-
izer scheme. b A MIMO-XTC scheme
Fig. A.2 Signal, crosstalk

and noise energy (mVrms ) and
BER performance
and reused as a MIMO signal in our MIMO-XTC. The signal energy at point D
in Fig. A.1b includes this beneficial MIMO energy. On the other hand, the noise
component N M at point D can be expressed as
Appendix A: Noise Analysis 75
⎡ ⎤ ⎡ ⎤⎡ ⎤⎡ ⎤
N M1 AAE (ω) 0 G jωβG H (ω)N T 1 − jωτ H (ω)N T 2 + N R1
⎣ ⎦=⎣ ⎦⎣ ⎦⎣ ⎦
N M2 0 AAE (ω) jωβG G − jωτ H (ω)N T 1 + H (ω)N T 2 + N R2
⎡ ⎤
G AAE (ω) H (ω)(1 + τ 2 ω 2 )N T 1 + N R1 + jωτ H (ω)N R2
⎢ ⎥
=⎢
⎣
⎥
⎦ (A.2)
2 2
G AAE (ω) H (ω)(1 + τ ω )N T 2 + jωτ H (ω)N R1 + N R2
We can see that the noise at the transmitter outputs does not appear in the adja-
cent channel though the channels are coupled. The MIMO-XTC scheme cancels
the coupled noise between channels, which looks similar to the crosstalk cancella-
tion process. However, the noise component from the receiver input in an adjacent
channel is coupled through the differentiator and increases the overall noise level.
In Fig. A.2, the noise energy at point D increases from 5.5 to 16 mVrms due to the
additional noise from the adjacent channel receiver input noise. As expected, the
MIMO-XTC scheme has a noise penalty in comparison to the conventional ana-
log equalizer. However, the residual crosstalk in the conventional analog equalizer
scheme significantly degrades SNDR in this severe crosstalk environment and there-
fore, the overall noise plus crosstalk is considerably smaller for MIMO-XTC. The
BER can be evaluated by considering the eye-opening (E, H in Fig. 2.11c) and noise
standard deviation (σn2 = Vnoise
2 + Vcrosstalk
2 ) at each point [1].

σn E H E −E 2 H −H 2
Pe ≈ 0.0284 Q −Q + 0.0118 exp − exp
(H − E)/2 2σn 2σn 2σn 8σn2 2σn 8σn2

σn −E 2 H −H 2
− 0.1023 exp − exp (A.3)
(H − E)/2 8σn2 2σn 8σn2
The BERs at points B, C and D are approximately 10−11 , 10−2 , and 10−24 . For
this simulation, the noise bandwidth for the sources NT 1 , NT 2 , N R1 and N R2 was
limited to 100 GHz [1] and a standard deviation of 5 mVrms was used [2]. Figure A.2
shows the noise and signal energies at A, B, C, and D in Fig. A.1. The noise and
signal powers at each point have been estimated in mVrms by numerically evaluating
the following:
• Point A
ω
2
Vnoise = 0 C |N T |2 dω
2
Vcrosstalk =0
ω
2
Vsignal = 0 C |X (ω)|2 dω
• Point B
ω
2
Vnoise = 0 C |H (ω)N T |2 + | − jωτ H (ω)N T |2 + |N R |2 dω
76 Appendix A: Noise Analysis
ω
2
Vcrosstalk = 0 C | − jωτ H (ω)X (ω)|2 dω
ω
2
Vsignal = 0 C |H (ω)X (ω)|2 dω
• Point C
ω
2
Vnoise = 0 C |AAEC (ω)H (ω)N T |2 +|− jωτ AAEC (ω)H (ω)N T |2 +|AAEC (ω)N R |2 dω
ω
2
Vcrosstalk = 0 C | − jωτ AAEC (ω)H (ω)X (ω)|2 dω
ω
2
Vsignal = 0 C |AAEC (ω)H (ω)X (ω)|2 dω
• Point D
ω
2
Vnoise = 0 C |G AAE (ω)H (ω)(1 + τ 2 ω 2 )N T |2 + |G AAE (ω)N R |2 + | jωτ G AAE (ω)
H (ω)N R |2 dω
2
Vcrosstalk ≈0
ω
2
Vsignal = 0 C |G AAE (ω)H (ω)(1 + τ 2 ω 2 )X (ω)|2 dω
Appendix B
Issues of Applying Consecutive 2 × 2 XTCR on
Multi-Lane I/Os (≥4)
Figure B.1 shows the result of applying the 2 × 2 crosstalk cancellation scheme
developed in a previous study [3] to a set of parallel I/Os (≥4). The top LHS sub-
figure shows the physical realization. The bottom LHS figure presents the frequency
domain model for the channel including far-end crosstalk, FEXT. This sub-figure also
includes a model for the 2 × 2 XTCR analog front-end design [3]. Here, H(ω) is the
channel transfer function. − jωβ H (ω) is the crosstalk model for a micro-strip line,
where β is the coupling coefficient between channels. X1 –X8 are the independent
transmitted NRZ signals and Y1 –Y8 are the received signals at the end of the channel
outputs. G represents the gain of an adder with a LE. Z1 –Z8 are the final output
signals after the XTCR process. The received signals in channels 2–7 (Y2 –Y7 ) have
two additional crosstalk signals as presented in the top RHS matrix and represents the
nominal case for an infinite number of parallel I/Os. The bottom-RHS matrix shows
the result of XTCR analog front-ends in the frequency domain. In the final output
signal after XTCR, there are two additional error terms (ω 2 β 2 H G X ) on each channel
which are uncorrelated to the forward signal and act as a noise term. Unfortunately,
the amplitude of this error term is not negligible for reasonable coupling coefficient
values (β). By designing the channels to have sufficient distance between them and
ignoring this error term, we can avoid this problem but it increases the area and
potentially the routing length for each channel, and thus limits the I/O speed. This
is a common issue in all crosstalk cancellation schemes that target multiple parallel
I/Os (≥4), but as of yet, has largely been ignored.

78 Appendix B: Issues of Applying Consecutive
Fig. B.1 Application of 2 × 2 XTCR analog front-ends in multi-lane parallel I/Os (left) and the
frequency domain matrix representation of multiple 2 × 2 XTCR (right)
Appendix C
Transmitter-Side Discrete-Time FIR XTC Filter
Versus Receiver-Side Analog-IIR XTC Filter
Figure C.1 shows the two widely used voltage-mode XTC architectures. For simplic-
ity, we focus on the signal flow for crosstalk cancellation in only channel 1, and ignore
secondary effects like crosstalk reutilization at this point. Discrete-time FIR XTC
schemes, such as that shown in Fig. C.1a, are usually realized at the transmitter-side,
because of the availability of the digital data stream along with clock information.
The adjacent channel digital signal (channel 2) is appropriately delayed by flip-flops
and added with the correct FIR coefficients at the drivers to generate an XTC sig-
nal [4]. The crosstalk cancellation occurs in the channels as shown in the top RHS
in Fig. C.1a. For better signal integrity in the multi-lane (≥4) I/Os, the I/O receiver
must deal with both channel ISI and all the independent crosstalk signals. So, though
FIR filters allow us to generate arbitrary filter shapes, while generating the crosstalk
signal using FIR taps, we need to consider both the shape of channel ISI and its
differentiation which becomes significantly more complex and power hungry. On
the other hand, receiver-side analog-IIR crosstalk cancellation architectures take the
adjacent channel received signal (channel 2), differentiates it (d/dt) and multiplies
with an appropriate gain (β) to generate an XTC signal and adds it to the signal line
(channel 1), as shown in Fig. C.1b.
The dotted red boxes in Fig. C.1a, b show additional details for the two methods to
generate the crosstalk cancellation signal. The first thing to note is that the transmitter
FIR XTC scheme requires fractional taps. The crosstalk signal is proportional to the
derivative of the signal, and the maximum crosstalk amplitude occurs at the data
transitions and not during the middle of the eye opening. This means that the XTC
FIR filter needs to operate at 2× the clock speed, or by using interleaved clocks as
illustrated in the waveform inside the dotted box of Fig. C.1a. Generating signals at
twice the clock speed requires higher bandwidth or higher complexity in hardware,
and causes high power consumption. In addition, the transmitter driver is one of
the more power-hungry blocks in the overall I/O system, and additional drivers to
create sharp edges for an XTC signal requires large driving capability, which further
increases power consumption. In the presence of pre-emphasis to mitigate ISI, the
crosstalk becomes ever sharper due to the pre-emphasis and the XTC signal will need
to have a very sharp edge, which correlates to high speed and high power.

80 Appendix C: Transmitter-Side Discrete-Time
(a)
(b)
Fig. C.1 Comparison of architectures and comparison of XTC signal generation for transmitter
FIR XTC and receiver IIR XTC. a Signal flow in a XTC architecture with transmitter FIR filter.
b Signal flow in a XTC architecture with receiver IIR filter
There are additional complications with incorporating the XTC signal at the trans-
mitter driver. First, the driver for the XTC signal consumes a certain portion of the
voltage margin of the overall driver, which is already insufficient due to the pre-
emphasis FIR. This means that there will have to be a trade-off between the achiev-
able vertical eye-opening and improved XTC performance. Second, the routing of
the complicated delay lines in the multi-tap FIR filters used in XTC can become a
bandwidth limiter. Third, adapting the tap weights for the XTC scheme is challenging
Appendix C: Transmitter-Side Discrete-Time 81
because there are twice as many FIR taps need to be adapted as the number of taps
in the pre-emphasis FIR filter, with one set for each adjacent channel. Therefore,
in theory, as long as the DACs in the taps are of sufficient resolution and provide
sufficient control, an accurate XTC signal within a wide-range of strengths can be
generated. However, all the DAC weights now have to be adjusted optimally and
automatically. Further, the residual crosstalk signal after XTC is only known on the
receiver-side, so a number of I/Os will have to be dedicated to transmit back this
information to the transmitter-side, albeit at lower speeds.
The receiver-side analog-IIR XTC scheme uses the received signal from the adja-
cent line (channel 2), as shown in the dotted red box of Fig. C.1b. For closely-spaced
channels with the multiple distributed poles and zeros inherent in a transmission line,
the crosstalk signal at the far-end receiver-side (FEXT) is close to an ideal negative
differentiation [5–8]. An analog passive differentiation block can emulate the XTC
signal. After proper gain control and addition with a negative sign, the crosstalk can
be canceled. However, the analog-IIR XTC filter can only differentiate the wave-
form and is less flexible in comparison to the transmitter-side FIR XTC. There is
likely to be some deviations from the ideal differentiation due to transmission nulls
caused by discontinuities. Any discrepancy from the ideal differentiation will result
in some residual crosstalk signal. However, we find that in most micro-strip lines the
majority of the crosstalk signal follows the ideal derivative and the residual error is
fairly small. So, the additional flexibility available in the discrete time transmitter
FIR based XTC scheme turns out to add to the complexity of the calibration process
and tends to be power hungry while not providing significantly more benefit.
Appendix D
Line Mismatch Sensitivity
In this sub-section, we consider the impact of line mismatch on the eye opening
while using the introduced IIR-based XTCR technique. The top LHS of Fig. D.1
shows the setup used to identify the sensitivity of the eye-opening due to chan-
nel length mismatch. In the set of simulations performed here we vary the length
mismatch on the eye-diagram at the channel 2 output. The channel 2 analog front-
end output contains 4 components, the forward signal (GHX2 (ω)), the reutilized
crosstalk ((ω 2 β 2 + ω 2 δ 2 )GHX2 (ω)), the residual crosstalk after XTC and an error
signal (ω 2 βδGHX4 (ω)). The error signal is not sensitive to the line mismatch because
it is independent of the forward signal in channel 2 and behaves like noise. However,
the performance of the reutilized crosstalk signal and residual crosstalk signal after
XTC will be impacted by the line mismatch.
The wavelength of a 6 GHz signal, which is the corner frequency of 12 Gb/s NRZ
signal, can be calculated as,
c 3 · 108
λ= √ = √ = 937 (mil) (D.1)
f r 6 · 109 • 4.4
Therefore, a 56 mil line mismatch causes a 10 ps time delay. The eye-diagrams on

the RHS of Fig. D.1 present the simulated eye as we increase the time delay by
10 ps steps. The horizontal eye-opening was 210/628 mVppd , which is 33 % of the
maximum peak-to-peak differential amplitude of the NRZ signal. With a 40 ps delay
mismatch, the horizontal eye-opening is reduced to 85.4/816 mVppd because of the
increased residual crosstalk and the 2nd derivative signal mismatch relative to the
forward signal timing. The bottom LHS graph in Fig. D.1 summarizes the horizontal
eye-opening results and their percentages in the overall eye amplitude for various
time delay mismatches. A 56 mil (≈1.4 mm) line mismatch reduces the eye-opening
from 210 mV to 156 mV. However, these results are for the worst-case location of
the mismatch, i.e., at the receiver input.

84 Appendix D: Line Mismatch Sensitivity
(a)
(b)
(c)
(d)
(e)
Fig. D.1 Eye-degradation on channel 2 analog front-end output by the line mismatch at the channel
output
Appendix E
Input Matching for 4 × 4 XTCR Receiver
Test Bench
See input matching in Fig. E.1.
Fig. E.1 Input matching circuits and simulated reflections (S11 , S22 ) and voltage gains (Vout /Vin )

Appendix F
Bandwidth Improvement by Technology
Scaling
This section includes simulation results that show the performance improvement
of a MOSFET as the technology scaling advances. The significant advantages are
the reduction of the overlap capacitance between the gate and the source and the
increase of current driving capability. Figure F.1 presents the definition of the driving
capability. As derived in the equation, the iout /iin is approximately expressed as
3μn VOV
2L 2
. As the technology is scaled (L), we can expect the square increase in the
current driving capability in a MOSFET. As we increase the frequency of input
AC current, the current gain iout /iin becomes unity, and we define this frequency
as the transient frequency. Since both the current gain and transient frequency are
proportional to gm /Cg , the transient frequency is typically used to represent the
driving capability of the technology.
The graph on the left of Fig. F.2 shows the simulated iout /iin for 130 nm, 65 nm,
and 32 nm CMOS technology. The transient frequencies of each technology are
presented in the graph. Since the overdrive voltage affects the transient frequency,
we set this value to 100 mV for all simulation conditions. The graph on the right of
Fig. F.2 shows gm /Cg values for various power levels. In this simulation, we maintain
a constant overdrive voltage (100 mV), but increase the MOSFET size so that the
DC current flowing through the device ranges from 1 to 5 mA for each technology.
Although we vary the aspect ratio of the MOSFET (W/L), the value of gm /Cg does
not change, and highly depends on the technology scaling (L). These trends justify
the bottom-right equation of Fig. F.1, showing that the transient frequency mostly
Fig. F.1 Description of the definition of driving capability and ωT

88 Appendix F: Bandwidth Improvement by Technology Scaling
Fig. F.2 ωT simulation results with constant 100 mV overdrive voltage at each technology (left)
and gm /Cg plots for various currents and MOSFET aspect ratios with constant 100 mV overdrive
voltage (right)
Fig. F.3 Bandwidth (cutoff frequency) improvement with the technology scaling
depends on the mobility (μn ), overdrive voltage (VOV ) and channel length scaling
(L 2 ).
Although the technology scales, the parasitics generated from routing metals do
not scale. For the simulations of a differential pair shown at the top of Fig. F.3, we
increase the MOSFET power (1–5 mA) for 130 nm and 32 nm while maintaining the
Appendix F: Bandwidth Improvement by Technology Scaling 89
constant overdrive voltage (100 mV) and assuming a constant parasitic capacitance
(100 fF). The graphs on the bottom of Fig. F.3 show AC gain plots for various MOS-
FET power levels. For 32 nm CMOS, the Cg varies from 6.6 to 33 fF in our simulation
setting, and the Cd value is 10–20 % of Cg . As a result, the 1-dB cutoff frequency of
MOSFET is dominated by the R L C L value and does not vary. On the other hands,
Cg of 130 nm CMOS ranges from 20.5 to 102.9 fF, as illustrated in the graph in the
right of Fig. F.2 and affects the cutoff frequency.
References
1. S. Gondi, B. Razavi, Equalization and clock and data recovery techniques for 10-Gb/s CMOS
serial-link receivers. IEEE J. Solid-State Circuits 42, 1999–2011 (2007)
2. B.K. Casper, M. Haycock, R. Mooney, An accurate and efficient analysis method for multi-
Gb/s chip-to-chip signaling schemes, in IEEE VLSI, pp. 54–57, Aug. 2002
3. T. Oh, R. Harjani, A 5 Gb/s 2 × 2 MIMO crosstalk cancellation scheme for high-speed I/Os,
in IEEE CICC, Sep. 2010
speed serial link design, in IEEE CICC, pp. 405–408, Sep. 2006
5. J.F. Buckwalter, A. Hajimiri, Cancellation of crosstalk-induced jitter. IEEE J. Solid-State
Circuits 41, 621–631 (2006)
6. H.-K. Jung, S.-M. Lee, J.-Y. Sim, H.-J. Park, A slew-rate controlled transmitter to compensate
for the crosstalk-induced jitter of coupled microstrip lines, in IEEE CICC, Sep. 2010
crosstalk-and-ISI equalizing receiver in 2-drop single-ended SSTL memory channel, in IEEE
CICC, Sep. 2010
8. H.-K. Jung, K. Lee, J.-S. Kim, J,-J. Lee, J.-Y. Sim, H.-J. Park, A 4 Gb/s 3-bit parallel transmitter
with the crosstalk-induced jitter compensation using TX data timing control. IEEE J. Solid-
State Circuits 44, 2891–2900 (2009)

High Performance Multi-Channel High-Speed IO Circuits PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

High Performance Multi-Channel High-Speed IO Circuits PDF

Hochgeladen von

Copyright:

Verfügbare Formate

ACSP · Analog Circuits And Signal Processing

For further volumes:

ISSN 1872-082X ISSN 2197-1854 (electronic)

Library of Congress Control Number: 2013945793

Ó Springer Science+Business Media New York 2014

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

2 2 3 6 Gb/s MIMO Crosstalk Cancellation and Signal

3 4 3 12 Gb/s MIMO Crosstalk Cancellation and Signal

3.3 Verifying Crosstalk Cancellation Using Multi-Lane Signals . . . . 39

4 Adaptive XTCR, AGC, and Adaptive DFE Loop. . ............ 47

5 Research Summary and Contributions . . . . . . . . . . . . . . . . . . . . . 69

Appendix A: Noise Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Appendix B: Issues of Applying Consecutive 2 3 2 XTCR

Appendix C: Transmitter-Side Discrete-Time FIR XTC Filter

Appendix D: Line Mismatch Sensitivity . . . . . . . . . . . . . . . . . . . . . . . 83

Appendix E: Input Matching for 4 3 4 XTCR Receiver

Appendix F: Bandwidth Improvement by Technology Scaling. . . . . . . 87

T. Oh and R. Harjani, High Performance Multi-Channel High-Speed I/O Circuits, 1

1.1 Low Impedance Microstrip-Line FEXT Model

where τ (= u/d k ) is the forward coupling strength and u is a function of channel

1.2 Predicting Eye-Diagram Properties from the Pulse Response

channel, then h −m = · · · = h −2 = h −1 = 0, ∀ m > 0. Further, for simplification

Fig. 1.2 Skewed FEXT signal in a transmitter per-pin de-skew scheme

1.3 Single-Ended Versus Differential Signaling

Fig. 1.5 Differential I/O

u Vin (ω) u Vin (ω)

u Vin (ω) u Vin (ω)

In this chapter, a continuous-time multiple-input and multiple-output crosstalk can-

2.1 MIMO-XTCR Architecture

Our approach utilizes a continuous-time receiver-side crosstalk cancellation scheme

T. Oh and R. Harjani, High Performance Multi-Channel High-Speed I/O Circuits, 11

Fig. 2.1 2 × 2 MIMO-XTCR architecture

Fig. 2.2 Bandwidth improve-

A continuous-time high-frequency boosting linear equalizer stage follows the

2.2 2 × 2 MIMO-XTCR Prototype Implementation

2.2.1 2 × 2 MIMO-XTCR in Single-Ended I/Os

Fig. 2.3 2 × 2 MIMO-XTCR circuit for single-ended I/Os

Fig. 2.6 Simulated linear equalizer frequency response

The frequency for the zero, 1/(2π R S C S ) is calibrated by changing C S and R S

Fig. 2.7 Die photograph

Fig. 2.8 2 × 2 MIMO crosstalk cancellation scheme for differential I/Os

2.2.2 2 × 2 MIMO-XTCR in Differential I/Os

Figure 2.8 shows a potential extension of the MIMO-XTCR scheme to differential

negative derivative crosstalk pulse response on the adjacent differential channel.

2.3 Measurement Results

2.3.1 2 × 2 MIMO-XTCR Gain Calibration: Single Input Signal

2.3.2 2 × 2 MIMO-XTCR Measurement Results: Two Independent

Table 2.1 Measurement results (before and after MIMO-XTCR)

Table 2.3 Comparison of FEXT cancellation schemes

The proposed continuous-time MIMO FEXT cancellation and signal reutilization

This paper presents an efficient continuous-time architecture to cancel and reutilize

T. Oh and R. Harjani, High Performance Multi-Channel High-Speed I/O Circuits, 27

Channel 1 Crosstalk from

Channel 5 Forward signal

h(t) Transmitted NRZ signal

d Received NRZ signal (channel inter-symbol interference)

Crosstalk signal (zero at peak amplitude timing of received signal)

3.1 Characteristic of Far-End Crosstalk and Proposed Channel

3.1.1 Factors for Crosstalk Strength

that crosstalk amplitude in single-ended I/Os is inversely proportional to the channel

3.1.2 Proposed Channel Architecture for Multi-Lane

• XTCR high-frequency boosting