Sie sind auf Seite 1von 372

CIRCUIT TECHNOLOGY,

TOWARDS 100 GHz LOGIC

Mark •i
HIGH SPEED INTEGRATED
CIRCUIT TECHNOLOGY,
TOWARDS 100 GHz LOGIC
SELECTED TOPICS IN ELECTRONICS AND SYSTEMS

Editor-in-Chief: M. S. Shur

Published
Vol. 4: Compound Semiconductor Electronics: The Age of Maturity
ed. M. Shur
Vol. 5: High Performance Design Automation for Multichip Modules and Packages
ed. J. Cho and co-ed. P. D. Franzon
Vol. 6: Low Power VLSI Design and Technology
eds. G. Yeap and F. Najm
Vol. 7: Current Trends in Optical Amplifiers and Their Applications
ed. T. P. Lee
Vol. 8: Current Research and Developments in Optical Fiber Communications
in China
eds. Q.-M. Wang and T. P. Lee
Vol. 9: Signal Compression: Coding of Speech, Audio, Text, Image and Video
ed. N. Jayant
Vol. 10: Emerging Optoelectronic Technologies and Applications
ed. Y.-H. Lo
Vol. 11: High Speed Semiconductor Lasers
ed. S. A. Gurevich
Vol. 12: Current Research on Optical Materials, Devices and Systems in Taiwan
eds. S. Chi and T. P. Lee
Vol. 13: High Speed Circuits for Lightwave Communications
ed. K.-C. Wang
Vol. 14: Quantum-Based Electronics and Devices
eds. M. Dutta and M. A. Stroscio
Vol. 15: Silicon and Beyond
eds. M. S. Shur and T. A. Fjeldly
Vol. 16: Advances in Semiconductor Lasers and Applications to Optoelectronics
eds. M. Dutta and M. A. Stroscio
Vol. 17: Frontiers in Electronics: From Materials to Systems
eds. Y. S. Park, S. Luryi, M. S. Shur, J. M. XuandA. Zaslavsky
Vol. 18: Sensitive Skin
eds. V. Lumelsky, Michael S. Shur and S. Wagner
Vol. 19: Advances in Surface Acoustic Wave Technology, Systems and
Applications (Two volumes), volume 1
eds. C. C. W. Ruppel and T. A. Fjeldly
Vol. 20: Advances in Surface Acoustic Wave Technology, Systems and
Applications (Two volumes), volume 2
eds. C. C. W. Ruppel and T. A. Fjeldly
Selected Topics in Electronics and Systems - Vol. 21

HIGH SPEED INTEGRATED


CIRCUIT TECHNOLOGY,
TOWARDS 100 GHz LOGIC

Editor

Mark Rodwell
University of California, Santa Barbara, USA

fe World Scientific
m Singapore • New Jersey • London • Hong Kong
Published by
World Scientific Publishing Co. Pte. Ltd.
P O Box 128, Farrer Road, Singapore 912805
USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library.

HIGH SPEED INTEGRATED CIRCUIT TECHNOLOGY, TOWARDS 100 GHz LOGIC


Copyright © 2001 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or
mechanical, including photocopying, recording or any information storage and retrieval system now known or to
be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center,
Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from
the publisher.

ISBN 981-02-4638-2

Printed in Singapore.
Preface:
H i g h S p e e d I n t e g r a t e d Circuit Technology,
Towards 100 G H z Logic

M.J.W. Rodwell
Department of Electrical and Computer Engineering, University of California
Santa Barbara, CA, 93106, U.S.A.

This issue of the Journal addresses recent work in very high speed digital elec-
tronics. After a period of slow progress in the late 1980's and early 1990's, clock
rates for small-scale semiconductor integrated circuits have increased quickly in the
past five years, and demonstration of small-scale ICs operating at a 100 GHz clock
appears to be imminent.
There are important applications, both commercial and military. The recent,
explosive, growth of voice and data communications promises the largest market.
10 G b / s time-division-multiplexed optical fiber data transmission systems are now
available, and ICs are now being quickly developed for 40 G b / s . Optical fibers
can certainly support yet larger bandwidth. As the following papers illustrate, IC
operation above 40 GHz poses no particular fundamental difficulty, and chip sets
for 100 and perhaps 160 G b / s rates will in time become available.
There are also military applications for GHz mixed-signal ICs. Military radar
and communications systems use direct digital frequency synthesis and digital-
analog converters in transmitters, and analog-digital converters in receivers. The
application stipulates very high dynamic range with the highest obtainable band-
width. Oversampling is used extensively to increase dynamic range, and the re-
quired clock rates can quickly approach 100 GHz.
Logic speed is a function of device carrier transport physics, device scaling,
and intelligent design of circuit and system architecture. The papers, all invited,
describe electron device, circuit, and system design, in a variety of semiconductor
-and superconductor- technologies. The papers are organized in an order in rough
correspondence to the present volume of manufacturing.
Fukaishi et al report circuit and system design for a 5 G b / s serial link fabricated
in 0.25 /jm CMOS. Architecture is key to performance; fast tree demultiplexers are
used, with subsequent frequency conversion at lower speeds providing the correct
demultiplexed word sizes. Migration of CMOS serial links to 0.13 fim poses a serious
competitive threat to today's 10 G b / s SiGe chip sets.
Wurzer et al and Washio et al report high speed Si/SiGe bipolar transistors and
digital integrated circuits. The transistors are aggressively scaled, with ~ 0 . 1 0.2
/xva critical features, and parasitics are greatly reduced through polysilicon extrinsic
contacts. Circuits use double-buffered ECL operating at very high current density.
Both papers report very fast digital circuits, and ICs for 40 G b / s transmission.
GaAs-based HBTs offer significant advantages over Silicon in terms of basic
transport physics, but have generally lagged behind Si/SiGe technology in submi-
cron scaling. Oka et al describe highly-scaled I n G a P / G a A s HBTs with emitter
dimensions as small as 0.25 /im and fmax as high as 255 GHz, and digital ICs
operating to 39.5 GHz.
Using InP HEMTs, the N T T group has demonstrated numerous ICs operat-
ing at 40-80 G b / s rates. Representing this work, Enoki et al address feasibility of
vi Preface

100 G b / s -class fiber transmission ICs. Considering key circuit blocks within the
transceiver, relationships are developed for circuit bandwidth as a function of dom-
inant transistor parasitics, for both HEMT and HBT circuits. Very wideband and
high-saturation-power unipolar photodiodes are also demonstrated, together with
their integration with 40 G b / s decision circuits.
HBTs fabricated on InP substrates can exploit carrier velocities and mobilities
several times those available in Si/SiGe. Yet, InP-based HBTs have generally been
much less aggressively scaled than their Silicon counterparts. Rodwell et al report
an analysis of the scaling strategy required to obtain both wideband digital and mm-
wave ICs. A substrate transfer process is also reported, in which transistors and
moderate-scale ICs have been fabricated. The process allows aggressive scaling of
the collector-base junction parasitics. Fields et al report several scaling generations
of InAlAs/InGaAs HBTs fabricated in a highly manufacturable mesa process. A
static frequency divider is reported operating at a record 72.8 GHz clock frequency.
Gutierrez-Aitken et al report an InP HBT process with a cantilever-base process step
for aggressive reduction in the collector capacitance. Impressive scales of integration
are demonstrated in this high speed process with demonstration of a functional
3000-HBT direct digital frequency synthesis IC.
Superconductors still retain their position as the fastest digital technology. Bunyk
et al review the physics of Josephson junctions and the principles of operation and
design of rapid single flux quantum (RSFQ) logic. Prospects for immediate and
longer-term applications are examined. Brock reviews RSFQ superconducting logic
from the perspective of circuits and systems, including historical development. The
paper provides an extensive summary and description of key superconducting IC
results, including high-resolution ADCs and DACs, PLLs, memory, packaging, and
cryogenics.
CONTENTS

Preface v
M. J. W. Rodwell

High-Speed and High-Data-Bandwidth Transmitter and Receiver for


Multi-Channel Serial Data Communication and CMOS Technology 1
M. Fukaishi, K. Nakamura, and M. Yotsuyanagi

High-Performance Si and SiGe Bipolar Technologies and Circuits 35


M. Wurzer, T. F. Meister, J. Bock, H. Schafer, K. Aufinger, S. Boguth,
H. Knapp, M. Rest, R. Schreiter, and L. Treitinger

Self-Aligned Si BJT/SiGe HBT Technology and Its Application to


High-Speed Circuits 77
K. Washio

Small-Scale InGaP/GaAs Heterojunction Bipolar Transistors for


High-Speed and Low-Power Integrated-Circuit Applications 115
T. Oka, K. Hirata, H. Suzuki, K. Ouchi, H. Uchiyama, T. Taniguchi,
K. Mochizuki, and T. Nakamura

Prospects of InP-Based IC Technologies for 100-Gbit/S-Class Lightwave


Communications Systems 137
T. Enoki, E. Sano, and T. Ishibashi

Scaling of InGaAs/InAlAs HBTs for High Speed Mixed-Signal and


mm-Wave ICs 159
M. J. W. Rodwell, M. Urteaga, Y. Betser, T. Mathew, P. Krishnan,
D. Scott, S. Jaganathan, D. Mensa, J. Guthrie, R. Pullela, Q. Lee,
B. Agarwal, U. Bhattacharya, S. Long, S. C. Martin, and R. P. Smith

Progress Toward 100 GHz Logic in InP HBT IC Technology 217


C. H. Fields, M. Sokolich, S. Thomas, K. Elliot, and J. Jensen
viii Contents

Cantilevered Base InP DHBT for High Speed Digital Applications 245
A. L. Gutierrez-Aitken, E. N. Kaneshiro, J. H. Matsui, D. J. Sawdai,
J. K. Notthoff, P. T. Chin, and A. K. Oki

RSFQ Technology: Physics and Devices 257


P. Bunyk, K. Likharev, and D. Zinoviev

RSFQ Technology: Circuits and Systems 307


D. K. Brock
International Journal of High Speed Electronics and Systems, Vol. 11, No. 1 (2001) 1-33
© World Scientific Publishing Company

HIGH-SPEED AND HIGH-DATA-BANDWIDTH TRANSMITTER


AND RECEIVER FOR MULTI-CHANNEL SERIAL DATA
COMMUNICATION WITH CMOS TECHNOLOGY

MUNEO FUKAISHI, KAZUYUKI NAKAMURA,


and MICHIO YOTSUYANAGI

System Devices and Fundamental Research


Silicon Systems Research Laboratories, NEC Corporation,
1120, Shimokuzawa, Sagamihara, Kanagawa 229-1198, JAPAN

This paper briefly reviews recent research on CMOS gigahertz-rate communication


circuits and design innovations for overcoming device performance limitations. A
multi-channel transmitter and receiver chip set operating at 5 Gb/s has been
developed using 0.25-|-im CMOS technology. To achieve high-speed operation, the
chip set features: (1) a tree-type demultiplexer and frequency conversion architecture,
(2) a self-aligning phase detector for clock and data recovery circuit, and (3) a fully
pipelined 8-bit to 10-bit encoder. The features contributing to the achievement of
high-data bandwidth for multi-channel transmission include circuits for
compensating for the phase difference between multiple receiver chips and for the
frequency difference between the system clocks of the transmitter and receiver chips.
These techniques for high-speed operation and multi-channel transmission are
supported by the high level of integration possible with CMOS technology compared
with non-CMOS technology.

1. Introduction

Demand has been increasing for high-speed serial data link systems using
high-speed large-scale integrated circuits (LSIs) with speeds in the gigahertz
range. Such high-speed LSIs have conventionally been achieved by using Si
bipolar transistors or compound semiconductor transistors, such as GaAs or
SiGe. So far, they have only been applied to telecommunication systems.
However, the demand for gigahertz LSIs is increasing rapidly in consumer
applications (data communication systems), for example gigabit ethernet,
links between processor boards and/or computers, and communication
between computers and peripheral devices, because of the great activity in the

1
2 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

multimedia market. While high-speed data transmission rates are important in


these applications, it is also important to keep the circuit area small, power
consumption low, and fabrication cost low for these consumer applications.
CMOS technology has recently been used to develop such high-speed LSIs
because the performance of CMOS devices increases as miniaturization of the
CMOS gate length progresses. 1 " 18 This miniaturization also leads to a lower
supply voltage. In the deep-sub-micron region, the cutoff frequency is as high
as several tens of gigahertz, while the switching power consumed is very low
--- on the order of 0.1 mW/GHz/gate. These basic characteristics are
motivating a redesign of various communication building blocks previously
implemented with non-CMOS technologies. The advantage of CMOS
technology over non-CMOS technology is that its high level of integration
allows low-speed logic functions, such as encoder circuits or protocol
controllers, to be integrated into a single chip. Eliminating off-chip
interconnections between them will drastically reduce power consumption at
the interface. CMOS circuits are therefore useful for keeping power and cost
requirements low compared with using bipolar transistors or compound
semiconductor devices. Although CMOS device performance increases with
miniaturization, design innovations to get the most out of device performance
are needed in order to overcome the device performance limitations.
This paper describes approaches to overcome the conventional CMOS
limitations. Section 2 describes the trends of CMOS high-speed
communication LSIs. Section 3 describes ordinary device performance
limitations and how to obtain high-speed operation beyond these limitations
by using CMOS circuits. Section 4 describes circuit designs for the clock and
data recovery circuit and demultiplexer for high-speed circuit technology, 8-
bit to 10-bit (8B10B) encoder and word alignment logic for high-speed logic
circuits, and multi-channel transmission techniques for high-data-bandwidth
transmission. Experimental results for 5-Gb/s multi-channel transmitter and
receiver chip sets with 0.25-|im CMOS are presented in section 5. Finally,
high-speed LSI design issues and future prospects are discussed in section 6.

2. Trends of CMOS High-Speed Communication LSIs

Figure 1 shows the trends of high-speed serial data communication LSIs with
CMOS technology, and table 1 lists the recent work on gigahertz-rate
transceivers with CMOS circuits. The transmission data rate increases in
proportion to the gate length reduction: operating speeds of 1 and 3 Gb/s have
been achieved with 0.8-0.5-nm and 0.15-nm CMOS by using the conventional
CMOS architecture, or single architecture, which uses clock signals having
High-Speed and High-Data-Bandwidth Transmitter and Receiver 3

the same frequency as the data rate; i.e. a 3-GHz clock for 3-Gb/s data. The
maximum transmitted-data rate is determined by the device performance, and
will be only about 4 or 5 Gb/s for 0.1 -\xm CMOS. In contrast, the parallel
architecture overcomes device performance limitations through the use of
multiple, different-phase clocks. As the number of multi-phase clocks
increases, the clock frequency can be decreased. For example, a 4-Gb/s data
rate is achieved with 0.25-|im CMOS, and 6-Gb/s with 0.18-nm CMOS. The
maximum transmitted-data rate will reach 10 Gb/s with 0.1 -j^m CMOS. The
figure and table indicate that the data rate for the maximum operating
frequency depends on the circuit design rather than on the design rule or the
gate length. The key circuit techniques featured in these studies can be
summarized as follows: current-mode operation for the high-speed blocks, '
double rail flip-flop, 2 utilization of multiple-phase clocks, ' " ' '
utilization of both rising and falling clock edges (two-phase clocks), • • • •
16, 8
' oversampling techniques, 4 " 5, l0 pre-emphasis for driving long cables,"' 12'
15
and asynchronous operation.' 3 ' I6 These techniques are supported by the
advantage of CMOS technology over non-CMOS technology in its high level
of integration. This allows CMOS circuits to have various designs. Innovative
circuits can therefore overcome the device performance limitations, and the
data rate for the maximum operating frequency depends on the circuit design
rather than on the device performance.

0.8 0.5 0.3 0.2 0.1


Design Rule (urn)

Fig. 1. Trends of high-speed serial data communication LSIs.


4 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

Table 1: CMOS serial data transceiver studies.

Design Function Data Rate VCO Frequency Ref.


Rule
(Effective)
0.8 pirn 8 1 MUX + PLL 2.5 Gb/s 312.5 MHz [4]
1 8 DEMUX+ CDR* X 8 Phases
0.6 [xm 8 1 MUX + PLL 4 Gb/s 500 MHz [8]
1 8 DEMUX+ CDR* X 8 Phases
0.5 )j,m 20:1 MUX + PLL 1.0625 Gb/s 106.25 MHz [5]
1:20 DEMUX+ CDR* X 10 Phases
0.5 pirn 10:1 MUX + PLL 1.25 Gb/s 250 MHz [6]
1:10 DEMUX+CDR* X 5 Phases
0.8(0.45) 20:1 Fiber Channel 1.0625 Gb/s 265.5 MHz [1]
Transceiver** X 4 Phases (TX)
531 MHz
Both Edges(RX)
0.4 (jm 10:1 MUX + PLL 10 Gb/s 500 MHz [12]
(4 Levels) X 10 Phases
0.35 urn 8:1 Fiber Channel 1.5 Gb/s 375 MHz [10]
Transceiver** X3 X 4 Phases
0.28 nm 10:1 MUX + PLL 3.5 Gb/s 350 MHz [14]
1:10 DEMUX+ CDR* X 10 Phases
0.25 urn 4:1 MUX 3 Gb/s - [2]
1:4DEMUX
0.25 nm 32:1 Fiber Channel 4.25 Gb/s 2.125 GHz [13]
Transceiver** Both Edges
0.25 urn 32:1 Fiber Channel 5 Gb/s 2.5 GHz This
Transceiver** Both Edges Work
[16]
0.25 nm 10:1 MUX + DLL 4 Gb/s - [15]
1:10 DEMUX+ CDR*
0.18 urn 1:8 DEMUX+ CDR* 6 Gb/s 3 GHz [11]
Both Edges
0.18 urn CDR* 10 Gb/s 5 GHz [18]
Both Edges
0.18 urn 1 8 DEMUX 10 Gb/s - [17]
0.15 urn 8 1 MUX 3 Gb/s - [3]
0.15 pirn 1 8 DEMUX+ CDR* 2.4 Gb/s 1.2 GHz [7]
Both Edges
0.15 \im Pre AMP + A G C + 1:8 2.4 Gb/s 1.2 GHz [9]
DEMUX+ CDR* Both Edges
*: CDR Clock and Data Recovery PLL
**: Fiber Channel Transceiver includes 8B10B encoder and 10B8B decoder.
High-Speed and High-Data-Bandwidth Transmitter and Receiver 5

O
&
J
it
o

E
x
s 1L
1

Fig. 2. 0.25-nm CMOS inverter maximum clock frequency versus fan-out.

3. High-Speed and Multi-Channel Serial Data Transceiver

3.1. Architecture for high-speed LSIs

As the data rate increases, circuit operating speed comes to be limited by


device performance. The" dependence of the maximum clock frequency of a
0.25-nm CMOS inverter on fan-out is shown in Fig. 2. The operating speed of
the CMOS inverter, indicated by the maximum clock frequency, decreases as
fan-out increases. With 0.25-u,m CMOS technology, the operating speed for a
fan-out of 2 is about 3.5 GHz excluding influence of interconnect wire
capacitance. When considering the wire capacitance, the operating speed
degrades. This device performance limitation makes it impossible either to
generate a 5-GHz clock or to distribute a 5-GHz clock signal with
conventional 0.25-|am CMOS circuits.
Other key circuits for the achievement of high-speed transceiver LSIs
include the serializer in the last stage of the transmitter and the deserializer in
the first stage of the receiver because these circuits operate at the highest
speed. Figure 3 contrasts a conventional deserializer design, a shift-register
type demultiplexer (DEMUX) (Fig. 3(a)), with two designs intended to
overcome conventional device limitations: a multi-phase type DEMUX (Fig.
3(b)) "' 8 and a tree-type DEMUX (Fig. 3(c)). 7
Although the shift register type architecture in (a) can handle arbitrary
parallel data, the maximum operating speed of this circuit is limited by device
performance: only 3.0-Gb/s operation will be achievable even with 0.15-jxm
CMOS. 3' " This is because this architecture needs a clock signal of the same
frequency as the transmitted data rate (e.g., a 5-GHz clock signal is necessary
for a 5-Gb/s DEMUX). The parallel architecture shown in Fig. 3(b) overcomes
6 M. Pukaishi, K. Nakamura & M. Yotsuyanagi

device limitations through the use of multiple, different-phase clocks. 4' 8 In


this parallel architecture, which can also handle arbitrary parallel data, the
clock frequency decreases with increasing number of different-phase clocks,
but this approach requires highly precise clock phase control, which is
difficult to achieve. A phase difference of only 200 psec, for example, must be
maintained for 5-Gb/s operation. The use of parallel circuitry also increases
input capacitance in the receiver and, as a result of increased circuit volume,
increases as well the power consumption of the transceiver as a whole.
The tree-type DEMUX architecture (c) overcomes device limitations by
using both rising and falling clock edges. As a result, a tree-type DEMUX is
able to operate at half the speed of the data rate. Although it is necessary to
drive a 2.5-GHz clock, which is nearly the maximum driving speed for 0.25-
ixm CMOS, the small number of high-speed-operated devices in the tree-type
architecture do make it more suitable than the other two architectures for
high-speed operations with low power consumption. This conventional tree-
type architecture, however, has two serious drawbacks. One is that it requires
the distribution of precisely controlled different-frequency clock signals to its
respective 1:2 DEMUXs from a clock generator block. The other drawback is
that the tree-type architecture converts serial data only into 2N bit parallel data,
which makes it unsuitable for application to the 10-channel
multiplexer/demultiplexer (MUX/DEMUX) necessary in ANSI Fiber Channel
designs. The Fiber Channel standard is widely used for the physical layer on
which the high-speed serial communication LSIs are based.
In response, first of all, to the clock-control drawback, we have developed
a 1:2 DEMUX module that does not require precisely controlled clock
distribution. It generates not only the output data but also an optimized clock
for the next stage. An asynchronous tree-type 1:8 DEMUX is obtained simply
by connecting such 1:2 DEMUX modules.
In response to the 2N bit conversion drawback, we have developed an 8-bit to
10-bit parallel-to-parallel frequency conversion circuit, which makes it
possible to enjoy the benefits of a tree-type architecture while still meeting
Fiber Channel standards. We have also developed comma detection and word
alignment logic because the tree-type DEMUX and our frequency conversion
circuit are unable by themselves to identify word boundaries correctly. This
tree-type DEMUX and frequency conversion architecture is supported by the
high integration level of CMOS technology because this architecture needs
many more transistors than the conventional architecture.
High-Speed and High-Data-Bandwidth Transmitter and Receiver 7

DIN —» F/F F/F F/F F/F F/F |—»- K K-TH F/F F/F

6-GHz OLK • - X X X 3 X
F/F i» F/F -i» F/F 1» F/F -I K K * F/F - i 1 * F/F -i
1 L L J L J L L
x ! TT T 1 T 1
OUT9 OUT8 OUT7 OUT6 KK. ii Ti
OUT1 OUTO

(a)

K OUT4

OUT2

OUT6

L OUT1
_J— -vT
L* OUT6
r* 1 J DEMUX
r* OUT3
I I * DEMUX t _

>-» 12 DEMUX
a., OUT7

2.5-GHz OLK 925-MHz GLK

CLK Generator Block

(b) (c)

Fig. 3. Block diagram of (a) shift register type 1:10 DEMUX, (b) multi-phase type 1:10
DEMUX, and (c) tree-type 1:8 DEMUX.

Other key circuits for achieving high-speed transceiver LSIs are the clock
and data recovery (CDR) circuit and logic circuit, such as the 8B10B encoder.
CDR is used in the front stage of the receiver, so it operates at the highest
speed in the receiver. The operating speed of the 8B10B encoder reaches 500
MHz in the 5-Gb/s transceiver. We therefore developed a binary self-aligning
phase detector for a high-speed CDR circuit and a 500-MHz fully pipelined
8B10B encoder by using 0.25-u.m CMOS technology.
8 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

3.2. Architecture for multi-channel transmission

Data transmission speed has recently been increasing rapidly to provide


data communication systems. In particular, the communication speed between
a personal computer and a flat panel display has increased as display sizes
have increased because the data bandwidth required is proportional to the total
number of pixels in the panel. 10 The digital display interface for next-
generation ultra-high-resolution flat panel displays (3200x2400 pixels) needs
to have a bandwidth of 16-Gb/s. A conventional interface connects the graphic
controller to the panel controller directly through numerous parallel cables,
and the effects of electromagnetic interference (EMI) increase drastically as
the transmitted data bandwidth increases. An effective way to reduce the
severity of these EMI effects is to make the number of transmission cables as
small as possible; however, this can only be done if the output signals of the
graphic controller are serialized. Low power consumption and low cost are
also essential for consumer applications, and CMOS circuits are particularly
useful for keeping power and cost requirements low compared with bipolar
transistors or compound semiconductor devises. Although CMOS technology
has recently been used in high-speed LSIs, 1_18 the CMOS circuits presently
available cannot operate at the 16-GHz frequency required for next-generation
panel interfaces. The maximum operating speed of CMOS circuits is, for
example, 5 Gb/s using 0.25-nm CMOS. I3 Therefore, multi-channel
transmission, which can achieve an aggregate bandwidth of 16 Gb/s, is
necessary in order to obtain the required bandwidth. Although multi-channel
transmission increases not only interface bandwidth but also the total power
and number of transistors or chip area, CMOS circuits can suppress both total
power and area to about 1/10 of those of bipolar circuits. 20 Multi-channel
transmission using CMOS circuits is therefore an effective approach for
increasing aggregate bandwidth while keeping power and cost requirements
low. Although the aggregate bandwidth can be increased by using multi-
channel transmission, there are system problems that must be solved when
developing a multi-channel TX and RX LSIs. System problems are caused by
poor synchronization, and have two origins. One is the phase difference
between multiple RX chips due to the data skew caused by transmission cables
of various lengths. The other is the frequency difference between the TX and
RX system clocks (the PC and the peripheral devices each have their own
clock source). However, because CMOS circuits can integrate a large number
of transistors, many function blocks can be implemented in a single chip. We
can therefore integrate the solutions for high-speed and multi-channel
transmission, such as an encoder for increasing the serial-data transmission
accuracy and an elastic buffer for compensating for phase and frequency
High-Speed and High-Data-Bandwidth Transmitter and Receiver 9

difference between multiple receiver chips or between the system clock of the
transmitter and receiver chips.
We developed techniques for compensating for the phase and frequency
differences, in order to obtain a multi-channel transmitter (TX) and receiver
(RX) chip set implemented by 0.25-u.m CMOS technology.

3.3. System, transmitter, and receiver architecture

3.3.1. System Architecture

The panel interface system consists of a graphic controller, transmitters


(TXs), receivers (RXs), and a panel controller as shown in Fig. 4. The graphic
controller output is 128-bit-wide data at 125 Mb/s. The bandwidth of the data
is 16 Gb/s, and it actually reaches 20 Gb/s because an 8B10B encoder is used
to increase serial-data transmission accuracy. The 20-Gb/s bandwidth required
for the interface is obtained by using multi-channel transmission achieved by
using four 5-Gb/s LSIs. The reason for using 5-Gb/s LSI is that the operating
speed of CMOS circuits is at most 5 Gb/s by using 0.25-nm CMOS. 1 3 One TX
LSI chip changes 32-bit-wide parallel data into 5-Gb/s serial data. Four serial
data streams made from the 128-bit parallel data are synchronized with the
transmitter system clock (CLKTX). The data transmitted through the four
coaxial cables must be deserialized by the RX chips before reaching the panel
controller because the panel controller has to receive 128-bit synchronous
data. The RX LSI changes each serial data stream back into 32-bit parallel
data. The 128-bit-wide output data from the four RX chips is synchronized
with the receiver system clock (CLKRX).

k5 Gb/s/ch.
Graphic Board DataRX LCD panel
DataTX[31:0]

[63:32]
o M 128
[95:64]

[127:96]

TX System CLK (CLKTX)

Fig. 4. Block diagram of panel interface system.


10 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

3.3.2. Transmitter architecture

Figure 5 shows a block diagram of the transmitter, which has a 32-bit-wide


125-Mb/s CMOS-level interface. A 32:8 pre-multiplexer (Pre-MUX) converts
32-bit parallel input data into 8-bit-wide 500-Mb/s data. The converted data is
then encoded to 10-bit-wide 500-Mb/s data by an 8-bit-to-l0-bit (8B10B)
encoder, which guarantees DC-balancing and a maximum run-length of five
for Fiber Channels. 21 The 8B10B encoder can therefore increase both the
accuracy of serial transmission and the operating margin of the receiver. The
encoded 10-bit 500-Mb/s data is then serialized into 5-Gb/s serial data by the
10:1 MUX. This MUX consists of a 10-bit-to-8-bit parallel-to-parallel
frequency-conversion circuit and a tree-type 8:1 MUX. 13 The tree-type MUX
is much more suitable for high-speed operations than a MUX with the
conventional shift-register-type architecture. This is because the flip-flops
used in the shift-register-type MUX must be operated using the target-
frequency clock signal (e.g., a 5-GHz clock signal is necessary for a 5-Gb/s
MUX). The maximum operating speed of this MUX is limited by device
performance (e.g., circuits with 0.25-|am CMOS can operate up to 2.5 GHz).
Moreover, the shift-register-type MUX needs many more circuits, such as
flip-flops, operating at high speed, so high-speed clock distribution over a
large area is needed; however such a distribution is difficult to achieve and
limits of the operating speed of the MUX. In contrast, the tree-type
architecture overcomes device limitations by using both the rising and falling
clock edges, and it also overcomes the difficulty with clock distribution
because of the small number of high-speed operating devices, for example,
only one 2:1 MUX module placed on the last stage of the 8:1 MUX operates at
the fastest speed. The frequency-conversion circuit makes it possible to take
the tree-type architecture advantage, which overcomes device limitations,
while still meeting the Fiber Channel Standard that specifies a 10-bit
serializer. n The serialized data is output by a differential data driver circuit
based on an nMOS open-drain buffer composed of a differential inverter
circuit using current mode logic (CML). A frequency-synthesis PLL generates
2.5-GHz differential clock signals from a 125-MHz reference signal (CLKTX).
The serial data is processed by using the rising edges of the 2.5-GHz
differential clock signals. This is because the delay time of the flip-flop
differs slightly for operations using the rising and falling edges. Although this
difference can be negligible in low-speed operation, it degrades the eye
diagram of serial data in high-speed LSIs. This difference creates jitter in the
serial data output.
High-Speed and High-Data-Bandwidth Transmitter and Receiver 11

3.3.3. Receiver architecture

The direction of data flow in the receiver is opposite to that in the transmitter
(Fig. 6). Differential high-speed 5-Gb/s serial input data is received by a
differential input buffer (data receiver) and converted into a single CMOS-
level signal. The data receiver, as well as the data driver in the transmitter, is
based on a CML circuit. The high-speed serial data must be deserialized into
10-bit data so it can be decoded by a 10B8B decoder for the Fiber Channel.
Like the transmitter, in order to obtain high-speed operations, a 1:10
demultiplexer (DEMUX) consists of a clock and data recovery core (CDR
core), a 2:8 asynchronous tree-type DEMUX, and an 8-bit-to-10-bit
frequency-conversion circuit. 13 The tree-type DEMUX and frequency-
conversion architecture provide high-speed operation as well as the TX chip.
The CDR core converts the serial data into 2-bit 2.5-Gb/s parallel data, and
the asynchronous tree-type DEMUX deserializes the 2-bit parallel data into
8-bit 625-Mb/s parallel data. The deserialized 8-bit data is then converted into
10-bit 500-Mb/s data by the frequency-conversion circuit. This data is next
processed by the comma-detection and word-alignment logic, because neither
the asynchronous DEMUX nor the frequency-conversion circuit can align
word boundaries correctly. The 10-bit word-aligned data is then decoded by
the 10B8B decoder, and the 8-bit decoded data is converted into 32-bit 125-
Mb/s parallel data by an 8:32 Post-DEMUX. The elastic buffer synchronizes
the converted 32-bit data with the RX system clock (CLKRX). A clock signal
in the receiver is generated from the CLKRX by the PLL.

r 10:1 MUX Data


i
125-Mb/s 5-Gb/s D £ v e r
500-Mb/s 625-Mb/s

7 ^
Serial
Parallel
Output
Input

125-MHz Ref. CLK (CLKTX)

Fig 5. B l o c k d i a g r a m o f t r a n s m i t t e r .
12 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

I
4 I

125-MHz Recovered CLK

2.5-GHz 125-MHz Ref. CLK


(CLKRX) (V)

Fig. 6. Block diagram of receiver.

3.4. Circuit Design

CML circuits can generally operate at higher speed than CMOS circuits
because CML circuits do not use pMOS transistors for driver, whose speed is
lower than nMOS transistors. Although MOS transistors' operating speed
increases as the gate length reduces, the supply voltage also reduces (e.g.,
2.5V for 0.25-nm CMOS, 1.5 V for 0.15-Lim CMOS, and 1.2 V for 0.1-Ltm
CMOS). CML circuits using fine gate length MOS transistors can not operate
because transistors do not operate in saturation region. Therefore, we adopt
CMOS circuits using nMOS and pMOS transistors for the digital operating
blocks. CML circuits, in contrast, are used for the analog operating blocks, for
example voltage-controlled oscillator (VCO) in PLL, phase interpolators (Pis)
in CDR, and I/O circuits. This is because noise immunity is important for
these analog circuits and CML has noise immunity rather than CMOS circuits.

3.4.1. Clock and data recovery

Figure 7 shows the block diagram of the CDR circuit, which consists of a
frequency-synthesis PLL and the CDR core. A voltage-controlled oscillator
(VCO), which is a component block of the PLL, is the sixth stage of the
CML-type delay cell (Fig. 8). 22 The PLL therefore generates 2.5-GHz pure 4-
phase clock signals (differential clocks of 0 and 90 degrees: CLKO, CLKOB,
CLK90, CLK90B) from the 125-MHz CLKRX signal. The CDR core consists
High-Speed and High-Data-Bandwidth Transmitter and Receiver 13

CDR Core
CLK90 PI 2.5-GHz
CLKO PI
Recovered CLK
Q
Q.
-/-•DOUT
"c
0
E
5-Gb/s en
Up
DIN
CO Down

2.5-GHz, 4-Phase CLK


(2-Phase Differential) CLKRX
125-MHz

Fig. 7. Block diagram of clock and data recovery (CDR) circuit.

Fig. 8. Voltage-controlled oscillator (VCO) and delay buffer.

of two phase interpolators (Pis), a phase detector (PD), and a PI controller.


The PI, which is an analog multiplier, receives two complementary clock
signals (<|), 8) from the PLL through the PI controller and generates the
interpolated clock signal. 23 The PI controller selects the input clock signals
14 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

for the Pis from the 4-phase clock signals of the PLL and generates digital
weight codes according to the PD results. The resolution depends on the
number of digital weight codes selected from the PI controller and is set at
1/16 of the 90-degree clock signal; the interpolation step is about 5.6 degrees.
Figure 9 shows the simulated transfer function of the PI; namely, phase shift
versus number of digital weight codes selected from the PI controller. The
input clock signals for the Pis change at code no. 0 or 16. The PI receives
signals CLKO and CLK90 and generates the interpolated clock signal between
0 and 90 degrees (domain A in Fig. 9). In domain B, the PI controller changes
CLKO to CLKOB, which is the clock signal with a 180-degree phase difference
from CLKO, and the PI generates the interpolated signal between 90 and 180
degrees. In domain C, CLK90 changes to CLK90B, which is a 270-degree
clock signal, and a clock signal between 180 and 270 degrees is generated.
Finally, in domain D, CLKOB changes back into CLKO and the PI outputs a
clock signal between 270 and 360 degrees. The PI therefore has unlimited
phase shift capability. Figure 9 also shows that the maximum interpolation
step is about 8 degrees, which is equal to 9-psec jitter or a 0.02 unit interval
(UI) for a 2.5-GHz clock signal. This step is negligible for data link
applications.

1 • i • 1 • 1 ' r
360

| 270
CD
"D

c 180
'sz
CO
CD

| 90

0
0 16 0 16 0
Code No.

Fig. 9. Simulated phase-interpolator transfer function.


High-Speed and High-Data-Bandwidth Transmitter and Receiver 15

The operating speed of the receiver is limited by that of the CDR, because
the CDR is the fastest circuit in the receiver. In particular, the PD is the most
important block in the CDR because it operates at the highest speed. A self-
alignment PD is suitable for high-speed operation because it is used as a root
module for the tree-type DEMUX. Two types of self-alignment PDs have been
developed. One is the linear type, such as the Hogge type, which outputs the
phase difference between the data and the clock signal as an analog value. 24
The other is the binary type, such as the Alexander type, which outputs the
phase difference as a digital signal. 25 Binary self-alignment PD is more
suitable for high-speed operation because it does not need to handle fine
pulses, which cannot be handled by using CMOS circuits, unlike the linear-
type PD. The binary PD, however, increases jitter because of its "bang-bang"
operation. 26 The "bang-bang" operation is suppressed by using an Up/Down
counter circuit. The Up/Down counter, which consists of an 8-bit-long shift-
register, counts up when it receives an up signal from PD, or down when it
receives a down signal. The counting is continued up to 8-bit difference
between the total number of up and down signals. Since the Up/Down counter
operates as an integrating circuit, it therefore behaves as a low-pass filter. The
Up/Down counter must operate at over 2.5 GHz in 5-Gb/s RXs because the
data rate of the PD results, such as up or down signals, is 2.5 Gb/s. However,
2.5-GHz synchronous logic operations, such as up/down counting, are
extremely difficult to implement when using 0.25-^m CMOS technology
because 2.5 GHz is almost the maximum speed of the flip-flops.
Asynchronous circuits, in contrast, such as the asynchronous 1:2 DEMUX, can
operate at the same speed as 2.5 GHz. We therefore developed the binary
self-alignment PD with parallel output shown in Fig. 10. This PD compares
the phase of input data with CLK0_PI/CLK90_PI from the Pis. The PD
outputs 2.5-Gb/s Pre-Up and Pre-Down signals and deserialized data, DO and
D l . Therefore, this PD is also used as the root module of the 1:8 tree-type
DEMUX. The Pre-Up/Down signals are converted into 2-bit and half-rate
Up/Down signals (Up[l:0], Down[l:0]) by the asynchronous 1:2 DEMUX.
Although a more complex Up/Down counter is required to handle four signals
(two Up and two Down signals), this self-alignment PD with parallel output
can halve the operating speed required for the Up/Down counter.
16 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

- Binary Self-Align. P D
y M
1:2DEMUX
Pre-Down H[f^SM" 1 * > • c
DownO o
MSM Q
pDo-jy-i
J
l " > MSM
DO
-• r--T •:-. si
S r '—)£> H MSM f*»
UpO

Up1
^5
a.
3
A D1 _ . C * F/F —r*
O
_j K: 1/2 J=H,
O Recovered CLK
MSM: Master-Slave-Master Latch

Fig. 10. Block diagram of a binary self-alignment phase detector with parallel output.

3.4.2. Demultiplexer

Figure 11(a) is a block diagram of the 1:2 DEMUX module, which does not
require precisely controlled clock signal distribution; Figure 11(b) is its
timing diagram. The module contains a clock divider for the next stage, as
well as a D-Flip-flop (D-F/F) and a Master-Slave-Master type Flip-flop
(MSM-F/F) for data. The module operates at half the clock speed of the input
data rate (e.g., a 2.5-GHz clock for 5-Gb/s data). This is because this module
operates using both the rising and falling clock edges. The D-F/F outputs odd
data streams at the rising edges of the clock, and the Master-Slave latches in
the MSM-F/F latch even data streams at falling edges of the clock. These
latched data are output at the rising edges of the clock by the second Master
latch in the MSM-F/F. In this way, two-bit output data DO and Dl is
synchronized with the rising edges of the input clock. A divided clock CLK/2
is generated at the falling edges of the input clock. Without the delay circuit
located after the clock divider, CLK/2 would initially change at a point close
to the center of output data D0/D1, but as the operating speed gradually begins
to reach the Gb/s range, the internal delay in the F/Fs could no longer be
ignored: it would reduce both the operating margin and speed of the 1:2
DEMUX module. With the delay circuit, which adjusts the timing between
D0/D1 and CLK/2, the timing of the CLK/2 for each next-stage is set precisely
at the center of each D0/D1 eye. That is to say, the 1:2 DEMUX module
generates optimized timing between the divided clock and the data for the
High-Speed and High-Data-Bandwidth Transmitter and Receiver 17

next-stage DEMUX modules.


Figure 12 is a block diagram of an asynchronous tree-type 1:8 DEMUX
composed of the 1:2 DEMUX modules. Unlike the conventional scheme
shown in Fig.3(c), 7 here it is not necessary to distribute external clock signals
from the clock generation block to respective DEMUX modules because the
clock for the module is CLK/2 of the previous module. Neither is it necessary
to control the timing between input data and clock because the timing has
already been optimized by the previous module. This asynchronous tree-type
DEMUX is obtained simply by connecting 1:2 DEMUX modules, and it
suffers no clock distribution or clock skew problems, even when applied to
high-speed LSIs.

3.4.3. Frequency conversion circuit

Our design contains two parallel-to-parallel frequency conversion circuits:


625-MHz X 8-bit to 500-MHz X 10-bit for the receiver (illustrated with a
timing diagram in Fig. 13) and 500-MHz X 10-bit to 625-MHz X 8-bit for the
transmitter. Each circuit's core is a least common multiple (LCM) register
with input/output selectors. To simplify the input/output selector structure, the
LCM register is given a 40-bit capacity, because 40 bits is the least common
multiple of 8-bit input/output and 10-bit output/input data. The input/output
selector shown in Fig. 13(a) operates using multiple phase clocks. The 1:5
input selector, containing an 8-bit input register, operates with five clocks,
each having a 1.6-nsec phase difference ((|>0 - $4). 8-bit input data is first

1
MSM F/F
1 Master Slave Master
1

DO
DINO®®000
1
• — Latch Latch • Latch Q CLK
i r~
i
c | tf 1 i

i
i D-F/F
1 «•

i
DIN C % 1 i
M 11 ii
CLK/2 ^ ^ ^ ^ ^ k i

J
CLK L o Q
r—* -0 C Q
1
Td
1 *
1
1 CLK/2
t!t
Td'

(a) (b)
Fig. 11. (a) 1:2 DEMUX module block diagram and (b) timing diagram.
18 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

625-Mb/ 's DATA

DO •DO
1.25-Gb/s DATA t
DIN D1 *D4

2 5 - G b / s DATA DO CLK CLK/2


DIN D1
CLK CLK/2 i DO >D2
DIN D1 >-D6
DO CLK CLK/2
5-Gb/s DATA • DIN D1
2.5-GHz CLK - CLK CLK/2 i DO •Dl
(from PLL) DIN D1 >-D5
DO CLK CLK/2
DIN D1
CLK CLK/2 1 DO *D3
DIN D1 »-D7
CLK CLK/2

• 625-MHz CLK

Fig. 12. Asynchronous tree-type 1:8 DEMUX block diagram.

stored in the 8-bit input register in time with a 625-MHz clock signal (<(>625)
and then stored into the LCM register, in a location determined by which
multi-phase clock (<|>0 - <|)4) was on at that point. The 4:1 output selector,
containing four tri-state select buffers and a 10-bit output register, operates
with four select signals ((pO - cp3), each having a 2-nsec phase difference, for
the respective tri-state select buffers. The tri-state select buffers read 10-bit
data from the LCM register in order (cpO - cp3) and write the data to the 10-bit
output register in time with a 500-MHz clock signal ((p500). In this way, the
1:5 input selector operates at a 1.6-nsec period X 5 steps, while the 4:1 output
selector operates at 2 nsec X 4 steps. To avoid read/write conflicts with the
LCM register, the timing difference between the input/output selectors is set
to 180 degrees, which guarantees a maximum timing margin for the frequency
conversion circuit. (The structure of the 10-bit to 8-bit frequency conversion
circuit for the transmitter is fundamentally the same as that illustrated for the
receiver.)
High-Speed and High-Data-Bandwidth Transmitter and Receiver 19

3,4.4. Comma detection and word alignment logic

Figure 14 is a simple block diagram of the comma detection and word


alignment logic. Word boundaries are identified by detecting the comma
signals (0011111010 or 1100000101) that are used by the Fiber Channel to
delimit the start of information packets. Current 10-bit data is first stored in
the upper register (dlO - d 1 9). When new 10-bit data arrives, the previous 10-
bit data is shifted to the lower register (dO - d9). As indicated by the shaded
area, a comma signal may extend over both previous and current data because
the tree-type DEMUX and the frequency conversion circuit can neither
identify comma signals nor align word boundaries. In checking for the
presence of a comma signal, the comma detector checks only the 19-bit
registers dO - d l 8 ; it omits d 19 from the check. This is to avoid the possibility
of detecting the same comma signal twice. (Specifically, if d 19 were also
checked, then when a non-overlapping comma signal occurred, i.e., one
occurring precisely in dlO - d 1 9, it would be detected first there, in the upper
register, and then again after it had been shifted to the lower register.)
A block diagram of the comma detector is shown in Fig. 15. It consists of
ten comma detection blocks (CDBs), a 10-input OR gate, and ten registers for
select signals. Each CDB compares 10-input data with the comma signals
(0011111010 or 1100000101). When one of the CDBs detects a comma signal,
the 10-input OR gate sends a trigger signal to the ten registers, which causes
the comma detector to generate a select signal. (If, for example, the comma
signal was that shown by the shaded area in Fig. 15 (d[15:6J), the select
signal would be Sel[6].) Subsequent data can then be extracted through the
word alignment cell (a tri-state buffer) from the selected registers (d6 - dl5).
The comma detection/word alignment circuit can be operated within 1 clock at
500 MHz, and the latency of the word alignment logic is only 3 clocks.
20 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

IN(625Mb/s)- -4)625
D._1C
8 bit Input
1:5 Input Selector Register

B \ 8 \
$100 X !
:Z3*r
40 bit LCM
Register
1 5
7
2 --? <p2- <p3-

4:1 Output Selector £ 10 bit Output


Register
(p500 . / , OUT (500 Mb/s)
10

(a)

(b)

Fig. 13. (a) Block diagram of 625-MHz x 8-b to 500-MHz x 10-b frequency conversion in the
receiver and (b) timing diagram.
High-Speed and High-Data-Bandwidth Transmitter and Receiver 21

d19

, d1S 0
IN 1
0
1
1 '
dio 1

d9 • 1
1
0 -
d6 ' 0 *

•A
dO
s / ^ OUT d[15:6]

Fig. 14. Block diagram of comma detection and word alignment logic.

•jf-* Sel [9:0]


[18:0]

Fig. 15. Comma detector block diagram.

3.4.5. 8B10B encoder

The use of 8B10B encoded data increases not only the transmission accuracy
but also the operating margin in the receiver. The reason for these increases is
that the 8B10B encoder improves the DC-balance of the serial data by using
running disparity control. The 5-Gb/s serial transceiver requires the 8B10B
encoder to operate at 500 MHz. A semi-pipelined architecture that
respectively performs 5B6B and 3B4B operations in the periods of rising
edges and falling edges has been proposed, as shown in Fig. 16. 2 ' Since this
architecture requires the bypass of running disparity (RD) information
22 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

between 5B6B and 3B4B operations, the RD information operation must be


processed within half a clock cycle at 500 MHz. Therefore, 500-MHz
operation in a 0.25-|xm CMOS is impossible. To overcome the speed limitation
of the semi-pipelined encoder, we developed a fully pipelined 2-stage 8B10B
encoder, as shown in Fig. 17. The first stage, consisting of Pre-5B6B and Pre-
3B4B encoders, does not require RD information. It must, however, output not
only the pre-encoded results, but also two flags: one showing whether the
code has an alternate code, and the other showing whether the code is a DC-
balance code. The second stage determines the final output code according to
the information provided by the 5B6B and 3B4B pre-encoders and the current
RD information. The second stage is also responsible for the operation for
D.x.7 and D/K.y.7. The delay time of the first stage is 1.2 nsec and that of the
second stage is 1.3 nsec. As a result, the fully pipelined 8B10B operation is
achieved at 750 MHz.
We have also developed a fully pipelined 2-stage 10B8B decoder for the
receiver. The first pipelined stage of the 10B8B decoder consists of the Pre-
6B5B and Pre-4B3B decoders as well as the 8B10B encoder; it does not
require RD information for running disparity error checks. The second stage
checks running disparity error according to both the pre-decoded information
from the first stage and the currently running disparity information. The speed
of the fully pipelined 10B8B decoder operation is the same as that of the
8B10B encoder operation.

3.4.6. Phase and frequency compensation technique

F i g u r e 18 shows a t i m i n g c h a r t of four RX chips to e x p l a i n the p h a s e


compensation. A phase difference between the multiple RX chips is caused by
the data skew in the t r a n s m i s s i o n lines and by the i n t e r n a l - c l o c k - s t a t e
difference. The t r a n s m i s s i o n line data skew is j u s t the input data skew
between multiple chips (I in Fig. 6). The clock state difference between the
multiple chips is caused by the tree-type DEMUX. It is a result of using an
asynchronous tree-type DEMUX and frequency conversion architecture as the
1:10 DEMUX circuit in order to obtain high-speed operation. 1 3 Although the
DEMUX can operate at high speed, the clock timing is not always the same,
nor is the t i m i n g of the DEMUX output data, because the 1:8 t r e e - t y p e
DEMUX does not correctly align word boundaries and does not reset the clock
of each chip to the same time. The output signals of the word-alignment logic
may therefore vary by the 8-bit time of 5-Gb/s transmission, or 1.6 nsec (II).
Moreover, the variation between multiple RX chips may increase when the
serial data are skewed (II). This variation transfers to the output of the8:32
High-Speed and High-Data-Bandwidth Transmitter and Receiver 23

Reg Reg
Reg
6b
> 5B6BEnc "/*_-/—» 10
8 rd
ra "i 10b
8b
Input Reg CLKTki
CLKX

f
3b 3B4B —-Xi
Enc

RD
CLK CLK CLK
•4
1.5 CLK

Fig. 16. Block diagram of a semi-pipelined 8B10B encoder.

Alternate/DC-Balance Code

Reg

•/• 5B6B Pre-Enc


8b 10b 10
14b -f*
Input
> 3B4B Pre-Enc
^ Q
rd

CLK CLK ;LK


•4- Pre-Encoders RD Operation
1.2 ns 1.3 ns

Fig. 17. Block diagram of a fully pipelined 8B10B encoder.

Post-DEMUX because the 10B8B decoder and the Post-DEMUX operate


according to a 500-MHz clock signal from the 1:10 DEMUX (III). The elastic
buffer compensates for data variation within 0.5 clocks at 125 MHz, or 4 nsec,
and outputs different phase data within this range at the same timing (IV) as
synchronized data with CLKRX (V). As a result, the maximum skew tolerance
between the serial lines for all four channels to maintain synchronous
operation is 12 serial-clock bit times (2.4 nsec = 4 - 1 . 6 nsec), which is
equivalent to a coaxial cable length of about 50 cm.
Figure 19 shows the 32-bit parallel-data timing chart for RX. It helps
24 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

explain the compensation of the frequency difference between the clocks of


the TX and the RX systems. The elastic buffer also accommodates the
frequency differences by using a 32-bit Fiber Channel Idle character (K.28.5
D21.4 D21.5 D21.5) that is transmitted between packets. The read selector in
the elastic buffer is reset when the Idle characters are transmitted. As a result,
the elastic buffer removes/inserts the Idle characters according to both the
difference between frequencies of the TX and RX system clocks and the phase
difference between the timing of the write and read selectors. When CLKRX
is slower than CLKTX (fc < fCLKTX), the Idle character is removed as
shown in Fig. 19. On the other hand, when CLKRX is faster than CLKTX
(fcLKRx > fcLKTx)' the Idle character is inserted between packets (Fig. 19). The
maximum compensatable frequency difference (Af) depends on packet length
(L) and is given by Af [ppm] = 0.5 x 1E6 / L. For example, Af for the
maximum packet length given in the Fiber Channel Standard (L: 2148 words)
is 233 ppm.

Chip "A" Input


Ch|
(I) DIN
^^TnTTBTnTmnnmn-- p "B" ln P ut
Chip "C" Input
Chip "D" Input
•, I* *\ 8-bit (1.6 nsec)
^ A ^-> | Tree-Type DEMUX Variation
(») Word Align.
Output (10b) ry^-L 0.5CLK (4 nsec)
* ^3-> Compensation Range
(Hi) Post-DEMUX
Output (32b)

(IV) Elastic Buffer


Output (32b)

(V) Reference CLK (CLKRX) _T

Fig. I 8. Timing chart for phase-difference compensation in RXs.


High-Speed and High-Data-Bandwidth Transmitter and Receiver 25

fdKRX < fdKTX


Data Packet Data Packet

32-b Parallel Data

Elastic Buffer Out —-{D-


V Remove
fcLKRX > f d K T X
Data Packet Data Packet

Insert

Fig. 19. Timing chart for frequency-difference compensation.

4. Experimental Results

The developed transmitter and receiver chips were fabricated using a 0.25-u.m
triple-metal CMOS process. The MOS transistor characteristics list on table 2.
The microphotographs in Fig. 20 show the chips and the TX and RX cores.
The die size, determined by I/O pad requirements, is 4x6 mm. The core of the
TX occupies an area of 1.1x1.2 mm, and that of the RX occupies an area of
1.3x1.3 mm. The high-speed blocks, — such as 10:1 MUX/DEMUX, PLL, and
CDR --- are designed to have fan-out numbers of less than two. This is
because the maximum operating speed of inverter circuits with a fan-out of
two is about 2.5 GHz including the influence of wire capacitance. Distortion
of the input-signal waveform of the high-speed signals was avoided by using
50-Q on-chip termination.
The chips were mounted directly on a PC board to measure chip
performance. The high-speed signal lines on the PC board were designed to
have impedance of 50 Q, with a co-planar structure. Figure 21 shows an output
waveform of data transmitted at 5 Gb/s. It was measured by using the 1-m
coaxial cables with 26-GHz bandwidth. Error-free operation was obtained for
random 32-bit 125-Mb/s parallel input data at a supply voltage of 2.5 V. We
also measured the TX performance using AWG20 coaxial cables, whose
bandwidth is about 2.4 GHz, and demonstrated successful transmission. A
transmission test using 10-m AWG20 coaxial cables from the TX to the RX
chip was also successful. A 10 "12 BER, which depends on measurement time,
was obtained using both coaxial cables with 26-GHz and 2.4-GHz bandwidth.
26 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

When using a cable with lower bandwidth, output buffer must be improved,
such by using a pre-emphasis buffer. 5 ' 12' I5
Figure 22 shows the waveforms of two on the four receiver chips for
transmission test to achieve 20-Gb/s aggregate bandwidth. They include 5-
Gb/s serial input data, output waveforms, elastic buffer clock, and internal
recovered clock. This figure also shows the waveforms corresponding to
different skew values of input data. These results demonstrate that the
operation of the receiver was successful and that the RXs output the retimed
data at the same timing even when the serial input data signals included data
skew and the internal recovered clock timing between two chips was different.
The power consumption of the TX chip is 500 mW and that of the RX chip is
750 mW at 5 Gb/s with a 2.5 V supply. Figure 23 shows the high-speed 2.5-
GHz clock output waveform from the frequency synthesis PLL, a 125-MHz
recovered byte clock, and the timing jitter. The peak-to-peak jitter of the
high-speed clock is about 24 psec, or 0.12 UI; that of the byte clock is about
200 psec, or 0.02 UI. The byte clock generates more jitter than the high-speed
clock, but more importantly, it enables the RX chip to operate correctly (as
shown in Fig. 22). For data communications, this amount of jitter is
negligible.

Table 2: MOS transistor characteristics.

nMOS pMOS
Gate Oxide thickness 6 nm 6 nm
Gate length 0.25 |im 0.25 nm
Threshold voltage 0.2 V -0.2 V
Saturation current 560 nA/jxm -260 nA/|j.m
(|Vds|, |Vgs| = 2.5 V )
High-Speed and High-Data-Bandwidth Transmitter and Receiver 27

Fig. 20. Chip and core circuits microphotographs: transmitter (left) and receiver (right).

In our developed multi-channel transmission, the interface bandwidth


increases in proportion to the number of chips. The aggregate bandwidth
reaches 20 Gb/s when the number of chips is four. In principle, there is no
limit to the number of chips in this multi-channel chip set. However, as the
number of chips increases, the total power consumption and the area of the
multiple chips on the PC board increase. In particular, the need for a large
area makes the distribution of both data and system clock signals too difficult,
while data and clock skew stay low. Since the phase-difference compensation
using the elastic buffer is designed to be within half a cycle of the receiver
system clock, the clock skew between the inputs of multiple receiver chips
degrades the range in which phase can be compensated. Therefore, the total
length of both coaxial cable and signal line on the PC board.must be short;
that is within the compensatable range of 50 cm. In response to the line length
of the system clock, since the wavelength of the 125-MHz system clock is
over 1.5 m, system clock skew on the PC board causes no problems for a
limited number of chips. In fact, the total power limits the number of chips
used.
28 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

/•^Br M. ^BP/

200mV/div[_
50ps/div

Fig. 21. Measured waveform: transmitted-data eye diagram at 5-Gb/s operation.

t" Elastic Buffer

f" Elastic Buffer

Clock State Difference ^ciock B


2ns'dtu

2ns/div

Fig. 22. Measured waveforms: skewed input data and retimed output data in RX chips.
High-Speed and High-Data-Bandwidth Transmitter and Receiver 29

x
&<••$& **"£**# %. «*

5S.64na
r55-73iT«
fNSfi 3 82/ps
'kPk E3.Sp=
i 1311
|n±l<r
**2*
*±3»
Hfma
71"&2iSi
9 5 BIB*
99.776*
764
U i" B 9 ' . " S 5 n r p t * '•"CTT^ITSrn " " — * **"*" "
34»V feHSfi 33.-I9PB n±E? 9 6 . 5 9 9 5 t
89.2-1,13 'kPh 19Gps
3578 &KW3
. ± 3 ' 99.522-4 f
3g?9 I
d9 2414ns
»*-*%*
Sfr HBUns

Jitter = 23.6 psec P-P (0.12UI) Jitter = 196 psec P-P (0.02UI)

Fig. 23. Measured waveforms and timing jitter: frequency synthesis PLL clock (left) and byte-
recovered clock (right).

Discussion

Let us consider the relationship between the maximum operating frequency


and the gate length. The maximum frequency is inversely proportional to the
gate length or design rule as shown in Fig. 1. By simply extrapolating the
fitted line, we will be able to reach 5-Gb/s operation with 0.1 -^.m CMOS
technology in the case of using a single architecture, which operates with the
same frequency clock signal as the transmitted data rate. A more radical
design is obviously needed to overcome the conventional circuit limitations
and improve the circuit performance further.
Our approach is to use the asynchronous tree-type DEMUX and frequency
conversion, which utilizes both clock edges (as demonstrated in Sect. 3). 16
This has achieved twice the performance of the conventional design as shown
in Fig. 1. This circuit technique is independent of the design rule, so is always
effective at doubling the processing speed. The maximum transmitted data
rate, for example, will reach 10 Gb/s with 0.1-|j.m CMOS. The oversampling
technique, in which sophisticated decision control logic can operate at a lower
frequency, with multi-phase clocks is also an effective approach for getting
the most out of CMOS characteristics. 1 ' 4"6' 8> 10' 12, 14 For example, 4 Gb/s
operation can be achieved using 500-MHz, 8 phase clock signals by using
0.5-nm CMOS technology. Our approach using both clock edges is one type of
30 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

the multi-phase architecture, and overcomes the difficulty of clock


distribution needing precise clock phase control. These architectures for
high-speed operation with lower-speed devices are supported by the high level
of integration of CMOS technology over non-CMOS technology.
Another possibility is to introduce current-mode logic to the final 2:1 stage,
which is the fastest operating block, or to all circuits. 1, 17"'8 This approach
makes it possible to take low-power advantage of using CMOS while still
achieving high-speed operation, and is also supported by CMOS's high-
integration.
In addition, we should point out an inherent advantage of CMOS
implementation: the availability of high-performance p-channel devices
(pMOS). It should be remembered that both PNP bipolar transistors and p-
channel GaAs FETs have very poor performance, because of low carrier-drift
mobility, compared with their n-type transistors. The application of pMOS
transistors to various portions seems to add design flexibility not only to
biasing but also to high speed. It is effective for low voltage circuits, which
are necessary for deep-submicron CMOS technology.
On the other hand, we have to solve several design issues on the road to
high-speed operation LSIs. As supply voltage and design rules are reduced in
accordance with device scaling, "signal integrity" problems become serious.
One of the major drawbacks of CMOS push-pull operation is the large
switching noise induced by sudden current flows on power and ground lines.
The switching noises of the power and ground lines, for example, reach about
100 mV. 27 This digital switching noise could cause delay time variations or
logical operation error. Low-impedance power and ground distribution is thus
more important than in bipolar and GaAs technologies. This problem is
especially important when integrating low-voltage swing circuits, such as the
current-mode logic, with large-scale digital circuits. On-chip decoupling, or
bypass capacitors, would be help to suppress the noise. A silicon substrate
itself is another source of digital switching noise. The substrate noise, for
example, reach about 50 mV.27 Analog circuits, such as PLL and preamplifier,
are very sensitive to this noise. A wider line-spacing and effective electronic
shielding will be necessary, together with high-noise-immunity design. 9

6. Conclusion

This paper has reviewed recent research on CMOS gigahertz-data-rate


communication LSIs, and it has described design innovations to overcome
device performance limitations. We have developed a high-data-bandwidth
digital interface for ultra-high-resolution flat panel displays. The interface
High-Speed and High-Data-Bandwidth Transmitter and Receiver 31

consists of a multi-channel transmitter and receiver chip set that operates at 5


Gb/s. Its key features for high-speed operation are a tree-type demultiplexer
and frequency conversion architecture, a self-alignment phase detector for
clock and data recovery circuit, and fully pipelined 8-bit to 10-bit encoder.
The tree-type and frequency conversion architecture makes it possible to keep
the benefits of a tree-type architecture while still meeting Fiber Channel
standards. The self-alignment phase detector optimizes the timing between
transmitted input data and the receiver clock signal, so timing can be designed
easily and operating speed increases. The fully pipelined encoding operation
reaches 750 MHz with 0.25-pim CMOS technology. The multi-channel chip set
is obtained by using two compensation techniques. One compensates for phase
difference between multiple receiver chips. Therefore, in principle, there are
no limits to the number of chips that can be used concurrently. The other
technique compensates for differences in the frequencies of the transmitter
and receiver chips, because personal computers and peripheral devices each
have their own clock source. These solutions for high-speed operation and
multi-channel transmission are supported by the advantage of CMOS
technology over non-CMOS technology in its high level of integration. The
0.25-jim CMOS technology is therefore helpful to integrate a lot of function
blocks into a single chip. One chip set with a 2.5-V power supply provides a
5-Gb/s bandwidth, so an aggregate bandwidth of 20 Gb/s can be obtained by
using four chip sets. Based on these results, we discussed high-speed LSI
design issues and future prospects.
With deep-submicron CMOS technologies, gigahertz-rate operation of each
communication building block is possible. We can overcome the conventional
circuit limitations and improve the circuit performance further by applying the
radical circuit design techniques reviewed in this paper.

Acknowledgments

The authors would like to thank Messrs. Yasuhiko Iwamono, Masaharu Sato,
Yutaka Tsutsui, Yoshinori Hirota, and Yoetsu Nakazawa for their excellent
device fabrication, Messrs. Tetsuya Enomoto, Takashi Ae, Masaki Ishida, and
Hideki Heiuchi for their generous help with the chip layout, Messrs. Akio
Tajima, and Hidenori Ikeno, Drs. Haoya Henmi, and Hiroshi Hayama for their
valuable assistance in system design. The authors also wish to express
appreciation to Messrs. Tadahiko Horiuchi, and Syuji Kishi, Drs. Masakazu
Yamashina, Takao Nishitani, Masao Fukuma, and Hiroyuki Abe for their
continuous encouragement throughout this work.
32 M. Fukaishi, K. Nakamura & M. Yotsuyanagi

References

1. J. F. Ewen, A. X. Widmer, M. Soyuer, K. R. Wrenner, B. Parker, and H. A.


Ainspan, "Single-chip 1062 Mbaud CMOS transceiver for serial data
communication", in ISSCC Dig. Tech. Papers 38 (1995) pp. 32-33.
2. S. Yasuda, Y. Ohtomo, M. Ino, Y. Kado, and T. Tsuchiya, "3-Gb/s CMOS 1:4
MUX and DEMUX ICs", IEICE Trans. Electron. E78-C (1995) pp. 1746-1752.
3. M. Kurisu, M. Kaneko, T. Suzaki, A. Tanabe, M. Togo, A. Furukawa, T. Tamura,
K. Nakajima, and K. Yoshida, "2.8-Gb/s 176-mW byte-interleaved and 3.0-Gb/s
118-mW bit-interleaved 8:1 multiplexers with 0.15-u.m CMOS technology", IEEE
J. Solid-State Circuits 31 (1996) pp. 2024-2029.
4. C.-K. K. Yang and M. A. Horowitz, "A 0.8-nm CMOS 2.5-Gb/s oversampling
receiver and transmitter for serial links", IEEE J. Solid-State Circuits 31 (1996)
pp. 2015-2023.
5. A. Fiedler, R. Mactaggart, J. Welch, and S. Krishnan, "A 1.0625 Gbps transceiver
with 2x-oversampling and transmit signal pre-emphasis", in ISSCC Dig. Tech.
Papers 40 (1997) pp. 238-239.
6. D. Chen and M. O. Baker, "A 1.25 Gb/s, 460 mW CMOS transceiver for serial
data communication", in ISSCC Dig. Tech. Papers 40 (1997) pp. 242-243.
7. M. Soda, H. Tezuka, S. Shioiri, A. Tanabe, A. Furukawa, M. Togo, T. Tamura, and
K. Yoshida, "A 2.4-Gb/s CMOS clock recovering 1:8 demultiplexer", 1997
Symposium on VLSI Circuits Digest (1997) pp. 69-70.
8. C.-K. K. Yang, R. Farjad-Rad, and M. A. Horowitz, "A 0.5-um CMOS 4.0-Gb/s
serial link transceiver with data recovery using oversampling", IEEE J. Solid-
State Circuits 33 (1998) pp. 713-722.
9. A. Tanabe, M. Soda, Y. Nakahara, A. Furukawa, T. Tamura, and K. Yoshida, "A
single chip 2.4 Gb/s CMOS optical receiver IC with low substrate crosstalk
preamplifier", in ISSCC Dig. Tech. Papers 41 (1998) pp. 304-305.
10. K. Lee, S. Kim, D-K. Jeong, G. Kim, B. Kim, V. D. Costa, and D. Lee, "A jitter-
tolerant 4.5 Gb/s CMOS interconnect for digital display", in ISSCC Dig. Tech.
Papers 41 (1998) pp. 310-311.
11. K. Nakamura, M. Fukaishi, H. Abiko, A. Matsumoto, and M. Yotsuyanagi, "A 6
Gbps CMOS phase detecting DEMUX module using half-frequency clock", 1998
Symposium on VLSI Circuits Digest (1998) pp. 196-197.
12. R. Farjad-Rad, C.-K. K. Yang, M. Horowitz, and T. Lee, "A 0.4-nm CMOS 10-
Gb/s 4-PAM pre-emphasis serial link transmitter", 1998 Symposium on VLSI
Circuits Digest (1998) pp. 198-199.
13. M. Fukaishi, K. Nakamura, M. Sato, Y. Tsutsui, S. Kishi, and M. Yotsuyanagi, "A
4.25-Gb/s CMOS fiber channel transceiver with asynchronous tree-type
demultiplexer and frequency conversion architecture", IEEE J. Solid-State
Circuits 33 (1998) pp. 2139-2147.
High-Speed and High-Data-Bandwidth Transmitter and Receiver 33

14. R. Gu, J. M. Tran, H-C. Lin, A-L. Yee, and M. Izzard, "A 0.5 - 3.5 Gb/s low-
power low-jitter serial data CMOS transceiver", in ISSCC Dig. Tech. Papers 42
(1999) pp. 352-353.
15. M-J. E. Lee, W. Dally, and P. Chiang, "A 90 mW 4 Gb/s equalized I/O circuit with
input offset cancellation", in ISSCC Dig. Tech. Papers 43 (2000) pp. 252-253.
16. M. Fukaishi, K. Nakamura, H. Heiuchi, Y. Hirota, Y. Nakazawa, H. Ikeno, H.
Hayarna, and M. Yotsuyanagi, "A 20 Gb/s CMOS multi-channel transmitter and
receiver chip set for ultra-high resolution digital display", in ISSCC Dig. Tech.
Papers 43 (2000) pp. 260-261.
17. A. Tanabe, M. Umetani, I. Fujiwara, T. Ogura, K. Kataoka, M, Okihara, H.
Sakuraba, T. Endoh, and F. Masuoka, "A 10 Gb/s demultiplexer IC in 0.18|xm
CMOS using current mode logic with tolerance to the threshold voltage
fluctuation", in ISSCC Dig. Tech. Papers 43 (2000) pp. 62-63.
18. J. Savoj and B. Razavi, "A 10-Gb/s CMOS clock and data recovery circuit", 2000
Symposium on VLSI Circuits Digest (2000) pp. 136-139.
19. A. Tanabe, M. Togo, M. Soda, H. Tezuka, T. Suzaki, A. Furukawa, and K. Emura,
"High performance CMOS for GHz communication IC", 1996 Symposium on VLSI
Tech. Digest (1996) pp. 134-135.
20. R. C. Walker, K-C. Hsieh, T. A. Knotts, and C-S. Yen, "A lOGb/s Si-Bipolar
TX/RX Chipset for Computer Data Transmission", in ISSCC Dig. Tech. Papers 41
(1998) pp. 3 0 2 - 3 0 3 .
21. A. X. Widmer and P. A. Franaszek, "A DC-Balanced, Partitioned-Block, 8B/10B
Transmission Code", IBM J. Res. Develop. 27 (1983) pp. 440 - 4 5 1 .
22. J. G. Maneatis and M. A. Horowitz, "Precise Delay Generation Using Coupled
Oscillators", IEEE J. Solid-State Circuits 28 (1993) pp. 1273-1282.
23. S. Sidiropoulos and M. Horowitz, "A Semidigital Dual Delay-Locked Loop",
IEEE J. Solid-State Circuits 32 (1997) pp. 1683-1692.
24. C. R. Hogge. Jr., "A Self Correcting Clock Recovery Circuit", IEEE Trans.
Electron Devices ED-32 (1985) pp. 2704-2706.
25. J. D. H. Alexander, "Clock Recovery from Random Binary Signals", Electronics
Letters 11 (1975) pp. 541-542.
26. M. Rau, T. Oberst, R. Lares, A Rothermel, R. Schweer, and N. Menoux,
"Clock/Data Recovery PLL Using Half-Frequency Clock", IEEE J. Solid-State
Circuits 32 (1997) pp. 1156-1159.
27. N. K. Verghese, T. J. Schmerbeck, and D. J. Allstot, Simulation Techniques and
Solutions for Mixed-Signal Coupling in Integrated Circuits, Kluwer Academic
Publishers, 1995.
This page is intentionally left blank
I n t e r n a t i o n a l J o u r n a l of H i g h S p e e d E l e c t r o n i c s a n d S y s t e m s , Vol. 1 1 , N o . 1 (2001) 3 5 - 7 6
© W o r l d Scientific P u b l i s h i n g C o m p a n y

H I G H - P E R F O R M A N C E Si A N D S i G e
BIPOLAR TECHNOLOGIES A N D CIRCUITS

MARTIN W U R Z E R
Corporate Research, Infineon Technologies AG, Otto-Hahn-Ring 6
D-81730 Munich, Germany
also with Institut fur Nachrichtentechnik und Hochfrequenztechnik,
Technische Universildt Wien, A-1040 Vienna, Austria

T H O M A S F . MEISTER, J O S E F B O C K , H E R B E R T SCHAFER,
KLAUS A U F I N G E R , SABINE B O G U T H
Wireless Products, Infineon Technologies AG, Otto-Hahn-Ring 6
D-81730 Munich, Germany

H E R B E R T K N A P P , M I R J A N A R E S T , R E N A T E S C H R E I T E R , LUDWIG T R E I T I N G E R
Corporate Research, Infineon Technologies AG, Otto-Hahn-Ring 6
D-81730 Munich, Germany

In this paper we present Si and SiGe bipolar technologies and circuits suited for present
and future high-performance communication systems. T h e silicon bipolar technology
described has an implanted base and, without increase in process complexity in com-
parison to current production technologies, transit frequencies of 52 GHz and maximum
oscillation frequencies of 65 GHz are achieved. T h e transistors of the described epitaxial
SiGe-base technologies exhibit transit frequencies of 81 GHz and maximum oscillation
frequencies of 95 GHz. Measurement results of circuits realized in these technologies for
low power and high-speed applications are presented: a 43 GHz low power dynamic fre-
quency divider, a 23 GHz monolithically integrated oscillator, a 40 G b / s clock and d a t a
(CDR) recovery realized in the pure silicon bipolar technology, and a 53 GHz static fre-
quency divider, a 79 GHz dynamic frequency divider and a 20 GHz / 27 m W dual-modulus
prescaler in the SiGe technology.

1. Introduction
The wish to connect everyone at any time and any place by speech and data trans-
mission requires the possibility of global communications. The necessary infra-
structure for global communications is broad, and includes technologies and cir-
cuits related to all types of networks, both local and wide area, wired and wireless,
for high-speed point-to-point and point-to-multipoint communications. Important
elements in these networks are high-speed IC's at reasonable cost. The strong eco-
nomical power of this market is the driving force for the steady improvement of

35
36 M. Wurzer et al.

the high-frequency performance of RF technologies. Attractive candidates are Si


and SiGe bipolar technologies. Self-aligned double-polysilicon transistors with im-
planted base and LOCOS isolation are used in most of todays production technolo-
gies. Recent results have proven that there is still a high performance potential for
this well-established concept (e.g. l i 2 ) . The performance improvement is achieved
by lateral scaling of the devices and by vertical scaling of the base dopant profile.
The main advantage of this technology is the availability of all the process mod-
ules in existing production environments which reduces cost. For this reason it is
important to investigate the performance limits.
In recent years a lot of effort has been spent on the development of epitaxial
SiGe base technologies (e.g. 3>4-5'6>7), too. Additionally, advanced isolation tech-
niques like shallow and deep trench or SOI substrates are often used (e.g. 8 ' 9 ' 1 0 > n ).
Furthermore, sophisticated process tools like self-aligned metal base electrodes can
be applied to increase the performance. Using these techniques impressive record
device parameters and circuit performance have been achieved, e.g. a maximum os-
cillation frequency of 163 GHz 12 , ring oscillator gate delays of 5.5 ps and a 67 GHz
static frequency divider 12>13. However, several of these process tools are usually
not available in existing production environments and their use leads to a signif-
icant increase in process complexity. This performance improvement is therefore
accompanied by an increase of manufacturing costs.
For this reason it is interesting to evaluate, which technology gives the best
performance/cost relation by realizing circuits in Si and SiGe for the targeted ap-
plication.
This work reports on the device fabrication, transistor performance and circuit
results of an advanced implanted base silicon bipolar technology, and an advanced
SiGe bipolar technology which are both fully compatible with standard CMOS
production environments, except for the epitaxial growth of the SiGe base.

2. High-Performance Si Bipolar Technology and Circuits

2.1. Technology Description

2.1.1. Device Fabrication


The fabrication of the devices is based on a self-aligned double-polysilicon bipolar
technology presented in 14 , which has been qualified for production of the SIEGET
45 microwave transistors 15 . The transistor performance has been further improved
by carefully scaling the lateral dimensions of the devices.
The fabrication starts with the implantation of a highly As-doped buried layer.
The maximum doping concentration of about 5 x 1019 c m - 3 is just below the critical
limit before the onset of As segregation which would lead to transient enhanced
diffusion during base formation. The resulting sheet resistance of the buried layer
is only 1 4 0 / D which enables a low collector resistance. A 0.9/mi thick epitaxial
High Performance Si and SiGe Bipolar Technologies and Circuits 37

layer is grown. The isolation consists of a pn junction formed by boron implantation


and LOCOS oxide. The isolation is optimized for low capacitances (i.e. low power
consumption and high operation frequencies) and not for high packing density.
Using a relatively large transistor-transistor pitch of 4/im reasonable values for
the collector-substrate capacitance are achieved without using any expensive deep
or shallow trench isolation. We believe that the slightly smaller packing density
in comparison to trench-isolated technologies is not a real problem for typical RF
applications, because the area consumption of such circuits is usually not determined
by the active devices but by the pad configuration and/or passives like capacitors
or spiral inductors. The LOCOS oxide is 600 nm thick which gives small values for
the parasitic base-collector capacitance.
After the implantation of a collector sinker a 250 nm thick polysilicon layer
for the base contact and a 250 nm thick CVD oxide are deposited. This stack is
patterned using 0.4 ^m lithography to define the emitter regions of the devices.
The minimum overlaps used in the whole fabrication process are 0.2 /j,m. Thus, the
requirements for lithography are relaxed and the technology can be manufactured
in standard CMOS production lines.
A key feature of this technology is the base doping technique which leads to
very steep base profiles without the use of epitaxial base deposition. A low-energy
(5keV BF^) implantation is used as dopant source. However, the final shape of the
base doping profile is not only determined by the implantation conditions but by
subsequent diffusion using RTP. A first RTP step is used to anneal the implantation
damage and to minimize transient enhanced diffusion. Then a relatively high ther-
mal budget can be used in a second RTP step to optimize the base doping profile.
During this treatment the excess boron concentration at the surface is diffused out
and a plateau in the base profile with a maximum boron concentration of about
5 x 10 18 c m - 3 and a steep slope towards the collector side of the devices are ob-
tained (Figure 1). A selectively implanted collector with a doping level of about
2 x 10 17 cm" 3 is used to increase the transit frequency and the collector current
density of the transistors. An L-shaped spacer is formed for emitter-base isolation.
The spacer width is carefully adjusted to minimize the base link resistance while
maintaining low emitter-base leakage. The final effective emitter width is 0.2 /jm.
To prevent a lack of emitter dopant in small transistors which could lead to
narrow-emitter effects like a reduced transit frequency 16 an in-situ doped emitter-
polysilicon layer is used for emitter doping 1T. The final doping profile of the devices
is shown in Figure 2. The emitter-base doping technique is highly reproducible from
wafer to wafer and homogeneous within a wafers. This is confirmed by wafer maps
of the intrinsic base sheet resistance (Figure 3). At a base width of only 50 nm a
mean value for the base sheet resistance of 12.3kfi/D at VBE = 0 V with a standard
deviation across the wafer of only 5.1% is achieved. This demonstrates that high-
quality (i.e. thin, steep and homogeneous) base doping profiles can be fabricated
using conventional ion implantation. Figure 4 shows an SEM cross-section of a
fabricated transistor with an effective emitter width of 0.2 fim.
38 M. Wurzer et al.

1tf
as implanted
V after anneal of
y implantation damage

C (cm-3)
before emitter
drive-in

25 50 75 100 125
depth (nm) -

Fig. 1. Boron profile after various steps of base formation (SIMS).

'\0*

1021
2

10
20
21 H As
As

10 1!19
C (cm-3) ^ ~ ^ ^ \
18
11
: AB /
10
\
10 1
17 " / 1 F Li
10
16 Ur , 1, J , , .
0.5
,....,....,.. I.
1 1.5
\

2.5

depth (\xm) -

Fig. 2. Final doping profile of the transistors (SIMS).

The fabrication is completed by a four-level aluminum metalization which uses


CMP for planarization and contact holes and vias filled with tungsten. This is
helpful for the realization of complex circuits and on-chip inductors.
High Performance Si and SiGe Bipolar Technologies and Circuits 39

RBipcO/O] 13.4 12.2 12.9

13.8 12.8 12.6 12.0 12.3 12.6 12.5

test
13.8 13.0 10.8 11.9 12.1 12.0
field

13.5 13.0 12.6 11.6 11.4 12.1 12.4 12.3 12.5

13.3 12.9 12.8 12.0 11.8 12.0 12.1 12.0 12.0

13.2 13.0 13.2 12.4 12.1 12.5 12.5 12.0 12.0

12.0 12.3 test 11.0 11.5 11.8 11.8 11.7 11.8


field

12.2 12.2 11.5 11.6 11.8 11.8 11.9

11.9 11.4 11.6 12.3 11.8

Fig. 3. Wafer map of the intrinsic base sheet resistance RBU mean value: 12.3 kO/D, standard
deviation; 5.1 %.

Fig. 4. SEM cross-scctiun of a tran.si.stor with 0.2/im effective emitter width.


40 M. Wurzer et al.

In general the technology is very similar to current production technologies used


by several companies, but offers higher performance. Only standard production
tools are used and no change of the transistor concept is necessary. Therefore, this
technology is highly manufacturable with high yield and low costs.

2.1.2. Transistor Performance


Typical Gummel characteristics of transistors with an emitter area of 0.2 x 2.8 una2
are shown in Figure 5. Ideal behaviour down to base currents of about 1 nA is
observed. The current gain ft is nearly constant over four decades of current at a
value of 140. Figure 6 depicts the output characteristics of the devices. The Early
voltage is 23 V at an intrinsic base sheet resistance of 12kfi/D. This demonstrates
that the small base width has not been realized at the expense of an unacceptable
small base Gummel number which would lead to a high intrinsic base resistance
and poor linearity.

VW •
Fig. 5. Typical Gummel characteristics.

On-wafer S-parameter measurements are used to evaluate the high-frequency


performance of the devices. To eliminate the influence of pads and metal lines OPEN
and SHORT deembedding is used for the characterization of small devices 18 . The
measured transit frequency fo of the transistors is shown in Figure 7. At a base-
collector voltage VBC of 0 V a transit frequency of 50 GHz is obtained, at VBC = -1V
the transit frequency is 52 GHz. The maximum of the transit frequency occurs at
the collector current density of about 2mA//im 2 . In Figure 8 the maximum transit
frequency is shown as a function of the emitter width WE- The transit frequency is
independent of the transistor geometry and no reduction of fr for small transistor
High Performance Si and SiGe Bipolar Technologies and Circuits 41

120

l c (MA)

V
CE 00
Fig. 6. Output characteristics.

/o 52 GHz
60 -
/
50 vBC = - i v
40
f T (GHz)
30 -f
20
V = 0 V
2 BC
10 AE = 0.2x2.8 Mm
f , . I

C) 1 2 3 <

l c (mA) - •

Fig. 7. Transit frequency fx vs. collector current lc-

widths is observed. This is due t o t h e use of in-situ doped emitter-polysilicon which


enables t h e s a m e e m i t t e r doping profile for all transistor sizes down t o t h e m i n i m u m
w i d t h of 0.2 fim. T h e m a x i m u m oscillation frequency fmax has been d e t e r m i n e d
by e x t r a p o l a t i n g t h e m a x i m u m available gain (MAG) with a slope of -20 d B per
42 M. Wurzer et at

decade of frequency. The maximum oscillation frequency is 56 GHz at Vgc = 0 V


and 65 GHz at VBC = - 2 V, respectively (Figure 9). In Figure 10 the measured gains
are shown as functions of frequency. Power gains of 25, 21, and 18 dB are achieved
at frequencies of 3, 6, and 10 GHz, respectively.

70
54 GHz (max.)
60 /
T 50

40
fT (GHz) 52 GHz (min.)
30

20
V =-1 V
10 BC

0 0.5 1 1.5
w E (|jm)

y
Fig. 8. Maximum transit frequency fr,max s- emitter width WE for constant emitter length
IE =2.8 //m.

80
65 GHz
70
V 2 V
BC = -
A 60
J 50
40
max
30
20
10 AF = 0.2 x 2.8 pm

0
1 2
l c (mA)

Fig. 9. Maximum oscillation frequency / „ „ vs. collector current lc-


High Performance Si and SiGe Bipolar Technologies and Circuits 43

40
A = 0 . 2 x 10 \imz

v=-2V
MSG / MAG

Gain (dB)

68 GHz
49 GHz

10 100

frequency (GHz)

Fig. 10. Gain vs. frequency characteristics.

Figure 11 depicts the minimum noise figure Fmin of microwave transistors with
six emitter stripes as a function of collector current for three different frequencies.
At 3, 6, and 10 GHz the minimum noise figures are 0.7, 1.3, and 1.7 dB, respectively
and the associated gains are 17, 14, and 11 dB. These values demonstrate that these
transistors are well suited for analog applications up to at least 6 GHz.

A = 6 x (0.2 x 20 urrT)

V 1 V
BC = "

F
min( d B >

10 20 30 40 50

I- (mA)

Fig. 11. Minimum noise figure F m i „ of microwave transistors vs. collector current Ic for different
frequencies.
44 M. Wurzer et al.

Table 1 summarizes the most important transistor parameters. The quality of


the base doping technique is reflected in the high transit frequency of 52 GHz at
reasonable values for the intrinsic base sheet resistance of 12k£l/D and the Early
voltage of 23 V. The capacitances (measured at zero bias voltage) are small in
view of the simple isolation technique. Further improvements for implanted base
technologies should be feasible if advanced isolation schemes like shallow and deep
trench are used. This should result in again higher operation frequencies as well as
lower power consumption. The breakdown voltages (measured at 1 = 1 0 fiA) indicate
that the collector doping level has not been chosen too high and that the width of
the spacers is appropriate for emitter-base isolation.
In general, the data demonstrate that this technology has not been developed
to achieve single record transistor parameters, but to find a balanced compromise
between all parameters. This enables the realization of high-performance circuits
for analog and digital applications.
Table 1. Important transistor parameters.

AE 0.2x2.8^im^
0 140
RBi 12kn/D
VEavly 23 V
BVEBO 2.8 V
BVCBO H.5V
BVCEO 2.7 V
CEB 8.8 fF
CBC 6.4 fF
CCS 15.2 fF
ST 52 GHz
J max 65 GHz

2.2. Circuit Results

2.2.1. Ring Oscillators


ECL ring oscillators have been fabricated to evaluate the digital high-speed switch-
ing potential of the technology. The circuits have 75 stages consisting of differential
ECL gates. The ring oscillators are driven with a differential voltage swing of
400 mV and a switching current of 1.0 mA per gate. The minimum measured ECL
gate delay is 11.5ps. In Figure 12 a wafer map of the minimum gate delay is shown.
A mean value of 12.0 ps is obtained with a standard deviation across the wafer of
only 3.2%.
All of the tested ring oscillators with 300 transistors each were functional (the
two test fields are empty chips which are used for thickness measurements and
SIMS analysis). This demonstrates that high yield for typical RF circuits is to be
expected. The small standard deviation of the gate delay indicates that not only the
base resistance but also all other important transistor characteristics like the transit
High Performance Si and SiGe Bipolar Technologies and Circuits 45

t D [ps] 11.7 11.7 11.7

11.6 11.5 11.7 11.7 11.7 11.7 11.6

11.5 11.7 jef* 11.9 11.9 11.9 11.9


field
11.9 11.7 11.8 12.0 12.0 12.1 12.2 12.2 12.4

11.8 11.7 11.8 11.9 12.1 12.3 12.3 12.3 12.6

11.7 11.8 12.0 12.1 12.2 12.3 12.5 12.9 12.9

11.7 11.8 l^l 12.2 12.2 12.4 12.6 12.9 12.9


field

11.6 11.9 12.1 12.1 12.2 12.6 12.9

11.6 11.7 12.2 11.9 12.2

Fig. 12. Wafer map of the ring oscillator gate delay time TD; mean value: 12.0 ps, standard
deviation: 3.2%.

time or the capacitances have excellent homogeneity over the wafer. The measured
gate delay characteristics represent the state of the art for non-epitaxial base bipolar
technologies. The low-power capability of the technology has been evaluated by
fabricating CML ring oscillators with an emitter area of only 0.2 x 0.3/xm2. The
circuits have 75 stages and a differential voltage swing of 400 mV. The circuits are
driven at low current densities at which the gate delay is mainly determined by the
capacitances of the transistors. At a supply voltage of 1.8 V and a current per gate
of 26 //A a gate delay of 111 ps is measured. This results in a very low value for the
power delay product of only 5.2 fJ.

2.2.2. Low-Power Dynamic Frequency Divider

High-speed frequency dividers are critical building blocks in a variety of ap-


plications ranging from clock generators to microwave receivers. The maximum
operating frequency for these systems is often limited by the speed of the frequency
divider. Conventional static frequency dividers use master-slave flip-flops to achieve
frequency division (see section 3.2.2.). These circuits operate over a wide frequency
range with their lower frequency limit determined by the slew rate of the input sig-
nal. The upper frequency limit is caused by the gate delay of the master and slave
latches in the flip-flop. Dynamic frequency dividers can operate at much higher
input frequencies than static dividers. However, they have a limited operating fre-
46 M. Wurzer et al.

quency range with a lower limit typically at one half to one third of the maximum
frequency. Dynamic frequency dividers not only have higher maximum frequencies
than static dividers but, for a given input frequency, they consume less power than
static dividers.

Input -2 : -5-16 : ^ Output


dynamic buffer static buffer

Fig. 13. Dynamic frequency divider block diagram.

We have designed a low-power dynamic divider operating from frequencies below


20 GHz to over 40 GHz. The circuit has a divide ratio of 32. It consists of a dynamic
divide-by-two input stage followed by a static divider by 16 (Figure 13). It would be
possible to implement a multi-stage divider consisting exclusively of dynamic divider
stages 1 9 . However, the operating frequency range of each of the successive stages
would have to correspond exactly to the output frequency range of the previous
stage in order not to decrease the overall operating frequency range of the circuit.
Since the output frequency of the first divider stage is low enough to be handled
by a low-power flip-flop we chose an four-bit asynchronous static divider as second
part of the circuit.

Mixer Low-pass Amplifier


f1±f2 frf2 frf 2-t2
Input
<8> % >
Output

f2

Fig. 14. Regenerative divider principle.

The dynamic divider in the input stage uses regenerative frequency division.
Figure 14 shows the operating principle of this divider type. It consists of a mixer,
a low-pass filter, and an amplifier 2 0 . The input signal with a frequency / i is applied
to one port of the mixer. Assuming an ideal mixer and a local oscillator signal with
a frequency f2 only the sum and difference frequencies / i ± / 2 will appear at the
mixer output. The low-pass filter suppresses the frequency /i + h- The signal with
the frequency fi - / 2 is amplified and serves as output signal of the divider as well
as local oscillator signal for the mixer. To obtain stable operation / j - f2 has to
be equal to / 2 . This leads to the desired frequency division of f2 = / i / 2 . The
High Performance Si and SiGe Bipolar Technologies and Circuits 47

maximum operating frequency of the divider is determined by the loop gain which
has to be higher than unity for divider operation. The lower limit is reached when
the low-pass filter no longer suppresses the signal at f\ + f2.

Vcc

i 5
INPUT

^ BUFFER

ii
Fig. 15. Circuit diagram of the low-power regenerative divider.

Figure 15 shows the implementation of the dynamic divider. An active double-


balanced mixer is used because it provides suppression of the input signals. Fur-
thermore, its conversion gain allows to omit the amplifier shown in Figure 14. Since
the conversion gain of the mixer drops at higher frequencies no additional low-pass
filter is required.

oooooc
Input
ex, p,
-** ' J t s W i p K i - • output
JP &
•am* ^ -
fe& U
/-\
n DT:

\_^ OOOO u
Fig. 16. Low-power dynamic divider chip photograph (size: 550 x 450 /im2).

The amplitude of the output signal of the regenerative divider varies with varying
input signals. For this reason the regenerative divider stage is followed by a limiting
amplifier which provides a constant input signal for the four-stage asynchronous
48 M. Wurzer et al.

divider. Each of these four stages consists of a master-slave flip-flop with feedback
from the inverted output to the data input. Figure 16 shows the chip photograph
of the dynamic divider circuit. The chip measures 550 x 450 iim2.
The low-power dynamic frequency divider operates with supply voltages from
3.6 V to 5 V. With a 3.6 V supply the circuit draws 58 mA and operates up to
36 GHz. With a 5 V supply a maximum operating frequency of 43 GHz is achieved.
This compares well with the state-of-the-art 21 . The supply current in this case is
71 mA. The input sensitivity of the circuit is shown in Figure 17. It was measured on
wafer with a single-ended input signal. The output voltage swing is 2 x 200mV p p .

20
finmax - 36 GHz
! V E E = -3.6 V
10

finmax = 43 GHz
P. n (dBm) V F F = -5.0 V

-10

-20 i . . . . i

0 10 20 30 40 50 60
f (GHz) •

Fig. 17. Input sensitivity of the low-power dynamic frequency divider.

2.2.3. Monolithically Integrated Oscillator


During recent years a large number of monolithically integrated oscillators have
been published. In most cases these circuits use on-chip spiral inductors in their
resonant circuits. The majority of the monolithic oscillators in silicon bipolar or
CMOS technologies published so far address the mobile communications market at
frequencies around 2 GHz 22>23-24. Monolithic oscillators operating at frequencies
higher than 10 GHz have relied on III-V-semiconductors 25 or SiGe heterojunction
bipolar transistors 26 . Oscillators operating at 10 GHz or 20 GHz are attractive for
optical communications systems, e.g. for clock and data recovery circuits. In this
section we present the first monolithically integrated oscillator for the frequency
range of 19.5 GHz to 23 GHz. The circuit uses a cross-coupled differential amplifier
as its active element 2 7 . Two identical resonant circuits act as load for the differential
amplifier. They use on-chip spiral inductors with an inductance of 300 pH and a
quality factor Q of six. The spiral inductors can be seen in the chip photograph
High Performance Si and SiGe Bipolar Technologies and Circuits 49

(Figure 18). The circuit does not use varactors and is tuned by varying the operating
current of the oscillator core. An additional output buffer provides isolation between
the resonator and the off-chip load.

Fig. 18. Monolithic oscillator chip photograph (size: 560 x 330 /im2).

The oscillator operates with supply voltages ranging from 3.3 V to 5 V. It can
be tuned over a frequency range of 19.5 GHz to 23 GHz. Figure 19 shows the
output spectrum at 23 GHz. The output power of about — 12dBm is determined
by the operating current of the output buffer. The phase noise of the oscillator is
—91 dBc/Hz at a frequency offset of 1 MHz.

<vr~rErsi i o d B IVIKR -1-1.a3dBm


Ftl_ OdBm 1QdB/ 23.0000GHz

C E N T E R 23.0000QH; SPAN 100.0MHz


R B W 1 O M H z SWF* 250ms

Fig. 19. Output spectrum of the monolithic oscillator.


50 M. Wurzer et al.

2.2.4. Clock and Data Recovery


Figure 20 shows the block diagram of 40 Gb/s electrical time-division multiplex-
ing (ETDM) fiber-optic link. The time-division multiplexer collects several data
channels into a single high-speed data stream. The E/O-block converts the data
from electrical to optical signals by modulating the light of a semiconductor laser
diode using an external modulator. The O/E-conversion on the receiving side is per-
formed by a photodiode followed by a transimpedance amplifier. This bit stream
is fed into the clock and data recovery unit. Its task is to synchronize the local
oscillator to the phase of the incoming data and to retime the data. In contrast to
10 Gb/s systems, the decision function is now performed by a demultiplexer. This
requires a DMTJX with excellent retiming capability combined with a high input
sensitivity 28 - 29 .

input
data v/\A* — i
channels
modulator
driver
clock optical fiber

output DMUX&
data CLOCK V\A/J
channels RECOVERY
transimpedance
amplifier

Fig. 20. Block diagram of a fiber-optic link.

It has been shown that basic digital functions like MUX and DMUX for 40 Gb/s
optical-fiber TDM systems can be realized in silicon bipolar technology 2 8 . But clock
and data recovery circuits in a silicon technology have so far only been demonstrated
for 20 Gb/s 3 0 . With more sophisticated SiGe or III-V technologies 40 Gb/s are
achieved 31>32.33>34. Some of these solutions are hybrid. All these realizations are
either based on high-Q filters or on PLLs. The advantage of the first concept is
the easy implementation. The disadvantages are that temperature and frequency
variation of filter group delay makes sampling time difficult to control, the high-
Q filter is difficult to integrate and narrow pulses require a high fT- The major
advantages of the second approach are that the phase between the extracted clock
and the received data is locked, and that it can be implemented as a monolithic
integrated circuit. In the following we describe a clock and data recovery circuit for
40 Gb/s in this production-near silicon bipolar technology 35 .
Figure 21 a) shows the concept of the CDR for the fiber-optic link used in more
High Performance Si and SiGe Bipolar Technologies and Circuits 51

detail. The main processing blocks are the demultiplexer consisting of two master-
slave D-flip-flops (DFF1, DFF2) in parallel and an additional master-slave D-flip-
flop (DFF3) which forms the phase detector together with DFF2 and the XOR gate.
All these functions are integrated in a single chip. The fixed 90° phase shifter,
voltage-controlled oscillator (VCO), and loop filter have been realized externally
with commercially available components.

D,„>c:
D,„» D2 OCT X locked
c2 | _ 1 t L_ condition

^3 1 • L

c? t 1 t I
clock is early
^3
t 1 t L_
<?XORou

U2 1 t 1_ clock is late
^ 1 » 1 •
CLK

a) b)

Fig. 21. CDR circuit: a) block diagram and b) timing diagram.

Figure 21b) shows the timing diagram. The incoming 40Gb/s data signal is
applied to flip-flops DFF1, DFF2 and DFF3. DFF1 is toggled by UlK, DFF2 by
CLK and DFF3 by the 90° delayed clock signal. This results in the sampling of the
input in vicinity of bit mid and each following potential transition. If a transition is
present, the phase relationship of the data and the clock can be deduced to be early
or late. If the midbit clock CLK is too early DFF3 samples the same bit, if it is too
late DFF3 samples the following bit. Under locked conditions DFF3 samples at the
edge of the data eye. The XOR compares the output samples of DFF2 and DFF3.
The result is fed to the loop filter. The output signal of the loop filter serves as
control signal of the VCO. The advantages of this concept are that all components
operate at half the data rate and that the input is demultiplexed at the same time.
The disadvantage is that the input signal has to drive three DFFs in parallel.
The circuit is designed for the single supply voltage of -5 V. The circuit principles
used are seen in the circuit blocks of a master-slave D-flip-flop (MS-DFF) shown in
Figure 22.
The well-proven Emitter-Emitter Coupled Logic (E 2 CL) is used with emitter
followers at the inputs and current switches at the outputs 3 6 . The series gat-
52 M. Wurzer et al.

= : ;:
(y o^y
Data

Output

Clock

EE
"
• 4 4

K Data Input
4 * •

Master
* 4 •

-H*-
HI
Slave -H

Fig. 22. Circuit diagram of the master-slave D-flip-flop.

ing between clock and data signals enables differential operation with low voltage
swings (AV « 400 rnV^,p) resulting in an increase of speed and a reduction of power
consumption. Furthermore, differential operation reduces time jitter and crosstalk
and offers good common-mode suppression compared to single-mode operation 3 6 .
Cascaded emitter followers are used for level shifting and impedance transforma-
tion between the various current switches. Multiple emitter followers improve the
decoupling capability and increase the collector-base voltage of the current-switch
transistors allowing for smaller transistors, resulting in lower base-collector capaci-
tances 3 6 . On-chip matching resistors (50 fi) at all data inputs are used in order to
reduce jitter introduced by reflections and instabilities 28 - 37 . Getting the required
speed rather than low power consumption was the main aim of this design. All
transistor sizes are individually optimized with respect to the function of the tran-
sistor in the circuit. Special attention was given to the on-chip wiring. The lines on
the chip were classified as 'critical' or 'uncritical'. For example, the lines driven by
emitter followers are critical because they support ringing, while the lines driven by
current switches are uncritical 36 . The critical lines are then shortened at the cost
of the uncritical ones. The longer signal lines are realized as microstrip lines (with
the lowest metallization layer as a ground plane), mainly to improve simulation
accuracy. This leads to the layout shown in Figure 23.
For measurements the clock and data recovery IC has been mounted on a 15 mil
ceramic substrate (e r = 9.9) using conventional bonding technique. Special care has
been taken to minimize the length of the bond wires by positioning the surface of
the chip on the same level as the signal, ground, and supply lines of the mounting
substrate. Due to differential operation a pair of lines for each clock and data signal
is needed to connect the chip with the environment. Therefore, a corresponding
number of connectors is necessary. The minimum distance between them determines
High Performance Si and SiGe Bipolar Technologies and. Circuits 53

D2

D3

C3 t \ XORout

Fig. 23. Chip micrograph (chip size: 900 x 900 (im2).

the minimum size of the test fixture. To avoid additional delay lines the length of
the lines for the signals C2, O3 and D\y D25 D% (see Figure 21), respectively, have
to be the same. To achieve a compact layout of these lines coupled microstrip lines
are used. At the input D«n grounded coplanar lines are applied which show lower
dispersion than microstrip lines. The realized test fixture is shown in Figure 24. It
measures 70 x 70 mm 2 .

Dm

C3 t I XORout
Fig. 24. Photograph of the package (package size: 70 x 70 mm 2 ).
54 M. Wurzer et al.

Random pulse pattern generators for driving the circuit at the required data rate
of 40 Gb/s are not commercially available. A pulse generator has been built from
basic high-speed IC's 28 - 38 . Four 10 Gb/s pseudo-random bit sequences (sequence
length 2 7 -l) have been multiplexed to a 40 Gb/s nonreturn-to-zero (NRZ) signal.
The clock and data recovery IC operates at the single supply voltage of -5 V and
consumes 1.6 W. It should be mentioned that no additional cooling was applied.
Figure 25 shows the 40 Gb/s input signal to the CDR circuit. In order to demon-
strate the input sensitivity of the circuit the eye opening is artificially reduced. In
Figure 26 an eye diagram of the well regenerated and demultiplexed data signal
is shown. Figure 27 shows the 20 GHz transmitter clock (top) and the recovered
clock (bottom). The jitter histogram of the extracted clock in the time domain is
displayed in Figure 28. The measured rms time jitter as observed on the sampling
oscilloscope is about 0.8 ps.
This completes the presentation of circuits useful as building blocks in high-speed
data transmission. The presented results are state-of-the-art for silicon bipolar
technologies.

20 ps

Fig. 25. Eye diagram of the 40 G b / s input d a t a signal D ; n (eye opening is artificially reduced).

20 ps

Fig. 26. Eye diagram of the 20 G b / s d a t a signal at the output Z>2 of the 1:2 demultiplexer.
High Performance Si and SiGe Bipolar Technologies and Circuits 55

!50mV

150 mV
VT | |

20 ps

Fig. 27. Transmitter clock (top) and recovered clock (bottom).

-*T~-2c = .6ps
i !
:i

ri 1t \ i

i1i
/ /

i1 i Jf J
I 1
(25mV

VI < N
•iwr Ljn
• ^m' j1 * ^

10 ps

Fig. 28. Jitter histogram of the recovered clock.

3. High-Performance SiGe Technology and Circuits

3.1. Technology Description

3.1.1. SiGe Device Fabrication


A schematic cross-section of the SiGe npn transistor is shown in Figure 29.
The isolation which uses a selectively grown collector region has been described
in 7 ' 3 9 . Of major importance is the SiGe base which has been integrated into
a double polysilicon self-aligned emitter-base configuration by means of selective
epitaxial growth 7>39>4o,4i. This device structure is advantageous in providing small
capacitances and small parasitic series resistances and therefore effectively fits to
the benefits of the integrated SiGe base layer. Moreover this device configuration
exhibits a quasi self-aligned base-collector structure which serves to reduce base-
56 M. Wurzer et al.

collector capacitance.

base emitter collector

p+-polysilicon buried layer LOCOS


CVD-oxide collector
Fig. 29. Schematic cross-section of the fabricated SiGe HBT's .

The emitter-base process flow, which is closely related to that of the implanted
base double polysilicon transistor, is outlined in Figure 30.
For the definition of the emitter region a nitride/oxide/p + -polysilicon sandwich
is patterned onto a 100 nm thick CVD oxide layer. By applying a pedestal collector
implantation the doping level of the active collector has been adjusted to about
2 x 10 17 cm" 3 for achieving high switching current levels in the fabricated high-
speed circuits. After the formation of thin nitride spacers the CVD oxide is wet
etched underneath the p+-polysilicon base electrodes in order to create self-aligned
adjusted p+-polysilicon overhangs of about 0.1 /im. The graded SiGe base and the
boron doped silicon cap were grown in a single wafer reactor which is equipped with
a silicon and a polysilicon chamber. The selective SiGe base deposition has been
obtained with a H2/'S1H2CI2/'HCl/GeHi/'B-zH6 chemistry at a pressure of 6Torr.
Figure 31 shows the SIMS dopant profile of the integrated SiGe base layer after the
epitaxial deposition process. At the collector side a boron spike with a maximum
boron content of 2 x 10 19 c m - 3 was grown in order to reduce the intrinsic base
sheet resistance. At the emitter side the base is lowly doped with 5 x 10 18 cm~ 3
in order to obtain small emitter-base capacitance and to maintain a high emitter-
base breakdown voltage. In this base profile the maximum germanium content is
15% which has been graded linearily over 20 nm across the neutral base. After
base deposition the thin nitride spacers were stripped in phosphoric acid and device
processing was continued by forming a spacer of only 100 nm in thickness for the
final separation of the emitter from the extrinsic base. For a prevention of a cut-
off frequency reduction at narrow emitter width an in-situ arsenic doped emitter
polysilicon has been deposited at a temperature of 550° C using a H2/SiH6/AsH3
chemistry. The emitter has been diffused out of this n + -polysilicon layer by rapid
thermal annealing at a temperature of 1025° C for 6 s. After the emitter drive-in
High Performance Si and SiGe Bipolar Technologies and Circuits 57

nitride
•••*•.

1
W///M +
p - poly
•'///////////////,
a)
///, '/A v////,
% CVT>-ox\fey//

SiGe base

b)
• •p+ - poly \
WW///////,
///, //A D» V////,
n- ////
% CVD-oxide^^

n+ poly

m,
c) p + - poly fa Vr,

SiGe base

Fig. 30. Emitter-base fabrication steps a) self-aligned formation of p+-polysilicon overhangs, b)


selective SiGe base deposition, nitride spacer removal, c) spacer and emitter formation.

device processing is completed by applying a four-level metalization scheme with


tungsten filled contact and via holes.
An SEM cross-section through the final emitter-base configuration is shown in
Figure 32. The fabricated transistors exhibit an emitter mask size of 0.5 //in and an
effective emitter width of 0.25 /urn. The main advantage of this device configuration
is the combination of a low resistivity SiGe base layer with the self-aligned formation
of p+-polysilicon overlaps serving for a reduction of base-collector capacitance.
53 M. Wurzer et at

r
"\\zzn " " ^ ^ n ™" T *" «—™* * »
'B47711

20

Germanium
1E2(h
co ;
E ; 115

!
c
.2
"s
c 1E19:

1
o> :L Boron y
1Ho 8
0)
0

m
1E18;

1E1?"
0
r
°°~r°°~,r,!i°r"
20 40
1iui
60 80 100
,..&.......
..J

120
Depth [nm]
Fig. 31. Profile of the SiGe base layer after epitaxy (SIMS).

Fig. 32. SEM cross-section of the final emitter-base configuration for transistor with effectiYe
emitter width of 0.25 /im.
High Performance Si and SiGe Bipolar Technologies and Circuits 59

3.1.2. Transistor Performance


Figure 33 shows the Gummel characteristics of transistors with an effective emitter
area of 0.25 x 2.8 ^m 2 . At a pinched base sheet resistance of 4.5kH/n a current
gain of 200 has been obtained. Despite of the boron peak of 2 x 1019 c m - 3 at the
collector side ideal Gummel characteristics are observed. This indicates that the
boron peak is well separated from the emitter and that no boron outdiffusion has
occured. Additionally, the Gummel characteristics of multi-transistor arrays with
7000 transistors are shown in Figure 33. The typical yield of these transistor arrays
is 85 %. The output characteristics of our the SiGe HBT's is shown in Figure 34.
The emitter to collector breakdown voltage BVCEO is 2.5 V. The steeply graded
SiGe profile with a germanium fraction of 15 % at the collector side has provided an
Early voltage higher than 200 V. This leads to a current gain/Early voltage product
in excess of 40000.

Fig. 33. Typical Gummel characteristics of single transistors with 0.25 X 2.8 fim2 emitter area and
multi-emitter arrays with 7000 transistors.

Figure 35 gives a wafer map of the intrinsic base sheet resistance at an emitter-
base bias of VBE—0V. The mean value of the intrinsic base sheet resistance is
4.5kft/D and the standard deviation over a wafer is typically 12%. At normal
operation with VBE > 0 V the sheet resistance is even lower and since the effective
emitter width is only 0.25 ^m the resulting intrinsic base resistance is very small.
The cut-off frequency fT has been obtained by the measurement of the S-
parameters using OPEN and SHORT structures for deembedding 18 . The depen-
dence of the cut-off frequency on the collector current is shown in Figure 36 for vari-
ous base-collector voltages. Devices with an effective emitter area of 0.25 x 2.8 nm2
60 M. Wurzer et al.

A E = 0.25 x 2.8 M"r


l B = 5, 10, 15, 20, 25 pA

I_ (mA)

V
CE 0 0

Fig. 34. Typical output characteristics.

RBI [kn/D] 5.56 5.04 5.68

6.24 5.00 4.52 4.40 4.48 4.64 5.00

4.92 4.36 ^ 3.68 4.12 4.20 4.40

5.36 4.44 4.04 3.80 4.04 4.20 4.12 4.16 4.64

5.08 4.36 4.04 4.20 4.48 4.52 4.28 4.16 4.48

5.08 4.21 3.92 4.04 4.40 4.44 4.24 4.16 4.52

5.60 4.44 ™ 3.52 4.08 4.12 4.08 4.20 4.76

5.04 4.12 4.04 4.20 4.28 4.40 4.68

5.32 4.96 5.00 5.08 5.16

Fig. 35. Wafer map of intrinsic base sheet resistance RB{\ mean value: 4.5kfi/D, standard devia-
tion: 12%.
High Performance Si and SiGe Bipolar Technologies and Circuits 61

and one base contact exhibit a peak cut-off frequency of 81 GHz at VBC — -1V. The
slightly decreasing cut-off frequency with increasing reverse base-collector voltage
indicates a large contribution of the collector part to the total transit time and that
the emitter and base transit times are effectively reduced by the steeply graded SiGe
base layer. Due to the use of the in-situ arsenic doped emitter polysilicon layer no
cut-off frequency reduction due to narrow emitter effects has been observed down
to the smallest emitter width of 0.25 /mi.

100
90
A 80

f T (GHz) 50
40
30
20
10
0
1 2 3 4
l_ (mA) •

Fig. 36. Transit frequency fr vs. collector current Ic for transistors with an effective emitter area
of 0.25 X 2.8/am 2 .

The maximum oscillation frequency has been determined on devices with an


effective emitter area of 0.25 x 9.8 /im 2 and two base contacts. This is a typical
transistor configuration used in high-speed circuits. Figure 37 shows the frequency
dependence of the small signal current gain PAC and the maximum stable gain
(MSG) or the maximum available gain (MAG), respectively. The maximum oscilla-
tion frequency fmax is extrapolated from the maximum available gain with -20 dB
per decade at frequencies between 25 and 30 GHz. The maximum oscillation fre-
quency is 95 GHz at the reverse base-collector voltage of 2 V. The dependence of
the maximum oscillation frequency on collector current is depicted in Figure 38 for
three different base collector voltages.
The most important transistor parameters are summarized in Table 2 for devices
with an effective emitter area of 0.25 x 2.8^m 2 and one base contact. The break-
down voltages have been determined for a current of 10/iA. In spite of a collector
doping level of 2 x 10 17 c m - 3 , which has been employed for achieving high switching
current densities in high-speed circuits, the quasi self-alignment of the base-collector
62 M. Wurzer et al.

40
AE = 0.25 > 9.8 pm2

30 ^ ^ - ^ S . BC
2V

. M S G / MAG
20 -
Gain (dB)
Ih21 r\\
10
-;-v 95 GHz

10 100

frequency (GHz)

Fig. 37. Gain vs. frequency for transistors with an effective emitter area of 0.25 x 9.8/iin 2 and
two base contacts.

(GHz)

Fig. 38. Maximum oscillation frequency /max vs. collector current, Ic for transistors with an
effective emitter area of 0.25 x 9.8 /mi 2 and two base contacts.

structure has resulted to a base-collector capacitance of only 5.6 fF. Furthermore an


emitter-base capacitance of only 8.2 fF and relatively high emitter-base breakdown
voltage of 3.2 V indicate that the boron peak in the base profile is well separated
from the emitter and that no boron outdiffusion has occured.
High Performance Si and SiGe Bipolar Technologies and Circuits 63

Table 2. Important transistor parameters.

AE 0.25x2.8 tini*
& 230
RBi 4.5kf2/0
VEarty >200V
BVEBO 3.2 V
BVCBO 9.5 V
BVCEO 2.5 V
CEB 8.2fF
CBC 5.6 fF
Ccs 6.8fF
ST 81 GHz

For low-power applications very small transistor geometries are of particular


interest. Therefore the cut-off frequency and maximum oscillation frequency have
been measured for the transistor with the minimum effective emitter area of 0.25 x
0.5 /im2. This device reaches a peak cut-off frequency of 67 GHz (Figure 39) and a
maximum oscillation frequency of 80 GHz (Figure 40) at a current of 300//A. Even
at a current as low as 28 fiA these devices reach a cut-off frequency of 25 GHz. This
technology is therefore well suited for low-power applications, too.

100
90 -
A 80 " 67 GHz,.
7
I 60
°
fT(GHz) 50
40 - f \ = 2V
30
\Xc -
V
20 - O BC = - 1 V
10 * A E = 0.25x0.5 Mm2
0 , , , , i , , , , i , . , , . . . . i

0.2 0.4 0.6 0.8


lc (mA)

Fig. 39. Transit frequency fa vs. collector current lc for minimum size transistors (effective
emitter area: 0.25 x 0.5^m 2 ).-
64 M. Wurzer et al.

100
80 GHz
90
80
70
t
/V
60

(GHz)
50
40
\v=-2v
30
20 t
f
10 - A E = 0.25 x 0.5 pm 2
\ y
^
V = -1V
V = ov
0 I • . • .

0.2 0.4 0.6 0.8


L (mA)

Fig. 40. Maximum oscillation frequency fmax vs. collector current Ic for transistors with an
effective emitter area of 0.25 X 0.5 /jm2.

3.2. Circuit Results

3.2.1. Ring Oscillator


The basic switching performance has been evaluated on CML ring oscillators con-
figured with 75 stages. The ring oscillators were driven with a differential voltage
swing of 400 mV at a supply voltage of 2 V. The dependence of the gate delay time
on the switching current is shown in Figure 41 for devices with an emitter area of
0.25 x 2.8fim 2 . At a switching current of 2mA per gate the minimum gate delay
time is 8.0 ps. The minimum power delay product of the CML ring oscillators is
9fJ. Figure 42 shows a wafer map of the minimum gate delay time TD- The mean
value is 8.0 ps and the standard deviation over a wafer is only 1.4%. Comparable
homogeneities were obtained for maximum operating frequencies of static frequency
dividers, too.

3.2.2. 2:1 Static Frequency Divider


Beside the applications described in section 2.2.2. frequency dividers (along with
ring oscillators) are also used in on-wafer measurements for evaluating the speed
performance of IC technologies. To date impressive results have been achieved
with realizations in different technologies: 66 GHz with InAlAs/InGaAs transferred-
High Performance Si and SiGe Bipolar Technologies and Circuits 65

100 - r-I-r-!-

I ' ' '

\— ^i.i4J,L.....AQj4...
10 —I--J.-I - J - - - I - — • -. —«#t; - _
r - T _
r ~ i ~

] L 1 L 1

TD [ps] , i . . . i

AE = 0.25 xisijm! 2
i i i i i I i ! i

0.1 1 10
l G [mA]

Fig. 41. CML ring oscillator characteristics.

TD [ps] 8.18 8.42 8.23

8.11 8.07 8.04 8.06 8.07 8.17 8.23

8.04 8.03 ^ 7.92 7.97 8.04 8.08

8.06 8.00 defect 7.97 7.85 7.91 8.00 8.00 8.11

8.00 7.97 7.88 7.88 7.92 7.91 7.91 7.98 8.08

7.97 7.94 7.91 7.87 7.87 7.87 7.89 7.97 8.06

test
7.95 7.97 feu 7.87 7.88 7.88 7.97 7.97 8.08

7.94 8.04 7.89 7.88 7.92 7.95 8.00

7.92 7.87 7.87 7.92 8.01

Fig. 42. Wafer map of ring oscillator minimum gate delay time TD. The mean value is 8.0 ps and
the standard deviation is 1.4% .
66 M. Wurzer ei al.

substrate HBTs 4? and §7 GHz with SiGe bipolar HBT's 13 . These measurements
have been performed on-wafer. The best published value for a mounted static
frequency divider is 45.2 GHz 435 which is fabricated in an InAlAs/InGaAs/lnP
HEMT technology. The measurement results on mounted chips are interesting
from the application point of view. In the following section we will present both
types of results.
Figure 43 shows the block diagram of the 2:1 static frequency divider. The
circuit consists of a clock input stage, a master-slave D-flip-flop, and an output
buffer. The internal dividing function is obtained by connecting the inverted slave
output to the master input. A chip micrograph is depicted in Figure 44. It measures
450x550|im 2 .

Output

Pig. 43. Block diagram of the 2:1 static frequency divider.

Inputs • Output

:t> '<^&Q^
Pig. 44. Chip micrograph (size; 450 x 550|im 2 ).
High Performance Si and SiGe Bipolar Technologies and Circuits 67

The divider draws 122 mA from a single supply voltage VEE of -6.3 V. The
divider core consumes 303 mW. The following measurements have been performed
single-ended with a sinussoidal input signal.
Figure 45 gives the on-wafer measured input sensitivity vs. input frequency. The
divider operates up to the frequency of 53 GHz. Figure 46 shows the input (top)
and output (bottom) transient signals at this input frequency. The single-ended
output voltage swing at an external 50 fl load is measured to be 240 mV.

P. (dBm)

0 5 10 15 20 25 30 35 40 45 50 55

f (GHz)

Fig. 45. Single-ended input sensitivity.

150mVjl

100 mV J )

20 ps
Fig. 46. Single-ended input (top) and output (bottom) signal at 53 GHz input frequency measured
on-wafer.
68 M. Wurzer et al.

To evaluate the performance of the circuit at module-level, chips have been


mounted on 15 mil ceramic substrates (e r = 9.9). The ceramic substrates (Fig-
ure 47) measure 30 x30mm 2 . Special care has been taken to optimize the signal
transmission from the signal generator to the input pads of the chip. To minimize
the length of the bond wires, the chip is positioned in a way that its surface is on the
same level as signal, ground and supply lines on the ceramic substrate. No additional
cooling has been applied. The measured input sensitivity is depicted in Figure 48.
The highest sensitivity is around the frequency of 42 GHz. The maximum operating
frequency is 49.6 GHz. For comparison the best published value for mounted static
frequency dividers taking into account all technologies is 45.2 GHz 4 3 .

Input1 — • Output

Fig. 47. Photograph of the test fixture (size: 30 x30mm 2 .)

P. (dBm)

-30
0 5 10 15 20 25 30 35 40 45 50

f (GHz) •

Fig. 48. Measured single-ended input sensitivity for mounted chips.


High Performance Si and SiGe Bipolar Technologies and Circuits 69

3.2.3. Dynamic Frequency Divider


In this section we present a dynamic frequency divider optimized for maximum
operating speed. The circuit has a divide ratio of two which is realized by using
a single regenerative divider stage 44 . The circuit is similar to the first divider
stage of the low-power dynamic divider presented in a previous section (Figure 15).
However, it uses larger transistors and higher operating currents. An additional
emitter follower stage helps to extend the loop bandwidth and thereby the maximum
operating frequency (Figure 49). The circuit has a differential input. Both input
pins are connected to the bias voltage via on-chip 50 ft resistors.

Vcc

Input

Buffer

nnn
Fig. 49. Circuit diagram of the high-speed regenerative divider.

The output voltage of the regenerative divider is applied to a two-stage limit-


ing amplifier. This amplifier provides a constant output amplitude of the divider
independent of the input signal.
The performance of the dynamic frequency divider was evaluated by on-wafer
measurements with single-ended input signals. The complementary input of the
circuit was left unconnected. The circuit operates up to a maximum frequency of
79.2 GHz. This is close to 81 GHz fr of the transistors used in the divider and shows
that the divider makes full use of the potential of the technology. Figure 50 shows
the minimum input signal power required by the divider. The circuit operates over
a wide frequency range, with 26.6 GHz minimum. Figure 51 shows the single-ended
input and output signal of the divider at an input frequency of 79 GHz. The single-
ended output voltage swing is 240mV pp . With a supply voltage of 7.5 V the circuit
consumes 143 mA. Approximately half of this current is consumed by the divider
stage and the rest by the output buffer.
70 M. Wurzer et al.

P.n (dBm)

0 10 20 30 40 50 60 70 80 90
f (GHz) *•

Fig. 50. Input sensitivity of the dynamic frequency divider.

^IJililililifif.lililililililil
ttrawiwtwiiiwiwinni
L
100 mV I

I : I
1 !
I—I
20 ps
Fig. 51. Input (top) and output (bottom) signal of the dynamic frequency divider (/;„ = 79 GHz).

3.2.4. Low-Power Dual-Modulus Prescaler

State-of-the-art SiGe technologies have shown their potential for high-speed cir-
cuits operating above 40 GHz i2>13.44.45. For applications requiring lower operating
frequencies these technologies allow the design of very low power circuits. As an
example a dual-modulus prescaler for frequencies up to 20 GHz is presented in this
section.
High Performance Si and SiGe Bipolar Technologies and Circuits 71

Dual-modulus prescalers are frequency dividers which can be switched between


two different divide ratios. They are widely used in frequency synthesizers. Fig-
ure 52 shows the block diagram of a frequency synthesizer using a prescaler. The
output frequency of the voltage-controlled oscillator (VCO) is applied to a pro-
grammable frequency divider. Programmable dividers for frequencies below 100 MHz
are readily available whereas programmable dividers for higher frequencies can be
realized only with the penalty of high power consumption. For this reason prescalers
are commonly used to perform frequency division of the VCO signal down to fre-
quencies below 100 MHz. The use of fixed-modulus prescalers requires a corre-
sponding reduction of the reference frequency of the PLL and therefore deteriorates
the PLL performance. Dual-modulus prescalers, on the other hand, extend the
frequency range of programmable dividers without these drawbacks. The circuit
shown in Figure 52 uses two low-speed programmable dividers (divide-by-N and
divide-by-A) and a dual-modulus prescaler with selectable divide ratios of P and
P + 1. The resulting output frequency fvco of the synthesizer is (N-P + A) JREF-

Reference Output
Frequency Phase Loop Frequency
Detector Filter VCO
fREF x

Frequency
Divider Dual Modulus
+N Prescaler
+P / +(P+1)

Modulus Control

Fig. 52. Frequency synthesizer block diagram.

Synchronous divider +4/5

Input —K>
<^|_ Modulus
\ r control

D Q-
C Q — [ > - Output
Asynchronous divider +64

Fig. 53. Dual-modulus prescaler block diagram.


72 M. Wurzer et al.

The prescaler is realized using a synchronous divide-by-four/divide-by-five input


stage and a six-bit asynchronous divider (Figure 53). When the modulus control
input MC is held HIGH the input stage divides by four and the overall divide ratio
of the prescaler is 256. When MC is LOW and the outputs of all six stages of the
asynchronous divider are LOW, too, the input stage divides by five. The overall
divide ratio is then 1 • 5 + 63« 4 = 257.
The circuit is implemented using differential current-mode logic (CML) with a
voltage swing of 200 mVpp m. This low voltage swing allows for low-power operation
while still providing sufficient noise margin. Figure 54 shows a photograph of the
chip (chip size: 550 x 450pan 2 ).

Input ~ * • i • Output

Fig. 54. Dual-modulus prescaler chip photograph (size: 550 x 450 | i m 2 ) .

P i n (dBm)

f (GHz)
Fig. 55. Prescaler input sensitivity (14c = 2.3 V).
High Performance Si and SiGe Bipolar Technologies and Circuits 73

T h e circuit o p e r a t e s with supply voltages from 2 V t o 5 V. W i t h a supply voltage


of 2.3 V t h e circuit consumes 2 7 m W a n d o p e r a t e s up to a m a x i m u m input frequency
of 20 GHz. Its power consumption is t h e lowest reported to d a t e for these frequencies
47,48 Figure 55 shows t h e measured input sensitivity of t h e prescaler. T h e two
traces indicate t h e m i n i m u m and m a x i m u m input power for p r o p e r operation. T h e
m e a s u r e m e n t s were performed with chips m o u n t e d on microstrip test b o a r d s .

4. Summary

We have presented m e a s u r e m e n t results on circuits realized in p r o d u c t i o n - n e a r Si


a n d SiGe bipolar technologies. T h e results, m a d e possible by careful circuit de-
sign in combination with a well-balanced technology development, d e m o n s t r a t e t h e
high-speed a n d low-power potential of these technologies. Due t o t h e a t t r a c t i v e
p e r f o r m a n c e / c o s t relation electronic components in Si a n d SiGe technologies are
promising candidates for applications in future wireless and high-speed optical fiber
communication systems.

Acknowledgements

P a r t of this work was s u p p o r t e d by t h e E u r o p e a n Union within t h e Esprit project


23229 ( B e t a ) .

References

1. A. Pruijmboom, D. Szmyd, R. Brock, R. Wall, N. Morris, K. Fong, and F.


Jovenin, "QUBiC3: A 0.5 /im BiCMOS Production Technology, with / T = 30GHz,
fmax = 60 GHz and High-Quality Passive Components for Wireless Telecommunication
Applications," in Procceedings of the IEEE Bipolar Circuits and Technology Meeting,
Minneapolis, MN, USA, Sept. 1998, pp. 120-123.
2. J. Bock, T. F. Meister, H. Knapp, K. Aufinger, M. Wurzer, R. Gabl, M. Pohl, S.
Boguth, M. Franosch, and L. Treitinger, "0.5/Um/60 GHz fmax Implanted Base Si
Bipolar Technology," in Procceedings of the IEEE Bipolar Circuits and Technology
Meeting, Minneapolis, MN, USA, Sept. 1998, pp. 160-163.
3. E. F. Crabbe, J. H. Comfort, W. Lee, J. D. Cressler, B. S. Meyerson, J. Y. Megdanis,
C. Sun, and J. M. C. Stork, "73-GHz Self-Aligned SiGe-Base Bipolar Transistors with
Phosphorus-Doped Polysilicon Emitter," IEEE Electron Device Letters 13 (1992) 259-
261.
4. F. Sato, T. Tatsumi, T. Hashimoto, and T. Tashiro, "A Super Self-Aligned Selectively
Grown SiGe Base (SSSB) Bipolar Transistor fabricated by Cold-Wall Type UHV/CVD
Technology," IEEE Transaction on Electron Devices 41 (1994) 1373-1378.
5. K. Washio, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and T. Onai, "A Selective-
Epitaxial SiGe HBT with SMI Electrodes Featuring 9.3-ps ECL-Gate Delay," in IEEE
IEDM Digest of Technical Papers, Washington, DC, USA, Dec. 1997, pp. 795-798.
6. A. Schiippen, U. Erben, H. Gruhle, H. Kibbel, H. Schumacher, and U. Konig, "En-
hanced SiGe Heterojunction Bipolar Transistors with 160 GRz-fmax," in IEEE IEDM
Digest of Technical Papers, Washington, DC, USA, Dec. 1995, pp. 743-746.
7. T. F. Meister, H. Schafer, M. Franosch, W. Molzer, K. Aufinger, U. Scheler, C. Walz,
M. Stolz, S. Boguth, and J. Bock, "SiGe Base Bipolar Technology with 74 GHz imax
74 M. Wurzer et al.

and l i p s gate delay," in IEEE IEDM Digest of Technical Papers, Washington, DC,
USA, Dec. 1995, pp. 739-742.
8. E. Ohue, Y. Kiyota, T. Onai, M. Tanabe, and K. Washio, "100-GHz fT Homojunction
Bipolar Technology," in Symposium on Very Large Scale Integrated Technology Digest
of Technical Papers Papers, Honolulu, HI, USA, June 1996, pp. 106-107.
9. M. Ugajin, J. Kodate, Y. Kobayashi, S. Konaka, and T. Sakai, "Very-High //< and fmax
Silicon Bipolar Transistors using Ultra-High-Performance Super Self-Aligned Process
Technology for Low-Energy and Ultra-High-Speed LSI's," in IEEE IEDM Digest of
Technical Papers, Washington, DC, USA, Dec. 1995, pp. 735-738.
10. D. C. Ahlgren, M. Gilbert, D. Greenberg, S. J. Jeng, J. Malinowski, D. Nguyen-Ngoc, K.
Schonenberg, R. Stein, K. Groves, K. Walter, G. Hueckel, D. Colavito, G. Freeman, D.
Sunderland, D. L. Harame, and B. Meyerson, "Manufacturability Demonstration of an
Integrated SiGe HBT Technology for the Analog and Wireless Marketplace," in IEEE
IEDM Digest of Technical Papers, San Francisco, CA, USA, Dec. 1996, pp. 859-862.
11. K. Washio, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and T. Onai, "95 GHz fT Self-
Aligned Selective Epitaxial SiGe HBT with SMI Electrodes," in IEEE ISSCC Technical
Digest, San Francisco, CA, USA, Feb. 1998, pp. 312-313.
12. K. Washio, E. Ohue, K. Oda, R. Hayami, M. Tanabe, H. Shimamoto, T. Harada, and
M. Kondo, "82 GHz Dynamic Frequency Divider in 5.5 ps ECL SiGe HBTs," in IEEE
ISSCC Digest of Technical Papers, San Francisco, CA, USA, Febr. 2000, pp. 210-211.
13. K. Washio, R. Hayami, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and M. Kondo,
"67-GHz Static Frequency Divider Using 0.2-/im Self-Aligned SiGe HBTs," in Radio
Frequency Integrated Circuits Symposium, Boston, MA, USA, June 2000, pp. 31 34.
14. J. Bock, A. Felder, T. F. Meister, M. Franosch, K. Aufinger, M. Wurzer, R. Schrei-
ter, S. Boguth, and L. Treitinger, "50 GHz Implanted Base Silicon Bipolar Technology
with 35 GHz Static Frequency Divider," in Symposium, on VLSI Technology Digest of
Technical Papers, Honolulu, HI, USA, June 1996, pp. 108-109.
15. Infineon, B F P 520, data sheet, Infineon Technologies AG, Munich, 1999.
16. J. N. Burghartz, J. Y. C. Sun, C. L. Stanis, S. R. Mader, and W. J. D., "Identification
of Perimeter Depletion and Emitter Plug Effects in Deep-Submicrometer, Shallow-
Junction Polysilicon Emitter Bipolar Transistors," IEEE Transactions on Electron De-
vices 39 (1992) 1477-1489.
17. J. Bock, M. Franosch, H. Schafer, H. von Philipsborn, and J. Popp, "In-situ Doped
Emitter-Polysilicon for 0.5 /im Silicon Bipolar Technology," in Procceedings of the Eu-
ropean Solid State Device Research Conference, The Hague, Netherlands, Sept. 1995,
pp. 421-424.
18. M. C. A. M. Koolen, J. A. M. Geelen, and M. P. J. G. Versleijen, "An Improved De-
embedding Technique for On-wafer High-frequency Characterization," in Procceedings
of the IEEE Bipolar Circuits and Technology Meeting, Minneapolis, MN, USA, Sept.
1991, pp. 188-191.
19. R. H. Derksen and H.-M. Rein, "7.3-GHz Dynamic Frequency Dividers Monolithically
Integrated in a Standard Bipolar Technology," IEEE Transactions on Microwave The-
ory and Techniques 36 (1988) 537-541.
20. R. L. Miller, "Fractional-Frequency Generators Utilizing Regenerative Modulation,"
Proceedings of the I.R.E. 27 (1939) 446-457.
21. K. Washio, E. Ohue, M. Tanabe, and T. Onai, "Self-Aligned Metal/IDP Si Bipolar
Technology with 12-ps ECL and 45-GHz Dynamic Frequency Divider," IEEE Transac-
tions on Electron Devices 44 (1997) 2078-2082.
22. M. Soyuer, K. A. Jenkins, J. N. Burghartz, H. A. Ainspan, F. J. Canora, S. Ponnapalli,
J. F. Ewen, and W. E. Pence, "A 2.4-GHz Silicon Bipolar Oscillator with integrated
Resonator," IEEE Journal of Solid-State Circuits 31 (1996) 268-270.
High Performance Si and SiGe Bipolar Technologies and Circuits 75

23. B. Jansen, K. Negus, and D. Lee, "Silicon Bipolar VCO Family for 1.1 to 2.2 GHz
with Fully-Integrated Tank and Tuning Circuits," in IEEE ISSCC Digest of Technical
Papers, San Francisco, CA, USA, Febr. 1997, pp. 392-393.
24. J. Craninckx and M. Steyaert, "A 1.8-GHz Low-Phase-Noise CMOS VCO Using Opti-
mized Hollow Spiral Inductors," IEEE Journal of Solid-State Circuits 32 (1997) 736-
744.
25. Z. G. Wang, M. Berroth, A. Thiede, M. Rieger-Motzer, T. Jakobus, A. Hulsmann, K.
Kohler, and B. Raynor, "40 GHz monolithically-integrated fully-balanced VCO using
0.3 ^ m HEMTs," IEE Electronics Letters 33 (1997) 422-423.
26. C. N. Rheinfelder, F. Beiflwanger, J. Gerdes, F. J. Schmiickle, K. M. Strohm, J.-F. Luy,
and W. Heinrich, "A Coplanar 38-GHz SiGe MMIC Oscillator," IEEE Microwave and
Guided Wave Letters 11 (1996) 398-400.
27. H. Knapp, H.-D. Wohlmuth, J. Bock, and A. Scholtz, "A 22 GHz monolithically in-
tegrated oscillator in silicon bipolar technology," IEE Electronics Letters 35 (1999)
438-439.
28. A. Felder, M. MSller, J. Popp, J. Bock, and H.-M. Rein, "46 Gb/s DEMUX, 5 0 G b / s
MUX, and 30 GHz Static Frequency Divider in Silicon Bipolar Technology," IEEE
Journal of Solid-State Circuits 31 (1996) 481-486.
29. A. Felder, M. Moller, M. Wurzer, M. Rest, and H.-M. Rein, "60 Gb/s regenerating
demultiplexer in SiGe bipolar technology," IEE Electronics Letters 33 (1997) 1984-
1986.
30. W. Bogner, U. Fischer, E. Gottwald, and E. Miillner, "20Gbit/s TDM nonrepeatered
transmission over 198 km DSF using Si-bipolar IC for demultiplexing and clock recov-
ery," in Proceedings of European Conference on Optical Communication, Oslo, Norway,
Sept. 1996, pp. 203-206.
31. W. Bogner, E. Gottwald, A. Schopflin, and C.-J. Weiske, "20 Gbit/s unrepeatered opti-
cal transmission over 148 km by electrical time division multiplexing and demultiplex-
ing," IEE Electronics Letters 33 (1997) 2136-2137.
32. R. Yu, R. Pierson, P. Zampardi, K. Runge, A. Campana, D. Meeker, K. C. Wang, A.
Petersen, and J. Bowers, "Packaged clock recovery integrated circuits for 40GBit/s
optical communication link," in GaAs IC Symposium Technical Digest, Orlando, FL,
USA, Nov. 1996, pp. 129-132.
33. M. Mokhtari, T. Swahn, R. H. Walden, W. E. Stanchina, M. Kardos, T. Juhola, G.
Schuppener, H. Tenhunen, and T. Lewin, "InP-HBT chip-set for 40-Gb/s fiber optical
communication systems operational at 3 V," IEEE Journal of Solid-State Circuits 32
(1997) 1371-1383.
34. M. Lang, Z.-G. Wang, Z. Lao, M. Schlechtweg, M. Thiede, M. Rieger-Motzer, M. Sedler,
W. Bronner, G. Kaufel, K. Kohler, A. Hulsmann, and B. Raynor, "20-40 Gb/s 0.2-fim
GaAs HEMT chip set for optical data receiver," IEEE Journal of Solid-State Circuits
32 (1997) 1384-1393.
35. M. Wurzer, J. Bock, W. Zirwas, H. Knapp, F. Schumann, A. Felder, and L. Treitinger,
"40 G b / s Integrated Clock and Data Recovery Circuit in a Silicon Bipolar Technology,"
in Procceedings of the IEEE Bipolar Circuits and Technology Meeting, Minneapolis, MN,
USA, Sept. 1998, pp. 136-139.
36. H.-M. Rein and M. Moller, "Design considerations for very-high-speed Si-bipolar IC's
operating up to 50 Gb/s," IEEE Journal of Solid-State Circuits 31 (1996) 1076-1090.
37. J. Hauenschild and H.-M. Rein, "Influence of transmission-line interconnections be-
tween Gbit/s IC s on time jitter and instabilities," IEEE Journal of Solid-State Circuits
25 (1990) 763-766.
38. M. Moller, H.-M. Rein, A. Felder, and T. F. Meister, "60 Gb/s time division multiplexer
in SiGe-bipolar technology with special regard to mounting and measuring technique,"
76 M. Wurzer et al.

IEE Electronics Letters 33 (1997) 679-680.


39. T. F. Meister, R. Stengl, H. W. Meul, R. Weil, P. Packan, A. Felder, H. Klose, R.
Schreiter, J. Popp, H.-M. Rein, and L. Treitinger, "Sub-20ps silicon bipolar technol-
ogy using selective epitaxial growth," in IEEE IEDM Digest of Technical Papers, San
Francisco, CA, USA, Dec. 1992, pp. 401-404.
40. F. Sato, T. Hashimoto, T. Tatsumi, H. Kitahata, and T. Tashiro, "Sub-20psec ECL
Circuits with 50 GHz fmax Self-aligned SiGe HBTs," in IEEE IEDM Digest of Technical
Papers, San Francisco, CA, USA, Dec. 1992, pp. 397-400.
41. K. Wasliio, M. Kondo, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and T. Harada,
"A 0.2-fJ.m Self-Aligned SiGe HBT Featuring 107-GHz fmnx and 6.7 ps ECL," in IEEE
IEDM Digest of Technical Papers, Washington, DC, USA, Dec. 1999, pp. 795-798.
42. Q. Lee, D. Mensa, J. Guthrie, S. Jaganathan, T. Mathew, Y. Betser, S. Krishnan,
S. Ceran, and M. J. W. Rodwell, "66 GHz Static Frequency Divider in Transferred-
substrate HBT Technology," in IEEE Radio Frequency Integrated Circuits Symposium
Digest of Papers, Anaheim, CA, USA, June 1999, pp. 87-90.
43. T. Otsuji, M. Yoneyama, K. Murata, Y. Imai, T. Enoki, and Y. Umeda, "2-46.5 GHz
Quasi-static 2:1 Frequency Divider IC using InAlAs/InGaAs/InP HEMTs," IEE Elec-
tronics Letters 33 (1997) 1376-1377.
44. H. Knapp, T. F. Meister, M. Wurzer, D. Zoschg, K. Aufinger, and L. Treitinger, "A
79 GHz Dynamic Frequency Divider in SiGe Bipolar Technology," in IEEE ISSCC
Digest of Technical Papers, San Francisco, CA, USA, Febr. 2000, pp. 208-209.
45. M. Wurzer, T. F. Meister, H. Knapp, K. Aufinger, R. Schreiter, S. Boguth, and L.
Treitinger, "53 GHz Static Frequency Divider in a Si/SiGe Bipolar Technology," in
IEEE ISSCC Digest of Technical Papers, San Francisco, CA, USA, Febr. 2000, pp. 206-
207.
46. H. Knapp, T. F. Meister, M. Wurzer, K. Aufinger, S. Boguth, and L. Treitinger, "A Low
Power 20 GHz SiGe Dual-Modulus Prescaler," in 2000 IEEE International Microwave
Symposium, Boston, MA, USA, June 2000, pp. 731-734.
47. T. Maeda, S. Wada, M. Tokushima, M. Ishikawa, J. Yamazaki, and M. Fujii, "An
Ultralow-Power-Consumption, High-Speed, GaAs 256/258 Dual-Modulus Prescaler,"
IEEE Journal of Solid-State Circuits 34 (1999) 212-218.
48. S. Wada, T. Maeda, J. Tokushima, M. Yamazaki, M. Ishikawa, and M. Fujii, "A
27 GHz/151 m W GaAs 256/258 Dual Modulus Prescaler IC with 0.1 /im Double-Deck-
Shaped (DDS) Gate E/D-HJFETs," in IEEE GaAs IC Symposium Technical Digest,
Atlanta, GA, USA, Nov. 1998, pp. 125-128.
International Journal of High Speed Electronics and Systems, Vol. 11, No. 1 (2001) 77-114
© World Scientific Publishing Company

SELF-ALIGNED Si BJT/SiGe HBT TECHNOLOGY


AND
ITS APPLICATION TO HIGH-SPEED CIRCUITS

KATSUYOSHI WASHIO
Central Research Laboratory, Hitachi Ltd.
1 -280 Higashi-Koigakubo, Kokubunji, Tokyo 185-8601, Japan

In a Si bipolar transistor (BJT) and a SiGe heterojunction bipolar transistor (HBT), self-aligned struc-
tures help to improve high-speed and high-frequency characteristics. These structures are used to re-
duce parasitic capacitance and resistance, and thus maximize the transistor's intrinsic performance. In
addition to generally used process technology, selective metal deposition to form electrodes and selec-
tive epitaxial growth of Si/SiGe multilayers are applied in the fabrication process. To improve the
intrinsic speed, the cutoff frequency, a shallow diffusion process for Si BJTs and a graded-Ge profile
SiGe-base layer for SiGe HBTs are used. These also enable a high maximum oscillation frequency and
a small gate delay in the emitter-coupled logic through the synergistic effect of the self-aligned struc-
ture. Both high-speed digital circuits — frequency dividers up to millimeter-wave bands and a multi-
plexer/demultiplexer for optical-fiber-links — and high-frequency analog circuits for optical-fiber-links
— a preamplifier, an automatic gain control amplifier, a limiting amplifier, and a decision circuit —
have been implemented by applying the self-aligned Si BJTs and/or SiGe HBTs.

1. Introduction
To meet the rapidly growing demand for an improved information infrastructure, the capac-
ity of the backbone transmission network must be expanded. Ultra-high-speed monolithic
integrated circuits (IC) are key components for optical-fiber-link systems. Therefore, sev-
eral ultra-high-speed Si bipolar transistors (BJTs) and SiGe heterojunction bipolar transis-
tors (HBTs) have been developed, and the feasibility of Si BJT and SiGe HBT technologies,
which enable a data rate of 20 or 40 Gb/s, has been investigated.'"7
To improve the operating speed of a Si BJT, the following approach has been used. To
reduce the parasitic capacitance, a self-aligned structure with poly-Si emitter and base elec-
trodes has been used. To shorten the base transit time, shallow emitter/base profiles using
various diffusion processes has been applied. To reduce the base resistance, poly-Si elec-
trodes combined with self-aligned transistor structures have been effective through their
low resistance and short space between the emitter and base electrode. However, the base
resistance has not been greatly reduced, because fine fabrication technology has led to an
increase in the resistance of the poly-Si electrode surrounding the emitter and the link base
resistance has been becoming the dominant component of the base resistance.8 A Si BJT
with a self-aligned stacked metal/in-situ doped poly-Si (SMI) base electrode is one solution

77
78 K. Washio

to this problem and it provides both low base resistance and low collector capacitance. By
using such Si BJTs, high-speed circuits (for example, a 12-ps-delay emitter-coupled-logic
(ECL) gate and a 45-GHz dynamic frequency divider) have been obtained even with an
implanted base.9,10
The SiGe HBT is an attractive candidate to achieve a fast base transit time, and a below-
15-ps ECL gate delay "' '2 and a cutoff frequency of over 100 GHz 13,14 have been reported.
This technology is well suitable to high-speed circuits. However, to enable ultra-high-
speed operation, a fast base transit time with a SiGe base needs to be achieved simulta-
neously with low base resistance and low parasitic capacitance to improve the operating
speed of analog and digital circuits. That is, the SiGe HBT should be combined with low-
base-resistance and low-parasitic-capacitance techniques, such as SMI base electrodes and
the self-aligned transistor structure. Using this approach, a self-aligned selective-epitaxial-
growth (SEG) SiGe HBT with SMI electrodes, having a high cutoff frequency and maxi-
mum oscillation frequency of about 100 GHz and a below-10-ps ECL gate delay, has been
developed.15 This technology was applied in ICs for optical-fiber-link systems including a
static frequency divider with a maximum operating frequency of up to 50 GHz, a time-
division multiplexer and demultiplexer operating at 40 Gb/s, a preamplifier with a band-
width of 35 GHz, an automatic-gain-control amplifier core with a bandwidth of 32 GHz,
and a decision circuit operating at 40 Gb/s.6,7
Furthermore, for future optical data communication systems and microwave/millimeter-
wave systems, both high-speed operation and sophisticated functions are simultaneously
required. Therefore, ultra-high-speed transistors fully compatible with the CMOS process
are essential.16 SiGe HBTs compatible with CMOS can be fabricated by the well-estab-
lished Si process, so they are the most promising candidate to meet this requirement. Thus,
a self-aligned SEG SiGe HBT that has shallow-trench and dual-deep-trench isolations and
Ti-salicide electrodes has been developed.17,18 This HBT is fabricated on a 200-mm wafer
line and the fabrication process is almost completely compatible with the 0.2-p.m bipolar-
CMOS technology that is applied to a fast-cache memory chip.19 These SiGe HBTs provide
a 122-GHz cutoff frequency, a 163-GHz maximum oscillation frequency, and an ECL gate
delay time of 5.5 ps. To satisfy the specifications for MMICs, passive elements (an MIM
capacitor and a high-Q inductor) are fabricated on the same chip by using four-level inter-
connects. A static frequency divider and a dynamic frequency divider with a maximum
operating frequency of up to 67 GHz 20 and up to 82.4 GHz, respectively, have been devel-
oped. A preamplifier with a 45-GHz bandwidth, a limiting amplifier with a 49-GHz band-
width, and a 40-Gb/s 1:4 high-sensitivity demultiplexer combined with a decision circuit
for practical use in a 40-Gb/s optical receiver have also been developed.21 These excellent
capabilities also indicate that SiGe HBTs will play a major role in future millimeter-wave
systems.

2. Index for High-Speed Characteristics of a Bipolar Transistor


To develop a high-speed bipolar transistor successfully, the development should follow an
appropriate index. For bipolar digital circuits, the gate delay time of an ECL circuit is the
most popular index of operating speed. On the other hand, for bipolar analog circuits, the
Self-Aligned Si BJT/SiGe HBT Technology and Its Application 79

maximum oscillation frequency, one of the transistor characteristics used to indicate high-
frequency performance, is generally used. In this section, the relationship between ECL
gate delay and transistor parameters, and the relationship between maximum oscillation
frequency and analog circuit performance is explained.

2.1. ECL gate delay


The progress made in reducing the gate-delay time in ECL circuits for high-speed BJTs is
reviewed in Fig. 1. From the beginning of the 1980s, the ECL gate delay has been short-
ened through improved self-aligned structures, as shown by the open circles. The shortest
delay of 12 ps was achieved in 1996. However, that seems to have been the limit on higher
speed in Si BJTs. Therefore, from the start of the 1990s, results achieved with SiGe HBTs
are shown (solid circles). Optimization of the SiGe-base profile and application of a self-
aligned structure enabled a delay of 5.5 ps in 2000; the shortest yet reported. The ECL gate
delay will probably reach less than 5 ps within the next few years.

100
0
8.50 0

o SiBJT
? ° ° 8...6-.. • SiGe HBT
to 9
c
"53 10 -

B •
5
"cG

O
LU
, ,,
1985 1990 1995 2000 2005
Year
Fig. 1. Progress of gate-delay time in ECL circuits for high-speed BJTs.

The delay time of an ECL gate tpcl is approximately given as a function of parasitic
capacitance Cp, base transit time T, and base resistance n„ as shown in the equation

I'd
•• a, Cp I Ics + a2 x + a 3 rb lcs (1)
where / cs is switching current, and a,, a2, and a, are coefficients depending on the transistor
parameters. Here, Cp is mainly composed by the capacitance of the collector, the substrate,
and the load resistor. Figure 2 shows the asymptotic analysis of each term as a function of
ICs- The term related to Cp decreases as current increases, the T term increases at high
injection (due to the Kirk effect), and the rb term increases with current. To improve the
operating speed in ECL circuits, lower parasitic capacitance, a faster base transit time, and
lower base resistance should be achieved simultaneously. That is, the transistor parameters
and ECL gate delay should be improved, as shown in Fig. 2, from the solid lines to the
dashed lines (optimized parameters).
80 K. Washio

Fig. 2. Asymptotic analysis of ECL gate delay time t^ as a function of the switching current Ics. Here, ir,, is
approximately given as a function of parasitic capacitance Cp, base transit time T, and base resistance rb.

2.2. Maximum oscillation frequency


Maximum oscillation frequency fmax, the frequency at which the unilateral gain becomes
unity, is approximately given by the equation

f = fr
J max %TlC,:rb (2)

where fr is cutoff frequency, Cjc is collector capacitance, and rb is base resistance. Because
some of the transistor parameters used in Eqs. (1) and (2) are the same,/„„ has a depen-
dence on transistor characteristics similar to that of the ECL gate delay. However, other
parasitics, e.g. parasitic capacitance of the substrate, load resistor, and interconnects, affect
the ECL gate delay, so there is a difference in the dependence on operating current (switch-
ing current for ECL gate delay and collector current for/„,„). Therefore, if fmax is used as an
index for digital circuits, the effect of other parasitics and the current dependence should be
taken into account. By measuring s parameters of a transistor,/„„ can be obtained directly
from the unilateral gain. The unilateral gain is the forward power gain in a feedback ampli-
fier, so it is a suitable index for analog circuits. The relationship between/„,„ and the
operating frequency of analog circuits, represented by the transmission speed of optical-
fiber-link systems in which they used, is shown in Fig. 3. From this, it can be seen that
about one-fourth of/„„ is available to analog circuits used in optical-fiber links operating at
the same bit rate; that is,/ m „ of 10,40, and 160 GHz is available for transmission speeds of
2.5, 10, and 40 Gb/s, respectively.

3. Si Bipolar Transistor
A shallow diffusion process to improve the intrinsic speed, that is, the cutoff frequency, is
important for high-speed Si bipolar transistors. Self-aligned transistor structures also con-
tribute to high-speed and high-frequency characteristics. Such structures are essential to
Self-Aligned Si BJT/SiGe HBT Technology and Its Application 81

CO 1 ^ S "

.a 40GtVs
S
T3 10GtVs
a>

spe
2.5GWS Cf
c
o
'in
<n
£
co
c i |
« ' ,|
fn (GHz)
'max
Fig. 3. Relationship between/„,„ and the operating frequency of analog circuits (represented by the
transmission speed of optical-fiber-link systems in which they used).

enable intrinsic speed that is not greatly degraded by parasitic capacitance or resistance. In
this section, a vapor-phase shallow boron diffusion process for intrinsic base formation, and
a self-aligned transistor structure and fabrication process for selective metal deposition to
form electrodes, are described.

3.1. Shallow boron diffusion process for a thin base


As explained in Section 2, cutoff frequency/,- is one of the most important transistor param-
eters. To achieve high/,, a shallow and high-concentration base should be formed to reduce
the carrier transit time without allowing base punchthrough. For this purpose, lamp-heated
rapid vapor-phase doping (RVD) has been developed.22

3.1.1. RVD equipment and process


RVD is a kind of gas source diffusion in which a hydrogen carrier gas and B2H6 source gas
are used for p-type doping. Doping experiments were carried out using equipment that
included a lamp annealing system (Fig. 4). Before doping, the wafer was treated by a
conventional method that involved chemical cleaning, HF dipping, and rinsing with de-
ionized water to remove native oxide. After the wafer was loaded, the atmosphere is changed

H2 10 l/min
B2H6/H2(0.1%) 40, 50 ml/min
doping temperature 900°C
doping pressure atmospheric
doping time 60s

optical
pyrometer
Fig. 4. RVD equipment and boron doping conditions.
82 K. Washio

from nitrogen to hydrogen. B2H6 gas was then introduced into the chamber with hydrogen
carrier gas at room temperature. After the gases were distributed uniformly in the chamber,
the wafer was rapidly heated for boron doping. The doping conditions are summarized in
Fig. 4. The low B2H6 concentration in the chamber provided a suitable surface boron con-
centration without segregation. In RVD, the impurity is introduced from the vapor-phase
into the Si without any native oxide. Thus, the impurity concentration can be controlled by
the B2H6 gas flow rate, rather than by the solid solubility, which permits a wide range of
concentration. Also, a short doping time, for example, 1 min., is adequate for making
shallow junctions.
Typical RVD boron profiles, formed with a doping temperature of 900°C and time of 60
s, are shown in Fig. 5. The boron concentrations at the surface were 5 x 1018 and 1.2 x 1019
cm", and the diffusion depths at which the concentration reached 10'8 cm"3 were less than
20 nm. The activation efficiency of the boron doped layers, estimated by using Irvin's
curve, and the measured sheet resistances were evaluated and it was found that the boron
atoms were fully activated immediately after doping without extra annealing. Thus, the
advantages of RVD are that it reduces the thermal budget and forms shallow junctions.

20 40 60 80 100
depth (nm)
Fig. 5. Typical boron profiles formed by RVD. Doping temperature was 900°C and doping time was 60 s.
BiHf, gas flow rates were 40 and 50 ml/min.

_100
& RVD

r 10 BF 2 l/l
o
sL/V
0) 1 \ VI0 keV .

"5
a;
M
0.1
10 100
diffusion depth (nm)
Fig. 6. Relation between the diffusion depth and sheet resistance. Diffusion depth was defined as the point
where the boron concentration was 1018 cm"3. Activation annealing after BF2 ion implantation (I/I) was at 950°C
for 10 s.
Self-Aligned Si BJT/SiGe HBT Technology and Us Application 83

3.1.2. Electrical characteristics


The relationship between the diffusion depth and sheet resistance is shown in Fig. 6. The
diffusion depth was defined as the point where the boron concentration was 1018 cm 3 . Data
from BF2 ion implantation (I/I) after activation annealing at 950°C for 10 s are also shown.
The advantage of RVD over BF2 implantation is clear. With BF2 I/I, the diffusion depth of
a boron-doped layer with a few kQ/sq. sheet resistance is 40 to 50 nm. With RVD, it is less
than 30 nm. Therefore, RVD should enable us to make a shallower and higher concentra-
tion base than we can with BF2 I/I.
The ultra-shallow base resulted in a 100-GHz/ r when the emitter area was 0.6 x 10 \xm
(Fig. 7).23 Such characteristics are difficult to obtain with conventional base formation
processes such as ion implantation. Thus, RVD is an effective way to fabricate Si bipolar
transistors that offer high/ 7 .

120 I I I 111111 1 I I I llll| I I I I MM


A E = 0 . 6 x 1 0 urn '.
100

^80

2 . 60

^ 40

20
I • • •' ""I • ' • • ""
°0.01 0.1 1 10
J c (mA/|jm2)
Fig. 7. Dependence of cutoff frequency (J'r) on collector current density (Jr). The emitter area (Ai?) was 0.6 x 10
u.m.

3.2. Self-aligned Si bipolar transistor with stacked metal/in-situ doped poly-Si electrodes
As shown in Fig. 2, base resistance, base transit time, and parasitic capacitance should be
simultaneously reduced to improve the operating speed in ECL circuits. Poly-Si electrodes
combined with self-aligned transistor structures are an effective means of reducing base
resistance, through their low resistance and short space between the emitter and base elec-
trode. Base resistance has not been greatly reduced, though, because finer fabrication tech-
nology has led to increased link base resistance. Moreover, as vertical scaling has im-
proved, the poly-Si base electrode has become thinner and its sheet resistance has risen.
Thus, optimization of the above three transistor parameters is not sufficient. From this
point of view, the reduction of base resistance, while maintaining low parasitic capacitance,
is a significant step towards obtaining a high-speed ECL gate.
For this reason, a self-aligned stacked metal/in-situ doped poly-Si (IDP) (referred to as
SMI) technology was developed.9 The stacked metal/IDPbase electrode is formed in a self-
aligned manner through selective deposition of tungsten. In SMI technology, there is no
heat treatment - used when making transistors employing salicide - which could cause
84 K. Washio

unwanted diffusion of the base dopants, and only emitter drive-in annealing is required
after the base formation. Thus, shallow intrinsic and link base profiles can be easily pro-
duced. This SMI technology, with its small thermal budget, has enabled reduced base resis-
tance while keeping collector capacitance low — essential to achieve both a high maximum
oscillation frequency and a high cutoff frequency to improve the operating speed in both
analog and digital bipolar circuits.

3.2.1. Device structure and fabrication process


A schematic cross-section of a transistor fabricated using SMI technology is shown in Fig.
8. A self-aligned stacked tungsten/in-situ boron-doped poly-Si (IBDP) film with a boron
concentration of 1021 cm"3 is used as the base electrode. The self-aligned emitter and the
metal base electrode were kept 0.1 \xm apart by using a Si3N4 side-spacer, and the width of
the link base window was 0.2 jam. The sheet resistance of the base electrode was only 2 QJ
sq., which is about 1/50 that of a p+ poly-Si film of the same thickness, and the contact
resistance between the tungsten and the highly concentrated IBDP was 20 Sl\xm2, so the
base resistance was reduced.

IPDP Tungsten
Base _ \ Emitter / y Collector.

Fig. 8. A schematic cross-section of a transistor fabricated using SMI technoiogy. A self-aligned stacked
tungsten/in-situ boron-doped poly-Si (IBDP) film is used as the base electrode. The n* emitter was formed by
diffusion of phosphorus from in-situ phosphorus-doped poly-Si (IPDP).

The fabrication process flow is shown in Fig. 9. Formation of the n* buried layer (by Sb
diffusion), a 0.4-(j,m-thick epitaxial layer, the LOCOS, and the U-groove isolation were
followed by deposition of Si02, undoped poly-Si, 20-nm-thick Si0 2 , and IBDP films, which
were patterned for the emitter area. The p intrinsic base was formed by BF2+ implantation at
10 keV through 5-nm-thick oxide and annealing at 850°C for 10 min. The p+ link base was
simultaneously formed by diffusion from the IBDP (Fig. 9(a)). Following that, a 100-nm-
thick Si3N4 side-spacer to isolate the emitter from the base was formed. The n+ emitter was
formed by diffusion of phosphorus from the 120-nm-thick in-situ phosphorus-doped poly-
Si (IPDP) at 950°C for 30 s. The base width was 90 nm and the emitter depth was 20 nm
(Fig. 9(b)). The small thermal budget after IBDP deposition enabled the link base to have a
junction depth shallower than that of the intrinsic base, thus it reduced collector capaci-
Self-Aligned Si BJT/SiGe HBT Technology and Its Application 85

tance. After the emitter poly-Si was covered with Si0 2 , the undoped poly-Si was etched off
by isotropic dry etching (Fig. 9(c)). Then, 150-nm-thick tungsten was selectively deposited
onto the IBDP to refill the space under the emitter overhang (Fig. 9(d)). At this point, the
self-aligned stacked metal/IBDP base electrode was fabricated. Finally, metallization of
three layers, using blanket W/TiN for the first and aluminum for the second and third metal
layers, was carried out. SMI technology, since only emitter drive-in annealing is required
after base formation, can be easily combined with other shallow base formation processes;
for example, those using various methods of diffusion or thin Si and/or SiGe epitaxial growth.

(b) (d)

Fig. 9. Fabrication process flow of a transistor with SMI electrode: (a) after the p intrinsic base and the p* link
base was formed, (b) the n* emitter was formed, (c) the undoped poly-Si was etched off, and (d) tungsten was
selectively deposited onto the IBDP to refill the space under the emitter overhang.

To investigate the usefulness of SMI technology and the importance of lower base resis-
tance, three types of base electrode were fabricated (Fig. 10).'° The base electrode was
constructed with only an IBDP layer in sample (a), with a partially stacked metal/IBDP (P-
SMI) layer in sample (b), and with a fully stacked metal/IBDP (F-SMI) layer in sample (c).
The sheet resistance of the 200-nm-thick IBDP, which is used in conventional transistors, in
sample (a) was 100 lQ/sq., and that of the 50-nm-thick IBDP in samples (b) and (c) was 400
Q./sq. The metal on a 0.4-(j.m-wide IBDP section was not intentionally stacked in the P-
SMI sample (b), so its base resistance could be compared with that of the F-SMI sample (c).

(a) (b) (c)

Fig. 10. Three types of base electrode: (a) only IBDP, (b) partially stacked metal/IBDP (P-SMI), (c) fully
stacked metal/IBDP (F-SMI).
86 K. Washio

During the thermal oxidation to create a 5-nm-thick Si0 2 layer before the intrinsic base
implantation, the Si0 2 on the IBDP became thicker at the periphery of the emitter window
than it did farther away from the emitter. Thus, the P-SMI base electrode was formed by
using the brief HF-dip etching to remove the thinner area of Si0 2 while leaving the thicker
area.

3.2.2. Transistor performance


Base resistance and maximum oscillation frequency flmx as a function of collector current
for a 0.2 x 2 u,m effective emitter area at a 2-V collector-to-emitter bias voltage are shown
in Fig. 11 for the three types of base electrodes. Most other transistor characteristics -
emitter resistance rE of about 25 Q., collector-base capacitance CjC of about 4 fF, emitter-
base capacitance CjE of about 3 fF, substrate capacitance of about 4 fF (capacitances were
measured at zero bias), and maximum cutoff frequency fT of about 50 GHz - were nearly
the same, as shown in Table 1, because both the intrinsic vertical profiles with the ion-
implanted base and the transistor structures were identical except for the base electrode.
Moreover, there was no excess base current observed in the SMI base electrode transistors
compared with an IBDP base electrode transistor. This indicates that the metal on the IBDP
was far enough away to prevent an electron current flowing into the base. The collector
resistance rc for the P-SMI and F-SMI base electrode samples was about half that of the
IBDP base electrode sample. This is because the tungsten film was deposited on the collec-
tor poly-Si pad simultaneously with the tungsten film on the base IBDP in the SMI transis-
tors (Fig. 8) and the tungsten/poly-Si contact area was much larger than the contact hole
area.
500 100 1 I III..., 1 11
A E = 0 . 2 x 2 |jm
- AE = 0.2 x 2 pm "?-SM'I
2-400 i^PySMI
f£t^Dp:
CD
o
Jj 300 i
to (3 '• / >

S 200 - / / P-SMI
CD
V)
CO 100

0 10
0.01 0.1 1 0.01 0.1 1 10
collector current (mA) collector current (mA)
Fig. 11. Dependence of base resistance and maximum oscillation frequency fmax on the collector current for a
0.2 x 2 |im effective emitter area at a 2-V collector-to-emitter bias voltage.

Table 1. Typical transistor characteristics with a 0.2 x 2 |im effective emitter area.
IBDP P-SMI F-SMI
I"t (£2) 42 25 24
re (£2) 26 26 27
Th (Q) 310 360 120
CiC (ff) 3.6 4.0 4.9
C,B (tF) 3.1 3.2 3.4
Cs OF) 4.8 4.1 3.8
f-r (GHz) 52 47 50
r , (GHz) 56 47 73
Self-Aligned Si BJT/SiGe HBT Technology and Its Application 87

The base resistance obtained by ^-parameter measurement using a network analyzer,


was 310 Q. for the IBDP base electrode sample, 360 Q for the P-SMI base electrode sample,
and 120 Q for the F-SMI base electrode sample, at the peak/ r of a collector current of about
1 mA. The base resistance of the P-SMI base electrode sample was higher than that of the
conventional IBDP base electrode sample because of the higher sheet resistance (400 Q/
sq.) of the thin IBDP film on which there was no stacked tungsten. This indicates that the
stacked metal/IBDP base electrode is only effective if a metal film is self-alignedly formed
on the IBDP film (in our case by using selective CVD). If this is done, SMI technology is
clearly extremely effective means of reducing base resistance.
The maximum oscillation frequency/,,,,., was close to inversely proportional to the square
of the base resistance because all samples had nearly the same cutoff frequency and collec-
tor capacitance. The peak/„,„ was 56 GHz for the IBDP base electrode sample, 47 GHz for
the P-SMI base electrode sample, and 73 GHz for the F-SMI base electrode sample, at a
collector current of 0.7 mA to 1.2 mA. The h\ghfmax in the F-SMI base electrode sample
was due to the reduction of the base resistance while keeping the collector capacitance low.
Consequently, in the F-SMI base electrode transistor, in comparison to the IBDP base elec-
trode transistor, the base resistance was reduced to about 40% and the maximum oscillation
frequency was increased by about 30%. Therefore, SMI technology is very suitable for
high-frequency analog circuits.

3.2.3. Circuit performance


The dependence of the gate delay time on the switching current measured in 51-stage dif-
ferential ECL ring oscillators with a fan-in and a fan-out of 1 for a 0.2 x 2 \xm effective
emitter area at a single-ended voltage swing of 250 mV and a supply voltage of 3.2 V is
shown in Fig. 12. The effect of lower base resistance was most noticeable in the high
switching current region, because the product of the base resistance and diffusion capaci-
tance was dominant. Ultra-high-speed operation with a 12-ps minimum gate delay time
was observed at a switching current of 0.94 mA in the fully stacked metal/IBDP base elec-
trode sample, compared with a gate delay time of 14.3 ps in the IBDP base electrode sample
and 15.6 ps in the partially stacked metal/IBDP base electrode sample. This clearly indi-
cates that the reduction of base resistance when using SMI technology also makes it very

30
"oT
Q.
<D
.1 20
>,
J?
CD
•D
S
o>
10
0.2 0.5 i 3
switching current (mA)
Fig. 12. Dependence of the gate delay time on the switching current measured in 51-stage differential ECL ring
oscillators with a fan-in and a fan-out of 1 for a 0.2 x 2 ftm effective emitter area at a voltage swing of 250 mV
and a supply voltage of 3.2 V.
88 K. Washio

suitable for high-speed digital circuits.


To demonstrate the high-speed characteristics of an SMI transistor when applied to digi-
tal circuits, two 1/8 frequency dividers — one dynamic and the other static — were mea-
sured. The first stage of the dynamic frequency divider consisted of a dynamic T-flip-flop
(D-TFF). The D-TFF circuit was based on the regenerative frequency division principle.
The internal single-ended voltage swing for the D-TFF was 500 mV, which was optimized
taking into account the gain cutoff frequency of the Gilbert multiplier. The other circuits -
the master-slave TFFs (MS-TFFs) of the second and third stages, the internal buffers, and
the input/output buffers - were the same as in the static divider. The internal single-ended
voltage swing of 250 mV for the MS-TFF was selected by calculating the gate delay time of
a single flip-flop biased to operate as an inverter. To increase the operating speed, two
emitter followers were used in each stage after the flip-flops and the internal buffers. Each
emitter size of the emitter follower transistors was optimized to reduce the loading of the
flip-flops. The internal buffers with the single-ended voltage swing of 250 mV were con-
ventional differential amplifiers. They were used to reform the output signal of the flip-
flops and for the lowpass filter. The input buffer consisted of three emitter followers with
50-S2 matching resistor on the chip. The output buffer consisted of a differential amplifier
with 100-C2 output resistor.
The single-ended input signal and the output waveforms of the dynamic and static 1/8
frequency dividers, measured on-wafer with a microwave probing station, are shown in
Fig. 13. Maximum operating frequencies as high as 45.2 GHz for the dynamic divider and
as high as 28 GHz for the static divider were observed. The lower limit of the operating
frequency was 14 GHz, about one-third of the maximum operating frequency, for the dy-
namic divider, as predicted from the underlying principle. The minimum input power of the
static divider was less than 10 dBm up to 20 GHz. The power consumption of the D-TFF
and each MS-TFF was 46 mW and 90 mW, respectively, at a supply voltage of 5 V. The
SMI technology produces a shallow link base which provides low collector capacitance
with low base resistance, so both high-speed and low-power performance is achieved, even
when using a conventional ion-implanted base.

'in = 45.2 GHz | f i n = 28GHz

Input
i Mi! ; ,
m
; 100 mV/div.
i§mWAAAAft/ 200 mV/div.

fou.= 5.65 GHr 'out = 3.5CSHz

A ,\ / / r
\ 1\ 2l _ \.
r \
output f
\
J \
316mV/div.
7 ^ ,j
150 mV/div.
V
100ps/div.
(a) (b)

Fig. 13. The single-ended input signal and the output waveforms of the dynamic and static 1/8 frequency
dividers. The 45.2-GHz input and 5.65-GHz output signals are for the dynamic frequency divider (a), and the
28-GHz input and 3.5-GHz output signals are for the static frequency divider (b).
Self-Aligned Si BJT/SiGe HBT Technology and Its Application 89

4. SiGe HBT — Feasibility Study for 40-Gb/s Optical-Fiber-Links


A self-aligned selective-epitaxial-growth (SEG) SiGe-base HBT with self-aligned stacked
metal/IDP (SMI) electrodes has been developed, and its ultra-high-speed performance was
investigated. In this section, selective epitaxial growth of a Si/SiGe multilayer by UHV/
CVD, the fabrication process and characteristics of a self-aligned SEG SiGe HBT, and its
application to various ICs for optical-fiber-link systems are described.

4.1. Selective epitaxial growth of the SiGe layer and potential of the SiGe HBT
The Si/SiGe multilayer was formed by selective epitaxial growth using a UHV/CVD sys-
tem. In this section, the epitaxial growth equipment and the experimental conditions and
characteristics of a SiGe HBT with a conventional double poly-Si structure are described.

4.1.1. UHV/CVD system and experimental conditions


The UHV/CVD system for selective epitaxial growth of the Si and SiGe layers is shown in
Fig. 14. In this system, the reaction chamber was exhausted by a turbo-molecular pump
(TMP) to reduce the partial pressure of contaminant gases, such as H 2 0 and oxygen. The
base pressure of the chamber was about lxlO 6 Pa. A load-lock chamber was used to pre-
vent the reaction chamber from being exposed to the air. The wafer was set on the susceptor
and heated by an RF induction coil. To prevent deposition of Si on the chamber and con-
tamination of the metals, the temperature of the stainless-steel wall was kept at 5°C by
circulating the thermal oil. The bypass line and conductance valve were used for high-
pressure H2 pre-cleaning.
reaction chamber

Si2H6 GeH4/H2 B2H6/H2

Fig. 14. UHV/CVD system for selective epitaxial growth of Si and SiGe layers.

The wafer surface was terminated by hydrogen by dipping in an HF solution. However,


that termination was not perfect, so high-pressure H2 pre-cleaning for the low-temperature
epitaxial growth was also used. Contaminants such as oxygen and carbon on the wafer
surface were completely removed after the high-pressure H2 pre-cleaning.24
The selective epitaxial growth was done using only Si2H6 and GeH4 diluted with H2. The
doping source was B2H6 diluted with H2. The temperature for the epitaxial growth of Si and
SiGe was 600 and 575°C, respectively. The growth rate of Si and SiGe was 2.9 nm/min and
90 K. Washio

7.8 nm/min» respectively.25 An SEM bird's eye view of selective epitaxial growth of SiGe is
shown in Fig. 15. SiGe was grown on the Si surface selectively with respect to only Si02,
and a poly-SiGe layer was grown on Si3N4. This was due to a difference in the incubation
times before starting growth on Si02 and on Si3N4. Therefore, to obtain selective growth of
the SiGe layer, the whole surface except for the epitaxial growth area should be covered
with Si02.

epi Sio.eGeo.i
on Si-sub
(80 nm)
Fig. 15. An SEM bird's-eye view of selective epitaxial growth of SiGe. The SiGe epitaxial layer was grown
selectively with respect to only Si02.

4.1.2. High-frequency characteristics in a graded-Ge'-profile SiGe HBT


The cutoff frequency fT as a function of the collector current Ic for a conventional double
poly-Si structure SiGe HBT with a 25% graded Ge profile is shown in Fig. 16, A maximum
cutoff frequency of about 130 GHz was achieved at a collector current of 6 mA.14 As Fig.
16 also shows, the peak/f increased with an increasing maximum Ge content in the graded
Ge profile. The peak/r Ge content. This suggests that the acceleration of electrons by the
drift field in a base layer with a graded profile is an effective way to increase the cutoff
frequency. The base transit time can be reduced to 0.51 ps with a 25% graded Ge profile
because of the highly efficient electron injection from the emitter to the base and the high
drift field in the base due to the optimized SiGe-HBT profile.

140 130 GHz 130

120 x
N
• / ^ o 120
o ioo : / graded Ge " 30
H E
/ 20
80 110 1c
/ 25% graded 10 O

60 / AE = 0.35 x 3.55 pm Depth (nm)


/_ 1 8 „___
100
0.1 1 10 100 10 15 20 25 30
!c (mA) max. Ge content (%)
Fig. 16. Cutoff frequency/r as a function of collector current Ic for a conventional double poly-Si structure
SiGe HBT with a 25% graded Ge profile, and the peak/r dependence on maximum Ge content. A maximurn
cutoff frequency of about 130 GHz was achieved at a collector current of 6 mAfora 0.35 x 3,55 jun effective
emitter area.
Self Aligned Si BJT/SiGe HBT Technology and Its Application 91

4.2. Self-aligned SEG SiGe HBT


The fabrication process and device technologies used to make a high-speed self-aligned
selective-epitaxial-growth (SEG) SiGe-base HBT, and its potential application in the ICs of
optical-fiber-link systems are described in this section. A SiGe base self-aligned to the
emitter to reduce collector capacitance was selectively grown by using a UHV/CVD sys-
tem. A self-aligned stacked metal/in-situ doped poly-Si electrode technology enabled low
parasitic resistance, and allowed the intrinsic base profile to be kept shallow, so it is well-
suited to a SiGe-base HBT. A wide insulator refilled trench was introduced to reduce sub-
strate capacitance.

4.2.1. Device structure and fabrication process


A schematic cross-sectional view of a self-aligned SEG SiGe-base HBT with SMI elec-
trodes is shown in Fig. 17.15 Three key features enable high-speed operation in this transis-
tor. First, to achieve a high cutoff frequency with low parasitic capacitance, a SiGe-base
self-aligned to the emitter was formed by SEG. A TEM cross-sectional view of the intrinsic
SEG Si/SiGe region is shown in Fig. 18. The impurity profile of the intrinsic region is
shown in Fig. 19. The SEG layer consisted of a 20-nm-thick Si cap, a 10-nm-thick dual-
graded Ge-praflle (from 0 to 10% over 5 nm and from 10 to 15% over 5 nm) Si^Ge*, a 40-
nm-thick Sio.g5Geo.15 layer, and a 10-nm-thick Ge-retrograded Sii„xGex layer. The measured

base emitter collector

Fig. 18. A TEM cross-section of the intrinsic region of the self-aligned SEG SiGe-base HBT. The emitter is
0.14 pm wide.
92 K. Washio

Depth (nm)
Fig. 19. Impurity and Ge profile of the intrinsic region. A dotted line shows die designed Ge profile.

Ge profile agrees with our intended Ge profile (the dotted line in Fig. 19). A 20-nm-thick
10" cm'3 boron-doped Si|.xGex layer was formed as the intrinsic base, which was 30 nm
wide. Double selective implantation of phosphorous in undoped SiGe and a 0.15-nm-thick
Si layer, increased the collector-doping level to about 1018 cm"3. The shallow emitter junc-
tion (20 nm deep) was formed by diffusion from the IPDP into the Si cap at 900°C for 30 s.
The self-aligned SiGe base structure was fabricated as follows (Fig. 20). After the n+
buried layer was formed by Sb diffusion, a 0.15-^m epitaxial layer and trench and wedge-
shaped isolations were formed.26 Seven films of thick CVD-Si0 2 , undoped poly-Si, thin
Si0 2 , IBDP, Si3N4, thin poly-Si, and thin Si0 2 were then deposited. First, the upper four
films for the emitter area were patterned and an n collector with a phosphorous concentra-
tion of 7xl0 17 cm'3 was formed. Following that, a 80-nm-thick CVD-Si0 2 side-spacer to
isolate the emitter from the base was formed. After the lower three films were etched off,
the wafer was inserted in a UHV/CVD chamber and contamination on the Si surface was
removed by H2 cleaning at a partial H2 pressure of 1300 Pa at 850°C. The 0.54-|j,m-wide
SiGe base and the Si-cap multilayer self-aligned to the 0.14-ujn-wide emitter were selec-
tively grown by using a UHV/CVD system with Si2H6, GeH4, and B2H6 source gases at
575°C for the SiGe and 600°C for the Si. The whole surface except the base area was
covered with Si0 2 because SiGe was grown selectively with respect to only Si0 2 . The
poly-Si/SiGe base contact simultaneously formed with the Si/SiGe intrinsic base was grown
on the buffer poly-Si and beneath the IBDP. The intrinsic selective-epitaxial-growth Si/
SiGe base was appropriately connected with the extrinsic IBDP base electrode, and this
self-aligned structure effectively reduced collector capacitance. The buffer poly-Si between
the IBDP and the single Si surface provided a good link between the intrinsic and extrinsic
bases. This is called a poly-Si-assisted self-aligned SEG (PASS) structure.
To reduce the parasitic resistances of the base, emitter, and collector, tungsten films were
selectively stacked in a self-aligned manner on IBDP as the base electrode and on IPDP as
the emitter and collector electrodes. Especially in the case of the narrow emitter, metal film
deposited at the bottom of the emitter poly-Si (120-nm-thick and 4xl020-cm~3 phosphorous-
doped) was very effective in reducing the emitter resistance. The sheet resistance of the
base electrode, constructed with a tungsten/ IBDP film, was 2 £2/sq., which is about 1/50
Self-Aligned Si BJT/SiGe HBT Technology and Its Application 93

that of a p+ poly-Si film of the same thickness. The contact resistance between the tungsten
and IBDP was 20 fiixm2. The base resistance was therefore effectively reduced. The self-
aligned emitter and metal base electrode were kept 0.07 p.m apart by using a Si3N4 and Si0 2
side-spacer.
Finally, a 2-|j.m-wide and 4-|am-deep BPSG/Si0 2 refilled trench was introduced to re-
duce the substrate capacitance. Si0 2 was laid down on the sidewalls, and the remainder of
the trench was filled with BPSG by annealing reflow and wet-etching. The low dielectric
constant of BPSG/Si0 2 and the wide trench reduced the substrate capacitance, especially
that of the sidewall element. A conventional 0.6-(xm-wide poly-Si/Si02 refilled trench had
a substrate capacitance of 3.8 fF at zero bias and 3.3 fF at a reverse bias. The substrate
capacitance of the BPSG/Si02 refilled trench was 1.5 fF at zero bias, with a minimum of 0.6
fF. The sidewall capacitance was 0.027 fF/[im, only about 13% of the conventional value.
Substrate capacitance is generally connected at the output node in bipolar circuits. Reduc-
ing this capacitance was therefore an effective way to achieve high-speed operation.

Si02

poly-Si SiOa Si3N4


/poly-Si
IBDP /Z.SJQ2
m
14-f Si ° 2
I Si3N4

S S
Si/SiGe

^
-A-—
b,g
Fig. 20. Process flow to fabricate the self-aligned SEG SiGe HBT. The SiGe layer was grown selectively with
respect to only Si0 2 .

4.2.2. Transistor performance


The transistors with an emitter area of 0.14 x 1.5 |^m exhibited good I-V performance, as
shown by the Gummel plot and the IC-VC£ characteristics in Fig. 21. A high current gain of
720 with a base-recombination current below 100 pA was obtained. This indicates there
were no defects created in, or relaxation of, the strained Si/SiGe multilayer during the ther-
mal cycle after low-temperature epitaxial growth. (This can also be confirmed by inspec-
tion of the TEM cross-sectional view in Fig. 18.) The high Early voltage VA, more than 100
V at a collector current of 1 mA, indicates that the collector current was determined by the
94 K. Washio

drift field created by bandgap grading. The observed negative resistance at a higher collec-
tor current was due to self-heating, because current gain decreases with increased tempera-
ture, as can be seen in the typical characteristics of a true emitter/base heterojunction.

10!

10-

10-"

<

^ io •"

io"

10'12

10" —
0.4 0.6 0.8 1 1.2

VBE (V)

Fig. 21. Gummel plot and Ic-VCr; characteristics for a typical transistor with an emitter area of 0.14 x 1.5 u.m.
The cutoff frequency/, and maximum oscillation frequency flmx of these transistors were
95 GHz and 97 GHz at a collector-to-emitter bias voltage of 2 V and a collector current of
2 mA, respectively (Fig. 22). These attractive high-frequency characteristics were attrib-
uted to the well-balanced transistor characteristics - a fast forward-transit time of 1.18 ps
and low collector capacitance of 3.6 fF - enabled by using the SEG SiGe base. The fre-
quency dependence of the magnitude of h2I and unilateral gain U at VCE of 2 V and lc of 2
mA, obtained from high-frequency s-parameter measurements up to 110 GHz, are shown in
Fig. 23. Based on the deviation from the -20 dB/dec dependence on frequency of U above
40 G\iz,fmax was estimated by extrapolating U in the frequency range below 40 GHz. Typi-
cal characteristics of a transistor with an emitter area of 0.14 x 1.5 |a.m are listed in Table 2.
The emitter resistance was 50 Q. despite the narrow emitter (0.14 ^m wide), and this low
resistance was attributed to the stacked tungsten/IDP emitter electrode.

120

100

§ 80

E
* 40
H
20

0
0.1 1 10
Collector Current (mA)
Fig. 22. Cutoff frequency (JY) and maximum oscillation frequency (/,',„,,) of the transistors with an emitter area of
0.14 x 1.5 u,m as a function of collector current.
Self-Aligned Si BJT/SiGe HBT Technology and Its Application 95

40
* E = 0 14 x 1.5 jm

35
\
- \\J"21I ic 2 mA VCE = 2 V

30
• ^

S 25
U * \
2.
- -
2. 20

£. 15

10
: -
JjfaV

Frequency (Hz)

Fig. 23. Frequency dependence of the magnitude of h2i and unilateral gain U at VCE of 2 V and Ic of 2 mA from
high-frequency s-parameter measurements up to 110 GHz.

Table 2. Typical characteristics of a transistor with an emitter area of 0.14 x 1.5 urn.

BV„ 2.0 V so a 3.6 fF


>100 V 210 Q

The dependence of the gate delay time on the switching current, measured in 45-stage
differential ECL ring oscillators with a fan-in and a fan-out of 1 at a single-ended logic-
swing voltage of 250 mV and a supply voltage of 3.5 V, is shown in Fig. 24. The measured
minimum gate-delay time for the transistors was 8.0 ps at a switching current of 1.5 mA. A
faster gate-delay time of 7.7 ps has been achieved in a CML (current-mode-logic) circuit.26
The ultra-fast performance of the ECL gate is derived from the fully-self-aligned SEG SiGe
base structure with its fast forward-transit time and low collector capacitance, the low para-
sitic resistance SMI electrodes, and the low substrate capacitance BPSG/Si02-refilled trench.

100

50 9 . 5 fJ >

1
1
ra
1 0 - differential E C L

A E = 0.14x 1.5 urn
O V L s 250 mV (single-ended)
UJ

0.01 0.1 1 10
switching current (mA)
Fig. 24. Dependence of the gate delay time on the switching current measured in 45-stage differential ECL ring
oscillators with a fan-in and a fan-out of l at a single-ended logic swing voltage of 250 mV and a supply voltage
of 3.5 V for transistors with an emitter area of 0.14 x 1.5 u,m.
96 K. Washio

43. Circuit performance of a 40-Gh/s transmitter and receiver chipset


As applications of these SiGe HBTs, ICs for optical-fiber-link systems have been devel-
oped. * A block diagram of a transmitter and a receiver for an optical-fiber-link communi-
cation system is shown in Fig. 25. The developed IC chipset is shown as the shaded blocks.
They include a time-division multiplexer (MUX) in the transmitter, and a preamplifier, an
automatic-gain-control (AGC) amplifier, a decision circuit, a demultiplexer (DEMUX), and
a frequency divider in the receiver.

TRANSMITTER Optical Fiber

40 Gb/

RECEIVER
Fig. 25. Block diagram of a transmitter and a receiver for an. optical-fiber-link communication system.

4.3.1. Frequency divider


A block diagram, a circuit photomicrograph of the master-slave T-type flip-flop and an
internal buffer, and single-ended 50-GHz input and the output waveforms (measured on-
wafer) of a 1/8 static frequency divider are shown in Fig. 26. The MS-TFF with the internal
buffer occupied 150 x 60 pin. It was laid out as symmetrically as possible. The input buffer
consisted of three emitter followers with a 50-ft matching resistor on the chip. The output
buffer consisted of a differential amplifier with a 100-Q output resistor. The emitter length
of an upper four-quadrant switching transistor and a lower switching transistor was 1.5 pm

Input MS Inter. Output


Buffer TFF Buffer Buffer

100 ps/div.

Fig. 26. Block diagram, a circuit photomicrograph of the master-slave T-type flip-flop and an internal buffer,
and single-ended 50-GHz input and the output waveforms (measured on-wafer) of a 1/8 static frequency divider
at a supply voltage of -5.5 V. The MS-TFF with the internal buffer occupies 150 x 60 y,m.
Self Aligned Si BJT/SiGe HBT Technology and Its Application 97

in the MS-TFF. The internal single-ended voltage swing for the MS-TFF was optimized to
250 mV by calculating the gate delay time of a single flip-flop biased to operate as an
inverter. To obtain higher operating speed, two emitter followers were used in each stage
after the flip-flops. The emitter lengths were optimized to 2.5 and to 3.5 pm to reduce the
loading of the flip-flops. The internal buffers, with a single-ended voltage swing of 250
mV, were conventional differential amplifiers. They were used to reform the output signal
of the flip-flop and were also used for the lowpass filter. The power consumption of the
MS-TFF was 119 mW at a supply voltage of -5.5 V.

4.3.2. 2:1 MUX


A block diagram constructed with basic circuit core modules, the eye diagrams of 20-Gb/s
input and 40-Gb/s outputs from the selector and the DFF, and a chip photomicrograph of a
2:1 MUX are shown in Fig. 27. The core circuits - a selector for multiplexing, the DFF for
retiming, an internal buffer, data/clock input with an on-chip 50-O matching resistor, and a
data output buffer - were designed in the same way as for the frequency divider. A mea-
sured data rate of 40 Gb/s for MUX with a DFF for retiming by a clock at 40 GHz was
obtained. The total power consumption was 870 mW at a supply voltage of -5 V on a 1.1 x
1.2-mmchip.

Clock Clock ?0ps/div.


(20 GHz) (40 GHz)
Fig. 27. Block diagram constructed with basic circuit core modules, the eye diagrams of 20-Gb/s input and 40-
Gb/s outputs from the selector and the DFF, and a chip photomicrograph of a 2; 1 MUX.

4.33. 1:2 DEMUX


A block diagram, two 20-Gb/s output eye diagrams, and a chip photomicrograph of a 1:2
DEMUX are shown in Fig. 28. This DEMUX was constructed with the modular approach
also used for the 2:1 MUX. A 2:1 MUX to generate 40-Gb/s data was provided on the chip,
because the maximum input data rate in the measurement setup was only 20 Gb/s. The core
DFF circuits were the same as in the MUX. The total power consumption was 1 W at a
supply voltage of -5 V on a 1.1 x 1.35-mm chip.

4.3.4. Preamplifier
A circuit diagram, a chip photomicrographs and the measured frequency response of the
transimpedance of a preamplifier IC are shown in Fig. 29. To improve the preamplifier
bandwidth , a common base transistor QCB in front of the transimpedance amplifier was
introduced because the large time constant caused by parasitic capacitance CPD of the pho-
98 K. Washio

todiode at the input node is the dominant factor in the frequency response. QCB separated
photodiode capacitance CPD from the transimpedance amplifier and reduced the parasitic
capacitance to Cjc. At the same time, QCB provided lower and stable impedance at the input
node. This improved the bandwidth of the transimpedance amplifier. The SiGe base HBT
provided very low collector capacitance, therefore the common base transistor was an ef-
fective way to widen the bandwidth. From the measured frequency response of the
transimpedance, a 24-GHz bandwidth was obtained in a conventional preamplifier. On the
other hand, a preamplifier with a common base transistor achieved a transimpedance of
48.7 dBQ and a bandwidth of 35 GHz. The bandwidth was improved by 40% compared to
that of a conventional preamplifier. The chip area was 0.74 x 0.74 mm, supply voltages
were 8 V and -5V, and the power dissipation was 0.27 W.

Fig. 29. Circuit diagram, a cMp photomicrograph, and measuredfrequencyresponse of the transimpedance of a
preamplifier 1C.

4.3.5. AGC amplifier


A block diagram, a circuit, a chip photomicrograph, and the measured frequency response
of gain of an AGC amplifier core IC are shown in Fig. 30. The AGC amplifier core con-
sisted of an input buffer, two variable gain stages A1 and A2, a constant gain amplifier A3,
and an output buffer. The second variable gain stage A2 was composed of an upper four-
quadrant multiplier used for signal amplification, a lower multiplier used for gain control, a
transimpedance amplifier as a load circuit, and an emitter follower. To achieve a wide
bandwidth of over 30 GHz? a transimpedance amplifier was introduced as an active load
circuit in every amplifier stage. To suppress the dependence of bandwidth on gain, a peak-
Self-Aligned Si BJT/SiGe HBT Technology and Its Application 99

ing capacitor CP in a transistor pair in the upper multiplier was used because the -transistor
pair is the dominant factor in the frequency response at the minimum gain. The designed
peaking capacitance was 30 fF to enable a wide bandwidth with little deviation for stable
operation over a wide dynamic range of gain. The frequency response had a wide band-
width from 31.6 GHz to 32.7 GHz with a flat frequency response within a range of gain
from -6 dB to 13 dB. The chip size was 0,95 x 1.08 "mm, supply voltage was -7.5 V, and the
power dissipation was 0.73 W.

,OUT

Transimpedance
Amplifier

Schematic of A2
Fig. 30. Block diagram, circuit, a chip photomicrograph, and measured frequency response of gain of an AGC
amplifier core IC.

4.3.6. Decision circuit


A block diagram, an eye diagram at 40 Gb/s, and a chip photomicrograph of a decision
circuit are shown in Fig. 31. To generate a bit stream at a data rate of 40 Gb/s, a 2:1 MUX
was introduced in front of the decision circuit. The decision circuit consisted of a preampli-
fier, a master-slave DFF, a post amplifier, and a clock input stage. To realize both high-
speed operation and high sensitivity, a wide-bandwidth amplifier was used as a preampli-
fier. A well-opened eye diagram of a 350-mVpp swing at a data rate of 40 Gb/s was. ob-
tained. The chip area was 1.14 x 1.08 mm, supply voltages was -6.5 V» and the power
dissipation was 1.1 W.

2:1 MUX Decision Circuit

Clock 2
(40 GHz)
Fig. 31. Block diagram, eye diagram at 40 Gb/s, and a chip photomicrograph of the decision circuit.

4.3.7. Summary of IC performance


The performance of the test IC chipset for 40-Gb/s optical-fiber-link systems fabricated by
using SiGe HBTs to investigate their feasibility is shown in Table 3. These excellent results
100 K. Washio

indicate that self-aligned SEG SiGe-base HBT technology, which offers high reliability and
cost-effectiveness, will play an important role in future optical-fiber-link systems operating
at a data rate of 40 Gb/s for global communication applications.

Table 3. Performance of the test IC chipset for 40-Gb/s optical-fiber-link systems fabricated by using SiGe
HBTs.
Circuit Max. Speed / Bandwidth Remarks
Multiplexer 40 Gb/s 2:1, 40 GHz DFF
Preamplifier 35.1 GHz ZT = 48-7 dBii
AGC Amplifier 31.6-32.7 GHz Dynamic Range = 19 dB
Decision Circuit 40 Gb/s
Demultiplexer 40 Gb/s 1:2, 40 GHZ DFF
Static Frequency
50 GHz
Divider

5. SiGe HBT — in Combination with CMOS


For future optical data communication systems and microwave/millimeter-wave systems,
both high-speed operation and more sophisticated functions are required. Therefore, ultra-
high-speed transistors fully compatible with the CMOS process will be essential. SiGe
HBTs with a below- 10-ps ECL gate delay and a cutoff frequency of about 100 GHz (de-
scribed in Section 4) are the most promising candidates to meet these requirements, be-
cause they can be fabricated by a well established Si process compatible with CMOS.
In this section, a 0.2-|a.m self-aligned SEG SiGe HBT, with shallow-trench and dual-
deep-trench isolations and Ti-salicide electrodes, is described.17,18 The fabrication process,
except the SEG, is almost completely compatible with the 0.2-|a,m bipolar-CMOS technol-
ogy that is applied to a fast-cache memory chip and the SiGe HBTs are fabricated on a 200-
mm wafer line. The SiGe HBTs exhibit a peak cutoff frequency of 122 GHz, a peak maxi-
mum oscillation frequency of 163 GHz, and an ECL gate delay time of 5.5 ps. Four-level
interconnects, including MIM capacitors, are formed by chemical mechanical polishing
(CMP).

5.1. Device structure and fabrication process


An SEM cross-sectional view of a 0.2-fxm self-aligned SEG SiGe HBT is shown in Fig. 32.
An enlarged active region, the key part of the SiGe HBT, is also shown. The 0.6-pim-wide
Si-cap/SiGe-base multilayer self-aligned to the 0.2-|xm-wide emitter was selectively grown
by UHV/CVD. To provide a good link between the intrinsic and extrinsic bases, a poly-Si-
assisted self-aligned SEG (PASS) structure was applied.15 In the PASS structure, a poly-
SiGe base contact simultaneously formed with the SiGe intrinsic base was grown around
the buffer poly-Si and beneath the base poly-Si. This self-aligned active-region structure
enabled both low collector capacitance and low base resistance. Furthermore, to reduce the
parasitic capacitances of the collector and substrate, respectively, shallow-trench (0.4 (J,m
deep) and dual 0.6-}xm-wide deep-trench (3 \x.m deep) isolations were used. To reduce the
parasitic resistance of all electrodes, Ti-salicide layers, with a sheet resistance of 3 Q/sq.
and contact resistance of about 25 Q.\im2, were formed.
Self Aligned Si BJT/SiGe HBT Technology and Its Application 101

Fig. 32. SEM cross-sectional view of a 0.2-jiin self-aligned SEG SiGe HBT with an enlarged active region the
key part, of the SiGe HBT. A 0.6-|Am-wide Si-cap/SiGe-base multilayer self-aligned to a 0.2-|4m-wide emitter
was selectively grown by UHV/CVD. A poly-Si-assisted self-aligned SEG (PASS) structure, shallow-trench
and dual-deep-trench isolations, and Ti-salicide electrodes were applied.

The process steps for fabricating the self-aligned SEG SiGe HBT were as follows (Fig.
33). An n+ buried layer (BL) was formed by ion implantation and diffusion of Sb» then a
0.3-|Am-thick epitaxial layer was formed. Next, the shallow- and dual-deep-trench isola-
tions were filled up with Si02 then planarized by CMP. Next, the intrinsic region was
covered by four deposited films consisting of poly-Si, Si3N4, poly-Si, and Si02.
The multilayer of Si3N4, poly-Si, and Si02, was used to form the PASS structure. After
a poly-Si film, which acted as a resistor, was deposited on the shallow-trench isolation in
the field region, a Si02 film was deposited to cover the resistor and a window in this film
was opened at the intrinsic region. An amorphous Si film for the base poly-Si and a Si02
film were then deposited and a window opened in the intrinsic region. ThefirstSIC (SIC1)
was then formed by phosphorus-ion implantation through the multilayer into the Si epi-
taxial layer. After that, the Si02 film was deposited and remained on the sidewall of the
window (Fig. 33(a)).

Fig. 33. Process steps for fabricating the self-aligned SEG SiGe HBT.
102 K. Washio

The Si3N4filmwas then deposited and also remained on the sidewall of the window. At
that time, thetop.filmof the multilayer, the Si3N4, was etched. After that, the remaining
multilayer of poly-Si and Si02 was selectively etched. The Si3N4 film on the sidewall was
then removed and the window in the topfilmof the multilayer, the Si3N4, enlarged by wet
side-etching. Next, the Si-cap/SiGe-base multilayer was selectively grown by UHV/CVD
using Si2H6, GeEi, and B2He source gases at 550°C for the SiGe layer and a Si2H6 source at
580°C for the Si layer. The SEG layer consisted of a 20-nm-thick Si cap, 10-nm-thick dual-
graded Ge-profile (graded from 0 to 10% and from 10 to 15%) Si^Ge,, 45-nm-thick
Sio.gsGeo.,5, and 10-nm-thick Ge-retrograded (from 15% to 0) Sij.xGex. A 15-nm-thick 2 x
10,9-cm"3 boron-doped Si,.xGex layer was formed as the intrinsic base in the SEG layer. The
poly-SiGe base contact was formed simultaneously with the intrinsic SEG, and this forma-
tion was assisted by the middle film of the multilayer, the poly-Si. Next, phosphorus ions
were implanted into a lower part of the selectively grown Si/SiGe layer to form the second
SIC (SIC2). Double selective implantation of phosphorous in the 0.3-p,m~thick Si epitaxial
layer and then in the undoped SiGe increased the collector-doping level to about 7 x 1017
cm'3 (Fig. 33(b)).
Thin Si02 and in-situ phosphorus-doped poly-Si (IDP) layers were deposited and the
IDP film remained on the sidewall. After an emitter window was formed, a second IDP
layer was deposited. A shallow emitter with a junction depth of about 20 nm was formed by
diffusion from the IDP layers into the Si cap at 900°C for 30 s (Fig. 33(c)). Next, Ti-salicide
layers on all electrodes of the emitter, base, collector, and contact of the poly-Si resistor
were formed simultaneously in a self-aligned manner. The SiGe HBT was fabricated on a
200-mm wafer line and the process (except the SEG) to fabricate the SiGe HBTs was al-
most the same as the 0.2-fxm bipolar-CMOS process, so the process is completely compat-
ible with the BiCMOS technology.
A cross-sectional SEM view of a four-level metal layer structure with an MIM capacitor
is shown in Fig. 34. All the metal layers were multilayers of Al, Ti, and TiN. CMP was used
to planarize the W of the 0.5-pm-wide contact plugs and the 0.6 x 0.6 pm via holes, and to
planarize the plasma-Si02 interlayer insulators. A concave MIM capacitor with capaci-
tance of 0.7 fF/fim2 was formed between thefirst-and second-level metals by using 50-nm-
thick plasma Si02 as an insulator.

Fig. 34. Cross-sectional SEM view of a four-level metal layer structure with an MIM capacitor. The MIM
capacitor was formed between the first- and second-level metals.
Self-Aligned Si BJT/SiGe HBT Technology and Its Application 103

The process steps for fabricating the MIM capacitor were as follows (Fig. 35). After the
first-level metal was formed and the first interlayer insulator planarized, a W plug was
formed on the first-level metal. Then, through etching of the first interlayer insulator, a
window reaching the first-level metal was opened. Next, a two-layer film of 80-nm-thick
TiN and plasma Si0 2 was deposited then removed, except for that in the area covering the
capacitor window. At that time, the surface of the W plug was exposed (Fig. 35(a)). The
second-level metal was then deposited and patterned (Fig. 35(b)). Concave MIM capaci-
tors with corners tend to have low capacitance and are not considered the optimal shape
compared with planar MIMs. However, by optimizing the dry-etching conditions to open
the window, nearly the same capacitance and a sufficiently high breakdown voltage were
achieved. This concave MIM, unlike planar MIMs, doesn't require vias, thus the series
resistance of the top plate (the second metal) is low and the Q value is high. Therefore, this
concave MIM is preferable for applications such as resonators and tank circuits.
W plug ,TiN
/Sl02
| ' ^/1st metal
(a) i ' i

D
(b) I
Fig. 35. Process steps for fabricating the MIM capacitor.

5.2. Transistor characteristics


Typical transistor characteristics of a SiGe HBT with an emitter area of 0.2 x 2 |im are
summarized in Table 4. The SiGe HBT exhibited good /- V performance with a high current
gain of 1400. The base-recombination current was below 10 pA and the HBT yield, mea-
sured from 4000 parallel-connected transistors, was more than 99.9993%. The collector-
to-emitter breakdown voltage with series resistance of a few hundred ohm connected at the
base (BVCER), which is normal for high-speed digital circuits, was fairly high at 3 V. The
Early voltage was about 50 V at a collector current of 1 mA, and the negative resistance at
a higher collector current was caused by self-heating of the HBTs. The low base resistance

Table 4. Typical transistor characteristics of a SiGe HBT with an emitter area of 0.2 x 2 |xm.

*E 0.2x2 Mm
h
FE 1400

BVCER 3.0 V
BV
CBO 6.3 V

RB 90 a
c
l<= 3.6 IF

C.ub 1.8 IF

»T 122 GHz

'max 163 GHz


ECLtp,, 5.5 P»
104 K. Washio

(90 Q.), in spite of the short emitter length (2 |xm), can be attributed to the highly doped
SiGe base and Ti-salicide base electrode. The low collector capacitance of 3.6 fF and low
substrate capacitance of 1.8 fF were attributed, respectively, to the SiGe HBT's PASS struc-
ture and shallow- and deep-trench isolations. The peak cutoff frequencies and maximum
oscillation frequencies of two SiGe HBTs were 122 and 157 GHz and 122 and 163 GHz,
respectively, for emitter areas of 0.2 x 1 and 0.2 x 2 LUTI (Fig. 36). The/„,„ of 163 GHz can
be attributed to the high cutoff frequency arising from the shallow SiGe base, low base
resistance, and low collector capacitance. The/ 7 and/„„ were measured from the frequency
dependence of the magnitude of h2i and unilateral gain, respectively, derived from .s-param-
eter measurements up to 110 GHz. The dependence of the gate delay time on the switching
current in 53-stage differential ECL ring oscillators for a single-ended voltage swing of 250
mV is shown in Fig. 37. An ECL gate delay time of 5.5 ps was measured at a switching
current of 2 mA. The ultra-fast performance of the ECL gate was attributed to the low
collector capacitance and high/,#„,„ of the fully-self-aligned SiGe HBT structure, the low
parasitic resistance of the Ti-salicide electrodes, and the low substrate capacitance enabled
by the shallow-trench and dual-deep-trench isolations. This high-speed performance, to-
gether with the process compatibility with CMOS, indicates that self-aligned SEG SiGe
HBT technology is particularly promising for optical fiber-link tele- and data-communica-
tion systems and for microwave/millimeter-wave systems.

yixi 200
WE • 0.2 pm
WE 10.2 pm

150
' 150 Sp? \ 2 ntn
, 2 i*n
100 100 k pm\
, L = pm\
/V *
/ *
50 50

0
0,1 1 10 0.1 1 10
collector current (mA) collector current (mA)
Fig. 36. Cutoff frequencies and maximum oscillation frequencies as a function of collector current for the two
SiGe HBTs with emitter areas of 0.2 x 1 and 0.2 x 2 u.m. The peak cutoff frequencies and maximum oscillation
frequencies of two SiGe HBTs were 122 and 157 GHz and 122 and 163 GHz, respectively, for emitter areas of
0.2 x 1 and 0.2 x 2 |im.
30i
differential ECL gate AE = 0.2 x 2 Mm
V, = 250 mV

5.5 ps

1 3
switching current (mA)

Fig. 37. Dependence of the gate delay time on the switching current in 53-stage differential ECL ring oscilla-
tors for a single-ended voltage swing of 250 mV. An ECL gate delay time of 5.5 ps was measured at a
switching current of 2 mA.
102 K. Washio

The Si3N4filmwas then deposited and also remained on the sidewall of the window. At
that time, thetop.filmof the multilayer, the Si3N4, was etched. After that, the remaining
multilayer of poly-Si and Si02 was selectively etched. The Si3N4 film on the sidewall was
then removed and the window in the topfilmof the multilayer, the Si3N4, enlarged by wet
side-etching. Next, the Si-cap/SiGe-base multilayer was selectively grown by UHV/CVD
using Si2H6, GeEi, and B2He source gases at 550°C for the SiGe layer and a Si2H6 source at
580°C for the Si layer. The SEG layer consisted of a 20-nm-thick Si cap, 10-nm-thick dual-
graded Ge-profile (graded from 0 to 10% and from 10 to 15%) Si^Ge,, 45-nm-thick
Sio.gsGeo.,5, and 10-nm-thick Ge-retrograded (from 15% to 0) Sij.xGex. A 15-nm-thick 2 x
10,9-cm"3 boron-doped Si,.xGex layer was formed as the intrinsic base in the SEG layer. The
poly-SiGe base contact was formed simultaneously with the intrinsic SEG, and this forma-
tion was assisted by the middle film of the multilayer, the poly-Si. Next, phosphorus ions
were implanted into a lower part of the selectively grown Si/SiGe layer to form the second
SIC (SIC2). Double selective implantation of phosphorous in the 0.3-p,m~thick Si epitaxial
layer and then in the undoped SiGe increased the collector-doping level to about 7 x 1017
cm'3 (Fig. 33(b)).
Thin Si02 and in-situ phosphorus-doped poly-Si (IDP) layers were deposited and the
IDP film remained on the sidewall. After an emitter window was formed, a second IDP
layer was deposited. A shallow emitter with a junction depth of about 20 nm was formed by
diffusion from the IDP layers into the Si cap at 900°C for 30 s (Fig. 33(c)). Next, Ti-salicide
layers on all electrodes of the emitter, base, collector, and contact of the poly-Si resistor
were formed simultaneously in a self-aligned manner. The SiGe HBT was fabricated on a
200-mm wafer line and the process (except the SEG) to fabricate the SiGe HBTs was al-
most the same as the 0.2-fxm bipolar-CMOS process, so the process is completely compat-
ible with the BiCMOS technology.
A cross-sectional SEM view of a four-level metal layer structure with an MIM capacitor
is shown in Fig. 34. All the metal layers were multilayers of Al, Ti, and TiN. CMP was used
to planarize the W of the 0.5-pm-wide contact plugs and the 0.6 x 0.6 pm via holes, and to
planarize the plasma-Si02 interlayer insulators. A concave MIM capacitor with capaci-
tance of 0.7 fF/fim2 was formed between thefirst-and second-level metals by using 50-nm-
thick plasma Si02 as an insulator.

Fig. 34. Cross-sectional SEM view of a four-level metal layer structure with an MIM capacitor. The MIM
capacitor was formed between the first- and second-level metals.
Self-Aligned Si BJT/SiGe HBT Technology and Its Application 105

6. IC Chipset for Practical Use in a 40-Gb/s Optical Receiver


To realize 40-Gb/s optical transmission systems, compound semiconductor technologies
have been used to develop a 37-GHz bandwidth preamplifier for a 25-fF-capacitance pho-
todiode using a GaAs PHEMT,28 a 18-dB gain limiting amplifier at 40 GHz using an AlInAs/
GalnAs HBT,2f and a 40-Gb/s 1:4 demultiplexer using a GaAs HEMT.30 However, for a 40-
Gb/s system a photodiode of about 50-fF capacitance should be used for the preamplifier,
the gain of the limiting amplifier must be increased to 30 dB» and a byte-synchronizing
function will be needed in the demultiplexer. Furthermore, to enable widespread commer-
cial uses the cost of 40-Gb/s systems must be reduced. Therefore, SiGe HBTs have been
usedtodevelop ICs for practical use in a 40-Gb/s optical receiver: a preamplifier that can be
applied to a large-capacitance photodiode and is strong enough to endure conventional as-
sembly, a limiting amplifier with a wide-bandwidth circuit configuration which can be ex-
tended as a circuit core to the AGC amplifier and decision circuit, and a high-sensitivity
demultiplexer with a byte-synchronization function.21

6.1. Preamplifier
A schematic, the transimpedance gain, and a chip photomicrograph of the preamplifier are
shown in Fig. 38. The common-base input stage provides a low input capacitance (equal to
the collector capacitance of an HBT) instead of the large Cin of the photodiode and IC-
pads.31 In this configuration, the bandwidth, having a strong dependence on the time con-
stants of the emitter and the collector nodes, is very sensitive to the emitter area of transistor
QCB (that is, it is sensitive to parasitic capacitance). Therefore, an emitter small enough to
ensure a bandwidth of more than 40 GHz regardless of the variation in Cm caused by manu-
facturing deviations was used. The transimpedance gain was transformed by using the $-
parameter taking into account Cin within a range from 50 fF to 125 fF. The -3-dB bandwidth
was 49 and 40 GHz for Cin of 50 and 125 fF, respectively, with a transimpedance gain of
50.2 dBO. The bandwidth was improved by 76 to 62% compared to that of a conventional
transimpedance amplifier. This indicates that the developed preamplifier will be able to
perform acceptably even when mass-produced. The power supply voltages of the pream-
plifier, Vcc and VEE, were 5.0 and -5.2 V, respectively, and the power consumption was 300
mW on a 1.2 x 1.8-mm chip.

60r-—T r • 1 1 i" • 1 • — 1

,r , L , i , i , i i 1 mmmmMM
0 10 20 30 40 50 Jp|(PlMfpNi
frequency (GHz) fe**^^
Fig. 38. Schematic (insert), the transimpedance gain, and a chip photomicrograph of the preamplifier.
106 K. Washio

6.2. Limiting amplifier


A schematic of the limiting amplifier, a chip photomicrograph, and the dependence of out-
put power and phase deviation (in the saturation region) on input power under 40-GHz
limiting amplifier operation are shown in Fig. 39. The three constant-gain stages (A1-A3)
with a transimpedance circuit (Q3-Q6, RF> and RL ) as an active load were used to simulta-
neously achieve a wide bandwidth and a high voltage gain.32 To obtain a sharp rising output
waveform, biasing resistors R, were added to prevent the cut-off operation of Q5 (and Q6)
when the transconductance stage (Q,, Q2, and RE ) operated in the saturation region. The
31.9-dB gain at 40 GHz shows that the high input sensitivity makes the limiting amplifier
easy to use in a clock-extraction circuit and a wide bandwidth of up to 49 GHz and a high
gain indicates that it can be used as a core circuit, such as in a decision circuit. The mini-
mum saturation input power was sufficiently low, -30 dBm, and the phase deviation within
an input power range from -30 to -10 dBm was less than 70 degrees. The waveform of a
500-mVpp output voltage with little distortion at 40 GHz was made possible by the biasing
resistors. The power supply voltage VEE was -7.5 V and the power consumption was 1.38 W
on a 1.2 x 1.8-mm chip.

Fig. 39. Schematic of the limiting amplifier, a cMp photomicrograph, and the dependence of output power and
phase deviation (in the saturation region) on input power under 40-GHz limiting amplifier operation.

63. 1:4DEMUX
A schematic, the output patterns for a known 40-Gb/s data input, and a chip photomicro-
graph of a 1:4 high-sensitivity demultiplexer (HS-DEMUX) combined with a decision cir-
cuit are shown in Fig. 40. The integration with a decision circuit reduced the total power
consumption and the number of system components. Also, the clock-timing adjustment
between these two circuits became easier because the delay-time variation caused by pack-
aging deviation was eliminated. In the decision circuit, the cascoded amplifier was de-
signed to change the output-voltage level of the wide-bandwidth amplifier to the proper
level for the following master-slave D-flip-flop (MSFF). The 1:4 DEMUX consists of
three 1:2 DEMUXs and a clock-distribution circuit (CDC). In the CDC, byte-synchroniza-
tion is enabled by bit-rotation;33 this synchronization is indispensable for practical use. The
CDC is composed of phase-shifters, static frequency dividers, 2-bit counters, and exclusive
ORs. The CDC provides 1/2 clock (CKA) and 1/4 clock (CKB) signals. The phase rela-
tionship between these clocks is controlled by the four states of the two exclusive OR's
output, which are determined from the bit-rotation signal via two 2-bit counters in series.
108 K. Washio

Dynamic FD ICs have also been developed: two types of analog dynamic FDs (ADFDs)
with operating frequencies of 57-64 GHz with GaAs HEMTs 36 and 75 GHz with InP
HEMTs37, and a digital dynamic FD (DDFD) operating at 39-63.5 GHz with InP HEMTs.38
Over the last decade, FD ICs have been mainly based on III-V compound semiconductor
devices. However, for millimeter-wave systems to penetrate the consumer electronics mar-
ket, low-cost monolithic ICs are essential because of their availability and ease of use. In
this section, two types of static FDs with maximum operating frequencies of up to 60 and
67 GHz,18,20 and a dynamic FD with a maximum operating frequency of up to 82.4 GHz,'8
all made using SiGe HBTs, are described.

7.1 Static frequency dividers


Two types of 1/4 static frequency divider (SFD) have been developed. One consists of two
conventional master-slave toggle flip-flops (MS-TFF), and a fast latch-to-track-transition
(called pre-tracking) MS-TFF is used for the other. A block diagram of a 1/4 SFD and the
circuit of a pre-tracking (PT) MS-TFF are shown in Fig. 41. The divider consists of a 50-S2-
terminated three-emitter-follower input buffer, a pre-tracking MS-TFF based on a fast com-
parator technique 39 as the first divide-by-two stage, a conventional MS-TFF as the second
divide-by-two stage, internal buffers to reform the output signal of each MS-TFF, and an
output buffer driving 50-S2 lines. In the PT MS-TFF, to extend the operating frequency, a
load resistor was separated into two resistors, RL1 and RL2, and collector nodes of the latch-
ing pair transistors were connected at the node between the two resistors. Therefore, the
low output level of the tracking pair rises to a level equal to the product of the PT ratio
(defined as R L I/(RLI+RL2)) and the total logic swing voltage Vt'°'°' during the latching phase.
Also, the logic swing voltage of the latching pair is reduced to the same level. As a result,
the response of the collector and base nodes of the upper quadrant multiplier becomes faster;
hence, the latch-to-track transition time is reduced. The internal single-ended voltage swing
for the MS-TFF was optimized to 250 mV by calculating the gate delay time of a single flip-
flop biased to operate as an inverter, and the typical Vl"M of the PT MS-TFF was also set at
the same value. To obtain higher operating speed, two emitter followers were used in each
stage after the flip-flops. The emitter lengths of each emitter follower transistor were opti-
mized to 1.5 and 3 \ixt\ in the master stage and 1.5 and 4 ^m in the slave stage to reduce the
loading of the flip-flops.

Fig. 41. Block diagram of a 1/4 SFD and the circuit of a pre-tracking master-slave toggle flip-flop.
Self-Aligned Si BJT/SiGe HBT Technology and Its Application 109

The input sensitivities of a 1/4 PT-SFD with a PT ratio of 0.41 and a conventional SFD
at a switching current ICs of 1.28 mA is shown in Fig. 42. The SFDs were measured on-
wafer with 67-GHz micro-coaxial probes. The sinusoidal clock input was single-ended
driven and the second differential input was terminated at 50 Q. The PT-SFD with a PT
ratio of 0.41 operated about 11% faster than the conventional SFD. The 16.75-GHz di-
vided-by-four output waveform for a 67-GHz input in the PT-SFD at a PT ratio of 0.41 and
the 15-GHz divided-by-four output waveform for a 60-GHz input in the conventional SFD
are also shown in Fig. 42. The power consumption of the PT MS-TFF and the conventional
MS-TFF were 175 and 162 mW, respectively, at a supply voltage of -5.2 V.

10 i • i ' i ' i ' i ' i ' i


7 " H u B -, N PT ratio d !
0.41 -
F 0 conv. i ,-j ^ I

-
m
•D / /
-10 - ~ "
a -
o
w PT SFD f i n = 67 GHz
® -20
c
LU

c -SO
LL
/
-40 I i I i I i I
Ar\ en an
frequency (GHz) 20 ps/div., 100 mV/div.

Fig. 42. Input sensitivities of a 1/4 PT-SFD with a PT ratio of 0.41 and a conventional SFD at a switching
current of 1.28 mA. Also shown is a 16.75-GHz divided-by-four output waveform for a 67-GHz input in the
PT-SFD at a PT ratio of 0.41, and a 15-GHz divided-by-four output waveform for a 60-GHz input in the
conventional SFD.

7.2 Dynamic frequency divider


The 1/4 dynamic frequency divider (DFD) consists of a 50-£2-terminated three-emitter-
follower input buffer, a dynamic toggle flip-flop (D-TFF) based on the regenerative fre-
quency division principle as the first divide-by-two stage, a static MS-TFF as the second
divide-by-two stage, internal buffers to reform the output signal of each TFF, and an output
buffer driving 50-Q. lines. The internal single-ended voltage swing for the D-TFF was
designed to be 500 mV, which was optimized by taking into account the gain cutoff fre-
quency of the Gilbert multiplier. The single-ended voltage swing of the MS-TFF is 250
mV. To increase the operating speed, two emitter followers were used and the emitter size
of each emitter follower transistor was optimized to reduce the loading of the flip-flops.
The input sensitivity of a 1/4 DFD, up to its maximum operating frequency of 82.4 GHz
is shown in Fig. 43. The operational bandwidth of the DFD was as broad as 50 GHz (32 -
82 GHz) without tuning. The divided-by-four output waveform measured on-wafer is also
shown in Fig. 43. The power consumption of the D-TFF was only 40 mW at a supply
voltage of-5.2 V. Chip micrographs of the 1/4 DFD, zooming in to the main circuit region
in two steps, are shown in Fig. 44. All circuit elements (transistors, resistors, and even
interconnects) are laid out symmetrically, as can be seen in the two chip micrographs on the
110 K. Washio

right (the upper micrograph was taken after the first-level metal was applied). The DC bias
terminals are connected via MIM capacitors to obtain a stable voltage supply, The D-TFF
occupies only 40 x 50 pan and the main circuit region is 420 x 150 pin.

—dynamic f j n = 82.4 GHz—

r
/ \ K / /<* /
\/ \, >>'
• " ' . ,

-,,
hor.: 20 ps/div, ver.: 100 mV/div.

20 40 60 80 100
input frequency (GHz)

Fig. 43. Input sensitivity of a 1/4 dynamic frequency divider, up to its maximum operating frequency of 82.4
GHz, and a 20.6-GHz divided-by-four output waveform measured on-wafer.

4 2 0 x 1 5 0 urn

-^^lii^l
4,1 W- - ' ^ S i ^ T 5 8 1 ^

40 x 50 urn
0.97 x 1.1 mm
Fig. 44. Chip micrographs of the 1/4 dynamic frequency divider, zooming in to the main circuit region in two
steps. The DC bias terminals are connected via MIM capacitors to obtain a stable voltage supply. The D-TFF
was 40 x 50 yun and the main circuit region was 420 x 150 Jim.

8. C o n c l u s i o n s
Recent advances in Si BJT/SiGe HBT technologies have gone a long way towards making
these transistors ready for application in high-speed circuits for future optical-fiber-links
and millimeter-wave systems. For Si BJTs, the shallow boron diffusion RVD process has
been used to obtain a shorter base transit time, and an SMI base electrode has been used to
reduce the base resistance. This has enabled a high cutoff frequency of 100 GHz and high-
speed circuits (e.g., a 12-ps-delay ECL gate and a 45-GHz dynamic frequency divider). For
Self-Aligned Si BJT/SiGe HBT Technology and Its Application 111

40-Gb/s optical-fiber-links, a self-aligned SEG SiGe HBT with SMI electrodes that has a
high cutoff frequency, a maximum oscillation frequency of about 100 GHz, and a below-
10-ps ECL gate delay, has been developed. This technology was applied in ICs for optical-
fiber-link systems: a static frequency divider with a maximum operating frequency of up to
50 GHz, a time-division multiplexer and demultiplexer operating at 40 Gb/s, a preamplifier
with a bandwidth of 35 GHz, an AGC amplifier core with a bandwidth of 32 GHz, and a
decision circuit operating at 40 Gb/s. Furthermore, to provide both high-speed operation
and sophisticated functions, SiGe HBTs fully compatible with the CMOS process have
been developed. They provide a 122-GHz cutoff frequency, a 163-GHz maximum oscilla-
tion frequency, and an ECL gate delay time of 5.5 ps. A static frequency divider and a
dynamic frequency divider with maximum operating frequencies of up to 67 GHz and up to
82.4 GHz, respectively, have been demonstrated. A preamplifier with a 45-GHz band-
width, a limiting amplifier with a 49-GHz bandwidth, and a 40-Gb/s 1:4 high-sensitivity
demultiplexer combined with a decision circuit for practical use in a 40-Gb/s optical re-
ceiver have also been developed. The excellent performance of these devices shows that
the SiGe HBT will play a major role in future millimeter-wave systems.

Acknowledgments
I would like to express my sincere thanks to Dr. Katsuki Miyauchi, Dr. Masanobu Miyao,
Dr. Osamu Kanehisa, Dr. Koichi Seki, Katsutaka Kimura, Dr. Akio Anzai, Dr. Yasushi Hatta,
and Takashi Harada for their advice and encouragement. I also thank Tokuo Kure and the
staff at the Central Research Laboratory pilot line for the wafer processing, and thank Dr.
Takahiro Onai, Dr. Yukihiro Kiyota, Katsuya Oda, Eiji Ohue, Dr. Masao Kondo, Hiromi
Shimamoto, Masamichi Tanabe, Reiko Hayami, Toru Masuda, Dr. Ken-ichi Ohhata,
Nobuhiro Shiramizu, Fumihiko Arakawa, and Koji Mikami for their extensive contribu-
tions through this work.

References
1. M. Soda, H. Tezuka, F. Sato, T. Hashimoto, S. Nakamura, T. Tatsumi, T. Suzuki, T. Tashiro, "Si-
Analog ICs for 20 Gb/s Optical Receiver," in 1SSCC Dig. Tech. Pap., pp. 170-171,1994.
2. A. Felder, M. Moller, J. Popp, J. Bock, and H. -M. Rein, "46 Gb/s DEMUX, 50 Gb/s MUX, and
30 GHz Static Frequency Divider in Silicon Bipolar Technology," IEEE J. Solid-State Circuits,
vol. 31, no. 4, Apr. 1996, pp. 481-486.
3. H. -M. Rein and M. Moller, "Design Consideration for Very-High-Speed Si-Bipolar ICs Operat-
ing up to 50 Gb/s," IEEE J. Solid-State Circuits, vol. 31, no. 8, Aug. 1996, pp. 1076-1090.
4. M. Wurzer, T. F. Meister, H. Schafer, H. Knapp, J. Bock, R. Stengl, K. Aufinger, M. Franosch, M.
Rest, M. Moller, H. -M. Rein, and A. Felder, "42GHz Static Frequency Divider in a Si/SiGe
Bipolar Technology," in ISSCC Dig. Tech. Pap., pp. 122-123, 1997.
5. M. Moller, H.-M. Rein, A. Felder, and T. F. Meister, "60 Gbit/s Time-Division Multiplexer in
SiGe-Bipolar Technology with Special Regard to Mounting and Measurement Technique", Elec-
tron. Lett., vol. 33, pp. 679-680, 1997.
6. K. Washio, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and T. Onai, "95 GHz fT Self-Aligned
Selective Epitaxial SiGe HBT with SMI Electrodes," in ISSCC Dig. Tech. Pap., pp. 312-313,
1998.
112 K. Washio

7. T. Masuda, K. Ohhata, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, T. Onai, and K. Washio, "40
Gb/s Analog IC Chipset for Optical Receiver using SiGe HBTs," in ISSCC Dig. Tech. Pap., pp.
314-315, 1998.
8. M. Tanabe, H. Shimamoto, T. Onai, and K. Washio, "Simplified Distribution Base Resistance
Model in Self-Aligned Bipolar Transistors," IEICE Trans. Electron., vol. E-79-C, no. 2, pp. 165-
171, 1996.
9. T. Onai, E. Ohue, M. Tanabe, and K. Washio, "Self-Aligned Metal/IDP Bipolar Technology Fea-
turing 14 ps/70 GHz," in 1EDM Tech. Dig., pp. 699-702, 1995.
10. K. Washio, E. Ohue, M. Tanabe, and T. Onai, "Self-Aligned metal/IDP Si Bipolar Technology
with 12-ps ECL and 45-GHz Dynamic Frequency Divider," in Proc. ESSDERC'96, Bologna,
Sept. 1996, pp. 807-810.
11. T. F. Meister, H. Schafer, M. Franosch, W. Molzer, K. Aufinger, U. Schler, C. Walz, M. Stolz, S.
Boguth, and J. Bock, "SiGe Base Bipolar Technology with 74 GHz fm„ and 11 ps Gate Delay," in
1EDM Tech. Dig., pp. 739-742, 1995.
12. A. Pruijmboom, D. Terpstra, C. E. Timmering, W. B. de Boer, M. J. J. Theunissen, J. W. Slotboom,
R. J. E. Hueting, and J. J. E. M. Hageraats, "Selective-Epitaxial Base Technology with 14 ps
ECL-Gate Delay, for Low Power Wide-Band Communication Systems," in IEDM Tech. Dig., pp.
747-750, 1995.
13. E. F. Crabbe, B. S. Meyerson, J. M. C. Stork, and D. L. Harame, "Vertical Profile Optimization of
Very High Frequency Epitaxial Si- and SiGe-Base Bipolar Transistors," in IEDM Tech. Dig., pp.
83-86, 1993.
14. K. Oda, E. Ohue, M. Tanabe, H. Shimamoto, T. Onai, and K. Washio, "130-GHz fT SiGe HBT
Technology," in IEDM Tech. Dig., pp. 791-794, 1997.
15. K. Washio, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and T. Onai, "A Selective-Epitaxial
SiGe HBT with SMI Electrodes Featuring 9.3-ps ECL-Gate Delay," in IEDM Tech. Dig., pp.
795-798, 1997.
16. S. Subbanna, D. Ahlgren, D. Harame, and B. Meyerson, "How SiGe Evolved into a Manufactur-
able Semiconductor Production Process," in ISSCC Dig. Tech. Pap., pp. 66-67, 1999.
17. K. Washio, M. Kondo, H. Shimamoto, M. Tanabe, E. Ohue, R. Hayami, K. Oda, and T. Harada,
"A 0.2-u.m Self-Aligned SiGe HBT Featuring 107-GHz f„ x and 6.7-ps ECL," in IEDM Tech.
Dig., pp. 557-560, 1999.
18. K. Washio, E. Ohue, K. Oda, R. Hayami, M. Tanabe, H. Shimamoto, T. Harada, and M. Kondo,
"82GHz Dynamic Frequency Divider in 5.5ps ECL SiGe HBTs," in ISSCC Dig. Tech. Pap., pp.
210-211,2000.
19. T. Hashimoto, T. Kikuchi, K. Watanabe, N. Ohashi, T. Saito, H. Yamaguchi, S. Wada, N. Natsuaki,
M. Kondo, S. Kondo, Y. Homma, N. Owada, and T Ikeda, "A 0.2-p.m Bipolar-CMOS Technol-
ogy on Bonded SOI with Copper Metallization for Ultra High-Speed Processors," in IEDM Tech.
Dig., pp. 209-212, 1998.
20. K. Washio, R. Hayami, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and Masao Kondo, "67-
GHz Static Frequency Divider Using 0.2-u.m Self-Aligned SiGe HBTs," in IEEE MTT-S Radio
Frequency Integrated Circuits Symp., pp. 31-34, 2000.
21. T. Masuda, K. Ohhata, F. Arakawa, N. Shiramizu, E. Ohue, K. Oda, R. Hayami, M. Tanabe, H.
Shimamoto, M. Kondo, T. Harada, and K. Washio, "45GHz Transimpedance 32 dB Limiting
Amplifier and 40 Gb/s 1:4 High-Sensitivity Demultiplexer with Decision Circuit Using SiGe
HBTs for 40 Gb/s Optical Receiver," in ISSCC Dig. Tech. Pap., pp. 60-61, 2000.
Self-Aligned Si BJT/SiGe HBT Technology and Its Application 113

22. Y. Kiyota, T. Onai, T. Nakamura, T. Inada, and Y. Hirano, "Ultra-Thin Base Si Bipolar Transistor
Using Rapid Vapor-Phase Direct Doping (RVD)," IEEE Trans. Electron Devices, vol. 39, pp.
,2077-2081, 1992.
23. Y. Kiyota, E. Ohue, T. Onai, K. Washio, M. Tanabe, and T. Inada, "Lamp-Heated Rapid Vapor-
Phase Doping Technology for 100-GHz Si Bipolar Transistors," in Proceeding ofBCTM, pp.
173-176, 1996.
24. K. Oda and Y. Kiyota, "H2 Cleaning of Silicon Wafers Before Low-Temperature Epitaxial Growth
by Ultra High Vacuum Chemical Vapor Deposition," J. Electrochem. Soc, vol. 143, p. 2361,
1996.
25. K. Oda and Y. Kiyota, "Selectivity Dependence on Ge Composition in Si!.xGex Low-Tempera-
ture Epitaxial Growth," MRS 1997 Spring Meeting.
26. M. Kondo, K. Oda, E. Ohue, H. Shimamoto, M. Tanabe, T. Onai, and K. Washio, "Ultra-Low-
Power and High-Speed SiGe Base Bipolar Transistors for Wireless Telecommunication Systems,"
IEEE Trans. Electron Devices, vol. 45, no. 6, pp. 1287-1294, June 1998.
27. E. Ohue, K. Oda, R. Hayami, and K. Washio, "A 7.7-ps CML Using Selective-Epitaxial SiGe
HBTs," in Proc. BCTM, pp. 97-100, 1998.
28. E. Legros, S. Vuye, L. Giraudet, and C. Joly, "High-Sensitivity 40 Gbit/s Photoreceiver Using
GaAs PHEMT Distributed Amplifiers", Electron. Lett., 25th June 1998, vol. 34, No. 13, p. 1351.
29. M. Mokhtari, T. Swahn, R. H. Walden, W. E. Stanchina, M. Kardos, T. Juhola, G. Schuppener, H.
Tenhunen, and T. Lewin," InP-HBT Chip-set for 40 Gb/s Fiber Optical Communication Systems
Operational at 3 V", IEEE J. of Solid-State Circuits, vol. 32, No. 9, pp. 1371-1383, September
1997.
30. M. Lang, Z.-G. Wang, Z. Lao, M. Schlechtweg, A. Thiede, M. R.-Motzer, M. Sedler, W. Bronner,
G. Kaufel, K. Kohler, A. Hulsmann, and B. Raynor, "20-40 Gb/s 0.2-p.m GaAs HEMT Chip Set
for Optical Data Receiver", IEEE J. of Solid-State Circuits, vol. 32, No. 9, pp. 1384-1393, Sep-
tember 1997.
31. T. Vanisri et al., "Integrated High Frequency Low-Noise Current-Mode Optical Transimpedance
Preamplifiers: Theory and Practice", IEEE J. Solid-Stale Circuits, vol. 30, No. 6, pp. 677-685,
June 1995.
32. M. Moller, H. -M. Rein, and H. Wemz, "13 Gb/s Si-Bipolar AGC Amplifier IC with High Gain
and Wide Dynamic Range for Optical-Fiber Receivers", IEEE J. Solid-State Circuits, vol. 29,
No. 7, pp. 815-822, July 1994.
33. Z. Lao, U. Langmann, J. N. Albers, E. Schlag, and D. Clawin, "Si Bipolar 14 Gb/s 1:4-
Demultiplexer IC for System Applications", IEEE J. Solid-State Circuits, vol. 31, No. 1, pp. 54-
60, January 1996.
34. Q. Lee, D. Mensa, J. Guthrie, S. Jaganathan, T. Mathew, Y. Betser, S. Krishnan, S. Ceran, and M.
J. W. Rodwell, "66 GHz Static Frequency Divider in Transferred-substrate HBT Technology," in
IEEE MTT-S Radio Frequency Integrated Circuits Symp., June 1999.
35. H. Nakajima, T. Ishibashi, E. Sano. M. Ida, S. Tamahata, and Y. Ishii, "InP-Based High-Speed
Electronics," in IEDM Tech. Dig., pp. 771 -774, 1999.
36. J.-C. Sarkissian, M. Camiade, P. Savary, A. Suarez, R. Quere, and J. Obregon, "A 60-GHz HEMT-
MMIC Analog Frequency Divider by Two," IEEE J. Solid-State Circuits, vol. 30, pp. 1062-1067,
Oct. 1995.
37. C. J. Madden, D. R. Snook, R. L. Van Tuyl, M. V. Le, and L. D. Nguyen, "A Novel 75 GHz InP
HEMT Dynamic Divider," in IEEE GaAs IC Symp. Dig., pp. 137-140, Oct. 1996.
114 K. Washio

38. Y. Umeda, K. Osafune, T. Enoki, H. Yokoyama, Y. Ishii, and Y. Imamura, "Over-60-GHz Design
Technology for an SCFL Dynamic Frequency Divider Using InP-Based HEMT's," IEEE Trans.
Microwave Theory Tech.. vol. 46. pp. 1209-1214, Sept. 1998.
39. B. Peetz, B. D. Hamilton, and J. Kang, 'An 8-bit 250 Megasample per Second Analog-to-Digital
Converter: Operation Without a Sample and Hold," IEEE J. Solid-State Circuits, vol. SC-21, no.
6, pp. 997-1002, Dec. 1986.
International Journal of High Speed Electronics and Systems, Vol. 11, No. 1 (2001) 115-136
© World Scientific Publishing Company

SMALL-SCALE InGaP/GaAs HETEROJUNCTION BIPOLAR TRANSISTORS


FOR HIGH-SPEED AND LOW-POWER
INTEGRATED-CIRCUIT APPLICATIONS

TOHRU OKA, KOJI HIRATA', HIDEYUKI SUZUKI, KIYOSHI OUCHI,


HIROYUKI UCHIYAMA, TAKAFUMI TANIGUCHI,
KAZUHIRO MOCHIZUKI and TOHRU NAKAMURA**
Central Research Laboratory, Hitachi, Ltd., 'Hitachi ULSl Systems Co.,
Kokubunji, Tokyo 185-8601, Japan.
"Department of Electronics and Electrical Engineering, Hosei University,
Koganei, Tokyo 184-8584, Japan

Small-scale InGaP/GaAs heterojunction bipolar transistors (HBTs) with high-speed as well as low-
current operation are demonstrated. To reduce the emitter size SE and the base-collector capacitance
CK simultaneously, the HBTs are fabricated by using WSi/Ti as the base electrode and by burying
SiCh in the extrinsic collector region. WSi/Ti metals simplify and facilitate processing to fabricate
small base electrodes, and the buried Si0 2 reduces the parasitic CK under the base electrode. The
cutoff frequency fT of 156 GHz and the maximum oscillation frequency /ma, of 255 GHz were
obtained at a collector current Ic of 3.5 mA for the HBT with SE of 0.5 urn x 4.5 um, and/r of 114
GHz and/™, of 230 GHz were obtained at Ic of 0.9 mA for the HBT with SE of 0.25 um x 1.5 um. A
1/8 static frequency divider operated at a maximum toggle frequency of 39.5 GHz with a power
consumption per flip-flop of 190 mW. A transimpedance amplifier provides a gain of 46.5 dB £2
with a bandwidth of 41.6 GHz at a power consumption of 150 mW. These results indicate the great
potential of our HBTs for high-speed, low-power integrated-circuit applications.

1. Introduction

Recent technological advances in communication systems, such as microwave/


millimeter-wave wireless communication systems and large capacity optical-fiber
communication systems, require ultra-high-speed devices to handle the huge volume of
information. Heterojunction bipolar transistors (HBTs) are one of the promising devices
for these high-speed applications because of their unique advantages over metal
semiconductor field effect transistors (MESFETs) and high electron mobility transistors
(HEMTs); a much higher transconductance, a better threshold voltage control, a higher
output resistance, higher current driving capability and lower phase noise. Furthermore,
HBTs operate at higher power densities than MESFETs and HEMTs, which allow for
smaller die size for a given power requirement. In addition, the HBT fabrication process
is less complex, particularly in terms of lithographic requirements, so it results in higher
yields and therefore lower costs.

115
116 T. Oka et al.

GaAs-based HBTs have distinctive characteristics compared to Si bipolar devices and


InP-based HBTs. Compared to Si bipolar devices, GaAs HBTs have higher cutoff
frequencies at a similar vertical device scale, which is basically associated with higher
electron mobility and drift velocities in III-V materials.1 Compared to InP HBTs, on the
other hand, process technology of GaAs devices is much matured and large wafers can be
produced with higher yields and lower costs. Moreover, GaAs HBTs have higher
breakdown voltage than Si bipolar transistors and InP HBTs and, thus, are suitable for
integrated circuits (ICs) with large output voltages.
To date, GaAs-based HBTs with high cutoff frequency fT over 100 GHz and high
maximum oscillation frequency / max over 200 GHz have been reported.2"4 However,
device dimensions of these HBTs are considerably larger than self-aligned and high-
speed Si bipolar devices. " This causes large power dissipation and thermal-management
problems when they are applied in highly integrated chips such as those used in high-
speed communications. Although GaAs HBTs with emitter areas of equal to or smaller
than 1 |im2 have been reported,8"10 their high-frequency characteristics are limited by
parasitic capacitance resulting from a large base-collector junction area, which relatively
increases as the emitter size is scaled down. Simultaneous reduction of both emitter size
and parasitic capacitance is therefore the key to achieve high-speed, low-power IC
operation with GaAs-based HBTs.
To reduce the parasitic capacitance of the base-collector junction, ion implantation of
proton- or oxygen into the region under the extrinsic base is extensively used.11' 12 The
implanted ions compensate the relatively low doping of the collector and thus fully
depletes the extrinsic collector. However, the dielectric constant of the implanted regions
is still as large as that of GaAs, and the reduction of the parasitic capacitance is limited.
Furthermore, ion implantation increases base resistance,13 which also degrades high-
frequency characteristics.
In this paper, we demonstrate small-scale InGaP/GaAs HBTs capable of high-speed
as well as low-current operation. ' ' To reduce the parasitic capacitance under the base
electrode, we developed a planarization and etchback process to bury thick Si0 2 in the
extrinsic base-collector region. To simplify the fabrication process, we developed base
electrodes with high process controllability by using WSi and Ti. These technologies
enable us to reduce both emitter size and parasitic capacitance simultaneously and, thus
the device can operate at a high frequency and a low collector current. We applied the
small-scale HBTs to 1/8 static frequency dividers and transimpedance amplifiers and
investigated the capability of our HBTs for high-speed, low-power integrated circuits.

2. Device Design

2.1. Conventional HBT

A schematic cross-section of a conventional HBT structure is shown in Fig. 1. A certain


amount of the extrinsic base-collector junction area must be used to contact the wire and
Small-Scale InGaP/GaAs Heterojunction Bipolar Transistors 117

,4 wire

J emitter \ tWK'rs
base

collector
subcollector

S.I. GaAs substrate


I
Fig. 1. Schematic cross-section of a conventional HBT structure.

S E 0im2)
40 20 10 6 4
2.0 -\—r -r
T kT
E = —|-(C E B + CBC)

1.5 " Tec = (R EE + Rc)CBC


B
CD

1.0
_ca
CD
Q
*F
0.5

*CC
0.1 0.2 0.3

1/S E (jjm-2)

Fig. 2. Dependence of delay time on emitter size of conventional HBTs.


118 T. Oka et al.

the base electrode, and this area is not reduced when the device dimension is scaled
down. We analyzed delay time components of our conventional HBTs to confirm a
limiting factor of high-frequency performance. Total delay time TEC is approximately
expressed by
TEC = l/2ffT = kT/qIc(CEB + CBC) +TB+TC+ (REE + RC)CBC, (1)

where kT/q is thermal voltage, Ic is collector current, CEB is emitter-base capacitance, CBC
is base-collector capacitance, TB is base transit time, xc is collector transit time, REE is
emitter resistance, and Rc is collector resistance. Figure 2 shows the dependence of delay
time on emitter size of conventional HBTs with the base thickness of 30 nm and the
collector thickness of 200 nm at a collector current density of 1 x 105 A/cm2. Since the
intrinsic transit time TF = xB + Tc is determined by the thickness of the neutral base and
the collector depletion region, rF is almost constant for all sizes of devices. In contrast,
both the emitter charging time zE = kT/qIdCEB + CBC) and the collector charging time zcc
= (REE + Rc)CBc increases as the emitter size is reduced. This results from the increase in
the parasitic component of CBC, leading TE and xcc to the dominant factors in the total
delay time of small devices. We thus concluded that the parasitic capacitance of the base
collector junction should be reduced to improve the high frequency performance of
small-scale HBTs.

2.2. Developed HBT with Buried Si02

A schematic cross-section of our developed small-scale HBT is shown in Fig. 3. The


structure of the developed HBT has the following two features. One is that the extrinsic
base-collector region is buried with SiC>2. The buried SiC>2 reduces the parasitic
capacitance under the base electrode because the dielectric constant of Si0 2 is about 1/3
of that of GaAs.16 Furthermore, the width of the base contact is reduced to 0.3 |am by
using a self-aligning process. The narrow width also reduces the base-collector
capacitance because it decreases the base-collector junction area. The other feature is that
WSi/Ti metals are used as the base electrode. These materials have definite advantages
over conventional gold-based electrode metals. That is, both WSi and Ti can be deposited
by a sputtering method with good step coverage and selectively patterned on GaAs and
Si0 2 by using reactive ion etching (RIE). These advantageous properties simplify and
facilitate processing to fabricate base electrodes of HBTs with narrow base contact and
buried Si0 2 . In addition, insertion of a thin Ti film between WSi and /?-type GaAs further
reduces the contact resistance compared to using WSi only. 17 '' This suppresses the large
increase in the base resistance that occurs when the width of the base contact is reduced.
These features enable us to reduce both emitter size and base-collector capacitance
simultaneously, leading to high-speed operation of GaAs HBTs at low collector currents.
Small-Scale InGaP/GaAs Heterojunction Bipolar Transistors 119

F+ ion implanted region

Fig. 3. Schematic cross-section of our developed small-scale HBT with a WSi/Ti base electrode and buried
Si0 2 .

3. Characterization and Optimization of Base Contact

Before fabricating small-scale devices, we investigated characteristics of WSi and


WSi/Ti metals as the base electrode. We also estimated the optimum width of the base
contact in order to design HBTs with both high/ r and high/max.

3.1. Contact Resistance of WSi and WSi/Ti

We evaluated the specific contact resistance pc between GaAs and WSi or WSi/Ti ohmic
metals by using a transmission line model (TLM) measurement.19, 20 WSi (300 nm)/Ti (5
nm)or WSi (300 nm) films were deposited by sputtering on 30-nm-thick p-GaAs actual
HBT base layers. The composition ratio x of WSix was 0.3 and the sheet resistance of the
300-nm-thick WSi was approximately 7 Q/square.
Specific contact resistance pc as a function of base carrier concentration NA is shown
in Fig. 4. The dashed lines are the theoretical curves of the relationship between pc and
NA based on a tunneling model.21 We calculated these curves corresponding to various
potential barrier heights <pB, assuming that the effective tunneling hole mass is 0.1 m,
where m is the free electron mass. The pc of WSi fits the theoretical curve for <pB of 0.8
eV at lower carrier concentration and it deviates to higher </>B as the carrier concentration
increases. The pc of WSi/Ti, in contrast, agrees well with the curve for (j)B of 0.6 eV at
120 T. Oka et al.

any carrier concentration. All these pc values are about one order of magnitude lower
than those of WSi. At a carrier concentration of 1 x 1020 cm"3, for example, pc is reduced
from 2 x 10"6 Q.cm2 to 3 x 10"7 Qxm 2 . This result indicates that Ti film insertion
effectively decreases potential barrier height and thus reduces contact resistance.

Carrier Concentration, N A (crrr3)


10 2 1 4 2 1020 5X10 19
10" 4 i i i
<)>B = 0.8eV :
CNJ O WSi
E • WSi/Ti
10 - 5 / 0.7 e V , ' :

05
o ' / ,'o.eev ;
I 10"6
d)
CO
o / / • • ' *

DC
•*—' —t
°co -in"'
10
t .

c
o
O i,,.r,, I 1 I I I I 1 I I I I
10" 8
0 0.5 1.0 1.5 2.0

10 1 0 / N A 1 / 2 (cm3/2)

Fig. 4. Specific contact resistance of WSi and WSi/Ti as a function of carrier concentration in p-GaAs layers.

3.2. Optimization of Base Contact Width

Narrowing the width of the base contact WBC decreases the base-collector capacitance
CBC and thus increases both fT and /max. However, it simultaneously increases base
resistance RB, which decreases / max . Therefore, WBC has an optimum value for achieving
both high/ r and/max. We thus estimated dependence of the product of RB and CBC on WBC-
Figure 5 shows the calculated product of RB and CBC of a device with an emitter size
SE of 0.5 \im x 5 \im and a base doping concentration of 1 x 1020 cm"3. The calculation
was carried out by assuming that the device structure is as described in the next section.
The product for the case of the WSi base electrode reaches its minimum at WBC of 0.55
jim and rapidly increases when WBC is below 0.4 |im. On the other hand, the product for
the WSi/Ti electrode is about 1/2 to 1/3 smaller than that for the WSi electrode. The
product shows the minimum at WBC of 0.25 |xm and it does not show the remarkable
increase even though WBC is less than 0.2 |am. These results are due to the extremely low
pc of WSi/Ti and suggest that WBC can be reduced to less than 0.4 nm without greatly
increasing base resistance. Although the optimum value of WBC is slightly changed
Small-Scale InGaP/GaAs Heterojunction Bipolar Transistors 121

0.8 "I—'—*—>—I - 1
—'—•-1—'—'—'—1—•—•—•"

S E = 0.5 p,m x 5 (j.m

0.6 -
Q.
WSi
O (pc=2x10-6Qcm2)
O 0.4
x
CO WSi/Ti
QC Q2 |_ "(pc = 3 x 1 0 - 7 ^ c m 2 )

J L_l_

0 0.2 0.4 0.6 0.8 1.0


WBC (JAITI)

Fig. 5. Calculated product of base resistance RB and base-collector capacitance CBC as a function of base
contact width Wfic.

according to emitter size, our developed HBTs would have both high/ r and/max at WBC of
around 0.2 to 0.3 pun.

4. Fabrication Process

The epitaxial layers of the fabricated small-scale HBTs were grown on a semi-insulating
(100) GaAs substrate by gas-source molecular beam epitaxy (MBE).22 The group III
sources were elemental Ga and In, and the group V sources were cracked arsine (AsH3)
and phosphine (PH3). The n- and p-type dopant sources were solid Si and carbon-
tetrabromide (CBr4), respectively.
The parameters of the epitaxial layer structure are listed in Table I. A triple emitter-
cap layer was used to reduce emitter resistance: 50-nm-thick InGaAs doped to 4 x 1019
cm"3, 100-nm-thick GaAs doped to 8 x 1018 cm"3, and 50-nm-thick InGaP doped to 8 x
1018 cm 3 . The emitter layer was 100-nm-thick InGaP doped to 5 x 1017 cm"3. Thep-GaAs
base layer was highly doped to 1 x 1020 cm"3 to reduce the contact resistance with the
WSi/Ti base electrode, and the thickness was 30 nm to obtain appropriate current gain at
the high doping level. The n-GaAs collector layer was 200-nm thick and doped to 2 x
10 cm"3. The rc-GaAs subcollector layer was 800-nm thick and doped to 8 x 1018 cm"3.
The relatively thick subcollector layer was used in order to bury the thick Si0 2 in the
extrinsic collector.
The main steps in the fabrication process are schematically shown in Fig. 6. Device
122 T. Oka et al.

Table 1. Epitaxial layer structure parameters of fabricated HBTs.

Doping Thickness
Layer Material tn _3
(cm ) (nm)

Emitter-cap n-InGaAs 4u~1019 50


n-GaAs 8ti~1018 100
n-InGaP 8ii~1018 50
Emitter n-InGaP 5u~1017 100
Base p-GaAs lii~102o 30
Collector n-GaAs 2ii~1016 200
Subcollector n-GaAs 8ii~1018 800

fabrication starts with W/WSi deposition on the InGaAs emitter-cap layer by RF


sputtering. These metals are formed into a non-alloyed emitter electrode by RIE using
CHF3 and SF6. By using this electrode as a mask, the InGaAs and GaAs layers are etched
by using C12/CH4 electron cyclotron resonance (ECR) plasma (Fig. 6(a)).
The next step involves formation of a 0.4 p,m thick Si0 2 sidewall around the emitter
electrode. By using this sidewall as a mask, the base and collector layers are self-
aligningly etched by ECR plasma. F + ions are then implanted into the subcollector to
isolate each device, and the outsides of the base and collector mesas are buried with Si0 2
by using planarization and an etchback process (Fig. 6(b)).
The InGaP layers are etched by the ECR plasma, leaving approximately 50 nm
unetched so that the plasma does not damage the base surface. The remaining InGaP
layer is removed by selective wet-chemical etching using a solution of dilute HC1. During
this etching, the side of the emitter layer is hardly etched because of the orientation
dependence of etching rate. A 0.1 nm thick Si0 2 sidewall is then formed to avoid a short
between the emitter and the base electrode (Fig. 6(c)). This results in the base contact
width of about 0.3 um, which is the proper width to obtain both high fT and /max as
explained in the previous section.
WSi/Ti is deposited by RF sputtering and etched by RIE to define the base electrode
(Fig. 6(d)). Then, AuGe/W/Ni/Au/Mo is deposited on the subcollector as a collector
electrode, and alloyed at 350°C for 30 min. Subsequently, the WSi on the emitter
electrode is selectively etched by RIE with CF4 by using a photoresist as a mask, and the
Ti on the emitter electrode is removed by dilute HF (Fig. 6(e)).
After devices are passivated and planarized with Si0 2 and spin-on-glass, WSiN
resistors with 100 Q/square sheet resistance and the first level of the metallization are
fabricated (Fig. 6(f)). This metallization forms most of the probe pads, resistor contacts,
capacitor bottom plate, interconnect wiring, and transmission lines.
To serve as interconnect crossovers for the second wiring levels, the first level of
metallization is passivated and planarized with Si0 2 and spin-on-glass. SiN is then
deposited and etched away in unwanted regions in order to form MIM capacitors. Finally,
Small-Scale InGaP/GaAs Heterojunction Bipolar Transistors 123

W/WSi n + lnGaAs/GaAs

p + GaAs • SiO,

sf=^=:I
-InGaP

n GaAs F+l/l
n + GaAs n + GaAs

S.I. GaAs sub. S.I. GaAs sub.

(a) (b)

Si0 2 Sidewall WSi/Ti


m. y
r?
(c) (d)

SiO

AuGe

iibiij

(e) (f)

Fig. 6. Fabrication steps for the HBT with a WSi/Ti base electrode and buried Si0 2 .
124 T. Oka et al.

Fig. 7. SEM cross-section of fabricated HBT with a WSi/Ti base electrode and buried 8i0 2 .

Au is evaporated to form the second level of metallization and the top plate of MIM
capacitors.
A cross-sectional SEM photograph of the fabricated HBT is shown in Fig. 7. The
base-contact width is 0 3 pm and the sidewall thickness is about 0.1 pin. The final
thickness of the buried Si0 2 is 0.4-0.5 pm. The WSi/Ti base electrode covers both the
narrow base-contact surface and the Si0 2 buried in the extrinsic collector regions,
indicating that WSi and Ti are useful materials for fabricating base electrodes of the
small-scale HBTs with buried Si0 2 .
Small-Scale InGaP/GaAs Heterojunction Bipolar Transistors 125

5. Device Performance

5.1. DC Characteristics

DC characteristics of a fabricated HBT with an emitter size SE of 0.25 pim x 1.5 nm are
shown in Fig. 8. Figure 8(a) shows the common-emitter Ic - VCE characteristics. The
small signal current gain htt is 28, and the collector-emitter breakdown voltage BVCEO of
9.6 V is attained. The offset voltage is about 0.3 V. This relatively large offset voltage is
probably attributed to the relatively large ratio of the base-collector junction area to the
emitter area as well as the difference in turn-on voltages of the base-emitter and the base-
collector junction. Figure 8(b) shows a Gummel plot at a base-collector bias voltage VBc
of 0 V. The DC current gain hFE of 30 is achieved at a collector current density Jc of 1 x
105 A/cm2 and decreases at Jc above 2 x 105 A/cm2 because of the base push-out effect at
high current. The relatively large current gain for the small-emitter device is attributed to
the low surface recombination velocity of InGaP, which suppresses the emitter size effect
on current gain.23'24 The ideality factors of the collector and the base currents are 1.0 and
1.6, respectively.

2.0 • 1 • I i

l B == 10^A/step .
1.5
r •

~
o
1.0 •f -

I •

0.5
I
r...
-

1 2 3 1.0 1.2 1.4 1.6 1.8


VCE (V) V BE (V)

(a) (b)

Fig. 8. DC characteristics of fabricated HBT with an emitter size 5 £ of 0.25 um x 1.5 um: (a) common-emitter
lc ~VCEcharacteristics; (b) Gummel plot.

5.2. Microwave Characteristics

We investigated the high-frequency characteristics of the fabricated HBTs by using on-


wafer S-parameter measurements with an HP85107A network analyzer system and
cascade microwave probes. The measurements were carried out in the frequency range of
126 T. Oka et al.

100 MHz to 40 GHz. The pad parasitics were de-embedded using the method presented
by Costa et al.25
Figure 9 shows frequency dependence of small-signal current gain \h2i\2, unilateral
power gain UG, and maximum stable gain MSG for an HBT with an emitter size SE of
0.5 jam x 4.5 |im. The collector-emitter bias voltage VCE was 1.5 V and the collector
current Ic was 3.5 mA. T h e / r and/ max as estimated with -20 dB/decade extrapolations
from \h2\f and UG were 156 GHz and 255 GHz, respectively.

40 • • • •• i

fmax = 255GHz
30
ST fT = 156GHz
£ 20
03
CD
10

0
1 10 100
Frequency (GHz)

Fig. 9. Frequency dependence of small-signal current gain | h2\\2, unilateral gain UG, and maximum stable
gain MSG of fabricated HBT with SE of 0.5 um x 4.5 um. The measurements were done at VCE= 1.5 V and
Ic = 3.5 mA.

Figure 10 shows dependence of/ r and/ max on collector currents of a fabricated HBT
with SE of 0.5 urn x 4.5 |am at VCE of 1.5 V. The figure also plots the results of the
conventional HBT (SE = 1.2 ^m x 3.4 (im) without burying Si0 2 under the base
electrode.24 The fabricated HBT operates at much higher fT and / max at lower collector
currents than the conventional HBT, demonstrating that the reduction of CBc effectively
improves the high-frequency performance in small-scale devices. The fabricated HBT
exhibits the peak/ r of 156 GHz at Ic of 3.5 mA and the peak/ max of 260 GHz at Ic of 3.2
mA. This result indicates the great potential of the developed HBTs for high-speed and
low-power applications.
The product of/ r and BVCEO is one of the figure-of-merits of high-speed bipolar
devices because the reduction of collector thickness increases fT but decreases BVCEO- The
product of fT and BVCEO of the developed HBT is 1.5 THz- V, which is five times larger
than that of Si bipolar devices and larger than that of high-performance InP-based
HBTs.26"28 This result suggests that the developed HBTs are also suitable for ICs with
large output voltages.
Small-Scale InGaP/GaAs Heterojunction Bipolar Transistors 127

200 I i i ii m—i—i i mill—i i I I I I I I I — r - 300 i I I mil—i i I I I I I I I — i i IIIIIII—rr

Fabricated HBT
250
Fabricated HBT
150 (S E = 0.5 \xm X 4.5 Jim)
17 200
N X
o 100 • CD
^ 150
x
CO
50 H- E 100
Conventional HBT
(S e = 1.2 (imx3.4 urn)
50

0.1 1 10 0 0.1 1 10
l c (mA)

(a) (b)

Fig. 10. Dependence of (a) fT and (b) y,nax on collector current of fabricated HBT with SE of 0.5 (imx4.5 |^m
and conventional HBT with S £ of 1.2 u.mx 3.4 |.im.

Figure 11 shows frequency dependence of \h2i\2, UG, and MSG for an HBT with SE of
0.25 nm x 1.5 |im at VCE of 1.5 V and Ic of 0.9 mA. T h e / r and/ max as estimated with -20
dB/decade extrapolations from |/i2i|2 and UG were 114 GHz and 230 GHz, respectively.
Figure 12 shows dependence of fT and/ max on Ic of HBTs with SE 0.25 [xm x 1.5 nm at

40

W = 230 GHz

fT = 114GHz

10 100
Frequency (GHz)

Fig. 11. Frequency dependence of small-signal current gain | h2i\2, unilateral gain UG, and maximum stable
gain MSG of fabricated HBT with SE of 0.25 |jm x 1.5 fim. The measurements were done at VCE = 1.5 V and
/ c = 0.9 mA.
128 T. Oka et al.

VCE of 1.5 V. Although the high frequency performance is degraded compared to that of
the larger-emitter device, the peak fT of 114 GHz and the peak / max of 230 GHz are
achieved at Ic of only 0.9 mA. Furthermore, the device operates a t / r as high as 50 GHz
and/max a s high as 135 GHz at Ic as low as 0.1 mA. These excellent high-frequency
characteristics at low collector currents are due to the simultaneous reduction of the
emitter size and base-collector capacitance.

300
V nF = 1.5V

£ 200
CD
X

,1 100

0
0.1 1 5

Ic (mA)

Fig. 12. Dependence of /rand./iTiaxon collector current of fabricated HBT with SEof 0.25 urn x 1.5 |im at VCE
of 1.5 V.

Typical device characteristics of the developed HBTs and the conventional HBT are
summarized in Table II. The CBC of the HBT with SE of 0.5 ]xm x 4.5 \im is reduced to
6.7 fF, which is about 1/4 of that of the conventional HBT, whereas the SE is about 1/2.
As a result, both zE and Tcc are greatly decreased. However, the high frequency
performance of smaller devices is degraded compared to that of the larger device. This is
because CBC is not reduced in proportion to emitter size, whereas REE is increased in
proportion to emitter size. The large REE still increases the parasitic delay time by
affecting the small parasitic capacitance in CBC- We thus conclude that the reduction of
emitter resistance will further improves the high frequency performance in much smaller
devices.
Small-Scale InGaP/GaAs Heterojunction Bipolar Transistors 129

Table 2. Typical device characteristics of developed HBTs and conventional HBT.

0.5x4.5 0.25x1.5 1.2x3.4


Emitter size ((im2) (developed) (developed) (conventional)

35 30 40
emitter resistance REE (Q) 15 90 6.6
base resistance RB (Q.) 29 53 25
collector resistance Rc (Q.) 6.3 11 8.1
emitter capacitance CEB (fF) 6.2 1.8 13.8
collector capacitance CBC (fF) 6.7 3.5 25.8
/ r peak (GHz) 156 114 115
/max P e a k ( G H z ) 260 230 159
emitter charging time TE (ps) 0.23 0.39 0.35
intrinsic transit time TF (ps) 0.65 0.65 0.65
collector charging time Tcc (ps) 0.14 0.35 0.38

6. Circuit Applications

To investigate the capability of our developed HBTs as elements of high-speed and low-
power integrated circuits, we applied our HBT technology to fabricate 1/8 static
frequency dividers as digital circuits and transimpedance amplifiers as analog circuits.

6.1. Static Frequency Divider

A block diagram of a 1/8 static frequency divider is shown in Fig. 13. The divider
consists of an input buffer composed by emitter followers with 50-Q on-chip resistors,
three stages of divide-by-two master-slave T-type flip-flop (MS-T-FF) with an internal
buffer in series, and an output buffer consisting of a differential amplifier. Each T-FF is
constructed with series-gated emitter-coupled logic (ECL) with an internal single-ended
voltage swing of 400 mV. The supply voltage is -6.5 V. Figure 14 shows a chip
micrograph of the fabricated frequency divider. The divider includes 187 transistors with

o- Input
Jt •>. MS Inter^^. MS Interf s ^ M s InterT^v^. Output
buffer.
fer^— T-FF buffer, T-FF buffer/^ y-FF buffer^" buffer.

Fig. 13. Block diagram of a 1/8 static frequency divider.


130 T. Oka et al.

;; H i -

Fig. 14. Chip micrograph of a 1/8 staticfrequencydivider.

SE of 0.5 iim x 4.5 pm. The size of the chip is 0.9 mm x 1.8 mm and the M8-T-FF with
the internal buffer occupies 210 |im x 440 fun.
The measured minimum input peak-to-peak voltage V r p versus the input frequency of
the frequency divider is shown in Fig. 15. The divider was driven by a single-ended input
signal and reliable operation was guaranteed above the minimum input voltage. In the
frequency range of 3-38 GHz, the minimum input voltage is less than 500 mVp„p. The

1000

> 800 h
£

600
o
>
400 h
E

200
2

10 20 30
Frequency (GHz)

Fig. 15. Measured minimum input voltage versus inputfrequencyof a 1/8 staticfrequencydivider.
Small-Scale InGaP/GaAs Heterojunction Bipolar Transistors 131

self-oscillation frequency is about 24 GHz, and the divider operates up to 39.5 GHz. The
maximum toggle frequency is lower than that of previously reported SiGe HBTs in spite
of the higher fT and/ max of our HBTs.29 This is probably due to interconnect delay owing
to the relatively loose circuit layout, as shown in Fig. 14. Therefore, much higher
operation can be achieved by optimizing the layout design. Figure 16 shows the measured
input and output waveforms at the highest frequency operation. The power consumption
per flip-flop is 190 mW, which is about 2/3 of that of a previously reported 40-GHz static
frequency divider consisting of GaAs HBTs.3

1/5 WUipU

A- r-^ f\ AU-nJhk r r^ >

£f I1 I f
\ \ >
r
\,
VU i ll V~/
J ©
o

50 ps/div.
(b)

Fig. 16. Measured (a) input and (b) output waveforms of a 1/8 static frequency divider at 39.5 GHz.

6.2. Transimpedance Amplifier

A circuit diagram of a transimpedance amplifier is shown in Fig. 17. We designed and


fabricated feedback-type amplifiers with a basic common-emitter configuration. The
circuit consists of a transimpedance gain stage and an impedance matching stage for the
output. To ensure good termination of the output, a resistor is inserted in series between
the emitter follower and the output pad. Two transistor sizes (Qi and Q3: 0.5 jam x 9.5
|j,m; Q2: 0.5 jam x 4.5 |am) are used. The transimpedance gain stage has a feedback
resistance RF of 500 D. and a load resistance RL of 1000 Q. Supply voltages VC1, Va, and
VE are 11 V, 3.2 V, and -1.5 V, respectively. Figure 18 shows a chip micrograph of the
fabricated transimpedance amplifier. A 50-£2 coplanar waveguide transmission line is
utilized to connect the resistor used for the termination of the output pad and to achieve
good return-loss characteristics. Inputs are directly connected to the input pad in order to
avoid inductance and capacitance of the interconnect lines. The chip size is 1.0 mm x 1.4
mm.
132 T. Oka et al.

V.C1 V
C2
Q

IN O- kQi Q3

I—^—OOUT

Fig. 17. Circuit diagram of a transimpedance amplifier.

"5 X s4 ^

11
$ X

if
D
i!
'•~t r

Fig. 18. Chip micrograph of a transimpedance amplifier.


Small-Scale InGaP/GaAs Heterojunction Bipolar Transistors 133

The amplifier was tested on wafer by using RF probe heads in the frequency range of
45 MHz to 50 GHz. The dependence of transimpedance gain Z, and output return-loss
characteristics S22 on the frequency of the transimpedance amplifier is shown in Fig. 19.
The transimpedance characteristics were calculated from the measured S parameters. The
amplifier has a transimpedance gain of 46.5 dBQ. with a 3-dB bandwidth of 41.6 GHz,
and its return loss is below -10 dB over a frequency range of less than 42.0 GHz. The
gain-bandwidth product of 8.8 THz-Q is attained. The power consumption is 150 mW,
which is less than half that of the same type of amplifier previously reported.31 These
results indicate that our HBTs are very promising for producing high-speed ICs with low
power dissipation.

•10 a>
IV)

Q_
DO
- -20

-30
20 30 40 50
Frequency (GHz)

Fig. 19. Frequency dependence of transimpedance gain and output return-loss characteristics of a
transimpedance amplifier.

7. Summary

We have developed small-scale InGaP/GaAs HBT technology for high-speed and low-
current operation. The simultaneous reduction of the emitter size and the base-collector
capacitance were achieved in the HBTs by using WSi/Ti as the base electrode and by
134 T. Oka et al.

burying S i 0 2 in the extrinsic base-collector region under the base electrode, leading to
high-speed as well as low-current operation of GaAs HBTs. We obtained an / T of 156
GHz and an/ m a x of 255 GHz for an HBT with SE of 0.5 jum x 4.5 \im at / c of 3.5 mA, and
a n / T of 114 GHz and an/ m a x of 230 GHz for an HBT with SE of 0.25 ( i m x l . 5 |am at 7C
of 0.9 mA. A 1/8 static frequency divider by using the HBTs operated at a maximum
operation frequency of 39.5 GHz with a power consumption per flip-flop of 190 mW. A
transimpedance amplifier has a 46.5-dBQ gain with a 41.6-GHz bandwidth, and a power
consumption of 150 mW. These results indicate the great potential of our developed
HBTs for high-speed and low-power integrated-circuit applications. W e believe vertical
scaling of epitaxial layers and optimization of circuit design will enable us to realize
further high-speed integrated circuits using GaAs HBTs.

Acknowledgment

The authors would like to thank Mr. I. Ohbu and the process staff at the Hitachi Central
Research Laboratory for their contributions to the process development and the device
fabrication.

References

1. H. Kroemer, "Heterostructure bipolar transistors and integrated circuits", Proc. IEEE, 70,
(1982) 13-25.
2. T. Ishibashi, H. Nakajima, H. Ito, S. Yamahata, and Y. Matsuoka, "Suppressed base-widening
in AlGaAs/ GaAs balistic collection transistors", in 48th Annual Device Research Conf., 1990,
VIIB-3.
3. H. Simawaki, Y. Amamiya, N. Furuhata, and K. Honjo, "Uigh-fIlmx AlGaAs/InGaAs and
AlGaAs/GaAs HBTs fabricated with MOMBE selective growth in extrinsic base regions", in
51st Annual Device Research Conf, 1993, IVA-6.
4. M. Yanagihara, H. Sakai, Y. Ota, M. Tanabe, K. Inoue, and A. Tamura, "253-GHz flJUX
AlGaAs/GaAs HBT with Ni/Ti/Pt/Ti/Pt-contact and L-shaped base electrode", in Tech. Dig.
IEEEIEDM, 1995, pp. 807-810.
5. T. Uchino, T Shiba, T. Kikuchi, Y. Tamaki, A. Watanabe, Y. Kiyota, and M. Honda, "15-ps
ECL/74-GHz/T Si bipolar technology", in Tech. Dig. IEEEIEDM, 1993, pp. 67-70.
6. M. Ugajin, J. Kodate, Y. Kobayashi, S. Konaka, and T. Sakai, "Very-high / T and fmax Si
bipolar transistors using ultra-high-performance super self-aligned process technology for
low-energy and ultra-high-speed LSI's", in Tech. Dig. IEEEIEDM, 1995, pp. 735-738.
7. K. Washio, M. Kondo, E. Ohue, K. Oda, R. Hayami, M. Tanabe, H. Shimamoto, and T.
Harada, "A 0.2-um self-aligned SiGe HBT featuring 107-GHz / „ „ and 6.7-ps ECL", in Tech.
Dig. IEEEIEDM, 1999, pp. 557-560.
8. K. Nagata, O. Nakajima, T. Nittono, Y. Yamauchi, and T. Ishibashi, "A new self-alignment
technology using bridged base electrode for small-scaled AlGaAs/GaAs HBTs", IEEE Trans.
Electron Devices 39 (1992) 1786-1792.
9. W. S. Lee, T. Enoki, S. Yamahata, Y. Matsuoka, and T. Ishibashi, "Submicrometer self-
aligned AlGaAs/GaAs heterojunction bipolar transistor process suitable for digital
applications", IEEE Trans. Electron Devices 39 (1992) 2694-2700.
Small-Scale InGaP/GaAs Heterojunction Bipolar Transistors 135

10. Y. Ueda, N. Hayama, and K. Honjo, "Submicron-square emitter AlGaAs/GaAs HBTs with
AlGaAs hetero-guardring", IEEE Electron Device Lett. 15 (1994) 66-68.
11. D. C. D' Avanzo, "Proton isolation for GaAs integrated circuits", IEEE Trans. Electron
Devices 29 (1982) 1051-1059.
12. P. M. Asbeck, D. L. Miller, R. J. Anderson, and F. H. Eisen, "GaAs/AlGaAs heterojunction
bipolar transistors with buried oxygen-implanted isolation layers", IEEE Electron Device Lett.
5(1984)310-312.
13. M.-C. Ho, R. A. Johnson, W. J. Ho, M. F. Chang, and P. M. Asbeck, "High-performance low-
base-collector capacitance AlGaAs/GaAs heterojunction bipolar transistors fabricated by deep
ion implantation", IEEE Electron Device Lett. 16 (1995) 512-514.
14. T. Oka, K. Hirata, K. Ouchi, H. Uchiyama, K. Mochizuki, and T. Nakamura, "InGaP/GaAs
HBT's with high-speed and low-current operation fabricated using WSi/Ti as the base
electrode and burying Si0 2 in the extrinsic collector", in Tech. Dig. IEEE IEDM, 1997, pp.
739-742.
15. T. Oka, K. Hirata, K. Ouchi, H. Uchiyama, K. Mochizuki, and T. Nakamura, "Small-scaled
InGaP/GaAs HBT's with WSi/Ti base electrode and buried Si0 2 ", IEEE Trans. Electron
Devices 45 (1998) 2276-2282.
16. S. M. Sze, Physics of Semiconductor Devices, John Wiley & Sons, Inc., New York, 1981, pp.
850-852.
17. T.Oka, K. Ouchi, K. Mochizuki, and T. Nakamura, "A WSi base electrode and a heavily-
doped thin base layer for high-speed and low-power InGaP/GaAs HBTs", Japan. J. Appl.
Phys. 36(1997)1804-1806.
18. T. Oka, K. Ouchi, H. Uchiyama, T. Taniguchi, K. Mochizuki, and T. Nakamura, "High-speed
InGaP/GaAs heterojunction bipolar transistors with buried Si0 2 using WSi as the base
electrode", IEEE Electron Device Lett. 18 (1997) 154-156.
19. H. Murrmann, and D. Widmann, "Current crowding on metal contacts to planar devices",
IEEE Trans. Electron Devices 16 (1969) 1022-1024.
20. H. H. Berger, "Models for contacts to planar devices", Solid-State Electron. 15 (1972) 145-
158.
21. A. Y. C. Yu, "Electron tunneling and contact resistance of metal-silicon contact barriers,"
Solid-State Electron. 13 (1970) 239-247.
22. K. Ouchi, T. Mishima, K. Mochizuki, T.Oka, and T. Tanoue, "Fully strained heavily carbon-
doped GaAs grown by gas-source molecular beam epitaxy using carbontetrabromide and its
application to InGaP/GaAs heterojunction bipolar transistors", Japan. J. Appl. Phys. 36 (1997)
1866-1868.
23. O. Nakajima, K. Nagata, H. Ito, T. Ishibashi, and T. Sugeta, "Emitter-base junction size effect
on current gain Hfe of AlGaAs/GaAs heterojunction bipolar transistors", Japan. J. Appl. Phys.
24 (1985) L596-L598.
24. T. Oka, K. Ouchi, K. Mochizuki, and T. Nakamura, "High-speed InGaP/GaAs HBTs w i t h / ^
of 159 GHz", Solid-State Electron. 41 (1997) 1611-1614.
25. D. Costa, W. U. Liu, and J. S. Harris, Jr., "Direct extraction of the AlGaAs/GaAs
heterojunction bipolar transistor small-signal equivalent circuit", IEEE Trans. Electron
Devices 38 (1991) 2018-2024.
26. L. E. Larson, "Silicon bipolar transistor design and modeling for microwave integrated circuit
applications", in Proc. IEEE BCTM, 1996, pp. 142-148.
27. S. Yamahata, K. Kurishima, H. Ito, and Y. Matsuoka, "Over-220-GHz-/T-and-/max InP/InGaAs
136 T. Oka et al.

double-heterojunction bipolar transistors with a new hexagonal-shaped emitter", in GaAs IC


Symp. Tech. Dig., 1995, pp. 163-166.
28. M. Rodwell, Q. Lee, D. Mensa, R. Pullela, J. Guthrie, S. C. Martin, R. P. Smith, S.
Jaganathan, T. Mathew, B. Agarwal, and S. Long, "48 GHz digital ICs using transferred-
substrate HBTs", in GaAs IC Symp. Tech. Dig., 1998, pp. 113-116.
29. K. Washio, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and T. Onai, "95 GHz fT self-aligned
selective epitaxial SiGe HBT with SMI electrodes," in ISSCC Digest of Tech. Papers, 1998,
pp. 312-313.
30. Y. Amamiya, T. Niwa, N. Nagano, M. Mamada, Y. Suzuki, and H. Simawaki, "40-GHz
frequency dividers with reduced power dissipation fabricated using high-speed small-emitter-
area AlGaAs/InGaAs HBTs," in Tech. Dig. IEEE GaAs IC Symp., 1998, pp. 121-124.
31. Y. Suzuki, H. Simawaki, Y. Amamiya, and K. Fukuchi, "An HBT preamplifier for 40-Gb/s
optical transmission systems," in Tech. Dig. IEEE GaAs IC Symp., 1996, pp. 203-206.
International Journal of High Speed Electronics and Systems, Vol. 11, No. 1 (2001) 137-158
© World Scientific Publishing Company

PROSPECTS OFInP-BASED IC TECHNOLOGIES


FOR 100-GBIT/S -CLASS LIGHTWAVE COMMUNICATIONS
SYSTEMS

TAKATOMO ENOKI, EIICHI SANO, AND TADAO ISHIBASHI


NTT Photonics Laboratories
3-1, Morinosato Wakamiya, Atsugi-shi, Kanagawa 243-0198, Japan

This paper describes the device technologies that will be instrumental in


achieving 100-Gbit/s-class ultrahigh-speed lightwave communications systems
and clarifies the device technology issues we must address before we can make
such systems a reality. At the interface between optical and electrical units,
their monolithic integration is necessary and a uni-traveling carrier photodiode
(UTC-PD) enables us to simplify the interface circuits. In high-speed circuits,
InP-based HEMT's and HBT's will still be key devices. Further scaling down
of HEMT's and reduction of their parasitic capacitance are the major subjects to
achieve lOOGbit/s operations. For HBT's, reduction of the product of feedback
capacitance and base resistance are necessary. Estimations of operation speed
for basic circuits in lightwave communications systems are summarized to set
targets for designing device structures.

1. Introduction

Recently, developments in lightwave communications technology have enabled the


commercial use of 10-Gbit/s systems. The recent growth of the Internet, however, has
been tremendous and the number of Internet hosts has increased a 1000 fold in the last
decade. A vast increase in transmission capacity is still needed. The experimental
development of next-generation cost-effective time division multiplexing (TDM) and
wavelength division multiplexing (WDM) optical fiber networks has begun.2"3 TDM
systems operating beyond 40 Gbit/s, combined with WDM, offer promising solutions for
over-Tbit/s systems, and the research target is now moving to 100-Gbit/s-class operation.
One of the major issues in implementing such ultrahigh speed TDM systems is the
development of ultrahigh speed electronic ICs.
In such development, it is very important that device researchers clarify and
demonstrate how fast ICs can operate because this information let's us judge the limit of
electrical TDM networks and assess the cost-effectiveness of TDM and WDM networks.
Several ICs that operate over 50-Gbit/s using InP HEMT's, InP HBT's, and SiGe bipolars
have been reported.5"7'M InP HEMT's and InP HBT's consist of InP-based ternary and
quaternary materials, such as InGaAs, InAlAs, InGaAsP. These material systems are
basically indispensable for photonic devices in communications systems because the

137
138 T. Enoki, E. Sana & T. Ishihashi

jLBBSaiafiLi Receiver

iEDFA r

PD: photodetector
Pre: preamplifier
Dist: distributor
Dec: decision IC
LD: laser diode DEMUX: demultiplexer
MOD: optical modulator Dift differentiator
EDFA:fiberamp. Rec: rectifier
DRV: driver Res: microwave resonator
PLL: phase locked loop Limit: limiting amplifier
MUX: multiplexer FD: frequency divider

(a)

Advanced HIC Monolithic Receiver

alHidi^lTC
MIC
s
LnffiL.
Optica/ Output
\^s2J—1EDFA [CmlH Pre/4
Optica/ Input
( i J ^ H ^ L

OFF

PLL
>cJQEE_
AOC: automatic offset
MLJ; ZK ILL
E/ectrica/ Output
JEEL
>cMM
controller

E/ectrica/ Input
(b)
Fig, 1. Optical transmitter and receiver configurations

bandgap energy of Ino.53Ga0.47As, which is lattice-matched to InP substrate, corresponds


to the wavelength that gives the minimum propagation loss in optical fiber and the other
materials act as barrier layers for carrier or cladding layers for light. The variety of
bandgap energy and conduction-band discontinuity (AEC) also gives us wide freedoms in
bandgap engineering of electron devices as well as optical devices. Moreover* the
typical electron mobility of Ino.53Gao.47As is about double that of GaAs and the ohmic
contact resistance to n-InGaAs is lower than that to GaAs due to the lower Schottky
barrier height and the higher saturation doping density of donors. These features
motivate us to make high-speed electron devices and monolithically integrated OEICs
using InP-based material systems. This paper discusses the potential of InP-based
devices for 100-Gbit/s-class IC's.

2, Lightwave Communications systems

Figure 1 depicts the basic transmitter and receiver configurations for lightwave
communications systems. In the early R&D stage, transmitters and receivers tendtobe
InP-Based IC Technologies for Lightwave Communications Systems 139

100

I
D-F/F J < 7
' •/A
A

^Monolithic CDR
2
u. yS < 6 :

& • •

o
E
1
i •

Si Bipolar
SiGe HBT
X A GaAs FET
A GaAs HBT
2 a InPFET
• InPHBT

0.1 . . . ! . . ,
1980 1985 1990 1995 2000 2005
Year
Fig. 2. Trend in maximum clock frequency for lightwave communications ICs over time.

constructed with several small size integrated circuits (SSIs), each of which corresponds
to a block in Fig. 1(a). Figure 2 shows the trend in maximum clock frequency for
D-type flip-flop (D-F/F) which is the most important circuit element for lightwave
communications ICs. The data indicate that the maximum clock frequency has
multiplied by a factor of five over the last decade and that it will reach 100 Gbit/s by
2004. To keep up with this increase, however, we must solve some problems related to
device, circuit, and packaging technologies.8
In device technology, it is still necessary to improve the high frequency performance
and current drivability of transistors in circuits in order to increase maximum operation
speed of flip-flop circuits, although InP-based HEMT and HBT have achieved record
performance in discrete semiconductor devices. Some approaches to doing so will be
discussed in the following section. In circuit technology, highly integrated circuits, such
as monolithic clock & data recovery circuits (CDR), should be targeted for commercial
products from the point of view of power dissipation, reliability, size, and cost. Figure 2
also shows the maximum clock frequency for monolithic CDR's. It has rapidly
increased and is now 20 Gbit/s.8
Constructing 100-Gbit/s transmitters and receivers with SSI hybrid-integration
method is not realistic because commercially available coaxial connectors are restricted
to 65 GHz. 100-Gbit/s transmitters and receivers should monolithically integrate
circuits as much as possible. Figure 1(b) shows 100-Gbit/s-class transmitter and
receiver configurations. As will be described later, we have a solution: construct it with
a uni-traveling carrier photodiode (UTC-PD) and HEMT's or HBT's. Since it is very
140 T. Enoki, E. Sano & T. Ishibashi

1000 1
' ""I • r—r
3

• InP-based HEMT
o O GaAs-based HEMT
c
a> • GaAsMESFET
3
A Si-MOSFET
A'.

100 -

<0
O)
c
fl>

3
o
10 '
0.01 0.1 1
Gate length (\im)
Fig. 3. Dependence of current gain cutoff frequency on gate length for FET's.

difficult to monolithically integrate an optical modulator with HEMTs or HBTs, advanced


optoelectronic hybrid IC technologies are neceaasry for 100-Gbit/s transmitters.

3. Status of High Electron Mobility Transistors (HEMTs)

Figure 3 summarizes current gain cutoff frequencies (fT's) for various kinds of FET's.
The gate length in this figure is the physical length of the gate footprint for GaAs and
InP-based FET's and the effective gate length for Si-MOSFET's,9 which were estimated
by subtracting the lateral diffusion length of the source/drain n+-region under the gate.
Among them, high electron mobility transistors (HEMT's) consisting of
Ino.52Alo.4gAs/Ino.53Gao.47As heterostructure, in other words, InP-based HEMT's, have
achieved record fT's for a wide range of Lg's. Especially below the 0.1-|xm region, fT's
for InP-based HEMT's are remarkably higher than those for GaAs-based devices. This
is due to the higher saturation velocity of electrons, the lower parasitic resistance, the
better carrier transport properties, and the improved carrier confinement.
In applications for high-speed logic circuits, an fT should be kept as high as possible
from off to on states. It is useful to use a two-region model for HEMT's in order to
clarify the relationship between the device parameters and the fT. Here we define \i as a
low-field electron mobility and vs as a saturation velocity and assume that the electron
velocity is expressed by
InP-Based IC Technologies for Lightwave Communications Systems 141

i i i i I i i i i I i i i i I i i i i

_ • (i =0

"*•# Shigekawa et al.

Windhorn et al.

j—1_J i • • • •
5 10 15 20

Electric field strength (V/cm)


Fig. 4. Dependence of electron drift velocity on electric field strength.

fiF

VA =\ l + ^-^F (1)

F>F.

where |^ is an electron mobility for high electric field and defines the electric field
strength (Fs) where velocity saturation of electrons takes place. Figure 4 depicts the
dependence of vd on F with typical parameters in comparison with experimental data
measured by time-of-flight methods.10 Assuming the above electron transport, we can
derive the drain current (Ids) , transconductance (gm), and the delay time (TJ) when the
velocity saturation takes place at the drain end of the gate.

v
,K-v*\M
M g
' * - , M- 1 + - (2)
d +M {d + M)lds
+^-{l-M)

(3)
dl•ds vs 3M(l + M)'
M
Offl
d+Ad
(4)
,, fi, ( l - M ) 2 '
142 T. Enoki, E. Sano & T. Ishibashi

- - © - - Measured (L =0.03[im) - - • - • Measured (L =0.5nm)


2
H=3000cm /Vs
• - - Measured (L =0.2nm)
2 - - A - - Measured (L =0.1 nm)
H= 6000 cm /Vs
• • • • Measured (L =0.05nm)
|i=10000cm'/Vs

v =2.7x10 cm/s
400
N
X
o 350

o 300
c
a
3
>
0) 250
1-
<*-
3= 200
o
**
3
U 150
c
oa
O) 100
•f
c
50
I_
3
u n
10" 10" 10"' 10 u
Drain current density (A/mm)

Fig. 5. Dependence of current-gain cutoff frequency on drain current


or various barrier thickness

where e is the permittivity of the barrier layer, Ad the thickness of two-dimensional


electron gas, Vgs the gate-to-source voltage, Vth the threshold voltage. As can be seen
from eq.(2), when a constant mobility ((xs=(J.) is assumed, the M means the ratio of the
actual current to the imaginary current, which is calculated by assuming that all electrons
at the source travel under the gate at the saturation velocity.
From an equivalent circuit model of an FET,

(~<gS + *-orl \
fr- 2JZ\ {i + gd{Rs+Rd)} + Cgd{Rs+Rd) (6)

where Cgs and Cgd are the gate-source and gate-drain capacitance, and Rs and Rd the
InP-Based IC Technologies for Lightwave Communications Systems 143

source and drain resistances. Here, we consider an ideal case where Cgd results from a
fringe capacitance (ite/2) of the gate depletion layer. In this case,

ji(d+Adjl+M2)
fr = 2JTJJT, .{l + grf(K,+K(,)} + Y f a + * J (7)
2\>M

Figure 5 shows dependence of fT on Ids calculated by assuming vs of 2.7x10 cni/s,


gm/gd=20, Rs+Rd=0.4Qmm, (j/m=4. Although the maximum fT does not depend on u,
fT at a low current density significantly depends on it. Higher \i results in higher fT in
the low drain current density region. In the same figure, the measured dependences of
fT on Ids for 0.03-12, 0.05-11, 0.1-, 0.22-, and 0.5-^m-gate devices are also plotted.
Although the measured maximum fT's are consistent with the calculated one, the
measured dependence for a shorter-gate-length device is much steeper than the
calculation. The used models for electron transport and charge control are very simple
and there is large room for improvement. The good agreements for the devices with a
gate length of longer than 0.2 \xm, however, strongly suggest that the so-called
short-channel effects, such as threshold voltage shift (AVth), gm compression, and drain

• i i • i • •
' 1 '

> . ^
E •

-200 y^«
' J

(A
0)
O)
CO

-400

• d= 12nm (measured)
o 1 'i d=15nm (Simulation)
r ! d=12nm (Simulation)
(0 -600 — - -- - i - - - d= 8nm (Simulation)
o • ' :'
-i :

1 ;
-800 :i • • • • •i • • i i i i i •

0.05 0.1 0.2 0.5


Gate length (nm)
Fig. 6. Dependence of threshold voltage shift on gate length for various barrier thickness
144 T. Enoki, E. Sano & T. Ishibashi

conductance increase, result in the steeper dependence of fT. Thus, for high-speed logic
operation, shortening Lg, increasing \i and reducing the short-channel effects are
important.
When shortening Lg, scaling down the channel, which means reducing Lg and d at
the same time, is the most important rule for improving the performance and suppressing
the short-channel effects. Figure 6 shows the dependence of AVth on Lg for InP-based
HEMT's. 1L13 The measured data are for devices with a d of 12 nm, and the curves were
obtained from two-dimensional device simulations. Below 0.1 \im, AVlh strongly
depends on Lg and this suggests difficulty in controlling V,h in the region. To ease the
Vth control, a thinner barrier should be introduced. Although the gate length has been
shortened to 0.03 n-m and fT exceeds 300 GHz as shown in Fig. 3, not enough attention
has been paid to the threshold voltage control in making digital ICs. A very thin barrier
of 8 nm will reduce AV,h to -0.55 V for Lg of 0.03 \x.m.

4. IC Performance and Device Parameters

4.1 Propagation gate delay ofHEMT-IC


In designing high-speed devices, it is very important to clarify what parameters affect the
circuit speed. To do this, analytical expressions that predict the propagation delays in
the basic inverters are very helpful. Source-coupled FET logic (SCFL) and
emitter-coupled logic (ECL) designs are widely used in high-speed digital ICs because of
their advantageous series-gate configuration. In this section, we discuss to what extent
InP-based devices can meet the requirements for 100-Gbit/s ICs.
The propagation gate delay for an SCFL inverter with a gain of Ga is given by14

C^ + RL(ced+Cds)+RLRg(gm)Csd+R[c ) +
' \ 1 I D T, 8 *> 8S 8" '
is-) rt
1 + RL8*

l-g„ ML+R,)
Csd{RL+Rg)+C RL+RS +
gm + 28d
C, + Cds + 2xgd
om,max od

C,=Cgs+{l + Ga)Ced, G. 78, (8)


! + *,£„ 8„

where Cgs, Cgd, and Cds are source-gate, gate-drain, and drain-source capacitances, and RL
is load resistance, Rg gate resistance, <gm> the average extrinsic transconductance in the
logic swing, gm-max the maximum extrinsic transconductance, gd drain conductance, and x
the phase delay of transconductance.
InP-Based IC Technologies for Lightwave Communications Systems 145

Cp=jte/2+80fF/miB
Cf=2. 5 f F
2.5 1
' ' I -i—i—I—i—I—i—i—I—r

3ps/gate

a gi(GaAl UESFET) | -
o gi(G»As HODFET)
• g»(lnP NEXT)

0.0 I . . . . I
0 100 200 300 400 500
Current gain cutoff frequency (GHz)
(a)

Cp=ne/2+40fF/inm
Cf=1.25fF

0.0 I ' ' i


0 100 200 300 400 500
Current gain cutoff frequency (GHz)
(b)
Fig. 7. Calculated tpd contours in a graph of gm and fr.
(a) Cp=jte/2+80 fF/mm, Cf=2.5 fF, (b):Reduce parasitic capacitances.
146 T. Enoki, E. Sano & T. Ishibashi

By combining eq. (6) and eq. (8), xpd can be calculated for fT and gmjinax as
independent parameters assuming other parasitic parameters. Increasing fT and gmjnax
corresponds to shortening gate length and making the channel shallower, respectively.
Table 1 summarizes the parameters and Fig. 7 shows calculated Tpd contours in a plot of
fT v.s. gm. In the table, Cp is the parasitic capacitance for Cgs and Cgd and is proportional
to the gate width. For example, Cp is an overlap capacitance of a T-shaped gate. Q is
a fringe capacitance of Cgs and Cgd and is assumed to be a constant. In Fig. 7(a), Cp and
Q are the values for the state-of-the art HEMT's developed at NTT.15 For 100-Gbit/s
operation, xps should be lower than 3 ps/gate, and the contours in Fig. 7(a) means that it is
necessary to develop HEMT's with a gm of over 1.7 S/mm and an fT of over 340 GHz in
circuits, which are close to the record performance for discrete devices. Although a
dynamic circuit configuration can relax the requirements to some extent, novel
technologies, which enable us to monolithically integrate such high-performance
HEMT's, should be developed. On the other hand, the effects of Cp and Cf dominate the
circuit performance. In Fig. 7(b), the Cp and Q are assumed to be half the values for Fig.
7(a). The gm and fT requirements to achieve 3 ps/gate become more realistic. In this
case, HEMT's with a gm of over 1.4 S/mm and an fT of over 280 GHz will be a very
promising device to achieve 100-Gbit/s IC's. At the same time, however, Fig. 7 shows
strong dependence of t ^ on Cp and Q. Sophisticated device processes that reduce
parasitic capacitance and accurate characterization of FET's are required to design
100-GHz-class logic IC's.

Table I. Parameters for calculation of Tpd in Fig. 7.


Parameter Value Parameter Value
G. 2 gd &m.max/^*-'

w8 20 nm Cgd (f+cpys+Cf
R8 0.67 Q Qls 14fF/mm
Rs 0.15 Qmm Cp 80 fF/mm or 40 fF/mm
Rd 0.23 Qmm Q 2.5 fF or 1.2 fF
V 0.8

In addition to the gate delay of circuits, we should take interconnection delay into
account. The equivalent circuit model for interconnection line depends on its length and
wavelength of signal. In order to estimate the interconnection delay, we directly
measured gate delays of SCFL inverters with a line intentionally inserted in signal path.
Figure 8(a) shows measured dependence of the gate delay on the power dissipation.
"1L", "2L", "TFMS", and "CPW" stand for "lst-level line", "2nd-level line", "thin-film
microstrip line", and "coplanar waveguide", respectively, and their structures are dipicted
in Fig. 8(b). The line length is 100 urn and the gate delays of the inverters without the
intentionally inserted line are also plotted as reference. BCB (benzocyclobutene) is
used as a low permittivity insulating layer between 1st and 2nd-level lines. Figure 8(a)
InP-Based IC Technologies for Lightwave Communications Systems 147

7.0
CPW-100nm \

I
6.0 1L-I00nm Low-Impedance line
I High-impedance line
(Between circuit blocks)

(In circuit blocl.)

TFMS-100|im

70 80 90 100 150
Power dissipation (mW/gate)

(a) (b)
Fig. 8. Gate delay times for SCFL inverters with intentionally inserted lines.

clearly shows that the interconnection delay depends on the structure of line. The
2°d-level line has the smallest propagation delay of about 2 ps/mm and is suitable for
high-speed interconnection. The TFMS line has the relatively small propagation delay
of about 4 ps/mm and is suitable for designing the characteristic impedance of the line.
Based on these data and Fig. 7(b), required device performance for the total gate delay of
3 ps/gate is a gm of over 1.5 S/mm and an fT of over 300 GHz.

4.2 Bandwidth of basic HBT-ICs and HBTparameters


In this subsection, we will clarify the bipolar transistor performance needed in order to
achieve 100-Gbit/s operation in the transmitter and receiver circuits shown in Fig. 1.
The phase locked loop (PLL) is not critical, because a 100-GHz voltage-controlled
oscillator (VCO), a key component in the PLL, has been obtained using InP-based
HBTs.16 A 60-Gbit/s operation has been realized for 2:1 MUX using SiGe HBTs.17 A
slight improvement in the device performance will result in 100-Gbit/s operation. Then,
we investigate the maximum operating speed of the D-F/Fs. The propagation delay xpd
for an emitter-coupled logic (ECL) inverter is given by

^=^f\1 +
~yRsCBC(2 + ^yRL{CBC+CBCj2 + ^ (9)

where Xf is the forward transit time, RB the base resistance, RL the load resistance, CBC the
internal base-to-collector capacitance, and CBCex the external base-to-collector
capacitance.18 rD is the large version of the differential emitter resistance given by
0.15// c , where Ic is the collector current.19 Equation (9) gives a value 18% smaller than
that obtained by SPICE simulation for our InP-based HBT. The propagation delay
depends on the collector current and has a minimum value given by
148 T. Enoki, E. Sano & T. Ishibashi

r-
' ^*^**~*^^ •
VL=0.3 V
=0.5 V •

: ^ ^ -
"=0.8 V "^
CO •v_^^V L ==0.3 V ^N^
Q. X<>/S\ 4 0 Gbit/s
s^^ =0.5 v \ ° f
= 0 . 8 v \ o \
O
CD
O 0.1


:
CD

g

CBC&X=CBC eo X A\
100 Gbit/s\ \\
o InP
• o GaAs
A SiGe
0.01 , i i I
0.1
T f (ps)
Fig. 9. Maximum operating speed contours for a MS D-F/F.
Device performances denoted by a-k are reported in Ref. 20-30.

/2+015
T
pdmin ~Xf +
^B^-BC 2 + - F '- + 2lzfRB{CE + CBCJ V,. (10)
0.15 0.15 V,

where VL is the logic swing. A circuit simulation revealed that the maximum operating
speed of a master-slave (MS) D-F/F is given by ll(33rpdmi„). Figure 9 shows the
maximum operating speed contours for MS D-F/Fs along with HBT performances
reported in Refs. 20-30. In Fig. 9, CBC in the transferred-substrate HBT denoted by e is
replaced by (CBC +CBcex)/2, which gives a good approximation for HBTs with CBCex much
larger than CBC. It is evident from Fig. 8 that the InP material system is most promising.
It is anticipated from Eq. (9) that the device performance required for 100-Gbit/s
operation in a modulator driver is severer than that for D-F/F shown in Fig. 8 when the
ordinary current switch configuration is used. Figure 10 shows a possible driver circuit
using cascode and emitter-feedback configurations. Straight calculation for the
equivalent circuit of the driver gives the time constant

1+- + RBCBC —— + RLCBC. (11)


r +R
rD+RE D E
InP-Based IC Technologies for Lightwave Communications Systems 149

50Q

from D-F/F o

Fig. 10. Circuit configuration for optical modulator driver.

50 i i i . ,

RB=1 G
/ =2 Q
~"^>vj^
NT O v \
\ \ "*\\
•*• \ V \

10 1 \ \ \»\

O
CO
%
O 1
1 ' \*\
\ * I'll
1 \ 1 \*\
t 1 1 \'l

1 1 • 1'1

RL=50 Q
1 1 , 1*1
i j i y1
VDR= 2 V 1 1

'I I
VDR= 3 V

1 . . . . t

0.1 1
Tf (ps)

Fig. 11. 70-GHz-bandwidth contours for optical modulator drivers

Here we neglected CBCex for simplicity. We assume that the 3-dB bandwidth
/-3dB=l/(2*Jt*r) of the driver is 70 % of the rate frequency. Figure 11 shows 70-GHz
bandwidth contours in a graph of Xf and CBC for driving voltages VDR of 2 and 3 V.
Figure 12 shows a typical preamplifier and limiting amplifier configuration.8 An
the automatic offset controller (AOC) ensure a correct differential operation by producing
a half of the peak photocurrent. A limiting amplifier is required in order to absorb the
variation in the optical power and produce a constant output voltage swing. The same
procedure as done for the inverter, but neglecting CBCex, gives the minimum time
constants for the preamplifier and limiting amplifier as
150 T. Enoki, E. Sano & T. Ishibashi

Preamplifier Limiting Amplifier

to D-F/F
o and PLL
UTC-PD

VEEo

, , AOC|
Fig. 12. Circuit configurations of preamplifier and limiting amplifier.

rmin = rf + {2RB +RF)CBC+ 2^jrf{RB+Rr)CB (13)

and

= rf+3RBCBC+2yj2At^RBC (14)
B

respectively. Here A is the voltage gain of the limiting amplifier. Assuming a


minimum detectable optical power of 0.3 mW, a UTC-PD responsivity of 0.5 AAV, RF of
100 Q, and A of 10 gives an output voltage swing of 0.3 VPP with a dynamic range of 20
dB. We also assume the 3-dB bandwidth for the combination of the preamplifier and
limiting amplifier is 60 % of the rate frequency, which means that the bandwidth for each
amplifier is 93 GHz for 100 Gbit/s. Figure 13 shows 93-GHz bandwidth contours in a
graph of Tf and CBC for the preamplifier and limiting amplifier. Finally, a 100-Gbit/s
DEMUX can be obtained using HBTs that produce a 100-Gbit/s D-FF.
The consideration described above indicates that HBTs with Xf of 0.5 ps, RB of 20 Q,
and CBC of 1.6 fF, can be used to produce 100-Gbit/s-class transmitter and receiver ICs.
These values roughly correspond to an fT of 200 GHz and fmax of 500 GHz and can be
achieved by introducing a graded base and reducing the lateral dimensions in the HBT.
InP-Based IC Technologies for Lightwave Communications Systems 151

T f (ps)

Fig. 13. 93-GHz-bandwidth contours for preamplifiers and limiting amplifiers.

5. Photonic Interface for High-Speed Electronics Using Uni-Traveling-Carrier


Photodiodes

The difficulty of receiving ultrahigh bit-rate optical signals is related to electrical


post-amplification. The gain-bandwidth characteristics of a broadband amplifier, for
example, are limited, depending on the high-frequency performance of transistors.
Multiple-stage connections of amplifier circuits to yield enough output, which is required
for driving other electronics, can also degrade the bandwidth. In addition, there are
serious problems associated with the device packaging and interconnections between
amplifier chips or modules. In contrast, much higher gains and bandwidths are
available in optical amplification. When the output level of the photodetector is high
enough, a simple combination of an optical amplifier and such a photodetector can
replace the conventional photoreceiver function. The bandwidth of such a
photoreceiver (photonic driver) is only limited by the photodetector performance because
of the THz bandwidth of optical amplifiers. The uni-traveling-carrier photodiode
(UTC-PD) is a long-wavelength photodiode that achieves high output and fast response
simultaneously and is suitable for the photonic driver applications in a bit-rate range
higher than around 40 Gbit/s.
152 T. Enoki, E. Sano & T. Ishibashi

Diffusion Block Layer

p-Contact

Light Absorption
Layer
Carrier Collection
Layer (Wide-Gap) V.B.

Fig. 14 . Schematic band diagrams for a UTC-PD.

The UTC-PD structure (illustrated in Fig. 14) basically consists of a p-type (neutral)
photoabsorption layer, a widegap (depleted) carrier collection layer and a diffusion block
layer.31 Electrons photogenerated in the absorption layer diffuse or drift into the
collection layer. Here, the diffusion block layer provides electrons with unidirectional
motion. Though both electron and hole currents exist in the absorption layer, only the
electrons are active carriers. This is because the holes are the majority carriers and their
current responds to electron current by their collective motion within the dielectric
relaxation time (on the order of 100 fs). The response speed of a UTC-PD is a function
of electron traveling time over the structure just like in the case of a heterostructure
bipolar transistor. In a representative design with similar absorption and collection layer
thickness values, the absorption layer traveling time (xA) dominates the frequency
response (f3dB " 1/2JTTA).' When electron transport is of diffusive motion, xA is
approximated by

(15)
3De v,h

where WA is the absorption layer thickness, De is the diffusivity of minority electrons,


and vth the thermionic emission velocity of an electron.32,33
The UTC-PD has a significant advantage over pin-PDs in terms of the output
saturation current (or linearity). Generally, the output saturation is initiated by the field
modulation in the depletion layer due to the increase of carrier space charges with
InP-Based IG Technologies for Lightwave Communications Systems 153

1 1 —r 1
30 - R = 2 5 Q
L
S =13|im2 \
25 P j n = 1.0pJ/pulse
V „ = -1.25 V
P m = 0.13 pJ/pulse f3dB = 174GHz
20 V b = -0.75 V FWHM = 1.80ps
C f3dB=220GHz
t 15
3 FWHM = 1.44ps
o 10
3 •

a
*•*
5

•x J
3
o
0 \ ^ U k l k l


0 10 20 30 40
Time (ps)
Fig. 1 5 . Pulse photoresponse waveform for UTC-PD.

increasing current density. So high carrier velocity is very important for reducing this
space charge effect. The nature of the uni-traveling-carrier allows us to make the best
use of the electron velocity overshoot ( ve|eclron " 4 x 107 cm/s in InP). Namely, since the
UTC-PD operation does not use holes, the onset of the space charge effect is much
delayed compared to the case for a pin-PD where holes are dominant space charges (vhoie
5 x 106 cm/s in InGaAs). An output peak current of 80 mApp has been achieved even
while maintaining a high bandwidth f3dB of 115 GHz.34 This peak current is about one
order of magnitude higher than that for a pin-PD with a similar bandwidth. The
UTC-PD design also offers a high f3dB by reducing WA without in increase of junction
capacitance. Figure 15 shows pulse photoresponse curves for an InP/InGaAs UTC-PD
(at 1.55-jum wavelength) measured by the electro-optic sampling technique.35 An f3dB of
235 GHz was obtained with an output current of 8 mApP for an absorption layer width of
86 nm and a collection layer width of 230 nm. With some penalty in f3dB, an output
current as high as 30 mApp could be achieved for the same device.
Monolithic integration of UTC-PD's into high-speed HEMT circuits and RTD
(resonant tunneling diode) circuits 36has been done. An InP-based HEMT D-FF circuit37
with a photonic interface has also been fabricated as shown in Fig. 16.38 Good
bit-error-rate characteristics with an input sensitivity of - 27.5 dBm were obtained.39
The unnecessity of electronic amplifiers for the electronic input made it possible to
reduce chip size and decrease the power consumption of the OEIC.
154 T. Enoki, E. Sana & T. Ishibmhi

Fig. 1 6 . Monolithic Integration of UTC-PD ami a decision circuits using ©J-pm-gate InP-basei HEMIfe.
(a) MIcrophotograph of an optical-input D-type flip-flop circuit (2x2 mm).
(b) Eye diagram of optical input (upper) and electrical output (lower) signals of the optical-input D-type flip-flop
circuit.

6. Summary

The prospects of 100 GMt/s-ciass ICs for future lightwave communications systems have
been discussed. Main subjects for achieving such high-speed lightwave
communications systems are high-speed operation of digital ICs and the interface
between optical devices and electrical ICs. Device parameters for achieving 100
GMt/s-class ICs have been clarified based on the state-of-the art technologies, such as
InP-based HEMTs and HBTs.
By further scaling down HEMTs and reducing their parasitic capacitance, we will
be able to achieve a propagation gate delay below 3 ps/gate, which is necessary for 100
Gbit/s ICs. HEMTs with a gm of over 1.5 S/mm and an fT of over 300 GHz are
promising and performance has already been reported for discrete devices with a gate
length of below 0.05 jim. Integration technologies for such high-performance and
short-channel devices should be developed.
HBTs with Xf of 0.5 ps, R3 of 20 Q, and CBc of 1.6 £F, can be used to produce
100-Gbit/s-dass transmitter and receiver ICs. These values roughly correspond to a n / r
of 200 GHz mdfn^ of 500 GHz and can be achieved by introducing a graded base and
reducing the lateral dimensions in the HBT.
A photonic interface for high-speed electronics using UTC- PD is a promising way
to relax the difficulties in packaging technologies for the 100-Gbit/s region. Monolithic
integration of UTC-PD's and high-speed HEMTs has been done, and over-40-Gbit/s
operation of an optical-input decision circuit has been confirmed using the state-of-the art
InP-Based IC Technologies for Lightwave Communications Systems 155

technology.
The concrete issues that must be addressed in order to meet the requirements
discussed here are accurate device modeling, reduction of parasitic components,
monolithic integration of highly functional circuits. The InP-based technologies
described in this paper should provide realistic solutions for achieving 100-Gbit/s-class
lightwave communications systems.

Acknowledgements
The authors acknowledge Y. Ishii, H. Toba, and K. Sato for their continuous support and
encouragement throughout this work. They also thank Y. Miyamoto, H. Ito, Y. Yamane,
T. Kobayashi, K. Kurishima, K. Murata, H. Yokoyama, and H. Kitabayashi for their
fruitful discussions.

References
1. "Internet Domain Survey", available at hltp://www.nw.com/zone/host-counl-history,/,
Jan., 1999.
2. Y. Yano, T. Ono, K. Fukuchi, T. Itoh, H. Yamazaki, M. Yamaguchi, and K. Emura,
"2.6 terabit/s WDM transmission experiment using optical duobinary coding", Proc.
ECOC'96,1996, vol. 5, pp. 3-6.
3. S. Kawanishi, H. Tanaka, K. Uchiyama, I.Shake, and K. Mori, "3 Tbit/s (160 Gbit/s x
19 ch) OTDM/WDM transmission experiment", OFC Tech. Digest, 1999, PD.
1/1-1/3.
4. T. N. Nielsen, A. J. Stentz, K. Rottwitt, D. S. Vengsarkar, L. Hsu, P. B. Hansen, J. H.
Park, K. S. Feder, T. A. Strasser, S. Cabot, S. Stulz, C. K. Kan, A. F. Judy, J. Sulhoff,
S. Y. Park, and L. E. Nelson, "3.28 Tb/s (82 x 40 Gb/s) transmission over 3 x 100 km
nonzero-dispersion fiber using dual C- and L-band hybrid Raman/Erbium-doped
inline amplifiers", OFC Tech. Digest, Postdeadline Paper, 2000, PD23/1-23-3.
5. T. Otsuji, K. Murata, T. Enoki, and Y. Umeda, "An 80-Gbit/s multiplexer IC using
InAlAs/InGaAs/InP HEMT's", IEEE J. of Solid-State Circ. Vol. 33 No. 9 (1998) pp.
1321-1327.
6. H. Nakajima, T. Ishibashi, E. Sano, M. Ida, S. Yamahata, and Y. Ishii, "InP-based
hight-speed electronics", IEDM Tech. Digest, 1999, pp. 111-11 A.
7. A. Gutierrez-Aitken, E. Kaneshiro, B. Tang, J. Notthoff, P. Chin, D. Streit, and A.
Oki, "69 GHz frequency divider with a cantilevered base InP DHBT", IEDM Tech.
Digest, 1999, pp. 779-782.
8. E. Sano, K. Hagimoto, and Y. Ishii, "Present status and future prospects of
high-speed lightwave ICs based on InP," Intl. J. High Speed Electronics and Systems
vol. 9 (1998) pp. 567-593.
9. H. S. Momose, E. Morifuji, T. Yoshitomi, T. Ohguro, M. Saito, T. Morimoto, Y.
Katsumata, and H. Iwai, "High-frequency AC characteristics of 1.5 nm gate oxide
MOSFETs", IEDM Tech. Digest, 1996, pp. 105-108.
156 T. Enoki, E. Sano & T. Ishibashi

10. N. Shigekawa, T. Furuta, and K. Arai, "Time-of-flight measurements of electron


velocity in an Ino.52Alo.4sAs/Ino.53Gao.47As double heterostructure", Appl. Phys. Lett.
Vol. 57 No. 1 (1990) pp.67-69.
11. T. Enoki, M. Tomizawa, Y. Umeda, and Y. Ishii, "0.05-nm-Gate InAlAs/InGaAs high
electron mobility transistor and reduction of its short-channel effects", Jpn. J. Appl.
Phys. Vol. 33 (1994) pp. 798-803.
12. T. Suemitsu, T. Ishii, H. Yokoyama, Y. Umeda, T. Enoki, Y. Ishii, and T. Tamamura,
"30-nm-gate InAlAs/InGaAs HEMTs lattice-matched to InP", IEDM Tech. Digest,
1998, pp. 223-226.
13. T. Enoki, H. Ito, K. Ikuta, Y. Umeda, and Y. Ishii, "0.1-nm InAlAs/InGaAs HEMTs
with an InP-recess-etch stopper grown by MOCVD", Microwave and Optical Tech.
Lett. Vol. 11 No. 3 (1996) pp. 135-139.
14. T. Enoki, H. Yokoyama, Y Umeda, and T. Otsuji, "Ultrahigh-speed integrated
circuits using InP-based HEMTs", Jpn. J. Appl. Phys. Vol. 37 (1998) pp. 1359-1364.
15. E. Sano and K. Murata, "An analytical delay expression for source-coupled FET
logic (SCFL) inverters", IEEE Trans, on Electon Devices Vol. 42 No. 4 (1995) pp.
785-786.
16. Y Baeyens, R. Pullella, C. Dorschky, J. -P. Mattia, R. Kopf, H. - S . Tsai, G. Georgiou,
R. Hamm, Y. C. Wang, Q. Lee, and Y -K. Chen, "Compact differential InP-based
HBT VCOs with a wide tuning range at W-band," IEEE MTT-S IMS Dig., Boston,
MA, June 2000, pp. 349-352.
17. H. -M. Rein, "Si and SiGe bipolar ICs for 10 to 40 Gb/s optical-fiber TDM links,"
Intl. J. High Speed Electronics and Systems Vol. 9 (1998) pp. 347-383.
18. E. Sano, Y. Matsuoka and T. Ishibashi, "Device figure-of-merits for high-speed
digital ICs and baseband amplifiers," IEICE Trans. Electron. Vol. E78-C No. 9
(1995) pp. 1182-1188.
19. P. K. Tien, "Propagation delay in high speed silicon bipolar and GaAs HBT digital
circuits," Intl. J. High Speed Electronics and Systems Vol. 1 (1990) pp. 101-124.
20. S. Yamahata, H. Nakajima, M. Ida, H. Niiyama, N. Watanabe, E. Sano, and Y. Ishii,
"Reliable carbon-doped InP/InGaAs HBTs technology for low-power 40-GHz static
frequency divider," Intl. Conf. Solid State Devices and Materials, Tokyo, Japan, Sept.
1999, pp. 570-571.
21. S. Yamahata, K. Kurishima, H. Ito, and Y. Matsuoka, "Over-220-GHz-fT-and-fmax
InP/InGaAs double-heterojunction bipolar transistors with a new hexagonal-shaped
emitter," IEEE GaAs IC Symposium, San Diego, CA, Oct. 1995, pp. 163-166.
22. S. Yamahata, K. Kurishima, H. Nakajima, T. Kobayashi, and Y. Matsuoka,
"Ultra-high fmax and fT InP/InGaAs double-heterojunction bipolar transistors with
step-graded InGaAsP collector," IEEE GaAs IC Symposium, Philadelphia, PA, Oct.
1994, pp. 345-348.
23. Y Matsuoka, S. Yamahata, K. Kurishima, and H. Ito, "Ultrahigh-speed InP/InGaAs
double-heterojunction bipolar transistors and analyses of their operation," Jpn. J.
Appl. Phys. Vol. 35 (1996) pp. 5646-5654.
24. Q. Lee, S. C. Martin, D. Mensa, R. P. Smith, J. Guthrie, S. Jaganathan, T. Mathew, S.
InP-Based IC Technologies for Lightwave Communications Systems 157

Krishnan, S. Ceran, and M. J. W. Rodwell, "Submicron transferred-substrate


heterojunction bipolar transistors with greater than 800 GHz fmax," IPRM'99, Davos,
Switzerland, May 1999, pp. 175-178.
25. H. Shigematsu, T. Iwai, Y. Matsumiya, H. Ohnishi, O. Ueda, and T. Fujii, "Ultrahigh
fT and fmax new self-alignment InP/InGaAs HBT's," IEEE Electron Device Lett., Vol.
16 (1995) pp. 55-57.
26. H. -F. Chau and Y. -C. Kao, "High fmax InAlAs/InGaAs heterojunction bipolar
transistors," Tech. Dig. IEDM, Washington, DC, Dec, 1993, pp. 783-786.
27. T. Oka, K. Hirata, K. Ouchi, H. Uchiyama, K. Mochizuki, and T.
Nakamura,"Small-scaled InGaP/GaAs HBTs with Wsi/Ti base electrode and buried
Si02", IEEE Trans, on Electron Devices Vol. 45 No. 11 (1998) pp. 2276-2282.
28. H. Shimawaki, Y. Amamiya, N. Furuhata, and K. Honjo, "High-/m„ AlGaAs/InGaAs
and AlGaAs/GaAs HBT's with p+/p regrown base contacts," IEEE Trans. Electron
Devices Vol. 42 No. 10 (1995) pp. 1735-1744.
29. K. Washio, E. Ohue, K. Oda, R. Hayami, M. Tanabe, H. Shimamoto, T. Harada, and
M. Kondo, "82GHz dynamic frequency divider in 5.5 ps ECL SiGe HBTs," Dig.
Tech. Papers ISSCC, San Francisco, Feb. 2000, pp. 210-211.
30. A. Schuppen, U. Erben, A. Gruhle, H. Kibbel, H. Schumacher, and U. Konig,
"Enhanced SiGe heterojunction bipolar transistors with 160 GHz- fmax," Tech. Dig.
IEDM, Washington, D. C., Dec. 1995, pp. 743-746.
31. T. Ishibashi, N. Shimizu, S. Kodama, H. Ito, T. Nagatsuma, and T. Furuta,
"Uni-traveling-carier phorodiodes," Tech. Dig. Ultrafast Electronics and
Optoelectronics (1997 OSA Spring Topical Meeting), 1997, pp. 166-168/UWA2-1.
32. T. Ishibashi, S. Kodama, N. Shimizu, and T. Furuta, "High-speed response of
uni-traveling carier phorodiodes," Jpn. J. Appl. Phys.Vol. 36 (1997) pp. 6263-6268.
33. T. Ishibashi, H. Fushimi, T. Furuta, and H. Ito, "Uni-traveling-carrier photodiodes for
electromagnetic wave generation, ," 1999 IEEE 7*' Int. Conf. on Terahertz
Elecatronics Proc, 1999, pp. 36-39.
34. N. Shimizu, N. Watanabe, T. Furuta, and T. Ishibashi, "InP/InGaAs uni-traveling
carier phorodiode with improved 3-dB bandwidth of over 150 GHz," IEEE Photonic
Technology Lett. Vol. 10 (1997) p. 412.
35. H. Ito, T. Furuta, S. Kodama, N. Watanabe, and T. Ishibashi, "InP/InGaAs
uni-traveling-carrier photodiode with 220 GHz bandwidth," Electron. Lett. Vol. 35
No. 18 (1999) pp. 1556-1557.
36. T. Akeyoshi, N. Shimizu, J. Osaka, M. Yamamoto, T. Ishibashi, K. Sano, K. Murata,
and E. Sano, "Optoelectronic logic gate monolithically integrating resonat tunneling
diode and uni-traveling-carrier photo diode", Proc. of Indium Phosphide and Related
Materials, 1998, pp.423-426.
37. K. Murata, T. Otsuji, and Y. Yamane, "45 Gbit/s decision IC module using
InAlAs/InGaAs/InP HEMTs", Electron. Lett., Vol. 35, pp. 1379-1380,1999.
38. H. Kitabayashi, Y. Umeda, T. Furuta, N. Watanabe, T. Akeyoshi, Y. Yamane, K.
Murata, N. Shimizu, and Y. Ishii, "Monolithic integration technology using
InP-based HEMTs and uni-traveling-carrier-photodiode for over 40 Gbit/s digital
158 T. Enoki, E. Sano & T. Ishibashi

OEIC", Proc. of Indium Phosphide and Related Materials, 1999, pp. 329-332.
39. N. Shimizu, K. Murata, A. Hirano, Y. Miyamoto, H. Kitabayashi, Y. Unieda, T.
Akeyoshi, T. Furuta, and N. Watanabe, "40 Gbit/s monolithic digital OEIC composed
of uni-traveling-carrier photodiode and InP HEMTs", Electron. Lett., Vol. 36, No. 14,
pp. 1219-1220, 2000.
International Journal of High Speed Electronics and Systems, Vol. 11, No. 1 (2001) 159-215
© World Scientific Publishing Company

SCALING OF I n G a A s / I n A l A s H B T s FOR
HIGH S P E E D MIXED-SIGNAL A N D mm-WAVE ICs

M. J. W. RODWELL, M. URTEAGA, Y. BETSER", T. MATHEW, P. KRISHNAN,


D. SCOTT, S. JAGANATHAN, D. MENSA, J. GUTHRffit, R. PULLELA*,
Q. LEE§, B. AGARWAL^, U. BHATTACHARYAH, and S. LONG
Department of Electrical and Computer Engineering, University of California,
Santa Barbara, CA 93106, USA

S. C. MARTIN and R. P. S M I T H "


NASA Jet Propulsion Labs, California Institute of Technology,
Pasadena, CA, USA

High bandwidths are obtained with heterojunction bipolar transistors by thinning the
base and collector layers, increasing emitter current density, decreasing emitter contact
resistivity, and reducing the emitter and collector junction widths. In mesa HBTs,
minimum dimensions required for the base contact impose a minimum width for the
collector junction, frustrating device scaling. Narrow collector junctions can be obtained
by using substrate transfer processes, or -if contact resistivity is greatly reduced - by
reducing the width of the base Ohmic contacts in a mesa structure. HBTs with submicron
collector junctions exhibit extremely high fmax and high gains in mm-wave ICs. Logic
gate delays are primarily set by depletion-layer charging times, and neither fT nor fmax
is indicative of logic speed. For high speed logic, epitaxial layers must be thinned, emitter
and collector junction widths reduced, current density increased, and emitter parasitic
resistance decreased. Transferred-substrate HBTs have obtained 21 dB unilateral power
gain at 100 GHz. If extrapolated at -20 dB/decade, the power gain cutoff frequency fmax
is 1.1 THz. Transferred-substrate HBTs have obtained 295 GHz fT. Demonstrated ICs
include lumped and distributed amplifiers with bandwidths to 85 GHz, 66 GHz master-
slave flip-flops, and 18 GHz clock rate A — E ADCs.

1. Introduction
Research in wide bandwidth hetero junction bipolar transistors (HBTs) x 2 is driven
by applications in high-frequency communications and radar. In optical fiber com-
munications, integrated circuits for 40 Gb/s transmission are now in development
*Y. Betser is now with Anadigics Corp., Israel.
t j . Guthrie is now with Nortel Networks, Inc.
' D . Mensa, R. Pullela, and S. Jaganathan are now with Gtran, Inc
§Q. Lee is now with Lucent Technologies
' B . Agarwal is now with Conexant Corp.
IIU. Bhattacharya is now with Intel Corp.
**R. P. Smith is now with Cree Research, Inc.

159
160 M. J. W. Rodwell et al.

5
, 6 . Emergence of 160 G b / s transmission equipment in the near future must rely
on a timely and substantial improvement in the bandwidth of semiconductor elec-
tronics. 160 G b / s fiber transmission will require amplifiers with fiat gain and linear
phase over a ~ DC-110 GHz bandwidth and master-slave latches 3 (used in decision
circuits, multiplexers, and phase-lock loops) operable at 80 GHz or 160 GHz clock
frequency.
A second set of driving applications are wideband, high-resolution analog-digital
converters, digital-analog converters, and direct digital frequency synthesizers 8 .
Increased bandwidths of these mixed-signal ICs will increase the bandwidth and
frequency agility of military radar and communications systems 4 . In ADCs and
DACs, very high resolution is obtained using oversampling techniques 7 , 9 , with
clock frequencies ~ 100 x the signal bandwidths. In high resolution ADCs, to
avoid metastability errors in latched comparators driven by small input signals,
the circuit time constants must be much smaller than the periods of the clock
signals employed. Similar design constraints apply to high-resolution DACs. High
resolution ADCs and DACs consequently require transistor bandwidths 10 2 : 1 to
10 4 : 1 larger than the signal frequencies involved. Transistors with several hundred
GHz fT and fmax would enable high-resolution microwave mixed-signal ICs.
A third driving application is in monolithic millimeter-wave integrated circuits
(MIMICs). In microwave and millimeter-wave receivers, the low-noise R F pream-
plifier, several stages of amplification, and frequency conversion (a mixer), are typ-
ically implemented as small-scale monolithic circuits. Similar MIMICs are used
in the transmitter. The operating frequency is set by the application, but pro-
gressive improvements in transistor bandwidths permit the evolution of radar and
communications ICs to progressively higher frequencies. A transistor with a 1 THz
power-gain cutoff frequency would provide useful gain over the full 30-300 GHz
millimeter-wave band. This would permit e.g. digital radio links with millimeter-
wave carrier frequencies and 1-10 G b / s channel capacities. Until recently, III-V
high-electron-mobility field-effect-transistors (HEMTs) have shown fmax superior
to that of HBTs, and have dominated in MIMICs. With recent work on scaling of
HBTs to submicron dimensions 4 3 , HBT power-gain cutoff frequencies now exceed
those of HEMTs, and HBTs can compete for application in MIMICs.
In high-speed digital and mixed-signal applications, III-V HBTs must compete
with their silicon counterparts. The primary advantage of III-V HBTs is superior
bandwidth, and the primary disadvantage the relative immaturity of the technology,
with consequently higher cost and lower scales of integration. There are several
factors contributing to the superior bandwidth of III-V HBTs. For HBTs grown on
GaAs or InP substrates, available lattice-matched materials allow use of an emitter
whose bandgap energy is much larger than that of the base *. This allows the base
doping to be increased to the limits of incorporation in growth, ~ 10 2 0 /cm 3 , and
results in very low base sheet resistance. 600 fi/square sheet resistance and 0.15 ps
base transit time is readily obtained in a Be-doped InGaAs base of 400 A thickness.
In contrast, constraints of allowable lattice mismatch in Si/SiGe HBTs limit the
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and rara-Wave ICs 161

allowable Ge:Si alloy ratio. The emitter-base bandgap energy difference is then
much smaller than in III-V HBTs, and base dopings are consequently lower. 4-8
kfi/square base sheet resistivity is typical of SiGe HBTs 13 . High electron velocities
are a second significant advantage of III-V HBTs. In InAlAs/InGaAs HBTs with
0.2-0.3 /im collector thickness, effective collector electron velocities exceed 4 x 10 7
cm/s, approximately 4:1 higher than observed in Si. This high electron velocity
results in high current-gain cutoff frequencies.
With the exception of transferred-substrate HBTs (discussed subsequently), best
reported results of InP-based HBTs include 225 GHz fT and 300 GHz fmax 3 5 , 1 4 .
Si/SiGe HBTs 1 0 , n have obtained 156 GHz fT. Thus, despite the advantages of III-
V HBTs provided by superior materials properties, Si bipolar junction transistors
(BJTs) and Si/SiGe HBTs remain highly competitive. The high bandwidths of
Si/SiGe HBTs arise in part from aggressive submicron scaling. In devices with
0.14 fira emitter-base junction widths, 92 GHz fT and 108 GHz fmax have been
reported 12 . Self-aligned polysilicon contacts reduce both the parasitic collector-
base capacitance and the base resistance. In marked contrast to the aggressive
submicron scaling and aggressive parasitic reduction employed in Si/SiGe HBTs,
III-V HBTs are typically fabricated with 1-2 /im emitter junction widths and 3-5
/im collector-base junction widths. This is remarkable in an era when commodity
microprocessors are available with tens of millions of transistors at 0.13 /xm gate
lengths. Deep submicron scaling will improve the bandwidth of III-V heterojunction
bipolar transistors, and is critical to their continued success.
To obtain improved H B T bandwidths by scaling, transit times are reduced by
decreasing the thicknesses of the base and collector epitaxial layers. Important
RC charging times are reduced by laterally scaling the base and collector junction
widths. Most significant among several limits to H B T submicron scaling is the
extrinsic (parasitic) collector-base junction lying under the base Ohmic contacts.
The required minimum size for the base Ohmic contacts places a lower limit on
the size of the collector-base junction, preventing submicron junction scaling. We
have developed a substrate transfer process which allows fabrication of HBTs with
submicron emitter-base and collector-base junctions lying on opposing sides of the
base epitaxial layer. With this device, fmax increases rapidly with scaling. With
transferred-substrate HBTs, 1.1 THz extrapolated power-gain cutoff frequencies and
295 GHz current-gain cutoff frequencies have been obtained. Further improvements
in fT requires further epitaxial scaling, together with increased operating current
density and greatly improved emitter parasitic resistance.

2. H B T scaling

In HBTs, thinning the base and collector epitaxial layers reduces the carrier transit
times but increases the base resistance and the collector-base capacitance. These
can be subsequently reduced by reducing the lithographically-defined widths of the
emitter-base and collector-base junctions. To simultaneously obtain both high fr
and high fmax, device epitaxial and lithographic dimensions must be concurrently
162 M. J. W. Rodwell et al.

base contact pad

base contact
collector
contact base I *t
Y///A collector
sub collector
SI substrate

Figure 1: Plan and cross-section of a typical mesa HBT. The emitter-base junction
has width We, length Le and area Ae = LeWe, while the collector-base junction
has width Wc, length Lc and area Ac = LCWC

scaled. Below we examine the limits to HBT scaling.


Figure 1 shows a simplified cross-section of a mesa HBT. To form the transis-
tor, the emitter, base, and collector layers first grown by molecular-beam epitaxy
(MBE) or metal-organic chemical vapor deposition (MOCVD) on a semi-insulating
substrate. The HBT junctions are formed by a series of patterned etches, and
contacts formed by depositing metal. This results in a device structure where the
collector-base junction must lie under the full area of the base Ohmic contacts.
There is also a parasitic collector-base junction lying under the area of the base
contact pad. In this device structure, the collector-base junction must be substan-
tially larger than the emitter dimensions. At the sides of the emitter stripe, the
base Ohmic contact must be at least one Ohmic contract transfer length L con tact in
order to obtain low contact resistance. In an InGaAs-base HBT with 400 A base
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 163

thickness and 5 x 10 1 9 /cm 3 doping, L c o n t a c t ^ 0.4 /an. Lithographic alignment


tolerances between emitter and collector also constrain the minimum collector-base
junction dimensions. Dependent upon the process minimum feature size and the
length of the emitter stripe, the base contact pad area can contribute as much as
50 % of the total collector-base capacitance.

2.1. Factors determining fT


Before examining scaling for high cutoff frequencies, relevant HBT parameters must
first be calculated. The current-gain cutoff frequency is
1 kT
— r = n + TC + — (Cje + Ccb) + (Rex + Rc)Ccb, (1)

where Rex and Rc are the parasitic emitter and collector resistances, CCb i s the
collector junction capacitance, and Ic the collector current.
First examine the base transit time TJ. If a linear grading of the base semicon-
ductor bandgap energy with position is used to reduce Tb, then 1 5
2
r2 / urn \ T2 / UT \

_Tb_ (W_\ (-, _ -AE/kT\


vexit \AEj V ) '
(2)

where AE is the grading in the base bandgap energy and Tb the base thickness.
The base exit velocity vexu is of the order of (fcT/m*) 1 / 2 for an ungraded base 1 5 ,
and is somewhat larger with base bandgap grading. Dn is the base minority carrier
diffusivity and m* the electron effective mass. Equation 2 is derived from the drift-
diffusion relationship, and is accurate only if the predicted T(, is large in comparison
16
with the momentum relaxation time r m = Dnm*/kT . Using the parameters of
an InGaAs base at 5 x 10 /cm doping (Dn = 40 cm 2 /sec, vexit ~ 3 x 10 7 cm/s,
19 3

r m = 3 5 fs), we note that 52 meV bandgap grading is sufficient to reduce Tb by ~


2:1. For a thick base layer or a large vexu , rb <x T 2 ; with InGaAs base layers below
~400 A thickness, the exit velocity term in eqn. 2 adds a significant correction.
The collector transit time r c is the mean delay of the collector displacement
current, and is given by 1T , 18

-L v(x) dX
-2vee' {3}

where v(x) is the position-dependent electron velocity in the collector drift region
and veff an effective electron velocity. TC is most strongly dependent upon the elec-
tron velocity in the proximity of the base, and becomes progressively less sensitive
to the electron velocity as the electron passes through the collector 18 . At low
collector-base bias voltages, electrons must traverse a significant fraction of the col-
lector drift region before acquiring sufficient kinetic energy (0.55 eV for InGaAs 1 9 ,
164 M. J- W. Rodwell et al.

0.6eV for I n P 2 0 ) to undergo T-L scattering 1 7 , 1 8 , and v(x) is fortuitously highest


near the base. In thin InGaAs or I n P layers, ueff = 3-5 x 10 7 c m / s . For scaling
analysis, we will take TC OC TC.
In InAlAs/InGaAs HBTs with Tb ^ 400 A and Tc ^ 0.2 p , fT ~ 250 GHz, and
the RC charging terms in eqn. 1 comprise 35% of the total forward delay. These
terms must be considered in detail.
First consider the charging time [kT/qIc\Ccb- This term has a major impact
upon digital circuit delay (section 3.1) and is reduced by increasing the collector
current density to limits set by collector space-charge screening (the Kirk effect 2 1 ) .
If the collector doping Nj. is chosen so as to obtain a fully-depleted collector at zero
bias current and the applied Vcb, we must have

Vcb + 4> = qNdT?/2e , (4)

while base pushout occurs at a current density Jmax satisfying

Vcb + 4>= {Jmaxhsat - qNd) T*/2e , (5)

hence the maximum collector current before base pushout is

/cmax = Ae{Vcb + «£)4«; s o t /T 2 ex Ae/Tl (6)

where vsat is an (assumed) uniform electron velocity within the collector. With
undoped collectors, Ic,max is 2:1 smaller than in eqn. 6. The collector capacitance is
Ccb = eAc/Tc. With the HBT biased at ICimax oc 1/T 2 , (kT/qIc)Ccb ex Tc (Ac/Ae).
This delay term is thus minimized by scaling (reducing T c ), but bias current densities
must increase in proportion to the square of the desired fractional improvement in
/r-
The emitter charging time (Cje[kT/qIc] in eqn. 1) is a significant determinant
of / r , and also plays a major role in ECL logic delay (section 3.1). If we were to
assume that Cje were simply a depletion capacitance, it would be reasonable to
expect that this charging time could be minimized simply by making the emitter-
base depletion region very thick, by use of very low emitter doping, combined with
a thick bandgap grading region in the base-emitter heterojunction. Clearly, this
approach must fail somehow in the limit of very large depletion thicknesses. We
must examine design of the emitter-base junction in detail to determine the limits
to the emitter-base depletion thickness, and to understand how the junction design
must be modified as the transistor is scaled for increased device bandwidth.
In order to support a high emitter current density without a substantial poten-
tial drop in the emitter-base depletion layer, a high electron density n(x) must be
present within the emitter-base junction. In high speed HBTs the thickness Teb
of the emitter-base depletion layer must then be small if significant charge storage
effects are to be avoided. Figure 2 shows a band diagram of the base-emitter deple-
tion region. n(x) = Ncexp[-q(Ec(x) - Ef<n(x))/kT], where Nc is the conduction
band effective density of states, Ec(x) is the conduction-band energy and EfiU(x)
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 165

depletion
emitter region
Figure 2: Band diagram of the HBT emitter-base junction. If the base-emitter
junction thickness Teb is excessive, HBT performance will be degraded by either
stored charge or by excessive potential drops in the depletion layer.
166 M. J. W. Rodwell et al.

the electron quasi-Fermi level. An arbitrary conduction-band profile Ec(x) can be


obtained through combined bandgap grading and doping. Under modulation of Vbe,
dn(x)/dVi,e = n(x)(q/kT)(x/Teib). The ideality factor N is defined by the relation-
ship Ic oc eivbe/NkT. g rac Jients in Efn in the emitter-base depletion region result in
N greater than unity, with

N = 1 + ldi^A (7)
q dvbe
In the base-emitter depletion region, dEfn/dx = —J/fJ-n,ebn(x), while in the base
Jn = qn(Tet,)Dn/TbT. Here, finieb is the electron mobility in the junction (due to
the low doping in the grade, this mobility is significantly larger than that of the
base) and T = kT/AE - [kT/AE - Dn/vexitTb)e-^E/kT is a factor involving the
base bandgap grading ( r ~ 1 for an ungraded base). Combining these relationships,
the ideality factor is

TV = i +Tb
g ^Tfl„,
- A - Jo14^r(i
n(C,Teb)
- CK, (8)
eb

Where ( = x/Teb is a normalized position variable, and fin is the electron mobility
in the base. To obtain a low ideality factor, Teb/Tb must not be large, and the
electron density n(x) in the junction must be kept high. Unless Teb/Tb is kept
small, the high n(x) will result in significant charge storage. Using methods similar
to those used to derive the collector transit time 1T , 18 (eqn. 3),

[•Teb
Cje/Ae = e/Teb + • I {x/Teb) qn(x)dx (9)
dVbe Jo

The term (kT/qIc)Cje in eqn. 1 can be then written as

<*W.)C. = (£) (£
TTebTb r'njtTeb)^^
Dn J/ 00 ' -el ^ C d C
-n{T
n\-Leb
(10)

The first term in eqn. 10 results from the depletion-layer capacitance, and is mini-
mized using high bias current densities Je = Ie/Ae; the second term reflects storage
of mobile electron charge within the depletion layer, and is minimized by reducing
TebTb-
In eqn. 1, the delay term RexCcb is a major limit to HBT scaling for high fT.
Further, Rex contributes significantly to ECL logic delay. Because of the relative
sizes of the emitter and collector Ohmic contacts, in a well-designed submicron
HBT, Rc is 4:1 to 10:1 smaller than Rex and RcCcb can be neglected in a first
analysis. Rex must first be calculated. The emitter layer structure of a typical
HBT (fig. 3) contains a heavily-doped and narrow-bandgap contact ("cap") layer,
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 167

emitter contact metal -

emitter cap layer-


N++ emitter
N+ emitter
surface depletion region
base-emitter grade -
(partially depleted)
Figure 3: Cross-section of the emitter layers within a typical HBT, comprising an a
heavily-doped semiconductor contact ("cap") layer, a low-resistance N++ emitter
layer, and the N+ emitter. Lateral depletion of the N+ emitter can be significant
in submicron devices.

and a heavily-doped N++ wide-bandgap emitter layer. A portion of the emitter


layer may be more lightly (N+) doped for reduced junction capacitance, and may
be of several hundred A thickness to avoid dopant diffusion from the N++ layer
into the emitter-base junction. If heterointerfaces are properly graded to avoid
conduction-band barriers between layers, the parasitic emitter resistance is

Rex = Pc,e/LeWe ^contact ~t~ Pcaplca-pIL^We,^contact


+ Pe2Te2/LeWejunct + PelTel/LeWe ,
(ii)
an
where pCiE is the emitter specific Ohmic contact resistivity, and pcap, pe2i d Pel
are the bulk resistivities of the cap, N++, and N+ emitter layers. For submicron
emitters, the junction width Wejunct is significantly smaller than the contact width
We,contact due to lateral undercutting of the emitter during etching of the emitter-
base junction, and the electrically-active emitter width We can be significantly
smaller than WejUnct because of the presence of surface (edge) depletion regions of
width {2ecj)/qNeiy/2, where Nei is the N+ layer doping and cj> is the bandbending
due to pinning of the Fermi energy at the surface. For simplicity in scaling analysis,
we will approximate
Rex ~ pe/Ae (12)
where pe is a fitted parameter, approximately 50fi—/xm2 for submicron InAlAs/InGaAs
HBTs fabricated to date at UCSB. In InAlAs/InGaAs HBTs we have fabricated,
p c , e = 20fi — pm2 when InGaAs contacts at 10 1 9 /cm 3 doping are employed, and
pc>e = 4il — pm2 for contacts to InAs layers at 2 x 10 1 9 /cm 3 doping. The peiTei —
5.5fi - pm2 resistance of the N+ InAlAs layer (8 x 10 1 7 /cm 3 doping, 700 A thick-
ness) is significant in submicron devices for which We is 2:1 to 4:1 smaller than
168 M. J. W. Rodwell et al.

WetCcmtact- To avoid such emitter size effects, deep submicron HBTs should use
» 10 1 8 /cm 3 emitter doping.
The RexCCb charging time can now be examined. Since Ccb = eAc/Tc,

2
*»Mf)(£)- 2 8 t s x (x)- (">
if pe = 50 fi — /jm and Tc = 0.2 /xm. This a significant delay. In HBTs we have
fabricated with 275 GHz peak fT, the substrate transfer process allows Ac/Ae to
be kept small at 2.3:1, yet RexCcb still constitutes 11% of the total l/27r/ T = 0.58
ps forward delay. In mesa HBTs (fig. 1) Ac/Ae is often larger than 2.3:1 and
hence RexCcb will contribute a larger delay. Because RexCcb oc 1/TC, thinning the
collector to reduce TC also increases RexCCb-
To increase HBT current gain cutoff frequencies, the base and collector layers
must be thinned and the bias current density increased. Thinning the collector
increases RexCCb, imposing a limit to scaling. Limits to bias current density imposed
by device reliability, and loss in breakdown voltage with reduced collector thickness,
are two further potential limits to scaling. Finally, unless the device structure of
fig. 1 is laterally scaled, vertical H B T scaling for increased fT will result in reduced
power-gain cutoff frequencies fmax •

2.2. Lithographic scaling for high fmax


Regardless of the value of / T , transistors cannot provide power gain at frequencies
above fmax. Independent of fT, fmax defines the maximum usable frequency of a
transistor in either narrowband reactively-tuned or broadband distributed circuits
22
. In more general analog and digital circuits (section 3.1), all transistor parasitics
play a significant role. The fT and fmax of a transistor are then cited to give a first-
order summary of the device transit delays and of the magnitude of its dominant
parasitics.
In an H B T with base resistance Rbb and collector capacitance Ccb, the power-
gain cutoff frequency is approximately fmax — {fr/^RbbCcbi)1^2• The base-collector
junction is a distributed network, and RbbCcbi represents an effective, weighted time
constant.
The base resistance (fig. 1) Rbb is composed of the sum of contact resistance
RC) base-emitter gap resistance Rgap, and spreading resistance under the emitter
Rspread' With base sheet resistance ps, and specific (vertical) contact access resis-
tance pc, we have

=
Hbb tlb,cont i H-gap > ^-spread
=
Rb.cont
y/p3pc/2Le
Rgap = PaWeb/2Le
Rspread = PsWe/\2Le.
(14)
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 169

To compute fmax, we must find Ccbi. Because the base-collector junction para-
sitics are distributed, calculation of RbbCcbi is complex, and will be deferred until
section 2.3. As a first (and very rough) approximation, we will first compute RbbCcb,
e.g. the product of the base resistance and the full capacitance CCb = tAc/Tc of t h e
collector-base junction,

RbbCcb = {^/~p7p~c+PsWeb){^ [•£•

WcWe

(15)

Consider the influence of device scaling on the time constant RbbCcb- Decreasing the
base thickness to reduce Tb increases the base sheet resistivity pc, increasing RbbCcb-
Decreasing the collector thickness Tc to reduce r c directly increases RbbCcb, as is
shown explicitly in eqn. 15.
Low RbbCcb, and consequently high fmax, is obtained by scaling the emitter
and collector junction widths We and Wc to submicron dimensions. Reducing the
emitter width We alone reduces towards zero the component of RbbCcb associated
with the base spreading resistance (the second term in eqn. 15). In the normal
triple-mesa H B T (fig. 1), the base Ohmic contacts must be at least one contact
transfer length (LContact = (Pc/Ps)1 ) , setting a minimum collector junction width
Wc- The component of RbbCCb associated with the base contact resistance (the first
term in eqn. 15) has a minimum value, independent of lithographic limits. Conse-
quently, / m ax does not increase rapidly with scaling. Given this minimum RbbCcb,
attempts to obtain high fT by thinning the collector have resulted in decreased
fmax, frustrating efforts to improve H B T bandwidths.
If the parasitic collector-base junction is eliminated, fmax will instead increase
rapidly with scaling. The collector-base junction need only be present where current
flows, e.g. under the emitter. We have fabricated such a device (figure 4) using
substrate transfer processes. The emitter and collector junctions can be of equal
width, hence Wc = We. The base-collector time constant becomes

Wc
RbbCcb — {y/p~Ipl + PsWeb) 2( | J
Tc
wp
(16)

With submicron scaling of the emitter and collector junction widths, the first term
in eqn. 16 dominates, and .fmax increases as the inverse square root of the process
minimum feature size.
170 M. J. W. Rodwell et al.

emitter
base contact base contact

base
V//////A

collector ry/>
contact i^ > i

WC = WE=W
Figure 4: Cross-section of an idealized HBT with the collector-base junction lying
only under the emitter. Such device structures can be formed using substrate
transfer processes.

2.3. Secondary Effects in / m a x


The formulas developed above are highly simplified and significantly underestimate
the HBT fmax- Two significant corrections must be applied. First, the simple
lumped RC model of the base-collector junction must be re-examined. Secondly,
differential space-charge effects substantially reduce the collector-base capacitance
under high-current conditions.
The HBT base-collector network is distributed, and is represented by the model
of fig. 5. Using a small grid spacing, we have entered the resulting network into a
microwave circuit simulator (HP-EESOF 2 3 ) to calculate -without approximation-
the HBT fmax- Alternatively, analytic expressions for fmax can be developed from
hand analysis of the distributed network of fig. 5. Among these is the model of
Vaidyanathan and Pulfrey 2 4 , which provides good physical insight. The model of
reference 2 4 is derived for a triple-mesa HBT; the authors of 2 5 have recently gen-
eralized the model to the case of transferred-substrate and lateral-etched-undercut
collector 3 0 HBTs. We describe the Vaidyanathan / Pulfrey model below, and ex-
amine its predicted performance for HBTs with submicron emitter and collector
junction widths.
Referring to fig. 5, define three capacitances. CCb,e = eLeWe/Tc is the capac-
itance of the collector junction lying under the emitter. Ccb,gap = 2eLeWeb/Tc is
the capacitance of the collector junction lying under the gap between the emitter
and the base contact. Ccb<ext = 2eLeWcb/Tc is the capacitance of the collector lying
under the base Ohmic contacts. Components of the base resistance are as defined
in eqn. 14.
The collector-base capacitance under the emitter stripe Ccbie is charged through
a resistance (Rb,cont + Rgap + Rspread)- The collector-base capacitance under the gap
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 171

TTTTTTTTTTTTTTP^
'777777777777,
Ax
'% collector ,
£ AG
•WrAR
4= AC

Figure 5: Distributed model of the HBT base-collector junction for accurate calcu-
lation of RbbCcbi- With mesh spacing Ax, AG = LeAx/pc, AR = psAx/Le, and
AC = eLeAx/Tc

between the emitter and the base Ohmic contacts is charged through a resistance
,cont + Rgap/%)-
The charging time constant associated with the collector-base junction capaci-
tance Ccbtext lying under the base Ohmics requires more detailed scrutiny. Ccb,ext
can be charged by currents passing vertically through the base Ohmic contact above
it; this path has a resistance Rb,cont,i = Pc/2LeWcb- Alternatively, CCb,ext can
be charged by currents passing laterally from the base contact region lying out-
side the perimeter of the collector contact; this path has a resistance Rb,cont,o =
{PsPc)1/2coth((Wb - W 6 c ) / L c o n t a c t ) , where L c o n t a c t = (pc/Ps)1/2 is the base Ohmic
contact transfer length.
In the limit of zero collector series resistance, Vaidyanathan and Pulfrey's model,
24 25
, reduces to

Jmax — (17)
8nrCb
where
kT
= Tb+Tc + — (Cje + Ccb), (18)
Wr
and

(R„ ycont ' ttgap f ^spread)


' ^cb,gap \ttb,cont + Rgap/*)
+ {Rb,cont,Q\\Rb,cont,l) Ccb,ext
(19)

Examining figure 5, the external collector capacitance CCb,ext is not charged


through the resistances Rgap and Rspread. It is pessimistic to calculate fmax
172 M. J. W. Rodwell et al.

i i i i 1 , , , , i , , J—1—
700- 7

WE =0.4 urn I_
600 -

1_
500 -
N :
O 400 -_ 7

1 300 - "——A '~


200- finite element analysis
Vaidyanathan and Pulfrey -
100 i, r
-\fzfinRbbCcbtotal
0- i j i i i 1 l i i i 1 i i i i \ ' '
l i . | i .

0.5 1 1.5 2 2.5


Collector Junction Width, jim

Figure 6: Comparison of fmax computed from a finite element model with


Vaidyanathan and Pulfrey's model (Eqn.. 17) and a model using the total col-
lector junction capacitance (Eqn.. 15). Except for Wc, the modeled H B T is that of
figure 19, and has We = 0.4 /xm.
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 173

i i i
AjU\J Transferred-
Substrate HBT
-jkmmmimMMtmmm*-*.
1500- HMM
N Mesa HBT
I
CD pit"*<iwt«Ri*p«<«*-i
1000H

0.5 [m base Ohmics .

500-

0- 1 i i i i i i i

0.5 1 1.5
emitter width, microns

Figure 7: Lithographic scaling of transferred-substrate and mesa HBTs. fmax is


calculated using fig. 5's finite-element model of the collector-base junction. Except
for Wc and We, the HBT parameters are taken from the device of fig. 19. Current
density and epitaxial layer thicknesses are held constant, resulting in constant fT
174 M. J. W. Rodwell et al.

as (fr/^TTRbbCcb)1^2 in which the collector-base time constant includes the full


collector-base capacitance. As indicated by Vaidyanathan and Pulfrey's model (eqn.
17), the external collector capacitance Ccb,ext is in fact charged through a smaller
associated resistance (Rb,cont,o\\Rb,cont,i)• This model shows extremely good agree-
ment with finite-element analysis (fig. 6).
Figure 7 compares the fmax of mesa and transferred-substrate HBTs, computed
using the finite-element model. For the transferred-substrate device, fmax increases
rapidly with deep submicron scaling. Experimentally, we observe a more rapid
variation of fmax with collector width than is shown in fig. 6, and fig. 7 predicts
a higher fmax than is experimentally observed for mesa HBTs. Series resistance in
the base metallization and collector series resistance 2 4 (not modeled above, and not
present in Schottky-collector transferred-substrate HBTs) are possible explanations
for the discrepancy.
At high collector current densities, differential space-charge effects in the collec-
tor space-charge region result in Ccb smaller than eAc/Tc, and increase the HBT
fmax- The effect was predicted by Camnitz and Moll 2 7 , and first experimentally
observed by Betser and Ritter 2 6 . Similar effects have been observed in MESFETs
28
. In III-V materials at high fields, electron velocity v(S) decreases with increasing
electric field. To a first approximation, l/v{£) ~ KO + KX£. Modulating the collector
voltage Vcb modulates the collector transit time TC (eqn. 3), and partially modu-
lates the space-charge in the collector drift region. This modulated space-charge
partially screens the base from modulations in the collector applied field, and Ccb,e
is reduced to

acb,e - 'A-'T'-'ul
€A.e K\JcA.e K'Wc-* c
1-
Tc 2 6«
(20)

The quadratic dependence upon Jc results from internal collector field redistribution
in the presence of the collector space-charge 2 7 . Current spreads laterally during
transport through the collector, flowing through a region of width ~ (We + TC). The
differential space charge effect strongly reduces the collector junction capacitance in
regions below and adjacent to the emitter stripe. It thus has the strongest impact
upon fmax in devices with minimal excess collector capacitance. Experimental d a t a
confirming Ccb cancellation will be shown in section 4.2. Capacitance cancellation
is not instantaneous, but instead arises after a delay proportional to r c ; HBT power
gain must therefore increase at —40 dB/decade for frequencies above ~ 1/2TVTC. The
effect can produce a ~ 2:1 increase in fmax, hence a large increase in the attainable
gain of tuned millimeter-wave amplifiers. In contrast, in digital circuits (section
3.1), many delay terms are significant, and a 2:1 reduction in Ccbi would produce
only a ~ 12% decrease in gate delay.
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 175

emitter
base contact V//\
InGaAs
undercut
collector junction
Y77777\ W Y/////A collector

collector
contact

InGaAs subcollector
InP subcollector
SI substrate

Figure 8: HBT with an undercut collector-base junction formed by selective etching


of the InP collector in HC1

^je^be.diff gm'be > i ?

Figure 9: Hybrid-^ small-signal HBT equivalent circuit. Cbe,diS = 9m(.Tb + Tc)- T h e


element Ccb% does not represent capacitance of that fraction of the collector junction
lying under the emitter, but is instead a parameter adjusted to obtain the correct
Jmax •

2.4. HBT equivalent circuit model

The H B T base-collector network is distributed, and accurate expressions for fmax


are complex. Computer simulation of complex circuits requires a compact device
model. Under small-signal operation, t h e Gummel-Poon model used in SPICE re-
duces to the simple hybrid-7r model of figure 9. For this model, fmax = {.U/SnRbbCcbi)1^2-
It should be emphasized that Ccbi corresponds t o no particular physical area in the
collector-base junction. Specifically Ccbi is not equal Cc&,e, the capacitance of that
fraction of the collector junction which lies under the emitter. Instead, in this model
Rbb is given by eqn. 14, (Ccbx + Ccbi) = eAc/Tc, and the intrinsic collector-base
capacitance is set to Ccbi = Tcb/Rbb, where rcb is given by eqn. 19. Thus Ccbi is
defined to be so that the simplified model predicts the correct device fmax. To
correctly model common-base and emitter-follower input impedance at / ~ fT, the
transconductance element must have an associated delay of ~ (r c + CTb), where the
176 M. J. W. Rodwell et al.

factor £ ~0.1-0.2 is dependent upon the degree of base bandgap grading.

2.5. High fmax HBT designs

To obtain simultaneous high values of fT and fmax the emitter and collector stripe
widths must both be scaled. The substrate transfer process is an extremely aggres-
sive method of reducing the parasitic extrinsic collector-base junctions, and requires
a substantial departure from typical fabrication processes. There are alternatives
requiring less radical processing. With GaAs/AlGaAs HBTs 2 9 deep proton im-
plantation can reduce the extrinsic collector capacitance. The extrinsic collector
junction can be undercut using selective wet chemical etches (fig. 8) 3 0 , 3 1 . Collec-
tor capacitance under the base contact pad can be reduced using dielectric spacer
layers 3 2 . Alternatively, Rbb can be reduced by regrowing, prior to base contact
deposition, thick extrinsic P + contact regions on the exposed base surface 3 3 , 3 4 .
Finally, low RbbCcbi can be obtained in mesa HBTs by reducing the size of the base
Ohmic contacts. Using a CBr4 doping source, we have grown by MBE InGaAs
base layers with > 10 2 0 /cm 3 carbon (P-type) doping. At such doping levels, pc and
hence the transfer length I/ c o n t a c t = (p c /Ps) 1 / ' 2 are greatly reduced. The width of
the base Ohmic contacts can be accordingly reduced.

3. H B T D i g i t a l I n t e g r a t e d Circuits

,fT and fmax of scaled InP-based HBTs are significantly higher than Si/SiGe HBTs.
Consequently, tuned and broadband amplifiers using InP-based HBTs show sub-
stantially higher bandwidths than those implemented in Si/SiGe 3 6 , 3 7 , 3 8 . Yet, in
digital circuits the 2 competing technologies have held a rough parity for the past 3-
4 years. Since analog/digital mixed-signal ICs (fiber optic transmission ICs, ADCs,
DACs) are major HBT applications, we must examine in detail the relationship
between logic gate delay and HBT design and scaling. The reader is also referred
to gate delay analyes by Sano et. al. 4 9 , and Enoki et. al. 5 0 . General methods of
digital circuit delay analysis are discussed in Hodges and Jackson 5 1 .
We compute below, as a function of HBT parameters, the maximum clock rate
of an ECL master-slave (M/S) latch. M / S latches serve as timing control elements
in digital ICs, as latched comparators in ADCs, and as decision circuits in fiber
optic receivers. To benchmark their maximum clock frequency, M / S latches are
configured as 2:1 static frequency dividers. It is important to distinguish between
the maximum clock frequency of M / S latches configured as static dividers with
that of dynamic 2:1 frequency dividers, which operate significantly higher clock
frequencies, but have more restricted applications.

3 . 1 . Digital delay analysis

A schematic diagram of an ECL MS latch is shown in fig. 10. The master latch has
input stage Q l - 4 and latch Q5-8, while the slave latch has input stage Q13-16 and
latch Q17-20. The clock current is steered by Q9-12 and Q21-24. In our designs,
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 177

clock

Figure 10: ECL master-slave flip-flop. The current sources are implemented
with current mirrors. Except where marked, all resistors and transmission-line
impedances are 100 Q. Dotted lines indicate connections for a static 2:1 frequency
divider.

signals between gates are routed on the collector nodes, using 100 $1 transmission
lines terminated at sending and receiving ends in 100 fi.
A logic voltage swing AVi must be specified. Gate delay will vary with AVj,,
but a minimum AVT, is necessary for adequate DC noise margin hence proper logic
operation. In order for the differential pairs Q3-4 to properly steer the current of
Q9, the difference in the internal V(,e of the two transistors should be several times
kT/q. As a first assumption, we set AVbe,int ~ 6kT/q; this results in a e 6 :l ratio
between the currents in the on and off states. In the presence of parasitic emitter
resistance Rex, the logic swing required for at least an e 6 :l current switching ratio
is to
AVL > QkT/q + I0Rex = GkT/q + JoPe , (21)

where I0 is the switched current, Jo the emitter current density and pe the emitter
resistance normalized to a unit emitter junction area.
We compute, approximately, the gate delay using the charge control method,
adding the charging times of each node associated with the signal path. The node
charging time from the initial state to the (50%) switching point is At ~ AQ/2I,
where A Q is the switched charge, and i" the charging current 5 1 , 5 2 . This is
equivalent to analysis of a linearized version of the digital circuit, in which node
51
impedances are modeled by R = AV/AI, C = AQ/AV, and gm = AIc/AVbe .
Gain effects varying to second order in (jw) in the circuit transfer function are
neglected; this simplifying assumption introduces significant error by ignoring the
178 M. J. W. Rodwell et al.

i(r
cbx Mr -VA—c
Rbb C
MA- cbt ( J ) I'eadc exp(-7'coTc)

^t,diff ~ Sm^b
r=l/g
je

Figure 11: Simplified H B T common-base (T) equivalent circuit model used in the
logic delay analysis. Note that T/, is modeled as a diffusion capacitance while r c is
modeled as a transport delay.

effect of emitter-follower ringing.


We assume a current density Jo in the upper-level current-switch HBTs and a
current density J o / 2 for all emitter followers and for the lower-level clock-steering
current-switch HBTs. The upper-level differential current-switch transistors have
emitter areas Ae<cs, the lower-level (clock switching) current-switch transistors have
emitter areas 2 • AetCS, and the emitter followers have emitter areas Aeef. The
currents that flow in these devices are therefore IQ = JoAe<cs, IQ = (Jo/2)(2A e , c s ),
and IotE = (Jo/2)j4 e ,e/ respectively. The base-emitter voltage in the on-stage is
denoted as Vbe,on- F ° r simplicity, we assume a digital voltage swing AVL at all
upper-level collector nodes, although it is known that decreased MS latch delay
can be obtained by using smaller switched currents (hence smaller AVL) during
operation of the positive-feedback latch).
The large-signal base-emitter depletion capacitance is defined as Cje = c j e *Ae,
where the average capacitance per unit emitter area is

AQ_ = J_ fVb (22)


^ = 7^7
AV ~ ^7
AV J/..
Vbc
_AVC^V)dV •

The base-collector capacitance Ccb = ccb * Ae is taken as proportional to the emitter


area (thus assuming a fixed emitter-collector area ratio). The collector-base junction
is fully depleted and operates at current densities below that causing base pushout.
In the hybrid-7r model (fig. 9), the large-signal base-emitter diffusion capacitance is
Cbe,AiK = Io{Tb + TC)/AV. For common-base switching paths the T-model (fig. 11)
is employed; for that model, the base-emitter junction has a small-signal diffusion
capacitance Ct,diff = QmJbi while under large signal drive the capacitance becomes
rbIc/AVL.
Assume that t h e bases of Q l and Q17 are a t a logic high while the bases of Q2
and Q18 are at a logic low. At t = 0 the clock rises from low to high. The clock
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 179

differential pair Q10/Q12 changes state, establishing after a propagation delay a


collector current IQ in Q10 . The current I0 then charges the capacitances at the
emitter node of Q3, driving the node negative until Q3 turns on (fig. 12a). After
a propagation delay through Q3, IQ is established as a collector current for Q3.
The current I0 then charges the capacitances at the collector node of Q3, driving
the node negative with an charging time resulting from the node capacitances (fig.
12b). Finally (fig. 12c), the emitter followers (Q5,6,13,14) charge/discharge the
base-emitter junctions of the master-stage latch current-steering pair (Q7,8) and the
slave-stage input current-steering pair (Q15,16), with delay arising both from the
emitter-followers and from the (Q7,8,15,16) base-emitter junction charging through
Rbb Once this sequence is complete, the clock can change states and t h e sequence
repeats itself in the slave stage. Note t h a t t h e delays associated with t h e clock
differential pairs occur both in the master and in the slave and therefore, to a first
approximation, do not affect the maximum clock frequency.
We first calculate the switching delay at the emitter of Q3 (fig 12,a). Q10 is
turned on at t = 0. Q10 and Q3 are in series, and have equal on-state and off-state
emitter currents. The logic voltage swings at the base-emitter junctions of Q10
and Q3 must therefore be identical, with He.on — Vbe,off = AVi, = IQRL for both
transistors. Further, note that over the logic transition Cjez sees a voltage swing of
A V L - IoRex3 = IO{RL - Rexz), while Cctio sees a voltage swing of 2 • AVz,. There
is interconnect capacitance Cm\ at the node; further Q3 and Q4 have substrate
capacitances C s ,3 and C s ,4. The node charging time is

3
T _ AT/ ( ^ ~*~ ( ^ 3 4 + ^ml + ^CcblO + Cje4
-'Q3emitter - ' i ' L I ZJ

+(Avi-/o/je,3)(^r)+7i
2,CS -f- C"rnl/AejCS + 4ccf, -f- Cje
TQ3emitter = A V L
2JQ

+ (AVL - JoPe) ( | g ) + n , (23)

where the latter form is written using currents, capacitances and resistances nor-
malized to a unit H B T emitter junction area (pe = RexAe, rbb = RbbAe).
Second, we calculate the delay between the emitter and collector of Q3 (fig 12,b).
Q3 operates in common-base mode, and its gm element has delay TC. Capacitances
CCbs and CCb3 undergo a 2AV£, voltage swing; other capacitances undergo a swing
of A V L . Adding collector transmission-line bus delay TbU3 (fig. 10), t h e node delay
at the collector of Q3 is

T T
7Q3CO11 = c+ bus
A T, / 2CC(,8 + 2C C 63 + C c (, 5 + CcblZ \
+ AVL
{ Wo )
180 M. J. W. Rodwell et al.

low

a)

_ out

Q1 HK Q 3
b) _[rQ8
•v.1^ -\. 5k Q5 Q13

c)

Figure 12: Equivalent circuit of the nodes in the signal path for calculating the M/S
latch delay. Charging of the emitter node of Q3 (a). Charging of the collector node
of Q3 (b). Charging of the base of the switching transistor Q15 (c).
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 181

Table 1: Delay coefficients ajj, found by hand analysis, assuming gate delay of form
1
gate l/2/« lock = Zai

AVL/J0
Cje

1
Ccbx

6
Ccbi

6 1
Cs
fe TfJp
AVL
1
Th„,Jn
AVL
1
kT/qJo 0.5 1 1 0.5
- o r0 0.5 0
Pe -0.25 0.5 0.5 0.5 0 0.5 0
rbb 0.5 0 1 0 0 0.5 0

fQ3coll = Tc + Tbus
(4 + 2Ae,ef/Ae,c3)ccb
+ AV, (24)
2J0

Finally, we calculate the delay between the voltage transition at the collector
of Q3 and the base (internal to Rbb) of Q15 (fig 12,c). In the figure, the emitter
follower is represented by a T-model and the current-steering device by a partial
hybrid-7r model. The emitter followers Q6 and Q14, simultaneously undergoing a
negative-going transition, are explicitly assumed to remain on during the switching
event; this stipulates a minimum (Ae,e//M.e,cs) area ratio, and always-on operation
must be verified during design. The bias current in Q7,8 and Q15,16 must be
examined carefully for this calculation. Under maximum-clock-rate operation, the
base voltages of Q7 and Q15 change states only slightly before an emitter current
is established in these transistors through the turn-on of Q12 and Q24 at the next
clock high-low transition. A hand calculation here can only be approximate; we will
take the emitter current of Q7 and Q15 to be 7 0 during the base voltage transition.
The delay is

T Q i3/i6 = (l/2)(kT/qI0,E + Rexl3)


x (Cs + 2Ccbl5 + Cjel5 + TfI0/AVL)
+ (l/2)iJw,i3 [CiM + 2Ccbil5 + T / J O / A V L )
T,Qis/15 = (l/2)(2kT/qJ0 + Pe)(Ae,cs/Ae,ef)

x {Cs/Ae,C3 + 2ccb + cje + TfJ0/AVL)


+ {l/2)rbb{cje + 2ccbi + TfJ0/AVL) . (25)

The total gate delay is then

1/2/ciocfc — Ttotal — T Q 3 e m i t t e r + TQ3CO11 + TQI^/I5 (26)

and the maximum clock frequency /c;ocfc is determined.


Both hand analysis and SPICE simulations indicate that hiock exhibits a broad
maximum as a function of the ratio of emitter follower to current switch emitter
areas, with Ae,ef/Ae^cs ~ 2 being optimum. We subsequently assume this ratio.
Results of the hand calculations are summarized in table 1. Note that because
182 M. J. W. Rodwell et al.

Table 2: Delay coefficients a^-, found by SPICE, assuming gate delay of form Tgate
=
1/2,/clock Z-idijriCj.

Th„,Jn
Cje C-cbx c
cbi cs AVt AVz,
AVL/J0 0.8 4.3 4.3 1.9 0.7 1.6 1
Pe -0.1 15 15 2.1 6 0.2 0
ru 0.7 2.2 5.2 0 0.1 0.7 0

the logic swing AVL is large in comparison with kT/q, terms in kT/qJo will be
substantially smaller than terms in AVL/JQ TO somewhat simplify the tabulations,
terms in kT/qJo were combined in subsequent tables with those in AVL/JQ. The
delays ry and Tbus are written as effective capacitances (TJQ/AVL), in order to
represent the delay in the form Tgate = l/2/ c ; 0 cfc = T,aijTiCj.
A large set of SPICE simulations were performed of M / S latch maximum toggle
rate, using circuit models of /xm-scale and submicron-emitter transferred-substrate
HBTs. To the extent that the gate delay can be approximated by first-order delay
terms, Tgate = 1/2fciock = T^a^riCj, the delay coefficients a^ can be found by
varying the H B T model parameters in the simulations. Tables 2 and 3 show the
results of this analysis for the HBT of fig. 28. Given the many simplifications
involved in the hand analysis, the correlation between hand analysis and simulation
is reasonable, except that in hand analysis smaller coefficients are found for the
terms rexccbx, rexccbi, rt,bCcbx, and rj,f,cci)i, terms which in the simulations are found
to collectively contribute 29% of the latch delay.
Tables 1, 2, and 3 provide important points regarding HBT design for fast
logic. In modern InP-based HBTs, ry is relatively small, and (for present UCSB
HBTs) contributes only ~ 20% of total gate delay, an amount comparable to the
delay contributed by Cje, and much smaller than the ~ 40% contributed by terms
associated with (cC(,x + cCbi). Low base and collector transit times, hence high fT, is
-of itself- not indicative of high speed logic operation. This is because under logic
operation, the change in base+collector stored charge is

AQ fe Tflo AVL = CUAVL (27)


\AVL
while under small-signal operation

SQb,c = 9mrfSVbe = ( - g ^ SVbe = Css6Vbe (28)

Under logic operation, the base-emitter diffusion capacitance associated with Tf is


reduced in proportion to the ratio of logic swing AVi to kT/q, a ratio of typically
10:1. In contrast, under logic operation the capacitances Cje and Ccb must be
provided with charge CjeAVi, and Ccf,AVz,.
Examining the delay components in terms of real (r^f,, rex) and equivalent
( A V L / J O ) resistances through which the depletion and diffusion capacitances are
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 183

Table 3: Delay components, found by SPICE, as a fraction of a total 4.9 ps latch


delay, for the HBT of fig. 28. All emitter-followers and the lower-level current
switch devices operate at 105 A/cm 3 current density, with the upper-level current
switches operating at 2 • 105 A/cm 3 . The logic swing is AVi = 200 mV
Tfjp
Cje Ccbx Ccbi cs AVL total
AV^/Jo 6% 7% 5% 4% 1% 11% 10% 44%
Pe 0% 9% 7% 2% 3% 1% 0% 21%
rbb 12% 1% 12% 0% 0% 10% 0% 35%
total 18% 16% 23% 6% 5% 22% 10% 100%

charged, we are faced with a significant discrepancy between hand and computer
analyses. In either analysis, AVL/J0 is dominant. A key conclusion is that high
current densities are essential for fast HBT logic circuits. If the HBT is operated at
a current density limited by the Kirk effect (eqn. 6), then the delay terms associated
with charging the collector-base capacitance,
CcbAVL f-Ac AVL
Io Tc
Ac AVL Tc
(29)
Ac (Vcb + 4>)4v s o t
are minimized through use of thin collector layers. Delay associated with r^ is
also significant. Finally, note that while simulations associate 22% of the net delay
with Rex, this underestimates its effect; because adequate noise margin demands
AVz, > QkT/q+Jope (eq. 21), reducing delay terms associated with AVL/JO through
increased current density demands simultaneous improvements in pe.

3.2. Scaling for high speed logic


As examined in sections 2.2 and 2.3, lithographic scaling of the emitter and col-
lector junction widths progressively increases fmax if the parasitic collector-base
junction is eliminated. If the lithographic dimensions are scaled while holding the
base and collector epitaxial layer thicknesses constant, fmax increases rapidly while
,fT remains relatively constant. While such a device will produce gain at very high
frequencies in reactively-tuned MIMICs, broadband analog circuits require simul-
taneous high values of fT and / m o x .
In analyzing HBT logic speed (section 3.1), it is found that ~ 10-15 equivalent
RC delay terms are significant. In order to improve logic speed, all significant HBT
capacitances and transit delays must be reduced. We now examine the scaling of
HBT parameters required to increase bandwidth by a factor of 7 : 1 , using simplified
expressions for HBT parameters in order to more clearly show the dominant trends.
To ensure that bandwidth increases by 7 : 1 for all circuits, digital and analog, using
the scaled HBT, all transit times and all capacitances in figure 9 must be reduced
184 M. J. W. Rodwell et al.

by 7 : 1, while maintaining constant all resistances, the transconductance, and the


collector bias current Ic. Explicitly, Ic oc 7 0 and gm oc 7 0 .
The base-emitter diffusion capacitance is

C6e,d,ff = gm{n + rc) = (qIc/kT){K2Tb2 + K3TC) , (30)

Here the terms /tj represent parameters which do not change with scaling. To obtain
Cbe.diff oc 7 _ 1 with fixed Ic, we must set T(, OC 7 _ 1 and r c oc 7 - 1 . This requires
Ttcxj-1/2 and Tc oc 7 - 1 .
An immediately apparent limit to collector scaling is loss of collector breakdown
voltage. An AlInAs/GalnAs H B T with a 0.2 fim InGaAs collector thickness ex-
hibits Vbr,ceo — 1-5 V at 10 5 A / c m 2 bias. Semiconductors with higher products
{£ma.xVSa.t) of breakdown field and electron velocity mitigate this limit; HBTs with
InP collectors 3 5 exhibit r c comparable to devices with InGaAs collectors but have
~ 5:1 increased breakdown. Regardless of the collector thickness, impact ionization
cannot occur for Vce less than the bandgap of the collector semiconductor. Fur-
ther, unless the collector bandgap is small or the collector much thinner than 1000
A, Zener tunneling currents will also be small for bias voltages below the collector
bandgap energy. Even with 1000-A collector layers, an I n P / G a l n A s / I n P DHBT
will exhibit Vbr,Ceo > 1-2 V , sufficient for current-mode logic. While important
in power amplifiers and in mixed-signal (medium-voltage) ICs, loss of breakdown
voltage may not pose a serious limit to the scaling of InP-collector DHBTs for
low-voltage, high-speed logic.
The capacitance Cje is given by

Cje = Cjei + Cje2 = K4LeWe/Teb + K^,TebTbIc • (31)

Analysis of the partitioning of Ccb between Ccb,x and CCbi is complex (section
2.3), and in this section we therefore restrict the analysis to HBTs in which Ccb,x
is zero (Lc ~ Le and Wc — We) and CCbi = Ccb- Such HBTs include transferred-
substrate (figure 4) and undercut-mesa devices (figure 8), and mesa devices having
very high base doping and hence requiring only a very small base Ohmic contact
width. Ccb then scales as

Ccb = eWcLc/Tc ~ eWeLe/Tc . (32)

Because Tc ex 7 - 1 , to obtain Ccb oc 7""1 we must set WCLC oc 7 - 2 and hence


We£eOC7~ 2 .
The base resistance Rbb is the sum of the terms (eqn. 14) Rb,cont, Rgap and
Rapread' Correct scaling of Ccb requires that WeLe oc 7 ~ 2 . It is desired that Rbb vary
negligibly with scaling; we show here that this is obtained by setting We oc Wc oc 7 - 2
and Le ~ Lc oc 7 0 . T h e base contact resistance term Rb,Cont = ^6/V /LeTe is
proportional t o 7 1 / 4 , while Rspread = K7We/LeTeb oc 7 - 3 / 2 . If we scale Web oc 7 - 1 ,
then Rgap = n%Web/LeTb oc 7 - 1 / 2 . While the contact resistance term Rb,Cont, the
dominant term in Rbb for submicron devices, increases (oc 7 1 / 4 ) slowly with scaling,
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 185

the rapid decrease in Rgap and Ravread results in a total Rbb showing only a very
slow increase with scaling.
To obtain Cj e 2 oc 7 " 1 we must set Teb ex 7 ~ 1 / 2 . This results in Cjei oc 7 ~ 3 / 2 ,
improving more rapidly than required for a 7 : 1 scaling in transistor bandwidth.
The collector series resistance Rc is zero in transferred-substrate HBTs using
Schottky collector contacts. In undercut-mesa devices, Rc has a similar geometric
dependence as Rbb, and also varies only minimally with scaling.
Scaling thus requires that the emitter and collector stripe widths We and Wc
be proportional to 7 ~ 2 , and that the emitter and collector stripe lengths Le and
Lc be independent of scaling. Because the collector current is constant (Ic oc 7 ),
the emitter current density increases quadratically with the desired improvement
in transistor bandwidth (Je oc 7 2 ) , as does the transistor's operating power density
{P/Ae = JeVce oc 7 2 ) . Limits to bias current density imposed by reliability con-
cerns and dissipated power density are thus major impediments to scaling for high
bandwidth.
The emitter resistance Rex = pe/WeLe presents a major impediment to scaling.
With WeLe oc 7 ~ 2 , in order to maintain the desired constant Rex the aggregate
emitter resistivity pe oc 7 ~ 2 must improve in proportion to the square of the in-
tended improvement in HBT bandwidth. This will require substantial increases in
emitter doping over those now typically used in HBTs, and use of low-resistivity
(e.g. InAs) semiconductor contact layers.
The collector-emitter resistance is Rce = VA/IC, where the Early voltage is VA =
qNaTbTc/e and Na is the base doping. From these relationships Rce oc 7~ 3 / ' 2 , and
does not scale as desired. Fortunately, for an HBT with Tb = 300 A, T c = 0.2/xrn,
and NA = 5 • 10 1 9 /cm 3 ( a device with 275 GHz f T ) , VA ~ 500 V. A 7 = 10 : 1
scaling for a target 2750 GHz fT would still result in VA = 16 Volts, which is
acceptably large. In HBTs, degradation of Rce through base-width modulation is
not a significant impediment to scaling.
In scaling the device, we have set We oc 7 2 and Le oc 7 0 . If all other widths
and lengths in the device layout are scaled in the same proportions, then the H B T
area, and the area of a given circuit, are proportional to 7 - 2 . The average wire
length within the circuit is proportional to the square root of the IC area, and
hence is proportional to 7 _ 1 . Wiring delays, whether transmission-line delays or
CWireAV/AI charging times, thus also scale correctly. Because of the fixed bulk
metal resistivity, interconnect parasitic series resistance does not scale correctly,
increasing as 7 2 .
In scaled HBTs, base current is dominated by surface recombination and by
currents conducted on the surface between the base-emitter junction and the base
Ohmic contact. Consequently , Ib oc n(Teb)Le. Because Ic oc LeWen(Teb)/Tb,
(3 ex We/Tb. With the scaling laws above, fj oc 7 ~ 3 / 2 . Current gain decreases
rapidly with scaling, and reduction of surface recombination and surface conduction
is critical in deep submicron devices.
Finally, we reconsider scaling of the mesa HBT. For mesa HBTs, base and
186 M. J. W. Rodwell et al.

Table 4: Scaling laws for HBTs; required proportional change in key relevant HBT
physical parameters in order to obtain a 7:1 increase in bandwidth in an arbitrary
circuit. Additionally, for mesa HBTs, but not transferred-substrate or undercut-
mesa devices, the base contact resistivity pv must scale as 7 - 2
parameter symbol scaling law
collector depletion layer thickness Tc 7-1
base epitaxial layer thickness Tb 7 -i/2
emitter-base junction width We 7-*
collector-base junction width wc 7"*
emitter-base depletion thickness Teb 7-l/2
emitter parasitic resistivity Pe — Ji<exJ*e 7"*
emitter junction area Ae = WeLe 7"*
emitter current h 7U
emitter current density Je 7*
bias and signal voltages VCE, vce, vbe 7U
average interconnect length L*wire 7"'
circuit area - 7-*
device power density - 7*
circuit power density - 7*

collector thickness, emitter and collector junction widths, emitter contact resistiv-
ity, and current density must all scale as discussed above for undercut-mesa and
transferred-substrate HBTs. In particular, the base-collector junction width must
still scale as 7 - 2 . For a normal triple-mesa device, this then requires t h a t the
widths Wb of the base Ohmic contacts (fig. 1) scale as 7 - 2 , while maintaining a
fixed Rb,cont = (/5s/Sc) 1 ^ 2 (l/2-^e)coth(iy6/L con t ac t). This can be accomplished by a
combined reduction of both ps and p c , and hence a general analysis is exceedingly
complex. As a limiting case, with a highly scaled HBT, Wc and must be very small,
and hence Wb will be much less than jLcontact- In this case Rb,cont — Pc/^LeWb,
and hence constant Rb,cont requires that the base Ohmic contact resistivity scale
as pc oc 7 - 2 . Transferred-substrate and narrow-mesa HBTs do not require this
improvement of base contact resistivity with scaling.

To simultaneously increase HBT bandwidth in general circuits by 7 : 1, emitter


and collector junction widths must vary as 7 - 2 while maintaining constant junction
lengths. Base thickness must vary as 7"" 1 / 2 and collector thickness as 7 - 1 . Emitter
current density and transistor and IC power density all increase in proportion to
7 2 . The emitter contact structure must improve in proportion to 7 2 . Power dissi-
pation, reliability under high-current operation, required improvements in surface
recombination velocity, and the required quality of the emitter Ohmic contact are
the most significant impediments to scaling. These relationships are summarized in
table 4.
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 187

Table 5: SPICE simulation results of flip-flop clock speed as a function of transistor


design. Interconnect capacitance and delay is not considered. Boldface indicates
parameter changed from previous design..

Emitter Collector Base Clock


parasitic current
width resistance width thickness density material thickness doping
1.2 50 1.8 3000 l.OE+05 InGaAs 400 4E19Be 115
0.7 50 1.5 3000 l.OE+05 InGaAs 400 4E19Be 125
0.7 50 1.5 3000 l.OE+05 InGaAs 300 4E19 Be 128
0.7 50 0.8 3000 1.0E+05 InGaAs 300 4E19Be 159
0.35 50 0.45 3000 l.OE+05 InGaAs 300 4E19 Be 176
0.35 50 0.45 3000 l.OE+05 InGaAs 300 1E20C 182
0.35 25 0.45 2120 2.0E+05 InP 300 1E20C 250
0.35 12.5 0.45 1500 4.0E+05 InP 300 1E20C 285
urn Ohm-um2 urn A A/cm2 -- A cm"3 GHz

3.3. Design projections for > 200 GHz logic


Following the design rules above, a scaling study of high speed M / S latches was
pursued. Based upon measured parameters of tested HBTs, an HBT SPICE model
was developed in which model elements (depletion capacitances, contact resistances,
and carrier transit times) were calculated as a function of lithographic dimensions
and layer thicknesses. ECL master-slave flip-flops were then simulated for maximum
clock frequency. The results (table 5) start with the HBT design of fig. 28, and
show progressive increases in clock rate as the emitter and collector stripe widths
are reduced, base and collector layers thinned, the current density increased, and
the emitter contact resistivity reduced. Thin collector layers are here required not
primarily for low TC, but primarily so as to increase (eq. 6) the current density at
base pushout, and hence decrease C^AVf,// (eq. 29).

4. T r a n s f e r r e d - s u b s t r a t e H B T s

Wide HBT bandwidths are obtained by scaling. In scaling for high fT, significant
188 M. J. W. Rodwell et al.

0.5
Emitter Collector depletion region
0--

-0.5

o -1 -
c Schottky
LU collector
-1.5

' ' I ' ' I I I I I I I I T


1000 2000 3000 4000 5000 6000
Distance, A

Figure 13: Band diagram, under bias, of a typical device.

limits include high power density and high current density, demands for very low
emitter parasitic resistance, and the collapse of fmax due to the extrinsic collector-
base junction. Using substrate transfer processes, this extrinsic junction can be
reduced in size or eliminated. This permits either aggressive lithographic scaling
without epitaxial scaling for greatly increased fmax at constant fT. Alternatively, if
high values of both fT and fmax are sought, simultaneous lithographic and epitaxial
scaling is required; with the extrinsic Cc\, eliminated, operation at high current
density and reduction of the emitter resistance are the key requirements for further
scaling.

4 . 1 . Growth and fabrication

The epitaxial layer structure is described by its band diagram (fig. 13). The In-
GaAs base is typically 300-400 A thick, has 2kT bandgap grading, and is Be-doped
at 5'10 1 9 /cm 3 . The InGaAs collector is 2000-3000 A thickness. A collector N+
pulse-doped layer placed 400 A from the base delays the onset of base push-out
at high collector current densities. Although such pulse-doped layers have been
used as electron launchers 3 9 in GaAs-based HBTs, our experimental d a t a shows
no significant effect of the launcher upon TC for InGaAs-collector HBTs.
Devices typically use Schottky collector contacts 4 0 , although HBTs with N +
subcollector layers (Ohmic-collector devices) have also been fabricated. While
Ohmic-collector devices have non-zero collector series resistance, hence lower fmax
24
, the 0.2 V barrier present in the Schottky-collector device increases the Vce re-
quired to suppress base push-out at high current densities. Ohmic-collector devices
thus show higher fmax under the low-V^ conditions associated with current-mode-
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 189

1) Normal emitter, base processes. 2) Coat with BCB polymer.


Deposit silicon nitride insulator. Etch vias.

Figure 14: Transferred-substrate HBT process flow.

logic (CML). Schottky-collector devices are used for emitter-coupled-logic (ECL),


where the operating Vce is higher.
Figure 14 shows the process flow. Standard fabrication processes 4 2 define the
emitter-base junction, the base mesa, polyimide planarization, and the emitter con-
tacts. The substrate transfer process commences with deposition of the PECVD
SisN4 insulator layer and the Benzocyclobutene (BCB) transmission-line dielectric
(5 ^m thickness). Thermal and electrical vias are etched in the BCB. The wafer
is electroplated to metallize the vias and to form the ground plane. The wafer is
then solder-bonded to a GaAs carrier substrate. The InP substrate is removed in
HC1 and Schottky collectors are deposited, completing the process. Fig. 15 shows
a detailed device cross section.
For the emitter-base junction, deep submicron scaling requires tight control of
lateral undercutting during the base contact recess etch. To form the emitter,
reactive-ion etching in CH4 / H2 / Ar, monitored with a HeNe laser, first removes
the N + Gain As emitter contact layer. A HCl/HBr/Acetic selective wet etch then
removes the AlInAs emitter, stopping on the AlInAs/GalnAs emitter-base grade.
By etching at 10° C, the etch rate is slowed, and a controlled emitter undercut is
formed. The undercut both narrows the emitter and serves (as normal) to define
the liftoff edge in the self-aligned base contact deposition. A timed nonselective wet
Citric/H3P04/H202 etch then removes the base- emitter grade. Etch selectivity
190 M. J. W. Rodwell et al.

Schottky collector contact

gold
BCB thermal BCB
via
gold ground plane
solder bond
GaAs substrate
emitter ^ base ^ polyimide §
^53
metal H Si3N4 j collector ^
Figure 15: Schematic cross-section of a transferred-substrate HBT

iiiiijiiHiiisiiii
BIlHIl^HBiHiiEini

Figure 16: Cross-section of emitter-base junction. The 0.5 jum emitter metal was
defined with a projection lithography system.
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 191

Figure 17: Transferred-substrate HBT defined by contact lithography

Figure 18: E-beam HBT: test structure with 0.15 /im emitter-base junction (a),
and 0.4 fim. Schottky collector stripe (b)

in both the RIE and HCl/HBr/Acetic etches aids in etch-depth control, and we
are able to reproducibly etch ^100 A into the base without use of surface contact
resistance probing as a process monitor. Figure 16 shows the cross-section of a
0.15-/im emitter-base junction.
In defining submicron collector-base junctions, use of the Schottky-collector con-
tact eliminates the need for an etch of similar precision through an N + collector
Ohmic contact layer. The collector junction is defined by the stripe width of the de-
posited metal. Subsequent to collector deposition, a self-aligned wet etch of ~1000
A depth remoYes the collector junction sidewalls (eliminating fringing fields) and
reduces the collector junction width by ^2000 A. The step, intended to reduce
CCbi generally provides a greater increase fmQx than would be expected from the
observed reduction in collector junction width.
Given the unusual features of the substrate transfer process, IC yield is a signif-
icant concern. The transistors and ICs reported here have all been developed by a
192 M. J. W. Rodwell et al.

team whose average size -over t i m e - is approximately 12 Ph.D. students, working


in a university cleanroom, and responsible for all aspects of technology, includ-
ing crystal growth, processing, IC design, and testing. It is therefore difficult to
separate yield difficulties inherent to the substrate transfer process with yield diffi-
culties associated with limited manpower available to address process control, and
the limited quality of university cleanroom equipment. Process failures do result
from failure of the substrate transfer steps (failure of solder adhesion, failure -for
unknown causes-of the substrate removal selective wet etch), but -equally- pro-
cess failures arise in HBT fabrication steps unrelated to that of substrate transfer.
Significant among these are excessive undercut in the emitter-base junction etch,
failure of the emitter-base RIE or selective wet etches, emitter-base short-circuits
forming during base contact liftoff, liftoff failures in interconnect metals, poor adhe-
sion of resistor metal, and variation of resistor sheet resistivity. Given the resources
available to a larger industrial group, various process difficulties -whether associ-
ated with or independent of substrate transfer- could be addressed. We believe the
most serious fundamental difficulties are with the solder bonding and with the small
wafer expansion after bonding (below), which most probably results from mechan-
ical creep of the solder under exposure to stress and temperature cycles. Solder
bonding also is presently limited to small wafer sizes (quarters of 50 mm wafers).
More dimensionally stable alternatives, possibly spin-on-glasses, should be found
for both the solder and the BCB dielectric.
Presently the largest working ICs fabricated in the process are 150-HBT ADCs
and 250-HBT binary adders. The most significant process difficulty is dimensional
change of the wafer during substrate transfer. Presently wafers show 3 • 1 0 - 4 frac-
tional expansion after transfer, resulting in ±0.5 /xm misregistration (during col-
lector lithography) at the edges of the stepper exposure field if a 3 mm reticle
is employed. We presently adjust the dimensions of the collector mask as a cor-
rection. At the expense of increased effort during collector lithography, a smaller
exposure reticle size can be used for the collector lithography than for the steps pre-
ceding substrate transfer. The relative sizes of the emitter and collector junctions
are determined by lithographic alignment tolerances, and the collector stripe width
must exceed the emitter stripe width by twice the lithographic alignment tolerance.
Our electron-beam lithography system can align to 0.1 /xm registration, and our
projection lithography system aligns to 0.1-0.3 /xm registration, depending on the
time since maintenance. Modern projection lithography systems are much better,
0.35-/xm-resolution steppers have ~ 300 A registration tolerance.

4.2. Device results

Transferred-substrate HBTs have been fabricated using contact lithography at 1-2


/xm resolution, using a 0.5 /xm stepper, and using electron-beam lithography. Fig.
17 shows a device defined by optical lithography 4 1 . Figure 18 shows HBT emitter-
base and collector-base junctions defined by electron-beam lithography.
Figure 19 shows microwave gains for a deep submicron device fabricated using
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 193

J i i i . i i—i—i i

Mason's gain, U

MSG
"V
•• f =1080 GHz

X f =204 GHz
^* T

l = 6mA, V = 1.2 V
—i 1 r~

10 100 1000
Frequency.GHz

Figure 19: Gains of a 0.4 /jm x 6 fim emitter and 0.7 /xm x 10 /xm collector HBT
fabricated using electron-beam lithography. Theoretical -20 dB/decade (H21, U)
gain slopes are indicated. The device exhibits an extrapolated 1.08 THz / m a x
194 M. J. W. Rodwell et al.

MAG/MSG
CO common emitter
O 15
MAG/MSG
10 common base
MAG/MSG
5 common collector
T 1 1—I—I—r-
0
10 . 100
Frequency, GHz

Figure 20: Variation of transistor gains with frequency, computed from a hybrid-
n H B T model. Shown are the maximum available / maximum stable gains
(MAG/MSG) in common-emitter, common-base, and common collector mode, and
Mason's invariant, U, the unilateral gain

electron-beam lithography, reported by Lee et. al. 4 3 . The base and collector layers
are 400 A and 3000 A thick, while the emitter and collector junction dimensions
are 0.4 /xm x 6 fim and 0.7 /jm x 10 /xm. Biased at Vce = 1-2 V and Ic = 6 mA
(J e = 2.5 x 10 5 A / c m 2 ) , the device exhibits 204 GHz fT. If extrapolated at -20
dB/decade, a 1080 GHz fmax is determined. We note, however, that such a 10:1
extrapolation must be treated with considerable caution.
We have extrapolated Mason's invariant (unilateral) gain at -20 dB/decade to
determine the extrapolated fmax- Mason's gain 4 6 is invariant with respect to em-
bedding the device in a lossless reciprocal network, and consequently is independent
of pad inductive or capacitive parasitics and independent of the transistor configu-
ration (common-emitter vs. common-base). For HBTs well-modeled by a hybrid-7r
equivalent circuit, Mason's gain conforms closely to a -20 dB/decade variation with
frequency (fig. 20). In marked contrast, the maximum available / maximum stable
gain is a function of the transistor configuration, and shows no fixed variation with
frequency, fmax is unique; at / = fmax the MAG/MSG and U are both 0 dB.
Device gains were measured over 45 MHz-50 GHz and 75-110 GHz using a mi-
crowave network analyzer and microwave wafer probes. To avoid uncorrectable
measurement errors (in S12, hence U) arising from variable probe-probe electro-
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 195

magnetic coupling, the HBTs are separated from their probe pads by long on-wafer
50 Q, microstrip lines. On-wafer line-reflect-line calibration standards are used to
de-embed the transistor S-parameters 4 4 . Before extracting HBT power gains to
extrapolate fT and fmax, it is essential to verify the on-wafer calibration through
measurement of known standards, to verify that the probe-probe parasitic coupling
(as measured from the S12 of an on-wafer open-circuit standard) is at least 15-20 dB
smaller than the measured transistor S12, and to ensure that the transistor's mea-
sured S-parameters have a variation with frequency which conforms closely to that
of a hybrid-7r model. In the 75-110 GHz band, with high-/ m 0 x (hence very low Si2)
HBTs, we have found that these requirements cannot be met using commercially-
provided calibration substrates or with probe pads immediately adjacent to the
transistor under test. The on-wafer LRM method is required, and the probe-probe
separation must be at least 500 /im for all calibration test structures and for the
device under test. In addition to the 10:1 extrapolation to 1.08 THz fmax, the very
high power gain at 110 GHz also results in significant measurement variability, with
repeated calibrations at the same bias point giving extrapolated fmax varying from
1.0 to 1.3 THz.
We have recently acquired a 140-220 GHz network analyzer with on-wafer probes,
and are now developing methods to obtain precision HBT measurements in this
band. Preliminary HBT measurements on a recently-processed submicron HBT
wafer indicate ~ 10 dB unilateral power gain and maximum stable gain at 200
GHz (the device is potentially unstable even at this high frequency) 3 6 . We have
also recently demonstrated single-stage tuned HBT amplifiers at 185 GHz 3 6 ; this
indicates significant HBT gain at 200 GHz. Given current measurement data, the
1.1 THz extrapolated fmax is presently best viewed simply as an extremely high
measured power gain at 100 GHz.
Cc(, cancellation contributes substantially to the fmax obtained. At zero cur-
rent, Ccb,e — tAe/Tc = 0.9 fF. The measured variation of fT vs. Vce (fig. 21)
indicates drc/dVce ~ 0.18 ps/V, predicting ~0.9 fF reduction in CCb,e from Ic = 1
mA to Ic = 6 mA. The total collector-base capacitance CCb is determined from the
measured variation with frequency of the imaginary part of the admittance param-
eter 3[Yi 2 ] = jwCcb- The total Ccb determined from Yj2 (fig. 22) shows a 0.64
fF decrease between 1 mA and 6 mA Ic. The measured variation in the total CCb
primarily reflects variation in the capacitance Ccb,e- The reduction Ccb,e with bias
current results in a rapid increase in fmax with bias (fig. 23).
Figure 24 shows the small-signal hybrid-7r model. The measured S-parameters
(fig. 25), /i2li and U, show good correlation with the hybrid-7r model, and the model
parameters are consistent with measured bulk and sheet resistivities and junction
capacitances. The HBT output conductance is dominated by Rcb, which represents
variation of collector-base leakage with bias. This is likely due to impact ionization.
Base-width modulation in HBTs is negligible, hence Rce is very large. Cbe,poiy is a
metal-polyimide-metal overlap capacitance between the emitter and base contacts
(fig. 15) which contributes an additional Cbe,poiy{Rex + kT/qIc) = 60 fs to the
196 M. J. W. Rodwell et al.

1200- I I I I I I I I I I I I I I I I ' * • ' • ' l


' ' ' • ' •350
5 2
J = 2.5 x 1 0 A/cm

-300

N
I
^250 O
I
N

•200

1
' ' I' ' ' ' I' ' ' ' I••' ' I' ' ' ' i •' ' ' I' ' •' I' •' ' 150
0.95 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35
VCE, Volts

Figure 21: Variation of fT and / m a x with collector-emitter voltage

2.8- 1 1 I 1 I I 1 1 1 I 1 1 1 1 I 1 1 1 1 I 1 1 1 1 I 1 1 1 1

t p2.6-
+c ,

(extracted from microwav

\ measured
cb.i

\ 0.64 fF decrease,
\ 1 -6mA
cbjt

i».
=c
cb. Mat

KJ
c

^ — - J t
to

"
1 ft -
. -

I ,mA
c

Figure 22: Collector-base capacitance extracted from Y12, vs. emitter current
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 197

' • I I I I—I—I—I—L.
1200-

1000-

V =1.16V
N 800- ce
x

"a 600 •

-"" 400-

200-

0-
J , x10 5 A/cm 2

Figure 23: Variation of fT and /max with emitter current density

transistor forward delay.


Neither contact lithography nor electron-beam lithography is suitable for fab-
rication of large ICs. We have fabricated HBT ICs using a 0.5 /im projection
lithography system, and have obtained > 800 GHz fmax (fig. 26).
With the exception of reactively-tuned circuits, for which fmax is the sole deter-
minant of circuit bandwidth, circuit design generally requires high values for both
fT and fmax- Figure 27 4 5 shows the forward delay of an HBT with 0.6 /im x 8
/im emitter and 2/im x 12 /im collector junctions, a 400 A thick base with 52 meV
bandgap grading, and a 2000 A thick collector. The peak fT is 252 GHz, and RC
charging terms constitute 35% of the forward delay. Figure 28 shows R F gains for
a similar device with a thinner base, narrower emitter and collector junctions, and
increased (Je = 2.5 x 10 5 A/cm 2 ) current density 4T . The device exhibits simulta-
neous 295 GHz fT and fmax- Significant terms in r e c = l / 2 i r / T are Tj, + T C = 395 fs,
Cje/gm = 82 fs, Ccb/dm = 20 fs, and RexCct, = 39 fs. To obtain further increases in
fT, the collector must be thinned, current density further increased and the emitter
parasitic resistance improved.
Device scaling also reduces D.C. current gain. Base current in narrow-emitter
InAlAs/InGaAs HBTs is predominantly due to conduction on the exposed InGaAs
base surface between the emitter mesa and the base Ohmic contact. /3 decreases
with emitter width, but increases as the base is thinned, as base bandgap grading
is increased, and (at the expense of fmax) as the emitter- base spacing is increased.
198 M. J. W. Rodwell et al.

Cob,x = 1-82fF

H( rbc = 28 Kfi
i-^WV-H
Base rbb = 2 8 « C cbi = 0.18 fF Collector
-AW-17 r- Hr <250Kfl
—•
C be , depl _l£"e.d^J> 'be

= 36fF 9mvbeexPHo>(0.16ps)] -
C - 3 0
be,poly & C o u t = 1.0fF
rex = 1 5 n '

9m =I«/VT= 0-231 S

Tf = 0.48 ps Emitter
c T =111fF
be,diff = 9 m f

Figure 24: Device equivalent circuit model at Vce = 1.2 V and Ic = 6 mA.

P > 50 has been obtained with 0.2 /im emitters (fig. 29). Using 0.7 jim emitters
and 300 A base thickness with 2kT grading, (3 ~ 200 is obtained.

4.3. Interconnects and thermal management


In developing an integrated circuit technology for microwave mixed-signal ICs, ~
100 GHz digital logic, and 100-300 GHz monolithic transmitters and receivers, sig-
nificant issues in interconnects, packaging, and thermal management must also be
addressed. Wiring parasitics, including line capacitance per unit length, line de-
lay per unit length, ground via inductance, and parasitic ground return induc-
tance, must all be minimized. Ground via inductance (~ 12 pH, or j7.5 f2 at
100 GHz) in standard 100-/xm-substrate microstrip MIMICs makes low-impedance
source/emitter grounding difficult in > 100 GHz ICs. The interconnects must have
low capacitance and low delay per unit length, and the wire lengths, hence transis-
tor spacings, must be small. Given that fast HBTs operate at ~ 10 5 A / c m 2 current
density, efficient heat sinking is then essential. To provide predictable performance,
interconnects of more than a few ps length must have a controlled characteristic
impedance. To prevent circuit-circuit interaction through ground-circuit common-
lead inductance ("ground loops"), the IC technology must provide an integral low
inductance -hence unbroken- ground plane for ground-return connections.
Ground-return inductance between the IC and package results in "ground bounce"
and hence interaction between the I C s input and output lines. For ICs with top-
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm- Wave ICs 199

measured
equivalent circuit

Figure 25: Measured 45 MHz-50 GHz and 75-110 GHz device S-parameters at
F cc » 1.2 V and Ic = 6 mA. The solid line represents S-parameters of the equivalent
circuit model (fig. 24)

Figure 26; SEM from emitter side of a stepper-defined HBT with a 0.2 /on x 6 |im
emitter.
200 M. J. W. Rodwell et al.

i i i I i i i i I 1
• • i • ' ' • • • ' '

Tb+xc = 0.41 ps
RexCcb= 0.114ps
0.51 ps Cje/gm= 0.065 ps
0.4-
Ccb/gm= 0-045 ps
0.2- / T = 252 GHz total = 0.634 ps
0 -I—|—i—i—i i |—i—r- I | 1 I I I | I I I I | I I I I

0.1 0.2 0.3 0.4 0.5 0.6


1/1 (1/mA)

Figure 27: HBT forward transit delay vs. inverse emitter current for an HBT with
a 2000 A thick collector and a 400 A thick base with 2kT bandgap grading. RC
charging terms are significant in determining fT

_i_i_i_l_ • i i .I
50-
h V = 1 V
2i CE ' J c = 1-5mA/um2

f =295 GHz
T

10-
f = 295 GHz
MAX
T-!-!-| 1 1 1 -

10 10:
Frequency (GHz)

Figure 28: Measured R F gains for an HBT with a 300 A base with 52 meV grading
and a 2000 Acollector.
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 201

• JL-JU
4
WeO.2 X 6 j-im
3.5 Wc^1.5X9^inrV
p-50
3

< 2.5
£
ib step,
2
0.01 mA

1.5

0.5

0 i ' "T—'—r~

0.6 0.8 1 1
Vce, Volts

Figure 29: Common-emitter characteristics for a device defined by optical projection


lithography. As a result of the 400 A base with 2kT grading, /? =50 is obtained
even with a 0.2 fim emitter width.

(a) (b)

Figure 30: CML (a) and ECL (b) master-slave D-flip-flops.


202 M. J. W. Rodwell et al.

keep-alive bias
currents
clock

/ <
inductive load [ K ^ \ J ^ '

t-'TT^W-t
WV-TV-+

transmission-line bus K F ^ - w
short signal path
emitter-follower
c|ock
damping

Figure 31: High-speed master-slave flip-flop; key features of the circuit design and
physical layout.

I&&

Figure 32: High speed master-slave flip-flop. The IC contains 70 HBTs.


InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 203

40 60 go
Frequency, GHz

a| *T$.wfj$*aM|* i *•"•' ^ - ^ ; ^ ' 4 ^ W ^ * - q w w g K ^ .

Figure 33: Distributed amplifier in the transferred-substrate process. The amplifier


exhibits 11.5 dB gain and approximately 80 GHz bandwidth
204 M. J. W. Rodwell et al.

Figure 34: 11 dB gain, DC-50 GHz differential amplifier.

surface (coplanar-waveguide) ground connections and multiple input/output con-


nections, ground bounce between IC and package will prevent 100 GHz operation.
For an IC with iVsignai signal lines of impedance Zo, risetime AT, and voltage swing
Signal, and Abound grounding bond wires of inductance Lbond — 0.6pH//xm-300//m,
the package-IC ground bounce is Vbounce = VsisnakiNaign&lLbond/NgroundZoAT. For
ground bounce equal to 10% of the signal amplitudes, a 100-GHz clock rate IC
must have Nground/NBien&i =5-10, and 80%-90% of the IC bond-pads must be de-
voted to IC grounding. Reported 10 GHz clock rate ICs devote ~ 50% of IC pads
for grounding. For mixed-signal and communications ICs, signal coupling through
ground bounce must be much smaller than 10% of the digital I/O interface lev-
els. Consequently, common-lead inductance between the IC and package ground
systems must be made vanishingly small.
In addition to wide bandwidth transistors, the substrate transfer process pro-
vides thermal vias for HBT heatsinking, and microstrip transmission-line intercon-
nects on a low dielectric constant substrate (e,.=2.7) with vias, ground plane, and 3
levels of interconnects. At 5 fim length, the grounding vias are 20:1 shorter than in
typical 100-/im-substrate microstrip MIMICs, reducing ground via inductance by
over an order of magnitude. The process also incorporates NiCr resistors and SiaN4
MIM capacitors.
Presently, thermal resistance is dominated by temperature gradients internal
to the transistor itself, arising from the low thermal conductivity of the InAlAs
emitter and InGaAs base and collector layers. Thus, allowable power per unit HBT
emitter area remains comparable to mesa HBTs. For power transferred-substrate
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm- Wave ICs 205

i I i I I I I r
0 10 20 30 40 50 60 70 80
Frequency, GHz

Figure 35: / T -doubler resistive feedback amplifier with 8.2 dB low-frequency gain
and a DC-80 GHz 3-dB-bandwidth
206 M. J. W. Rodwell et al.

CD
•o

10 20 30 40 50
Frequency, GHz

Fi gure 36: Measured S-parameters of a single-stage Darlington feedback amplifier.


The amplifier exhibits 18 dB baseband gain, a 3-dB-bandwidth greater than 50
GHz, and greater than 400 GHz gain-bandwidth product.
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 207

Figure 37: Circuit diagram of a W-band medium-power amplifier in the transferred-


substrate HBT process.

HBTs, use of high-thermal-conductivity InP emitter and collector epitaxial layers


will greatly increase allowable power per unit HBT junction area. This is being
pursued. To tolerate high power densities, the NiCr resistors must have thermal
vias, which results in significant parasitic capacitance. Pull-up resistors in ECL do
not require the thermal via.

5. I n t e g r a t e d circuit results
As a first demonstration of digital ICs in the transferred-substrate process, we fab-
ricated ECL and CML master-slave flip-flops, configured as 2:1 static frequency
dividers 5 3 . Circuits were fabricated using contact lithography, producing devices
with 0.6 /xm x 8/xm emitters and 1.6 /xm x 12 /xm collectors. The devices operate
at 1.25 mA//xm 2 . The differential logic swing is 600 mV. The collector pull-up
resistors are 50 CI, hence the divider outputs directly drive 50 il output lines with-
out buffering. For these initial designs, circuit design was entirely standard. The
CML divider uses series-gated master and slave latches. Emitter-follower buffers
are added to the CML clock and data ports to form the ECL divider. The ICs are
shown in fig. 30. The ICs operated at maximum clock frequencies 47 GHz (CML)
and 48 GHz (ECL) and dissipated 380 mW (ECL) or 75 m W (CML) from a -5 V
supply.
Improved master-slave flip-flop designs were fabricated using optical projection
208 M. J. W. Rodwell et al.

m •Wmzftfaj ./**
'••n**?: ^ . ^ i M f i p -
. . . ' * • -

Figure 38: W-band balanced medium-power amplifier. The amplifier has 7 dB gain
and produces 10.7 dBm saturated output power at 78 GHz.

Figure 39: 2-bit carry generation logic circuits, developed as components of a mi-
crowave binary adder. The circuit contains 250 transistors.
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 209

Figure 40: A-E ADC fabricated in the transferred-substrate process. The IC con-
tains approximately 150 HBTs, and operates at 18 GHz clock rate.

lithography. These designs employed. HBTs with 0.5 fim emitter and 1.5 /xm collec-
tor junctions widths, with the devices operating at 2 x 105 A/cm 2 current density.
Critical interconnects between stages are implemented as short doubly-terminated
100 fi transmission lines at the center of the IC. The terminations use a small
amount of series inductive peaking (Ig. 31). Emitter-follower buffers increase logic
speed but can induce strong ringing; L-R networks provide shunt loading of emitter-
follower outputs and damp the emitter-follower pulse response. Keep-alive currents
of 1/6 the logic currents keep the input stages weakly biased to minimize the input
stage delays. The overall chip area is 1.0 x 0.4 mm, and consists of 76 transis-
tors (fig. 32). The lip-flop dissipates 812 mW from a -5V supply, and the output
buffer dissipates 38 mW from a -2V supply. Circuit simulations, which included
all significant device and interconnect parasitics, predicted a 95 GHz maximum
clock frequency when the latch is configured as a 2:1 static frequency divider. IC
operation has been demonstrated to 66 GHz.
A number of high speed analog ICs have been fabricated in the transferred-
substrate HBT process. Among these are 80 GHz distributed amplifiers 37 (fig.
33), 50 GHz broadband differential amplifiers for optical fiber receivers 56(fig. 34),
and broadband Darlington and fr - doubler resistive feedback amplifiers (fig. 35).
Figure 36 shows the measured gain vs. frequency of a Darlington resistive feedback
amplifier 57 38 . Greater than 400 GHz gain-bandwidth product is obtain from a
210 M. J. W. Rodwell et al.

integrator 1
idata

riput

Figure 41: Simplified circuit diagram of the A - E ADC.

single Darlington stage. Tuned mm-wave amplifiers have also been demonstrated
in the transferred-substrate process, including a 75 GHz amplifier 5 8 (figs. 38, 37)
and, recently, a 185-GHz tuned amplifier 3 6 .
Larger digital and mixed-signal ICs have also been fabricated in the transferred-
substrate process. We have recently fabricated A - E ADCs t t i n the technology (fig.
40, fig. 41) 5 9 . These ICs have operated at an 18 GHz clock rate. Figure 42 shows
the measured ADC signal/noise ratio and third-order distortion as a function of
input power under two-tone test conditions. At a 990 MHz signal frequency, a peak
signal/noise ratio of 120-125 dB (1 Hz) is obtained.
Larger digital circuits in development include sum and carry generation circuits
for pipelined adder-accumulators (fig. 39). These circuits use 4-level series-gated
current-steering logic and merged logic-latch circuits to obtain the equivalent of 2
AND, 2 OR, and 2 latching operations in a 50 ps clock period 6 0 .

6. Conclusions
Bipolar integrated circuit bandwidths have increased tremendously since the first
demonstration of (bipolar) integrated circuits 40 years ago. Device, IC, and applica-
tion bandwidths will continue to increase. With MOS transistors and III-V HEMTs
(FETs), improved device bandwidths are obtained by lateral scaling (shorter gate
lengths) combined with vertical scaling (thinner gate-channel insulating barriers),
and progressive improvements in source/drain Ohmic contacts. W i t h bipolar tran-
sistors, improved bandwidths are obtained by vertical scaling (thinner base and
collector layers), combined with lateral scaling (narrower collector and emitter junc-
tions), increased current density, and progressive improvements in emitter Ohmic

ttFor A - E converters, the terms "modulator" and "ADC" are used synonymously in the literature.
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 211

i i_l i i I i i i_i L I i i i i I i i i i n i i h i , -40


o- _I_J I_I

r = 749.9 MHz,
D Q 1
f2= 150.1 MHz
-20- --60
_ ^ f = 499.9 MHz,
f2 = 500.1 MHz
f = 989.9 MHz,
§ -40 f2 = 990.1 MHz

g>
-60 •100%

I- -so- •120 8.
CD
.CO
O s
-100 •140

-\2.\J '—'—'—' I i—i—i—i—I—'—'—r~i—|—i—i—i—r~]—i—i—i—rn—i—i—i—i—TT—i—i—i—i—i—n~ ' * ' * '

-BO -70 -60 -50 -40 -30 -20 -10


Pin (dBm)
Figure 42: A-E ADC noise floor and third-order distortion power as a function of
input power for different signal frequencies under two-tone test conditions.
212 M. J. W. Rodwell et al.

contacts. While III-V HBTs benefit from strong heterojunctions, high mobilities,
and high electron velocities, Si/SiGe bipolar transistors have been much more ag-
gressively scaled, both in lithographic dimensions and emitter current density. Es-
sential to the future success of III-V HBTs is submicron junction scaling and greatly
increased current densities.
While bipolar ICs are much smaller than CMOS VLSI ICs, clock frequencies are
much higher. In both technologies, thermal management and signal integrity are
major limits to performance. As bipolar technologies evolve towards complex ICs
operating at a 100 GHz clock, an increasing fraction of the total circuit connections
will be terminated transmission lines of controlled characteristic impedance and
minimal dielectric loading.

Acknowledgments

Work at UCSB was supported by the ONR under grants N0014-99-1-0041, N00014-
01-1-0065, N00014-01-1-0066, N00014-01-1-0024, N00014-98-1-0750, and N00014-
98-1-0830 (D. Purdy, D. VanVechten, M. Yoder, J. Zolper), by the AFOSR under
grant F4962096-1-0019 (H. Schlossberg), and by the ARO under the Quasi-Optical
MURI PC249806 (J. Harvey). J P L work was performed at the Center for Space
Microelectronics Technology, JPL, Caltech, and sponsored by the NASA office of
Space Science.

References

1. H. Kroemer, "Heterostructure Bipolar Transistors and Integrated Circuits", Proc.


IEEE, Vol. 70, No. 1, January 1982, pp. 13-25.
2. P. Asbeck, F. Chang, K.-C. Wang, G. Sullivan, and D. Cheung, "GaAs-based Het-
erojunction Bipolar Transistors for Very High Performance Electronic Circuits", Proc.
IEEE, vol. 81 (12), pp. 1709-1726, December 1993.
3. M. Sokolich, D. P. Docter, Y.K. Brown, A.R. Kamer, J.F. Jensen, W.E. Stanchina,
S. Thomas III, C. H. Fields, D. A. Ahmari, M. Lui, R. Martinez, J. Duvall, "A low
power 52.9 GHz static frequency divider in a manufacturable 180 GHz AlInAs/InGaAs
HBT IC technology", Technical Digest, IEEE GaAs IC Symposium , Nov. 1-4, 1998,
Atlanta, Ga. , pp. 117-120.
4. P.K. Hughes, J.Y. Choe, J. Zolper, "Advanced Multifunctional RF system (AMRFS)",
Technical Digest (vol. XXV), Government Microcircuits Application Conference (GO-
MAC), Anaheim, CA., March 2000, pp. 194-197.
5. H. Suzuki, K. Watanabe, K. Ishikawa, H. Masuda, K. Ouchi, T. Tanoue and R. Takeyari,
"InP/InGaAs HBT ICs for 40 Gbit/s optical transmission systems", Technical Digest,
IEEE GaAs IC Symposium, 1997, pp. 215-218.
6. H. M. Rein, E. Gottwald and T. F. Meister, "Si-bipolar - a potential candidate for
high-speed electronics in 20 and 40 Gb/s TDM systems ?", Technical Digest, Ultrafast
Electronics and Optoelectronics Conference, 1997, pp. 118-120.
7. J.C. Candy and G.C. Temes, editors, Oversampling Delta-Sigma Data Converters,
IEEE press, 1992, Piscataway, N.J.
8. D. C. Larson, "High speed direct digital synthesis techniques and applications", Tech-
nical Digest IEEE GaAs IC Symposium, 1998, pp. 209-212.
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 213

9. J. Jensen, G. Raghavan, A. Cosand, R. Walden, "A 3.2 GHz second order sigma-delta
modulator implemented in the InP HBT technology", IEEE J. Solid-State Circuits, pp.
214-215, 1997.
10. S. Subbanna, J. Johnson, G. Freeman, R. Volant, R. Groves, D. Herman, B. Meyerson,
"Prospects for Silicon Germanium based technology for very high speed circuits", IEEE
MTT-S International Microwave Symposium, Boston, MA, June 2000.
11. E.F. Crabbe, B.S. Meyerson, D.L. Harame, J.M.C. Stork, A. Megdanis, J. Cotte, J.
Chu, M. Gilbert, C. Stanis, J. H. Comfort, G. L. Patton, S. Subbanna, "113-GHz fT
graded-base SiGe HBTs", 51st Device Research Conference, 1993; Abstract in IEEE
Transactions on Electron Devices, Vol. 40, p. 2100, 1993.
12. K. Ohhata, T. Masuda, E. Ohue, K. Washio, "Design of a 32.7-GHz bandwdth AGC
amplifier IC with wid& dynamic range implemented with SiGe HBT", IEEE J. Solid-
State Circuits, Vol. 34, No. 9, Sept 1999, pp. 1291-1297.
13. E. F. Crabbe, J. H. Comfort, J. D. Cressler, J. Y.-C. Sun, and J. M.C. Stork, "High-Low
Polysilicon-Emitter SiGe-Base Bipolar Transistors", IEEE Electron Device Letters, Vol.
4, No. 10, October 1993, pp. 478-480
14. S. Yamahata, K. Kurishima, H. Ito and Y. Matsuoka, "Over-220-GHz-/ T -and-/ maa:
InP/InGaAs double-heterojunction bipolar transistors with a new hexagonal-shaped
emitter", Technical Digest, IEEE GaAs IC Symposium, 1995, pp. 163-166.
15. H. Kroemer, "Two integral relations pertaining to the electron transport through a
bipolar transistor with a nonuniform energy gap in the base region", Solid State Elec-
tronics, vol. 28, pp. 1101-1103, 1985.
16. B.G. Streetman, Solid State Electronic Devices, third edition, Prentice-Hall, 1990.
17. S. Laux, W. Lee, "Collector signal delay in the presence of velocity overshoot", IEEE
Electron Device Letters, vol. 11, No. 4, pp. 174-176, 1990
18. T. Ishibashi, "Influence of electron velocity overshoot on collector transit times of
HBTs", IEEE Transactions on Electron Devices, vol. 37, no. 9, pp. 2103-2105, Septem-
ber 1990.
19. M. Littlejohn, K.W. Kim, H. Tian, "High-field transport in InGaAs and related het-
erostructures", in Properties of lattice-matched and strained Indium Gallium Ar-
senide, P. Bhattacharya, editor, INSPEC, 1993, London.
20. E.P.O'Reilly, "Band structure of InP: Overview", in Properties of Indium Phosphide,
INSPEC, 1991, London.
21. C.T. Kirk, "A theory of transistor cutoff frequency (/ T ) fall-off at high current density",
IEEE Transactions on Electron Devices, ED-9, p. 164, (1962)
22. M.J.W. Rodwell, S.T. Allen, R.Y. Yu, M.G. Case, M. Reddy, E. Carman, J. Pusl,
M. Kamegawa, Y. Konishi, and R. Pullela, "Active and Nonlinear Wave Propagation
Devices in Ultrafast Electronics and Optoelectronics", IEEE Proceedings, Vol. 82, No.
7, pp. 1037-1058, July 1994.
23. HP-EESOF Series IV microwave circuit simulation program. Hewlett-Packard Com-
pany, 3000 Hanover Street, Palo Alto, CA 94304, USA
24. M. Vaidyanathan and D. L. Pulfrey, "Extrapolated fmax of heterojunction bipolar
transistors", IEEE Transactions on Electron Devices, Vol. 46, No.2, February 1999.
25. M. Vaidyanathan and D. L. Pulfrey, private communication
26. Yoram Betser and Dan Ritter, "Reduction of the base collector capacitance in
InP/GalnAs heterojunction bipolar transistors due to electron velocity modulation",
IEEE Transactions on Electron Devices, vol. 46 , no. 4, April 1999.
27. L. H. Camnitz and N. Moll, "An Analysis of the Cutoff-Frequency Behavior of Mi-
crowave Heterojunction Bipolar Transistors ", In Compound Semiconductor Tran-
sistors , edited by S. Tiwari, pp. 21-45, IEEE Press, Pisc_ataway, 1992.
28. R. W. H. Engelmann and C. A. Liechti, "Bias Dependence of GaAs and InP MESFET
214 M. J. W. Rodwell et al.

Parameters", I E E E Transactions on Electron Devices, vol. ED-24, no. 11, pp. 1288-
1296, Nov. 1977.
29. M.-C Ho, R.A. Johnson, W. J. Ho, M.F. Chang, P. M. Asbeck, ". High-performance low-
base-collector capacitance A l G a A s / G a A s heterojunction bipolar transistors fabricated
by deep ion implantation", I E E E Electron Device Letters, vol.16, (no.11), Nov. 1995.
pp.512-14.
30. W. Liu, D. Hill, H. F . Chau, J. Sweder, T . Nagle and J. Delany, "Laterally etched un-
dercut (LEU) technique to reduce base-collector capacitance in heterojunction bipolar
transistors", Technical Digest , I E E E GaAs IC Symposium, pp. 167-170, 1995.
31. A. Gutierrez-Aitken et. al., 1999 International Electron Device Meeting, December,
Washington, D C .
32. T. Oka, K. Hirata, K. Ouchi, H. Uchiyama, K. Mochizuki, T. Nakamura, ". Small-scaled
I n G a P / G a A s H B T s with W S i / T i base electrode and buried S i 0 2 " I E E E Transactions
on Electron Devices, vol.45, (no.11), Nov. 1998, pp.2276-82.
33. H. Shimawaki, Y. Amamiya, N. Furuhata, K. Honjo, "High / m Q I A l G a A s / I n G a A s
and A l G a A s / G a A s H B T ' s with p+ /p Regrown Base Contacts", I E E E Transactions on
Electron Devices, Vol. 42, No. 10, October 1995, pp. 1735-1744.
34. T. Oka, K. Hirata, K. Ouchi, H. Uchiyama, T. Taniguchi, K. Mochizuki, T. Nakamura,
"Advanced Performance of Small-Scaled I n G a P / G a A s H B T s with fT over 150 GHz
and fmax over 250 GHz", In Proceedings, 1998 I E E E International Electron Device
Meeting, December 6-9, San Francisco, pp. 653-656
35. Y. Matsuoka, S. Yamahata, K. kurishima and H. Ito, "Ultrahigh-speed I n P / I n G a A s
Double-Heterostructure Bipolar Transistors and Analysis of Their Operation",
Japanese Journal of Applied Physics, vol. 35, pp.5646-5654, 1996.
36. M. Urteaga, D. Scott, T. Mathew, S. Krishnan, Y. Wei, M . J . W . Rodwell, "185 GHz
Monolithic Amplifier in I n G a A s / I n A l A s Transferred-Substrate H B T Technology", Sub-
mitted to the 2001 M T T - S International Microwave Symposium, s u b m i t t e d Dec. 2000.
37. S. Krishnan, S. J a g a n a t h a n , T. Mathew, Y. Wei, M.J.W. Rodwell "Broadband H B T
amplifiers", 2000 I E E E Cornell Conference on High Speed Electronics.
38. S. Krishnan, D. Mensa, J. Guthrie, S. J a g a n a t h a n , T. Mathew, R. Girish, Y. Wei
and M.J.W. Rodwell, "Broadband lumped H B T amplifiers." IEE Electronics Letters,
pp.466-7, Vol 36, No.5.
39. S. Yamahata, Y. Matsuoka, T. Ishibashi, "Ultrahigh-speed A l G a A s / G a A s ballistic col-
lection transistors using carbon as a p-type dopant", Electronics Letters, Vol. 29, No.
22, 28 October 1993, pp. 1996-1997.
40. R. P. Smith, S.T. Allen, M. Reddy, S.C. Martin, J. Liu, R. E. Muller, and M.J.W.
Rodwell, "0.1 fim Schottky-Collector A l A s / G a A s Resonant Tunneling Diodes", I E E E
Electron Device Letters, Vol. 15, No. 8, August 1994.
41. Q. Lee, B. Agarwal, R. Pullela, D. Mensa, J. Guthrie, L. Samoska, M. Rodwell, "A >
400 GHz fmax transferred-substrate heterojunction bipolar transistor IC technology",
I E E E Electron Device Letters, vol. 19, p p . 77-79, 1998.
42. W. E. Stanchina, et. al, "An InP-based H B T fab for High-Speed Digital, Analog,
Mixed-Signal, and Optoelectronic ICs" Technical Digest,GaAs IC Symposium, 1995,
pp. 31-34
43. Q. Lee, S.C. Martin, D. Mensa, R.P. Smith, J. Guthrie, S. J a g a n a t h a n , Y. Betser,
T . Mathew, S. Krishnan, L. Samoska, and M.J.W. Rodwell, "Submicron transferred-
s u b s t r a t e heterojunction bipolar transistors with greater t h a n 1 THz fmax-\ postdead-
line paper, 1999 I E E E Device Research Conference, June, Santa Barbara, CA.
44. Q. Lee, S.C. Martin, D. Mensa, R.P. Smith, J. Guthrie, and M.J.W. Rodwell, "Submi-
cron transferred-substrate heterojunction bipolar transistors", I E E E Electron Device
Letters, Vol. 20, No. 8, August 1999, pp. 396-398.
InGaAs/InAlAs HBTs for High Speed Mixed-Signal and mm-Wave ICs 215

45. D. Mensa, Q. Lee, J. Guthrie, S. J a g a n a t h a n , and M . J . W . Rodwell, "Transferred-


s u b s t r a t e H B T s with 250 GHz current-gain cutoff frequency ", Proceedings, 1998 In-
ternational Electron Device Meeting, San Francisco, December.
46. S. J. Mason, "Power gain in feedback amplifier", I R E Trans. Circuit Theory, vol. C T - 1 ,
pp. 20-25, 1954.
47. Y. Betser , D. Mensa, S. J a g a n a t h a n , T. Mathew and M. Rodwell, " I n A l A s / I n G a A s
H B T s with Simultaneously High values of Ft and F m a x for mixed analog/digital ap-
plications" , To be published, I E E E Electron Device Letters, submitted July 2000.
48. J. Guthrie, D. Mensa, T. Mathew, Q. Lee, S. Krishnan, S. J a g a n a t h a n , S. Cerhan, Y.
Betser, M.J.W. Rodwell, "A 50 m m C o p p e r / P o l y m e r Substrate H B T Technology for >
100 GHz MIMICs", 1999 I E E E Conference on I n P and related materials, May, Davos,
Switzerland.
49. E. Sano, Y. Matsuoka, T . Ishibashi, "Device Figures-of-Merit for High Speed Digital
ICs and Baseband Amplfiers" I E I C E Transactions on Electronics, E78-C (1995) pp.
1182-1188.
50. T. Enoki, E. Sano, T. Ishibashi, "Prospects of InP-based IC technologies for 100 G b / s -
class lightwave communications systems", International Journal of High Speed Elec-
tronics and Systems, this issue.
51. D.A. Hodges and H.G. Jackson, Analysis and Design of Digital Integrated Circuits,
2nd Edition, McGraw-Hill, 1983, ISBN 0-07-029153-5
52. P.K. Tien, "Propagation delay in high speed silicon bipolar and GaAs H B T digital
circuits", International Journal of High Speed Electronics and Systems, 1(1) p p . 101-
124, 1990.
53. R. Pullela, D. Mensa, B. Agarwal, J. Guthrie, M. Rodwell, "47 GHz static frequency
divider in Ultrafast transferred-substrate heterojunction bipolar transistor technology",
1998 Conference on I n P and Related Materials, May, T s u k u b a , J a p a n .
54. Q. Lee, D. Mensa, J. Guthrie, S. J a g a n a t h a n , T. Mathew, S. Krishnan, S. Cerhan
and M.J.W. Rodwell, "66 GHz static frequency divider in transferred-substrate H B T
technology", 1999 I E E E RF/Microwave monolithic circuits symposium, J u n e , Anaheim,
CA.
55. B. Agarwal, R. Pullela, Q. Lee, D. Mensa, J. Guthrie, M. J . W . Rodwell, "80 GHz
Distributed Amplifiers with transferred-substrate heterojunction bipolar transistors",
1998 I E E E M T T Microwave Symposium, June, Baltimore Md.
56. B. Agarwal, Q. Lee, R. Pullela, D. Mensa, J. Guthrie, M. J.W. Rodwell, "A transferred-
substrate H B T wideband differential Amplifier to 50 GHz", I E E E Microwave and
Guided Wave Letters, J u n e 1998.
57. D. Mensa, R. Pullela, Q. Lee, B . Agarwal, J. Guthrie, S. J a g a n a t h a n , M . J . W . Rodwell,
"Baseband amplifiers in the transferred-substrate H B T technology". 1998 I E E E GaAs
IC symposium. Nov. 1-4, Atlanta, Ga.
58. J.R. Guthrie, M. Urteaga, D. Scott, D. Mensa, T. Mathew, Q. Lee, S. Krishnan, S.
J a g a n a t h a n , Y. Betser, M. Rodwell, " H B T MMIC 75 GHz power amplifiers", 2000
I E E E Conference on Indium Phosphide and Related Materials, May, Williamsburg,
Va.
59. S. J a g a n a t h a n , D. Mensa, T . Mathew, Y. Betser, S. Krishnan, Y. Wei, D. Scott, M.
Urteaga, M. Rodwell, "An 18 GHz clock rate continuous-time A - E modulator imple-
mented in I n P transferred-substrate H B T technology", 2000 I E E E GaAs IC Sympo-
sium, November, Seattle, Wa.
60. T. Mathew, S. J a g a n a t h a n , D. Scott, S. Krishnan, Y. Wei, M. Urteaga, M. J. W .
Rodwell, S. Long, "2 bit adder carry and sum logic circuits at 19 GHz clock frequency
in Transferred Substrate H B T technology", submitted to the 2001 I E E E Conference
on Indium Phoshide and related materials, May, Nara, J a p a n .
This page is intentionally left blank
International Journal of High Speed Electronics and Systems, Vol. 11, No. 1 (2001) 217-243
© World Scientific Publishing Company

Progress toward 100 GHz Logic in InP HBT IC Technology


C.H. FIELDS, M. SOKOLICH, S. THOMAS, K. ELLIOT AND J. JENSEN
HRL Laboratories, LLC, Microelectronics Laboratory,
3011 Malibu Canyon Rd, Malibu, CA 90265, USA

Future wideband communications, mm-wave digital synthesis, and


digital beam-steering will benefit from digital operation at clock
frequencies between 50 and 100 GHz at reasonable power levels.
HRL* has developed InP-based HBT technology that is capable of
supporting these needs. We have demonstrated InP HBTs with cutoff
frequencies, f,, over 200GHz and with fmax over 300GHz as well as
fully static dividers operating at 72.8GHz.

1. Introduction

In both wireline and wireless communications systems there is an increasing need for
circuitry capable of 40, 80 or 100 Gbps data rates'. Once received, signals can be
demultiplexed and processed at much lower data rates so the primary need is for small
and medium scale integrated circuits such as drivers, multiplexers and demultiplexers. In
analog to digital conversion applications AS data converters can be used which also
require modest complexity and extremely high clock rates to achieve the desired
oversampling ratios. In such circuits, sampling speed is traded off against circuit
complexity and circuit element matching. Finally, direct digital synthesis of microwave
signals requires clock rates of about 100 GHz. Thus 100GHz clock rates open up
techniques to microwave and high speed circuit designers that are currently applied only
at much lower frequencies in a few growing areas of interest. Access to an ultra-high
speed, medium scale integrated - large scale integrated (MSI-LSI) technology is thus an
essential capability for the system engineer in a variety of disciplines including radar,
communications, and remote sensing. InP based HBT IC technology is ideal for these
applications because only InP based HBTs have demonstrated clock rates in excess of
70GHz with MSI levels of circuit complexity.
In this paper we review the device and circuit issues that have enabled us to exceed 70
GHz clock rates and we look forward to what must be done to surpass 100 GHz clock
rates. The development trends indicate that 100 GHz clock rate circuit demonstration is
imminent.

2. Historical Evolution of High Speed Digital Circuits


Divider frequency has increased from 4 GHz to 72 GHz in the two decades since the first
demonstration of a III-V (GaAs) based static divider2. Figure 1 shows that the trend has
not been a smooth one for III-V based technology but has been fairly predictable for Si
based technology. The period of rapid advance in the mid to late 1980's roughly
corresponds to the U.S. defense buildup while the lull in the early 1990's corresponds to
the DoD procurement downturn. The smooth Si trend is likely to be more a function of
demand in commercial and industrial markets than to the fluctuations of the defense
market. The recent resurgence in the rate of increase of III-V speed, however, seems to
be more driven by the globally expanding communications market than by defense
expenditures which is an encouraging trend for III-V.

* ©2000 HRL Laboratories, LLC. All Rights Reserved

217
218 C. H. Fields et al.

At the current rate of increase, maximum clock rates will exceed 100 GHz sometime
in the next year. The next steps will be to increase circuit yield to allow increasing levels
of integration. Several issues need to be addressed to achieve manufactureable
technologies at these clock rates. The underlying device technology must have sufficient
cutoff frequency. Cutoff frequencies (both f, and fmax) of roughly twice the clock rate and
careful design should suffice. Device cutoff frequency of more than 300 GHz has been
demonstrated in InP HEMT and 250+ GHz devices exist in InP HBT so the cutoff
frequency appears to lead the circuit clock rates for medium to large scale ICs by at least
3-5 years. Equally important to achieving high clock rates are compactness of design, a
low permittivity dielectric and signal line shielding. Taken together these requirements
can best be achieved with a multiple level interconnect (at least 3 layers) and a readily
planarized dielectric such as polyimide or BCB. In addition, device performance and
matching must be maintained as devices are scaled and clock speeds increase. Well
matched devices result in larger noise margins and allow the differential logic gates to
operate with small signal swings.

B lll-V HBT
HSi BJT

HSiGeHBT
. ^FET/HEMT

•• iriSa

m •
nil
•9>

1980 1985 1990 1995 2000

Figure 1. Historical trend in maximum clock-rate of static dividers. Points include


published accounts that claim the highest overall circuit speed or highest
speed in one of the listed technologies.

3. Semiconductor Material Properties and Growth

InP has very close parallels to GaAs but unlike GaAs, InP has the advantage of
developing in an environment in which capital equipment, fabrication and design know-
how and markets already exist. A key to the development of all the three terminal
devices in GaAs was the pace of concurrent development of quality materials. The first
GaAs MESFETs were built in 19673 but self-sustaining commercial insertion did not
occur until the mid 1980's constrained primarily by the lack of availability of high quality
substrates. HBTs in GaAs, first introduced in 1972, were not commercialized until the
late 1980's even though they are not as sensitive to the starting substrate material. The
pacing item in that case was development of MBE and MOCVD epitaxial growth
techniques. Compelling evidence that epitaxial growth paced HBTs comes in the form of
the GaAs based HEMT. The HEMT was first demonstrated in 1980, more than 8 years
after the HBT, but was commercialized somewhat earlier than the HBT because the
epitaxial growth techniques were applicable to both and the HEMT structure was
somewhat simpler. Recently GaAs HBT epitaxial wafer suppliers were so bold as to
Progress Toward 100 GHz Logic in InP HBT IC Technology 219

claim that, once the epi wafers were delivered, the transistors had already been made and
the job of the semiconductor fab was merely to reveal and interconnect them . InP has
had no analog to the MESFET because the high quality growth techniques used for GaAs
are readily transferable to InP. This has allowed InP to skip the "chicken and egg" period
in which the technology can't move forward because substrates are unavailable and
substrate quality can't improve because the volume is too low.
Much of the performance advantage enjoyed by InP-based HBT integrated circuits is
related to properties inherent to the InP materials system. The InP materials system
refers to the common group of compounds that can be grown lattice matched to the InP
substrate, namely InGaAs, InAlAs, and to a lesser extent InGaAlAs and InGaAsP. The
most common configurations of InP-based HBTs employ InP or AlInAs emitter layers,
InGaAs base layers, and InGaAs or InP collector layers.
This suite of compounds offers numerous advantages over both GaAs based HBTs
and SiGe based HBTs. The much lower electron effective mass of InGaAs yields higher
electron mobility and results in shorter base transit times and correspondingly lower
diffusion capacitance. The InP-based materials exhibit lower surface recombination
velocities, which allow for higher gain and, more importantly, further device scaling.
The higher thermal conductivity of the InP substrate provides efficient dissipation of heat
for circuit and power amplifier applications. Larger conduction band intervalley
separation in InP and InGaAs can also produce shorter collector delay times. This is due
to the reduction in the inter-valley scattering which reduces the average electron velocity
in GaAs.
Perhaps the most important advantage of the InP HBT materials system is the superior
electron transport properties of both InP and InGaAs. Figure 2 shows velocity-field
curves for common collector materials used in the InP, GaAs, and SiGe HBT materials
systems. Both InGaAs and InP have significantly higher peak electron velocity than
GaAs and Si. In addition, InGaAs reaches its peak velocity at a lower electric field,
which offers further advantages for low voltage, low power circuit applications. While Si
has the highest saturated velocity, it occurs at a much higher electric field. Hence, SiGe
HBTs require much larger bias voltages to achieve high speed and suffer greatly from
increased power dissipation.
Other major advantages of the InP HBT materials system relate to the energy
bandgaps and offsets of the materials. Figure 3 shows an energy bandgap vs lattice
constant diagram for III-V semiconductors. The much smaller energy bandgap of
InGaAs (as compared to GaAs) leads to much lower base-emitter turn-on voltages and
hence significantly lower power supply requirements and thus reduced power dissipation.
In addition, the energy bandgap difference between an InGaAs base and an InP or InAlAs
emitter (as compared to a GaAs base and an AlGaAs or InGaP emitter) is larger and
results in higher injection efficiency and gain. Alternatively, device designers are able to
trade some gain for increased base doping levels to further improve high frequency
performance. Another advantage stems from the ability to bandgap engineer the HBT by
varying material compositions to grade or otherwise alter the energy bandgap profile.
Bandgap engineering is used to grade junctions for reducing and stabilizing base-emitter
turn-on voltages, reducing hot electron injection effects, transferring offset between the
conduction and valence bands, reducing resistance, launching electrons with energy
above the local thermal equilibrium, and eliminating conduction band electron barriers in
double heterostructure bipolar transistors (DHBTs). The GaAs materials system has
some ability for bandgap engineering, but the InP materials system has a much larger
range of accessible energy bandgaps and offsets and a wider range of materials and
compositions that can be used.
220 C. H. Fields et al.

10Hp
I T=300 K
InP
~In0.53Ga0.47As

E 10? zr-
GaAs / /
S z \ / / / ^ S i

2
Q
106

O V S

6
a 1 1 M 1 1 III 1 1 i i i i n l .i i i i i i n!
UJ
102 103 104 105
ELECTRIC FIELD (V cm"1)

Figure 2. Velocity-field curves for important collector materials in the InP, GaAs, and
SiGe HBT materials systems

4.U
°ZnS

-— 0.4
3.0
ZnSe
AlP ° Cd:
0 ZnTe 0.5 X
As 0
z
5!o 2.0
- ^ CdSe
a " V) "-« AlSb CdTe ~
z o _
< >
Si \ l n P k\\ '
/ \
\ H— i.o <
1.0 1 «k^ X^W. i.s
^iLj/baSb \ __, 2.0
"\GJ|^
3.0
nAs \ N i ] ^ ^. " X l r t S b ~
5.0
0.0 ...L_..i. . 1 . J L._i i 1 ) i r-^""f^
5.5 6,0 ° 6.S
LATTICE CONSTANT (A)

Figure 3. Energy bandgap-lattice constant diagram for the common III-V materials.
Progress Toward 100 GHz Logic in InP HBT IC Technology 221

4. Fabrication of InP-based HBT Integrated Circuits

HRL Laboratories supports three versions of InP-based HBT IC technologies. The first
technology (Gl), based on 2.0|im minimum feature size, began development in the late
'80s and has been pursued extensively in the '90s. This technology provides transistors
with 80 GHz f, and 150 GHz fmax. The fabrication sequence permits the emitter metal to
be patterned with the minimum feature size allowed by the lithographic tools. The
second technology (G2), based on 1.0|im minimum feature size, provides transistors with
150 GHz f, and 180 GHz fmax. The second generation device evolved from the first
generation device by scaling both vertical and horizontal dimensions of the original HBT
IC technology.5 The third technology, still in development, scales the emitter width
further to submicron minimum dimensions.
The GalnAs/AlInAs HBT material is grown in either a solid-source or gas-source
MBE system. The layers are grown on 3" semi-insulating InP substrates. InP collectors
are optional to improve breakdown voltage of the transistors. A schematic of the layer
structure is shown in figure 4 below. The base is doped with Be and the emitter and
collector layers are doped with Si. The presence of the nine-period graded superlattice
has been shown to improve device reliability through prevention of Be diffusion.6

Contact GalnAs n+
Emitter Contact AllnAs it
Emitter AllnAs n
n
GalnAs/AlInAs SL
P++
Spacer GalnAs P
Base GalnAs P+
Collector GalnAs nr
Subcollector GalnAs n*
Buffer GalnAs
SI InP Substrate

Figure 4. Typical layer structure of the AlInAs/InGaAs transistors in this work.


Generation 1 and 2 (Gl and G2) processes differ only in the thickness of
base and collector layers.

All three processes make use of polyimide planarization, two levels of metal
interconnect, TaN thin film resistors, and metal-insulator-metal capacitors. A third level
of metal is optional depending on the circuit complexity. The basic process is extendable
to include integration with photodiodes, resonant tunneling diodes, Schottky diodes and
high electron mobility transistors.
Fabrication of InP-based HBTs is similar to fabrication of GaAs-based HBTs. As
GaAs-based HBTs have reached the status of commercial production, the fabrication is
well understood and similarities to InP-based HBT fabrication will not be dealt with here.
However, there are three major differences in InP-based HBTs that have implications to
the process technology. First, the device layers are made of different materials and so
different etch chemistries must be used. Second, InP HBTs are more readily scaled due
to the lower surface recombination velocity. Finally, GaAs HBTs are typically isolated
by implantation of the collector, while the InP based HBTs are mesa isolated. This is due
to the fact that the InGaAs collector commonly used for the InP-based process does not
222 C. H. Fields et al.

become isolating after implant due to the low energy bandgap. Therefore, the collector
and subcollector regions of the InP-based HBT must be removed by etching.
This results in what is referred to as the "triple-mesa" structure as the HBT consists of
an emitter mesa, a base-collector mesa, and a subcollector mesa.7 As the GaAs HBT has
only the emitter mesa, the InP-based HBT mesa is much taller. This topology which is
typically greater than that for GaAs processing complicates device fabrication and makes
subsequent planarization of the device much more difficult. As these three fabrication
issues are somewhat unique to InP-based HBTs, we will discuss each in greater detail in
the following sections.

Emitter-Base Mesa Fabrication


HBT fabrication starts with the definition of the emitter by Ti/Pt/Au/Ti metalization. The
emitter area is defined by a combination of dry and wet etching. Wet etching is desired
to provide an undercut profile for the subsequent self-aligned Ti/Pt/Au base metal
deposition. Excessive undercutting can be eliminated by including an anisotropic dry
etch.
Is there a limit to the minimum emitter dimension for future generations of InP-based
HBTs? Lee et al., reported on an e-beam written HBT that had a 0.5 u,m wide emitter.8
Undercut of the emitter resulted in an actual emitter width of only 0.4 p.m. The focus of
the paper was in the reduction of base collector capacitance. Due to e-beam patterning of
the collector and novel transferred-substrate processing, extremely high fmax was
achieved. SiGe HBTs are being reported with effective emitter widths of 0.18 u.m.9
However, these processes make extensive use of polysilicon layers, which is not
transferrable to the InP-based HBT process. As InGaAs/AlInAs HBTs follow the trend
to smaller emitter dimensions, the emitter-base mesa fabrication must receive additional
attention.
Niwa et al., have shown advantages in ft that can be achieved by increasing the emitter
doping level.'0 As the emitter size is reduced, the contribution of the emitter junction
capacitance is reduced; however, the emitter resistance increases. By reducing Re by
75%, they were able to improve ft from 51 GHz to 112 GHz for a 0.6x4.6u,m2 device. It
should be pointed out that the larger size device did have higher ft of 140 GHz. The main
motivation for reducing emitter size is not improved ft, but rather is reduction in power-
consumption. A secondary benefit of reducing the emitter size is reduction in the base-
collector area that reduces capacitance and improves fmax.

Base-Collector Mesa Fabrication


After base metal deposition, the base and collector layers are etched to form a mesa.
Formation of the base-collector mesa is important to the device characteristics as this
helps determine the base-collector capacitance, shown to be a determining factor in
device toggle rate. There are several ways currently in use to etch the base-collector
mesa. Kopf et al., have used dry etching to form the mesa." As the device dimensions
shrink, dry etching that provides a vertical profile and excellent control of lateral
dimensions is desirable. An InP collector is used in their device. A BC13/N2 plasma was
chosen to etch the base and collector. As there is little to no selectivity for this system,
some form of in situ monitoring is desired. They chose optical emission spectroscopy
and made use of an As emission line. The subcollector and base are both InGaAs, so the
As emission decreases when the etch has gone through the InGaAs base into the InP
collector, and increases once the InGaAs sub-collector is reached. The etch is stopped
once the As in the sub-collector is detected.
Both InP and InGaAs have been studied for use as the subcollector material. Even
though the InP has better thermal characteristics, an InGaAs subcollector is often chosen
for ease of processing as the layers can easily be delineated. Additionally, having an
Progress Toward 100 GHz Logic in InP HBT IC Technology 223

InGaAs subcollector on an InP substrate enables a selective etch to the substrate while
maintaining uniformity in etch depth across the wafer. The emitter size for these devices
were either 2x4|im2or 1.2x3|im2. By scaling to the smaller emitter size, Cbc was reduced
from 25 fF to 12 fF allowing Fmax to increase from 155 GHz to 200 GHz.
Another approach to reduce the base-collector capacitance is to transfer the epitaxial
material to another substrate and process the collector with the bottom-side up. This is
referred to as the transferred-substrate approach. The subcollector normally present is
removed and the device is interconnected by a Schottky contact. A Cbc of 3.2 fF is
extracted as a result of the reduction in the collector area.
A third method is to continue using wet etching to form the mesa. With the use of an
InP collector, InGaAs base and subcollector, it is possible to overetch the collector with
an InP etch that is selective to InGaAs. Gutierrez et al., have used this method to increase
Fmax to 263 GHz.12
A final approach is the lateral scaling of device dimensions.13 The length of the base
contact around the periphery of the emitter can be optimized to obtain the best base-
resistance/base-collector capacitance tradeoff as evidenced by fmax. In comparison to our
previously reported baseline transistors, the modified layout alone resulted in a 0.38
picosecond decrease in the total transistor delay, TF, for thin (2000 A) collector
structures.14
The use of modern lithographic equipment benefits the HBT fabrication both in the
reduction of the minimum printable critical dimension (CD) as well as improving the
overlay accuracy. The overlay accuracy, X, determines how closely features can be
spaced. These advantages can be seen for generic layouts shown in figure 5. This figure
shows the effects of reductions in CD and X, (both by a factor of 2) in the base-collector
mesa area. A 4 times reduction in area is obtained for this example. The reduction in
minimum feature size and overlay tolerance greatly reduces the base-collector area
without requiring drastic process modifications.

Collector Mesa Fabrication and Planarization


Collector isolation follows the base-collector mesa formation. This forms the third mesa
in the triple-mesa structure. InP-based HBTs benefit from selective etches available
between GalnAs used for the subcollector and the InP substrate. This allows one to stop
precisely at the device-substrate interface and ensure uniformity across the wafer.
Following HBT device fabrication, the resistors and capacitors are fabricated and then the
wafers are covered with polyimide. Figure 6 shows how the emitter metal extends
through the polyimide to make contact with the second level of metal interconnect. Other
approaches include use of spin-on-glass or BCB as the planarizing agent.15
One major benefit from scaling the transistor laterally is the vertical scaling that is
done to prevent parasitic resistances from dominating the transistor speed. The epitaxial
thickness of the 1st generation transistor was approximately twice that of the 2nd
generation transistor. This reduction in thickness has several advantages. First, the
growth time is reduced significantly. This is important for consumer applications where
the wafer quantities are much greater. Second, the non-planarity is reduced allowing
features to be spaced closer. Reducing the mesa height has also been shown to improve
yield."
224 a H. Fields et al.

^r
- CD1
+
Emitter
" " ' " • ' ,
Emitter
Ji.

1f
Base Via

CD1
Base-Collector Mesa
Base Via

| Base--Collector Me sa

Figure 5. Comparison of layouts using old design rules, and new design rules. Reduction
in minimum CD and improved overlay accuracy allow 4 times reduction in
base-collector mesa area.

mm

Figure 6. SEM image of emitter metal extending through polyimide to make connection
to the second level metal interconnect.
Progress Toward 100 GHz Logic in InP HBT IC Technology 225

5. Device Geometry and Critical Parasitics


The basic triple-mesa HBT device structure outlined in section 4 utilizes a base
contact that is self-aligned to the emitter. No other feature of the device is self-aligned in
either the first or second generation technology. The self-aligned base results in a —0.15
p.m spacing from edge of base metal to the emitter/base junction. Just as with a field
effect device (both CMOS and MESFET) there is only one true critical layer to layer
registration and that is the dimension that defines the access resistance: the base
resistance in the case of the HBT and the source resistance in the case of the FET, Self-
aligning other features is neither necessary nor desirable as it tends to put unnecessary
constraints on other device features such as epitaxial layer thickness. The collector
contact is thus optically aligned to the base. A second generation HBT is shown in figure
7 below which depicts the three terminals and typical layer-layer registration.

Emitter

Collector

Figure 7. Focused Ion Beam cross-section of \\im AlInAs/GalnAs HBT

The three-mesa process results in some overlap of base and collector resulting in
unwanted extrinsic capacitance. This extrinsic capacitance is reduced to about 4fF in the
second generation process and does not impact device or circuit performance markedly at
the lpm minimum feature size. This component of capacitance does not scale with
emitter geometry. The base metal ring around the emitter must be maintained at a
minimum width to assure low contact resistance and therefore low overall base
resistance. This ring of base metal also contributes to an "extrinsic" capacitance but one
that scales with the emitter length. Agarwal16 has demonstrated an alternative method to
minimize parasitics but it involves considerably more process complexity and a non-
standard, transferred substrate, processes. Still it shows promise; the highest fmax HBT at
over ITHz has been demonstrated with this method17. Others have used undercutting of
the collector to improve transistor performance without significant changes to the
conventional process18. Still others have supported the base contact on a dielectric
instead of the semiconductor to minimize the capacitance19. Some method to eliminate
the capacitance associated with supporting the base contact needs to be employed to
create a truly scalable HBT technology to deep submicron dimensions. However, the
methods published thus far clearly reduce producibility as no large circuits have been
demonstrated in such reduced capacitance processes.
226 C. H. Fields et al.

6. Device Characterization
DC
A typical Gummel plot for the Gl process is shown in figure 8 and an Ic-Vce
characteristic plot is shown in figure 9. The primary differences of GaAs/AlGaAs HBT
and AllnAs/GalnAs HBT can be illustrated with these two plots. GaAs-based single
heterostructure bipolar transistors (SHBTs) have significantly larger offset voltages than
InP-based devices. A typical GaAs HBT has an offset of 200-300 mV while the
comparable InP device has an offset of 50 -150 mV. The offset voltage arises because
the two p-n junctions (base/collector and base/emitter) have dissimilar materials and
dissimilar areas and therefore dissimilar current-voltage dependence. The differences are
not as great in the AllnAs/GalnAs HBT as in the AlGaAs/GaAs HBT. Because both
emitter and base are rather heavily doped in an HBT, the GaAs-based HBTs also have a
much larger Vbe turn-on than InP-base HBTs. The turn on voltage is nearly the full
bandgap of the base material. The -0.7V turn-on of AllnAs/GalnAs results in silicon-
like logic gate designs and lower power supply voltages. The one difference from Si or
SiGe in terms of DC characteristics is that the InP based HBT current gain, (3, increases
with increasing current density well beyond the safe operating region of the device.

m . i

aee.e aBe.a 4ee.a sea.a ee .@ 7Btj.e eee.a


vb [E-3:

Figure 8. Forward Gummel plot of a Baseline (Gl) 2x5um device. The plot show the
results of a two-terminal Gummel measurement withVbc=0.

20-0 I—i—I—I—I—I—I—I—I—i—I—i—i—I—i—I—i—i—i—r

Figure 9. Plot of the forward IV characteristics of a Baseline (Gl) 2x5um2 device. The
data was taken under constant base current drive.
Progress Toward 100 GHz Logic in InP HBT IC Technology 227

J l) J max
Unity current gain cutoff frequencies. ft, as high as 225 GHz have been demonstrated in
our second generation (G2) process with large emitter areas (1.5x8u.m2) but small
parasitic elements. Matched f, and fmax of about 180 GHz are obtained for smaller
transistors (lx3u.m2). Devices as small as 0.25x0.75|im2 (effective emitter area) have
been fabricated operating at f,= 160 GHz, fmax=250 GHz in our G2+ process. This
enhanced version of the basic G2 process takes advantage of new lithographic
capabilities only with no change to epitaxial material or basic process. Device selection
for high-speed logic circuits is based on a tradeoff of speed and power dissipation.
Larger area transistors have higher cutoff frequency but significantly higher power
consumption. Most G2 circuit designs are based on lx3u,m2 devices, which offer 93% of
the cutoff frequency of long emitter structures (96% for the G2+). It is essential to
optimize an HBT process such that devices close to the minimum lithographic dimension
in both length and width give optimum performance in a logic gate. Such optimization
results in the high speed but also low power device that is essential for the fabrication of
large scale circuits. A circuit designer cannot afford a single logic gate dissipating
hundreds of mW of power even in relatively simple circuits like 4:1 MUX. Scaling off,
and fmax with emitter size are plotted in figure. 10. Performance characteristics for a
number of device structures differing only in base and collector layer thickness are listed
in Table 1

200
T |

180 -
< - - T ~ - — - """"""-—--..3"
X/T "tL i
;i>>
O 160 -
x
ro \
i
i
E 140 I

u_ I
-
120
—•-- R ! !
IE - Fmax i
100 i i i i ' i i i

0 2 4 6
Emitter Length (|im)

Figure 10. Cutoff frequency (ft, fmax) scaling for lu.m emitter stripe-width HBT.
228 C. H. Fields et al.

TABLE I - EPITAXIAL STRUCTURE AND CORRESPONDING KEY


PERFORMANCE CHARACTERISTICS

Base
Collector Width Max. Divider Vbe3o Ft Fmax Yield
Sample Width (A) (A) Toggle Rate (mV) (GHz) (GHz) (Arb.)

Al 2000 500 47.0 5 158 205 34


Bl 4000 500 52.9 5 134 243 26
A2 2000 300 49.5 10 215 193 24
B2 4000 300 48.0 9 149 257 20

7. Device Scaling

Bipolar transistor feature sizes have been reduced less aggressively into the submicron
regime than HEMT because they benefit less from horizontal scaling than do field effect
devices. On the other hand, performance gains can be disappointing as the intrinsic
device is scaled vertically without careful management of extrinsic device parasitics. For
example, using a constant 2p.m design rule layout while scaling the collector from 700nm
to 200nm (reducing the collector transit time by more than a factor of 3) results in less
than a 2x improvement in f, (f, = 150 GHz) and no improvement in fmax The equations
describing ft and fmax commonly used are :

1
=Tf+^{Che+Cbc)+TRC (1)
2# ' Ic

f = — (2)

where Tf is the sum of the base and collector delay and tRC is a sum of parasitic RC
charging terms independent of collector current. The delay time scales favorably as the
collector and base thicknesses are reduced. Base transit time is 0.12ps faster for a 30nm
ungraded base than for a 50nm base. For thick base transistors (>100nm), the base transit
time scales as the square of the base thickness while for a thin base devices the scaling is
closer to linear. Transit time in the collector is at least linear with collector thickness. In
the presence of ballistic effects thinner collectors actually have much faster effective
transit velocity20. On the other hand, the RC charging terms in (1) and (2) (including Rbb
Cbc) scale unfavorably as epitaxial layers are thinned because both resistances and
capacitances increase under those conditions.
To obtain the full benefit of vertical (epitaxial layer) scaling we must concurrently
scale horizontally. For example, scaling only in the horizontal direction on a 700nm
collector HBT results in no significant improvement in f, because the xRC term in (1) is
small. Scaling only the epitaxial layers (the collector from 700nm to 200nm and the base
from 50 to 30nm) on a 2|im design rule transistor layout can result in a near doubling of
ft. In this case the first term in (1) is reduced but the third term increases. Combining the
vertical scaling with horizontal scaling to exploit a lp:m minimum design rule to reduce
parasitics (without scaling the intrinsic transistor) results in an additional 35%
improvement in ft. Typical model parameters for nominal 1, 3 and 5 |tm emitter stripe
length HBT in the G2 process are listed in Table II showing the scaling of various device
elements.
Progress Toward 100 GHz Logic in InP HBT IC Technology 229

TABLE II
TYPICAL HBT MODEL PARAMETERS
Parameter Symbol lxl u.m 1x3 1x5

Current Gain P 40 40 40
CE Breakdown BVceo 2.9 V 2.9 2.9
Base Resistance RB 173 Q. 67 34
Collector Res. Re 13.7 n 10.4 10.4
Emitter Res. RE 23.4 Q 6.8 3.6
CB Capacitance C JC 9.3 fF 13.1 17.0
EB Capacitance CJE 2.0 fF 8.7 16.5

8. Integrated Circuit Process

Although high performance devices are a necessary condition for a high performance
process, alone they are not sufficient. Before we can establish that InP is a suitable
substrate for medium or large scale integrated circuits we must first review some of its
basic properties.
Like GaAs, InP is semi-insulating with an intrinsic carrier concentration at room
temperature of 1.2x107 cm"3. The relative dielectric constant at 12.6 is very close to that
of GaAs, 12.9, so that transmission line designs are very similar. The thermal
conductivity of InP is substantially higher than that of GaAs resulting in better heat
sinking on power transistors or high density circuits. Handling of InP through the IC
process is similar to GaAs, standard wafer thickness and density is similar so that most
equipment used for GaAs can be readily converted for use with InP. There is no physical
basis for additional susceptibility to breakage. Any differences can be attributed to the
relative maturity of substrate growth technology. Our experience with breakage on 3"
wafers has been similar to our experience with 3" GaAs. InP substrates are still
considerably more expensive than GaAs (currently at least four times the price). Again,
this is primarily an experience curve effect and not a fundamental, sustainable difference.
As discussed in sections 4, the HRL HBT IC process includes passive components:
Si3N4 capacitors with capacitance per unit area of 300 fF/mm2, TaN thin film resistors
with resistance of 50 fi/square and transmission lines and spiral inductors with excellent
RF properties. Most of the surface processing (lithography, metal and dielectric
deposition and etching) has its origins in work done on GaAs and is still very similar21.

9. Device Matching
A key advantage of bipolar transistors (both HBT and BJT) is that the physics of the
base-emitter p-n junction determines the turn-on voltage. In field-effect devices:
MESFETs, HEMTs and MOSFETs, the specifics of doping, implant depth or oxide
thickness determine the threshold voltage. Thus, in general and especially in III-V
compound semiconductors, the turn-on uniformity for bipolar transistors is superior to
that for field effect devices. This is true of both long-range (wafer) uniformity and short
range (pair match) uniformity. A typical result of pair match uniformity for InP HBT is
shown in figure 1 I. Typical field-effect technologies report threshold standard deviation
no better than about 10 mV, here the standard deviation of Vbe turn-on is less than 2mV.
Differential logic gates with closely matched device pairs will operate correctly even at
very small input logic swings. This is essential for high frequency operation where signal
attenuation, particularly at the input, can be severe.
230 C. H. Fields et al.

60

50

40
t

20

10

0
-10 -5 0 5 10
Vbe Difference, Adjacent HBT (mV)

Fig. 11 - Distribution of the difference in measured Vbe (at high current density) between
adjacent (50p.m spaced) lx3|j,m HBT.

10. HBT Device Modeling

HB T RF Performance
As discussed in section 4, the baseline HBT process at HRL L.L.C. produces devices
with a unity current gain of over 80GHz and fmax greater than 150GHz. This process
utilized a 2.0u,m minimum CD. A second-generation device with scaled vertical and
lateral dimensions (l.Oum minimum CD) has an f, over 150GHz with and fmax of greater
than 180GHz. There appear to be further advantages in both the vertical and lateral
scaling of this 2nd generation device. Lateral scaling improves device performance
through the reduction in the magnitude of the base-emitter and base-collector
capacitances.
Although a BJT and an HBT are both bipolar devices, it is not straightforward to use
standard BJT SPICE models to simulate the performance of HBT devices and circuits.
The differences between BJT and HBT devices complicate the use of standard SPICE
models in the simulation of HBT device performance.

Base Current
As an example of the differences between HBTs and BJTs we look at base currents
for the two devices. Many advanced InP HBT devices are fabricated using mesa
isolation. The method of fabrication leads to different current components contributions
than a typical silicon BJT which is fabricated primarily below the semiconductor surface
using implanted and diffused dopants.
The base current in a mesa-type HBT consists of several components. These
components include: (1) recombination current in the neutral emitter-base space charge
region ISCR, (2) surface recombination at the extrinsic emitter and base sidewalls IRS, (3)
recombination in the quasi-neutral base IRB, and (4) injections current from the base into
the emitter IKE. These current components can be expressed by the following equation:
22

I B = ISCR + IRS + IRB + IRE


-I1exp(VBE/2Vt)+I2exp(VBE/2Vt)+I„(X2)(l-a)+Ip(Xl)exp(-VB/Vt) (3)
Progress Toward 100 GHz Logic in InP HBT IC Technology 231

Where I, and I2 are constants for the recombination currents ISCR and IRS; I„(X2) is the
electron current at the edge of the quasi-neutral base; Ip(X|) is the hole current at the edge
of the quasi-neutral emitter; a is the base transport factor; and VB is the valence-band
barrier potential across the emitter-base junction.
This equation has neglected leakage currents along the periphery of the base-emitter
and base-collector junctions. In fact, at low values of Vbe, this leakage term is the
dominant contributor to the base current. However, at bias value for the typical forward
Gummel plots, this term is negligible. At the bias voltages used in typical circuits, the
three current components, IRB, IRS, and ISCR are all important contributors to the total base
current. At these bias levels, the term IRB is proportional to the collector current and
therefore exhibits a Vt-like I-V characteristic. Likewise IRE becomes significant at larger
values of Vbe and exhibits a slope close to Vt. The base current component ISCR arises
from recombination in the space-charge region. From the traditional Shockley-Read-Hall
analysis and equation (3) above, the process exhibits a slope of 2Vt. These three currents
result in a combined ideality factor of n~l.5 for GaAs HBTs [1] and n~l .3 for InP HBTs.
Accurate simulations of the large signal characteristics for HBTs are possible using
the traditional Gummel-Poon model with a base current ideality factor extracted from
measured data. Figure 12 shows the forward Gummel plot of a lx3p.m2 HBT device
fabricated at HRL that shows a base ideality factor n=1.23. The ideality factor of the
collector current (nf in the SPICE BJT model) was 1.06. The simulated data (dashed
lines) matches the measured data (solid lines) well across a large range of bias conditions.
The forward I-V curves are shown in figure 13, where again the simulated and measured
data are superimposed. This figure shows excellent agreement between the measured and
simulated data across this bias region. The I-V data was limited to a collector current
density of 4mA/|im" and power limited to 4mW/u.nr.

IE) =•

3H0.H ^00.0 500.0 600.0 700.0 B00.0 3EIEJ.E3

s/b C E - 3 :

Figure 12. Forward Gummel plot of a lx5u,m2 device. The plot show the results of a
two-terminal Gummel measurement withVbc=0. The measured data is plotted
as solid lines and the simulated data as dashed lines.
232 C. H. Fields et al.

^*>.K> —i | — i — i — — i — i — | — | — I — | — | — i — | — I — | — | — | — r

Figure 13. Plot of the forward IV characteristics of a lx5p.m2 device. The data was taken
under constant base current drive. The measured data is plotted as solid lines
and the simulated data as dashed lines.

Another example of a difference between HBTs and BJTs that is a concern of the
modeling engineer is base width modulation; otherwise known as the Early effect. The
HBT devices at HRL utilize extremely high doped bases. This high doping insures that
base width modulation is negligible in these devices. Looking at the data in figure 13, the
modeling engineer might be tempted to use the SPICE term VAF to model the increase in
collector current at higher values of Vce. The use of VAF here is incorrect since this
increased current is a result of avalanche multiplication and changes in the device
temperature.
Small signal simulations of HBT performance are accomplished using the Hybrid-?:
model depicted in figure 14. Careful fitting of the model parameters to the measured data
can yield adequate fits across the bias and frequency range of the measured data.
However, it is often difficult to obtain good agreement with measurement at substantially
higher frequencies.

Rh, C be

Base Rb Cb ^
rn\ -r-C 9gmV0

iRe
Emitter
Figure 14. Small circuit equivalent circuit model of the HBT devices used for simulation.

An important aspect of modeling high frequency performance is the modeling of on-


wafer parasitic elements. The parasitic inductances and capacitances are difficult to
model without a rigorous electromagnetic solution. Therefore, their values are
determined here by estimation and fitting parameters to measured data.
Progress Toward 100 GHz Logic in InP HBT IC Technology 233

i.s -eee.a a.H eu.a *ee.a


REIRl— CE—3 D

(a) (b)
1 Mj.lJ-l-1 r1-U.j M 1 1 | 1 . M .

:""!
~'\ V
- f ;
•' :f/l ' \ ',
• •'<»•

:
--•'' / I
>•:',., i , , , , -

REAL CE+0]

(c) (d)
Figure 15. S-parameters vs. frequency for 7 base current bias levels for a lx5|im 2 HBT
from 0.5GHz to 26.5GHz (a) si 1, (b) sl2, (c) s21, and (d) s22. The measured
data is plotted as solid lines and the simulated data as dashed lines.

A plot of measured and simulated unity current gain versus collector current (f, vs. Ic)
is shown in figure 16 below. Again the measured and simulated data show excellent
agreement across all values of collector current bias. The measured value of f, is
extrapolated from the h2! data assuming a slope -20dB/decade above the highest
measured frequency (26.5GHz). Figure 17 shows the measured and simulated values of
h2i plotted versus frequency. The plot shows seven curves representing the different
values of base current drive.
Figure 15 shows simulated (dashed) and measured (solid) s-parameters for a lx5u,m2
HBT device. The data is taken in the frequency range from 0.5-26.5GHz and at 7 values
of base current bias. The simulated data in the figures matches the measured data well
across all values of bias and frequency.

_ r/
Lj_ | 5 0 . 0 —

: , , i ,

Figure 16. Plot off, vs. Ic for a lx5|am2 HBT device. The simulated data and the solid
lines represent the measured data.
234 C. H. Fields et al.

1_ 1 I f II11 [ T 1 T "TTTT
- r - r r r r r n —i—
" H.IS'IJ!.,.,,

— -
_
_
_ V
_
_ %V
NL
:
v
-- \^
: ^^JU-±AA^——A-~~ . ,, « . m i L 1 L_

f r e q CLOG]

Figure 17. Simulated and measured h2l vs. Vbe of a lx5u,m2 HBT device. The simulated
data is plotted as dashed lines while the solid lines represent the measured
data.

11. High Speed Circuit Design and Performance Benchmarking

Static Divider Performance

"Tnptrt"

mmMmMmmurmimmm
•.•n "
llHHl
•••••••••••I
mmmmmmmmmm
Output ;

••••••••••I
mmmmmmmmmm.
50 psec/div

Figure 18. Frequency synthesizer output (left) and sampling oscilloscope input and
output (right) of 62.7 GHz divide-by-16 circuit. The vertical scale of the
output trace is 0.2V/div and the input on an arbitrary scale. This signal
depends upon cable losses and the input bandwidth of the test equipment.

The basic divide-by-16 circuit design includes an input buffer, four master-slave QVfL
flip-flops buffered by emitter followers, a bias circuit and an output buffer configured as
standard cells. SPICE simulations performed prior to fabrication were found to be
accurate, predicting the AC and DC performance within 15% and 5%, respectively. The
design was optimized for current densities of 2x105 A/cm2. Only a single value of load
resistor was selected based on these simulation results. We anticipate that optimization of
the load resistance could result in faster circuits. Figure 18 shows the measured frequency
synthesizer output and sampling oscilloscope input and output of the divide-by-16 circuit
operating at 62.7 GHz. The input frequency was generated with a frequency synthesizer
and the output verified using a spectrum analyzer.
Progress Toward 100 GHz Logic in InP HBT IC Technology 235

A sensitivity analysis using SPICE indicated that the design performance was 6x more
sensitive to Cbc than any other electrical parameter. Outside of Cbc, we found that only an
increase in current density would improve performance. As a result an effort was made to
optimize Cbc without changing the overall process flow for selected circuits. We also
note that higher f, and fmax values as simulated did not always result in faster circuits, and
in fact sometimes resulted in slower circuits. The main reason for this was that ft and fmax
describe unity fan-in and fan-out device characteristics whereas circuits typically have
larger nodal transistor capacitance values. For example, a thicker collector decreases f,
but may provide better circuit performance because Cbc is reduced. The results of the
sensitivity analysis with respect to CJC are summarized in figure 19 which plots the
relative value of CJC versus predicted maximum divider frequency. Although divider
performance is more sensitive to CJC than any other parameter for these devices, the
relationship is non-linear.

Figure 19. The predicted static divider performance based on SPICE modeling on the
relative CJC value with all other SPICE parameters fixed.

The base-collector capacitance of the conventional transistor suffers from a large


parasitic contribution from the base-contact. By changing the geometry of the layers
defining this contact, we were able to substantially reduce the contribution from this
parasitic component without modifying the process flow. The same circuit layout was
replicated on a mask set with the improved transistor design and fabricated so that
different transistor designs could be evaluated side-by-side.
Both performance and yield of dividers was significantly higher on the circuit layouts
that utilized modified transistors with lower Cbc. In addition, we also observed an
improvement of the cutoff frequency, f, from 155 GHz to 177 GHz and fmax from 190
GHz to 235 GHz due to the reduction in Cbc (figures 20 and 21 respectively). This
measurement of fmax has not been optimized as a function of collector bias voltage.
236 C. H. Fields et al.

250.0 • Prior transistor


design
• Reduced Cbc | ^
200.0 design

> " > ;


J 150.0

1 100.0

50.0

0.0
1.0OE+O3 1.0OE+O4 1.00E+05 1.00E+O3

Current DensityfA/cm2)

Figure 20. Improved fmm obtained by changes to the transistor design to reduce Cb,

200.0 • Prior transistor


180.0 design
160.0 • Reduced Cbc
design
140.0
-N- 120.0
o 100.0
^ 80.0
60.0
40.0
20.0
0.0
1.00E+03 1.00E+04 1.00E+05 1.00E+06

Current Density(A/cm )

Figure 21. Improved ft obtained by changes to the transistor design to reduce Cb

We obtain optimal performance when base doping produces a base sheet resistance
less than 500 ohms/square. Below this value, we found that performance was less
sensitive to the base resistance. We also note that the extrinsic contribution to Cbc can
cause the measured f, to be sensitive to the base resistance.
The divider operation was well behaved up to the best case maximum operating
frequency of 62.7 GHz with the reduced Cbc divider. This performance compared to a 53
GHz best case result with the baseline layout. High yield above 50GHz was obtained
with average performance of 55.6 GHz with a standard deviation of 2.7GHz for the
reduced Cbc structure. Although power supply voltages were optimized to obtain 62.7
GHz operation, operation at greater than 50 GHz was routinely observed at a fixed supply
voltage of 3 V.
Progress Toward 100 GHz Logic in InP HBT IC Technology 237

Max. Freq. vs. Input Voltage

0.6

20 30 40
Maximum Frequency

Figure 22. Required input voltage for proper operation of the divider design. The design
is static over a broad range of operation. Operation at frequencies lower than
100 MHz have been obtained.

Comparison with previously published results for InP-based dividers is difficult


because the published designs have not always been completely static.18 It is well
recognized that higher performance may be obtained with dynamic designs. The HRL
divide-by-16 design reported here has been operated over a wide range of frequencies at a
variety of input levels as shown in Figure 22 showing that it is fully static. We have not
investigated in detail the possible lowest frequency of operation but we have routinely
tested the circuit at frequencies as low as 50 MHz using high slew rate digital sources.
We have also performed SPICE simulations of a dynamic variation on the basic
divider design to evaluate and compare performance to other results.23 The SPICE
simulation agrees with the static divider measured result as shown in Figure 23 (left). The
dynamic modification operates in simulation to a substantially higher frequency of 80
GHz as shown in Figure 23 (right). Hence, higher performance could be obtained with a
dynamic design. However, fully static designs are preferred for many applications
because they have lower phase noise and operate over a much broader range of
conditions.

:e6 '

Figure 23. SPICE simulation of the output spectrum of fully static divider (left) with a 62
GHz input and dynamic divider (right) with an 80 GHz input using the same
device models and a similar circuit topology.
238 C. H. Fields et al.

The improved phase noise of a static master-slave design is related to the dual
sampling of a master-slave flip-flop typically used in such designs. Since the sampling
aperture of the master and slave differ in phase by 180 degrees and by design do not
overlap, changes in the timing edge do not easily propagate around the divider feedback
loop. In the dynamic designs, the sampling aperture of the master and slave overlap and
any delay variation due to noise sources can more easily propagate freely around the
loop. The dynamic design is more realistically viewed as a phase-locked ring oscillator
with low Q.

Performance Benchmarking

Digital circuits in any high-speed technology are typically benchmarked by the


performance of static frequency dividers. A static frequency divider is usually composed
of two D-Flip flops, the master and the slave, and represents the minimal representative
unit that can be built and evaluated easily. Performance of a static divider is a recognized
figure of merit for digital integrated circuit processes because a static frequency divider
uses the same basic flip-flop elements found in more complex sequential circuits24.
Although power dissipation of a frequency divider alone is rarely a limiting factor in
system designs in which they are used, power is an important benchmark of the
technology because it can limit the integration level and therefore the available
functionality of more complex circuits. This is especially true at the highest operating
rates where power dissipation per gate is on the order of hundreds of mW for some
competing technologies.
For a number of years we have used a divider designed around a conventional CML D
flip-flop as a benchmark of our evolving process. The circuit diagram of this basic master
slave flip-flop has been reported previously.24 The most recent high-speed divider used
our G2+ transistors with 198GHz f, and was designed with load and current source
resistances of 35 ohms. The bias current for each divider stage in the chain was 9 mA or
3 mA/u.m2 which corresponds to a current density of 3x103 A/cm2 The only modification
to the circuit over previously reported results was minor optimization of the load
resistance and the insertion of the G2+ HBT in place of the G2 HBT5. The interconnect
wiring was not changed and provided a direct comparison of the performance
improvement available from the reduction of device parasitics alone.

10

w
~ 0
3
E
£ 5-
O D
o O
© 55-10
m o
•D JC
*—• +•»
c
e Synthesizer Output Limits
•- ><-20
Q.
-30
0 10 20 30 40 50 60 70 80
Input Frequency (GHz)

Fig. 24. Sensitivity to input power level of 72.8GHz divider. Maximum available output
power of synthesizer is 9dBm.
Progress Toward 100 GHz Logic in InP HBT IC Technology 239

A plot of divider sensitivity to input signal is shown in figure 24. The divider operated
with OdBm up to 63GHz without taking account of cable losses. Taking losses into
account the divider operated up to approximately 67GHz at OdBm. Maximum toggle rate
of 72.8GHz was achieved with input power of less than 5dBm at the input buffer
(8.6dBm from the signal generator). The comparable result with previous G2 HBT is
53GHz. Figure 24 is important from a benchmarking perspective because it establishes
that the circuit operates over the entire frequency range from dc to the maximum toggle
rate (minimum frequency of test was 100MHz). A straightforward dynamic divider using
a single differential amplifier fed back to a mixer should result in frequency division from
about f,/3 to f, so showing frequency division over a limited frequency range does not
establish performance of a flip-flop that is widely applicable in sequential circuits. In a
static divider we would expect a maximum toggle rate of f,/2, in this case we obtained a
toggle rate of 37% of f, which is similar to our previous result and indicates that, in the
presence of device and interconnect parasitics, the delay elements have complex
relationships . In automated probing of dividers at a fixed power supply voltage of 3.1 V
we obtained excellent uniformity of the toggle rate as shown in figure 25. This is a
hallmark of HBT CML circuits.
9
8
7 1
6 i
i
c 5 ;
84
3
2
1

65 7.j 75

Maximum Divider Toggle Rate (GHz)

Figure 25. Distribution of maximum toggle rates on-wafer at a constant power supply
voltage of 3.1V.

»-«'" «M/»H» 39.il..

Figure 26. Spectrum analyzer display showing the divided output signal at 9.1GHz
corresponding to a 72.8GHz input.
240 C. H. Fields et al.

Because there is wide interest in 40 Gb/s logic circuits for communications, there is
interest in a high speed digital technology with adequate process and design margin at
40GHz or 20 GHz clock rates. To address circuit applications at 20 GHz we designed a
low power version of our CML benchmark divider with 0.5x4|im2 transistors in the flip
flop and 0.5x2(lm2 HBT in the input buffer. Changes to the circuit design were minimal
although we did take advantage of the compactness of the smaller HBT to reduce the
interconnect length. Load resistors were chosen for good speed/power tradeoff and the
resulting bias current in the HBT was reduced to 5xl0 4 A/cm2. Maximum toggle rate in
the low power circuit was 36GHz with a 3.1V supply. The low power divider stage
dissipated 6.9mW per flip-flop. This is an effective power delay product of 24fJ if we
assume 2 gate delays per flip flop and an equivalent complexity of 4 logic gates (the
effective number of logic gates is only important in comparing to ring oscillators since
divider comparisons are unambiguous). This power*delay product is less than one-tenth
of that available in competing materials systems27'28,29 for circuits operating above 20
GHz (Figure 27).

Constant Power*Delay Product (fj)


15 30 60 120 240 480 960

100
N

••-<

O)
D)
O
H
Q.
_o
Ll_
I
Q.

1 10 100 1000
Power Per Flip-Flop, mW
Figure 27. Flip-flop toggle rate vs. power dissipation for a number of dividers reported in
the literature.

Further evolutionary improvements to the device structure and layout are possible that
should provide the 50% added improvement needed to reach 100GHz clock rates. In
particular, the reduction of device layer thickness, now that lateral dimensions have been
reduced, can lead to f, of 300GHz. This high ft will come at the expense of some
breakdown voltage and so will be useful for only a limited set of circuit applications.
Progress Toward 100 GHz Logic in InP HBT IC Technology 241

12. How Close Are We?


How close are we to demonstrating a 100 GHz divider? The flip answer is that we are
73% of the way there. The fastest static divider thus far demonstrated has a toggle rate of
72.8 GHz. This latest advancement was achieved without special attention to the circuit
design and so there appears to be substantial room for improvement through design alone.
Design, however, tends to lag process advancement by many months. This is the amount
of time designers really need to incorporate all of the real effects of a process
advancement. Two years ago we demonstrated a 53GHz divider which was the fastest at
the time. The HBTs used in that divider were just recently incorporated into an advanced
divider operating at 64GHz. Many of our improvements can be attributed entirely to
careful design techniques and, in particular, the use of microwave design tools to
optimize input match and interconnect distributed parasitics. If designers can squeeze
another 20% improvement in performance over our current state-of-art HBT as
simulations clearly show, then we should expect close to 100GHz performance sometime
in the next 18 months from careful divider redesign alone.

13. Developments to Watch For


Developments to watch for in the next several years include:

1. Merging fine line lithography and DHBT material to produce high performance
HBT. This is probably the only way to realize usable 300 GHz HBT and ensure the
ability to design 100 GHz clock-rate circuits.
2. Improvements in base doping leading to 300 Q/square sheet resistance with a 200*
base. Perhaps brought about by the incorporation of Carbon as the base dopant.
3. Processes with 3 and even 4 levels of interconnect to manage interconnect parasitics.
Continued research on low premitivitty dielectrics and other strategies to reduce
capacitive coupling of interconnect lines.
4. Extensive use of mm-wave design techniques to realize digital circuits.
5. Commercialization of 40 Gbps circuits. A few strategic parts in production will
accelerate the development of 100 GHz circuits as researchers look forward to the
next generation of communications circuits.
6. Large scale integrated circuits in InP. These will be the test bed for true performance
improvements in the HBT. Device enhancements that are closely followed by circuit
demonstrations are the ones to watch.
7. Advanced high speed packaging techniques that can combine multiple components
in different material structures on the same substrate. System on a chip will take on
a different meaning from the one understood today as InP HEMT, InP HBT and
SiGe BiCMOS are all combined on the same low cost substrate.
8. Bootstrapping of test capability as high speed circuits are quickly turned around to be
used to test the next generation of high speed circuits.
242 C. H. Fields et al.

14. Summary

HRL has been at the forefront of InP HBT research and development since the late
1980's following a long period of leadership in GaAs research beginning in the mid
1970's. During this period we have consistently demonstrated some of the highest
performance devices and circuits as well as many firsts in device and circuit
implementation including: switches, dividers, oscillators, multiplexers, and sigma-delta
modulators. The level of maturity of InP HBT IC technology makes it a strong
candidate to support the needs of future communications requiring 100GHz logic.

Acknowledgements
We gratefully acknowledge the support of the Air Force Research Labs (contract #
F33615-96-C-1924) and the Office of Naval Research (contract # N00014-98-C-0081).
The authors also wish to thank B. Doty for the SEM and FIB work as well as A. Arthur,
Y. Brown, J. Duvall, W. Hoefer, C. Hooper, H. Karatnicki, R. Martinez, and M. Montes
for their help in wafer processing.

References
1
M. Nakamura, "Challenges in Semiconductor Technology for Multi-Megabit Network Services", ISSCC
Proceedings, pp. 16-20, February 1998.
2
R. Van Tuyl, C. Liechti, R. Lee, E. Gowen, " GaAs MESFET logic with 4 GHz clock rate," IEEE J. Solid-
State Circuits, Vol. SC-12, pp. 485-496, Oct. 1977.
3
W.W. Hooper and W. I Lehrer, "An epitaxial GaAsfield-effecttransistor," Proc. IEEE, Vol. 55. pp. 1237-
1238, July 1967.
4
J.C.C. Fan, "Heterostructure Device Wafer Manufacturing for Telecom Applications", GaAs MANTECH,
pp. 193-196, April 1999.
5
W.E. Stanchina , J.F. Jensen, R.H. Walden, M. Hafizi, H.-C. Sun, T. Liu, G. Raghavan, K.E. Elliott, M.
Kardos, A.E. Schmitz, Y.K. Brown, M.E. Montes and M. Yung, "An InP-Based HBT Fab for High-Speed
Digital, Analog, Mixed-Signal, and Optoelectronic Ics, GaAs IC Symposium, pp.31-35, 1996.
6
M. Hafizi, "HBT IC Manufacturability and Reliability," Solid-State Electronics Vol. 41, No. 10, pp. 1591-
1598, 1997.
7
W. E. Stanchina, D. B. Rensch, J. F. Jensen, U. K. Mishra, T. V. Kargodorian, M. P. Pierce, and Y. K. Allen,
"Processing Techniques for the Fabrication of High Speed AlInAs/GalnAs HBT Circuits," SOTAPOCS
Proc, 1990.
8
Q. Lee, S. C. Martin, D. Mensa, R. P. Smith, J. Guthrie, and M. J. Rodwell, "Submicron Transferred-
Substrate Heterojunction Bipolar Transistors," IEEE Electron Device Letters, Vol. 20, pp. 396-398, 1999.
9
G. Freeman, D. Ahlgren, D. R. Greenberg, R. Groves, F. Huang, G. Hugo, B. Jagannathan, S. J. Jeng, J.
Johnson, K. Schonenberg, K. Stein, R. Volant, and S. Subbanna, "A 0.18 urn 90 GHz fT SiGe HBT
BiCMOS, ASIC-Compatible, Copper Interconnect Technology for RF and Microwave Applications," IEDM
Technical Digest, pp. 569-572,1999.
10
T. Niwa, et al., "High-fr AlGaAs/InGaAs HBTs with reduced emitter resistance for low-power-consumption,
high-speed ICs," 25th ISCS Proc, Nara, Japan, pp.309-312, 1998.
11
R. F. Kopf, R. A. Hairan, Y.-C. Wang, R. W. Ryan, A. Tate, M. A. Melendes, R. Pullela, Y.-K. Chen, and J.
Thevin, "Dry-Etch Fabrication of Reduced Area InGaAs/InP DHBT Devices for High Speed Circuit
Applications," J. Electronic Materials, Vol. 29, pp. 222-224,2000.
12
A. Gutierrez-Aitken, et al., "69 GHz Frequency Divider with a Cantilevered Base InP DHBT," IEDM Tech.
Digest, pp. 779-782, 1999.
13
S. Thomas III, C. H. Fields, M. Sokolich, K. Kiziloglu, and D. Chow, "Fabrication of InP-based HBT
integrated circuits," IPRM conf. proc, pp. 286-289,2000.
14
M. Sokolich, D.P. Docter, Y.K. Brown, A.R. Kramer, J.F. Jensen, W.E. Stanchina, S. Thomas III, C.H.
Fields, D. A. Ahmari, M. Lui, R. Martinex, and J. Duvall, "A Low Power 52.9 GHz Static Divider
Implemented in a Manufacturable 180 GHz AlInAs/InGaAs HBT IC Technology", Proceedings of the 20th
Annual IEEE GaAs IC Symposium, Atlanta GA, pp.117-120, Nov. 1-4, 1998.
15
R. F. Kopf, R. A. Hamm, R. J. Malik, R. W. Ryan, J. Burm, A. Tate, Y.-K. Chen, G. Georgiou, D. V. Lang,
M. Geva, and F. Ren, "Novel Fabrication of C-Doped Base InGaAs/InP DHBT Structures for High Speed
Circuit Applications," Solid-State Electronics Vol. 42, pp. 2239-2250,1998.
Progress Toward 100 GHz Logic in InP HBT IC Technology 243

16
R. Pullela, B. Agarwal, Q. Lee, D. Mensa, J. Guthrie, L. Samoska, M. Rodwell, "Ultrafast Transferred-
Substrate Heterojunction Bipolar Transistors ICs for high-speed fiber-optic transmission", Optical Fiber
Commuunication Conference and Exhibit, OFC '98, Technical Digest, pp. 314 -315, 1998.
17
Q. Lee,, S.C. Martin, D. Mensa, R.P. Smith, J. Guthrie, S. Jaganathan, Y. Betser, T. Mathew, S. Krishnan, L.
Samoska, and M. Rodwell, "Submicron Transferred-Substrate Heterojunction Bipolar Transistors with
Greater than 1 THz/„„", Device Research Conference June, 1999.
18
Tang, B.; Notthoff, J.; Gutierrez-Aitken, A.; Kaneshiro, E.; Chin, P.; Oki, A., "InP DHBT 68 GHz frequency
divider", GaAs IC Symposium, pp. 193 -196, 1999.
19
T. Oka, K. Hirata, K. Ouchi, H. Uchiyama, K. Mochizuki, T. Nakamura, "Small-Scaled InGaP/GaAs HBT's
with WSi/Ti Base Electrode and Buried Si02, IEEE T-ED, Vol. 45, No. 11, pp. 2276-2282, 1998.
20
T. Ishibashi, "Influence of Electron Velocity Overshoot on Collector Transit Times of HBT's, Trans. El. Dev.,
Vol. 37, No. 9, pp. 2103-2105, 1990.
21
Modern GaAs Processing Techniques, Artech House Microwave Library, Ralph E. Williams - Editor,
August 1990.
22
Principles and Analysys of AlGaAs/GaAs Heterojunction Bipolar Transistors, Artech House, Juin H.
Liou, 1996.
23
K. Elliot S. Thomas III, Y. Brown, A. Kramer, M. Sokolich, M. Lui, and D. Hitko, "An Improved
Performance 62.7GHz Low Power Divide-by-16 InP-Based HBT Circuit," IPRM conf. proc, 2000.
24
J.F. Jensen, M. Hafizi, W.E. Stanchina, R.A. Metzger, and D.B. Rensch, "39.5-GHz Static Frequency
Divider Implemented in AlInAs/GalnAs HBT Technology", GaAs IC Symposium Tech. Dig., pp. 101-104,
1992.
25
H. Knapp, T. Meister, M. Wurzer, D. Zoschg, K Aufinger, L. Treitinger, "A 79GHz Dynamic Frequency
Divider in SiGe Bipolar Technology", ISSCC Proceedings, pp. 208-209,2000.
26
M. Sokolich, C. Fields, G. Raghavan, D.A. Hitko, M. Lui, D.P. Docter, Y.K. Brown, M.G. Case, A.R.
Kramer, J. A. Henige, J.F. Jensen, "Optimizing InP HBT Technology for 50 GHz Clock-rate MSI Circuits",
IPRM Proceedings, pp. 195-198, 1999.
27
K. Washio, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, T. Onai, "95GHz ft Self-Aligned Selective Epitaxial
SiGe HBT with SMI Electrodes", ISSCC Proceedings, pp. 312-313, 1998.
28
M. Wurzer, T. F. Meister, H. Knapp, K. Aufinger, R. Schreiter, S. Boguth, L. Treitinger, "53GHz Static
Frequency Divider in a Si/SiGe Bipolar Technology", Solid-State Circuits Conference, pp. 206-207,2000.
29
Y. Yamauchi, O. Nakajima, K. Nagata, H. Ito and T. Ishibashi, "A 34.8 GHz 'A Static Frequency Divider
using AlGaAs/GaAs HBTs", GaAs IC Symposium, pp 121-124, 1989.
This page is intentionally left blank
International Journal of High Speed Electronics and Systems, Vol. 11, No. 1 (2001) 245-256
© World Scientific Publishing Company

CANTILEVERED BASE InP DHBT FOR HIGH


SPEED DIGITAL APPLICATIONS
AUGUSTO L. GUTIERREZ-AITKEN, ERIC N. KANESHIRO,
JAMES H. MATSUI, DONALD J. SAWDAI, JOHANNES K.
NOTTHOFF, PATRICK T. CHIN, AND AARON K. OKI

Semiconductors Product Center


TRW Space & Electronics Group
One Space Park
Redondo Beach, CA 90278, U.S.A.

High speed digital logic is essential in diverse applications such as


optical communication, frequency synthesizers, and analog-digital
conversion. Current research efforts indicate that technologies
utilizing heterojunction bipolar transistors (HBT) are the preferred
approach for systems operating at clock frequencies of 40 GHz and
above (1-6). In this paper we report a novel InAlAs/InGaAs/InP
double-HBT (DHBT) with a cantilevered base layer and undercut
collector. We fabricated and demonstrated an 80 GHz 2:1 digital
frequency divider, and a 5 GHz 8-bit phase/7-bit magnitude Direct
Digital Synthesizer (DDS) chip with approximately 3000 transistors
using this technology.

1. Introduction

In an HBT device, the outside dimensions of the base contact metal determine the
base-collector junction area. However, the collector region that carries most of the
collector current is the area immediately under the emitter. Therefore, the areas that are
under the base contact metal can be removed with a minimum effect in current driving
capability. The removal of excess collector material under the base layer greatly reduces
the base-collector capacitance (CBC) (7). Since the base layer remains under the base
contact metal, the undercut process has little effect on the base contact and access
resistances. When this excess material is removed in the collector layer, a cantilevered
base with undercut collector HBT is created as illustrated in Fig. 1. The reduction in CBC
is proportional to the amount of collector undercut and the difference of the dielectric
constants in the undercut and no undercut areas. As a result of the reduction of CBC, the
fmax and/ r of the device are increased.
If the collector layer is totally depleted, the total CBc of the device shown in Fig. 1
is,
(\-a)A aA
CBC=Ci+2Cp=£oes^—^ +£ o eu-r (1)

245
246 A. L. Gutierrez-Aitken et al.

BASE METAL

SUB-COLLECTOR
S.I InP SUBSTRATE

Figure 1. Cantilevered base and undercut collector HBT.

where C, is the capacitance for the non-undercut area, 2CP is the capacitance for the
undercut area, EQ is the permittivity of vacuum, es is the dielectric constant of the
collector material, ev is the dielectric constant of the undercut area, Wc is the base-
collector depletion width, A is the base layer area, and a is the undercut factor defined as
the fraction of the base layer area that is undercut, a - AUNDERCUi/A

2. Cantilevered DHBT Structure and Process

The DHBT structure was grown in a solid-source molecular beam epitaxy (MBE)
system equipped with a valved phosphorus cracker. More details concerning the growth
can be found in (8). The structure is grown on a semi-insulating InP substrate and
consists of an n+ InP sub-collector, an n" InP collector, a linearly graded n" InGaAlAs
layer at the collector-base interface to minimize the current blocking due to the
InGaAs/InP conduction band discontinuity, a p+ InGaAs base (or InGaAlAs graded
base), an n InAlAs Emitter, and an n+ InGaAs cap layer.

The fabrication process starts with the definition of the emitters, followed by a self-
aligned base metal. Then, the base and the collector layers are etched sequentially with
selective wet etching solutions. It is at this step that the undercut in the collector layer is
formed. Fig. 2 shows an SEM photograph of the cantilevered base HBT cross-section.

After the undercut structure is created, the sub-collector is etched down to the
substrate for device isolation. Next, the collector ohmic metal is evaporated and the
devices are passivated with SiN. Finally, a lOOQ/sq thin film NiCr resistor and two
layers of gold interconnect metal with airbridge crossovers are used to form the circuits.
Cantiievered Base InP DEBT for High Speed Digital Applications 247

^ c;#«

C'ANTtLEVERKD
EASE LAYEE

•i^^^^^^HMi IlllBIIIl

Figure 2. SEM photograph of a cantiievered base HBT

3. Device Performance

Several groups of wafers with different DHBT structures were fabricated with the
cantiievered base process. Wafers with standard no-undercut devices were also processed
in some of the lots for control purposes. Excellent DC and RF performance were
obtained. Fig. 3 shows the measured I-V output characteristic for both types of devices
for a 1.5 X 10 pm2 emitter area HBT with 80 nm base and 700 nm collector with a DC
gain (P) of -25. The base currents were slightly reduced for the standard DHBT to
differentiate the curves. Ps over 100 were achieved for thinner and graded base layers.
The devices demonstrate nearly ideal DC characteristics with low offset voltage and
output conductance. Fig. 4 shows /MAX mdfT as a function of emitter current for a 1 X
10 pm2 emitter DHBT with 40 nm graded base and 400 nm collector biased at VCB =
1.25 V.

Cantiievered DHBT
Standard DHBT

_P3

8 9 10
V V
CE< >

Figure 3. DC output characteristics for cantiievered and standard HBTs


248 A. L. Gutierrez-Aitken et al.

300

l E (mA)

Figure 4. Measured fr and fMAx vs. emitter current of a 1 X 10 u,m2 emitter area
HBT with 40 nm graded base and 400 nm collector

The peak fMAx and fT are 263 and 127 GHz, respectively. fT and fMAX were
extrapolated from s-parameter measurements up to 50 GHz of h2i and U, respectively.
The amount of undercut in the collector is well controlled with good uniformity across
the 3 inch wafer. Fig. 5 depicts a fMAx comparison between a cantilevered base device
and a standard, no-undercut device biased at JE = 1 mA/^m2 and VCB = 0.75 V. The
figure shows/MAX at four different sites across the wafer. An average improvement of 25
GHz is demonstrated for the cantilevered base type device.

160

CANTILEVERED BASE HBT


140

0. 120 -

NON-CANTILEVER ED BASE HBT


100

1.5 X 10 u n r Emitter
80 _!_
Site 1 Site 2 Site 3 Site 4
Wafer Site

Figure 5. Measured /MAX of a 1.5 X 10 u.m2 emitter area undercut and no-undercut
HBT with 80 nm graded base and 700 nm collector. Similar results were obtained
for devices with 400 nm collector thickness.
Cantilevered Base InP DHBT for High Speed Digital Applications 249

Using as a first approximation, the well know equation for/MAX.

/ T
(2)
MAX \hnRBCBC

and Equation (1) with es = 12.5 for InP, ev = 1 for SiN, the 25 GHz increase in fMAX
translates to an undercut factor a of 0.74. This value of a matches very well with a = 0.7
derived from device layout and processing considerations assuming an optimum amount
of undercut.

A wafer lot with thinner collector DHBT structure was also fabricated using the
cantilevered base process to observe the effect of CBC change on fT. A reduction of CBC
translates into a reduction of the base-collector capacitance charging time (rc)
component of the emitter-to-collector total transit time (TEC)- Table 1 shows the peak/r
and/AMx obtained for increasing undercut amounts for a 1.5 X 10 u.m2 emitter device
with 40 nm graded base and 200 nm collector. The peak/ T was increased 22 GHz from
small undercut to large undercut amounts.

Table 1
Measured peak/7- and /MAX for increasing amount of undercut
For an DHBT with 40 nm graded base and 200 nm collector
Undercut Peak/7- Peak fMAX
Small 173 95
Medium 186 99
Large 195 109

We also investigated the further reduction of CBc by using an "elevated collector"


structure illustrated in Fig. 6. In this structure, the collector and all the layers above it
are elevated to increase the distance from the base to sub-collector (WT). The actual
collector thickness (Wc) remains unchanged. In this manner, the contribution of Cp to
CBC, see Eq. (1) and Fig. 1, is reduced without increasing the carrier transit time in the
collector.

We fabricated devices on a wafer lot with an elevated collector DHBT structure


using the cantilevered base process. Two wafers with standard collector (non-elevated
collector) were included in the lot as control wafers. All the wafers in the lot were
processed together with the same undercut factor. The structures had 40 nm graded base
and 400 nm collector layers. The elevated collector structure had WT = 800 nm. Fig. 7
shows the results of measurements of fT and /MAX as a function of emitter current for a
1.5 X 10 nm2 emitter area device. As expected, there was minimum change in/ T . Both
250 A. L. Gutierrez-Aitken et al.

structures have the same collector thickness and the collector is thick enough for r£C to
be dominated by transit time and not by base-collector capacitance charging time (re).
Conversely, the peak /MAX was increased 24 GHz from 192 to 216 GHz due to the
reduction of CBC-

BASE METAL

Figure 6. Elevated collector cantilevered base DHBT structure.

Standard Collector - 0 -
250 Elevated Collector - * -

Figure 7. Measuredfi-and /MAX for cantilevered base with and without elevated
collector structures.

4. Divider Circuit Design and Performance

Several circuits were designed and fabricated with the cantilevered base DHBT
process. Two examples of these are a frequency divider circuit and a direct digital
synthesizer circuit (DDS). Both circuits used an epilayer structure with 40 nm thick base
layer and 400 nm thick collector layer. Fig. 8 shows the schematic of the divider circuit.
CantUevered Base InP DEBT for High Speed Digital Applications 251

The divider consists of a current mode logic (CML) master-slave flip-flop with the
inverting output connected back to the input. The single-ended input clock is AC-
coupled, with an internal, voltage generator providing the appropriate bias level. The
differential output is buffered through cascaded emitter followers, and AC coupled with
a series resistor providing an output match. The key to this circuit operation is a
configuration and physical layout which minimizes the interconnect delay by keeping
the switching signal trace as short as possible.

|"fc-Wr- i — ^ y C ^ rWr^ -V/HK) outF

? ??

Figure 8. Divider schematic.

The circuit is fabricated using 1.5 X 4 p,m2 emitter area devices, biased at a peak
current density of 2x10sA/cm2. The overall chip size is 800 X 575 fxm2 and. contains 17
transistors. Fig. 9 shows a photo of the fabricated IC.

W5^^&&8888&888&LK^?r """WSJ

Figure 9. Divider IC photo.


252 A. L. Gutierrez-Aitken et al.

The divider was tested on wafer using an RF probe station and an HP8565E 50
GHz spectrum analyzer to monitor the output. As demonstrated in Fig. 10, the divider
operated at input frequencies up to 80.2 GHz. The divider draws 38 mA from a single
-2.6 V supply, dissipating 99 mW.

80,2 GHz
40,1 GHz

Figure 10. Output spectrum of the divider circuit with an input clock frequency
of 80.2 GHz and an output frequency of 40.1 GHz. The frequency span in the
figure is 1 MHz.

The sensitivity plot of the divider is shown in Figure 11. In this figure the results
are a combination of two different chips. The circuit design and topology of the two
chips, one operating from 55 to 75 GHz and the other from 75 to 80 GHz, is exactly the
same. The difference between the two chips is that the divider operating at high

20 V-Band Set W-Band Set

o
c 10
ttf

-10

I I I ! i
-20
50 55 60 65 70 75 80 85
Frequency (GHz)
Figure 11. Divider sensitivity.
Cantilevered Base InP DHBT for High Speed Digital Applications 253

frequency was fabricated using a higher speed HBT device. A V-band test set
was used for the measurements up to 75 GHz and a W-band test set for the
measurements above 75 GHz.. The vertical axis is not calibrated for cable and probe
losses.

In the divider design, two key features are used to extend the frequency of operation
beyond traditional static flip-flop implementations. Inductive peaking is used in the
loads to increase the high frequency gain, allowing use of lower value resistive loads.
The reactive load also provides some phase lead, effectively reducing the propagation
delays at high frequencies. Inductive peaking has been shown to extend the operating
frequency of digital circuits without necessarily impacting low frequency operation (9).
By implementing the inductors as coupled transmission lines, the high coupling
efficiency attained allows short electrical lengths (< 100 u.m) and significantly improves
high frequency performance.

The second key feature is a patented scheme for asymmetric current switching that
allows the ratio of track pair to latch pair current to be varied through sharing of current
between the master and slave flip-flop and the use of emitter degeneration resistors in
the clock latch path. By increasing the track pair current relative to the latch current,
the track dynamic characteristics such as setup time and propagation delay can be
improved while preserving the basic static nature of the latching circuit. This is similar
to the "HLO-FF" approach in (10), which separates the track and latch pair currents,
resulting in up to 30% improvement in switching speeds, but with some increase in the
minimum operating clock frequency of the flip-flop if the ratio is made too high. Our
approach differs from the "HLO-FF" approach in that a single current source is shared
between master and slave, which considerably simplifies the layout, and an emitter
degeneration resistor is used instead of scaling device geometries, which allows us to
finely adjust the ratio of currents and use minimum device geometries throughout the
design.

This scheme offers advantages over dynamic circuit techniques such as replacing
the positive feedback latch pair with a negative feedback pair ("super-dynamic" flip-
flop) (9), or RF techniques such as injection locking a broadly tunable oscillator (11).
The frequency range of operation can be easily traded off with the maximum frequency,
and it retains the basic functional characteristics of static dividers such as ease of
integration, small size, and robustness to component variations.

5. Direct Digital Synthesizer (DDS) Design and Performance

A DDS circuit is a monolithic integrated circuit incorporating two main functions, a


Digital Sinewave Generator (DSG) and a Digital-to-Analog Converter (DAC). Wide
bandwidth, high linearity DDS systems have inherent advantages over conventional RF
approaches because of their time, phase, and frequency agility and precision. This level
of control for high speed applications requires very high speed digital clock rates and the
DDS performance is ultimately limited by the characteristics of the device technology.
254 A. L. Gutierrez*Aitken et al

In order to achieve the ultra fast digital, clock rates required, the InP DHBT devices need
to approach near ideal bandwidth and linearity performance. Fig, 12 shows a picture of
the fabricated 8-bit phase/7-bit magnitude DDS chip with approximately 3000
cantilevered base transistors.

Figure 12. DDS chip photo.

The DDS chip was also tested, and characterized on wafer using an RF probe
station. Fig. 13 shows a simulated (a) and a measured (b) sine wave output with a
frequency control word of 1. The input clock frequency is 5 GHz and the synthesized
output sine wave frequency is 19.5 MHz

fe>MlMt •" ** * ** 6S.9S38 5na'div 116.9n«

(a) (b)

Figure 13. Simulated (a) and measured (b) output sine wave of the DDS circuit with an
input clock frequency of 5 GHz and a frequency control word of 1. The output has a
frequency of 19.5 MHz.
CantUevered Base InP DHBT for High Speed Digital Applications 255

Fig. 14 shows a simulated (a) and a measured (b) sine wave output under Nyquist
condition with a frequency control word of 127 and an input clock frequency is 5 GHz.

Figure 14. Simulated (a) and measured (b) output sine wave of the DOS circuit under
Nyquist condition with an input clock frequency of 5 GHz and a frequency control
word of 127

6. Summary
We developed a novel cantilevered base and undercut collector InP DHBT technology
that demonstrated improvement on RF performance as a result of the reduction of the
base-collector junction capacitance. The devices demonstrate excellent DC
characteristics with low offset voltage and output conductance. A peak/ r of 195 GHz
was obtained for a structure with 200 nm collector layer and a peak fMAX of 263 GHz
was obtained for a structure with 400 nm. Using this technology, we demonstrated a 2:1
digital frequency divider operating at 80.1 GHz and dissipating only 99 mW. We also
demonstrated a fully operational DDS chip with more than 3000 transistors operating up
to 5 GHz of clock frequency. To the best of our knowledge, this represents a world
record and demonstrates the potential of the InP HBT for ultra-high speed logic.

Acknowledgment

This work is supported by ONE (Max Yoder, Daniel Purdy) under contract No.
N00014-98-C-0111.
Eeferences

L M. Case, S.Knorr, L. Larson, D. Rench, D. Harame, B. Meyerson, and S. Rosenbaun,


"A 23 GHz Static 1/128 frequency divider implemented in a manufacturable Si/SiGe
256 A. L. Gutierrez-Aitken et al.

HBT process", IEEE Bipolar/BiCMOS Circuits and Technology Meeting, 1995, pp.
121-124.
2. Y. Yamauchi, O. Nakajima, K Nagata, H. Ito, and T. Ishibashi, "A 34.8 GHz V*
static frequency divider using AlGaAs/GaAs HBTs", IEEE GaAs IC Symposium
Tech. Dig., 1989, pp. 121-124.
3. J.F. Jensen, M. Hafizi, W.E. Stanchina, R.A. Metzger, and D.B. Rench, "39.5 GHz
static frequency divider implemented in AlInAs/GalnAs HBT technology", IEEE
GaAs IC Symposium Tech. Dig., 1992, pp. 101-104.
4. Y. Amamiya, T. Niwa, N. Nagano, M. Mamada, Y. Suzuki, and H. Shimawaki, "40
GHz frequency dividers with reduced power dissipation fabricated using high-speed
small-emitter area AlGaAs/InGaAs HBTs", IEEE GaAs IC Symposium Tech. Dig.,
pp. 121-124, 1998.
5. M. Sokolich, D.P. Docter, Y.K. Brown, A.R. Kramer, J.F. Jensen, W.E. Stanchina,
S. Thomas III, C.H. Fields, D.A. Ahmari, M. Lui, R. Martinez, and J. Duval, "A low
power 52.9 GHz static divider implemented in a manufacturable 180 GHz
AlInAs/InGaAs HBT IC technology", IEEE GaAs IC Symposium Tech. Dig., 1998,
pp. 117-120.
6. Q. Lee, D. Mensa, J. Guthrie, S. Jaganathan, T. Mathew, Y. Betser, S. Krishnan, S.
Ceran, and M.J.L. Rodwell, "66 GHz static frequency divider in transferred-substrate
HBT technology", IEEE RF IC Symposium Tech. Dig., 1999, pp. 87-90.
7. W. Liu, D. Hill, H-F. Chau, J. Sweder, T. Nagle, and J. Delaney, "Laterally etched
undercut (LEU) technique to reduce base-collector capacitances in heterojunction
bipolar transistors", IEEE GaAs IC Symposium Tech. Dig., 1995, pp. 167-170.
8. T.P. Chin, A.L. Gutierrez-Aitken, J. Cowles, A-C. Han, E.N. Kaneshiro, T.R. Block,
and D.C. Streit, "Growth and characterization of InAlAs/InGaAs/InP heterojunction
bipolar transistors by valved phosphorus cracker", 7th North American Molecular
Beam Epitaxy Conf., 1998.
9. T. Otsuji, M. Yoneyama, K. Murata, E. Sano, "A Super-Dynamic Flip-Flop Circuit
for Broadband Applications up to 24 Gbit/s Utilizing Production-Level 0.2 um GaAs
MESFETs", IEEE GaAs IC Symp. Tech. Dig., pp. 145-148, 1996.
10. K. Murata, T. Otsuji, M. Ohhata, M. Togashi, E. Sano, and M. Suzuki, "A Novel
High-speed Latching Operation Flip-Flop (HLO-FF) Circuit and Its Application to a
19 Gb/s Decision Circuit Using 0.2 um GaAs MESFET", IEEE GaAs IC Symp.
Tech. Dig., pp. 193-196, 1994.
11. C. Madden, D. Snook, R. Van Tuyl, M. Le, L. Nguyen, "A Novel 75 GHz InP HEMT
Dynamic Divider", IEEE GaAs IC Symp. Tech. Dig., pp. 137-140, 1996.
International Journal of High Speed Electronics and Systems, Vol. 11, No. 1 (2001) 257-305
© World Scientific Publishing Company

RSFQ TECHNOLOGY: PHYSICS AND DEVICES

PAUL BUNYK, KONSTANTIN LIKHAREV, and DMITRY ZINOVIEV


State University ofNew York at Stony Brook
Stony Brook, NY 11794-3800, U.S.A.

Rapid Single-Flux-Quantum (RSFQ) logic, based on the representation of digital bits


by single quanta of magnetic flux in superconducting loops, may combine several-
hundred-GHz speed with extremely low power dissipation (close to 10"18 Joule/bit) and
very simple fabrication technology. The drawbacks of this technology include the
necessity of deep (liquid-helium-level) cooling of RSFQ circuits and the rudimentary
level of the currently available fabrication and testing facilities. The objective of this
paper is to review RSFQ device physics and also discuss in brief the prospects of
future development of this technology in the light of the tradeoff between its
advantages and handicaps.

1. Introduction
The most authoritative industrial forecast, the International Technology Roadmap for
Semiconductors (ITRS),1 predicts that the current spectacular growth in density of
semiconductor digital integrated circuits ("Moore's Law") will continue for at least one
more decade, increasing the integration scale by almost three orders of magnitude by the
end of that period. The outlook for speed is, however, entirely different: the document
predicts that the exponential growth of microprocessor clock frequency which was
typical for the last two decades, will slow to a crawl right after the recent crossing of the
1 GHz frontier. If the anticipated mass production of integrated circuits with sub-100-
nm features does not materialize because of skyrocketing fabrication costs, the
prospects for multi-GHz operation of the mainstream CMOS logic circuits will seem
even more bleak.
In order to explore alternative ways to overcome the integrated circuit speed
saturation, it is important to recognize that this problem has virtually nothing to do with
the intrinsic switching speed of semiconductor transistors. The intrinsic switching time
of a modern silicon MOSFET is below 10 ps.2 This does not mean, however, that such
transistors may enable digital integrated circuits with a 10-ps-scale clock cycle. In fact,
a dominant share of the much longer, 1-ns-scale clock cycle in modern integrated
circuits is spent on recharging the interconnect capacitances (C) by the on-currents (7) of
logic gate transistors. (The relative contribution of the gate capacitance to C is almost
negligible.)1'3 The most apparent way to speed up the recharging process is to increase
the output current /, e.g., by using transistors with wider channels. This, however,

257
258 P. Bunyk, K. Likharev & D. Zinoviev

immediately leads to growth of the dynamic power consumption. A convenient measure


of this power is its average density (power per unit area)

Po = C0V2fc, (1)

where V is the logic swing (typically close to the power supply voltage VDD), fc is the
clock frequency, and C0 is the effective total interconnect capacitance per unit area. The
latter parameter is virtually independent on the active device parameters (for current
interconnect technologies, C0 ~ 10"8 F/cm2).4 As a result, P0 is very large even now:
modern 1-GHz microprocessors burn above 50 watts on a l-cm2-scale chip area.5 This
power density should be compared to -0.1 W/cm2 power flux from the direct sunlight or
~10 W/cm2 from a kitchen's hot plate. Removal of such enormous power from a chip
without its overheating presents a very serious technical challenge. The ITRS1 foresees
that no more than 185 W will be lifted from a chip even by the year 2014, indicating
that there are virtually no known reserves for this way. In addition, C0 will continue to
grow due to the increase of the wiring level number (see, e.g., Fig. 7 of Ref. 4).
The only visible path to the increase of speed in semiconductor digital devices, while
keeping power within acceptable limits, is to decrease the power supply voltage VDD
while keeping the on-current / fixed. In CMOS circuits, this reduction requires an
increase of the ratio I/VDD, i.e. the transistor transconductance gm, via shortening the
MOSFET gate. This is a very arduous (and increasingly expensive) process which will
require the commercial introduction of radically new patterning techniques to
implement minimum features smaller than ~100 nm. Even if this size reduction is
implemented, the speed improvements will be rather marginal.1
The problem of heat removal is not limited to CMOS circuits. Novel semiconductor
devices like heterojunction bipolar transistors* or resonant tunneling diodes7
demonstrate even more remarkable speed of internal switching (beyond 100 GHz), due
to their low internal parasitics. However, they do obey the lower power bound
expressed by Eq. (1) and have voltage swings V comparable to those of the CMOS
transistors. As a result, an attempt to use them in faster VLSI circuits would lead to the
same heat-limited performance saturation.
A remarkable opportunity to solve the speed saturation problem is provided by
superconductor integrated circuits which can operate above 100 GHz. The goal of this
paper is to give a brief review of this opportunity and discuss prospects and problems of
the so-called Rapid Single-Flux-Quantum (RSFQ) logic which is currently the focus of
work in this field. A review of current and possible near term future applications of this
technology from the system point of view is given by D. Brock in another article of this
special issue. Somewhat more technical, albeit already somewhat outdated, reviews may
be found in Refs. 8, 9; for popular accounts of RSFQ technology, see Refs. 10, 11.
Current research work in this field may be followed conveniently via proceedings of the
biennial Applied Superconductivity Conferences, which are being published in special
issues of the IEEE Transactions on Applied Superconductivity, and via Web home
pages of several groups.12
RSFQ Technology: Physics and Devices 259

2. Superconductor Digital Electronics

2.1. Superconductor transmission lines

The main advantage of superconductor digital circuits may be not in active devices, but
in interconnects. Superconductors have very low ac loss below the "gap" frequency

f& = 2A(T)/h, (2)

where A(7) is the superconductor energy gap (at T< TJ2, A(7) = 1.76 kBTc, where Tc is
its critical temperature).1314 For the superconductor most practical for integrated circuit
fabrication, niobium, Tc is close to 9 K (degrees Kelvin) and f& to 700 GHz.
As a result, on-chip superconducting transmission lines (Fig. 1) with very thin
dielectric layers, with d~ 0.1 (im, have very low attenuation for picosecond signals, and
may be used for transfer of picosecond waveforms over any on-chip distances.15,16.
Since insulation thickness for such lines may be much smaller than the strip width w, the
electromagnetic field is very well localized in narrow gap(s) between the strips and
ground(s), so that such interconnects also have very low crosstalk.17 In order to
implement such low crosstalk in semiconductor transistor technology at comparable
frequencies, one would need to use ground planes with similarly small distance d from
the strips. However, if implemented with normal metals, such interconnects would have
very high attenuation, since d would be comparable to the skin depth 8((D). In
superconductors, however, the dissipative field penetration by skin depth 8(to) is
replaced with non-dissipative, frequency-independent field penetration by the so-called
London depth X, about 0.1 \im for niobium thin films.

(a)
w
->

^ d-.-.
a:v::::::
•••••t-'rf---

Fig. 1. In superconductor electronics, on-chip interconnects such as (a) microstrips and (b) striplines with
submicron dielectric layers may have negligible attenuation and dispersion for picosecond waveforms
transfer over distances of a few cm. Lines with min [d, d\ X] « w also have virtually negligible crosstalk
even if the wiring pitch is close to the strip width w.

2.2. Josephson junctions

The second important component in the arsenal of superconductor electronics is the


Josephson junction, a two-terminal device which physically is just a weak contact
13,14,18
between two superconductor electrodes. For present practice, the most important
260 P. Bunyk, K. Likharev & D. Zinoviev

type of such a contact is a "niobium-trilayer" (Nb/A10j/Nb) tunnel junction19,20 in which


the weak contact between two niobium thin-film electrodes is provided by tunneling
through a ~l-nm-thick layer of aluminum oxide.
Josephson junctions feature very unusual dynamics18 because of the macroscopic
quantum nature of charge carriers (Cooper pairs) in superconductors.13'14 In contrast to
Fermi particles (single electrons and holes) in normal metals and semiconductors,
Cooper pairs have integer spin and hence obey the Bose statistics. As a result, they form
a coherent condensate which may be described with a single wavefunction

¥ M ) = Mexp{/<p(r,0}. (2)
The wavefunction amplitude |i|/|, which is proportional to the square root of the
Cooper pair density, is almost constant inside Josephson junction electrodes, but the
wavefunction phase <p exhibits fascinating dynamics. Indeed, plugging Eq. (2) with |vp| =
const into the Schrodinger equation

ihdy/dt = Hxp, (3)

for superconductors in equilibrium when the Hamiltonian operator H is just the Cooper
pair energy E = 2e[i + const (where \i is the local Fermi level, i.e. the electrochemical
potential), we get the following equation for the phase evolution:

3cp(r, t)ldt = -(2e/ft)n(r, t). (4)

Subtracting Eqs. (4) written for two arbitrarily fixed points inside superconductor
electrodes, we get a the fundamental relation between the phase drop <> j s q>i - (fr and
voltage drop V= |i.2 - Hi between these points:

dtydt = {2elh)V(t). (5)

The last equation has been experimentally confirmed to be accurate to at least the
15th decimal place and is presently used in the legal definition of the volt.
The spatial and temporal dynamics of the phase (p(r, t) affects all properties of
superconductors; in particular, it determines the flow of Cooper pairs ("supercurrent")
in a Josephson junction:13'14'18

Is = h sin <>
|. (6)

Here 0 is the phase drop across the junction, while Ic is its "critical current" which
depends on its area and barrier transparency.
According to Eqs. (5), (6), for small perturbations of the supercurrent Is a Josephson
junction behaves as a (nonlinear) dynamic inductance

Lj=d VId idl/dt) = Lc/costy, (7a)


where
RSFQ Technology: Physics and Devices 261

Lc^tillelc (7b)

However, for large signals the Josephson junction dynamics is substantially more
complex. For a feir description of this dynamics, one should take into account three
other components of the current through the junction:

I(t) = Ic sin <)> + CdVldt + V/R + If{i), (8)

where C is the junction capacitance and R its "normal resistance". (Generally, 1/7? is the
sum of a nonlinear "quasiparticle" conductance G^V) of the junction itself and a linear
conductance of an external shunt, but in present-day circuits the latter term dominates,
so that R may be considered constant.) The (typically small) term / / » gives the
Langevin description of current noise in the normal resistance R. For externally-shunted
junctions, a fair description of this noise is given by the Nyquist formula which may be
written as either

<l/> = (4kBT/R)Af, (9a)


or
<If(t)If(0> = (2kBTIR) Sit-/). (9b)

The system of equations (6), (8) gives an implicit relation between the current and
voltage in Josephson junctions. Its analysis18 shows that these junctions allow
generation of various picosecond waveforms. Moreover, due to the fundamental relation
(5) the junctions may recover weak incoming pulses, restoring their waveforms to a
nominal value.
In addition to that, the effective impedance I Z\ ~ R of a Josephson junction may be
matched to that of on-chip superconductor interconnect lines (Fig. 1), ensuring effective
insertion of generated signals into the line and reception of signals incident from the
line. Also, these devices operate with low signal voltages (V ~ ICR ~ 1 mV) and as a
result their scale of power dissipation, P ~ F2Re(Z') ~ IC2R, is extremely low, typically
of the order of 1 u"W. Moreover, junctions may be in their superconducting state, with
no dissipation at all, most of the time, so that the average power dissipation may be
reduced well below 1 uW per logic gate even at 100-GHz-scale frequencies.
Combined in one device, all these features enable extremely fast digital signal
processing together with very low dissipation. As a result, device integration and chip
packaging can be extremely dense, saving more time on signal propagation delays in
multi-chip systems. Finally, in contrast to most advanced semiconductor devices, the
fabrication technology of niobium-based integrated circuits is very simple. Though
these circuits are usually formed on the readily available, standard silicon wafers, they
do not require any silicon processing (like ion implantation or high-temperature
diffusion), but rather just deposition of several metallization layers including
superconductor interconnects and one or two normal-metal layers for Josephson
junction shunting and biasing (Fig. 2).
262 P. Bunyk, K. Likharev & D. Zinoviev

2.3. Latching logics

The recognition of the advantages of superconductor integrated circuits has motivated


several attempts to develop a practical Josephson junction digital technology, among
them, the large-scale IBM effort in the USA (1969 - 1983)21 and the MITI project in
Japan (1981-1990).22 These projects should be credited for several important
contributions to superconductor electronics. However, they were terminated without
commercialization of the technology because the achieved circuit speed (clock
frequency ~ 1 GHz by 1990) was only marginally higher than that of the contemporary
semiconductor transistor circuits, and could hardly justify the necessary helium cooling.
The main factor limiting the speed was the unfortunate choice of so-called latching (or
"voltage-state") circuitry based on the properties of unshunted Josephson tunnel
junctions.

Fig. 2. A commercially available 10-level niobium-based process for superconductor integrated circuns
includes 4 superconductor metal layers M for wiring, a device definition layer IIA for Josephson junctions, a
resistor layer R2, a gold layer R3 for the contact pads, and three sets of vias to connect the conducting layers.
Figure courtesy of HYPRES, Inc.

As can be readily shown131418 from Eqs. (5) and (8), such junctions, when biased
with a dc current within the range -Ic < I < +Ia n a v e two different states: a
superconducting state with vanishing voltage drop V across the junction, and a resistive
state with 1^1=^ = 2A(T)/e (for niobium-trilayer junctions, Vg <* 2.8 mV). In latching
logic, the superconducting state is used to denote binary "0", while the resistive state
represents binary "1". Here comes the problem: switching from "0" to " 1 " may be rather
fast, a few picoseconds for junctions with high critical current density j c — Id A (of a few
kA/cm2). However, the reciprocal switching ("l"-»"0") is much more complex18 and
should be long, of the order of one nanosecond, to avoid errors. Recently, several
relatively simple latching circuits have been tested at clock frequencies of a few
GHz;24'25 this speed is, however, considerably (by a factor of 10 or so) slower than that
of modern, RSFQ circuits - see below - at the same level of fabrication. This is
essentially the price which was paid for an attempt to mimic the information
representation by dc voltage, which is the only option is semiconductor electronics, but
is very unnatural for superconductors with their macroscopic quantum dynamics.
RSFQ Technology: Physics and Devices 263

From the practical point of view, another problem of latching logic was even more
formidable. Most latching devices must be driven by an external clock signal which also
provides the necessary power. The total current needed to run an LSI circuit could reach
many amperes, and feeding integrated circuits with such huge currents at multi-GHz
frequencies would create severe crosstalk between the off-chip segments of ac power
and signal lines.26

2.4. SFQ logics

An alternative approach to use superconductors for computing is based on their natural


property13'14 to quantize magnetic flux <& = \BjiA through any closed superconducting
loop in multiples of the flux quantum O0. Indeed, let us plug in Eq. (5) written for two
end points of an almost closed loop into Faraday's induction law for this loop:

d®ldt=V. (10)

Integration of the resulting equation over time yields the relation between the magnetic
flux and Josephson phase difference:

<>
t = 2JC*/*O, (11)

where the fundamental constant combination

®0= h/2e~2x\0lsWb, (12)

is called the magnetic flux quantum. (Due to the relation (11), the variable 0(r, t) =
(<JV2rc)<p(r, t) is frequently referred to as "flux" in a given point of the circuit, even if it
does not belong to any specific superconductor loop.)
Now, closing the ends of the loop we have to require that the wavefunctions in these
two (now identical) points coincide, besides maybe a phase difference multiple of 2jt.
Then Eq. (11) immediately yields the flux quantization:

O = «<&„, B = 0,±1,±2,... (13)

Since 1961, this prediction has been repeatedly verified experimentally with high
accuracy.
Evidently, digital information can be coded by certain values of the integer n, for
example, the flux states with n - 0 and » = 1 may be used to represent binary zero and
one, respectively. If a superconducting loop is made of a bulk superconductor,
switching between the different flux states requires the suppression and restoration of
superconductivity in at least some cross-section of the loop; the latter process would
take too much time (~100 ps for niobium). However, if the loop is interrupted with a
Josephson junction, switching may be performed much faster (for niobium-based
junctions, in a fraction of a picosecond).
264 P. Bunyk, K. Likharev & D. Zinoviev

Let us consider the simplest, but representative, single-flux-quantum (SFQ) circuit


with just one Josephson junction (Fig. 3), usually called SQUID (standing for
Superconducting QUantum Interference Device). In order to describe it, one should
combine Eqs. (8), (11) with the usual equation for the total magnetic flux through the
loop:
4> = 4>ex-L/, (14)

where L is the loop inductance, and <&ex is the external magnetic flux. (In practical SFQ
circuits, it is frequently more convenient to create this flux by passing an external
current 1^ through a part of the loop: 3>ex = M ^ , where Mis the inductance of this part -
see Fig. 3a). Neglecting the small noise Iff) for a while, we get the following simple
stationary relation between the external and total magnetic flux:

§ + / sin <(> = (jfe (15)

where we have used convenient definitions

l=2nLIcl<S>0, tym&2Tffbai/®0. (16)

external current /ex (a) (b)


" "1
\l/ loop inductance L .... ....
—-
—-
L-_~1
voltage V(f) —"
phase <>
| (f) "" 1111 i. i i

J ''''
.....

Josephson
superconductor junction ^ =-a—

(e.g.,Nb) (e. g., Nb/AIOx/Nb)


. , """ 1 "I i \_y
20
Time
Fig. 3. (a) The simplest SFQ circuit ("SQUID") which may serve as a generator of single SFQ pulses and (b)
dynamics of its switching in the moment when the externally applied flux induced by the slowly changing
current /„, reaches its threshold value (see Fig. 4). In panel (b), time is in units of to (defined by Eq. (17)
below), voltage in units of IQR- Inductive parameter / equals 2n; shunting parameter p c (defined by Eq. (18)
below ) equals 1.

Equation (15) shows that if the LIC product is small (/ < 1), the Josephson phase <>|
(and hence the total magnetic flux * through the loop) is a unique function of <)>ex, i.e.
the applied flux. This means that the insertion of a Josephson junction with small
critical current suppresses the flux quantization effect described by Eq. (13), since the
loop is virtually broken by the Josephson junction. Such loops are called "non-
quantizing".
RSFQ Technology: Physics and Devices 265

However, if the product LIC is large enough compared to 3>o (/ > 1), the Josephson
phase difference <> | , and hence the total magnetic flux O and persistent supercurrent / =
Ic sin<(), may have several stable stationary states, (|>„ « 2wn, i.e. <& „ = n4>0, for the same
external field <t>ex - see Fig. 4. This means that the insertion of a Josephson junction with
a sufficiently large critical current retains the flux quantization effect (the loop is
"quantizing"), but modifies it. In particular, the difference between the neighboring
values of O is somewhat smaller than <&o, but if / is not very close to 1, this reduction is
small. The Josephson junction also limits the number of stable flux states to N ~ //jr.; in
typical RSFQ circuits, / is close to 2m (i.e., LIC~ <*>o), and one can conveniently work
with only two flux states. Moreover, by fixing the dc flux bias at <b<J2 (see the dashed
vertical line in Fig. 4), these two states (« = 0 and n = 1) may be "equilibrated", i.e.
provided with equal energy and stability.

O ex /<1>0
-0.5 - 1 -

Fig. 4. RF SQUID: total magnetic flux * as a function of applied flux * „ as given by Eq. (15) for the Lie
product typical for "quantizing" loops in RSFQ circuits (/ = 2n; i.e. Lie = 4>o> *o/2rc). Arrows indicate flux-
state switching induced by a slow change of the external field - for dynamics, see Fig. 3b. Solid points show
two stable states at the equilibrating value O0x = Oo/2.

Switching between the two states may be achieved by changing <I>CX (i.e., via
changing 7ex - see Fig. 3). At large values of inductance ( / » 1) this switching may be
conveniently understood as the result of the Josephson junction current exceeding its
critical current Ic- Any current beyond this value cannot be carried by supercurrent (6)
alone, so that eventually the "normal" current IN = VIR should pick up the difference;
since according to the basic Eq. (5), V ~ dty/dt, the Josephson phase difference should
start moving beyond the critical value n/2, decreasing Is and leading to the further
growth of IN, V, and dtydt. This positive-feedback (exponential) growth of the phase
change speed ends only when the phase has come close to its initial value (0 < <|>< ic/2)
plus 27C, i.e. when the 2% leap of the phase has been performed - see Fig. 3b and 4.
As follows from Eqs. (5) and (8), the time of the phase leap is of the order of a few
units of
x = max [to, RC], To = LCIR = nl2elcR = <&ol2TdcR- (17)

In order to have non-oscillatory transient dynamics, with the unambiguous selection


of the final state (and hence avoid the dynamics complications which have killed the
266 P. Bunyk, K. Likharev & D. Zinoviev

latching logics), the time constants should be related as RC < i0; this relation may be
expressed as

pc = RC/T0 = (2Tt/®0)IcR2C < 1, or R < (<&O/2JI/CC)1/2. (18)

For stand-alone runnel junctions this condition is not satisfied (until their critical current
density is extremely high, see Sec. 4.8 below); this is why in SFQ circuits the junctions
are externally shunted with thin-film resistors (Fig. 5) to decrease the total value of R.
Nb/AIO^Nb trilayer

Fig. 5. Typical layout of an externally shunted Josephson junction assembly. The whole circuit is fabricated
over a common, mostly unpattemed ground plane (not shown). Figure courtesy of HYPRES, Inc.

The shunt resistance is selected to provide just the critical damping (p c ~ 1), since
further reduction of R would only slow down the switching speed. At this value of P„
the transient time constant may be expressed as

Xb= /?C=(O 0 C/2JC/ C ) 1 / 2 . (19)

For tunnel junctions, to does not formally depend on the junction area A (since both C
and Ic are proportional to A), but only on the ratio of the critical current density jc =
IJA to the specific capacitance C0 - CIA. The density j c depends exponentially on the
tunnel barrier thickness and hence may be readily adjusted to a desirable level within
very broad limits, by changing aluminum oxidation time and oxygen pressure. Its
practical choice is determined by two important limitations on the junction critical
current.
On one hand, thermal fluctuations (9) may toggle an SFQ circuit from one state into
another spontaneously. Detailed analyses18 show that in order to make the
corresponding bit error rate T to be sufficiently low, Ic should satisfy the relation:

/ c >3/ r ln(l/27iPco), (20a)

where IT is the current equivalent of the thermal fluctuation scale kBT:


RSFQ Technology: Physics and Devices 267

IT=(2n/<&0)kBT. (20b)

For the usual temperature of operation of niobium-based circuits (T= 4-5 K), IT is about
0.2 uA, so that for a reasonably low bit error rate, Pro ^ 10~30, Ic should not be less than
50 uA. As will be discussed in Sec. 3.3 below, during the junction switching the effect
of fluctuations is even more dangerous, so that Ic should be even somewhat larger,
above ~ 100 uA.
On the other hand, excessively larger 7c's lead to excessive power dissipation (and
eventually may result in local overheating). Indeed, in the stationary state Josephson
junctions do not dissipate energy, because at <j>(0 = const, voltage V across the junction
vanishes completely - see Eq. (5). Power P - IV is finite (i. e., energy is dissipated)
only during the transient, leading to the energy loss

AE = llVdt = (4>o/2jt) \ldty ~ (*O/2JI) IC In = I^0 (21)

per each SFQ switching event.


A reasonable compromise between thefluctuationlimitations and power growth is to
have Ic of the smallest junction about /„ = 125 \xA (for this value, Eq. (21) gives AE «
2xl0"19 joule per junction switching event) and hence the critical current density

jc~IJA^. (22)

where AmiB is the smallest junction area available at the given patterning technology.
Table 1 lists the values of to a n d other key parameters of SFQ circuits for several
values of Am^. (This scaling has been confirmed in numerous experiments.) The table
shows that superconductivity allows a natural implementation of very fast bistable
devices with extremely low power consumption. Now, a major problem is how to pass
the information about the flux state (or equivalently about the Josephson phase
difference ()>) of one loop to other similar cells. (Again, V= 0 for each of these states, so
that usual interconnects would carry no information at all.)
There have been two approaches to this problem, which may be called static and
dynamic SFQ, respectively. In the former approach, the information is passed quasi-
statically via superconducting wires. Unfortunately, the unavoidable inductance L of a
wire causes a Josephson phase drop along it:

A(p = (2rt/<I>o) \V{t)dt = (2n/<&o) U. (23)

similar to the voltage drop along normal-metal wires. As a result, only a fraction of the
initial phase signal reaches the destination; hence, a "phase amplifier" is needed. Such a
device, named the Parametric Quantron was suggested33 in 1976. Its simplest version is
similar to the SQUID shown in Fig. 3a, but includes a Josephson junction whose critical
current Ic can be externally controlled. (Either a long junction or two or more lumped
junctions in parallel may be used for this control.18) Modulation of Ic by an external
268 P. Bunyk, K. Likharev & D. Zinoviev

clock signal may create a local Josephson phase and energy gain, thus enabling control
of the final state by a relatively small input signal from similar cells.
Parametric-Quantron-based circuits may have several interesting properties,
including the ability to process digital information reversibly,34 with energy dissipation
per bit well below both the apparent "thermodynamic limit" kBT\u2 and the apparent
"quantum limit" ~h/z.3S Unfortunately, detailed analyses36"41 have shown that the critical
parameter margins of practical Parametric Quantron circuits are rather low. Moreover,
lack of long passive interconnects makes the information transfer over few-mm-scale
distances forbiddingly slow.

Table 1. Scaling of niobium-trilayer junctions with p\,= 1.

Assumed Deep-
Fabrication HYPRES 23 TRW 31 in COOL submicron
technology core (see
design32 Sec.5.3)
Minimum junction size F = V^min
3.5 1.75 0.8 0.3
(urn)
Critical current density jc
1 4 20 150
(kA/cm2)
Specific capacitance Co
5 6 7 8
(iiF/cm2)
Voltage scale ICR
0.3 0.5 1.1 2.0
(mV)
=
Time scale t 0 (®oCo/2njc) 1.1 0.7 0.3 0.17
(ps)
Power scale I„R
0.04 0.07 0.14 0.25
(uW)

2.5. Dynamic SFQ: The basic idea

Currently, we are in the midst of a new attempt to develop a competitive superconductor


digital technology, using dynamic SFQ devices. The basic idea of these devices is to use
transient dynamics for information transfer. Indeed, according to Faraday's induction
law V= dQ>/dt, during the switching between the neighboring flux states (Fig. 4) a short
voltage pulse is formed across the junction (Fig. 3b). Since for SFQ circuits the flux
change is quantized (AO = 4>o), so is the pulse area:

\V(t)dt = Oo = 2 mVxps. (24)

For typical critically shunted Josephson junctions, the FWHM switching time is of
the order of 4% (Fig. 3b), i.e. a few picoseconds, so amplitude of the pulse Vmax *=
4>o/4to -1.5 ICR is of the order of a millivolt - for more specific numbers, see Table 1.
In dynamic single-flux-quantum circuits these "SFQ pulses" are passed to other
devices along either passive superconductor transmission lines (Fig. 1) or, if
RSFQ Technology: Physics and Devices 269

current/power gain is needed, active Josephson transmission lines (see Sec. 3.1 below).
Dynamic SFQ circuits are very attractive because the pulses can be naturally generated,
reproduced/recovered, memorized and processed with simple SFQ devices whose speed
is much higher, and energy dissipation much smaller, than that of the latching logic.
Another feature which distinguishes dynamic SFQ circuits from other logics using
two-terminal devices is the pulse nature of the signals. For such picosecond signals,
even a-few-pH inductance may provide a substantial isolation between the circuit input
and output. For more usual signals such as voltage steps in semiconductor electronics,
three-terminal devices like transistors are very essential to provide sufficient isolation.
In contrast, RSFQ circuits with their return-to-zero signals may be quite robust despite
using just two-terminal devices, Josephson junctions, eliminating the need for
superconductor transistors.
Historically, some prototype dynamic SFQ circuits were discussed by several
authors since the mid-1970s.42Jt8 It was only in 1985-86, however, that a complete
family of dynamic SFQ logic circuits, broadly known as RSFQ, was suggested.49'50 The
first simple devices of this family were experimentally implemented in 1986-1990 - see
the early review.8 Since 1991, the RSFQ idea has been adopted by several groups in the
United States and other countries, and its development has started to progress rapidly.
At the last Applied Superconductivity Conference in Virginia Beach, VA (September
2000) almost 100 papers on various aspects of RSFQ technology were presented by
more than 15 groups from all over the globe. However, to our knowledge, no RSFQ-
based system has been commercialized by the moment of this writing.

3. RSFQ Devices

Since the variety of RSFQ devices developed by now is quite large, for this review we
have picked up a limited subset which is being currently used in our current FLUX
project.51 The goal of this project is the demonstration of the first RSFQ general-
purpose microprocessor; as a result, the subset is quite representative.
Though nominal values of circuit parameters may be readily estimated from
the similarity between any particular closed loop inside an RSFQ circuit and the simple
SQUID (Fig. 3a), quality design requires their intensive numerical optimization (see
Sec. 4.7 below) using a specific design criterion. The parameters cited below are the
result of such optimization for the broadest noise margins when operating at a relatively
modest clock frequency fc = (75x0) '• Other optimization criteria, more relevant to
different design goals, may lead to somewhat different parameter values and even
different schematics. Description of some other RSFQ devices can be found in a Web-
browsable library52 and original literature.
All RSFQ devices may be divided into 2 groups:
- asynchronous components with no internal memory, which generate an
output SFQ pulse immediately upon the arrival of an input pulse, and
- synchronous (clocked) devices with internal memory, where the generation of
an output pulse may be delayed substantially after the arrival of data SFQ pulse(s), until
the arrival of one more SFQ pulse playing the role of the clock.
270 P. Bunyk, K. Likharev & D. Zinoviev

3.1. Asynchronous components

3.1.1. Josephson transmission line

Figure 5 shows a few segments of the simplest active RSFQ component, the Josephson
Transmission Line (JTL), which had been repeatedly discussed in literature long before
the full RSFQ family was suggested in 1985.

(a)

(b)

\ .

Fig. 5. Josephson transmission line: (a) schematic of a 4-stage fragment and (b) typical layout of a 2 stage
fragment. Nominal parameters: Ic\ = lei -• • •= Ic = 250 (lA, 1\~ h ~- /DC =175 uA, Li = £3 =.. .= £ = 4.0
pH (Lie " 0.5 4>o). For afinite-lengthJTLfragments,the edge inductances (£1 and £5) are half of the internal
inductance £. Figure (b) courtesy of HYPRES, Inc.

In the initial state of the line, the equal dc supply currents I\ = h= .. .= /DC « 0.7 Ic,
feed Josephson junctions creating, in accordance with Eq. (6), equal dc phase drops §\ —
§2= .. .*= arcsin (Ivc^c) "• ^3 across each junction. Now, let an SFQ pulse propagate
from the left to the right in Fig. 5 a. There are two alternative but equivalent languages
which describe dynamics of the JTL (or any other RSFQ device).
In the phase-current language, the propagating SFQ pulse from input (A) to output
(B) induces a 2it leap of the Josephson phase difference <>| (similar to those discussed
above in the context of the simple SQUID loop) across one junction of the line at a
RSFQ Technology: Physics and Devices 271

time. For brevity, we say that the corresponding junction "switches", although the
voltage across the junction vanishes both before and after this brief event - see Eq. (5).
Let us start from the moment when junction J\ switches. For the loop comprising
that junction, inductance L2 and junction J2, this event is equivalent to the insertion of
additional external flux A4> =• O0.53 This increase causes an immediate increase of the
current in the loop by M = <&o/(I/i + L2 + LJI\ where Lj are effective inductances of the
Josephson junctions - see Eq. (7). In the JTL, the segment inductances L2 = L-$ =...= L
are made low: LIC < ®o (non-quantizing loops). Thus the new value of current through
J2, which is the sum of AI and the dc current Ipc « 0.7 Ic, exceeds Ic. As a result, this
junction switches just like its predecessor, with a delay %D = 4% Now (j>2 =rc/3+ 2it, i.e.
the difference between <|)i and <j>2 is small again, while the difference between <j>2 and $3 is
close to 2JC. For the loop J\-L\-J2 this means a reduction of the effective external flux by
<&o, with all the junction currents below Ic, and the cell becomes dormant, but for the
next loop (J2-Li-Ji) it means the increase of flux by 4>0, and the beginning of the similar
switching process in J3.
We see that the extra flux quantum propagates along the JTL, with a delay of \D *
4to per cell, as if it were crossing Josephson junction by junction. The latter description
corresponds to the magnetic flux language of description of RSFQ circuits; below we
will use whichever language is more convenient in the particular situation.
Notice that the amplitude of the SFQ pulse is the same on each junction, despite the
dissipation of energy AE ~ Ic&0 (21) during each switching event. (The necessary
energy is picked up from the dc power supply providing dc currents Ilt I2,...). This
recovery/amplification of the SFQ pulse in the JTL (and all other RSFQ devices) is due
to the fundamental quantization of flux and hence of the SFQ pulse area - see Eq. (24).
However, this quantization leaves free the current scale of the SFQ pulse (and hence its
impedance and energy scales) which may be regulated by the choice of Ic (at fixed LIC
product and Ivcllc ratio).54 The choice of Ic indicated in Fig. 5 corresponds to the
standard I/O impedance in the FLUX microprocessor.
3.1.2. Splitter and merger

Two straightforward generalizations of the JTL are the SFQ pulse splitter (Fig. 6) and
merger (Fig. 7). These devices are necessary in particular to complement RSFQ gates
with their limited fan-in and fan-out - see Sec. 3.3. (In future, the fan-out may be
increased by incorporation of these components in the gates; this goal seems feasible,
but still has to be achieved.)
In the splitter, switching of junction Jx causes the O0-increase of effective flux
applied to both branches L4-Jt and L5-J2- The resulting current increase AI adds to the
dc bias current and exceeds the critical currents of Jx and J2. As a result, both these
junctions switch virtually simultaneously, supplying SFQ pulses to the correspondent
outputs (g, and Q2).
In the reciprocal, merger circuit (Fig. 7) an input pulse (arriving, say from input A),
switches junction J4, first causing the O0-increase of the external flux applied to the
non-quantizing four-junction loop J0-Ji-J2-Ji. Injunctions J0 and J{ the resulting current
increase AI adds to the dc bias current Ixx (in J2, AI and 7DC have opposite directions)
272 P. Bunyk, K. Likharev & D. Zinoviev

and exceeds the critical current of J\. (Although Io = Icl, the former junction is
inductively shunted by the input circuit connected to B.) J\ switches, applying an
additional flux to the loop consisting of J3, Lz, L\, and a parallel connection of two
junction pairs JrJx and J2-J4, and hence increasing current through J 3 (which in fact
starts switching immediately after J4). This increment accelerates the switching of J3,
which forms the output SFQ pulse at Q.
If the input pulse arrives at B, the device dynamics is similar. If two SFQ pulses
arrive at A and B almost simultaneously, either one or two of them pass to the circuit
output. In the latter case, one of these pulses can be readily decimated with a special
circuit at the input of the next latch. With this addition, the device shown in Fig. 7 may
be used as an asynchronous OR gate.56

(a)

Fig. 6. SFQ pulse splitter ("fork"). Nominal Fig. 7. SFQ pulse merger (asynchronous OR
parameters: fa, = 250 UA, fa = fa = 163 uA, gate). Nominal parameters: fa = fa = fa = 144
7o = 400 uA, I , = 1.98 pH, L2 = 1.68 pH; U = HA, fa =- fa = 150 pA, h = 313 UA, £0 = L, =
0.84pH;I 4 = Z.5= 0.79 pH. Li = 1.97 pH; £2 = 0.53 pH, L2 = 4.47 pH.

3.1.3. Transmitter and receiver for passive transmission lines

Figures 8 and 9 show two components of a transceiver for passive superconductor


transmission lines (cf. Fig. 1). Both circuits are essentially short JTL segments with
Josephson junction critical currents changing along the line. This provides a reasonable
matching of the effective circuit I/O impedance for this particular RSFQ device family,
R = RJ2 = 1.9(2, (where Ru a Oo/2itVu = 3.8 O) to a higher impedance Z = 4.6Q of
the transmission lines. This transformation allows narrower lines and thus considerable
chip real estate savings. (A small series resistor Rx in the receiver prevents undesirable
dc interaction between the transmitter and receiver: without the resistor, a supercurrent
could flow along the superconductor strip.)

3.1.4. Asynchronous component characterization

Table 2 gives a summary of the most important time parameters of the components
described above, listed in units of t 0 - see Eq. (17).
Parameter xD is the full time delay of the SFQ pulse measured, e.g., from the instant
of maximum of current through the input terminal to that in the output terminal.
Parameter a, which is defined as
RSFQ Technology: Physics and Devices 273

a2=I;/a2(aVa/a)2 (25)

characterizes the sensitivity of the delay zD to random, independent variations of critical


currents Ia of all Josephson junctions of the circuit. (We will need it for the discussion
of junction parameter spread effects in Sec. 3.3.)

(a) 0>)

Jl J2
Fig. 8. SFQ pulse transmitter. Nominal Fig. 9. SFQ pulse receiver. Nominal parameters: Ic\
parameters: fa = 175 uA, fa = 125 HA, /i = 212 = 125 uA (besides the explicitly shown resistors, this
uA, £i = 1.97 pH, Li = 1.58 pH; U= 3.95 pH, U = junction is unshunted), fa = 175 uA, U =212 uA,
0.66 pH. U = 1.79 pH, i 2 = 3.95 pH; X3= 1.58 pH, U = 1.97
pH,/?,= 0.71 a,R2 = 7.4 Q.

Table 2. Speed and jitter parameters of the key asynchronous RSFQ devices (in units of To). Merger
parameters are for the case of well separated input pulses; if the pulses are simultaneous the delay is lower.

Component Pulse delay t D Delay sensitivity a Jitter &


JTL (1 stage) 4.0 5.9 0.065
Splitter 9.5 7.0 0.11
Merger 9.0 9.2 0.17
Transmitter 3.7 3.2 0.09
Receiver 6.0 7.6 0.11

The last of the listed parameters is the r.m.s. fluctuation ("jitter") 8/of the time delay
due to the thermal fluctuations. It has been calculated for the usual operation
temperature T = 4.2 K, using a method similar but somewhat more precise than that
discussed in detail in Refs. 55, 56. (In fact, these methods are strictly valid for Pc « 1,
while for practical circuits with f3c = 1 they give a slightly exaggerated jitter estimate.)

3.2. Latches

In contract to the asynchronous devices working very similar to the usual combinational
logic in semiconductor digital electronics, clocked devices have internal memory and
should be formally treated as finite state machines.57 We will start the review of these
devices from the simplest species, latches (flip-flops).
274 P. Bunyk, K. Likharev & D. Zinoviev

3.2.1. Dflip-flop

Figure 10 shows the simplest RSFQ latch, the D flip-flop, built around a
quantizing loop JrL2-LrJ2 which may be in either of two equalibrated flux states, "0"
and "1". Since L2 > L3, in the initial state "0" the dc supply current I0 flows to the
ground mostly through junction J2, creating a sub-critical Josephson phase drop §2 " 7t/3
across this junction, while the phase drop across J\ is small. This is why when an SFQ
pulse arrives from the data terminal D (passing through a buffer stage L7-J5-L5 which is
similar to one JTL stage and provides impedance matching) it switches junction J2,
inserting the extra external flux A<l> = O0 into the quantizing loop. However, since the
loop inductance is large (X/ci == <&o), the resulting clockwise current A/ ~
^<J{LJ\+L2+L-i+LJ1) ~ &0/L1, is insufficient to bring the total current through J\ above its
critical current value Ia . Hence the circuit is now stuck in its another stationary flux
state " 1 " . In this state the persistent current A/ circulates in the quantizing loop
clockwise; in J2 it subtracts from the initial dc bias current making this junction almost
unbiased (<> | 2 « n/2). On the contrary, in J\ the persistent current adds up to the initial
dc bias, creating a subcritical phase drop <j>i = Jt/3. This is why when an SFQ pulse
arrives at the clock input C (via a buffer stage L6-J6-LA), it switches junction Jx rather
than J2. As a result of this switching, an output SFQ pulse is formed across junction Ju
while the flip-flop returns to its initial flux state "0". The buffer stage L}-J0-Lo passes the
generated SFQ pulse to the output terminal Q.
As a magnetic language bottom line, a flux quantum incident from input D enters
the quantizing loop and is stored there, until it is released with the clock pulse and is
free to propagate to the next RSFQ circuit.

Fig. 10. D flip-flop. Nominal parameters: lea - 276 Fig. 11. D2flip-flop.Nominal parameters: let —la
HA, let = 268 uA, Ici = 269 |iA, / c 5 = / « = 250 (lA, I0 = la = Ic5 =/c7 = la = Io>~ 250 uA, let = kf, =
= 250 liA, h = 128 uA, h = 175 uA, h = 220 liA, L„ 259 \nA, h = h = 161 HA, h = 296 (iA, £0 = £5 =
= U = hi = 1.97 pH, £1 = 3.08 pH, £ 2 = 6.58 pH, £ 3 L9 = 1.97pH, £,=£io= 1.31 pH,£ 2 = /.n = 1.47
= 1.32 pH, Z.5 = 0.66 pH. Here and below, line pH, U = 1.71 pH, £7= 2.52 pH.
indicates a large (quantizing) inductance.

Notice that the clock pulse, generated by the buffer junction J6, is applied to
junctions J3 and J\ in series. At the "regular" operation cycle described above, 73 is dc
biased less than Jt and is not switched. However, if the clock pulse arrives when the
flip-flop is in its "0" state (which happens if no data pulse arrives between two
sequential clock pulses), J\ is only weakly biased, and the clock switches junction J 3
RSFQ Technology: Physics and Devices 275

instead, without any consequence for the quantizing loop. In the magnetic language, the
clock pulse flux quantum drops out of the circuit across junction Jx.

3.2.2. D2 flip-flop

Figure 11 shows a more complex latch called D2 flip-flop.58 This device still has only
one quantizing loop formed by junction J$, inductances L6 and L7 in series, and a
parallel connection of two junction pairs, J4 and J\, and J6 and J9. This loop is
quantizing and may reside in one of two flux states, "0" and "1". Like in the D flip-
flop, in the initial "0" state the input junction J 5 is sub-critically biased, so that the SFQ
pulse arriving from data input A switches this junction, and the whole quantizing loop,
into the opposite flux state "1". In this state, branches JrJ\ and J6-J9 carry sub-critical
currents (in Fig. 11, directed up for junctions J4 and Jl} and down for junctions J6 and
J9). Junction J3 preserves the quantizing loop from the effect of a possible second data
pulse during the same clock cycle, like J3 does with the "extra" clock pulse in the D
flip-flop described above (Fig. 10).
If now a clock SFQ pulse arrives, e.g., from input T0, it passes the buffer stage L0-Jo-
Lx and then switches junction J\. (If the flip-flop were in state "0", junction J6 would be
unbiased, and J2 would be switched instead, dropping the input flux quantum out of the
circuit.) As a result of this switching, an SFQ pulse is sent to output terminal Q0, and the
effective flux applied to the non-quantizing loop is increased by <&0. Simultaneously,
current through junction J6 increases beyond IC6, and it is switched, completing the
transient process. (Notice that the whole process of the four-junction loop switching is
similar to that in the merger which was discussed in Sec. 3.1.2 above - see Fig. 7.)
If the clock pulse arrives from T! rather than from T0, the transient process is
similar, leading to the sequential switching of junctions J8 and J9, with the output SFQ
pulse sent to Q\ and the quantizing loop returned to its initial state "0". Thus the D2 flip-
flop sends the trapped data (if it has arrived after the previous clock pulse) to the output
corresponding to the clock input.

3.2.3. TxRSflip-flop

This device (Fig. 12) is one more truncation of the B flip-flop.60 Its quantizing loop is
formed by inductances L7 and is and branches of two non-quantizing four-junction
loops {J\-J(, in parallel with Jg-Ju, and J3-J7 in parallel with J9-J\4). Each loop operates
like that in the merger (Fig. 7) or the D1 flip-flop (Fig. 11). For example, if an SFQ
pulse arrives from terminal Si, the right non-quantizing loop lets a flux quantum into the
quantizing loop by switching junctions J3 and then J9. The reset may be achieved by
applying a pulse to input S0; this leads to the flux quantum extraction through the left
non-quantizing loop by successive switching of J] andJg. (The former switching also
leads to formation of an SFQ pulse at the destructive output terminal QD.)
Another way to switch this flip-flop is to feed its toggle input T with a pulse; in this
case the quantizing loop always switches to the state opposite to the one it had before
the pulse. Each other toggle leads to generation of an output pulse at output Q.
A slight modification of the device (the addition of one more output) turns it to a so-
called T, flip-flop which may be quite useful for application in decimation filters61 and
276 P. Bunyk, K. Likharev & D. Zinoviev

as the main component of a single-bit full adder60. In the FLUX project we, however,
use the TxRS flip-flop only as a basic cell of the clock controller (similar to that
described in Ref. 62, but with the additional option of running a pre-set number of
cycles).

Fig. 12. TxRSflip-flop.Nominal parameters: Ico = 264 uA, Ia = 228 uA, ICz = 250 nA, la = 216 uA, 1& =
375 pA, fo = 266 (jA, fa = 210 (LA, /c7= 126 ytA, Ia = 276 uA, / CT = 146 pA, /Cio = 280 \sA, Icu = Zeis =
250 uA, ICn = 226 jiA, / c , 4 = 229 uA, 7Ci5 = 280 uA, h = 219 uA, /| = 156 uA, 72 = 293 pA, 73 = 130 uA, 74
= h = 125 (lA, 75 = 340 pA, 77 = 293 (lA, i 0 = i 3 = i6= i 9 = Lu = 1.98 pH, L, = 1.24 pH, L2 = 5.28 pH, L4
= 5.97 pH, Z.5 = 1.76 pH, I 7 = 3.16 pH, Ls = 1.13 pH, £,o= 2.43 pH, I „ = 1.08 pH, I, 2 = 1.10 pH, Lu =
1.13 pH.

3.2.4. NDRO memory cell

Any latch is essentially a single-bit memory cell, but in the flip-flops considered
above the information readout is always destructive. Figure 13 shows a more complex
cell enabling non-destructive readout (NDRO) of the stored bit.63 The cell is built
around the quantizing loop J2-L2-L3-J9 which is switched from "0" to " 1 " by a pulse
from input S£Tt and switched back by a pulse from SET0, similarly to the D flip-flop
discussed above. (The latter process yields an output pulse at the auxiliary output Q0.)
The NDRO readout is enabled by an additional circuit including a series connection
of two additional Josephson junctions J 3 and J7 which is "nested" on the quantizing
loop. If the loop is in state "0", the Josephson phase <(>„ in the nesting point (between
inductances L2 and L3) is small, and thus all the junctions of the string have small phase
drops and hence carry little supercurrent. (Additional dc current 75 makes the phase
drop across Jj slightly negative.) As a result, if an SFQ pulse arrives at terminal RD and
is applied to junctions J$ and J7 in series, the former junction is switched, and no output
signal is developed (as required from function READ 0). However, if the quantizing
loop has been switched into flux state "1", the Josephson phase drop across J9 is close
to about 2JI. AS a result, phase <))„ (which is close to mean of <t>2 and $9) is close to rc.
This phase drop is divided between junctions 73 and J7, so that the latter two junctions
are now sub-critically biased. As a result, the pulse from RD switches J 7 rather than J5,
developing an SFQ pulse at the NDRO output Q. The transient is completed by
RSFQ Technology: Physics and Devices 277

switching J3 in the opposite direction, so that the final value of <(>„ and hence the
quantizing loop state are not affected by the NDRO process.

Fig. 13. NDRO memory cell. Nominal


parameters: /co = 235 P-A, Ici = 291
uA, la = 259 uA; la = 126 |JA, h =
350 uA, Ic6 = 318 uA, la = 235 (tA,
la, = 303 uA, Io = 375 uA, /o = 134
uA, /i = 129 uA, /2 = h = 127 uA, h =
140 uA, /4 = 251 uA, h = 163 (lA, io =
o-~~ 1.52 pH, U = i 5 = L-, = i,o = 1.98 pH,
£2 = 2.68 pH, U = 1.68 pH, U = 4.31
pH, £6 = 1 08 pH, U = 2.65 pH, L9 =

3.3. Clocked gates

3.3.1. Standard RSFQ logic

Before discussing particular logic gates, a signaling protocol in RSFQ circuits should be
clearly defined. It should differ from that in the ordinary combinational logic accepted
in semiconductor electronics, because of two inter-related factors:
- "return-to-zero" nature of SFQ pulses, and
- natural internal memory of quantizing SFQ loops.
Most RSFQ circuits implemented so far have been based on the standard RSFQ
protocol49 illustrated schematically in Fig. 14a.
(a) (b)
H "1" C IN
^CPA
->
02 A

D«A i"
D, I N - >
D2lN_^
AV
Vft ->
QOUT

Q
t-i
A clock period
toe
JL -> DNm-&

-»! ID. • >

time
Fig. 14. (a) The standard RSFQ protocol and (b) a typical clocked gate. Timing parameters shown in (a) are
discussed in detail in the text below.

In this system, a signal in a data line is treated as binary " 1 " if it carries an SFQ pulse
within the given clock period - see signal D\. On the contrary, the absence of the pulse
during this time interval (see signal Z>2) is understood as binary "0". More generally,
any RSFQ circuit using the orthodox protocol may be considered as a connection of
asynchronous components and clocked gates ("elementary cells"8 or "logic latches")-64
278 P. Bunyk, K. Likharev & D. Zinoviev

Such a gate has a few (typically two) internal states and functionally may be considered
as an explicit or implicit integration of combinational logic and a latch. Input SFQ
pulses change the state of the latch which stores this information until the arrival of the
clock pulse. This pulse triggers output signal(s) and resets the cell into its initial state.
For example, within this protocol, the D flip-flop (see Sec. 3.2.1 above) may be called a
"clocked YES" gate. Let us discuss the implementation of a few other logic functions.

3.3.2. Inverter

The RSFQ inverter8,69 (Fig. 15) is built around a quantizing loop J2-L3-J3-L2 which may
be set ("0"—>"1" switched) by a data pulse arriving from D, very much as in the devices
described above. However, the quantizing loop is now not directly connected to the
common ground, but is separated from it by an additional junction (7|). As a result,
when a clock pulse arrives from terminal C, it is applied to J2 and Jt in series. If the
quantizing loop is in its flux state " 1 " , junction J2 is sub-critically biased, and the clock
switches it, producing an SFQ pulse across it, but no pulse at the gate output Q. Such an
output pulse only appears if by the arrival of the clock pulse the quantizing loop was in
its "0" state, i.e., if no data pulse(s) has arrived at the device input since the previous
clock pulse (which has reset the loop into the "0" state).

10 Jl 10
Fig. 15. Clocked inverter. Nominal
parameters: ha = 295 |iA, /n = 268
MA, In = 235 uA, In = 141 uA, la
= 129 uA, la = 269 pA /«, = 248
HA, In = 146 |0A, In = 125 uA, /, =
130 |lA, h = 191 |iA, I3 = 126 ilA,
h = 215 uA, L„ = L7 = L% = 1.97
pH, Lx = 3.08 pH, Li = 0.58 pH, i 3
= 6.34 pH, U = 1.66 pH, i 5 = 1.63
pH, L6 = 0.66 pH.

3.3.3. XOR gate

Figure 16 shows a clocked XOR gate8'70'71. This device is very much similar to the
merger (Fig. 7), but the main loop (including junctions J6 and J5, inductance Z,7, and two
parallel branches L6-Ji-L4-L2 and L\\-J\srLu-J9) is now quantizing. If the loop has been
reset to its initial state "0", junctions J2 and J9 are both sub-critically biased, so that an
SFQ pulse incident from either input A or input B switches the loop to the opposite flux
state " 1 " (e.g., in the case of pulse from A, by switching sequentially junctions J2 and Jl0
exactly as was described in Sec. 3.1.2.) This switching adds the persistent current A/ to
dc current flowing through junction J6, and biases it sub-critically. If only one data pulse
has arrived until the arrival of the clock pulse C, the latter pulse switches J6 rather than
J 4 , forming an output SFQ pulse at terminal Q.
However, if both pulses A and B arrive before C, the second of these pulses
increases the persistent current in the quantizing loop beyond the critical current of
junction J5, and this junction switches, letting the extra flux out of the quantizing loop.
As a result, the gate returns to state "0", with junction J6 virtually unbiased, so that the
clock pulse switches J 4 rather than J6, and no output pulse is formed at output Q. The
RSFQ Technology: Physics and Devices 279

same happens if no data pulses have arrived during the clock period, so that the full
truth table of XOR function is faithfully implemented.

7
LOC

Jl 10 J2 II

LI C

J8 14 19 15

Fig. 16. Clocked XOR. Nominal parameters:/CD = /CI = Ia-ta~ 250 UA,/C2 = /CT = 27I (xA,/c3 = /cio =
293 uA, / C 4=210(iA,/c5 = 214uA,/c6=255uA, h = I* = 206 uA, /, = / 5 = 119 \iA,li = 255 uA, h =
236 uA, La = L2 = Lm = i 12 =198 pH, U = 0.29 pH, i 3 = La = 3.98 pH, i 4 = £ M = 1.43 pH, £ 5 = 1.47 pH,
L6 = Li i = 4.23 pH, L7 = 0.53 pH, i 8 = 4.02 pH, L, = 1.33 pH.

3.3.4. ^A© gate

Finally, Figure 17 shows a clocked AND gate.70 Its inputs are fed into two similar D
flip-flops (cf. Fig. 10) that operate exactly as was described in Sec. 3.2 above. Clock
pulse passes through the buffer stage Ln-J$-L(, and is applied simultaneously to output
junctions of both D flip-flops. If both data bits A and B have arrived by that time,
junctions Jx and Ju switch simultaneously and provide large enough current to force
switching of junction 78 and the formation of the output pulse first across that junction
and then, after passing the buffer stage Lg-L9-J9-Lw, at output terminal Q.
On the other hand, if only one input pulse (say, A) has arrived before the clock, the
additional current through J8 is not sufficient for switching of that junction, and J 4 is
switched instead, letting the additional flux quantum out of the circuit, with no output
pulse formation. Finally, if no data pulse has arrived during the given clock period,
neither of junctions Ju Jn is switched (J3 and Je are switched instead by the clock
pulse), and no output pulse is formed either.

3.3.5. Clocked component characteristics

As evident from Fig. 14, clocked flip-flops and logic gates cannot be fully characterized
(as asynchronous components) by just the delay tD between the input (in this case clock)
pulse and the output pulse; at least three more time constants have to be included:
- the minimum value of interval zDC between the last of the data pulses (D„ in
Fig. 14a) and the next clock pulse, at which the device operates correctly,
280 P. Bunyk, K. Likharev & D. Zinoviev

- the minimum value of the interval zCD between the clock pulse and the first
data pulse (Dj in Fig. 14a), and
- the minimum value(s) of the interval(s) %DD between the data pulses.
(Notice that in all single-input-bit gates, such as inverter, the data-to-data interval is
not defined. Moreover, some two-input-bit gates, like the AND and XOR discussed
above, may operate at an arbitrary data-to-data interval IDD- m these cases, in all the
forthcoming formulas (XDD )min should be set to 0.)

^ TIT | 717 ^

Fig. 17. Clocked AND. Nominal parameters: Ia> = k\ l = 199 uA, lCi = /cu = 266 uA, Ia = lew = 234 uA;
la = /C6 = 231 MA, ICA = /c7 = /CS = 190 (iA, /C5 = / « = 250 pA, /0 = 73 = 171 uA, /, = 278 (lA, h = 238
HA, lo = 1)3 = 1.37 pH, U = i n = 1.00 pH, i 2 = Lis = 4.34 pH, I 3 = U = I , , = 0.72 pH, i 3 . = I 1 2 = 0.99
oH. U = 2.03 pH. Ls = 0.62 oH. £, = 0.69 oH. £« = 2.25 oH. £o = 0.80 DH. Lm = 1.88 DH.

Table 3 shows the timing parameters for the clocked devices discussed above. Since
the sum XCD + IDD + XDC defines the clock period, its minimum value determines the
maximum clock frequency
K U = l / ( t c O + *DD + tDc)min (26)

of the gate in the absence of fluctuations.

Table 3. Basic speed parameters of RSFQ clocked devices (flip-flops and logic gates) in units of to- Notice
that in some gates including AND and XOR, the data-to-data interval XDD may be arbitrarily small, while in
single-bit circuits like inverter and simple flip-flops, this interval is not defined. For NDRO cell, this
parameter depends on the data signal order; the first value is for SET preceding RESET.

Device to a & (Toc)mia (Tawmui (tzM>)min

D flip-flop 11.0 11.5 0.11 10 11 n/a


2
D flip-flop 14.1 12.0 0.15 14 14 0
NDRO cell 14.5 15.3 0.12 26 38 17/26
Inverter 16.0 15.1 0.19 12 17 n/a
XOR 14.4 13.0 0.14 12 12 0
AND 22.7 14.0 0.145 0 27 0
RSFQ Technology: Physics and Devices 281

The main effect of thermal fluctuations Ift) is the finite probability/?$ of "decision"
errors occurring in the moment of the cell of SFQ pulse arrival, in addition to the
"storage" errors which occur during the passive waiting time.55 (The latter errors are
characterized by a rate T - see Eq. (21) and its discussion).
A unified description of both types of errors may be achieved in terms of
degradation of RSFQ device noise margins.55 Even if fluctuation-induced errors are
negligible, an RSFQ cell operates correctly only within a certain range of each
parameter, including notably the Josephson junction critical currents which are the most
sensitive parameters in RSFQ circuit fabrication technologies. In the device set
described in this paper, the noise margins are at least ±35% if each Ic is varied
individually and about ±30% if all critical currents are changed simultaneously and
proportionally.72 As the clock frequency approaches a certain maximum value, the
margins shrink - see solid lines in Fig. 18.

('c)max
clock frequency
Fig. 18. Typical operation window of an RSFQ circuit (schematically). Solid lines: boundaries in the absence
of fluctuations; dotted lines: levels of a fixed bit error rate. Dashed arrows explain the definition of the
maximal clock frequency for a particular choice of/Q (and other parameters) in the presence of fluctuations.

For a typical RSFQ device and a given bit error rate, storage errors decrease the
parameter margins region slightly from one side, while decision errors cause a
considerably larger degradation of the operation region from another side (Fig. 18). For
fc S (f^naJl, and a small deviation M of Ic from the deterministic boundary of the
operation region, the decision error probability does not depend of clock frequency and
may be described by the Gaussian statistics:

dpldlc = (2rt)"2(87z))-1 exp {-(Mfl2(oID)2}, (27a)


i.e.
p = (l/2)[ 1 - erf (A/A/2 8/0)], (27b)

with afrequency-independentr.m.s. fluctuation 8ID. An analytical theory for the gray


zone width 5ID due to thermal and quantum fluctuations has been developed73 and
confirmed experimentally74,75 only for a simple circuit (a balanced comparator) which
may serve as a model of the RSFQ decision-making component. Nevertheless, this
model is in semi-quantitative agreement with results of numerical modeling76'77 and
experimental studies55'78"81 of various gates. It shows that for thermal fluctuations
282 P. Bunyk, K. Likharev & D. Zinoviev

6ID~Q2K/n)mITV2Icm, (27c)

where K is a dimensionless parameter depending on the SFQ pulse shape. (For the usual
RSFQ design style described in the above device examples, K = 0.2, but in the case of
necessity this parameter may be reduced to K^,, ~ fcxJ2 using additional shunting of
some Josephson junctions.)
For niobium-based implementations of RSFQ logic, the typical values of Ic are close
to 150 nA (see Figs. 5-17 above), so that for the operation at the liquid helium
temperature 7 = 4.2 K (IT =0.17 nA) and K « 0.2, 81D = 5 uA, so that in the middle of
the parameter range (AI/Ic = 35%) the decision error probability at low frequencies
given by Eq. (27) is reasonably low (p 2 1020). Nevertheless, any further decrease of
critical currents or increase of operation temperature makes the decision error rate
unacceptably high for most digital applications. (This explains, in particular, our choice
of the current unit /„ = 125 u\A which is essentially the lowest value of critical current
we use in our designs.) In particular, this effect excludes operation of RSFQ circuits
based on high-temperature superconductors at temperatures above -10 K - for details,
see Sec. 4.5 of Ref. 9.
The bit error rate grows as the clock frequency approaches its maximum
deterministic value fm^. In this case the decision error probability may be expressed as
sum of those due to violations of each of the critical intervals shown in Fig. 14a:

P=PDC+PCD+PDD = ^DC.DC.CC{\I2)[\ -erf(AT/V2 8 0 ] , (28)

where Ax, are the timing noise margins

At,- s x,- (x,)miB. (29)

Parameters S/f in Eq. (28) have the physical sense of the r.m.s. jitter of the time
intervals between the pulses arriving at the decision-making part of the RSFQ gate: last-
datum-to clock, clock-to-first-datum, and first-datum-to-last-datum, respectively.
Equation (28) justifies the special name "timing errors"55 for decision errors in this
region. If an RSFQ circuit should operate as fast as possible, timing errors become the
major factor limiting the circuit parameter margins.
With the usual requirement of a very low bit error rate, Eq. (28) can be simplified
using the well-known asymptotic expansion for the reciprocal error function: if/? = (1/2)
[1 - erf(x/\/2)], a n d / ? « 1, then

x ~xa (p) s {2 In [2nmp In (p')]-'} 1/2 . (30)

For a modest value p = 10"23 (corresponding to a 6-months average interval between


errors of a 100-thousand-gate chip), xa(p) = 10, while even a slight increase of this
number, to say xa(p) = 11 gives a much lower the error rate p « 2xl0-28 corresponding,
e.g., to a similar reliability of a large, 5-billion-gate computer, even without any circuit
redundancy.
RSFQ Technology: Physics and Devices 283

Noise margins At may be degraded further by random deviations of the real circuits
parameters (notably of critical currents Ic) from the optimal values, due to fabrication
process imperfections. In contrast to the fluctuations discussed above, these deviations
die do not change in time and rarely follow the Gaussian distribution (27a,b) exactly.
This distribution may be used, however, for a crude estimate of these effects. Assuming
that the critical current deviations are independent, and using the definition (25) of the
sensitivity parameter a, the r.m.s. (time-independent) variation of a circuit component's
time delay may be presented as

dt = adlt/la (3D

where dlc is the r.m.s. critical current spread. The maximum deviation on a chip with N
similar components, which needs to be fabricated with a yield of (1 - q), may be
estimated using Eq. (30): d ^ = xa(q/N)adIc/Ic- For example, if the desired fabrication
yield a is 80% (q = 0.2), for a 5,000-gate chip we get xJglN) = 4, while for a 5-million-
gate chip, xJ(qlN) ~ 5.
Assuming a similar rate/> for all three types of timing errors, we may now write die
following requirement for the noise margins;

AxDc>xJp/3)6tDc+x^q/N)aDCdIc/Ic, Axco >xJp/3)btCD+xJ,q/N)OcD^yic, (32a)


ATM, > xJp/3)8tDD+ x^q/N) O-DC MCHC, (32b)

so that the minimum clock period increases, in comparison with Eq. (26), by the sum of
the right hand parts of Eqs. (32). (Again, in all single-input-bit gates and some two-
input-bit gates, the data-to-data interval xDD may be arbitrary. In all these cases, pDD - 0
and Axoc = 0, so the restriction expressed by Eq. (32b) should be ignored, and factors
p/3 in Eq. (32a) replaced byp/2.)
Even for the imperfect present-day fabrication technologies, the relative r.m.s.
spread dIJIc is as low as 1 to 2%. For this case, Table 3 and Eq. (32) show that noise
margins are consumed more by the thermally-induced jitter (with xJp/2) = 10) than by
the fabrication spreads (with xJ^qlN) = 5). Another conclusion which might be made
from these data is that both the pulse jitter and fabrication-induced deviation introduced
by clocked gates are both much smaller than (Zcohaa and (Tcc)min, so that the clock
frequency decrease enforced by timing errors seems to be relatively small. In practical
circuits, however, a very substantial (and usually dominant) contribution to jitter is
provided by asynchronous circuit components, in particular the clock distribution
circuits. This issue will be discussed in Sec. 4.2 below.

3.4. I/O interface components

3.4.1. Input stage (DC/SFQ converter)

The single-junction circuit shown in Fig. 3a may serve as a rudimentary input stage
("DC/SFQ converter"), but its operation may be improved69 by using two additional
284 P. Bunyk, K. Likharev & D. Zinoviev

Josephson junctions - J0 and J2 in Fig. 19. Junction J0 allows the quantizing loop (L r
Ja-J\) to be reset to its initial state "0" when the input current IJj) is ramped down,
without disturbing junction J) (which is left exclusively for positive switching events,
A<)> > 0), while junction J2 together with inductors L,, L2 forms an output buffer stage,
bringing the output impedance to the nominal value used in this particular logic set.

3.4.2. Output stage (SFQ/DC converter)

When the ultrafast processing of digital information in an RSFQ circuit is completed,


the results may be transferred to the usual (non-return-to-zero) form using a "SFQ/DC
converter" shown in Fig. 20. It is based on a 2-junction quantizing loop (J3-L4-L5-J5),
similar to those used in other RSFQ flip-flops and logic gates. Input SFQ pulses, after
passing a buffer stage {L0-J0-L\-Lri), are fed into each arm of the loop and ensure
toggling of its flux state ("0"-»"l"—»"0"...) by every pulse. (This part of the converter is
essentially an RSFQ T flip-flop, the device similar to, but simpler than, the TxRS flip
flop described in Sec. 3.2.)82
(a) (b)
Iex(t)

LI L2
J
/\ SFQ
»x X*
T^ J3
/\

X' •o

Fig. 19. DC/SFQ converter. IC\ = la. = 125 Fig. 20. SFQ/DC converter. Ico = 212 jlA, /Ci = 288 uA, Ia
UA, la = 162 uA, 0 < /„</) < 250 (iA, h = = 156uA, 7c3=138uA,/ C 4=125nA,fo= 350^A;/ C 6 =
275 UA, £0 = 8.41 pH, L, = £2 = 1.32 pH, 163 MA, h = 106 uA, h = 150 (iA, h = 181 UA, U = 1.98
£3 =2,4 =1-97 pH. pH, U= L2 = 0.66 pH, £3 = 0.79 pH, L4 = 1.58 pH, £5 =
2.89 pH.

Two readout junctions J 4 and J6 are nested on the flip-flop very much like in the
NDRO cell (see Fig. 13 and its discussion in Sec. 3.2), but now the bias current I2 is
higher. As a result, when the flip-flop is in the state " 1 " with O = <I>o. and the Josephson
phase in the nesting point is close torc,I2 exceeds the maximum supercurrent which can
be transferred from the current injection point. Hence junctions J4 and J6 have to carry
part of the dc current in the form of normal current through their resistances if,
providing a finite output dc voltage Vmt«I2RI2. On the contrary, when the flip-flop is
in state "0", junctions J4 and J6 stay superconducting and

The resulting voltage signal, with a swing of several hundred microvolts, is sufficient
to be transferred from the cryostat using a copper cable and amplified to the standard
semiconductor transistor level by inexpensive room-temperature semiconductor
amplifiers. Such a simple output interface can have a bandwidth of at least 100 megabits
per second per channel.30 This rate may be increased to at least 1 gigabit per second,
and quite possibly to ~10 Gbps using an additional on-chip, Josephson-amplifier to a
RSFQ Technology: Physics and Devices 285

few-millivolt level. Such an amplifier may be based either on a latching (e.g., HUFFLE-
type) circuit30 or a non-latching, multi-junction output stage.

4. RSFQ Technology Development: Problems Real and Imaginary

4.1. Connectivity

There are at least 3 ways to connect RSFQ components:


- If the components may be laid out directly next to each other, they may be
connected directly (if designed properly, see below).
- If the distance between two components is not negligible, they may be
connected with active lines - JTLs (Fig. 5). These lines may be arbitrarily long, but
introduce considerable signal delay - about 4t 0 per stage - and substantial jitter (Table
2). For example, in the 1.75-um technology,33 the signal speed in these lines is about 10
um/ps. Besides that, these lines are relatively wide (for the just mentioned technology,
about 25 urn), introduce considerable jitter (see Table 1) and require substantial dc
current supply and as a rule three metallization levels (including the ground plane - see
Fig. 5).
- Due to the factors listed above, passive, superconducting transmission lines
(Fig. 1) should be used for all long-distance connections - practically, any connections
longer than the combined transceiver length (for the 1.75-um technology, about 50 um).
These lines feature much faster signal propagation speed,

v~c[d/(d+2X)E]m, (33)

where c is the speed of light, d and e are the insulation layer thickness and dielectric
constant, respectively. In typical cases, v is close to 100 um/ps, i.e. an order of
magnitude higher than in a JTL. Additional advantages of the passive line include the
virtual absence of jitter (besides that of transceiver circuits), relatively small width
(down to ~10 um in a 1.75-um technology), and lower metallization layer consumption
(down to two superconductor layers besides the line crossing points). The drawback of
these lines is the necessity to use transceivers (Fig. 8 and 9) which consume chip area of
the order of that of a typical gate and introduce additional latency. Our recent design
experience51 indicates the need for the integration of the transceivers with RSFQ gates,
the task that is certainly doable but still has to be completed.
Regardless of the interconnect choice, RSFQ components should be designed in a
way allowing their direct connection either to each other, or to transmission line
tranceivers, without parameter re-optimization. All the circuits presented in this paper,
which feature necessary I/O buffer stages, do satisfy this important condition.

4.2. Timing and jitter

For any 100-GHz-scale digital technology, clock distribution and other timing issues are
extremely important since even a-few-ps jitter in the data and/or clock distribution path
286 P. Bunyk, K. Likharev & D. Zinoviev

may lead to unacceptable bit error rate. This is why various clock distribution schemes
for RSFQ circuits have been discussed in detail in several papers.8,56'88 Requirements for
such schemes may be formulated using a very general sketch of an RSFQ circuit
fragment shown in Fig. 21.

D'
1
u ti I'M — •
C .-> >^
•H L" * 1 Tw — •
D"
n • r
Fig. 21. General scheme of timing of a two-bit RSFQ gate F, showing three racing loops (data-to-data, clock-
to-data, and data-to-next-clock).

Let the two data bits D' and D" be initially stored in clocked devices L' and L''.
(Each of them may be either just a latch or a logic gate). Clock pulses following with a
period^"1, originate (or are split) in some point C. After through a generally different
chains of asynchronous components (e.g., splitters) with time delays t'„ t"„ and t„
respectively, this pulse triggers the latches L', L", and finally gate F. Signals from the
latches generally also pass a few asynchronous components before landing at gate F. At
the ideal choice of time delays of each circuit, in the absence of fluctuations, these
delays should be related as (Fig. 14a)

2*=l..j?f "y + E j = i . . # T " / + (XDc)min =


£f=l.. Q ti ,
£«=1 ..Qtj + (tc/))min ~ £ j = l . . p t ' i + £/=l..M^'i +fc •> (34)
+
^i=\..pt'i ^i=l..M^'ii- ( t z » ) t n i n = £/=!.. R t"i + ^i=l..N^'

(Generally, these sums should include the input and output delays of the clocked
components, however, in the typical case when at least some of the asynchronous
component numbers N, M, P, Q, and R are large, those contributions are minor.) In this
case, the highest clockfrequencygiven by Eq. (26) is achieved.
However, in the real world, mutual jitter of signal and data pulses grow as they
propagate through the circuit components. Assuming that the thermal fluctuations in
Josephson junctions of the circuit are independent (as estimates show they should be),
for the full r.m.s. jitters we get

(8tDCf - Z M .j, (ST,)2 + E w ..* (&") 2 + £ W .. M (5T,) 2 ,


(8tCDf = I w .. e (5f-) 2 + £«..,>(5T}) 2 + Z w „ Q m 2 - (35)
2 2 2
( 5 W = £/=i.. p (St 'if + Z w .. u (St 'd + 2w. je (8?' ',) + £,=,..* (St')?
RSFQ Technology: Physics and Devices 287

We see that the jitter values scale crudely as (R+N+Q)187, (P+AfrR+N)mSt, and
(R+M+Q)m8t, respectively, where 8/ ~ 0.1t0 is the average jitter of an typical
asynchronous component - see Table 2. (These estimates are exact if all the
asynchronous stages are similar.)
A similar analysis is valid for the time-independent random variations of timing
intervals, introduced by imperfect fabrication - see Sec. 3.3 above. It means OCD, Otoe
(and possibly <XM>) participating in Eq. (32) should be calculated as

(OLDC)2 = 2 W j , (a",) 2 + I w . . „ ( a " ) 2 + I w . . Q (a,)2,


(eta,) = I M . . g (a,)2 + I M . J . (a',) 2 + IM-A^CC'*)2,
2
(36)
(aDD? = £,=,.. P (a ;)2 + Z„.. u (a ',)2 + ^M.J> ( a " )2 + EM.JV ( a ' ',)2,

and scale as (R+N+Q)ma, ((h-P+M)l/2a, and (P+M±R+Q)ma, respectively, where


a is the sensitivity parameter for each component, as defined by Eq. (25).
The jitter scaling imposes substantial restrictions on RSFQ circuit design. For
example, an attempt to implement timing of a 64-bit-wide array of RSFQ gates
performing a parallel calculation, by m - 64 sequential splitters of a master clock signal,
would lead to an r.m.s. jitter (relative to the clock source) about mm (8?)Spiitter = 8
(SOspiiner" 0.9x0- An almost similar jitter will accumulate in the 64 stages of logic - see
the 3"* column of Table 3, so that the net r.m.s. jitter will be close to 2% Multiplying it
by xjipll) - 10, we see from Eq. (32) that this factor alone adds as much as ~20to, i.e.,
more than 2 typical gate delays, to the clock period. Such performance degradation
may be unacceptable in many cases. This is why in our current project51 we use trees for
clock distribution, thus reducing the factor mm by the much smaller factor (log2 m)m in
the total r.m.s. jitter.
Quality design of RSFQ circuits requires to use Eqs. (32), (36) for a more exact
calculation of parameter margin degradation (and hence of the maximum clock
frequency) than the simple estimate given above. So far, to our knowledge, such a
calculation has only been carried out for only few circuits, including notably a pipelined
parallel fixed-point adder.56 The results show, for example, that the requirement of a 10"
bit error rate increases the minimum clock cycle from a noise free value of 22to to as
much as 46x0- An additional 1.5% spread of the Josephson junction currents (with a
requirement of a high, 99% circuit fabrication yield) increases the period by additional
8to, and thus reduces the maximum clock frequency to about 30 GHz for a 1.75-um
technology or 60 GHz for a 0.8-|im technology.
We believe the maximum clock frequency may be similar for all RSFQ-based fixed-
point and floating-point functional units. However, our preliminary estimates show51
that branch condition handling in general-purpose microprocessors may require
somewhat longer clock periods (up to 100x0) unless innovative architecture solutions,
taking into account the peculiarities of RSFQ logic, are used.
288 P. Bunyk, K. Likharev & D. Zinoviev

43. The "memoryproblem"

The NDRO memory cell shown in Fig. 13 allows very fast (a-few-picosecond) read and
write operations. However, this is rather bulky: it requires, with buffer stages, 10
Josephson junctions per bit; its layout in a 4-metal-layer technology (Fig. 2) takes a chip
area about 1,000 F 2 , where F is the minimum junction size. As a result, such cells are
quite suitable for logic registers, but impractical for even relatively small on-chip
memories (say, LI caches).
A much more practical solution for those memories is to use the compact, four-
Josephson-junction, "flux-transition" memory cells developed by NEC for latching
logic.89 The most important drawback of this memory is its relatively long access time,
limited by the double time of flight of signals along the word and bit lines through the
memory cell matrix. For a 3x3 mm2 matrix, this time is about 120 ps - much longer
than for the NDRO cell, but still much shorter than for SRAM-based semiconductor
cache memories.
The flux-transition memory may be adjusted for operation with RSFQ logic by
replacing:
- the readout circuit (dc SQUID) with a similar circuit using shunted Josephson
junctions,
- the ac-powered line drivers with dc-powered HUFFLE-type drivers, and
- the latching decode logic with a pipelined RSFQ decoder.32
Preliminary experiments90 show that with just 5 metallic layers the memory cell area
may be close to just ~ 200F2. Estimates show that with two more wiring levels the area
may be reduced to ~100F2. This means that with the modest 0.8-|im technology, 1
Mb/cm2 density is achievable, while the introduction of a deep-submicron fabrication
technology may give 16 Mb chips - for more on this, see Sec. 5.3 below.
These estimates show that the much exaggerated "superconductor memory problem"
(the fact that only very small memories have been implemented so far) has hardly
anything to do with physics or technology of superconductor integrated circuits: since
the early 1980s, funding for work in this direction has been practically unavailable in
the United States because of some strange twist of administrative wisdom.

4.4. Magneticfluxtrapping

Another frequently cited problem of superconductor electronics is flux quantum


trapping in superconducting ground plane. The physical reason for this effect is that
magnetic flux quanta may exist not only in superconducting loops (Sec. 2.4), but also in
continuous superconductors (especially thin superconducting films) where they take the
form of so-called Abrikosov vortices.1314 The vortex in a thin film is virtually axially-
symmetric, with axis perpendicular to the film plane. It can be imagined as a bundle of
magnetic field lines, with the total flux equal to 4>0 - see Eq. (12). The bundle's radius
is limited by persistent supercurrent circulating around it, to Xx ~ max f X, 2k2/t], where t
is the film thickness. For a typical Nb ground plane, Xx ~ 0.1 urn. The magnetic field
penetration to a continuous superconductor becomes possible because the flux-shielding
supercurrent increases toward the vortex center, eventually becoming so large that it
RSFQ Technology: Physics and Devices 289

suppresses the film superconductivity in a central spot with the radius close to the so-
called coherence distance £. For typical Nb ground plane films, % is somewhat smaller
than 0.1 |im.
The Abrikosov vortex has a positive self-energy

E*~{<bf ritB&d\sA\S\J§. (37)

In a typical Nb film, E0 is quite substantial, of the order of 10"17 joule, i.e. about 105 K
in temperature units. This is why thermally-induced self-nucleation of vortices at
temperatures below the critical temperature Tc is virtually impossible. However, if a
superconductor integrated circuit is being cooled from room temperature to T < Tc in a
substantial magnetic field B, unavoidable small variations of Tc of the ground plane film
cause superconductivity to arise in random spots first. Merging, these spots form
superconducting loops encircling magnetic flux lines and preventing their escape from
the film. As a result, even as temperature drops well below Tc, the film is left with
residual flux line in the form of quantized vortices with 2D density n ~ Z?/4>0, trapped on
intentional and occasional inhomogeneities of the film ("pinning centers"). If even one
vortex happens to sit too close to a loop of an RSFQ circuit, the magnetic field of the
vortex may offset the bias flux in the loop and disturb the circuit operation.
The problem may be solved by a combination of two measures. First, the external
magnetic field may be reduced to a few nanotesla using a simple system of degaussed
magnetic shields, thus decreasing the trapped vortex density to ~ 10 cm"2. Additionally,
holes are patterned in free areas of the ground plane, close to all RSFQ gates. (These
holes should not protrude under the superconductor interconnects, to avoid signal
propagation disturbances.) A near Abrikosov vortex is attracted by such a hole, and
tends to slip into it in the moment of its formation (T « Tc), especially if chip cooling
through the critical point is carried out slowly, at a rate of the order of a few K/s. Holes
in the form of "moats" surrounding each RSFQ gate or SFQ memory cell work best,92'93
but cutting out nearly all free parts of the ground plane also gives acceptable results.
The rule of thumb is to have at least one hole of size a » k± at a distance not more than
a few micrometers from each RSFQ circuit loop.94
For present-day RSFQ circuits, with their relatively low integration scale, the
described combination of methods works quite well. It remains to be seen whether the
currently accepted procedures are sufficient for degaussing of future VLSI RSFQ chips,
but the authors feel that, if necessary, each of the components of the procedure may be
improved considerably.

4.5. DC current recycling

A real problem awaiting a solution is the dc power current recycling. While the dc
current necessary for powering of a single RSFQ device is quite modest, of the order of
100 |iA per Josephson junction (see Fig. 5-17), the total current necessary for powering
a VLSI RSFQ circuit may be much higher than the value which can be comfortably
passed into a helium cryostat by simple copper leads (a few amperes per lead). Hence,
the dc current has to be "recycled", i.e. used for powering several fragments of the
290 P. Bunyk, K. Likharev & D. Zinoviev

circuit. For this purpose, the fragments should be connected in series for the dc current,
excluding the usual (galvanic) means of signal transfer between them.
Apparently, the problem may be solved by SFQ pulse transfer through
superconducting thin-film transformers. Such transformers, which are already used in
some SFQ circuits (see, e.g., Ref. 89, 90), may be formed by a couple of overlapped
short superconducting strips. The issues to be addressed in this way include possible
excitation of parasitic low-frequency oscillations in tank circuits formed by the
transformer inductances and large capacitances between the circuit fragments. Probably,
these resonances may be successfully damped by special thin film resistors.

4.6. I/O issues

As was discussed in Sec. 3.4, existing DC/SFQ and SFQ/DC converters, complemented
with superconductor drivers, may allow RSFQ circuits to be interfaced with
semiconductor electronics environment at frequencies up to -10 GHz, i.e. about the
highest frequency attainable in such an environment. The thermal load imposed on the
helium-level cryosystem by high-frequency I/O channels based on copper cables may be
quite modest (of the order of 100 [iW per channel), and may present a serious challenge
only for extremely-large-scale systems with millions of I/O channels.31
Several groups have reported successful experiments95"98 on optical interfacing
between superconductor chips and room-temperature devices. Unfortunately, for the
output channels this load includes considerable power dissipation in amplifiers which
are necessary to boost the signal energy from ~10"18 J/bit in RSFQ circuits to ~ 10"12
J/bit in optical channels. Input channels do not have this problem and may be quite
simple,97 but for systems with a comparable number of room temperature inputs and
outputs, it hardly makes sense to employ two different I/O technologies. As a result,
electric cables, possibly using high-temperature superconductor wires between helium
and nitrogen stages to reduce the thermal load, may be the best I/O option for RSFQ
systems.
Another important direction of recent work in superconductor electronics was the
development of fast communications channels between superconducting chips. Though
present-day experimental results are still in the range of a few GHz,99"103 analyses104"106
and the first experimental results107 show that the prospects are good for implementing
superconductor multichip modules (MCMs), using for example multi-flux-quantum104
or even single-flux-quantum105"107 pulses transmitted over superconducting microstrip
lines on silicon-based MCMs. In the former, more conservative, design the bandwidth
may reach ~30 Gbps per channel, while in the latter case it may be as high as that of the
on-chip RSFQ circuitry (~ 100 Gbps).

4.7. Design and testing tools

During the past decade, there was rapid progress in the development of software tools
for computer-aided design of RSFQ circuits. For example, our Stony Brook team has
developed such tools as Josephson junction circuit simulators PSCAN108'109, a circuit
RSFQ Technology: Physics and Devices 291

optimizer COWBOY109, and a quasi-2D inductance matrix calculator LMETER110 with


a back-annotator to PSCAN, called LM2SCH. Some other RSFQ groups have created
their own design tools. A useful review of these tools can be found in Ref. 111 and on
the Web.112 These tools are still insufficient for the VLSI RSFQ circuit design, in
particular good layout synthesis tool still have to be developed.
Testing tools also need additional development. As a typical example of the present
state-of-the-art, our group has developed an automated multi-channel circuit tester
OCTOPUX which can perform measurements of a diced chip at a rate of up to 300
kHz.113 The developed software support of this system allows relatively sophisticated
tests of RSFQ circuits; for example, statistics of parameter spreads and thermal noise
may be studied automatically with good accuracy using a special RSFQ circuit with
only a few contact pads."4 RSFQ circuit testing at multi-10-GHz frequencies may be
carried out using special RSFQ on-chip testers.115"117 (Because of the unique speed of
RSFQ devices, their testing by high-speed room-temperature equipment, which can only
be extended to a few GHz, is hardly worth the effort.) Still, tools for comprehensive
testing of RSFQ chips before wafer dicing have to be developed.

4.8. Submicron RSFQ technology

Future VLSI RSFQ technology requires deep-submicron (e.g., 0.3-um) Josephson


junctions which involve several new issues. Indeed, Eq. (22) shows that as A^ is
reduced to ~0.3x0.3 \ixn2,jc exceeds 100 kA/cm2. At this stage, junction physics and
RSFQ circuit scaling become somewhat different from those of a-few-micron junctions
described above, due to two factors.
(1) In tunnel Josephson junctions, intrinsic "normal" quasiparticle conductance GN
is proportional to Ic, so that the ICRN product is constant:

(/c*)max = «(77rc)A(7)/e, a(0)~l. (38)

(Within the classical BSC theory of direct tunneling between superconductors,13 a(0)=
TI/2, though in practical niobium-trilayer junctions this constant may be some 30%
lower, apparently due to "proximity effects" at the interface between niobium and the
unoxidized fraction of the aluminum layer.) Plugging Eqs. (22) and (38) into Eq. (18)
(with the experimental values of the specific capacitance listed in Table 1) we see that at
jc ^ 100 kA/cm2 the junctions become naturally overdamped: their intrinsic value of |3C
becomes less than 1, so that the junctions may be used in RSFQ circuits even without
any external shunting.1'8 As is evident from Fig. 5, this allows the circuit density to be
increased quite dramatically: according to estimates,121 by a factor of ~3 in terms of the
minimum junction area, if an adequate number (~8) of superconductor layers is used. In
this case the RSFQ IC density becomes comparable with that of CMOS circuits with the
same F and same functionality, while retaining much higher speed and simpler
fabrication technology.
(2) Experiments122,123 have shown that transport properties of niobium-trilayer
Josephson junctions withyc 5: 10 kA/cm2 differ considerably from those calculated from
the theory of direct tunneling.119 Until recently, it was feared that these deviations were
292 P. Bunyk, K. Likharev & D. Zinoviev

due to rare microshorts of the aluminum oxide layer (which is extremely thin, below 1
nm, in these high-^c junctions.) This could mean that the junctions were inherently
irreproducible. However, recent experiments123 have shown a reasonably small on-chip
spread of Ic for junctions withy'c as high as 210 kA/cm2. Moreover, in a very recent
work124 properties of these junction were quantitatively explained (Fig. 22) by the so-
called multiple-Andreev-reflection (MAR) theory of the Josephson effect125"127, with
account of a rather general statistical distribution147'148 of the electron mode
transparencies. This "MBSB" distribution may be interpreted as a result of resonant
tunneling via random localized electron states in a disordered aluminum oxide
barrier.124 If this conclusion is correct, it will mean that high-yc Josephson junctions may
be inherently very reproducible (with the r.m.s. critical current spreads below 1% even
for deep-submicron junctions), giving every hope for the possibility of high-yield
fabrication of VLSI RSFQ circuits.

. i i • i i i i 11 | i 11 i i i 1 1 1 1 11 • i i 11

3.5 - (a) 0.5x0.5 urn2 ^ A


3 _;
*—*, MAR + Dorokhov^ yj ;
> 2.5 '- MAR + MBUB^y ~
JE, 2 i- / i ^ ^ s \
12 1.5 L s^jf c
'a*a '-.
•c 1 \
\ y ^
0.5 r --^ -.
r
. . . . 1 . . . . 1 . . . . 1 . . . . ! . . . . 1 . .
: Fig. 22. DC I-V curves for two
0 . I • • i ( i l l i | l l l l | l l l l | l l l l | l l Josephson junction samples of
2
-_ different area, with the critical
3.5 r ( b ) 1x1 urn current suppressed by a magnetic
3 f j = 210 kA/cm2 -i field, in order to reveal the details of
c
quasiparticle transfer. Solid lines:
> 2.5 r (in both cases) p" -, experimental results141. Dashed lines:
E, 2 L ^ y - MAR theory using the "MBSB"
•z. • Sr^^*^
: distribution147"'48 of transparencies.
•d 1.5 r <^y -. Dotted lines: MAR theory using an
1 - --'^ -; alternative, "Dorokhov" distribution.
: ^C^^
0.5 r - -:; In the absence of the magnetic field,
the junctions exhibit critical current
:
0 . ... i .... i .... i .... i . . . . i . . , ,: whose temperature dependence is
0 0.5 1 1.5 2 2.5 also very well described by the MAR
theory (After Ref. 126.)
Voltage (mV)

Despite these advances and hopes, much research remains to be done in this field.
For example, there are still no RSFQ circuit simulators (like PSCAN) which would
adequately describe the specific dynamics of self-shunted, high-yc junctions. (The
dynamics is only semi-qualitatively described with Eq. (8) used in existing simulators.)
Also, the decision bit error rate for these junctions is determined by a combination of
quantum and shot noise rather by thermal fluctuations. In fact, Eqs. (17) and (38) show
that the time scale to for the self-shunted junctions (R = RN) approaches the fundamental
value
RSFQ Technology: Physics and Devices 293

( t o U = h/2a(0)A(0) « 0.2 h/kBTc, (39)

about 0.17 ps for niobium - see the last column of Table 1. (By the way, this brings the
highest operation frequency of a simple RSFQ device, digital frequency divider based
on the T flip flop, close to 800 GHz. This prediction was confirmed in recent
experiments130 where 770 GHz operation was demonstrated using 210-kA/cm2
junctions.) For the typical operation temperature T = 4.2K = 0.5r c this value of T0
brings the junctions beyond (though not too far from) the thermal-to-quantum
fluctuation crossover131

h/Xo = 2nkBTo. (40)

A good theoretical understanding of decision errors due to quantum fluctuations,73


confirmed experimentally,75 exists only within a simple "Resistively Shunted Junction"
(RSJ) model of Josephson junction dynamics described by Eq. (8) with constant R and
thermally-equilibrium fluctuation sources,18 which ignores the shot noise and the MAR
dynamic peculiarities. This theory indicates that the bit error rate in the quantum limit
may be fairly well estimated from that in the classical limit with the replacement T ->
(JCK/2)1/27,0, where T0 is the crossover temperature given by Eq. (40). Since for
unshunted niobium junctions at 4.2K the T(/T ratio is about 1.5, and the noise margin
degradation scales as T112 (see Eq. (27c)), the r.m.s. jitter due to quantum fluctuations in
unshunted high^'c junctions (K = 0.2) should be about 60% larger than that due to the
thermal fluctuations (see Sec. 3 and 4). For the best studied case of integer adder56 it
means that the maximum clock frequency of its operation with low bit error rate should
be about 100 GHz (instead of the 130 GHz which could be naively anticipated from the
to scaling). It may be expected that the quantitative analysis of fluctuations in high^c
junctions with MAR transport will give a close result.
Using this assumption, requirements for the high-y'c junction fabrication technology
reproducibility may be formulated. As follows from the data in Tables 2 and 3, for
nearly all RSFQ components the 67/oc ratio is close to 0.01 for thermal fluctuations at
4.2 K. According to Eq. (32) with xa(p/3)/xa(q/N) = 10/5 = 2, it means that the clock
frequency degradation due to the quantum fluctuations is crudely equivalent to
1.6x2x0.01 ~ 3% r.m.s. spread of the critical currents. For 0.3-um junctions this means
that in order to avoid additional speed degradation at high yield, junction linear size
should be reproduced with a 3a spread below about 15 nm. Such accuracy has been
already achieved in modern photolithography,1 although its applicability to Josephson
junction fabrication still has to be confirmed experimentally. (In experiments,141
junctions were defined by direct e-beam writing which is too slow for practical VLSI
fabrication.)
294 P. Bunyk, K. Likharev & D. Zinoviev

5. Future Prospects

5.1. Immediate opportunities

The main practical drawback of the niobium-based RSFQ circuits is the necessity of
cooling them to helium temperatures (4 to 5 K). Currently, closed-cycle refrigerators
for this temperature range are somewhat costly (~$30,000) and bulky (-100 kg), though
their inconvenience relative to other fluid-based refrigeration systems is frequently
exaggerated. Recent rapid progress in cryocooler technology indicates that the cost per
unit may be reduced to below ~$ 1,000 when they are produced in volume.132
Nevertheless, the necessity of deep refrigeration imposes hard conditions on RSFQ
technology applications to practical digital electronics. Crudely speaking, there is no
hope of using this technology for any application which may be implemented using the
mainstream, room-temperature CMOS ICs.
However, for several important military and commercial applications the necessity
of helium cooling of RSFQ circuits may be more than compensated by their
unparalleled speed, even if they are implemented using currently existing (or slightly
upgraded) fabrication technology. What follows is a very brief review of these
applications.

5.1.1. Analog-to-digital converters

The first successes in this area61'133,134 and analysis of possible improvements'35 allow
us to believe that there are good prospects for the implementation, within the next few
years, of unique RSFQ ADCs with, e.g., a 16-bit signal-to-noise ratio and 100-MHz
analog signal bandwidth. This is considerably better than what has been achieved with
the best semiconductor ADCs.136 Hopefully, this advantage will be sufficient for the
practical introduction of RSFQ ADCs in radar and wireless communication systems, in
particular in software defined radio.137

5.1.2. Digital-to-analog Converters

There are very good prospects for the extension of the recent progress in this area138"140
to develop in the next few years, for example, a multi-chip 20-bit DAC with settling
time below 1 u\s, accuracy better than 0.001 ppm, and output voltage approaching 1
volt. These converters may serve, in particular, as ac voltage calibrators in metrological
systems, with performance much higher than that of alternative devices,141 and at a
lower cost, since the RSFQ DAC may use a simple and cheap MHz-range rf reference
source rather than a complex picosecond pulse sequence synthesizer. This simplicity
may allow the DACs to compete with traditional Josephson standards of dc voltage,
which require expensive, high power sources of stable multi-GHz reference signals.
Other possible applications of RSFQ DACs include arbitrary waveform generation in
radars and secure communication systems.
RSFQ Technology: Physics and Devices 295

5.13. Digital SQUIDs

In contrast to present-day analog SQUIDs, with slew rate below ~ 106 flux quanta per
second,142 their digital counterparts143 will use superfast on-chip feedback providing
slew rates beyond 1010 <J>o/s. This feature may allow electronic subtraction of
interference and, as a result, operation of SQUIDs without external magnetic shields
which now are a major component of system cost. When implemented, digital SQUIDs
may rapidly replace their analog counterparts in most application areas, and help, in
particular, to move biomedical applications of these devices144 from research centers to
medical clinics.

5.1.4. Digital autocorrelators

These devices can combine unprecedented bandwidth with very small size and power
consumption. The first 16-channel prototype of such a correlator has already been
tested.117 However, the number of channels of such systems still has to be increased to
values of practical interest for radio astronomy and other applications (1,024 channels
and beyond). For this, current fabrication technology should be improved, at least
modestly, to allow a higher integration scale, at least ~100K Josephson junctions per
chip.

5.1.5. Pseudo-random signal circuits

Some circuits of this class, including pseudo-random number generators, modulators


and demodulators, may be relatively simple (hundreds of elementary cells) and thus
implemented using the current niobium-trilayer technology, while providing a decisive
speed advantage over the semiconductor competition in spread-spectrum
communication systems, e.g., in 3G wireless communication systems like CDMA. The
first RSFQ pseudo-random generators have already been designed, fabricated and

5.2. Long term prospects

The circuits and systems listed above, as interesting and important as they may be,
nevertheless occupy hardly more than just narrow niches in the immense electronics
market. With the transfer to a submicron, VLSI RSFQ technology (see Sec. 4.8 above),
many other applications will become possible. These applications include notably:

5.2.1. Ultrafast digital switching

Preliminary studies indicate148 that RSFQ circuits can be used for the implementation
of digital switching cores with unparalleled speed performance with very low power
consumption and (as a result) high circuit density. For example, a 128xl28-channel
self-routing Batcher-banyan core for a 424-bit (ATM) packet payload, implemented in a
0.8-jim technology, could provide throughput close to 100 Gbit per channel, dissipate
296 P. Bunyk, K. Likharev & D. Zinoviev

about 10 mW of power and fit on a single lxl cm2 chip. To our knowledge, no
semiconductor, electro-optical, or fully-optical system could provide comparable
performance. The traditional switches for digital communications, however, include
large memory components, mostly to search for the physical address of the packet
inside the switch using the destination address carried in the packet header. The
feasibility of their implementation using RSFQ technology, or the feasibility of new
switch architectures, still have to be explored.

5.2.2. Digital signal processing

RSFQ technology seems uniquely suited for several types of digital signal and image
processing including motion estimation, digital Fourier and cosine transforms, etc., for
applications in communication systems and high-definition digital television. As an
illustration of the possible speed of such processing, a RSFQ fixed-point 32-bit
multiplier would be able to provide throughput close to 60 billion operations per
second (gigaops)56 in comparison with just a few gigaops for modern CMOS DSPs.
The estimated power consumption of a floating-point RSFQ DSP is close to 50 p.W per
gigaflops, the number to be compared to approximately 1 W per gigaflops for the best
prospective CMOS DSP-based systems such as IBM's Blue Gene.149 Several RSFQ
blocks important for DSP applications have already been designed.56'68'71150

5.2.3. High-performance general-purpose computing

According to the ITRS1, by the year 2006 high-performance microprocessors may


reach a clock frequency of 2 to 3.5 GHz, and a microprocessor assembly featuring up to
200 million transistors may be placed on a single ~500-mm2 chip dissipating up to 160
watts. The peak performance of such a multiprocessor CMOS chip can be crudely
estimated as 10 to 100 gigaflops. Notice that this estimate is based on a very optimistic
assumption of 70-nm fabrication CMOS technology, for which there are still "no known
solutions".1
On the other hand, preliminary design work32'151"153 shows that an RSFQ
microprocessor using a much more conservative, 0.3-um fabrication technology to place
just about 30 million Josephson junctions on a chip of comparable area, and operating
at clock frequency about 90 GHz would be able to provide a peak performance of
approximately 2,000 gigaflops, while dissipating power below 1 watt. This dramatic
advantage may be used on at least two system size scales:

(a) Unique petaflops-scale systems. To achieve a peak performance of 1


petaflops will take 10 to 100 thousand advanced CMOS chips discussed above, with a
total power consumption of the order of 10 MW. The management of power of such
proportions would take a sizeable building. The significant (microsecond-scale) latency
of interprocessor communication in a system of such a physical size would make the
system stall for programs where inter-processor communication is a large enough
fraction of the computation process. The problems associated with semiconductor
processors have stimulated a search for alternative approaches to petaflops-scale
computing, in particular, the Hybrid Technology MultiThreaded architecture (HTMT)
RSFQ Technology: Physics and Devices 29?

project based on the use of RSFQ technology for most number cnraeliing and inter-
processor communications - see Fig. 23.
Our pretimnafy design work on the RSFQ "COOL* core32*131"153 for the HTMT
computer system indicates that the 1-petalops peak performance might be reached with
just 500 logic chips (plus about 29000 last superconductor memory chips), with
aggregate power dissipation in the core below 1 kW, Though removal of such power
from the cryostat would require a large-scale close-cycle cryocooler (helium
recondenser) consuming about 300 kW? this is still considerably less than what would
be required for a CMOS-based system. Even more important, the cryocooler would be
remote, enabling to compact the RSFQ core into a 1-m3 volume. As a result, the
simulated average latency of inter-processor communication network (including both
switching delays and signal time-of-flight) is as low as 20 ns»153 apparently enabling the
system as a whole to sustain a sub-petalops performance at many real-life computer
profpams*

(b) 6iPersonai" terqflops-scale computing (PeT). Much larger potential market (up
to SlOB/yr worldwide) may exist for Mgh-performance desktop-scale systems (personal
workstations and corporate servers) with just a few RSFQ VLSI chips, lip-chip-
mounted on a single superconductor-wired MCM. Estimates151 show that a PeT
computer would be able to sustain afew-taraftops-scaleperformance, while its cost, at a
production volume in 100,000s per year, may be below $100K. This would provide at
least an order-of-nagnitude price-to-perfoimance advantage over semiconductor
competition.

980 rim optical monitor and control WAN front end server
optical amplifiers computer w/ data gateways w/consoie (x4)
pumps acquisition cards (1 cabinet)
(20 cabinets) (3 cabinets)

40 m
Fig, 23. Side view on the HTMT petaflops computer room: a conceptual design. (Picture courtesy of J.
Morookian and L. Bergman, Jet Propulsion Laboratory.)
298 P. Bunyk, K. Likharev & D. Zinoviev

6. Conclusion

Assuming that the problems outlined in Sec. 4 are successfully and promptly solved, we
may place crude year tags on the RSFQ technology levels characterized in Table 1, thus
obtaining the expected RSFQ learning curve (Fig. 24). This plot is compared with the
ITRS predictions for the mainstream CMOS technology.1

1 THz
T T T «» P

%s~
0 3jtm
Nb RSFQ 10WJJs
(the authors' forecast) ° - 8 f»m
1M,"
100 GHz
1.75 jim
100K J J: petaflops computing,
PeT computers, etc.
3.5 nm DSP
5K JJ:
FLUX microprocessor

10 GHz
u ADC, DAC, DSQUID, etc.

0.045 urn 0-02 urn


0.065 iim
1 GHz

"no known solutions"


(e-beam lithography?)
photolithography
Si CMOS
(the ITRS 1999 forecast 1
historic trend

100 MHz
1996 1999 2002 2005 2008 2011
Year
Fig. 24. An optimistic version of the expected progress of clock frequency of the high-performance
semiconductor and superconductor LSI circuits. The numbers near the points show the necessary minimum
feature size and (for RSFQ) the anticipated integration scale. Dashed lines on the right indicate the ranges
where forecasts seem rather uncertain. Dotted lines on the left show the CMOS historic trend.

It is probably evident that the RSFQ speed advantage is so great that even if the time
tags in this (quite subjective) forecast are somewhat misplaced, the potential value of
RSFQ as the possible fastest practical digital technology can hardly be questioned. After
the transfer to deep-submicron design rules and multiple wiring levels, the RSFQ logic
circuits may also be the densest (for the given patterning technology level). In addition,
RSFQ Technology: Physics and Devices 299

a breakthrough in the technology of Josephson junctions based on high-temperature


superconducting (HTS) materials155156 may make it possible for RSFQ systems to
operate at even higher speed, though probably not at much higher temperatures.
We believe that with a relatively modest government and/or industrial effort, RSFQ
could be established as the leading digital technology for high-performance computing,
wireless communications and precise instrumentation. However, if this support does not
arrive very soon, the current momentum may be lost, and then it will take much more
time and money to revive this technology when its remarkable advantages are finally
broadly recognized.

Acknowledgments

Fruitful discussions with numerous colleagues, and valuable comments by T.


Claeson, M. Dorojevets and V. Semenov are gratefully acknowledged. The authors are
grateful to D. Brock (HYPRES), and J. Morookian and L. Bergman (JPL) for their kind
permission to use previously unpublished figures. This work was supported in part by
DoD and NASA via JPL.

References
1. The International Technology Roadmap for Semiconductors, 1999 Version, 2000 Update, available
on the Web at public.itrs.net/.
2. See, e.g., C. Warm, F. Assaderaghi, and Y. Taur, "High-performance 0.07-micrometer CMOS with
9.5 ps Gate Delay and 150 GHz/ ", IEEE El. Dev. Lett. 18 (1997) 625-627.
3. G. A. Sai-Halasz, "Performance trends in high-end processors", Proc. of IEEE 83 (1995) 20-36.
4. S. Borkar, "Design challenges for technology scaling", IEEE Micro (1999) 23-29.
5. A fine collection of modern microprocessor specifications may be found on the Web at
www.geek.com/procspec/procspec.htm
6. G. Raghavan, M. Sokolich, and W. E. Stanchina, "Indium phosphide ICs unleash the high-
frequency spectrum", IEEE Spectrum 37 (2000) No. 7,47-52.
7. S. L. Rommel, T. E. Dillon, M. W. Dashiell, H. Feng, J. Kolodzey, P. R. Berger, P. E. Thompson,
K. D. Hobart, R. Lake, A. C. Seabaugh, G. Klimeck, and D. K. Blanks, D.K. "Room temperature
operation of epitaxially grown Si/SiosGecs/Si resonant interband tunneling diodes", Appl. Phys.
Lett. 73 (1998) 2191-2193.
8. K. K. Likharev and V. K. Semenov, V.K. "RSFQ logic/memory family: A new Josephson-junction
digital technology for sub-terahertz-clock-frequency digital systems", IEEE Trans, on Appl.
Supercond. 1(1991)3-28.
9. K. K. Likharev, "Superconductor devices for ultrafast computing", in: H. Weinstock (ed.)
Applications of Superconductivity, Kluwer, Dordrecht, 2000, pp. 247-294.
10. K. Likharev, "Superconductors speed up computation", Phys. World (1997) No. 5,39-43.
11. D. K. Brock, E. Track, and J. M. Rowell, "Superconductor ICs: the 100-GHz second generation",
IEEE Spectrum 37 (2000) No. 12,40^16.
12. In particular, the Stony Brook RSFQ group page gamayun.physics.sunysb.edu/RSFQ/RSFQ.html
has links to most of these sites.
13. See, e.g., M. Tinkham, Introduction to Superconductivity, 2nd ed., McGraw-Hill, New York, 1996.
14. T. Van Duzer and C. W. Turner, Principles of Superconducting Circuits, Elsevier, New York,
1981.
15. R. L. Kautz, "Picosecond pulses on superconducting striplines", J. Appl. Phys. 49 (1978) 308-314.
16. S. V. Polonsky, V. K. Semenov, and D. F. Schneider, "Transmission of single-fiux-quantum pulses
along superconducting microstrip lines", IEEE Trans, on Appl. Supercond. 3 (1993) 2598-2600.
300 P. Bunyk, K. Likharev & D. Zinoviev

17. M. Curric, R. Sobolewski, and T. Y. Hsiang, "High-frequency crosstalk in superconductor


microstrip waveguide interconnects", ibid. 9 (1999) 3602-3605.
18. K. K. Likharev, Dynamics ofJosephson Junctions and Circuits, Gordon and Breach, New York,
1986.
19. M. Gurvitch, M. A. Washington, and H. A. Huggins, "High refractory Josephson tunnel junctions
utilizing thin aluminum layers", Appl. Phys. Lett. 42 (1983) 472-475.
20. S. Hasuo, T. Imamura, and N. Fujimaki, "Recent advances in Josephson junction devices", Fujitsu
Techn. J. 24 (1988) 284-292; Y. Tarytani, M. Hirado, and U. Kawabe, "Niobium-based integrated
circuit technologies", Proc. IEEE 77 (1989) 1164-1176.
21. "Josephson computer technology: An IBM research project", IBM J. Res. Devel. 24 (1980) No. 5.
22. S. Hasuo, S. Kotani, A. Inoue, and N. Fujimaki, "High speed Josephson processor technology",
IEEE Trans, on Magn. 27 (1991) 2602-2609.
23. "HYPRES Design Rules", available from HYPRES, Inc., 175 Clearbrook Rd., Elmsford, NY
10523, U.S.A., and from the Web site www.hypres.com.
24. M. Jeffery, W. Perold, and T. Van Duzer, "Superconducting complementary output switching logic
operating at 5-10 Gbps", Appl. Phys. Lett. 69 (1996) 2746-2749.
25. Y. Hashimoto, S. Yorozu, H. Numata, M. Koike, M. Tanaka, and S. Tahara, "High-speed testing of
Josephson logic circuits by an on-chip signal-pattern generator", in: Ext. Abst. of Int. Supercond.
Electronics Conf., PTB, Berlin, 1997, pp. 269-271.
26. This problem is avoided in dc-powered latching circuits of the HUFFLE family. Originally proposed
long ago27, these devices were later improved substantially 28,29 so that operation of single gates was
demonstrated atfrequenciesup to 6 GHz. These circuits share, however, one more drawback of
latching circuits: relatively high power consumption. Their power per gate at a few GHz is of the
order of 10 u.W per gate, i.e., at least two orders of magnitude higher than in RSFQ technology. As a
result, HUFFLE devices are not a match for RSFQ in basic logic circuits; however, they may be
quite useful for some auxiliary functions, e.g., as amplifiers in the superconductor/semiconductor
electronics interfaces30 and as drivers in superconductor memories32. Indeed, since the round-trip
time-of-fiight through, say, a 1-cm-long memory drive line is about 200 ps, the limited speed of the
latching circuits is not such a big problem as it is in logic. Notice that there have been also several
other directions of superconductor digital electronics, including dual-rail voltage-state logic, almost
similar "SAIL" devices, and Josephson field-effect transistors, which eventually ran into a dead end
- for their critical review, see, e.g., Ref. 9.
27. A. F. Hebard, S. S. Pei, L. N. Dunkleberger, and T. A. Fulton, "A DC powered Josephson flip-flop",
IEEE Trans, on Magn. 15 (1979) 408-411.
28. Y. Hatano, H. Nagaishi, S. Yano, K. Nakahara, H. Yamada, S. Kominami, and M. Hirano, "An all
DC-powered Josephson logic circuit", IEEE J. of Solid State Circuits 26 (1991) 1123-1132.
29. H. Hasegawa, H. Nagaishi, S. Kominami, H. Yamada, and T. Nishino, "A DC-powered Josephson
logic family that uses hybrid unlatching flip-flop logic elements (HUFFLES)", IEEE Trans, on
Appl. Supercond. 5(1995)3504-3510.
30. D. F. Schneider, J.-C. Lin, S. V. Polonsky, V. K. Semenov, and C. A. Hamilton, "Broadband
interfacing of superconducting digital systems to room temperature electronics", ibid. 5 (1995)
3152-3155.
31. L. Abelson, Q. P. Herr, G. L. Kerber, M. Leung, and S. Tighe, "Manufacturability of
superconductor electronics for a petaflops-scale computer", IEEE Trans, on Appl. Supercond. 9
(1999)3202-3207.
32. M. Dorojevets, P. Bunyk, D. Zinoviev, and K. Likharev, "COOL-0: Design of an RSFQ subsystem
forpetaflops computing", ibid. 9 (1999) 3606-3614.
33. K.K. Likharev, "Dynamics of some single-flux-quantum devices. I. Parametric Quantron", IEEE
Trans, on Magn. 13 (1976) 242-244.
34. C. Bennett, "Logical reversibility of computation", IBM J. Res. Devel. 17 (1973) 525-532.
35. K. Likharev, "Classical and quantum limitations on energy consumption at computation", Int. J.
Theor. Phys. 21 (1982) 311-326.
36. K. K. Likharev, S. V. Rylov, and V. K. Semenov, "Reversible conveyor computation in array of
parametric quantrons", IEEE Trans. Magn. 21 (1985) 947-950.
37. S. V. Rylov and V. K. Semenov, "Superconductor quantum interferometers as elements with
controllable, sign changeable inductance, and their use in parametric quantrons", Sov.
Microelectronics 17 (1988) 109-116.
RSFQ Technology: Physics and Devices 301

38. K.F. Loe and E. Goto, "Analysis of flux input and output Josephson pair device", IEEE Trans.
Magn. 21 (1985) 884-887.
39. M. Hosoya, W. Hioe, J. Casas, R. Kamikawai, Y. Harada, Y. Wada, H. Nakane, R. Suda and E.
Goto, "Flux quantum parametron: A single quantum flux device for Josephson supercomputer",
IEEE Trans, on Appl. Supercond. 1 (1991) 77-89.
40. M. Hosoya and W. Hioe, "Margin analysis of quantum parametron logic gates", ibid. 3 (993) 3022-
3028.
41. R. Suda, R. Kamikawai, Y. Wada, W. Hioe, M. Hosoya, and E. Goto, "QFP wiring problem -
Introduction and analytical considerations", IEEE Trans, on CAD/ICAS 13 (1994) 48-56.
42. K. K. Likharev, "Properties of a superconducting ring closed with a weak link as a device with
several stable states", Radio Eng. and Electron. Phys. 19 (1974) No. 7,109-115.
43. K. Nakajima, Y. Onodera, and Y. Ogawa, "Logic design of Josephson network", J. Appl. Phys. 47
(1976) 1620-1627.
44. K. Nakajima and Y. Onodera, "Logic design of Josephson network - IT', ibid. 49 (1978) 2958-
2963.
45. J. P. Hurrell and A. H. Silver, "SQUID digital electronics", in: B.S. Deaver Jr. et al. (eds.). Future
Trends in Superconductive Electronics, AIP, New York, pp. 437-447.
46. J. P. Hurrell, D. C. Pridmore-Brown, and A. H. Silver, "A/D conversion with unlatched SQUIDs",
IEEE Trans. Electron. Dev. 27 (1980) 1887-1896.
47. K. Nakajima, G. Oya, G., and Y. Sawada, "Fluxoid motion in phase mode Josephson switching
system", IEEE Trans, on Magn. 19 (1983) 1201-1204.
48. G. Oya, M. Yamashita, and Y. Sawada, "Single flux quantum 4JL-interferometer operated in the
phase mode", ibid. 21 (1985) 880-883.
49. K. K. Likharev, O. A. Mukhanov, and V. K. Semenov, "Resistive single flux quantum logic for the
Josephson-junction technology", in: H. Hahlbohm and H. Liibbig (eds.) SQUID'85, W. de Gruyter,
Berlin, 1985, p. 1103-1108.
50. K. K. Likharev, K.K., O. A. Mukhanov, and V. K. Semenov, "Ultimate performance of RSFQ logic
circuits", IEEE Trans, on Magn. 23 (1987) 759-762.
51. M. Dorojevets, P. Bunyk, and D. Zinoviev, "FLUX Project: Design of a 20-GHz 16-bit
Ultrapipelined Processor Prototype Based on 1.75-um LTS RSFQ Technology", Report #1EK04,
2000 Applied Superconductivity Conference, to be published in IEEE Trans, on Appl. Supercond.
ll(2001)No.2.
52. P. Bunyk, A. Rylyakov, K. Likharev, P. Litskevitch, and D. Zinoviev, SUNYRSFQ Cell Library,
available on the Web at gamayun.physics.sunysb.edu/RSFQ/Lib/.
53. For a single-junction loop (Fig. 3) this equivalence is clear from Eq. (11). A very similar equation
may be written for any loop of an RSFQ device.
54. If necessary, SFQ pulse current and energy may be boosted by an exponential ramp-up of the
junction critical currents along the array - see Fig. 9 and its discussion.
55. A. V. Rylyakov and K. K. Likharev, "Pulse jitter and timing errors in RSFQ circuits", IEEE Trans,
on Appl. Supercond. 9 (1999) 3539-3544.
56. P. Bunyk and P. Litskevich, "Case study in RSFQ design: Fast pipelined parallel adder", ibid. 9
(1999)3714-3720.
57. See, e.g., C. H. Roth, Fundamentals of Logic Design, West Publishing Co., St. Paul, MN, 1985
58. This device was suggested by one of the authors.59 It may be also considered as a truncation of a
more general device, the "B flip flop", suggested earlier.60
59. D. Zinoviev, "Design and partial implementation of RSFQ-based Batcher-banyan switch and
support tools", PhD thesis, SUNY at Stony Brook, Aug. 1997.
60. S. Polonsky, V. Semenov, and A. Kirichenko, "Single-flux-quantum B flip flop and its possible
applications", IEEE Trans, on Appl. Supercond. 4 (1994) 9-18.
61. J. C. Lin, V. K. Semenov, and K. K. Likharev, "Design of an SFQ-counting analog-to-digital
converter", ibid. 5 (1995) 2252-2259.
62. J.-C. Lin and V. Semenov, "Timing circuits for RSFQ digital systems", ibid. 5 (1995) 3472-3477.
63. O. A. Mukhanov, S. V. Rylov, V. K. Semenov, and S. V. Vyshensky, "RSFQ logic arithmetic",
IEEE Trans, on Magn. 25 (1989) 857-860.
64. Several different approaches to RSFQ computing have been suggested. For example, in the dual-rail
approach65"68 two lines are used to carry exactly one SFQ pulse during each clock period: binary
information is coded according to which line carries this pulse. This approach may be stretched to
302 P. Bunyk, K. Likharev & D. Zinoviev

implement completely asynchronous ("delay-insensitive") circuits66,68 which fire outputs as soon as


the later of the input pulses has arrived. Preliminary comparison of sizable circuits56,68 indicates that
delay-insensitive circuits require more hardware (measured in, say, the number of Josephson
junctions) for the same function, though they provide some improvement in the maximum
processing rate (throughput), while their processing delay (latency) is almost similar to architectures
with explicit local clock.
65. Z. J. Deng, N. Yoshikawa, S. R. Whiteley, and T. Van Duzer, "Data-driven self-timed RSFQ digital
integrated circuits and systems", IEEE Trans, on Appl. Supercond. 7, (1997) 3634-3637.
66. P. Patra, S. Polonsky, and D. S. Fussel, "Delay insensitive logic for RSFQ superconductor
technology", in: Proc. of 3rd Int. Symp. on Adv. Res. in Asynchronous Ore. andSyst. (Async97),
IEEE Comp. Soc., Los Alamitos, CA, pp. 42-53.
67. A. V. Rylyakov and S. V. Polonsky, "All-digital 1-bit RSFQ autocorrelator for radioastronomy
applications: Design and experimental results", IEEE Trans, on Appl. Supercond. 8 (1998) 14-19.
68. Y. Kameda, S. V. Pokmsky, M. Maezawa, and T. Nanya, "Self-timed parallel adders based on DI
RSFQ primitives", IEEE Trans, on Appl. Supercond. 9 (1999) 4040-4045.
69. A, Kidiyarova-Shevchenko, A. Kirichenko, S. Polonsky, and P. Shevchenko, "New elements of the
RSFQ logic memory (Part 2)", in: Ext. Abstr. ofthe 3rd Int. Supercond. Electron. Conf, Glasgow,
UK, 1991, pp. 200-203.
70. S. V. Polonsky, V. K. Semenov, P. I. Bunyk, A. F. Kirichenko, A. Yu. Kidiyarova-Shevchenko, O.
A. Mukhanov, P. N. Shevchenko, D. F. Schneider, D. Yu. Zinoviev, and K. K. Likharev, "New
RSFQ circuits", IEEE Trans, on Appl. Supercond. 3 (1993) 2566-2567.
71. S. V. Polonsky, J. C. Lin, and A. V. Rylyakov, "RSFQ arithmetic blocks for DSP applications",
ibid. 5(1995)2823-2826.
72. A bipolar power supply may increase the margins to approximately ±40%.
73. T. V. Filippov, "The quantum dissipation properties of a Josephson balanced comparator", Rus.
Microelectron. 25 (1996) 250-256.
74. T. V. Filippov, Yu. A. Polyakov, V. K. Semenov, and K. K. Likharev, "Signal resolution of RSFQ
comparators", IEEE Trans, on Appl. Supercond. 5 (1995) 2240-2243.
75. V. K. Semenov, T. V. Filippov, Yu. A. Polyakov, and K. K. Likharev, "SFQ balanced comparators
at a finite sampling rate", ibid. 7 (1997) 3617-3621.
76. J. Satchell, "Stochastic simulation of SFQ logic" ibid. 7 (1997) 3315-3318.
77. M. Jeffery, P. Y. Xie, S. R. Whiteley, and T. Van Duzer, "Monte Carlo and thermal noise analysis
of ultra-high-speed high temperature superconductor digital circuits", ibid. 9 (1998) 4095-4098.
78. B. Ruck, Y. Chong, R. Dittmann, A. Engelhardt, B. Oelze, E. Sodtke, W. E. Booij, and M. G.
Blamire, "Measurement of the error rate of single flux quantum circuits with high temperature
superconductors", ibid 9 (1999) 3850-3853.
79. E. J. Dean, P. D. Dresselhaus, J. X. Przybysz, A. H. Miklich, A. H. Worsham, and S. V. Polonsky,
"Bit error rate measurements for GHz code generator circuits", ibid. 9 (1999) 3598-3601.
80. Q. P. Herr, M. W. Johnson, and M. J. Feldman, "Temperature-dependent bit-error rate of a clocked
superconducting digital circuit", ibid. 9 (1999) 3594-3597.
81. A. M. Herr, M. J. Feldman, and M. Bocko, "Timing jitter and bit errors in a 64-bit circular shift
register", IEEE Trans, on Appl. Supercond. 9 (1999) 3721-3724.
82. Preliminary versions of this device were suggested45 and implemented83 well before the invention of
the full RSFQ logic set.
83. C. A. Hamilton and F. L. Lloyd, "100 GHz binary counter based on the dc SQUIDs", IEEE
Electron. Dev. Lett. 3 (1982) 335-338.
84. In fact, the output part of the SFQ/DC converter is just a galvanically-coupled version of a
superconductor quantum magnetometer ("dc SQUID"13,14); more usual, magnetically-coupled
versions of the SFQ/DC converter are also possible.
85. J.X. Przybysz, J. D. McCambridge, P. D. Dresselhaus, A. H. Worsham, and E. J. Dean, "Dewar-to-
dewar data transfer at GHz rates", IEEE Trans, on Appl. Supercond. 9 (1999) 2981-2984.
86. R. D. Sandell, J. W. Spargo. and M. Leung, "High data rate switch with amplifier chip", ibid. 9
(1999)2985-2988.
87. O. A. Muhkanov, S. V. Rylov, D. V. Gaidarenko, N. B. Dubash, and V. V. Borzenets, "Josephson
output interfaces for RSFQ circuits", ibid. 7 (1997) 2826-2831.
88. K. Gaj, E. G. Friedman, and M. J. Feldman, 'Timing of multi-gigahertz RSFQ digital circuits", J. of
VLSI Signal Processing Systems 16 (1997) 247-276.
RSFQ Technology: Physics and Devices 303

89. S. Tahara, I. Ishida, S. Nagasawa, M. Hidaka, H. Tsuge, and Y. Wada, "4-Kbit Josephson
nondestructive read-out RAM operated at 580 ps and 6.7 mW", IEEE Trans, on Magn. 27 (1991)
2626-2633.
90. S. Nagasawa, S. Tahara, H. Numata, and A. Tsuchida, A. "Miniaturized vortex transitional
Josephson memory cell by a vertical integrated device structure", IEEE Trans, on Appl. Supercond.
4 (1994) 19-24.
91. The trapping may happen in other layers of a superconductor integrated circuit as well, but since
these layers are usually patterned into narrow interconnecting wires, the effect is much less
probable.
92. S. Bermon and T. Gheewala, "Moat-guarded Josephson SQUIDs", IEEE Trans. Magn. 19 (1983)
1160-1164.
93. S. Nagasawa, H. Numata, C. Kato, and S. Tahara, "Evaluation of trapped magnetic flux for
Josephson 4-Kbit RAMs", in Ext. Abstr. ofInt. Supercond. Electron. Conf., Nagoya, Japan, 1995,
pp. 192-194.
94. Spectacular experimental images of flux quanta trapped on random pinning centers and intentional
holes may be found in the paper M. Jeffery, T. Van Duzer, J. Kirtley, and M. B. Ketchen, "Magnetic
imaging of moat-guarded superconducting electronic circuits", Appl. Phys. Lett. 67 (1995) 1769-
1771.
95. K. Nakahara, H. Nagaishi, H. Hasegawa, S. Kominami, H. Yamada, and T. Nishino, "Optical input-
output interface system for Josephson-junction integrated circuits", ibid. 4 (1994) 223-227.
96. L. A. Bunz, R. Robertazzi, and S. Rylov, "An optically coupled superconducting analog to digital
converter", ibid. 7 (1997) 2972-2974.
97. J. F. Bulzacchelli, H.-S. Lee, A. Sotiris, J. A. Misewich, and M. B. Ketchen, "Optoelectronic
clocking system for testing RSFQ circuits up to 20 GHz", Ibid. 7 (1997) 3301-3306.
98. D. Gupta, D. V. Gaidarenko, and S. V. Rylov, "A 16-bit analog-to-digital converter module with
optical output", ibid. 9 (1999) 3030-3033.
99. S. Tanahasbi, T. Kubo, K. Kawabata, R. Jikuhara, G. Kaji, M. Terasawa, H. Nakagawa, M. Aoyagi,
I. Kurosawa, and S. Takada, S. "Superconductor wiring in multi-chip module for Josephson LSI
circuits", Jpn. J. Appl. Phys. 32, pt. 2 (1993) L898-L900.
100. T. Ogashiwa, H. Nakagawa, H. Akimoto, H. Shigyo, and S. Takada, "Flip-chip bonding using
superconducting solder bump", Jpn. J. Appl. Phys. 34, pt. 1 (1995) 4043-4046.
101. R. D. Sandell, G. Akerling, and A. D. Smith, "Multichip packaging for high-speed superconducting
circuits", IEEE Trans, on Appl. Supercond. 5 (1995) 3160-3163.
102. B. J. Dalrymple, M. Leung, R. D. Sandell, and J. Spargo, "Multi-Gb/s operation of flipped chip
MVTL circuits", ibid. 7 (1997) 2693-2696.
103. J. X. Przybysz, D. L. Miller, S. S. Martinet, J. H. Kang, A. H. Worsham, and M. L. Farich,
"Interface circuits for chip-to-chip data transfer at GHz rates", ibid. 7 (1997) 2657-2660.
104. S. P. Polonsky and D. F. Schneider, 'Toward broadband communications between RSFQ chips",
ibid. 7(1997)2818-2821.
105. M. Maezawa, M. Yamamori, and A. Shoji, "A novel approach to chip-to-chip communications
using single flux quantum pulse, ibid. 9 (1999) 4049-4052.
106. H. Toepfer, T. Lingel, F. H. Uhlman, and M. Aoyagi, "Numerical studies of interchip pulse
transmission for complex RSFQ systems", ibid. 9 (1999) 3725-2728.
107. M. Maezawa, H. Yamamori, A. Shoji, "Chip-to-chip communication using a single flux quantum
pulse", ibid. 10 (2000) 1603-1605.
108. S. V. Polonsky, V. K. Semenov, and P. N. Shevchenko, "PSCAN - Personal superconductor circuit
analyzer", Supercond. Sci. Technol. 4 (1991) 667-669.
109. S. Polonsky, P. Shevchenko, A. Kirichenko, D. Zinoviev, and A. Rylyakov, "PSCAN'96: New
software for simulation and optimization of complex RSFQ circuits", IEEE Trans, on Appl.
Supercond. 7 (1997) 2685-2689.
110. P. I. Bunyk, and S, V. Rylov, "Automated calculation of mutual inductance matrices of multilayer
superconductor integrated circuits", in Abstr. of Int. Supercond. Electronics Conf, NIST, Boulder,
CO (1993); LMETER is available at http://gamayun.physics.sunysb.edu/~paul/lmeter/lmeter.html
under the conditions of Gnu Public License (GPL).
111. K. Gaj, Q. P. Herr, V. Adler, A. Krasniewski, E. G. Friedman, and M. J. Feldman, 'Tools for the
computer-aided design of multigigahertz superconducting digital circuits", IEEE Trans, on Appl.
Supercond. 9(1999)18-38.
304 P. Bunyk, K. Likharev & D. Zinoviev

112. K. Gaj, "Survey of SDE Design Tools ", http://benry.ee.rochester.e<lu:8080/~sde/cad/survey.html.


113. D. Zinoviev and Yu. Polyakov, "Octopux: An advanced automated setup for testing superconductor
circuits", IEEE Trans, on Appl. Supercond. 7 (1997) 3240-3243.
114. V. K. Semenov, Yu. A. Polyakov, and W. Chao, "Extraction of impacts of fabrication spreads and
thermal noise on operation of superconducting digital circuits", ibid. 9 (1999) 4040-4033.
115. A. F. Kirichenko, O. A. Mukhanov, and A. I. Ryzhikh, A.I. "Advanced on-chip test technology for
RSFQ circuits", ibid. 7 (1997) 3438-3441.
116. Q. P. Herr, K. Gaj, A. M. Herr, N. Vukovic, C. A. Mancini, M. Bocko, and M. J. Feldman, "High
speed testing of a four-bit RSFQ decimation digital filter", ibid. 7 (1997) 2975-2978.
117. A. V. Rylyakov, D. F. Schneider, and Yu. A. Polyakov, "A fully integrated 16-channel RSFQ
autocorrelator operating at 11 GHz", ibid. 9 (1999) 3623-3627.
118. Other Josephson junctions which may be overdamped without external shunting include
superconductor microbridges"9 and double-tunnel-barrier junctions with normal-metal interlayer
(so-called SISIS structures)120. However, in order to reach substantial values of the IcR product (~ 1
mV, cf. Table 1), and also suppress the strong temperature sensitivity of microbridges, they should
have length below 100 nm which would be reproducible with the 3a spread about 5 nm, imposing
very hard demands on patterning technology. In order to reach the same goal using SISIS junctions,
their critical current should be increased to about 20 kA/cm2. This is already close to the level
necessary for the usual, single-junctions which should be inherently more reproducible.
119. For a general review, see K. Likharev, "Superconducting weak links", Rev. Mod. Phys. 51 (1999)
101-160.
120. A. Brinkman, D. Gassel, A. A. Golubov, M. Yu. Kupriyanov, M. Siegel, and H. Rogalla, "Double-
barrier Josephson junctions: Theory and experiment", Report at the 2000 Applied
Superconductivity Conference, to be published in IEEE Trans, of Supercond. Electron. 11 (2001)
No. 2.
121. Y. Naveh, D. Averin, and K. Likharev, "Physics of high-/c Josephson junctions and prospects of
their RSFQ VLSI applications", Report #4EL03, 2000 Applied Superconductivity Conference; to be
published in IEEE Trans, on Appl. Supercond. 11 (2001) No. 2.
122. A. W. Kleinsasser, R. E. Miller, W. H. Mallison, and G. D. Arnold, "Observation of multiple
Andreev reflections in superconducting tunnel junctions", Phys. Rev. Lett. 72 (1994) 1738-1741.
123. V. Patel and J. E. Lukens, "Self-shunted Nb/AKVNb Josephson junctions", IEEE Trans, on Appl.
Supercond. 9 (1999) 3247-3250.
124. Y. Naveh, V. Patel, D. V. Averin, K. K. Likharev, and J. E. Lukens, "Universal distribution of
transparencies in highly conductive Nb/AKVNb junctions", Phys. Rev. Lett. 85 (2000) 5404.
125. A. Furusaki and M. Tsukada, "A unified theory of clean Josephson junctions", Physica B 165&166
(1990)967-968.
126. C. W. J. Beenakker and H. van Houten, "Josephson current through a superconducting point contact
shorter than the coherence length", Phys. Rev. Lett. 66 (1991) 3056-3059.
127. D. Averin and A. Bardas, "Josephson effect in a single quantum channel", Phys. Rev. Lett. 75
(1995)1831-1834.
128. J. A. Melsen and C. W. J. Beenakker, "Reflectionless tunneling through a double-barrier junction",
Physica B 203 (1994) 219.
129. K. M. Schep and G. E. W. Bauer, "Transport through dirty interfaces"; Phys. Rev. Lett. 78 (1997)
3015.
130. W. Chen, A. V. Rylyakov, V. Patel, J. E. Lukens, and K. K. Likharev, "Rapid single flux quantum
T-flip-flop operating at 770 GHz", IEEE Trans, on Appl. Supercond. 9 (1999) 3212-3215.
131. See, e.g., A. O. Caldeira and A. Leggett, "Quantum tunneling in a dissipative system", Ann. Phys.
149(1983)374.
132. See, e.g., Proceedings of the 5K Cryocooler Workshop, July 24-25, 1995, Elmsford, NY, available
from HYPRES, Inc., phone 914-592-1190.
133. S. V. Rylov, D. K. Brock, D. V. Gaidarenko, A. F. Kirichenko, J. M. Vogt, and V. K. Semenov,
"High-resolution ADC using modulation-demodulation architecture", IEEE Trans, on Appl.
Supercond. 9 (1999) 3016-3019.
134. V. K. Semenov, Yu. A. Polyakov, and T. V. Filippov, "Superconductor delta ADC with on-chip
decimation filter", ibid. 9 (1999) 3026-3029.
RSFQ Technology: Physics and Devices 305

135. O. Mukhanov, D. Brock, W. Li, D. Gupta, J. Vogt, V. Semenov, T. Filippov, Y. Polyakov,


"Superconductive High Resolution ADC", Report #2EK01, 2000 Applied Superconductivity
Conference, to be published in IEEE Trans, on Appl. Supercond. 11 (2001), No. 2.
136. R. H. Walden, "Analog-to-digital converter survey and analysis", IEEE J. on Sel. Areas of
Commun. 17 (1999) 539-550; see also Web site www.hrl.com/TECHLABS/micro/ADC/adc.html.
137. E. B. Wikborg, V. K. Semenov, and K. K. Likharev, "RSFQ front end for a software radio receiver",
IEEE Trans, on Appl. Supercond. 9 (1999) 3615-3618.
138. V. K. Semenov, P. N. Shevchenko, and Yu. A. Polyakov, "Digital-to-analog converter based on
processing of SFQ pulses", in Ext. Abstr. of Int. Supercond. Electron. Conf., PTB, Berlin, 1997, pp.
320-322.
139. H. Sasaki, S. Kiryu, F. Hirayama, T. Kikuchi, M. Maezawa, A. Shoji, and S. Polonsky, "RSFQ-
based D/A converter for AC voltage standard", IEEE Trans, on Appl. Supercond 9 (1999) 3561-
3564.
140. V.K. Semenov, Yu. A. Polyakov, and E. Wikborg, "Flux multiplier and its metrology applications",
Report #2EI13, 2000 Applied Superconductivity Conference; to be published in IEEE Trans, on
Appl. Supercond. 11 (2001), No. 2.
141. S. P. Benz, C. A. Hamilton, T. E. Harvey, L. A. Christian, and J. X. Przybysz, "Pulse-driven
Josephson digital/analog converter", IEEE Trans, on Appl. Supercond. 8 (1998) 42-47.
142. See, e.g., I. Clarke, "Low- and high-y,, SQUIDs and some applications", in: H. Weinstock (ed.)
Applications of Superconductivity, Kluwer, Dordrecht, 2000, pp. 1-60.
143. V. K. Semenov and Yu. A. Polyakov, "Fully integrated digital SQUID", in Ext. Abstr. of Int.
Supercond. Electron. Conf., PTB, Berlin, pp. 329-331.
144. J. Vrba, "Multichannel SQUID biomagnetie systems", in: H. Weinstock (ed.) Applications of
Superconductivity, Kluwer, Dordrecht, 2000, pp. 61-138.
145. A. Kidiyarova-Shevchenko, and D. Zinoviev, "RSFQ pseudo random generator and its possible
applications", IEEE Trans, on Appl. Supercond. 5 (1995) 2820-2822.
146. J. Kang, A. H. Worsham, J. Przybysz, "4.6 GHz SFQ shift register and SFQ pseudorandom bit
sequence generator", ibid 5 (1995) 2827-2830.
147. P. D. Dresselhaus, E. J. Dean, A. H. Worsham, J. X. Przybysz, and S. V. Polonsky, "Modulation
and demodulation of 2 GHz pseudo random binary sequence using SFQ digital circuits", ibid, 9
(1999)3585-3589.
148. D. Yu. Zinoviev and K. K. Likharev, 'Teasibility study of RSFQ-based self-routing nonblocking
switches", IEEE Trans, on Appl. Supercond. 7 (1997) 3155-3163.
149. D. Clark, "Blue Gene and the race toward petaflops capacity", IEEE Concurrency 8 (2000) 5-9.
150. O. A. Mukhanov and A. F. Kirichenko, "Implementation of a FFT radix 2 butterfly using serial
RSFQ multiplier-adders", IEEE Trans, on Appl. Supercond. 5 (1995) 2461-2464.
151. M. Dorojevets, P. Bunyk, D. Zinoviev, and K. Likharev, "RSFQ Computing: The Quest for
Petaflops", in: Future Trends in Microelectronics, S. Luryi, J. Xu, and A. Zaslavsky (eds.), Wiley,
New York, 1999, pp. 193-206.
152. M. Dorojevets, P. Bunyk, D. Zinoviev, and K. Likharev, "Superconductor Electronic Devices for
Petaflops Computing", FED Journal 10 (2000) 3-14.
153. M. Dorojevets, P. Bunyk, D. Zinoviev, and K. Likharev, "COOL-1: The Next Step in RSFQ
Computer Design". Physica B 280 (2000) 495-496.
154. L. Wittie, D. Zinoviev, G. Sazaklis, and K. Likharev, "CNET: RSFQ switching network for
petaflops-scale computing", IEEE Trans, on Appl. Supercond. 9 (1999) 4034-4039.
155. For a review, see, e.g., M. Yu. Kupriyanov and K. K. Likharev, "Josephson effect in high-
temperature superconductors", Sov. Phys. Usp. 33 (1990) 340-364.
156. To our knowledge, the most reproducible technology of HTS junction fabrication was described by
B. D. Hunt, M. G. Forrester, J. Talvaccio, and R. M. Young, "High-resistance HTS edge Josephson
junctions for digital circuits", IEEE Trans, on Appl. Supercond. 9 (1999) 3362-3365. The achieved
critical current spreads are still insufficient for VLSI applications.
This page is intentionally left blank
International Journal of High Speed Electronics and Systems, Vol. 11, No. 1 (2001) 307-362
© World Scientific Publishing Company

RSFQ TECHNOLOGY: CIRCUITS AND SYSTEMS

DARREN K. BROCK
HYPRES, Inc.
175 Clearbrook Road, Elmsford, NY 10523-1101, USA

Rapid Single-Flux-Quantum (RSFQ) logic is a superconductor IC technology that, with


only a modest number of researchers worldwide, has produced some of the world's highest
performance digital and mixed-signal circuits. This achievement is due, in part, to a
constellation of characteristics that manifest themselves at the circuit level - namely, high-
speed digital logic at low-power, ideal interconnects, quantum accuracy, scalability, and
simplicity of fabrication. A necessary key to translating these advantages to the system-
level involves understanding the I/O, synchronization, and packaging issues associated with
a cryogenic technology. The objective of this paper is to review the status of current RSFQ
circuit-level infrastructure components and their potential impact on system-level
applications.

1. Introduction

As the RF and digital domains converge1, entirely new strategies are needed to enable the
innovative applications that will drive tomorrow's electronics industry. The ability to
deploy 100+ GHz mixed-signal systems will usher in a telecommunications and
computer revolution. Specific areas to benefit include:
• The wireless communications industry - Given the insatiable "thirst for
bandwidth" in digital telecommunications, future data converters and digital signal
processors will be required to deliver greatly increased performance to meet the
connectivity demands of governments, businesses, and consumers.
• The defense/government market - The never-ending drive for militaries and
governments to do "more with less" is resulting in a concentrated push to deploy multi-
function, dynamically reconfigurable systems. Such systems will rely on flexible, ultra-
fast, digital technologies, and replace, consolidate, and expand the capability of existing
dedicated analog systems for radar, electronic warfare, and other surveillance
applications.
• The hyper-computer business - Demand for access to intensive computation for
weather prediction, non-invasive geo-physical exploration of natural resources, global
economic modeling, intensive data mining, and other applications already exceeds the
abilities of modern supercomputers and networks. Ever-faster processing capabilities,
ultra-low latency memories, and ultra-high throughput network switches will be needed.

307
308 D. K. Brock

For traditional three-terminal semiconductor transistor devices, a cutoff frequency fmax


approaching 1 THz is needed to achieve a throughput on the order of 100 Gb/s for small
application specific ICs (ASICs).2 Such performance requirements are beginning to reach
the limits of the physical properties of semiconductors.3,4
Further, it has been noted that the rate of innovation in semiconductor materials and
devices has slowed dramatically, and that virtually no improvement in device speed is
anticipated beyond the next five years.5 Ultra-exotic or "speculative" technologies such
as quantum computing6, DNA7 and molecular computing8 represent a paradigm shift
unthinkable until, at the earliest, the middle of the century. To sustain the historical
performance growth in the electronics industry, a radically new IC technology - one that
is scalable and addresses the problems of both device speed and interconnect delay -
must be identified and nurtured, while keeping cost in mind. RSFQ technology, based on
low-temperature superconductors9, could be the answer.
Rapid Signal Flux Quantum (RSFQ) logic10 is an IC technology with the potential to
leapfrog the performance of traditional silicon and III-V compound semiconductors. ICs
with sub-micron RSFQ static digital frequency dividers have already been fabricated and
operated in university laboratories at over 750 Gb/s." These achievements represent
faster demonstrated electronic circuit speeds than any other technology has predicted to
date, even through computer simulations. Prototype RSFQ circuits made with modest
research-grade 2-3 um linewidth niobium (Nb) fabrication processes have demonstrated
circuits such as those shown in Table 1.

Table 1: Representative RSFQ circuits demonstrated in 2-3 um Nb technologies.


Circuit Type Circuit metric(s) Circuit Type Circuit metric(s)
Toggle flip-flop 144 GHz 2-bit counter 120 GHz
4-bit Shift register 66 GHz 1-kbit shift register 19 GHz
6-bit Flash analog-to- 6-bit Transient digitizer
digital converter 3 ENOB a at 20 GHz with 6x32 bit on-chip 16GS/s
(ADC) memory buffer
14-bit High-resolution 14ENOB& 19-bit Digital-to-analog fully functional
ADC (2 MHz) -100dBcSFDR b converter at low speed
1:8 Demultiplexer 1:2 Demultiplexer
20 Gb/s 95 Gb/s
(synchronous) (asynchronous)
1-bit Half-adder 23 GHz 2-bit Full-adder 13 GHz
SxN bit serial 14-bit digital comb filter 20 GHz
16 GHz
multiplier
128-bit autocorrelator 16 GHz Time-to-digital converter 31 GHz

* ENOB effective number of bits


3
SFDR =spunous-free dynamic range

The RSFQ technology also has a clear path to extend performance. Unlike
semiconductor devices, the speed of RSFQ ICs comes from inherent physical
RSFQ Technology: Circuits and Systems 309

phenomena, not ultra-small scaling. This means that existing lithography techniques can
be employed, and more importantly, existing equipment can fabricate circuitry that
surpasses conventional limits of performance. Because RSFQ logic uses the lossless
ballistic transmission of digital data "fluxons" near the speed of light, the wire-up
nightmare that silicon designers face is substantially reduced. This scenario also allows
the full speed potential of individual gates to be realized.
Other features of this technology that make it suitable for growth into the traditional
market include its compatibility with existing IC packaging techniques. These include
compatibility with optical (fiber) signal input and output, a maturing multi-chip module
(MCM) technology with multi-Gb/s digital data transfer between chips, and simple
interface circuits to convert to and from both ECL logic and CMOS logic levels.
RSFQ integrated circuits are made with standard semiconductor manufacturing
equipment; however, there are many fewer mask layers (typically about 10) and the
actual processing involves much less complex depositions.12'13 Because RSFQ logic is an
all thin-film technology, there are no doping profiles to calculate, no high-temperature
drive-ins, no epitaxial growths or chemical-vapor depositions. These differences translate
directly into reduced costs in the large-scale manufacture of RSFQ electronics.
System-on-a-chip (SOC) architectures, containing both front-end analog circuitry, as
well as digital processing blocks, are fundamental to enabling tomorrow's 100 GHz
applications. This configuration presents extraordinary difficulties for semiconductors,
due to "crosstalk" - problems of interference between the analog and digital sections of
the same chip. Because of the unique reliance on single quanta of magnetic flux to
convey information, RSFQ are inherently more immune to this sort of crosstalk.
1.1 Development of the Rapid Single Flux Quantum IC Technology
The existence of the magnetic flux quantum in a superconductor circuit was first
predicted in 1950 by Fritz London.14 In 1961, Bascom Deaver and William Fairbank
were able to experimentally prove this theory in their laboratories at Stanford University
(Palo Alto, CA).15 The postulation and subsequent discovery of an active superconductor
device—the Josephson junction16—in 1962 quickly led to a number of efforts to exploit
the speed and power advantages of a circuit made from a resistanceless (superconductive)
material; however, virtually all of these ideas sought to copy the voltage-level output of
the transistor, and its predecessor, the vacuum tube.17 Called Josephson "latching logics",
perhaps, the most notable of these development efforts was the superconducting
supercomputer program that ran from 1969-1983 at IBM (Yorktown Heights, NY).18
Ironically, about the same time IBM terminated its Josephson computer project,
innovative solutions were being found to design problems that had plagued the latching
logic approaches. As a result, modern RSFQ logic as a digital superconductor technology
differs from this original latching junction work in three fundamental ways: junction
material, logic convention, and packaging. These advances were a result of work in a
number of laboratories and countries and are summarized in the following sections.
310 D. K. Brock

1.1.1 Use of refractory metals - niobium


Nb RSFQ ICs tolerate thermal and mechanical environments well, unlike previous Pb-
alloy Josephson circuits, which did not endure the environmental stresses of ordinary use.
By the early 1980s, it had become clear that a better material system and more rugged
Josephson IC process was needed. At the Sperry Research Center (Sudbury, MA), Harry
Kroger, Larry Smith, and Don Jillie introduced what is now known as the "trilayer
process", a method of creating reproducible superconducting tunnel barriers across a
wafer.19 Using this trilayer approach, a stable Nb/Al Josephson junction process was
developed at Bell Laboratories (Murray Hill, NJ) by Michael Gurvitch, John Rowell and
others.20 This Sperry/Bell Labs "Nb/Al trilayer process" is now used universally to make
Nb RSFQ ICs and other superconductor devices, such as sensors that measure the
magnetic fields from the heart and brain. The next step in the development of the modern
RSFQ fab process was made in Japan, where, in a Ministry of International Trade and
Industry (MITI) funded project, researchers at Fujitsu, Hitachi, NEC, and Japan's
Electrotechnical Laboratory (ETL) showed how the Nb/Al trilayers, which initially were
used in the US to make only small numbers of junctions, could be developed into a
complete fabrication process for complex Josephson ICs.21'22,23 The junction count in
such ICs is, in principle, limited only by the available lithography and wafer size. Since
the 1980s there has also been progress, particularly at TRW (Redondo Beach, CA), and at
ETL (Japan) in replacing the Nb/Al trilayer process with one that uses NbN as the
superconductor, allowing operation at somewhat higher temperatures (-10 K);24 however,
the majority of R&D in the field worldwide remains based on the Nb/Al process.
/. 1.2 Magnetic flux quanta as binary data
RSFQ logic does not try to mimic semiconductor voltage-level logic, as did the
Josephson latching-logic schemes (which incidentally suffered from the same speed
restrictions of a few GHz that today's semiconductor digital VLSI logics are
encountering).
During the mid-1980s, a group of researchers at Moscow State University (Moscow,
Russia), including Konstantin Likharev, Oleg Mukhanov and Vasili Semenov, invented a
new Josephson junction logic family tailor-made to access the ultimate potential of
superconductors. This new superconductor digital logic family became known as Rapid
Single Flux Quantum (RSFQ) logic.25'26 (see also Ref. 10) This approach relies on
another intrinsic property of superconductors (apart from the loss of resistance below a
critical temperature Tc), namely that within a closed section of superconductor material
any magnetic flux present can exist only in discrete amounts that are multiples of the
magnetic flux quantum, O0 = hlle ~ 2.07 x 10"15 Wb, where h is Planck's constant and e
is the electron charge (i.e., it is said to be quantized).27 When a flux quantum moves in an
electrical circuit, it is manifested as a fast voltage pulse - an "SFQ pulse" - with an
integrated amplitude of \V(t)dt = <J>0 = 2.07 mV-ps.
RSFQ Technology: Circuits and Systems 311

Use and manipulation of single quanta of magnetic flux, in superconducting devices


called "flux shuttles", was first demonstrated in 1971 by Philip Anderson, Robert Dynes,
and Ted Fulton of Bell Laboratories (Murray Hill, NJ).28 Another such logic scheme was
proposed in 1978 by John Hurrell and Arnold Silver at the Aerospace Corporation (Los
Angeles, CA).29 At the same time in Japan, a group at Tohoko University was developing
what would become known as "phase-mode logic".30,3' Although all are based on SFQ
pulses, none of these approaches adequately defined a convenient manner for coding
binary data onto and extracting it from the magnetic flux quanta in the circuit. A primary
contribution of the RSFQ approach was to define a convention for the logical
representation of single flux quantum "l"s and "0"s. This made RSFQ a synchronous
(clocked) logic, departing from the traditional combinatorial Boolean logic style. Since
the mid-1980s, a number of other logic families utilizing SFQ pulses have been
proposed32,33,34, including dual-rail or "delay-insensitive" gates. Nevertheless, the RSFQ
convention for performing digital and mixed-signal logic with superconductors has now
become the dominant approach among researchers around the world.
1.1.3 Availability of closed-cycle refrigeration
RSFQ circuits also no longer have to rely solely on liquid-helium cooling, as did their
previous voltage-state counterparts. Modern commercial closed-cycle cryogenic
refrigerators ("cryocoolers") make packaging RSFQ chips manageable and comparably
inexpensive. Such refrigerators can cool the niobium superconductor chips to their
operating temperature of 4-5 K. Although this cooling is often accomplished with liquid
helium (LHe) in research laboratories, cryocoolers are almost always the best packaging
choice for commercial or military systems. Over the past decade, the reliability and
efficiency of such cryocoolers has improved, while their size and cost has decreased.35 In
fact, it is now possible to purchase a cryocooler that reaches 4-5 K, and fits in the lower
half of a standard 19 in. instrument rack, for about US$20,000.36 Further improvements
and cost reduction, particularly of pulse-tube refrigerators37, seems likely.
1.2 Contents of this review
This article gives a general overview of the RSFQ technology at the circuit and system
level. It is intended to serve as a companion paper to "RSFQ Technology: Physics and
Devices" by Paul Bunyk, Konstantin Likharev, and Dmitry Zinoviev (also in this issue),
which covers, in detail, the theoretical underpinnings, basic operation, and possible future
directions of RSFQ technology. This review is confined to work in low-temperature
superconductor (LTS) development of RSFQ circuits in Nb/Al trilayer material systems.
Recent advances in device/circuit research on RSFQ and digital gates in high-temperature
superconductor (HTS) materials can be found in Ref. 38. Work in NbN/MgO and
NbN/AIN material systems is covered in Refs. 22 and 39 respectively. Reviews of the
status of the commercialization of superconducting electronics in general can also be
found in the literature.40,41,42
312 D. K. Brock

After originating at Moscow State University (Moscow, Russia), the majority of LTS
RSFQ circuit work was performed, over roughly the last 10 years, at the following US
institutions: HYPRES, Inc. (Elmsford, NY); Northrop Grumman (Baltimore, MD); TRW
(Redondo Beach, CA); NIST (Boulder, CO); Conductus (Sunnyvale, CA); University of
Rochester (Rochester, NY); Stanford University (Palo Alto, CA); UC Berkeley
(Berkeley, CA); MIT Lincoln Laboratory (Lexington, MA); and SUNY Stony Brook
(Stony Brook, NY).
More recently, a number of major institutions in Japan have stepped up their
involvement in the design and development of Nb RSFQ circuits and systems. These
include: NEC Fundamental Research Laboratories (Tsukuba, Ibaraki); Hitachi Advanced
Research Laboratory (Kokubunji, Tokyo); Fujitsu Laboratories Ltd. (Atsugi-shi,
Kanagawa); Tohoku University (Aoba-ku, Sendai); Nagoya University (Chikusa-ku,
Nagoya); the University of Tokyo (Meguro-ku, Tokyo); Yokahama National University
(Hodogaya-ku, Yokahama); the Electrotechnical Laboratory (ETL) (Tsukuba, Ibaraki);
and the Superconductivity Research Laboratory (SRL) of the International
Superconductivity Technology Center (ISTEC) (Koto-ku, Tokyo).
In Europe, the focus has continued to be more on HTS materials; however, LTS
RSFQ work is also ongoing at the Technische Universitat Ilmenau, (Ilmenau, Germany),
the Physikalish-Technische Bundesanstalt (PTB), (Braunschweig, Germany), Chalmers
tekniska hogskola (Goteborg, Swedsen) and (on a small level) at Ericsson Components
AB (Stockholm, Sweden).
This article is intended to outline the current state of a number basic circuit-level
"building blocks" and techniques, developed by the above groups and others, which
could be applied towards creating fully-functional RSFQ-based systems.43 Section 2
covers advances in data conversion between analog and digital formats. Section 3
introduces digital signal processing core components such as adders and multipliers and
other infrastructure components, such as clocks and data buffers. Section 4 focuses on
clock and data input/output techniques, which become especially important due to both
the high-speed and cryogenic nature of the technology. Finally, section 5 examines
several RSFQ applications and their system-level benefits, and gives a summary of
promising development areas.
2. RSFQ Data Converters
The quantum accurate periodic transfer function of the Superconducting Quantum
Interference Device (SQUID) makes superconductor circuits an obvious choice for data
conversion from a continuous-time to discrete-time format.44 Likewise, when a quantized
SFQ pulse is used to represent digital data, the AC Josephson effect24 (i.e. frequency-to-
voltage relationship)45 gives access to a method for directly transferring back into the
analog domain. RSFQ analog-to-digital converters (ADCs) have been one of the best-
funded research areas in the field, since they offer both high-speed and high-fidelity
performance. Digital-to-analog converters (DACs) have also received considerable
RSFQ Technology: Circuits and Systems 313

attention, since the basis for their operation is already commercially employed to define
the SI (Systeme Internationale) unit of the Volt.46 Several of the major efforts in these
areas are discussed in the following sections.
2.1 Analog-to-digital converters
This interest in RSFQ ADCs is partly because of potential dual-use applications in
civilian markets, such as software-defined radio, and to defense radar and EW systems.
The fact that the performance of semiconductor ADCs has improved at an average rate of
only -1.5 bits per 6 years is a further stimulus. The performance of traditional
semiconductor ADCs has been summarized by Robert Walden of HRL Laboratories
(Malibu, CA)47, and is plotted in Fig. 1. The overlaid data points for two fully functional
RSFQ ADCs on this plot show demonstrated performance that is already comparable
with the very best semiconductor devices. A number of different RSFQ ADC approaches
have received attention, each design typically focussed on stressing different performance
parameters. Among them are the HYPRES "high-resolution" and "flash" architectures, as
well as Northrop-Grumman's I-A configuration.
22 r

1
__ "N^ — -

• •i • ^•^X
t *N5
••
• • *
•««s^
• • •
^JTTN -N--
; ; • ; • _ > •

(0) Demonstrated
Superconductor ADCs •
f») State-of-the-Art
Semiconductor ADCs

10kS/s 100kS/s 1 MS/s 10MS/S 100MS/S 1GS/S 10GS/S 100GS/S


Nyquist Sample Rate

Fig. 1. Performance regions for RSFQ ADCs with diamonds indicating demonstrated circuits.
Filled dot semiconductor data used with permission of Bob Walden and © 1999 HRL Laboratories,
LLC. All Rights Reserved.

2. /. 7 Phase-modulation/demodulation ADC
Fig. 2 shows a block diagram of the HYPRES RSFQ "high-resolution" ADC based on a
phase-modulation/demodulation architecture.48 The circuit consists of two major parts: a
differential-code front-end quantizer and a digital decimation low-pass filter. The front
end is composed of analog phase modulator and a digital phase demodulator. The phase
modulator consists of a single-junction SQUID, biased by a DC voltage from a special
voltage source, which is stabilized by an internal clock frequency. The phase
314 D. K. Brock

demodulator consists of a time-Interleaved bank of race arbiters-(SYNC) followed by a


thermometer-to-binary encoder (DEC).

Fig 2, (left) Block diagram of a phase modulation/demodulation ADC. (right) chip photo.
The front-end phase quantizer49 operates as follows: a DC voltage generator
continuously pumps magnetic flux into the superconducting Inductive loop of the single-
junction SQUID at a stabilized rate of lA flux quantum per clock period. Inside the loop,
this linearly growing flux adds to the signal flux coupled in from the input current. After
the total magnetic flux in the superconducting loop reaches a certain threshold, the
Josephson junction of the Interferometer switches and releases a single quantum of
magnetic flux from the loop, (I.e., an SFQ pulse). Thus* for a constant Input signal, the
junction simply generates a periodic SFQ pulse train at half the clock frequency. When
the Input signal increases, however, the total flux within the loop grows more quickly,
making the frequency of the output pulse train increase. Similarly, when the Input signal
decreases, the flux within the loop grows more slowly, and, hence, the frequency of the
output pulse train decreases. This mechanism, In fact, produces a continuous linear
analog phase modulation of the train with 2% of phase corresponding to a single flux
quantum worth of Input signal coupled into the phase quantizer.50
Demodulation of the phase-modulated pulse train is performed using a device called a
synchronizer. This synchronizer (sometimes called a "race arbiter") is a specialized
RSFQ shift register In which each cell stores the data pulse directed to it, releasing It only
when a clock pulse arrives. Several such cells (or "channels") are cascaded In order to
eliminate unresolved racing conditions; however, even under racing conditions, the
synchronizer never drops pulses, so the DC component of the signal never drifts. A single
synchronizer channel provides digitization of the phase to a resolution Least Significant
Bit (LSB) value of n, up to an Input signal slew rate of ±1 LSB per clock period. Using a
bank of N synchronizer channels, uniformly interleaved in time within one clock period,
Increases this phase resolution to LSB = nIN and limits the Input signal slew rate to ±N
LSBs per clock period. In order to obtain a binary differential code from the thermometer
RSFQ Technology: Circuits and Systems 815

code outputs of the synchronizer bank, the encoder block adds up these outputs and
subtracts N/2 each clock period.

Fig. 3 (left) This oscilloscope photograph shows the operation of a 10-bit version of the high-
resolution ADC. This chip was operated up to 900 MS/s with a 64:1 decimation filter, (right)
Measurement of a 14-bit ADC performance at 175 MS/s (11.2 GHz clock frequency with 1:64
decimation ratio) using an 8K-point FFT spectrum. For 50 MHz input sinewave: ENOB = 8.9 bits,
SINAD = 55.3 dB, SFDR = -74.3 dBc (12.3 SFDR bits).
The differential code from the output of the front end is passed to a digital decimation
low-pass filter (DSP), which uses a standard CIC (cascaded integrator-comb) architecture
[Hogenauer51] with two integration stages and one differentiation stage. The first
integration stage simply restores the signal from differential code, while the second one
provides the first-order low-pass filtering.
The dynamic resolution or Effective Number of Bits (ENOB) of this ADC is
determined by the input signal bandwidth (BW), the internal clock frequency fdk, and the
number of synchronizer channels N and is given by
ENOB = log2 (NUI HBW) + V4 log2 (fdk I 2BW).52 (1)

The first term in this formula accounts for the slew rate limit, while the second one comes
from standard oversampling gain. Here, the BW is assumed to be half the output
sampling rate (i.e. at the Nyquist limit). Therefore, (1) gives a bandwidth-to-resolution
tradeoff ratio of 1.5 bits per octave. Output spectra of this ADC for both single tone and
two-tone tests is shown in Figs. 3 and 4. Additional details can be found in Refs. 53, 54,
and 55.
This ADC design is especially linear, because the quantization thresholds are set by a
ratio of fundamental constants (hlle) in the SQUID in the front-end. This leads to an
enhanced spurious-free dynamic range (SFDR) in comparison to semiconductor ADCs,
whose thresholds are set by the matching of device characteristics.
316 D. K. Brock

:
iiiiiiU
Fig. 4 Measured spectra on (left) ADC output for a 10 MHz tone (unfiltered). Note the input tone
harmonics; (right) Two-tone ADC test with 8 MHz and 10 MHz signals (filtered). For this initial
test, the ADC does not show significant 3 order intermodulation products.

2.1.2 "Flash " wideband parallel ADC


Fig. 5 shows a circuit diagram and photo of the HYPRES RSFQ flash ADC, which is
optimized for high bandwidth and clock rate. The flash ADC uses a periodic comparator
architecture requiring only N comparators to digitize TV-bit data (unlike semiconductor
flash schemes which require 2W-1 comparators). In the flash ADC approach, the input
signal is delivered to the linear array of N comparators via an R/2R resistive divider
ladder, such that each successive comparator in the array receives half of the remaining
input signal. This configuration results in a parallel TV-bit Gray-code56 output at the end of
each sample interval.
CLK

QUANTIZING
JUNCTION

Fig. 5 (left) Circuit schematic of the RSFQ quantizer of the flash ADC. (right) Flash ADC chip
with 2 complete 6-bit flash converters (one in each comer). Each ADC contains 6 pairs of RSFQ
quantizers, a start-stop acquisition logic block, and a 6x32 bit FIFO memory buffer.
The flash comparator works as follows: as the input signal increases, more flux is
coupled into the quantizer loop. The Meissner effect causes a counteracting circulating
RSFQ Technology: Circuits and Systems 317

current to be induced, which adds to the current bias flowing through the sampling
junction. When interrogated by a clock SFQ pulse, the sampling junction will redirect
this flux quantum to the output latch, corresponding to a sampled value of "1". In
contrast, as the input signal decreases, less flux is coupled into the quantizer loop,
causing less current to add to the sampling junction bias. If a sampling SFQ pulse arrives
from the clock in this state, the floating buffer junction connected to the sampling
junction will release the pulse; therefore, there will be no signal sent to the output latch.
This behavior corresponds to a sample value of "0".
The RSFQ flash comparator design employs a two-spoke SQUID wheel (i.e., a two-
leaf phase tree)57,58 containing two quantizing junctions. When the phase between these
junctions is adjusted to equal n (i.e. V2fluxquantum), the combination acts as if it were a
single very small junction. This reduces the effective LI product of the comparator
circuit, helping to linearize its dynamic performance. A 1-pH shunt inductor in parallel
with the input transformer inductance helps to further reduce the L/R time constant of the
circuit. A small feedback resistor can even be used to further linearize the comparator's
dynamics at high frequencies. Fig. 6 shows "beat frequency" signal reconstructions from
a 5-bit raw Gray-code RSFQ flash ADC. The onset of dynamic distortion can be seen as
the input frequency increases. The bandwidth of this front end has been demonstrated in
beat-frequency tests up to 30 GHz.

Fig. 6 A panel of oscillographs showing the performance of the flash ADC plotted in Fig. 1, (from
left to right) 5 bits at 4 GHz, 4 bits at 12 GHz and 3 bits at 20 GHz.
The i-cm2 digitizer chip (shown in Fig. 5) contains two 6-bit transient digitizers. This
circuit is a six-bit version of the ADC used in the beat-frequency tests; however, for each
bit, an acquisition switch set and a 32-stage acquisition shift register were attached to the
comparator's clock and data output ports. Using this design, single-shot pulses with
risetimes < 100 ps were digitized (see Fig. 7). Here, the sampled data reveals structures that
suggest ringing which is not visible on the sampling scope reference. The bandwidth of this
ADC test chip has been shown to exceed 10 GHz (the 3 dB point is at ~I6 GHz).
When comparator threshold distortions are present, there is a marked decrease in
ADC resolution. Architectural improvements, including redundant comparators and real-
time digital error-correction logic, can restore much of this lost performance. By
interleaving several identical comparator thresholds, using XOR logic to combine pairs of
thresholds, then XOR-ing those thresholds, new thresholds for additional bits of
318 D. K. Brock

resolution can be synthesized.59 In addition, the MSB comparators receive the smallest
fraction of signal current, resulting in the widest "gray zone" near the threshold. By
interleaving the thresholds of two comparators per bit, and using real-time RSFQ digital
logic, the comparator furthest away from its gray zone can be chosen. The algorithm uses
the output from the previous bit to choose the correct comparator, hence the name "look-
backw logic.60 Fig. 8(left) shows a prototype Flash ADC chip with these additional
features.
J . . „ r . ^ , . , r . , r r .^r^...., . r ..r r _ ^

400

200 : :
n
!
> -200
0

n / :
-400

u
| \
-800

•t i
j_J_i_a_i_L-i-j_i_J_uu-^!

400 500 0 32 64 86 128 0 32 64 96

Holt Sampi® Number Sample Number

Fig. 7 Reconstructed sampled data from an RSFQ Plash ADC digitizer, (left) A periodic analog
pulse signal captured on a sampling oscilloscope; (middle) 8 GS/s single-shot data; (right) 16 GS/s
single-shot data.

rr Pr****** ^•-' * K ¥ 1 ^ & 1 1,-f I

MMM|

Y-V

'W'

Fig. S (left) The advanced Flash ADC architecture with interleaving and lookback logic, (right) A
"ping-pong" time-interleaved configuration for a 40 GS/s bunch profiler.
Very high frequency semiconductor ADCs often use a "ping-pong" architecture, in
which two separate ADCs work in tandem* sampling the input signal 180° out of phase in
order to effectively double the sampling frequency. This approach also works with
RSFQ Technology: Circuits and Systems 319

RSFQ ADCs. Fig. 8(right) shows a set of twin Flash ADCs arranged to form a 40 GS/s
"bunch profiler" for application in high-energy physics experiments for the US
Department of Energy. Although quite different at the device level, this design
demonstrates the ability of RSFQ circuits to re-use existing architectures from traditional
IC design. The flash ADC could also be used as a front-end component for a number of
envisioned applications, including transient digitizers, channelized wideband receivers,
and digital beamforming receivers.
2.13 Sigma-Delta ADC
Semiconductor ADCs have used an architecture known as a E-A modulator for some
time,61 RSFQ technology can also be used to implement this type of ADC. The traditional
E-A approach uses an op-amp to add up successive digital samples (the E operation) and
then subtract the generated digital word from the total input signal (the A operation). The
modulated digital signal is then sent to an integrator (low-pass digital filter) where the
Nyquist-rate digital words are generated. Fig. 9 shows the circuit photo and schematic of
a single ••fluxquantum £»A modulator design from Northrop-Grumman (Baltimore,
MD).62,63'64 The modulator uses a superconductive inductor at the input to integrate the
applied signal voltage and then a single junction quantizer to digitize the integrated
current. The A-feedback is in the form of SFQ pulses. The circuit essentially acts as an
electrical servo, which balances the input signal with feedback pulses that represent the
bits out. One advantage of using SFQ pulses as this, feedback is that each pulse is
identically repeatable and accurate.

Fig. 9 (lei) E-A modulator circuit photo, (right) E-A modulator circuit diagram. (Photo courtesy J.
Przybysz, Northrop-Grumman.)
The circuit is clocked by sampling pulses generated by a 1.28 GHz room-temperature
source and sharpened by a pulse buffer before being applied to the modulator. Each time
the sum of the sampling pulse current, the junction bias, and £-inductor current exceed
the critical current of the sampling junction, the quantizer produces an SFQ A-feedback
pulse and reduces the current in the inductor by *</L. The Fourier transform of the ADC
output digits in Fig. 10 shows the reduction in quantization noise at low frequencies near
the signal frequency.
320 D. K. Brock

Frequency (MHz)

Fig. 10 Northrop Grumman second-order A-£ modulator data FFT showing second-order noise
shaping with the circuit in Fig. 9. (Photo courtesy J. Przybysz)65

2.2 Digital-to-analog converters


A number of different programmable Josephson voltage standards have been
proposed.66'67 All of these designs are essentially digital-to-analog converters (DACs)
based on the properties of flux quantization. The fact that an SFQ DAC uses the same
fundamental physics that define the unit of the Volt has some profound consequences.
For instance, any instantaneous voltage generated by the DAC will be precise to the
accuracy of the definition of the Volt. Further, every waveform cycle generated will be
exactly the same, with quantum precision. The intrinsically small time constants
associated with Josephson junctions may make it possible to extend such performance to
many GHz, although very large arrays of JJs may be necessary to achieve useful levels of
output voltages.
2.2.1 Voltage-multiplier Josephson DAC
Vasili Semenov at SUNY Stony Brook first suggested an RSFQ DAC design based
around a voltage multiplier (VM) block.68 This DAC (seen in Fig. 11) uses each bit of an
yV-bit RSFQ digital word to drive an RSFQ digital-to-frequency converter (DFC)
(sometimes noted as "SD"). A DFC is designed to output a stream of SFQ pulses at a
frequency that is proportional to its reference clock frequency, only when the bit value at
its input is "1". By arranging a series of N DFCs with reference frequencies fN that
decrease as 2'N, one can effectively create a binary-weighted set. By switching different
DFCs in and out of the series with the digital input word, any of 2N combinations can be
chosen. The VM is an inductively-coupled SQUID chain used to transform the DFC
streams of flux quanta into time-averaged voltages, then sum them, creating a
corresponding output voltage with vV-bit resolution. This arrangement constitutes a
programmable Voltage Standard, since the output voltage is derived directly from the
input word and the Josephson frequency-to-voltage relation. By updating the N-b\X input
word periodically, at a rate slower than the slowest DFC reference frequency, one creates
RSFQ Technology: Circuits and Systems 321

a DAC. The voltage at the output of the DAC during a single sampling period Is given by
Vout=M#Q/o-» where fQ Is a readout or sampling clock frequency, and Mis the total number
of SFQ pulses driven through the VM by all the DFCs. The LSB of the output voltage Is
w-#o:/cb where n is the number of stages in the smallest stage of the VM, and,/© Is again
the output sample rate. The output dynamic range is 2^-LSBs where N is the resolution of
the DAC In bits.

rS t r 1 r ; .' i f -,• Jt —
f ! -i sc j I 5 D r j s ,. j — - , w j- *j_ j > ..266

L
* , ^ s •? v * f V * % s * A » * * • , % h{ "F T JT lu^c™ -
***V

, 5 . \ / *$• X^ X^ s s ? \ \ - v ^ v ^ ^ W '

l;JV,. 11. <k:tt) DA<J chip, (right) DAC! block diagram. (Figures courtesy of V. Semenov, SUNY
SU«n.) Brook.)
Fig. 11 shows a chip photo and block diagram of the voltage multiplier DAC. Many
bits of dynamic range are possible, because the Initial reference clock can be very high.
For instances the chip in Fig. 11 is a 22-bit DAC. Experimental results of an 8-bit design
are shown in Fig. 12. The differential non-linearity of the DAC is excellent (< 0.1 LSB),
With the proper microwave engineering of the VMs, a multi-GHz output rate (effective
bandwidth) could be achieved* while maintaining significant dynamic range. The update
clock and output clock are synchronized to prevent spikes during code transitions.
500
0.20 1 Isb a 1.58$/ ® Ufj©f = 25.2MV;
Accuracy of 8 bit RSFQ DAC
US8302, Wafer 2844g j

0,10

0.05

0.00

-0.15 128 256


100 150 200 250 Input code
Digital Code

Fig. 12 (left) Deviation of measured quantization levels from the expected uniform positions,
(right) Dependence of output voltage of the DAC on applied digital code. (Figures courtesy of ¥.
Semenov, SUNY Stony Brook.)
322 D. K. Brock

2.2.2 Pulse-driven Josephson DAC


The National Institute of Standards and Technology (NIST), (Boulder, CO) has also been
developing a Josephson DAC which exploits single flux quantum pulses to create a
programmable or "AC" voltage standard.69 It is shown in Fig. 13. To operate, a desired
tone (or any other periodic waveform) is first synthesized. A 2-A modulator algorithm is
implemented in a computer program to generate a set of digital samples of this given
periodic waveform S(t), which is stored in the memory of a semiconductor digital code
generator. This data, S(i), is used to generate an output digital pulse code, SD(t), in which
the density of ones is proportional to the magnitude of the original given signal.
Naturally, the sequence will S(i) contain harmonics of the desired waveform (i.e.
quantization noise) which is characteristic of the sampling process. By using a A-S
modulation approach, however, this quantization noise can be substantially reduced or
"pushed out-of-band". If the digital code generator were ideal, SD could simply be low-
pass filtered and used as the analog waveform; however, real digital code generators have
both amplitude and phase distortion - this is where using flux quantization helps.
It has been shown that the Josephson frequency-to-voltage relation holds for pulse-
driven Josephson systems, just as for sinewave-driven systems, as long as the frequency
of these pulses is below the junction plasma frequency.70 In this case, SD is applied to a JJ
array at a few GHz. The instantaneous oscillation frequency across the array is then
simply proportional to the time-averaged voltage or density of ones (i.e. density of digital
code generator pulses) being driven across it. The JJ array serves to quantize the time
integral of these voltage pulses by O0, which cleans up many of the non-idealities in SD.
The resulting output Sj can then be applied to an analog low-pass filter to recover a much
higher fidelity output signal S'(t). Fig. 13 (right) shows the output spectrum of this DAC
generating a 23.4 kHz sinewave. The second harmonic is clearly suppressed to about -75
dBc. In the absence of noise or jitter, this should increase to <-100 dBc. The output
voltage swing of the generated waveform is proportional to the number of junctions in the
array. A 1000-junction array could generate waveforms a few milivolts peak-to-peak, so
much larger single-chip arrays would presumably be needed for most applications.

Fig. 13 (left) Block diagram and (right) measurement results of pulse-driven Josephson DAC being
developed at NIST for AC voltage standards. (Figures courtesy of S. Benz, NIST)71
RSFQ Technology: Circuits and Systems 323

3. RSFQ DSP and Infrastructure Blocks


Basic digital processing functions are performed in RSFQ as in any other logic family.
For carrying out the manipulation of bits, adders, multipliers, clocks, registers,
interconnects, and data buffers are all needed, with active Josephson transmission lines
(JTLs) to serve as delays for time synchronization72. Other blocks include parallel-to-
serial converters, clock generators, and phase-locking devices. The diversity of
components that have been (or are being) developed has brought RSFQ technology to the
point where the design of larger systems with more functionality can be reasonably
considered. Several key blocks for general DSP applications are covered in this section.
The discussion of the applications themselves is reserved for section 5.1.
3.1 Digital clocks and phase-locked loops
The naturally high frequency of RSFQ circuits requires that digital clocks of equally high
frequency be generated and distributed throughout the chips themselves. For "low-speed"
circuits (i.e. < 20 GHz) it is often convenient to use an external synthesized source to
create a sinewave clock trigger that can be sent to the chip via high-speed coaxial cable;
however, when useful systems are considered, a more practical method of clock
generation is required. Further, the necessity of having very fast RSFQ circuits operate
with the traditional semiconductor-based instrumentation leads to a requirement not only
for an on-chip RSFQ clock, but also for it to be imbedded in a full phase-locked loop
(PLL) to create a complete useful RSFQ subsystem.
3.1.1 On-chip clock sources
The characteristics of several SFQ clock sources are summarized in Table 2. These
include single over-biased junctions (which do not have particularly narrow linewidths
due to thermal noise), JJ arrays (which require special care in ensure coherent radiation),
JTL oscillator rings (which are easy to design, but offer only limited tuneability) , and
so-called "long" Josephson junctions (which can be operated in either flux-flow or
resonant modes).
Table 2. A comparison of different single flux quantum
clock sources

Oscillator Type Quality Factor Frequency


2 3
Single Junction 10 - 10 > 100 GHz
Junction array 10 3 -10 5 > 100 GHz
Long junction
103 > 100 GHz
(flux flow mode)
3 3
JTL oscillator ring 10 -5xl0 10-20 GHz
Long Junction 5 6
(resonant mode) 10 -10 1-100 GHz

High quality room-temperature clock sources at these high frequencies, which could
be used to externally clock the circuits, are extremely expensive, difficult to package, and
324 D. K. Brock

suffer strongly Increased timing jitter when transmitted over long distances. Increased
jitter would prove detrimental to the performance of any digital circuit. For these high-
speed Integrated circuits, a low-jitter, on-chip clock technology is needed. An on-chip
clock technology consists of four key building blocks: a stable, low-jitter master clock
generator; a clock distribution network; a clock decimator and selector to generate and
select a sub-harmonic of the master clock frequency; a phase-locked loop (PLL) to
synchronize the on-chip master clock with a stable external clock to provide long term
phase stability.

Fig. 14 Measurements of a 56.0 GHz LU resonator SFQ clock pulse train digitally divided by V ibr
display in (left) Time-domain showing a period 256(T) = 4.52 as and (right) l-,rcqucncy~domain
with/256 ==221 MHz.
An example of a high-quality on-chip SFQ clock is shown in Fig. 14. The long
Josephson junction (LJJ) oscillator in resonant mode offers a compromise between
tunability and stability. By biasing on the iV* zero-field step (ZFS) of an LJJ (N = 7 in
Fig, 14), a train of SFQ clock pulses can be generated at the resonant frequency set by the
LJJ's size, or alternatively, at any higher harmonic of this characteristic frequency.74
Digital frequency dividers can also be used to access sub-harmonics of the clock pulse
train.
Other approaches have also been employed to generate a SFQ clock on chip, while
maintaining synchronization with room temperature electronics. One approach employed
an SFQ clock coupled to a resonant length of microstrip transmission line,, so that the
clock becomes "locked" at the resonant frequencies of the microstrip line.75 Another
variation employed a frequency doubling technique in which two SFQ pulses are
produced for every one that is input, so that several cascaded doublers can bring the
frequency up to the range of interest.76 The first of these is a narrow-band approach,
suitable for specific frequencies, when tuning is not needed. The second is somewhat
limited in range and lacks a feedback mechanism for long-term stabilization.
3. J.2 Phase-locked loops
A more traditional approach to synchronization - the PLL - is shown in Fig. 15.77
The layout on the right is an RSFQ PLL suitable for use with any standard RSFQ circuit.
The basic phase-locked loop design is that of a classical digital PLL,78 in which the phase
RSFQ Technology: Circuits and Systems 325

detector and the frequency divider are digital, but the loop filter and voltage-controlled
oscillator are analog. Here, the VCO is an over-biased Josephson junction and the clock
output is a train of SFQ pulses. Although certainly not the highest Q resonator, it is
nevertheless a suitable (and simple) method for this proof-of-principle design.
In this circuit, the fast output of the VCO is phase locked to a slower reference
generated at room temperature signal by first passing through a frequency divider (a
toggle flip-flop counter) and then being compared against the reference signal in the
phase detector. The phase detector is a device that compares the phases of the two SFQ
pulse trains at its inputs and issues an analog feedback -signal proportional to the
difference in phase between the signals. This output signal is the filtered by a loop filter
(it may also be amplified or attenuated as necessary) and fed back to the VCO bias in
order to minimize the phase difference between the two signals. Fig. 15 (right) shows an
on-chip phase-locked loop (PLL) operating at 50- GHz and locking to a reference
frequency over a range of 1.5 GHz (±1.5%). The pull-in and locking ranges were
indistinguishable.

Fig. 15. (left) Layout of an RSFQ PLL (right) PLL demonstration at 50 GHz VCO frequency.
Traces show phase matching at 12.2 MHz betweenreferenceclock and frequency divider output,
and phase detector output. The two signals are out of phase by 90 degrees.

1 2 RAM and FIFO buffers

3.2.1 Random access memory


Because it is composed of a ratio of fundamental constants, the magnetic flux quantum
(Oo), stored as a persistent current in a superconductors is an excellent choice for a unit of
data storage for large memories. The ability of such flux quanta to be both stored and
transferred with virtually no power dissipation allows the development of various large-
scale circuits with on-chip memory, connecting them into deeply pipelined devices.
While RSFQ logic designs successfully use these features, RAM designs have
historically not fully exploited this ability.
326 D. K. Brock

The most successful superconductor RAM implementation thus far is one from
NEC. The design approach used in the NEC memories combine SFQ memory cells with
AC-powered voltage-state Josephson periphery circuits (readout, decoder, etc.).
Unfortunately, the requirement of large external AC-power limits the clock cycle to about
1 GHz, making the RAM throughput insufficient to match the much faster, DC-powered,
RSFQ logic family.80 Recently, however, new DC-powered SFQ RAM approaches have
been pursued that may help this issue.81,82
One new DC-powered Cryogenic RAM or "CRAM" consists of SFQ memory cell
arrays, dc/SFQ decoders, current drivers, sensing gates, and block multiplexor and
demultiplexers. The general structure of the RAM is shown in Fig. 16. In order to
increase throughput, a 16-Kbit RAM chip is divided into four 4-Kbit blocks. Each block
comprises a row-accessible 128 x 32-bit matrix, where each row of contains one 32-bit
word. A block demultiplexer distributes input data between blocks, where a decoder
converts address and/or data into DC currents on microstrip lines, which propagate near
the speed of light, to operate on rows of SFQ memory cells. In a READ operation, a 32-bit
RSFQ word appears across the block output and is funneled to the CRAM output by a
block multiplexor.

1
32-brt Input Data V*rd
Block n Address I
Block Demultptexs

Address J I & W/R 1 '

Fig. 16. (left) Block diagram of the CRAM combining four blocks of memory arrays, a block
decoder, Y-decoders, select line drivers, sense gates, and output block multiplexor; (right) Single
block of the CRAM including an address decoder and drivers, a 32x128 memory cell array, X-
drivers, and output sense dc-to-SFQ converter.

Fig. 16 (right) shows a more detailed schematic of a single 4-Kbit memory block.
Access to individual SFQ memory cells is provided via both magnetically coupled and
direct microstrip lines. The sign of the currents depends on whether READ or WRITE
operations are selected. Each WRITE1 operation is preceded by an erase or WRITEO to
reset the cell. The SFQ memory cell itself (see Fig. 17) is based on a modified version of
RSFQ Technology: Circuits and Systems 327

the NEC vortex transition (VT) memory cell83,84, with non-destructive readout and
current control. If the read-out SQUID switches to the resistive state during a READ
operation, the Y-column sense read-out cell (dc-SFQ) transforms the DC-signal into an
RSFQ bit.

Read-out RITE1
(••
select)
SFQ RAM Cell Access Table
Read-out \ /
X SQUID / ^ Operations Select Lines Access

WRITE 1 +X+Y Bit


READ
(+X select) WRITEO -X Word
READ +x Word
Storage
SQUID

Fig. 17. An SFQ memory cell with DC-powered row-accessible selection.

3.2.2 FIFO memory buffers


First-In First-Out (FIFO) memory buffers (of the type used in section 2.1.2) can be
created using basic shift register cells. The first RSFQ shift register was a simple chain of
RS flip-flops successively clocked by an SFQ pulse on a Josephson transmission line
(JTL) traveling opposite to the data stream. This simple design later extended, with an
additional clock JTL, to build a reversible shift register.85 The subsequent merging of the
clock JTL and data RS flip-flop into a single cell resulted in a junction-efficient shift
register design with only two junctions per bit, often called a "2-JJ" design shown in Fig.
18.86 All these designs have a constraint between the delay of the clock pulse xcik and the
delay of the data shift tshlft to avoid racing conditions. Specifically, the clock delay tclk
must be negative (i.e. the clock and data must travel in opposite directions), and must be
greater than xshift: (-Tei^Tsh.ft).87
Fig. 19 shows the measurements of 256xl-bit and 4xl-bit 2-JJ shift registers at 12.36
Gb/s and 60 Gb/s, respectively.88 In Fig. 19(left) the envelope of a directly triggered
measurement of the shifted sequence (1110011) at 6.18 GHz is shown in trace (a).
Increasing the dc/SFQ clock trigger offset, the input clock frequency can be doubled. The
consequence is seen in (b) where the output envelope is moved to a new position in time
(by half the delay of the shift register). Specifically, '/2(6.18 GHz)"'-(256 bits) = 20.7ns,
corresponding to an operation at the double frequency 12.36 GHz. Trace (c) corresponds
a leap to the position when the dc/SFQ converter generates the clock pulse train. Similar
measurements of a 2-JJ 32-bit shift register yielded proper operation at a 31.8 GHz clock
frequency.
328 D. K. Brock

Fig. 18 The most common "2-JJ" RSFQ shift register design, which features reversible operation as
well as a minimum number of junctions per cell. Clock distribution junctions are on the top of the
circuit, while data storage registers lie across the bottom.

20i>V 20ifV 200*V

<V> on '•
Clock JTL *

'Dig tar r
SRjOjktpik.

"V*. •"**• -f|i» *W 'tf*>'

Fig. 19 Shift register measurements, (left) Triggered clock test of a 256-bit SR with a data pattern
of (11100111) at 885 MHz for clock frequencies of: (a) 6.18 GHz (b) 12.36 GHz and (c) >2.4 mA
(clock pulse train generation), (right) Non-triggered clock test of a 4-bit SR showing correct
operation to -124 uV (~60GHz). Vertical scale 9.7 GHz/div. (frequency = <voltage>/(/j/2e)).
Horiz. scale 50 u.A/div.
In an un-triggered high-frequency test, a -1-200 Hz triangle wave (sweep) current is
applied to the input of a dc/SFQ converter (the generator of the clock pulse train), and the
time-averaged DC voltages on both the clock JTL and the output of the shift register are
measured, with the data signal frequency adjusted in order to observe a few data
envelopes in each sweep period. Fig. 19(right) shows the high frequency un-triggered
measurement of a 4-bit buffered shift register using this method. When operating
correctly, the digital SR output voltage is the same as the voltage on the clock JTL
whenever the data value is a "1". The digital SR output voltage is zero when the data
value is "0". In this measurement, beyond -124 uV (or -60 GHz according the Josephson
voltage-to-frequency relation), the output no longer follows the clock. This sets the
maximum operational frequency. Even these speeds are still lower than the simulated
maximum frequency 75 GHz. This is due to: the maximum frequency of the input
RSFQ Technology: Circuits and Systems 329

dc/SFQ converter (-65 GHz); underbiasing of the shift register due to the small
parameter margins of some cells; and the resonances in the DC bias lines.
3.3 Adders and multipliers
In general, the throughput of a computational process can be increased at the expense of
latency by using a pipelined architecture. Such pipelining normally requires additional
registers, which in a semiconductor technology can more than double the size of the
circuit. In RSFQ logic, however, register storage is built into the gate itself. This inherent
memory state (i.e., storage of single flux quanta) makes it possible to build a basic set of
useful cells from Toggle and Reset-Set (T and RS) flip-flops and their modifications.
Such modifications can include adding destructive and non-destructive readouts (DROs
and NDROs), as well as fan-in buffers (often called "confluence buffers" or "mergers"),
and fan-out cells (typically called "splitters"). Using this technique, basic cells have been
developed to execute many common digital signal processing (DSP) elementary
operations, including Addition, Accumulation, and Multiplication. Details of the design
and experimental evaluation of such cells have been extensively described in the
literature10, therefore only a few examples will be given here.
3.3.1 Basic DSP gates
Since all these cells have an internal memory, they provide their own registers to
synchronize and effectively pipeline their inputs and/or outputs. For a basic latching data
register, an RS flip-flop or destructive read-out (DRO) call as in Fig. 20 can be used.
Combining a T flip-flop with a DRO and confluence buffer performs the function of half
or full addition (FA-cell).89 As another example, a D flip-flop with non-destructive read-
out (NDRO or NR-cell) and confluence buffer can be used to realize the function of
accumulation as shown in Fig. 21.90'91

T
r
D flip-flop (DRO)
Fig. 20 (left) Circuit and (right) block diagrams for a Dflip-flopused as a destructive readout cell.
To take advantage of the speed (and hence throughput) of RSFQ, serial math
approaches have been explored. For implementation of serial addition, an entire carry-
save serial adder (CSSA-cell), as seen in Fig. 22, has even been designed. This cell is an
excellent demonstration the capabilities of a "non-Boolean" design approach to RSFQ.92
330 D. K. Brock

r S
FF

y
X
P
D flip-flop with AND (NDRO)
Fig. 21 (left) Circuit and (right) logical diagrams for a non-destructive read-out cell.

Carry-Save Serial Adder (CSSA)


Fig. 22 (right) Circuit and (right) logical diagrams for the CSSA.
The implementation of these gates is detailed in section 3.3.3. Another set of basic
RSFQ cells has also been successfully developed and reported on.93 Furthermore, a
traditional approach of composing DSP elementary functions using RSFQ Boolean cells
and flip-flops has been described as well.94
3.3.2. The B-flip flop
Success in the design and demonstration of RSFQ functional blocks, such as those in
section 3.3.1, was accompanied by development of a "universal" RSFQ cell. The idea of
the "bi" flip-flop or B-flip flop (shown in Fig. 23) was to take advantage of the inherent
memory capability of RSFQ logic to form a template cell, from which a number of useful
functions could be derived simply by connecting different cell inputs and/or outputs, with
shorting or opening of different branches.95 The B flip-flop has 4 inputs and 6 outputs
altogether. Input SFQ pulses can change the internal state ("1" or "0") of the cell by
introducing a flux quantum into the center, shared interferometer. This internal state,
along with additional input pulses, can then define logical outputs with reference to
different points in the cell. Major modes include RS2, T2 and T-RS configurations,96
which utilize an input toggle of the memory state, a persistent set-until-reset state, or a
RSFQ Technology: Circuits and Systems 331

combination of both. Possible logical operations include NDRO-cell, toggle flip-flop with
synchronous destructive readout (Tl-cell), full-adder, or even an asynchronous
(reversible) up-down counter. This template also serves as the basis for the gates
described in section 4.1.

-*-

(S1 V S2) A 0 "


J - (R1 V R2) A 1

-X- -X-

Fig 23. An approach to a template RSFQ gate, the bi (B) flip-flop, here configured in RS mode.
Logical expressions at the outputs indicate the dependence on the inputs.

3.3.3 Serial vs. parallel architectures


As previously mentioned, the high speed and naturally pipelined architecture of RSFQ
logic makes it possible to consider both parallel and serial approaches to compute-
intensive functions. Specifically, multi-rate processing, in the form of serial math, is one
method to take advantage of the very high clock speeds available, while minimizing the
total number of devices needed for a given task.
The most common DSP functional primitive is X = WY + Z. In order to perform this
function, both multipliers and adders are required. Fig. 24 shows single-bit modules for
both serial and parallel implementations of multipliers. These modules can be designed
using the basic set of RSFQ cells described above; then interconnected with JTLs to
provide SFQ pulse transmission and to set the necessary delays within and among them.
For example, a serial multiplier module which dissipates only 13 (i\V has been
constructed using 48 Josephson junctions (JJs).97 The parallel multiplier module (PMM),
also shown in Fig. 24, employs a total of 67 JJs and dissipates only 19 uW of power
experimentally at 15 GHz.98
332 D. K. Brock

CLK CLK
^
NR NR
a/E a a£

• * k
b ab b b bit
DR DR
* ^
>. _ ab *\
jS CLK A V
L*- L»- SUM
FA
CSSA ^ ^
A B 1
SUM
DR f CARRY

rCLK

Fig. 24 Block diagrams of (left) an RSFQ serial multiplier and (right) single bit slice of an RSFQ
parallel multiplier.

The serial-multiplier in Fig. 24 uses three elementary RSFQ DSP cells: destructive
readout registers (DROs or DRs), NDROs or NR cells, and carry save serial-adders
(CSSAs). After clearing the internal carry bits of the CSSA, iV-bit operand B is loaded
(either serially or in parallel) into the top TV-bit DROs where it will be held for the
multiplication cycle. If successive multiplies by the same coefficient value are required,
operand B need only be loaded once. An yV-bit operand A is then shifted into the DR
register from the right, LSB first. After the first clock, the LSB of the result product is
available at the sum output of the rightmost CSSA. Operand A is shifted N times and then
padded with zeros and shifted N more times to produce a 2jV-bit product shift out to the
right from the rightmost CSSA. At the end of each clock period, the serial-multiplier
delivers each consecutive bit of the 2A^-bit product (A x B) at the output terminal. The
partial single-bit multiplications are performed by the AND cells, while the CSSAs sum
up the partial products. The whole multiplication takes 2N clock periods; however,
loading of the next B operand can take place during the last N periods of the previous
multiplication cycle. This is a highly efficient method of multiplication and forms the
basis of the FFT processor described in section 5.1.7.
Built-in self-test (BIST) circuitry is necessary to perform the GHz rate digital testing
necessary for this cell. The placement of FIFO memory buffers at the input and output of
the cell under test (CUT) provides a method of loading in a test vector at low-frequency,
then switching to a high-speed signal, which clocks the CUT and stores the output in
another on-chip buffer, where it is again available for low-frequency readout. Such high-
speed tests are shown in Fig. 25. "
RSFQ Technology; Circuits and Systems 333

)•'!;,. ?.5. (i-^U i iirji-speoii operation of an 8-bit serial multiplier with shift register lest system at 63
OMz and j'5 MHz ioad/'oit-ioad dock. Inputs: w = | i l l i l l i l ) » Y«{ 11000000); output
YW»{1011111101000000}; (right) Correct operation (WX + Y + Z) of a parallel multiplier
module with shit-register test system at 5.3 GHz (load/off-load clock is 2.5 MHz) for
W«X*(0000101), Y=(0110101), and Z=(1010001). The outputs for both are in NRZ format
A consequence of the high speed of RSFQ logic is that additions and multiplications
can be implemented as either parallel or serial operations, without much reduction in
performance. If both operations are bit-wise pipelined (i.e. no cany propagation) then
circuit complexity can be directly traded for execution time. For example, as previously
mentioned, an JV-bit x M-bit multiplication requires Nx Maddition operations; however,
these can be accomplished as either M additions. over N clock cycles, or by N x M
additions in only one clock cycle.* The parallel implementation of a 16-bit multiply-
accumulator requires 2SS full adders; however, a serial implementation requires only 33
Carry Save Adders (CSA). With an internal clock rate of 32 GHz, it would perform the
operations at a throughput of 1 GHz. An intermediate trade-off could also be designed,
such as using 64 adders over 4 clock periods. This tradeoff example give a good feel for
the design flexibility enabled by the high-speed bit rates possible in RSFQ.
4. ESFQ Chip I/O Approaches
An extra level of complexity exists at the interface between RSFQ circuits and off-
the-shelf components. This is due to the high aggregate data rates involved, the cryogenic
nature of the technology, and the low-level signals of RSFQ. One method for bringing
data onto or taking data off an RSFQ chip is to perform time-division multiplexing /
demultiplexing. Such "muxing / demuxing" of data can. be done synchronously or
asynchronously, depending on whether clock recovery is required between data source
and destination. The low-level superconductor digital signals can be somewhat amplified
to a point where custom semiconductor digital amplifiers can provide output at standard
emitter-coupled logic (ECL) levels. Further, direct high-speed data I/O may be possible
using on-chip electro-optic techniques. In many cases, partitioning of an RSFQ system
may involve the use of a superconductor multi-chip module, which can allow direct data
transmission between chips at tens of Gb/s. Ultimately, it is the ciyopackage containing
334 D. K. Brock

the RSFQ chip or MCM that actually interfaces with the cooling apparatus and I/O path
which sets the final cooling requirement.100
4.1 Multiplexors and demultiplexors
The B flip-flop with joined inputs (or T2 flip-flop), as described in section 3.3.2, can be
used as an RSFQ dual-rail (asynchronous) demultiplexer. The circuit in Fig.26 (left) has
two input lines - one for input zeros "In 0" and one for input ones "In 1". The ones are
applied to the input of one internal T flip-flop and the zeros are applied to the input of the
other internal T flip-flop. In this configuration, the direct and inverted outputs of each T
flip-flop form the dual-rail (asynchronous) demultiplexed output "Out A and B".
In "0"
Out "0"
L7
J4
OutB "©" L2 L3 InB"0" L3 lnA"0"
•X-^-X- OutA "0"
Reset J5 *J6
-«#~
LI
L4 L5
OutB"l' OutA "1" InA " 1 "
J 8 * J9 J 1 0 > k j i 2 InB"l"

L6>
Out"l"
In T
Fig. 26 (left) The schematic of a T2-mode B flip-flop configured as a dual-rail 1:2 demultiplexer
and; (right) schematic of an asynchronous multiplexor cell of similar design.

In " 0 "

Fig. 27 (left) Functional test of the T2 demultiplexer cell showing correct operation. All outputs are
NRZ data, (right) High-speed test results of the T2 demultiplexer cell. (Horiz. axis units - mA;
Vert, axis units - mV.) One trace is the time-averaged voltage input to "In 0". The other is a trace
of twice the time-averaged voltage on "OutA 0". Correct operation is seen to the point where the
two traces diverge. Input test pattern is (00011011) which is correctly shuffled between OutA and
OutB up to 95 Gb/s.
In order to determine the maximum working speed of this demultiplexer, a time-
averaged DC I-V characteristic is measured. Scanning a current on the input In 0, the dc
RSFQ Technology: Circuits and Systems 335

voltage response on the In 0 and OutA 0 terminals can be monitored. Fig. 27 (right)
shows the I-V curve from this experiment. For convenience, the voltage on OutAQ
terminal was multiplied by two. As seen, the traces coincide up to 1.9 mV» which means
the maximum working frequency Is 95 Gb/s, according to the Josephson frequency-to-
voltage relation. Of course, this Is not a complete test for such circuit, but It does give a
good estimation of the limit of performance the demultiplexor.
A schematic of an asynchronous dual-rail 2:1 multiplexor is shown In Fig. 26 (right).
This cell Is also based on the B flip-flop, with output terminals serving as Inputs and vice
versa. The Input Inductances serve as buffers for Incoming data. In the initial state,
junctions Jl and J8 are not biased, causing data arriving to input B to be stored in the L2
or L4 buffer and to wait for data arrival on input A. Meanwhile, junctions J4 and J12 are
biased, which allows data (either "0" or "1") arriving to the input A to pass through the
inductor L3 (or L5). The multiplexor internal state toggles with a delay provided by shunt
resistor R, allowing the data captured in input buffers to be released. The multiplexor
alternates between the two Incoming data streams, doubling the output data rate. If
desired, these multiplexor cells can be connected to a tree comprising a 2#:1 multiplexor
with no data transfer rate reduction. This scheme also enables parallei-to-serial
conversion of dual-rail data at an extremely high frequencies. Simulations predict good
operation up to data rates of 60 Gb/s.

Fig. 28 1:8 demux operating at 20 Gb/s.


For synchronous operation, another shift-and-dump (serial-to-parallel converter)
architecture can be use for demultiplexing. Fig. 28 shows a fabricated 1:8 demux circuit
of this type. For eight clock cycles, a serial bit is loaded into the demux cell; the ninth
clock dumps these bits onto eight separate output lines, clears the Internal demux
registers, and begins a repeat of the process. The 8:1 decimated clock Is also made
available at the output in order to synchronize the data flow between cryogenic and room
temperature circuits. Fig. 28 (right) shows a BIST test (as described in section 3.3.3) for a
serial data stream (bottom trace) that is sent to the demux cell at 20 Gb/s and correctly
shuffled between eight parallel output lines (topmost traces), reducing the aggregate data
rate to 2.5 Gb/s per channel.
336 D. K. Brock

4.2 Digital output drivers


For electrical connection outside RSFQ chips, the digital data, in the form of SFQ pulses,
must be translated into voltage swings suitable for processing by standard semiconductor
circuits. The SFQ/dc converter cell10 is one way of starting this process; however, this
method only provides a -250 uV NRZ voltage swing for each true bit at the chip pad. So-
called "high-voltage driver" (HVD) blocks (similar to the type shown in Fig. 30) have
therefore been developed to convert RSFQ data/clock signals into -2.5 mV NRZ voltage
swings at the chip pad.101102103
1 kA/cm 2 Driver 4 Q1V»

Fig. 29. (left) Output of 2.5 kA/cm2 asynchronous voltage driver circuit for 1 Gb/s, 4 Gb/s and 8
Gb/s input patterns and (right) output eye diagram of (a) 1 kA/cm2 and (b) 2.5 kA/cm2 driver
circuits for 4 Gb/s PRBS input. (Photos courtesy of Conductus, Inc.)
A HVD output driver fabricated with critical current densities of Jc = 2.5 kA/cm2 can
operate well up to 8 Gb/s, with rise and fall times of about 100 ps. The outputs of the
high-voltage driver circuit for input pulse patterns at 1 Gb/s, 4 Gb/s and 8 Gb/s are shown
in Fig. 29. Note that the 2.2 mV output amplitude at 8 Gb/s is less than the 3 mV
amplitude measured at 1 Gb/s. A large part of the decreased output amplitude at higher
frequencies is due to increasing loss in the measurement probe. The output eye diagrams
for a 4 Gb/s pseudo-random binary sequence (PRBS) pattern input are also shown in
Fig. 29. Here, the performance of the circuit is gauged by the size of the eye opening,
which improves with larger SNR, smaller phase noise, and smaller rise and fall times.
The eye opening represents the region, in voltage and time, of error-free operation.
Clearly, the 2.5 kA/cm2 circuit has a larger eye opening, primarily due to faster rise and
fall times, which are about 50% of those for the 1 kA/cm2 circuit.
RSFQ Technology: Circuits and Systems 337

. .3'

;*. =,; *j {*-!A'* '*:$••£, i ; * : ^ f e , : * : ^ *


SJ.S;

Fig. 30 Mask layout of an RSFQ High voltage Driver (HVD) for amplifying NRZ output voltage
swings for measurement off-chip. SFQ pulses are amplified by the input JTL on the fight and used
to trigger a voltage across the SQUID stack in the center for output across the resistor at the left.

4J Electrical and optical signal conversion


Although RSFQ circuits are capable of internal operation at tens of GHz their application
is limited by the output drive capability. For data communication applications that require
high-speed outputs such as switches and transmitters, the usable bandwidth of an RSFQ
circuit is limited by the output interface. The RSFQ signal must be amplified on chip at
low temperature to a large enough level to be sensed by a low«noise9 wideband,
semiconductor amplifiers with a low error rate. Once amplified, the signal can be used,
for example, to drive a laser diode or optical modulator for fiber-optic communication
links. Approaches under investigation are introduced in the following sections.
4.3.1 Conversion between RSFQ and Emitter-coupled logic (ECL) signals
In order to be compatible with standard off-the-shelf data acquisition systems, most
RSFQ output interfaces rely on conversion to ECL representation of ones and zeros.104,105
Fig. 31 (left) shows SFQ-to-EGL interface boards capable of handing up to 16 channels
of superconductor data at rates up to 1 GS/s. (see Fig. 32)

1%. ?! (icft) FCL interlace In a VME cage consists of one clock board and lour 4-diannel data
receiver boards: (fi^hi) On-site interface of a RSFQ ADC and a semiconductor DRi'M In a VME
cage.
The design consists of amplifiers, followed by a discriminator circuit and level shifters
(DC restore) to provide standard ECL data waveforms. Whereas previous boards utilized
differential outputs, the 1 GS/s modules operate with single-ended clock and- data signals
338 D. K. Brock

from the chip. Boards have been fabricated / operated in both VME and VXI standards to
allow compatibility with various existing high-performance electronics systems such as
the DRFM in Fig. 31 (right).

Fig. 32 Output-board amplifier test showing a pseudo-random bit sequence of 3 mV data at 1 Gb/s
to simulate the output signalsfroman RSFQ chip (left) and the eye-diagram of the 1 V amplified
digital stream ready for interface to room-temperature electronics (right).

4.3.2 Conversion between optical signals and RSFQ


Interfacing to low-temperature RSFQ devices presents a number of challenges, because
the traditional method of connecting to high-frequency (multi-GHz) devices (co-axial
cables) introduces a significant heat load into the system. The large physical size of such
cables may also limit the total number that may be fit into a given location. If the co-axial
cable is more than a few meters long, substantial attenuation due to skin effect losses at
higher frequencies may become a problem as well. In comparison, optical fibers have low
thermal conductivity, are immune to electromagnetic interference (EMI), are physically
smaller and lighter than standard cable, and are not subject to cross talk through ground
loops. For these reasons, several groups have been investigating techniques of adapting
optical fiber technology to connect both the input and output signals to/from
superconductor and RSFQ circuits.106,107
Optical-to-Electrical Conversion - The coupling of optical signals (either analog or
digital) into electrical stimulus for a RSFQ circuit can be accomplished using either PIN
diodes or MSM diodes. MSM diodes are easily fabricated in a standard Nb
superconductor IC fabrication process, since Si substrates are typically used a mechanical
supports for the thin-film Josephson circuits. Alternatively, PIN diodes (made from
InGaAs or GaAs) can be used either in bare chip form, or completely packaged with fiber
optic connectors. InGaAs photodiodes have been operated at up to 4 GHz at 4.2 K and Si
MSM diodes have been run up to 6 GHz at these temperatures. Beneficially, the
performance of these devices increases as the operating temperature is reduced. In either
case, care must be taken in affecting the transition from the optical device to co-planar
RSFQ Technology: Circuits and Systems 339

waveguide in order to make the electrical connection to the RSFQ circuit. In the case of
off-the-shelf photodiodes, their packaging capacitance will limit the overall speed to the
system.

Fig. 33 (left) Schematic of atypical optically-coupled RSFQ system. (Photo courtesy Conduct us.
Inc.) (right) Low-speed RSFQ shift jregister (of thetypedescribed in section 3.2.2) operating with
bothfiberoptic input andfiberoptic output. An integrated MSM diode is used at the input (B/O)
and a laser diode is modulated by the circuit's amplified voltage output (O/E).
Electrical-to-Optical Conversion - To optically couple the RSFQ output signal from
the cryogenic environment to room temperature, several methods have been studied.
These include: the use of electro-optic crystals (such as Lithium Niobate or Lithium
Tantalate)108,109 to phase modulate an incoming light beam, Mach-Zender modulators, or
the direct modulation of laser diodes or light emitting diodes. Laser diodes are generally
preferred to light emitting diodes on the basis of their electrical-optical slope efficiency;
however, off-the-shelf laser diodes often show signs of carrier freeze-out when cooled to
temperatures near 5 K.
4.4 Multi-chip modules and cryopackages
Whereas many semiconductor chips are bonded into individual packages suitable for
placement on a PCB5 RSFQ systems can be thought of as being interconnected with each
other on a superconducting multi-chip module that resides in a "cryopackagew. It is this
cryopackage that a system-user mounts on a small refrigerator in order to apply cooling
and power, and make all I/O connections.
4.4.1 Multi-chip modules
Building on work done in the late 1970's at IBM110, the idea of a superconducting multi-
chip module (MCM) was revived in the 19805s in Japan1 !M12,113f and independently at
TRW114,115 in the 1990*8. Today, these processes have matured and are being refined to
yield even smaller inductances in the contacts.116,117 In fact, the bandwidth of 25-mm
bump bonds is expected to exceed 100 GHz. Different substrate-conductor combinations
have also been explored including Nb-on-Si, Cu-on»Si, and Cu-on-Ceramic, but Nb-on-St
remains most popular because of the ability to fabricate the MCM "carriers" in the same
process as the Nb chips themselves. The Cu-on-Si and Cu-on-Ceramic, while offering
low resistance at 5 Ks are not superconductive.
340 D. K. Brock

Due of the data speeds Involved, it is impractical to consider transferring bits between
RSFQ chips via co-axial cables in different parts of the same system. Instead, a
superconductor (Nb-on-Si) MCM can be thought of as a dual of the PCB for
semiconductor ICs; however, the superconductor MCM maintains the lossless,
dispersionless transmission of the data between chips. Further, the Nb-on-Si arrangement
offers the ability of creating active circuitry in the MCM itself. This approach allows the
distribution of global clocks throughout the system, such as the LJI oscillator and PLL in
section 3.1, and enables the possibility of both synchronous and asynchronous system
operation.
Fig. 34 shows the low-temperature solder-bump technique for attaching RSFQ chips
onto MCM carriers. Typical bumps are 100 pm round pads comprising ~35 ran of Ti and
an adhesion/barrier layer, 400 nm of Pd as the solder-contact layer, and 50 ran of Au to
prevent oxidation of the Pd. The solder is InSn, which reflows at 118°C. After relow, the
bumps collapse to a height of about 5 jim, yielding an inductance of ~10 pH for the
contact. Fig. 35 shows the appearance of a full MCM (left) with two chips on a large
carrier aid close-up of arrays of test bumps (right).

Chip or substrate

Chip or substrate

In-Sn solder
12 prn 5 sjm
Metal adhesion layer
0,2 \xm~

urn pro
260-500 pm
Chip or substrate Chip or substrate

Fig. 34. MCM bump-bonding / solder re-flow process of TRW for superconductor ICs.

Fig. 35, (left) Flip-chip die attach type superconductor MCM with 2 chips on a carrier. (Photo
courtesy J. Spargos TRW) (right) InSn solder bumps scanning electron micrograph. (Photo courtesy
A. Smith, TRW)
RSFQ Technology: Circuits and Systems 341

Essential for an effective design of chip-to-chip transmitter and receiver circuits is an


accurate understanding of the intervening electrical pathway. Propagation of signals on-
chip is well characterized up to the off-chip transition. Along the surface of a
superconductor chip, signals follow a nearly transverse electromagnetic (TEM) mode
along microstrip wiring. Currents along the wiring structures are mirrored by return
currents confined below the wiring. Dispersion is low, even to hundreds of GHz. In
contrast, the transition off-chip requires separate, widely separated connections for
signals and ground returns. Parasitic reactances in the transition greatly affect the
dispersion of transiting signals. Effective bandwidths of the transitions typically range up
to tens of GHz. One method of reducing the connection parasitics is to keep the physical
distances of the non-microstrip pathways to an absolute minimum. The flip-chip die
attach geometry excels over wire-bonding and other die attach schemes in this regard.
Although most superconductor MCM digital signal transfer schemes involve the
boosting of the RSFQ signal before transmission and then recovering the SFQ pulse at
the receiver, experiments have shown the direct transfer of SFQ pulses between chips on
a MCM using very low-inductance bumps.118
4.4.2 Cryopackages
Whether using a single-chip or MCM configuration, the RSFQ circuitry must be mounted
in a cryopackage before mounting it on the cooling surface. In the last 10 years, much has
been learned about this art."9'120 For reliable operation, RSFQ circuits require ambient
magnetic fields of less than ~1 mG at the surface of the chip. Typically, cryopackages
employ double-walled "high-u metal" shields to provide this screening, along with a
vacuum-insulated heat radiation shield. This configuration can result in a system that will
reject external magnetic fields up to -10 G and provide good thermal isolation from
higher temperatures. Since the operation of RSFQ logic itself depends magnetic fields, it
is possible that, as the chip is cooled below the superconducting critical temperature,
stray magnetic flux quanta could become trapped (i.e. Abrikosov vortices24) within a
closed superconducting region in the chip ground plane. A package "defluxing" heater
provides a means of getting rid of any trapped flux by raising the temperature of the chip
above Tc for a short time. Ideally, a self-check procedure should monitor this effect at
cool-down time and deflux the RSFQ circuits as a part of normal diagnostics.121
Fig. 36 shows a demo122 cryopackage on a cryocooler produced during an ATP
program on network switching for the US Department of Commerce.123 This package
required a mounting stage at 4-5 K with a maximum power dissipation of 300 raW and a
temperature stability of < 100 mK. The minimum surface area was 3.25 in. in diameter.
This package maintained an ambient magnetic field < 0.5 mG, with clearance to
accommodate 12 co-axial cables 0.085 OD. The second stage lifted a maximum power of
10-20 W with a minimum surface area of 20 in2 and a temperature stability of 1 K.
Primary concerns in the design were magnetic and electrical noise.
342 D. K. Brock

Magnetic shield
xz~i
(4.5 K) ffl^ • 4.5 K

Radiation shield -
(80 K)

•Magnetic shield- <SGK


(295 K) *-i
Mi
Vacuum caii —-
{295 K) ^
IS*
#44 ; >
BOREAS
BUM)

Fig. 36 (let) ATP program HYPRES/Boreas cryopackage/cryocooler showing all three CCR
stages. The digital network switch system tested up to 4 Gb/s I/O. (right) Schematic of the CCR
configuration showing the multiple temperature stages and shielding. (Photos courtesy of
Conductus, Inc.)
Fig. 36(left) shows another cryopackage, designed specifically for both electrical and
optical connections. This unit has been used with the high-resolution ADC described in
section 2.1.1, The full ADC parallel output word was first serialized using a "shift and
dump" type multiplexor on-chip (i.e. parallel-to-serial converter) and then used to
modulate a serial data stream onto an optical fiber for transfer to room temperature
electronics.124,125 Magnetic and thermal shields are shown also. Fig. 37(right) shows a
similar package with 6 high-speed (>10 GHz) SMA connectors for high-speed electrical
I/O. In either case, the package would be heat sunk to a cold finger on a CCR or
alternatively could simply be immersion-cooled in liquid helium for development and
reliability-testing.
Some applications, such as the DAC described in section 2.2.1 can benefit from the
use of multiple chips, but do not necessarily require the full performance of a MCM
process. The package in Fig. 38 accommodates up to 20 0.5 cm2 RSFQ chips with
relatively low-speed (500 MHz) interconnects. Triple high-p, metal shielding and flexible
ribbon cable connectors make this design both robust and simple.
RSFQ Technology: Circuits and Systems 343

Fig. 37 (left) A cryopackage fitted for optical output with 3 high-speed lines and 18 low-speed lines
and a mounting fixture for an output laser diode, (right) A similar cryopackage with 6 high-speed
lines and 18 low-speed lines with dimensions: 5.75 in. x 2.1 in. X 1.1 in.

Fig. 38 Low-speed multi-chip package being developed for a Josephson AOvoltage standard
(DAC).

5. RSFQ Digital Signal Processing Applications


Many applications for LTS circuits have been introduced and studied.126,127 Some
emphasizing the digital aspect of RSFQ have reached quite mature stages of
development.128 Recently* substantial funding has been made available for investigation
of PetaFLOPs level computing. This application has been detailed in several places,129,130
and is therefore not covered here. Other possible applications and their cryocooler
requirements are reviewed in the following sections.
5J Application examples

5.1.1 Pseudo-random binary sequence generators


For spread-spectrum communications protocols, it is necessary for the receiver to
synchronize its internal code generator with the incoming signal before receiving data.
The speed of RSFQ can allow this synchronization to take place in only a few clock
344 D. K. Brock

cycles, eliminating the need to send sync signals back and forth between transmitter and
receiver by providing pseudo-random binary sequence generators.131 Northrop Grumman
has made good progress on this application.132'33134
5.1.2 Network switches
The transmission capacity of a single mode optical fiber is more than 1 Tb/s and the
speed of RSFQ is well matched to the bandwidth of optical fiber. Although photons (with
their low energy) are a good match for sending signals, they interact only weakly, making
it difficulty to perform fully-optical switching. The optical I/O capability of RSFQ (see
section 4.3.2) creates the possibility of constructing a full optical network switch.123 A 2-
node SFQ crossbar switch has been demonstrated with both data streams over 16 Gb/s at
Northrop Grumman, while a 16x16 design was demonstrated at 3 Gb/s by TRW.135 The
ATP program discussed in section 4.4.2 is another good example. Unlike purely optical
switching fabrics, RSFQ data switches can be rapidly reconfigured. For instance, for the
Asynchronous Transfer Mode (ATM) packet length of 53 bytes (i.e. 424 bits), it is
necessary for an OC-192 data switch to configure the crosspoints every 42 ns.136 While
optical crosspoints are not fast enough for this, it is well within the capability of today's
RSFQ circuits.137 In fact, one analysis has shown that a 128x128 channel self-routing
Batcher-Banyan switching core implemented in a 0.8-um RSFQ technology could
provide throughput close to 100 Gbit per channel, dissipate about 10 mW of power, and
fit on a single 1-cm2 die.138
5.1.3 Software Defined Radio
Considerable interest has arisen in the idea of enabling software-defined radios (SDR)139
with RSFQ technology.140 The SDR concept relies on the digitization of communications
waveforms as close to the antenna as possible, with subsequent hardware and software
DSP to sort out the signals. For semiconductor ADCs, one or two stages of down-
conversion and filtering are typically needed in order to bandlimit a single channel and
move it to the baseband before it can be handled digitally. Using one or a combination of
the approaches discussed in section 2.1, however, an entire band or several bands could
be digitized at once, skipping one or all the mixing stages, while preserving signal
fidelity. The ability to feed the digitized waveforms directly into an RSFQ digital pre-
filter before handing off baseband signals to the control hardware might also enable
dynamically programmable or software reconfigureable systems in which a single
platform can simultaneously satisfy the requirements of communications, radar, and
electronic warfare applications.
5.1.4 Digital RF Memory
Militaries are moving toward digital multi-function systems (as a replacement for
dedicated RF systems) in order to reduce cost and increase flexibility. A digital RF
Memory (DRFM) is an Electronic Counter-measure (ECM) system designed to digitize
wideband waveforms, store them in a cache memory, and apply time-delays and
frequency-shifts to the data before uploading them to a direct digital synthesizer (DDS)
RSFQ Technology: Circuits and Systems 345

for retransmission. The purpose of this is to put up false or distorted signals that will be
misinterpreted by enemy receivers. Naturally, in order to appear realistic, the linearity of
all DRFM components should be greater than the accuracy of the targeted receivers.
Further, there is a requirement that all processing take place in near-realtime. RSFQ
technology has demonstrated all the necessary components, including ADC, DAC, RAM,
DSP, etc, for this application.

Superconductive Gallium Arsenide


ty—J::LJ • DAC
ADC L J ,v
T . - r —•

~i r~ IK--- •

Input
and
Reconstructed Output
(29 MHz, §60 ps puis® width)

System demonstrated up to 160 MS/s

Fig 39. Digital RF memory test at KOR Electronics (San Diego, CA) showing how an existing
system can be retrofitted with an RSFQ chip without loss of performance.
As a first step toward the realization of an all-RSFQ DRFM, HYPRES (Elmsford,
NY) has worked with Kor Electronics (San Diego, CA) to take a traditional DRFM and
replace only the ADC portion with the RSFQ ADC described in section 2.1.1. Fig. 39
shows the measurement setup for such tests. As input, a 500 ps wide pulse was applied to
the ADC at a rate of 29 MHz. The digitized waveform was then passed directly from the
superconductor ADC to a fast semiconductor memory and immediately uploaded to a
GaAs DAC. Power spectra before ADC digitization and after conversion back to analog
are shown. Performance was limited by the DAC fidelity, Although rigorous
measurements are not yet complete, this demonstration proves the validity of replacing a
single system section with an RSFQ component.
5.1.5 Time-io-digital converters
Another area where RSFQ circuits are particularly well suited is for the instrumentation
of high-energy physics (HEP) experiments.141 In such applications, LHe is typically used
to cool superconducting magnets, creating a natural place for cooling the circuits.
Because RSFQ circuits are inherently radiation-hard142'143, they perform well in high-
radiation HEP environments. Further, such circuits offer better sensitivity for the
measurement of weak detector signals and can reduce SNR degradation by performing
the digitization and multiplexing of many detector signals before the transition to room
346 D. K. Brock

temperature. A Time-to-Digital Converter (TDC) is one such detector that can be made
using RSFQ technology. A TDC is essentially an electronic stopwatch, able to determine
the absolute Mid relative time difference between events (or "hits") and report them as
digital numbers.

z® * ^ ^ « ^ttng^ys; ^2 r^ T^
T-rT

^•13
Xxx xx
h
E** «JHI QsSaffoifcjte LJ

F# ISsIIIiii
b^ * r *«* *
f mm wm m m mm §
L„JkMjL.*dkA*JkM.
Fig. 40. (left) 8-channel TDC chip and (right) block diagram of a single channel.
Fig. 40 shows the layout of an 8-channel TDC chip (left) and block diagram of a
single-channel (right). Each TDC channel consists of a 14-bit RSFQ counter based on T
flip-flops with destructive readouts, a 9-hit (i.e. 9-word) shift register-based FIFO
memory, and a parallel-to-serial converter with output driver. To facilitate both external
user control and data output from the TDC, a complete VXI-bus user interface with
Lab View control software has also been constructed. A detailed design of the RSFQ
multi-hit TDC circuit is described in Refs. 144 and 145. Interestingly, the binary counter
is the only component required to operate at the maximum GHz rate. The FIFO buffer is
used to store multiple time stamps associated with different input hits and the parallel-to-
serial converter provides a serial mode of data readout. An extra "valid bit" register has
even been added to provide tagging capabilities in order to distinguish between valid
information and time references, blank time stamps, etc. As shown on the measurement
in Fig. 41 (left), the applied distance between the first and the second hits is 0.2000 jis and
the clock frequency is 33 GHz. The 13-bit binary data output is (1100110111111), which
is 6591 in decimal representation. Thus, 6591 x (1/33 GHz) = 0.1997 ps = 0.2 \xs ±2.5 ps,
as dictated by the design. Further applications are discussed in Ref. 146. Fig. 41 (right)
shows a prototype control station for the TDC instrument currently under development by
HYPRES (Elmsford, NY) for the US Department of Energy's Fermilab (Batavia, IL).
RSFQ Technology: Circuits and Systems 347

Fig. 41 (left) High-speed testing of the digital (coarse) part of the IDC showing 2.5 p* resolution
and (right) TDC user interface showing a VXI crate with interface cards and controlling PC with
Lab VIEW acquisition, control, and display software.
5.L6 Digital signal autocorrelators
Signal autocorrelators perform a correlation function on a spectrum to detect the presence
of periodic signals, even when the signal strength is very weak. This is based on the
principle that random noise (non-periodic) averages down and signals (periodic) are
brought out. Unfortunately, typical semiconductor-based systems can take msec to
seconds to reveal the presence of some covert communications waveforms or to receive
spread-spectrum communications signals in areas with significant clutter. RSFQ
autocorrelators offer two important advantages over other technologies. First, the
hardware complexity of a broadband digital correlator decreases rapidly with clock
speed. From this consideration, RSFQ digital correlators running at 20 GHz clock speeds
definitely outperform the existing analog correlators when the level of-100 stages (or
"lags") is reached. With -1000 lags they will beat the digital as well as the (very complex
and costly) analog-digital systems. Another advantage of an RSFQ correlator arises in
space-bome applications where reduction in power dissipation could be of crucial
importance. Power dissipated by an RSFQ correlator can be reduced to below 1 yW (at
4.2 K) per channel for clock speeds up to 20 GHz. Even with a very inefficient
cryocooler (say 0.1%) this translates into only 1 mW per channel at room temperature
and is two orders of magnitude better than semiconductor correlators.
The basic correlation process consists of a standard sum of products calculation.
Given two data vectors, the corresponding elements from each vector are multiplied and
then all the products are summed to a single result. One of the vectors is then shifted by a
single element and the multiply and sum process is repeated to produce the next result
data point. This process is illustrated in Fig. 42(right). A 16-channelf 4 GHz bandwidth,
double oversampling (16 GHz clock), RSFQ autocorrelator of this type has been
demonstrated.144*147 The active area of the correlator IC is shown in Fig. 42(left).
348 D. K. Brock

**""**% 1-bit Cof??f>ar«tor [ N 1 S-stag® Shift Register JZZ^> J 1

** ^.fg$%$$*vy$ni&,„ ~f "f~ "HUZ1. IJI* .1 -,


* 1 1 1 Bitwisus-XOR

Fig, 42 (lei) Mask layout for autocorrelator. (right) Block diagram of autocorrelator.

5.1.7 Fast Fourier Transform Engines


For many digital signal processing applications, especially in SIGINT (signal intercept)
and Comms (communications), a transform into the frequency domain allows the use of
significantly faster algorithms for discriminating, selecting^ and identifying waveforms.
Unfortunately, the calculation of the Fourier transform of a signal (even with the Fast
Fourier Transform [FFTJ method) is very compute-intensive, and grows more so as finer
frequency resolution is desired. Fig. 43 shows an RSFQ Decimation-in-Time (DIT)
Radix»2 Butterfly integrated circuit which is the basic cell needed to implement the 32-
point Fast Fourier Transform (FFT) in a parallel data flow architecture.97 The radix-2
butterfly circuit uses serial RSFQ math and consists of four single bit-wide serial
multipliers and eight carry-save serial adders of the type outlined in section 3.3.3. The
circuit with 16-bit word-length employs only 3400 JJs* occupies an area of 3 J mm x 2.0
mm, and dissipates less than 1.1 mW power.

Fig. 43. (left) 2 Radix-2 16-bit multipliers on a 0.3 cm2 chip, (right) Full operation of a rtdix-2
butterfly with 5-bit word length. Outputs are RTZ with LSBs first. Inputs are Wim = (11111)» Yi» ~
(01110), WRe = (11011), YRe = (10101), Xne « -Xi. = (1011000000), X ta - -XRe - (0101000000). Note,
no subtracters are used in the circuit.
RSFQ Technology: Circuits and Systems 349

A DIT radix-2 butterfly operation requires one complex (Re & Im) multiply and two
complex additions. A purely real implementation requires four real multipliers and six
real adders to compute:
XRe= XRe+ (YRe-WRe - Yta-Wta) and XIm' = XIm + (YIm WRe + YRe Wlm),
YRe= XRe - (YRe-WRe - Y,,,,-WIm) and YIm' = XIm + (YIm WRe + YRe WIm). (3)
Fig. 43(right) shows a demonstration of full functional operation of the radix-2 butterfly
chip with a 5-bit word length. To multiply, N x 1-bit serial multipliers are used. To
add/subtract, 1-bit carry-save serial adders (CSSA) are used.
5.2 Cryogenic refrigerator requirements
RSFQ circuitry must be cooled for operation. The temperature of operation is normally
selected to be ~Vz of the material's superconducting transition "critical temperature" (Tc).
For operating temperatures below 'A Tc, superconducting parameters are not strongly
sensitive to small variations in temperature. Above lA Tc, Josephson junction device
operation becomes very sensitive to thermal fluctuations, resulting in greatly reduced
parameter operating margins.148 For Nb (Tc = 9.23 K), operating at the boiling point of
LHe at one atmosphere (4.2 K), meets this requirement. A closed-cycle refrigerator with
a 5 K final stage temperature is also a suitable platform.149 At these temperatures, RSFQ
circuits require -10 to 100 mK stability.
Cooling of commercial RSFQ superconducting electronics requires low cost, closed
cycle refrigeration. Logistics inhibit the widespread use of LHe, except in a laboratory
setting. Space-based (satellite) applications of RSFQ require cryocoolers of the highest
reliability, lowest input power, and small weight and size; while marginal capital cost
relative to that of the electronics subsystem, are the paramount concern for other
applications. A cost-reliability goal for commercial cryocoolers has been suggested at
$1K tolOK for a unit with a 2-10 year lifetime.150
Refrigeration power required for a RSFQ system will likely be a few watts at most
and the dominant heat load will be heat leaks by radiation and conduction from the
surrounding interface electronics, rather than the RSFQ chips themselves. Consequently,
the refrigeration power to go from 300 to 77 K is not much less (-30%) than to go from
room temperature to 5 K. However, input power requirements are about 600-2000 W for
the LTS RSFQ temperature range, for 0.5 W of refrigeration power, due to the Carnot
and actual thermal cycle efficiencies.
Ultimately, it is the thermal conductivity of the I/O and the active device dissipation
that determines the requirements of the cryocooler. For a typical RSFQ ADC system, a
refrigerator might be needed that can cool a minimum of 250 raW at 5 K; ideally 350
mW. This cooling budget consists of: 100 mW for the RSFQ chip, 70 mW for the I/O
heat load from a 40 K stage to the 5 K stage. With a 50% to 100% design margin, this
results in 250 mW to 340 mW total power lift.
350 D. K. Brock

5.2.1 Product platforms


Most refrigerator systems have two separate mechanical components: a cold head and a
compressor. The cold head is where the RSFQ circuits are mounted. It can be compact
and dissipates only a small amount of power. The compressor contains a motor driven
compressor stage and (in some cases) oil particle traps, and dissipates almost all of the
required power. It is important to note that the power required by the compressor is for
operating a motor, and thus can be primary unregulated power. The fundamentals of CCR
operation can be found in Ref.151.
Since extending the lifetime of existing cryocoolers directly impacts cost, and since
moving parts limit long-term reliability or lifetime, the most dramatic small cryocooler
improvements are in the form of pulse tube refrigerators with only one moving part.152
The pulse tube is second only to Stirling refrigerators in efficiency. Stirling coolers have
gained widespread acceptance for use in infrared sensor cooling. First in reliability and
second in current production today are Gifford-McMahon (GM) refrigerators, which are
used as cryopumps on vacuum systems. While these GM production units typically reach
12-15 K, high heat-capacity materials, such as Er3Ni can be used instead of Pb in a
regenerative heat exchanger to yield 4-5 K operation, without resorting to the previously
required separate Joule-Thomson (JT) final stage.153 Such units are expandable to lift
several watts at the 5 K level by using a higher-power compressor.154
With regard to CCR reliability, MTBFs of 80,000 hours are achievable with GM and
JT cryocoolers, although some periodic maintenance may be required (i.e. oil adsorber
replacement) perhaps every 18-24 months. Theoretical lifetimes for free-piston Stirling
cryocoolers using gas bearings and/or flexture bearings or pulse-tube coolers are also of
this magnitude, but do not yet have operating histories to support such numbers.119 Fig.
44 illustrates the myriad of platforms which can accommodate a CCR-packaged RSFQ
system. Although not likely to be found on a handheld devices any time soon, solutions
do exist for the majority of fixed-site and mobile platforms, with the possibility of
dismounted (portable) systems in the future.
The practicality of RSFQ-based applications is linked to the ability to package and
power such systems. Various options exist for active (closed-cycle refrigerator or CCR)
cryocooling to the necessary 4-5 Kelvin temperatures. Today's cooling options can be
loosely grouped into two classes: "Commercial" and "Developing". Examples are given
in the following sections.
RSFQ Technology: Circuits and Systems 351

Ground installations

Surface Ships
Submarines
Airborne

Tractor-trailers
Satellites
UAVs
Fighters
Missies

Fig. 44 Platforms capable of fielding cryogenic electronics.

5.2.2 Commercial Coolers


Near-term RSFQ systems can make use of commercially available cryocoolers. (i.e., no
cooler development work is needed). Instead, work would focus on system integration
issues, input/output, and robustness.
The Leybold "CoolPower 4.2LAB": Fig.45(left) shows this compact (19 in. rack-
mountable) cooler which provides a foil 0.75 W at 5 Ks drawing 2 kW at 220 V (single-
phase) input power, It is air-cooled and requires one standard maintenance every 24
months. This 2-stage Gifford-McMahon (GM) cycle CCR is used for the HYPRES
primary voltage standard product and is currently deployed, in the field, as a platform for
superconductor ICs. The < 100 kg. unit has not been optimized for size and might be
reduced in overall profile without undue impact on performance.

Fig. 45 (left) A 19 in. rack-mountable Leybold "4.2LAB" 2-stage Gifford-McMahon cryocooler


delivering lA W of lift at 4 K for 2 kW input power, (right) A CTI single stage Gifford-McMahon
cryocooler delivering 60 W lift at 60 K with a no-load temperature of 45 K.
382 D. K. Brock

The CTI MGryodyness Refrigerator: Fig.45(right) shows a single stage GM cooler


based on the CTI M350 and 8F Cryopumps which delivers 6 W of lift at 60 K using 600
W from the wall119 A second stage might be added to the cold-head to achieve the
required specifications9 perhaps even with the same compressor. Many of these units are
currently deployed in the field as a platform for Conductus superconductor HTS filters
and have demonstrated excellent reliability statistics. At only 60 lbs,, this cooler easily
mounts into a standard half-ATR rack aid is gaining acceptance within the wireless base-
station community as a communications platform. Other 4-5 K commercial cryocoolers
are available from many different manufacturers including APD Cryogenics, Sumitomo,
Toshiba, Mitsubishi, etc. Improved high-capacity versions (suitable for ground base or
large shipboard installations) have recently entered the market as well.
5.2 J. Developing Coolers
The reduction in form-factor necessary to realize a single-person-portable or "backpack"-
sized system stems, from currently demonstrated (but not yet commercial) technology.
There are several promising candidates for 4-5 K RSFQ cooling, although none has yet
been' demonstrated with the necessary specilcations.
The TRW 3503 Pulse-tube cooler:155 Fig.46(left) shows this ultra-rugged unit which
contains no moving parts at the cold-head and has been space-qualified. (Two units are
currently deployed in satellites). The unit provides 0.3 W of lift at 35 K for 82 W into the
compressor with a 300 K sink temperature: weight =12.1 kg, size = 341 mm x 200 mm x
498 mm. Additional study into the use of rare-earth regenerators and or a JT expansion
stage in the design is the focus'of work needed to explore the possibility of 5 K operation.
The clever use of reciprocating flexture bearings in the Stirling compressor allows for
very little net vibration in the unit, a good match for the requirements of satellite systems.

Fig. 46 (left) Space-qualified TRW 3503 Pulse-tube cryocooier with Sterling compressor reaches
35 K today, (right) Example of a full-custom "MMS 5O-80K" class split-sterling cooler for space
applications reaches which reaches 4.5 K.
The Matra Marconi Space "MMS 50-80Kss class of split-Stirling cryocoolers:156 Fig.
46(right) shows another custom unit designed for ultra-long life applications for the
European Space Agency (ESA). The 18 kg unit shown can be pushed down to provide
0.11 W of lift at 4.5 K from a compressor power budget of 145 W. Further optimization
RSFQ Technology: Circuits and Systems 353

has been suggested to Increase the lift capacity; however, the cost-to-produce may remain
very high, even In large quantities.
The recent Introduction of off-the-shelf 5 K CCRss such as the Leybold unit In section
5.2.29 has already resulted In Nb superconductor IC-based products being brought to
market. The HYPRES Primary Josephson Voltage Standard (see Fig, 47), previously
available only In a LHe-dewar format, Is now being sold as a folly self-contained unit
based on this CCE. Besides the long-term savings In cryogens for the user, the upgraded
unit now requires less maintenance. Further reductions in the slze-welght-p ower profile
of CCRs will undoubtedly open up further markets as outlined In section 5.1. The lack of
application pull has slowed this development more than the technical challenges. Simply
put, if there Is a market for cryogenic refrigeration, it could soon be available with the
same reliability and cost/performance as household refrigerators.

Fig. 47 Availability of off-the-shelf cryogenic refrigerators has made superconductor IC-based


products much more user-friendly. Thisfollyself-contained commercial Primary Voltage Standard
systemfromHYPRES uses an Nb trilayer superconductor IC with an array of over 20,000 JJs to
reproduce the Systeme Internationale (SI) unit of the Volt for metrology applications in any lab.

6, Conclusion
As we move closer to the centennial celebration of the 1911- discovery of the
phenomenon of superconductivity, the prospect of practical and useful applications of
superconductor microelectronics is at its most promising. Fifty years of superconductor
research Into first analog and then digital circuits has developed in the shadow of a
semiconductor-fueled "Information revolution" which has invigorated economies and
shaped cultures. Exploiting the massive investment in semiconductor integrated circuit
354 D. K. Brock

processing equipment, design techniques, and application niches, the very demand for
"100 GHz-level" performance created by semiconductors may, in fact, only be fulfilled
by superconductors. RSFQ technology could be the key.
RSFQ data converters are the fastest and most sensitive ever demonstrated. With a
full cadre of clocking, logic, and memory approaches under development, this versatile
technology could conceivably merge the Digital and RF domains once and for all.
Moreover, as quantum computation, optical and biological systems, or other new
technologies mature, RSFQ may be the only approach with the speed and integration
scale to bridge the gap between traditional electronic data and future formats.
Notably, the inclusion the cooling and interface components does require that
cryogenics experts be part of the RSFQ system design team. System form factor,
refrigeration power, and operating temperature variations need to be well defined; heat
leaks and magnetic shielding are key constraints for system packaging. But these are
solvable issues. In the end, RSFQ system cryogenics will be not as much a technological
issue, as a psychological issue for the prospective user - it represents a true paradigm
shift in the definition of an electronic system. The desire for 100 GHz performance,
however, if great enough, can surmount even this barrier.
Acknowledgements
Thanks to all who contributed data, text, figures, and/or assisted with proofreading,
including: John Rowell, Deepnaryan Gupta, Oleg Mukhanov, Alex Kirichenko and Alan
Kadin of HYPRES; Konstatin Likharev, Paul Bunyk, Dmitry Zinoviev, and Vasili
Semenov of SUNY Stony Brook; Dale Durand, Andy Smith, and John Spargo of TRW;
John Przybysz and Don Miller of Northrop Grumman; Sam Benz of NIST; and
Yongming Zhang of Conductus.
References

1. J. Richey, "The future of high-speed electronics: RF and Digital electronics will converge to
the same domain," plenary address at Commercialization of Cryoelectronics Technologies in
Microelectronics, San Francisco, CA-Feb. 19, 1999.
2. The best research-grade bipolar transistor amplifier and/or digital frequency divider designs
reach an operational frequency ~4x below the unity power-gain bandwidth point, /max.
Historically, performance of commercial parts has been closer to 7x below/^. See also: E.
Sano, Y. Matsuoka, and T. Ishibashi, "Device figure-of-merits for high-speed digital ICs and
baseband amplifiers," IEICE Trans. Electron. E78-C (1995) 1182-1188.
3. M. Schultz, "The end of the road for silicon?" Natur. 399 (1999) 729-730.
4. P. Packman, "Pushing the limits," Sci. 285 (1999) 2079-2081.
5. Semiconductor Industry Association, "1999 International Technology Roadmap for
Semiconductors," [Online] http://www.itrs.net/!999_SIA_Roadmap/Home.htm.
6. C. Bennett, "Quantum information and computation," Phys. Today 48 (1995) 24-30.
7. L. Adleman, "Computing with DNA," 5c/. Amer. 279 (1998) 54-61.
8. M. Reed, "Molecular-scale electronics," Proc. IEEE 87 (1999) 652-658.
RSFQ Technology: Circuits and Systems 355

9. For a concise review of superconductivity see: J. Schrieffer and M. Tinkham,


"Superconductivity," Rev. Mod. Phys.11 (1999) S313-S317.
10. General reviews of RSFQ fundamentals are found in: K. Likharev and V. Semenov, "RSFQ
logic/memory family: A new Josephson-junction digital technology for sub-Terahertz-clock-
frequency digital systems," IEEE Trans. Appl. Supercond. 1 (1991) 3-28; K. Likharev,
"Rapid single flux quantum logic" in H. Weinstock and R. Ralston (eds.), The New
Superconducting Electronics. Kluwer, Dordrecht (1993) 423-452; K. Likharev,
"Superconductor devices for ultrafast computing," in H. Weinstock (ed.) Applications of
Superconductivity. Kluwer, Dordrecht, (2000) 247-294; and P. Bunyk, K. Likharev, and
D. Zinoviev, "RSFQ Technology: Physics and Devices," this issue.
11. W. Chen, A. Rylyakov, V. Patel, J. Lukens, and K. Likharev, "Superconductor digital
frequency divider operating up to 750 GHz" Appl. Phys. Lett. 73 (1998) 2817-2819; and W.
Chen, A. Rylakov, V. Patel, J. Lukens, and K. Likharev, "Rapid single flux quantum T-flip
flop operating up to 770 GHz," IEEE Trans. Appl. Supercond. 9 (1999) 3212-3215.
12. HYPRES, Inc., "HYPRES Nb Design Rules rev. 017", [Online] http://www.hypres.com/.
13. L. Ya, C. Berry, R. Drake, et al., "An all-niobium eight level process for small and medium
scale applications," IEEE Trans. Mag. 23 (1987) 1476-1497.
14. F. London, Superfluids Vol. 1 Macroscopic Theory of Superconductivity. Dover, New York
(1950).
15. B. Deaver and W. Fairbank, "Experimental evidence for quantized flux in superconducting
cylinders," Phys. Rev. Lett. 7 (1961) 43-46.
16. B. Josephson, "Possible new effects in superconductive tunneling," Phys. Lett. 1 (1962) 251-
253.
17 See various articles on latching logics the a special issues of IEEE Trans. Elect. Dev. 27 (Oct.
1980) and Proc. IEEE 77 (Aug. 1989).
18. "Josephson computer technology: An IBM research project," in Special Issue of IBM J. Res.
Dev. 24 (Mar. 1980). A MITI program also ran in Japan from 1981-1990 (see also Ref. 23).
19. H. Kroger, L. Smith, and D. Jille, "Selective anodization process for fabricating Josephson
tunnel junctions," Appl. Phys. Lett. 39 (1981) 2180-2182.
20. M. Gurvitch, M. A. Washington, and H. A. Huggins, "High quality refractory Josephson
tunnel junctions utilizing thin aluminum layers," Appl. Phys. Lett. 42 (1983) 472-475.
21. S. Hasuo, T. Imamura and N. Fujimaki "Recent advances in Josephson junctions devices,"
Fujitsu Tech. J. 24 (1988) 284-292.
22. Y. Tarytani, M. Hirado, and U. Kawabe, "Niobium-based integrated circuit technologies,"
Proc. IEEE 77 (1989) 1164-1176.
23. S. Hasuo, S. Kotani, A. Inoue, and N. Fujimaki, "High speed Josephson processor
technology", IEEE Trans. Mag. 27 (1991) 2602-2609.
24. G. Kerber, L. Abelson, R. Elmadjian, et al., "An improved NbN integrated circuit process
featuring thick NbN ground plane and lower parasitic circuit inductance," IEEE Trans. Appl.
Supercond. 7 (1997) 2638-2641.
25. K.K. Likharev, O.A. Mukhanov, and V.K. Semenov, "Resistive single flux quantum logic for
the Josephson-j unction technology, in H. Hahlbomand and H. Liibbig (eds.) SOUID'85. W.
deGruyter, Berlin, (1985) 1103-1108.
26. K. Likharev, O. Mukhanov, V. Semenov, "Ultimate performance of RSFQ logic circuits,"
IEEE Trans. Mag. 23 (1987) 759-762.
27. Good textbooks covering the basics of superconductor circuits include: T. Orlando, and K.
Delin, Foundations of Applied Superconductivity, Addison-Wesley, Reading MA. (1991);
356 D. K. Brock

M. Tinkham, Introduction to Superconductivity. 2 ed, McGraw-Hill, New York, (1996); T.


Van Duzer and C. Turner, Principles of Superconductive Devices 2nd ed. (1999); and A.M.
Kadin, Introduction to Superconducting Circuits. Wiley Interscience, New York (1999).
28. First introduced in P. Anderson, R. Dynes, and T. Fulton, "Josephson flux quantum shuttles,"
Bull. Am Phys. Soc 16 (1971) 399; and then expounded upon in T. Fulton, R. Dynes, and P.
Anderson, "The flux shuttle - A Josephson shift register employing single flux quanta,"
Proc.lEEE 61 (1973) 28-35.
29. J. P. Hurrell and A. H. Silver, "SQUID digital electronics", in: B.S. Deaver Jr. et al. (eds.),
Future Trends in Superconductive Electronics. AIP, New York, (1978) 437-447.
30. K. Nakajima, Y. Onodera, and Y. Ogawa, "Logic design of Josephson network," J. Appl.
Phys. 47(1976)1620-1627.
31. K. Nakajima and Y. Onodera, "Logic design of Josephson network - II," J. Appl. Phys. 49
(1978)2958-2963.
32. J. Deng, S. Whiteley, and T. VanDuzer, "Data-driven self-timing of RSFQ digital integrated
circuits and systems," IEEE Trans. Appl. Supercond. 7 (1997) 3634-3637.
33. N. Yoshikawa, H. Tago, and K. Yoneyama, "Design considerations for data-driven self-
timed RSFQ adder circuits," JEJCE Trans, on Electron. E81-C (1998) 16-18-1626.
34. K. Nakajima, Y. Mizugaki, T. Onomi, and T. Yamashita, "Fluxiod-type logic circuits," in H.
Ohta and C. Ishii (eds.), Physics and Application of Mesoscopic Josephson Junctions. Phys.
Soc. Jap., Toyko (1999) 267-288.
35. "Present status, future prospects and market potential for 4-5 K cryocoolers" in Proceedines
of the HYPRES 5 K Crvocooler Workshop July 24-25, (1995), available from HYPRES Inc.
36. See e.g. Leybold Lab4.2 2-stage GM unit.
37. R. Radebaugh, "Development of the pulse tube refrigerator as an efficient and reliable
cryocooler," Proc. Inst. Referig. (1999) 1-16.
38. J. Yoshida, "Recent progress of high-temperature superconductor Josephson junction
technology for digital circuit applications," IEICE Trans. Electron. E83-C (2000) 49-59.
39. H. Terai and Z. Wang, "All-NbN single flux quantum circuits based on NbN/AlN/NbN
tunnel junctions," IEICE Trans. Electron. E83-C (2000) 69-74.
40. A. Braginski, "Superconducting Electronics Coming to Market," IEEE Trans. Appl.
Supercond. 9 (1999) 2825-2836.
41. J. Rowell, "Recommended Directions of Research and Development in Superconducting
Electronics," IEEE Trans. Appl. Supercond. 9 (1999) 2837-2848.
42. C. Rosner, "Emerging 21st century markets and outlook for applied superconducting
products," in P. Kittel (ed.), Advances in Cryogenic Engineering 43A (1998) 1-23.
43. K. Gaj, Q. Herr, V. Adler, D. Brock, E. Friedman, and M. Feldman "Towards a systematic
design methodology for large multi-gigahertz rapid single flux quantum circuits," IEEE
Trans. Appl. Supercond. 9 (1999) 4591-4606.
44. G. Lee and D. Petersen, "Superconductive A/D converters," Proc. IEEE 77 (1989) 1264-
1273.
45. When using fluxons, O0, as data bits, a time-averaged voltage measurement serves as a direct
measurement of the bit-rate according to the relation <V> = Oo/sec, where the measurement
accuracy is determined by the uncertainty limit on the voltage measurement apparatus.
46. C. Hamilton, C. Burroughs, and S. Benz, "Josephson voltage standard - A review," IEEE
Trans. Appl. Supercond. 7 (1997) 3756-3761.
47. R. Walden, "Analog-to-digital converter survey and analysis," IEEE J. Selec. Areas Comm.
17(1999)539-550.
RSFQ Technology: Circuits and Systems 357

48. S. Rylov, D. Brock, D. Gaidarenko, A. Kirichenko, J. Vogt, and V. Semenov, "High


resolution ADC using phase modulation-demodulation architecture," IEEE Trans. Appl.
Supercond. 9 (1999) 3016-3019.
49. S. Rylov, "Novel architecture for flux quantizing ADCs," Extend. Abs of 4' Intl. Supercond.
Electr. Conf. Boulder, CO (1993) 112-113.
50. S. Rylov, "Analysis of high-performance counter-type A/D converters using RSFQ
logic/memory elements," IEEE Trans. Mag. 27 (1991) 2431-2434.
51. E. Hogenauer, "An economical class of digital filters for decimation and interpolation,"
IEEE. Trans. Acoust., Speech, andSig. Proc. 29 (1981) 155-162.
52. S. Rylov and R. Robertazzi, "Superconducting high-resolution A/D converter based on phase
modulation and multi-channel timing arbitration," IEEE Trans. Appl. Supercond. 5 (1995)
2260-2263.
53. S. Rylov, L. Bunz, D. Gaidarenko, M. Fisher, R. Robertazzi, and O. Mukhanov, "High
resolution ADC system," IEEE Trans. Appl. Supercond. 1 (1997) 2649-2652.
54. O. Mukhanov, D. Brock, A. Kirichenko, et al., "Progress in the Development of a
Superconductive High-Resolution ADC," Extend. Abs of 7th Intl. Supercond. Electr. Conf.
Berkeley, CA, 13-16.
55. D. Brock, O. Mukhanov, W. Li, et al.,"A Dynamically programmable ADC for multifunction
digital receivers," Gov. Microcir. App. Conf. Dig. Pap 25 (2000) 217-220.
56. P. Bradley, "A 6-bit Josephson flash A/D converter with GHz input bandwidth," IEEE Trans.
Appl. Supercond. 3 (1993) 2550-2557.
57. S. Rylov, V. Semenov, and K. Likharev, "Josephson junction A/D converters using
differential coding," IEEE Trans. Mag. 23 (1987) 735-378.
58. G. Lee and H. Ko, "Phase tree: a periodic, fractional flux quantum vernier for high-speed
interpolation of A/D converters," IEEE Trans. Appl. Supercond. 3 (1993) 3001-3004.
59. S. B. Kaplan, S. V. Rylov and P.D. Bradley, "Real-time digital error correction for flash
analog-to-digital converter," IEEE Trans. Appl. Supercond. 7 (1997) 2822-2825.
60. C. Anderson, "Josephson look-back analog to digital converter," IEEE Trans. Appl.
Supercond, 3 (1993) 2769-2773.
61. J. Candy and G. Temes, (ed.), Oversampling Delta-Sigma Data Converters. IEEE Press,
Piscataway,NJ(1992).
62. J. Przybysz, D. Miller, E. Naviasky, and J. Kang, Josephson sigma-delta modulator for high
dynamic range A/D conversion," IEEE Trans. Appl. Supercond. 3 (1993) 2732-2735.
63. J. Przybysz, D. Miller, and E. Naviasky, "Two-loop modulator for sigma-delta analog-to-
digital converter," IEEE Trans. Appl. Supercond. 5 (1995) 2248-2251.
64. D. Miller, J. Przybysz, H. Worsham, and E. Dean "Flux quantum sigma-delta analog-to-
digital converters for rf signals," IEEE Trans. Appl. Supercond. 9 (1999) 4026-4029.
65. D. Miller, J. Przybysz, H. Worsham, and A. Miklich, "Superconducting sigma-delta analog-
to-digital converters," Extend. Abstr. of 6th Intl. Supercond. Elec. Conf, Berlin, Germany
(1997)38-40.
66. S. Benz, C. Hamilton, and C. Burroughs "Operating margins for a superconducting voltage
waveform synthesizer," Extend. Abs of 7th Inter. Supercond. Electr. Conf. Berkeley, CA
(1999) 115-117.
67. H. Sasaki, S. Kiryu, F. Hirayama, et al., "RSFQ-based D/A converter for AC voltage
standard," IEEE Trans. Appl. Supercond. 9 (1999) 3561-3564.
68. V.K. Semenov, "Digital to analog conversion based on processing of the SFQ pulses," IEEE
Trans. Appl. Supercond. 3 (1993) 2637-2640.
358 D. K. Brock

69. C.A. Hamilton, "Josephson voltage standard based on single-flux-quantum voltage


multipliers," IEEE Tram. Appl. Supercond. 2 (1992) 139-142; and S.P. Benz, Appl. Phys.
Lett. 67(1995)2714.
70. S.P. Benz and C.A. Hamilton, "A pulse-driven programmable Josephson voltage standard,"
Appl. Phys. Lett. 68 (1996) 3171-3173.
71. A. Benz, C. Hamilton, C. Bouroughs, et al., "Pulse-driven Josephson digital/analog
converter," IEEE Trans. Appl. Supercond. 8 (1998) 42-47.
72. K. Gaj, E. Friedman, and M. Feldman, "Timing of multi-Gigahertz RSFQ digital circuits,"/.
VLSISig. Proc. Sys. 16 (1997) 2826-2831.
73. C. Mancini and M. Bocko, "Short-term stability of RSFQ ring oscillators," IEEE Trans.
Appl. Supercond. 9 (1999) 3545-3548.
74. D. Gupta and Y. Zhang, "On-Chip Clock Technology for Ultrafast Digital Superconducting
Electronics,"^/. Phys. Lett., 76 (2000) 3819-3821.
75. Y.M. Zhang, V. Borzenets, V.K. Kaplunenko, and N.B. Dubash, "Underdamped long
Josephson junction coupled to overdamped single-flux-quantum circuits," Appl. Phys. Lett.,
71 (1997) 1-3.
76. Y. Zhang and D. Gupta, "Low-jitter on-chip clock for RSFQ circuit applications", Exten.
Abs.of7'h Intl. Supercond. Elec. Conf. Berkeley, CA (1999) 88-90.
77. D. Brock and M. Pambianchi, "A 60 GHz monolithic RSFQ digital phase-locked loop," 2000
Intl. Microwave Symp. Dig. Vol. 1 (2000) TU4D-3.
78. See for instance: R.E. Best, Phase-Locked loops: Design. Simulation, and Applications.
McGraw-Hill, New York (1997) 91.
79. S. Nagasawa, Y. Hashimoto, HNumata, and S. Tahara, "High-frequency clock operation of
Josephson 256-word x 16-bit RAMs," IEEE Trans. Appl. Supercond. 9 (1999) 3708-3713.
80. S. Nagasewa, H. Hasegawa, T Hasimoto, et al. "Design of a 16 K-bit superconducting
latching/SFQ hybrid RAM," Extend. Abs of 7th Intl. Supercond. Electr. Conf. Berkeley, CA
(1999)365-367.
81. Q. Herr and L. Eaton "Towards a 16 kilobit, sub-nanosecond Josephson RAM", Extend. Abs
of 7th Inter.Supercond. Electr. Conf. Berkeley, CA (1999) 362-364.
82. A. Kirichenko, O. Mukhanov, and D. Brock "A single flux quantum cryogenic random
access memory" Extend. Abs of 7,h Inter.Supercond. Electr. Conf. Berkeley, CA (1999) 124-
127.
83. S. Nagasawa, Y. Hashimoto, H Numata, and S. Tahara, "A 380ps 9.5mW Josephson 4-Kbit
RAM operating at a high bit yield," IEEE Trans. Appl. Supercond, 5 (1995) 2447.
84. S. Tahara, I. Ishida, Y. Ajisawa, and Y. Wada, "Experimental vortex transition
nondestructive read-out Josephson memory cell," J. Appl. Phys. 65 (1989) 851-856.
85. O.A. Mukhanov, "Rapid Single Flux Quantum (RSFQ) shift register family," IEEE Trans.
Appl Supercond. 3 (1993) 2578-2581.
86. O. Mukhanov, S. Polonsky, V. Semenov, "New Elements of the RSFQ logic family," IEEE
Trans. Mag. 27 (1991) 2435-2438.
87. P.-F. Yuh, "Testing a 4-b shift register at 11 GHz," IEEE Trans. Appl. Supercond. 2 (1992),
101-105.
88. O.A. Mukhanov, "RSFQ 1024-bit shift register for acquisition memory," IEEE Trans. Appl.
Supercond. 3 (1993) 3102-3113.
89. O. Mukhanov, "Design and test of RSFQ full adders," Exten. Abs of 4lh Intl. Supercond.
Electr. Conf, Boulder, CO (1993) 19-20.
RSFQ Technology: Circuits and Systems 359

90. 0. Mukhanov, S. Rylov, V. Semenov, and S. Vyshenskii, "RSFQ logic arithemetic," IEEE
Trans. Mag. 25 (1989) 857-860.
91. S. Kaplan and O. Mukhanov, "Operation of a superconductive demultiplexer using rapid
single flux quantum (RSFQ) technology," IEEE Trans.Appl. Supercond. 5 (1995) 2853-2856.
92. A. Kirichenko and O. Mukhanov, "Implementation of novel 'push-forward' RSFQ carry-
save serial adders," IEEE Trans. Appl Supercond 5 (1995) 3010-3013.
93. S. Polonsky, J. Lin, and A. Rylyakov, "RSFQ arithmetic for DSP applications," IEEE Trans.
Appl. Supercond. 5 (1995) 2823-2826.
94. S. Martinet, D. Brock, M. Feldman, and M. Bocko, "Functional Testing of RSFQ Circuits,"
IEEE Trans Appl. Supercond. 5 (1995) 3006-3010.
95. S. Polonsky, V. Semenov, and A. Kirichenko, "RSFQ Bi-flip-flop and its possible
applications," Exten. Abs.4,h Intl. Supercond. Electr. Conf. Boulder, CO (1993) 106-107.
96. S. Polonsky, V. Semenov, and A. Kirichenko, "Single flux, quantum B flip-flip and its
possible applications," IEEE Trans. Appl. Supercond. 4 (1994) 9-18.
97. O. A. Mukhanov and A. F. Kirichenko, "Implementation of a FFT radix 2 butterfly using
serial RSFQ multiplier-adders," IEEE Trans. Appl. Supercond. 5, (1995) 2461-2464.
98. A. Kirichenko and O. Mukhanov, and A. Ryzhikh, "Advanced on-chip test technology for
RSFQ circuits," IEEE Trans. Appl. Supercond. 7 (1997) 3438-3441.
99. A. Pance, J.S. Martens, A. Barfknecht, J.E. Fleischman, K.E. Kihlstrom, and S.R. Whiteley,
"High performance RSFQ shift register for the 10 GHz hybrid superconducting digital
system," Exten. Abs.4'h Intl. Supercond. Electr. Conf., Boulder, (1993) 104-105.
100. T. Hendricks, M. Bruns, and E. Hershberg, "Thermal transport and electrical dissipation in
ultra-low thermal load, multi-Gigahertz I/O cables for superconducting microelectronics," in
P. Kittel (ed.), Advances in Cryogenic Engineering Vol. 41 A. Plenum Press, New York
(1996)1761.
101. O.A. Mukhanov, S.V. Rylov, V.K. Semenov, and S.V. Vyshenskii, "Recent development of
Rapid Single Flux Quantum (RSFQ) logic digital devices," Exten. Abs.of2"d Intl. Supercond.
Electr. Conf. Tokyo, (1989) 557-560.
102. S. Rylov, "DC-powered high-voltage driver for RSFQ logic family," Exten. Abs.4'h Intl.
Supercond. Electr. Conf, Boulder, CO (1993) 110-111.
103. J. Przybysz, D. Miller, S. Martinet, et al., "Interface circuits for chip-to-chip data transfer at
GHz rates," IEEE Trans. Appl. Supercond. 7 (1997) 2657-2660.
104. R. Koch, P. Ostertag, E. Crocoll, et al., "A NRZ-Output amplifier for RSFQ Circuits" IEEE
Trans. Appl. Supercond. 9 (1999) 3549-3552.
105. D.F. Schneider, J.C. Lin, S.V. Polonsky, and V.K. Semenov, "Broadband interfacing of
superconducting digital systems to room temperature electronics," IEEE Trans. Appl.
Supercond. 5 (1995) 3152-3155.
106. B. Van Seghbroeck, "Optical data communications between Josephson junction circuits and
room-temperature electronics." IEEE Trans. Appl. Supercond. 3 (1993) 2881-2884.
107. L. Bunz, E. Track, S. Rylov, et al., "Fiber-optic input and output for superconducting
circuits," Proc. SPIE 2160 (1994) 229-235.
108. M. Currie, R. Sobolewski, and T. Hsiang, "Subpicosecond measurements of the response of
Josephson transmission lines to large current pulses," IEEE Trans. Appl. Supercond. 9 (1999)
3531-3534.
109. M. Currie, R. Sobolewski, and T. Hsiang, "High-frequency crosstalk in superconducting
microstrip waveguide interconnects," IEEE Trans. Appl. Supercond. 9 (1999) 3602-3605.
360 D. K. Brock

110. H. Jones and D. Herrell "The characteristics of chip-to-chip signal propagation in a package
suitable for superconducting circuits," IBM J. Res. Dev. 24 (1980) 172-177.
111. T. Ogashiwa, H. Nakagawa, H. Akimoto, et al. "New flip-chip bonding technology for
superconducting IC," Jap. J. Appl. Phys. 31 (1992) L36-L38.
112. S. Tanahasi, T. Kubo, K. Kawabata, et al. "Superconducting wring in multi-chip module for
Josephson LSI circuits," Jap. J. Appl. Phys. 32 (1993) L898-L900.
113. T. Ogashiwa, H. Nakagawa, H. Akimoto, et al. "Flip-chip bonding using superconducting
solder bump," Jap. J. Appl. Phys. 34 (1995) 4043-4046.
114. R. Sandell, G. Akerling, and A. Smith "Multi-chip packaging for high-speed superconductive
circuits," IEEE Trans. Appl. Supercond. 5 (1995) 3160-3163.
115. L. Abelson, R. Elmadjian, G. Kerber, and A. Smith "Superconductive multi-chip module
process for high speed digital applications," IEEE Trans. Appl. Supercond. 7 (1997) 2627-
2629.
116. M. Aoyagi, H. Nakagawa H. Sato, et al. "Superconducting flip-chip bonding method with a
um-scale gap," Extend. Abs of 7th Inter.Supercond. Electr. Conf. Berkeley, CA (1999) 323-
325.
117. M. Maezawa, H. Yamamori, and A. Shoji "A novel approach to chip-to-chip communication
using a single flux quantum pulse," IEEE Trans. Appl. Supercond. 9 (1999) 4049-4052
118. M. Maezawa, H. Yamamori, and A. Shoji, "Chip-to-chip communication using a single flux
quantum pulse," IEEE Trans. Appl. Supercond. 10 (2000) 1603-1605.
119. G. Lehmann, J. Ramsden, J. Sochor, and G. Beek, "Cryopackaging for real world products,"
in P. Kittel (ed.), Advances in Cryogenic Engineering Vol. 43A. Plenum Press, New York
(1998)865-24.
120. T. Clynne, "Packaging and integration issues for cryoelectronic and superconductor
materials," in P. Kittel (ed.), Advances in Cryogenic Engineering Vol. 43A. Plenum Press,
New York (1998) 871-880.
121. D. Gaidarenko and R. Robertazzi "High performance packaging system for superconducting
electronics," IEEE Trans. Appl. Supercond. 9 (1999) 3668-3671.
122. E. Hershberg, T. Hendricks, and D. Patelzick, "A fully functional closed cycle cryosystem,
that uses less than one watt of refrigeration at 4.5 K, for a 2.5 GHz per channel, 128x128
channel superconducting switch," in P. Kittel (ed.), Advances in Cryogenic Engineering Vol.
43, Plenum Press, New York (1998) 881-888.
123. N. Dubash, V. Borzenets, Y. Zhang, et al., "System Demonstration of a multigigabit network
switch," IEEE Trans. Micro. Theor. Techni. 48 (2000) 1209-1215.
124. D. Gupta, D. V. Gaidarenko, and S. V. Rylov, "A 16-bit Serial Analog-to-Digital Converter
Module with Optical Output," IEEE Trans. Appl. Supercond. 9 (1999) 3030-3033.
125. Gupta, S. V. Rylov, and D. V. Gaidarenko, "High-resolution superconductive serial analog-
to-digital converter for on-focal plane data conversion," in B. Pain (ed.), Infrared Readout
Electronics IV. SPIE, Bellingham, WA, 3360 (1998) 28-39.
126. T. Van Duzer, "Superconductor Electronics, 1986-1996," IEEE Trans. Appl. Supercond. 7
(1997)98-111.
127. A. Silver, "Superconductivity in Electronics," IEEE Trans. Appl. Supercond. 7 (1997) 69-79.
128. M. Feldman, "Digital applications of Josephson junctions," in H. Ohta and C. Ishii (eds.),
Physics and Application of Mesoscopic Josephson Junctions. Phys. Soc. Jap., Tokyo (1999)
289-304.
129. M. Dorojevets, P. Bunyk, D. Zinoviev and K. Likharev "COOL-0: Design of an RSFQ
subsystem for petaflops computing," IEEE Trans. Appl. Supercond. 9 (1999) 3606-3614.
RSFQ Technology: Circuits and Systems 361

130. L. Wittie, D. Zinoviev, G. Sazaklis, and K. Likharev "CNET: Design of an RSFQ switching
network for petaflops-scale computing," IEEE Trans. Appl. Supercond. 9 (1999) 4034-4039.
131. A. Kidiyarova-Shevchenko and D. Zinoviev, "RSFQ pseudo random generator and its
possible applications," IEEE Trans. Appl. Supercond. 5 (1995) 2820-2822.
132. J. Rang, H. Worsham, and J. Przybysz, "4.6 GHz shift register and SFQ pseudorandom bit
sequence generator," IEEE Trans. Appl. Supercond. 5 (1995) 2827-2830.
133. J. Kang, J.Przybysz, S. Martinet, et al., "3.69 GHz single flux quantum pseudorandom bit
sequence generator fabricated with Nb/A10x/Nb," IEEE Trans. Appl. Supercond. 7 (1997)
2673-2676.
134. P. Dresselhaus, E. Dean, H. Worsham, et al, "Modulation and demonulation of 2 GHz
pseudo random binary sequence using SFQ digital circuits," IEEE Trans. Appl. Supercond. 9
(1999)3585-3589.
135. R. Sandell, J. Spargo, and M. Leung, "High data rate switch with amplifier chip," IEEE
Trans. Appl. Supercond. 9 (1999) 2985-2988.
136. J. Przybysz, Sr., "Applications of Josephson electronics in digital systems," in H. Weinstock
(ed.), Applications of Superconductivity, Kluwer Academic Publishers, Dordrecht (2000)
229-246.
137. H. Worsham, A. Miklich, D. Miller, J. Kang, and J. Przybysz, "Single flux quantum circuits
for 2.5 Gbps data switching," IEEE Trans. Appl. Supercond. 7 (1997) 2476-2479.
138. D. Zinoviev and K. Likharev, "Feasibility study of RSFQ based self-routing nonblocking
digital switches," IEEE Trans. Appl. Supercond. 1 (1997) 3155-3163.
139. For a good overview, see the special issues on Software Defined Radio of: IEEE J. Selec.
Areas in Comm. 17 (April 1999); IEEE Comm. Mag. 33 (May 1995) and IEICE Trans.
Comm. E83-B (June 2000).
140. E. Wikborg, V. Semenov, and K. Likharev, "RSFQ front-end for a software radio receiver"
IEEE Trans. Appl. Supercond. 9 (1999) 3615-3618.
141. S. Pagano, V. Palmieri, A. Esposite, O. Mukhanov, and S. Rylov, "First realization of a
tracking detector for high energy physics experiments based on Josephson digital readout
circuitry," IEEE Trans. Appl. Supercond. 9 (1999) 3628-3631.
142. S. Pagano, R. Cristano, L. Frunzio, V. Palmieri, et al. "Effect of intense proton radiation on
properties of Josephson junctions," IEEE Trans Appl. Supercond. 7 (1997) 2917-2920.
143. S. Pagano, L. Frunzio, R. Cristano, O. Mukahanov, et al., "Radiation hardness of Josephson
junctions and superconductive digital devices," Extnd. Abs. Of 6'h Intl. Superoncd. Electr.
Conf. Berlin, Germany (1997) 269-271.
144. O. Mukhanov and S. Rylov, "Time-to-digital converters based on RSFQ digital counters,"
IEEE Trans. Appl. Supercond.l (1997) 2669-2672.
145. O. Mukhanov, A. Kirichenko, and J. Vogt, and M. Pambianchi, "A superconductive multi-hit
time digitizer," IEEE Trans. Appl. Supercond. 9 (1999) 3619-3622.
146. D. Brock and O. Mukhanov and A. Rylyakov, "Advanced Receiver Components: A 50-ps
Resolution Multi-Hit Time-to-Digital Converter and 4 GHz Digital Correlator/
Autocorrelator," Gov. Microcir. App. Conf. Dig. Pap 24 (1999) 416-419.
147. A. Rylyakov, D. Schneider, and Yu. Polyakov, "A fully integrated 16-channel RSFQ
autocorrelator operating at 11 GHz," IEEE Trans. Appl. Supercond. 9 (1999) 3623-3627.
148. A. Rylyakov and K. Likharev, "Pulse jitter and timing errors in RSFQ circuits," IEEE Trans.
Appl. Supercond. 9 (1999) 3539-3544.
149. H.J.M. ter Brake, "Cryogenic Systems for Superconducting Devices," in H. Weinstock (ed.),
Applications of Superconductivity, Kluwer Academic Publishers, Dordrecht (2000).
362 D. K. Brock

150. Conference Proceedings of Superconducting Digital Circuits and Systems Vol. 1 & 2,
Institute for Technology and Strategic Research, George Washington University (1991).
151. G. Walker, Crvocoolers Part 1: Fundamentals. Plenum Press, New York (1983).
152. W. Burt and C. Chan, "New mid-size high efficiency pulse tube coolers," in R. Ross Jr. (ed.),
Crvocoolers 9. Plenum Press, New York (1997) 173-182.
153. H.Yoshimura, et al.,"Helium liquefaction by a Gifford-McMahon cycle cryogenic
refrigerator," Rev. Sci. Instrum. 60 (1989) 3533-3536.
154. I. Takashi, N. Masashi, N. Kouki, and Y. Hideto, "Development of a 2W class 4 K Gifford-
McMahon cycle cryocooler," in R. Ross Jr. (ed.), Crvocoolers 9. Plenum Press, New York
(1997) 617-626.
155. E. Tward, C. Chan, J. Raab, and R. Orsini, "Miniature long-life space qualified pulse tube
cryocooler," SAE Technical Paper #941622, SAE International, Warrendale, PA. (1994)
156. B. Jones, S. Scull, and C. Jewell, "The batch manufacture of Stirling-cycle coolers for space
applications including test, qualification, and integration issues," in R. Ross Jr. (ed.),
Crvocoolers 9. Plenum Press, New York (1997) 59-68.
I3BN 961 02 4638-21

www. worldscientific. com


4716 he

Das könnte Ihnen auch gefallen