Sie sind auf Seite 1von 176

Amir Zjajo

Brain-Machine
Interface
Circuits and Systems
Brain-Machine Interface
Amir Zjajo

Brain-Machine Interface
Circuits and Systems

13
Amir Zjajo
Delft University of Technology
Delft
The Netherlands

ISBN 978-3-319-31540-9 ISBN 978-3-319-31541-6 (eBook)


DOI 10.1007/978-3-319-31541-6

Library of Congress Control Number: 2016934192

Springer International Publishing Switzerland 2016


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material contained herein or for any errors
or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG Switzerland
To my son Viggo Alan and
my daughter Emma
Acknowledgements

The author acknowledges the contributions of Dr. Rene van Leuken of Delft
University of Technology, and Dr. Carlo Galuzzi of Maastricht University.

vii
Contents

1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 BrainMachine Interface: Circuits and Systems . . . . . . . . . . . . . . . . 2
1.2 Remarks on Current Design Practice. . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Organization of the Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Neural Signal Conditioning Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Power-Efficient Neural Signal Conditioning Circuit. . . . . . . . . . . . . 18
2.3 Operational Amplifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Neural Signal Quantization Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Low-Power A/D Converter Architectures . . . . . . . . . . . . . . . . . . . . . 34
3.3 A/D Converter Building Blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.1 Sample and Hold Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Bootstrap Switch Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.3 Operational Amplifier Circuit. . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.4 Latched Comparator Circuit. . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Voltage-Domain SAR A/D Conversion. . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Current-Domain SAR A/D Conversion. . . . . . . . . . . . . . . . . . . . . . . 58
3.6 Time-Domain Two-Step A/D Conversion . . . . . . . . . . . . . . . . . . . . . 60
3.7 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

ix
x Contents

4 Neural Signal Classification Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . 77


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Spike Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3 Spike Classifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5 BrainMachine Interface: System Optimization. . . . . . . . . . . . . . . . . . 95
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 Circuit Parameters Formulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2.1 Random Process Variability. . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2.2 Noise in Neural Recording Interface. . . . . . . . . . . . . . . . . . . 101
5.3 Stochastic MNA for Process Variability Analysis . . . . . . . . . . . . . . . 102
5.4 Stochastic MNA for Noise Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.5 PPA Optimization of Multichannel Neural Recording Interface. . . . 110
5.5.1 Power Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.5.2 Power Per Area Optimization. . . . . . . . . . . . . . . . . . . . . . . . . 112
5.6 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.1 Summary of the Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2 Recommendations and Future Research . . . . . . . . . . . . . . . . . . . . . . 128
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
About the Author

Amir Zjajo received the M.Sc. and DIC degrees from


the Imperial College London, London, UK, in 2000
and the Ph.D. degree from Eindhoven University of
Technology, Eindhoven, The Netherlands in 2010, all
in electrical engineering. In 2000, he joined Philips
Research Laboratories as a member of the research
staff in the Mixed-Signal Circuits and Systems Group.
From 2006 to 2009, he was with Corporate Research of
NXP Semiconductors as a Senior Research Scientist.
In 2009, he joined Delft University of Technology as a
Faculty member in the Circuit and Systems Group.
Dr. Zjajo has published more than 70 papers in ref-
erenced journals and conference proceedings, and holds more than ten US patents
or patent pending. He is the author of the books, Low-Voltage High-Resolution
A/D Converters: Design, Test and Calibration (Springer 2011, Chinese translation
2015), and Stochastic Process Variations in Deep-Submicron CMOS: Circuits
and Algorithms (Springer 2014). He serves as a member of Technical Program
Committee of IEEE Design, Automation and Test in Europe Conference, IEEE
International Symposium on Circuits and Systems, IEEE International Symposium
on VLSI, IEEE International Symposium on Nanoelectronic and Information
Systems, and IEEE International Conference on Embedded Computer Systems.
His research interests include power-efficient mixed-signal circuit and system
design for health and mobile applications and neuromorphic electronic circuits for
autonomous cognitive systems. Dr. Zjajo won the best paper award at BIODEVICES
2015 and DATE 2012.

xi
Abbreviations

A/D Analog to Digital


ADC Analog-to-Digital Converter
ANN Artificial Neural Network
AP Action Potentials
BDF Backward Differentiation Formula
BMI Brain Machine Interface
BSIM Berkeley Short-Channel IGFET Model
CAD Computer-Aided Design
CDF Cumulative Distribution Function
CMOS Complementary MOS
CMRR Common-Mode Rejection Ratio
D/A Digital to Analog
DAC Digital-to-Analog Converter
DAE Differential Algebraic Equations
DFT Discrete Fourier Transform
DIBL Drain-Induced Barrier Lowering
DNL Differential Nonlinearity
DR Dynamic Range
DSP Digital Signal Processor
DTFT Discrete Time Fourier Transform
EM Expectation Maximization
ENOB Effective Number of Bits
ERBF Exponential Radial Basis Function
ERBW Effective Resolution Bandwidth
FFT Fast Fourier Transform
GBW GainBandwidth Product
IC Integrated Circuit
IEEE Institute of Electrical and Electronics Engineers
INL Integral Nonlinearity

xiii
xiv Abbreviations

ITDFT Inverse Time Discrete Fourier Transform


KCL Kirchhoff Current Law
KKT KarushKuhnTucker
LFP Local Field Potentials
LNA Low Noise Amplifier
LSB Least Significant Bit
MNA Modified Nodal Analysis
MOS Metal Oxide Semiconductor
MOSFET MetalOxideSemiconductor Field-Effect Transistor
MSB Most Significant Bit
NA Nodal Analysis
NMOS Negative doped MOS
ODE Ordinary Differential Equation
OTA Operational Transconductance Amplifier
PDE Partial Differential Equation
PDF Probability Density Function
PGA Programmable Gain Amplifier
PMOS Positive doped MOS
PPA Power per Area
PSD Power Spectral Density
PSRR Power Supply Rejection Ratio
QP Quadratic Problem
QPO Quadratic Program Optimization
RBF Radial Basis Function
RTL Register Transfer Level
S/H Sample and Hold
SAR Successive Approximation Register
SC Switched Capacitor
SDE Stochastic Differential Equation
SFDR Spurious-Free Dynamic Range
SINAD Signal-to-Noise and Distortion
SNDR Signal-to-Noise plus Distortion Ratio
SNR Signal-to-Noise Ratio
SPICE Simulation Program with Integrated Circuit Emphasis
SRAM Static Random-Access Memory
STI Shallow Trench Isolation
SVD Singular Value Decomposition
SVM Support Vector Machine
T/D Time to Digital
T/H Track and Hold
TDC Time-to-Digital Converter
THD Total Harmonic Distortion
V/I Voltage to Current
VCCS Voltage-Controlled Current Sources
Abbreviations xv

VGA Variable Gain Amplifier


VTC Voltage-to-Time Converter
WCD Worst Case Design
WSS Wide Sense Stationary
Symbols

a Elements of the incidence matrix A, bounds


A Amplitude, area, constant singular incidence matrix
Af Voltage gain of feedback amplifier
Afmb Mid-band gain of amplifier
b Number of circuit branches, vector of biases, bounds
Bi Number of output codes
B Bit, effective stage resolution
Bn Noise bandwidth
BW Bandwidth
ci Class to which the data xi from the input vector belongs
cxy Process correction factors depending upon the process maturity
C * NeymanPearson Critical region
C Capacitance, covariance matrix
CC Compensation capacitance, cumulative coverage
Ceff Effective capacitance
CG Gate capacitance, input capacitance of the operational amplifier
CGS Gate-Source capacitance
Cin Input capacitance
C L Load capacitance
Cout Parasitic output capacitance
Cox Gate-oxide capacitance
Cpar Parasitic capacitance
Ctot Total load capacitance
CQ Function of the deterministic initial solution
C Autocorrelation matrix
C Symmetrical covariance matrix
di Location of transistor i on the die with respect to a point of origin
Di Multiplier of reference voltage
Dout Digital output
e Noise, error, scaling parameter of transistor current

xvii
xviii Symbols

eq Quantization error
e2 Noise power
E{.} Expected value
Econv Energy per conversion step
fclk Clock frequency
fin Input frequency
fp,n(di) Eigenfunctions of the covariance matrix
fS Sampling frequency
fsig Signal frequency
fspur Frequency of spurious tone
fT Transit frequency
f(x,t) Vector of noise intensities
FQ Function of the deterministic initial solution
g Conductance
gm Transconductance
Gi Interstage gain
Gm Transconductance
h Numerical integration stepsize, surface heat transfer coefficient
i Index, circuit node, transistor on the die
imax Number of iteration steps
I Current
Iamp Total amplifier current consumption
Idiff Difussion current
ID Drain current
IDD Power supply current
Iref Reference current
j Index, circuit branch
J0 Jacobian of the initial data z0 evaluated at pi
k Boltzmanns coefficient, error correction coefficient, index
K Amplifier current gain, gain error correction coefficient
K(t) Variancecovariance matrix of (t)
L Channel length
Li Low-rank Cholesky factors
L(|TX) Log-likelihood of parameter with respect to input set TX
m Index
M Number of terms, number of channels in BMI
n Index, number of circuit nodes, number of bits
N Number of bits
Naperture Aperture jitter limited resolution
P Power
p Process parameter
p(di,) Stochastic process corresponding to process parameter p
pX|(x|) Gaussian mixture model
p* Process parameter deviations from their corresponding nominal values
Symbols xix

p1 Dominant pole of amplifier


p2 Nondominant pole of amplifier
q Channel charge, circuit nodes, index, vector of state variables
r Circuit nodes, number of iterations
R Resistance
rds Output resistance of a transistor
Reff Effective thermal resistance
Ron Switch on-resistance
Rn-1 Process noise covariance
rout Amplifier output resistance
Si Silicon
Sn Output vector of temperatures at sensor locations
s Scaling parameter of transistor size, score
t Time
T Absolute temperature, transpose, time, transistor
tox Oxide thickness
tS Sampling time
vf Fractional part of the analog input signal
vn Input-referred noise of the amplifier
un Gaussian sensor noise
V Voltage
VCM Common-mode voltage
VDD Positive supply voltage
VDS Drain-source voltage
VDS,SAT Drain-source saturation voltage
VFS Full-scale voltage
VGS Gate-source voltage
Vin Input voltage
VLSB Voltage corresponding to the least significant bit
Voff Offset voltage
Vref Reference voltage
VT Threshold voltage
UT Thermal voltage
w Normal vector perpendicular to the hyperplane, weight
wi Cost of applying test stimuli performing test number i
W Channel width, Wiener process parameter vector, loss function
W*, L* Geometrical deformation due to manufacturing variations
x Vector of unknowns
xi Vectors of observations
x(t) Analog input signal
X Input, observability Gramian
y0 Arbitrary initial state of the circuit
y[k] Output digital signal
y Yield
xx Symbols

Y Output, controllability Gramian


z0 Nominal voltages and currents
z(1-) (1-)-quantile of the standard normal distribution Z
z[k] Reconstructed output signal
Z Low rank Cholesky factor
NeymanPearson significance level, weight vector of the training set
Feedback factor, transistor current gain, bound
Noise excess factor, measurement correction factor, reference errors
i Iteration shift parameters
Relative mismatch
Error
Distributed random variable, forgetting factor
Random vector,
Die, unknown parameter vector, coefficients of mobility reduction
p,n Eigenvalues of the covariance matrix
Converter transition code, subthreshold gate coupling coefficient
Threshold of significance level , white noise process
Central value of the transition band
Carrier mobility, mean value, iteration step size
Fitting parameter estimated from the extracted data
Yield bound
(t) Vector of independent Gaussian white noise sources
i Degree of misclassification of the data xi
n() Vector of zero-mean uncorrelated Gaussian random variables
Correlation parameter reflecting the spatial scale of clustering
p Random vector accounting for device tolerances
Standard deviation
Un Measurement noise covariance
Time constant
Matrix of normal vectors
Set of all valid design variable vectors in design space
Clock phase, Mercer kernel
T Thermal voltage at the actual temperature
Circuit performance function
r,f[.] Probability function
Relative deviation, yield constraint violation
r Boundaries of voltage of interest
Covariance matrix
Sampling space
Chapter 1
Introduction

AbstractContinuous monitoring of physiological parameters (e.g., the moni-


toring of stress and emotion, personal psychological analysis) enabled by brain
machine interface (BMI) circuits is not only beneficial for chronic diseases, but
for detection of the onset of a medical condition and the preventive or therapeu-
tic measures. It is expected that the combination of ultra-low power sensor- and
ultra-low power wireless communication technology will enable new biomedi-
cal devices that will be able to enhance our sensing ability, and can also provide
prosthetic functions (e.g., cochlear implants, artificial retina, motor functions).
Practical multichannel BMI systems are combined with CMOS electronics for
long term and reliable recording and conditioning of intra-cortical neural signals,
on-chip processing of the recorded neural data, and stimulating the nervous sys-
tem in a closed-loop framework. To evade the risk of infection, these systems are
implanted under the skin, while the recorded neural signals and the power required
for the implant operation is transmitted wirelessly. This migration, to allow prox-
imity between electrodes and circuitry and the increasing density in multichannel
electrode arrays, is, however, creating significant design challenges in respect to
circuit miniaturization and power dissipation reduction of the recording system.
Furthermore, the space to host the system is restricted to ensure minimal tissue
damage and tissue displacement during implantation. In this book, this design
problem is addressed at various abstraction levels, i.e., circuit level and system
level. It therefore provides a broad view on the various solutions that have to be
used and their possible combination in very effective complementary techniques.
Technology scaling, circuit topologies, architecture trends, (post-silicon) cir-
cuit optimization algorithms and yield-constrained, power-per-area minimization
framework specifically target power-performance trade-off, from the spatial reso-
lution (i.e., number of channels), feasible wireless data bandwidth and information
quality to the delivered power of implantable batteries.

Springer International Publishing Switzerland 2016 1


A. Zjajo, Brain-Machine Interface, DOI10.1007/978-3-319-31541-6_1
2 1Introduction

1.1BrainMachine Interface: Circuits and Systems

Best way to predict the future is to invent it. Medicine in the twentieth century
relied primarily on pharmaceuticals that could chemically alter the action of neu-
rons or other cells in the body, but twenty-first century health care may be defined
more by electroceuticals: novel treatments that will use pulses of electricity to
regulate the activity of neurons, or devices that interface directly with our nerves.
Systems such as brainmachine interface (BMI) detect the voltage changes in
the brain that occur when neurons fire to trigger a thought or an action, and they
translate those signal into digital information that is conveyed to the machine, e.g.,
prosthetic limb, speech prosthesis, a wheelchair.
Recently, many promising technological advances are about to change our con-
cept about healthcare, as well as the provision of medical cares. For example, the
telemedicine, e-hospital, and ubiquitous healthcare are enabled by emerging wire-
less broadband communication technology. While initially becoming main-stream
for portable devices such as notebook computers and smart phones, wireless
communication (e.g., wireless sensor network, body sensor network) is evolving
toward wearable and/or implantable solutions. The combination of two technol-
ogies, ultra-low power sensor technology and ultra-low power wireless commu-
nication technology, enables long-term continuous monitoring and feedback to
medical professionals wherever needed.
Neural prosthesis systems enable the interaction with neural cells either by
recording, to facilitate early diagnosis and predict intended behavior before under-
taking any preventive or corrective actions, or by stimulation, to prevent the onset
of detrimental neural activity. Monitoring the activity of a large population of neu-
rons in neurobiological tissue with high-density microelectrode arrays in multi-
channel implantable BMI is a prerequisite for understanding the cortical structures
and can lead to a better conception of stark brain disorders, such as Alzheimers
and Parkinsons diseases, epilepsy and autism [1], or to reestablish sensory (e.g.,
hearing and vision) or motor (e.g., movement and speech) functions [2].
Metal-wire and micro-machined silicon neural probes, such as the Michigan
probe [3] or the Utah array [4], have aided the development of highly integrated
multichannel recording devices with large channel counts, enabling study of brain
activity and the complex processing performed by neural systems in vivo [57].
Several studies have demonstrated that the understanding of certain brain functions
can only be achieved by monitoring the electrical activity of large numbers of indi-
vidual neurons in multiple brain areas at the same time [8]. Consequently, real-time
acquisition from many parallel readout channels is thus needed both for the suc-
cessful implementation of neural prosthetic devices as well as for a better under-
standing of fundamental neural circuits and connectivity patterns in the brain [9].
One of the main goals of the current neural probe technologies [1021] is to
minimize the size of the implants while including as many recording sites as pos-
sible, with high spatial resolution. This enables the fabrication of devices that
match the feature size and density of neural circuits [22], and facilitates the spike
1.1 BrainMachine Interface: Circuits and Systems 3

classification process [23, 24]. Because electrical recording from single neurons is
invasive, monitoring large numbers of neurons using large implanted devices inev-
itably increases the tissue damage; thus, there exists a trade-off between the probe
size and the number of recording sites. Although existing neural probes can record
from many neurons, the limitations in the interconnect technology constrains the
number of recording sites that can be routed out of the probe [8].
The study of highly localized neural activity requires, besides implantable
microelectrodes, electronic circuitry for accurately amplifying and conditioning
the signals detected at the recording sites. While neural probes have become more
compact and denser in order to monitor large populations of neurons, the inter-
facing electronic circuits have also become smaller and more capable of handling
large amounts of parallel recording channels. Some of the challenges in the design
of analog front-end circuits for neural recording are associated with the nature of
the neural signals. These signals have amplitudes in the order of few V to several
mV and frequency spans from dc to a few kHz. Local field potentials (LFPs), rep-
resenting averaged activity from small sets of neurons surrounding the recording
sites, can be found in the low-frequency range (~1300Hz). On the other hand,
action potentials (APs) or spikes, representing single-cell activity, are located in
the higher frequency range (~30010kHz). Recording both LFPs and APs using
implanted electrodes yields the most informative signals for studying neuronal
communication and computation. Thus, according to the nature of a specific sig-
nal, the recording circuits have to be designed with sufficiently low input-referred
noise [i.e., to achieve a high signal-to-noise ratio (SNR)] and sufficient gain and
dynamic range.
The raw data rates that are generated by simultaneous monitoring of hun-
dreds and even thousands of neurons are large [25]. When sampled at 32kS/s
with 10-bit precision, 100 electrodes would generate raw data rate of 32Mbs1.
Communicating such volumes of neuronal data over battery-powered wireless
links, while maintaining reasonable battery life, is hardly possible with common
methods of low-power wireless communications. Evidently, some form of data
reduction or lossy data compression to reduce the raw waveform data capacity,
e.g., wavelet transform [26], must be applied. Alternatively, only significant fea-
tures of the neuronal signal could be extracted and the transmitted data could be
limited to those features only [8], which may lead to an order of magnitude reduc-
tion in the required data rate [27]. Additionally, if the neuronal spikes are sorted
on the chip [28], and mere notifications of spike events are transmitted to the host,
another order of magnitude reduction can be achieved. Adapting power-efficient
spike sorting algorithms for utilization in very-large-scale integration (VLSI) can
yet lead to significant power savings, with only a limited accuracy loss [29, 30].
The block diagram of M-channel neural recording system is illustrated in
Fig. 1.1. With an increase in the range of applications and their functionalities,
neuroprosthetic devices are evolving to a closed-loop control system [31] com-
posed of a front-end neural recording interface and a backend neural signal pro-
cessing, containing features such as local field potential measurement circuits
[32] or spike detection circuits [33]. To evade the risk of infection, these systems
4 1Introduction

front-end neural interface back-end signal processing


#M #M
#K #K
#M #M

DSP
M LNA M M M n n K K

recording low noise band-pass filter programmable gain SAR A/D digital signal processing D/A converter reconstructionfilter stimulator
electrode amplifier amplifier converter system electrode

Fig.1.1Block diagram of a brainmachine interface with M-channel front-end neural recording


interface and backend signal processing

are implanted under the skin, while the recorded neural signals and the power
required for the implant operation is transmitted wirelessly. If a battery is used
with an energy capacity of 625mAh at 1.5V, a CMOS IC with 100mW power
consumption can only last for nine and a half hours. Most of implantable biomedi-
cal devices in contrast should last more than 10years and this limits the average
system power consumption (when using the same battery) to 10W. Proximity
between electrodes and circuitry and the increasing density in multichannel elec-
trode arrays are creating significant design challenges in respect to circuit minia-
turization and power dissipation reduction of the recording system.
Power density is limited to 0.8mW/mm2 [34] to prevent possible heat damage
to the tissue surrounding the device (and subsequently, limited power consumption
prolong the batterys longevity and evade recurrent battery replacements surger-
ies). Furthermore, the space to host the system is restricted to ensure minimal tis-
sue damage and tissue displacement during implantation.
The signal quality in neural interface front-end, beside the specifics of the elec-
trode material and the electrode/tissue interface, is limited by the nature of the
bio-potential signal and its biological background noise, dictating system resource
constraints, such as power, area, and bandwidth. The BMI architecture includes,
additionally, a micro-stimulation module to apply stimulation signals to the brain
neural tissues. Currently, multi-electrode arrays contain 10100s electrodes and
are projected to double every seven years [35]. When a neuron fires an action
potential, the cell membrane becomes depolarized by the opening of voltage-con-
trolled neuron channels, which leads to a flow of current both inside and outside
the neuron. Since extracellular media is resistive [36], the extracellular potential is
approximately proportional to the current across the neuron membrane [37]. The
membrane roughly behaves like an RC circuit and most current flows through the
membrane capacitance [38].
The neural data acquired by the recording electrodes is conditioned using
analog circuits. The electrode is characterized by its charge density and imped-
ance characteristics (e.g., a 36m diameter probe (1000m2) may have a capaci-
tance of 200pF, equivalent to 80k impedance at 10kHz), which determines the
amount of noise added to the signal (e.g., 7Vrms for a 10kHz recording band-
width). As a result of the small amplitude of neural signals (typically ranging from
10 to 500V and containing data up to~10kHz), and the high impedance of
the electrode tissue interface, low-noise amplification (LNA), band-pass filtering,
1.1 BrainMachine Interface: Circuits and Systems 5

and programmable-gain amplification (PGA) of the neural signals is performed


before the signals can be digitized by a analog-to-digital converter. The amplifiers
offer high gain (LNA 100and PGA in the range of 1020) without degrading
the signal linearity. To keep the overall bandwidth constant, when the bias cur-
rent of the gain stage is varied, a band-pass filter [39] is added to the output of
the LNA. The configurable A/D converter set the numerical accuracy of the subse-
quent spike processing part. A 100-channel, 10-bit-precise digitization of raw neu-
ral waveforms sampled at 32kHz generates 32Mbs1 of data; the power costs in
signal conditioning, quantization, and wireless communication all scale with this
high data rate. The feature extraction and spike classification significantly reduce
the data requirements prior to data transmission (in multichannel systems, the raw
data rate is substantially higher than the limited bandwidth of the wireless telem-
etry). The A/D converter output containing the time-multiplexed neural signals is
fed to a backend signal processing unit, which provides additional filtering and
executes spike detection [40]. After feature extraction, and spike classification, the
relevant information is then utilized for K-channel brain stimulation in a closed-
loop framework, or alternatively, transmitted to an outside receiver for offline
processing. The circuit is powered through wireless power transfer links to avoid
large-capacity batteries or skin-penetrating wires.
The analog-to-digital interface circuit exhibits keen sensitivity to technology
scaling. To achieve high linearity, high dynamic range, and high sampling speed
simultaneously under low supply voltages in deep submicron CMOS technology
with low power consumption has thus far been conceived of as extremely chal-
lenging. The impact of random dopant fluctuation is exhibited through a large VT
and accounts for most of the variations observed in analog circuits where system-
atic variation is small and random uncorrelated variation can cause mismatch (e.g.,
stochastic fluctuation of parameter mismatch is often referred to with the term
matching) that results in reduced noise margins. In general, to cope with the deg-
radation in device properties, several design techniques have been applied, start-
ing with manual trimming in the early days, followed by analog techniques such
as chopper stabilization, auto-zeroing techniques (correlated double sampling),
dynamic element matching, dynamic current mirrors and current copiers.
Nowadays digital signal-correction processing is exploited to compensate for
signal impairments created by analog device imperfections on both block and
system level [41] (Fig.1.2). System level correction uses system knowledge to
improve or simplify block level correction tasks. In contrast, block level correc-
tion refers to the improvement of the overall performance of a particular block
in the system. In the mixed-signal blocks, due to additional digital post- or pre-
processing, the boundaries between analog signal processing and digital signal
processing become blurred. Because of the increasing analog/digital performance
gap and the flexibility of digital circuits, performance-supporting digital circuits
are an intrinsic part of mixed-signal and analog circuits. In this approach, integra-
tion density and long-term storage are the attributes that create a resilient solu-
tion with better power and area efficiency. Additionally, it allows us to break away
from the (speed degrading) device area increase traditionally associated with the
6 1Introduction

(a) (b)
Correction Approach A/D
Block
System Level
Correction
Error
D/A Estimation

(c)
Block Level Block Level Block Level
Correction Correction Correction A/D Error
Block Correction

Digital signal Mixed signal Analog signal Error


processing processing processing Estimation

Fig. 1.2a Correction approach for mixed-signal and analog circuits, b mixed-signal solution
(digital error estimation, analog error correction), c alternative mixed-signal scheme (error esti-
mation and correction are done digitally)

demand for reduced circuit offset. Initial work on digital signal-correction process-
ing started in the early nineties, and focused on offset attenuation or dispersion.
The next priority became area scaling for analog functions, to keep up with the
pace at which digital cost-per-function was reducing [42]. Lately, the main focus
is on correcting analog device characteristics, which became impaired as a result
of aggressive feature size reduction and area scaling. However, efficient digital sig-
nal-correction processing of analog circuits is only possible if their analog behav-
ior is sufficiently well characterized. As a consequence, an appropriate model, as
well as its corresponding parameters, has to be identified. The model is based on a
priori knowledge about the system. The key parameters that influence the system
and their time behavior are typical examples. Nevertheless, in principle, the model
itself can be derived and modified adaptively, which is the central topic of adap-
tive control theory. The parameters of the model can be tuned during the fabrica-
tion of the chip or during its operation. Since fabrication-based correction methods
are limited, algorithms that adapt to a nonstationary environment during operation
have to be employed.

1.2Remarks on Current Design Practice

In this section, the most challenging design issues for analog circuits in deep sub-
micron technologies such as contrasting the degradation of analog performances
caused by requirement for biasing at lower operating voltages, obtaining high
dynamic rangewith low voltage supplies and ensuring good matchingfor low-off-
set are reviewed. Additionally, the subsequent remedies to improves the perfor-
mance of analog circuits and data converters by correcting or calibrating the static
and possibly the dynamic limitations through calibration techniques are briefly
discussed as well.
1.2 Remarks on Current Design Practice 7

(a)
200 1000
Line width

GBW [GHz]
150 100
Line Width [nm] GBW

100 10

Supply Voltage [V]


50 1

Supply voltage

0 0.1
1998 2003 2008 2015
Year

(b) 12

10 CL=100 fF

8
GBW [GHz]

90 nm
6 CL=200 fF

4
CL=100 fF
0.25 m
2
CL=200 fF
0
0.25 0.75 1.25 1.75
IDS [A]

Fig. 1.3a Trend of analog features in CMOS technologies. b Gain-bandwidth product versus
drain current in two technological nodes

From an integration point of view the analog electronics must be realized on


the same die as the digital core and consequently must cope with the CMOS evo-
lution dictated by the digital circuit. Technology scaling (Fig.1.3a) offers sig-
nificantly lowering of the cost of digital logic and memory. To ensure sufficient
lifetime for digital circuitry and to keep power consumption at an acceptable level,
the dimension-reduction is accompanied by lowering of nominal supply volt-
ages. Due to the reduction of supply voltage the available signal swing is low-
ered, fundamentally limiting the achievable dynamic range at reasonable power
consumption levels. Additionally, lower supply voltages require biasing at lower
operating voltages which results in worse transistor properties, and hence yield
circuits with lower performance. To achieve a high linearity, high sampling speed,
8 1Introduction

high dynamic range, with low supply voltages and low power dissipation in ultra-
deep submicron CMOS technology is a major challenge. The key limitation of
analog circuits is that they operate with electrical variables and not simply with
discrete numbers that, in circuit implementations, gives rise of a beneficial noise
margin. On the contrary, the accuracy of analog circuits fundamentally relies on
matchingbetween components, low noise, offset and low distortions.
With reduction of the supply voltage to ensure suitable overdrive voltage for
keeping transistors in saturation, even if the number of transistors stacked-up is
kept at the minimum, the swing of signals is low if high resolution is required.
Low voltage is also problematic for driving CMOS switches especially for the
ones connected to signal nodes as the on-resistance can become very high or at the
limit the switch does not close at all in some interval of the input amplitude.
In general, to achieve a high gain operation, high output impedance is neces-
sary, e.g., drain current should vary only slightly with the applied VDS. With the
transistor scaling, the drain assert its influence more strongly due to the growing
proximity of gate and drain connections and increase the sensitivity of the drain
current to the drain voltage. The rapid degradation of the output resistance at gate
lengths below 0.1m and the saturation of gm reduce the device intrinsic gain
gmro characteristics.
As transistor size is reduced, the fields in the channel increase and the dopant
impurity levels increase. Both changes reduce the carrier mobility, and hence the
transconductance gm. Typically, desired high transconductance value is obtained
at the cost of an increased bias current. However, for very short channel the car-
rier velocity quickly reaches the saturation limit at which the transconductance
also saturates becoming independent of gate length or bias gm = WeffCoxvsat/2.
As channel lengths are reduced without proportional reduction in drain voltage,
raising the electric field in the channel, the result is velocity saturation of the
carriers, limiting the current and the transconductance. A limited transconduct-
ance is problematic for analog design: for obtaining high gain it is necessary to
use wide transistors at the cost of an increased parasitic capacitances and, con-
sequently, limitations in bandwidth and slew rate. Even using longer lengths
obtaining gain with deep submicron technologies is not appropriate; it is typi-
cally necessary using cascode structures with stack of transistors or circuits with
positive feedback. As transistors dimension reduction continues, the intrinsic gain
keeps decreasing due to a lower output resistance as a result of drain-induced bar-
rier lowering and hot carrier impact ionization. To make devices smaller, junction
design has become more complex, leading to higher doping levels, shallower junc-
tions, halo doping, etc., all to decrease drain-induced barrier lowering. To keep
these complex junctions in place, the annealing steps formerly used to remove
damage and electrically active defects must be curtailed, increasing junction leak-
age. Heavier doping also is associated with thinner depletion layers and more
recombination centers that result in increased leakage current, even without lat-
tice damage. In addition, gate leakage currents in very thin-oxide devices will set
an upper bound on the attainable effective output resistance via circuit techniques
1.2 Remarks on Current Design Practice 9

(such as active cascode). Similarly, as scaling continues, the elevated drain-to-


source leakage in an off-switch can adversely affect the switch performance. If the
switch is driven by an amplifier, the leakage may lower the output resistance of the
amplifier, hence limits its low-frequency gain.
Low-distortionat quasi-dc frequencies is relevant for many analog circuits.
Typically, quasi-dc distortion may be due to the variation of the depletion layer
width along the channel, mobility reduction, velocity saturation and nonlineari-
ties in the transistors transconductances and in their output conductances, which
is heavily dependent on biasing, size, technology, and typically sees large volt-
age swings. With scaling higher harmonic components may increase in amplitude
despite the smaller signal; the distortion increases significantly. At circuit level the
degraded quasi-dc performance can be compensated by techniques that boost gain,
such as (regulated) cascodes. These are, however, harder to fit within decreasing
supply voltages. Other solutions include a more aggressive reduction of signal
magnitude which requires a higher power consumption to maintain SNR levels.
The theoretically highest gain bandwidth of an operational transconductance
amplifier (OTA) is almost determined by the cutoff frequency of transistor (see
Fig.1.3b for assessment of GBW for two technological nodes). Assuming that the
kT/C noise limit establishes the value of the load capacitance, to achieve required
SNR large transconductance is required. Accordingly, the aspect ratio necessary
for the input differential pair must be fairly large, in the hundred ranges. Similarly,
since with scaling the gate oxide becomes thinner, the specific capacitance Cox
increases as the scaling factor. However, since the gate area decreases as the
square of the scaling factor, the gate-to-source and gain-to-drain parasitic capaci-
tance lowers as the process is scaled. The coefficients for the parasitic input and
output capacitance, Cgs and Cgd shown in Fig.1.4a has been obtained by simula-
tion for conventional foundry processes under the assumption that the overdrive
voltage is 0.175V. Similarly, with technology scaling the actual junctions become
shallower, roughly proportional to the technology feature size. Also, the junction
area roughly scales in proportion to the minimum gate length, while the dope
level increase does not significantly increase the capacitance per area. Altogether
this leads to a significantly reduced junction capacitance per gm with newer tech-
nologies. Reducing transistor parasitic capacitance is desired, however, the ben-
efit is contrasted by the increased parasitic capacitance of the interconnection (the
capacitance of the wires connecting different parts of the chip). With transistors
becoming smaller and more transistors being placed on the chip, interconnect
capacitance is becoming a large percentage of total capacitance.
The global effect is that scaling does not benefit fully from the scaling in
increasing the speed of analog circuit as the position of the nondominant poles
is largely unchanged. Additionally, with the reduced signal swing, to achieve
required SNR signal capacitance has to increase proportionally. By examining
Fig.1.4b, it can be seen that the characteristic exhibits convex curve and takes the
highest value at the certain sink current (region b).
10 1Introduction

(a) 500
Cgs
W

C[fF/mA], f T [GHz], W [m/mA]


100
Cg
1/s 2

10
fT

1
0.1 0.2 0.3 0.4 0.5
L[m]

(b) 10k

b
c
1k 90 nm
a 0.13m
fC [Mz]

0.18 m

100 0.25 m

10
0.01 0.1 1 10
IDS [A]

Fig.1.4a Scaling of gate width and transistor capacitances. b Conversion frequency fc versus
drain current for four technological nodes

In the region of the current being less than this value (region a), the conversion
frequency increases with an increase of the sink current. Similarly, in the region
of the current being higher than this value (region c), the conversion frequency
decreases with an increase of the sink current. There are two reasons why this
characteristic is exhibited; in the low current region, the gm is proportional to the
sink current, and the parasitic capacitances are smaller than the signal capacitance.
At around the peak, at least one of the parasitic capacitances becomes equal to the
signal capacitance. In the region of the current being larger than that value, both
parasitic capacitances become larger than the signal capacitance and the conver-
sion frequency will decrease with an increase of the sink current.
The offset of any analog circuit and the static accuracy of data converters criti-
cally depend on the matchingbetween nominally identical devices. With transis-
tors becoming smaller, the number of atoms in the silicon that produce many of
1.2 Remarks on Current Design Practice 11

the transistors properties is becoming fewer, with the result that control of dopant
numbers and placement is more erratic. During chip manufacturing, random pro-
cess variations affect all transistor dimensions: length, width, junction depths,
oxide thickness, etc., and become a greater percentage of overall transistor size
as the transistor scales. The stochastic nature of physical and chemical fabrica-
tion steps causes a random error in electrical parameters that gives rise to a time
independent difference between equally designed elements. The error typically
decreases as the area of devices. Transistor matching properties are improved
with a thinner oxide [43]. Nevertheless, when the oxide thickness is reduced to
a few atomic layers, quantum effects will dominate and matching will degrade.
Since many circuit techniques exploit the equality of two components it is impor-
tant for a given process obtaining the best matching especially for critical devices.
Some of the rules that have to be followed to ensure good matching are: firstly,
devices to be matched should have the same structure and use the same materials,
secondly, the temperature of matched components should be the same, e.g., the
devices to be matched should be located on the same isotherm, which is obtained
by symmetrical placement with respect to the dissipative devices, thirdly, the dis-
tance between matched devices should be minimum for having the maximum spa-
tial correlation of fluctuating physical parameters, common-centroid geometries
should be used to cancel the gradient of parameters at the first order. Similarly, the
same orientation of devices on chip should be the same to eliminate dissymme-
tries due to unisotropic fabrication steps, or to the uniostropy of the silicon itself
and lastly, the surroundings in the layout, possibly improved by dummy structures
should be the same to avoid border mismatches.
The use of digital enhancing techniques in A/D converters (i.e., foreground,
background) reduces the need for expensive technologies with special fabrication
steps; a side advantage is that the cost of parts is reduced while maintaining good
yield, reliability and long-term stability. The foreground calibration interrupts the
normal operation of the converter for performing the trimming of elements or the
mismatch measurement by a dedicated calibration cycle normally performed at
power-on or during periods of inactivity of the circuit. Any miscalibration or sud-
den environmental changes such as power supply or temperature may make the
measured errors invalid. Therefore, for devices that operate for long periods it is
necessary to have periodic extra calibration cycles. The input switch restores the
data converter to normal operational after the mismatch measurement and every
conversion period the logic uses the output of the A/D converter to properly
address the memory that contains the correction quantity. In order to optimize the
memory size the stored data should be the minimum word-length, which depends
on technology accuracy and expected A/D linearity. The digital measure of errors,
that allows for calibration by digital signal processing, can be at the element, block
or entire converter level. The calibration parameters are stored in memories but, in
contrast with the trimming case, the content of the memories is frequently used, as
they are input of the digital processor.
12 1Introduction

Methods using background calibration work during the normal operation of the
converter by using extra circuitry that functions all the time synchronously with
the converter function. Often these circuits use hardware redundancy to perform
a background calibration on the fraction of the architecture that is not temporarily
used. However, since the use of redundant hardware is effective but costs silicon
area and power consumption, other methods aim at obtaining the functionality by
borrowing a small fraction of the sampled data circuit operation for performing the
self-calibration.

1.3Motivation

The healthcare or health-assisting devices, as well as medical care enabled by


these devices will enable a level of unprecedented care during each persons life.
Continuous monitoring of physiological parameters (e.g., the monitoring of stress
and emotion, personal psychological analysis) enabled by BMI circuits is not only
beneficial for chronic diseases, but for detection of the onset of a medical con-
dition and the preventive or therapeutic measures. Long-term data collection also
assists a more exact diagnosis. For non-chronic illnesses, it can assist rehabili-
tation of patients. It is expected that this new biomedical devices will be able to
enhance our sensing ability, and can also provide prosthetic functions (e.g., coch-
lear implants, artificial retina, motor functions).
Practical multichannel BMI systems are combined with CMOS electronics for
long term and reliable recording and conditioning of intra-cortical neural signals,
on-chip processing of the recorded neural data, and stimulating the nervous sys-
tem in a closed-loop framework. To evade the risk of infection, these systems are
implanted under the skin, while the recorded neural signals and the power required
for the implant operation is transmitted wirelessly. This migration, to allow prox-
imity between electrodes and circuitry, and the increasing density in multichannel
electrode arrays, is, however, creating significant design challenges in respect to
circuit miniaturization and power dissipation reduction of the recording system.
Power density is limited to 0.8mW/mm2 to prevent possible heat damage to the
tissue surrounding the device (and subsequently, limited power consumption pro-
long the batterys longevity and evade recurrent battery replacements surgeries).
Furthermore, the space to host the system is restricted to ensure minimal tissue
damage and tissue displacement during implantation.
In this book, this problem is addressed at various abstraction levels, i.e., circuit
level and system level. It therefore provides a broad view on the various solutions
that have to be used and their possible combination in very effective complemen-
tary techniques. Technology scaling, circuit topologies, architecture trends, (post-
silicon) circuit optimization algorithms and yield-constrained, power-per-area
minimization framework specifically target power-performance trade-off, from the
spatial resolution (i.e., number of channels), feasible wireless data bandwidth and
information quality to the delivered power of implantable batteries.
1.4 Organization of the Book 13

1.4Organization of the Book

In Chap. 2, we present a low-power neural signal conditioning system with capac-


itive-feedback low-noise amplifier, and capacitive-attenuation band-pass filter. The
capacitive-feedback amplifier offers low-offset and low-distortion solution with
optimal powernoise trade-off. Similarly, the capacitive-attenuation band-pass fil-
ter provides wide tuning range and low-power realization, while allowing simple
extension of the transconductors linear range, and consequently, ensuring low har-
monic distortion. The low noise amplifier and band-pass filter circuit are realized
in a 65nm CMOS technology, and consumes 1.15W and 390nW, respectively.
The fully differential low-noise amplifier achieves 40dB closed-loop gain and
occupies an area of 0.04mm2. Input-referred noise is 3.1Vrms over the oper-
ating bandwidth 0.120kHz. Distortion is below 2% total harmonic distortion
(THD) for typical extracellular neural signals (smaller than 10mV peak-to-peak).
The capacitive-attenuation band-pass filter with first-order slopes achieves 65dB
dynamic range, 210mVrms at 2% THD and 140Vrms total integrated output
noise.
In Chap. 3, we present several A/D converter realizations in voltage-, current-,
and time-domain, respectively, suitable for multichannel neural signal process-
ing, and we evaluate trade-off between noise, speed, and power dissipation on a
circuit-architecture level. This approach provides key insight required to address
SNR, response time, and linearity of the physical electronic interface. The voltage-
domain SAR A/D converter combines the functionalities of programmable-gain
stage and analog-to-digital conversion, occupies an area of 0.028mm2, and con-
sumes 1.1W of power at 100kS/s sampling rate. The current-mode successive
approximation A/D converter is realized in a 65nm CMOS technology, and con-
sumes less than 367nW at 40kS/s, corresponding to a figure of merit of 14fJ/
conversion-step, while operating from a 1V supply. A time-based, programmable-
gain A/D converter allows for an easily scalable, and power-efficient, implantable,
biomedical recording system. The time-domain converter circuit is realized in a
90nm CMOS technology, operates at 640kS/s, occupies an area of 0.022mm2,
and consumes less than 2.7W corresponding to a figure of merit of 6.2fJ/
conversion-step.
In Chap. 4, we present a 128-channel, programmable, neural spike classifier
based on nonlinear energy operator spike detection, and multiclass kernel support
vector machine classification that is able to accurately identify overlapping neural
spikes even for low SNR. For efficient algorithm execution, we transform the mul-
ticlass problem with the Keslers construction and extend iterative greedy optimi-
zation reduced set vectors approach with a cascaded method. The power-efficient,
multichannel clustering is achieved by a combination of the several algorithm and
circuit techniques, namely, the Keslers transformation, a boosted cascade reduced
set vectors approach, a two-stage pipeline processing units, the power-scalable
kernels, the register-bank memory, a high-VT devices, and a near-threshold sup-
ply. The results obtained in a 65nm CMOS technology show that an efficient,
14 1Introduction

large-scale neural spike data classification can be obtained with a low power (less
than 41W, corresponding to a 15.5W/mm2 of power density), compact, and a
low resource usage structure (31k logic gates resulting in a 2.64mm2 area).
In Chap. 5, we develop a yield constrained sequential power-per-area (PPA)
minimization framework based on dual quadratic program that is applied to mul-
tivariable optimization in neural interface design under bounded process variation
influences. In the proposed algorithm, we create a sequence of minimizations of
the feasible PPA regions with iteratively generated low-dimensional subspaces,
while accounting for the impact of area scaling. With a two-step estimation flow,
the constrained multi-criteria optimization is converted into an optimization with
a single objective function, and repeated estimation of non-critical solutions are
evaded. Consequently, the yield constraint is only active as the optimization con-
cludes, eliminating the problem of overdesign in the worst-case approach. The PPA
assignment is interleaved, at any design point, with the configuration selection,
which optimally redistributes the overall index of circuit quality to minimize the
total PPA ratio. The proposed method can be used with any variability model and,
subsequently, any correlation model, and is not restricted by any particular perfor-
mance constraint. The experimental results, obtained on the multichannel neural
recording interface circuits implemented in 90nm CMOS technology, demonstrate
power savings of up to 26% and area of up to 22%, without yield penalty.
In Chap. 6 the main conclusions are summarized and recommendations for fur-
ther research are presented.

References

1. G. Buzsaki, Large-scale recording of neuronal ensembles. Nat. Neurosci. 7, 446451 (2004)


2. F.A. Mussa-Ivaldi, L.E. Miller, Brain-machine interfaces: computational demands and clini-
cal needs meet basic neuroscience. Trends Neurosci. 26(6), 329334 (2003)
3. Q. Bai, K.D. Wise, D.J. Anderson, A high-yield micro assembly structure for three-dimen-
sional microelectrode arrays. IEEE Trans. Biomed. Eng. 47(3), 281289 (2000)
4. E.M. Maynard, C.T. Nordhausen, R. Normann, The Utah intracortical electrode array:
a recording structure for potential brain-computer interfaces. Electroencephalogr. Clin.
Neurophysiol. 102, 228239 (1997)
5. A.B. Schwarz, Cortial neural prosthetics. Annu. Rev. Neurosci. 27, 487507 (2004)
6. M. Nicolelis, Actions from thoughts. Nature 409, 403407 (2001)
7. M. Black, M. Serruya, E. Bienenstock, Y. Gao, W. Wu, J. Donoghue, in Connecting
Brains with Machines: The Neural Control of 2D Cursor Movement. Proceedings of IEEE
International Conference on Neural Engineering, pp. 580583, 2003
8. G. Buzsaki, Large-scale recording of neuronal ensembles. Nat. Neurosci. 7(5), 446451
(2004). (May)
9. J. Csicsvari etal., Massively parallel recording of unit and local field potentials with silicon-
based electrodes. J. Neurophysiol. 90(2), 13141323 (2003). (Aug)
10. P.K. Campbell etal., A silicon-based, three-dimensional neural interface: manufacturing pro-
cesses for an intracortical electrode array. IEEE Trans. Biomed. Eng. 38(8), 758768 (1991)
11. R.H. Olsson, K.D. Wise, A three-dimensional neural recording microsystem with implantable
data compression circuitry. IEEE J. Solid-State Circ. 40(12), 27962804 (2005)
References 15

12. R.H. Olsson etal., Band-tunable and multiplexed integrated circuits for simultaneous record-
ing and stimulation with microelectrode arrays. IEEE Trans. Biomed. Eng. 52(7), 13031311
(2005)
13. T.J. Blanche, M.A. Spacek, J.F. Hetke, N.V. Swindale, Polytrodes: high-density silicon elec-
trode arrays for large-scale multiunit recording. J. Neurophysiol. 93(5), 29873000 (2005)
14. R.J. Vetter, etal., in Development of a Microscale Implantable Neural Interface (MINI)
Probe Systems. Proceedings of International Conference of Engineering in Medicine and
Biology Society, pp. 73417344, 2005
15. G.E. Perlin, K.D. Wise, An ultra compact integrated front end for wireless neural recording
microsystems. J. Microelectromech. Syst. 19(6), 14091421 (2010)
16. P. Ruther, etal., in Compact Wireless Neural Recording System for Small Animals using
Silicon-Based Probe Arrays. Proceedings of International Conference of Engineering in
Medicine and Biology Society, pp. 22842287, 2011
17. T. Torfs etal., Two-dimensional multi-channel neural probes with electronic depth control.
IEEE Trans. Biomed. Circ. Syst. 5(5), 403412 (2011)
18. U.G. Hofmann etal., A novel high channel-count system for acute multisite neuronal record-
ings. IEEE Trans. Biomed. Eng. 53(8), 16721677 (2006)
19. P. Norlin etal., A 32-site neural recording probe fabricated by DRIE of SOI substrates. J.
Microelectromech. Microeng. 12(4), 414 (2002)
20. J. Du etal., Multiplexed, high density electrophysiology with nanofabricated neural probes.
PLoS ONE 6(10), e26204 (2011)
21. K. Faligkas, L.B. Leene, T.G. Constandinou, in A Novel Neural Recording System Utilising
Continuous Time Energy Based Compression. Proceedings of International Symposium on
Circuits and Systems, pp. 30003003, 2015
22. J.T. Robinson, M. Jorgolli, H. Park, Nanowire electrodes for high-density stimulation and
measurement of neural circuits. Frontiers Neural Circ. 7(38), 2013
23. C.M. Gray, P.E. Maldonado, M. Wilson, B. McNaughton, Tetrodes markedly improve the
reliability and yield of multiple single-unit isolation from multi-unit recordings in cat striate
cortex. J. Neurosci. Methods 63(12), 4354 (1995)
24. K.D. Harris, D.A. Henze, J. Csicsvari, H. Hirase, G. Buzski, Accuracy of tetrode spike
separation as determined by simultaneous intracellular and extracellular measurements. J.
Neurophysiol. 84(1), 401414 (2000)
25. R.R. Harrison, in A Low-Power Integrated Circuit for Adaptive Detection of Action Potentials
in Noisy Signals. Proceedings of Annual International Conference of the IEEE Engineering
in Medicine and Biology Society, pp. 33253328, 2003
26. K. Oweiss, K. Thomson, D. Anderson, in A Systems Approach for Real-Time Data
Compression in Advanced Brain-Machine Interfaces. Proceedings of IEEE International
Conference on Neural Engineering, pp. 6265, 2005
27. Y. Perelman, R. Ginosar, Analog frontend for multichannel neuronal recording system with
spike and lfp separation. J. Neurosci. Methods 153, 2126 (2006)
28. Z.S. Zumsteg, etal., in Power Feasibility of Implantable Digital Spike-Sorting Circuits for
Neural Prosthetic Systems. Proceedings of Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, pp. 42374240, 2004
29. A. Zviagintsev, Y. Perelman, R. Ginosar, in Low Power Architectures for Spike Sorting.
Proceedings of IEEE International Conference on Neural Engineering, pp. 162165, 2005
30. A. Zviagintsev, Y. Perelman, R. Ginosar, in Low Power Spike Detection and Alignment
Algorithm. Proceedings of IEEE International Conference on Neural Engineering, pp. 317
320, 2005
31. B. Gosselin, Recent advances in neural recording microsystems. Sensors 11(5), 45724597
(2011)
32. R.R. Harrison, G. Santhanam, K.V. Shenoy, in Local Field Potential Measurement with Low-
power Analog Integrated Circuit. International Conference of IEEE Engineering in Medicine
and Biology Society, vol. 2, pp. 40674070, 2004
16 1Introduction

33. R.R. Harrison etal., A low-power integrated circuit for a wireless 100-electrode neural
recording system. IEEE J. Solid-State Circ. 42(1), 123133 (2007)
34. S. Kim, R. Normann, R. Harrison, F. Solzbacher, in Preliminary Study of the Thermal Impact
of a Microelectrode Array Implanted in the Brain. Proceedings of Annual International
Conference of the IEEE Engineering in Medicine and Biology Society, pp. 29862989, 2006
35. I.H. Stevenson, K.P. Kording, How advances in neural recording affect data analysis. Nat.
Neurosci. 14(2), 139142 (2011)
36. C.I. de Zeeuw etal., Spatiotemporal firing patterns in the cerebellum. Nat. Rev. Neurosci.
12(6), 327344 (2011)
37. F. Klbl, etal., in In Vivo Electrical Characterization of Deep Brain Electrode and Impact
on Bio-amplifier Design. IEEE Biomedical Circuits and Systems Conference, pp. 210213,
2010
38. A.C. West, J. Newman, Current distributions on recessed electrodes. J. Electrochem. Soc.
138(6), 16201625 (1991)
39. S.K. Arfin, Low power circuits and systems for wireless neural stimulation, PhD Thesis,
MIT, 2011)
40. K.H. Kim, S.J. Kim, A wavelet-based method for action potential detection from extracel-
lular neural signal recording with low signal-to-noise ratio. IEEE Trans. Biomed. Eng. 50,
9991011 (2003)
41. K. Okada, S. Kousai (ed.), Digitally-Assisted Analog and RF CMOS Circuit Design for
Software defined Radio (Springer Verlag GmbH, Berlin, 2011)
42. M. Verhelst, B. Murmann, Area scaling analysis of CMOS ADCs. IEEE Electron. Lett. 48(6),
314315 (2012)
43. M. Pelgrom, A. Duinmaijer, A. Welbers, Matching properties of MOS transistors. IEEE J.
Solid-State Circ. 24(5), 14331439 (1989)
Chapter 2
Neural Signal Conditioning Circuits

AbstractThe increasing density and the miniaturization of the functional


blocks in these multi-electrode arrays presents significant circuit design chal-
lenge in terms of area, power, and the scalability, reliability and expandability
of the recording system. In this chapter, we present a neural signal condition-
ing circuit for biomedical implantable devices, which includes low-noise signal
amplification and band-pass filtering. The circuit is realized in a 65nm CMOS
technology, and consumes less than 1.5W. The fully differential low-noise
amplifier achieves 40dB closed loop gain, occupies an area of 0.04mm2, and
has input referred noise of 3.1Vrms over the operating bandwidth 0.120kHz.
The capacitive-attenuation band-pass filter with first-order slopes achieves 65dB
dynamic range, 210mVrms at 2% THD and 140Vrms total integrated output
noise.

2.1Introduction

Minimally invasive monitoring of the electrical activity of specific brain areas


using implantable microsystems offers the promise of diagnosing brain diseases,
as well as detecting and identifying neural patterns which are specific to behavio-
ral phenomenon. Neural pattern classification and recognition require simultane-
ous recording from a large number of neurons (and recording the LFP and spike
signals simultaneously). This, however, leads to the requirement of large dynamic
range and signal bandwidth for the analog front-end. In the worst case, we assume
that spikes with an amplitude of tens of V added on LFPs with an amplitudes of
about 2mV appear at the input of a recording channel. If an input-referred noise
of 2V is needed to meet the signal-to-noise ratio requirement of the spike sig-
nal, the dynamic range of the channel is around 60dB, resulting in a 10-bit A/D
conversion. Additionally, this sampling has to be done fast enough to capture the
information in spikes, e.g. 32kHz sampling rate. For a neural recording device
with 100 channels this results in a data rate of 32Mbs1. Furthermore, extensive

Springer International Publishing Switzerland 2016 17


A. Zjajo, Brain-Machine Interface, DOI10.1007/978-3-319-31541-6_2
18 2 Neural Signal Conditioning Circuits

recording in vivo demands complying with severe safety requirements. For exam-
ple, the maximum temperature increase due to the operation of the cortical implant
in any surrounding brain tissue should be kept at less than 1C [1].
The limited total power budget imposes strict specifications on the circuit
design of the low-noise analog front-end and high-speed circuits in the wideband
wireless link, which transmits the recorded data to a base station located outside
the skull. The design constraints are more pronounced when the number of record-
ing sites increases to several hundred for typical multi-electrode arrays.
Front-end neural amplifiers are crucial building blocks in implantable cortical
microsystems. Low-power and low-noise operation, stable dc interface with the
sensors (microprobes), and small silicon area are the main design specifications of
these amplifiers. The power dissipation is dictated by the tolerable input-referred
thermal noise of the amplifier, where the trade-off is expressed in terms of noise
efficiency factor [2]. For an ideal thermal-noise-limited amplifier with a constant
bandwidth and supply voltage, the power of the amplifier scales as 1/v2n where vn
is the input-referred noise of the amplifier. This relationship shows the steep power
cost of achieving low-noise performance in an amplifier.
In this chapter, we introduce a novel, low-power neural recording interface
system with capacitive-feedback low noise amplifier and capacitive-attenuation
band-pass filter. The capacitive-feedback amplifier offers low-offset and low-
distortion solution with optimal power-noise trade-off. Similarly, the capacitive-
attenuation band-pass filter provides wide tuning range and low-power realization,
while allowing simple extension of the transconductors linear range, and conse-
quently, ensuring low harmonic distortion. The low noise amplifier and band-pass
filter circuit are realized in a 65 nm CMOS technology, and consume 1.15W
and 390nW, respectively. The fully differential low-noise amplifier achieves
40dB closed loop gain, and occupies an area of 0.04mm2. Input referred noise is
3.1Vrms over the operating bandwidth 0.120kHz. Distortion is below 2% total
harmonic distortion (THD) for typical extracellular neural signals (smaller than
10mV peak-to-peak). The capacitive-attenuation band-pass filter with first-order
slopes achieves 65dB dynamic range, 210mVrms at 2% THD and 140Vrms
total integrated output noise.
The chapter is organized as follows: Sect.2.2 focuses on the signal condition-
ing circuit details, while Sect.2.3 offers brief overview of the operational amplifier
circuit concepts. Experimental results obtained are presented in Sect.2.4. Finally,
Sect.2.5 provides a summary and the main conclusions.

2.2Power-Efficient Neural Signal Conditioning Circuit

The neural spikes, typically ranging from 10 to 500V and containing data up
to~20kHz, are amplified with low noise neural amplifier (LNA) illustrated in
Fig. 2.1, where Vref voltage designates the node connected to the reference elec-
trode. The amplifier A1 is designed based on an operational transconductance
2.2 Power-Efficient Neural Signal Conditioning Circuit 19

T1 T2

Cin Cf
Vin C
A1
Vref
Gm2 Vout
C/ (A+1) A2
Cin AC Gm1 C
Cf T3

T4 VSS VSS R1
AC R2

VSS VSS
VSS

Fig.2.1Schematic of the signal conditioning circuit including low noise amplifier, band-pass
filter and programmable gain amplifier

amplifier that generates a current proportional to the differential input voltage.


The amplifier has a capacitive feedback configuration, which is adapted from [3]
with minor modifications. Neural amplifiers typically employ two different feed-
back path structures, to realize a high-pass filter, i.e. with two subthreshold-biased
transistors or with two diode-connected transistors. Two identical diode-connected
transistors T12 and T34 act as a high value resistors Rh (>1012) and adjust the
low frequency high-pass cutoff of the amplifier at ((2RhCf)1 =0.5Hz) that
blocks the dc offset induced by the electrode-tissue interface (typically around
1V), and local field potentials (LFP), typically with 0.150mV amplitude at
300Hz and below. The mid-band gain Amb is set by Cin/Cf, and the low-pass cutoff
frequency is approximately placed at gm,in/AmbCL, where gm,in is the transconduct-
ance of the input differential pair, and CL is the effective load capacitance of the
amplifier.
As neural recording involves the measurements of very small voltages, noise
can become a limiting factor in the system performance. The total noise at the
input of the neural interface is composed of the noise introduced by the electrodes
and the input-referred noise of the electronic circuitry.
The former is determined by the material of the electrodes, the impedance and
other characteristics of the electrode-electrolyte/tissue interface. The latter mainly
includes thermal and flicker noise of every component in the circuit. The noise of
the electronic system must be kept lower than the electrode noise (1020Vrms
[4]). So that it has a minor contribution to the overall noise. In a multi-stage sys-
tem, the noise of the first stage (input) has the largest effect on the circuit noise
due to the amplification of the following stages. Therefore, the design of the
input stage becomes critical and involves numerous trade-offs with other impor-
tant specifications such as power consumption and area. If the input stage is an
instrumentation amplifier, the ideal input-referred noise, assuming transistors in
the subthreshold region and a first-order frequency response, can be expressed as
Vrms,ni = [(4kT UTBW)/(2Itot)] [5], where k is Boltzmann constant, T is the
20 2 Neural Signal Conditioning Circuits

VDD

T15 T16

T13 T14
T18 T17

inp inn output


Ibias T1 T2 T3 T4

T9 T12
T10 T11

T5 T6 T7 T8

R1 R2 R3 R4

VSS

Fig.2.2Folded cascode LNA circuit

absolute temperature, UT is the thermal voltage, is the subthreshold gate cou-


pling coefficient, Itot is total supply current and BW is the 3dB bandwidth of the
amplifier. Consequently, for a given bandwidth the noise is inversely proportional
to the square root of the supply current, hence, there exists a trade-off between
noise and power consumption (Appendix A-1).
Implemented low noise, low power LNA Gm folded cascode circuit is illus-
trated in Fig.2.2. The topology is based on [6], where current splitting technique
[7] to enhance the drain resistance of both input and bottom transistors without
any additional cascading, is combined with the output-current scaling [5] tech-
nique to lower the OTA noise.
The noise contributions of the amplifier are minimized to be almost those of
only its two input transistors, due to the use of cascoded resistive loading rather
than current-source loads. The folded cascode Gm circuit realize a wide input com-
mon-mode range and a relatively high open-loop gain within one stage. The input-
referred noise of the Gm circuit is reduced by increasing the gm of input pair and
cascade devices, and increasing the aspect ratio of the devices. The effect of the
last method, however, is partially canceled by the increase in the noise excess fac-
tor. When referred to the Gm input, thermal noise voltages of the transistors used
as current sources (and mirrors) are multiplied by the gm of the device itself and
divided by the gm of the input transistor, which suggests that maximizing input
pair gm and minimizing gm of the current sources (and mirrors) minimizes noise.
The transistors of the output stage have two constrains: the gm of the cascading
transistors T9, T12 must be high enough, in order to boost the output resistance of
the cascode, allowing a high enough dc gain. Secondly, the saturation voltage of
the active loads T58 and T1316 must be maximized, in order to reduce the extra
noise contribution of the output stage. By making the cascading transistors larger
2.2 Power-Efficient Neural Signal Conditioning Circuit 21

than the active loads the gm of the cascading transistors is maximized, boosting the
dc gain, while their saturation voltage is reduced, allowing for a larger saturation
voltage for the active loads, without exceeding the voltage headroom. The bias
current of the LNA can be varied to adapt its noise per unit bandwidth.
To keep the overall bandwidth constant when the bias current of the gain stage
is varied, a band-pass filter [8] (Fig.2.3) is added to the output of the LNA. High
gain provided by the LNA stage alleviates noise floor requirements of this band-
width-limiting stage. The total integrated output voltage noise of the filter depends
on the linear range of the transconductors Gm1 and Gm2 (Fig.2.4), the ratio of the
attenuator capacitances A and the unit capacitance C. The linear range of the Gm is
effectively improved by attenuating the input. In the high-pass stage, the signal is
attenuated by a factor of A+1 and the full capacitance of (A+1)C is then utilized
for filtering with Gm1. In the low-pass stage, a gain of A+1 is applied to signals
in the pass-band. A capacitance C/(A+1) is added in parallel with the attenuating
capacitances to increase the filtering capacitance.

2.3Operational Amplifiers

Operating on the edge of the performance envelope, op amps exhibit intense trade-
offs amongst the dynamic range, linearity, settling speed, stability, and power con-
sumption. As a result, accuracy and speed are often dictated by the performance of
these amplifiers.
Amplifiers with a single gain stage have high output impedance providing an
adequate dc gain, which can be further increased with gain boosting techniques.
Single-stage architecture offers large bandwidth and a good phase margin with

VDD

T9
Vpbias
T10

Vinp Vinn
T1 T3 T4 T2

T5 T6

T7 T8
Vout

T11 T12

VSS

Fig.2.3Band-pass filter Gm1 cell


22 2 Neural Signal Conditioning Circuits

VDD

T13 T14

T9 T10 T11 T12

Vinp Vinn
T1 T2 Vout

T3 T4

T7 T5 T6 T8

VSS

Fig.2.4Band-pass filter Gm2 cell

small power consumption. Furthermore, no frequency compensation is needed,


since the architecture is self-compensated (the dominant pole is determined by the
load capacitance), which makes the footprint on the silicon small. On the other
hand, the high output impedance is obtained by sacrificing the output voltage
swing, and the noise is rather high as a result of the number of noise-contributing
devices and limited voltage head-room for current source biasing.
The simplest approach for the one-stage high-gain operational amplifier is tel-
escopic cascode amplifier [9] of Fig.2.5. With this architecture, a high open loop
dc gain can be achieved and it is capable of high speed when closed loop gain is
low. The number of current legs being only two, the power consumption is small.
The biggest disadvantage of a telescopic cascode amplifier is its low maximum
output swing, VDD 5VDS,SAT, where VDD is the supply voltage and VDS,SAT is

VDD

bias1
T7 T8

bias2
T5 T6
outn outp

bias3
T3 T4

inp inn
T1 T2

cmfb
T3

VSS

Fig.2.5One-stage amplifiers: telescopic cascade


2.3 Operational Amplifiers 23

VDD

T9 bias1 bias1 T10

T7 bias2 bias2 T8

outn inp inn outp


T1 T2

bias3 bias3
T5 T6
bias4
T11

T3 cmfb cmfb T4

VSS

Fig.2.6One-stage amplifiers: folded cascade

the saturation voltage of a transistor. With this maximum possible output swing
the input common-mode range is zero. In practice, some input common-mode
range, which reduces the output swing, always has to be reserved so as to permit
inaccuracy and settling transients in the signal common-mode levels. The high-
speed capability of the amplifier is the result of the presence of only n-channel
transistors in the signal path and of relatively small capacitance at the source of
the cascode transistors. The gain-bandwidth product of the amplifier is given by
GBW=gm1/CL, where gm1 is the transconductance of transistors T1 and CL is the
load capacitance. Thus, the GBW is limited by the load capacitance.
Due to its the simple topology and dimensioning, the telescopic cascode ampli-
fier is preferred if its output swing is large enough for the specific application. The
output signal swing of this architecture has been widened by driving the transis-
tors T7T8 into the linear region [10]. In order to preserve the good common mode
rejection ratio and power supply rejection ratio properties of the topology, addi-
tional feedback circuits for compensation have been added to these variations. The
telescopic cascode amplifier has low current consumption, relatively high gain,
low noise and very fast operation. However, as it has five stacked transistors, the
topology is not suitable for low supply voltages.
The folded cascode amplifier topology [11] is shown in Fig.2.6. The swing of
this design is constrained by its cascoded output stage. It provides a larger output
swing and input common-mode range than the telescopic amplifier with the same
dc gain and without major loss of speed. The output swing is VDD4VDS,SAT and
is not linked to the input common-mode range, which is VDD VT 2VDS,SAT.
The second pole of this amplifier is located at gm7/Cpar, where gm7 is the transcon-
ductance of T7 and Cpar is the sum of the parasitic capacitances from transistors
T1, T7 and T9 at the source node of transistor T7. The frequency response of this
amplifier is deteriorated from that of the telescopic cascode amplifier because of
a smaller transconductance of the p-channel device and a larger parasitic capac-
itance. To assure symmetrical slewing, the output stage current is usually made
24 2 Neural Signal Conditioning Circuits

VDD

K:1 1:1 1:1 1:K


T6 T4 T5 T7
T14 T15

bias3 bias3
T8 T9

outn outp

inn inp
bias2 T2 T3 bias2
T10 T11

T16 T17

T12 T1 T13
K:1 bias1 1:K
KIB/2 IB/2 IB IB/2 KIB/2

VSS

Fig.2.7One-stage amplifiers: push-pull current-mirror amplifier with a cascade output stage

equal to that of the input stage. The GBW of the folded cascode amplifier is also
given by gm1/CL.
The open loop dc gain of amplifiers having cascode transistors can be boosted
by regulating the gate voltages of the cascode transistors [12]. The regulation is
realized by adding an extra gain stage, which reduces the feedback from the output
to the drain of the input transistors. In this way, the dc gain of the amplifier can be
increased by several orders of magnitude. The increase in power and chip area can
be kept very small with appropriate feedback amplifier architecture [12]. The cur-
rent consumption of the folded cascode is doubled compared to the telescopic cas-
code amplifier although the output voltage swing is increased since there are only
four stacked transistors. The noise of the folded cascode is slightly higher than in
the telescopic cascode as a result of the added noise from the current source tran-
sistors T9 and T10. In addition, the folded cascade has a slightly smaller dc gain
due to the parallel combination of the output resistance of transistors T1 and T9.
A push-pull current-mirror amplifier, shown in Fig.2.7, has much better slew-
rate properties and potentially larger bandwidth and dc gain than the folded cas-
code amplifier. The slew rate and dc gain depend on the current-mirror ratio K,
which is typically between one and three. However, too large current-mirror ratio
increases the parasitic capacitance at the gates of the transistors T12 and T13, push-
ing the non-dominant pole to lower frequencies and limiting the achievable GBW.
The non-dominant pole of the current mirror amplifier is much lower than that of
the folded cascode amplifier and telescopic amplifiers due to the larger parasitic
capacitance at the drains of input transistors.
The noise and current consumption of the current-mirror amplifier are larger
than in the telescopic cascode amplifier or in the folded cascode amplifier. A cur-
rent-mirror amplifier with dynamic biasing [13] can be used to make the amplifier
biasing be based purely on its small signal behavior, as the slew rate is not limited.
In dynamic biasing, the biasing current of the operational amplifier is controlled
2.3 Operational Amplifiers 25

VDD

T3 bias1 T4
T5 T6

CC CC

outn inp inn outp


T1 T2

bias2
T9

cmfb cmfb
T7 T8

VSS

Fig.2.8Two-stage amplifiers: Miller compensated

on the basis of the differential input signal. With large differential input signals,
the biasing current is increased to speed up the output settling. Hence, no slew
rate limiting occurs, and the GBW requirement is relaxed. As the settling proceeds,
the input voltage decreases and the biasing current is reduced. The biasing current
needs to be kept only to a level that provides enough GBW for an adequate small-
signal performance. In addition to relaxed GBW requirements, the reduced static
current consumption makes the design of a high-dc gain amplifier easier. With
very low supply voltages, the use of the cascode output stages limits the avail-
able output signal swing considerably. Hence, two-stage operational amplifiers
are often used, in which the operational amplifier gain is divided into two stages,
where the latter stage is typically a common-source output stage. Unfortunately,
with the same power dissipation, the speed of the two-stage operational amplifiers
is typically lower than that of single-stage operational amplifiers.
Of the several alternative two-stage amplifiers, Fig.2.8 shows a simple Miller
compensated amplifier [14]. With all the transistors in the output stage of this ampli-
fier placed in the saturation region, it has an output swing of VDDVDS,SAT. Since
the non-dominant pole, which arises from the output node, is determined domi-
nantly by an explicit load capacitance, the amplifier has a compromised frequency
response.
The gain bandwidth product of a Miller compensated amplifier is given approx-
imately by GBW=gm1/CC, where gm1 is the transconductance of T1. In general,
the open loop dc gain of the basic configuration is not large enough for high-res-
olution applications. Gain can be enhanced by using cascoding, which has, how-
ever, a negative effect on the signal swing and bandwidth. Another drawback of
this architecture is a poor power supply rejection at high frequencies because of
the connection of VDD through the gate-source capacitance CGS5,6 of T5 and T6
and CC. The noise properties of the two-stage Miller-compensated operational
26 2 Neural Signal Conditioning Circuits

VDD

T11 bias4 bias4 T12


T9 T10

T7 bias3 bias3 T8

CC CC
outn T15 inp inn T16 outp
T1 T2

bias2 bias2
T5 T6
bias1
T11
cmfb cmfb cmfb cmfb
T13 T3 T4 T14

VSS

Fig.2.9Two-stage amplifiers: folded cascode amplifier with a common-source output stage and
Miller frequency compensation

amplifier are comparable to those of the telescopic cascode and better than those
of the folded cascode amplifier. The speed of a Miller-compensated amplifier is
determined by its pole-splitting capacitor CC. Usually, the position of this non-
dominant pole, which is located at the output of the two-stage amplifier, is lower
than that of either a folded-cascode or a telescopic amplifier.
Thus, in order to push this pole to higher frequencies, the second stage of the
amplifier requires higher currents resulting in increased power dissipation. Since
the first stage does not need to have a large output voltage swing, it can be a cas-
code stage, either a telescopic or a folded cascode. However, the current consump-
tion and transistor count are also increased. The advantages of the folded cascode
structure are a larger input common-mode range and the avoidance of level shift-
ing between the stages, while the telescopic stage can offer larger bandwidth and
lower thermal noise.
Figure2.9 illustrates a folded cascode amplifier with a common-source output
stage and Miller compensation. The noise properties are comparable with those
of the folded cascode amplifier. If a cascode input stage is used, the lead-compen-
sation resistor can be merged with the cascode transistors. An example of this is
the folded cascode amplifier with a common-source output stage and Ahuja-style
compensation [15] shown in Fig.2.10. The operation of the Ahuja-style compen-
sated operational amplifier is suitable for larger capacitive loads than the Miller-
compensated one and it has a better power supply rejection, since the substrate
noise coupling through the gate-source capacitance of the output stage gain tran-
sistors is not coupled directly through the pole-splitting capacitors to the opera-
tional amplifier output [15].
2.4 Experimental Results 27

VDD

T11 bias4 bias4 T12


T9 T10

T7 bias3 bias3 T8

outn inp inn outp


T1 T2

bias2 bias2
T5 T6
CC CC
bias1
T11
cmfb cmfb cmfb cmfb
T13 T3 T4 T14

VSS

Fig.2.10Two-stage amplifiers: folded cascode amplifier with a common-source output stage


and Ahuja-style frequency compensation

2.4Experimental Results

Design simulations on the transistor level were performed at body temperature


(37C) on Cadence Virtuoso using industrial hardware-calibrated TSMC 65nm
CMOS technology. The analog circuits operate with a 1V supply, while the
digital blocks operate at near-threshold from a 400mV supply. The test dataset
(Fig. 2.11) is based on recordings from the human neocortex and basal ganglia.
The signal quality in neural interface front-end, beside the specifics of the elec-
trode material and the electrode/tissue interface, is limited by the nature of the
bio-potential signal, dictating system resource constraints (power, size, bandwidth,
and thermal dissipation i.e. to avoid tissue damage). When a neuron fires an action
potential, the cell membrane becomes depolarized by the opening of voltage-con-
trolled neuron channels leading to a flow of current both inside and outside the
neuron. The time series representation of an neuron signal at the preamplifiers
input (Fig.2.12) are composed of a spike burst, plus additive Gaussian white noise
(grey area with 1000 randomly selected neural channel compartments and black
area with filtered out predicted bias from the estimated variance 2).
Since extracellular media is resistive [16], the extracellular potential is approxi-
mately proportional to the current across the neuron membrane. Hence, by main-
taining a constant current density, the relative uncertainty of the current becomes
inversely proportional to the square of the interface area. The membrane roughly
behaves like an RC circuit and most current flows through the membrane capac-
itance. In typical electrode-tissue interface, we are relying on the current meas-
urement to sense these neural signals. Hence, by maintaining a constant current
density, the relative uncertainty of the current becomes inversely proportional
to the square of the interface area. The electrode noise spectral density has an
28 2 Neural Signal Conditioning Circuits

(a) 2 Raw signal

Amplitude
1
0
-1
-2

(b) 1 Bandpass filtered signal (300-3000Hz)


Amplitude

0.5

-0.5
(c) Detected spikes
1
Amplitude

0.5

-0.5
1 2 3 4 5 6 7 8 9 10
Time [40uS/step]

Fig.2.11Test data set, the y axis is arbitrary; a top: raw signal after amplification, not corrected
for gain, b bandpass filtered signal, and c detected spikes

20
Membrane potential [mV]

-20

-40

-60

0 5 10 15 20 25
Time [ms]

Fig.2.12Statistical voltage trace of neuron cell activity; Grey AreaVoltage traces from 1000
randomly selected neural channel compartments, Black AreaExpected voltage trace

approximate dependence of 10dB/dec for small frequencies. However, for


frequencies higher than 110kHz, capacitances at the interface form the high-
frequency pole and shape both the signal and the noise spectrum; the noise is low-
pass filtered to the recording amplifier inputs.
Due to the small amplitude of neural signals and the high impedance of the
electrode tissue interface, amplification and low-pass filtering of the extra-
cellular neural signals is performed before the signals can be digitized. An
2.4 Experimental Results 29

(a) x 10
-3

Amplitude [V]
0

-5

0 2 4 6 8 10 12 14 16 18 20
Time [ms]
(b)
-80

-100
Magnitude [dBV 2rms /Hz]

-120

-140

-160

-180
0 1 2
10 10 10
Frequency [kHz]

Fig.2.13a Noise amplitude in time-domain at the output of the low-pass filter; b noise PSD at
the output of the low-pass filter

example of the time-domain noise estimation and noise power spectral den-
sity at the output of the low-pass filter is illustrated in Fig.2.13. For frequencies
higher than~10kHz, capacitances at the interface form the high-frequency pole
and shape both the signal and the noise spectrum; the noise is low-pass filtered
to the recording amplifier inputs. The interfaces input equivalent noise volt-
age decreases as the gain across the amplifying stages increases, i.e. the ratio
of the square of the signal power over its noise variance can be expressed as
   
SNR = F 2 / 2 2
neural + electrode + i j Gj
1 2
amp,i , where F is the total sig-
nal power, amp,i
2 represents the variance of the noise added by the ith amplification
stage with gains Gj , electrode
2 is the variance of the electrode, and neural
2 is variance
of the biological neural noise. The observed SNR of the system also increases as
the system is isomorphically scaled up, which suggests a fundamental trade-off
between SNR and speed of the system.
The fully differential low-noise amplifier achieves 40dB closed loop gain, and
occupies an area of 0.04mm2. Input referred noise is 3.1Vrms over the oper-
ating bandwidth 0.120kHz. Distortion is below 2% total harmonic distortion
(THD) for typical extracellular neural signals (smaller than 10mV peak-to-peak).
The common-mode rejection ratio (CMRR) and the power-supply rejection ratio
(PSRR) exceed 75dB.
The capacitive-attenuation band-pass filter with first-order slopes achieves
65dB dynamic range, 210mVrms at 2% THD and 140Vrms total inte-
grated output noise. Total harmonic distortion of the V/I converter is 0.04% at
20kHz. Table2.1 compares the state of the art neural recording systems to this
work.
30 2 Neural Signal Conditioning Circuits

Table2.1Neural interface comparison with prior art


Interface [17] [18] [19] [20] [this
work]a
Technology 0.18 0.13 0.18 0.065 0.065
VDD [V] 0.45 1.2 1.8 1 1
Gain [dB] 52 5460 3072 52.1 65
INF [Vrms] 3.2 4.7 3.2 4.13 3.1
Bandwidth [Hz](k) 10 105 3006 18.2 20
P/channel[W] 0.73 3.5 5.4 2.8 2.1
A/channel[mm2] 0.2 0.09 0.08 0.042 0.036
aSimulated data

2.5Conclusions

Bio-electronic neural interfaces enable the interaction with neural cells by record-
ing, to facilitate early diagnosis and predict intended behavior before undertak-
ing any preventive or corrective actions, or by stimulation, to prevent the onset
of detrimental neural activity such as that resulting in tremor. Multi-channel
neural interfaces allow for spatial neural recording and stimulation at multiple
sites. To evade the risk of infection, these systems are implanted under the skin,
while the recorded neural signals and the power required for the implant opera-
tion is transmitted wirelessly. The maximum number of channels is constrained
with noise, area, bandwidth, power, which has to be supplied to the implant exter-
nally, thermal dissipation i.e. to avoid necrosis of the tissues, and the scalability
and expandability of the recording system. Very frequently an electrode records
the action potentials from multiple surrounding neurons. Subsequently, the ability
to differentiate spikes from noise is governed by, both, the discrepancies between
the noise-free spikes from each neuron, and the signal-to-noise level of the record-
ing interface. After the waveform alignment, a feature extraction step character-
izes detected spikes and represent each detected spike in a reduced dimensional
space. The feature extraction and spike classification significantly reduce the data
requirements prior to data transmission (in multi-channel systems, the raw data
rate is substantially higher than the limited bandwidth of the wireless telemetry).
In this chapter, we introduce a low-power neural signal conditioning circuit
with capacitive-feedback low-noise amplifier and capacitive-attenuation band-
pass filter. The capacitive-feedback amplifier offers low-offset and low-distortion
solution with optimal power-noise trade-off. Similarly, the capacitive-attenuation
band-pass filter provides wide tuning range and low-power realization, while
allowing simple extension of the transconductors linear range, and consequently,
ensuring low harmonic distortion.
References 31

References

1. IEEE Standards Coordinating Committee, in IEEE standard for safety levels with respect to
human exposure to radio frequency electromagnetic fields, 3kHz to 300GHz, C95.1-2005,
2006
2. M. Steyaert, W. Sansen, C. Zhongyuan, A micropower low-noise monolithic instrumentation
amplifier for medical purposes. IEEE J. Solid-State Circuits 22(6), 11631168 (1987)
3. R. Harrison, C. Charles, A low-power low-noise CMOS amplifier for neural recording appli-
cations. IEEE J. Solid-State Circuits 38(6), 958965 (2003)
4. M.C. Chae, W. Liu, M. Sivaprakasam, Design optimization for integrated neural recording
systems. IEEE J. Solid-State Circuits 43(9), 19311939 (2008)
5. W. Wattanapanitch, M. Fee, R. Sarpeshkar, An energy-efficient micropower neural recording
amplifier. IEEE Trans. Biomed. Circuits Syst. 1(2), 136147 (2007)
6. C. Qian, J. Parramon, E. Sanchez-Sinencio, A micropower low-noise neural recording front-
end circuit for epileptic seizure detection. IEEE J. Solid-State Circuits 46(6), 13291405
(2011)
7. F. Bahmani, E. Snchez-Sinencio, A highly linear pseudo-differential transconductance, in
Proceedings of IEEE European Solid-State Circuits Conference, 2004, pp. 111114
8. S.K. Arfin, Low power circuits and systems for wireless neural stimulation. PhD thesis,
Massachusetts Institute of Technology, 2011
9. G. Nicollini, P. Confalonieri, D. Senderowicz, A fully differential sample-and-hold circuit for
high-speed applications. IEEE J. Solid-State Circuits 24(5), 14611465 (1989)
10. K. Gulati, H.-S. Lee, A high-swing CMOS telescopic operational amplifier. IEEE J. Solid-
State Circuits 33(12), 20102019 (1998)
11. T.C. Choi, R.T. Kaneshiro, W. Brodersen, P.R. Gray, W.B. Jett, M. Wilcox, High-frequency
CMOS switched-capacitor filters for communications application. IEEE J. Solid-State
Circuits 18, 652664 (1983)
12. K. Bult, G. Geelen, A fast-settling CMOS op amp for SC circuits with 90-dB DC gain. IEEE
J. Solid-State Circuits 25(6), 13791384 (1990)
13. R. Harjani, R. Heineke, F. Wang, An integrated low-voltage class AB CMOS OTA. IEEE J.
Solid-State Circuits 34(2), 134142 (1999)
14. R. Hogervorst, J.H. Huijsing, Design of low-voltage low-power operational amplifier cells
(Kluwer Academic Publishers, Dordrecht, 1999)
15. B.K. Ahuja, An improved frequency compensation technique for CMOS operational ampli-
fiers. IEEE J. Solid-State Circuits 18(6), 629633 (1983)
16. C.I. de Zeeuw etal., Spatiotemporal firing patterns in the cerebellum. Nat. Rev. Neurosci.
12(6), 327344 (2011)
17. D. Han etal., A 0.45V 100-channel neural-recording IC with sub-W/channel consumption
in 0.18m CMOS. IEEE Trans. Biomed. Circuits Syst. 7(6), 735746 (2013)
18. K. Abdelhalim etal., 64-channel UWB wireless neural vector analyzer SoC with a closed-
loop phase synchrony-triggered neurostimulator. IEEE J. Solid-State Circuits 48(10), 2494
2510 (2013)
19. C.M. Lopez etal., An implantable 455-active-electrode 52-channel CMOS neural probe, in
IEEE International Solid-State Circuits Conference, pp. 288289, 2013
20. K.A. Ng, Y.P. Xu, A multi-channel neural-recording amplifier system with 90dB CMRR
employing CMOS-inverter-based OTAs with CMFB through supply rails in 65nm CMOS, in
IEEE International Solid-State Circuits Conference, pp. 206207, 2015
Chapter 3
Neural Signal Quantization Circuits

Abstract Integrated neural implant interface with the brain using biocompatible
electrodes provides high yield cell recordings, large channel counts, and access to
spike data and/or field potentials with high signal-to-noise ratio. By increasing the
number of recording electrodes, spatially broad analysis can be performed that can
provide insights into how and why neuronal ensembles synchronize their activity.
In this chapter, we present several A/D converter realizations in voltage-, current-
and time-domain, respectively, suitable for multichannel neural signal-processing.
The voltage-domain SAR A/D converter combines the functionalities of program-
mable-gain stage and analog to digital conversion, occupies an area of 0.028mm2,
and consumes 1.1W of power at 100kS/s sampling rate. The current-mode suc-
cessive approximation A/D converter is realized in a 65nm CMOS technology,
and consumes less than 367nW at 40kS/s, corresponding to a figure of merit
of 14 fJ/conversion-step, while operating from a 1V supply. A time-based, pro-
grammable-gain A/D converter allows for an easily scalable, and power-efficient,
implantable, biomedical recording system. The time-domain converter circuit is
realized in a 90nm CMOS technology, operates at 640 kS/s, occupies an area of
0.022mm2, and consumes less than 2.7W corresponding to a figure of merit of
6.2fJ/conversion-step.

3.1Introduction

Bioelectronic interfaces allow the interaction with neural cells by both recording,
to facilitate early diagnosis and predict intended behavior before undertaking any
preventive or corrective actions [1], or stimulation devices, to prevent the onset
of detrimental neural activity such as that resulting in tremor. Monitoring large
scale neuronal activity and diagnosing neural disorders has been accelerated by
the fabrication of miniaturized microelectrode arrays, capable of simultaneously
recording neural signals from hundreds of channels [2]. By increasing the num-
ber of recording electrodes, spatially broad analysis of local field potentials can
be performed that can provide insights into how and why neuronal ensembles

Springer International Publishing Switzerland 2016 33


A. Zjajo, Brain-Machine Interface, DOI10.1007/978-3-319-31541-6_3
34 3 Neural Signal Quantization Circuits

synchronize their activity. Studies on body motor systems have uncovered how
kinematic parameters of movement control are encoded in neuronal spike time-
stamps [3] and inter-spike intervals [4]. Neurons produce spikes of nearly identi-
cal amplitude near to the soma, but the measured signal depend on the position of
the electrode relative to the cell. Additionally, the signal quality in neural inter-
face front-end, beside the specifics of the electrode material and the electrode/tis-
sue interface, is limited by the nature of the bio-potential signal and its biological
background noise, dictating system resources. For any portable or implantable
device, microelectrode arrays require miniature electronics locally to amplify the
weak neural signals, filter out noise and out-of band interference and digitize for
transmission. A single-channel [5] or a multichannel integrated neural amplifiers
and A/D converters provide the frontline interface between recording electrode
and signal conditioning circuits, and thus face critical performance requirements.
In this chapter, we present several A/D converter realizations in voltage-, cur-
rent- and time-domain, respectively, suitable for multichannel neural signal-pro-
cessing, and we evaluate trade-off between noise, speed and power dissipation on
a circuit-architecture level. This approach provides key insight required to address
SNR, response time, and linearity of the physical electronic interface. The voltage-
domain SAR A/D converter combines the functionalities of programmable-gain
stage and analog to digital conversion, occupies an area of 0.028mm2, and con-
sumes 1.1W of power at 100 kS/s sampling rate. The current-mode successive
approximation A/D converter is realized in a 65 nm CMOS technology, and con-
sumes less than 367 nW at 40 kS/s, corresponding to a figure of merit of 14 fJ/con-
version-step, while operating from a 1V supply. A time-based, programmable-gain
A/D converter allows for an easily scalable, and power-efficient, implantable, bio-
medical recording system. The time-domain converter circuit is realized in a 90nm
CMOS technology, operates at 640 kS/s, occupies an area of 0.022mm2, and con-
sumes less than 2.7W corresponding to a figure of merit of 6.2 fJ/conversion-step.
The chapter is organized as follows: Sect.3.2 present the overview of the low-
power A/D converter architectures, while in Sect.3.3 analyses of the main building
blocks of the A/D converter are given, namely, sample and hold circuit, operation
amplifier, and comparator. Section3.4 focuses on the voltage-domain A/D conversion,
and the noise fluctuations on a circuit-architecture level. In Sect.3.5, the main build-
ing blocks of the current-domain ADC are evaluated. In Sect.3.6, the time-domain
A/D conversion, which utilizes a linear voltage-to-time converter (VTC) and a two-
step time-to-digital converter is discussed. Experimental results obtained are presented
in Sect.3.7. Finally, Sect.3.8 provides a summary and the main conclusions.

3.2Low-Power A/D Converter Architectures

Since the existence of digital signal processing, A/D converters have been play-
ing a very important role to interface analog and digital worlds. They perform the
digitalization of analog signals at a fixed time period, which is generally specified
3.2 Low-Power A/D Converter Architectures 35

by the application. The A/D conversion process involves sampling the applied
analog input signal and quantizing it to its digital representation by comparing it to
reference voltages before further signal processing in subsequent digital systems.
Depending on how these functions are combined, different A/D converter architec-
tures can be implemented with different requirements on each function. To imple-
ment power-optimized A/D converter functions, it is important to understand the
performance limitations of each function before discussing system issues. In this
section, the concept of the basic A/D conversion process and the fundamental limi-
tation to the power dissipation of each key building block are presented.
Parallel (Flash) A/D conversion is by far the fastest and conceptually simplest
conversion process [615], where an analog input is applied to one side of a com-
parator circuit and the other side is connected to the proper level of reference from
zero to full scale. The threshold levels are usually generated by resistively dividing
one or more references into a series of equally spaced voltages, which are applied
to one input of each comparator. For n-bit resolution, 2n1 comparators simulta-
neously evaluate the analog input and generate the digital output as a thermometer
code. Since flash converter needs only one clock cycle per conversion, it is often
the fastest converter. On the other hand, the resolution of flash ADCs is limited by
circuit complexity, high power dissipation, and comparator and reference mismatch.
Its complexity grows exponentially as the resolution bit increases. Consequently,
the power dissipation and the chip area increase exponentially with the resolution.
To reduce hardware complexity, power dissipation, and die area, and to increase
the resolution, but to maintain high conversion rates, flash converters can be
extended to a two-step/multistep [1624] or sub-ranging architecture [2533]
(also called series-parallel converter). Conceptually, these types of converters need
m2n instead of 2mn comparators for a full flash implementation assuming n1,
n2, , nm are all equal to n. However, the conversion in sub-range, two-step/multi-
step ADC does not occur instantaneously like a flash ADC, and the input has to
be held constant until the sub-quantizer finishes its conversion. Therefore, a sam-
ple and hold circuit is required to improve performance. The conversion process is
split into two steps as shown in Fig.3.1. Simplified two-step A/D architecture and

Analog In + A
S/H
D
-
A=2n1

A D
D A

n1 n2


(n1+n2) Digital Out

Fig.3.1Two-step A/D converter


36 3 Neural Signal Quantization Circuits

voltage
amplifier
Vin
+ A
D
- amplified
voltage residue lower bit
ADC
A D
D A Vin-LSB<V<Vin

upper bit
ADC
Two-step A/D converter

time difference
amplifier
Tdiff=Tstart-Tstop + T
Tstop
D
- amplified
time residue lower bit
TDC
Tstart
T Delay
D Tdiff-LSB<T<Tdiff
upper bit
TDC
Two-step T/D converter

Fig.3.2Simplified two-step A/D converter and corresponding T/D converter

corresponding two-step time to digital (T/D) converter is illustrated in Fig.3.2.


The first A/D sub-converter performs a coarse conversion of the input signal. A
D/A converter is used to convert the digital output of the A/D sub-converter back
into the analog domain. The output of the D/A converter is then subtracted from
the analog input. The resulting signal, called the residue, is amplified and fed into
a second A/D sub-converter which takes over the fine conversion to full resolution
of the converter. The amplification between the two stages is not strictly necessary
but is carried out nevertheless in most of the cases. With the help of this ampli-
fying stage, the second A/D sub-converter can work with the same signal levels
as the first one, and therefore has the same accuracy requirements. At the end of
the conversion the digital outputs of both A/D sub-converters are summed up. By
using concurrent processing, the throughput of this architecture can sustain the
same rate as a flash A/D converter. However, the converted outputs have a latency
of two clock cycles due to the extra stage to reduce the number of precision com-
parators. If the system can tolerate the latency of the converted signal, a two-step
converter is a lower power, smaller area alternative.
The two-step architecture is equipped with a sample-and-hold (S/H) circuit in
front of the converter (Fig.3.1). This additional circuit is necessary because the input
signal has to be kept constant until the entire conversion (coarse and fine) is com-
pleted. By adding a second S/H circuit between the two converter stages, the conver-
sion speed of the two-step A/D converter can be significantly increased (Fig.3.4).
In a first clock cycle the input sample and hold circuit samples the analog input
signal and holds the value until the first stage has finished its operation and the
outputs of the subtraction circuit and the amplifier have settled. In the next clock
3.2 Low-Power A/D Converter Architectures 37

cycle, the S/H circuit between the two stages holds the value of the amplified resi-
due. Therefore, the second stage is able to operate on that residue independently of
the first stage, which in turn can convert a new, more recent sample. The maximum
sampling frequency of the pipelined two-step converter is determined by the set-
tling time of the first stage only due to the independent operation of the two stages.
To generate the digital output for one sample, the output of the first stage has
to be delayed by one clock cycle by means of a shift register (SR) (Fig.3.3).
Although the sampling speed is increased by the pipelined operation, the delay
between the sampling of the analog input and the output of the corresponding digi-
tal value is still two clock cycles. For most applications, however, latency does not
play any role, only conversion speed is important. In all signal processing and tel-
ecommunications applications, the main delay is caused by digital signal process-
ing, so a latency of even more than two clock cycles is not critical.
The architecture as described above is not limited to two stages. Because the
inter-stage sample and hold circuit decouples the individual stages, there is no dif-
ference in conversion speed whether one single stage or an arbitrary number of
stages follow the first one. This leads to the general pipelined A/D converter archi-
tecture, as depicted in Fig.3.4 [3455]. Each stage consists of an S/H, an N-bit
flash A/D converter, a reconstruction D/A converter, a subtracter, and a residue

Analog In + A
S/H S/H
D
-
A=2n1

A D
D A

n1

SR
n2


(n1+n2) Digital Out

Fig.3.3Two-Step converter with an additional sample and hold circuit and a shift register (SR)
to line up the stage output in time

1st stage 2nd stage mth stage

In
S/H
+ In
S/H
+ S/H A
D
- -
A=2n1 A=2n2

A D A D
D A D A
n1 n2

SR

nm
SR SR


(n1+n2++nm) Digital Out

Fig.3.4Multi-stage pipeline A/D converter architecture


38 3 Neural Signal Quantization Circuits

amplifier. The conversion mechanism is similar to that of sub-ranging conversion in


each stage. Now the amplified residue is sampled by the next S/H, instead of being
fed to the following stage. All the n-bit digital outputs emerging from the quantizer
are combined as a final code by using the proper number of delay registers, combi-
nation logic, and digital error correction logic. Although this operation produces a
latency corresponding to the sub-conversion stage before generating a valid output
code, the conversion rate is determined by each stages conversion time, which is
dependent on the reconstruction D/A converter and residue amplifier settling time.
The multi-stage pipeline structure combines the advantages of high throughput
by flash converters with the low complexity, power dissipation, and input capaci-
tance of sub-ranging/multistep converters. The advantage of the pipelined A/D
converter architecture over the two-step converter is the freedom in the choice
of number of bits per stage. In principle, any number of bits per stage is possi-
ble, down to one single bit. It is even possible to implement a noninteger number
of bits such as 1.5 bit per stage by omitting the top comparator of the flash A/D
sub-converter used in the individual stages [56]. It is not necessary, although com-
mon, that the number of bits per stage is identical throughout the pipeline, but can
be chosen individually for each stage [3438]. The only real disadvantage of the
pipelined architecture is the increased latency. For an A/D converter with m stages,
the latency is m clock cycles. For architectures with a small number of bits per
stage, the latency can thus be ten to fourteen clock cycles or even more.
The throughput rate can be increased further by using a parallel architecture
[5762] in a time-interleaved manner. The individual A/D converters therefore operate
on a much lower sampling rate than the entire converter, with the reduction in con-
version speed for each individual converter equal to the number of A/D converters in
parallel. The only building block that sees the full input signal bandwidth of the com-
posite converter is the sample-and-hold circuit of each A/D converter. Theoretically, the
conversion rate can be increased by the number of parallel paths, at the cost of a linear
increase in power consumption and large silicon area requirement. A second problem
associated with parallel A/D converters is path mismatch. During operation, the input
signal has to pass different paths from the input to the digital output. If all A/D con-
verters in parallel are identical, these paths are also identical. However, if offset, gain,
bandwidth or time mismatch occur between the individual converters, the path for the
input signal changes each time it is switched from one converter to another.
The successive approximation register (SAR) algorithm A/D conversion [6373]
reduce the circuit complexity and power consumption using a low c onversion rate,
i.e., by allowing one clock period per bit (plus one for the input sampling). An n-bit

Fig.3.5Successive In
S/H + SAR
approximation A/D converter
architecture - logic

D
A n
Digital Out
3.2 Low-Power A/D Converter Architectures 39

SAR A/D converter illustrated in Fig.3.5 typically consists of a S/H circuit followed
by a feedback loop composed by a comparator, a successive approximation Register
(SAR) logic block, and an n-bit D/A converter.
The SAR logic captures the data from the comparator at each clock cycle, and
assembles the words driving the D/A converter bit by bit, from the most- to the
least-significant bit, according to the successive approximation algorithm: The D/A
converter generate a value representing half of the reference voltage. Subsequently,
the comparator determines whether the held signal value is over or under the output
value of the digital-to-analog converter and keeps or resets the MSB. The algorithm
proceeds in the same way predicting each successive bit until all n-bits have been
determined. At the start of the next conversion, while the S/H circuit is sampling
the next input, the SAR provides the n-bit output and resets the registers. Offsets in
the S/H circuit or the comparator generate a shift of the conversion range, however
this shift is identical for every code. The S/H circuit requires a low distortion figure
for relatively low sample periods. Additionally, the D/A converter have stringent
requirements as it determines the overall circuit linearity and the conversion speed.
Due to a minimum number of analog blocks required, and a very simple digi-
tal logic needed to perform the complete conversion, the SAR A/D converters are
usually chosen as the most efficient in terms of power consumption to digitalize
biomedical signals.

3.3A/D Converter Building Blocks

3.3.1Sample and Hold Circuit

Inherent to the A/D conversion process is a sample-and-hold (S/H) circuit that


resides in the front-end of a converter. In addition to suffering from additive cir-
cuit noise and signal distortion just as the rest of the converter does, the S/H also
requires a precision time base to define the exact acquisition time of the input sig-
nal. The dynamic performance degradation of an ADC can often be attributed to
the deficiency of the S/H circuit (and the associated buffer amplifier). The main
function of an S/H circuit is to take samples of its input signal and hold its value
until the A/D converter can process the information. Typically, the samples are
taken at uniform time intervals; thus, the sampling rate (or clock rate) of the cir-
cuit can be determined. The operation of an S/H circuit can be divided into sample
mode (sometimes also referred as acquisition mode) and hold mode, whose dura-
tions need not be equal. In sample mode, the output can either track the input, in
which case the circuit is often called a track and hold (T/H) circuit or it can be
reset to some fixed value. In hold mode an S/H circuit remembers the value of the
input signal at the sampling moment and thus it can be considered as an analog
memory cell. The basic circuit elements that can be employed as memories are
capacitors and inductors, of which the capacitors store the signal as a voltage (or
charge) and the inductors as a current. Since capacitors and switches with a high
40 3 Neural Signal Quantization Circuits

Fig.3.6Switched capacitor CF
S/H circuit configurations in
sample phase: a circuit with
separate CH and CF
CH
Vin Vout

VSS VSS

Fig.3.7Switched capacitor
S/H circuit configurations in
sample phase: a circuit with CH
one capacitor Vin Vout

VSS
off-resistance needed for a voltage memory are far easier to implement in a prac-
tical integrated circuit technology than inductors and switches with a very small
on-resistance required for a current memory, all sample and hold circuits are based
on voltage sampling with switched capacitor (SC) technique. S/H circuit archi-
tectures can roughly be divided into open-loop and closed-loop architectures. The
main difference between them is that in closed-loop architectures the capacitor, on
which the voltage is sampled, is enclosed in a feedback loop, at least in hold mode.
Although open-loop S/H architecture provide high-speed solution, its accuracy,
however, is limited by the harmonic distortion arising from the nonlinear gain of
the buffer amplifiers and the signal-dependent charge injection from the switch.
These problems are especially emphasized with a CMOS technology. Enclosing
the sampling capacitor in the feedback loop reduces the effects of nonlinear para-
sitic capacitances and signal-dependent charge injection from the MOS switches.
Unfortunately, an inevitable consequence of the use of feedback is reduced speed.
Figures3.6, 3.7 and 3.8 illustrate three common configurations for closed-loop
switched-capacitor S/H circuits [56, 6276]. For simplicity, single-ended configu-
rations are shown; however, in circuit implementation all would be fully differ-
ential. In a mixed-signal circuit such as A/D converters, fully differential analog
signals are preferred as a means of getting a better power supply rejection and
immunity to common mode noise. The operation needs two nonoverlapping clock
phasessampling, and holding, or transferring. Switch configurations shown
in Figs.3.6, 3.7 and 3.8 are for the sampling phase, while configurations shown
in Figs.3.9, 3.10, and 3.11 are for hold phase. In all cases, the basic operations
include sampling the signal on the sampling capacitor(s) CH and transferring the
signal charge onto the feedback capacitor CF by using an opamp in the feedback
configuration. In the configuration in Fig.3.6, which is often used as an integrator,
3.3 A/D Converter Building Blocks 41

Fig.3.8Switched capacitor
S/H circuit configurations CF
in sample phase: a circuit
with CF shared as a sampling V in
capacitor CH
V out

V SS

Fig.3.9Switched capacitor CF
S/H circuit configurations
in hold phase: a circuit with
separate CH and CF
CH
Vin Vout

VSS VSS

Fig.3.10Switched capacitor
S/H circuit configurations in
hold phase: a circuit with one CH
capacitor Vin Vout

VSS VSS

Fig.3.11Switched capacitor
S/H circuit configurations CF
in hold phase: a circuit with
CF shared as a sampling Vin
capacitor CH
Vout

VSS VSS
42 3 Neural Signal Quantization Circuits

assuming an ideal opamp and switches, the opamp forces the sampled signal
charge on CH to transfer to CF.
If CH and CF are not equal capacitors, the signal charge transferred to CF will dis-
play the voltage at the output of the opamp according to Vout=(CH/CF) Vin. In this
way, both S/H and gain functions can be implemented within one SC circuit [75, 76].
In the configuration shown in Fig.3.7, only one capacitor is used as both sam-
pling capacitor and feedback capacitor. This configuration does not implement the
gain function, but it can achieve high speed because the feedback factor (the ratio
of the feedback capacitor to the total capacitance at the summing node) can be
much larger than that of the previous configuration, operating much closer to the
unity gain frequency of the amplifier. Furthermore, it does not have the capaci-
tor mismatch limitation as the other two configurations. Here, the sampling is
performed passively, i.e., it is done without the opamp, which makes signal acqui-
sition fast. In hold mode, the sampling capacitor is disconnected from the input
and put in a feedback loop around the opamp [56, 62].
Figure 3.8 shows another configuration which is a combined version of the
configurations in Figs.3.6 and 3.7. In this configuration, in the sampling phase,
the signal is sampled on both CH and CF, with the resulting transfer function
Vout=(1+(CH/CF)) Vin. In the next phase, the sampled charge in the sampling
capacitor is transferred to the feedback capacitor. As a result, the feedback capac-
itor has the transferred charge from the sampling capacitor as well as the input
signal charge. This configuration has a wider bandwidth in comparison to the con-
figuration shown in Fig.3.6, although feedback factor is comparable. Important
parameters in determining the bandwidth of the SC circuit are Gm (transconduct-
ance of the opamp), feedback factor , and output load capacitance. In all of these
three configurations, the bandwidth is given by 1/=Gm/CL, where CL is the
total capacitance seen at the opamp output. Since S/H circuit use amplifier as
buffer, the acquisition time will be a function of the amplifier own specifications.
Similarly, the error tolerance at the output of the S/H is dependent on the ampli-
fiers offset, gain, and linearity. Once the hold command is issued, the S/H faces
other errors. Pedestal error occurs as a result of charge injection and clock feed-
through. Part of the charge built up in the channel of the switch is distributed onto
the capacitor, thus slightly changing its voltage. Also, the clock couples onto the
capacitor via overlap capacitance between the gate and the source or drain.
Another error that occurs during the hold mode is called droop, which is related
to the leakage of current from the capacitor due to parasitic impedances and to
the leakage through the reverse-biased diode formed by the drain of the switch.
This diode leakage can be minimized by making the drain area as small as can be
tolerated. Although the input impedance to the amplifier is very large, the switch
has a finite off impedance through which leakage can occur. Current can also leak
through the substrate.
A prominent drawback of a simple S/H is the on-resistance variation of the
input switch that introduces distortion. Technology scales the supply voltage
faster than the threshold voltage, which results in a larger on-resistance variation
in a switch. As a result, the bandwidth of the switch becomes increasingly signal
3.3 A/D Converter Building Blocks 43

dependent. Clock bootstrapping was introduced to keep the switch gate-source


voltage constant (Sect.3.2). Care must be exercised to ensure that the reliability of
the circuit is not compromised.
While the scaling of CMOS technology offers a potential for improvement on
the operating speed of mixed-signal circuits, the accompanying reduction in the
supply voltage and various short-channel effects create both fundamental and
practical limitations on the achievable gain, signal swing, and noise level of these
circuits, particularly under a low power constraint. In the sampling circuit, thermal
noise is produced due to finite resistance of a MOS transistor switch and is stored
in a sampling capacitor. As the sampling circuit cannot differentiate the noise from
the signal, part of this signal acquisition corresponds to the instantaneous value of
the noise at the moment the sampling takes place. In this context, when the sam-
ple is stored as charge on a capacitor, the root-mean-square (rms) total integrated
thermal noise voltage is vns 2 = kT/C , where kT is the thermal energy and C is
H H
the sampling capacitance. This is often referred to as the kT/C noise. No resist-
ance value at the expression is present, as the increase of thermal noise power
caused by increasing the resistance value is cancelled in turn by the decreasing
bandwidth. In the sampling process the kT/C noise usually comprises two major
contributionsthe channel noise of the switches and the amplifier noise. Since no
direct current is conducted by the switch right before a sampling takes place (the
bandwidth of the S/H circuit is assumed large and the circuit is assumed settled),
the 1/f noise is not of concern here; only the thermal noise contributes, which is a
function of the channel resistance that is weakly affected by the technology scal-
ing [77]. On the other hand, the amplifier output noise is in most cases dominated
by the channel noise of the input transistors, where the thermal noise and the 1/f
noise both contribute. Because the input transistors of the amplifier are usually
biased in saturation region to derive large transconductance (gm), impact ioniza-
tion and hot carrier effect tend to enhance their thermal noise level [7880]; the
1/f noise increases as well due to the reduced gate capacitance resulted from finer
lithography and therefore shorter minimum gate length. It follows that, as CMOS
technology scaling continues, amplifier increasingly becomes the dominant noise
source. Interestingly, the input-referred noise (the total integrated output noise as
well) still takes the form of kT/C with some correction factor 1, vns 2 = kT/C .
1 H
Thus a fundamental technique to reduce the noise level, or to increase the signal-
to-noise ratio of an S/H circuit, is to increase the size of the sampling capacitors.
The penalty associated with this technique is the increased power consumption as
larger capacitors demand larger charging/discharging current to keep up the sam-
pling speed.

3.3.2Bootstrap Switch Circuit

In standard CMOS technologies, the threshold voltage of MOS transistors does


not scale with the supply voltage and it becomes a significant problem when
44 3 Neural Signal Quantization Circuits

MOS transistors are used as switches at low voltages. When the signal ampli-
tudes are large, accuracy and signal bandwidth are limited by distortion, which
originates from the fact that switch on-resistance are not constant but vary
as functions of drain and source voltages. The on-resistance is expressed as
Ron = L/(CoxW(VGSVT)), if VDS is small. In the equation two different signal-
dependent terms can be identified. The first and dominant one is the gate-source
voltage VGS. The second is the threshold voltage VT dependency on the source-
bulk. Although large transistor switches can be used for the worst case VT design,
the switch parasitic capacitance can significantly overload the output of the circuit.
Therefore, increasing VGSVT is desirable to implement low on-resistance switch
without adding too much parasitic capacitance.
Several methods allow increase of this gate voltage drive. One method is to
reduce VT by including an extra low-threshold transistor in the process, although it
will add to process complexity. Another method is to increase VGS using one large
supply created from chip supply to drive all switches on the chip, but potential
problems including possible cross-talk to some sensitive nodes through the shared
supply and difficulty in estimating the total charge drain to drive all switches ren-
ders this method absolvent.
Another viable solution to avoid major source of nonlinearity is to make the
switch gate-source voltage constant, by making the gate voltage track the source
voltage with an offset Voff_in, which is, at its maximum, equal to the supply volt-
age. This technique, which is implemented in this design, is called bootstrap-
ping [81]. In this case, bootstrap circuit shown in Fig.3.12 drives each switch
that use the same clock to avoid the problem of crosstalk through the clock line.
A Voff_in can be generated with a switched capacitor, which is pre-charged in
every clock cycle. During the clock phase when the transistor is nonconductive the
switched capacitor is pre-charged to Voff_in. To turn the switch on, the capacitor

VDD

T2 clk clkn
T1 T3 T4 T13

T8
T11 T12
VSS

T14 T15 T16

T5 T6
clk

T9 T10 out

in
clkn
T7

VSS

Fig.3.12Bootstrap circuit to boost the clock voltage


3.3 A/D Converter Building Blocks 45

is switched between the input voltage and the transistor gate. The capacitor values
are chosen as small as possible for area considerations but large enough to suf-
ficiently charge the load to the desired voltage levels. The device sizes are chosen
to create sufficiently fast rise and fall times at the load. The load consists of the
gate capacitance of the switching device T10 and any parasitic capacitance due to
inter-connect between the bootstrap circuit and the switching device. Therefore,
it is desirable in the layout to minimize the distance between the bootstrap cir-
cuit and the switch or to insert shielding protection. When the switch T10 is on,
its gate voltage VG is greater than the analog input signal Vin by a fixed differ-
ence of Voff_in = VDD. Although the absolute voltage applied to the gate may
exceed for a positive input signal, none of the terminal-to-terminal device voltages
exceeds VDD. A single-phase clock clk turns the switch T10 on and off. During the
off phase, clk is low discharging the gate of the switch to ground through devices
T11 and T12.
At the same time, VDD is applied by T3 and T7 across as capacitor connected
transistor T16, which act as the battery across the gate and source during the on
phase. T8 and T9 isolate the switch from the capacitance while it is charging. When
clkn goes high, T6 pulls down the gate of T8, allowing charge from the battery
capacitor to flow onto gate of T10. This turns on both T9 and T10. T9 enables gate
of T10 to track the input voltage applied at the source of T10 shifted by VDD, keep-
ing the gate-source voltage constant regardless of the input signal.

3.3.3Operational Amplifier Circuit

The maximum speed and, to a large extent, the power consumption of S/H is
determined by the operational amplifier. In general, the amplifiers open loop dc
gain limits the settling accuracy of the amplifier output, while the bandwidth and
slew rate of the amplifier determine the maximal clock frequency. The operational
amplifiers in S/H circuit have some unique requirements, the most important of
which is the input impedance, which must be purely capacitive so as to guarantee
the conservation of charge. Consequently, the operational amplifier input has to be
either in the common source or the source follower configuration. Another char-
acteristic feature of S/H circuit is the load at the amplifier output, which is typi-
cally purely capacitive and as a result, the amplifier output impedance can be high.
The benefit of driving solely capacitive loads is that no output voltage buffers are
required. In addition, if all the amplifier internal nodes have low impedance, and
only the output node has high impedance, the speed of the amplifier can be max-
imized. Unfortunately, an output stage with very high output impedance cannot
usually provide high signal swing.
The ultimate settling accuracy is limited by the finite amplifier dc gain. What
the exact settling error is depends not only on the gain but also on the feedback
factor in the circuit utilizing the amplifier. A very widely used method to improve
the dc gain is based on local negative feedback [8284]. In addition to this cascode
46 3 Neural Signal Quantization Circuits

regulation other techniques for increasing the dc gain have been proposed as well.
Gain boosting with positive feedback has been investigated, [85, 86]. In [87],
dynamic biasing, where the opamp current is decreased toward the end of the set-
tling phase, is used to increase the dc gain. It exploits the fact that current reduc-
tion lowers the transistor gDS, which increases the dc gain. By regulating the gate
voltages of the cascode transistors [88] by adding an extra gain stage, the dc gain
of the amplifier can be increased by several orders of magnitude.
Besides the amplifier bandwidth, the settling time is limited by the fact that
the amplifier can supply only a finite current to the load capacitor. Consequently,
the output cannot change faster than the slew rate. When designing an ampli-
fier, the load capacitor is known and the required slew rate SR = kVmax/TS can
be calculated from the largest voltage step Vmax and the clock period TS. A com-
monly used rule of thumb suggests that one third of the settling time should
be reserved for slewing, resulting in k of six. The required slewing current is
ISR =(kVmaxCL)/TS. It is linearly dependent on the clock frequency, while the
current needed to obtain the amplifier bandwidth has a quadratic dependence. The
opamp unity gain frequency 1 can be made larger increasing gm, in by means of
making the transistors bigger; however, this does not necessarily imply a faster
opamp. The parasitic capacitance is also increased, therefore feedback factor
becomes smaller and dominant pole p=1 is pushed towards lower frequen-
cies. Therefore, a trade-off between the increase of gmin and CG exists. This
suggests that an optimum size for the input pair exist, which maximizes the
transconductance of the opamp by avoiding to make the input capacitance domi-
nant on the feedback factor.
Overview of several single-, and two-stage amplifiers is given in Sect.2.3.

3.3.4Latched Comparator Circuit

Because of its fast response, regenerative latches are used, almost without excep-
tion, as comparators for high-speed applications. An ideal latched comparator is
composed of a preamplifier with infinite gain and a digital latch circuit. Since the
amplifiers used in comparators need not to be either linear or closed-loop, they can
incorporate positive feedback to attain virtually infinite gain [89]. Because of its
special architecture, working process of a latched comparator could be divided in
two stages: tracking and latching stages. In tracking stage the following dynamic
latch circuit is disabled, and the input analog differential voltages are amplified by
the preamplifier. In the latching stage while the preamplifier is disabled, the latch
circuit regenerates the amplified differential signals into a pair of full-scale digital
signals with a positive feedback mechanism and latches them at output ends.
Depending on the type of latch employed, the latch comparators can be divided
into two groups: static [56, 90, 91], which have a constant current consumption
during operation and dynamic [9294], which does not consume any static power.
While dynamic latch circuits regenerate the difference signals, the large voltage
3.3 A/D Converter Building Blocks 47

Fig.3.13Static latch VDD


comparators: [90]
T4 T5 T6 T7

outn

outp

inp T2 T3 inn

bias clkn clkn


T1 T10 T8 T9 T11

VSS

variations on regeneration nodes will introduce the instantaneous large currents.


Through parasitic gate-source and gate-drain capacitances of transistors, the
instantaneous currents are coupled to the inputs of the comparators, making the
disturbances unacceptable. It is so-called kickback noise influence. In an A/D con-
verters where a large number of comparators are switched on or off at the same
time, the summation of variations came from regeneration nodes may become
unexpectedly large and directly results in false quantization code output [95].
The static latched comparator from [90] is shown in Fig.3.13. When the clock
signal is high, T10 and T11 discharge the latch formed with cross-connected tran-
sistors T89 to the output nodes. When the latch signal goes low, the drain current
difference between T6 and T7 appears as the output voltage difference. However,
some delay in the circuit is present since T8 and T11 have to wait until either side
of the output voltage becomes larger than VT. The other is that there is a static cur-
rent in the comparators which are close to the threshold after the output is fully
developed.
Assume the potential of Voutp node is higher than that of Voutn node. After the
short period, T11 turns off and the potential of Voutp becomes VDD, however, since
T8 is in a linear region, the static current from T6 will drain during the regeneration
period. Since the input transistors are isolated from the regeneration nodes through
the current mirror, the kickback noise is reduced. However, the speed of the regen-
eration circuit is limited by the bias current and is not suitable for low-power high-
speed applications.
Figure3.14 illustrates the schematic of the comparator given in [56]. The circuit
consists of a folded-cascode amplifier (T1T7) where the load has been replaced by
a current-triggered latch (T8T10). When the latch signal is high (resetting period),
transistor T10 shorts both the latch outputs. In addition, the on-resistance R of T10
can give an extra gain Areset at the latch output, Areset=(gm1,2R)/(2gm8,9R),
which speed up the regeneration process. However, the on-resistance R, should be
48 3 Neural Signal Quantization Circuits

Fig.3.14Static latch VDD


comparators: [56]
bias T4 T5

bias T6 T7

outn

inp inn clk outp


T2 T3

T10

bias
T1 T8 T9

VSS

chosen such that gm8,9R<2 and should be small enough to reset the output at the
clock rate. Since all transistors are in active region, the latch can start regenerat-
ing right after the latch signal goes low. The one disadvantage of this scheme is
the large kickback noise. The folding nodes (drains of T4 and T5) have to jump up
to VDD in every clock cycle since the latch output does the full swing. Because of
this, there are substantial amounts of kickback noise into the inputs through the
gate-drain capacitor of input transistors T1 and T2 (CGD1, CGD2). To reduce kick-
back noise, the clamping diode has been inserted at the output nodes [96].
In Fig.3.15 design shown in [91] is illustrated. Here, when the latch signal is
low (resetting period), the amplified input signal is stored at gate of T8 and T9 and
T12 shorts both Voutp and Voutn. When the latch signal goes high, the cross-coupled
transistors T10 and T11 make a positive feedback latch. In addition, the positive
feedback capacitors, C1 and C2, boost up the regeneration speed by switching T8
and T9 from an input dependant current source during resetting period to a cross-
coupled latch during the regeneration period. Because of C1 and C2, the T8~T11
work like a cross-coupled inverter so that the latch does not dissipate the static
power once it completes the regeneration period. However, there is a large amount
of kickback noise through the positive feedback capacitors, C1 and C2. The
switches (T6, T7 and T13) have been added to isolate the preamplifier from the
latch. Therefore, the relatively large chip area is required due to the positive feed-
back capacitors (C1, C2), isolation switches (T6, T7 and T13) and complementary
latch signals.
The concept of a dynamic comparator exhibits potential for low power and
small area implementation and, in this context, is restricted to single-stage topolo-
gies without static power dissipation. A widely used dynamic comparator is based
on a differential sensing amplifier as shown in Fig.3.16 was introduced in [92].
Transistors T14, biased in linear region, adjust the threshold resistively and above
them transistors T512 form a latch. When the latch control signal is low, the
3.3 A/D Converter Building Blocks 49

Fig.3.15Static latch VDD


comparators: [91]
clkn
T4 T5

T7 T6
clkn

T7
T6

outn outp

inp inn clkn


T2 clk T3

T12

bias
T1
T10 T11

VSS

Fig.3.16Dynamic latch VDD

comparator [92]
T9 T10 T11 T12

clk clk

outp
T7 T8
outn

T5 T6

inp T1 T2 refn inn T3 T4 refp

VSS

transistors T9 and T12 are conducting and T7 and T8 are cut off, which forces both
differential outputs to VDD and no current path exists between the supply voltages.
Simultaneously, T10 and T11 are cut off and the transistors T5 and T6 conduct.
This implies that T7 and T8 have a voltage of VDD over them. When the compara-
tor is latched, T7 and T8 are turned on. Immediately after the regeneration moment,
the gates of the transistors T5 and T6 are still at VDD and they enter saturation,
amplifying the voltage difference between their sources. If all transistors T512 are
assumed to be perfectly matched, the imbalance of the conductances of the left
and right input branches, formed by T12 and T34, determines which of the out-
puts goes to VDD and which to 0V. After a static situation is reached (Vclk is high),
both branches are cut off and the outputs preserve their values until the comparator
is reset again by switching Vclk to 0V. The transistors T14 connected to the input
50 3 Neural Signal Quantization Circuits

and reference are in the triode region and act like voltage controlled resistors. The
transconductance of the transistors T14 operating in the linear region, is directly
proportional to the drain-source voltage of the corresponding transistor VDS14,
while for the transistors T56 the transconductance is proportional to VGS5,6VT. At
the beginning of the latching process, VDS140 while VGS5,6VTVDD. Thus,
gm5,6gm14, which makes the matching of T5 and T6 dominant in determining the
latching balance. As small transistors are preferred, offset voltages of a few hun-
dred millivolts are easily resulted. Mismatch in transistors T712 are attenuated by
the gain of T5 and T6, which makes them less critical. To cope with the mismatch
problem, the layout of the critical transistors must be drawn as symmetric as pos-
sible. In addition to the mismatch sensitivity, the latch is also very sensitive to an
asymmetry in the load capacitance. This can be avoided by adding an extra latch
or inverters as a buffering stage after the comparator core outputs.
The resistive divider dynamic comparator topology has one clear benefit, which
is its low kickback noise. This results from the fact that the voltage variation at the
drains of the input transistors T14 is very small. On the other hand, the speed and
resolution of the topology are relatively poor because of the small gain of the tran-
sistors biased in the linear region.
A fully differential dynamic comparator based on two cross-coupled differen-
tial pairs with switched current sources loaded with a CMOS latch is shown in
Fig.3.17 [93]. The trip point of the comparator can be set by introducing imbal-
ance between the source-coupled pairs. Because of the dynamic current sources
together with the latch, connected directly between the differential pairs and the
supply voltage, the comparator does not dissipate dc-power. When the comparator
is inactive the latch signal is low, which means that the current source transistors
T5 and T6 are switched off and no current path between the supply voltages exists.
Simultaneously, the p-channel switch transistors T9 and T12 reset the outputs by
shorting them to VDD. The n-channel transistors T7 and T8 of the latch conduct

Fig.3.17Dynamic latch VDD


comparator [93]
clk clk
T9 T10 T11 T12

outn
outp

T7 T8

inp T1 T2 inn refn T3 T4 refp

clk clk
T5 T6

VSS
3.3 A/D Converter Building Blocks 51

and also force the drains of all the input transistors T14 to VDD, while the drain
voltage of T5 and T6 are dependent on the comparator input voltages. When clock
signal is raised to VDD, the outputs are disconnected from the positive supply, the
switching current sources T5 and T6 turn on and T14 compare VinpVinn with Vrefp
Vrefn. Since the latch devices T78 are conducting, the circuit regeneratively ampli-
fies the voltage difference at the drains of the input pairs. The threshold voltage of
the comparator is determined by the current division in the differential pairs and
between the cross-coupled branches.
The threshold level of the comparator can be derived using large signal cur-
rent equations for the differential pairs. The effect of the mismatches of the other
transistors T712 is in this topology not completely critical, because the input is
amplified by T14 before T712 latch. The drains of the cross-coupled differential
pairs are high impedance nodes, and the transconductances of the threshold-volt-
age-determining transistors T14 large. A drawback of the differential pair dynamic
comparator is its high kickback noise: large transients in the drain nodes of the
input transistors are coupled to the input nodes through the parasitic gate-drain
capacitances. However, there are techniques to reduce the kickback noise, e.g., by
cross coupling dummy transistors from the differential inputs to the drain nodes
[97]. The differential pair topology achieves a high speed and resolution, which
results from the built-in dynamic amplification.
Figure 3.18 illustrates the schematic of the dynamic latch given in [94]. The
dynamic latch consists of pre-charge transistors T12 and T13, cross-coupled
inverter T69, differential pair T10 and T11 and switch T14 which prevent the static
current flow at the resetting period. When the latch signal is low (resetting period),
the drain voltages of T1011 are VDDVT, and their source voltage is VT below the
latch input common mode voltage. Therefore, once the latch signal goes high, the
n-channel transistors T7,911 immediately go into the active region. Because each
transistor in one of the cross-coupled inverters turns off, there is no static power
dissipation from the latch once the latch outputs are fully developed.

Fig.3.18Dynamic latch VDD

comparators [94] T12 T6 T8 T13


clk clk clk
T4 T5

outn
outp

T7 T9

inp T1 T2 inn T10 T11

bias clk
T3 T14

VSS
52 3 Neural Signal Quantization Circuits

3.4Voltage-Domain SAR A/D Conversion

Multi-channel, fully-differential designs allow for spatial neural recording and


stimulation at multiple sites [98100]. The maximum number of channels is con-
strained with noise, area, bandwidth, power [101], which has to be supplied to the
implant from outside, thermal dissipation, i.e., to avoid necrosis of the tissues even
by a moderate heat flux [102], and the scalability and expandability of the record-
ing system.
The block diagram of a typical neural recording system architecture is illus-
trated in Fig.3.19. To avoid the large capacitive D/A converter found in the SAR
A/D converter, to lower demands on driving capabilities of the amplifier, and to
relax power, noise and cross-talk requirements, in the alternative architecture illus-
trated in Fig.3.20, the programmable gain amplifier (PGA) and ADC are com-
bined and embedded in every recording channel. The programmable gain analog
to digital converter (PG ADC) implements simultaneously both signal acquisition
and amplification, and data conversion.
As illustrated in Fig.3.21 [103], the schematic incorporates a fully-differential
operational transconductance amplifier (OTA), a comparator and circuitry for con-
trol of the acquisition and amplification operation set by the clock phases s1, s2,
and s3 and output generation data conversion operation, controlled by the clock
phases 1 and 2. The recorded signals are capacitively coupled to the input of
the amplifier to reject the dc polarization. The differential input signal is sampled,
amplified by the capacitance ratio (gain GA is adjustable by implementing C3 as
a programmable capacitor array, GA = C3/C4), and transferred to the integration
capacitors C4 at the feedback loop of the OTA. At data conversion operation, the
differential signal stored in C4 is converted to digital domain by successively add-
ing or subtracting binary-scaled versions of the reference voltage to the integration

Fig.3.19Multichannel &K0
neural interfaces: &K
$QDORJPX[

multiplexing an ADC &K


between M channels '63
Q

$'&
%3/1$ 3*$

Fig.3.20Multichannel Ch #M
neural interfaces; an ADC Ch #2

per channel with serial Ch #1


DSP
interfacing
Serial
n Data

BP-LNA PGA ADC


3.4 Voltage-Domain SAR A/D Conversion 53

vcm

vcm
s2 s1

s1 C3 s2 s3
Vin-

vcm

vcm

vcm
2n C4
C1 1 1

s2 2 C2 2p
Vcomp+
Vref+

Vref-
s2 2 2p Vcomp-
C2
C1 1 1
2n C4
vcm

vcm

vcm

Vin+ s1 s2 s3
C3
s2 s1
vcm
vcm

Fig.3.21Schematic of programmable gain ADC

capacitors. Voltage addition or subtraction is implemented by means of the four


cross-coupled switches controlled by the signals 2p and 2n. Digital representa-
tion of the output signals are then sequentially stored in SAR register for further
processing.
For discrete-time analog signal-processing circuits, analog signals are acquired
and processed consecutively, and a sample of the signal is taken periodically con-
trolled by a clock signal. As the sampling circuit cannot differentiate the noise
from the signal, part of this signal acquisition corresponds to the instantaneous
value of the noise at the moment the sampling takes place. In this context, when
the sample is stored as charge on a capacitor, the root-mean-square total integrated
thermal noise voltage is kT/C4, where kT is the thermal energy. This noise usually
comprises two major contributionsthe channel noise of the switches, which is a
function of the channel resistance and the OTA noise.
The OTA output noise is in most cases dominated by the channel noise of the
input transistors, where the thermal noise and the 1/f noise both contribute. If the
input transistors of the OTA are biased in saturation region to derive large transcon-
ductance gm, impact ionization and hot carrier effect will enhance their thermal
noise level [104]. Similarly, the 1/f noise increases as well due to the reduced
gate capacitance resulted from finer lithography and therefore shorter minimum
gate length. As a consequence, an accurate consideration of the intrinsic noise
sources in such a circuit should have the thermal noise of switches and all ampli-
fier noises readily included. For a given speed requirement and signal swing, a
two times reduction in noise voltage requires a four times increase in the sampling
54 3 Neural Signal Quantization Circuits

Fig.3.22Maximum 85
achievable SNR for different 14 bit
80 SNR[dB]
sampling capacitor values
and resolutions 75
12 bit
70

65

60 10 bit

55

50
C4 [F] 8 bit
45
1f 10f 0.1p 1p 10p

Fig.3.23SNR versus power 90


dissipation
SNR[dB] 14 bit
80

70 12 bit

60 10 bit

50
8 bit
P [W]
40
10n 0.1u 1u 10u 0.1m 1m

capacitance value and the OTA size. This means that the PG ADC circuit power
quadruples for every additional bit resolved for a given speed requirement and
supply voltage as illustrated in Figs.3.22 and 3.23. Notice that for a small sam-
pling capacitor values, thermal noise limits the SNR, while for a large sampling
capacitor, the SNR is limited by the quantization noise and the curve flattens out.
Improving the power efficiency beyond topological changes of the OTA and sup-
ply voltage reduction require smart allocation of the biasing currents. Hence,
techniques such as current reuse [105, 106], time multiplexing [4, 106], and adap-
tive duty-cycling of the entire analog front end [107, 108] can be used to improve
power efficiency by exploiting the fact that neurons spikes are irregular and low
frequency.
Choosing the OTA bandwidth too high increases the noise and additionally
demands unnecessarily low on-resistance of the switches and thus large transistor
dimensions. The optimum time constant remains constant regardless of the circuit
size (or ID) because CL scales together with C4 and the parasitic capacitance Cp.
The choice of the hold capacitor value is a trade-off between noise requirements
on the one hand and speed and power consumption on the other hand.
3.4 Voltage-Domain SAR A/D Conversion 55

Fig.3.24Closed loop 5k
normalized time constant /t
versus hold capacitance
CH for different biasing
conditions; case for
C4=3CL, CL=Cp. The time 10 A
1k
constant is normalized to the
t (=1/ft,intrinsic) of the device, 40 A
which is approximately
(CG/gm) 100 A

400 A
C4 [F] 1 mA
0.1k
0 1p 10p

The sampling action adds kT/C noise to the system which can only be reduced
by increasing the hold capacitance C4. A large capacitance, on the other hand,
increases the load of the operational amplifier and thus decreases the speed for
a given power. The OTA size and its bias current for a given speed requirement
and minimum power dissipation are determined using -versus-C4 curves as in
Fig.3.24. Note that for low frequency operation (where /t is large), the COTA that
achieves the minimum power dissipation for given settling time and noise require-
ments, usually does not correspond to the minimum time constant point. This is a
consequence of setting the C4/COTA ratio of the circuit to the minimum time con-
stant point, which requires larger COTA and results in power increase and excessive
bandwidth. Near the speed limit of the given technology (where the ratio /t is
small), however, the difference in power between the minimum power point and
the minimum time constant point becomes smaller as the stringent settling time
requirement forces the C4/COTA ratio (Fig.3.25) to be at its optimum value to
achieve the maximum bandwidth.
The OTA in PG ADC circuit has some unique requirements; the most important
is the input impedance, which must be purely capacitive so as to guarantee the
conservation of charge. Consequently, the OTA input has to be either in the com-
mon source or the source follower configuration.
Another characteristic feature is the load at the OTA output, which is typi-
cally purely capacitive, and as a result, the OTA output impedance must be high.
The benefit of driving solely capacitive loads is that no output voltage buffers
are required. The implemented folded-cascode OTA is illustrated in Fig.3.26.
The input stage of the OTA is provided with two extra transistors T10 and T11 in
a common-source connection, having their gates connected to a desired reference
common-mode voltage at the input, and their drains connected to the ground [88].
The advantage of this solution is that the common-mode range at the output is
not restricted by a regulation circuit, and can approach a rail-to-rail behavior very
closely. The transistors of the output stage have two constrains: the gm of the cas-
cading transistors T5,6 must be high enough, in order to boost the output resistance
56 3 Neural Signal Quantization Circuits

COTA ,opt [pF]


2.5

2 CL =1.5pF Cp =1.5pF

1.5 Cp =1pF
CL =1pF

1 Cp =0.5pF
CL =0.5pF
C4 [F]
0.5
0.1p 1p 10p

Fig.3.25Optimum gate capacitance COTA,opt versus hold capacitance C4 for different loading
and parasitic conditions

VDD

T9 T7 T8 T12 T14 T17 T19

inp cm inn outp outn


T1 T10 T11 T2

T5 T6 T13 T15 T18 Ibias

T3 T4 T16

VSS

Fig.3.26OTA schematic

of the cascode, allowing a high enough dc gain and the saturation voltage of the
active loads T3,4 and T7,8 must be maximized, in order to reduce the extra noise
contribution of the output stage. These considerations underline a tradeoff between
fitting the saturation voltage into the voltage headroom and minimizing the noise
contribution. A good compromise is to make the cascading transistors larger
than the active loads: in such a way the gm of the cascading transistors is max-
imized, boosting the dc gain, while their saturation voltage is reduced, allowing
for a larger saturation voltage for the active loads, without exceeding the voltage
headroom.
3.4 Voltage-Domain SAR A/D Conversion 57

In order to maximize the output SNR, CL must be maximized, which means that
bandwidth must be minimized. The input-referred noise of the OTA input pair is
reduced by increasing the gm, increasing the current, or increasing the aspect ratio
of the devices. The effect of the last method, however, is partially canceled by the
increase in the noise excess factor. When referred to the OTA input, the noise volt-
ages of the current sources (or mirrors) in the first stages are multiplied by the gm
of the device itself, and divided by the gm of the input transistor, which again sug-
gests that maximizing input pair gm minimizes noise. It can be further reduced by
decreasing the gm of the current sources. Since the current is usually set by other
requirements, the only possibility is to decrease the aspect ratio of the device.
This leads to an increase in the gate overdrive voltage, which, as a positive side
effect, also decreases . Increasing L to avoid short channel effects is also possible,
although with a constant aspect ratio it increases the parasitic capacitances.
The dynamic latch illustrated in Fig.3.27 consists of pre-charge transistors T14
and T17, cross-coupled inverter T1213 and T1516, differential pair T10 and T11 and
switch T9, which prevent the static current flow at the resetting period [94]. A large
portion of the total comparator current is allocated to the input branches to boost
the input gm. Similarly, the noise from the non-gain element i.e. the load transis-
tor, is minimized, by applying small biasing current. Additionally, small width and
large length for their gate dimensions is chosen.
The converter utilizes a synchronous SAR logic consisting of a cascade mul-
tiple input, n bit shift register (Fig.3.28) to generate digital output code and the
switch control signals for the D/A converter. The successive approximation algo-
rithm starts with the activation of the MSB, while the other bits remain zero. When
the conversion continues, the rest of the bits are successively activated. Each bit
evaluates the state of the others and in function of the result, it decides either it
has to be activated, keeps its value, or take the value of the comparator [109]. The
selection depends on the state of the register itself and the state of the following
registers states. As a result, its switching activity is not high and the leakage power
dominates the total power. To reduce the leakage currents several techniques are

VDD
T14 T15 T16 T17
T28 T26 T25 T19 T18 T7 T8 clk clk

outn
outp
inp inn
Ibias T1 T2
T12 T13
T20
T27 T24

T5 T6
T10 T11

T22 T21 T3 T4 clk T9

VSS

Fig.3.27Comparator schematic
58 3 Neural Signal Quantization Circuits

(a) (b)
comp clk
comp
k
clk clk clk clk clk D Q
shift

MUX
cmp k cmp k cmp k cmp k cmp k
shift shift shift shift
Ak Ak Ak Ak Ak
Ak
clk

D<7> D<6> D<5> D<1> D<0>

Fig.3.28a Multi-input SAR logic, b register

employed, including increased channel length, minimum transistor width, and


replacing the gate transistors with stacked pairs [110].

3.5Current-Domain SAR A/D Conversion

The current-mode converters offer high resource efficiency in terms of power and
area [111114]. In contrast to voltage mode charge redistribution SAR A/D con-
verter, corresponding current mode circuit have several intrinsic advantages includ-
ing tunable input impedances, wide bandwidth, and low supply voltage requirement.
Additionally, only MOSFET devices are required for logical and numerical opera-
tion limiting the area requirements. The current mode SAR A/D converter is imple-
mented following the conventional architecture. The output digital code is generated
by comparing the input current offered through current sample and hold circuit
(S/H) with a reference current provided by binary current D/A converter (DAC). The
comparison is performed in sequence for each bit in the selected resolution, adding
up to n-cycles per conversion (i.e., a binary search). The current comparison requires
only injecting two currents into a single node and using the current, which flows
out of the node, as the algebraic difference of the two input currents. Since most
of current source implementations have high output impedance, the nodal voltage
generated by the output current indicates the result of the comparison. The current
comparator feeds back in each cycle to the SAR logic, adjusting the reference cur-
rent generated by a current mode D/A converter closer to the input value. The input
dynamic range of the D/A converter is controlled by biasing current. As a conse-
quence, the power consumption of the DAC is directly proportional to the signal
level and accordingly, advantageous for the low energy neural signals.
A S/H circuit capture the input signal at the sampling instants and subsequently
hold the signal value, which is then further processed in a current based binary
search algorithm SAR loop. The schematic of the implemented circuit is illus-
trated in Fig.3.29. The circuit is (pseudo) differential, and only a single-ended
version is shown. A sample-and-hold operation is performed by using analog
switch formed by transmission gate T45 and hold capacitor CH. In sample mode,
switch T45 is turned on, and the gates of the current-mirror circuit transistors T1
3.5 Current-Domain SAR A/D Conversion 59

and T2 become connected. Precise current-mirroring operation occurs if the drain


voltages of both transistors are the same. However, the accuracy of the basic cur-
rent mirror formed by transistors T1 and T2 is limited, generating a signal-depend-
ent current conversion error I. Consequently, two operational amplifiers are
added [115], one at the input terminal (formed by transistor T3 and current source
Ib1) and one at the output terminal (formed by transistor T6 and current source Ib2)
to keep the input and the output terminal voltages of a current mirror circuit.
The current difference between the sample and hold output current IS/H and the
D/A converter output current IDAC is integrated by the input gate capacitance of
the first inverter stage T14 of the inverter cascade current comparator illustrated
in Fig.3.30. The first inverter in this circuit operates as an integrating current-to-
voltage converter while the second inverter T56 changes the sign of the output
voltage of the first inverter to be equivalent to input current. The integrating nature
of the comparator ensures that there is no inherent dc offset in the comparator.
Subsequently, the inverter cascade provides a simple, small and effective current-
to-voltage converter/comparator.

Fig.3.29Schematic of VDD
current mode sample and
hold circuit I+Iin-I Ib2 I+Iin-I
I+Ib1 T7 T8

Iin Ib1 Io=Iin


T6

T3
clk
I+Iin-I
T4
T1 T2
I-I
T5 Ib2
I+Iin Ib1
CH
clkn

VSS

Fig.3.30Schematic of VDD
inverter cascade current mode
comparator circuit
T1
IDAC
T2 T5
VOUT

T3 T6

IS/H
T4

VSS
60 3 Neural Signal Quantization Circuits

Fig.3.31Current mode D/A converter

The current mode D/A converter circuit illustrated in Fig.3.31 consists of a cur-
rent replication network, which generates weighted currents using cascoded cur-
rent mirrors (T2341), and a current switching network of differential pairs (T120)
controlled by the binary bits. The cascade current sources are sized up according
to the bit weight and are biased by the same bias voltages. The weighted sources
and each weighted current source (or cascodes) is made of a number of LSB
devices connected in parallel (the LSB device becomes the unit device). By parti-
tioning the weighted devices in units, the unit devices can be positioned according
to common-centroids to reduce the impact of matching error gradients. This sim-
ple and compact implementation is able to reach very high conversion rates, being
limited only by the steepness of the data waveforms carrying the bits, by the maxi-
mum switching speed of the current switches, and by the technology limitation.
At nano-ampere bias levels, mismatch will limit the linearity of the current mode
D/A converter, thus restricting the maximum resolution of the A/D converter
[116]. To achieve an 10-bit resolution calibration as in [111] is employed.

3.6Time-Domain Two-Step A/D Conversion

The time-mode converters based on asynchronous ADCs [117], slope and inte-
grating ADC [118], or pulse-position modulation [119] provide high power and
area efficiency. In time-based methodology, conventional voltage and current vari-
ables are replaced by corresponding time differences between two rising edges as
the time variables, and logic circuits substitute the large-sized and power-hungry
analog blocks. In deep-submicron CMOS devices, even with the supply voltage
reduction, time resolution is increased due to the decrease of gate delay [120].
In the proposed design, a voltage signal is converted to a time-domain repre-
sentation using a comparator-based switched-capacitor circuit [121] and a con-
tinuous-time comparator. To improve the power efficiency, resulting time domain
information is converted to the corresponding digital code with a two-step time-
to-digital converter (TDC), where fine quantization of the resulting residue is
obtained with folding Vernier converter. The implementation results in a 90nm
CMOS technology show that a significant gain on throughput, resource usage and
3.6 Time-Domain Two-Step A/D Conversion 61

&
9'' FORFN

& 9 UHI 06%V


9LQ [ ,[ VWRS
9R V\QFKURQL]HU FRXQWHU
VZLWFKERRVWLQJ ,[ VWDUW /6%V
9FRQW 7'&
VWDUW UHVHW 9RII>@
966
9&0 9&0

Fig.3.32Block diagram of an ADC with two-step time-to-digital conversion; single input ver-
sion shown for clarity

power reduction (less than 2.7W corresponding to a figure of merit of 6.2 fJ/
conversion-step) can be obtained for large-scale neural spike data, with a simple
and compact ADC structure that has minimal analog complexity.
The basic concept of the architecture, which utilizes a linear voltage-to-
time converter (VTC) , and a two-step time-to-digital converter, is illustrated in
Fig.3.32. The scheme is reconfigurable in terms of input gain (through program-
mable capacitance C2), resolution (controlling the number of performed iterations)
and sampling frequency (through the frequency of the input clock). Once a con-
figuration has been selected, the bias current is also dynamically controlled during
the conversion operation to adapt to the reference voltage. A comparator-based,
switched-capacitor gain stage [121] eliminates high-gain, high-speed operational
amplifier from the design, and does not require stabilizing high-gain, high-speed
feedback loop, reducing complexity, and the associated stability versus bandwidth/
power tradeoff. The VTC converts a sampled input voltage to a pulse, whose time
period is linearly proportional to the input voltage.
During the charge transfer phase, the current source IX1 turns on, charges up
the capacitor network consisting of C1 and C2, and generate a constant voltage
ramp on the output voltage Vo and, subsequently, causes the virtual ground volt-
age VX to ramp simultaneously, (Fig.3.33a), via the capacitor divider. The volt-
ages continue to ramp until the comparator detects the virtual ground condition
(VX = VCM), and turns off the current source. When the voltage at the sampling
capacitor reaches the comparator threshold, the comparator output goes high. The
time-to-digital converter measures the time interval tm from the start of the ramp
until the ramp and the input signal crossover point, as illustrated in Fig.3.33b). i.e.
between the start signal rising edges, and the comparator generated stop signal.
The time interval is measured by the TDC, which generates a corresponding digi-
tal output. The most simplest TDC realization, a digital counter, requires a (very)
high counter frequency to realize a high resolution converter. Similarly, delay
line circuits, although more power efficient, necessitate large number of stages to
measure required periods of time, significantly degrading INL and effective reso-
lution [122]. A TDC combining a low-frequency, low-power counter as a coarse
62 3 Neural Signal Quantization Circuits

(a) (b) (c) LSB


max
Vo min
Vo[n] ref
MSB
ramp tm
t
Vo start
Vin
VCM clock D[out]
Vx0 tc
stop
t tf
Vin

Fig.3.33a The output voltage ramps to the final value in comparator-based switched capacitor
charge transfer phase, b ADC timing signals, c input versus output voltage of the proposed ADC

quantizer, and a folding Vernier delay line TDC as a fine quantizer, offer both, a
large dynamic range and power efficiency. The preferred post-processed data of
fine and course TDC output as a function of input voltage is shown in Fig.3.33c).
The maximum and minimum values of the fine folding Vernier TDC match to the
half- and one-and-a-half- period, of the coarse time-to-digital converter, respec-
tively, as measured in increases in the fine TDC unit step size.
The circuit realization of a fully-differential comparator with digitally-pro-
grammable offset adjustment [123] is illustrated in Fig.3.34. Transistors T58
employ iterated instance notation to designate 5 transistors placed in parallel. The
widths of these devices are binary weighted to offer a programmable current gain,
which creates an offset programmable pre-amplifier that is employed for offset
compensation. The continuous-time comparator at the output of the voltage-to-
time converter consists of a differential amplifier followed by a common source
stage (Fig.3.35). The input transistors operate in the subthreshold region for

Fig.3.34Differential VDD
comparator with digitally
programmable offset Vcontrol
T7[4:0] T8[4:0] T12
adjustment
Voff[4:0] Voff[4:0]
T5[4:0] T6[4:0]
Vout
T10 T11
Vinp Vinn
T1 T2

Vbias
T3
Vcontrol
T9
Vout
T4

VSS
3.6 Time-Domain Two-Step A/D Conversion 63

Fig.3.35Continuous-time VDD
comparator
T15 T16 T19

stop
ref Vin
T13 T14

Vbias
T17 T18

VSS

reduced power consumption and to offer a larger input common mode range, and,
consequently, increased ramp dynamic range.
The coarse current source (Fig.3.32) is a PMOS cascode that is controlled by
a switch at the gate of the cascade transistor, and the fine current source is a single
NMOS device with a series switch.
A coarse time quantizer, designed using a counter, measures the number of
reference clock cycles. The fine resolution quantization of the two-step time-to-
digital converter corresponds to a folding Vernier delay TDC. The proposed archi-
tecture executes time-to-digital conversion by counting transitions between the
stop signal and the next reference clock rising edge after stop signal. These transi-
tions are enabled only during the measurement interval. The synchronizer block,
which consists of three flip-flops in series, ensures that the coarse and fine time
measurements are correctly aligned.
A folding Vernier delay TDC is easily scalable to different time resolution and
higher number of bits without increasing the area. The architecture achieves mini-
mum time resolution of Vernier delay element (i.e., basic inverter delay), and, due to
the folding, offers area-efficient solution. Instead of 32-element delay line required
for the regular Vernier architecture, the folding feature allows the same Vernier delay
stages to be used repeatedly to measure the delay. Additionally, with implemented
dynamic control, we sequentially reduce the power required for each conversion.
Block level of a folding TDC is illustrated in Fig.3.36. Simplified overview
of a freeze Vernier delay line architecture is shown in Fig.3.37. In this design,
only four thermal codes are generated at every cycle, and, hence, in the worst case,
the measurement cycle is repeated eight times, which is equivalent to 32-bit ther-
mal code with only four Vernier delay elements. The 4-bit thermal codes are con-
verted into 4 pulses with thermal-to-clock generator, and clock a 5-bit counter at
the output of TDC. For each thermal bit generated in a freeze Vernier delay line,
a corresponding pulse is generated using pulse generator. The distance between
two pulses is controlled with current-starved inverters. For rising edge input, the
circuit generates a pulse. The width of the pulse is determined by the NAND gate,
inverter and the buffer. The enable signal, which decides if either signals start/stop
or v1_start/v1_stop continue into the next cycle, is generated using signals vt4 and vp4.
64 3 Neural Signal Quantization Circuits

Q[4:0]
5bit counter

v1_start Pulse Gen

Thermal bit to
1 clock converter
start 0
vst vt4
Freeze Vernier
0
4-bit thermal code
stop vsp vp4
1
v1_stop
Pulse Gen

enable Enable generation


logic

Fig.3.36Block level of a folding Vernier delay time-to-digital converter

d<n-1> d<n> d<n+1>


vst vt4

feedback feedback feedback

isolation isolation
inverters inverters
vsp vp4

Fig.3.37Simplified overview of a freeze Vernier delay line architecture

In the first conversion cycle, enable=0 and vstart/vstop is selected for measure-
ment, otherwise v1_start/v1_stop is selected. The enable signal is switched from 1
to 0 when the rising edge of vstart/v1_start crosses the rising edge of vstop/v1_stop.
This particular feature dynamically decides when the conversion is stopped, hence,
power/conversion is optimized based on the input. The TDC also offers a feed-
back to the system with a ready signal (inverted signal of enable), indicating that
it is ready for next conversion. The 4-bit thermal code is generated with freeze
Vernier architecture [124]. In the conventional Vernier architecture, time capture
elements or early-late detectors (e.g., a D-register or an arbiter) impose the large
load on the circuit. In the freeze Vernier TDC, the time capturing is instead per-
formed by freezing the node voltages of the start line in a linear Vernier delay line,
allowing a power- and area-efficient conversion. The freeze Vernier converter con-
sists of inverters and current enabled inverters only. Additionally, the circuit does
not require any reset signalit resets on the falling edge of the stop and the start
3.6 Time-Domain Two-Step A/D Conversion 65

signal. The delays of the inverters in the freeze Vernier delay elements are con-
trolled using bias current, thus, controlling the resolution of the TDC.

3.7Experimental Results

Design simulations on the transistor level were performed at body temperature


(37C) on Cadence Virtuoso using industrial hardware-calibrated TSMC 90nm
(voltage- and time-domain ADC) and 65nm (current-mode ADC) CMOS technol-
ogy. All voltage-domain PG ADC simulations were performed with a 1V supply
voltage. Spectral signature of PG ADC is illustrated in Fig.3.38.
The circuit offers a programmable amplification of 018dB by digitally scaling
the input capacitance C3. As shown in Fig.3.39, the signal-to-noise and Distortion
Ratio (SNDR), spurious-free Dynamic Range (SFDR) and Total Harmonic
Distortion (THD) remain constant at different gain settings. The THD in the range
of 10100 kS/s is above 54dB for fin of 10kHz (Fig.3.40). Within the bandwidth
of up to 10kHz, SNDR is above 44dB, and SFDR more than 57dB. The degra-
dation with a higher input signal is mainly due to the parasitic capacitance, clock

Fig.3.38Spectral signature 0

of programmable-gain -15 fin=9.25 KHz


voltage-domain SAR A/D fS=100 kS/S
converter -30 SFDR=57.6 dB
Power [dBFS]

SNDR=45.6 dB
-45

-60

-75

-90

-105
0 12.5 25 37.5 50
Frequency [kHz]

Fig.3.39SFDR, SNDR, and 65


THD versus gain settings SFDR
SNDR
60
THD
SFDR, THD, SNDR [dB]

55

50

45

40
1 2 3 4 5 6 7 8
Gain
66 3 Neural Signal Quantization Circuits

Fig.3.40SFDR, SNDR, 65
SFDR
and THD versus sampling
SNDR
frequency with fin=10kHz 60 THD

SFDR, THD, SNDR [dB]


and gain set to one
55

50

45

40
10 20 30 40 50 60 70 80 90 100
Sampling frequency [kHz]

non-idealities and substrate switching noise. Parasitic capacitance decreases the


feedback factor resulting in an increased settling time constant. The non-idealities
of clock such as clock jitter, nonoverlapping period time, finite rising and fall time,
unsymmetrical duty cycle are another reason for this degradation. The three latter
errors reduce the time allocated for the setting time. The area of the PG ADC is
0.028mm2, and the circuit consumes 1.1W of power at 100 kS/s sampling rate.
Table3.1 summarize the performance, with figure of merit calculated according to
FoM=P/(2fin2ENOB) [J/conversion-step] [125].
The analog circuits in the current-domain A/D converter operate with a 1V
supply, while the digital blocks operate at near-threshold from a 400mV supply.
Spectral signature of the neural interface is illustrated in Fig.3.41. As shown in
Figs. 3.42 and 3.43, SNDR, SFDR and THD remain constant at different input
and sampling frequencies, respectively. Variation across slowslow, and fastfast
corner is 0.2 ENOB. The DNL/INL is 0.2/0.3 LSB, respectively. Current-mode
SAR A/D converter consumes 367 nW (sample and hold 117 nW, comparator 37
nW, D/A converter 149 nW and logic 64 nW). The specifications of the current
mode SAR A/D converter are summarized in Table3.1.
Spectral signature of the time-domain ADC is illustrated in Fig.3.44. The
circuit offers a programmable amplification of 018dB by digitally scaling the

Table3.1Performance ADC [Voltage-mode] [Current-mode] [Time-mode]


summary
Technology 0.09 0.065 0.09
Resolution 8 10 10
VDD [V] 1 1 1
fS [kS/s] 100 40 640 40
ENOB 7.2 9.3 9.4 9.5
FoM 75a 14 6.2 21
[fJ/conv-step]
Power [W] 1.1 0.37 2.7 1.6
Area [mm2] 0.028 0.012 0.022
aPGA+ADC
3.7 Experimental Results 67

Fig.3.41Spectral signature 0
of the current-domain SAR -20 f =18.9 kHz
in
A/D converter f =40 kS/s
S
-40 SFDR=64.7 dB

Power [dBFS]
SNDR=58.3 dB
-60

-80

-100

-120

-140
0 5 10 15 20
Frequency [kHz]

Fig.3.42SFDR, SNDR, and 75


THD versus input frequency SFDR
with fS=20kHz SNDR
SFDR, THD, SNDR [dB]

70 THD

65

60

55
2 4 6 8 10 12 14 16 18 20
Input frequency [kHz]

Fig.3.43SFDR, SNDR, 75
SFDR
and THD versus sampling
SNDR
frequency with fin=1kHz THD
SFDR, THD, SNDR [dB]

70

65

60

55
5 10 15 20 25 30 35 40
Sample frequency [kHz]

voltage-to-time converter. SNDR, SFDR and THD versus sampling, and input fre-
quency is illustrated in Figs.3.45 and 3.46, respectively. The THD in the range
of 40640 kS/s is above 63dB within the bandwidth of neural activity of up to
20kHz; SNDR is above 58dB, and SFDR more than 64dB. The maximum sim-
ulated DNL is 0.6 LSB and the maximum simulated INL is 0.8 LSB. Variation
68 3 Neural Signal Quantization Circuits

Fig.3.44Spectral signature 
of the time-domain A/D  ILQ N+]
converter IV N6V

6)'5 G%

3RZHU>G%)6@
61'5 G%









      
)UHTXHQF\>N+]@

Fig.3.45SFDR, SNDR, 
6)'5
and THD versus sampling
61'5
frequency with fin=20kHz
6)'57+'61'5>G%@

 7+'
and gain set to 18dB






      
6DPSOLQJIUHTXHQF\>N+]@

Fig.3.46SFDR, SNDR, and 


THD versus input frequency 6)'5
with fS=640kHz and gain 61'5
6)'57+'61'5>G%@

 7+'
set to 18dB






    
,QSXWIUHTXHQF\>N+]@

across slow-slow and fast-fast corner is 0.35 ENOB. The VTC is >9 bit linear
across 0.5V input range.
Consequently, ramp rate variation across the input range is limited to 10%,
leading to 400V nonlinear voltage variation across the output range. The refer-
ence clock frequency is 80MHz, and, subsequently, the counter realizes a 5 bit
resolution over the 400ns TDC input time signal range. The ramp repetition fre-
quency, i.e., sampling frequency of the proposed ADC, is 640kHz. The simulated
3.7 Experimental Results 69

Table3.2Comparison with prior art


[100] [106] [126] [117] [119] [111]a [127]a [128] [129]a [130]a
Technology 0.18 0.18 0.18 0.12 0.09 0.18 0.09 0.18 0.35 0.09
Type SAR SAR SAR Time Time Current SAR SAR SAR
VDD [V] 0.45 1 1.8 1.2 1 1.2 1 1.8 3.3 0.5
fS [kS/s] 200 245 120 1000 1000 16 1000 50 16 1280
ENOB 8.3 8.3 9.2 10 7.9 8 9.34 10.2 8.9 9.95
FoM [fJ/conv-step] 21 109 382 175 188 132 2.87 0.22 93 2.36
Power [W] 1.35 8.4 27 180 14 0.45 1.79 13 3.06 3
Area [mm2] NR NR NR 0.105 0.06 0.078b NR 0.038 NR 0.048b
aSimulation data, NRnot reported
bEstimated

ENOB is 9.4 bits over the entire neural spikes input bandwidth. The total A/D
converter consumes 2.7W, when sampled at 640 kS/s, and 1.6W at 40 kS/s,
respectively. The area of the folding Vernier TDC design sums up to 10.5m2,
the average resolution is 10.05ps, it operates at a power supply of 0.4V, and con-
sumes 0.6W of power at 640 kS/s sampling rate. Table3.1 summarize the per-
formance, while Table3.2 show comparison with previous art.

3.8Conclusions

The high density of neurons in neurobiological tissue requires a large number of


electrodes for accurate representation of neural activity. To develop neural prosthe-
ses capable of interfacing with single neurons and neuronal networks, multichan-
nel neural probes and the electrodes need to be customized to the anatomy and
morphology of the recording site. The increasing density and the miniaturization
of the functional blocks in these multielectrode arrays, however, presents signifi-
cant circuit design challenge in terms of area, bandwidth, power, and the scalabil-
ity, programmability and expandability of the recording system. In this chapter,
we present voltage-, current- and time-domain analog to digital converter, and
we evaluate trade-off between noise, speed, and power dissipation and character-
ize the noise fluctuations on a circuit-architecture level. This approach provides
key insight required to address SNR, response time, and linearity of the physical
electronic interface. Presented voltage-domain SAR A/D converter combines the
functionalities of programmable-gain stage and analog to digital conversion, occu-
pies an area of 0.028mm2, and consumes 1.1W of power at 100 kS/s sampling
rate. The power consumption of the current-mode SAR ADC is scaled with the
input current level making the current mode A/D converter suitable for low energy
signals, achieving the figure of merit of the 14 fJ/conversion-step, and THD of
63.4dB at 40 kS/s sampling frequency. The circuit consumes only 0.37W, and
occupies an area of 0.012mm2 in a 65 nm CMOS technology. A time-based A/D
70 3 Neural Signal Quantization Circuits

converter consumes less than 2.7W of power when operating at 640 kS/s sam-
pling frequency. With 6.2 fJ/conversion-step, the circuit realized in 90nm CMOS
technology exhibits one of the best FoM reported, and occupies an estimated area
of only 0.022mm2.

References

1. M.A.L. Nicolelis, Actions from thoughts. Nature 409, 403407 (2001)


2. U. Frey etal., An 11k-electrode 126-channel high-density micro-electrode array to inter-
act with electrogenic cells, in IEEE International Solid-State Circuits Conference Digest of
Technical Papers, pp. 158159, 2007
3. A.P. Georgopoulos, A.B. Schwartz, R.E. Kettner, Neuronal population coding of movement
direction. Science 233(4771), 14161419 (1986)
4. C. Chae etal., A 128-channel 6 mw wireless neural recording IC with spike feature extrac-
tion and UWB transmitter. IEEE Trans. Neural Syst. Rehabil. Eng. 17(4), 312321 (2009)
5. M. Yin, M. Ghovanloo, A low-noise preamplifier with adjustable gain and bandwidth for bio
potential recording applications, in IEEE International Symposium on Circuits and Systems,
pp. 321324, 2007
6. J. Lin, B. Haroun, An Embedded 0.8V/480W 6b/22MHz flash ADC in 0.13m digital
CMOS process using nonlinear double-interpolation technique, in IEEE International Solid-
State Circuits Conference Digest of Technical Papers, pp. 244246, 2002
7. K. Uyttenhove, M. Steyaert, A 1.8V, 6bit, 1.3GHz CMOS flash ADC in 0.25m CMOS,
in Proceedings of IEEE European Solid-State Circuits Conference, pp. 455458, 2002
8. X. Jiang, Z. Wang, M.F. Chang, A 2 GS/s 6 b ADC in 0.18-m CMOS, in IEEE
International Solid-State Circuits Conference Digest of Technical Papers, pp. 322323, 2003
9. C. Sandner, M. Clara, A. Santner, T. Hartig, F. Kuttner, A 6bit, 1.2GSps low-power flash-
ADC in 0.13m digital CMOS, in Proceedings of IEEE European Solid-State Circuits
Conference, pp. 339342, 2004
10. C.-C. Huang, J.-T. Wu, A background comparator calibration technique for flash analog-to-
digital converters. IEEE Trans. Circuits Syst I 52(9), 17321740 (2005)
11. O. Viitala, S. Lindfors, K. Halonen, A 5-bit 1-GS/s flash-ADC in 0.13-m CMOS using
active interpolation, in Proceedings of IEEE European Solid-State Circuits Conference,
pp. 412415, 2006
12. S. Park, Y. Palaskas, M.P. Flynn, A 4-GS/s 4-bit flash ADC in 0.18-m CMOS. IEEE J.
Solid-State Circuits 42(9), 18651872 (2007)
13. J.-I. Kim etal., A 6-b 4.1-GS/s flash ADC with time-domain latch interpolation in 90-nm
CMOS. IEEE J. Solid-State Circuits 48(6), 1142911441 (2013)
14. A. Varzaghani etal., A 10.3-GS/s, 6-bit flash ADC for 10G ethernet applications. IEEE J.
Solid-State Circuits 48(12), 30383048 (2013)
15. J.-I. Kim etal., A 65nm CMOS 7b 2GS/s 20.7 mW flash ADC with cascaded latch interpo-
lation. IEEE J. Solid-State Circuits 50(10), 23192330 (2015)
16. C. Moreland, F. Murden, M. Elliott, J. Young, M. Hensley, R. Stop, A 14-bit
100-MSample/s subranging ADC. IEEE J. Solid-State Circuits 35(7), 17911798 (2000)
17. P. Hui, M. Segami, M. Choi, C. Ling, A.A. Abidi, A 3.3-V 12-b 50-MS/s A/D converter
in 0.6-m CMOS with over 80-dB SFDR. IEEE J. Solid-State Circuits 35(12), 17691780
(2000)
18. M.-J. Choe, B.-S. Song, K. Bacrania, A 13-b 40-MSamples/s CMOS pipelined folding ADC
with background offset trimming. IEEE J. Solid-State Circuits 35(6), 17811790 (2000)
References 71

19. H. van der Ploeg, G. Hoogzaad, H.A.H. Termeer, M. Vertregt, R.L.J. Roovers, A 2.5-V 12-b
54-Msample/s 0.25-m CMOS ADC in 1-mm2 with mixed-signal chopping and calibration.
IEEE J. Solid-State Circuits 36(12), 18591867 (2001)
20. M. Clara, A. Wiesbauer, F. Kuttner, A 1.8V fully embedded 10 b 160 MS/s two-step ADC
in 0.18m CMOS, in Proceedings of IEEE Custom Integrated Circuit Conference, pp.
437440, 2002
21. T.-C. Lin, J.-C. Wu, A two-step A/D converter in digital CMOS processes, in Proceedings of
IEEE Asia-Pacific Conference on ASIC, pp. 177180, 2002
22. A. Zjajo, H. van der Ploeg, M. Vertregt, A 1.8V 100mW 12-bits 80Msample/s two-
step ADC in 0.18-m CMOS, in Proceedings of IEEE European Solid-State Circuits
Conference, pp. 241244, 2003
23. N. Ning, F. Long, S.-Y. Wu, Y. Liu, G.-Q. Liu, Q. Yu, M.-H. Yang, An 8-Bit 250MSPS mod-
ified two-step ADC, in Proceedings of IEEE International Conference on Communications,
Circuits and Systems, pp. 21972200, 2006
24. S. Hashemi, B. Razavi, A 7.1 mW 1 GS/s ADC with 48dB SNDR at Nyquist rate. IEEE J.
Solid-State Circuits 49(8), 17391750 (2014)
25. A. Wiesbauer, M. Clara, M. Harteneck, T. Potscher, C. Fleischhacker, G. Koder, C. Sandner,
A fully integrated analog front-end macro for cable modem applications in 0.18-m CMOS,
in Proceedings of IEEE European Solid-State Circuits Conference, pp. 245248, 2001
26. R.C. Taft, M.R. Tursi, A 100-MS/s 8-b CMOS subranging ADC with sustained parametric
performance from 3.8V down to 2.2 V. IEEE J. Solid-State Circuits 36(3), 331338 (2001)
27. J. Mulder, C.M. Ward, C.-H. Lin, D. Kruse, J.R. Westra, M. Lughtart, E. Arslan, R.J. van de
Plassche, K. Bult, F.M.L. van der Goes, A 21-mW 8-b 125-MSample/s ADC in 0.09-mm2
0.13-m CMOS. IEEE J. Solid-State Circuits 39(5), 21162125 (2004)
28. P.M. Figueiredo, P. Cardoso, A. Lopes, C. Fachada, N. Hamanishi, K. Tanabe, J. Vital, A
90nm CMOS 1.2V 6b 1GS/s two-step subranging ADC, in IEEE International Solid-State
Circuits Conference Digest of Technical Papers, pp. 568569, 2006
29. Y. Shimizu, S. Murayama, K. Kudoh, H. Yatsuda, A 30mW 12b 40MS/s subranging ADC
with a high-gain offset-canceling positive-feedback amplifier in 90nm digital CMOS, in
IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp. 216
217, 2006
30. J. Huber, R.J. Chandler, A.A. Abidi, A 10b 160MS/s 84mW 1V subranging ADC in 90nm
CMOS, in IEEE International Solid-State Circuits Conference Digest of Technical Papers,
pp. 454455, 2007
31. C. Cheng, Y. Jiren, A 10-bit 500-MS/s 124-mW subranging folding ADC in 0.13m CMOS
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 17091712,
2007
32. Y. Shimizu, S. Murayama, K. Kudoh, H. Yatsuda, A split-load interpolation-amplifier-array
300MS/s 8b subranging ADC in 90nm CMOS, in IEEE International Solid-State Circuits
Conference Digest of Technical Papers, pp. 552553, 2008
33. K. Yoshioka etal., Dynamic architecture and frequency scaling in 0.8-1.2 GS/s 7b subrang-
ing ADC. IEEE J. Solid-State Circuits 50(4), 932945 (2015)
34. D.A. Mercer, A 14-b, 2.5 MSPS pipelined ADC with on-chip EPROM. IEEE J. Solid-State
Circuits 31(1), 7076 (1996)
35. I. Opris, L. Lewicki, B. Wong, A single-ended 12-bit 20 MSample/s self-calibrating pipeline
A/D converter. IEEE J. Solid-State Circuits 33(11), 18981903 (1998)
36. A.M. Abo, P.R. Gray, A 1.5-V, 10-bit, 14.3-MS/s CMOS pipeline analog-to-digital con-
verter. IEEE J. Solid-State Circuits 34(5), 599606 (1999)
37. H.-S. Chen, K. Bacrania, B.-S. Song, A 14b 20MSample/s CMOS pipelined ADC, in IEEE
International Solid-State Circuits Conference Digest of Technical Papers, pp. 4647, 2000
38. I. Mehr, L. Singer, A 55-mW, 10-bit, 40-Msample/s Nyquist-rate CMOS ADC. IEEE J.
Solid-State Circuits 35(3), 7076 (2000)
39. Y. Chiu, Inherently linear capacitor error-averaging techniques for pipelined A/D conver-
sion, in IEEE Transaction on Circuits and SystemsII, vol. 47, pp. 229232, 2000
72 3 Neural Signal Quantization Circuits

40. X. Wang, P.J. Hurst, S.H. Lewis, A 12-bit 20-Msample/s pipelined analog-to-digital converter
with nested digital background calibration. IEEE J. Solid-State Circuits 39(11), 17991808
(2004)
41. D. Kurose, T. Ito, T. Ueno, T. Yamaji, T. Itakura, 55-mW 200-MSPS 10-bit pipeline ADCs
for wireless receivers, in Proceedings of IEEE European Solid-State Circuits Conference,
pp. 527530, 2005
42. C.T. Peach, A. Ravi, R. Bishop, K. Soumyanath, D.J. Allstot, A 9-b 400 Msample/s pipe-
lined analog-to-digital converter in 90nm CMOS, in Proceedings of IEEE European Solid-
State Circuits Conference, pp. 535538, 2005
43. A.M.A. Ali, C. Dillon, R. Sneed, A.S. Morgan, S. Bardsley, J. Kornblum, L. Wu, A 14-bit
125 MS/s IF/RF sampling pipelined ADC with 100dB SFDR and 50fs Jitter. IEEE J. Solid-
State Circuits 41(8), 18461855 (2006)
44. M. Daito, H. Matsui, M. Ueda, K. Iizuka, A 14-bit 20-MS/s pipelined ADC with digital dis-
tortion calibration. IEEE J. Solid-State Circuits 41(11), 24172423 (2006)
45. T. Ito, D. Kurose, T. Ueno, T. Yamaji, T. Itakura, 55-mW 1.2-V 12-bit 100-MSPS pipeline
ADCs for wireless receivers, Proceedings of IEEE European Solid-State Circuits Conference,
pp. 540543, 2006
46. J. Treichler, Q. Huang, T. Burger, A 10-bit ENOB 50-MS/s pipeline ADC in 130-nm CMOS at
1.2V supply, in Proceedings of IEEE European Solid-State Circuits Conference, pp. 552555,
2006
47. I. Ahmed, D.A. Johns, An 11-bit 45MS/s pipelined ADC with rapid calibration of DAC
errors in a multi-bit pipeline stage, in Proceedings of IEEE European Solid-State Circuits
Conference, pp. 147150, 2007
48. S.-C. Lee, Y.-D. Jeon, J.-K. Kwon, J. Kim, A 10-bit 205-MS/s 1.0-mm2 90-nm CMOS pipe-
line ADC for flat panel display applications. IEEE J. Solid-State Circuits 42(12), 26882695
(2007)
49. J. Li, R. Leboeuf, M. Courcy, G. Manganaro, A 1.8V 10b 210MS/s CMOS pipelined ADC
featuring 86dB SFDR without calibration, in Proceedings of IEEE Custom Integrated
Circuits Conference, pp. 317320, 2007
50. M. Boulemnakher, E. Andre, J. Roux, F. Paillardet, A 1.2V 4.5mW 10b 100MS/s pipeline
ADC in a 65nm CMOS, in IEEE International Solid-State Circuits Conference Digest of
Technical Papers, pp. 250251, 2008
51. Y.-S. Shu, B.-S. Song, A 15-bit linear 20-MS/s pipelined ADC digitally calibrated with sig-
nal-dependent dithering. IEEE J. Solid-State Circuits 43(2), 342350 (2008)
52. J. Shen, P.R. Kinget, A 0.5-V 8-bit 10-Ms/s pipelined ADC in 90-nm CMOS. IEEE J. Solid-
State Circuits 43(4), 17991808 (2008)
53. C.-J. Tseng, Y.-C. Hsieh, C.-H. Yang, H.-S. Chen, A 10-bit 200 MS/s capacitor-sharing pipe-
line ADC. IEEE Trans. Circuits Syst.-I: Regul. Pap. 60(11), 29022910 (2013)
54. R. Sehgal, F. van der Goes, K. Bult, A 12 b 53 mW 195 MS/s pipeline ADC with 82dB
SFDR using split-ADC calibration. IEEE J. Solid-State Circuits 50(7), 15921603 (2015)
55. L. Yong, M.P. Flynn, A 100 MS/s 10.5 bit 2.46 mW comparator-less pipeline ADC using
self-biased ring amplifiers. IEEE J. Solid-State Circuits 50(10), 23312341 (2015)
56. S.H. Lewis, H.S. Fetterman, G.F. Gross, R. Ramachandran, T.R. Viswanathan, A 10-b
20-Msample/s analog-to-digital converter, in IEEE Journal of Solid-State Circuits, vol. 27,
no. 3, pp. 351358, 1992
57. B. Xia, A. Valdes-Garcia, E. Sanchez-Sinencio, A configurable time-interleaved pipeline
ADC for multi-standard wireless receivers, in Proceedings of IEEE European Solid-State
Circuits Conference, pp. 259262, 2004
58. S.-C. Lee, G.-H. Kim, J.-K. Kwon, J. Kim, S.-H. Lee, Offset and dynamic gain-mismatch
reduction techniques for 10b 200Ms/s parallel pipeline ADCs, in Proceedings of IEEE
European Solid-State Circuits Conference, pp. 531534, 2005
59. S. Limotyrakis, S.D. Kulchycki, D.K. Su, B.A. Wooley, A 150-MS/s 8-b 71-mW CMOS
time-interleaved ADC. IEEE J. Solid-State Circuits 40(5), 10571067 (2005)
References 73

60. C.-C. Hsu, F.-C. Huang, C.-Y. Shih, C.-C. Huang, Y.-H. Lin, C.-C. Lee, B. Razavi, An 11b
800MS/s time-interleaved ADC with digital background calibration, in IEEE International
Solid-State Circuits Conference Digest of Technical Papers, pp. 464465, 2007
61. Z.-M. Lee, C.-Y. Wang, J.-T. Wu, A CMOS 15-bit 125-MS/s time-interleaved ADC with
digital background calibration. IEEE J. Solid-State Circuits 42(10), 21492160 (2007)
62. C.-Y. Chen etal., A 12-bit 3 GS/s pipeline ADC with 0.4mm2 and 500 mW in 40nm digital
CMOS. IEEE J. Solid-State Circuits 47(4), 10131021 (2012)
63. J. Park, H.-J. Park, J.-W. Kim, S. Seo, P. Chung, A 1 mW 10-bit 500 kSps SAR A/D converter,
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 581584, 2000
64. P. Confalonleri et. al., A 2.7 mW 1 MSps 10 b analog-to-digital converter with built-in
reference buffer and 1 LSB accuracy programmable input ranges, in Proceedings of IEEE
European Solid-State Circuits Conference, pp. 255258, 2004
65. N. Verma, A.P. Chandrakasan, An ultra low energy 12-bit rate-resolution scalable SAR ADC
for wireless sensor nodes. IEEE J. Solid-State Circuits 42(6), 11961205 (2007)
66. C.-C. Liu etal., A 10-bit 50-MS/ SAR ADC with a monotonic capacitor switching proce-
dure. IEEE J. Solid-State Circuits 45(4), 731740 (2010)
67. S. Shikata, R. Sekimoto, T. Kuroda, H. Ishikuro, A 0.5V 1.1 MS/sec 6.3 fJ/conversion-step
SAR-ADC with tri-level comparator in 40nm CMOS. IEEE J. Solid-State Circuits 47(4),
10221030 (2012)
68. Z. Dai, A. Bhide, A. Alvandpour, A 53-nW 9.1-ENOB 1-kS/s SAR ADC in 0.13-m CMOS
for medical implant devices. IEEE J. Solid-State Circuits 47(7), 15851593 (2012)
69. G.-Y. Huang etal., A 1-W 10-bit 200-kS/s SAR ADC with a bypass window for biomedical
applications. IEEE J. Solid-State Circuits 47(11), 27832795 (2012)
70. M. Yip, A.P. Chandrakasan, A resolution-reconfigurable 5-to-10-bit 0.4-to-1V power scalable
SAR ADC for sensor applications. IEEE J. Solid-State Circuits 48(6), 14531464 (2013)
71. P. Harpe, E. Cantatore, A. van Roermund, A 10b/12b 40 kS/s SAR ADC with data-driven
noise reduction achieving up to 10.1b ENOB at 2.2 fJ/conversion-step. IEEE J. Solid-State
Circuits 48(12), 30113018 (2013)
72. F.M. Yaul, A.P. Chandrakasan, A 10b SAR ADC with data-dependent energy reduction using
LSB-first successive approximation. IEEE J. Solid-State Circuits 49(12), 28252834 (2014)
73. J.-H. Tsai etal., A 0.003mm2 10 b 240 MS/s 0.7 mW SAR ADC in 28nm CMOS with dig-
ital error correction and correlated-reversed switching. IEEE J. Solid-State Circuits 50(6),
13821398 (2015)
74. B.-S. Song, M.F. Tompsett, K.R. Lakshmikumar, A 12 bit 1MHz capacitor error averaging
pipelined A/D converter. IEEE J. Solid-State Circuits 23(10), 13241333 (1988)
75. Y.-M. Lin, B. Kim, P.R. Gray, A 13-b 2.5-MHz self-calibrated pipelined A/D converter in
3-m CMOS. IEEE J. Solid-State Circuits 26(5), 628635 (1991)
76. C.S.G. Conroy, D.W. Cline, P.R. Gray, A high-speed parallel pipelined ADC technique in
CMOS, Proceedings of IEEE Symposium on VLSI Circuits, pp. 9697, 1992
77. B.-S. Song, M.F. Tompsett, K.R. Lakshmikumar, A 12 bit 1MHz capacitor error averaging
pipelined A/D. IEEE J. Solid-State Circuits 23(10), 13241333 (1988)
78. J.M. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design
Perspective, 2nd edn. (Prentice Hall, New Jersey, 2003)
79. A.A. Abidi, High-frequency noise measurements on FETs with small dimensions. IEEE
Trans. Electron Devices 33(11), 18011805 (1986)
80. C. Enz, Y. Cheng, MOS transistor modeling for RF IC design. IEEE J. Solid-State Circuits
35(2), 186201 (2000)
81. A.M. Abo, P.R. Gray, A 1.5-V, 10-bit, 14.3-MS/s CMOS pipeline analog-to-digital converter.
IEEE J. Solid-State Circuits 34(5), 599606 (1999)
82. B.J. Hosticka, Improvement of the gain of MOS amplifiers. IEEE J. Solid-State Circuits
14(6), 11111114 (1979)
83. E. Sckinger, W. Guggenbhl, A High-swing, high-impedance MOS cascode circuit. IEEE
J. Solid-State Circuits 25(1), 289297 (1990)
74 3 Neural Signal Quantization Circuits

84. U. Gatti, F. Maloberti, G. Torelli, A novel CMOS linear transconductance cell for continuous-
time filters, in Proceedings of IEEE International Symposium on Circuits and Systems,
pp. 11731176, 1990
85. C.A. Laber, P.R. Gray, A positive-feedback transconductance amplifier with applications to
high frequency high Q CMOS switched capacitor filters. IEEE J. Solid-State Circuits 13(6),
13701378 (1988)
86. A.A. Abidi, An analysis of bootstrapped gain enhancement techniques. IEEE J. Solid-State
Circuits 22(6), 12001204 (1987)
87. B.J. Hosticka, Dynamic CMOS amplifiers. IEEE J. Solid-State Circuits 15(5), 881886
(1980)
88. K. Bult, G. Geelen, A fast-settling CMOS op amp for SC circuits with 90-dB DC gain.
IEEE J. Solid-State Circuits 25(6), 13791384 (1990)
89. R. Ockey, M. Syrzycki, Optimization of a latched comparator for high-speed analog-to-digital
converters, in IEEE Canadian Conference on Electrical and Computer Engineering, vol. 1,
pp. 403408, 1999
90. F. Murden, R. Gosser, 12b 50MSample/s two-stage A/D converter, in IEEE International
Solid-State Circuits Conference Digest of Technical Papers, pp. 278279, 1995
91. J. Robert, G.C. Temes, V. Valencic, R. Dessoulavy, D. Philippe, A 16-bit low-voltage CMOS
A/D converter. IEEE J. Solid-State Circuits 22(2), 157263 (1987)
92. T.B. Cho, P.R. Gray, A 10 b, 20 Msample/s, 35 mW pipeline A/D converter. IEEE J. Solid-
State Circuits 30(3), 166172 (1995)
93. L. Sumanen, M. Waltari, K. Halonen, A mismatch insensitive CMOS dynamic comparator
for pipeline A/D converters, in Proceedings of the IEEE International Conference on Circuits
and Systems, pp. 3235, 2000
94. T. Kobayashi, K. Nogami, T. Shirotori, Y. Fujimoto, A current-controlled latch sense ampli-
fier and a static power-saving input buffer for low-power architecture. IEEE J. Solid-State
Circuits 28(4), 523527 (1993)
95. P.M. Figueiredo, J.C. Vital, Low kickback noise techniques for CMOS latched comparators,
in IEEE International Symposium on Circuits and Systems, vol. 1, pp. 537540, 2004
96. B. Nauta, A.G.W. Venes, A 70-MS/s 110-mW 8-b CMOS folding and interpolating A/D
converter. IEEE J. Solid-State Circuits 30(12), 13021308 (1995)
97. J. Lin, B. Haroun, An embedded 0.8V/480W 6b/22MHz flash ADC in 0.13m digital
CMOS Process using nonlinear double-interpolation technique, in IEEE International Solid-
State Circuits Conference Digest of Technical Papers, pp. 244246, 2002
98. F. Shahrokhi etal., The 128-channel fully differential digital integrated neural recording and
stimulation interface. IEEE Trans. Biomed. Circuits Syst. 4(3), 149161 (2010)
99. H. Gao etal., HermesE: a 96-channel full data rate direct neural interface in 0.13um CMOS.
IEEE J. Solid-State Circuits 47(4), 10431055 (2012)
100. D. Han etal., A 0.45V 100-channel neural-recording IC with sub-W/channel comsumption
in 0.18m CMOS. IEEE Trans. Biomed. Circuits Syst. 7(6), 735746 (2013)
101. M.S. Chae, W. Liu, M. Sivaprakasham, Design optimization for integrated neural recording
systems. IEEE J. Solid-State Circuits 43(9), 19311939 (2008)
102. T.M. Seese, H. Harasaki, G.M. Saidel, C.R. Davies, Characterization of tissue morphology,
angiogenesis, and temperature in the adaptive response of muscle tissue to chronic heating.
Lab. Invest. 78(12), 15531562 (1998)
103. A. Rodrguez-Prez etal., A 64-channel inductively-powered neural recording sensor array,
in Proceedings of IEEE Biomedical Circuits and Systems Conference, pp. 228231, 2012
104. C. Enz, Y. Cheng, MOS transistor modeling for RF IC design. IEEE J. Solid-State Circuits
35(2), 186201 (2000)
105. S. Song etal., A 430nW 64nV/VHz current-reuse telescopic amplifier for neural recording
application, in Proceedings of IEEE Biomedical Circuits and Systems Conference, pp. 322325,
2013
106. X. Zou etal., A 100-channel 1-mW implantable neural recording IC. IEEE Trans. Circuits
Syst. I Regul. Pap. 60(10), 25842596 (2013)
References 75

107. J. Lee, H.-G. Rhew, D.R. Kipke, M.P. Flynn, A 64 channel programmable closed-loop neu-
rostimulator with 8 channel neural amplifier and logarithmic ADC. IEEE J. Solid-State
Circuits 45(9), 19351945 (2010)
108. K. Abdelhalim, R. Genov, CMOS DAC-sharing stimulator for neural recording and stimula-
tion arrays, in Proceedings of IEEE International Symposium on Circuits and Systems, pp.
17121715, 2011
109. A. Rossi, G. Fucilli, Nonredundant successive approximation register for A/D converters.
Electronic Lett. 32(12), 10551056 (1996)
110. S. Narendra, V. De, S. Borkar, D.A. Antoniadis, A.P. Chandrakasan, Full-chip subthreshold
leakage power prediction and reduction techniques for sub-0.18-m CMOS. IEEE J. Solid-
State Circuits 39(2), 501510 (2004)
111. B. Haaheim, T.G. Constandinou, A sub-1W, 16kHz Current-mode SAR-ADC for single-
neuron spike recording, in Proceedings of IEEE Biomedical Circuits and Systems Conference,
pp. 29572960, 2012
112. A. Agarwal, Y.B. Kim, S. Sonkusale, Low power current mode ADC for CMOS sensor IC,
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 584587,
2005
113. R. Dlugosz, K. Iniewski, Ultra low power current-mode algorithmic analog-to-digital

converter implemented in 0.18m CMOS technology for wireless sensor network, in
Proceedings of IEEE International Conference on Mixed Design of Integrated Circuits and
Systems, pp. 401406, 2006
114. S. Al-Ahdab, R. Lotfi, W. Serdijn, A 1-V 225-nW 1kS/s current successive approximation
ADC for pacemakers, in Proceedings of IEEE International Conference on Ph.D. Research
in Microelectronics and Electronics, pp. 14, 2010
115. Y. Sugimoto, A 1.5-V current-mode CMOS sample-and-hold IC with 57-dB S/N at 20 MS/s
and 54-dB S/N at 30 MS/s. IEEE J. Solid-State Circuits 36(4), 696700 (2001)
116. B. Linares-Barranco, T. Serrano-Gotarredona, On the design and characterization of femto-
ampere current-mode circuits. IEEE J. Solid-State Circuits 38(8), 13531363 (2003)
117. E. Allier etal., 120nm low power asynchronous ADC, in Proceedings of IEEE

International Symposium on Low Power Electronic Design, pp. 6065, 2005
118. M. Park, M.H. Perrot, A single-slope 80MS/s ADC using two-step time-to-digital conversion,
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 11251128,
2009
119. S. Naraghi, M. Courcy, M.P. Flynn, A 9-bit, 14W and 0.006mm2 pulse position modulation
ADC in 90nm digital CMOS. IEEE J. Solid-State Circuits 45(9), 18701880 (2010)
120. A.P. Chandrakasan etal., Technologies for ultradynamic voltage scaling. Proc. IEEE 98(2),
191214 (2010)
121. J.K. Fiorenza etal., Comparator-based switched-capacitor circuits for scaled CMOS tech-
nologies. IEEE J. Solid-State Circuits 41(12), 26582668 (2006)
122. J.P. Jansson, A. Mantyniemi, J. Kostamovaara, A CMOS time-to-digital converter with better
than 10ps single-shot precision. IEEE J. Solid-State Circuits 41(6), 12861296 (2006)
123. L. Brooks, H.-S. Lee, A 12b, 50 MS/s, fully differential zero-crosssing based pipelined
ADC. IEEE J. Solid-State Circuits 44(12), 33293343 (2009)
124. K. Blutman, J. Angevare, A. Zjajo, N. van der Meijs, A 0.1pJ freeze Vernier time-to-digital
converter in 65nm CMOS, in Proceedings of IEEE International Symposium on Circuits
and Systems, pp. 8588, 2014
125. R.H. Walden, Analog-to-digital converter survey and analysis. IEEE J. Sel. Areas Commun.
17, 539550 (1999)
126. C.M. Lopez etal., An implantable 455-active-electrode 52-channel CMOS neural probe.
IEEE J. Solid-State Circuits 49(1), 248261 (2014)
127. T. Rabuske etal., A self-calibrated 10-bit 1MSps SAR ADC with reduced-voltage charge-
sharing DAC, in Proceedings of IEEE International Symposium on Circuits and Systems,
pp. 24522455, 2013
76 3 Neural Signal Quantization Circuits

128. C. Gao etal., An ultra-low-power extended counting ADC for large scale sensor arrays, in
Proceedings of IEEE International Symposium on Circuits and Systems, pp. 8184, 2014
129. L. Zheng etal., An adaptive 16/64kHz, 9-bit SAR ADC with peak-aligned sampling
for neural spike recording, in IEEE International Symposium on Circuits and Systems,
pp. 23852388, 2014
130. Y.-W. Cheng, K.T. Tang, A 0.5-V 1.28-MS/s 10-bit SAR ADC with switching detect logic,
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 293296,
2015
Chapter 4
Neural Signal Classification Circuits

AbstractRobust, power- and area-efficient spike classifier, capable of accurate


identification of the neural spikes even for low SNR, is a prerequisite for the real-
time, implantable, closed-loop, brainmachine interface. In this chapter, we pro-
pose an easily scalable, 128-channel, programmable, neural spike classifier based
on nonlinear energy operator spike detection, and a boosted cascade, multiclass
kernel support vector machine classification. For efficient algorithm execution, we
transform a multiclass problem with the Keslers construction and extend iterative
greedy optimization reduced set vectors approach with a cascaded method. Since
obtained classification function is highly parallelizable, the problem is subdi-
vided and parallel units are instantiated for the processing of each subproblem via
energy-scalable kernels. After partition of the data into disjoint subsets, we opti-
mize the data separately with multiple SVMs. We construct cascades of such (par-
tial) approximations and use them to obtain the modified objective function, which
offers high accuracy, has small kernel matrices and low computational complexity.
The power-efficient classification is obtained with a combination of the algorithm
and circuit techniques. The classifier implemented in a 65nm CMOS technology
consumes less than 41W of power, and occupies an area of 2.64mm2.

4.1Introduction

The high density of neurons in neurobiological tissue requires a large number of


recording electrodes to be implanted into relevant cortical regions for accurate rep-
resentation of neural activity in freely moving subjects (e.g., for spatially broad
analysis of neuronal synchronization), and to allow the location controllability of
the recording sites [1]. Monitoring the activity of large number of neurons is a pre-
requisite for understanding the cortical structures and can lead to a better compre-
hension of severe brain disorders, such as Alzheimers and Parkinsons diseases,
epilepsy, autism, and psychiatric disorders [2] or to reestablish sensory (e.g., vision,
hearing) or motor (e.g., movement, speech) functions [3]. However, very frequently

Springer International Publishing Switzerland 2016 77


A. Zjajo, Brain-Machine Interface, DOI10.1007/978-3-319-31541-6_4
78 4 Neural Signal Classification Circuits

an electrode records the action potentials from multiple surrounding neurons (e.g.,
due to the background activity of other neurons, slight perturbations in electrode
position or external electrical or mechanical interference, etc.), and the recorded
waveform/spikes consist of the superimposed potentials fired from these neurons.
The ability to distinguish spikes from noise [4], and to distinguish spikes from
different sources from the superimposed waveform, therefore depends on both
the discrepancies between the noise-free spikes from each source and the signal-
to-noise level (SNR) in the recording system. The time occurrences of the action
potentials emitted by the neurons close to the electrode are detected, depending on
the SNR, either by voltage thresholding with respect to an estimation of the noise
amplitude in the signal or with a more advanced technique, such as continuous
wavelet transform [5]. After the waveform alignment, to simplify the classifica-
tion process, a feature extraction step, such as principal component analysis (PCA)
[6] or wavelet decomposition [7] characterizes detected spikes and represents each
detected spike in a reduced-dimensional space, i.e., for a spike consisting of n
sample points, the feature extraction method produces m variables (m<n), where
m is the number of features. Based on these features the spikes are classified into
m-dimensional clusters by k-means [8], expectation maximization (EM) [9], tem-
plate matching [10], Bayesian clustering [11], and artificial neural network (ANN)
with each cluster corresponding to the spiking activity of a single neuron.
The support vector machine (SVM) has been introduced to bioinformatics
and spike classification/sorting [1214] because of its excellent generalization,
sparse solution, and concurrent utilization of quadratic programming, which pro-
vides global optimization. This absence of local minima is a substantial difference
from the artificial neural network classifiers. Like ANN classifiers, applications
of SVMs to any classification problem require the determination of several user-
defined parameters, e.g., choice of an appropriate kernel and related parameters,
determination of regularization parameter (i.e., C) and an appropriate optimization
technique. Correspondingly, SVM applies the structure risk minimization instead
of the empirical risk minimization and solves the problems of nonlinear, dimen-
sionality curse efficiently. However, the methods [1214] could not identify multi-
class neural spikes nor could they decompose overlapping neural spikes resulting
from variable triggering of data collection (e.g., due to noise or other spike events
leading to premature or delayed waveform). Recording multiple spikes on a spe-
cific electrode can also create complex sums of neuron waveforms [15].
In this chapter, we present a 128-channel, programmable, neural spike classifier
based on nonlinear energy operator spike detection, and multiclass kernel support
vector machine classification that is able to accurately identify overlapping neural
spikes even for low SNR. For efficient algorithm execution, we transform the mul-
ticlass problem with the Keslers construction and extend iterative greedy optimi-
zation reduced set vectors approach with a cascaded method. The power-efficient,
multichannel clustering is achieved by a combination of the several algorithm and
circuit techniques, namely, the Keslers transformation, a boosted cascade reduced
set vectors approach, a two-stage pipeline processing units, the power-scalable
kernels, the register-bank memory, high-VT devices, and a near-threshold supply.
4.1Introduction 79

The results obtained in a 65nm CMOS technology show that an efficient, large-
scale neural spike data classification can be obtained with a low power (less than
41W, corresponding to a 15.5W/mm2 of power density), compact, and a low
resource usage structure (31k logic gates resulting in a 2.64mm2 area).
This chapter is organized as follows: Sect.4.2 focuses on the neural spike clas-
sifier and associated design decisions. In Sect.4.3, SVM training and classification
are described, and iterative greedy optimization reduced set vectors approach is
extended with a cascaded method boosted cascade. Section4.4 elaborates experi-
mental results. Finally, Sect.4.5 provides a summary and the main conclusions.

4.2Spike Detector

The data acquired by the recording electrodes in 128-channel (816 arrange-


ment) neural recording interface is conditioned using analog circuits, as illustrated
in Fig.4.1. Each channel consists of an electrode, a low-noise preamplifier (LNA),
a band-pass filter, and a programmable gain postamplifier (PGA), while an 10-bit
A/D converter (ADC) is shared by 16 postamplifiers through time multiplexing. The
ADC output is fed to a backend signal processing unit, which provides additional
filtering and executes a spike sorting. Several previous spike sorting DSP realiza-
tions [1618] have implemented spike detection and feature extraction, however,
most spike sorting clustering algorithms, e.g., means, and superparamagnetic clus-
tering, are offline, unsupervised algorithms not usable for real-time data streams. In
the proposed design, first threshold crossings of a local energy measurement [19]
are used to detect spikes. A frequency-shaping filter significantly attenuates the low
frequency noise and helps differentiating similar spikes from different neurons.
The feature extraction based on maximum and minimum values of spike wave-
forms first derivatives [20] is employed due to its small computation and little
memory requirement while preserving high information score. Neural spikes are
classified with multiclass support vector machine. The relevant information is then
transmitted to an outside receiver through the transmitter or used for stimulation in
a closed-loop framework.
The 10-bit time-multiplexed neural data, sampled at 40 kS/s is applied to the
control unit (Fig.4.2). A 4kB instruction memory and 8kB data memory offer

front-end neural interface back-end signal processing

neural feature
spike detection classification sorting
signals extraction
(ex: threshold) (ex: K-means) results
(from ADC) (ex: PCA)
training
previous art required
on-chip
proposed system implementation training
recording low noise band-pass filter programmable N:1 mux A/D converter
gain amplifier required
electrode amplifier
neural energy-filter- max-min multiclass
based spike feature SVM sorting
signals
detection extraction classification results
(from ADC)

Fig.4.1Block diagram of a brainmachine interface with N-channel front-end neural recording


interface and backend signal processing
80 4 Neural Signal Classification Circuits

time-multiplexed
neural samples
SRAM for

control unit
16-channel X[n-2] X[n-1] X[n] X[n+1] X[n+2] sorting
max-min multiclass
input neural results
feature SVM
configuration signal
extractor classification
spike detection decision unit noise shaping
filter
FSM
energy filter threshold unit
instruction
SRAM
system control unit
data SRAM arbiter ALU

Fig.4.2The architecture of the backend signal processing

spike detection algorithm programmability and parameter set flexibility. The sys-
tem control unit is loaded with 32 10-bit filter coefficients and a 16-bit threshold
value. The spike detector algorithm calculates the energy function for waveforms
inside a slicing window; when a spike event reaches the threshold, a spike data is
stored and transferred for the alignment process and further feature extraction. The
noise-shaping filter provides the spike waveforms derivatives to identify neurons
kernel signatures (including the positive and negative peaks of the spike derivative
and spike height). The filter coefficients are programmable through the coefficient
register array. Consequently, a variety of noise profiles and spike widths can be
precisely tuned. To attain the marginal phase distortion, we utilized Bessel filter
structure. For real-time, high-signal throughput, all spike-processing operations
including detection, filtering, and feature extraction are performed in parallel.
The SRAM is implemented as the register-bank memory, since it can be scaled
to subthreshold voltages (i.e., to reduce the leakage power). In contrast, the com-
piled SRAM has limited read noise margin, and subsequently, cannot be scaled
below 0.7V.
The register-bank memories are organized as spike registers [16], as shown in
Fig.4.3. Each spike register module consists of 10-bit registers to save the spike
waveforms, and a delay line for clock gating. The decoder enables sequential,
clock-controlled selection of each spike sample S from a spike register. In each

Fig.4.3Selectively clocked spk_in


register-bank memory spike
clk_en1
read decoder

register 1
write decoder

w_en spk_out

spike
clk_enN
register N
addr_w addr_r

S(1) S(2) S(N)


spk_in spk_in spk_in
clk_en
DFF DFF DFF
4.2 Spike Detector 81

10-bit spike register, only 1-bit D flip-flops have an active clock. Accordingly,
such delay-line-based clock-gating arrangement reduces the redundant clock tran-
sitions, and subsequently, allows 10-fold reduction in the clock-switching power
(corresponding to a 32% reduction in the total power consumed by the memory).

4.3Spike Classifier

The support vector machine is a linear classifier in the parameter space; never-
theless, it becomes a nonlinear classifier as a result of the nonlinear mapping of
the space of the input patterns into the high-dimensional feature space. The clas-
sifier operations can be combined to realize variety of multiclass [21] and ensem-
ble classifiers (e.g., classifier trees and adaptive boosting [22]). Instead of creating
many binary classifiers to determine the class labels, we solve a multiclass prob-
lem directly [23] by modifying the binary class objective function and adding and
constraining it for every class. The modified objective function allows simultane-
ous computation of multiclass classification [24]. Let us consider labeled train-
ing spike trains of N data points {yk(i) , xk }k=1,
k=N, i=m
i=1 , where xk is the kth input pattern
from n-dimensional space Rn and y(i)k denotes the output of the ith output unit for
pattern k, i.e., approach very similar to ANN methodology. The m outputs can
encode q=2m different classes. The training procedure of the SVM corresponds
to a convex optimization and amounts to solving a constrained quadratic optimiza-
tion problem (QP); the solution found is, thus, guaranteed to be the unique global
minimum of the objective function. To maximize the margin of y(x), and b are
chosen such that they minimize the nearest integer |||| subject to the optimization
problem formulated as [25]
m N m
(m) 1 
min JLS (i , bi , k,i ) = min ||i ||22 + bi2 + C k,i (4.1)
i ,bi ,k ,i 2
i=1 k=1 i=1

subject to the equality constraints



(1) T
yk [1 1 (xk ) + b1 ] 1 k,1 , k = 1, . . . , N

(2) T

yk [2 2 (xk ) + b2 ] 1 k,2 , k = 1, . . . , N
(4.2)


(m) T

yk [m m (xk ) + bm ] 1 k,m , k = 1, . . . , N

where is a matrix of normal vectors (perpendicular to the hyperplane, e.g.,


defined as x+b = 0), b is vector of biases, C>0 is the regularization constant,
is a vector of slack variables used to relax the inequalities for the case of non-
separable data. The sum i,kk,i is the cost function of spike trains whose distance
to the hyperplane is less than margin 1/||||. In [26] it is demonstrated that (4.1) is
an acceptable formulation in terms of generalization errors though an additional
82 4 Neural Signal Classification Circuits

term b2/2 is added to the objective. To solve the optimization problem, we use the
KarushKuhnTucker theorem [27]. We add a dual set of variables, one for each
constraint and obtain the Lagrangian of the optimization problem (4.1)

N
(m) (i)

L(m) (i , bi , k,i ; k,i ) = JLS k,i {yk [iT i (xk ) + bi ] 1+k,i } (4.3)
k=1

which gives as conditions for optimality



N
L1 (i)
= 0 i = k,i yk i (xk )



i

k=1
N
L1
= 0
(i)
k,i yk = 0 (4.4)

b i

k=1
L1 = 0 =

k,i k,i k,i

for k=1,, N and i=1,, m. The offset of the hyperplane from the origin is
determined by the parameter b/||||. The function (.) is a nonlinear function,
which maps the input space into a higher dimensional space. To avoid working
with the high-dimensional map , we instead choose a kernel function by defin-
ing the dot product in Hilbert space
(x)T (xk ) = (x, xk ) (4.5)
enabling us to treat nonlinear problems with principally linear techniques.
Formally, is a symmetric, positive semidefinite Mercer kernel; the only con-
dition required is that the kernel satisfies a general positivity constraint [27].
To allow for mislabeled examples a modified maximum margin technique is
employed [28]. If there exists no hyperplane x + b = 0 that can divide differ-
ent classes, the objective function is penalized with nonzero slack variables i. The
modified maximum margin technique then finds a hyperplane that separates the
training set with a minimal number of errors and the optimization becomes a
trade-off between a large margin and a small error penalty . The maximum mar-
gin hyperplane and consequently the classification task is then only a function of
the support vectors

N
 N

max Q1 (k ; (xk , xl )) = k 1/2 yk yl (xk , xl )k l
k
k=1 k,l=1
N

s.t. Rm |0 k C, k = 1, . . . , N, k yk = 0
k=1
(4.6)

where k are weight vectors. The QP optimization task in (4.6) is solved efficiently
using sequential minimal optimization, i.e., by constructing the optimal separating
4.3 Spike Classifier 83

hyperplane for the full dataset [29]. Typically, many k go to zero during optimiza-
tion, and the remaining xk corresponding to those k>0 are called support vectors.
To simplify notation, we assume that all nonsupport vectors have been removed,
so that Nx is now the number of support vectors, and k>0 for all k. The resulting
classification function f(x) in (4.6) has the following expansion:
N

f (x) = sgn k yk (x, xk ) + b (4.7)
k=1

where the support vector machine classifier uses the sign of f(x) to assign a class
label y to the object x [30]. The complexity of the computation of (4.7) scales
with the number of support vectors. To simplify the kernel classifier trained by the
SVM, we approximate an input pattern xkR (using (4.7)), e.g., =k(xk) by
a reduced set vectors ziR, e.g., =k(zk), kR, where the weight vector
kR and the vectors zi determine the reduced kernel expansion. The problem of
finding the reduced kernel expansion can be stated as the optimization task
Nx

2
min || || = min k l (xk xl )
,z ,z
k,l=1
Nz Nz
Nx 
(4.8)
 
+ k l (zk zl ) 2 k l (xk zl )
k,l=1 k=1 l=1

Although is not given explicitly, (4.8) can be computed (and minimized) in terms
of the kernel and carried out over both the zk and k. The reduced set vectors zk and
the coefficients l,k for a classifier fl(x) are solved by iterative greedy optimization [31]
m

fl (x) = sgn l,k (x, zl ) + b, l = 1, . . . , Nz (4.9)
k=1

For a given complexity (i.e., number of reduced set vectors) the classifier provides
the optimal greedy approximation of the full SVM decision boundary; the first one
is the one which, using the objective function (4.8) is closest to the full SVM (4.7)
constrained to using only one reduced set vector.
The transformation from the multiclass SVM problem in (4.1) to the single
class problem is based on the Keslers construction [28, 30]. Resulting SVM clas-
sifier is composed of the set of discriminant functions, which are computed as
 
fl (x) = (xk x) km ((l, yk ) (l, m)) + bl
(4.10)
k m

where the vector bj, mK is given by



bl = km ((l, yk ) (l, m))
(4.11)
k m
84 4 Neural Signal Classification Circuits

(a) (b)
td/8 td/8 td/8 td/8 td/8 td/8 td/8 td/8

training
N neural feature 1st layer
selection and cascade
signals
classifier classifier
sv(x1) sv(x2)sv(x3) sv(x4)sv(x5) sv(x6)sv(x7) sv(x8)
training training
2nd layer
sv(x9) sv(x10) sv(x11) sv(x12)
detection
N neural 3rd layer
signals pre- Result
classification
processing sv(x13) sv(x14)
4th layer
sv(x15)

Fig.4.4a Cascaded SVM framework, b binary boosted cascade architecture

Since the data xk appears only in the form of dot products in the dual form, we
can construct the dot product (xk, zl) using the Kronecker delta, i.e., (k, l)=1 for
k=l, and (k, l)=0 for kl and map it to a reproducing kernel Hilbert space
such that the dot product obtains the same value as the function . This property
allows us to configure the SVM classifier via various energy-scalable kernels [32]
for finding nonlinear classifiers. For (.,.) one typically has the following choices:
T
(x,xk) = xTk x (linear SVM); (x,xk) =(xk x +1)d (polynomial SVM of degree
T
d); (x,xk)=tanh[xk x-] sigmoid SVM); (x,xk)=exp{-||x-xk||2} (radial basis
function (RBF) SVM); (x,xk)=exp{-||x-xk||/(22)} exponential radial basis func-
tion (ERBF) SVM; and (x,xk)=exp{-||x-xk||2/(22)} Gaussian RBF SVM, where
, , , and are positive real constants. The kernels yield increasing levels of
strength (e.g., false alarm for linear kernel of 18 per day decrease to 1.2 per day
for RBF kernel [33]). However, the required power for each kernel (from simula-
tion of the CPU) varies by orders of magnitude.
The complexity of the computation of (4.10) scales with the number of sup-
port vectors. To simplify the kernel classifier trained by the SVM, we extend itera-
tive greedy optimization reduced set vectors approach [31] with boosted cascade
classifier (Fig.4.4). Accordingly, the reduced expansion is not evaluated at once,
but rather in a cascaded way, such that in most cases a very small number of sup-
port vectors are applied. The computation of classification function fl(x) involves
matrixvector operations, which are highly parallelizable. Therefore, the problem
is segmented into smaller ones and parallel units are instantiated for the processing
of each subproblem. Consider a set of reduced set vectors classification functions
where the lth function is an approximation with l vectors, chained into a sequence.
After partition of the data into disjoint subsets, we iteratively train the SVM on
subsets of the original dataset and combine support vectors of resulting models to
create new training sets [34, 35]. A query vector is then evaluated by every func-
tion in the cascade and if classified negative the evaluation stops
fc,l (x) = sgn(f1 (x))sgn(f2 (x)) . . . , (4.12)
4.3 Spike Classifier 85

where fc,l(x) is the cascade evaluation function of (4.10). In other words, we bias
each cascade level in a way that one of the binary decisions is very confident, while
the other is uncertain and propagates the data point to the next, more complex cas-
cade level. Biasing of the functions f is done by setting the parameter b to achieve a
desired accuracy of the function on an evaluation set. When a run through the cas-
cade is completed, we combine the remaining support vectors of the final model
with each subset from the first step of the first run. Frequently, a single pass through
the cascade produces satisfactory accuracy, however, if the global optimum is to be
reached, the result of the last level is fed back into the first level to test its fraction of
the input vectors, i.e., whether any of the input vectors have to be incorporated into
the optimization. If this is not valid for all input layer support vectors, the cascade is
converged to the global optimum, else it proceeds with additional pass through the
network.
The training data (td) in Fig.4.4 are split into subsets, and each one is eval-
uated individually for support vectors in the first layer [36]. Hence, eliminating
nonsupport vectors early from the classification, significantly accelerates SVM
procedure. The scheme requires only modest communication from one layer to
the next, and a satisfactory accuracy is often obtained with a single pass through
the cascade. When passing through the cascade, merged support vectors are used
to test data d for violations of the KarushKuhnTucker (KKT) conditions [37]
(Fig. 4.5a). Violators are then combined with the support vectors for the next
iteration. The required arithmetic over feature vectors (the elementwise operands
as well as SVM model parameters) is executed with, two-stage pipeline (i.e., to
reduce glitch propagation) processing unit (Fig.4.5b). Flip-flops are inserted in
the pipeline to lessen the impact of active glitching [38], and to reduce the leakage
energy.

(a) (b)
d1 d2 sv(xi)[j] xj[j]

SUB
Test KKT Test KKT
Merge Merge
MULT
sv(x1) sv(x2)
F/F
0 b

Merge
ADD/SUB

sv(x3)
F/F F/F

f[j] k(.)

Fig.4.5a A cascade with two input sets, b two-stage pipeline processing unit
86 4 Neural Signal Classification Circuits

4.4Experimental Results

Design simulations on the transistor level were performed at body temperature


(37C) on Cadence Virtuoso using industrial hardware-calibrated TSMC 65nm
CMOS technology. In the classifier design, most of the circuit is idle (zero
switching activities) at any clock cycle. Consequently, the leakage dominates the
power consumption. To minimize the leakage, the classifier is synthesized with
high-VT devices. For minimal power consumption, the circuit operates at near-
threshold (0.4V) supply. The test dataset is based on recordings from the human
neocortex and basal ganglia (Fig.4.6). The neural data was input to RTL simula-
tions to obtain switching activity estimates for the design. These estimates were
then annotated into the synthesis flow to obtain energy estimates for the digital
spike-classification module. Instead of thresholding the raw signal, we detect
spikes in a more reliable way using threshold crossings of a local energy meas-
urement of the band-pass-filtered signal [5] (Fig.4.7). The local energy thresh-
old is equal to the squared average standard deviation of the signal defined by
the noise properties of the recording channel and is equal to the minimal SNR
required to be able to distinguish two neurons. Multiple single-unit spike trains
are extracted from extracellular neural signals recorded from microelectrodes,
and the information encoded in the spike trains is subsequently classified with
RBF SVM kernel as illustrative example (Fig.4.7c). Each neuron action poten-
tial waveform is detected from a multiunit extracellular recording and assigned
to one specific unit according to their waveform features. Since this procedure
involves a substantial amount of error in the spike trains, and particularly when
the background noise level is high, we measured testing classification error,

(a) Raw neural signal

1
Amplitude

0
-1

(b) Average square root of the power of the signal


1
Amplitude

0.5

0
(c) Detected spikes

1
Amplitude

0
-1
0 1 2 3 4 5 6 7 8 9 10
Time [s]

Fig.4.6Spike detection from continuously acquired data, the y-axis is arbitrary; a top: raw
signal after amplification, not corrected for gain, b middle: threshold (line) crossings of a local
energy measurement with a running window of 1ms, and c bottom: detected spikes
4.4 Experimental Results 87

(a) Bandpass filtered signal (300-3000Hz)


1

0.5

Amplitude
0

-0.5

-1
0 400 600 800 1000 1200
Time [ms]
(b) Detected spikes
1

0.5
Amplitude

-0.5

-1
0 400 600 800 1000 1200
Time [ms]

(c)
RBF
SVM =5.12, 2 =1.72 with 3 different classes

5 2

4 3
2 Classifier
3 spike 1
spike 2
2 3 spike 3

1
X2

0
1

-1
3
2

-2
-3

-5 -4 -3 -2 -1 0 1 2 3
X1

Fig.4.7a Spike detection from continuously acquired data, b detected spikes, c the SVM sepa-
ration hypersurface for the RBF kernel ( IEEE 2015)

training classification error, margin of the found hyperplane, and number of ker-
nel evaluations.
To improve the data structure from the numerical point of view, the system
in (4.12) is first preprocessed by reordering the nonzero patterns for bandwidth
reduction (Fig.4.8). Figure4.7c gives a three-class classification graphical illus-
tration, where the bold lines represent decision boundaries. For a correctly classi-
(1) (2)
fied example x1, we have 1 =0 and 1 =0, i.e., no loss counted, since both 1,2
and 1,3 are negative.
On the other hand, for an example x2 that violates two margin bounds
(2,2,2,3>0), both methods generate a loss. The algorithm converges very fast
at first steps and slows down as the optimal solution is approached. However,
88 4 Neural Signal Classification Circuits

0 0

100 100

200 200

300 300

400 400

500 500

600 600

700 700

800 800
0 200 400 600 800 0 200 400 600 800

Fig.4.8Nonzero pattern before (left) and after (right) reordering

almost the same classification error rates were obtained for all the parameters
=[102, 5103, 103], indicating that to find good classifier we do not
need the extremely precise solution with 0. The SVM performance is sen-
sitive to hyperparameter settings, e.g., the settings of the complexity param-
eter C and the kernel parameter for the Gaussian kernel. As a consequence,
hyperparameter tuning with grid search approach is performed before the final
model fit. More sophisticated methods for hyperparameter tuning are available
as well [39].
The SVM spike sorting performance has been summarized and benchmarked
(Fig. 4.9) versus four different, relatively computationally efficient meth-
ods for spike sorting, e.g., template matching, principle component analysis,
Mahalanobis, and Euclidean distance. The performance is quantified using the
effective accuracy, e.g., total spikes classified versus spikes correctly classified
(excluding spike detection). The source of spike detection error is either the false
inclusion of a noise segment as a spike waveform or the false omission of spike
waveforms. These errors can be easily modeled by the addition or removal of
spikes at random positions in time, so that the desired percentage of error ratio is
obtained. In contrast, care should be taken in modeling spike classification errors,
since an error in one unit may or may not cause an error in another unit. In all
methods the suitable parameters are selected with which better classification per-
formance is obtained.
The SVM classifier consistently outperforms benchmarked methods over the
entire range of SNRs tested, although it only exceeds the Euclidean distance met-
ric by a slight margin reaching an asymptotic success rate of~97%. The different
SNRs in BMI have been obtained by superimposing attenuated spike waveforms
such as to mimic the background activity observed at the electrode. If we increase
the SNR of the entire front-end brainmachine interface, the spike sorting accu-
racy increases by up to 45% (depending on spike sorting method used).
Similarly, the accuracy of the spike sorting algorithm increases with A/D
converter resolution, although it saturates beyond 56 bit resolution, ultimately
4.4 Experimental Results 89

(a) 100
95
90
85

Accuracy [%]
80
75
70
Mahalanobis
65
PCA
60 SVM
Template Matching
55
Euclidean
50
10 12 14 16 18 20 22 24 26 28 30
SNR [dB]
(b)
100
95
90
85
Accuracy [%]

80
75
70
65 Euclidean
Mahalanobis
60 PCA
SVM
55
Templeate Matching
50
10 12 14 16 18 20 22 24 26 28 30
SNR [dB]

Fig.4.9a Effect of SNR on single spike sorting accuracy of the BMI system, b effect of SNR
on overlapping spikes of three classes on sorting accuracy of the BMI system. ( IEEE 2015)

limited by the SNR. However, since the amplitude of the observed spike signals
can vary, typically, by one order of magnitude, additional resolution is needed
(i.e., 23 bit), if the amplification gain is fixed. Additionally, increasing the sam-
pling rate of A/D converter improves spike sorting accuracy, since this captures
finer features further differentiating the signals. The sorting accuracy of the spike
waveforms, which overlap at different sample points is illustrated in Fig.4.9b.
The correct classification rate of the proposed method is on average 48% larger
than that of other four methods. If the training data contains the spike waveforms
appearing in the process of complex spike bursts, we classify other distorted
spikes generated by the bursting neurons first before resolving the problem of
complex spike bursts partially. The performance of the four other methods is lim-
ited if the distribution of the background noise is non-Gaussian or if the multiple
spike clusters are overlapped.
The estimation error varies with the number of spikes detected (Fig.4.10a),
and it reaches 60dB with normalized distribution at around 700 spikes over
the entire dataset. The convergence period is~0.1s assuming a firing rate at 20
spikes/s from three neurons. The number of support vectors required is partly gov-
erned by the complexity of the classification task. The kernels yield increasing
90 4 Neural Signal Classification Circuits

(a) -10
cluster 1
cluster 2
-25 cluster 3
cluster 4
cluster 5

error [dB]
-40

-55

-70
10 # spikes 100 1000

(b) 1
10

0
10

-1
10
Power [mW]

-2
10 Linear
MLP
-3 Poly
10
RBF
-4
10

-5
10
1 2 3 4
10 10 10 10
# support vectors
(c) 0
10
RBF
Poly
-5
log normalized error

10

-10
10

-15
10
10 100 200 300 400
# support vectors

Fig.4.10a The error versus number of spikes, b energy per cycle versus various SVM kernels,
c log-normalized error in reduced set model order reduction versus number of support vectors

levels of strength; however, the required energy for each kernel varies by orders of
magnitude as illustrated in Fig.4.10b. As the SNR decreases more support vectors
are needed in order to define a more complex decision boundary. For our dataset,
the number of support vectors required is reduced within the range of 300310
(Fig.4.10c). The required cycle count (0.14kcycles) and memory (0.2kB) for lin-
ear kernel versus (4.86kcycles) and (6.7kB) for RBF kernel highlights the mem-
ory usage dependence on the kernels.
The spike detection implementation includes 31k logic gates resulting in a
2.64mm2 area, and consumes only 41W of power from a 0.4V supply voltage.
4.4 Experimental Results 91

Table4.1Comparison with [16] [17] [18] [This work]a


prior art
Technology [nm] 65 90 65 65
Programmability No Yes No Yes
VDD [V] 0.27 1 0.3 0.4
No. of channels 16 128 1 128
Pow. Dens. [W/mm2] 60.9 9.8 43.4 15.5
Power [W] 75 87 2.17 41
Area [mm2] 1.23 8.9 0.05 2.64
aSimulated data

The consumed power corresponds to a temperature increase of 0.11C (i.e.,


assuming the 0.029C/mW model 10), which is~9 times lower than the required
consumed power in a neural implants safe range (<1C). In Table4.1, we com-
pare the state of the spike sorting systems to this work.

4.5Conclusions

The support vector machine has been introduced to bioinformatics and spike
classification/sorting because of its excellent generalization, sparse solution,
and concurrent utilization of quadratic programming. In this chapter, we pro-
pose a programmable neural spike classifier based on multiclass kernel SVM for
128-channel spike sorting system that tracks the evolution of clusters in real time,
and offers high accuracy, has low memory requirements, and low computational
complexity. The implementation results show that the spike classifier operates
online, without compromising on required power and chip area, even in neural
interfaces with a low SNR.

References

1. M.A. Lebedev, M.A.L. Nicolelis, Brain-machine interfaces: Past, present and future. Trends
Neurosci. 29(9), 536546 (2006)
2. G. Buzsaki, Large-scale recording of neuronal ensembles. Nat. Neurosci. 7, 446451 (2004)
3. F.A. Mussa-Ivaldi, L.E. Miller, Brain-machine interfaces: Computational demands and clini-
cal needs meet basic neuroscience. Trends Neurosci. 26(6), 329334 (2003)
4. K.H. Lee, N. Verma, A low-power processor with configurable embedded machine-learning
accelerators for high-order and adaptive analysis of medical-sensor signals. IEEE J. Solid-
State Circuits 48(7), 16251637 (2013)
5. K.H. Kim, S.J. Kim, A wavelet-based method for action potential detection from extracel-
lular neural signal recording with low signal-to-noise ratio. IEEE Trans. Biomed. Eng. 50,
9991011 (2003)
6. D.A. Adamos, E.K. Kosmidis, G. Theophilidis, Performance evaluation of pca-based spike
sorting algorithms. Comput. Methods Programs Biomed. 91, 232244 (2008)
92 4 Neural Signal Classification Circuits

7. R.Q. Quiroga, Z. Nadasdy, Y.B. Shaul, Unsupervised spike detection and sorting with wavelets
and superparamagnetic clustering. Neural Comput. 16, 16611687 (2004)
8. S. Takahashi, Y. Anzai, Y. Sakurai, A new approach to spike sorting for multi-neuronal activities
recorded with a tetrode-how ICA can be practical. Neurosci. Res. 46, 265272 (2003)
9. F. Wood, M. Fellows, J. Donoghue, M. Black, Automatic spike sorting for neural decoding,
in Proceedings of IEEE Conference on Engineering in Medicine and Biological Systems, pp.
40094012, 2004
10. C. Vargas-Irwin, J.P. Donoghue, Automated spike sorting using density grid contour cluster-
ing and subtractive waveform decomposition. J. Neurosci. Methods 164(1), 118 (2007)
11. J. Dai, etal. Experimental study on neuronal spike sorting methods, in IEEE Future
Generation Communication Networks Conference, pp. 230233, 2008
12. R.J. Vogelstein, K. Murari, P.H. Thakur, G. Cauwenberghs, S. Chakrabartty, C. Diehl, Spike
sorting with support vector machines, in Proceedings of Annual International Conference on
IEEE Engineering in Medicine and Biology Society, pp. 546549, 2004
13. K.H. Kim, S.S. Kim, S.J. Kim, Advantage of support vector machine for neural spike train
decoding under spike sorting errors, in Proceedings of Annual International Conference on
IEEE Engineering in Medicine and Biology Society, pp. 52805283, 2005
14. R. Boostani, B. Graimann, M.H. Moradi, G. Pfurtscheller, A comparison approach toward
finding the best feature and classifier in cue-based BCI. Med Biol Eng Comput. 45, 403412
(2007)
15. G. Zouridakis, D.C. Tam, Identification of reliable spike templates in multi-unit extracellular
recordings using fuzzy clustering. Comput. Methods Programs Biomed. 61(2), 9198 (2000)
16. V. Karkare, S. Gibson, D. Markovic, A 75-W, 16-channel neural spike-sorting processor
with unsupervised clustering. IEEE J. Solid-State Circuits 48(9), 22302238 (2013)
17. T.C. Ma, T.C. Chen, L.G. Chen, Design and implementation of a low power spike detection
processor for 128-channel spike sorting microsystem, in IEEE International Conference on
Acoustics, Speech and Signal Processing, pp. 38893892, 2014
18. Z. Jiang, Q. Wang, M. Seok, A low power unsupervised spike sorting accelerator insensitive to
clustering initialization in sub-optimal feature space, in IEEE Design Automation Conference,
pp. 16, 2015
19. K.H. Kim, S.J. Kim, A wavelet-based method for action potential detection from extracel-
lular neural signal recording with low signal-to-noise ratio. IEEE Trans. Biomed. Eng. 50(8),
9991011 (2003)
20. T. Chen, etal., NEUSORT2.0: A multiple-channel neural signal processor with systolic array
buffer and channel-interleaving processing schedule, in International Conference of IEEE
Engineering in Medicine and Biology Society, pp. 50295032, 2008
21. E. Shih, J. Guttag, Reducing energy consumption of multi-channel mobile medical moni-
toring algorithms, in Proceedings of International Workshop on Systems and Networking
Support for Healthcare and Assisted Living Environments, no. 15, pp. 17, 2008
22. R.E. Schapire, A brief introduction to boosting, in Proceedings of International Joint
Conference on Artificial Intelligence, pp. 14011406, 1999
23. B. Schlkopf, A.J. Smola, Learning with kernelssupport vector machines, regularization,
optimization and beyond (The MIT Press, Cambridge, MA, 2002)
24. C.W. Hsu, C.-J. Lin, A comparison of methods for multi-class support vector machines.
IEEE Trans. Neural Networks 13, 415425 (2002)
25. O. Mangasarian, D. Musicant, Successive overrelaxation for support vector machines. IEEE
Trans. Neural Networks 10(5), 10321037 (1999)
26. C.W. Hsu, C.J. Lin, A simple decomposition method for support vector machines. Mach.
Learn. 46, 291314 (2002)
27. V.N. Vapnik, Statistical Learning Theory (Wiley, New York, 1998)
28. V. Franc, V. Hlavac, Multi-class support vector machine, in Proceedings of IEEE
International Conference on Pattern Recognition, vol. 2, pp. 236239, 2002
References 93

29. J. Platt, Fast Training of Support Vector Machines Using Sequential Minimal Optimization,
in Advances in kernel methods: Support vector learning, chapter, Cambridge, MA: The MIT
Press, 1999
30. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification (Wiley, New York, 2000)
31. B. Scholkopf, P. Knirsch, C. Smola, A. Burges, Fast approximation of support vector ker-
nel expansions, and an interpretation of clustering as approximation in feature spaces, in
Mustererkennung 199820, pp, ed. by P. Levi, M. Schanz, R.J. Ahler, F. May (Springer-
Verlag, Berlin, Germany, 1998), pp. 124132
32. H. Lee, S.Y. Kung, N. Verma, Improving kernel-energy tradeoffs for machine learning in
implantable and wearable biomedical applications, in Proceedings of IEEE International
Conference on Acoustics, Speech, and Signal Processing, pp. 15971600, 2011
33. Available: http://www.physionet.org%2cPhysionet
34. C.J. Burges, Simplified support vector decision rules, in International Conference on
Machine Learning, pp. 7177, 1996
35. S.R.M. Ratsch, T. Vetter, Efficient face detection by a cascaded support vector machine
expansion. Roy Soc London Proc Ser 460, 32833297 (2004)
36. H.P. Graf, etal., Parallel support vector machines: the cascade SVM, in Advances in Neural
Information Processing Systems, pp. 521528, 2004
37. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification (Wiley, New York, 2000)
38. K.H. Lee, N. Verma, A low-power processor with configurable embedded machine-learning
accelerators for high-order and adaptive analysis of medical-sensor signals. IEEE J. Solid-
State Circuits 48(7), 16251637 (2013)
39. P. Koch, B. Bischl, O. Flasch, T. Bartz-Beilstein, W. Konen, On the tuning and evolution of
support vector kernels. Evol. Intel. 5, 153170 (2012)
Chapter 5
BrainMachine Interface: System
Optimization

AbstractTo develop neural prostheses capable of interfacing with neuron cells


and neural networks, multichannel probes and the electrodes need to be custom-
ized to the anatomy and morphology of the recording site. The increasing den-
sity and the miniaturization of the functional blocks in these multielectrode arrays,
however, presents significant circuit design challenge in terms of area, power,
and the scalability, reliability and expandability of the recording system. In this
chapter, we propose a novel method for power per area (PPA) optimization under
yield constrains in multichannel neural recording interface. Using a sequence of
minimizations with iteratively generated low-dimensional subspaces, our approach
renders consistently improved PPA ratio and imposes no restrictions on the distri-
bution of process parameters or how the data enters the constraints. The proposed
method can be used with any variability model and subsequently any correlation
model, and is not restricted by any particular performance constraint. The experi-
mental results, obtained on the multichannel neural recording interface circuits
implemented in CMOS 90nm technology, demonstrate power savings of up to
26% and area of up to 22% without yield penalty.

5.1Introduction

Neural prosthesis systems enable the interaction with neural cells either by record-
ing, to facilitate early diagnosis and predict intended behavior before undertaking
any preventive or corrective actions, or by stimulation, to prevent the onset of det-
rimental neural activity. Monitoring the activity of a large population of neurons
in neurobiological tissue with high-density microelectrode arrays in multichannel
implantable brainmachine interface (BMI) is a prerequisite for understanding the
cortical structures and can lead to a better conception of stark brain disorders, such
as Alzheimers and Parkinsons diseases, epilepsy and autism [1] or to reestablish
sensory (e.g., hearing and vision) or motor (e.g., movement and speech) functions
[2]. Practical multichannel BMI systems are combined with CMOS electronics for

Springer International Publishing Switzerland 2016 95


A. Zjajo, Brain-Machine Interface, DOI10.1007/978-3-319-31541-6_5
96 5 BrainMachine Interface: System Optimization

long-term and reliable recording and conditioning of intracortical neural signals


[3], on-chip processing of the recorded neural data [4], and stimulating the nerv-
ous system in a closed-loop framework [5]. To evade the risk of infection, these
systems are implanted under the skin, while the recorded neural signals and the
power required for the implant operation is transmitted wirelessly. This migration,
to allow proximity between electrodes and circuitry, and the increasing density in
multichannel electrode arrays, are, however, creating significant design challenges
in respect to circuit miniaturization and power dissipation reduction of the record-
ing system. Power density is limited to 0.8mW/mm2 [6] to prevent possible heat
damage to the tissue surrounding the device (and subsequently, limited power
consumption prolongs the batterys longevity and evade recurrent battery replace-
ments surgeries). Furthermore, the space to host the system is restricted to ensure
minimal tissue damage and tissue displacement during implantation. As a conse-
quence, intrinsic circuit noise is often traded for low power and high density of
integration.
Technology scaling, circuit topologies, architecture trends, and (post-silicon)
circuit optimization algorithms specifically target power-performance trade-off,
from the spatial resolution (i.e., number of channels), feasible wireless data band-
width and information quality to the delivered power of implantable batteries.
Circuit topologies, such as current reuse [7], time multiplexing [8], sleep modes
[9], adaptive duty-cycling of the entire analog front-end [10], and adaptive system
bandwidth or resolution [11] can be used to improve power efficiency by exploit-
ing the fact that neurons spikes are irregular and low frequency. Circuit optimi-
zation approaches, such as analytical, based on sensitivities [12] and physical
[13] parameters offer guidelines for optimum power operation. The choice of the
nonlinear optimization techniques including system-level hierarchical optimiza-
tion [14], building-block-level optimization [15, 16], structured perceptron [17],
and geometric programing [18] is centered on the nonlinear relation among device
dimensions and their associated performance due to strong short-channel effects in
the nanometer CMOS technology.
In this chapter, we develop a yield constrained sequential power per area (PPA)
minimization framework [19] based on dual quadratic program that is applied to
multivariable optimization in neural interface design under bounded process varia-
tion influences. In the proposed algorithm, we create a sequence of minimizations
of the feasible PPA regions with iteratively generated low-dimensional subspaces,
while accounting for the impact of area scaling. With a two-step estimation flow,
the constrained multi-criteria optimization is converted into an optimization with
a single objective function, and repeated estimation of noncritical solutions are
evaded. Consequently, the yield constraint is only active as the optimization con-
cludes, eliminating the problem of overdesign in the worst-case approach. The
PPA assignment is interleaved, at any design point, with the configuration selec-
tion, which optimally redistributes the overall index of circuit quality to minimize
the total PPA ratio. The proposed method can be used with any variability model
and, subsequently, any correlation model, and is not restricted by any particular
performance constraint. The experimental results, obtained on the multichannel
5.1Introduction 97

neural recording interface circuits implemented in 90nm CMOS technology,


demonstrate power savings of up to 26% and area of up to 22%, without yield
penalty.
This chapter treats static manufacturing variability and noise fluctuation as
stationary and nonstationary stochastic process, respectively, and is organized as
follows: Sect.5.2 provides formulation of the circuit parameters and noise in neu-
ral interface front-end. Sections5.3 and 5.4 focus on the circuit parameters for-
mulation and associated process variability and noise, respectively. Section5.5
discusses PPA optimization under yield constrain in neural recording interface
design. In Sect.5.6, characterization of the fundamental limits of the sensing pro-
cess, post-processing interface circuit, and PPA ratio optimization results obtained
on the prototype are presented. Finally, Sect.5.7 provides a summary and the main
conclusions.

5.2Circuit Parameters Formulation

5.2.1Random Process Variability

The availability of large datasets of process parameters obtained through param-


eter extraction allows the study and modeling of the variation and correlation
between process parameters, which is of crucial importance to obtain realistic
values of the modeled circuit unknowns. Typical procedures determine param-
eters sequentially and neglect the interactions between them and, as a result, the
fit of the model to measured data may be less than the optimum. In addition, the
parameters are obtained as they relate to a specific device and, consequently, they
correspond to different device sizes. The extraction procedures are also generally
specialized to a particular model, and considerable work is required to change or
improve these models.
For complicated IC models, parameter extraction can be formulated as an
optimization problem. The use of direct parameter extraction techniques instead
of optimization allows end-of-line compact model parameter determination. The
model equations are split up into functionally independent parts, and all param-
eters are solved using straightforward algebra without iterative procedures or
least squares fitting. With the constant downscaling of supply voltage, the moder-
ate inversion region becomes more and more important, and an accurate descrip-
tion of this region is thus essential. The threshold voltage-based models, such as
BSIM and MOS 9, make use of approximate expressions of the drain-source chan-
nel current IDS in the weak inversion region (i.e., subthreshold) and in the strong-
inversion region (i.e., well above threshold). These approximate equations are tied
together using a mathematical smoothing function, resulting in neither a physical
nor an accurate description of IDS in the moderate inversion region (i.e., around
threshold).
98 5 BrainMachine Interface: System Optimization

The major advantages of surface potential (defined as the electrostatic potential


at the gate oxide/substrate interface with respect to the neutral bulk) over threshold
voltage based models is that surface potential model does not rely on the regional
approach and I-V and C-V characteristics in all operation regions are expressed/
evaluated using a set of unified formulas. In the surface potential-based model, the
channel current IDS is split up in a drift (Idrift) and a diffusion (Idiff) component,
which are a function of the gate bias VGB and the surface potential at the source
(s0) and the drain (sL) side. In this way, IDS can be accurately described using
one equation for all operating regions (i.e., weak, moderate, and strong-inversion).
The numerical progress has also removed a major concern in surface potential
modeling: the solution of surface potential either in a closed form (with limited
accuracy) exists or as with our use of the second-order Newton iterative method to
improve the computational efficiency in MOS model 11.
The fundamental notion for the study of spatial statistics is that of stochastic
(random) process defined as a collection of random variables on a set of tempo-
ral or spatial locations. Generally, a second-order stationary (wide-sense station-
ary, WSS) process model is employed, but other more strict criteria of stationarity
are possible. This model implies that the mean is constant and the covariance only
depends on the separation between any two points. In a second-order stationary
process only the first and second moments of the process remain invariant. The
covariance and correlation functions capture how the codependence of random
variables at different locations changes with the separation distance. These func-
tions are unambiguously defined only for stationary processes. For example, the
random process describing the behavior of the transistor length L is stationary
only if there is non-systematic spatial variation of the mean L. If the process is
not stationary, the correlation function is not a reliable measure of codependence
and correlation. Once the systematic wafer-level and field-level dependencies are
removed, thereby making the process stationary, the true correlation is found to
be negligibly small. From a statistical modeling perspective, systematic variations
affect all transistors in a given circuit equally. Thus, systematic parametric varia-
tions can be represented by a deviation in the parameter mean of every transistor
in the circuit.
We model the manufactured values of the parameters pi{p1,,pm} for tran-
sistor i as a random variable
pi = p,i + p (di ) p(di , ) (5.1)
where p,i and p(di) are the mean value and standard deviation of the parameter
pi, respectively, p(di,) is the stochastic process corresponding to parameter p, di
denotes the location of transistor i on the die with respect to a point origin and
is the die on which the transistor lies. This reference point can be located, say in
the lower left corner of the die, or in the center, etc. A random process can be rep-
resented as a series expansion of some uncorrelated random variables involving a
complete set of deterministic functions with corresponding random coefficients.
A commonly used series involves spectral expansion [20], in which the random
coefficients are uncorrelated only if the random process is assumed stationary and
5.2 Circuit Parameters Formulation 99

the length of the random process is infinite or periodic. The use of the Karhunen
Love expansion [21] has generated interest because of its biorthogonal property,
that is, both the deterministic basis functions and the corresponding random coef-
ficients are orthogonal [22], e.g., the orthogonal deterministic basis function and
its magnitude are, respectively, the eigenfunction and eigenvalue of the covari-
ance function. Assuming that pi is a zero-mean Gaussian process and using the
KarhunenLove expansion, pi can be written in truncated form (for practical
implementation) by a finite number of terms M as
M
 
pi = p,i + p (di ) p,n p,n ()fp,n (di ) (5.2)
n=1

where {n()} is a vector of zero-mean uncorrelated Gaussian random variables


and fp,n(di) and p,n are the eigenfunctions and the eigenvalues of the covariance
matrix p(d1,d2) (Fig.5.1) of p(di,), controlled through a distance-based weight
term, the measurement correction factor, correlation parameter and process cor-
rection factors cx and cy.

Fig.5.1a Behavior of
modeled covariance func-
tions p using M=5 for
a/p=[1,,10], and b the
model fitting on the available
measurement data (IEEE
2011)
100 5 BrainMachine Interface: System Optimization

Without loss of generality, consider for instance two transistors with given
threshold voltages. In our approach, their threshold voltages are modeled as sto-
chastic processes over the spatial domain of a die, thus making parameters of any
two transistors on the die two different correlated random variables. The value of M
is governed by the accuracy of the eigenpairs in representing the covariance func-
tion rather than the number of random variables. Unlike previous approaches, which
model the covariance of process parameters due to the random effect as a piece-
wise-linear model [23] or through modified Bessel functions of the second kind
[24], here the covariance is represented as a linearly decreasing exponential function
 
Cp (d1 , d2 ) = 1 + dx,y ecx |dx1 dx2 |cy |dy1 dy2 |/
 
(5.3)

where is a distance-based weight term, is the measurement correction factor


for the two transistors located at Euclidian coordinates (x1,y1) and (x2,y2), respec-
tively, cx and cy are process correction factors depending upon the process matu-
rity. For instance, in Fig.5.1a, process correction factor cx,y=0.001 relates to a
very mature process, while cx,y=1 indicates that this is a process in a ramp up
phase. The correlation parameter p reflecting the spatial scale of clustering defined
in [a,a] regulates the decaying rate of the correlation function with respect to
distance (d1,d2) between the two transistors located at Euclidian coordinates
(x1,y1) and (x2,y2). Physically, lower a/ implies a highly correlated process, and
hence a smaller number of random variables are needed to represent the random
process and correspondingly a smaller number of terms in the KarhunenLove
expansion. This means that for cx,y=0.001 and a/=1 the number of transistors
that need to be sampled to assess, say a process parameter such as threshold volt-
age, is much less than the number that would be required for cx,y=1 and a/=10
because of the high nonlinearity shown in the correlation function. To maintain
a fixed difference between the theoretical value and the truncated form, M has to
be increased when a increases at constant b. In other words, for a given M, the
accuracy decreases as a/b increases. Eigenvalues p,n and eigenfunctions fp,n()
are the solution of the homogeneous Fredholm integral equation of the second
kind indexed on a bounded domain D. To find the numerical solution of Fredholm
integral, each eigenfunction is approximated by a linear combination of a line-
arly decreasing exponential function. Resulting approximation error is than min-
imized by the Galerkin method. One example of spatial correlation dependence
and model fitting on the available measurement data through KarhunenLove
expansion is given in Fig.5.1b. For comparison purposes, a grid-based spatial-
correlation model is intuitively simple and easy to use, yet, its limitations due to
the inherent accuracy versus efficiency necessitate a more flexible approach, espe-
cially at short to mid-range distances [24].
We now introduce a model p=f(.), accounting for voltage and current shifts
due to random manufacturing variations in transistor dimensions and process
parameters defined as
p = f (, W , L , p ) (5.4)
5.2 Circuit Parameters Formulation 101

where defines a fitting parameter estimated from the extracted data, W* and L*
represent the geometrical deformation due to manufacturing variations, and p*
models electrical parameter deviations from their corresponding nominal values,
e.g., altered transconductance, threshold voltage, etc. (Appendix B).

5.2.2Noise in Neural Recording Interface

In addition to process parameter variability, which sets the upper bound on the cir-
cuit design in terms of accuracy, linearity and timing, existence of noise associated
with fundamental processes represents an elementary limit on the performance of
electronic circuits.
Neural cell noise model: In the Hodgkin and Huxley framework, a neural chan-
nels configuration is determined by the states of its constituent subunits, where
each subunit can be either in an open or closed state [25]. Adding a noise term
x(V,t) (x=m,h, or n) to the deterministic ordinary differential equation (ODE)
of Hodgkin and Huxley is consistent with the behavior of the Markov process for
channel gating [26]. Such process can be contracted to a Langevin description
(via a Fokker-Planck equation) and expressed as delta-correlated noise processes
neuron(t+,t)=1/x[x(1x)+xx](), where x is the total number of neural
channels, and the transition rates x(t) and x(t) are instantaneous functions of the
membrane potential V(t). Diracs delta function designates that the noise at dif-
ferent times is uncorrelated and the variables m, h, and n represent the aggregated
fraction of open subunits of different types, aggregated across the entire cell mem-
brane. Subsequently, the neural channel noise is modeled as Brownian motion, i.e.,
as a Gauss-distributed nonstationary stochastic process with independent incre-
ments and heuristically fixed constant variance [27].
Electrodetissue interface and signal conditioning circuit noise model: In intra-
cortical microelectrode recordings, biological (neural cell) noise mainly origi-
nates from the firing of several neurons in the tissue surrounding the recording
microelectrode, while thermal noise levels are influenced by the electrodetissue
interface impedance in each individual recording site (as a result of the foreign
body reaction) and the recording bandwidth, i.e., a wider recording bandwidth
increase thermal noise levels. The electrodetissue interface noise includes the
tissue/bulk thermal noise and the electrodeelectrolyte interface noise. Tissue
noise is modeled as the thermal noise generated by the solution/spreading or tis-
sue/encapsulation resistance [28] and the electrode noise is the thermal noise
generated by the charge transfer resistor [29]. The noise of the signal condition-
ing electronic circuits is mainly determined by the thermal and flicker noise.
The most important types of electrical noise sources (thermal, shot, and flicker
noise) in passive elements and integrated-circuit devices have been investi-
gated extensively, and appropriate models derived [30] as stationary and in [31]
as nonstationary noise sources. We adapt model descriptions as defined in [31],
where thermal and shot noise are expressed as thermal(t+,t)=2kTG(t)() and
102 5 BrainMachine Interface: System Optimization

shot(t+,t)=qID(t)(), respectively, where k is Boltzmanns constant, T is the


absolute temperature, G is the conductance, q is the electron charge, and ID is the
current through the junction. These noise processes correspond to the current noise
sources, which are included in the models of the integrated-circuit devices.
A/D converter noise model: Sampled data systems operate on the series of dis-
crete-time samples taken at the end of the sampling period. Although the details of
the processing during each period result in nonstationary noise voltages and cur-
rents, the same operation is performed each clock cycle, leading to the same signal
statistics each clock cycle. Consequently, such stochastic process can be described
as wide-sense cyclostationary. The special case of a white noise input source is of
particular importance since the majority of the noise sources can be traced back
to white noise generated in circuit components. For a white noise step input, the
autocorrelation is a delta function, where Sxo is the one-sided white noise power
spectral density (PSD) of the underlying noise process. Using Parsevals theo-
rem, the variance of the output as a function of the autocorrelation simplifies to
(t+,t)=1/2Sxo(t)() [32]. The one-sided noise PSD of the sampled output
can then be found from the sum of the filtered and shifted two-sided input noise
PSD Sx(f) [33]. Measurements of the output codes for a dc input signal to the A/D
converter can be used to obtain an input-referred noise PSD estimate, SADC(f). The
noise of the input sampler and the converter quantization noise add to the input-
referred noise PSD to give the total input noise PSD Stotal(f)=Ssample(f)+SADC(f)
+Sq(f), where Ssample(f)=(kT/Cs)/(fs/2) is the noise PSD from the input sampler
over the Nyquist range (0fNeuronfs/2) and Sq(f)=(V2LSB /12)/(fs/2) is the A/D
converter quantization noise.

5.3Stochastic MNA for Process Variability Analysis

Device variability effects limitations are rudimentary issues for the robust circuit
design and their evaluation has been subject of numerous studies. Several models
have been suggested for device variability [3436], and correspondingly, a number
of Computer-aided design (CAD) tools for statistical circuit simulation [3742]. In
general, a circuit design is optimized for parametric yield so that the majority of
manufactured circuits meet the performance specifications. The computational cost
and complexity of yield estimation, coupled with the iterative nature of the design
process, make yield maximization computationally prohibitive. As a result, circuit
designs are verified using models corresponding to a set of worst-case conditions
of the process parameters. Worst-case analysis refers to the process of determining
the values of the process parameters in these worst-case conditions and the cor-
responding worst-case circuit performance values. Worst-case analysis is very effi-
cient in terms of designer effort, and thus has become the most widely practiced
technique for statistical analysis and verification. Algorithms previously proposed
for worst-case tolerance analysis fall into four major categories: corner technique,
interval analysis, sensitivity-based vertex analysis, and Monte Carlo simulation.
5.3 Stochastic MNA for Process Variability Analysis 103

The most common approach is the corners technique. In this approach, each
process parameter value that leads to the worst performance is chosen indepen-
dently. This method ignores the correlations among the processes parameters, and
the simultaneous setting of each process parameter to its extreme value result in
simulation at the tails of the joint probability density of the process parameters.
Thus, the worst-case performance values obtained are extremely pessimistic.
Interval analysis is computationally efficient but leads to overestimated results,
i.e., the calculated response space enclose the actual response space, due to the
intractable interval expansion caused by dependency among interval operands.
Interval splitting techniques have been adopted to reduce the interval expan-
sion, but at the expense of computational complexity. Traditional vertex analysis
assumes that the worst-case parameter sets are located at the vertices of param-
eter space, thus the response space can be calculated by taking the union of circuit
simulation results at all possible vertices of parameter space. Given a circuit with
M uncertain parameters, this will result in a 2M simulation problem. To further
reduce the simulation complexity, sensitivity information computed at the nomi-
nal parameter condition is used to find the vertices that correspond to the worst
cases of circuit response. The Monte Carlo algorithm takes random combinations
of values chosen from within the range of each process parameter and repeatedly
performs circuit simulations. The result is an ensemble of responses from which
the statistical characteristics are estimated. Unfortunately, if the number of itera-
tions for the simulation is not very large, Monte Carlo simulation always underes-
timates the tolerance window. Accurately determining the bounds on the response
requires a large number of simulations, so consequently the Monte Carlo method
becomes very cpu-time consuming if the chip becomes large. Other approaches
for statistical analysis of variation-affected circuits, such as the one based on the
Hermite polynomial chaos [43] or the response surface methodology, are able to
perform much faster than a Monte Carlo method at the expense of a design of an
experiments preprocessing stage [44]. In this section, the circuits are described as
a set of stochastic differential equations (SDE) and Gaussian closure approxima-
tions are introduced to obtain a closed form of moment equations. Even if a ran-
dom variable is not strictly Gaussian, a second-order probabilistic characterization
yields sufficient information for most practical problems.
Modern integrated circuits are often distinguished by a very high complex-
ity and a very high packing density. The numerical simulation of such circuits
requires modeling techniques that allow an automatic generation of network equa-
tions. Furthermore, the number of independent network variables describing the
network should be as small as possible. Circuit models have to meet two contra-
dicting demands: they have to describe the physical behavior of a circuit as correct
as possible while being simple enough to keep computing time reasonably small.
The level of the models ranges from simple algebraic equations, over ordinary and
partial differential equations to Boltzmann and Schrodinger equations depending
on the effects to be described. Due to the high number of network elements (up to
millions of elements) belonging to one circuit, one is restricted to relatively simple
models. In order to describe the physics as good as possible, so called compact
104 5 BrainMachine Interface: System Optimization

models represent the first choice in network simulation. Complex elements such
as transistors are modeled by small circuits containing basic network elements
described by algebraic and ODE only. The development of such replacement cir-
cuits forms its own research field and leads nowadays to transistor models with
more than 500 parameters. A well-established approach to meet both demands to
a certain extent is the description of the network by a graph with branches and
nodes. Branch currents, branch voltages and node potentials are introduced as
variables. The node potentials are defined as voltages with respect to one refer-
ence node, usually the ground node. The physical behavior of each network ele-
ment is modeled by a relation between its branch currents and its branch voltages.
In order to complete the network model, the topology of the elements has to be
taken into account. Assuming the electrical connections between the circuit ele-
ments to be ideally conducting and the nodes to be ideal and concentrated, the
topology can be described by Kirchhoffs laws (the sum of all branch currents
entering a node equals zero and the sum of all branch voltages in a loop equals
zero). In general, for time-domain analysis, modified nodal analysis (MNA) leads
to a nonlinear ODE or differential algebraic equation system which, in most cases,
is transformed into a nonlinear algebraic system by means of linear multi-step
integration methods [45, 46] and, at each integration step, a Newton-like method
is used to solve this nonlinear algebraic system (Appendix B). Therefore, from a
numerical point of view, the equations modeling a dynamic circuit are transformed
to equivalent linear equations at each iteration of the Newton method and at each
time instant of the time-domain analysis. Thus, we can say that the time-domain
analysis of a nonlinear dynamic circuit consists of the successive solutions of
many linear circuits approximating the original (nonlinear and dynamic) circuit at
specific operating points.
Consider a linear circuit with N+1 nodes and B voltage-controlled branches
(two-terminal resistors, independent current sources, and voltage-controlled
n-ports), the latter grouped in set B. We then introduce the source current vec-
tor iRB and the branch conductance matrix GRBB. By assuming that the
branches (one for each port) are ordered element by element, the matrix is block
diagonal: each 11 block corresponds to the conductance of a one-port and in
any case is nonzero, while nn blocks correspond to the conductance matrices of
voltage-controlled n-ports. More in detail, the diagonal entries of the nn blocks
can be zero and, in this case, the nonzero off-diagonal entries, on the same row or
column, correspond to voltage-controlled current sources (VCCSs). Now, consider
MNA and circuits embedding, besides voltage-controlled elements, independent
voltage sources, the remaining types of controlled sources and sources of process
variations. We split the set of branches B in two complementary subsets: BV of
voltage-controlled branches (v-branches) and BC of current-controlled branches
(c-branches).
Conventional nodal analysis (NA) is extended to MNA [46] as follows: currents
of c-branches are added as further unknowns and the corresponding branch equa-
tions are appended to the NA system. The NB incidence matrix A can be par-
titioned as A=[Av Ac], with AvRNBv and AcRNBc. As in conventional NA,
5.3 Stochastic MNA for Process Variability Analysis 105

constitutive relations of v-branches are written, using the conductance submatrix


GRBcBv in the form
iv = Gvv (5.5)
while the characteristics of the c-branches, including independent voltage sources
and controlled sources except VCCSs, are represented by the implicit equation
Bc vc + Rc ic + vc + Fc = 0 (5.6)
where Bc, Rc, FcRBcBc, vc=(ATvc)RBc [45] and RBc
is a random vector
accounting for device variations as defined in (5.4). These definitions are in agree-
ment with those adopted in the currently used simulators and suffice for a large
variety of circuits. Note that from the practical use perspective, a user may only
be interested in voltage variations over a period of time or in the worst case in a
period of time. This information can be obtained once the variations in any given
time instance are known. Using the above notations, (5.5) and (5.6) can be written
in the compact form as
F(q , q, t) + B(q, t) = 0 (5.7)
where q=[vc iv]T is the vector of stochastic processes which represents the state
variables (e.g., node voltages) of the circuit and is a vector of wide-sense station-
ary processes. B(q,t) is an NBc matrix, the entries of which are functions of the
state q and possibly t. Every column of B(q,t) corresponds to , and has normally
either one or two nonzero entries. The rows correspond to either a node equation
or a branch equation of an inductor or a voltage source. Equation(5.7) represents
a system of nonlinear SDE, which formulate a system of stochastic algebraic and
differential equations that describe the dynamics of the nonlinear circuit that lead
to the MNA equations when the random sources are set to zero. Solving (5.7)
means to determine the probability density function P of the random vector q(t)
at each time instant t. Formally, the probability density of the random variable q is
given as
P(q) = |(q)|N(h1 (q)|m, ) (5.8)
where |(q)| is the determinant of the Jacobian matrix of the inverse transform
h1(q) with h a nonlinear function of . However, generally it is not possible
to handle this distribution directly since it is non-Gaussian for all but linear h.
Therefore, it may be convenient to look for an approximation, which can be
found after partitioning the space of the stochastic source variables in a given
number of subdomains, and then solving the equation in each subdomain by
means of a piecewise-linear truncated Taylor approximation. If the subdomains
are small enough to consider the equation as linear in the range of variability of
, or that the nonlinearities in the subdomains are so smooth that they might be
considered as linear even for a wide range of , it is then possible to combine
the partial results and obtain the desired approximated solution to the original
problem.
106 5 BrainMachine Interface: System Optimization

Let x0=x(0,t) be the generic point around which to linearize, and with the
change of variable =xx0=[(qp0)T,(0)T]T, the first-order Taylor
piecewise-linearization of (5.7) in x0 yields
P(x0 ) + (K(x0 ) + P (x0 )) = 0 (5.9)
where K(x)=B(x), P(x)=F(x). Transient analysis requires only the solution of
the deterministic version of (5.7), e.g., by means of a conventional circuit simu-
lator, and of (5.9) with a method capable of dealing with linear SDE with sto-
chasticity that enters only through the initial conditions. Since (5.9) is a linear
homogeneous equation in , its solution, will always be proportional to 0. We
can rewrite (5.9) as
(x0 ) = E(x0 )0 + F(x0 )0 (5.10)
Equation(5.10) is a system of SDE which is linear in the narrow sense (right-hand
side is linear in and the coefficient matrix for the vector of variation sources is
independent of ) [47]. Since these stochastic processes have regular properties,
they can be considered as a family of classical problems for the individual sample
paths and be treated with the classical methods of the theory of linear SDE. By
expanding every element of (t) with
m

i (t) = (t)( 0 ) = ij (t)j (5.11)
j=1

for m elements of a vector . As long as j(t) is obtained, the expression for (t) is
known, so that the covariance matrix of the solution can be written as

= T (5.12)

Defining aj(t)=(a1j, a2j, , anj)T, Fj(t)=(F1j, F2j, , Fnj)T, the requirement for
(t) is

j (t) = E(t)j + F(t) (5.13)

Equation(5.13) is an ODE, which can be solved by a fast numerical method.

5.4Stochastic MNA for Noise Analysis

In addition to device variability, which sets the limitations of circuit designs in


terms of accuracy, linearity and timing, existence of electrical noise associated
with fundamental processes in integrated-circuit devices represents an elementary
limit on the performance of electronic circuits. The existence of electrical noise is
essentially due to the fact that electrical charge is not continuous, but is carried in
discrete amounts equal to the electron charge. The noise phenomena considered
5.4 Stochastic MNA for Noise Analysis 107

here are caused by the small current and voltage fluctuations, such as thermal,
shot, and flicker noise, that are generated within the integrated-circuit devices
themselves.
The noise performance of a circuit can be analyzed in terms of the small-signal
equivalent circuits by considering each of the uncorrelated noise sources in turn
and separately computing their contribution at the output. A nonlinear circuit is
assumed to have time-invariant (dc) large-signal excitations and time-invariant
steady-state large-signal waveforms and that both the noise sources and the noise
at the output are wide-sense stationary stochastic processes. Subsequently, the
nonlinear circuit is linearized around the fixed operating point to obtain a linear
time-invariant network for noise analysis. Implementation of this method based on
the interreciprocal adjoint network concept [48] results in a very efficient com-
putational technique for noise analysis, which is available in almost every circuit
simulator. Unfortunately, this method is only applicable to circuits with fixed oper-
ating points and is not appropriate for noise simulation of circuits with changing
bias conditions.
In a noise simulation method that uses linear periodically time-varying trans-
formations [49, 50], a nonlinear circuit is assumed to have periodic large-signal
excitations and periodic steady-state large-signal waveforms and that both the
noise sources and the noise at the output are cyclostationary stochastic processes.
Afterward, the nonlinear circuit is linearized around the periodic steady-state oper-
ating point to obtain a linear periodically time-varying network for noise analysis.
Nevertheless, this noise analysis technique is applicable to only a limited class of
nonlinear circuits with periodic excitations.
Noise simulation in time-domain has traditionally been based on the Monte
Carlo technique [51], where the circuit with the noise sources is simulated using
numerous transient analyzes with different sample paths of the noise sources.
Consequently, the probabilistic characteristics of noise are then calculated using
the data obtained in these simulations. However, accurately determining the
noise content requires a large number of simulations, so consequently, Monte
Carlo method becomes very cpu-time consuming if the chip becomes large.
Additionally, to accurately model shot and thermal noise sources, time-step in
transient analysis is limited to a very small value, making the simulation highly
inefficient.
In this section, we treat the noise as a nonstationary stochastic process, and
introduce an It system of SDE as a convenient way to represent such a pro-
cess. Recognizing that the variance-covariance matrix when backward Euler is
applied to such a matrix can be written in the continuous-time Lyapunov matrix
form, we then provide a numerical solution to such a set of linear time-varying
equations. We adapt model description as defined in [31], where thermal and shot
noise are expressed as delta-correlated noise processes having independent values
at every time point, modeled as modulated white noise processes. These noise pro-
cesses correspond to current noise sources which are included in the models of the
integrated-circuit devices. As numerical experiments suggest that both the conver-
gence and stability analyses of adaptive schemes for SDE extend to a number of
108 5 BrainMachine Interface: System Optimization

sophisticated methods which control different error measures, we follow the adap-
tation strategy, which can be viewed heuristically as a fixed time-step algorithm
applied to a time rescaled differential equation. Additionally, adaptation also con-
fers stability on algorithms constructed from explicit time-integrators, resulting in
better qualitative behavior than for fixed time-step counter-parts [52].
The inherent nature of white noise process differ fundamentally from a wide-
sense stationary stochastic process such as static manufacturing variability and
cannot be treated as an ODE using similar differential calculus as in Sect. 5.3.
The MNA formulation of the stochastic process that describes random influences,
which fluctuate rapidly and irregularly (i.e., white noise ) can be written as
F(r , r, t) + B(r, t) = 0 (5.14)
where r is the vector of stochastic processes which represents the state variables
(e.g., node voltages) of the circuit, is a vector of white Gaussian processes and
B(r,t) is a state and time dependent modulation of the vector of noise sources.
Since the magnitude of the noise content in a signal is much smaller in comparison
to the magnitude of the signal itself in any functional circuit, a system of nonlinear
SDE described in (5.14) can be piecewise-linearized under similar assumptions as
noted in Sect. 5.3. Including the noise content description, (2.10) can be expressed
in general form as
 (t) = E(t) + F(t) (5.15)
where =[(rr0)T,(0)T]T.We will interpret (5.15) as an Ito system of
SDE. Now rewriting (5.15) in the more natural differential form
d(t) = E(t)dt + F(t)dw (5.16)
where we substituted dw(t)=(t)dt with a vector of Wiener process w. If
the functions E(t) and F(t) are measurable and bounded on the time interval of
interest, there exists a unique solution for every initial value (t0) [47]. If is a
Gaussian stochastic process, then it is completely characterized by its mean and
correlation function. From Itos theorem on stochastic differentials
d((t)T (t))/dt = (t) d(T (t))/dt + d((t))/dt T (t) + F(t) F T (t)dt (5.17)
and expanding (5.17) with (5.16), noting that and dw are uncorrelated, vari-
ance-covariance matrix K(t) of (t) with the initial value K(0)=[ T] can be
expressed in differential Lyapunov matrix equation form as [47]
dK(t)/dt = E(t)K(t) + K(t)E T (t) + F(t)F T (t) (5.18)
Note that the mean of the noise variables is always zero for most integrated cir-
cuits. In view of the symmetry of K(t), (5.18) represents a system of linear ODE
with time-varying coefficients. To obtain a numerical solution, (5.18) has to be
discretized in time using a suitable scheme, such as any linear multi-step method,
or a Runge-Kutta method. For circuit simulation, implicit linear multi-step meth-
ods, and especially the trapezoidal method and the backward differentiation for-
mula were found to be most suitable [53]. If backward Euler is applied to (5.18),
5.4 Stochastic MNA for Noise Analysis 109

the differential Lyapunov matrix equation can be written in a special form referred
to as the continuous-time algebraic Lyapunov matrix equation
Pr K(tr ) + K(tr )PrT + Qr = 0 (5.19)
K(t) at time point tr is calculated by solving the system of linear equations in
(5.19). Such continuous-time Lyapunov equations have a unique solution K(t),
which is symmetric and positive semidefinite.
Several iterative techniques have been proposed for the solution of the alge-
braic Lyapunov matrix Eq.(5.19) arising in some specific problems where the
matrix Pr is large and sparse [5457], such as the BartelsStewart method [58],
and Hammarlings method [47], which remains the one and only reference for
directly computing the Cholesky factor of the solution K(tr) of (5.19) for small
to medium systems. For the backward stability analysis of the BartelsStewart
algorithm, see [59]. Extensions of these methods to generalized Lyapunov equa-
tions are described in [60]. In the Bartels-Stewart algorithm, first Pr is reduced
to upper Hessenberg form by means of Householder transformations, and then
the QR-algorithm is applied to the Hessenberg form to calculate the real Schur
decomposition [61] to transform (5.19) to a triangular system which can be solved
efficiently by forward or backward substitutions of the matrix Pr
S = U T Pr U (5.20)
where the real Schur form S is upper quasi-triangular and U is orthonormal. Our
formulation for the real case utilizes a similar scheme. The transformation matri-
ces are accumulated at each step to form U [58]. If we now set

K = U T K(tr )U
(5.21)
Q = U T Qr U

then (5.19) becomes


S K + KS T = Q (5.22)
To find unique solution, we partition (5.20) as
     
S1 s K1 k Q1 q
S=
0 n
K =
k T knn
Q =
qT qnn (5.23)

where S1, K1, Q1R(n1)(n1); s, k, qR(n1). The system in (5.20) then gives
three equations
(n + n )knn + qnn = 0 (5.24)

(S1 + n I)k + q + knn s = 0 (5.25)

S1 K1 + K1 S1T + Q1 + sk T + ksT = 0 (5.26)


110 5 BrainMachine Interface: System Optimization

knn can be obtained from (5.23) and set in (5.24) to solve for k. Once k is known,
(5.25) becomes a Lyapunov equation which has the same structure as (5.22) but of
order (n1), as
S1 K1 + K1 S1T = Q1 sk T ksT (5.27)
We can apply the same process to (5.26) until S1 is of the order 1. Note under
the condition that i=1,,n at the k-th step (k=1,2,,n) of this process, we can
obtain a unique solution vector of length (n+1k) and a reduced triangular
matrix equation of order (nk). Since U is orthonormal, once (5.22) is solved for
K , then K(tr) can be computed using
K(tr ) = U KU T (5.28)
Large dense Lyapunov equations can be solved by sign function based techniques
[61]. Krylov subspace methods, which are related to matrix polynomials have
been proposed [62] as well.
Relatively large sparse Lyapunov equations can be solved by iterative
approaches, e.g., [63]. Here, we apply a low rank version of the iterative method
[64], which is related to rational matrix functions. The postulated iteration for the
Lyapunov Eq.(5.19) is given by K(0)=0 and

(Pr + i In )Ki1/2 = Qr Ki1 (PrT i In )


(5.29)
(Pr + i In )KiT = Qr Ki1/2
T
(PrT i In )

for i=1,2, This method generates a sequence of matrices Ki which often con-
verges very fast toward the solution, provided that the iteration shift parameters i
are chosen (sub)optimally. For a more efficient implementation of the method, we
replace iterates by their Cholesky factors, i.e., Ki=LiLH
i and reformulate in terms
of the factors Li. The low rank Cholesky factors Li are not uniquely determined.
Different ways to generate them exist [64].
Note that the number of iteration steps imax needs not be fixed a priori.
However, if the Lyapunov equation should be solved as accurate as possible,
correct results are usually achieved for low values of stopping criteria which are
slightly larger than the machine precision.

5.5PPA Optimization of Multichannel Neural Recording


Interface

5.5.1Power Optimization

Random process variations have a major influence on the design parameters and
yield of the manufactured circuits. We define yield as the percentage of manufac-
tured circuits that meets all the specifications, considering process variations

(5.30)
5.5 PPA Optimization of Multichannel Neural Recording Interface 111

where E{.} is the expected value, and each vector d has an upper and lower
bound determined by the technological process variation pz with probability den-
sity function pdf(pz). The deterministic designable parameters dr, ,
e.g., bias voltages and currents, transistor widths and lengths, resistances, capaci-
tances, are denoted by the vector dD, where D is the designable parameter
space. Let the total area of the circuit be Atotal=k(xkAk), where A is the area of
a transistor or a discrete component (resistor or capacitor), k is an index that runs
over all transistors or a discrete components in the circuit and x is the sizing fac-
tor (x1). The optimization problem can then be formulated as the search for a
design point that minimizes the total power Ptotal 1cl over the deterministic
designable parameters d with lower bounds aj, and upper bounds bj, for 1jm
in the design space D, subject to a minimum yield requirement y with bound

(5.31)

Let D(Ptotal) be the compact set of all valid design variable vectors d, such that
Ptotal(d)=Ptotal. The designable parameter space D is assumed to be compact,
which for all practical purposes is no real restriction when the problem has a
finite minimum. The main advantage of this approach is its generality: it imposes
no restrictions on the distribution of p and on how the data enters the constraints.
We can approximately subdivide the algorithm into two steps; the yield fulfill-
ment, and the objective function optimization. If, as an approximation, we restrict
D(Ptotal) to just the one-best derivation of Ptotal, then we obtain the structured per-
ceptron algorithm [65]. As a consequence, given active constraints, including opti-
mum power budget and minimum frequency of operation, (5.31) can be effectively
solved by a sequence of minimizations of the feasible region with iteratively gen-
erated low-dimensional subspaces using a cutting plane method [66].
The statistical yield constrained problems require mechanisms for quantifying
the reliability associated with the resulting solution, and bounding the true optimal
value of the yield constraint problem (5.31). We define a reliable bound on prob-
ability Prob{ajdbj; 1jm) as the random quantity
 
 v
:= arg max{ :
r
r (1 )vr (5.32)
[0,1] r=0

where 1 is the required confidence level. Given a candidate solution


d(Ptotal), the probability Prob(d) is estimated as /v, where is the num-
ber of times the yield constraint is violated and m is the number of realizations
pz{p1,,pg}. Since the outlined procedure involves only the calculation
of quantities y, it can be performed with a large sample size v and, hence, the
112 5 BrainMachine Interface: System Optimization

feasibility of d can be evaluated with a high reliability, provided that the bound is
within realistic assumption.

5.5.2Power Per Area Optimization

The power optimization problem implicates varying the design point to optimize
power, subject to constraints of other, secondary performance measures, and
designable parameter boundaries. With a metric PPA, we quantify the minimum
power design that meets a targeted performance, while including the impact of
area scaling. The PPA metric depends on the process and operating conditions, cir-
cuit specification and the technologys VT option. We can express this multi-crite-
ria circuit performance optimization problem as

(5.33)

The PPA multi-criteria optimization problem is first translated into a min-max


problem [67]

(5.34)

The PPA value, at any design point, is converted into a performance score s and,
subsequently, score s is utilized to compute an overall index of circuit quality,
denoted by PPA (d;s), which is the objective function for the design optimization.
Accordingly, the constrained multi-criteria optimization is converted into an opti-
mization with a single objective function [67]. As a result, the general form of the
optimization problem becomes

(5.35)
5.5 PPA Optimization of Multichannel Neural Recording Interface 113

To start of the optimization problem, a design metric is initially selected, based


on the priority given to the power budget versus the performance function. If we
assume that (Ptotal, Ptotal,i)>0 for i{1,,N}, then the score s can be compactly
written as a set of nonlinear constraints

(5.36)
where is a combined feature representation of a performance function in a
given application. We replace each nonlinear inequality in (5.36) by |D|1 linear
inequalities

(5.37)

If the system of inequalities in (5.37) is feasible, typically, more than one solu-
tion d is possible. For a unique solution, we select d with ||d||1 for which s is
uniformly different from the next closest score update. The score update is than
expressed as dual quadratic program (QP)

(5.38)

where is the step size, the Lagrange multiplier imposing the constraint for
label ddi, and h(d) are the feature vectors of a design variable vector d. To find
the local maxima and minima, we repeatedly select a pair of derivatives of d and
optimize their dual (Lagrange) variables . The dual program formulation has
two main advantages over the primal QP; since dual program is determined only
by inner products defined by , it allows the usage of kernel functions, and addi-
tionally, the constraint matrix of the dual program supports problem decomposi-
tion. At the end of sequence, we average all the score vectors s obtained at each
iteration, similar to structured perceptron algorithm [65].

5.6Experimental Results

All the experimental results are carried out on a single processor Ubuntu Linux
9.10 system with Intel Core 2 Duo CPUs 2.66GHz processor and 6GB of
memory. The circuit netlist is simulated in Cadence Specter using 90nm CMOS
model files. The simulation date points are processed with a PERL script and
fed back into the MatLab code. The evaluated front-end neural recording inter-
face is illustrated in Fig.5.2. The test dataset (Fig.5.3a) is based on recordings
from the human neocortex and basal ganglia, however, the proposed optimization
114 5 BrainMachine Interface: System Optimization

T1 T2

Cin Cf clkin
Vin C
Vref A1 clock boosting
Gm2
C/ (A+1) A2
Cin AC Gm1 SAR
C S1
T3 logic
Cf

T4 VSS VSS R1 10b dig


AC R2 output

VSS VSS DAC


VSS

Fig.5.2Schematic of the front-end neural recording interface including LNA, band-pass filter,
PGA, and SAR A/D converter

framework is compatible with any Markov process deterministic neuron model. In


Fig.5.3b, we illustrate statistical voltage trace of a neuron signal composed of a
spike burst and biological noise.
The reduction of area for analog designs usually implies a trade-off, of which
the most common is an increase in noise. Fortunately, the interfaces input equiv-
alent noise voltage decreases as the gain across the amplifying stages increases
(Fig.5.3c), e.g., the ratio of the square of the signal power over its noise variance
can be expressed as SNR = F neural + electrode + i (j Gj )amp,i ], where
2 /[ 2 2 1 2

F is the total signal power, amp,i represents the variance of the noise added by
2

the ith amplification stage with gains Gj , electrode


2 is the variance of the electrode,
and neural is the variance of the biological neural noise. The lower bound on the
2

speed of the SAR ADC is primarily a function of the technologys gate delay and
kT/C noise multiplied by the number of SAR cycles necessary for one conversion.
The maximum resolution in SNR-bits of an SAR (for a given value of an effective
thermal resistance Reff, which sums together the effects of all noises, e.g., thermal,
shot, 1/f and input-referred noise) over 1/2 band (0fNeuronfs/2) is
 the full-Nyquist
than expressed as Nnoise = log2 VFS 2 / 6kTf R
s eff 1, where VFS is a full-scale
input signal and fs is the sampling frequency. The accuracy of the neural spike
classification in a backend signal processing unit directly increase with A/D con-
verter resolution, although it saturates beyond 56 bit resolution, ultimately lim-
ited by the SNR.
However, since the amplitude of the observed spike signals can vary, typically,
by one order of magnitude, additional resolution is needed (i.e., 23 bit), if the
amplification gain is fixed. Additionally, increasing the sampling rate of the A/D
converter improves spike sorting accuracy, since this captures finer features fur-
ther differentiating the signals. The PPA ratio differs for each design depending on
circuit characteristics, such as power consumption, bandwidth, gain, linearity, etc.
Closed form symbolic expressions of the constraints and the objective are passed
on to the optimization algorithm. Design heuristics are used to provide a good ini-
tial starting point. The total run-time of the optimization method is only dozens
5.6 Experimental Results 115

(a) raw trace


1

Amplitude
0.5
0
-0.5
-1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
zoom in

1
Amplitude

0.5
0
-0.5
-1

380 400 420 440 460 480 500 520 540 560
Time [ms]
(b)
20
Membrane potential [mV]

-20

-40

-60

0 5 10 15 20 25
Time [ms]
(c) 0

-50
dB

-100

-150
1 2 3 4 5
10 10 10 10 10
Frequency [Hz]

Fig.5.3a The test dataset, the y axis is arbitrary; a top raw signal after amplification, not cor-
rected for gain, b bottom zoom in of the raw signal, and c Spectral signature of SAR A/D con-
verter-two tone test; black area spectral content with nominal gain, gray area spectra with 20%
gain reduction, equivalent to 4 LSB loss in the dynamic range (IEEE 2015)

of seconds, and the number of iterations required to reach the stopping criterion
never exceeds 6 throughout the entire simulated range (from 103 to 101).
The design trade-off exploration space for circuit area, sample frequency and
PPA is illustrated in Fig.5.4a. The area and sample frequency curves are plotted
for the worst-case design (WCD), and the proposed quadratic program optimized
116 5 BrainMachine Interface: System Optimization

(a) design trade-off


1.4

1.3

1.2

1.1
Relative Area
1

0.9 QPO WCD


0.8

0.7

0.6

0.5
0.4
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Relative 1/fs

(b) 4

3.5

2.5
PPA

1.5 tolerance
box

1
optimal
0.5 yield box

0
0 0.5 1 1.5 2 2.5 3 3.5 4
Relative 1/fs

Fig.5.4a Area, sampling frequency and PPA trade-off for neural recording channel optimized
with quadratic programming (QPO) and worst-case design (WCD). The iso-PPA is shown as an
overlay (IEEE 2015), and b optimized PPA versus relative sampling frequency

approach (QPO). The normalized PPA ratio of the design is represented at the
intersection with the area-sample frequency curves. For a given circuit area, the
optimized design obtains higher performance than the corresponding WCD. The
points lying on the lowest intersections are most power efficient for the given input
and output constraints, and represent the PPA curve of interest. With the same
yield constraints, the optimization produces uniformly better optimum signal band-
width curves for a given power. The improvement is determined by the underly-
ing structure of physical process variation. If the amount of uncorrelated variability
increases, i.e., the intra-chip variation increases in comparison with the chip-to-
chip variation, the feasible yield facilitated by optimization increases. Similarly, to
maintain a constant power efficiency as area is reduced, the circuit noise and the
current and voltage efficiencies need to be held constant. The power consumption
of the neural interface front-end increases linearly with sampling frequency.
5.6 Experimental Results 117

(a)
0.7

0.6
Area
0.5
m D2
(g /I )

0.4

0.3

0.2

0.1
0.1 0.2 0.3 0.4 0.5 0.6 0.7
(g /I )
m D1

(b) 4
power-gain trade-off

3.5

2.5
ref

optimal
2
P/P

PPA

1.5

0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Relative Gain

(c) 4
power-area trade-off

3.5

2.5
ref

2
P/P

1.5

1 optimal
PPA
0.5

0
0 0.5 1 1.5 2 2.5 3
Relative Area

Fig.5.5a Two stages gm/ID versus constant gain (plain), constant area (plain hyperbolic), and
constant current (dashed elliptic) contours, b normalized contours showing optimal power per
area (PPA) versus relative gain (IEEE 2015), and c normalized contours showing optimal
power per area (PPA) versus relative area

Normalized contours showing optimal PPA versus relative sampling frequency,


the tolerance box for design constrains involved, and the tolerance box with the
optimal yield in the feasible region are shown in Fig.5.4b. If the design variable
variation can be controlled in such a way that the tolerance box is reduced to that
of the inner optimal yield box, the yield increases to 100%.
118 5 BrainMachine Interface: System Optimization

Table5.1Summary of the algorithm performance with 99% yield


Area PPA Ptotal/ SNR (100Hz10kHz) [dB]/
channel channel
[W]
Design WCD QPO WCD QPO WCD QPO WCD QPO
[mm2] rel. rel. slow, rel. slow, rel.
nom, fast nom, fast
[W] [dB]
LNA 0.096 0.86 1 0.86 7.12, 0.81 57.44, 1.18
7.15, 7.16 59.65,
61.22
LPF 0.052 0.78 1 0.82 8.64, 0.74 56.23, 1.21
8.84, 8.94 57.76,
58.44
HPF 0.066 0.85 1 0.84 5.47, 0.82 55.86, 1.19
5.65, 5.71 57.69,
58.55
PGA 0.058 0.91 1 0.92 9.56, 0.79 58.54, 1.23
9.76, 9.82 59.34,
60.26
SARcomp 0.036 0.86 1 0.91 3.14, 0.83 55.46, 1.24
3.21, 3.24 57.52,
58.21
SARDAC 0.074 0.92 1 0.96 3.56, 0.87 57.21, 1.19
3.69, 3.72 59.67,
60.93
SARlogic 0.042 0.81 1 0.87 4.52, 0.81 61.94, 1.25
4.56, 4.57 63.21,
64.32
Total 0.424 0.76 1 0.81 42.01, 0.82 54.76, 1.16
42.86, 56.21,
43.16 57.48
Average (relative) 0.84 1 0.87 0.81

The constant power, area, and gain contours for two gain stages are illustrated
in Fig.5.5a. The total area is shown as the hyperbolic-shaped contour, while ellip-
tic contours define the total current, IDtotal. Large transistor bias point (gm/ID)
corresponds to more current and smaller transistors. Contrasting, if we decrease
the current, the gain (due to larger gm/ID), and the total area increase. The plot in
Fig.5.5b illustrates the position of the optimal PPA versus relative (given) gain.
Consumed power in neural interface gain stages increase proportionally with gain
increase.
Typically, desired high gm is obtained at the cost of an increased bias current
(increased power) or area (wide transistors). However, for very short channel the
carrier velocity quickly reaches the saturation limit at which the gm also saturates,
becoming independent of gate length or bias. The intrinsic gain degradation can
be alleviated with open-loop residue amplifiers [68], comparator-based switched
5.6 Experimental Results 119

capacitor circuits [69], and correlated level shifting [70]. The plot in Fig.5.5c
illustrates the position of the optimal PPA under maximum yield reference design
point versus relative area. The offset and the static accuracy critically depend on
the matching between nominally identical devices. This error, however, typically
decreases as the area of devices increases. Several rules exist [71] to ensure suf-
ficient matching; the matched devices should have the same structure and the sur-
roundings in the layout, use the same materials, have the same orientation and
temperature, and the distance between matched devices should be minimum.
In Table5.1, the worst-case design (WCD) is compared across the neural inter-
face circuits with the optimization approach. The QP optimized circuits allow
large area reduction when designed for maximum WCD frequency ranging from
9 to 19%, with 16% on average. When operating at the same frequency, the opti-
mized total power is reduced up to 21%. The optimization space in symmetrical
circuits is restricted and, consequently, the additional power saving obtained by an
optimization is limited, particularly with the higher yield.
For decreased yield, 95% instead of 99%, higher power saving of up to 32%
on average can be achieved as a consequence of a larger optimization space (not
shown in Table5.1). Note that over-dimensioning in a case of higher yield, leads
to a larger area and higher power consumption. As yield increases when tolerance
decreases, an agreeable trade-off needs to exist between increase in yield and the
cost of design and manufacturing. Consequently, continuous observation of pro-
cess variation and thermal monitoring becomes a necessity [72]. The observed cir-
cuits power consumption scales with its bandwidth and SNR. The limit on power
dissipated can be expressed as (8kT)f(SNR), where f is an increasing function
of SNR [73]. Additionally, the interface input to the neural system is subject to
external noise, which can be represented by an effective temperature. Reducing
noise to improve signal processing requires larger numbers of receptors, channels,
or neurons, requiring additional power resources [74].

5.7Conclusions

Integrated neural implants interface with the brain using biocompatible elec-
trodes to provide high yield cell recordings, large channel counts, and access to
spike data and/or field potentials with high signal-to-noise ratio. Rapid advances in
computational capabilities, design tools, and biocompatible electrodes fabrication
techniques allow for the development of neural prostheses capable of interfacing
with single neurons and neuronal networks. The miniaturization of the functional
blocks in neural recording interface, however, presents significant circuit design
challenges in terms of noise, area, power, and the reliability of the recording sys-
tem. In this chapter, we develop a yield constrained sequential PPA minimization
framework that is applied to a multivariable optimization in a neural record-
ing interface. By limiting over-dimensioning of the circuit, the proposed method
achieves consistently a better PPA ratio over the entire range of neural recording
120 5 BrainMachine Interface: System Optimization

interface circuits, with no loss of circuit performance. Our approach can be used
with any variability model and is not restricted to any particular performance
constraint. As the experimental results in CMOS 90nm technology indicate, the
suggested numerical methods provide accurate and efficient solutions of the PPA
optimization problem offering yield up to 26% power savings and up to 22% area
reduction, without penalties.

References

1. G. Buzsaki, Large-scale recording of neuronal ensembles. Nat. Neurosci. 7, 446451 (2004)


2. F.A. Mussa-Ivaldi, L.E. Miller, Brain-machine interfaces: Computational demands and clini-
cal needs meet basic neuroscience. Trends Neurosci. 26(6), 329334 (2003)
3. M. Mollazadeh, K. Murari, G. Cauwenberghs, N. Thakor, Micropower CMOS-integrated
low-noise amplification, filtering, and digitization of multimodal neuropotentials. IEEE
Trans. Biomed. Circ. Syst. 3(1), 110 (2009)
4. A.M. Sodagar etal., An implantable 64-channel wireless microsystem for single-unit neural
recording. IEEE J. Solid-State Circuits 44(9), 25912604 (2009)
5. B.K. Thurgood etal., A wireless integrated circuit for 100-channel charge-balanced neural
stimulation. IEEE Trans. Biomed. Circuits Syst. 3(6), 405414 (2009)
6. S. Kim, R. Normann, R. Harrison, F. Solzbacher, Preliminary study of the thermal impact
of a microelectrode array implanted in the brain, in Proceedings of IEEE International
Conference of Engineering in Medicine and Biology Society (2006), pp. 29862989
7. X. Zou etal., A 100-channel 1-mW implantable neural recording IC. IEEE Trans. Circuits
Syst. I. Regul. Pap. 60(10), 25842596 (2013)
8. C. Chae etal., A 128-channel 6 mw wireless neural recording IC with spike feature extrac-
tion and UWB transmitter. IEEE Trans. Neural Syst. Rehabil. Eng. 17(4), 312321 (2009)
9. R.F. Yazicioglu etal., A 200W eight-channel EEG acquisition ASIC for ambulatory EEG
systems. ieee international solid-state circuits conference digest of technical papers (2008),
pp. 164165
10. J. Lee, H.-G. Rhew, D.R. Kipke, M.P. Flynn, A 64 channel programmable closed-loop neuro-
stimulator with 8 channel neural amplifier and logarithmic ADC. IEEE J. Solid-State Circuits
45(9), 19351945 (2010)
11. X.D. Zou etal., A 1-V 450-nW fully integrated programmable biomedical sensor interface
chip. IEEE J. Solid-State Circuits 44, 10671077 (2009)
12. R. Brodersen etal., Methods for true power minimization, in Proceedings of IEEE
International Conference on Computer-Aided Design (2002), pp. 3542
13. A. Bhavnagarwala, B. Austin, K. Bowman, J.D. Meindl, A minimum total power methodol-
ogy for projecting limits on CMOS GSI. IEEE Trans. Very Large Integration (VLSI) Syst.
8(6), 235251 (2000)
14. G. Yu, P. Li, Yield-aware hierarchical optimization of large analog integrated circuits, in
Proceedings of IEEE International Conference on Computer-Aided Design (2008), pp. 7984
15. F. Schenkel, etal., Mismatch analysis and direct yield optimization by specwise linearization
and feasibility-guided search, in Proceedings of IEEE Design Automation Conference, pp.
858863 (2001)
16. T. Mukherjee, L.R. Carley, R.A. Rutenbar, Efficient handling of operating range and manu-
facturing line variations in analog cell synthesis. IEEE Trans. Comput. Aided Des. Integr.
Circuits Syst. 19(8), 825839 (2000)
17. A. Zjajo, N. van der Meijs, R. van Leuken, Statistical power optimization of deep-submicron
digital CMOS circuits based on structured perceptron, in Proceedings of IEEE International
Conference on Integrated Circuits (2014), pp. 9598
References 121

18. S. Seth, B. Murmann, Design and optimization of continuous-time filters using geometric
programming, in Proceedings of IEEE International Symposium on Circuits and Systems
(2014), pp. 20892092
19. A. Zjajo, C. Galuzzi, R. van Leuken, Sequential power per area optimization of multichan-
nel neural recording interface based on dual quadratic programming, in Proceedings of IEEE
International Conference on Neural Engineering (2015), pp. 912
20. M. Grigoriu, On the spectral representation method in simulation. Probab. Eng. Mech. 8,
7590 (1993)
21. M. Love, Probability Theory (D. Van Nostrand Company Inc., Princeton, 1960)
22. R. Ghanem, P.D. Spanos, Stochastic Finite Element: A Spectral Approach (Springer, Berlin,
1991)
23. P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, C. Spanos, Modeling within-die spatial
correlation effects for process-design co-optimization, in Proceedings of IEEE International
Symposium on Quality of Electronic Design (2005), pp. 516521
24. J. Xiong, V. Zolotov, L. He, Robust extraction of spatial correlation, in Proceedings of IEEE
International Symposium on Physical Design (2006), pp. 29
25. A. Hodgkin, A. Huxley, A quantitative description of membrane current and its application to
conduction and excitation in nerve. J. Physiol. 117, 500544 (1952)
26. R.F. Fox, Y.-N. Lu, Emergent collective behavior in large numbers of globally coupled inde-
pendently stochastic ion channels. Phys. Rev. E. 49, 34213431 (1994)
27. A. Saarinen, M.L. Linne, O. Yli-Harja, Stochastic differential equation model for cerebellar
granule cell excitability. PLoS Comput. Biol. 4(2), 111 (2008)
28. A.C. West, J. Newman, Current distributions on recessed electrodes. J. Electrochem. Soc.
138(6), 16201625 (1991)
29. Z. Yang, Q. Zhao, E. Keefer, W. Liu, Noise characterization, modeling, and reduction for in
vivo neural recording, in Advances in Neural Information Processing Systems (2010), pp.
21602168
30. P.R. Gray, R.G. Meyer, Analysis and Design of Analog Integrated Circuits (Wiley, New York,
1984)
31. A. Demir, E. Liu, A. Sangiovanni-Vincentelli, Time-domain non-Monte Carlo noise simu-
lation for nonlinear dynamic circuits with arbitrary excitations, in Proceedings of IEEE
International Conference on Computer-Aided Design (1994), pp. 598603
32. J.H. Fischer, Noise sources and calculation techniques for switched capacitor filters. IEEE J.
Solid-State Circuits 17(4), 742752 (1982)
33. T. Sepke, P. Holloway, C.G. Sodini, H.-S. Lee, Noise analysis for comparator-based circuits.
IEEE Trans. Circuits Syst. I 56(3), 541553 (2009)
34. C. Michael, M. Ismail, Statistical Modeling for Computer-Aided Design of MOS VLSI
Circuits (Kluwer, Boston, 1993)
35. H. Zhang, Y. Zhao, A. Doboli, ALAMO: an improved -space based methodology for
modeling process parameter variations in analog circuits, in Proceedings of IEEE Design,
Automation and Test in Europe Conference (2006), pp. 156161
36. M. Pelgrom, A. Duinmaijer, A. Welbers, Matching properties of MOS transistors. IEEE J.
Solid-State Circuits 24(5), 14331439 (1989)
37. R. Lpez-Ahumada, R. Rodrguez-Macas, FASTEST: a tool for a complete and efficient sta-
tistical evaluation of analog circuits, dc analysis. in Analog Integrated Circuits and Signal
Processing, vol 29, no 3 (Kluwer Academic Publishers, The Netherlands, 2001), pp. 201212
38. G. Biagetti, S. Orcioni, C. Turchetti, P. Crippa, M. Alessandrini, SiSMA-a statistical simu-
lator for mismatch analysis of MOS ICs, in Proceedings of IEEE/ACM International
Conference on Computer-Aided Design (2002), pp. 490496
39. B. De Smedt, G. Gielen, WATSON: design space boundary exploration and model generation
for analogue and RF IC design. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 22(2),
213224 (2003)
122 5 BrainMachine Interface: System Optimization

40. B. Linares-Barranco, T. Serrano-Gotarredona, On an efficient CAD implementation of


the distance term in Pelgroms mismatch model. IEEE Trans. Comput. Aided Des. Integr.
Circuits Syst. 26(8), 15341538 (2007)
41. J. Kim, J. Ren, M.A. Horowitz, Stochastic steady-state and ac analyses of mixed-signal sys-
tems, in Proceedings of IEEE Design Automation Conference (2009), pp. 376381
42. A. Zjajo, J. Pineda de Gyvez, Analog automatic test pattern generation for quasi-static struc-
tural test. IEEE Trans. Very Large Scale Integr. VLSI Syst. 17(10), 13831391 (2009)
43. N. Mi, J. Fan, S.X.-D. Tan, Y. Cai, X. Hong, Statistical analysis of on-chip power delivery
networks considering lognormal leakage current variations with spatial correlation. IEEE
Trans. Circuits Syst. I. Regul. Pap. 55(7), 20642075 (2008)
44. E. Felt, S. Zanella, C. Guardiani, A. Sangiovanni-Vincentelli, Hierarchical statistical char-
acterization of mixed-signal circuits using behavioral modeling, in Proceedings of IEEE
International Conference on Computer-Aided Design (1996), pp. 374380
45. J. Vlach, K. Singhal, Computer Methods for Circuit Analysis and Design (Van Nostrand
Reinhold, New York, 1983)
46. L.O. Chua, C.A. Desoer, E.S. Kuh, Linear and Nonlinear Circuits (Mc Graw-Hill, New York,
1987)
47. L. Arnold, Stochastic Differential Equations: Theory and Application (Wiley, New York,
1974)
48. R. Rohrer, L. Nagel, R.G. Meyer, L. Weber, Computationally efficient electronic-circuit noise
calculations. IEEE J. Solid-State Circuits 6, 204213 (1971)
49. C.D. Hull, R.G. Meyer, A systematic approach to the analysis of noise in mixers. IEEE
Trans. Circuits Syst. I. Regul. Pap. 40, 909919 (1993)
50. M. Okumura, H. Tanimoto, T. Itakura, T. Sugawara, Numerical noise analysis for nonlinear
circuits with a periodic large signal excitation including cyclostationary noise sources. IEEE
Trans. Circuits Syst. I. Regul. Pap. 40, 581590 (1993). Sept
51. P. Bolcato, R. Poujois, A new approach for noise simulation in transient analysis, in
Proceedings of IEEE International Symposium on Circuits and Systems (1992)
52. J.-M. Sanz-Serna, Numerical ordinary differential equations versus dynamical systems,
in The Dynamics of Numerics and the Numerics of Dynamics, ed. by D.S. Broomhead, A.
Iserles (Clarendon Press, Oxford, 1992)
53. A. Sangiovanni-Vincentelli, Circuit simulation. in Computer Design Aids for VLSI Circuits
(Sijthoff and Noordhoff, The Netherlands, 1980)
54. P. Heydari, M. Pedram, Model-order reduction using variational balanced truncation with
spectral shaping. IEEE Trans. Circuits Syst. I. Regul. Pap. 53(4), 879891 (2006)
55. M. Di Marco, M. Forti, M. Grazzini, P. Nistri, L. Pancioni, Lyapunov method and conver-
gence of the full-range model of CNNs. IEEE Trans. Circuits Syst. I. Regul. Pap. 55(11),
35283541 (2008)
56. K.H. Lim, K.P. Seng, L.-M. Ang, S.W. Chin, Lyapunov theory-based multilayered neural net-
work. IEEE Trans. Circuits Syst. II Express Briefs 56(4), 305309 (2009)
57. X. Liu, Stability analysis of switched positive systems: a switched linear copositive
Lyapunov function method. IEEE Trans. Circuits Syst. II Express Briefs 56(5), 414418
(2009)
58. R.H. Bartels, G.W. Stewart, Solution of the matrix equation AX+XB=C. Commun.
Assoc. Comput. Mach. 15, 820826 (1972)
59. N.J. Higham, Perturbation theory and backward error for AXXB=C. BIT Numer. Math.
33, 124136 (1993)
60. T. Penzl, Numerical solution of generalized Lyapunov equations. Adv. Comput. Math. 8,
3348 (1998)
61. G.H. Golub, C.F. van Loan, Matrix Computations (Johns Hopkins University Press,
Baltimore, 1996)
62. I. Jaimoukha, E. Kasenally, Krylov subspace methods for solving large Lyapunov equations.
SIAM J. Numer. Anal. 31, 227251 (1994)
References 123

63. E. Wachspress, Iterative solution of the Lyapunov matrix equation. Appl. Math. Lett. 1,
8790 (1998)
64. J. Li, F. Wang, J. White, An efficient Lyapunov equation-based approach for generat-
ing reduced-order models of interconnect, in Proceedings of IEEE Design Automation
Conference (1999), pp. 16
65. Y. Freund, R.E. Schapire, Large margin classification using the perceptron algorithm. Mach.
Learn. 37, 277296 (1999)
66. I. Tsochantaridis, T. Hofmann, T. Joachims, Y. Altun, Support vector machine learning for
interdependent and structured output spaces, in International Conference on Machine
Learning (2004), pp. 18
67. A. Dharchoudbury, S.M. Kang, Worst-case analysis and optimization of VLSI circuits perfor-
mances. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 14(4), 481492 (1995)
68. B. Murmann, B.E. Boser, A 12-bit 75-ms/s pipelined ADC using open-loop residue amplifi-
cation. IEEE J. Solid-State Circuits 38(12), 20402050 (2003)
69. T. Sepke etal., Comparator-based switched-capacitor circuits for scaled CMOS technologies,
in IEEE International Solid-State Circuit Conference Digest of Technical Papers (2006), pp.
220221
70. B.R. Gregoire, U.-K. Moon, An over-60db true rail-to-rail performance using correlated
level shifting and an opamp with 30db loop gain, in IEEE International Solid-State Circuit
Conference Digest of Technical Papers (2008), pp. 540541
71. A. Zjajo, J. Pineda de Gyvez, Low-Power High-Resolution Analog to Digital Converters
(Springer, New York, 2011)
72. A. Zjajo, M.J. Barragan, J. Pineda de Gyvez, Low-power die-level process variation and tem-
perature monitors for yield analysis and optimization in deep-submicron CMOS. IEEE Trans.
Instrum. Meas. 61(8), 22122221 (2012)
73. E.A. Vittoz, Future of analog in the VLSI environment, in Proceedings of IEEE International
Symposium on Circuits and Systems (1990), pp. 13721375
74. J.E. Niven, S.B. Laughlin, Energy limitation as a selective pressure on the evolution of sen-
sory systems. J. Exp. Biol. 211(11), 17921804 (2008)
Chapter 6
Conclusions

AbstractThe healthcare or health-assisting devices, as well as medical care


enabled by these devices will enable a level of unprecedented care during each
persons life. Continuous monitoring of physiological parameters (e.g., the
monitoring of stress and emotion, personal psychological analysis) enabled by
brainmachine interface circuits is not only beneficial for chronic diseases, but
for detection of the onset of a medical condition and the preventive or therapeu-
tic measures. Long-term data collection also assists a more exact diagnosis. For
non-chronic illnesses, it can assist rehabilitation of patients. It is expected that
this new biomedical devices will be able to enhance our sensing ability, and can
also provide prosthetic functions (e.g., cochlear implants, artificial retina, motor
functions). In this book, this problem is addressed at various abstraction levels,
i.e., circuit level and system level. It therefore provides a broad view on the vari-
ous solutions that have to be used and their possible combination in very effective
complementary techniques.

6.1Summary of the Results

Continuous monitoring of physiological parameters (e.g., the monitoring of stress


and emotion, personal psychological analysis) enabled by brainmachine interface
circuits is not only beneficial for chronic diseases, but for detection of the onset
of a medical condition and the preventive or therapeutic measures. It is expected
that the combination of ultralow power sensor- and ultralow power wireless com-
munication technology will enable new biomedical devices that will be able to
enhance our sensing ability, and can also provide prosthetic functions (e.g., coch-
lear implants, artificial retina, motor functions). Minimally invasive monitoring
of the electrical activity of specific brain areas using implantable microsystems
offers the promise of diagnosing brain diseases, as well as detecting and identi-
fying neural patterns which are specific to behavioral phenomenon. Practical
multi-channel BMI systems are combined with CMOS electronics for long-term

Springer International Publishing Switzerland 2016 125


A. Zjajo, Brain-Machine Interface, DOI10.1007/978-3-319-31541-6_6
126 6Conclusions

and reliable recording and conditioning of intracortical neural signals, on-chip


processing of the recorded neural data, and stimulating the nervous system in a
closed-loop framework. To evade the risk of infection, these systems are implanted
under the skin, while the recorded neural signals and the power required for the
implant operation is transmitted wirelessly. This migration, to allow proxim-
ity between electrodes and circuitry, and the increasing density in multi-channel
electrode arrays, are, however, creating significant design challenges in respect to
circuit miniaturization and power dissipation reduction of the recording system.
Furthermore, the space to host the system is restricted to ensure minimal tissue
damage and tissue displacement during implantation.
In this book, this design problem is addressed at various abstraction levels,
i.e., circuit level and system level. It therefore provides a broad view on the vari-
ous solutions that have to be used and their possible combination in very effective
complementary techniques. Technology scaling, circuit topologies, architecture
trends, (post-silicon) circuit optimization algorithms and yield-constrained, power
per area minimization framework specifically target power performance trade-off,
from the spatial resolution (i.e., number of channels), feasible wireless data band-
width and information quality to the delivered power of implantable batteries.
The limited total power budget imposes strict specifications on the circuit
design of the low-noise analog front-end and high-speed circuits in the wide-
band wireless link, which transmits the recorded data to a base station located
outside the skull. The design constraints are more pronounced when the number
of recording sites increases to several hundred for typical multielectrode arrays.
As described in Chap.2, front-end neural amplifiers are crucial building blocks
in implantable cortical microsystems. Low-power and low-noise operation, stable
dc interface with the sensors (microprobes), and small silicon area are the main
design specifications of these amplifiers. The power dissipation is dictated by
the tolerable input-referred thermal noise of the amplifier, where the trade-off is
expressed in terms of noise efficiency factor. For an ideal thermal-noise-limited
amplifier with a constant bandwidth and supply voltage, the power of the ampli-
fier scales as 1/v2n where vn is the input-referred noise of the amplifier. This rela-
tionship shows the steep power cost of achieving low-noise performance in an
amplifier. We introduce a novel, low-power neural recording interface system
with capacitive feedback low noise amplifier and capacitive attenuation band-
pass filter. The capacitive feedback amplifier offers low-offset and low-distortion
solution with optimal power noise trade-off. Similarly, the capacitive attenua-
tion band-pass filter provides wide tuning range and low-power realization, while
allowing simple extension of the transconductors linear range, and consequently,
ensuring low harmonic distortion. The low noise amplifier and band-pass filter
circuit are realized in a 65nm CMOS technology, and consumes 1.15W and
390nW, respectively. The fully differential low-noise amplifier achieves 40dB
closed-loop gain, and occupies an area of 0.04mm2. Input-referred noise is
3.1Vrms over the operating bandwidth 0.120kHz. Distortion is below 2% total
harmonic distortion (THD) for typical extracellular neural signals (smaller than
10mV peak-to-peak). The capacitive attenuation band-pass filter with first-order
6.1 Summary of the Results 127

slopes achieves 65dB dynamic range, 210mVrms at 2% THD and 140Vrms


total integrated output noise.
For any portable or implantable device, microelectrode arrays require min-
iature electronics locally to amplify the weak neural signals, filter out noise and
out-of-band interference and digitize for transmission. A single-channel or a multi-
channel integrated neural amplifiers and A/D converters provide the front line
interface between recording electrode and signal conditioning circuits, and thus
face critical performance requirements. In Chap.3, we present voltage-, current-
and time-domain analog-to-digital converter, and we evaluate trade-off between
noise, speed, and power dissipation and characterize the noise fluctuations on a
circuit-architecture level. This approach provides key insight required to address
SNR, response time, and linearity of the physical electronic interface. Presented
voltage-domain SAR A/D converter combines the functionalities of programma-
ble-gain stage and analog-to-digital conversion, occupies an area of 0.028mm2,
and consumes 1.1W of power at 100kS/s sampling rate. The power consump-
tion of the current-mode SAR ADC is scaled with the input current level making
the current-mode A/D converter suitable for low energy signals, achieving the fig-
ure of merit of the 14fJ/conversion-step, and THD of 63.4dB at40 kS/s sampling
frequency. The circuit consumes only 0.37W, and occupy an area of 0.012mm2
in a 65nm CMOS technology. A time-based A/D converter consumes less than
2.7W of power when operating at 640kS/s sampling frequency. With 6.2fJ/con-
version-step, the circuit realized in 90nm CMOS technology exhibits one of the
best FoM reported, and occupy an estimated area of only 0.022mm2.
Recording electrodes implanted into relevant cortical regions record very fre-
quently the action potentials from multiple surrounding neurons (e.g., due to the
background activity of other neurons, slight perturbations in electrode position or
external electrical or mechanical interference, etc.). Consequently, the recorded
waveform/spikes consist of the superimposed potentials fired from these neurons.
The ability to distinguish spikes from noise, and to distinguish spikes from differ-
ent sources from the superimposed waveform, therefore depends on both the dis-
crepancies between the noise-free spikes from each source and the signal-to-noise
level in the recording system. In Chap.4, we present a 128-channel, programmable,
neural spike classifier based on nonlinear energy operator spike detection, and mul-
ticlass kernel support vector machine classification that is able to accurately identify
overlapping neural spikes even for low SNR. For efficient algorithm execution, we
transform the multiclass problem with the Keslers construction and extend itera-
tive greedy optimization reduced set vectors approach with a cascaded method. The
power-efficient, multi-channel clustering is achieved by a combination of the several
algorithm and circuit techniques, namely the Keslers transformation, a boosted cas-
cade reduced set vectors approach, a two-stage pipeline processing units, the power-
scalable kernels, the register-bank memory, a high-VT devices, and a near-threshold
supply. The results obtained in a 65nm CMOS technology show that an efficient,
large-scale neural spike data classification can be obtained with a low power (less
than 41W, corresponding to a 15.5W/mm2 of power density), compact, and a
low resource usage structure (31k logic gates resulting in a 2.64mm2 area).
128 6Conclusions

System optimization, architecture trends, technology scaling, circuit topolo-


gies, and (post-silicon) circuit optimization algorithms specifically target power
performance trade-off, from the spatial resolution (i.e., number of channels), fea-
sible wireless data bandwidth and information quality to the delivered power of
implantable batteries. In Chap.5, we develop a yield constrained sequential power
per area (PPA) minimization framework based on dual quadratic program that is
applied to multivariable optimization in neural interface design under bounded
process variation influences. In the proposed algorithm, we create a sequence of
minimizations of the feasible PPA regions with iteratively generated low-dimen-
sional subspaces, while accounting for the impact of area scaling. With a two-step
estimation flow, the constrained multi-criteria optimization is converted into an
optimization with a single objective function, and repeated estimation of noncriti-
cal solutions are evaded. Consequently, the yield constraint is only active as the
optimization concludes, eliminating the problem of overdesign in the worst-case
approach. The PPA assignment is interleaved, at any design point, with the config-
uration selection, which optimally redistributes the overall index of circuit quality
to minimize the total PPA ratio. The proposed method can be used with any varia-
bility model and, subsequently, any correlation model, and is not restricted by any
particular performance constraint. The experimental results, obtained on the mul-
tichannel neural recording interface circuits implemented in 90nm CMOS tech-
nology, demonstrate power savings of up to 26% and area of up to 22%, without
yield penalty.

6.2Recommendations and Future Research

Best way to predict the future is to invent it. Medicine in the twentieth century
relied primarily on pharmaceuticals that could chemically alter the action of neu-
rons or other cells in the body, but twenty-first century health care may be defined
more by electroceuticals: novel treatments that will use pulses of electricity to
regulate the activity of neurons, or devices that interface directly with our nerves.
Systems such as brain machine interface detect the voltage changes in the brain
that occur when neurons fire to trigger a thought or an action, and they translate
those signal into digital information that is conveyed to the machine, e.g., pros-
thetic limb, speech prosthesis, a wheelchair.
To help accomplish specific tasks a hybrid BMI could be build that combines
brain signals with input from other sensors. Sensors exist or are in the works that
can observe eye movement, breath, sweat, gaze, facial expressions, heart rate,
muscle movements, and sleep patterns, as well as the ambient temperature and air
quality. For example, an eye-tracking sensor follows the subjects gaze to locate
the target object, and ECoG sensor record brain activity while the subject reaches
toward that target. A computer analyzes the brain activity associated with the
subjects arm movement and sends a command to a robotic arm; with the help of
depth sensor, the arm reaches out and grabs the object. If a prosthetic limb has
6.2 Recommendations and Future Research 129

sensors that register when it touches an object, it could in principle send that sen-
sory feedback to a patient by stimulating the brain though the ECoG electrodes.
Consequently, a two-way communication between brain and prosthesis can be
used to help a user deftly control the limb.
What would it take to build a hybrid BMI? First, we need to improve our
recording hardware. Todays systems use only a few dozen electrodes on the cor-
tex; clearly, a much higher density of electrodes would produce a better signal.
We need a suite of sensors, possibly with a wearable gadget/clothing that moni-
tors, stimulates, and collects the data. To decipher neural activity of not just in one
area but across large regions of the brain, signal analysis needs to improve. We
will need better spatial and temporal resolution to determine the exact sequence in
which groups of neurons across the cortex fire to produce a command or a thought.
Finally, and the most importantly, we need novel circuit- to system-level tech-
niques to enhance the power efficiency of autonomous BMI systems and wire-
less sensors to ensure continued performance enhancements under a tight power
budget. Dramatic improvements in power efficiency can be obtained through sev-
eral principles:
electronics is going toward increasingly complex systems: meaningful circuit
solutions need to fit a system concept first;
power efficiency comes from synergy: working cooperatively across levels of
abstraction leads to benefits that are largely greater than the sum of the single
benefits;
exploring alternative signal processing circuits, e.g., time-based, current-based
processing, for power-efficient solutions; using digitally assisted analog circuit
and analog-assisted digital circuit techniques;
power is a valuable currency, and needs to be continuously traded-off with other
available commodities (performance, sample rate, resolution, signal quality,);
power needs to be truly scalable across voltage and time-varying specifications:
every time we can give up something, power needs to benefit from it;
using power-efficient machine-learning techniques to recognize certain general
states of mind from EEG or ECoG recordings; using power-scalable kernels for
the classification of a neural spikes;
emerging technologies are a significant source of inspiration to look at the
future, and to learn new ways to use what exists; circuit and system integration
with emerging and post-CMOS technologies (TFET, SymFET, BiSFET);
understanding or at least measuring are powerful tools to increase power effi-
ciency by avoiding pessimism and reducing design margin.
Additional design challenges posed by increased system integration of a multi-
physical domain hybrid bioelectronic interface, where not only analog and digi-
tal electronics are integrated, but also the mechanical, chemical, optical, and
thermal sensors are becoming integral part of the embedded system needs to be
addressed as well. Creation of a unified design environment where the system
definition and its design partitioning across the different physical domains can be
analyzed and verified remains priority. In addition, non-functional constraints that
130 6Conclusions

usually have particular impact on the successful operation of microelectronic sys-


tems, such as the power consumption, the die size, the reusability, etc., need to be
addressed. One of the main challenges to be addressed is codesign of the sensor/
stimulus component, the electronic subsystem, and the signal processing elements
together to analyze the interaction between the biological, chemical, electronic
and mechanical domain, and to understand and to optimize the integrated sys-
tem, as well as tight interaction in terms of control, calibration, and configurabil-
ity between the multi-domain subsystems. This new dimension of complexity in
multi-domain physical systems requires a global modeling, simulation and veri-
fication strategy, in which the design methodology and modeling approach for
multi-domain design should be revised. Although existing commercial tools offer
a modeling environment in the separate digital electronics and software domains,
there are severe limitations to extend these tools with new simulators or models of
computation to create multi-domain virtual prototypes.
Appendix

A.1. PowerNoise Amplifier Trade-off

The thermal current noise source in an CMOS transistor can be modeled as

2 = 4kT g
in,T m (A.1)

where k is the Boltzmann constant, T is the absolute temperature, is the sub-


threshold gate coupling coefficient (typically 0.60.7), and gm is the transcon-
ductance. The thermal noise coefficient depends on the effective mobility and
channel length modulation [1]; it is 2/3 for older technology nodes and between
0.6 and 1.3 for submicron technologies [2] in strong- and 1/(2) in weak inversion.
The thermal current noise in a resistor can be expressed as

2 = 4kT
in,R (A.2)
R
The input-referred thermal noise of the (single transistor, common-source) ampli-
fier with resistive load can be calculated as the output noise divided by the gain of
the amplifier
   
2 = 1 4kT 4kT 1 4kT
vn,i 2
4kT g m + = + (A.3)
gm R gm gm R gm

assuming the gmR, which is the gain of the amplifier, is much greater than 1/, thus
1/(gmR) is negligible compared to if the amplifier has a high-enough gain. The
total input-referred thermal noise of the amplifier can be calculated by integrating
the noise over the entire frequency range to be
 
4kT 1 4kT
Vrms,ni = = (A.4)
2 gm 2RC gm RC

Springer International Publishing Switzerland 2016 131


A. Zjajo, Brain-Machine Interface, DOI10.1007/978-3-319-31541-6
132 Appendix

In weak inversion where an MOS transistor achieves a maximum gm/ID where ID is


the drain current of the transistor, we have = 1/(2,) and gm = Itot/UT, where Itot
is the total current of the common-source amplifier. Consequently, we can express
the total input-referred thermal noise of the common-source amplifier with the
transistor operating in weak inversion as

1 1 UT kT
Vrms,ni = (A.5)
Itot 2RC

Since the total power consumption is P = ItotVDD, we can express the total power
consumption of the amplifier as a function of input-referred thermal noise as [3]

1 UT kT VDD
P= 2 (A.6)
Vrms,ni 2RC 2

Previous equation illustrates the trade-off between the power consumption and
the total input-referred thermal noise of a subthreshold amplifier for a given sup-
ply voltage and bandwidth (denoted by RC product in this case). To reduce the
input-referred thermal noise by a factor of 2, the total power consumption must be
increased by a factor of 4. This relationship shows a steep power cost of achieving
low-noise performance in a thermal-noise limited amplifier, even without taking a
flicker noise into account.
The power-noise tradeoff in the amplifier is aggravated if the transistor is oper-
ating in strong inversion. In strong inversion, the transconductance gm is propor-
tional to Itot. As a result, the total power consumption scales as 1/Vni4 instead of
1/Vni2 as in the subthreshold case.

A.2. Power in Signal Conditioning Circuit

The minimum power consumption of an LNA is dictated by the input-referred


noise voltage (Vrms,in), and can be calculated as [4]

(NEF)2 4kT UT BWLNA


PLNA = VDD ILNA = VDD 2 (A.7)
Vrms,in 2

where VDD is the supply voltage, k is the Boltzmann constant, T is the temperature
in Kelvins, BWLNA = fLP fHP is the 3-dB bandwidth of the LNA, fLP and fHP are
low-pass and high-pass bandwidth, respectively, UT is the thermal voltage (kT/q),
and noise efficiency factor NEF is defined as [3]

2ILNA
NEF = Vrms,in (A.8)
4kT UT BWLNA
Appendix 133

The total LNA output noise voltage should be less than the ADC quantization
noise

1 VDD 2
 
1
G2LNA G2PGA Vrms,in
2
LSB2 = (A.9)
12 12 2n

where GLNA is the gain of the LNA, GPGA is the gain of the programmable gain
amplifier, LSB is the ADC least significant bit voltage value, and n is the resolu-
tion of the A/D converter. Combining (A.7) and (A.9), the minimum LNA power
consumption is expressed as
24 2n kT UT BWLNA
PLNA G2LNA G2PGA (NEF)2 (A.10)
VDD
The PGA derives the following ADC and must meet a slew rate constraint. By set-
ting the time constant = tslew, where tslew = 1/2fs is the maximum allowable time
for slewing, the minimum required biasing current of the PGA (IPGA,slew = gmVeff) is
CL,PGA GPGA Veff
IPGA,slew = (A.11)
Tslew
where CL,PGA is the load capacitance of the PGA, Veff is the voltage swing of the A/D
converter, and fs is the sampling rate for one recording channel. Consequently, the
power consumption of the PGA is [4]
PPGA = 2fs CL,PGA GPGA Veff VDD (A.12)

A.3. Power in Signal Quantization Circuit

In a sample and hold circuit, sampling capacitor CS is typically chosen large


enough such that the sampling noise is comparable to or at least not significantly
larger than the converters quantization noise. Assuming that the sampling noise
is designed to be equal to the quantization noise leads to the following minimum
value of CS

2n
CS = 12kT 2 (A.13)
VFS

To charge this capacitor to VFS within one half period of the sampling frequency
fS, we need a current of I = 2fSCSVFS. Assuming that we have an ideal amplifier,
driving the capacitor leads to a minimum supply current for that amplifier. Further
assuming that the supply voltage of the amplifier is equal to VFS, we arrive at a
power dissipation of IVFS for the amplifier and, therefore, for the sampling process.
Combining these relationships gives a lower bound for the sampling power
PSH = 24kTfS 22n (A.14)
134 Appendix

In the binary search algorithm, n steps are needed to complete one conversion, as
the DAC output gradually approaches the input voltage. The DAC output voltage
for the i-th step can be expressed as
 
 Vref Vref 
VDAC,out = VI (i) = Vin + Dn1 + ... + i , 1 i n (A.15)
2 2 

where VI is the input voltage difference, Vref is the reference voltage and Dn is the
digital representation of n bit code. The comparator must determine the output dig-
ital code of the sub-ADC converted into a voltage by the DAC for transfer phase
within the decision time td. Subsequently, the output voltage difference required to
make the comparison in the latch-based comparator can be expressed as

Vout = AV VI exp(td ) (A.16)
where AV acts as a gain factor from the input to the initial imbalance of the latch
decision stage, = CL,comp/gm, and CL,out and gm are the output load and transcon-
ductance of the comparator, respectively. Assuming the td = 1/ts, the required gm is
n
n2
     
CL,comp  VDD VDD
gm,comp = ln = 2nf s C L,comp ln + ln 2
td AV (Vref /2K ) AV Vref 2
K=1

(A.17)
To identify the minimum power limit of the comparator, it is noted that its total
input-referred noise voltage has a fundamental kT/C limitation given by

kT
Vn2 = 4 (A.18)
CL,comp

Equating previous equation with the quantization noise VFS /12 22n , gives the
 2 

minimum capacitive load of the comparator

22n
CL,comp = 48kT 2 (A.19)
VFS

where VFS is the full scale voltage range. Substituting (A.19) in (A.17), the min-
imum gm,comp and Icomp = gm,compVeff can be found. The power consumption of the
comparator is [4]

22n n2
   
VDD
Pcomp = 96nfs kT 2 Veff VDD ln + ln 2 (A.20)
VFS AV Vref 2

To drive the SAR logic capacitance within the sampling phase requires a current of
Ilogic = (ClogicVFS)/ts which leads to the following minimum limit for the sampling power
2
Plogic = nfs Clogic VDD (A.21)
Appendix 135

where is a total activity of the SAR logic. In binary-weighted capacitor array,


each capacitor is realized as multiples of a unit capacitor CU. Power consump-
tion of the DAC depends on the unit capacitance, the input signal swing, and the
employed switching approach. For a uniformly distributed input signal between
ground and the reference voltage, the average switching power per conversion for
n-bit can be derived as [5, 6]
n

PDAC = 2n=12i (2i 1)CU Vref
2
fs (A.22)
i=1

The unit capacitor CU is usually determined by thermal noise and capacitor mis-
match. The thermal noise resulting from the sampling action of the input voltage
is given by kT/(2nCU). In a Nyquist ADC, CU should be large enough so that the
thermal noise is less than the converters quantization noise

2n
CU,n = 12kT 2 (A.23)
VFS

In mismatch-limited designs, a lower bound for the unit capacitor is


CU,n = 3max (2n 1) K2 KC (A.24)
where K is the mismatch parameter, KC is the capacitor density, and is the
worst-case DNL variance.

A.4. Noise Analysis of Programmable Gain SAR A/D


Converter

The input-referred noise vn (the total integrated output noise as well) still takes the
form of kT/C with some correction factor 1,

vn2 = 1 kT /C4 (A.25)

A fundamental technique to reduce the noise level, or to increase the signal-to-noise


ratio of a programmable gain ADC, is to increase the size of the sampling capaci-
tors, by over-sampling or with calibration. However, for a fixed input bandwidth
specification, the penalty associated with these techniques is the increased power
consumption. Consequently, a fundamental trade-off exists between noise, speed,
and power dissipation. During the acquisition process, kT/C noise is sampled on the
capacitors C4 along with the input signal. To determine the total noise charge sam-
pled onto the capacitor network, noise charge Qns is integrated over all frequencies
 2
2
 Vns (C4 + Cp + COTA ) 
 d = kT (C4 + Cp + COTA ) (A.26)
Qns = 
1 + j R (C + C + C
on 4 p OTA )
0
136 Appendix

where Ron is resistance of the switch, Vns is noise source, Cp is parasitic capacitance
and COTA is the input capacitance of the OTA. Then in the conversion mode, the
sampling capacitor C4, which now contains the signal value and the offset of the
OTA, is connected across the OTA. The total noise charge will cause an output
voltage of
 
2
Qns C4 + Cp + COTA 1 kT
2
vns(out) = 2 = kT 2
= (A.27)
C4 C4 C4

where is the feedback factor. For differential implementation of the circuit, the
noise power of the previous equation increases by a factor of 2 assuming no cor-
relation between positive side and negative side, since the uncorrelated noise adds
in power. Thus, input-referred noise power, which is found by dividing the output
noise power by the square of the gain (GA = C3/C4) is given by

2
vns(out) 1 kT
2
vns(in) = = (A.28)
(GA )2 (GA )2 C4

The resistive channel of the MOS devices in OTA also has thermal noise and con-
tributes to the input-referred noise of the PG ADC circuit. The noise power at the
output is found from

H s|j  i2 d = kT Gm Ro kT
  2 
2
vns(out) = ns = (A.29)
CLT (1 + Gm Ro ) CLT
0
where Ro is the output resistance and CLT is the capacitance loading at the output
 
CLT = CL + Cp + COTA (A.30)
The optimum gate capacitance of the OTA is proportional to the sampling capaci-
tor COTA,opt = 3C4, where 3 is a circuit-dependent proportionality factor. The drain
current ID yields

12 L 2 12 C4
ID = (A.31)
3

where is the carrier mobility, Cox is the gate oxide capacitance, 1 is the gain-

GmRo 1, and gain of the conversion operation GC = C2/C4, the input-referred


bandwidth product, and W and L are the channel width and length. Assuming

noise variance is

2 kT
vns(in) = (A.32)
(GC )2 CLT
The noise from acquisition and conversion mode can be added together to find the
total input-referred noise assuming that two noise sources are uncorrelated. Using
Appendix 137

the results from (3.28) and (3.32), the total input-referred noise power for differen-
tial input is given by
 
2 2 kT 2 kT 1 1 1
vns(in) = + = 2 + kT
(GC )2 CLT (GA )2 C4 (GC )2 CLT (GA )2 C4
(A.33)

For a noise dominated by kT/C, the power consumption is found as

12 L 2 12 SNR 8kT VDD


Psi ID VDD = 2 (A.34)
3 Vmax

B.1. MOS Transistor Model Uncertainty

The number of transistor process parameters that can vary is large. In previous
research aimed at optimizing the yield of integrated circuits [7, 8], the number of
parameters simulated was reduced by choosing parameters which are relatively
independent of each other, and which affect performance the most. The parameters
most frequently chosen are, for n- and p-channel transistors: threshold voltage at
zero backbias for the reference transistor at the reference temperature VTOR, gain
factor for an infinite square transistor at the reference temperature SQ, total length
and width variation Lvar and Wvar, oxide thickness tox, and bottom, sidewall, and
gate edge junction capacitance CJBR, CJSR, and CJGR, respectively. The variation in
absolute value of all these parameters must be considered, as well as the differences
between related elements, i.e., matching. The threshold voltage differences VT and
current factor differences are the dominant sources underlying the drain-source
current or gate-source voltage mismatch for a matched pair of MOS transistors.
Transistor Threshold Voltage: Various factors affect the gate-source voltage at
which the channel becomes conductive such as the voltage difference between
the channel and the substrate required for the channel to exist, the work function
difference between the gate material and the substrate material, the voltage drop
across the thin oxide required for the depletion region, the voltage drop across the
thin oxide due to implanted charge at the surface of the silicon, the voltage drop
across the thin oxide due to unavoidable charge trapped in the thin oxide, etc.
In order for the channel to exist the concentration of electron carriers in the
channel should be equal to the concentration of holes in the substrate, S = F.
The surface potential changed a total of 2F between the strong inversion and
depletion cases. Threshold voltage is affected by the built-in Fermi potential due
to the different materials and doping concentrations used for the gate material and
the substrate material. The work function difference is given by
 
kT ND NA
ms = FSub FGate = ln (A.35)
q ni2
138 Appendix

Due to the immobile negative charge in the depletion region left behind after the
p mobile carriers are repelled. This effect gives rise to a potential across the gate-
oxide capacitance of QB/Cox, where

2Si |2F | 
QB = qNA xd = qNA = 2qNA Si |2F | (A.36)
qNA

and xd is the width of the depletion region. The amount of implanted charge at the
surface of the silicon is adjusted in order to realize the desired threshold voltage.
For the case in which the source-to-substrate voltage is increased, the effective
threshold voltage is increased, which is known as the body effect. The body effect
occurs because, as the source-bulk voltage, VSB, becomes larger, the depletion
region between the channel and the substrate becomes wider, and therefore more
immobile negative charge becomes uncovered. This increase in charge changes the
charge attracted under the gate. Specifically, QB becomes

QB = 2qNA Si (VSB + |2F |) (A.37)
The voltage drop across the thin oxide due to unavoidable charge trapped in the
thin oxide gives rise to a voltage drop across the thin oxide, Vox, given by
Qox qNox
Vox = = (A.38)
Cox Cox
Incorporating all factors, the threshold voltage, VT, is than given by
QB Qox QB Qox QB QB
VT = 2F ms + = ms 2F +
Cox Cox Cox

QB Qox 2qSi NA   
= ms 2F + + |2F | + VSB |2F |
Cox Cox
(A.39)
When the source is shorted to the substrate, VSB = 0, a zero substrate bias is
defined as
QB Qox
VT 0 = ms 2F + (A.40)
Cox
The threshold voltage, VT, can be rewritten as

   2qSi NA
VT = VT 0 + |2F | + VSB |2F | = (A.41)
Cox

Advanced transistor models, such as MOST model 9 [9], define the threshold volt-
age as
VT = VT 0 + VT 0 + VT 1 = VT 0 = (VT 0T + VT 0G + VT 0(M) ) + VT 0 + VT 1
(A.42)
Appendix 139

where threshold voltage at zero backbias VT0 [V] for the actual transistor at the
actual temperature is defined as geometrical model, VT0T [V] is threshold tem-
perature dependence, VT0G [V] threshold geometrical dependence and VT0(M) [V]
matching deviation of threshold voltage. Due to the variation in the doping in
the depletion region under the gate, a two-factor body-effect model is needed to
account for the increase in threshold voltage with VSB for ion-implanted transistors.
The change in threshold voltage for nonzero back bias is represented in the model
as


K0 (uS uS0)
uS < uSX

2
1 K

K0 uSX K0 uS0
K0
VT 0 = (A.43)

2
2 K 2
+ K uS 1 K0


uSX uS uSX

   
uS = VSB + B uS0 = B uST = VSBT + B uSX = VSBX + B
(A.44)
where the parameter VSBX [V] is the backbias value, at which the implemented
layer becomes fully depleted, K0 [V1/2] is low-backbias body factor for the actual
transistor and K [V1/2] is high-backbias body factor for the actual transistor. For
nonzero values of the drain bias, the drain depletion layer expands towards the
source and may affect the potential barrier between the source and channel regions
especially for short-channel devices. This modulation of the potential barrier
between source and channel causes a reduction in the threshold voltage. In sub-
threshold this dramatically increases the current and is referred to as drain-induced
barrier lowering (DIBL). Once an inversion layer has been formed at higher values
of gate bias, any increase of drain bias induces an additional increase in inversion
charge at the drain end of the channel. The drain bias still has a small effect in the
threshold voltage and this effect is most pronounced in the output conductance in
strong inversion and is referred to as static feedback. The DIBL effect is modeled
by the parameter 00 in the subthreshold region. This drain bias voltage depen-
dence is expressed by first part of
2
VGTX 2
VGT 1
VT 1 = 0 2 2
VDS 1 2 2
VDSDS (A.45)
VGTX + VGT 1 VGTX + V GT 1



VGS VT 1 VGS VT1
VGT 1 =
0 VGS < VT1
VGTX = 2/2 (A.46)

where 1 is coefficient for the drain-induced threshold shift for large gate drive for
the actual transistor and DS exponent of the VDS dependence of 1 for the actual
140 Appendix

transistor. The static feedback effect is modeled by 1. This can be interpreted as


another change of effective gate drive and is modeled by the second part of (A.43).
From first-order calculations and experimental results the exponent DS is found
to have a value of 0.6. In order to guarantee a smooth transition between sub-
threshold and strong inversion mode, the model constant VGTX has been introduced.
Threshold voltage temperature dependence is defined as
VT 0T = VT 0R + (TA + TA TR ) ST ;VT 0 (A.47)
where VTOR [V] is threshold voltage at zero backbias for the reference transistor
at the reference temperature, TA [C] ambient or the circuit temperature, TA [C]
temperature offset of the device with respect to TA, TR [C] temperature at which
the parameters for the reference transistor have been determined and ST;VT0 [VK1]
coefficient of the temperature dependence VT0. In small devices the threshold volt-
age usually is changed due to two effects. In short-channel devices depletion from
the source and drain junctions causes less gate charge to be required to turn on
the transistors. On the other hand in narrow-channel devices the extension of the
depletion layer under the isolation causes more gate charge to be required to form
a channel. Usually these effects can be modeled by geometrical preprocessing
rules:
     
1 1 1 1 1 1
VT 0G = SL;VT 0 + 2 SL2;VT 0 + SW ;VT 0
LE LER LE2 LER WE WER
(A.48)
where LE [m] is effective channel length of the transistor, WE [m] effective channel
width of the transistor, LER [m] effective channel length of the reference transis-
tor, WER [m] effective channel width of the reference transistor, SL;VT0 [Vm] coef-
ficient of the length dependence VT0, SL2;VT0 [Vm2] second coefficient of the length
dependence VT0, SW;VT0 [Vm] coefficient of the width dependence VT0. The individ-
ual transistor sigmas are square root of two smaller than the sigma for a pair. In
the definition of the individual transistor matching deviation stated in the process
block, switch mechanism and correction factor is added as well,

FS VT 0(AIntra) / 2
VT 0(M) = + FS VT 0(BIntra) / 2 (A.49)
We Le FC

where VT0(AIntra) and VT0(BIntra) are within-chip spread of VT0 [Vm], FS is a sort of
mechanism to switch between inter and intra die spread, for intra.die spread FS = 1,
otherwise is zero, and FC is correction for multiple transistors in parallel and units.
Transistor Current Gain: A single expression model the drain current for all
regions of operation in the MOST model 9 is given by
   
VGT 3 1+ 2
1
VDS1 VDS1
IDS = G3 (A.50)
{1 + 1 VGT 1 + 2 (us us0 )}(1 + 3 VDS1 )
Appendix 141

where
 
1 2
(K0 K)VSBX
1 = K+ 2 (A.51)
us VSBX + (2 VGT 1 + VSB )2

VGT 3 = 2mT ln (1 + G1 ) (A.52)


  
1 1 exp V
T
DS
+ G1 G2 
VGT 2

G3 = 1
G1 = exp
+ G1
1
2mT
 
VDS VDS1
G2 = 1 + ln 1 + (A.53)
VP

 m
us0
m = 1 + m0 (A.54)
us1

1, 2, 3 are coefficients of the mobility reduction due to the gate-induced field, the
backbias and the lateral field, respectively, T thermal voltage at the actual tem-
perature, 1 weak-inversion correction factor, 1 and 2 are model constants and VP
is characteristic voltage of the channel-length modulation. The parameter m0 char-
acterizes the subthreshold slope for VBS = 0. Gain factor is defined as
   
We A / 2
= SQT Fold (1 + SSTI ) 1+ + B / 2 FS
Le W e Le F C
(A.55)
where SQT is gain factor temperature dependence, SSTI is STI stress, FS switching
mechanism factor, FC correction factor multiple transistors in parallel and units and A
area scaling factor and B a constant. Gain factor temperature dependence is defined as
 
T0 + T R
SQT = SQ (A.56)
T0 + TA + TA

where [] is exponent of the temperature dependence of the gain factor and SQ


[AV2] is gain factor for an infinite square transistor at the reference temperature
defined as


2 2
(1 + 2Q)We + Q(Wx W ) Q (Wx W ) + /We
SQ =2
Le +(Lx L) (Lx L)2 + 2
1 1 1 1
BSQ + BSQS + Le BSQS BSQ
(A.57)
142 Appendix

 BSQ  BSQS
T0 + TR T0 + TR
BSQ = SQTR BSQS = SQSTR
T0 + TA + TA T0 + TA + TA
(A.58)

For devices in the ohmic region (A.24) can be approximated by

VGS VT 21 VDS
ID
= VDS (A.59)
1 + (VGS VT )

and for saturated devices

(VGS VT )2
ID
= (A.60)
2 1 + (VGS VT )

Change in drain current can be calculated by



    
ID ID ID
ID = + VT + (A.61)
VT

leading to drain current mismatch


ID
= x VT x (A.62)
ID
where for ohmic

1 + 21 VDS (VGS VT )
o =   o = (A.63)
VGS VT 21 VDS (1 + (VGS VT )) 1 + (VGS VT )

and for saturation


2 + (VGS VT ) (VGS VT )
s = s = (A.64)
(VGS VT )(1 + (VGS VT )) 1 + (VGS VT )
The standard deviation of the mismatch parameters is derived by
   
ID
2 = 2 + 2x 2 (VT ) + x2 2 ( )
ID
       

+ 2 , VT x (VT ) + 2 , x ()

+ 2(VT , )x x ( ) (V )
(A.65)
with [10]

AVT / 2
(VT ) =  + BVT / 2 + SVT D (A.66)
Weff Leff
Appendix 143



 
A / 2
= + B / 2 + S D (A.67)
Weff Leff

where Weff is the effective gate-width and Leff the effective gate-length, the propor-
tionality constants AVT, SVT, A, and S are technology-dependent factors, D is dis-
tance and BVT and B are constants. For widely spaced devices terms SVTD and SD
are included in the models for the random variations in two previous equations,
but for typical device separations (<1 mm) and typical device sizes this correction
is small. Most mismatch characterization has been performed on devices in strong
inversion, in the saturation or linear region but some studies for devices operat-
ing in weak inversion have also been conducted. Qualitatively, the behavior in all
regions is very similar; VT and variations are the dominant source of mismatch
and their matching scales with device area. The effective mobility degradation
mismatch term can be combined with the current factor mismatch term, as both
terms become significant in the same bias range (high gate voltage). The corre-
lation factor (VT, /) can be ignored as well, since correlation between
(VT) and the other mismatch parameters remains low for both small and large
devices. The drain source current error ID/ ID is important for the voltage biased
pair. For the current biased pair, the gate-source or input referred mismatch should
be considered, whose expression could be derived similarly as for drain source
current error. Change in gate-source voltage can be calculated by
   
VGS VGS
VGS = VT + (A.68)
VT

leading to the standard deviation of the mismatch parameters is derived by


   
2 VGS 2 2 2 (VGS VT )
= (VT ) + where = (A.69)
VGS 2

MOS transistor current matching or gate-source matching is bias point dependent,


and for typical bias points, VT mismatch is the dominant error source for drain-
source current or gate-source voltage matching.
Transistor width W and length L: The electrical transistor length is determined
by the combination of physical polysilicon track width, spacer processing, mask-,
projection-, and etch variations
Le = L + Lvar = L + LPS 2 Loverlap (A.70)
where Le is effective electrical transistor channel length, determined by linear
region MOS transistor measurements on several transistors with varying length,
L drawn width of the polysilicon gate, Lvar total length variation, LPS length
variation due to mask, projection, lithographic, etch, etc., variations and Loverlap
effective source/gate or drain/gate overlap per side due to lateral diffusion. The
electrical transistor width is determined by the combination of physical active
region width, mask, projection and etch variations
144 Appendix

We = W + Wvar = W + WOD 2 Wnarrow (A.71)

where We is effective electrical transistor channel width, determined by linear


region MOS transistor measurements on several transistors with varying width, W
drawn width of the active region, Wvar total width variation, WOD width varia-
tions due to mask, projection, lithographic, etch, etc., variations and Wnarrow dif-
fusion width offset: effective diffusion width increase due to lateral diffusion of
the n+ or p+ implementation.
Oxide thickness: The modeling of oxide thickness tox has impact on: total capac-
itance from the gate to the ground: Cox = ox(We Le)/tox, gain factor: gain factor,
SL;1Rcoefficient of the length dependence of 1, 1Rcoefficient of the mobility
reduction due to the gate-induced field, subthreshold behaviour: m0Rfactor of the
subthreshold slope for the reference transistor at the reference temperature, over-
lap capacitances: CGD0 = WE Col = WE (ox LD)/tox, and CGS0 = CGD0, and bulk
factors: K0Rlow-backbias body factor and KRhigh-backbias body factor.
Junction capacitances: The depletion-region capacitance is nonlinear and is
formed by: n+p: n-channel source/drain to p-substrate junction, p+n: p-chan-
nel source/drain to n-well junction and np: n-well to p-substrate junction.
Depletion capacitance of a pn or np junction consists of bottom, sidewall and gate
edge component. Capacitance of bottom area AB is given as
 PB
VDBR VR
CJB = CJBR AB (A.72)
VDB

where AB [m2] is diffusion area, VR [V] voltage at which parameters have been
determined, VDB [V] diffusion voltage of bottom area AB, VDBR [V] diffusion
voltage of the bottom junction at T = TR and PB [] bottom-junction grading
coefficient.
Similar formulations hold for the locos-edge and the gate-edge compo-
nents; one has to replace the index B by S and G, and the area AB by LS and LG.
Capacitance of the bottom component is derived as

CJBR

P
B
V < VLB
1 V V
CJBV = DB (A.73)
C CLB PB (V VLB )
LB + VDB (1FCB ) V VLB

where
  1
1 + PB PB
CLB = CJB (1 FCB )PB FCB = 1 VLB = FCB VDB (A.74)
3

and V is diode bias voltage. Similar expressions can be derived for sidewall CJSV and
gate edge component CJGV. The total diode depletion capacitance can be described by:
C = CJBV + CJSV + CJGV (A.75)
Appendix 145

B.2. Resistor and Capacitor Model Uncertainty

Typical CMOS and BiCMOS technologies offer several different resistors, such
as diffusion n+/p+ resistors, n+/p+ poly resistors, and nwell resistor. Many fac-
tors in the fabrication of a resistor such as the fluctuations of the film thickness,
doping concentration, doping profile, and the dimension variation caused by the
photolithographic inaccuracies and nonuniform etch rates can display significant
variation in the sheet resistance. However, this is bearable as long as the device
matching properties are within the range the designs require. The fluctuations of
the resistance of the resistor can be categorized into two groups, one for which the
fluctuations occurring in the whole device are scaled with the device area, called
area fluctuations, another on in which fluctuations takes place only along the
edges of the device and therefore scaled with the periphery, called peripheral fluc-
tuations. For a matched resistor pair with width W and resistance R, the standard
deviation of the random mismatch between the resistors is
 
fp  
= fa + W R (A.76)
W

where fa and fp are constants describing the contributions of area and periphery
fluctuations, respectively. In circuit applications, to achieve required matching,
resistors with width (at least 23 times) wider than minimum width should be
used. Also, resistors with higher resistance (longer length) at fixed width exhibit
larger mismatching. To achieve the desired matching, it has been a common prac-
tice that a resistor with long length (for high resistance) is broken into shorter
resistors in series. To model a (poly-silicon) resistor following equation is used
L Re
R = Rsh + (A.77)
W + W W + W
where Rsh is the sheet resistance of the poly-resistor, Re is the end resistance coef-
ficient, W and L are resistor width and length, W is the resistor width offset. The
correlations between standard deviations () of the model parameters and the stan-
dard deviation of the resistance are given in the following
 2  2  2
R R R
R2 = 2
Rsh 2
+ Re 2
+ W (A.78)
Rsh Re W

2
L2

1 L Rsh Re
R2 = Rsh
2 + 2
Re + 2
W +
(W + W )2 (W + W )2 (W + W )2 (W + W )2
(A.79)
To define the resistor matching,
 2  2  2
2 2 L 2 1 2 1
R = Rsh + Re + W
R (L Rsh + Re ) (L Rsh + Re ) (W + W )2
(A.80)
146 Appendix

ARsh AW
Rsh = Re = ARe W =
1
(A.81)
WL W 2

Current CMOS technology provides various capacitance options, such as poly-to-


poly capacitors, metal-to-metal capacitors, MOS capacitors, and junction capaci-
tors. The integrated capacitors show significant variability due to the process
variation. For a MOS capacitor, the capacitance values are strongly dependent on
the change in oxide thickness and doping profile in the channel besides the varia-
tion in geometries.
Similar to the resistors the matching behavior of capacitors depends on the ran-
dom mismatch due to periphery and area fluctuations with a standard deviation
 
fp
= fa + C (A.82)
C

where fa and fp are factors describing the influence of the area and periphery fluctu-
ations, respectively. The contribution of the periphery components decreases as the
area (capacitance) increases. For very large capacitors, the area components
domi-
nate and the random mismatch becomes inversely proportional to C . A simple
capacitor mismatch model is given by
2 fp fa
C = p2 + a2 + d2 p = 3 a = 1 d = fd d (A.83)
C C4 C2
where fp, fa, and fd are constants describing the influence of periphery, area, and
distance fluctuations. The periphery component models the effect of edge rough-
ness, and it is most significant for small capacitors, which have relatively large
amount of edge capacitance. The area component models the effect of short-range
dielectric thickness variations, and it is most significant for moderate size capaci-
tors. The distance component models the effect of global dielectric thickness vari-
ations across the wafer, and it becomes significant for large capacitors or widely
spaced capacitors.

B.3. Time-Domain Analysis

The modern analog circuit simulators use a modified form of nodal analysis [11,
12] and NewtonRaphson iteration to solve the system of n nonlinear equations
fi in n variables pi. In general, the time-dependent behavior of a circuit containing
linear or nonlinear elements may be described as [13]
q E = 0 q0 = q(0)
f (q, , w, p, t) = 0 (A.84)

This notation assumes that the terminal equations for capacitors and inductors are
defined in terms of charges and fluxes, collected in q. The elements of matrix E
Appendix 147

are either 1 or 0, and represents the circuit variables (nodal voltages or branch
currents). All nonlinearitys are incorporated in the algebraic system f(q, , w, p,
t) = 0, so the differential equations q E = 0 are linear. The initial conditions
are represented by q0. Furthermore, w is a vector of excitations, and p contains the
circuit parameters like parameters of linear or nonlinear components. An element
of p may also be a (nonlinear) function of the circuit parameters. It is assumed that
for each p there is only one solution of . The dc solution is computed by solving
the system
E0 = 0
(A.85)
f (q0 , 0 , w0 , pi , 0) = 0
which is derived by setting q = 0. The solution (q0, 0) is fond by Newton-
Raphson iteration. In general, this technique finds the solution of a nonlinear sys-
tem F() = 0 by iteratively solving the Newton-Raphson equation
J k k = f ( k ) (A.86)
k
where J is the Jacobian of f, with J ij = fi /j . Iteration starts with estimate
k
 k

0. After k is computed in the kth iteration, k+1 is found as k+1 = k + k


and the next iteration stars. The iteration terminates when k is sufficiently
small. For the (A.85), the NewtonRaphson equation is
    
0 E q0 E
f f
0
=
f (A.87)
q0 0

which is solved by iteration (for simplicity it is assumed that the excitations w


do not depend on pj). This scheme is used in the dc operating point [1113], dc
transfer curve, and even time-domain analysis; in the last case, the dependence
upon time is eliminated by approximating the differential equations by difference
equations [13]. Only frequency-domain (small signal) analyses are significantly
different because they require (for each frequency) a solution of a system of simul-
taneous linear equations in the complex domain; this is often done by separating
the real and imaginary parts of coefficients and variables, and solving a twice as
large system of linear equations in the real domain.
The main computational effort of numerical circuit simulation in typical appli-
cations is thus devoted to: (i) evaluating the Jacobian J and the function f, and
then (ii) solving the system of linear equations. After the dc solution (q0, 0) is
obtained, the dc derivatives are computed. Differentiation of (A.85) with respect to
pj results in linear system
  q   
0
0 E pj 0
f f 0 = f
p (A.88)
q
0 0 pj j

Equation (A.85) can be solved efficiently by using the LU factorization [14] of


the Jacobian that was computed at the last iteration of (A.87). Now the derivatives
148 Appendix

of (A.84) to pj is computed. Differentiation of (A.84) to pj results in linear, time-


varying system
q q0 q(0)
pj E p =0 pj = pj
f q f
j
f (A.89)
q pj + pj + pj =0

At each time point the circuit derivatives are obtained by solving previous system
of equation after the original system is solved. Suppose, for example, that a kth
order backward differentiation formula (BDF) is used [15, 16], with the corrector
k1
1 
(q )n+k = ai qn+ki (A.90)
t
i=0

where the coefficients ai depend upon the order k of the BDF formula. After sub-
stituting (A.90) into (A.84), the Newton-Raphson equation is derived as
k1

a0 1
t E qn+k t ai qn+ki En+k
f f
n+k
= t=0 (A.91)
q f (qn+k , n+k , wn+k , pj , tn+k )

Iteration on this system provides the solution (qn+k,n+k). Substituting a kth order
BDF formula in (A.89) gives the linear system

a q k1
1 q
t0 E pj n+k t ai p j
(A.92)

f f = t=0 n+ki
q f
pj
n+k
p j

Thus (A.91) and (A.92) have the same system matrix. The LU factorization of this
matrix is available after (A.91) is iteratively solved. Then a forward and backward
substitution solves (A.92). For each parameter the right-hand side of (A.92) is
different and the forward and backward substitution must be repeated. If random
term (p, t), which models the tolerance effects is nonzero and added to the
equation (A.35) [1721]
f (q, , w, p, t) + (p, t) = 0 (A.93)
Solving this system means to determine the probability density function of the ran-
dom vector p(t) at each time instant t. For two instants in time, t1 and t2, with t1
= t1 t0 and t2 = t2 t0 where t0 is a time that coincides with dc solution of cir-
cuit performance function , t is assumed to satisfy the criteria that circuit per-
formance function can be designated as the quasi-static. To make the problem
manageable, the function can be linearized by first-order Taylor approximation
assuming that the magnitude of the random term p is sufficiently small to consider
the equation as linear in the range of variability of p or the nonlinearities are so
smooth that they might be considered as linear even for a wide range of p.
Appendix 149

B.4. Parameter Extraction

Once the nominal parameter vector p0 is found for the nominal device, the param-
eter extraction of all device parameters pk of the transistors connected to particular
node n can be performed using a linear approximation to the model. Let p = [p1, p2,
, pn]T Rn denote the parameter vector, f = [f1, f2, , fm]T Rm performance vec-
k T Rm the measured performance vector of the kth device
tor, zk = z1k , z2k , . . . , zm
 
and w a vector of excitations w = [w1, w2, , wl]T Rl. Considering Eq. (A.84)
q E = 0 q0 = q(0)
f (q, , w, p, t) = 0 (A.94)

general model can be written. The measurements can only be made under certain
selected values of w, and if the initial conditions q0 are met, so the model can be
simply denoted as
f (p) = 0 (A.95)
To extract a parameter vector pk corresponding to the kth device
  
pk = arg min f (pk ) zk  (A.96)
 
pk Rn

is found. The weighted sum of error squares for the kth device is formed as [13]
m
1 1
(pk ) = wi [fi (pk ) zik ]2 = [f (pk ) zk ]T W [f (pk ) zk ] (A.97)
2 2
i=1

if circuit performance function is approximated as a linear function of p around


the mean value p
= f (p) = p + J(p p) f (p0 + p) f (p0 ) + J(p0 )p (A.98)
0 0
where J(p ) is the Jacobian evaluated at p , a linear least squares problem is
formed for the kth device [16] as
 
1
min (pk ) = [J(p0 )pk + f 0 zk ]T W [J(p0 )pk + f 0 zk ] (A.99)
pk Rn 2

So, for the measured performance vector zk for the kth device, an approximate esti-
mate of the model parameter vector for the kth device is obtained from

pk(0) = p0 pk(0) (A.100)

where

pk(0) = [J(p0 )T WJ(p0 )T ]1 J(p0 )T W (f 0 zk ) (A.101)


150 Appendix

B.5. Performance Function Correction

To model the influence of measurement errors on the estimated parameter varia-


tion consider a circuit with a response that is nonlinear in n parameters. Changes
in the n parameters are linearly related to the resulting circuit performance func-
tion (node voltages, branch currents, ), if the parameter changes are small

= p (A.102)
p
with = (p) 0 and
 T
1
(p) = 0 + p + pT Hp + . . .  0 + (A.103)
p 2

where H is the Hessian matrix [22], whose elements are the second-order
derivatives

hij = 2 (p)/pi pj (A.104)

Now define

r = rr pr + where rr pr = [1 . . . k ]T (A.105)

which is the relation between measurement errors , parameter deviations and


observed circuit performance function .
Assume that r is obtained by k measurements. Now an estimate for the
parameter deviations pr must be obtained. According to least square approxima-
tion theorem [17], the least squares estimate pr of pr minimizes the residual

r rr pr 2
 
2 (A.106)

The least squares approximation of pr can be employed to find influence of mea-


surement errors on the estimated parameter deviations by

pr = (rr rr )1 rr

r (A.107)
which may be obtained using the pseudo-inverse of rr. As stated in [22], the cova-
riance matrix Cpr may be determined as
1
(A.108)

Cpr = rr rr

This expression models the influence of measurement errors on the estimated


parameter variation. The magnitude of the ith diagonal element of Cpr indicates
the precision with which the value of the ith parameter can be estimated: a large
variance signifies low parameter testability. Like this a parameter is considered
Appendix 151

testable if the variance of its estimated deviation is below a certain limit. The off-
diagonal elements of Cpr contain the parameter covariances.
If an accuracy check shows that the performance function extraction is not
accurate enough, the performance function correction is performed to refine the
extraction. The basic idea underlying performance function correction is to correct
the errors of performance function extraction based on the given model and the
knowledge obtained from the previous stages by iteration process. Denoting

k k
(i) (p) = 0 + (i) (A.109)

the extracted performance function vector for the kth device at the ith iteration,
performance function correction can be found by finding the solution for the trans-
k k )
formation (i+1) = Fi ((i) such that more accurate performance function vectors
can be extracted, subject to
   
 k k   k k 
(i+1) ()  < (i) ()  (A.110)

where
 
k k
() = arg min ( ) (A.111)
k Rn

is the ideal solution of the performance function. The error correction mapping Fi
is selected in the form of

k k k
(i+1) (p) = (i) (p) + di (i) (A.112)

where di is called error correction function and needs to be constructed. The


dataset

{dik , (i)
k
, k = 1, 2, . . . , K} (A.113)

gives the information relating the errors due to inaccurate parameter extraction to
the extracted parameter values. A quadratic function is postulated to approximate
the error correction function
n
 n 
 n
dt = pj + pj pl , t = 1, 2, . . . , n (A.114)
j=1 j=1 l=1

where d = [d1,d2,,dn]T, p = [p1, p2,, pn]T, j and jl are the coefficients of


the error correction function at the ith iteration. The coefficients can be determined
by fitting equation to the dataset under the least square criterion. Once the error
correction function is established, performance function correction is performed as
k k k
(i+1) (p) = (i) (p) + (i+1) (A.115)
152 Appendix

k
(i+1) k
= (i) k
(p) + di (i) (A.116)

B.6. Sample Size Estimation

The problem of statistical analysis consists in determining the statistical properties


of random term (p,t), which models the tolerance effects
= (p, t) f (q0 , 0 , w0 , pi , 0) (A.117)
as shown in Appendix B.3. In Monte Carlo analysis an ensemble of transfer curves
is calculated from which the statistical characteristics are estimated. From estima-
tion theory it is known, that the estimate for the mean
n
1
=
n
i (A.118)
i=1

with confidence level = 1 lies within the interval probability [23]



z1 + z1 (A.119)
2 n 2 n
of a N(0, 1) distributed random variable . From this with given interval width

= 2z1 (A.120)
2 n
the necessary sample size n is obtained as

2
 
n= 2z1 (A.121)
2

If, for example a mean value has to be estimated with a relative error / =
0.1 and a confidence level of = 0.99 (z1/2 2.5) the sample size is n = 2500.
Similar to that we have for the estimate of the variance
n
1 
2 = (i )2 (A.122)
n1
i=1

a necessary sample size of


2
2
  2  2
n= 2 2z1 2
= 2 z1 (A.123)
2 2

in order to provide that the estimate 2 falls with probability into the interval
2 2
2 2 2 + (A.124)
2 2
Appendix 153

For example, the required number of samples for an accuracy of / = 0.1 and
a confidence level of 0.99 is n = 1250.

B.7. Frequency Domain Analysis

The behavior of a system (A.93) in the frequency domain


f (qj , j , wj , pj , j) + (pj , j) = 0 (A.125)
is described by a set of linear complex equations [13]
T (p, j) X(p, j) = W (p, j) (A.126)
where T(p, j) is the system matrix, X(p, j) and W(p, j) are network and source
vectors, respectively, and is the frequency in radians per second. To evaluate
network vector X(p, j) to the parameter p, the previous equation is differentiated
with respect to p to obtain
 
X(p, j) 1 T (p, j) W (p, j)
= T (p, j) X(p, j) (A.127)
p p p

The circuit performance function = f(p, j) is obtained from = f(p, j) =


dTX(p, j) using the adjoint or transpose method [24] where the vector d is a con-
stant vector that specifies the circuit performance function. The derivatives of the
circuit performance function with respect to VT and are then computed from

 
(VTi , j) T 1 T (VTi , j) W (VTi , j)
= d T (VTi , j) X(VTi , j)
VTi VTi VTi
(A.128)
 
(i , j) T (i , j) W (i , j)
= d T T 1 (i , j) X(i , j) (A.129)
i i i

The first-order derivatives of the magnitude of the circuit performance function are
computed from
 
|(j)| 1 (VTi , j)
= |(VTi , j)|Re (A.130)
VTi (VTi , j) VTi

 
|(i , j)| 1 (i , j)
= |(i , j)|Re (A.131)
i (i , j) i
154 Appendix

where Re denotes the real part of the complex variable function. The second-
order derivatives are calculated from

2 | (VTi , j)| (VTi , j) 2


 
1
= | (V Ti , j)|Re
VTi2 (VTi , j) VTi
  2
2 (VTi , j) (VTi , j) 2

1 1
+ | (VTi , j)|Re
(VTi , j) VTi2 (VTi , j)2 VTi

(A.132)

2 |(i , j)| (i , j) 2
 
1
= |( i , j)|Re
i2 (i , j) i
  2
2 (i , j) (i , j) 2

1 1
+ |(i , j)|Re
(i , j) i2 (i , j)2 i

(A.133)
The circuit performance function (j) can be approximated with the truncated
Taylor expansions as
 
(j)
= (j) + J (j) (j) (A.134)

where J is the R MN Jacobain matrix of the transformation whose generic ij ele-


ment is defined as

i (, j) 
[J]ij = i = 1, . . . , R, j = 1, . . . , MN (A.135)
(j)j =

The multivariate normal probability function can be found as


 
1 1 T 1  (j) 
P( ) =  exp (j) (j) C(j) (j)
 
(2 )R C (j) 2

(A.136)
where the covariance matrix of the circuit performance function C (j) is defined
as

C (j) = J(j) C (j) J(j)T (A.137)

and covariance matrix is



Cp1 p1 Cp1 p2 . . .
C = Cp2 p1 Cp2 p2 . . .
(A.138)
... ... ...
Appendix 155

where

i +Li x
x j +Lj yi+Wi yj+Wj
  1 
Cp1 p1 = Rp1 p1 (xA , yA , xB , yB )
ij (Wi Li )(Wj Lj )
xi xj yi yj

p1 (xA , yA )p1 (xB , yB ) dxA dxB dyA dyB

(A.139)

i +Li x
x j +Lj yi+Wi yj+Wj
  1 
Cp1 p2 = Rp1 p2 (xA , yA , xB , yB )
ij (Wi Li )(Wj Lj )
xi xj yi yj

p1 (xA , yA )p2 (xB , yB ) dxA dxB dyA dyB

(A.140)

and Rp1p1(xA, yA, xB, yB), the autocorrelation function of the stochastic process p1,
is defined as the joint moment of the random variable p1(xA, yA) and p1(xB, yB), i.e.,
Rp1p1(xA, yA, xB, yB) = E{p1(xA, yA) p1(xB, yB)}, which is a function of xA, yA and xB, yB
and Rp1p2(xA, yA, xB, yB) = E{p1(xA, yA)p2(xB, yB)} the cross-correlation function of
the stochastic process p1 and p2. The experimental data shows that threshold volt-
age differences VT and current factor differences are the dominant sources
underlying the drain-source current or gate-source voltage mismatch for a matched
pair of MOS transistors.
The covariance pipj = 0, for i j, if pi and pj are uncorrelated. Thus the covari-
ance matrix CP of p1, , pk with mean pi and a variance pi2 is
Cp1 ,...pk = diag(1, . . . , 1) (A.141)
In [10] these random differences for the single transistor having a normal distribu-
tion with zero mean and a variance dependent on the device area WL are derived
as

  AVT / 2  
for i = j Cp1 p1 = VT =  + BVT / 2 + SVT D; for i = j Cp1 p1 = 0
ij Weff Leff ij

(A.142)


  A / 2  
for i = j Cp2 p2 = / =  + B / 2 + S D; for i = j Cp2 p2 = 0
ij Weff Leff ij

(A.143)
where Weff is the effective gate-width and Leff the effective gate-length, the propor-
tionality constants AVT, SVT, A, and S are technology-dependent factors, D is dis-
tance and BVT and B are constants.
156 Appendix

Assuming the ac components as small variations around the dc component, the


frequency analysis tolerance window, considering only the first and second-order
terms of the Taylor expansion of the circuit performance function = f(VT (j),
(j)), around their mean (=0), the mean and of the circuit performance func-
tion for = 0, can be estimated as
n
   
1  2 |(VTi , j)| 2 2 (Vi , j) 2
= 0 + VTi + i (A.144)
2
i=1
VTi2 i2

n
   
2
 2 |(VTi , j)| 2 2 (Vi , j) 2
= VTi + i (A.145)
i=1
VTi2 i2

where n is total number of transistors in the circuit and is the mean of = f(VT
(j), (j)) over the local or global parametric variations.

B.8. Discrimination Analysis

Derivation of an acceptable tolerance window is aggravated due to the over-


lapped regions in the measured values of the error-free and faulty circuits, result-
ing in ambiguity regions for fault detection. Let the one-dimensional measurement
spaces G and F denote fault-free and faulty decision regions and f(n|G) and
f(n|F) indicates the distributions of the n under fault-free and faulty conditions.
Then,

= P(n F |G) = fn (n |G)dn
F
  (A.146)
   c G
= P c| N G , 2 /n =P Z
/ n

= P(n G |F) = fn (n |F)dn
G
  (A.147)
   c F
= P < c| N F , 2 /n =P Z<
/ n
where Z ~ N(0, 1) is the standard normal distribution, the notation indicates the
probability that the fault-free circuit is rejected when it is fault-free, and denotes
the probability that faulty circuit is accepted when it is faulty and c critical con-
stant of the critical region of the form
 
C = (1 , . . . , n ) : c (A.148)
Appendix 157

and

P(G) = P(n G |G) = fn (n |G)dn = 1 fn (n |F)dn = 1
G G
(A.149)


P(F) = P(n F |F) = fn (n |F)dn = 1 fn (n |G)dn = 1
F F
(A.150)

Recall that if ~ N(, 2), then Z = ( /) ~ N(0, 1). In the present case,
the sample mean of , ~ N(, 2/n), since the variable is assumed to have
a normal distribution. Since and represent probabilities of events from the
same decision problem, they are not independent of each other or of the sample
size. Evidently, it would be desirable to have a decision process such that both
and are small. However, in general, a decrease in one type of error leads to an
increase in the other type for a fixed sample size. The only way to simultaneously
reduce both types of errors is to increase the sample size. However, this proves
to be time-consuming process. The NeymanPearson test is a special case of the
Bayes test, which provides a workable solution when the a priori probabilities may
be unknown or the Bayes average costs of making a decision may be difficult to
evaluate or set objectively. The Neyman-Pearson test is based on the critical region
C* , where is sample space of the test statistics
C = {(1 , . . . , n ) : l(1 , . . . , n |G, F) } (A.151)
which has the largest power (smallest -probability that faulty circuit is accepted
when it is faulty) of all tests with significance level . Introducing the Lagrange
multiplier to account for the constraint gives the following cost function, J,
which must be maximized with respect to the test and

J = 1 + (0 ) = 0 + fn (n |F) fn (n |G)dn
(A.152)
G

To maximize J by selecting the critical region G, we select n G such that the


integrand is positive. Thus G is given by
   
G = n : f (n |F) fn (n |G) > 0 (A.153)
The Neyman-Pearson test decision rule (n) can be written as a likelihood ratio
test

1 (pass) if l(1,..., n |G, F) 
(n ) =
0 (fail) if l(1,..., n |G, F) <  (A.154)
158 Appendix

Suppose 1,, n are independent and identically distributed N(, 2) random


values of the power supply current. The likelihood function of independent and
identically distributed N(, 2) random values of the power supply current where
F > G is given by
n n
   
1  2 1  2
l(1 , . . . , n ) = exp 2 (i G ) exp 2 (i F )
2 2
i=1 i=1
  n n

1 
(A.155)

= exp (i F )2 (i G )2
2 2
i=1 i=1

Now,
n
 n
  
(i F )2 (i G )2 = n 2F 2G 2n(F G ) (A.156)
i=1 i=1

Using the NeymanPearson Lemma, the critical region of the most powerful test
of significance level is
   
  1   2 2

C = 1,..., n : exp n F G 2n(F G ) 
2 2
2
 
(F + G )
(A.157)
 
= 1,..., n : log  +
n(F G ) 2
  
= 1,..., n : 
For the test to be of significance level
 
    G
P  | N , 2 /n = P Z =  = G + z(1)
/ n n
(A.158)
where P(Z < z(1)) = 1 , which can be also written as 1(1 ). z(1) is
the (1 )-quantile of Z, the standard normal distribution. This boundary for the
critical region guarantees, by the Neyman-Pearson lemma, the smallest value of
obtainable for the given values of and n. From two previous equations, we can
see that the test T rejects for

G
T= z(1) (A.159)
/ n

Similarly, to construct a test for the two-sided alternative, one approach is to com-
bine the critical regions for testing the two one-sided alternatives. The two one-
sided tests form a critical region of
C = (1 , . . . , n ) : 2 , 1
 
(A.160)
Appendix 159


1 = G + z(1 ) 2 = G z(1 ) (A.161)
2 n 2 n
Thus, the test T rejects for

G G
T= z(1 ) or T = z(1 ) (A.162)
/ n 2 / n 2

If the variance 2 is unknown, a critical region can be found


 
G
C = (1 , . . . , n ) : t = 1 (A.163)
S/ n

where t is the t-distribution with n 1 degrees of freedom and S is unbiased esti-


mator of the 2 confidence interval. 1 is chosen such that
  
G  G

=P 1  tn1 (A.164)
S/ n S/ n

to give a test of significance . The test T rejects for

G
T= tn1, (A.165)
S/ n

A critical region for the two-sided alternative if the variance 2 is unknown of the
form
 
G
C = (1 , . . . , n ) : t = 2 , t 1 (A.166)
S/ n

where 1 and 2 are chosen so that


     
G  G
 G  G

=P 2  tn1 + P 1  tn1
S/ n S/ n S/ n S/ n
(A.167)

to give a test of significance . The test T rejects for

G G
T= tn1, 2 or T = tn1, 2 (A.168)
S/ n S/ n
160 Appendix

References

1. R.P. Jindal, Compact noise models for MOSFETs. IEEE Trans. Electron Devices 53(9),
20512061 (2006)
2. J. Ou, in gm/ID based noise analysis for CMOS analog circuits, Proceedings of IEEE
International Midwest Symposium on Circuits and Systems, pp. 14, 2011
3. W. Wattanapanitch, M. Fee, R. Sarpeshkar, An energy-efficient micropower neural recording
amplifier. IEEE Trans. Biomed. Circuits Syst. 1(2), 136147 (2007)
4. M. Zamani, A. Demosthenous, in Power optimization of neural frontend interfaces,
Proceedings of IEEE International Symposium on Circuits and Systems, pp. 30083011,
2015
5. C.C. Liu etal., A 10-bit 50-MS/s SAR ADC with a monotonic capacitor switching proce-
dure. IEEE J. Solid-State Circuits 45(4), 731740 (2010)
6. D. Zhang, C. Svensson, A. Alvandpour, in Power consumption bounds for SAR ADCs,
Proceedings of IEEE European Conference on Circuit Theory and Design, pp. 556559,
2011
7. T. Yu, S. Kang, I. Hajj, T. Trick, in Statistical modeling of VLSI circuit performances,
Proceedings of IEEE International Conference on Computer-aided Design, pp. 224227,
1986
8. K. Krishna, S. Director, The linearized performance penalty (LPP) method for optimization
of parametric yield and its reliability. IEEE Trans. CAD Integr. Circuits Syst. 15571568
(1995)
9. MOS model 9, available at http://www.nxp.com/models/mos-models/model-9.html
10. M. Pelgrom, A. Duinmaijer, A. Welbers, Matching properties of MOS transistors. IEEE J.
Solid-State Circuits 24(5), 14331439 (1989)
11. V. Litovski, M. Zwolinski, VLSI Circuit Simulation and Optimization (Kluwer Academic
Publishers, Dordrecht, 1997)
12. K.Kundert, Designers Guide to Spice and Spectre (Kluwer Academic Publishers, Dordrecht,
1995)
13. J. Vlach, K. Singhal, Computer Methods for Circuit Analysis and Design (Van Nostrand
Reinhold, New York, 1983)
14. N. Higham, Accuracy and Stability of Numerical Algorithms (SIAM, Philadelphia, 1996)
15. W.J. McCalla, Fundamentals of Computer-aided Circuit Simulation (Kluwer Academic
Publishers, Dordrecht, 1988)
16. F. Scheid, Schaums Outline of Numerical Analysis (McGraw-Hill, New York, 1989)
17. E. Cheney, Introduction to Approximation Theory (American Mathematical Society,
Providence, 2000)
18. S. Director, R. Rohrer, The generalized adjoint network and network sensitivities. IEEE
Trans. Comput. Aided Des. 16(2), 318323 (1969)
19. D. Hocevar, P. Yang, T. Trick, B. Epler, Transient sensitivity computation for MOSFET cir-
cuits. IEEE Trans. Comput. Aided Des. CAD-4, 609620 (1985)
20. Y. Elcherif, P. Lin, Transient analysis and sensitivity computation in piecewise-linear circuits.
IEEE Trans. Circuit Syst. I 38, 15251533 (1991)
21. T. Nguyen, P. OBrien, D. Winston, in Transient sensitivity computation for transistor level
analysis and tuning, Proceedings of IEEE International Conference on Computer-Aided
Design, pp. 120123, 1999
22. K. Abadir, J. Magnus, Matrix Algebra (Cambridge University Press, Cambridge, 2005)
23. A. Papoulis, Probability, Random Variables, and Stochastic Processes (McGraw-Hill, New
York, 1991)
24. C. Gerald, Applied Numerical Analysis (Addison Wesley, Reading, 2003)
Index

A Complementary MOS, 1, 4, 5, 7, 8, 1214, 18,


Action potential, 3, 30, 78, 125 27, 34, 40, 43, 50, 60, 65, 69, 96, 97,
Adaptive boosting, 81 113, 120, 123126
Adaptive duty-cycling, 54, 96 Computer-aided design (CAD), 102
Ahuja-style frequency compensation, 27 Continuous random variable, 108
Analog to digital converter, 5, 52, 69, 125 Corner analysis, 102
Artificial neural network, 78 Correlation
Autocorrelation function, 102 function, 98, 100
Auxiliary amplifier, 5, 9, 13, 18, 21, 45, 48, matrix, 106
125 of device parameters, 97, 103
spatial, 11, 100
Coupling capacitance, 52
B Covariance, 98, 99, 106108
Band-limiting, 5, 21 Critical dimension, 11
BartelsStewart algorithm, 109 Cross-coupled latch, 48
Bayesian clustering, 78
Boosting technique, 21
Bootstrap circuit, 44 D
BrainMachine Interface, 2, 12, 88, 123 Differential algebraic equations, 104
Digital to analog converter, 39
Discrete-time integrator, 108
C Discrete, 8, 53, 102, 106, 111
Channel leakage, 57 Distortion, 8, 9, 13, 18, 29, 30, 39, 40, 42, 44,
Cholesky factor, 109, 110 65, 80, 124
Circuit simulation, 102, 103, 108 Drain-induced barrier lowering, 8
Circuit yield, 1, 3, 7, 12, 14, 102, 110, 119, Dynamic latch, 46, 4951, 57
124, 126 Dynamic range, 3, 5, 6, 8, 13, 17, 18, 21, 29,
Classification, 3, 5, 13, 14, 17, 30, 78, 8184, 58, 62, 63, 65, 115, 125
86, 88, 89, 91, 114, 125, 127
Clock period, 38, 46
Coarse converter, 36, 62 E
Common-mode feedback, 20, 23, 26, 51, 55, Effective channel length, 58
63 Effective number of bits, 38, 63
Common-mode rejection ratio, 23, 29, 40 Effective resolution bandwidth, 12, 25
Comparator, 3436, 38, 39, 4652, 57, 5961, Estimator, 157
66, 116 Euclidean distance, 88
Comparing random variables, 98100 Expectation-maximization, 78

Springer International Publishing Switzerland 2016 161


A. Zjajo, Brain-Machine Interface, DOI10.1007/978-3-319-31541-6
162 Index

Exponential radial basis function, 84 M


Mahalanobis distance, 88
Manufacturing variations, 100
F Matching, 5, 6, 8, 10, 50, 60, 78, 88, 119
Figure of merit, 13, 33, 34, 61, 66, 69, 125 Matrix, 81, 84, 99, 104110, 113
Fine converter, 36 Measurement correction factor, 100
Fitting parameter, 101 Mercer kernel, 82
Flash converters, 35, 38 Michigan probe, 2
Folded cascade amplifier, 24 Miller compensation, 26
Min-max problem, 112
Mobility, 8, 9
G Modified nodal analysis, 104
Gain boosting, 21, 46 Monte Carlo analysis, 102, 103, 107
Gain-bandwidth product, 7, 23 MOSFET, 58
Galerkin method, 100
Gate length, 8, 43, 53, 118
N
Neural spikes, 13, 18, 61, 69, 78, 79, 91, 114,
H 125, 127
Hilbert space, 82, 84 Newtons method, 98, 104
Hot carrier effect, 43, 53 Noise
Hyperplane, 8183, 87 bandwidth, 4, 5, 20, 21, 26, 29, 30, 52, 54,
57, 101
excess factor, 20, 57
I margin, 5, 8, 80
Incidence matrix, 104
Integrated circuit, 40, 101, 102, 103, 108
Integrated-circuit, 101 O
Integrator, 40 Offset, 6, 10, 13, 18, 19, 30, 38, 39, 42, 44, 50,
Ito stochastic differential equations, 107, 108 59, 62, 82, 116, 124
Operational transconductance amplifier, 9,
18, 52
J Ordinary differential equations, 101, 104, 106,
Jacobian, 105 108

K P
Karhunen-Loeve expansion, 99, 100 Parameter space, 81, 103, 111
KarushKuhnTucker conditions, 82, 85 Parameter vector, 147
Kernel, 13, 77, 78, 80, 8284, 86, 87, 89, 91, Parametric yield, 102
113, 125, 127 Parametric yield optimization, 102
Keslers construction, 13, 77, 78, 83, 125 Pedestal voltage, 42
K-means, 78 Phase margin, 21
Kronecker delta, 84 Pipeline converters, 37, 38
Power per area, 14, 96, 97, 112, 117119, 126
Principal component analysis, 78
L Probability density function, 105, 111
Least significant bit, 39 Process variation, 11, 14, 96, 104, 110, 116,
Local field potentials, 19, 33 119, 126
Low noise amplifier, 18, 19, 124 Programmable gain amplifier, 19, 52
Low-noise amplifier, 13 Push-pull current mirror amplifier, 24
Lyapunov equations, 109, 110
Index 163

Q Systematic spatial variation, 98


Quadratic programming, 78, 91, 116 Systematic variability, 5, 98
Quantizer, 35, 38, 62, 63

T
R Telescopic cascode amplifier, 2224, 26
Random variability, 97 Template matching, 78, 88
Random error, 11 Threshold voltage, 4244, 51, 80, 98, 100, 101
Random gate length variability, 8, 116 Threshold voltage-based models, 98
Random intra-chip variability, 116 Time-interleaved systems, 38
Random process, 11, 97100 Tolerance, 42, 102, 103, 117, 119
Random variables, 98, 100 Total harmonic distortion, 13, 18, 29, 65, 124
Random vector, 105 Transconductor, 13, 18, 21, 30, 124
Reliability, 11, 43, 111, 118 Transient analysis, 106, 107
Residuals, 148 Two-stage amplifier, 2527, 46
Runtime, 114 Two-step converter, 3638

S U
Sample and hold, 39, 58, 59, 66 Unbiased estimator, 157
Schur decomposition, 109 Utah array, 2
Sensors, 18, 124, 126, 127
Short-channel effects, 43, 96
Signal to noise and distortion ratio, 65 V
Signal-to-noise ratio, 3, 17, 33, 43, 119, 133 Variable gain amplifier, 25
Significance level, 9 Vernier, 60, 6264, 69
Slew rate, 8, 24, 45, 46 Very large-scale integrated circuit, 3
Spatial correlation, 100 Voltage-to-time converter, 34, 61, 62, 67
Spike classifier, 13, 78, 79, 81, 91, 125 Voltage variability, 2, 116, 126
Spurious free dynamic range, 65
Standard deviation, 86, 98
Static latch, 4749 W
Stationary random process, 98 Wafer, 98
Stochastic differential equations, 103, Wide-sense stationary, 98
105108 Wiener process, 108
Stochastic process, 98, 101, 102, 108 Within-die, 103, 107
Subrange, 35 Worst-case design, 115, 119
Substrate coupling, 26
Successive approximation register, 38, 39
Support vector machine, 13, 78, 79, 81, 83, Y
91, 125 Yield, 3, 11, 12, 14, 84, 89, 96, 97, 103, 106,
Surface potential-based models, 98 110, 111, 116120, 124, 126
Switched capacitor, 40, 41, 44, 118
System on chip, 1, 3, 9, 11, 12, 44, 96, 124