BrainMachine
Interface
Circuits and Systems
BrainMachine Interface
Amir Zjajo
BrainMachine Interface
Circuits and Systems
13
Amir Zjajo
Delft University of Technology
Delft
The Netherlands
The author acknowledges the contributions of Dr. Rene van Leuken of Delft
University of Technology, and Dr. Carlo Galuzzi of Maastricht University.
vii
Contents
1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 BrainMachine Interface: Circuits and Systems . . . . . . . . . . . . . . . . 2
1.2 Remarks on Current Design Practice. . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Organization of the Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Neural Signal Conditioning Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 PowerEfficient Neural Signal Conditioning Circuit. . . . . . . . . . . . . 18
2.3 Operational Amplifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Neural Signal Quantization Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 LowPower A/D Converter Architectures . . . . . . . . . . . . . . . . . . . . . 34
3.3 A/D Converter Building Blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.1 Sample and Hold Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Bootstrap Switch Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.3 Operational Amplifier Circuit. . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.4 Latched Comparator Circuit. . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 VoltageDomain SAR A/D Conversion. . . . . . . . . . . . . . . . . . . . . . . 52
3.5 CurrentDomain SAR A/D Conversion. . . . . . . . . . . . . . . . . . . . . . . 58
3.6 TimeDomain TwoStep A/D Conversion . . . . . . . . . . . . . . . . . . . . . 60
3.7 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
ix
x Contents
xi
Abbreviations
xiii
xiv Abbreviations
xvii
xviii Symbols
eq Quantization error
e2 Noise power
E{.} Expected value
Econv Energy per conversion step
fclk Clock frequency
fin Input frequency
fp,n(di) Eigenfunctions of the covariance matrix
fS Sampling frequency
fsig Signal frequency
fspur Frequency of spurious tone
fT Transit frequency
f(x,t) Vector of noise intensities
FQ Function of the deterministic initial solution
g Conductance
gm Transconductance
Gi Interstage gain
Gm Transconductance
h Numerical integration stepsize, surface heat transfer coefficient
i Index, circuit node, transistor on the die
imax Number of iteration steps
I Current
Iamp Total amplifier current consumption
Idiff Difussion current
ID Drain current
IDD Power supply current
Iref Reference current
j Index, circuit branch
J0 Jacobian of the initial data z0 evaluated at pi
k Boltzmanns coefficient, error correction coefficient, index
K Amplifier current gain, gain error correction coefficient
K(t) Variancecovariance matrix of (t)
L Channel length
Li Lowrank Cholesky factors
L(TX) Loglikelihood of parameter with respect to input set TX
m Index
M Number of terms, number of channels in BMI
n Index, number of circuit nodes, number of bits
N Number of bits
Naperture Aperture jitter limited resolution
P Power
p Process parameter
p(di,) Stochastic process corresponding to process parameter p
pX(x) Gaussian mixture model
p* Process parameter deviations from their corresponding nominal values
Symbols xix
Best way to predict the future is to invent it. Medicine in the twentieth century
relied primarily on pharmaceuticals that could chemically alter the action of neu
rons or other cells in the body, but twentyfirst century health care may be defined
more by electroceuticals: novel treatments that will use pulses of electricity to
regulate the activity of neurons, or devices that interface directly with our nerves.
Systems such as brainmachine interface (BMI) detect the voltage changes in
the brain that occur when neurons fire to trigger a thought or an action, and they
translate those signal into digital information that is conveyed to the machine, e.g.,
prosthetic limb, speech prosthesis, a wheelchair.
Recently, many promising technological advances are about to change our con
cept about healthcare, as well as the provision of medical cares. For example, the
telemedicine, ehospital, and ubiquitous healthcare are enabled by emerging wire
less broadband communication technology. While initially becoming mainstream
for portable devices such as notebook computers and smart phones, wireless
communication (e.g., wireless sensor network, body sensor network) is evolving
toward wearable and/or implantable solutions. The combination of two technol
ogies, ultralow power sensor technology and ultralow power wireless commu
nication technology, enables longterm continuous monitoring and feedback to
medical professionals wherever needed.
Neural prosthesis systems enable the interaction with neural cells either by
recording, to facilitate early diagnosis and predict intended behavior before under
taking any preventive or corrective actions, or by stimulation, to prevent the onset
of detrimental neural activity. Monitoring the activity of a large population of neu
rons in neurobiological tissue with highdensity microelectrode arrays in multi
channel implantable BMI is a prerequisite for understanding the cortical structures
and can lead to a better conception of stark brain disorders, such as Alzheimers
and Parkinsons diseases, epilepsy and autism [1], or to reestablish sensory (e.g.,
hearing and vision) or motor (e.g., movement and speech) functions [2].
Metalwire and micromachined silicon neural probes, such as the Michigan
probe [3] or the Utah array [4], have aided the development of highly integrated
multichannel recording devices with large channel counts, enabling study of brain
activity and the complex processing performed by neural systems in vivo [57].
Several studies have demonstrated that the understanding of certain brain functions
can only be achieved by monitoring the electrical activity of large numbers of indi
vidual neurons in multiple brain areas at the same time [8]. Consequently, realtime
acquisition from many parallel readout channels is thus needed both for the suc
cessful implementation of neural prosthetic devices as well as for a better under
standing of fundamental neural circuits and connectivity patterns in the brain [9].
One of the main goals of the current neural probe technologies [1021] is to
minimize the size of the implants while including as many recording sites as pos
sible, with high spatial resolution. This enables the fabrication of devices that
match the feature size and density of neural circuits [22], and facilitates the spike
1.1 BrainMachine Interface: Circuits and Systems 3
classification process [23, 24]. Because electrical recording from single neurons is
invasive, monitoring large numbers of neurons using large implanted devices inev
itably increases the tissue damage; thus, there exists a tradeoff between the probe
size and the number of recording sites. Although existing neural probes can record
from many neurons, the limitations in the interconnect technology constrains the
number of recording sites that can be routed out of the probe [8].
The study of highly localized neural activity requires, besides implantable
microelectrodes, electronic circuitry for accurately amplifying and conditioning
the signals detected at the recording sites. While neural probes have become more
compact and denser in order to monitor large populations of neurons, the inter
facing electronic circuits have also become smaller and more capable of handling
large amounts of parallel recording channels. Some of the challenges in the design
of analog frontend circuits for neural recording are associated with the nature of
the neural signals. These signals have amplitudes in the order of few V to several
mV and frequency spans from dc to a few kHz. Local field potentials (LFPs), rep
resenting averaged activity from small sets of neurons surrounding the recording
sites, can be found in the lowfrequency range (~1300Hz). On the other hand,
action potentials (APs) or spikes, representing singlecell activity, are located in
the higher frequency range (~30010kHz). Recording both LFPs and APs using
implanted electrodes yields the most informative signals for studying neuronal
communication and computation. Thus, according to the nature of a specific sig
nal, the recording circuits have to be designed with sufficiently low inputreferred
noise [i.e., to achieve a high signaltonoise ratio (SNR)] and sufficient gain and
dynamic range.
The raw data rates that are generated by simultaneous monitoring of hun
dreds and even thousands of neurons are large [25]. When sampled at 32kS/s
with 10bit precision, 100 electrodes would generate raw data rate of 32Mbs1.
Communicating such volumes of neuronal data over batterypowered wireless
links, while maintaining reasonable battery life, is hardly possible with common
methods of lowpower wireless communications. Evidently, some form of data
reduction or lossy data compression to reduce the raw waveform data capacity,
e.g., wavelet transform [26], must be applied. Alternatively, only significant fea
tures of the neuronal signal could be extracted and the transmitted data could be
limited to those features only [8], which may lead to an order of magnitude reduc
tion in the required data rate [27]. Additionally, if the neuronal spikes are sorted
on the chip [28], and mere notifications of spike events are transmitted to the host,
another order of magnitude reduction can be achieved. Adapting powerefficient
spike sorting algorithms for utilization in verylargescale integration (VLSI) can
yet lead to significant power savings, with only a limited accuracy loss [29, 30].
The block diagram of Mchannel neural recording system is illustrated in
Fig. 1.1. With an increase in the range of applications and their functionalities,
neuroprosthetic devices are evolving to a closedloop control system [31] com
posed of a frontend neural recording interface and a backend neural signal pro
cessing, containing features such as local field potential measurement circuits
[32] or spike detection circuits [33]. To evade the risk of infection, these systems
4 1Introduction
DSP
M LNA M M M n n K K
recording low noise bandpass filter programmable gain SAR A/D digital signal processing D/A converter reconstructionfilter stimulator
electrode amplifier amplifier converter system electrode
are implanted under the skin, while the recorded neural signals and the power
required for the implant operation is transmitted wirelessly. If a battery is used
with an energy capacity of 625mAh at 1.5V, a CMOS IC with 100mW power
consumption can only last for nine and a half hours. Most of implantable biomedi
cal devices in contrast should last more than 10years and this limits the average
system power consumption (when using the same battery) to 10W. Proximity
between electrodes and circuitry and the increasing density in multichannel elec
trode arrays are creating significant design challenges in respect to circuit minia
turization and power dissipation reduction of the recording system.
Power density is limited to 0.8mW/mm2 [34] to prevent possible heat damage
to the tissue surrounding the device (and subsequently, limited power consumption
prolong the batterys longevity and evade recurrent battery replacements surger
ies). Furthermore, the space to host the system is restricted to ensure minimal tis
sue damage and tissue displacement during implantation.
The signal quality in neural interface frontend, beside the specifics of the elec
trode material and the electrode/tissue interface, is limited by the nature of the
biopotential signal and its biological background noise, dictating system resource
constraints, such as power, area, and bandwidth. The BMI architecture includes,
additionally, a microstimulation module to apply stimulation signals to the brain
neural tissues. Currently, multielectrode arrays contain 10100s electrodes and
are projected to double every seven years [35]. When a neuron fires an action
potential, the cell membrane becomes depolarized by the opening of voltagecon
trolled neuron channels, which leads to a flow of current both inside and outside
the neuron. Since extracellular media is resistive [36], the extracellular potential is
approximately proportional to the current across the neuron membrane [37]. The
membrane roughly behaves like an RC circuit and most current flows through the
membrane capacitance [38].
The neural data acquired by the recording electrodes is conditioned using
analog circuits. The electrode is characterized by its charge density and imped
ance characteristics (e.g., a 36m diameter probe (1000m2) may have a capaci
tance of 200pF, equivalent to 80k impedance at 10kHz), which determines the
amount of noise added to the signal (e.g., 7Vrms for a 10kHz recording band
width). As a result of the small amplitude of neural signals (typically ranging from
10 to 500V and containing data up to~10kHz), and the high impedance of
the electrode tissue interface, lownoise amplification (LNA), bandpass filtering,
1.1 BrainMachine Interface: Circuits and Systems 5
(a) (b)
Correction Approach A/D
Block
System Level
Correction
Error
D/A Estimation
(c)
Block Level Block Level Block Level
Correction Correction Correction A/D Error
Block Correction
Fig. 1.2a Correction approach for mixedsignal and analog circuits, b mixedsignal solution
(digital error estimation, analog error correction), c alternative mixedsignal scheme (error esti
mation and correction are done digitally)
demand for reduced circuit offset. Initial work on digital signalcorrection process
ing started in the early nineties, and focused on offset attenuation or dispersion.
The next priority became area scaling for analog functions, to keep up with the
pace at which digital costperfunction was reducing [42]. Lately, the main focus
is on correcting analog device characteristics, which became impaired as a result
of aggressive feature size reduction and area scaling. However, efficient digital sig
nalcorrection processing of analog circuits is only possible if their analog behav
ior is sufficiently well characterized. As a consequence, an appropriate model, as
well as its corresponding parameters, has to be identified. The model is based on a
priori knowledge about the system. The key parameters that influence the system
and their time behavior are typical examples. Nevertheless, in principle, the model
itself can be derived and modified adaptively, which is the central topic of adap
tive control theory. The parameters of the model can be tuned during the fabrica
tion of the chip or during its operation. Since fabricationbased correction methods
are limited, algorithms that adapt to a nonstationary environment during operation
have to be employed.
In this section, the most challenging design issues for analog circuits in deep sub
micron technologies such as contrasting the degradation of analog performances
caused by requirement for biasing at lower operating voltages, obtaining high
dynamic rangewith low voltage supplies and ensuring good matchingfor lowoff
set are reviewed. Additionally, the subsequent remedies to improves the perfor
mance of analog circuits and data converters by correcting or calibrating the static
and possibly the dynamic limitations through calibration techniques are briefly
discussed as well.
1.2 Remarks on Current Design Practice 7
(a)
200 1000
Line width
GBW [GHz]
150 100
Line Width [nm] GBW
100 10
Supply voltage
0 0.1
1998 2003 2008 2015
Year
(b) 12
10 CL=100 fF
8
GBW [GHz]
90 nm
6 CL=200 fF
4
CL=100 fF
0.25 m
2
CL=200 fF
0
0.25 0.75 1.25 1.75
IDS [A]
Fig. 1.3a Trend of analog features in CMOS technologies. b Gainbandwidth product versus
drain current in two technological nodes
high dynamic range, with low supply voltages and low power dissipation in ultra
deep submicron CMOS technology is a major challenge. The key limitation of
analog circuits is that they operate with electrical variables and not simply with
discrete numbers that, in circuit implementations, gives rise of a beneficial noise
margin. On the contrary, the accuracy of analog circuits fundamentally relies on
matchingbetween components, low noise, offset and low distortions.
With reduction of the supply voltage to ensure suitable overdrive voltage for
keeping transistors in saturation, even if the number of transistors stackedup is
kept at the minimum, the swing of signals is low if high resolution is required.
Low voltage is also problematic for driving CMOS switches especially for the
ones connected to signal nodes as the onresistance can become very high or at the
limit the switch does not close at all in some interval of the input amplitude.
In general, to achieve a high gain operation, high output impedance is neces
sary, e.g., drain current should vary only slightly with the applied VDS. With the
transistor scaling, the drain assert its influence more strongly due to the growing
proximity of gate and drain connections and increase the sensitivity of the drain
current to the drain voltage. The rapid degradation of the output resistance at gate
lengths below 0.1m and the saturation of gm reduce the device intrinsic gain
gmro characteristics.
As transistor size is reduced, the fields in the channel increase and the dopant
impurity levels increase. Both changes reduce the carrier mobility, and hence the
transconductance gm. Typically, desired high transconductance value is obtained
at the cost of an increased bias current. However, for very short channel the car
rier velocity quickly reaches the saturation limit at which the transconductance
also saturates becoming independent of gate length or bias gm = WeffCoxvsat/2.
As channel lengths are reduced without proportional reduction in drain voltage,
raising the electric field in the channel, the result is velocity saturation of the
carriers, limiting the current and the transconductance. A limited transconduct
ance is problematic for analog design: for obtaining high gain it is necessary to
use wide transistors at the cost of an increased parasitic capacitances and, con
sequently, limitations in bandwidth and slew rate. Even using longer lengths
obtaining gain with deep submicron technologies is not appropriate; it is typi
cally necessary using cascode structures with stack of transistors or circuits with
positive feedback. As transistors dimension reduction continues, the intrinsic gain
keeps decreasing due to a lower output resistance as a result of draininduced bar
rier lowering and hot carrier impact ionization. To make devices smaller, junction
design has become more complex, leading to higher doping levels, shallower junc
tions, halo doping, etc., all to decrease draininduced barrier lowering. To keep
these complex junctions in place, the annealing steps formerly used to remove
damage and electrically active defects must be curtailed, increasing junction leak
age. Heavier doping also is associated with thinner depletion layers and more
recombination centers that result in increased leakage current, even without lat
tice damage. In addition, gate leakage currents in very thinoxide devices will set
an upper bound on the attainable effective output resistance via circuit techniques
1.2 Remarks on Current Design Practice 9
(a) 500
Cgs
W
10
fT
1
0.1 0.2 0.3 0.4 0.5
L[m]
(b) 10k
b
c
1k 90 nm
a 0.13m
fC [Mz]
0.18 m
100 0.25 m
10
0.01 0.1 1 10
IDS [A]
Fig.1.4a Scaling of gate width and transistor capacitances. b Conversion frequency fc versus
drain current for four technological nodes
In the region of the current being less than this value (region a), the conversion
frequency increases with an increase of the sink current. Similarly, in the region
of the current being higher than this value (region c), the conversion frequency
decreases with an increase of the sink current. There are two reasons why this
characteristic is exhibited; in the low current region, the gm is proportional to the
sink current, and the parasitic capacitances are smaller than the signal capacitance.
At around the peak, at least one of the parasitic capacitances becomes equal to the
signal capacitance. In the region of the current being larger than that value, both
parasitic capacitances become larger than the signal capacitance and the conver
sion frequency will decrease with an increase of the sink current.
The offset of any analog circuit and the static accuracy of data converters criti
cally depend on the matchingbetween nominally identical devices. With transis
tors becoming smaller, the number of atoms in the silicon that produce many of
1.2 Remarks on Current Design Practice 11
the transistors properties is becoming fewer, with the result that control of dopant
numbers and placement is more erratic. During chip manufacturing, random pro
cess variations affect all transistor dimensions: length, width, junction depths,
oxide thickness, etc., and become a greater percentage of overall transistor size
as the transistor scales. The stochastic nature of physical and chemical fabrica
tion steps causes a random error in electrical parameters that gives rise to a time
independent difference between equally designed elements. The error typically
decreases as the area of devices. Transistor matching properties are improved
with a thinner oxide [43]. Nevertheless, when the oxide thickness is reduced to
a few atomic layers, quantum effects will dominate and matching will degrade.
Since many circuit techniques exploit the equality of two components it is impor
tant for a given process obtaining the best matching especially for critical devices.
Some of the rules that have to be followed to ensure good matching are: firstly,
devices to be matched should have the same structure and use the same materials,
secondly, the temperature of matched components should be the same, e.g., the
devices to be matched should be located on the same isotherm, which is obtained
by symmetrical placement with respect to the dissipative devices, thirdly, the dis
tance between matched devices should be minimum for having the maximum spa
tial correlation of fluctuating physical parameters, commoncentroid geometries
should be used to cancel the gradient of parameters at the first order. Similarly, the
same orientation of devices on chip should be the same to eliminate dissymme
tries due to unisotropic fabrication steps, or to the uniostropy of the silicon itself
and lastly, the surroundings in the layout, possibly improved by dummy structures
should be the same to avoid border mismatches.
The use of digital enhancing techniques in A/D converters (i.e., foreground,
background) reduces the need for expensive technologies with special fabrication
steps; a side advantage is that the cost of parts is reduced while maintaining good
yield, reliability and longterm stability. The foreground calibration interrupts the
normal operation of the converter for performing the trimming of elements or the
mismatch measurement by a dedicated calibration cycle normally performed at
poweron or during periods of inactivity of the circuit. Any miscalibration or sud
den environmental changes such as power supply or temperature may make the
measured errors invalid. Therefore, for devices that operate for long periods it is
necessary to have periodic extra calibration cycles. The input switch restores the
data converter to normal operational after the mismatch measurement and every
conversion period the logic uses the output of the A/D converter to properly
address the memory that contains the correction quantity. In order to optimize the
memory size the stored data should be the minimum wordlength, which depends
on technology accuracy and expected A/D linearity. The digital measure of errors,
that allows for calibration by digital signal processing, can be at the element, block
or entire converter level. The calibration parameters are stored in memories but, in
contrast with the trimming case, the content of the memories is frequently used, as
they are input of the digital processor.
12 1Introduction
Methods using background calibration work during the normal operation of the
converter by using extra circuitry that functions all the time synchronously with
the converter function. Often these circuits use hardware redundancy to perform
a background calibration on the fraction of the architecture that is not temporarily
used. However, since the use of redundant hardware is effective but costs silicon
area and power consumption, other methods aim at obtaining the functionality by
borrowing a small fraction of the sampled data circuit operation for performing the
selfcalibration.
1.3Motivation
largescale neural spike data classification can be obtained with a low power (less
than 41W, corresponding to a 15.5W/mm2 of power density), compact, and a
low resource usage structure (31k logic gates resulting in a 2.64mm2 area).
In Chap. 5, we develop a yield constrained sequential powerperarea (PPA)
minimization framework based on dual quadratic program that is applied to mul
tivariable optimization in neural interface design under bounded process variation
influences. In the proposed algorithm, we create a sequence of minimizations of
the feasible PPA regions with iteratively generated lowdimensional subspaces,
while accounting for the impact of area scaling. With a twostep estimation flow,
the constrained multicriteria optimization is converted into an optimization with
a single objective function, and repeated estimation of noncritical solutions are
evaded. Consequently, the yield constraint is only active as the optimization con
cludes, eliminating the problem of overdesign in the worstcase approach. The PPA
assignment is interleaved, at any design point, with the configuration selection,
which optimally redistributes the overall index of circuit quality to minimize the
total PPA ratio. The proposed method can be used with any variability model and,
subsequently, any correlation model, and is not restricted by any particular perfor
mance constraint. The experimental results, obtained on the multichannel neural
recording interface circuits implemented in 90nm CMOS technology, demonstrate
power savings of up to 26% and area of up to 22%, without yield penalty.
In Chap. 6 the main conclusions are summarized and recommendations for fur
ther research are presented.
References
12. R.H. Olsson etal., Bandtunable and multiplexed integrated circuits for simultaneous record
ing and stimulation with microelectrode arrays. IEEE Trans. Biomed. Eng. 52(7), 13031311
(2005)
13. T.J. Blanche, M.A. Spacek, J.F. Hetke, N.V. Swindale, Polytrodes: highdensity silicon elec
trode arrays for largescale multiunit recording. J. Neurophysiol. 93(5), 29873000 (2005)
14. R.J. Vetter, etal., in Development of a Microscale Implantable Neural Interface (MINI)
Probe Systems. Proceedings of International Conference of Engineering in Medicine and
Biology Society, pp. 73417344, 2005
15. G.E. Perlin, K.D. Wise, An ultra compact integrated front end for wireless neural recording
microsystems. J. Microelectromech. Syst. 19(6), 14091421 (2010)
16. P. Ruther, etal., in Compact Wireless Neural Recording System for Small Animals using
SiliconBased Probe Arrays. Proceedings of International Conference of Engineering in
Medicine and Biology Society, pp. 22842287, 2011
17. T. Torfs etal., Twodimensional multichannel neural probes with electronic depth control.
IEEE Trans. Biomed. Circ. Syst. 5(5), 403412 (2011)
18. U.G. Hofmann etal., A novel high channelcount system for acute multisite neuronal record
ings. IEEE Trans. Biomed. Eng. 53(8), 16721677 (2006)
19. P. Norlin etal., A 32site neural recording probe fabricated by DRIE of SOI substrates. J.
Microelectromech. Microeng. 12(4), 414 (2002)
20. J. Du etal., Multiplexed, high density electrophysiology with nanofabricated neural probes.
PLoS ONE 6(10), e26204 (2011)
21. K. Faligkas, L.B. Leene, T.G. Constandinou, in A Novel Neural Recording System Utilising
Continuous Time Energy Based Compression. Proceedings of International Symposium on
Circuits and Systems, pp. 30003003, 2015
22. J.T. Robinson, M. Jorgolli, H. Park, Nanowire electrodes for highdensity stimulation and
measurement of neural circuits. Frontiers Neural Circ. 7(38), 2013
23. C.M. Gray, P.E. Maldonado, M. Wilson, B. McNaughton, Tetrodes markedly improve the
reliability and yield of multiple singleunit isolation from multiunit recordings in cat striate
cortex. J. Neurosci. Methods 63(12), 4354 (1995)
24. K.D. Harris, D.A. Henze, J. Csicsvari, H. Hirase, G. Buzski, Accuracy of tetrode spike
separation as determined by simultaneous intracellular and extracellular measurements. J.
Neurophysiol. 84(1), 401414 (2000)
25. R.R. Harrison, in A LowPower Integrated Circuit for Adaptive Detection of Action Potentials
in Noisy Signals. Proceedings of Annual International Conference of the IEEE Engineering
in Medicine and Biology Society, pp. 33253328, 2003
26. K. Oweiss, K. Thomson, D. Anderson, in A Systems Approach for RealTime Data
Compression in Advanced BrainMachine Interfaces. Proceedings of IEEE International
Conference on Neural Engineering, pp. 6265, 2005
27. Y. Perelman, R. Ginosar, Analog frontend for multichannel neuronal recording system with
spike and lfp separation. J. Neurosci. Methods 153, 2126 (2006)
28. Z.S. Zumsteg, etal., in Power Feasibility of Implantable Digital SpikeSorting Circuits for
Neural Prosthetic Systems. Proceedings of Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, pp. 42374240, 2004
29. A. Zviagintsev, Y. Perelman, R. Ginosar, in Low Power Architectures for Spike Sorting.
Proceedings of IEEE International Conference on Neural Engineering, pp. 162165, 2005
30. A. Zviagintsev, Y. Perelman, R. Ginosar, in Low Power Spike Detection and Alignment
Algorithm. Proceedings of IEEE International Conference on Neural Engineering, pp. 317
320, 2005
31. B. Gosselin, Recent advances in neural recording microsystems. Sensors 11(5), 45724597
(2011)
32. R.R. Harrison, G. Santhanam, K.V. Shenoy, in Local Field Potential Measurement with Low
power Analog Integrated Circuit. International Conference of IEEE Engineering in Medicine
and Biology Society, vol. 2, pp. 40674070, 2004
16 1Introduction
33. R.R. Harrison etal., A lowpower integrated circuit for a wireless 100electrode neural
recording system. IEEE J. SolidState Circ. 42(1), 123133 (2007)
34. S. Kim, R. Normann, R. Harrison, F. Solzbacher, in Preliminary Study of the Thermal Impact
of a Microelectrode Array Implanted in the Brain. Proceedings of Annual International
Conference of the IEEE Engineering in Medicine and Biology Society, pp. 29862989, 2006
35. I.H. Stevenson, K.P. Kording, How advances in neural recording affect data analysis. Nat.
Neurosci. 14(2), 139142 (2011)
36. C.I. de Zeeuw etal., Spatiotemporal firing patterns in the cerebellum. Nat. Rev. Neurosci.
12(6), 327344 (2011)
37. F. Klbl, etal., in In Vivo Electrical Characterization of Deep Brain Electrode and Impact
on Bioamplifier Design. IEEE Biomedical Circuits and Systems Conference, pp. 210213,
2010
38. A.C. West, J. Newman, Current distributions on recessed electrodes. J. Electrochem. Soc.
138(6), 16201625 (1991)
39. S.K. Arfin, Low power circuits and systems for wireless neural stimulation, PhD Thesis,
MIT, 2011)
40. K.H. Kim, S.J. Kim, A waveletbased method for action potential detection from extracel
lular neural signal recording with low signaltonoise ratio. IEEE Trans. Biomed. Eng. 50,
9991011 (2003)
41. K. Okada, S. Kousai (ed.), DigitallyAssisted Analog and RF CMOS Circuit Design for
Software defined Radio (Springer Verlag GmbH, Berlin, 2011)
42. M. Verhelst, B. Murmann, Area scaling analysis of CMOS ADCs. IEEE Electron. Lett. 48(6),
314315 (2012)
43. M. Pelgrom, A. Duinmaijer, A. Welbers, Matching properties of MOS transistors. IEEE J.
SolidState Circ. 24(5), 14331439 (1989)
Chapter 2
Neural Signal Conditioning Circuits
2.1Introduction
recording in vivo demands complying with severe safety requirements. For exam
ple, the maximum temperature increase due to the operation of the cortical implant
in any surrounding brain tissue should be kept at less than 1C [1].
The limited total power budget imposes strict specifications on the circuit
design of the lownoise analog frontend and highspeed circuits in the wideband
wireless link, which transmits the recorded data to a base station located outside
the skull. The design constraints are more pronounced when the number of record
ing sites increases to several hundred for typical multielectrode arrays.
Frontend neural amplifiers are crucial building blocks in implantable cortical
microsystems. Lowpower and lownoise operation, stable dc interface with the
sensors (microprobes), and small silicon area are the main design specifications of
these amplifiers. The power dissipation is dictated by the tolerable inputreferred
thermal noise of the amplifier, where the tradeoff is expressed in terms of noise
efficiency factor [2]. For an ideal thermalnoiselimited amplifier with a constant
bandwidth and supply voltage, the power of the amplifier scales as 1/v2n where vn
is the inputreferred noise of the amplifier. This relationship shows the steep power
cost of achieving lownoise performance in an amplifier.
In this chapter, we introduce a novel, lowpower neural recording interface
system with capacitivefeedback low noise amplifier and capacitiveattenuation
bandpass filter. The capacitivefeedback amplifier offers lowoffset and low
distortion solution with optimal powernoise tradeoff. Similarly, the capacitive
attenuation bandpass filter provides wide tuning range and lowpower realization,
while allowing simple extension of the transconductors linear range, and conse
quently, ensuring low harmonic distortion. The low noise amplifier and bandpass
filter circuit are realized in a 65 nm CMOS technology, and consume 1.15W
and 390nW, respectively. The fully differential lownoise amplifier achieves
40dB closed loop gain, and occupies an area of 0.04mm2. Input referred noise is
3.1Vrms over the operating bandwidth 0.120kHz. Distortion is below 2% total
harmonic distortion (THD) for typical extracellular neural signals (smaller than
10mV peaktopeak). The capacitiveattenuation bandpass filter with firstorder
slopes achieves 65dB dynamic range, 210mVrms at 2% THD and 140Vrms
total integrated output noise.
The chapter is organized as follows: Sect.2.2 focuses on the signal condition
ing circuit details, while Sect.2.3 offers brief overview of the operational amplifier
circuit concepts. Experimental results obtained are presented in Sect.2.4. Finally,
Sect.2.5 provides a summary and the main conclusions.
The neural spikes, typically ranging from 10 to 500V and containing data up
to~20kHz, are amplified with low noise neural amplifier (LNA) illustrated in
Fig. 2.1, where Vref voltage designates the node connected to the reference elec
trode. The amplifier A1 is designed based on an operational transconductance
2.2 PowerEfficient Neural Signal Conditioning Circuit 19
T1 T2
Cin Cf
Vin C
A1
Vref
Gm2 Vout
C/ (A+1) A2
Cin AC Gm1 C
Cf T3
T4 VSS VSS R1
AC R2
VSS VSS
VSS
Fig.2.1Schematic of the signal conditioning circuit including low noise amplifier, bandpass
filter and programmable gain amplifier
VDD
T15 T16
T13 T14
T18 T17
T9 T12
T10 T11
T5 T6 T7 T8
R1 R2 R3 R4
VSS
than the active loads the gm of the cascading transistors is maximized, boosting the
dc gain, while their saturation voltage is reduced, allowing for a larger saturation
voltage for the active loads, without exceeding the voltage headroom. The bias
current of the LNA can be varied to adapt its noise per unit bandwidth.
To keep the overall bandwidth constant when the bias current of the gain stage
is varied, a bandpass filter [8] (Fig.2.3) is added to the output of the LNA. High
gain provided by the LNA stage alleviates noise floor requirements of this band
widthlimiting stage. The total integrated output voltage noise of the filter depends
on the linear range of the transconductors Gm1 and Gm2 (Fig.2.4), the ratio of the
attenuator capacitances A and the unit capacitance C. The linear range of the Gm is
effectively improved by attenuating the input. In the highpass stage, the signal is
attenuated by a factor of A+1 and the full capacitance of (A+1)C is then utilized
for filtering with Gm1. In the lowpass stage, a gain of A+1 is applied to signals
in the passband. A capacitance C/(A+1) is added in parallel with the attenuating
capacitances to increase the filtering capacitance.
2.3Operational Amplifiers
Operating on the edge of the performance envelope, op amps exhibit intense trade
offs amongst the dynamic range, linearity, settling speed, stability, and power con
sumption. As a result, accuracy and speed are often dictated by the performance of
these amplifiers.
Amplifiers with a single gain stage have high output impedance providing an
adequate dc gain, which can be further increased with gain boosting techniques.
Singlestage architecture offers large bandwidth and a good phase margin with
VDD
T9
Vpbias
T10
Vinp Vinn
T1 T3 T4 T2
T5 T6
T7 T8
Vout
T11 T12
VSS
VDD
T13 T14
Vinp Vinn
T1 T2 Vout
T3 T4
T7 T5 T6 T8
VSS
VDD
bias1
T7 T8
bias2
T5 T6
outn outp
bias3
T3 T4
inp inn
T1 T2
cmfb
T3
VSS
VDD
T7 bias2 bias2 T8
bias3 bias3
T5 T6
bias4
T11
T3 cmfb cmfb T4
VSS
the saturation voltage of a transistor. With this maximum possible output swing
the input commonmode range is zero. In practice, some input commonmode
range, which reduces the output swing, always has to be reserved so as to permit
inaccuracy and settling transients in the signal commonmode levels. The high
speed capability of the amplifier is the result of the presence of only nchannel
transistors in the signal path and of relatively small capacitance at the source of
the cascode transistors. The gainbandwidth product of the amplifier is given by
GBW=gm1/CL, where gm1 is the transconductance of transistors T1 and CL is the
load capacitance. Thus, the GBW is limited by the load capacitance.
Due to its the simple topology and dimensioning, the telescopic cascode ampli
fier is preferred if its output swing is large enough for the specific application. The
output signal swing of this architecture has been widened by driving the transis
tors T7T8 into the linear region [10]. In order to preserve the good common mode
rejection ratio and power supply rejection ratio properties of the topology, addi
tional feedback circuits for compensation have been added to these variations. The
telescopic cascode amplifier has low current consumption, relatively high gain,
low noise and very fast operation. However, as it has five stacked transistors, the
topology is not suitable for low supply voltages.
The folded cascode amplifier topology [11] is shown in Fig.2.6. The swing of
this design is constrained by its cascoded output stage. It provides a larger output
swing and input commonmode range than the telescopic amplifier with the same
dc gain and without major loss of speed. The output swing is VDD4VDS,SAT and
is not linked to the input commonmode range, which is VDD VT 2VDS,SAT.
The second pole of this amplifier is located at gm7/Cpar, where gm7 is the transcon
ductance of T7 and Cpar is the sum of the parasitic capacitances from transistors
T1, T7 and T9 at the source node of transistor T7. The frequency response of this
amplifier is deteriorated from that of the telescopic cascode amplifier because of
a smaller transconductance of the pchannel device and a larger parasitic capac
itance. To assure symmetrical slewing, the output stage current is usually made
24 2 Neural Signal Conditioning Circuits
VDD
bias3 bias3
T8 T9
outn outp
inn inp
bias2 T2 T3 bias2
T10 T11
T16 T17
T12 T1 T13
K:1 bias1 1:K
KIB/2 IB/2 IB IB/2 KIB/2
VSS
equal to that of the input stage. The GBW of the folded cascode amplifier is also
given by gm1/CL.
The open loop dc gain of amplifiers having cascode transistors can be boosted
by regulating the gate voltages of the cascode transistors [12]. The regulation is
realized by adding an extra gain stage, which reduces the feedback from the output
to the drain of the input transistors. In this way, the dc gain of the amplifier can be
increased by several orders of magnitude. The increase in power and chip area can
be kept very small with appropriate feedback amplifier architecture [12]. The cur
rent consumption of the folded cascode is doubled compared to the telescopic cas
code amplifier although the output voltage swing is increased since there are only
four stacked transistors. The noise of the folded cascode is slightly higher than in
the telescopic cascode as a result of the added noise from the current source tran
sistors T9 and T10. In addition, the folded cascade has a slightly smaller dc gain
due to the parallel combination of the output resistance of transistors T1 and T9.
A pushpull currentmirror amplifier, shown in Fig.2.7, has much better slew
rate properties and potentially larger bandwidth and dc gain than the folded cas
code amplifier. The slew rate and dc gain depend on the currentmirror ratio K,
which is typically between one and three. However, too large currentmirror ratio
increases the parasitic capacitance at the gates of the transistors T12 and T13, push
ing the nondominant pole to lower frequencies and limiting the achievable GBW.
The nondominant pole of the current mirror amplifier is much lower than that of
the folded cascode amplifier and telescopic amplifiers due to the larger parasitic
capacitance at the drains of input transistors.
The noise and current consumption of the currentmirror amplifier are larger
than in the telescopic cascode amplifier or in the folded cascode amplifier. A cur
rentmirror amplifier with dynamic biasing [13] can be used to make the amplifier
biasing be based purely on its small signal behavior, as the slew rate is not limited.
In dynamic biasing, the biasing current of the operational amplifier is controlled
2.3 Operational Amplifiers 25
VDD
T3 bias1 T4
T5 T6
CC CC
bias2
T9
cmfb cmfb
T7 T8
VSS
on the basis of the differential input signal. With large differential input signals,
the biasing current is increased to speed up the output settling. Hence, no slew
rate limiting occurs, and the GBW requirement is relaxed. As the settling proceeds,
the input voltage decreases and the biasing current is reduced. The biasing current
needs to be kept only to a level that provides enough GBW for an adequate small
signal performance. In addition to relaxed GBW requirements, the reduced static
current consumption makes the design of a highdc gain amplifier easier. With
very low supply voltages, the use of the cascode output stages limits the avail
able output signal swing considerably. Hence, twostage operational amplifiers
are often used, in which the operational amplifier gain is divided into two stages,
where the latter stage is typically a commonsource output stage. Unfortunately,
with the same power dissipation, the speed of the twostage operational amplifiers
is typically lower than that of singlestage operational amplifiers.
Of the several alternative twostage amplifiers, Fig.2.8 shows a simple Miller
compensated amplifier [14]. With all the transistors in the output stage of this ampli
fier placed in the saturation region, it has an output swing of VDDVDS,SAT. Since
the nondominant pole, which arises from the output node, is determined domi
nantly by an explicit load capacitance, the amplifier has a compromised frequency
response.
The gain bandwidth product of a Miller compensated amplifier is given approx
imately by GBW=gm1/CC, where gm1 is the transconductance of T1. In general,
the open loop dc gain of the basic configuration is not large enough for highres
olution applications. Gain can be enhanced by using cascoding, which has, how
ever, a negative effect on the signal swing and bandwidth. Another drawback of
this architecture is a poor power supply rejection at high frequencies because of
the connection of VDD through the gatesource capacitance CGS5,6 of T5 and T6
and CC. The noise properties of the twostage Millercompensated operational
26 2 Neural Signal Conditioning Circuits
VDD
T7 bias3 bias3 T8
CC CC
outn T15 inp inn T16 outp
T1 T2
bias2 bias2
T5 T6
bias1
T11
cmfb cmfb cmfb cmfb
T13 T3 T4 T14
VSS
Fig.2.9Twostage amplifiers: folded cascode amplifier with a commonsource output stage and
Miller frequency compensation
amplifier are comparable to those of the telescopic cascode and better than those
of the folded cascode amplifier. The speed of a Millercompensated amplifier is
determined by its polesplitting capacitor CC. Usually, the position of this non
dominant pole, which is located at the output of the twostage amplifier, is lower
than that of either a foldedcascode or a telescopic amplifier.
Thus, in order to push this pole to higher frequencies, the second stage of the
amplifier requires higher currents resulting in increased power dissipation. Since
the first stage does not need to have a large output voltage swing, it can be a cas
code stage, either a telescopic or a folded cascode. However, the current consump
tion and transistor count are also increased. The advantages of the folded cascode
structure are a larger input commonmode range and the avoidance of level shift
ing between the stages, while the telescopic stage can offer larger bandwidth and
lower thermal noise.
Figure2.9 illustrates a folded cascode amplifier with a commonsource output
stage and Miller compensation. The noise properties are comparable with those
of the folded cascode amplifier. If a cascode input stage is used, the leadcompen
sation resistor can be merged with the cascode transistors. An example of this is
the folded cascode amplifier with a commonsource output stage and Ahujastyle
compensation [15] shown in Fig.2.10. The operation of the Ahujastyle compen
sated operational amplifier is suitable for larger capacitive loads than the Miller
compensated one and it has a better power supply rejection, since the substrate
noise coupling through the gatesource capacitance of the output stage gain tran
sistors is not coupled directly through the polesplitting capacitors to the opera
tional amplifier output [15].
2.4 Experimental Results 27
VDD
T7 bias3 bias3 T8
bias2 bias2
T5 T6
CC CC
bias1
T11
cmfb cmfb cmfb cmfb
T13 T3 T4 T14
VSS
2.4Experimental Results
Amplitude
1
0
1
2
0.5
0.5
(c) Detected spikes
1
Amplitude
0.5
0.5
1 2 3 4 5 6 7 8 9 10
Time [40uS/step]
Fig.2.11Test data set, the y axis is arbitrary; a top: raw signal after amplification, not corrected
for gain, b bandpass filtered signal, and c detected spikes
20
Membrane potential [mV]
20
40
60
0 5 10 15 20 25
Time [ms]
Fig.2.12Statistical voltage trace of neuron cell activity; Grey AreaVoltage traces from 1000
randomly selected neural channel compartments, Black AreaExpected voltage trace
(a) x 10
3
Amplitude [V]
0
5
0 2 4 6 8 10 12 14 16 18 20
Time [ms]
(b)
80
100
Magnitude [dBV 2rms /Hz]
120
140
160
180
0 1 2
10 10 10
Frequency [kHz]
Fig.2.13a Noise amplitude in timedomain at the output of the lowpass filter; b noise PSD at
the output of the lowpass filter
example of the timedomain noise estimation and noise power spectral den
sity at the output of the lowpass filter is illustrated in Fig.2.13. For frequencies
higher than~10kHz, capacitances at the interface form the highfrequency pole
and shape both the signal and the noise spectrum; the noise is lowpass filtered
to the recording amplifier inputs. The interfaces input equivalent noise volt
age decreases as the gain across the amplifying stages increases, i.e. the ratio
of the square of the signal power over its noise variance can be expressed as
SNR = F 2 / 2 2
neural + electrode + i j Gj
1 2
amp,i , where F is the total sig
nal power, amp,i
2 represents the variance of the noise added by the ith amplification
stage with gains Gj , electrode
2 is the variance of the electrode, and neural
2 is variance
of the biological neural noise. The observed SNR of the system also increases as
the system is isomorphically scaled up, which suggests a fundamental tradeoff
between SNR and speed of the system.
The fully differential lownoise amplifier achieves 40dB closed loop gain, and
occupies an area of 0.04mm2. Input referred noise is 3.1Vrms over the oper
ating bandwidth 0.120kHz. Distortion is below 2% total harmonic distortion
(THD) for typical extracellular neural signals (smaller than 10mV peaktopeak).
The commonmode rejection ratio (CMRR) and the powersupply rejection ratio
(PSRR) exceed 75dB.
The capacitiveattenuation bandpass filter with firstorder slopes achieves
65dB dynamic range, 210mVrms at 2% THD and 140Vrms total inte
grated output noise. Total harmonic distortion of the V/I converter is 0.04% at
20kHz. Table2.1 compares the state of the art neural recording systems to this
work.
30 2 Neural Signal Conditioning Circuits
2.5Conclusions
Bioelectronic neural interfaces enable the interaction with neural cells by record
ing, to facilitate early diagnosis and predict intended behavior before undertak
ing any preventive or corrective actions, or by stimulation, to prevent the onset
of detrimental neural activity such as that resulting in tremor. Multichannel
neural interfaces allow for spatial neural recording and stimulation at multiple
sites. To evade the risk of infection, these systems are implanted under the skin,
while the recorded neural signals and the power required for the implant opera
tion is transmitted wirelessly. The maximum number of channels is constrained
with noise, area, bandwidth, power, which has to be supplied to the implant exter
nally, thermal dissipation i.e. to avoid necrosis of the tissues, and the scalability
and expandability of the recording system. Very frequently an electrode records
the action potentials from multiple surrounding neurons. Subsequently, the ability
to differentiate spikes from noise is governed by, both, the discrepancies between
the noisefree spikes from each neuron, and the signaltonoise level of the record
ing interface. After the waveform alignment, a feature extraction step character
izes detected spikes and represent each detected spike in a reduced dimensional
space. The feature extraction and spike classification significantly reduce the data
requirements prior to data transmission (in multichannel systems, the raw data
rate is substantially higher than the limited bandwidth of the wireless telemetry).
In this chapter, we introduce a lowpower neural signal conditioning circuit
with capacitivefeedback lownoise amplifier and capacitiveattenuation band
pass filter. The capacitivefeedback amplifier offers lowoffset and lowdistortion
solution with optimal powernoise tradeoff. Similarly, the capacitiveattenuation
bandpass filter provides wide tuning range and lowpower realization, while
allowing simple extension of the transconductors linear range, and consequently,
ensuring low harmonic distortion.
References 31
References
1. IEEE Standards Coordinating Committee, in IEEE standard for safety levels with respect to
human exposure to radio frequency electromagnetic fields, 3kHz to 300GHz, C95.12005,
2006
2. M. Steyaert, W. Sansen, C. Zhongyuan, A micropower lownoise monolithic instrumentation
amplifier for medical purposes. IEEE J. SolidState Circuits 22(6), 11631168 (1987)
3. R. Harrison, C. Charles, A lowpower lownoise CMOS amplifier for neural recording appli
cations. IEEE J. SolidState Circuits 38(6), 958965 (2003)
4. M.C. Chae, W. Liu, M. Sivaprakasam, Design optimization for integrated neural recording
systems. IEEE J. SolidState Circuits 43(9), 19311939 (2008)
5. W. Wattanapanitch, M. Fee, R. Sarpeshkar, An energyefficient micropower neural recording
amplifier. IEEE Trans. Biomed. Circuits Syst. 1(2), 136147 (2007)
6. C. Qian, J. Parramon, E. SanchezSinencio, A micropower lownoise neural recording front
end circuit for epileptic seizure detection. IEEE J. SolidState Circuits 46(6), 13291405
(2011)
7. F. Bahmani, E. SnchezSinencio, A highly linear pseudodifferential transconductance, in
Proceedings of IEEE European SolidState Circuits Conference, 2004, pp. 111114
8. S.K. Arfin, Low power circuits and systems for wireless neural stimulation. PhD thesis,
Massachusetts Institute of Technology, 2011
9. G. Nicollini, P. Confalonieri, D. Senderowicz, A fully differential sampleandhold circuit for
highspeed applications. IEEE J. SolidState Circuits 24(5), 14611465 (1989)
10. K. Gulati, H.S. Lee, A highswing CMOS telescopic operational amplifier. IEEE J. Solid
State Circuits 33(12), 20102019 (1998)
11. T.C. Choi, R.T. Kaneshiro, W. Brodersen, P.R. Gray, W.B. Jett, M. Wilcox, Highfrequency
CMOS switchedcapacitor filters for communications application. IEEE J. SolidState
Circuits 18, 652664 (1983)
12. K. Bult, G. Geelen, A fastsettling CMOS op amp for SC circuits with 90dB DC gain. IEEE
J. SolidState Circuits 25(6), 13791384 (1990)
13. R. Harjani, R. Heineke, F. Wang, An integrated lowvoltage class AB CMOS OTA. IEEE J.
SolidState Circuits 34(2), 134142 (1999)
14. R. Hogervorst, J.H. Huijsing, Design of lowvoltage lowpower operational amplifier cells
(Kluwer Academic Publishers, Dordrecht, 1999)
15. B.K. Ahuja, An improved frequency compensation technique for CMOS operational ampli
fiers. IEEE J. SolidState Circuits 18(6), 629633 (1983)
16. C.I. de Zeeuw etal., Spatiotemporal firing patterns in the cerebellum. Nat. Rev. Neurosci.
12(6), 327344 (2011)
17. D. Han etal., A 0.45V 100channel neuralrecording IC with subW/channel consumption
in 0.18m CMOS. IEEE Trans. Biomed. Circuits Syst. 7(6), 735746 (2013)
18. K. Abdelhalim etal., 64channel UWB wireless neural vector analyzer SoC with a closed
loop phase synchronytriggered neurostimulator. IEEE J. SolidState Circuits 48(10), 2494
2510 (2013)
19. C.M. Lopez etal., An implantable 455activeelectrode 52channel CMOS neural probe, in
IEEE International SolidState Circuits Conference, pp. 288289, 2013
20. K.A. Ng, Y.P. Xu, A multichannel neuralrecording amplifier system with 90dB CMRR
employing CMOSinverterbased OTAs with CMFB through supply rails in 65nm CMOS, in
IEEE International SolidState Circuits Conference, pp. 206207, 2015
Chapter 3
Neural Signal Quantization Circuits
Abstract Integrated neural implant interface with the brain using biocompatible
electrodes provides high yield cell recordings, large channel counts, and access to
spike data and/or field potentials with high signaltonoise ratio. By increasing the
number of recording electrodes, spatially broad analysis can be performed that can
provide insights into how and why neuronal ensembles synchronize their activity.
In this chapter, we present several A/D converter realizations in voltage, current
and timedomain, respectively, suitable for multichannel neural signalprocessing.
The voltagedomain SAR A/D converter combines the functionalities of program
mablegain stage and analog to digital conversion, occupies an area of 0.028mm2,
and consumes 1.1W of power at 100kS/s sampling rate. The currentmode suc
cessive approximation A/D converter is realized in a 65nm CMOS technology,
and consumes less than 367nW at 40kS/s, corresponding to a figure of merit
of 14 fJ/conversionstep, while operating from a 1V supply. A timebased, pro
grammablegain A/D converter allows for an easily scalable, and powerefficient,
implantable, biomedical recording system. The timedomain converter circuit is
realized in a 90nm CMOS technology, operates at 640 kS/s, occupies an area of
0.022mm2, and consumes less than 2.7W corresponding to a figure of merit of
6.2fJ/conversionstep.
3.1Introduction
Bioelectronic interfaces allow the interaction with neural cells by both recording,
to facilitate early diagnosis and predict intended behavior before undertaking any
preventive or corrective actions [1], or stimulation devices, to prevent the onset
of detrimental neural activity such as that resulting in tremor. Monitoring large
scale neuronal activity and diagnosing neural disorders has been accelerated by
the fabrication of miniaturized microelectrode arrays, capable of simultaneously
recording neural signals from hundreds of channels [2]. By increasing the num
ber of recording electrodes, spatially broad analysis of local field potentials can
be performed that can provide insights into how and why neuronal ensembles
synchronize their activity. Studies on body motor systems have uncovered how
kinematic parameters of movement control are encoded in neuronal spike time
stamps [3] and interspike intervals [4]. Neurons produce spikes of nearly identi
cal amplitude near to the soma, but the measured signal depend on the position of
the electrode relative to the cell. Additionally, the signal quality in neural inter
face frontend, beside the specifics of the electrode material and the electrode/tis
sue interface, is limited by the nature of the biopotential signal and its biological
background noise, dictating system resources. For any portable or implantable
device, microelectrode arrays require miniature electronics locally to amplify the
weak neural signals, filter out noise and outof band interference and digitize for
transmission. A singlechannel [5] or a multichannel integrated neural amplifiers
and A/D converters provide the frontline interface between recording electrode
and signal conditioning circuits, and thus face critical performance requirements.
In this chapter, we present several A/D converter realizations in voltage, cur
rent and timedomain, respectively, suitable for multichannel neural signalpro
cessing, and we evaluate tradeoff between noise, speed and power dissipation on
a circuitarchitecture level. This approach provides key insight required to address
SNR, response time, and linearity of the physical electronic interface. The voltage
domain SAR A/D converter combines the functionalities of programmablegain
stage and analog to digital conversion, occupies an area of 0.028mm2, and con
sumes 1.1W of power at 100 kS/s sampling rate. The currentmode successive
approximation A/D converter is realized in a 65 nm CMOS technology, and con
sumes less than 367 nW at 40 kS/s, corresponding to a figure of merit of 14 fJ/con
versionstep, while operating from a 1V supply. A timebased, programmablegain
A/D converter allows for an easily scalable, and powerefficient, implantable, bio
medical recording system. The timedomain converter circuit is realized in a 90nm
CMOS technology, operates at 640 kS/s, occupies an area of 0.022mm2, and con
sumes less than 2.7W corresponding to a figure of merit of 6.2 fJ/conversionstep.
The chapter is organized as follows: Sect.3.2 present the overview of the low
power A/D converter architectures, while in Sect.3.3 analyses of the main building
blocks of the A/D converter are given, namely, sample and hold circuit, operation
amplifier, and comparator. Section3.4 focuses on the voltagedomain A/D conversion,
and the noise fluctuations on a circuitarchitecture level. In Sect.3.5, the main build
ing blocks of the currentdomain ADC are evaluated. In Sect.3.6, the timedomain
A/D conversion, which utilizes a linear voltagetotime converter (VTC) and a two
step timetodigital converter is discussed. Experimental results obtained are presented
in Sect.3.7. Finally, Sect.3.8 provides a summary and the main conclusions.
Since the existence of digital signal processing, A/D converters have been play
ing a very important role to interface analog and digital worlds. They perform the
digitalization of analog signals at a fixed time period, which is generally specified
3.2 LowPower A/D Converter Architectures 35
by the application. The A/D conversion process involves sampling the applied
analog input signal and quantizing it to its digital representation by comparing it to
reference voltages before further signal processing in subsequent digital systems.
Depending on how these functions are combined, different A/D converter architec
tures can be implemented with different requirements on each function. To imple
ment poweroptimized A/D converter functions, it is important to understand the
performance limitations of each function before discussing system issues. In this
section, the concept of the basic A/D conversion process and the fundamental limi
tation to the power dissipation of each key building block are presented.
Parallel (Flash) A/D conversion is by far the fastest and conceptually simplest
conversion process [615], where an analog input is applied to one side of a com
parator circuit and the other side is connected to the proper level of reference from
zero to full scale. The threshold levels are usually generated by resistively dividing
one or more references into a series of equally spaced voltages, which are applied
to one input of each comparator. For nbit resolution, 2n1 comparators simulta
neously evaluate the analog input and generate the digital output as a thermometer
code. Since flash converter needs only one clock cycle per conversion, it is often
the fastest converter. On the other hand, the resolution of flash ADCs is limited by
circuit complexity, high power dissipation, and comparator and reference mismatch.
Its complexity grows exponentially as the resolution bit increases. Consequently,
the power dissipation and the chip area increase exponentially with the resolution.
To reduce hardware complexity, power dissipation, and die area, and to increase
the resolution, but to maintain high conversion rates, flash converters can be
extended to a twostep/multistep [1624] or subranging architecture [2533]
(also called seriesparallel converter). Conceptually, these types of converters need
m2n instead of 2mn comparators for a full flash implementation assuming n1,
n2, , nm are all equal to n. However, the conversion in subrange, twostep/multi
step ADC does not occur instantaneously like a flash ADC, and the input has to
be held constant until the subquantizer finishes its conversion. Therefore, a sam
ple and hold circuit is required to improve performance. The conversion process is
split into two steps as shown in Fig.3.1. Simplified twostep A/D architecture and
Analog In + A
S/H
D

A=2n1
A D
D A
n1 n2
(n1+n2) Digital Out
voltage
amplifier
Vin
+ A
D
 amplified
voltage residue lower bit
ADC
A D
D A VinLSB<V<Vin
upper bit
ADC
Twostep A/D converter
time difference
amplifier
Tdiff=TstartTstop + T
Tstop
D
 amplified
time residue lower bit
TDC
Tstart
T Delay
D TdiffLSB<T<Tdiff
upper bit
TDC
Twostep T/D converter
cycle, the S/H circuit between the two stages holds the value of the amplified resi
due. Therefore, the second stage is able to operate on that residue independently of
the first stage, which in turn can convert a new, more recent sample. The maximum
sampling frequency of the pipelined twostep converter is determined by the set
tling time of the first stage only due to the independent operation of the two stages.
To generate the digital output for one sample, the output of the first stage has
to be delayed by one clock cycle by means of a shift register (SR) (Fig.3.3).
Although the sampling speed is increased by the pipelined operation, the delay
between the sampling of the analog input and the output of the corresponding digi
tal value is still two clock cycles. For most applications, however, latency does not
play any role, only conversion speed is important. In all signal processing and tel
ecommunications applications, the main delay is caused by digital signal process
ing, so a latency of even more than two clock cycles is not critical.
The architecture as described above is not limited to two stages. Because the
interstage sample and hold circuit decouples the individual stages, there is no dif
ference in conversion speed whether one single stage or an arbitrary number of
stages follow the first one. This leads to the general pipelined A/D converter archi
tecture, as depicted in Fig.3.4 [3455]. Each stage consists of an S/H, an Nbit
flash A/D converter, a reconstruction D/A converter, a subtracter, and a residue
Analog In + A
S/H S/H
D

A=2n1
A D
D A
n1
SR
n2
(n1+n2) Digital Out
Fig.3.3TwoStep converter with an additional sample and hold circuit and a shift register (SR)
to line up the stage output in time
In
S/H
+ In
S/H
+ S/H A
D
 
A=2n1 A=2n2
A D A D
D A D A
n1 n2
SR
nm
SR SR
(n1+n2++nm) Digital Out
Fig.3.5Successive In
S/H + SAR
approximation A/D converter
architecture  logic
D
A n
Digital Out
3.2 LowPower A/D Converter Architectures 39
SAR A/D converter illustrated in Fig.3.5 typically consists of a S/H circuit followed
by a feedback loop composed by a comparator, a successive approximation Register
(SAR) logic block, and an nbit D/A converter.
The SAR logic captures the data from the comparator at each clock cycle, and
assembles the words driving the D/A converter bit by bit, from the most to the
leastsignificant bit, according to the successive approximation algorithm: The D/A
converter generate a value representing half of the reference voltage. Subsequently,
the comparator determines whether the held signal value is over or under the output
value of the digitaltoanalog converter and keeps or resets the MSB. The algorithm
proceeds in the same way predicting each successive bit until all nbits have been
determined. At the start of the next conversion, while the S/H circuit is sampling
the next input, the SAR provides the nbit output and resets the registers. Offsets in
the S/H circuit or the comparator generate a shift of the conversion range, however
this shift is identical for every code. The S/H circuit requires a low distortion figure
for relatively low sample periods. Additionally, the D/A converter have stringent
requirements as it determines the overall circuit linearity and the conversion speed.
Due to a minimum number of analog blocks required, and a very simple digi
tal logic needed to perform the complete conversion, the SAR A/D converters are
usually chosen as the most efficient in terms of power consumption to digitalize
biomedical signals.
Fig.3.6Switched capacitor CF
S/H circuit configurations in
sample phase: a circuit with
separate CH and CF
CH
Vin Vout
VSS VSS
Fig.3.7Switched capacitor
S/H circuit configurations in
sample phase: a circuit with CH
one capacitor Vin Vout
VSS
offresistance needed for a voltage memory are far easier to implement in a prac
tical integrated circuit technology than inductors and switches with a very small
onresistance required for a current memory, all sample and hold circuits are based
on voltage sampling with switched capacitor (SC) technique. S/H circuit archi
tectures can roughly be divided into openloop and closedloop architectures. The
main difference between them is that in closedloop architectures the capacitor, on
which the voltage is sampled, is enclosed in a feedback loop, at least in hold mode.
Although openloop S/H architecture provide highspeed solution, its accuracy,
however, is limited by the harmonic distortion arising from the nonlinear gain of
the buffer amplifiers and the signaldependent charge injection from the switch.
These problems are especially emphasized with a CMOS technology. Enclosing
the sampling capacitor in the feedback loop reduces the effects of nonlinear para
sitic capacitances and signaldependent charge injection from the MOS switches.
Unfortunately, an inevitable consequence of the use of feedback is reduced speed.
Figures3.6, 3.7 and 3.8 illustrate three common configurations for closedloop
switchedcapacitor S/H circuits [56, 6276]. For simplicity, singleended configu
rations are shown; however, in circuit implementation all would be fully differ
ential. In a mixedsignal circuit such as A/D converters, fully differential analog
signals are preferred as a means of getting a better power supply rejection and
immunity to common mode noise. The operation needs two nonoverlapping clock
phasessampling, and holding, or transferring. Switch configurations shown
in Figs.3.6, 3.7 and 3.8 are for the sampling phase, while configurations shown
in Figs.3.9, 3.10, and 3.11 are for hold phase. In all cases, the basic operations
include sampling the signal on the sampling capacitor(s) CH and transferring the
signal charge onto the feedback capacitor CF by using an opamp in the feedback
configuration. In the configuration in Fig.3.6, which is often used as an integrator,
3.3 A/D Converter Building Blocks 41
Fig.3.8Switched capacitor
S/H circuit configurations CF
in sample phase: a circuit
with CF shared as a sampling V in
capacitor CH
V out
V SS
Fig.3.9Switched capacitor CF
S/H circuit configurations
in hold phase: a circuit with
separate CH and CF
CH
Vin Vout
VSS VSS
Fig.3.10Switched capacitor
S/H circuit configurations in
hold phase: a circuit with one CH
capacitor Vin Vout
VSS VSS
Fig.3.11Switched capacitor
S/H circuit configurations CF
in hold phase: a circuit with
CF shared as a sampling Vin
capacitor CH
Vout
VSS VSS
42 3 Neural Signal Quantization Circuits
assuming an ideal opamp and switches, the opamp forces the sampled signal
charge on CH to transfer to CF.
If CH and CF are not equal capacitors, the signal charge transferred to CF will dis
play the voltage at the output of the opamp according to Vout=(CH/CF) Vin. In this
way, both S/H and gain functions can be implemented within one SC circuit [75, 76].
In the configuration shown in Fig.3.7, only one capacitor is used as both sam
pling capacitor and feedback capacitor. This configuration does not implement the
gain function, but it can achieve high speed because the feedback factor (the ratio
of the feedback capacitor to the total capacitance at the summing node) can be
much larger than that of the previous configuration, operating much closer to the
unity gain frequency of the amplifier. Furthermore, it does not have the capaci
tor mismatch limitation as the other two configurations. Here, the sampling is
performed passively, i.e., it is done without the opamp, which makes signal acqui
sition fast. In hold mode, the sampling capacitor is disconnected from the input
and put in a feedback loop around the opamp [56, 62].
Figure 3.8 shows another configuration which is a combined version of the
configurations in Figs.3.6 and 3.7. In this configuration, in the sampling phase,
the signal is sampled on both CH and CF, with the resulting transfer function
Vout=(1+(CH/CF)) Vin. In the next phase, the sampled charge in the sampling
capacitor is transferred to the feedback capacitor. As a result, the feedback capac
itor has the transferred charge from the sampling capacitor as well as the input
signal charge. This configuration has a wider bandwidth in comparison to the con
figuration shown in Fig.3.6, although feedback factor is comparable. Important
parameters in determining the bandwidth of the SC circuit are Gm (transconduct
ance of the opamp), feedback factor , and output load capacitance. In all of these
three configurations, the bandwidth is given by 1/=Gm/CL, where CL is the
total capacitance seen at the opamp output. Since S/H circuit use amplifier as
buffer, the acquisition time will be a function of the amplifier own specifications.
Similarly, the error tolerance at the output of the S/H is dependent on the ampli
fiers offset, gain, and linearity. Once the hold command is issued, the S/H faces
other errors. Pedestal error occurs as a result of charge injection and clock feed
through. Part of the charge built up in the channel of the switch is distributed onto
the capacitor, thus slightly changing its voltage. Also, the clock couples onto the
capacitor via overlap capacitance between the gate and the source or drain.
Another error that occurs during the hold mode is called droop, which is related
to the leakage of current from the capacitor due to parasitic impedances and to
the leakage through the reversebiased diode formed by the drain of the switch.
This diode leakage can be minimized by making the drain area as small as can be
tolerated. Although the input impedance to the amplifier is very large, the switch
has a finite off impedance through which leakage can occur. Current can also leak
through the substrate.
A prominent drawback of a simple S/H is the onresistance variation of the
input switch that introduces distortion. Technology scales the supply voltage
faster than the threshold voltage, which results in a larger onresistance variation
in a switch. As a result, the bandwidth of the switch becomes increasingly signal
3.3 A/D Converter Building Blocks 43
MOS transistors are used as switches at low voltages. When the signal ampli
tudes are large, accuracy and signal bandwidth are limited by distortion, which
originates from the fact that switch onresistance are not constant but vary
as functions of drain and source voltages. The onresistance is expressed as
Ron = L/(CoxW(VGSVT)), if VDS is small. In the equation two different signal
dependent terms can be identified. The first and dominant one is the gatesource
voltage VGS. The second is the threshold voltage VT dependency on the source
bulk. Although large transistor switches can be used for the worst case VT design,
the switch parasitic capacitance can significantly overload the output of the circuit.
Therefore, increasing VGSVT is desirable to implement low onresistance switch
without adding too much parasitic capacitance.
Several methods allow increase of this gate voltage drive. One method is to
reduce VT by including an extra lowthreshold transistor in the process, although it
will add to process complexity. Another method is to increase VGS using one large
supply created from chip supply to drive all switches on the chip, but potential
problems including possible crosstalk to some sensitive nodes through the shared
supply and difficulty in estimating the total charge drain to drive all switches ren
ders this method absolvent.
Another viable solution to avoid major source of nonlinearity is to make the
switch gatesource voltage constant, by making the gate voltage track the source
voltage with an offset Voff_in, which is, at its maximum, equal to the supply volt
age. This technique, which is implemented in this design, is called bootstrap
ping [81]. In this case, bootstrap circuit shown in Fig.3.12 drives each switch
that use the same clock to avoid the problem of crosstalk through the clock line.
A Voff_in can be generated with a switched capacitor, which is precharged in
every clock cycle. During the clock phase when the transistor is nonconductive the
switched capacitor is precharged to Voff_in. To turn the switch on, the capacitor
VDD
T2 clk clkn
T1 T3 T4 T13
T8
T11 T12
VSS
T5 T6
clk
T9 T10 out
in
clkn
T7
VSS
is switched between the input voltage and the transistor gate. The capacitor values
are chosen as small as possible for area considerations but large enough to suf
ficiently charge the load to the desired voltage levels. The device sizes are chosen
to create sufficiently fast rise and fall times at the load. The load consists of the
gate capacitance of the switching device T10 and any parasitic capacitance due to
interconnect between the bootstrap circuit and the switching device. Therefore,
it is desirable in the layout to minimize the distance between the bootstrap cir
cuit and the switch or to insert shielding protection. When the switch T10 is on,
its gate voltage VG is greater than the analog input signal Vin by a fixed differ
ence of Voff_in = VDD. Although the absolute voltage applied to the gate may
exceed for a positive input signal, none of the terminaltoterminal device voltages
exceeds VDD. A singlephase clock clk turns the switch T10 on and off. During the
off phase, clk is low discharging the gate of the switch to ground through devices
T11 and T12.
At the same time, VDD is applied by T3 and T7 across as capacitor connected
transistor T16, which act as the battery across the gate and source during the on
phase. T8 and T9 isolate the switch from the capacitance while it is charging. When
clkn goes high, T6 pulls down the gate of T8, allowing charge from the battery
capacitor to flow onto gate of T10. This turns on both T9 and T10. T9 enables gate
of T10 to track the input voltage applied at the source of T10 shifted by VDD, keep
ing the gatesource voltage constant regardless of the input signal.
The maximum speed and, to a large extent, the power consumption of S/H is
determined by the operational amplifier. In general, the amplifiers open loop dc
gain limits the settling accuracy of the amplifier output, while the bandwidth and
slew rate of the amplifier determine the maximal clock frequency. The operational
amplifiers in S/H circuit have some unique requirements, the most important of
which is the input impedance, which must be purely capacitive so as to guarantee
the conservation of charge. Consequently, the operational amplifier input has to be
either in the common source or the source follower configuration. Another char
acteristic feature of S/H circuit is the load at the amplifier output, which is typi
cally purely capacitive and as a result, the amplifier output impedance can be high.
The benefit of driving solely capacitive loads is that no output voltage buffers are
required. In addition, if all the amplifier internal nodes have low impedance, and
only the output node has high impedance, the speed of the amplifier can be max
imized. Unfortunately, an output stage with very high output impedance cannot
usually provide high signal swing.
The ultimate settling accuracy is limited by the finite amplifier dc gain. What
the exact settling error is depends not only on the gain but also on the feedback
factor in the circuit utilizing the amplifier. A very widely used method to improve
the dc gain is based on local negative feedback [8284]. In addition to this cascode
46 3 Neural Signal Quantization Circuits
regulation other techniques for increasing the dc gain have been proposed as well.
Gain boosting with positive feedback has been investigated, [85, 86]. In [87],
dynamic biasing, where the opamp current is decreased toward the end of the set
tling phase, is used to increase the dc gain. It exploits the fact that current reduc
tion lowers the transistor gDS, which increases the dc gain. By regulating the gate
voltages of the cascode transistors [88] by adding an extra gain stage, the dc gain
of the amplifier can be increased by several orders of magnitude.
Besides the amplifier bandwidth, the settling time is limited by the fact that
the amplifier can supply only a finite current to the load capacitor. Consequently,
the output cannot change faster than the slew rate. When designing an ampli
fier, the load capacitor is known and the required slew rate SR = kVmax/TS can
be calculated from the largest voltage step Vmax and the clock period TS. A com
monly used rule of thumb suggests that one third of the settling time should
be reserved for slewing, resulting in k of six. The required slewing current is
ISR =(kVmaxCL)/TS. It is linearly dependent on the clock frequency, while the
current needed to obtain the amplifier bandwidth has a quadratic dependence. The
opamp unity gain frequency 1 can be made larger increasing gm, in by means of
making the transistors bigger; however, this does not necessarily imply a faster
opamp. The parasitic capacitance is also increased, therefore feedback factor
becomes smaller and dominant pole p=1 is pushed towards lower frequen
cies. Therefore, a tradeoff between the increase of gmin and CG exists. This
suggests that an optimum size for the input pair exist, which maximizes the
transconductance of the opamp by avoiding to make the input capacitance domi
nant on the feedback factor.
Overview of several single, and twostage amplifiers is given in Sect.2.3.
Because of its fast response, regenerative latches are used, almost without excep
tion, as comparators for highspeed applications. An ideal latched comparator is
composed of a preamplifier with infinite gain and a digital latch circuit. Since the
amplifiers used in comparators need not to be either linear or closedloop, they can
incorporate positive feedback to attain virtually infinite gain [89]. Because of its
special architecture, working process of a latched comparator could be divided in
two stages: tracking and latching stages. In tracking stage the following dynamic
latch circuit is disabled, and the input analog differential voltages are amplified by
the preamplifier. In the latching stage while the preamplifier is disabled, the latch
circuit regenerates the amplified differential signals into a pair of fullscale digital
signals with a positive feedback mechanism and latches them at output ends.
Depending on the type of latch employed, the latch comparators can be divided
into two groups: static [56, 90, 91], which have a constant current consumption
during operation and dynamic [9294], which does not consume any static power.
While dynamic latch circuits regenerate the difference signals, the large voltage
3.3 A/D Converter Building Blocks 47
outn
outp
inp T2 T3 inn
VSS
bias T6 T7
outn
T10
bias
T1 T8 T9
VSS
chosen such that gm8,9R<2 and should be small enough to reset the output at the
clock rate. Since all transistors are in active region, the latch can start regenerat
ing right after the latch signal goes low. The one disadvantage of this scheme is
the large kickback noise. The folding nodes (drains of T4 and T5) have to jump up
to VDD in every clock cycle since the latch output does the full swing. Because of
this, there are substantial amounts of kickback noise into the inputs through the
gatedrain capacitor of input transistors T1 and T2 (CGD1, CGD2). To reduce kick
back noise, the clamping diode has been inserted at the output nodes [96].
In Fig.3.15 design shown in [91] is illustrated. Here, when the latch signal is
low (resetting period), the amplified input signal is stored at gate of T8 and T9 and
T12 shorts both Voutp and Voutn. When the latch signal goes high, the crosscoupled
transistors T10 and T11 make a positive feedback latch. In addition, the positive
feedback capacitors, C1 and C2, boost up the regeneration speed by switching T8
and T9 from an input dependant current source during resetting period to a cross
coupled latch during the regeneration period. Because of C1 and C2, the T8~T11
work like a crosscoupled inverter so that the latch does not dissipate the static
power once it completes the regeneration period. However, there is a large amount
of kickback noise through the positive feedback capacitors, C1 and C2. The
switches (T6, T7 and T13) have been added to isolate the preamplifier from the
latch. Therefore, the relatively large chip area is required due to the positive feed
back capacitors (C1, C2), isolation switches (T6, T7 and T13) and complementary
latch signals.
The concept of a dynamic comparator exhibits potential for low power and
small area implementation and, in this context, is restricted to singlestage topolo
gies without static power dissipation. A widely used dynamic comparator is based
on a differential sensing amplifier as shown in Fig.3.16 was introduced in [92].
Transistors T14, biased in linear region, adjust the threshold resistively and above
them transistors T512 form a latch. When the latch control signal is low, the
3.3 A/D Converter Building Blocks 49
T7 T6
clkn
T7
T6
outn outp
T12
bias
T1
T10 T11
VSS
comparator [92]
T9 T10 T11 T12
clk clk
outp
T7 T8
outn
T5 T6
VSS
transistors T9 and T12 are conducting and T7 and T8 are cut off, which forces both
differential outputs to VDD and no current path exists between the supply voltages.
Simultaneously, T10 and T11 are cut off and the transistors T5 and T6 conduct.
This implies that T7 and T8 have a voltage of VDD over them. When the compara
tor is latched, T7 and T8 are turned on. Immediately after the regeneration moment,
the gates of the transistors T5 and T6 are still at VDD and they enter saturation,
amplifying the voltage difference between their sources. If all transistors T512 are
assumed to be perfectly matched, the imbalance of the conductances of the left
and right input branches, formed by T12 and T34, determines which of the out
puts goes to VDD and which to 0V. After a static situation is reached (Vclk is high),
both branches are cut off and the outputs preserve their values until the comparator
is reset again by switching Vclk to 0V. The transistors T14 connected to the input
50 3 Neural Signal Quantization Circuits
and reference are in the triode region and act like voltage controlled resistors. The
transconductance of the transistors T14 operating in the linear region, is directly
proportional to the drainsource voltage of the corresponding transistor VDS14,
while for the transistors T56 the transconductance is proportional to VGS5,6VT. At
the beginning of the latching process, VDS140 while VGS5,6VTVDD. Thus,
gm5,6gm14, which makes the matching of T5 and T6 dominant in determining the
latching balance. As small transistors are preferred, offset voltages of a few hun
dred millivolts are easily resulted. Mismatch in transistors T712 are attenuated by
the gain of T5 and T6, which makes them less critical. To cope with the mismatch
problem, the layout of the critical transistors must be drawn as symmetric as pos
sible. In addition to the mismatch sensitivity, the latch is also very sensitive to an
asymmetry in the load capacitance. This can be avoided by adding an extra latch
or inverters as a buffering stage after the comparator core outputs.
The resistive divider dynamic comparator topology has one clear benefit, which
is its low kickback noise. This results from the fact that the voltage variation at the
drains of the input transistors T14 is very small. On the other hand, the speed and
resolution of the topology are relatively poor because of the small gain of the tran
sistors biased in the linear region.
A fully differential dynamic comparator based on two crosscoupled differen
tial pairs with switched current sources loaded with a CMOS latch is shown in
Fig.3.17 [93]. The trip point of the comparator can be set by introducing imbal
ance between the sourcecoupled pairs. Because of the dynamic current sources
together with the latch, connected directly between the differential pairs and the
supply voltage, the comparator does not dissipate dcpower. When the comparator
is inactive the latch signal is low, which means that the current source transistors
T5 and T6 are switched off and no current path between the supply voltages exists.
Simultaneously, the pchannel switch transistors T9 and T12 reset the outputs by
shorting them to VDD. The nchannel transistors T7 and T8 of the latch conduct
outn
outp
T7 T8
clk clk
T5 T6
VSS
3.3 A/D Converter Building Blocks 51
and also force the drains of all the input transistors T14 to VDD, while the drain
voltage of T5 and T6 are dependent on the comparator input voltages. When clock
signal is raised to VDD, the outputs are disconnected from the positive supply, the
switching current sources T5 and T6 turn on and T14 compare VinpVinn with Vrefp
Vrefn. Since the latch devices T78 are conducting, the circuit regeneratively ampli
fies the voltage difference at the drains of the input pairs. The threshold voltage of
the comparator is determined by the current division in the differential pairs and
between the crosscoupled branches.
The threshold level of the comparator can be derived using large signal cur
rent equations for the differential pairs. The effect of the mismatches of the other
transistors T712 is in this topology not completely critical, because the input is
amplified by T14 before T712 latch. The drains of the crosscoupled differential
pairs are high impedance nodes, and the transconductances of the thresholdvolt
agedetermining transistors T14 large. A drawback of the differential pair dynamic
comparator is its high kickback noise: large transients in the drain nodes of the
input transistors are coupled to the input nodes through the parasitic gatedrain
capacitances. However, there are techniques to reduce the kickback noise, e.g., by
cross coupling dummy transistors from the differential inputs to the drain nodes
[97]. The differential pair topology achieves a high speed and resolution, which
results from the builtin dynamic amplification.
Figure 3.18 illustrates the schematic of the dynamic latch given in [94]. The
dynamic latch consists of precharge transistors T12 and T13, crosscoupled
inverter T69, differential pair T10 and T11 and switch T14 which prevent the static
current flow at the resetting period. When the latch signal is low (resetting period),
the drain voltages of T1011 are VDDVT, and their source voltage is VT below the
latch input common mode voltage. Therefore, once the latch signal goes high, the
nchannel transistors T7,911 immediately go into the active region. Because each
transistor in one of the crosscoupled inverters turns off, there is no static power
dissipation from the latch once the latch outputs are fully developed.
outn
outp
T7 T9
bias clk
T3 T14
VSS
52 3 Neural Signal Quantization Circuits
Fig.3.19Multichannel &K0
neural interfaces: &K
$QDORJPX[
$'&
%3/1$ 3*$
Fig.3.20Multichannel Ch #M
neural interfaces; an ADC Ch #2
vcm
vcm
s2 s1
s1 C3 s2 s3
Vin
vcm
vcm
vcm
2n C4
C1 1 1
s2 2 C2 2p
Vcomp+
Vref+
Vref
s2 2 2p Vcomp
C2
C1 1 1
2n C4
vcm
vcm
vcm
Vin+ s1 s2 s3
C3
s2 s1
vcm
vcm
Fig.3.22Maximum 85
achievable SNR for different 14 bit
80 SNR[dB]
sampling capacitor values
and resolutions 75
12 bit
70
65
60 10 bit
55
50
C4 [F] 8 bit
45
1f 10f 0.1p 1p 10p
70 12 bit
60 10 bit
50
8 bit
P [W]
40
10n 0.1u 1u 10u 0.1m 1m
capacitance value and the OTA size. This means that the PG ADC circuit power
quadruples for every additional bit resolved for a given speed requirement and
supply voltage as illustrated in Figs.3.22 and 3.23. Notice that for a small sam
pling capacitor values, thermal noise limits the SNR, while for a large sampling
capacitor, the SNR is limited by the quantization noise and the curve flattens out.
Improving the power efficiency beyond topological changes of the OTA and sup
ply voltage reduction require smart allocation of the biasing currents. Hence,
techniques such as current reuse [105, 106], time multiplexing [4, 106], and adap
tive dutycycling of the entire analog front end [107, 108] can be used to improve
power efficiency by exploiting the fact that neurons spikes are irregular and low
frequency.
Choosing the OTA bandwidth too high increases the noise and additionally
demands unnecessarily low onresistance of the switches and thus large transistor
dimensions. The optimum time constant remains constant regardless of the circuit
size (or ID) because CL scales together with C4 and the parasitic capacitance Cp.
The choice of the hold capacitor value is a tradeoff between noise requirements
on the one hand and speed and power consumption on the other hand.
3.4 VoltageDomain SAR A/D Conversion 55
Fig.3.24Closed loop 5k
normalized time constant /t
versus hold capacitance
CH for different biasing
conditions; case for
C4=3CL, CL=Cp. The time 10 A
1k
constant is normalized to the
t (=1/ft,intrinsic) of the device, 40 A
which is approximately
(CG/gm) 100 A
400 A
C4 [F] 1 mA
0.1k
0 1p 10p
The sampling action adds kT/C noise to the system which can only be reduced
by increasing the hold capacitance C4. A large capacitance, on the other hand,
increases the load of the operational amplifier and thus decreases the speed for
a given power. The OTA size and its bias current for a given speed requirement
and minimum power dissipation are determined using versusC4 curves as in
Fig.3.24. Note that for low frequency operation (where /t is large), the COTA that
achieves the minimum power dissipation for given settling time and noise require
ments, usually does not correspond to the minimum time constant point. This is a
consequence of setting the C4/COTA ratio of the circuit to the minimum time con
stant point, which requires larger COTA and results in power increase and excessive
bandwidth. Near the speed limit of the given technology (where the ratio /t is
small), however, the difference in power between the minimum power point and
the minimum time constant point becomes smaller as the stringent settling time
requirement forces the C4/COTA ratio (Fig.3.25) to be at its optimum value to
achieve the maximum bandwidth.
The OTA in PG ADC circuit has some unique requirements; the most important
is the input impedance, which must be purely capacitive so as to guarantee the
conservation of charge. Consequently, the OTA input has to be either in the com
mon source or the source follower configuration.
Another characteristic feature is the load at the OTA output, which is typi
cally purely capacitive, and as a result, the OTA output impedance must be high.
The benefit of driving solely capacitive loads is that no output voltage buffers
are required. The implemented foldedcascode OTA is illustrated in Fig.3.26.
The input stage of the OTA is provided with two extra transistors T10 and T11 in
a commonsource connection, having their gates connected to a desired reference
commonmode voltage at the input, and their drains connected to the ground [88].
The advantage of this solution is that the commonmode range at the output is
not restricted by a regulation circuit, and can approach a railtorail behavior very
closely. The transistors of the output stage have two constrains: the gm of the cas
cading transistors T5,6 must be high enough, in order to boost the output resistance
56 3 Neural Signal Quantization Circuits
2 CL =1.5pF Cp =1.5pF
1.5 Cp =1pF
CL =1pF
1 Cp =0.5pF
CL =0.5pF
C4 [F]
0.5
0.1p 1p 10p
Fig.3.25Optimum gate capacitance COTA,opt versus hold capacitance C4 for different loading
and parasitic conditions
VDD
T3 T4 T16
VSS
Fig.3.26OTA schematic
of the cascode, allowing a high enough dc gain and the saturation voltage of the
active loads T3,4 and T7,8 must be maximized, in order to reduce the extra noise
contribution of the output stage. These considerations underline a tradeoff between
fitting the saturation voltage into the voltage headroom and minimizing the noise
contribution. A good compromise is to make the cascading transistors larger
than the active loads: in such a way the gm of the cascading transistors is max
imized, boosting the dc gain, while their saturation voltage is reduced, allowing
for a larger saturation voltage for the active loads, without exceeding the voltage
headroom.
3.4 VoltageDomain SAR A/D Conversion 57
In order to maximize the output SNR, CL must be maximized, which means that
bandwidth must be minimized. The inputreferred noise of the OTA input pair is
reduced by increasing the gm, increasing the current, or increasing the aspect ratio
of the devices. The effect of the last method, however, is partially canceled by the
increase in the noise excess factor. When referred to the OTA input, the noise volt
ages of the current sources (or mirrors) in the first stages are multiplied by the gm
of the device itself, and divided by the gm of the input transistor, which again sug
gests that maximizing input pair gm minimizes noise. It can be further reduced by
decreasing the gm of the current sources. Since the current is usually set by other
requirements, the only possibility is to decrease the aspect ratio of the device.
This leads to an increase in the gate overdrive voltage, which, as a positive side
effect, also decreases . Increasing L to avoid short channel effects is also possible,
although with a constant aspect ratio it increases the parasitic capacitances.
The dynamic latch illustrated in Fig.3.27 consists of precharge transistors T14
and T17, crosscoupled inverter T1213 and T1516, differential pair T10 and T11 and
switch T9, which prevent the static current flow at the resetting period [94]. A large
portion of the total comparator current is allocated to the input branches to boost
the input gm. Similarly, the noise from the nongain element i.e. the load transis
tor, is minimized, by applying small biasing current. Additionally, small width and
large length for their gate dimensions is chosen.
The converter utilizes a synchronous SAR logic consisting of a cascade mul
tiple input, n bit shift register (Fig.3.28) to generate digital output code and the
switch control signals for the D/A converter. The successive approximation algo
rithm starts with the activation of the MSB, while the other bits remain zero. When
the conversion continues, the rest of the bits are successively activated. Each bit
evaluates the state of the others and in function of the result, it decides either it
has to be activated, keeps its value, or take the value of the comparator [109]. The
selection depends on the state of the register itself and the state of the following
registers states. As a result, its switching activity is not high and the leakage power
dominates the total power. To reduce the leakage currents several techniques are
VDD
T14 T15 T16 T17
T28 T26 T25 T19 T18 T7 T8 clk clk
outn
outp
inp inn
Ibias T1 T2
T12 T13
T20
T27 T24
T5 T6
T10 T11
VSS
Fig.3.27Comparator schematic
58 3 Neural Signal Quantization Circuits
(a) (b)
comp clk
comp
k
clk clk clk clk clk D Q
shift
MUX
cmp k cmp k cmp k cmp k cmp k
shift shift shift shift
Ak Ak Ak Ak Ak
Ak
clk
The currentmode converters offer high resource efficiency in terms of power and
area [111114]. In contrast to voltage mode charge redistribution SAR A/D con
verter, corresponding current mode circuit have several intrinsic advantages includ
ing tunable input impedances, wide bandwidth, and low supply voltage requirement.
Additionally, only MOSFET devices are required for logical and numerical opera
tion limiting the area requirements. The current mode SAR A/D converter is imple
mented following the conventional architecture. The output digital code is generated
by comparing the input current offered through current sample and hold circuit
(S/H) with a reference current provided by binary current D/A converter (DAC). The
comparison is performed in sequence for each bit in the selected resolution, adding
up to ncycles per conversion (i.e., a binary search). The current comparison requires
only injecting two currents into a single node and using the current, which flows
out of the node, as the algebraic difference of the two input currents. Since most
of current source implementations have high output impedance, the nodal voltage
generated by the output current indicates the result of the comparison. The current
comparator feeds back in each cycle to the SAR logic, adjusting the reference cur
rent generated by a current mode D/A converter closer to the input value. The input
dynamic range of the D/A converter is controlled by biasing current. As a conse
quence, the power consumption of the DAC is directly proportional to the signal
level and accordingly, advantageous for the low energy neural signals.
A S/H circuit capture the input signal at the sampling instants and subsequently
hold the signal value, which is then further processed in a current based binary
search algorithm SAR loop. The schematic of the implemented circuit is illus
trated in Fig.3.29. The circuit is (pseudo) differential, and only a singleended
version is shown. A sampleandhold operation is performed by using analog
switch formed by transmission gate T45 and hold capacitor CH. In sample mode,
switch T45 is turned on, and the gates of the currentmirror circuit transistors T1
3.5 CurrentDomain SAR A/D Conversion 59
Fig.3.29Schematic of VDD
current mode sample and
hold circuit I+IinI Ib2 I+IinI
I+Ib1 T7 T8
T3
clk
I+IinI
T4
T1 T2
II
T5 Ib2
I+Iin Ib1
CH
clkn
VSS
Fig.3.30Schematic of VDD
inverter cascade current mode
comparator circuit
T1
IDAC
T2 T5
VOUT
T3 T6
IS/H
T4
VSS
60 3 Neural Signal Quantization Circuits
The current mode D/A converter circuit illustrated in Fig.3.31 consists of a cur
rent replication network, which generates weighted currents using cascoded cur
rent mirrors (T2341), and a current switching network of differential pairs (T120)
controlled by the binary bits. The cascade current sources are sized up according
to the bit weight and are biased by the same bias voltages. The weighted sources
and each weighted current source (or cascodes) is made of a number of LSB
devices connected in parallel (the LSB device becomes the unit device). By parti
tioning the weighted devices in units, the unit devices can be positioned according
to commoncentroids to reduce the impact of matching error gradients. This sim
ple and compact implementation is able to reach very high conversion rates, being
limited only by the steepness of the data waveforms carrying the bits, by the maxi
mum switching speed of the current switches, and by the technology limitation.
At nanoampere bias levels, mismatch will limit the linearity of the current mode
D/A converter, thus restricting the maximum resolution of the A/D converter
[116]. To achieve an 10bit resolution calibration as in [111] is employed.
The timemode converters based on asynchronous ADCs [117], slope and inte
grating ADC [118], or pulseposition modulation [119] provide high power and
area efficiency. In timebased methodology, conventional voltage and current vari
ables are replaced by corresponding time differences between two rising edges as
the time variables, and logic circuits substitute the largesized and powerhungry
analog blocks. In deepsubmicron CMOS devices, even with the supply voltage
reduction, time resolution is increased due to the decrease of gate delay [120].
In the proposed design, a voltage signal is converted to a timedomain repre
sentation using a comparatorbased switchedcapacitor circuit [121] and a con
tinuoustime comparator. To improve the power efficiency, resulting time domain
information is converted to the corresponding digital code with a twostep time
todigital converter (TDC), where fine quantization of the resulting residue is
obtained with folding Vernier converter. The implementation results in a 90nm
CMOS technology show that a significant gain on throughput, resource usage and
3.6 TimeDomain TwoStep A/D Conversion 61
&
9'' FORFN
Fig.3.32Block diagram of an ADC with twostep timetodigital conversion; single input ver
sion shown for clarity
power reduction (less than 2.7W corresponding to a figure of merit of 6.2 fJ/
conversionstep) can be obtained for largescale neural spike data, with a simple
and compact ADC structure that has minimal analog complexity.
The basic concept of the architecture, which utilizes a linear voltageto
time converter (VTC) , and a twostep timetodigital converter, is illustrated in
Fig.3.32. The scheme is reconfigurable in terms of input gain (through program
mable capacitance C2), resolution (controlling the number of performed iterations)
and sampling frequency (through the frequency of the input clock). Once a con
figuration has been selected, the bias current is also dynamically controlled during
the conversion operation to adapt to the reference voltage. A comparatorbased,
switchedcapacitor gain stage [121] eliminates highgain, highspeed operational
amplifier from the design, and does not require stabilizing highgain, highspeed
feedback loop, reducing complexity, and the associated stability versus bandwidth/
power tradeoff. The VTC converts a sampled input voltage to a pulse, whose time
period is linearly proportional to the input voltage.
During the charge transfer phase, the current source IX1 turns on, charges up
the capacitor network consisting of C1 and C2, and generate a constant voltage
ramp on the output voltage Vo and, subsequently, causes the virtual ground volt
age VX to ramp simultaneously, (Fig.3.33a), via the capacitor divider. The volt
ages continue to ramp until the comparator detects the virtual ground condition
(VX = VCM), and turns off the current source. When the voltage at the sampling
capacitor reaches the comparator threshold, the comparator output goes high. The
timetodigital converter measures the time interval tm from the start of the ramp
until the ramp and the input signal crossover point, as illustrated in Fig.3.33b). i.e.
between the start signal rising edges, and the comparator generated stop signal.
The time interval is measured by the TDC, which generates a corresponding digi
tal output. The most simplest TDC realization, a digital counter, requires a (very)
high counter frequency to realize a high resolution converter. Similarly, delay
line circuits, although more power efficient, necessitate large number of stages to
measure required periods of time, significantly degrading INL and effective reso
lution [122]. A TDC combining a lowfrequency, lowpower counter as a coarse
62 3 Neural Signal Quantization Circuits
Fig.3.33a The output voltage ramps to the final value in comparatorbased switched capacitor
charge transfer phase, b ADC timing signals, c input versus output voltage of the proposed ADC
quantizer, and a folding Vernier delay line TDC as a fine quantizer, offer both, a
large dynamic range and power efficiency. The preferred postprocessed data of
fine and course TDC output as a function of input voltage is shown in Fig.3.33c).
The maximum and minimum values of the fine folding Vernier TDC match to the
half and oneandahalf period, of the coarse timetodigital converter, respec
tively, as measured in increases in the fine TDC unit step size.
The circuit realization of a fullydifferential comparator with digitallypro
grammable offset adjustment [123] is illustrated in Fig.3.34. Transistors T58
employ iterated instance notation to designate 5 transistors placed in parallel. The
widths of these devices are binary weighted to offer a programmable current gain,
which creates an offset programmable preamplifier that is employed for offset
compensation. The continuoustime comparator at the output of the voltageto
time converter consists of a differential amplifier followed by a common source
stage (Fig.3.35). The input transistors operate in the subthreshold region for
Fig.3.34Differential VDD
comparator with digitally
programmable offset Vcontrol
T7[4:0] T8[4:0] T12
adjustment
Voff[4:0] Voff[4:0]
T5[4:0] T6[4:0]
Vout
T10 T11
Vinp Vinn
T1 T2
Vbias
T3
Vcontrol
T9
Vout
T4
VSS
3.6 TimeDomain TwoStep A/D Conversion 63
Fig.3.35Continuoustime VDD
comparator
T15 T16 T19
stop
ref Vin
T13 T14
Vbias
T17 T18
VSS
reduced power consumption and to offer a larger input common mode range, and,
consequently, increased ramp dynamic range.
The coarse current source (Fig.3.32) is a PMOS cascode that is controlled by
a switch at the gate of the cascade transistor, and the fine current source is a single
NMOS device with a series switch.
A coarse time quantizer, designed using a counter, measures the number of
reference clock cycles. The fine resolution quantization of the twostep timeto
digital converter corresponds to a folding Vernier delay TDC. The proposed archi
tecture executes timetodigital conversion by counting transitions between the
stop signal and the next reference clock rising edge after stop signal. These transi
tions are enabled only during the measurement interval. The synchronizer block,
which consists of three flipflops in series, ensures that the coarse and fine time
measurements are correctly aligned.
A folding Vernier delay TDC is easily scalable to different time resolution and
higher number of bits without increasing the area. The architecture achieves mini
mum time resolution of Vernier delay element (i.e., basic inverter delay), and, due to
the folding, offers areaefficient solution. Instead of 32element delay line required
for the regular Vernier architecture, the folding feature allows the same Vernier delay
stages to be used repeatedly to measure the delay. Additionally, with implemented
dynamic control, we sequentially reduce the power required for each conversion.
Block level of a folding TDC is illustrated in Fig.3.36. Simplified overview
of a freeze Vernier delay line architecture is shown in Fig.3.37. In this design,
only four thermal codes are generated at every cycle, and, hence, in the worst case,
the measurement cycle is repeated eight times, which is equivalent to 32bit ther
mal code with only four Vernier delay elements. The 4bit thermal codes are con
verted into 4 pulses with thermaltoclock generator, and clock a 5bit counter at
the output of TDC. For each thermal bit generated in a freeze Vernier delay line,
a corresponding pulse is generated using pulse generator. The distance between
two pulses is controlled with currentstarved inverters. For rising edge input, the
circuit generates a pulse. The width of the pulse is determined by the NAND gate,
inverter and the buffer. The enable signal, which decides if either signals start/stop
or v1_start/v1_stop continue into the next cycle, is generated using signals vt4 and vp4.
64 3 Neural Signal Quantization Circuits
Q[4:0]
5bit counter
Thermal bit to
1 clock converter
start 0
vst vt4
Freeze Vernier
0
4bit thermal code
stop vsp vp4
1
v1_stop
Pulse Gen
isolation isolation
inverters inverters
vsp vp4
In the first conversion cycle, enable=0 and vstart/vstop is selected for measure
ment, otherwise v1_start/v1_stop is selected. The enable signal is switched from 1
to 0 when the rising edge of vstart/v1_start crosses the rising edge of vstop/v1_stop.
This particular feature dynamically decides when the conversion is stopped, hence,
power/conversion is optimized based on the input. The TDC also offers a feed
back to the system with a ready signal (inverted signal of enable), indicating that
it is ready for next conversion. The 4bit thermal code is generated with freeze
Vernier architecture [124]. In the conventional Vernier architecture, time capture
elements or earlylate detectors (e.g., a Dregister or an arbiter) impose the large
load on the circuit. In the freeze Vernier TDC, the time capturing is instead per
formed by freezing the node voltages of the start line in a linear Vernier delay line,
allowing a power and areaefficient conversion. The freeze Vernier converter con
sists of inverters and current enabled inverters only. Additionally, the circuit does
not require any reset signalit resets on the falling edge of the stop and the start
3.6 TimeDomain TwoStep A/D Conversion 65
signal. The delays of the inverters in the freeze Vernier delay elements are con
trolled using bias current, thus, controlling the resolution of the TDC.
3.7Experimental Results
Fig.3.38Spectral signature 0
SNDR=45.6 dB
45
60
75
90
105
0 12.5 25 37.5 50
Frequency [kHz]
55
50
45
40
1 2 3 4 5 6 7 8
Gain
66 3 Neural Signal Quantization Circuits
Fig.3.40SFDR, SNDR, 65
SFDR
and THD versus sampling
SNDR
frequency with fin=10kHz 60 THD
50
45
40
10 20 30 40 50 60 70 80 90 100
Sampling frequency [kHz]
Fig.3.41Spectral signature 0
of the currentdomain SAR 20 f =18.9 kHz
in
A/D converter f =40 kS/s
S
40 SFDR=64.7 dB
Power [dBFS]
SNDR=58.3 dB
60
80
100
120
140
0 5 10 15 20
Frequency [kHz]
70 THD
65
60
55
2 4 6 8 10 12 14 16 18 20
Input frequency [kHz]
Fig.3.43SFDR, SNDR, 75
SFDR
and THD versus sampling
SNDR
frequency with fin=1kHz THD
SFDR, THD, SNDR [dB]
70
65
60
55
5 10 15 20 25 30 35 40
Sample frequency [kHz]
voltagetotime converter. SNDR, SFDR and THD versus sampling, and input fre
quency is illustrated in Figs.3.45 and 3.46, respectively. The THD in the range
of 40640 kS/s is above 63dB within the bandwidth of neural activity of up to
20kHz; SNDR is above 58dB, and SFDR more than 64dB. The maximum sim
ulated DNL is 0.6 LSB and the maximum simulated INL is 0.8 LSB. Variation
68 3 Neural Signal Quantization Circuits
Fig.3.44Spectral signature
of the timedomain A/D ILQ N+]
converter IV N6V
6)'5 G%
3RZHU>G%)6@
61'5 G%
)UHTXHQF\>N+]@
Fig.3.45SFDR, SNDR,
6)'5
and THD versus sampling
61'5
frequency with fin=20kHz
6)'57+'61'5>G%@
7+'
and gain set to 18dB
6DPSOLQJIUHTXHQF\>N+]@
7+'
set to 18dB
,QSXWIUHTXHQF\>N+]@
across slowslow and fastfast corner is 0.35 ENOB. The VTC is >9 bit linear
across 0.5V input range.
Consequently, ramp rate variation across the input range is limited to 10%,
leading to 400V nonlinear voltage variation across the output range. The refer
ence clock frequency is 80MHz, and, subsequently, the counter realizes a 5 bit
resolution over the 400ns TDC input time signal range. The ramp repetition fre
quency, i.e., sampling frequency of the proposed ADC, is 640kHz. The simulated
3.7 Experimental Results 69
ENOB is 9.4 bits over the entire neural spikes input bandwidth. The total A/D
converter consumes 2.7W, when sampled at 640 kS/s, and 1.6W at 40 kS/s,
respectively. The area of the folding Vernier TDC design sums up to 10.5m2,
the average resolution is 10.05ps, it operates at a power supply of 0.4V, and con
sumes 0.6W of power at 640 kS/s sampling rate. Table3.1 summarize the per
formance, while Table3.2 show comparison with previous art.
3.8Conclusions
converter consumes less than 2.7W of power when operating at 640 kS/s sam
pling frequency. With 6.2 fJ/conversionstep, the circuit realized in 90nm CMOS
technology exhibits one of the best FoM reported, and occupies an estimated area
of only 0.022mm2.
References
19. H. van der Ploeg, G. Hoogzaad, H.A.H. Termeer, M. Vertregt, R.L.J. Roovers, A 2.5V 12b
54Msample/s 0.25m CMOS ADC in 1mm2 with mixedsignal chopping and calibration.
IEEE J. SolidState Circuits 36(12), 18591867 (2001)
20. M. Clara, A. Wiesbauer, F. Kuttner, A 1.8V fully embedded 10 b 160 MS/s twostep ADC
in 0.18m CMOS, in Proceedings of IEEE Custom Integrated Circuit Conference, pp.
437440, 2002
21. T.C. Lin, J.C. Wu, A twostep A/D converter in digital CMOS processes, in Proceedings of
IEEE AsiaPacific Conference on ASIC, pp. 177180, 2002
22. A. Zjajo, H. van der Ploeg, M. Vertregt, A 1.8V 100mW 12bits 80Msample/s two
step ADC in 0.18m CMOS, in Proceedings of IEEE European SolidState Circuits
Conference, pp. 241244, 2003
23. N. Ning, F. Long, S.Y. Wu, Y. Liu, G.Q. Liu, Q. Yu, M.H. Yang, An 8Bit 250MSPS mod
ified twostep ADC, in Proceedings of IEEE International Conference on Communications,
Circuits and Systems, pp. 21972200, 2006
24. S. Hashemi, B. Razavi, A 7.1 mW 1 GS/s ADC with 48dB SNDR at Nyquist rate. IEEE J.
SolidState Circuits 49(8), 17391750 (2014)
25. A. Wiesbauer, M. Clara, M. Harteneck, T. Potscher, C. Fleischhacker, G. Koder, C. Sandner,
A fully integrated analog frontend macro for cable modem applications in 0.18m CMOS,
in Proceedings of IEEE European SolidState Circuits Conference, pp. 245248, 2001
26. R.C. Taft, M.R. Tursi, A 100MS/s 8b CMOS subranging ADC with sustained parametric
performance from 3.8V down to 2.2 V. IEEE J. SolidState Circuits 36(3), 331338 (2001)
27. J. Mulder, C.M. Ward, C.H. Lin, D. Kruse, J.R. Westra, M. Lughtart, E. Arslan, R.J. van de
Plassche, K. Bult, F.M.L. van der Goes, A 21mW 8b 125MSample/s ADC in 0.09mm2
0.13m CMOS. IEEE J. SolidState Circuits 39(5), 21162125 (2004)
28. P.M. Figueiredo, P. Cardoso, A. Lopes, C. Fachada, N. Hamanishi, K. Tanabe, J. Vital, A
90nm CMOS 1.2V 6b 1GS/s twostep subranging ADC, in IEEE International SolidState
Circuits Conference Digest of Technical Papers, pp. 568569, 2006
29. Y. Shimizu, S. Murayama, K. Kudoh, H. Yatsuda, A 30mW 12b 40MS/s subranging ADC
with a highgain offsetcanceling positivefeedback amplifier in 90nm digital CMOS, in
IEEE International SolidState Circuits Conference Digest of Technical Papers, pp. 216
217, 2006
30. J. Huber, R.J. Chandler, A.A. Abidi, A 10b 160MS/s 84mW 1V subranging ADC in 90nm
CMOS, in IEEE International SolidState Circuits Conference Digest of Technical Papers,
pp. 454455, 2007
31. C. Cheng, Y. Jiren, A 10bit 500MS/s 124mW subranging folding ADC in 0.13m CMOS
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 17091712,
2007
32. Y. Shimizu, S. Murayama, K. Kudoh, H. Yatsuda, A splitload interpolationamplifierarray
300MS/s 8b subranging ADC in 90nm CMOS, in IEEE International SolidState Circuits
Conference Digest of Technical Papers, pp. 552553, 2008
33. K. Yoshioka etal., Dynamic architecture and frequency scaling in 0.81.2 GS/s 7b subrang
ing ADC. IEEE J. SolidState Circuits 50(4), 932945 (2015)
34. D.A. Mercer, A 14b, 2.5 MSPS pipelined ADC with onchip EPROM. IEEE J. SolidState
Circuits 31(1), 7076 (1996)
35. I. Opris, L. Lewicki, B. Wong, A singleended 12bit 20 MSample/s selfcalibrating pipeline
A/D converter. IEEE J. SolidState Circuits 33(11), 18981903 (1998)
36. A.M. Abo, P.R. Gray, A 1.5V, 10bit, 14.3MS/s CMOS pipeline analogtodigital con
verter. IEEE J. SolidState Circuits 34(5), 599606 (1999)
37. H.S. Chen, K. Bacrania, B.S. Song, A 14b 20MSample/s CMOS pipelined ADC, in IEEE
International SolidState Circuits Conference Digest of Technical Papers, pp. 4647, 2000
38. I. Mehr, L. Singer, A 55mW, 10bit, 40Msample/s Nyquistrate CMOS ADC. IEEE J.
SolidState Circuits 35(3), 7076 (2000)
39. Y. Chiu, Inherently linear capacitor erroraveraging techniques for pipelined A/D conver
sion, in IEEE Transaction on Circuits and SystemsII, vol. 47, pp. 229232, 2000
72 3 Neural Signal Quantization Circuits
40. X. Wang, P.J. Hurst, S.H. Lewis, A 12bit 20Msample/s pipelined analogtodigital converter
with nested digital background calibration. IEEE J. SolidState Circuits 39(11), 17991808
(2004)
41. D. Kurose, T. Ito, T. Ueno, T. Yamaji, T. Itakura, 55mW 200MSPS 10bit pipeline ADCs
for wireless receivers, in Proceedings of IEEE European SolidState Circuits Conference,
pp. 527530, 2005
42. C.T. Peach, A. Ravi, R. Bishop, K. Soumyanath, D.J. Allstot, A 9b 400 Msample/s pipe
lined analogtodigital converter in 90nm CMOS, in Proceedings of IEEE European Solid
State Circuits Conference, pp. 535538, 2005
43. A.M.A. Ali, C. Dillon, R. Sneed, A.S. Morgan, S. Bardsley, J. Kornblum, L. Wu, A 14bit
125 MS/s IF/RF sampling pipelined ADC with 100dB SFDR and 50fs Jitter. IEEE J. Solid
State Circuits 41(8), 18461855 (2006)
44. M. Daito, H. Matsui, M. Ueda, K. Iizuka, A 14bit 20MS/s pipelined ADC with digital dis
tortion calibration. IEEE J. SolidState Circuits 41(11), 24172423 (2006)
45. T. Ito, D. Kurose, T. Ueno, T. Yamaji, T. Itakura, 55mW 1.2V 12bit 100MSPS pipeline
ADCs for wireless receivers, Proceedings of IEEE European SolidState Circuits Conference,
pp. 540543, 2006
46. J. Treichler, Q. Huang, T. Burger, A 10bit ENOB 50MS/s pipeline ADC in 130nm CMOS at
1.2V supply, in Proceedings of IEEE European SolidState Circuits Conference, pp. 552555,
2006
47. I. Ahmed, D.A. Johns, An 11bit 45MS/s pipelined ADC with rapid calibration of DAC
errors in a multibit pipeline stage, in Proceedings of IEEE European SolidState Circuits
Conference, pp. 147150, 2007
48. S.C. Lee, Y.D. Jeon, J.K. Kwon, J. Kim, A 10bit 205MS/s 1.0mm2 90nm CMOS pipe
line ADC for flat panel display applications. IEEE J. SolidState Circuits 42(12), 26882695
(2007)
49. J. Li, R. Leboeuf, M. Courcy, G. Manganaro, A 1.8V 10b 210MS/s CMOS pipelined ADC
featuring 86dB SFDR without calibration, in Proceedings of IEEE Custom Integrated
Circuits Conference, pp. 317320, 2007
50. M. Boulemnakher, E. Andre, J. Roux, F. Paillardet, A 1.2V 4.5mW 10b 100MS/s pipeline
ADC in a 65nm CMOS, in IEEE International SolidState Circuits Conference Digest of
Technical Papers, pp. 250251, 2008
51. Y.S. Shu, B.S. Song, A 15bit linear 20MS/s pipelined ADC digitally calibrated with sig
naldependent dithering. IEEE J. SolidState Circuits 43(2), 342350 (2008)
52. J. Shen, P.R. Kinget, A 0.5V 8bit 10Ms/s pipelined ADC in 90nm CMOS. IEEE J. Solid
State Circuits 43(4), 17991808 (2008)
53. C.J. Tseng, Y.C. Hsieh, C.H. Yang, H.S. Chen, A 10bit 200 MS/s capacitorsharing pipe
line ADC. IEEE Trans. Circuits Syst.I: Regul. Pap. 60(11), 29022910 (2013)
54. R. Sehgal, F. van der Goes, K. Bult, A 12 b 53 mW 195 MS/s pipeline ADC with 82dB
SFDR using splitADC calibration. IEEE J. SolidState Circuits 50(7), 15921603 (2015)
55. L. Yong, M.P. Flynn, A 100 MS/s 10.5 bit 2.46 mW comparatorless pipeline ADC using
selfbiased ring amplifiers. IEEE J. SolidState Circuits 50(10), 23312341 (2015)
56. S.H. Lewis, H.S. Fetterman, G.F. Gross, R. Ramachandran, T.R. Viswanathan, A 10b
20Msample/s analogtodigital converter, in IEEE Journal of SolidState Circuits, vol. 27,
no. 3, pp. 351358, 1992
57. B. Xia, A. ValdesGarcia, E. SanchezSinencio, A configurable timeinterleaved pipeline
ADC for multistandard wireless receivers, in Proceedings of IEEE European SolidState
Circuits Conference, pp. 259262, 2004
58. S.C. Lee, G.H. Kim, J.K. Kwon, J. Kim, S.H. Lee, Offset and dynamic gainmismatch
reduction techniques for 10b 200Ms/s parallel pipeline ADCs, in Proceedings of IEEE
European SolidState Circuits Conference, pp. 531534, 2005
59. S. Limotyrakis, S.D. Kulchycki, D.K. Su, B.A. Wooley, A 150MS/s 8b 71mW CMOS
timeinterleaved ADC. IEEE J. SolidState Circuits 40(5), 10571067 (2005)
References 73
60. C.C. Hsu, F.C. Huang, C.Y. Shih, C.C. Huang, Y.H. Lin, C.C. Lee, B. Razavi, An 11b
800MS/s timeinterleaved ADC with digital background calibration, in IEEE International
SolidState Circuits Conference Digest of Technical Papers, pp. 464465, 2007
61. Z.M. Lee, C.Y. Wang, J.T. Wu, A CMOS 15bit 125MS/s timeinterleaved ADC with
digital background calibration. IEEE J. SolidState Circuits 42(10), 21492160 (2007)
62. C.Y. Chen etal., A 12bit 3 GS/s pipeline ADC with 0.4mm2 and 500 mW in 40nm digital
CMOS. IEEE J. SolidState Circuits 47(4), 10131021 (2012)
63. J. Park, H.J. Park, J.W. Kim, S. Seo, P. Chung, A 1 mW 10bit 500 kSps SAR A/D converter,
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 581584, 2000
64. P. Confalonleri et. al., A 2.7 mW 1 MSps 10 b analogtodigital converter with builtin
reference buffer and 1 LSB accuracy programmable input ranges, in Proceedings of IEEE
European SolidState Circuits Conference, pp. 255258, 2004
65. N. Verma, A.P. Chandrakasan, An ultra low energy 12bit rateresolution scalable SAR ADC
for wireless sensor nodes. IEEE J. SolidState Circuits 42(6), 11961205 (2007)
66. C.C. Liu etal., A 10bit 50MS/ SAR ADC with a monotonic capacitor switching proce
dure. IEEE J. SolidState Circuits 45(4), 731740 (2010)
67. S. Shikata, R. Sekimoto, T. Kuroda, H. Ishikuro, A 0.5V 1.1 MS/sec 6.3 fJ/conversionstep
SARADC with trilevel comparator in 40nm CMOS. IEEE J. SolidState Circuits 47(4),
10221030 (2012)
68. Z. Dai, A. Bhide, A. Alvandpour, A 53nW 9.1ENOB 1kS/s SAR ADC in 0.13m CMOS
for medical implant devices. IEEE J. SolidState Circuits 47(7), 15851593 (2012)
69. G.Y. Huang etal., A 1W 10bit 200kS/s SAR ADC with a bypass window for biomedical
applications. IEEE J. SolidState Circuits 47(11), 27832795 (2012)
70. M. Yip, A.P. Chandrakasan, A resolutionreconfigurable 5to10bit 0.4to1V power scalable
SAR ADC for sensor applications. IEEE J. SolidState Circuits 48(6), 14531464 (2013)
71. P. Harpe, E. Cantatore, A. van Roermund, A 10b/12b 40 kS/s SAR ADC with datadriven
noise reduction achieving up to 10.1b ENOB at 2.2 fJ/conversionstep. IEEE J. SolidState
Circuits 48(12), 30113018 (2013)
72. F.M. Yaul, A.P. Chandrakasan, A 10b SAR ADC with datadependent energy reduction using
LSBfirst successive approximation. IEEE J. SolidState Circuits 49(12), 28252834 (2014)
73. J.H. Tsai etal., A 0.003mm2 10 b 240 MS/s 0.7 mW SAR ADC in 28nm CMOS with dig
ital error correction and correlatedreversed switching. IEEE J. SolidState Circuits 50(6),
13821398 (2015)
74. B.S. Song, M.F. Tompsett, K.R. Lakshmikumar, A 12 bit 1MHz capacitor error averaging
pipelined A/D converter. IEEE J. SolidState Circuits 23(10), 13241333 (1988)
75. Y.M. Lin, B. Kim, P.R. Gray, A 13b 2.5MHz selfcalibrated pipelined A/D converter in
3m CMOS. IEEE J. SolidState Circuits 26(5), 628635 (1991)
76. C.S.G. Conroy, D.W. Cline, P.R. Gray, A highspeed parallel pipelined ADC technique in
CMOS, Proceedings of IEEE Symposium on VLSI Circuits, pp. 9697, 1992
77. B.S. Song, M.F. Tompsett, K.R. Lakshmikumar, A 12 bit 1MHz capacitor error averaging
pipelined A/D. IEEE J. SolidState Circuits 23(10), 13241333 (1988)
78. J.M. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design
Perspective, 2nd edn. (Prentice Hall, New Jersey, 2003)
79. A.A. Abidi, Highfrequency noise measurements on FETs with small dimensions. IEEE
Trans. Electron Devices 33(11), 18011805 (1986)
80. C. Enz, Y. Cheng, MOS transistor modeling for RF IC design. IEEE J. SolidState Circuits
35(2), 186201 (2000)
81. A.M. Abo, P.R. Gray, A 1.5V, 10bit, 14.3MS/s CMOS pipeline analogtodigital converter.
IEEE J. SolidState Circuits 34(5), 599606 (1999)
82. B.J. Hosticka, Improvement of the gain of MOS amplifiers. IEEE J. SolidState Circuits
14(6), 11111114 (1979)
83. E. Sckinger, W. Guggenbhl, A Highswing, highimpedance MOS cascode circuit. IEEE
J. SolidState Circuits 25(1), 289297 (1990)
74 3 Neural Signal Quantization Circuits
84. U. Gatti, F. Maloberti, G. Torelli, A novel CMOS linear transconductance cell for continuous
time filters, in Proceedings of IEEE International Symposium on Circuits and Systems,
pp. 11731176, 1990
85. C.A. Laber, P.R. Gray, A positivefeedback transconductance amplifier with applications to
high frequency high Q CMOS switched capacitor filters. IEEE J. SolidState Circuits 13(6),
13701378 (1988)
86. A.A. Abidi, An analysis of bootstrapped gain enhancement techniques. IEEE J. SolidState
Circuits 22(6), 12001204 (1987)
87. B.J. Hosticka, Dynamic CMOS amplifiers. IEEE J. SolidState Circuits 15(5), 881886
(1980)
88. K. Bult, G. Geelen, A fastsettling CMOS op amp for SC circuits with 90dB DC gain.
IEEE J. SolidState Circuits 25(6), 13791384 (1990)
89. R. Ockey, M. Syrzycki, Optimization of a latched comparator for highspeed analogtodigital
converters, in IEEE Canadian Conference on Electrical and Computer Engineering, vol. 1,
pp. 403408, 1999
90. F. Murden, R. Gosser, 12b 50MSample/s twostage A/D converter, in IEEE International
SolidState Circuits Conference Digest of Technical Papers, pp. 278279, 1995
91. J. Robert, G.C. Temes, V. Valencic, R. Dessoulavy, D. Philippe, A 16bit lowvoltage CMOS
A/D converter. IEEE J. SolidState Circuits 22(2), 157263 (1987)
92. T.B. Cho, P.R. Gray, A 10 b, 20 Msample/s, 35 mW pipeline A/D converter. IEEE J. Solid
State Circuits 30(3), 166172 (1995)
93. L. Sumanen, M. Waltari, K. Halonen, A mismatch insensitive CMOS dynamic comparator
for pipeline A/D converters, in Proceedings of the IEEE International Conference on Circuits
and Systems, pp. 3235, 2000
94. T. Kobayashi, K. Nogami, T. Shirotori, Y. Fujimoto, A currentcontrolled latch sense ampli
fier and a static powersaving input buffer for lowpower architecture. IEEE J. SolidState
Circuits 28(4), 523527 (1993)
95. P.M. Figueiredo, J.C. Vital, Low kickback noise techniques for CMOS latched comparators,
in IEEE International Symposium on Circuits and Systems, vol. 1, pp. 537540, 2004
96. B. Nauta, A.G.W. Venes, A 70MS/s 110mW 8b CMOS folding and interpolating A/D
converter. IEEE J. SolidState Circuits 30(12), 13021308 (1995)
97. J. Lin, B. Haroun, An embedded 0.8V/480W 6b/22MHz flash ADC in 0.13m digital
CMOS Process using nonlinear doubleinterpolation technique, in IEEE International Solid
State Circuits Conference Digest of Technical Papers, pp. 244246, 2002
98. F. Shahrokhi etal., The 128channel fully differential digital integrated neural recording and
stimulation interface. IEEE Trans. Biomed. Circuits Syst. 4(3), 149161 (2010)
99. H. Gao etal., HermesE: a 96channel full data rate direct neural interface in 0.13um CMOS.
IEEE J. SolidState Circuits 47(4), 10431055 (2012)
100. D. Han etal., A 0.45V 100channel neuralrecording IC with subW/channel comsumption
in 0.18m CMOS. IEEE Trans. Biomed. Circuits Syst. 7(6), 735746 (2013)
101. M.S. Chae, W. Liu, M. Sivaprakasham, Design optimization for integrated neural recording
systems. IEEE J. SolidState Circuits 43(9), 19311939 (2008)
102. T.M. Seese, H. Harasaki, G.M. Saidel, C.R. Davies, Characterization of tissue morphology,
angiogenesis, and temperature in the adaptive response of muscle tissue to chronic heating.
Lab. Invest. 78(12), 15531562 (1998)
103. A. RodrguezPrez etal., A 64channel inductivelypowered neural recording sensor array,
in Proceedings of IEEE Biomedical Circuits and Systems Conference, pp. 228231, 2012
104. C. Enz, Y. Cheng, MOS transistor modeling for RF IC design. IEEE J. SolidState Circuits
35(2), 186201 (2000)
105. S. Song etal., A 430nW 64nV/VHz currentreuse telescopic amplifier for neural recording
application, in Proceedings of IEEE Biomedical Circuits and Systems Conference, pp. 322325,
2013
106. X. Zou etal., A 100channel 1mW implantable neural recording IC. IEEE Trans. Circuits
Syst. I Regul. Pap. 60(10), 25842596 (2013)
References 75
107. J. Lee, H.G. Rhew, D.R. Kipke, M.P. Flynn, A 64 channel programmable closedloop neu
rostimulator with 8 channel neural amplifier and logarithmic ADC. IEEE J. SolidState
Circuits 45(9), 19351945 (2010)
108. K. Abdelhalim, R. Genov, CMOS DACsharing stimulator for neural recording and stimula
tion arrays, in Proceedings of IEEE International Symposium on Circuits and Systems, pp.
17121715, 2011
109. A. Rossi, G. Fucilli, Nonredundant successive approximation register for A/D converters.
Electronic Lett. 32(12), 10551056 (1996)
110. S. Narendra, V. De, S. Borkar, D.A. Antoniadis, A.P. Chandrakasan, Fullchip subthreshold
leakage power prediction and reduction techniques for sub0.18m CMOS. IEEE J. Solid
State Circuits 39(2), 501510 (2004)
111. B. Haaheim, T.G. Constandinou, A sub1W, 16kHz Currentmode SARADC for single
neuron spike recording, in Proceedings of IEEE Biomedical Circuits and Systems Conference,
pp. 29572960, 2012
112. A. Agarwal, Y.B. Kim, S. Sonkusale, Low power current mode ADC for CMOS sensor IC,
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 584587,
2005
113. R. Dlugosz, K. Iniewski, Ultra low power currentmode algorithmic analogtodigital
converter implemented in 0.18m CMOS technology for wireless sensor network, in
Proceedings of IEEE International Conference on Mixed Design of Integrated Circuits and
Systems, pp. 401406, 2006
114. S. AlAhdab, R. Lotfi, W. Serdijn, A 1V 225nW 1kS/s current successive approximation
ADC for pacemakers, in Proceedings of IEEE International Conference on Ph.D. Research
in Microelectronics and Electronics, pp. 14, 2010
115. Y. Sugimoto, A 1.5V currentmode CMOS sampleandhold IC with 57dB S/N at 20 MS/s
and 54dB S/N at 30 MS/s. IEEE J. SolidState Circuits 36(4), 696700 (2001)
116. B. LinaresBarranco, T. SerranoGotarredona, On the design and characterization of femto
ampere currentmode circuits. IEEE J. SolidState Circuits 38(8), 13531363 (2003)
117. E. Allier etal., 120nm low power asynchronous ADC, in Proceedings of IEEE
International Symposium on Low Power Electronic Design, pp. 6065, 2005
118. M. Park, M.H. Perrot, A singleslope 80MS/s ADC using twostep timetodigital conversion,
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 11251128,
2009
119. S. Naraghi, M. Courcy, M.P. Flynn, A 9bit, 14W and 0.006mm2 pulse position modulation
ADC in 90nm digital CMOS. IEEE J. SolidState Circuits 45(9), 18701880 (2010)
120. A.P. Chandrakasan etal., Technologies for ultradynamic voltage scaling. Proc. IEEE 98(2),
191214 (2010)
121. J.K. Fiorenza etal., Comparatorbased switchedcapacitor circuits for scaled CMOS tech
nologies. IEEE J. SolidState Circuits 41(12), 26582668 (2006)
122. J.P. Jansson, A. Mantyniemi, J. Kostamovaara, A CMOS timetodigital converter with better
than 10ps singleshot precision. IEEE J. SolidState Circuits 41(6), 12861296 (2006)
123. L. Brooks, H.S. Lee, A 12b, 50 MS/s, fully differential zerocrosssing based pipelined
ADC. IEEE J. SolidState Circuits 44(12), 33293343 (2009)
124. K. Blutman, J. Angevare, A. Zjajo, N. van der Meijs, A 0.1pJ freeze Vernier timetodigital
converter in 65nm CMOS, in Proceedings of IEEE International Symposium on Circuits
and Systems, pp. 8588, 2014
125. R.H. Walden, Analogtodigital converter survey and analysis. IEEE J. Sel. Areas Commun.
17, 539550 (1999)
126. C.M. Lopez etal., An implantable 455activeelectrode 52channel CMOS neural probe.
IEEE J. SolidState Circuits 49(1), 248261 (2014)
127. T. Rabuske etal., A selfcalibrated 10bit 1MSps SAR ADC with reducedvoltage charge
sharing DAC, in Proceedings of IEEE International Symposium on Circuits and Systems,
pp. 24522455, 2013
76 3 Neural Signal Quantization Circuits
128. C. Gao etal., An ultralowpower extended counting ADC for large scale sensor arrays, in
Proceedings of IEEE International Symposium on Circuits and Systems, pp. 8184, 2014
129. L. Zheng etal., An adaptive 16/64kHz, 9bit SAR ADC with peakaligned sampling
for neural spike recording, in IEEE International Symposium on Circuits and Systems,
pp. 23852388, 2014
130. Y.W. Cheng, K.T. Tang, A 0.5V 1.28MS/s 10bit SAR ADC with switching detect logic,
in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 293296,
2015
Chapter 4
Neural Signal Classification Circuits
4.1Introduction
an electrode records the action potentials from multiple surrounding neurons (e.g.,
due to the background activity of other neurons, slight perturbations in electrode
position or external electrical or mechanical interference, etc.), and the recorded
waveform/spikes consist of the superimposed potentials fired from these neurons.
The ability to distinguish spikes from noise [4], and to distinguish spikes from
different sources from the superimposed waveform, therefore depends on both
the discrepancies between the noisefree spikes from each source and the signal
tonoise level (SNR) in the recording system. The time occurrences of the action
potentials emitted by the neurons close to the electrode are detected, depending on
the SNR, either by voltage thresholding with respect to an estimation of the noise
amplitude in the signal or with a more advanced technique, such as continuous
wavelet transform [5]. After the waveform alignment, to simplify the classifica
tion process, a feature extraction step, such as principal component analysis (PCA)
[6] or wavelet decomposition [7] characterizes detected spikes and represents each
detected spike in a reduceddimensional space, i.e., for a spike consisting of n
sample points, the feature extraction method produces m variables (m<n), where
m is the number of features. Based on these features the spikes are classified into
mdimensional clusters by kmeans [8], expectation maximization (EM) [9], tem
plate matching [10], Bayesian clustering [11], and artificial neural network (ANN)
with each cluster corresponding to the spiking activity of a single neuron.
The support vector machine (SVM) has been introduced to bioinformatics
and spike classification/sorting [1214] because of its excellent generalization,
sparse solution, and concurrent utilization of quadratic programming, which pro
vides global optimization. This absence of local minima is a substantial difference
from the artificial neural network classifiers. Like ANN classifiers, applications
of SVMs to any classification problem require the determination of several user
defined parameters, e.g., choice of an appropriate kernel and related parameters,
determination of regularization parameter (i.e., C) and an appropriate optimization
technique. Correspondingly, SVM applies the structure risk minimization instead
of the empirical risk minimization and solves the problems of nonlinear, dimen
sionality curse efficiently. However, the methods [1214] could not identify multi
class neural spikes nor could they decompose overlapping neural spikes resulting
from variable triggering of data collection (e.g., due to noise or other spike events
leading to premature or delayed waveform). Recording multiple spikes on a spe
cific electrode can also create complex sums of neuron waveforms [15].
In this chapter, we present a 128channel, programmable, neural spike classifier
based on nonlinear energy operator spike detection, and multiclass kernel support
vector machine classification that is able to accurately identify overlapping neural
spikes even for low SNR. For efficient algorithm execution, we transform the mul
ticlass problem with the Keslers construction and extend iterative greedy optimi
zation reduced set vectors approach with a cascaded method. The powerefficient,
multichannel clustering is achieved by a combination of the several algorithm and
circuit techniques, namely, the Keslers transformation, a boosted cascade reduced
set vectors approach, a twostage pipeline processing units, the powerscalable
kernels, the registerbank memory, highVT devices, and a nearthreshold supply.
4.1Introduction 79
The results obtained in a 65nm CMOS technology show that an efficient, large
scale neural spike data classification can be obtained with a low power (less than
41W, corresponding to a 15.5W/mm2 of power density), compact, and a low
resource usage structure (31k logic gates resulting in a 2.64mm2 area).
This chapter is organized as follows: Sect.4.2 focuses on the neural spike clas
sifier and associated design decisions. In Sect.4.3, SVM training and classification
are described, and iterative greedy optimization reduced set vectors approach is
extended with a cascaded method boosted cascade. Section4.4 elaborates experi
mental results. Finally, Sect.4.5 provides a summary and the main conclusions.
4.2Spike Detector
neural feature
spike detection classification sorting
signals extraction
(ex: threshold) (ex: Kmeans) results
(from ADC) (ex: PCA)
training
previous art required
onchip
proposed system implementation training
recording low noise bandpass filter programmable N:1 mux A/D converter
gain amplifier required
electrode amplifier
neural energyfilter maxmin multiclass
based spike feature SVM sorting
signals
detection extraction classification results
(from ADC)
timemultiplexed
neural samples
SRAM for
control unit
16channel X[n2] X[n1] X[n] X[n+1] X[n+2] sorting
maxmin multiclass
input neural results
feature SVM
configuration signal
extractor classification
spike detection decision unit noise shaping
filter
FSM
energy filter threshold unit
instruction
SRAM
system control unit
data SRAM arbiter ALU
spike detection algorithm programmability and parameter set flexibility. The sys
tem control unit is loaded with 32 10bit filter coefficients and a 16bit threshold
value. The spike detector algorithm calculates the energy function for waveforms
inside a slicing window; when a spike event reaches the threshold, a spike data is
stored and transferred for the alignment process and further feature extraction. The
noiseshaping filter provides the spike waveforms derivatives to identify neurons
kernel signatures (including the positive and negative peaks of the spike derivative
and spike height). The filter coefficients are programmable through the coefficient
register array. Consequently, a variety of noise profiles and spike widths can be
precisely tuned. To attain the marginal phase distortion, we utilized Bessel filter
structure. For realtime, highsignal throughput, all spikeprocessing operations
including detection, filtering, and feature extraction are performed in parallel.
The SRAM is implemented as the registerbank memory, since it can be scaled
to subthreshold voltages (i.e., to reduce the leakage power). In contrast, the com
piled SRAM has limited read noise margin, and subsequently, cannot be scaled
below 0.7V.
The registerbank memories are organized as spike registers [16], as shown in
Fig.4.3. Each spike register module consists of 10bit registers to save the spike
waveforms, and a delay line for clock gating. The decoder enables sequential,
clockcontrolled selection of each spike sample S from a spike register. In each
register 1
write decoder
w_en spk_out
spike
clk_enN
register N
addr_w addr_r
10bit spike register, only 1bit D flipflops have an active clock. Accordingly,
such delaylinebased clockgating arrangement reduces the redundant clock tran
sitions, and subsequently, allows 10fold reduction in the clockswitching power
(corresponding to a 32% reduction in the total power consumed by the memory).
4.3Spike Classifier
The support vector machine is a linear classifier in the parameter space; never
theless, it becomes a nonlinear classifier as a result of the nonlinear mapping of
the space of the input patterns into the highdimensional feature space. The clas
sifier operations can be combined to realize variety of multiclass [21] and ensem
ble classifiers (e.g., classifier trees and adaptive boosting [22]). Instead of creating
many binary classifiers to determine the class labels, we solve a multiclass prob
lem directly [23] by modifying the binary class objective function and adding and
constraining it for every class. The modified objective function allows simultane
ous computation of multiclass classification [24]. Let us consider labeled train
ing spike trains of N data points {yk(i) , xk }k=1,
k=N, i=m
i=1 , where xk is the kth input pattern
from ndimensional space Rn and y(i)k denotes the output of the ith output unit for
pattern k, i.e., approach very similar to ANN methodology. The m outputs can
encode q=2m different classes. The training procedure of the SVM corresponds
to a convex optimization and amounts to solving a constrained quadratic optimiza
tion problem (QP); the solution found is, thus, guaranteed to be the unique global
minimum of the objective function. To maximize the margin of y(x), and b are
chosen such that they minimize the nearest integer  subject to the optimization
problem formulated as [25]
m N m
(m) 1
min JLS (i , bi , k,i ) = min i 22 + bi2 + C k,i (4.1)
i ,bi ,k ,i 2
i=1 k=1 i=1
term b2/2 is added to the objective. To solve the optimization problem, we use the
KarushKuhnTucker theorem [27]. We add a dual set of variables, one for each
constraint and obtain the Lagrangian of the optimization problem (4.1)
N
(m) (i)
L(m) (i , bi , k,i ; k,i ) = JLS k,i {yk [iT i (xk ) + bi ] 1+k,i } (4.3)
k=1
for k=1,, N and i=1,, m. The offset of the hyperplane from the origin is
determined by the parameter b/. The function (.) is a nonlinear function,
which maps the input space into a higher dimensional space. To avoid working
with the highdimensional map , we instead choose a kernel function by defin
ing the dot product in Hilbert space
(x)T (xk ) = (x, xk ) (4.5)
enabling us to treat nonlinear problems with principally linear techniques.
Formally, is a symmetric, positive semidefinite Mercer kernel; the only con
dition required is that the kernel satisfies a general positivity constraint [27].
To allow for mislabeled examples a modified maximum margin technique is
employed [28]. If there exists no hyperplane x + b = 0 that can divide differ
ent classes, the objective function is penalized with nonzero slack variables i. The
modified maximum margin technique then finds a hyperplane that separates the
training set with a minimal number of errors and the optimization becomes a
tradeoff between a large margin and a small error penalty . The maximum mar
gin hyperplane and consequently the classification task is then only a function of
the support vectors
N
N
max Q1 (k ; (xk , xl )) = k 1/2 yk yl (xk , xl )k l
k
k=1 k,l=1
N
s.t. Rm 0 k C, k = 1, . . . , N, k yk = 0
k=1
(4.6)
where k are weight vectors. The QP optimization task in (4.6) is solved efficiently
using sequential minimal optimization, i.e., by constructing the optimal separating
4.3 Spike Classifier 83
hyperplane for the full dataset [29]. Typically, many k go to zero during optimiza
tion, and the remaining xk corresponding to those k>0 are called support vectors.
To simplify notation, we assume that all nonsupport vectors have been removed,
so that Nx is now the number of support vectors, and k>0 for all k. The resulting
classification function f(x) in (4.6) has the following expansion:
N
f (x) = sgn k yk (x, xk ) + b (4.7)
k=1
where the support vector machine classifier uses the sign of f(x) to assign a class
label y to the object x [30]. The complexity of the computation of (4.7) scales
with the number of support vectors. To simplify the kernel classifier trained by the
SVM, we approximate an input pattern xkR (using (4.7)), e.g., =k(xk) by
a reduced set vectors ziR, e.g., =k(zk), kR, where the weight vector
kR and the vectors zi determine the reduced kernel expansion. The problem of
finding the reduced kernel expansion can be stated as the optimization task
Nx
2
min   = min k l (xk xl )
,z ,z
k,l=1
Nz Nz
Nx
(4.8)
+ k l (zk zl ) 2 k l (xk zl )
k,l=1 k=1 l=1
Although is not given explicitly, (4.8) can be computed (and minimized) in terms
of the kernel and carried out over both the zk and k. The reduced set vectors zk and
the coefficients l,k for a classifier fl(x) are solved by iterative greedy optimization [31]
m
fl (x) = sgn l,k (x, zl ) + b, l = 1, . . . , Nz (4.9)
k=1
For a given complexity (i.e., number of reduced set vectors) the classifier provides
the optimal greedy approximation of the full SVM decision boundary; the first one
is the one which, using the objective function (4.8) is closest to the full SVM (4.7)
constrained to using only one reduced set vector.
The transformation from the multiclass SVM problem in (4.1) to the single
class problem is based on the Keslers construction [28, 30]. Resulting SVM clas
sifier is composed of the set of discriminant functions, which are computed as
fl (x) = (xk x) km ((l, yk ) (l, m)) + bl
(4.10)
k m
(a) (b)
td/8 td/8 td/8 td/8 td/8 td/8 td/8 td/8
training
N neural feature 1st layer
selection and cascade
signals
classifier classifier
sv(x1) sv(x2)sv(x3) sv(x4)sv(x5) sv(x6)sv(x7) sv(x8)
training training
2nd layer
sv(x9) sv(x10) sv(x11) sv(x12)
detection
N neural 3rd layer
signals pre Result
classification
processing sv(x13) sv(x14)
4th layer
sv(x15)
Since the data xk appears only in the form of dot products in the dual form, we
can construct the dot product (xk, zl) using the Kronecker delta, i.e., (k, l)=1 for
k=l, and (k, l)=0 for kl and map it to a reproducing kernel Hilbert space
such that the dot product obtains the same value as the function . This property
allows us to configure the SVM classifier via various energyscalable kernels [32]
for finding nonlinear classifiers. For (.,.) one typically has the following choices:
T
(x,xk) = xTk x (linear SVM); (x,xk) =(xk x +1)d (polynomial SVM of degree
T
d); (x,xk)=tanh[xk x] sigmoid SVM); (x,xk)=exp{xxk2} (radial basis
function (RBF) SVM); (x,xk)=exp{xxk/(22)} exponential radial basis func
tion (ERBF) SVM; and (x,xk)=exp{xxk2/(22)} Gaussian RBF SVM, where
, , , and are positive real constants. The kernels yield increasing levels of
strength (e.g., false alarm for linear kernel of 18 per day decrease to 1.2 per day
for RBF kernel [33]). However, the required power for each kernel (from simula
tion of the CPU) varies by orders of magnitude.
The complexity of the computation of (4.10) scales with the number of sup
port vectors. To simplify the kernel classifier trained by the SVM, we extend itera
tive greedy optimization reduced set vectors approach [31] with boosted cascade
classifier (Fig.4.4). Accordingly, the reduced expansion is not evaluated at once,
but rather in a cascaded way, such that in most cases a very small number of sup
port vectors are applied. The computation of classification function fl(x) involves
matrixvector operations, which are highly parallelizable. Therefore, the problem
is segmented into smaller ones and parallel units are instantiated for the processing
of each subproblem. Consider a set of reduced set vectors classification functions
where the lth function is an approximation with l vectors, chained into a sequence.
After partition of the data into disjoint subsets, we iteratively train the SVM on
subsets of the original dataset and combine support vectors of resulting models to
create new training sets [34, 35]. A query vector is then evaluated by every func
tion in the cascade and if classified negative the evaluation stops
fc,l (x) = sgn(f1 (x))sgn(f2 (x)) . . . , (4.12)
4.3 Spike Classifier 85
where fc,l(x) is the cascade evaluation function of (4.10). In other words, we bias
each cascade level in a way that one of the binary decisions is very confident, while
the other is uncertain and propagates the data point to the next, more complex cas
cade level. Biasing of the functions f is done by setting the parameter b to achieve a
desired accuracy of the function on an evaluation set. When a run through the cas
cade is completed, we combine the remaining support vectors of the final model
with each subset from the first step of the first run. Frequently, a single pass through
the cascade produces satisfactory accuracy, however, if the global optimum is to be
reached, the result of the last level is fed back into the first level to test its fraction of
the input vectors, i.e., whether any of the input vectors have to be incorporated into
the optimization. If this is not valid for all input layer support vectors, the cascade is
converged to the global optimum, else it proceeds with additional pass through the
network.
The training data (td) in Fig.4.4 are split into subsets, and each one is eval
uated individually for support vectors in the first layer [36]. Hence, eliminating
nonsupport vectors early from the classification, significantly accelerates SVM
procedure. The scheme requires only modest communication from one layer to
the next, and a satisfactory accuracy is often obtained with a single pass through
the cascade. When passing through the cascade, merged support vectors are used
to test data d for violations of the KarushKuhnTucker (KKT) conditions [37]
(Fig. 4.5a). Violators are then combined with the support vectors for the next
iteration. The required arithmetic over feature vectors (the elementwise operands
as well as SVM model parameters) is executed with, twostage pipeline (i.e., to
reduce glitch propagation) processing unit (Fig.4.5b). Flipflops are inserted in
the pipeline to lessen the impact of active glitching [38], and to reduce the leakage
energy.
(a) (b)
d1 d2 sv(xi)[j] xj[j]
SUB
Test KKT Test KKT
Merge Merge
MULT
sv(x1) sv(x2)
F/F
0 b
Merge
ADD/SUB
sv(x3)
F/F F/F
f[j] k(.)
Fig.4.5a A cascade with two input sets, b twostage pipeline processing unit
86 4 Neural Signal Classification Circuits
4.4Experimental Results
1
Amplitude
0
1
0.5
0
(c) Detected spikes
1
Amplitude
0
1
0 1 2 3 4 5 6 7 8 9 10
Time [s]
Fig.4.6Spike detection from continuously acquired data, the yaxis is arbitrary; a top: raw
signal after amplification, not corrected for gain, b middle: threshold (line) crossings of a local
energy measurement with a running window of 1ms, and c bottom: detected spikes
4.4 Experimental Results 87
0.5
Amplitude
0
0.5
1
0 400 600 800 1000 1200
Time [ms]
(b) Detected spikes
1
0.5
Amplitude
0.5
1
0 400 600 800 1000 1200
Time [ms]
(c)
RBF
SVM =5.12, 2 =1.72 with 3 different classes
5 2
4 3
2 Classifier
3 spike 1
spike 2
2 3 spike 3
1
X2
0
1
1
3
2
2
3
5 4 3 2 1 0 1 2 3
X1
Fig.4.7a Spike detection from continuously acquired data, b detected spikes, c the SVM sepa
ration hypersurface for the RBF kernel ( IEEE 2015)
training classification error, margin of the found hyperplane, and number of ker
nel evaluations.
To improve the data structure from the numerical point of view, the system
in (4.12) is first preprocessed by reordering the nonzero patterns for bandwidth
reduction (Fig.4.8). Figure4.7c gives a threeclass classification graphical illus
tration, where the bold lines represent decision boundaries. For a correctly classi
(1) (2)
fied example x1, we have 1 =0 and 1 =0, i.e., no loss counted, since both 1,2
and 1,3 are negative.
On the other hand, for an example x2 that violates two margin bounds
(2,2,2,3>0), both methods generate a loss. The algorithm converges very fast
at first steps and slows down as the optimal solution is approached. However,
88 4 Neural Signal Classification Circuits
0 0
100 100
200 200
300 300
400 400
500 500
600 600
700 700
800 800
0 200 400 600 800 0 200 400 600 800
almost the same classification error rates were obtained for all the parameters
=[102, 5103, 103], indicating that to find good classifier we do not
need the extremely precise solution with 0. The SVM performance is sen
sitive to hyperparameter settings, e.g., the settings of the complexity param
eter C and the kernel parameter for the Gaussian kernel. As a consequence,
hyperparameter tuning with grid search approach is performed before the final
model fit. More sophisticated methods for hyperparameter tuning are available
as well [39].
The SVM spike sorting performance has been summarized and benchmarked
(Fig. 4.9) versus four different, relatively computationally efficient meth
ods for spike sorting, e.g., template matching, principle component analysis,
Mahalanobis, and Euclidean distance. The performance is quantified using the
effective accuracy, e.g., total spikes classified versus spikes correctly classified
(excluding spike detection). The source of spike detection error is either the false
inclusion of a noise segment as a spike waveform or the false omission of spike
waveforms. These errors can be easily modeled by the addition or removal of
spikes at random positions in time, so that the desired percentage of error ratio is
obtained. In contrast, care should be taken in modeling spike classification errors,
since an error in one unit may or may not cause an error in another unit. In all
methods the suitable parameters are selected with which better classification per
formance is obtained.
The SVM classifier consistently outperforms benchmarked methods over the
entire range of SNRs tested, although it only exceeds the Euclidean distance met
ric by a slight margin reaching an asymptotic success rate of~97%. The different
SNRs in BMI have been obtained by superimposing attenuated spike waveforms
such as to mimic the background activity observed at the electrode. If we increase
the SNR of the entire frontend brainmachine interface, the spike sorting accu
racy increases by up to 45% (depending on spike sorting method used).
Similarly, the accuracy of the spike sorting algorithm increases with A/D
converter resolution, although it saturates beyond 56 bit resolution, ultimately
4.4 Experimental Results 89
(a) 100
95
90
85
Accuracy [%]
80
75
70
Mahalanobis
65
PCA
60 SVM
Template Matching
55
Euclidean
50
10 12 14 16 18 20 22 24 26 28 30
SNR [dB]
(b)
100
95
90
85
Accuracy [%]
80
75
70
65 Euclidean
Mahalanobis
60 PCA
SVM
55
Templeate Matching
50
10 12 14 16 18 20 22 24 26 28 30
SNR [dB]
Fig.4.9a Effect of SNR on single spike sorting accuracy of the BMI system, b effect of SNR
on overlapping spikes of three classes on sorting accuracy of the BMI system. ( IEEE 2015)
limited by the SNR. However, since the amplitude of the observed spike signals
can vary, typically, by one order of magnitude, additional resolution is needed
(i.e., 23 bit), if the amplification gain is fixed. Additionally, increasing the sam
pling rate of A/D converter improves spike sorting accuracy, since this captures
finer features further differentiating the signals. The sorting accuracy of the spike
waveforms, which overlap at different sample points is illustrated in Fig.4.9b.
The correct classification rate of the proposed method is on average 48% larger
than that of other four methods. If the training data contains the spike waveforms
appearing in the process of complex spike bursts, we classify other distorted
spikes generated by the bursting neurons first before resolving the problem of
complex spike bursts partially. The performance of the four other methods is lim
ited if the distribution of the background noise is nonGaussian or if the multiple
spike clusters are overlapped.
The estimation error varies with the number of spikes detected (Fig.4.10a),
and it reaches 60dB with normalized distribution at around 700 spikes over
the entire dataset. The convergence period is~0.1s assuming a firing rate at 20
spikes/s from three neurons. The number of support vectors required is partly gov
erned by the complexity of the classification task. The kernels yield increasing
90 4 Neural Signal Classification Circuits
(a) 10
cluster 1
cluster 2
25 cluster 3
cluster 4
cluster 5
error [dB]
40
55
70
10 # spikes 100 1000
(b) 1
10
0
10
1
10
Power [mW]
2
10 Linear
MLP
3 Poly
10
RBF
4
10
5
10
1 2 3 4
10 10 10 10
# support vectors
(c) 0
10
RBF
Poly
5
log normalized error
10
10
10
15
10
10 100 200 300 400
# support vectors
Fig.4.10a The error versus number of spikes, b energy per cycle versus various SVM kernels,
c lognormalized error in reduced set model order reduction versus number of support vectors
levels of strength; however, the required energy for each kernel varies by orders of
magnitude as illustrated in Fig.4.10b. As the SNR decreases more support vectors
are needed in order to define a more complex decision boundary. For our dataset,
the number of support vectors required is reduced within the range of 300310
(Fig.4.10c). The required cycle count (0.14kcycles) and memory (0.2kB) for lin
ear kernel versus (4.86kcycles) and (6.7kB) for RBF kernel highlights the mem
ory usage dependence on the kernels.
The spike detection implementation includes 31k logic gates resulting in a
2.64mm2 area, and consumes only 41W of power from a 0.4V supply voltage.
4.4 Experimental Results 91
4.5Conclusions
The support vector machine has been introduced to bioinformatics and spike
classification/sorting because of its excellent generalization, sparse solution,
and concurrent utilization of quadratic programming. In this chapter, we pro
pose a programmable neural spike classifier based on multiclass kernel SVM for
128channel spike sorting system that tracks the evolution of clusters in real time,
and offers high accuracy, has low memory requirements, and low computational
complexity. The implementation results show that the spike classifier operates
online, without compromising on required power and chip area, even in neural
interfaces with a low SNR.
References
1. M.A. Lebedev, M.A.L. Nicolelis, Brainmachine interfaces: Past, present and future. Trends
Neurosci. 29(9), 536546 (2006)
2. G. Buzsaki, Largescale recording of neuronal ensembles. Nat. Neurosci. 7, 446451 (2004)
3. F.A. MussaIvaldi, L.E. Miller, Brainmachine interfaces: Computational demands and clini
cal needs meet basic neuroscience. Trends Neurosci. 26(6), 329334 (2003)
4. K.H. Lee, N. Verma, A lowpower processor with configurable embedded machinelearning
accelerators for highorder and adaptive analysis of medicalsensor signals. IEEE J. Solid
State Circuits 48(7), 16251637 (2013)
5. K.H. Kim, S.J. Kim, A waveletbased method for action potential detection from extracel
lular neural signal recording with low signaltonoise ratio. IEEE Trans. Biomed. Eng. 50,
9991011 (2003)
6. D.A. Adamos, E.K. Kosmidis, G. Theophilidis, Performance evaluation of pcabased spike
sorting algorithms. Comput. Methods Programs Biomed. 91, 232244 (2008)
92 4 Neural Signal Classification Circuits
7. R.Q. Quiroga, Z. Nadasdy, Y.B. Shaul, Unsupervised spike detection and sorting with wavelets
and superparamagnetic clustering. Neural Comput. 16, 16611687 (2004)
8. S. Takahashi, Y. Anzai, Y. Sakurai, A new approach to spike sorting for multineuronal activities
recorded with a tetrodehow ICA can be practical. Neurosci. Res. 46, 265272 (2003)
9. F. Wood, M. Fellows, J. Donoghue, M. Black, Automatic spike sorting for neural decoding,
in Proceedings of IEEE Conference on Engineering in Medicine and Biological Systems, pp.
40094012, 2004
10. C. VargasIrwin, J.P. Donoghue, Automated spike sorting using density grid contour cluster
ing and subtractive waveform decomposition. J. Neurosci. Methods 164(1), 118 (2007)
11. J. Dai, etal. Experimental study on neuronal spike sorting methods, in IEEE Future
Generation Communication Networks Conference, pp. 230233, 2008
12. R.J. Vogelstein, K. Murari, P.H. Thakur, G. Cauwenberghs, S. Chakrabartty, C. Diehl, Spike
sorting with support vector machines, in Proceedings of Annual International Conference on
IEEE Engineering in Medicine and Biology Society, pp. 546549, 2004
13. K.H. Kim, S.S. Kim, S.J. Kim, Advantage of support vector machine for neural spike train
decoding under spike sorting errors, in Proceedings of Annual International Conference on
IEEE Engineering in Medicine and Biology Society, pp. 52805283, 2005
14. R. Boostani, B. Graimann, M.H. Moradi, G. Pfurtscheller, A comparison approach toward
finding the best feature and classifier in cuebased BCI. Med Biol Eng Comput. 45, 403412
(2007)
15. G. Zouridakis, D.C. Tam, Identification of reliable spike templates in multiunit extracellular
recordings using fuzzy clustering. Comput. Methods Programs Biomed. 61(2), 9198 (2000)
16. V. Karkare, S. Gibson, D. Markovic, A 75W, 16channel neural spikesorting processor
with unsupervised clustering. IEEE J. SolidState Circuits 48(9), 22302238 (2013)
17. T.C. Ma, T.C. Chen, L.G. Chen, Design and implementation of a low power spike detection
processor for 128channel spike sorting microsystem, in IEEE International Conference on
Acoustics, Speech and Signal Processing, pp. 38893892, 2014
18. Z. Jiang, Q. Wang, M. Seok, A low power unsupervised spike sorting accelerator insensitive to
clustering initialization in suboptimal feature space, in IEEE Design Automation Conference,
pp. 16, 2015
19. K.H. Kim, S.J. Kim, A waveletbased method for action potential detection from extracel
lular neural signal recording with low signaltonoise ratio. IEEE Trans. Biomed. Eng. 50(8),
9991011 (2003)
20. T. Chen, etal., NEUSORT2.0: A multiplechannel neural signal processor with systolic array
buffer and channelinterleaving processing schedule, in International Conference of IEEE
Engineering in Medicine and Biology Society, pp. 50295032, 2008
21. E. Shih, J. Guttag, Reducing energy consumption of multichannel mobile medical moni
toring algorithms, in Proceedings of International Workshop on Systems and Networking
Support for Healthcare and Assisted Living Environments, no. 15, pp. 17, 2008
22. R.E. Schapire, A brief introduction to boosting, in Proceedings of International Joint
Conference on Artificial Intelligence, pp. 14011406, 1999
23. B. Schlkopf, A.J. Smola, Learning with kernelssupport vector machines, regularization,
optimization and beyond (The MIT Press, Cambridge, MA, 2002)
24. C.W. Hsu, C.J. Lin, A comparison of methods for multiclass support vector machines.
IEEE Trans. Neural Networks 13, 415425 (2002)
25. O. Mangasarian, D. Musicant, Successive overrelaxation for support vector machines. IEEE
Trans. Neural Networks 10(5), 10321037 (1999)
26. C.W. Hsu, C.J. Lin, A simple decomposition method for support vector machines. Mach.
Learn. 46, 291314 (2002)
27. V.N. Vapnik, Statistical Learning Theory (Wiley, New York, 1998)
28. V. Franc, V. Hlavac, Multiclass support vector machine, in Proceedings of IEEE
International Conference on Pattern Recognition, vol. 2, pp. 236239, 2002
References 93
29. J. Platt, Fast Training of Support Vector Machines Using Sequential Minimal Optimization,
in Advances in kernel methods: Support vector learning, chapter, Cambridge, MA: The MIT
Press, 1999
30. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification (Wiley, New York, 2000)
31. B. Scholkopf, P. Knirsch, C. Smola, A. Burges, Fast approximation of support vector ker
nel expansions, and an interpretation of clustering as approximation in feature spaces, in
Mustererkennung 199820, pp, ed. by P. Levi, M. Schanz, R.J. Ahler, F. May (Springer
Verlag, Berlin, Germany, 1998), pp. 124132
32. H. Lee, S.Y. Kung, N. Verma, Improving kernelenergy tradeoffs for machine learning in
implantable and wearable biomedical applications, in Proceedings of IEEE International
Conference on Acoustics, Speech, and Signal Processing, pp. 15971600, 2011
33. Available: http://www.physionet.org%2cPhysionet
34. C.J. Burges, Simplified support vector decision rules, in International Conference on
Machine Learning, pp. 7177, 1996
35. S.R.M. Ratsch, T. Vetter, Efficient face detection by a cascaded support vector machine
expansion. Roy Soc London Proc Ser 460, 32833297 (2004)
36. H.P. Graf, etal., Parallel support vector machines: the cascade SVM, in Advances in Neural
Information Processing Systems, pp. 521528, 2004
37. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification (Wiley, New York, 2000)
38. K.H. Lee, N. Verma, A lowpower processor with configurable embedded machinelearning
accelerators for highorder and adaptive analysis of medicalsensor signals. IEEE J. Solid
State Circuits 48(7), 16251637 (2013)
39. P. Koch, B. Bischl, O. Flasch, T. BartzBeilstein, W. Konen, On the tuning and evolution of
support vector kernels. Evol. Intel. 5, 153170 (2012)
Chapter 5
BrainMachine Interface: System
Optimization
5.1Introduction
Neural prosthesis systems enable the interaction with neural cells either by record
ing, to facilitate early diagnosis and predict intended behavior before undertaking
any preventive or corrective actions, or by stimulation, to prevent the onset of det
rimental neural activity. Monitoring the activity of a large population of neurons
in neurobiological tissue with highdensity microelectrode arrays in multichannel
implantable brainmachine interface (BMI) is a prerequisite for understanding the
cortical structures and can lead to a better conception of stark brain disorders, such
as Alzheimers and Parkinsons diseases, epilepsy and autism [1] or to reestablish
sensory (e.g., hearing and vision) or motor (e.g., movement and speech) functions
[2]. Practical multichannel BMI systems are combined with CMOS electronics for
the length of the random process is infinite or periodic. The use of the Karhunen
Love expansion [21] has generated interest because of its biorthogonal property,
that is, both the deterministic basis functions and the corresponding random coef
ficients are orthogonal [22], e.g., the orthogonal deterministic basis function and
its magnitude are, respectively, the eigenfunction and eigenvalue of the covari
ance function. Assuming that pi is a zeromean Gaussian process and using the
KarhunenLove expansion, pi can be written in truncated form (for practical
implementation) by a finite number of terms M as
M
pi = p,i + p (di ) p,n p,n ()fp,n (di ) (5.2)
n=1
Fig.5.1a Behavior of
modeled covariance func
tions p using M=5 for
a/p=[1,,10], and b the
model fitting on the available
measurement data (IEEE
2011)
100 5 BrainMachine Interface: System Optimization
Without loss of generality, consider for instance two transistors with given
threshold voltages. In our approach, their threshold voltages are modeled as sto
chastic processes over the spatial domain of a die, thus making parameters of any
two transistors on the die two different correlated random variables. The value of M
is governed by the accuracy of the eigenpairs in representing the covariance func
tion rather than the number of random variables. Unlike previous approaches, which
model the covariance of process parameters due to the random effect as a piece
wiselinear model [23] or through modified Bessel functions of the second kind
[24], here the covariance is represented as a linearly decreasing exponential function
Cp (d1 , d2 ) = 1 + dx,y ecx dx1 dx2 cy dy1 dy2 /
(5.3)
where defines a fitting parameter estimated from the extracted data, W* and L*
represent the geometrical deformation due to manufacturing variations, and p*
models electrical parameter deviations from their corresponding nominal values,
e.g., altered transconductance, threshold voltage, etc. (Appendix B).
In addition to process parameter variability, which sets the upper bound on the cir
cuit design in terms of accuracy, linearity and timing, existence of noise associated
with fundamental processes represents an elementary limit on the performance of
electronic circuits.
Neural cell noise model: In the Hodgkin and Huxley framework, a neural chan
nels configuration is determined by the states of its constituent subunits, where
each subunit can be either in an open or closed state [25]. Adding a noise term
x(V,t) (x=m,h, or n) to the deterministic ordinary differential equation (ODE)
of Hodgkin and Huxley is consistent with the behavior of the Markov process for
channel gating [26]. Such process can be contracted to a Langevin description
(via a FokkerPlanck equation) and expressed as deltacorrelated noise processes
neuron(t+,t)=1/x[x(1x)+xx](), where x is the total number of neural
channels, and the transition rates x(t) and x(t) are instantaneous functions of the
membrane potential V(t). Diracs delta function designates that the noise at dif
ferent times is uncorrelated and the variables m, h, and n represent the aggregated
fraction of open subunits of different types, aggregated across the entire cell mem
brane. Subsequently, the neural channel noise is modeled as Brownian motion, i.e.,
as a Gaussdistributed nonstationary stochastic process with independent incre
ments and heuristically fixed constant variance [27].
Electrodetissue interface and signal conditioning circuit noise model: In intra
cortical microelectrode recordings, biological (neural cell) noise mainly origi
nates from the firing of several neurons in the tissue surrounding the recording
microelectrode, while thermal noise levels are influenced by the electrodetissue
interface impedance in each individual recording site (as a result of the foreign
body reaction) and the recording bandwidth, i.e., a wider recording bandwidth
increase thermal noise levels. The electrodetissue interface noise includes the
tissue/bulk thermal noise and the electrodeelectrolyte interface noise. Tissue
noise is modeled as the thermal noise generated by the solution/spreading or tis
sue/encapsulation resistance [28] and the electrode noise is the thermal noise
generated by the charge transfer resistor [29]. The noise of the signal condition
ing electronic circuits is mainly determined by the thermal and flicker noise.
The most important types of electrical noise sources (thermal, shot, and flicker
noise) in passive elements and integratedcircuit devices have been investi
gated extensively, and appropriate models derived [30] as stationary and in [31]
as nonstationary noise sources. We adapt model descriptions as defined in [31],
where thermal and shot noise are expressed as thermal(t+,t)=2kTG(t)() and
102 5 BrainMachine Interface: System Optimization
Device variability effects limitations are rudimentary issues for the robust circuit
design and their evaluation has been subject of numerous studies. Several models
have been suggested for device variability [3436], and correspondingly, a number
of Computeraided design (CAD) tools for statistical circuit simulation [3742]. In
general, a circuit design is optimized for parametric yield so that the majority of
manufactured circuits meet the performance specifications. The computational cost
and complexity of yield estimation, coupled with the iterative nature of the design
process, make yield maximization computationally prohibitive. As a result, circuit
designs are verified using models corresponding to a set of worstcase conditions
of the process parameters. Worstcase analysis refers to the process of determining
the values of the process parameters in these worstcase conditions and the cor
responding worstcase circuit performance values. Worstcase analysis is very effi
cient in terms of designer effort, and thus has become the most widely practiced
technique for statistical analysis and verification. Algorithms previously proposed
for worstcase tolerance analysis fall into four major categories: corner technique,
interval analysis, sensitivitybased vertex analysis, and Monte Carlo simulation.
5.3 Stochastic MNA for Process Variability Analysis 103
The most common approach is the corners technique. In this approach, each
process parameter value that leads to the worst performance is chosen indepen
dently. This method ignores the correlations among the processes parameters, and
the simultaneous setting of each process parameter to its extreme value result in
simulation at the tails of the joint probability density of the process parameters.
Thus, the worstcase performance values obtained are extremely pessimistic.
Interval analysis is computationally efficient but leads to overestimated results,
i.e., the calculated response space enclose the actual response space, due to the
intractable interval expansion caused by dependency among interval operands.
Interval splitting techniques have been adopted to reduce the interval expan
sion, but at the expense of computational complexity. Traditional vertex analysis
assumes that the worstcase parameter sets are located at the vertices of param
eter space, thus the response space can be calculated by taking the union of circuit
simulation results at all possible vertices of parameter space. Given a circuit with
M uncertain parameters, this will result in a 2M simulation problem. To further
reduce the simulation complexity, sensitivity information computed at the nomi
nal parameter condition is used to find the vertices that correspond to the worst
cases of circuit response. The Monte Carlo algorithm takes random combinations
of values chosen from within the range of each process parameter and repeatedly
performs circuit simulations. The result is an ensemble of responses from which
the statistical characteristics are estimated. Unfortunately, if the number of itera
tions for the simulation is not very large, Monte Carlo simulation always underes
timates the tolerance window. Accurately determining the bounds on the response
requires a large number of simulations, so consequently the Monte Carlo method
becomes very cputime consuming if the chip becomes large. Other approaches
for statistical analysis of variationaffected circuits, such as the one based on the
Hermite polynomial chaos [43] or the response surface methodology, are able to
perform much faster than a Monte Carlo method at the expense of a design of an
experiments preprocessing stage [44]. In this section, the circuits are described as
a set of stochastic differential equations (SDE) and Gaussian closure approxima
tions are introduced to obtain a closed form of moment equations. Even if a ran
dom variable is not strictly Gaussian, a secondorder probabilistic characterization
yields sufficient information for most practical problems.
Modern integrated circuits are often distinguished by a very high complex
ity and a very high packing density. The numerical simulation of such circuits
requires modeling techniques that allow an automatic generation of network equa
tions. Furthermore, the number of independent network variables describing the
network should be as small as possible. Circuit models have to meet two contra
dicting demands: they have to describe the physical behavior of a circuit as correct
as possible while being simple enough to keep computing time reasonably small.
The level of the models ranges from simple algebraic equations, over ordinary and
partial differential equations to Boltzmann and Schrodinger equations depending
on the effects to be described. Due to the high number of network elements (up to
millions of elements) belonging to one circuit, one is restricted to relatively simple
models. In order to describe the physics as good as possible, so called compact
104 5 BrainMachine Interface: System Optimization
models represent the first choice in network simulation. Complex elements such
as transistors are modeled by small circuits containing basic network elements
described by algebraic and ODE only. The development of such replacement cir
cuits forms its own research field and leads nowadays to transistor models with
more than 500 parameters. A wellestablished approach to meet both demands to
a certain extent is the description of the network by a graph with branches and
nodes. Branch currents, branch voltages and node potentials are introduced as
variables. The node potentials are defined as voltages with respect to one refer
ence node, usually the ground node. The physical behavior of each network ele
ment is modeled by a relation between its branch currents and its branch voltages.
In order to complete the network model, the topology of the elements has to be
taken into account. Assuming the electrical connections between the circuit ele
ments to be ideally conducting and the nodes to be ideal and concentrated, the
topology can be described by Kirchhoffs laws (the sum of all branch currents
entering a node equals zero and the sum of all branch voltages in a loop equals
zero). In general, for timedomain analysis, modified nodal analysis (MNA) leads
to a nonlinear ODE or differential algebraic equation system which, in most cases,
is transformed into a nonlinear algebraic system by means of linear multistep
integration methods [45, 46] and, at each integration step, a Newtonlike method
is used to solve this nonlinear algebraic system (Appendix B). Therefore, from a
numerical point of view, the equations modeling a dynamic circuit are transformed
to equivalent linear equations at each iteration of the Newton method and at each
time instant of the timedomain analysis. Thus, we can say that the timedomain
analysis of a nonlinear dynamic circuit consists of the successive solutions of
many linear circuits approximating the original (nonlinear and dynamic) circuit at
specific operating points.
Consider a linear circuit with N+1 nodes and B voltagecontrolled branches
(twoterminal resistors, independent current sources, and voltagecontrolled
nports), the latter grouped in set B. We then introduce the source current vec
tor iRB and the branch conductance matrix GRBB. By assuming that the
branches (one for each port) are ordered element by element, the matrix is block
diagonal: each 11 block corresponds to the conductance of a oneport and in
any case is nonzero, while nn blocks correspond to the conductance matrices of
voltagecontrolled nports. More in detail, the diagonal entries of the nn blocks
can be zero and, in this case, the nonzero offdiagonal entries, on the same row or
column, correspond to voltagecontrolled current sources (VCCSs). Now, consider
MNA and circuits embedding, besides voltagecontrolled elements, independent
voltage sources, the remaining types of controlled sources and sources of process
variations. We split the set of branches B in two complementary subsets: BV of
voltagecontrolled branches (vbranches) and BC of currentcontrolled branches
(cbranches).
Conventional nodal analysis (NA) is extended to MNA [46] as follows: currents
of cbranches are added as further unknowns and the corresponding branch equa
tions are appended to the NA system. The NB incidence matrix A can be par
titioned as A=[Av Ac], with AvRNBv and AcRNBc. As in conventional NA,
5.3 Stochastic MNA for Process Variability Analysis 105
Let x0=x(0,t) be the generic point around which to linearize, and with the
change of variable =xx0=[(qp0)T,(0)T]T, the firstorder Taylor
piecewiselinearization of (5.7) in x0 yields
P(x0 ) + (K(x0 ) + P (x0 )) = 0 (5.9)
where K(x)=B(x), P(x)=F(x). Transient analysis requires only the solution of
the deterministic version of (5.7), e.g., by means of a conventional circuit simu
lator, and of (5.9) with a method capable of dealing with linear SDE with sto
chasticity that enters only through the initial conditions. Since (5.9) is a linear
homogeneous equation in , its solution, will always be proportional to 0. We
can rewrite (5.9) as
(x0 ) = E(x0 )0 + F(x0 )0 (5.10)
Equation(5.10) is a system of SDE which is linear in the narrow sense (righthand
side is linear in and the coefficient matrix for the vector of variation sources is
independent of ) [47]. Since these stochastic processes have regular properties,
they can be considered as a family of classical problems for the individual sample
paths and be treated with the classical methods of the theory of linear SDE. By
expanding every element of (t) with
m
i (t) = (t)( 0 ) = ij (t)j (5.11)
j=1
for m elements of a vector . As long as j(t) is obtained, the expression for (t) is
known, so that the covariance matrix of the solution can be written as
= T (5.12)
Defining aj(t)=(a1j, a2j, , anj)T, Fj(t)=(F1j, F2j, , Fnj)T, the requirement for
(t) is
here are caused by the small current and voltage fluctuations, such as thermal,
shot, and flicker noise, that are generated within the integratedcircuit devices
themselves.
The noise performance of a circuit can be analyzed in terms of the smallsignal
equivalent circuits by considering each of the uncorrelated noise sources in turn
and separately computing their contribution at the output. A nonlinear circuit is
assumed to have timeinvariant (dc) largesignal excitations and timeinvariant
steadystate largesignal waveforms and that both the noise sources and the noise
at the output are widesense stationary stochastic processes. Subsequently, the
nonlinear circuit is linearized around the fixed operating point to obtain a linear
timeinvariant network for noise analysis. Implementation of this method based on
the interreciprocal adjoint network concept [48] results in a very efficient com
putational technique for noise analysis, which is available in almost every circuit
simulator. Unfortunately, this method is only applicable to circuits with fixed oper
ating points and is not appropriate for noise simulation of circuits with changing
bias conditions.
In a noise simulation method that uses linear periodically timevarying trans
formations [49, 50], a nonlinear circuit is assumed to have periodic largesignal
excitations and periodic steadystate largesignal waveforms and that both the
noise sources and the noise at the output are cyclostationary stochastic processes.
Afterward, the nonlinear circuit is linearized around the periodic steadystate oper
ating point to obtain a linear periodically timevarying network for noise analysis.
Nevertheless, this noise analysis technique is applicable to only a limited class of
nonlinear circuits with periodic excitations.
Noise simulation in timedomain has traditionally been based on the Monte
Carlo technique [51], where the circuit with the noise sources is simulated using
numerous transient analyzes with different sample paths of the noise sources.
Consequently, the probabilistic characteristics of noise are then calculated using
the data obtained in these simulations. However, accurately determining the
noise content requires a large number of simulations, so consequently, Monte
Carlo method becomes very cputime consuming if the chip becomes large.
Additionally, to accurately model shot and thermal noise sources, timestep in
transient analysis is limited to a very small value, making the simulation highly
inefficient.
In this section, we treat the noise as a nonstationary stochastic process, and
introduce an It system of SDE as a convenient way to represent such a pro
cess. Recognizing that the variancecovariance matrix when backward Euler is
applied to such a matrix can be written in the continuoustime Lyapunov matrix
form, we then provide a numerical solution to such a set of linear timevarying
equations. We adapt model description as defined in [31], where thermal and shot
noise are expressed as deltacorrelated noise processes having independent values
at every time point, modeled as modulated white noise processes. These noise pro
cesses correspond to current noise sources which are included in the models of the
integratedcircuit devices. As numerical experiments suggest that both the conver
gence and stability analyses of adaptive schemes for SDE extend to a number of
108 5 BrainMachine Interface: System Optimization
sophisticated methods which control different error measures, we follow the adap
tation strategy, which can be viewed heuristically as a fixed timestep algorithm
applied to a time rescaled differential equation. Additionally, adaptation also con
fers stability on algorithms constructed from explicit timeintegrators, resulting in
better qualitative behavior than for fixed timestep counterparts [52].
The inherent nature of white noise process differ fundamentally from a wide
sense stationary stochastic process such as static manufacturing variability and
cannot be treated as an ODE using similar differential calculus as in Sect. 5.3.
The MNA formulation of the stochastic process that describes random influences,
which fluctuate rapidly and irregularly (i.e., white noise ) can be written as
F(r , r, t) + B(r, t) = 0 (5.14)
where r is the vector of stochastic processes which represents the state variables
(e.g., node voltages) of the circuit, is a vector of white Gaussian processes and
B(r,t) is a state and time dependent modulation of the vector of noise sources.
Since the magnitude of the noise content in a signal is much smaller in comparison
to the magnitude of the signal itself in any functional circuit, a system of nonlinear
SDE described in (5.14) can be piecewiselinearized under similar assumptions as
noted in Sect. 5.3. Including the noise content description, (2.10) can be expressed
in general form as
(t) = E(t) + F(t) (5.15)
where =[(rr0)T,(0)T]T.We will interpret (5.15) as an Ito system of
SDE. Now rewriting (5.15) in the more natural differential form
d(t) = E(t)dt + F(t)dw (5.16)
where we substituted dw(t)=(t)dt with a vector of Wiener process w. If
the functions E(t) and F(t) are measurable and bounded on the time interval of
interest, there exists a unique solution for every initial value (t0) [47]. If is a
Gaussian stochastic process, then it is completely characterized by its mean and
correlation function. From Itos theorem on stochastic differentials
d((t)T (t))/dt = (t) d(T (t))/dt + d((t))/dt T (t) + F(t) F T (t)dt (5.17)
and expanding (5.17) with (5.16), noting that and dw are uncorrelated, vari
ancecovariance matrix K(t) of (t) with the initial value K(0)=[ T] can be
expressed in differential Lyapunov matrix equation form as [47]
dK(t)/dt = E(t)K(t) + K(t)E T (t) + F(t)F T (t) (5.18)
Note that the mean of the noise variables is always zero for most integrated cir
cuits. In view of the symmetry of K(t), (5.18) represents a system of linear ODE
with timevarying coefficients. To obtain a numerical solution, (5.18) has to be
discretized in time using a suitable scheme, such as any linear multistep method,
or a RungeKutta method. For circuit simulation, implicit linear multistep meth
ods, and especially the trapezoidal method and the backward differentiation for
mula were found to be most suitable [53]. If backward Euler is applied to (5.18),
5.4 Stochastic MNA for Noise Analysis 109
the differential Lyapunov matrix equation can be written in a special form referred
to as the continuoustime algebraic Lyapunov matrix equation
Pr K(tr ) + K(tr )PrT + Qr = 0 (5.19)
K(t) at time point tr is calculated by solving the system of linear equations in
(5.19). Such continuoustime Lyapunov equations have a unique solution K(t),
which is symmetric and positive semidefinite.
Several iterative techniques have been proposed for the solution of the alge
braic Lyapunov matrix Eq.(5.19) arising in some specific problems where the
matrix Pr is large and sparse [5457], such as the BartelsStewart method [58],
and Hammarlings method [47], which remains the one and only reference for
directly computing the Cholesky factor of the solution K(tr) of (5.19) for small
to medium systems. For the backward stability analysis of the BartelsStewart
algorithm, see [59]. Extensions of these methods to generalized Lyapunov equa
tions are described in [60]. In the BartelsStewart algorithm, first Pr is reduced
to upper Hessenberg form by means of Householder transformations, and then
the QRalgorithm is applied to the Hessenberg form to calculate the real Schur
decomposition [61] to transform (5.19) to a triangular system which can be solved
efficiently by forward or backward substitutions of the matrix Pr
S = U T Pr U (5.20)
where the real Schur form S is upper quasitriangular and U is orthonormal. Our
formulation for the real case utilizes a similar scheme. The transformation matri
ces are accumulated at each step to form U [58]. If we now set
K = U T K(tr )U
(5.21)
Q = U T Qr U
where S1, K1, Q1R(n1)(n1); s, k, qR(n1). The system in (5.20) then gives
three equations
(n + n )knn + qnn = 0 (5.24)
knn can be obtained from (5.23) and set in (5.24) to solve for k. Once k is known,
(5.25) becomes a Lyapunov equation which has the same structure as (5.22) but of
order (n1), as
S1 K1 + K1 S1T = Q1 sk T ksT (5.27)
We can apply the same process to (5.26) until S1 is of the order 1. Note under
the condition that i=1,,n at the kth step (k=1,2,,n) of this process, we can
obtain a unique solution vector of length (n+1k) and a reduced triangular
matrix equation of order (nk). Since U is orthonormal, once (5.22) is solved for
K , then K(tr) can be computed using
K(tr ) = U KU T (5.28)
Large dense Lyapunov equations can be solved by sign function based techniques
[61]. Krylov subspace methods, which are related to matrix polynomials have
been proposed [62] as well.
Relatively large sparse Lyapunov equations can be solved by iterative
approaches, e.g., [63]. Here, we apply a low rank version of the iterative method
[64], which is related to rational matrix functions. The postulated iteration for the
Lyapunov Eq.(5.19) is given by K(0)=0 and
for i=1,2, This method generates a sequence of matrices Ki which often con
verges very fast toward the solution, provided that the iteration shift parameters i
are chosen (sub)optimally. For a more efficient implementation of the method, we
replace iterates by their Cholesky factors, i.e., Ki=LiLH
i and reformulate in terms
of the factors Li. The low rank Cholesky factors Li are not uniquely determined.
Different ways to generate them exist [64].
Note that the number of iteration steps imax needs not be fixed a priori.
However, if the Lyapunov equation should be solved as accurate as possible,
correct results are usually achieved for low values of stopping criteria which are
slightly larger than the machine precision.
5.5.1Power Optimization
Random process variations have a major influence on the design parameters and
yield of the manufactured circuits. We define yield as the percentage of manufac
tured circuits that meets all the specifications, considering process variations
(5.30)
5.5 PPA Optimization of Multichannel Neural Recording Interface 111
where E{.} is the expected value, and each vector d has an upper and lower
bound determined by the technological process variation pz with probability den
sity function pdf(pz). The deterministic designable parameters dr, ,
e.g., bias voltages and currents, transistor widths and lengths, resistances, capaci
tances, are denoted by the vector dD, where D is the designable parameter
space. Let the total area of the circuit be Atotal=k(xkAk), where A is the area of
a transistor or a discrete component (resistor or capacitor), k is an index that runs
over all transistors or a discrete components in the circuit and x is the sizing fac
tor (x1). The optimization problem can then be formulated as the search for a
design point that minimizes the total power Ptotal 1cl over the deterministic
designable parameters d with lower bounds aj, and upper bounds bj, for 1jm
in the design space D, subject to a minimum yield requirement y with bound
(5.31)
Let D(Ptotal) be the compact set of all valid design variable vectors d, such that
Ptotal(d)=Ptotal. The designable parameter space D is assumed to be compact,
which for all practical purposes is no real restriction when the problem has a
finite minimum. The main advantage of this approach is its generality: it imposes
no restrictions on the distribution of p and on how the data enters the constraints.
We can approximately subdivide the algorithm into two steps; the yield fulfill
ment, and the objective function optimization. If, as an approximation, we restrict
D(Ptotal) to just the onebest derivation of Ptotal, then we obtain the structured per
ceptron algorithm [65]. As a consequence, given active constraints, including opti
mum power budget and minimum frequency of operation, (5.31) can be effectively
solved by a sequence of minimizations of the feasible region with iteratively gen
erated lowdimensional subspaces using a cutting plane method [66].
The statistical yield constrained problems require mechanisms for quantifying
the reliability associated with the resulting solution, and bounding the true optimal
value of the yield constraint problem (5.31). We define a reliable bound on prob
ability Prob{ajdbj; 1jm) as the random quantity
v
:= arg max{ :
r
r (1 )vr (5.32)
[0,1] r=0
feasibility of d can be evaluated with a high reliability, provided that the bound is
within realistic assumption.
The power optimization problem implicates varying the design point to optimize
power, subject to constraints of other, secondary performance measures, and
designable parameter boundaries. With a metric PPA, we quantify the minimum
power design that meets a targeted performance, while including the impact of
area scaling. The PPA metric depends on the process and operating conditions, cir
cuit specification and the technologys VT option. We can express this multicrite
ria circuit performance optimization problem as
(5.33)
(5.34)
The PPA value, at any design point, is converted into a performance score s and,
subsequently, score s is utilized to compute an overall index of circuit quality,
denoted by PPA (d;s), which is the objective function for the design optimization.
Accordingly, the constrained multicriteria optimization is converted into an opti
mization with a single objective function [67]. As a result, the general form of the
optimization problem becomes
(5.35)
5.5 PPA Optimization of Multichannel Neural Recording Interface 113
(5.36)
where is a combined feature representation of a performance function in a
given application. We replace each nonlinear inequality in (5.36) by D1 linear
inequalities
(5.37)
If the system of inequalities in (5.37) is feasible, typically, more than one solu
tion d is possible. For a unique solution, we select d with d1 for which s is
uniformly different from the next closest score update. The score update is than
expressed as dual quadratic program (QP)
(5.38)
where is the step size, the Lagrange multiplier imposing the constraint for
label ddi, and h(d) are the feature vectors of a design variable vector d. To find
the local maxima and minima, we repeatedly select a pair of derivatives of d and
optimize their dual (Lagrange) variables . The dual program formulation has
two main advantages over the primal QP; since dual program is determined only
by inner products defined by , it allows the usage of kernel functions, and addi
tionally, the constraint matrix of the dual program supports problem decomposi
tion. At the end of sequence, we average all the score vectors s obtained at each
iteration, similar to structured perceptron algorithm [65].
5.6Experimental Results
All the experimental results are carried out on a single processor Ubuntu Linux
9.10 system with Intel Core 2 Duo CPUs 2.66GHz processor and 6GB of
memory. The circuit netlist is simulated in Cadence Specter using 90nm CMOS
model files. The simulation date points are processed with a PERL script and
fed back into the MatLab code. The evaluated frontend neural recording inter
face is illustrated in Fig.5.2. The test dataset (Fig.5.3a) is based on recordings
from the human neocortex and basal ganglia, however, the proposed optimization
114 5 BrainMachine Interface: System Optimization
T1 T2
Cin Cf clkin
Vin C
Vref A1 clock boosting
Gm2
C/ (A+1) A2
Cin AC Gm1 SAR
C S1
T3 logic
Cf
Fig.5.2Schematic of the frontend neural recording interface including LNA, bandpass filter,
PGA, and SAR A/D converter
F is the total signal power, amp,i represents the variance of the noise added by
2
speed of the SAR ADC is primarily a function of the technologys gate delay and
kT/C noise multiplied by the number of SAR cycles necessary for one conversion.
The maximum resolution in SNRbits of an SAR (for a given value of an effective
thermal resistance Reff, which sums together the effects of all noises, e.g., thermal,
shot, 1/f and inputreferred noise) over 1/2 band (0fNeuronfs/2) is
the fullNyquist
than expressed as Nnoise = log2 VFS 2 / 6kTf R
s eff 1, where VFS is a fullscale
input signal and fs is the sampling frequency. The accuracy of the neural spike
classification in a backend signal processing unit directly increase with A/D con
verter resolution, although it saturates beyond 56 bit resolution, ultimately lim
ited by the SNR.
However, since the amplitude of the observed spike signals can vary, typically,
by one order of magnitude, additional resolution is needed (i.e., 23 bit), if the
amplification gain is fixed. Additionally, increasing the sampling rate of the A/D
converter improves spike sorting accuracy, since this captures finer features fur
ther differentiating the signals. The PPA ratio differs for each design depending on
circuit characteristics, such as power consumption, bandwidth, gain, linearity, etc.
Closed form symbolic expressions of the constraints and the objective are passed
on to the optimization algorithm. Design heuristics are used to provide a good ini
tial starting point. The total runtime of the optimization method is only dozens
5.6 Experimental Results 115
Amplitude
0.5
0
0.5
1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
zoom in
1
Amplitude
0.5
0
0.5
1
380 400 420 440 460 480 500 520 540 560
Time [ms]
(b)
20
Membrane potential [mV]
20
40
60
0 5 10 15 20 25
Time [ms]
(c) 0
50
dB
100
150
1 2 3 4 5
10 10 10 10 10
Frequency [Hz]
Fig.5.3a The test dataset, the y axis is arbitrary; a top raw signal after amplification, not cor
rected for gain, b bottom zoom in of the raw signal, and c Spectral signature of SAR A/D con
vertertwo tone test; black area spectral content with nominal gain, gray area spectra with 20%
gain reduction, equivalent to 4 LSB loss in the dynamic range (IEEE 2015)
of seconds, and the number of iterations required to reach the stopping criterion
never exceeds 6 throughout the entire simulated range (from 103 to 101).
The design tradeoff exploration space for circuit area, sample frequency and
PPA is illustrated in Fig.5.4a. The area and sample frequency curves are plotted
for the worstcase design (WCD), and the proposed quadratic program optimized
116 5 BrainMachine Interface: System Optimization
1.3
1.2
1.1
Relative Area
1
0.7
0.6
0.5
0.4
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Relative 1/fs
(b) 4
3.5
2.5
PPA
1.5 tolerance
box
1
optimal
0.5 yield box
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Relative 1/fs
Fig.5.4a Area, sampling frequency and PPA tradeoff for neural recording channel optimized
with quadratic programming (QPO) and worstcase design (WCD). The isoPPA is shown as an
overlay (IEEE 2015), and b optimized PPA versus relative sampling frequency
approach (QPO). The normalized PPA ratio of the design is represented at the
intersection with the areasample frequency curves. For a given circuit area, the
optimized design obtains higher performance than the corresponding WCD. The
points lying on the lowest intersections are most power efficient for the given input
and output constraints, and represent the PPA curve of interest. With the same
yield constraints, the optimization produces uniformly better optimum signal band
width curves for a given power. The improvement is determined by the underly
ing structure of physical process variation. If the amount of uncorrelated variability
increases, i.e., the intrachip variation increases in comparison with the chipto
chip variation, the feasible yield facilitated by optimization increases. Similarly, to
maintain a constant power efficiency as area is reduced, the circuit noise and the
current and voltage efficiencies need to be held constant. The power consumption
of the neural interface frontend increases linearly with sampling frequency.
5.6 Experimental Results 117
(a)
0.7
0.6
Area
0.5
m D2
(g /I )
0.4
0.3
0.2
0.1
0.1 0.2 0.3 0.4 0.5 0.6 0.7
(g /I )
m D1
(b) 4
powergain tradeoff
3.5
2.5
ref
optimal
2
P/P
PPA
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Relative Gain
(c) 4
powerarea tradeoff
3.5
2.5
ref
2
P/P
1.5
1 optimal
PPA
0.5
0
0 0.5 1 1.5 2 2.5 3
Relative Area
Fig.5.5a Two stages gm/ID versus constant gain (plain), constant area (plain hyperbolic), and
constant current (dashed elliptic) contours, b normalized contours showing optimal power per
area (PPA) versus relative gain (IEEE 2015), and c normalized contours showing optimal
power per area (PPA) versus relative area
The constant power, area, and gain contours for two gain stages are illustrated
in Fig.5.5a. The total area is shown as the hyperbolicshaped contour, while ellip
tic contours define the total current, IDtotal. Large transistor bias point (gm/ID)
corresponds to more current and smaller transistors. Contrasting, if we decrease
the current, the gain (due to larger gm/ID), and the total area increase. The plot in
Fig.5.5b illustrates the position of the optimal PPA versus relative (given) gain.
Consumed power in neural interface gain stages increase proportionally with gain
increase.
Typically, desired high gm is obtained at the cost of an increased bias current
(increased power) or area (wide transistors). However, for very short channel the
carrier velocity quickly reaches the saturation limit at which the gm also saturates,
becoming independent of gate length or bias. The intrinsic gain degradation can
be alleviated with openloop residue amplifiers [68], comparatorbased switched
5.6 Experimental Results 119
capacitor circuits [69], and correlated level shifting [70]. The plot in Fig.5.5c
illustrates the position of the optimal PPA under maximum yield reference design
point versus relative area. The offset and the static accuracy critically depend on
the matching between nominally identical devices. This error, however, typically
decreases as the area of devices increases. Several rules exist [71] to ensure suf
ficient matching; the matched devices should have the same structure and the sur
roundings in the layout, use the same materials, have the same orientation and
temperature, and the distance between matched devices should be minimum.
In Table5.1, the worstcase design (WCD) is compared across the neural inter
face circuits with the optimization approach. The QP optimized circuits allow
large area reduction when designed for maximum WCD frequency ranging from
9 to 19%, with 16% on average. When operating at the same frequency, the opti
mized total power is reduced up to 21%. The optimization space in symmetrical
circuits is restricted and, consequently, the additional power saving obtained by an
optimization is limited, particularly with the higher yield.
For decreased yield, 95% instead of 99%, higher power saving of up to 32%
on average can be achieved as a consequence of a larger optimization space (not
shown in Table5.1). Note that overdimensioning in a case of higher yield, leads
to a larger area and higher power consumption. As yield increases when tolerance
decreases, an agreeable tradeoff needs to exist between increase in yield and the
cost of design and manufacturing. Consequently, continuous observation of pro
cess variation and thermal monitoring becomes a necessity [72]. The observed cir
cuits power consumption scales with its bandwidth and SNR. The limit on power
dissipated can be expressed as (8kT)f(SNR), where f is an increasing function
of SNR [73]. Additionally, the interface input to the neural system is subject to
external noise, which can be represented by an effective temperature. Reducing
noise to improve signal processing requires larger numbers of receptors, channels,
or neurons, requiring additional power resources [74].
5.7Conclusions
Integrated neural implants interface with the brain using biocompatible elec
trodes to provide high yield cell recordings, large channel counts, and access to
spike data and/or field potentials with high signaltonoise ratio. Rapid advances in
computational capabilities, design tools, and biocompatible electrodes fabrication
techniques allow for the development of neural prostheses capable of interfacing
with single neurons and neuronal networks. The miniaturization of the functional
blocks in neural recording interface, however, presents significant circuit design
challenges in terms of noise, area, power, and the reliability of the recording sys
tem. In this chapter, we develop a yield constrained sequential PPA minimization
framework that is applied to a multivariable optimization in a neural record
ing interface. By limiting overdimensioning of the circuit, the proposed method
achieves consistently a better PPA ratio over the entire range of neural recording
120 5 BrainMachine Interface: System Optimization
interface circuits, with no loss of circuit performance. Our approach can be used
with any variability model and is not restricted to any particular performance
constraint. As the experimental results in CMOS 90nm technology indicate, the
suggested numerical methods provide accurate and efficient solutions of the PPA
optimization problem offering yield up to 26% power savings and up to 22% area
reduction, without penalties.
References
18. S. Seth, B. Murmann, Design and optimization of continuoustime filters using geometric
programming, in Proceedings of IEEE International Symposium on Circuits and Systems
(2014), pp. 20892092
19. A. Zjajo, C. Galuzzi, R. van Leuken, Sequential power per area optimization of multichan
nel neural recording interface based on dual quadratic programming, in Proceedings of IEEE
International Conference on Neural Engineering (2015), pp. 912
20. M. Grigoriu, On the spectral representation method in simulation. Probab. Eng. Mech. 8,
7590 (1993)
21. M. Love, Probability Theory (D. Van Nostrand Company Inc., Princeton, 1960)
22. R. Ghanem, P.D. Spanos, Stochastic Finite Element: A Spectral Approach (Springer, Berlin,
1991)
23. P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, C. Spanos, Modeling withindie spatial
correlation effects for processdesign cooptimization, in Proceedings of IEEE International
Symposium on Quality of Electronic Design (2005), pp. 516521
24. J. Xiong, V. Zolotov, L. He, Robust extraction of spatial correlation, in Proceedings of IEEE
International Symposium on Physical Design (2006), pp. 29
25. A. Hodgkin, A. Huxley, A quantitative description of membrane current and its application to
conduction and excitation in nerve. J. Physiol. 117, 500544 (1952)
26. R.F. Fox, Y.N. Lu, Emergent collective behavior in large numbers of globally coupled inde
pendently stochastic ion channels. Phys. Rev. E. 49, 34213431 (1994)
27. A. Saarinen, M.L. Linne, O. YliHarja, Stochastic differential equation model for cerebellar
granule cell excitability. PLoS Comput. Biol. 4(2), 111 (2008)
28. A.C. West, J. Newman, Current distributions on recessed electrodes. J. Electrochem. Soc.
138(6), 16201625 (1991)
29. Z. Yang, Q. Zhao, E. Keefer, W. Liu, Noise characterization, modeling, and reduction for in
vivo neural recording, in Advances in Neural Information Processing Systems (2010), pp.
21602168
30. P.R. Gray, R.G. Meyer, Analysis and Design of Analog Integrated Circuits (Wiley, New York,
1984)
31. A. Demir, E. Liu, A. SangiovanniVincentelli, Timedomain nonMonte Carlo noise simu
lation for nonlinear dynamic circuits with arbitrary excitations, in Proceedings of IEEE
International Conference on ComputerAided Design (1994), pp. 598603
32. J.H. Fischer, Noise sources and calculation techniques for switched capacitor filters. IEEE J.
SolidState Circuits 17(4), 742752 (1982)
33. T. Sepke, P. Holloway, C.G. Sodini, H.S. Lee, Noise analysis for comparatorbased circuits.
IEEE Trans. Circuits Syst. I 56(3), 541553 (2009)
34. C. Michael, M. Ismail, Statistical Modeling for ComputerAided Design of MOS VLSI
Circuits (Kluwer, Boston, 1993)
35. H. Zhang, Y. Zhao, A. Doboli, ALAMO: an improved space based methodology for
modeling process parameter variations in analog circuits, in Proceedings of IEEE Design,
Automation and Test in Europe Conference (2006), pp. 156161
36. M. Pelgrom, A. Duinmaijer, A. Welbers, Matching properties of MOS transistors. IEEE J.
SolidState Circuits 24(5), 14331439 (1989)
37. R. LpezAhumada, R. RodrguezMacas, FASTEST: a tool for a complete and efficient sta
tistical evaluation of analog circuits, dc analysis. in Analog Integrated Circuits and Signal
Processing, vol 29, no 3 (Kluwer Academic Publishers, The Netherlands, 2001), pp. 201212
38. G. Biagetti, S. Orcioni, C. Turchetti, P. Crippa, M. Alessandrini, SiSMAa statistical simu
lator for mismatch analysis of MOS ICs, in Proceedings of IEEE/ACM International
Conference on ComputerAided Design (2002), pp. 490496
39. B. De Smedt, G. Gielen, WATSON: design space boundary exploration and model generation
for analogue and RF IC design. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 22(2),
213224 (2003)
122 5 BrainMachine Interface: System Optimization
63. E. Wachspress, Iterative solution of the Lyapunov matrix equation. Appl. Math. Lett. 1,
8790 (1998)
64. J. Li, F. Wang, J. White, An efficient Lyapunov equationbased approach for generat
ing reducedorder models of interconnect, in Proceedings of IEEE Design Automation
Conference (1999), pp. 16
65. Y. Freund, R.E. Schapire, Large margin classification using the perceptron algorithm. Mach.
Learn. 37, 277296 (1999)
66. I. Tsochantaridis, T. Hofmann, T. Joachims, Y. Altun, Support vector machine learning for
interdependent and structured output spaces, in International Conference on Machine
Learning (2004), pp. 18
67. A. Dharchoudbury, S.M. Kang, Worstcase analysis and optimization of VLSI circuits perfor
mances. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 14(4), 481492 (1995)
68. B. Murmann, B.E. Boser, A 12bit 75ms/s pipelined ADC using openloop residue amplifi
cation. IEEE J. SolidState Circuits 38(12), 20402050 (2003)
69. T. Sepke etal., Comparatorbased switchedcapacitor circuits for scaled CMOS technologies,
in IEEE International SolidState Circuit Conference Digest of Technical Papers (2006), pp.
220221
70. B.R. Gregoire, U.K. Moon, An over60db true railtorail performance using correlated
level shifting and an opamp with 30db loop gain, in IEEE International SolidState Circuit
Conference Digest of Technical Papers (2008), pp. 540541
71. A. Zjajo, J. Pineda de Gyvez, LowPower HighResolution Analog to Digital Converters
(Springer, New York, 2011)
72. A. Zjajo, M.J. Barragan, J. Pineda de Gyvez, Lowpower dielevel process variation and tem
perature monitors for yield analysis and optimization in deepsubmicron CMOS. IEEE Trans.
Instrum. Meas. 61(8), 22122221 (2012)
73. E.A. Vittoz, Future of analog in the VLSI environment, in Proceedings of IEEE International
Symposium on Circuits and Systems (1990), pp. 13721375
74. J.E. Niven, S.B. Laughlin, Energy limitation as a selective pressure on the evolution of sen
sory systems. J. Exp. Biol. 211(11), 17921804 (2008)
Chapter 6
Conclusions
Best way to predict the future is to invent it. Medicine in the twentieth century
relied primarily on pharmaceuticals that could chemically alter the action of neu
rons or other cells in the body, but twentyfirst century health care may be defined
more by electroceuticals: novel treatments that will use pulses of electricity to
regulate the activity of neurons, or devices that interface directly with our nerves.
Systems such as brain machine interface detect the voltage changes in the brain
that occur when neurons fire to trigger a thought or an action, and they translate
those signal into digital information that is conveyed to the machine, e.g., pros
thetic limb, speech prosthesis, a wheelchair.
To help accomplish specific tasks a hybrid BMI could be build that combines
brain signals with input from other sensors. Sensors exist or are in the works that
can observe eye movement, breath, sweat, gaze, facial expressions, heart rate,
muscle movements, and sleep patterns, as well as the ambient temperature and air
quality. For example, an eyetracking sensor follows the subjects gaze to locate
the target object, and ECoG sensor record brain activity while the subject reaches
toward that target. A computer analyzes the brain activity associated with the
subjects arm movement and sends a command to a robotic arm; with the help of
depth sensor, the arm reaches out and grabs the object. If a prosthetic limb has
6.2 Recommendations and Future Research 129
sensors that register when it touches an object, it could in principle send that sen
sory feedback to a patient by stimulating the brain though the ECoG electrodes.
Consequently, a twoway communication between brain and prosthesis can be
used to help a user deftly control the limb.
What would it take to build a hybrid BMI? First, we need to improve our
recording hardware. Todays systems use only a few dozen electrodes on the cor
tex; clearly, a much higher density of electrodes would produce a better signal.
We need a suite of sensors, possibly with a wearable gadget/clothing that moni
tors, stimulates, and collects the data. To decipher neural activity of not just in one
area but across large regions of the brain, signal analysis needs to improve. We
will need better spatial and temporal resolution to determine the exact sequence in
which groups of neurons across the cortex fire to produce a command or a thought.
Finally, and the most importantly, we need novel circuit to systemlevel tech
niques to enhance the power efficiency of autonomous BMI systems and wire
less sensors to ensure continued performance enhancements under a tight power
budget. Dramatic improvements in power efficiency can be obtained through sev
eral principles:
electronics is going toward increasingly complex systems: meaningful circuit
solutions need to fit a system concept first;
power efficiency comes from synergy: working cooperatively across levels of
abstraction leads to benefits that are largely greater than the sum of the single
benefits;
exploring alternative signal processing circuits, e.g., timebased, currentbased
processing, for powerefficient solutions; using digitally assisted analog circuit
and analogassisted digital circuit techniques;
power is a valuable currency, and needs to be continuously tradedoff with other
available commodities (performance, sample rate, resolution, signal quality,);
power needs to be truly scalable across voltage and timevarying specifications:
every time we can give up something, power needs to benefit from it;
using powerefficient machinelearning techniques to recognize certain general
states of mind from EEG or ECoG recordings; using powerscalable kernels for
the classification of a neural spikes;
emerging technologies are a significant source of inspiration to look at the
future, and to learn new ways to use what exists; circuit and system integration
with emerging and postCMOS technologies (TFET, SymFET, BiSFET);
understanding or at least measuring are powerful tools to increase power effi
ciency by avoiding pessimism and reducing design margin.
Additional design challenges posed by increased system integration of a multi
physical domain hybrid bioelectronic interface, where not only analog and digi
tal electronics are integrated, but also the mechanical, chemical, optical, and
thermal sensors are becoming integral part of the embedded system needs to be
addressed as well. Creation of a unified design environment where the system
definition and its design partitioning across the different physical domains can be
analyzed and verified remains priority. In addition, nonfunctional constraints that
130 6Conclusions
2 = 4kT g
in,T m (A.1)
2 = 4kT
in,R (A.2)
R
The inputreferred thermal noise of the (single transistor, commonsource) ampli
fier with resistive load can be calculated as the output noise divided by the gain of
the amplifier
2 = 1 4kT 4kT 1 4kT
vn,i 2
4kT g m + = + (A.3)
gm R gm gm R gm
assuming the gmR, which is the gain of the amplifier, is much greater than 1/, thus
1/(gmR) is negligible compared to if the amplifier has a highenough gain. The
total inputreferred thermal noise of the amplifier can be calculated by integrating
the noise over the entire frequency range to be
4kT 1 4kT
Vrms,ni = = (A.4)
2 gm 2RC gm RC
Since the total power consumption is P = ItotVDD, we can express the total power
consumption of the amplifier as a function of inputreferred thermal noise as [3]
1 UT kT VDD
P= 2 (A.6)
Vrms,ni 2RC 2
Previous equation illustrates the tradeoff between the power consumption and
the total inputreferred thermal noise of a subthreshold amplifier for a given sup
ply voltage and bandwidth (denoted by RC product in this case). To reduce the
inputreferred thermal noise by a factor of 2, the total power consumption must be
increased by a factor of 4. This relationship shows a steep power cost of achieving
lownoise performance in a thermalnoise limited amplifier, even without taking a
flicker noise into account.
The powernoise tradeoff in the amplifier is aggravated if the transistor is oper
ating in strong inversion. In strong inversion, the transconductance gm is propor
tional to Itot. As a result, the total power consumption scales as 1/Vni4 instead of
1/Vni2 as in the subthreshold case.
where VDD is the supply voltage, k is the Boltzmann constant, T is the temperature
in Kelvins, BWLNA = fLP fHP is the 3dB bandwidth of the LNA, fLP and fHP are
lowpass and highpass bandwidth, respectively, UT is the thermal voltage (kT/q),
and noise efficiency factor NEF is defined as [3]
2ILNA
NEF = Vrms,in (A.8)
4kT UT BWLNA
Appendix 133
The total LNA output noise voltage should be less than the ADC quantization
noise
1 VDD 2
1
G2LNA G2PGA Vrms,in
2
LSB2 = (A.9)
12 12 2n
where GLNA is the gain of the LNA, GPGA is the gain of the programmable gain
amplifier, LSB is the ADC least significant bit voltage value, and n is the resolu
tion of the A/D converter. Combining (A.7) and (A.9), the minimum LNA power
consumption is expressed as
24 2n kT UT BWLNA
PLNA G2LNA G2PGA (NEF)2 (A.10)
VDD
The PGA derives the following ADC and must meet a slew rate constraint. By set
ting the time constant = tslew, where tslew = 1/2fs is the maximum allowable time
for slewing, the minimum required biasing current of the PGA (IPGA,slew = gmVeff) is
CL,PGA GPGA Veff
IPGA,slew = (A.11)
Tslew
where CL,PGA is the load capacitance of the PGA, Veff is the voltage swing of the A/D
converter, and fs is the sampling rate for one recording channel. Consequently, the
power consumption of the PGA is [4]
PPGA = 2fs CL,PGA GPGA Veff VDD (A.12)
2n
CS = 12kT 2 (A.13)
VFS
To charge this capacitor to VFS within one half period of the sampling frequency
fS, we need a current of I = 2fSCSVFS. Assuming that we have an ideal amplifier,
driving the capacitor leads to a minimum supply current for that amplifier. Further
assuming that the supply voltage of the amplifier is equal to VFS, we arrive at a
power dissipation of IVFS for the amplifier and, therefore, for the sampling process.
Combining these relationships gives a lower bound for the sampling power
PSH = 24kTfS 22n (A.14)
134 Appendix
In the binary search algorithm, n steps are needed to complete one conversion, as
the DAC output gradually approaches the input voltage. The DAC output voltage
for the ith step can be expressed as
Vref Vref
VDAC,out = VI (i) = Vin + Dn1 + ... + i , 1 i n (A.15)
2 2
where VI is the input voltage difference, Vref is the reference voltage and Dn is the
digital representation of n bit code. The comparator must determine the output dig
ital code of the subADC converted into a voltage by the DAC for transfer phase
within the decision time td. Subsequently, the output voltage difference required to
make the comparison in the latchbased comparator can be expressed as
Vout = AV VI exp(td ) (A.16)
where AV acts as a gain factor from the input to the initial imbalance of the latch
decision stage, = CL,comp/gm, and CL,out and gm are the output load and transcon
ductance of the comparator, respectively. Assuming the td = 1/ts, the required gm is
n
n2
CL,comp VDD VDD
gm,comp = ln = 2nf s C L,comp ln + ln 2
td AV (Vref /2K ) AV Vref 2
K=1
(A.17)
To identify the minimum power limit of the comparator, it is noted that its total
inputreferred noise voltage has a fundamental kT/C limitation given by
kT
Vn2 = 4 (A.18)
CL,comp
Equating previous equation with the quantization noise VFS /12 22n , gives the
2
22n
CL,comp = 48kT 2 (A.19)
VFS
where VFS is the full scale voltage range. Substituting (A.19) in (A.17), the min
imum gm,comp and Icomp = gm,compVeff can be found. The power consumption of the
comparator is [4]
22n n2
VDD
Pcomp = 96nfs kT 2 Veff VDD ln + ln 2 (A.20)
VFS AV Vref 2
To drive the SAR logic capacitance within the sampling phase requires a current of
Ilogic = (ClogicVFS)/ts which leads to the following minimum limit for the sampling power
2
Plogic = nfs Clogic VDD (A.21)
Appendix 135
The unit capacitor CU is usually determined by thermal noise and capacitor mis
match. The thermal noise resulting from the sampling action of the input voltage
is given by kT/(2nCU). In a Nyquist ADC, CU should be large enough so that the
thermal noise is less than the converters quantization noise
2n
CU,n = 12kT 2 (A.23)
VFS
The inputreferred noise vn (the total integrated output noise as well) still takes the
form of kT/C with some correction factor 1,
where Ron is resistance of the switch, Vns is noise source, Cp is parasitic capacitance
and COTA is the input capacitance of the OTA. Then in the conversion mode, the
sampling capacitor C4, which now contains the signal value and the offset of the
OTA, is connected across the OTA. The total noise charge will cause an output
voltage of
2
Qns C4 + Cp + COTA 1 kT
2
vns(out) = 2 = kT 2
= (A.27)
C4 C4 C4
where is the feedback factor. For differential implementation of the circuit, the
noise power of the previous equation increases by a factor of 2 assuming no cor
relation between positive side and negative side, since the uncorrelated noise adds
in power. Thus, inputreferred noise power, which is found by dividing the output
noise power by the square of the gain (GA = C3/C4) is given by
2
vns(out) 1 kT
2
vns(in) = = (A.28)
(GA )2 (GA )2 C4
The resistive channel of the MOS devices in OTA also has thermal noise and con
tributes to the inputreferred noise of the PG ADC circuit. The noise power at the
output is found from
H sj i2 d = kT Gm Ro kT
2
2
vns(out) = ns = (A.29)
CLT (1 + Gm Ro ) CLT
0
where Ro is the output resistance and CLT is the capacitance loading at the output
CLT = CL + Cp + COTA (A.30)
The optimum gate capacitance of the OTA is proportional to the sampling capaci
tor COTA,opt = 3C4, where 3 is a circuitdependent proportionality factor. The drain
current ID yields
12 L 2 12 C4
ID = (A.31)
3
where is the carrier mobility, Cox is the gate oxide capacitance, 1 is the gain
noise variance is
2 kT
vns(in) = (A.32)
(GC )2 CLT
The noise from acquisition and conversion mode can be added together to find the
total inputreferred noise assuming that two noise sources are uncorrelated. Using
Appendix 137
the results from (3.28) and (3.32), the total inputreferred noise power for differen
tial input is given by
2 2 kT 2 kT 1 1 1
vns(in) = + = 2 + kT
(GC )2 CLT (GA )2 C4 (GC )2 CLT (GA )2 C4
(A.33)
The number of transistor process parameters that can vary is large. In previous
research aimed at optimizing the yield of integrated circuits [7, 8], the number of
parameters simulated was reduced by choosing parameters which are relatively
independent of each other, and which affect performance the most. The parameters
most frequently chosen are, for n and pchannel transistors: threshold voltage at
zero backbias for the reference transistor at the reference temperature VTOR, gain
factor for an infinite square transistor at the reference temperature SQ, total length
and width variation Lvar and Wvar, oxide thickness tox, and bottom, sidewall, and
gate edge junction capacitance CJBR, CJSR, and CJGR, respectively. The variation in
absolute value of all these parameters must be considered, as well as the differences
between related elements, i.e., matching. The threshold voltage differences VT and
current factor differences are the dominant sources underlying the drainsource
current or gatesource voltage mismatch for a matched pair of MOS transistors.
Transistor Threshold Voltage: Various factors affect the gatesource voltage at
which the channel becomes conductive such as the voltage difference between
the channel and the substrate required for the channel to exist, the work function
difference between the gate material and the substrate material, the voltage drop
across the thin oxide required for the depletion region, the voltage drop across the
thin oxide due to implanted charge at the surface of the silicon, the voltage drop
across the thin oxide due to unavoidable charge trapped in the thin oxide, etc.
In order for the channel to exist the concentration of electron carriers in the
channel should be equal to the concentration of holes in the substrate, S = F.
The surface potential changed a total of 2F between the strong inversion and
depletion cases. Threshold voltage is affected by the builtin Fermi potential due
to the different materials and doping concentrations used for the gate material and
the substrate material. The work function difference is given by
kT ND NA
ms = FSub FGate = ln (A.35)
q ni2
138 Appendix
Due to the immobile negative charge in the depletion region left behind after the
p mobile carriers are repelled. This effect gives rise to a potential across the gate
oxide capacitance of QB/Cox, where
2Si 2F 
QB = qNA xd = qNA = 2qNA Si 2F  (A.36)
qNA
and xd is the width of the depletion region. The amount of implanted charge at the
surface of the silicon is adjusted in order to realize the desired threshold voltage.
For the case in which the sourcetosubstrate voltage is increased, the effective
threshold voltage is increased, which is known as the body effect. The body effect
occurs because, as the sourcebulk voltage, VSB, becomes larger, the depletion
region between the channel and the substrate becomes wider, and therefore more
immobile negative charge becomes uncovered. This increase in charge changes the
charge attracted under the gate. Specifically, QB becomes
QB = 2qNA Si (VSB + 2F ) (A.37)
The voltage drop across the thin oxide due to unavoidable charge trapped in the
thin oxide gives rise to a voltage drop across the thin oxide, Vox, given by
Qox qNox
Vox = = (A.38)
Cox Cox
Incorporating all factors, the threshold voltage, VT, is than given by
QB Qox QB Qox QB QB
VT = 2F ms + = ms 2F +
Cox Cox Cox
QB Qox 2qSi NA
= ms 2F + + 2F  + VSB 2F 
Cox Cox
(A.39)
When the source is shorted to the substrate, VSB = 0, a zero substrate bias is
defined as
QB Qox
VT 0 = ms 2F + (A.40)
Cox
The threshold voltage, VT, can be rewritten as
2qSi NA
VT = VT 0 + 2F  + VSB 2F  = (A.41)
Cox
Advanced transistor models, such as MOST model 9 [9], define the threshold volt
age as
VT = VT 0 + VT 0 + VT 1 = VT 0 = (VT 0T + VT 0G + VT 0(M) ) + VT 0 + VT 1
(A.42)
Appendix 139
where threshold voltage at zero backbias VT0 [V] for the actual transistor at the
actual temperature is defined as geometrical model, VT0T [V] is threshold tem
perature dependence, VT0G [V] threshold geometrical dependence and VT0(M) [V]
matching deviation of threshold voltage. Due to the variation in the doping in
the depletion region under the gate, a twofactor bodyeffect model is needed to
account for the increase in threshold voltage with VSB for ionimplanted transistors.
The change in threshold voltage for nonzero back bias is represented in the model
as
K0 (uS uS0)
uS < uSX
2
1 K
K0 uSX K0 uS0
K0
VT 0 = (A.43)
2
2 K 2
+ K uS 1 K0
uSX uS uSX
uS = VSB + B uS0 = B uST = VSBT + B uSX = VSBX + B
(A.44)
where the parameter VSBX [V] is the backbias value, at which the implemented
layer becomes fully depleted, K0 [V1/2] is lowbackbias body factor for the actual
transistor and K [V1/2] is highbackbias body factor for the actual transistor. For
nonzero values of the drain bias, the drain depletion layer expands towards the
source and may affect the potential barrier between the source and channel regions
especially for shortchannel devices. This modulation of the potential barrier
between source and channel causes a reduction in the threshold voltage. In sub
threshold this dramatically increases the current and is referred to as draininduced
barrier lowering (DIBL). Once an inversion layer has been formed at higher values
of gate bias, any increase of drain bias induces an additional increase in inversion
charge at the drain end of the channel. The drain bias still has a small effect in the
threshold voltage and this effect is most pronounced in the output conductance in
strong inversion and is referred to as static feedback. The DIBL effect is modeled
by the parameter 00 in the subthreshold region. This drain bias voltage depen
dence is expressed by first part of
2
VGTX 2
VGT 1
VT 1 = 0 2 2
VDS 1 2 2
VDSDS (A.45)
VGTX + VGT 1 VGTX + V GT 1
VGS VT 1 VGS VT1
VGT 1 =
0 VGS < VT1
VGTX = 2/2 (A.46)
where 1 is coefficient for the draininduced threshold shift for large gate drive for
the actual transistor and DS exponent of the VDS dependence of 1 for the actual
140 Appendix
where VT0(AIntra) and VT0(BIntra) are withinchip spread of VT0 [Vm], FS is a sort of
mechanism to switch between inter and intra die spread, for intra.die spread FS = 1,
otherwise is zero, and FC is correction for multiple transistors in parallel and units.
Transistor Current Gain: A single expression model the drain current for all
regions of operation in the MOST model 9 is given by
VGT 3 1+ 2
1
VDS1 VDS1
IDS = G3 (A.50)
{1 + 1 VGT 1 + 2 (us us0 )}(1 + 3 VDS1 )
Appendix 141
where
1 2
(K0 K)VSBX
1 = K+ 2 (A.51)
us VSBX + (2 VGT 1 + VSB )2
m
us0
m = 1 + m0 (A.54)
us1
1, 2, 3 are coefficients of the mobility reduction due to the gateinduced field, the
backbias and the lateral field, respectively, T thermal voltage at the actual tem
perature, 1 weakinversion correction factor, 1 and 2 are model constants and VP
is characteristic voltage of the channellength modulation. The parameter m0 char
acterizes the subthreshold slope for VBS = 0. Gain factor is defined as
We A / 2
= SQT Fold (1 + SSTI ) 1+ + B / 2 FS
Le W e Le F C
(A.55)
where SQT is gain factor temperature dependence, SSTI is STI stress, FS switching
mechanism factor, FC correction factor multiple transistors in parallel and units and A
area scaling factor and B a constant. Gain factor temperature dependence is defined as
T0 + T R
SQT = SQ (A.56)
T0 + TA + TA
BSQ BSQS
T0 + TR T0 + TR
BSQ = SQTR BSQS = SQSTR
T0 + TA + TA T0 + TA + TA
(A.58)
VGS VT 21 VDS
ID
= VDS (A.59)
1 + (VGS VT )
(VGS VT )2
ID
= (A.60)
2 1 + (VGS VT )
1 + 21 VDS (VGS VT )
o = o = (A.63)
VGS VT 21 VDS (1 + (VGS VT )) 1 + (VGS VT )
A / 2
= + B / 2 + S D (A.67)
Weff Leff
where Weff is the effective gatewidth and Leff the effective gatelength, the propor
tionality constants AVT, SVT, A, and S are technologydependent factors, D is dis
tance and BVT and B are constants. For widely spaced devices terms SVTD and SD
are included in the models for the random variations in two previous equations,
but for typical device separations (<1 mm) and typical device sizes this correction
is small. Most mismatch characterization has been performed on devices in strong
inversion, in the saturation or linear region but some studies for devices operat
ing in weak inversion have also been conducted. Qualitatively, the behavior in all
regions is very similar; VT and variations are the dominant source of mismatch
and their matching scales with device area. The effective mobility degradation
mismatch term can be combined with the current factor mismatch term, as both
terms become significant in the same bias range (high gate voltage). The corre
lation factor (VT, /) can be ignored as well, since correlation between
(VT) and the other mismatch parameters remains low for both small and large
devices. The drain source current error ID/ ID is important for the voltage biased
pair. For the current biased pair, the gatesource or input referred mismatch should
be considered, whose expression could be derived similarly as for drain source
current error. Change in gatesource voltage can be calculated by
VGS VGS
VGS = VT + (A.68)
VT
where AB [m2] is diffusion area, VR [V] voltage at which parameters have been
determined, VDB [V] diffusion voltage of bottom area AB, VDBR [V] diffusion
voltage of the bottom junction at T = TR and PB [] bottomjunction grading
coefficient.
Similar formulations hold for the locosedge and the gateedge compo
nents; one has to replace the index B by S and G, and the area AB by LS and LG.
Capacitance of the bottom component is derived as
CJBR
P
B
V < VLB
1 V V
CJBV = DB (A.73)
C CLB PB (V VLB )
LB + VDB (1FCB ) V VLB
where
1
1 + PB PB
CLB = CJB (1 FCB )PB FCB = 1 VLB = FCB VDB (A.74)
3
and V is diode bias voltage. Similar expressions can be derived for sidewall CJSV and
gate edge component CJGV. The total diode depletion capacitance can be described by:
C = CJBV + CJSV + CJGV (A.75)
Appendix 145
Typical CMOS and BiCMOS technologies offer several different resistors, such
as diffusion n+/p+ resistors, n+/p+ poly resistors, and nwell resistor. Many fac
tors in the fabrication of a resistor such as the fluctuations of the film thickness,
doping concentration, doping profile, and the dimension variation caused by the
photolithographic inaccuracies and nonuniform etch rates can display significant
variation in the sheet resistance. However, this is bearable as long as the device
matching properties are within the range the designs require. The fluctuations of
the resistance of the resistor can be categorized into two groups, one for which the
fluctuations occurring in the whole device are scaled with the device area, called
area fluctuations, another on in which fluctuations takes place only along the
edges of the device and therefore scaled with the periphery, called peripheral fluc
tuations. For a matched resistor pair with width W and resistance R, the standard
deviation of the random mismatch between the resistors is
fp
= fa + W R (A.76)
W
where fa and fp are constants describing the contributions of area and periphery
fluctuations, respectively. In circuit applications, to achieve required matching,
resistors with width (at least 23 times) wider than minimum width should be
used. Also, resistors with higher resistance (longer length) at fixed width exhibit
larger mismatching. To achieve the desired matching, it has been a common prac
tice that a resistor with long length (for high resistance) is broken into shorter
resistors in series. To model a (polysilicon) resistor following equation is used
L Re
R = Rsh + (A.77)
W + W W + W
where Rsh is the sheet resistance of the polyresistor, Re is the end resistance coef
ficient, W and L are resistor width and length, W is the resistor width offset. The
correlations between standard deviations () of the model parameters and the stan
dard deviation of the resistance are given in the following
2 2 2
R R R
R2 = 2
Rsh 2
+ Re 2
+ W (A.78)
Rsh Re W
2
L2
1 L Rsh Re
R2 = Rsh
2 + 2
Re + 2
W +
(W + W )2 (W + W )2 (W + W )2 (W + W )2
(A.79)
To define the resistor matching,
2 2 2
2 2 L 2 1 2 1
R = Rsh + Re + W
R (L Rsh + Re ) (L Rsh + Re ) (W + W )2
(A.80)
146 Appendix
ARsh AW
Rsh = Re = ARe W =
1
(A.81)
WL W 2
where fa and fp are factors describing the influence of the area and periphery fluctu
ations, respectively. The contribution of the periphery components decreases as the
area (capacitance) increases. For very large capacitors, the area components
domi
nate and the random mismatch becomes inversely proportional to C . A simple
capacitor mismatch model is given by
2 fp fa
C = p2 + a2 + d2 p = 3 a = 1 d = fd d (A.83)
C C4 C2
where fp, fa, and fd are constants describing the influence of periphery, area, and
distance fluctuations. The periphery component models the effect of edge rough
ness, and it is most significant for small capacitors, which have relatively large
amount of edge capacitance. The area component models the effect of shortrange
dielectric thickness variations, and it is most significant for moderate size capaci
tors. The distance component models the effect of global dielectric thickness vari
ations across the wafer, and it becomes significant for large capacitors or widely
spaced capacitors.
The modern analog circuit simulators use a modified form of nodal analysis [11,
12] and NewtonRaphson iteration to solve the system of n nonlinear equations
fi in n variables pi. In general, the timedependent behavior of a circuit containing
linear or nonlinear elements may be described as [13]
q E = 0 q0 = q(0)
f (q, , w, p, t) = 0 (A.84)
This notation assumes that the terminal equations for capacitors and inductors are
defined in terms of charges and fluxes, collected in q. The elements of matrix E
Appendix 147
are either 1 or 0, and represents the circuit variables (nodal voltages or branch
currents). All nonlinearitys are incorporated in the algebraic system f(q, , w, p,
t) = 0, so the differential equations q E = 0 are linear. The initial conditions
are represented by q0. Furthermore, w is a vector of excitations, and p contains the
circuit parameters like parameters of linear or nonlinear components. An element
of p may also be a (nonlinear) function of the circuit parameters. It is assumed that
for each p there is only one solution of . The dc solution is computed by solving
the system
E0 = 0
(A.85)
f (q0 , 0 , w0 , pi , 0) = 0
which is derived by setting q = 0. The solution (q0, 0) is fond by Newton
Raphson iteration. In general, this technique finds the solution of a nonlinear sys
tem F() = 0 by iteratively solving the NewtonRaphson equation
J k k = f ( k ) (A.86)
k
where J is the Jacobian of f, with J ij = fi /j . Iteration starts with estimate
k
k
At each time point the circuit derivatives are obtained by solving previous system
of equation after the original system is solved. Suppose, for example, that a kth
order backward differentiation formula (BDF) is used [15, 16], with the corrector
k1
1
(q )n+k = ai qn+ki (A.90)
t
i=0
where the coefficients ai depend upon the order k of the BDF formula. After sub
stituting (A.90) into (A.84), the NewtonRaphson equation is derived as
k1
a0 1
t E qn+k t ai qn+ki En+k
f f
n+k
= t=0 (A.91)
q f (qn+k , n+k , wn+k , pj , tn+k )
Iteration on this system provides the solution (qn+k,n+k). Substituting a kth order
BDF formula in (A.89) gives the linear system
a q k1
1 q
t0 E pj n+k t ai p j
(A.92)
f f = t=0 n+ki
q f
pj
n+k
p j
Thus (A.91) and (A.92) have the same system matrix. The LU factorization of this
matrix is available after (A.91) is iteratively solved. Then a forward and backward
substitution solves (A.92). For each parameter the righthand side of (A.92) is
different and the forward and backward substitution must be repeated. If random
term (p, t), which models the tolerance effects is nonzero and added to the
equation (A.35) [1721]
f (q, , w, p, t) + (p, t) = 0 (A.93)
Solving this system means to determine the probability density function of the ran
dom vector p(t) at each time instant t. For two instants in time, t1 and t2, with t1
= t1 t0 and t2 = t2 t0 where t0 is a time that coincides with dc solution of cir
cuit performance function , t is assumed to satisfy the criteria that circuit per
formance function can be designated as the quasistatic. To make the problem
manageable, the function can be linearized by firstorder Taylor approximation
assuming that the magnitude of the random term p is sufficiently small to consider
the equation as linear in the range of variability of p or the nonlinearities are so
smooth that they might be considered as linear even for a wide range of p.
Appendix 149
Once the nominal parameter vector p0 is found for the nominal device, the param
eter extraction of all device parameters pk of the transistors connected to particular
node n can be performed using a linear approximation to the model. Let p = [p1, p2,
, pn]T Rn denote the parameter vector, f = [f1, f2, , fm]T Rm performance vec
k T Rm the measured performance vector of the kth device
tor, zk = z1k , z2k , . . . , zm
and w a vector of excitations w = [w1, w2, , wl]T Rl. Considering Eq. (A.84)
q E = 0 q0 = q(0)
f (q, , w, p, t) = 0 (A.94)
general model can be written. The measurements can only be made under certain
selected values of w, and if the initial conditions q0 are met, so the model can be
simply denoted as
f (p) = 0 (A.95)
To extract a parameter vector pk corresponding to the kth device
pk = arg min f (pk ) zk (A.96)
pk Rn
is found. The weighted sum of error squares for the kth device is formed as [13]
m
1 1
(pk ) = wi [fi (pk ) zik ]2 = [f (pk ) zk ]T W [f (pk ) zk ] (A.97)
2 2
i=1
So, for the measured performance vector zk for the kth device, an approximate esti
mate of the model parameter vector for the kth device is obtained from
where
where H is the Hessian matrix [22], whose elements are the secondorder
derivatives
Now define
r = rr pr + where rr pr = [1 . . . k ]T (A.105)
r rr pr 2
2 (A.106)
testable if the variance of its estimated deviation is below a certain limit. The off
diagonal elements of Cpr contain the parameter covariances.
If an accuracy check shows that the performance function extraction is not
accurate enough, the performance function correction is performed to refine the
extraction. The basic idea underlying performance function correction is to correct
the errors of performance function extraction based on the given model and the
knowledge obtained from the previous stages by iteration process. Denoting
k k
(i) (p) = 0 + (i) (A.109)
the extracted performance function vector for the kth device at the ith iteration,
performance function correction can be found by finding the solution for the trans
k k )
formation (i+1) = Fi ((i) such that more accurate performance function vectors
can be extracted, subject to
k k k k
(i+1) () < (i) () (A.110)
where
k k
() = arg min ( ) (A.111)
k Rn
is the ideal solution of the performance function. The error correction mapping Fi
is selected in the form of
k k k
(i+1) (p) = (i) (p) + di (i) (A.112)
{dik , (i)
k
, k = 1, 2, . . . , K} (A.113)
gives the information relating the errors due to inaccurate parameter extraction to
the extracted parameter values. A quadratic function is postulated to approximate
the error correction function
n
n
n
dt = pj + pj pl , t = 1, 2, . . . , n (A.114)
j=1 j=1 l=1
k
(i+1) k
= (i) k
(p) + di (i) (A.116)
2
n= 2z1 (A.121)
2
If, for example a mean value has to be estimated with a relative error / =
0.1 and a confidence level of = 0.99 (z1/2 2.5) the sample size is n = 2500.
Similar to that we have for the estimate of the variance
n
1
2 = (i )2 (A.122)
n1
i=1
in order to provide that the estimate 2 falls with probability into the interval
2 2
2 2 2 + (A.124)
2 2
Appendix 153
For example, the required number of samples for an accuracy of / = 0.1 and
a confidence level of 0.99 is n = 1250.
(VTi , j) T 1 T (VTi , j) W (VTi , j)
= d T (VTi , j) X(VTi , j)
VTi VTi VTi
(A.128)
(i , j) T (i , j) W (i , j)
= d T T 1 (i , j) X(i , j) (A.129)
i i i
The firstorder derivatives of the magnitude of the circuit performance function are
computed from
(j) 1 (VTi , j)
= (VTi , j)Re (A.130)
VTi (VTi , j) VTi
(i , j) 1 (i , j)
= (i , j)Re (A.131)
i (i , j) i
154 Appendix
where Re denotes the real part of the complex variable function. The second
order derivatives are calculated from
(A.132)
2 (i , j) (i , j) 2
1
= ( i , j)Re
i2 (i , j) i
2
2 (i , j) (i , j) 2
1 1
+ (i , j)Re
(i , j) i2 (i , j)2 i
(A.133)
The circuit performance function (j) can be approximated with the truncated
Taylor expansions as
(j)
= (j) + J (j) (j) (A.134)
(A.136)
where the covariance matrix of the circuit performance function C (j) is defined
as
where
i +Li x
x j +Lj yi+Wi yj+Wj
1
Cp1 p1 = Rp1 p1 (xA , yA , xB , yB )
ij (Wi Li )(Wj Lj )
xi xj yi yj
p1 (xA , yA )p1 (xB , yB ) dxA dxB dyA dyB
(A.139)
i +Li x
x j +Lj yi+Wi yj+Wj
1
Cp1 p2 = Rp1 p2 (xA , yA , xB , yB )
ij (Wi Li )(Wj Lj )
xi xj yi yj
p1 (xA , yA )p2 (xB , yB ) dxA dxB dyA dyB
(A.140)
and Rp1p1(xA, yA, xB, yB), the autocorrelation function of the stochastic process p1,
is defined as the joint moment of the random variable p1(xA, yA) and p1(xB, yB), i.e.,
Rp1p1(xA, yA, xB, yB) = E{p1(xA, yA) p1(xB, yB)}, which is a function of xA, yA and xB, yB
and Rp1p2(xA, yA, xB, yB) = E{p1(xA, yA)p2(xB, yB)} the crosscorrelation function of
the stochastic process p1 and p2. The experimental data shows that threshold volt
age differences VT and current factor differences are the dominant sources
underlying the drainsource current or gatesource voltage mismatch for a matched
pair of MOS transistors.
The covariance pipj = 0, for i j, if pi and pj are uncorrelated. Thus the covari
ance matrix CP of p1, , pk with mean pi and a variance pi2 is
Cp1 ,...pk = diag(1, . . . , 1) (A.141)
In [10] these random differences for the single transistor having a normal distribu
tion with zero mean and a variance dependent on the device area WL are derived
as
AVT / 2
for i = j Cp1 p1 = VT = + BVT / 2 + SVT D; for i = j Cp1 p1 = 0
ij Weff Leff ij
(A.142)
A / 2
for i = j Cp2 p2 = / = + B / 2 + S D; for i = j Cp2 p2 = 0
ij Weff Leff ij
(A.143)
where Weff is the effective gatewidth and Leff the effective gatelength, the propor
tionality constants AVT, SVT, A, and S are technologydependent factors, D is dis
tance and BVT and B are constants.
156 Appendix
n
2
2 (VTi , j) 2 2 (Vi , j) 2
= VTi + i (A.145)
i=1
VTi2 i2
where n is total number of transistors in the circuit and is the mean of = f(VT
(j), (j)) over the local or global parametric variations.
and
P(G) = P(n G G) = fn (n G)dn = 1 fn (n F)dn = 1
G G
(A.149)
P(F) = P(n F F) = fn (n F)dn = 1 fn (n G)dn = 1
F F
(A.150)
Recall that if ~ N(, 2), then Z = ( /) ~ N(0, 1). In the present case,
the sample mean of , ~ N(, 2/n), since the variable is assumed to have
a normal distribution. Since and represent probabilities of events from the
same decision problem, they are not independent of each other or of the sample
size. Evidently, it would be desirable to have a decision process such that both
and are small. However, in general, a decrease in one type of error leads to an
increase in the other type for a fixed sample size. The only way to simultaneously
reduce both types of errors is to increase the sample size. However, this proves
to be timeconsuming process. The NeymanPearson test is a special case of the
Bayes test, which provides a workable solution when the a priori probabilities may
be unknown or the Bayes average costs of making a decision may be difficult to
evaluate or set objectively. The NeymanPearson test is based on the critical region
C* , where is sample space of the test statistics
C = {(1 , . . . , n ) : l(1 , . . . , n G, F) } (A.151)
which has the largest power (smallest probability that faulty circuit is accepted
when it is faulty) of all tests with significance level . Introducing the Lagrange
multiplier to account for the constraint gives the following cost function, J,
which must be maximized with respect to the test and
J = 1 + (0 ) = 0 + fn (n F) fn (n G)dn
(A.152)
G
Now,
n
n
(i F )2 (i G )2 = n 2F 2G 2n(F G ) (A.156)
i=1 i=1
Using the NeymanPearson Lemma, the critical region of the most powerful test
of significance level is
1 2 2
C = 1,..., n : exp n F G 2n(F G )
2 2
2
(F + G )
(A.157)
= 1,..., n : log +
n(F G ) 2
= 1,..., n :
For the test to be of significance level
G
P  N , 2 /n = P Z = = G + z(1)
/ n n
(A.158)
where P(Z < z(1)) = 1 , which can be also written as 1(1 ). z(1) is
the (1 )quantile of Z, the standard normal distribution. This boundary for the
critical region guarantees, by the NeymanPearson lemma, the smallest value of
obtainable for the given values of and n. From two previous equations, we can
see that the test T rejects for
G
T= z(1) (A.159)
/ n
Similarly, to construct a test for the twosided alternative, one approach is to com
bine the critical regions for testing the two onesided alternatives. The two one
sided tests form a critical region of
C = (1 , . . . , n ) : 2 , 1
(A.160)
Appendix 159
1 = G + z(1 ) 2 = G z(1 ) (A.161)
2 n 2 n
Thus, the test T rejects for
G G
T= z(1 ) or T = z(1 ) (A.162)
/ n 2 / n 2
G
T= tn1, (A.165)
S/ n
A critical region for the twosided alternative if the variance 2 is unknown of the
form
G
C = (1 , . . . , n ) : t = 2 , t 1 (A.166)
S/ n
G G
T= tn1, 2 or T = tn1, 2 (A.168)
S/ n S/ n
160 Appendix
References
1. R.P. Jindal, Compact noise models for MOSFETs. IEEE Trans. Electron Devices 53(9),
20512061 (2006)
2. J. Ou, in gm/ID based noise analysis for CMOS analog circuits, Proceedings of IEEE
International Midwest Symposium on Circuits and Systems, pp. 14, 2011
3. W. Wattanapanitch, M. Fee, R. Sarpeshkar, An energyefficient micropower neural recording
amplifier. IEEE Trans. Biomed. Circuits Syst. 1(2), 136147 (2007)
4. M. Zamani, A. Demosthenous, in Power optimization of neural frontend interfaces,
Proceedings of IEEE International Symposium on Circuits and Systems, pp. 30083011,
2015
5. C.C. Liu etal., A 10bit 50MS/s SAR ADC with a monotonic capacitor switching proce
dure. IEEE J. SolidState Circuits 45(4), 731740 (2010)
6. D. Zhang, C. Svensson, A. Alvandpour, in Power consumption bounds for SAR ADCs,
Proceedings of IEEE European Conference on Circuit Theory and Design, pp. 556559,
2011
7. T. Yu, S. Kang, I. Hajj, T. Trick, in Statistical modeling of VLSI circuit performances,
Proceedings of IEEE International Conference on Computeraided Design, pp. 224227,
1986
8. K. Krishna, S. Director, The linearized performance penalty (LPP) method for optimization
of parametric yield and its reliability. IEEE Trans. CAD Integr. Circuits Syst. 15571568
(1995)
9. MOS model 9, available at http://www.nxp.com/models/mosmodels/model9.html
10. M. Pelgrom, A. Duinmaijer, A. Welbers, Matching properties of MOS transistors. IEEE J.
SolidState Circuits 24(5), 14331439 (1989)
11. V. Litovski, M. Zwolinski, VLSI Circuit Simulation and Optimization (Kluwer Academic
Publishers, Dordrecht, 1997)
12. K.Kundert, Designers Guide to Spice and Spectre (Kluwer Academic Publishers, Dordrecht,
1995)
13. J. Vlach, K. Singhal, Computer Methods for Circuit Analysis and Design (Van Nostrand
Reinhold, New York, 1983)
14. N. Higham, Accuracy and Stability of Numerical Algorithms (SIAM, Philadelphia, 1996)
15. W.J. McCalla, Fundamentals of Computeraided Circuit Simulation (Kluwer Academic
Publishers, Dordrecht, 1988)
16. F. Scheid, Schaums Outline of Numerical Analysis (McGrawHill, New York, 1989)
17. E. Cheney, Introduction to Approximation Theory (American Mathematical Society,
Providence, 2000)
18. S. Director, R. Rohrer, The generalized adjoint network and network sensitivities. IEEE
Trans. Comput. Aided Des. 16(2), 318323 (1969)
19. D. Hocevar, P. Yang, T. Trick, B. Epler, Transient sensitivity computation for MOSFET cir
cuits. IEEE Trans. Comput. Aided Des. CAD4, 609620 (1985)
20. Y. Elcherif, P. Lin, Transient analysis and sensitivity computation in piecewiselinear circuits.
IEEE Trans. Circuit Syst. I 38, 15251533 (1991)
21. T. Nguyen, P. OBrien, D. Winston, in Transient sensitivity computation for transistor level
analysis and tuning, Proceedings of IEEE International Conference on ComputerAided
Design, pp. 120123, 1999
22. K. Abadir, J. Magnus, Matrix Algebra (Cambridge University Press, Cambridge, 2005)
23. A. Papoulis, Probability, Random Variables, and Stochastic Processes (McGrawHill, New
York, 1991)
24. C. Gerald, Applied Numerical Analysis (Addison Wesley, Reading, 2003)
Index
K P
KarhunenLoeve expansion, 99, 100 Parameter space, 81, 103, 111
KarushKuhnTucker conditions, 82, 85 Parameter vector, 147
Kernel, 13, 77, 78, 80, 8284, 86, 87, 89, 91, Parametric yield, 102
113, 125, 127 Parametric yield optimization, 102
Keslers construction, 13, 77, 78, 83, 125 Pedestal voltage, 42
Kmeans, 78 Phase margin, 21
Kronecker delta, 84 Pipeline converters, 37, 38
Power per area, 14, 96, 97, 112, 117119, 126
Principal component analysis, 78
L Probability density function, 105, 111
Least significant bit, 39 Process variation, 11, 14, 96, 104, 110, 116,
Local field potentials, 19, 33 119, 126
Low noise amplifier, 18, 19, 124 Programmable gain amplifier, 19, 52
Lownoise amplifier, 13 Pushpull current mirror amplifier, 24
Lyapunov equations, 109, 110
Index 163
T
R Telescopic cascode amplifier, 2224, 26
Random variability, 97 Template matching, 78, 88
Random error, 11 Threshold voltage, 4244, 51, 80, 98, 100, 101
Random gate length variability, 8, 116 Threshold voltagebased models, 98
Random intrachip variability, 116 Timeinterleaved systems, 38
Random process, 11, 97100 Tolerance, 42, 102, 103, 117, 119
Random variables, 98, 100 Total harmonic distortion, 13, 18, 29, 65, 124
Random vector, 105 Transconductor, 13, 18, 21, 30, 124
Reliability, 11, 43, 111, 118 Transient analysis, 106, 107
Residuals, 148 Twostage amplifier, 2527, 46
Runtime, 114 Twostep converter, 3638
S U
Sample and hold, 39, 58, 59, 66 Unbiased estimator, 157
Schur decomposition, 109 Utah array, 2
Sensors, 18, 124, 126, 127
Shortchannel effects, 43, 96
Signal to noise and distortion ratio, 65 V
Signaltonoise ratio, 3, 17, 33, 43, 119, 133 Variable gain amplifier, 25
Significance level, 9 Vernier, 60, 6264, 69
Slew rate, 8, 24, 45, 46 Very largescale integrated circuit, 3
Spatial correlation, 100 Voltagetotime converter, 34, 61, 62, 67
Spike classifier, 13, 78, 79, 81, 91, 125 Voltage variability, 2, 116, 126
Spurious free dynamic range, 65
Standard deviation, 86, 98
Static latch, 4749 W
Stationary random process, 98 Wafer, 98
Stochastic differential equations, 103, Widesense stationary, 98
105108 Wiener process, 108
Stochastic process, 98, 101, 102, 108 Withindie, 103, 107
Subrange, 35 Worstcase design, 115, 119
Substrate coupling, 26
Successive approximation register, 38, 39
Support vector machine, 13, 78, 79, 81, 83, Y
91, 125 Yield, 3, 11, 12, 14, 84, 89, 96, 97, 103, 106,
Surface potentialbased models, 98 110, 111, 116120, 124, 126
Switched capacitor, 40, 41, 44, 118
System on chip, 1, 3, 9, 11, 12, 44, 96, 124