Beruflich Dokumente
Kultur Dokumente
Statistical Performance
Analysis and Modeling
Techniques for Nanometer
VLSI Designs
123
Ruijing Shen Sheldon X.-D. Tan
Department of Electrical Engineering Department of Electrical Engineering
University of California University of California
Riverside, USA Riverside, USA
Hao Yu
Department of Electrical and Electronic
Nanyang Technological University
Nanyang Avenue 50, Singapore
As VLSI technology scales into nanometer regime, chip design engineering faces
several challenges. One profound change in the chip design business is that engi-
neers cannot realize the design precisely into the silicon chips. Chip performance,
manufacture yield, and lifetime thereby cannot be determined accurately at the
design stage accordingly. The main culprit here is that many chip parameters—
such as oxide thickness due to chemical and mechanical polish (CMP) and impurity
density from doping fluctuations—cannot be determined or estimated precisely and
thus become unpredictable at device, circuit, and system levels, respectively. The
so-called manufacturing process variations start to play an essential role, and their
influence on the performance, yield, and reliability becomes significant. As a result,
variation-aware design methodologies and computer-aided design (CAD) tools are
widely believed to be the key to mitigate the unpredictability challenges for 45 nm
technologies and beyond. Variational characterization, modeling, and optimization,
hence, have to be incorporated into each step of the design and verification processes
to ensure reliable chips and profitable manufacture yields.
The book is divided into five parts. Part I introduces basic concepts of many
mathematic notations relevant to statistical analysis. Many established algorithms
and theories such as the Monte Carlo method, the spectral stochastic method, and
the principal factor analysis method and its variants will also be introduced. Part
II focuses on the techniques for statistical full-chip power consumption analy-
sis considering process variations. Chapter 3 reviews existing statistical leakage
analysis methods, as leakage powers are more susceptible to process variations.
Chapter 4 presents a gate-level leakage analysis method considering both inter-
die and inter-die variations with spatial correlations using the spectral stochastic
method. Chapter 5 tries to solve the similar problems in the previous chapter. But a
more efficient, linear-time algorithm is presented based on a virtual grid modeling of
process variations with spatial correlations. In Chap. 6, a statistical dynamic power
analysis technique using the combined virtual grid and the orthogonal polynomial
methods is presented. In Chap. 7, a statistical total chip power estimation method
will be presented. A collocation-based spectral-stochastic-based method is applied
to obtain the variational total chip powers based on accurate SPICE simulation.
vii
viii Preface
The contents of the book mainly come from the research works done in the Mixed-
Signal Nanometer VLSI Research Lab (MSLAB) at the University of California at
Riverside over the past several years. Some of the presented methods also come from
the research from Dr. Hao Yu’s research group at Nanyang Technological University,
Singapore.
It is a pleasure to record our gratitude to many Ph.D. students who have
contributed to this book. They include Dr. Duo Li, Dr. Ning Mi, Dr. Zhigang Hao,
and Mr. Fang Gong (UCLA) for some of their research works presented in this book.
Special thank is also given to Dr. Hai Wang, who helps to revise and proofread the
final draft of this book.
Sheldon X.-D. Tan is grateful to his collaborator Prof. Yici Cai of Tsinghua
University for the collaborative research works, which lead to some of the presented
works in this book. Sheldon X.-D. Tan is also obligated to Dr. Jinjun Xiong and
Dr. Chandu Visweswariah of IBM for their insights into many important problems
in industry, which inspired some of the works in this book.
The authors would like to thank both the National Science Foundation and
National Nature Science Foundation of China for their financial support for this
book. Sheldon X.-D. Tan highly appreciates the consistent support of Dr. Sankar
Basu of the National Science Foundation over the past 7 years. This book project is
funded in part by NSF grant under No. CCF-0448534; in part by NSF grants under
No. OISE-0623038, OISE-0929699, OISE-1051787, CCF-1116882; and OISE-
1130402; and in part by the National Natural Science Foundation of China (NSFC)
grant under No. 60828008. We would also would like to thank for the support of
UC Regent’s Committee on Research Fellowship and Faculty Fellowships from the
University of California at Riverside. Dr. Hao Yu would like also to acknowledge
the funding support from NRF2010NRF-POC001-001, Tier-1-RG 26/10, and Tier-
2-ARC 5/11 at Singapore.
Last not least, Sheldon X.-D. Tan would like to thank his wife, Yan, and his
daughters, Felicia and Leslay, for understanding and support during the many hours
it took to write this book. Ruijing Shen would like to express her deepest gratitude to
her adviser, Prof. Sheldon X.-D. Tan, for his help, trust, and guidance. There exist
ix
x Acknowledgment
Part I Fundamentals
1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3
1 Nanometer Chip Design in Uncertain World . . . . . . .. . . . . . . . . . . . . . . . . . . . 3
1.1 Causes of Variations .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4
1.2 Process Variation Classification and Modeling . . . . . . . . . . . . . . . . . . 6
1.3 Process Variation Impacts .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8
2 Book Outline .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8
2.1 Statistical Full-Chip Power Analysis . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9
2.2 Variational On-Chip Power Delivery Network Analysis . . . . . . . . 10
2.3 Statistical Interconnect Modeling and Extraction .. . . . . . . . . . . . . . . 11
2.4 Statistical Analog and Yield Analysis and Optimization . . . . . . . . 12
3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13
2 Fundamentals of Statistical Analysis . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15
1 Basic Concepts in Probability Theory . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15
1.1 Experiment, Sample Space, and Event . . . . . . .. . . . . . . . . . . . . . . . . . . . 15
1.2 Random Variable and Expectation .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 16
1.3 Variance and Moments of Random Variable .. . . . . . . . . . . . . . . . . . . . 17
1.4 Distribution Functions .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 18
1.5 Gaussian and Log-Normal Distributions . . . . .. . . . . . . . . . . . . . . . . . . . 19
1.6 Basic Concepts for Multiple Random Variables . . . . . . . . . . . . . . . . . 20
2 Multiple Random Variables and Variable Reduction.. . . . . . . . . . . . . . . . . . 23
2.1 Components of Covariance in Process Variation.. . . . . . . . . . . . . . . . 23
2.2 Random Variable Decoupling and Reduction .. . . . . . . . . . . . . . . . . . . 25
2.3 Principle Factor Analysis Technique . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 26
2.4 Weighted PFA Technique . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 26
2.5 Principal Component Analysis Technique . . .. . . . . . . . . . . . . . . . . . . . 27
3 Statistical Analysis Approaches.. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 28
3.1 Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 28
xi
xii Contents
References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 287
Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 299
List of Figures
xix
xx List of Figures
Fig. 6.1 The dynamic power versus effective channel length for
an AND2 gate in 45 nm technology (70 ps active pulse
as partial swing, 130 ps active pulse as full swing).
Reprinted with permission from [60] c 2010 IEEE .. . . . . . . . . . . . . . . 84
Fig. 6.2 A transition waveform example fE1 ; E2 ; : : : ; Em g for a
node. Reprinted with permission from [60] c 2010 IEEE . . . . . . . . . 86
Fig. 6.3 The flow of the presented algorithm . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 87
Fig. 6.4 The flow of building the sub LUT . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 88
Fig. 15.2 Transient nominal x .0/ .t/ (a) and transient mismatch
(˛1 .t/) (b) for one output of a COMS comparator by
the exact orthogonal PC and the isTPWL. Reprinted
with permission from [52]. c 2011 ACM. . . . . . .. . . . . . . . . . . . . . . . . . . . 249
Fig. 15.3 Transient waveform comparison at output of a diode
chain: the transient nominal, the transient with
mismatch by SiSMA (adding mismatch at i c only),
the transient with mismatch by the presented method
(adding mismatch at transient trajectory). Reprinted
with permission from [52]. c 2011 ACM. . . . . . .. . . . . . . . . . . . . . . . . . . . 250
Fig. 15.4 Transient mismatch (˛1 .t/, the time-varying standard
deviation) comparison at output of a BJT mixer with
distributed substrate: the exact by OPC expansion, the
macromodel by TPWL (order 45), and the macromodel
by isTPWL (order 45). The waveform by isTPWL is
visually identical to the exact OPC. Reprinted with
permission from [52]. c 2011 ACM . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 250
Fig. 15.5 (a) Comparison of the ratio of the waveform error by
TPWL and by isTPWL under the same reduction order.
(b) comparison of the ratio of the reduction runtime by
maniMOR and by isTPWL under the same reduction
order. In both cases, isTPWL is used as the baseline.
Reprinted with permission from [52]. c 2011 ACM .. . . . . . . . . . . . . . 251
xxvii
xxviii List of Tables
Table 9.1 CPU time comparison of StoEKS and HPC with the
Monte Carlo method. gi .t/ D 0:1ud i .t/ . . . . . .. . . . . . . . . . . . . . . . . . . . 141
Table 9.2 Accuracy comparison of different methods, StoEKS,
HPC, and MC. gi .t/ D 0:1ud i .t/ . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 142
Table 9.3 Error comparison of StoEKS and HPC over Monte
Carlo methods. gi .t/ D 0:1ud i .t/ . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 142
As VLSI technology scales into the nanometer regime, chip design engineering
faces several challenges in maintaining historical rates of performance improvement
and capacity increase with CMOS technologies. One profound change in the chip
design business is that engineers cannot put the design precisely into the silicon
chips. Chip performance, manufacture yield, and lifetime become unpredictable at
the design stage, and they cannot be determined accurately at the design stage. The
main culprit is that many chip parameters—such as oxide thickness due to chemi-
cal and mechanical polish (CMP) and impurity density from doping fluctuations—
cannot be determined precisely and thus are unpredictable. The so-called manu-
facture process variations start to play a big role, and their influence on the chip’s
performance, yield, and reliability becomes significant [16, 78, 121, 122, 170].
Traditional corner-based analysis and design approaches apply guard bands to
consider parameter variations, which may lead to too conservative designs. Such
pessimism can lead to increased design efforts and prolonged time to market. Also
a worse case is a circuit that does not correspond with all parameters at their worst
or best process conditions. It will become extremely difficult to find such a worst
case by simulating a limited number of corner cases.
As a result, it is imperative to develop new design methodologies to consider the
impacts of various process and environmental uncertainties and elevated temper-
ature on chip performance. Variational impacts have to be incorporated into every
step of design process to ensure the reliable chips and profitable manufacture yields.
The design methodologies and design tools from system level down to the physical
levels have to consider variability impacts on the chip performance, which calls for
new statistical optimization approaches for designing nanometer VLSI systems.
Performance modeling and analysis of nanometer VLSI systems in the presence
of process-induced variation and uncertainty is the one crucial problem facing IC
chip designers and design tool developers. How to efficiently and accurately assess
the impacts of the process variations on circuit performances in the various physical
design steps is critical for fast design closure, yield improvement, cost reduction
of VLSI design, and fabrication processes. The design methodologies and design
tools from system level down to the physical levels have to embrace variability
impacts on the nanometer VLSI chips, which calls for statistical/stochastic-based
approaches for designing 90 nm and beyond VLSI systems. The advantages and
promises of statistical analysis is that the impact of parameter variations on a circuit
is simultaneously obtained with a less computing effort and the impacts on yield can
be properly understood and used for further optimization.
the voltage threshold voltage shift. N/PBTI will also lead to increased threshold
voltage, decreased drain current and transconductance of devices. Electromigration
will result in increased wire resistance and timing degradation of wires and even
lead to failure of the wires in the worst case. Those variations typically happen after
chips have been used for a while and were more studied as reliability issues than
variation problems in the past. So in this book, we do not consider such aging- and
reliability-related variations.
6 1 Introduction
where ıinter represents the inter-die variation. Typically, inter-die variations have
simple distributions such as Gaussian. For a single parameter variation, inter-die
variation impact can be very easily captured as all the devices in a die take the same
values. In other words, under inter-die variation, if the circuit performance metrics
such as power, timing, and noises of all gates or devices are sensitive to the process
parameters in a similar way, then the circuit performance can be analyzed at multiple
process corners using deterministic analysis methods. However, if a number of inter-
die process variations are considered and they are also correlated, the corner cases
will grow exponentially with the increased number of process parameters.
Intra-die variations correspond to variability within a single chip. Intra-die
variations may affect different devices differently on the same die, i.e., make
some devices to have smaller gate oxide thicknesses and others to have larger
1 Nanometer Chip Design in Uncertain World 7
Fig. 1.3 The dishing and oxide erosion after the CMP process
transistor gate oxide thicknesses. In addition, intra-die variations may exhibit spatial
correlation due to proximity effects, i.e., it is more likely for devices located close
to each other to have similar characteristics than those placed far away.
Obviously, intra-die variations will typically have a large number of variables as
each device may require a variable. As a result, statistical methods must be used
as the corner-based method will be too expensive in this case. Intra-die variation
can be further classified into wafer-level variation, layout-dependent variation,
and statistical variations [170] based on the sources of the variations. Wafer-level
variation comes from lens aberration effect. Layout-dependent variation is caused
by lithographic and etching processes such as CMP and OPC and phase-shift
masks (PSM). CMP may lead to variations in dimensions called dishing and oxide
erosion. Figure 1.3 gives a cartoon illustration of the dishing and oxide erosion after
the CMP process.
Optical proximity effects are layout dependent and will lead to different critical
dimension (CD) variations depending on the neighboring layout of a pattern. Those
layout-dependent variations typically are spatially correlated (they also have purely
random components). Statistical variations come from random dopant variations,
whose impacts are not significant in the past and become more visible as CD scales
down. Those variations are purely random and not spatially correlated. However,
their impact on performance tends to be limited due to averaging effect in general.
In summary, we can model all the components of variation as follows:
where ıinter and ıintra represent the inter-die variation and intra-die variation,
respectively. In some works such as in [13,95,170], ıinter and ıintra are both modeled
as Gaussian random variables. In general, we will consider both the Gaussian and
non-Gaussian cases.
For layout-dependent ıintra , the value of parameter p located at .x; y/ can be
modeled as a location-dependent normally distributed random variable [101]:
p D p C ıx C ıy C ; (1.3)
8 1 Introduction
where p is the mean value (nominal design parameter value) at .0; 0/, and ıx
and ıy stand for the gradients of the parameter indicating the spatial variations
of p along the x and y directions, respectively. represents the random intra-
chip variation. Due to spatial correlations in the intra-die variation [195], the
vector of all random components across the chip has a correlated multivariate
normal distribution, N.0; †/, where † is the covariance matrix of the spatially
correlated parameters. If the covariance matrix is identity matrix, then there is no
correlation among the variables.
2 Book Outline
The book will present the latest developments for modeling and analysis of VLSI
systems in the presence of process variations at the nanometer scale. The authors
make no attempt to be comprehensive on the selected topics. Instead, we want to
2 Book Outline 9
Occurances 200
100
0
2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4
W −4
x 10
Power distribution with a fixed input vector and correlations in Leff
300
Occurances
200
100
0
3.5 3.6 3.7 3.8 3.9 4 4.1 4.2
W x 10−4
Fig. 1.4 The comparison of circuit total power distribution of circuit c432 in ISCAS’85 bench-
mark sets (top) under random input vectors (with 0.5 input signal and transition probabilities) and
(bottom) under a fixed input vector with effective channel length spatial correlations. Reprinted
with permission from [62] c 2011 IEEE
provide some promising perspectives from the angle of new analysis algorithms
to solve the existing problems with reduced design cycle and cost. We hope this
book can guide chip designers for understanding the potential and limitations of
the existing design tools when improving their circuit design productivity, CAD
developers for implementing the state-of-the-art techniques in their tools, CAD
researchers for developing better and new generation algorithms, and students for
understanding and mastering the emerging needs in the research.
The book consists of five parts. Part I starts with the review of many fundamental
statistical and stochastic mathematic concepts, illustrated in Chap. 2. We discuss
random processes, correlation matrices, and Monte Carlo (MC) method. We also re-
view orthogonal polynomial chaos (PC) and the related spectral stochastic method,
and principal factor analysis (PFA) and their variants for variable reductions.
Part II of this book focuses on the techniques for statistical full-chip power
consumption analysis considering process variations. We will look at important
aspects of statistical power analysis such as leakage powers, dynamic powers, and
total power estimation techniques in different chapters.
10 1 Introduction
Part III of the book deals with variational analysis of on-chip power grid (distribu-
tion) networks to assess the impacts of process variations on voltage drop noises and
power delivery integrity. We have three chapters in the part: Chaps. 8–10.
Chapter 8 introduces an efficient stochastic method for analyzing the voltage
drop variations of on-chip power grid networks, considering log-normal leakage
2 Book Outline 11
current variations with spatial correlation. The new analysis is based on the OPC
representation of random processes. This method considers both wire variations
and subthreshold leakage current variations, which are modeled as log-normal
distribution random variables, on the power grid voltage variations. To consider the
spatial correlation, the orthogonal decomposition is carried to map the correlated
random variables into independent variables.
Chapter 9 presents another stochastic method for solving the similar problems
presented in Chap. 8. The new method, called StoEKS, still applies Hermite orthog-
onal polynomial to represent the random variables in both power grid networks
and input leakage currents. But different from the other orthogonal polynomial-
based stochastic simulation method, extended Krylov subspace (EKS) method is
employed to compute variational responses from the augmented matrices consisting
of the coefficients of Hermite polynomials. The new contributions of this method
lie in the acceleration of the spectral stochastic method using the EKS method to
fast solve the variational circuit equations. By using the reduction technique, the
presented method partially mitigates increased circuit-size problem associated with
the augmented matrices from the Galerkin-based spectral stochastic method.
Chapter 10 gives a new approach to variational power grid analysis. The new
approach, called ETBR for extended truncated balanced realization, is based on
model order reduction techniques to reduce the circuit matrices before the simula-
tion. Different from the (improved) extended Krylov subspace methods EKS/IEKS,
ETBR performs fast truncated balanced realization on response Gramian to reduce
the original system. ETBR also avoids the adverse explicit moment representation
of the input signals. Instead, it uses spectrum representation in frequency domain
for input signals by fast Fourier transformation.
The new algorithm is very efficient and scalable for huge networks with a large
number of variational variables. This approach, called varETBR for variational
ETBR, is based on model order reduction techniques to reduce the circuit matrices
before the variational simulation. It performs the parameterized reduction on the
original system using variation-bearing subspaces. varETBR calculates variational
response Gramians by MC-based numerical integration considering both system
and input source variations for generating the projection subspace. varETBR is very
scalable considering number of variables, and is flexible for different variational
distributions and ranges as demonstrated in experimental results. After the reduc-
tion, MC-based statistical simulation is performed on the reduced system, and the
statistical responses of the original system are obtained thereafter.
In Part V of this book, we discuss the variational analysis of analog and mixed-
signal circuits as well as the yield analysis and optimization methods based on
statistical performance analysis and modeling. We will present the performance
bound analysis technique in s-domain for linearized analog circuits (Chap. 14) and
the stochastic mismatch analysis of analog circuits (Chap. 15). Chapter 16 shows a
yield analysis and optimization technique, and Chap. 17, binning scheme.
Chapter 14 introduces a performance bound analysis of analog circuits consid-
ering process variations. The presented method applies a graph-based symbolic
3 Summary 13
analysis and affine interval arithmetic to derive the variational transfer functions
of analog circuits (linearized) with variational coefficients in forms of intervals.
Then the frequency response bounds (maximum and minimum) are obtained
by performing analysis of a finite number of transfer functions given by the
control-theoretic Kharitonov’s polynomial functions, which can be computed very
efficiently. We also show in this chapter that the response bounds given by the
Kharitonov’s functions are conservative given the correlations among coefficient
intervals in transfer functions.
Chapter 15 discusses a fast non-Monte Carlo (NMC) method to calculate mis-
match of analog circuits in time domain. The local random mismatch is described
by a noise source with an explicit dependence on geometric parameters and is
further expanded by OPC. The resulting equation forms a stochastic differential
algebra equation (SDAE). To deal with large-scale problems, the SDAE is linearized
at a number of snapshots along the nominal transient trajectory and, hence, is
naturally embedded into a trajectory-piecewise-linear (TPWL) macromodeling. The
modeling is further improved with a novel incremental aggregation of subspaces
identified at those snapshots.
Chapter 16 introduces a fast NMC method to capture physical-level stochastic
variations for system-level yield estimation and optimization. Based on the or-
thogonal PC expansion concept, an efficient and true NMC mismatch analysis is
developed to estimate the parametric yield. Moreover, this work further derives the
stochastic sensitivity for yield within the framework of orthogonal polynomials.
Using sensitivities, a corresponding multiobjective optimization is developed to
improve the yield rate and other performance merits, simultaneously. As a result, the
presented approach can automatically tune design parameters for a robust design.
Chapter 17 gives a yield optimization technique using voltage binning method
to improve yield of chips. Voltage binning technique tries to assign different supply
voltages to different chips in order to improve the yield. The chapter will introduce
the valid voltage segment concept, which is determined by the timing and power
constraints of chips. Then we show a formulation to predict the maximum number
of bins required under the uniform binning scheme from the distribution of length
of valid supply voltage segment. With this concept, an optimal binning scheme can
be modeled as a set-cover problem. A greedy algorithm is developed to solve the
resulting set-cover problem in an incremental way. The presented method is also
extendable to deal with the ranged supply voltages for dynamic voltage scaling
under different operation modes (like low power and high-performance modes).
3 Summary
In this chapter, we first describe the motivations for the statistical and variational
analysis and modeling of nanometer VLSI systems. We then briefly introduce
all the chapters in the book, which are divided into five parts: introduction and
fundamental, statistical full-chip power analysis, variational power delivery network
14 1 Introduction
To make this book self-contained, this chapter will review relevant mathematical
concepts used in this book. We first review basic probability and statistical concepts
used in this book. Then we introduce mathematic notations for statistical processes
with multiple variable and variable reduction methods. We will then go through
some statistical analysis approaches such as the MC method and the spectral
stochastic method. Finally, we will discuss some fast techniques to compute some
of random variables with log-normal distributions.
Usually, we are interested in some value associated with a random event rather than
the event itself. For example, in the experiment of tossing two dice, we only care
about the sum of the two dice, not the outcome of each die.
Definition 2.4. A random variable X on a sample space S is a real-valued function
X W S ! R.
Definition 2.5. A discrete random variable is a random variable that takes only a
finite or countably infinite number of values (arises from counting).
Definition 2.6. A continuous random variable is a random variable whose set of
assumed values is uncountable (arises from measurement).
Let X be a random variable and let a 2 R. The event “X D a” represents the set
fs 2 S j X.s/ D ag and the probability of this event is written as
X
Pr.X D a/ D Pr.s/:
s2S WX.s/Da
Theorem 2.1. Markov’s inequality. For a random variable X that takes on only
nonnegative values and for all a > 0, we have
EŒX
Pr.X a/ :
a
Proof. Let X be a random variable such that X 0 and let a > 0. Define a random
variable I by (
1; if X a,
I D
0; otherwise,
where EŒI D Pr.I D 1/ D Pr.X a/ and
X
I : (2.1)
a
The expectations of both sides of (2.1) are given by the inequality
X EŒX
EŒI D Pr.X a/ E D ;
a a
Theorem 2.2. Chebyshev’s inequality. For any a > 0 and a random variable X ,
we have
VarŒX
Pr .jX EŒX j a/ :
a2
18 2 Fundamentals of Statistical Analysis
and the random variable .X EŒX /2 > 0. Use Markov’s inequality and the
definition of variance to obtain
2 2
E .X EŒX /2 VarŒX
Pr .X EŒX / a D
a2 a2
as required. t
u
Corollary 2.1. For any t > 1 and a random variable X , we have
1
Pr jX EŒX j t .X / 2
t
VarŒX
Pr jX EŒX j t EŒX 2 :
t .EŒX /2
Proof. The results follow from the definitions of variance and standard deviation
and Chebyshev’s inequality. t
u
The CDF of the standard normal distribution is denoted with ˚.x/ and can be
computed as an integral of the PDF:
Z x
1 t 2 =2 1 x
˚.x/ D p e dt D 1 C erf p ; x 2 R; (2.6)
2 1 2 2
where erf is the complementary error function.
Definition 2.13. If X is distributed normally with mean and variance 2 , then
the exponential of X Y D exp.X / follows log-normal distribution. That is to say,
a log-normal distribution is a probability distribution of a random variable whose
logarithm is normally distributed.
The PDF and CDF of a log-normal distribution are as follows:
1 .lnx/2
f .xI ; / D p e ; x > 0;
2 2 (2.7)
x 2
1 lnx lnx
FX .xI ; / D erf p D˚ : (2.8)
2 2
More details about the sum of multiple log-normal distribution is given in Sect. 4
of Chap. 2.
20 2 Fundamentals of Statistical Analysis
Proof. We use induction on the number of random variables. For the base case, let
X and Y be random variables. Use the law of total probability to get
XX
EŒX C Y D .i C j / Pr ..X D i / \ .Y D j //
i j
XX
D i Pr ..X D i / \ .Y D j //
i j
XX
C j Pr ..X D i / \ .Y D j //
i j
X X
D i Pr ..X D i / \ .Y D j //
i j
X X
C j Pr ..X D i / \ .Y D j //
j i
X X
D i Pr.X D i / C j Pr.Y D j /
i j
D EŒX C EŒY :
t
u
Linearity of expectations holds for anyP
collection of random variables, even if
they are not independent. Furthermore, if 1 i D1 E ŒjXi j converges, then it can be
shown that
1 Basic Concepts in Probability Theory 21
" 1
# 1
X X
E Xi D E ŒXi :
i D1 i D1
EŒcX D c EŒX :
D c EŒX
as required. t
u
If X and Y are two random variables, their covariance is
Cov.X; Y / D E Œ.X EŒX /.Y EŒY /
D E Œ.Y EŒY /.X EŒX /
D Cov.Y; X /:
Proof. Use the linearity of expectations, and the definitions of variance and
covariance, to obtain
VarŒX C Y D E .X C Y EŒX C Y /2
D E .X C Y EŒX EŒY /2
D E .X EŒX /2 C .Y EŒY /2
C 2.X EŒX /.Y EŒY /
D E .X EŒX /2 C E .Y EŒY /2
C 2 E Œ.X EŒX /.Y EŒY /
D VarŒX C VarŒY C 2 Cov.X; Y /
as required. t
u
22 2 Fundamentals of Statistical Analysis
Theorem 2.4 can be extended to a sum of any finite number of random variables.
For a collection X1 ; : : : ; Xn of random variables, it can be shown that
" #
X X XX
Var Xi D VarŒXi C 2 Cov.Xi ; Xj /:
i i i j >i
Theorem 2.5. For any two independent random variables X and Y , we have
Proof. Let the indices i and j assume all values in the ranges of X and Y ,
respectively. As X and Y are independent random variables, then
XX
EŒX Y D ij Pr ..X D i / \ .Y D j //
i j
XX
D ij Pr.X D i / Pr.Y D j /
i j
" #2 3
X X
D i Pr.X D i / 4 j Pr.Y D j /5
i j
D EŒX EŒY
as required. t
u
Corollary 2.2. For any independent random variables X and Y , we have
Cov.X; Y / D 0
and
VarŒX C Y D VarŒX C VarŒY :
Proof. As X and Y are independent, then so are X EŒX and Y EŒY . For any
random variable Z, we have
as required. t
u
Definition 2.15. For a collection of random variables, X D X1 ; : : : ; Xn , the
covariance matrix ˝nn is defined as
0 1
Var.X1 / Cov.X1 ; X2 / : : : Cov.X1 ; Xn /
B C
B Cov.X2 ; X1 / Var.X1 / : : : Cov.X2 ; Xn / C
B C
B : :: :: C
˝DB :: : ::: : C
B C
B C
@ Cov.Xn1 ; X1 / Cov.Xn1 ; X2 / : : : Cov.Xn1 ; Xn / A
Cov.Xn ; X1 / Cov.Xn ; X2 / ::: Var.Xn /
In general, process variation can be classified into two categories [13]: inter-die
and intra-die. Inter-die variations are variations from die to die, while intra-die
variations correspond to variability within a single chip. Inter-die variations are
global variables and, hence, affect all the devices on a chip in the similar fashion. For
example, it can cause channel lengths of all the devices on the same chip smaller.
Intra-die variations may affect devices differently on the same chip. For example, it
can cause some devices with smaller gate oxide thicknesses and others with larger
gate oxide thicknesses. The intra-die variations may exhibit spatial correlation. For
example, it is more likely for devices located close to each other to have similar
characteristics.
24 2 Fundamentals of Statistical Analysis
Gate3
Gate5
Gate2
Gate4
where ıinter and ıintra represent the inter-die variation and intra-die variation,
respectively. In general [13, 95, 169], ıinter and ıintra can be modeled as Gaussian
random variables with normal distribution. In this chapter, we will discuss both
Gaussian and non-Gaussian cases. Note that due to global effect of inter-die
variation, single random variable ıinter is used for all gates/grids in one chip.
For ıintra , the value of parameter p located at .x; y/ can be modeled as normally
distributed random variable [101] dependent on location:
p D p C ıx C ıy C ; (2.10)
where p is the mean value (nominal design parameter value) at .0; 0/ and ıx and
ıy stand for gradients of the parameter indicating the spatial variations of p along x
and y directions, respectively. represents the random intra-chip variation. Due to
spatial correlations in the intra-chip variation, the vector of all random components
across the chip has a correlated multivariate normal distribution, N.0; †/,
where † is the covariance matrix of the spatially correlated parameters.
A grid-based method is introduced by [13] for the consideration of correlation. In
the grid-based
p method,
p the intra-die spatial correlation of parameters is partitioned
into n row n col D n grids. Since devices close to each other are more
likely to have similar characteristics than those placed far away, grid-based methods
assume a perfect correlation among the devices in the same grid, high correlations
among those in close grids, and low to zero correlations in faraway grids. For
example, in Fig. 2.1, Gate1 and Gate2 have sizes shown to be exaggeratedly large.
They are located in the same grid square, and hence, their parameter variations such
2 Multiple Random Variables and Variable Reduction 25
as the variations of their gate channel length are assumed to be always identical.
Gate1 and Gate3 lie in neighboring grids, and hence, their parameter variations
are not identical but highly correlated due to their spatial proximity. For example,
when Gate1 has a larger than nominal gate channel length, Gate3 is more likely
to have a larger than nominal gate channel length. On the other hand, Gate1 and
Gate4 are far away from each other; their parameters can be assumed as weakly
correlated or uncorrelated. For example, when Gate1 has a larger than nominal gate
channel length, the gate channel length for Gate4 may be either larger or smaller
than nominal.
With the grid-based model, we can use a single random variable p.x; y/ to model
a parameter variation in a single grid at location .x; y/. As a result, n random
variables are needed for each type of parameter, where each represents the value of
a parameter in one of the n grids. In addition, we assume that correlation only exists
among the same type of parameters in different grids. Note that this assumption is
not critical and can easily be removed. For example, gate length L for transistors
in the i th grid is correlated with those in nearby grids, but is uncorrelated with
other parameters such as gate oxide thickness Tox in any grid including the i th grid
itself. For each type of parameter, a correlation matrix † of size n n represents the
spatial correlation of this parameter. Notice that the number of grid partitions needed
is determined by the process, not the circuit. So we can apply the same correlation
model to different designs under the same process.
Due to correlation, a large number of random variables involved in VLSI design can
be reduced. After the random variable decoupling via correlation, one may further
reduce the cost of statistical analysis by the spectral stochastic method as discussed
in Sect. 3. Since the random variables are correlated, this correlation should be
removed before using the spectral stochastic method. In this part, we first present
the theoretical basis for decoupling the correlation of random variables.
Proposition 2.1. For a set of zero-mean Gaussian-distributed variables whose
covariance matrix is ˝, if there is a matrix L satisfying ˝ D LLT , then can
be represented by a set of independent standard normal distributed variables as
D L.
Proof. According to the characteristics of normal distribution, linear transformation
does not impact on the zero mean of the variables and yield another normal
distribution. Thus, we only need to prove the covariance matrix remains unchanged
during the transformation. According to the definition of covariance,
cov.L/ D E L.L/T D LE T LT : (2.11)
26 2 Fundamentals of Statistical Analysis
Note that the solution for decoupling is not unique. For example, Cholesky
decomposition can be used to seek L since the covariance matrix ˝ is always a
semipositive definite matrix. However, Cholesky decomposition cannot reduce the
number of variables. PFA [74] can substitute Cholesky decomposition when variable
reduction is needed. Eigendecomposition on the covariance matrix yields
p p
˝ D LLT ; L D 1 e1 ; :::; n en ; (2.13)
where f i g are eigenvalues in order of descending magnitude, and fei g are corre-
sponding eigenvectors. PFA reduces the number of components in by truncating
L using the first k items.
The error of PFA can be controlled by k:
P
n
i
i DkC1
err D ; (2.14)
P
n
i
i D1
where bigger k leads to a more accurate result. PFA is efficient, especially when the
correlation length is large. In our experiments, we set the correlation length being
eight times the width of wires. As a result, PFA can reduce the number of variables
from 40 to 14 with an error of about 1% in an example with 20 parallel wires.
One idea is to consider the importance of the outputs during the reduction process
when using PFA. Recently, the weighted PFA (wPFA) technique has been used [204]
to obtain variable reduction efficiency.
If a weight is defined for each physical variable i , to reflect its impact on the
output, then a set of new variables are formed:
D W ; (2.15)
2 Multiple Random Variables and Variable Reduction 27
The error controlling process is similar to (2.14) but uses the weighted eigenval-
ues i .
We first briefly review the concept of principal component analysis (PCA), which is
used here to transform the random variables with correlation to uncorrelated random
variables [75].
Suppose that x is a vector of n random variables, x D Œx1 ; x2 ; :::; xn T , with
covariance matrix ˝ and mean vector x D Œx1 ; x2 ; :::; xn . To find the
orthogonal random variables, we first calculate the eigenvalue and corresponding
eigenvector. Then, by ordering the eigenvectors in descending order eigenvalues,
the orthogonal matrix A will be obtained. Here, A is expressed as
T
A D e1T ; e2T ; :::; enT ; (2.18)
and
i < i 1 ; i D 2; 3; :::; n: (2.20)
With A, we can perform the transformation to get orthogonal random variables y,
y D Œy1 ; y2 ; :::; yn T by using
y D A.x x /; (2.21)
28 2 Fundamentals of Statistical Analysis
A1 D AT : (2.23)
x D AT y C x : (2.24)
Monte Carlo techniques [41] are usually used to estimate the value of a definite,
finite-dimensional integral of the form
Z
GD g.X /f .X /dX; (2.25)
S
X
MC
GM C D .1=M C / g.Xi /: (2.26)
i D1
The estimator GM C above is a random variable. Its mean value is the integral G
to estimate, i.e., E.GM C / D G, making it an unbiased estimator. The variance of
GM C is Var.GM C / D 2 =M C , where 2 is the variance of the random variable
g.X / given by
Z
D g 2 .X /f .X /dX G 2 :
2
(2.27)
S
3 Statistical Analysis Approaches 29
P G 1:96 p GM C G C 1:96 p 0:95; (2.28)
MC MC
One recent advance in fast statistical analysis is to apply stochastic OPC [187] to the
nanometer-scale integrated circuit analysis. Based on the Askey scheme [196], any
stochastic random variable can be represented by OPC, and the random variable
with different probability distribution type is associated with different types of
orthogonal polynomials.
Hermite polynomial chaos (Hermite PC or HPC) utilizes a series of orthogonal
polynomials (with respect to the Gaussian distribution) to facilitate stochastic
analysis [197]. These polynomials are used as the orthogonal base to decompose
a random process in a similar way that sine and cosine functions are used to
decompose a periodic signal in a Fourier series expansion. Note that for the
Gaussian and log-normal distributions, Hermite polynomial is the best choice as
they lead to exponential convergence rate [45]. For non-Gaussian and non-log-
normal distributions, there are other orthogonal polynomials such as Legendre for
uniform distribution, Charlier for Poisson distribution, and Krawtchouk for binomial
distribution [44, 187].
For a random variable y./ with limited variance, where D Œ1 ; 2 ; :::n is
a vector of zero-mean orthogonal Gaussian random variables, the random variable
can be approximated by truncated Hermite PC expansion as follows [45]:
X
P
y./ D ak Hkn ./; (2.30)
kD0
30 2 Fundamentals of Statistical Analysis
X
P
v.t; / D ak Hkn ./: (2.32)
kD0
H01 ./ D 1; H11 ./ D ; H21 ./ D 2 1; H31 ./ D 3 3; ::: : (2.33)
hy./; Hk ./i
ak D ; (2.36)
hHk2 ./i
hv.t; /; Hk ./i
ak .t/ D ; 8k 2 f0; :::; P g: (2.37)
hHk2 ./i
Once we obtain the Hermite PC, we can calculate the mean and variance of
random variable y./ by one-time analysis as (one Gaussian variable case):
E.y.// D y0
Var.y.// D y12 Var.1 / C y22 .t/Var 12 1
D y12 C 2y22 : (2.38)
3 Statistical Analysis Approaches 31
Similarly, for random process v.t; / (one Gaussian variable case), the mean and
variance are as follows:
One critical problem remains so far is how to obtain the coefficients of Hermite
PC in (2.36) and (2.37) efficiently. There are two kinds of techniques to calculate
the coefficients of Hermite PC in (2.36) and (2.37), which are collocation-based
spectral stochastic method and Galerkin-based spectral stochastic method. In short,
we classify in the later part of the book as collocation-based and Galerkin-based
methods.
and weights [126]. With this method, the number of quadrature points needed for
n dimensions at level P is about .P C 1/n , which is well known as the curse of
dimensionality.
Smolyak quadrature [126], also known as sparse grid quadrature, is used as an
efficient method to reduce the number of quadrature points. Let us define a one-
dimensional sparse grid quadrature point set 1P D fi ; 2 ; :::; P g, which uses
P C 1 points to achieve degree 2P C 1 of exactness. The sparse grid for an n-
dimensional quadrature at degree P chooses points from the following set:
Pn
where jij D j D1 ij . The corresponding weight is
P Cnjij n1
wij1i :::i:::jn in D .1/ i
˘ wm ; (2.42)
1 n C P jij m jim
n1
where is the combinatorial number and w is the weight for the
n C P jij
corresponding quadrature points. It has been shown that interpolation on a Smolyak
grid ensures a bound for the mean-square error [126]
jEP j D O NPr .logNP /.rC1/.n1/ ;
where NP is the number of quadrature points and r is the order of the maximum
derivative that
exists
for the delay function. The number of quadrature points
nP
increases as O .P /Š .
It can be shown that a sparse grid at least with level P is required for an order P
representation. The reason is that the approximation contains order P polynomials
for both y./ and Hj ./. Thus, there exists y./Hj ./ with order 2P , which
requires a sparse grid of at least level P with an exactness degree of 2P C 1.
Therefore, level 1 and level 2 sparse grids are required for linear and quadratic
models, respectively. The number of quadrature points is about 2n for the linear
model and 2n2 for the quadratic model. The computational cost is about the same as
the Taylor-conversion method, while keeping the accuracy of homogeneous chaos
expansion.
In addition to the sparse grid technique, we can also employ several accelerating
techniques. Firstly, when n is too small, the number of quadrature points for sparse
grid may be larger than that of direct tensor product of a Gaussian quadrature. For
example, if there are only two variables, the number is 5 and 15 for level 1 and 2
sparse grid, compared to 4 and 9 for direct tensor product. In this case, the sparse
grid will not be used. Secondly, the set of quadrature points (2.41) may contain the
same points with different weights. For example, the level 2 sparse grid for three
variables contains four instances of the point (0,0,0). Combining these points by
summing the weights reduces the computational cost of y.i /.
4 Sum of Log-Normal Random Variables 33
The Galerkin-based method is based on the principle of orthogonality that the best
approximation of y./ is obtained when the error, ./, defined as
where Hk ./ are Hermite polynomials. In this way, we have transformed the
stochastic analysis process into a deterministic form, whereas we only need to
compute the corresponding coefficients of the Hermite PC.
For the illustration purpose, considering two Gaussian variable D Œ1 ; 2 , we
assume that the charge vector in panels can be written as a second-order (p D 2)
Hermite PC, we have
y./ D y0 C y1 1 C y2 2 C y3 .12 1/ C
y4 .22 1/ C y5 .1 2 /; (2.45)
which will be solved by (2.44). Once the Hermite PC of y./ is known, the mean
and variance of y./ can be evaluated trivially. Given an example, for one random
variable, the mean and variance are calculated as
E.y.// D y0 ;
Var.y.// D y12 Var./ C y22 Var. 2 1/
D y12 C 2y22 : (2.46)
Let g./ be the Gaussian random variable and l./ be the random variable obtained
by taking the exponential of g./,
For a log-normal random variable Il , let the mean and the variance of g./ as g
and g2 , then the mean and variance of l./ are
g2
g C 2
l D e ; (2.48)
h i
l2 D e. 2g Cg2 / e g2
1 ; (2.49)
respectively.
A general Gaussian variable g./ can always be represented in the following
affine form:
Xn
g./ D i gi ; (2.50)
i D0
where i are orthogonal Gaussian variables. That is, hi j i D ıij , hi i D 0, and
0 D 1 and gi is the coefficient of the individual Gaussian variables. Note that
such form can always be obtained by using Karhunen–Loeve orthogonal expansion
method [45].
In our problem, we need to represent the log-normal random variable l./ by
using the Hermite PC expansion form:
X
P
l./ D lk Hkn ./; (2.51)
kD0
g2
where l0 D exp g C 2 . To find the other coefficients, we can apply (2.36) on
l./. Therefore, we have
In this case, D Œ1 . For the second-order Hermite PC (P D 2/, following (2.54),
we have
1
l./ D l0 1 C g 1 C g2 12 1 : (2.55)
2
Hence, the desired Hermite PC coefficients, I0;1;2 , can be expressed as l0 ; l0 g ,
and 12 l0 g2 , respectively.
g./ D g C 1 1 C 2 2 : (2.56)
Note that
h.i j ıij /2 i D hi2 j2 i D hi2 ihj2 i D 1:
Therefore, the expansion of the log-normal random variables using second-order
Hermite PCs can be expressed as
2
l./ D l0 1 C 1 1 C 2 2 C 1 .12 1/
2
2
2 2
C .2 1/ C 21 2 1 2 ; (2.57)
2
where
1 2 1 2
l D l0 D exp g C 1 C 2 :
2 2
36 2 Fundamentals of Statistical Analysis
X
4
g D g C i i : (2.58)
i D1
0 1
X
4 X 4
1 2 2 X 4 X 4
l./ D l0 @1 C i i C i 1 i C i j i j C A ;
i D1 i D1
2 i D1 j D1
(2.59)
where !
1X 2
4
l D l0 D exp 0 C :
2 i D1 i
Hence, the desired Hermite PC coefficients can be expressed using the equation
(2.59) above.
5 Summary
1 Introduction
Process-induced variability has huge impact on the circuit performance in the sub-
90 nm VLSI technologies [120]. This is the particular case for leakage power,
which has increased dramatically with the technology scaling and is becoming the
dominant chip power dissipation [71].
Leakage power and its proportion in chip power dissipation have increased
dramatically with technology scaling [71]. The dominant factors in leakage currents
are subthreshold leakage currents Isub and gate oxide leakage currents Igate .
Subthreshold leakage currents rapidly increase for every technology generation
(about 5 to 10 increase per generation [24]) and are highly sensitive to threshold
voltage V th variations owing to the exponential relationship between Isub and V th.
On the other hand, as gate oxide thickness, Tox , scales down, Igate grows rapidly as
Igate has an exponential dependence on Tox .
Both leakage currents are highly sensitive to process variations due to the
exponential relation between the leakage current and variational parameters like
effective channel lengths. As process-induced variability becomes more pronounced
in the deep submicro regime [120], leakage variations become more significant,
and traditional worst-case-based approaches will lead to extremely pessimistic and
expensive overdesigned solutions. Statistical estimation and analysis of leakage
powers considering process variability are critical in various chip design steps to
improve design yield and robustness. In the leakage estimation model, we can obtain
the chip-level leakage statistics such as the mean value and standard deviation from
process information, library information, and design information.
Many methods have been proposed for the statistical model of chip-level leakage
current. Early work in [169] gives the analytic expressions of mean value and
variance of leakage currents of CMOS gates considering only subthreshold leakage.
The method in [119] provides simple analytic expressions of leakage currents of the
whole chip considering global variations only. The method in [192] uses third-order
Hermite polynomials without considering spatial correlations and only calculates
the mean value of full-chip leakage current. In [114], reverse biased source/drain
junction BTBT (band-to-band tunneling) leakage current is considered, in addition
to the subthreshold leakage currents, for estimating the mean values and variances of
the leakage currents of gates only. In [142], the PDF of stacked CMOS gates and the
entire chip are derived considering both inter-die and intra-die variations. In [14],
a hardware-based statistical model of dynamic switching power and static leakage
power was presented, which was extracted from experiments in a predetermined
process window.
Chip-level SLA methods can be classified into different categories based on
different criteria as shown in Table 3.1. Our classification and survey may not
be complete as this is still an active research field and more efficient methods
will be developed in the future. We will present in detail some recent important
developments in the section such as Monte Carlo method and the traditional grid-
based method [13]. The gate-based spectral stochastic method [155] and the virtual
grid-based method will be introduced in Chap. 4 and Chap. 5, respectively. We
remark that our limited coverage of the other methods, which are presented in
minimal detail, does not diminish the value of their contributions.
This chapter is structured as follows. In Sect. 2, we discuss the static leakage
model for one gate/MOSFET, and then Sect. 3 gives the process variation models for
computing statistical information of full-chip leakage current. Section 4 presents the
recently proposed chip-level statistical leakage modeling and analysis works. The
chapter concludes with a summary and brief discussion of potential future research.
Full-chip leakage current has two components, subthreshold leakage current and
gate leakage current. Here we describe the empirical models for both of them, based
on the assumption that the leakage current under process variations is estimated
under log-normal distributions.
2 Static Leakage Modeling 41
where a1 through a5 are the fitting coefficients for each unique input combination
of a gate. Then we can use a LUT to store the fitting parameters. For a k-input gate,
the size of the LUT is 2k 10 as we have two equations for each input combination,
and each equation has 10 fitting parameters. While in [13], they only keep dominant
states for leakage current, i.e., only one “off” transistor in a series transistor stack.
However, with technology down scaling to 45 nm, this is not the practical case. The
Isub based on the model in (3.1) still has a large error compared to the simulation
results. Hence, the authors in [155] keep all the states.
After choosing sampling points for L and Tox in their 3 regions linearly, and
then conducting SPICE simulation at each point, the subthreshold leakage current
is stored as the original curve. We can then perform the curve fitting process.
Figures 3.1 and 3.2 show the curve fitting results of Isub and Igate for four input
patterns in the AND2 gate. Here, 100 points are chosen linearly in the 3 regions
for L and Tox . These figures show that the curves fit the SPICE results very well,
and the currents in the four cases are comparable with each other. Since there is no
“dominant state,” all of them need to be considered.
Table 3.2 shows the errors compared with industry SPICE simulation results
for the AND2 gate for Isub . Max Err. is the maximum error given by one input
combination, and Avg Err. refers to the average error over all the input patterns. If
we add more terms into (3.1) as shown in Table 3.2, we can reduce the errors from
8% to about 3%. After we obtain the analytic expression for each input combination,
we take the average of the leakage currents of all the input combinations to arrive
final analytic expression for each gate in lieu of the dominant states used in [13].
Based on this model, the leakage current of one gate under process variation
can be estimated by log-normal distributions. The average leakage of a gate can be
computed as a weighted sum of leakage under different input states,
42 3 Traditional Statistical Leakage Power Analysis Methods
ln(Isub)
2 3
1 2
0 1
−1 0
0 20 40 60 80 100 0 20 40 60 80 100
Sample Point Index Sample Point Index
4
ln(Isub)
ln(Isub)
4
3
3
2
2 1
1 0
0 20 40 60 80 100 0 20 40 60 80 100
Sample Point Index Sample Point Index
Fig. 3.1 Subthreshold leakage currents for four different input patterns in AND2 gate under 45 nm
technology
avg
X
Isub D Pj Isub;j ; (3.3)
j 2 input states
avg
X
Igate D Pj Igate;j ; (3.4)
j 2 input states
X avg avg
Ileak; chip D Isub;i C Igate;i ; (3.5)
8gates i D1;:::;n
where Pj is the probability of input state j ; Isub;j and Igate;j are the subthreshold
leakage and the gate oxide leakage at input state j , respectively. n is the total number
of gates in the circuit. The interaction between these two leakage mechanisms is
included in total leakage estimation.
Since all the leakage components can be approximated as a log-normal distribu-
tion, we can simply sum up the distributions of the log-normals for all gates to get
the full-chip leakage distribution. Note that there exist spatial correlations, and the
2 Static Leakage Modeling 43
ln(Igate)
2 2
0 0
−2 −2
0 50 100 0 50 100
Sample Point Index Sample Point Index
ln(nA) Input Patern 2 ln(nA) Input Patern 3
6 8
Spice Spice
4 Curve−fitting 6 Curve−fitting
ln(Igate)
ln(Igate)
4
2
2
0 0
−2 −2
0 50 100 0 50 100
Sample Point Index Sample Point Index
Fig. 3.2 Gate oxide leakage currents for four different input patterns in AND2 gate under 45 nm
technology
leakage distributions of any two gates may be correlated. Therefore, the full-chip
leakage current is calculated by a sum of correlated log-normals:
X
p
SD eYi ; (3.6)
i D1
A B
Leff
W
Source Drain
Like in [96], sometimes the statistical model for the subthreshold leakage current is
formulated in a MOSFET. Here, we only discuss the formulation method developed
for NMOS transistors, then the method can be easily extended to PMOS transistors.
Here Isub of one MOSFET is formulated, and the Leff for nonrectilinear transistor is
developed.
The leakage current of a ideal transistor can be expressed as a function of
Leff [65]. The curve-fitted leakage model considering narrow width effect is shown
in (3.7),
p
˛sub qsi Ncheff .W 2 C ˛W W /
Isub D
.Vds2 C ˛ds1 Vds C ˛ds2 /exp.˛L1 L2eff C ˛L2 Leff /
A B
2 exp exp
A0 B0
Vds Vgs Vthlin
1 exp exp ; (3.7)
VT nVT
where all ˛s are fitting parameters, si is the dielectric constant of Si, Ncheff is the
effective channel doping concentration, and A and B are layout parameters as shown
in Fig. 3.3.
When high-k techniques are used to better insulate the gate from the channel
for sub-65-nm technologies, gate oxide tunneling effect has been moderated and
controlled [96]. In this case, Igate is less important than Isub .
A real gate structure under sub-90-nm technology is with rough edge (nonrec-
tilinear), which can be translated into an equivalent single transistor with effective
gate channel length Leff . As shown in Fig. 3.4, a nonrectilinear gate can be divided
into several slices of subgate, each of which has its own length and shares same
characteristic width W0 along the width direction. In this way, the leakage current of
one nonrectilinear gate IG can be approximated as the sum of the leakage currents
of all the slices along the width direction:
3 Process Variational Models for Leakage Analysis 45
W0
Li W
W
Leff
Fig. 3.4 Procedure to derive the effective gate channel length model
X
M
IG D Ij .Lj ; W0 / D I.Leff ; W /; (3.8)
j D1
where W is the width of the gate and each slice is a regular gate. Under this frame,
supposing we have M slices along the width direction, then we have
PM
j D1 Lj
D ; (3.9)
M
qP
M
j D1 .Lj /2
D : (3.10)
M
The Leff for the equivalent gate can be calculated by
W
Leff D Lmin C ˛ln ; (3.11)
W0
In this section, we present the process variation for computing variational leakage
currents. Process variation occurs at different levels: wafer level, inter-die level,
and intra-die level. Furthermore, they are caused by different sources such as
lithography, materials, aging, etc. [7]. Some of the variations are systematic, i.e.,
those caused by the lithography process [42, 129]. Some are purely random, i.e., the
46 3 Traditional Statistical Leakage Power Analysis Methods
doping density of impurities and edge roughness [7]. In this section, we introduce
different kinds of process variations first, and then the process variational model for
leakage analysis.
The main process parameter to have a big impact on leakage current is the
transistor threshold voltage V th. V th is observed to be the most sensitive to the
effective gate channel length L and gate oxide thickness Tox . The ITRS [71]
indicates that the gate channel length variation is a primary factor for device
parameter variation, and the number of dopants in channel results in an unacceptably
large statistical variation of the threshold voltage. Therefore, we must consider
the variations in L and Tox , since leakage current is most sensitive to these
parameters [13]. To reflect reality, we model spatial correlations in the gate channel
length, while the gate oxide thickness values for different gates are taken to be
uncorrelated.
Here we list an example of detailed parameters for gate channel length and gate
oxide thickness variations for under 45 nm technology in Table 3.3. As indicated
in the second column, we can decompose each parameter variation into “inter-die”
and “intra-die” variations. For intra-die variation, we further decompose it into with
and without spatial correlation. In most cases, these variations can be modeled by
Gaussian distributions [33, 178]. The total variance ( 2 ) is computed by summing
up the variances of all components, since the sum of Gaussian distributions is still a
Gaussian distribution.
Under inter-die variation, if the leakage currents of all gates or devices are
sensitive to the process parameters in similar ways, then the circuit performance
can be analyzed at multiple process corners using deterministic analysis methods.
However, statistical methods must be used to correctly predict the leakage if
intra-die variations are involved. As leakage current varies exponentially with
these parameters, simple use of worst-case values for all parameters can result in
exponentially larger leakage estimates than the nominal values which are actually
obtained, which is too inaccurate to be used in practical cases.
Electrical measurements of a full wafer show that the intra-die gate channel
length variation has strong spatial correlation [42]. This implies that devices that
are physically close to each other are more likely to be similar than those that are far
apart. Therefore, the intra-die variation of gate channel lengths is modeled based on
such kind of correlation. There are several different models that can represent this
kind of spatial correlations. Take the exponential model [195] for instance,
.r/ D er
2 =2
(3.12)
3 Process Variational Models for Leakage Analysis 47
where r is the distance between two panel centers and is the correlation length.
We notice that the strong spatial correlation suggested by (3.12) has been exploited
by [13] to speed up the calculation, where the full-chip is divided into N grids
and the correlated random variables are perfectly correlated in a grid. The strong
spatial correlation is explored naturally by grid-based method or PCA (for Gaussian
distributions) or independent component analysis (for non-Gaussian distributions),
which can transfer the correlated random variables into independent ones with
reduced numbers. Details will be given in the next section. For gate oxide thickness,
Tox , strong spatial correlation does not exist; therefore, we assume Tox of different
gates are uncorrelated.
The last column of Table 3.3 shows the standard deviation () of each variation.
According to statistical theory regarding Gaussian distributions, 99% of the samples
should fall in the range of ˙3. According to [71], the physical gate channel length
for high-performance logic in 45 nm technology will be 18 nm, and the physical
variation should be controlled within ˙12%. Therefore, we let 3 be 12%, and a
similar analysis can be done for Tox .
For a gate/module in a chip with gate channel length L, and process variation
L using our model parameters in Table 3.3, we have
where L is the nominal design parameter value, and Linter is constant for all gates
in all grids since it is a global factor that applies to the entire chip. For one chip
sample, we only need to generate it once. Lintra corr is different between each gate
or each grid and has spatial correlation. Therefore, we generate one value for each
gate/grid, and the spatial correlation is regarded as an exponential model in (3.12),
so that the correlation coefficient value diminishes with the distance between any
two gates/grids.
As for the gate oxide thickness Tox , using model parameters in Table 3.3, we have
Tox D ox C Tox ; Tox D Tox; inter C Tox; intra uncorr ; (3.14)
where ox is the nominal design parameter value. Due to similar reason as Linter ,
Tox; inter is constant for all gates in all grids. Tox; intra uncorr is different between
any gates/grids, but does not have spatial correlation.
After the process variations are modeled as correlated distributions, we can apply
the PCA in Sect. 2.2 of Chap. 2 to decompose correlated Gaussian distributions into
independent ones. After PCA, the process variations (e.g., V th, Tox and L) of
each gate can be modeled as
where the vector XG;i D ŒxG;i;1 ; xG;i;2 ; : : :T stands for the parameter
variations of the i th gate. E D Œ"1 ; "2 ; : : : ; "m T represents the random variables
for modeling both inter-die and intra-die variations of the entire die. Here
f"1 ; "2 ; : : : ; "m g can be extracted by PCA. They are independent and satisfy the
standard Gaussian distribution (i.e., zero mean and unit standard deviation). m is
the total number of these random variables. For practical industry designs, m is
typically large (e.g., 103 106 ). VG;i captures the correlations among the random
variables.
When m is a large number, the size of VG;i can be extremely huge. However, XG;i
only depends on the intra-die variations within its neighborhood; so VG;i should be
quite sparse. In Sect. 4, the gate-based spectral stochastic method and the projection-
based method will use this sparsity property to reduce the computational cost in two
different ways.
Gate-based statistical leakage analysis typically starts from the leakage modeling
for one gate,
Ileak;i D f .E/; (3.16)
where Ileak;i represents the total leakage current of the i th gate. Different models
can be chosen here to represent the relationship between E and Ileak;i . For example,
quadratic models are used to guarantee accuracy:
where Aleak;i 2 Rmm ; Bleak;i 2 Rm ; and Cleak;i 2 R are the coefficients. More
details will be given in the next section.
Given the leakage models of all the individual gates, the full-chip leakage current
is the sum of leakage currents of all the gates on the chip:
where n is the total number of gates in a chip. If we choose the quadratic model
in (3.17) and (3.18) implies that the full-chip leakage current is the sum of many
log-normal distributions. As we mentioned before, it can be approximated as a
log-normal distribution [13]. Therefore, we can also use a quadratic model to
approximate the logarithm of the full-chip leakage:
where AChip 2 Rmm ; BChip 2 Rm ; and CChip 2 R are the coefficients. In (3.17) and
(3.19), the quadratic coefficient matrices AGatei and AChip can be extremely large for
capturing all the intra-die variations, which makes the quadratic modeling problem
extremely expensive in practical applications. Several approaches have been made
to reduce the size of the model, with more details shown in the next section.
4 Full-Chip Leakage Modeling and Analysis Methods 49
Full-chip statistical leakage modeling and analysis methods can be classified into
different categories based on different criteria as shown in Fig. 3.1. In this section,
we will present in detail the three important methods: MC method, the traditional
grid-based method, and project-based method.
Monte Carlo technique mentioned in Sect. 3.1 of Chap. 2 can be used to estimate
the value of leakage power at gate level as well as chip level.
For full-chip leakage current, Ileak; Chip is G in (2.25). If the sample number M C
is large enough, then we can obtain a sufficiently accurate result. However, for full-
chip leakage current analysis, the MC estimator is too expensive. A more efficient
method with good accuracy is needed.
Several techniques exist for improving the accuracy of Monte Carlo evaluation
of finite integrals. In these techniques, the goal is to construct an estimator with a
reduced variance for a given, fixed number of samples. In other words, the improved
estimator can provide the same accuracy as the standard Monte Carlo estimator,
while needing considerably fewer samples. This is desirable because computing the
value of g.Xi / is typically costly.
Since the number of gates on an entire chip is very large and every gate has their
own variational parameter, the resulting number of random variables is very large.
For greater efficiency, the grid-based method partitions a chip to several grids, and
assigns all the gates on one grid with the same parameters.
A full-chip SLA method considering spatial correlations in the intra-die and
inter-die variations was proposed [13]. This method introduces a grid-based par-
titioning of the circuits to reduce the number of variables at a loss of accuracy. A
projection-based approach has been proposed in [95] to speed up the leakage anal-
ysis, where Krylov-subspace-based reduction has been performed on the coefficient
matrices of second-order expressions. This method assumes independent random
variables after a preprocessing step such as PCA. However, owing to the large
number of random variables involved (103 to 106 ), the PCA-based preprocess can be
very expensive. Work in [65] proposes a linear-time complexity method to compute
the mean and variance of full-chip leakage currents by exploiting the symmetric
property of one existing exponential spatial correlation formula. The method only
considers subthreshold leakage, and it requires the chip cells and modules to be
50 3 Traditional Statistical Leakage Power Analysis Methods
partitioned into a regular grid with similar uniform fitting functions, which is
typically impractical. In this work, both subthreshold leakage and gate oxide leakage
of only dominant input states are considered in (3.4). Here we consider only intra-
die variation of parameters. The extension to handling inter-die variation is quite
obvious, as shown at the end of this subsection.
As shown in (3.6), the total leakage current of a chip is the sum of correlated
leakage components, which can be approximated P as a log-normal using Wilkinson’s
method [2]. A sum of t log-normals, S D ti D1 eYi , is approximated as the log-
normal eZ , where Z D N.z ; z /. In Wilkinson’s approach, the mean value and
standard
P deviation of Z are obtained by matching the first two moments, u1 and u2 ,
of ti D1 eYi as follows:
X
t
eyi Cyi =2 ;
2
u1 D E.S / D ez Cz =2 D
2
(3.20)
i D1
X
t
e2yi C2yi
2
2 2z C2z2
u2 D E.S / D e D
i D1
t 1
X X
t
C2 eyi Cyj e.y2i C y2j C 2rij yi yj /=2; (3.21)
i D1 j Di C1
where rij is the correlation coefficient of Yi and Yj . Solving (3.21) for z and z
yields
1
z D 2 ln u1 ln u2 ; (3.22)
2
z2 D ln u2 2 ln u1 : (3.23)
From the above formula, we can see that a pair-by-pair computation for all
correlated pairs of variables needs to be done, i.e., for all i , j such that rij D 0.
It will lead to a very expensive computation time cost. First, leakage currents of
different gates are correlated because of the spatial correlation of L. Secondly, Isub
and Igate associated with the same NMOS transistor are correlated. Thirdly, Isub
in the same transistor stack are also correlated. If there are N gates in the circuit,
the complexity for computing the sum will be O.N 2 /, which is far from practical
for large circuits. Therefore, the grid-based method uses several approximations
to reduce the time complexity. In the grid-based method, gates in the same grid
have the same parameter values. For example, let Isub;i be the subthreshold leakage
currents for Gatei (i D 1; : : : ; t) under the same input vector, and assume that these
gates are all in the same grid k. Then
0 Cˇ dL Cˇ dT
Isub;i D ˛i eYi 0 k 1 ox;i
; (3.24)
4 Full-Chip Leakage Modeling and Analysis Methods 51
where ˛i , ˇ0 , and ˇ1 are the fitting coefficients. Since we assume that L is spatially
correlated and Tox is uncorrelated, all of the Isub;i in the same grid should use the
same variable dLk and different dTox values. Then, the sum of the leakage terms
Isub;i in grid k is given by
0 X
t
Cˇ0 dLk
eYi ˛i eˇ1 dTox;i : (3.25)
i D1
Note that the second part of the above expression is a sum of independent log-
normal variables, which is a special case for the sum of correlated log-normal
variables. By using Wilkinson’s method, this can be computed in linear time.
Therefore, for gates of the same type with the same input state in the same grid,
the time complexity is only linear, and we can approximate the sum of leakage
of all gates by a log-normal variable which can be superposed in the original
expression. Similarly, Igate of different gates in the same grid can be calculated
through summation in linear time and can be approximated by a log-normal variable.
Now, if the chip is divided into n grids, we can reduce the number of correlated
leakage components in each grid to a small constant c in their library. As a result, the
total number of correlated log-normals to sum is no more than c n. In general, the
number of grids is set to be substantially smaller than the number of gates in the chip,
which can be regarded as a constant number. Therefore, the complexity required
for the sum of log-normals in the grid-based method is reduced from O.N 2 / to a
substantially smaller constant O.n2 /.
As we discussed before, leakage currents of different gates are correlated due to
spatially correlated parameters such as transistor gate channel length. Furthermore,
Isub and Igate are correlated within the same gate. In addition, leakage currents under
different input vectors of the same gate are correlated because they are sensitive to
the same parameters of the gate, regardless of whether or not these are spatially
correlated. We must carefully predict the distribution of total leakage in the circuit,
and the correlations of these leakage currents must be correctly considered when
they are summed up.
As we mentioned before, the leakage currents that arise from the same leakage
mechanisms in the same grid from the same entry of the LUT are merged into
a single log-normally distributed leakage component to reduce the number of
correlated leakage components to sum. Let I1sum and I2sum be two merged sums,
which correspond to subthreshold leakage and gate oxide leakage components in
the same grid, respectively. These can be calculated as
X
t
Y10 Cˇ0 dL 0
I1sum De ˛i eˇ1 dTox;i D eY1 Cˇ0 dL e ; (3.26)
i D1
0
0 0 X
t
0 0 0
I2sum D e Y2 Cˇ0 dL ˛i0 eˇ1 dTox;i D eY2 Cˇ0 dL e ; (3.27)
i D1
52 3 Traditional Statistical Leakage Power Analysis Methods
where e and e are the log-normal approximations of the sum of independent log-
P P0 0 0
normals, ti D1 ˛i eˇ1 dTox;i and ti D1 ˛i eˇ1 dTox;i in I1sum and I2sum ; respectively, as
described in (3.25).
P P0 0 0
Note that ti D1 ˛i eˇ1 dTox;i and ti D1 ˛i eˇ1 dTox;i may be correlated, since the
same gate could have both subthreshold and gate leakage. Therefore, e and e are
correlated, and we need to derive the correlation between and . Since the Tox
values are independent in different gates, we can easily compute the correlation,
P P0 0 0
cov. ti D1 ˛i eˇ1 dTox;i ; ti D1 ˛i eˇ1 dTox;i / as
P P0 0 0
Since e and e are approximations of ti D1 ˛i eˇ1 dTox;i and ti D1 ˛i eˇ1 dTox;i ;
respectively, it is reasonable to assume that
0 0
1
X
t X
t
ˇi0 Tox;i
0
cov.e ; e / D cov @ ˛i eˇi Tox;i ; ˛i0 e A: (3.31)
i D1 i D1
At the same time, the mean values and standard deviations of and are already
known from the approximations; therefore, the computation of cov. ; / is easily
possible.
We can extend the framework for statistical computation of full-chip leakage
considering spatial correlations in intra-die variations of parameters to handle inter-
die variation. For each type of parameter, a global random variable can be applied
to all gates in the circuit to model the inter-die effect. In addition, this framework
is general and can be used to predict the circuit leakage under other parameter
variations or other leakage components. However, if the Gaussian or log-normal
assumption does not work, we can not use the grid-based method to estimate full-
chip leakage.
5 Summary 53
Recent work in [5] presents a unified approach for statistical timing and leakage
current analysis using quadratic polynomials. However, this method only considers
the long-channel effects and ignores the short-channel effects (ignoring channel
length variables) for the gate leakage models. The coefficients of the orthogonal PC
at gate level are computed directly by the interproduction via the efficient Smolyak
quadrature method. The method also tries to reduce the number of variables via the
moment matching method, which further speeds up the quadrature process at the
cost of more errors.
This projection-based method is used to compute the moments of statistical
leakages via moment matching techniques, which are well developed in the area of
interconnect model order reduction [177]. In the projection-based method, quadratic
models in (3.17) and (3.18) are used to guarantee accuracy. Li et al. [97] proposed
a projection-based approach (PROBE) to reduce the quadratic modeling cost. In
a quadratic model, we need to compute all elements of the quadratic coefficient
matrix, which is the main difficulty. Take Achip in (3.19), for example. In most
real cases, Achip is rank deficient. As a result, this full-rank matrix Achip can be
approximated by another low-rank matrix AQchip if kAchip AQchip kF is minimized.
Here, k kF denotes the Frobenius norm, which is the square root of the sum of
the squares of all matrix elements. Li et al. [97] proved that the optimal rank-R
approximation is
XR
AQchip D T
chipr Pchipr Pchipr ; (3.32)
rD1
5 Summary
suffer from the high computing costs (MC method), or can only work for variations
with strong spatial correlations (grid-based method), or has strong assumption about
parameter variations (no spatial correlation in the projection-based method).
In the following chapters, we show how those problems can be resolved or
mitigated. We will mainly present two statistical leakage analysis methods: the
spectral-stochastic-based method with variable reduction techniques and the virtual
grid-based approach.
Chapter 4
Statistical Leakage Power Analysis by Spectral
Stochastic Method
1 Introduction
orthogonal polynomials of all gates (their coefficients). The spatial correlations are
taken care of by PCA or ICA, and at the same time, the number of random variables
can also be substantially reduced in the presence of strong spatial correlations during
the decomposition process. Numerical examples on the PDWorkshop91 benchmarks
on a 45 nm technology show that the presented method is about 10 times faster than
the recently presented method [13] with constant better accuracy.
Gate3
Gate1
Gate2
Input: standard cell lib, netlist, placement information of design, of L and Tox
Output: analytic expression of the full-chip leakage currents in terms of Hermite
polynomials
1. Generate fitting parameter matrices asub and agate of Isub and Igate in (3.1) and (3.2) for
each type of gates (after SPICE run on each input pattern) (Sect. 2).
2. Perform PCA to transform and reduce the original parameter variables in L into
independent random variables in Lk (Sect. 2.2).
3. Generate Smolyak quadrature point set n2 with corresponding weights.
4. Calculate the coefficients of Hermite polynomial of Isub;k and Igate;k for the final leakage
analytic expression for each gate using (4.9) and (4.10).
5. Calculate the analytic expression of the full-chip leakage current by simple polynomial
additions and calculate leakage , leakage , PDF, and CDF of the leakage current if required.
the gate-level analytic leakage current expressions and covariances. The final part
(step 6) computes the final leakage expressions by simple polynomial additions and
calculates other statistical information.
where n is the total number of gates on the whole chip, and ıLinter and ıTox; inter
represent the inter-die (global) variations. In total, we have 2nC2 random variables.
There exist correlations between L among different gates, represented by the
covariance matrix cov.Li ; Lj / computed by (3.12).
The first step is to perform PCA on L to get a set of independent random variables
L0 D ŒL01 ; L02 ; :::; L0n , where L D PL0 and P D fpij g is the n n principal
component coefficient matrix. In this process, singular value decomposition (SVD)
is used on the covariance matrix, and the singular values are arranged in a decreasing
order, which means that the elements in L0 are arranged in a decreasing weight order.
Then the number of elements in L0 can be reduced by only considering the dominant
part of L0 as ŒL01 ; L02 ; :::; L0k (e.g., the weight should be bigger than 1%), where k
58 4 Statistical Leakage Power Analysis by Spectral Stochastic Method
is the number of reduced random variables. Then every element L0i in L0 can be
represented by orthogonal Gaussian random variable i with normal distribution:
L0i D i C i i ; (4.3)
where i and i are the mean value and standard deviation of L0i . And L can be
represented as
0 1 0 10 1
L1 p11 ::: p1k 1 1
B L2 C B p21 ::: p2k C B C
B C B C B 2 2 C
LDB : CCB : :: :: C B : C C ıLinter : (4.4)
@ :: A @ :: : : A @ :: A
Ln pn1 : : : pnk k k
For ŒTox1 ; Tox2 ; :::; Toxn , ıLinter , and ıTox; inter , we can also represent them by using
the standard Gaussian variables as
where ox;j , L;inter , and ox; inter are independent orthonormal Gaussian random
variables. As a result, we can present L and Tox by k C n C 2 independent
orthonormal Gaussian random variables:
Then the Isub .L; Tox / / Igate .L; Tox / can be modeled as Isub ./ / Igate ./, respec-
tively.
But among the k C n C 2 variables, only k C 2 variables related to the
channel lengths are correlated. In other words, the n variables Tox;i of each gate
are independent. As a result, for the j th gate, we only have k C 3 independent
variables; the corresponding variable vector, g D fg;j g, is defined as
For each gate, we need to present the leakage currents in order-2 Hermite polynomi-
als first as shown below for both subthreshold and gate leakage currents—Isub . g;j /
and Igate . g;j /:
2 Flow of Gate-Based Method 59
X
P X
P
Isub . g;j / D Isub;i;j Hi2 . g;j /; Igate . g;j / D Igate;i;j Hi2 . g;j /; (4.8)
i D0 i D0
where Hi2 . g;j /s are order-2 Hermite polynomials. Isub;i;j and Igate;i;j are then
computed by the numerical Gaussian quadrature method discussed in Sect. 3.3 of
Chap. 2. Let S be the size of Z-dimensional second-order (level 2) quadrature point
set Z2 and Z D k C 3. Then Isub;i and Igate;i can be computed as the following:
X
S
Isub;i;j D Isub .l /Hi2 .l /wl =hHi2 . g;j /i; (4.9)
lD1
X
S
Igate;i;j D Igate .l /Hi2 .l /wl =hHi2 . g;j /i; (4.10)
lD1
where Isub .l / and Igate .l / are computed using (3.1) and (3.2).
As a result, their coefficients for i th Hermite polynomial at j th gate can be added
directly as X X
Ileakage;i;j D Isub;i;j C Igate;i;j : (4.11)
After the leakage currents are calculated for each gate, we can proceed to compute
the leakage current for the whole chip as follows:
X
n
Ileakage ./ D .Isub . g;j / C Igate . g;j //: (4.12)
j D1
The summation is done for each coefficient of Hermite polynomials. Then we obtain
the analytic expression of the final leakage currents in terms of the .
We can then obtain the mean value, variance, PDF, and CDF of the leakage
current very easily. For instance, the mean value and variance for the full-chip
leakage current are
To analyze the time complexity, one typically does not count the precharacterization
cost of step 1 in Fig. 4.2. For PCA step (step 2), which essentially uses SVD on the
covariance matrix, its computation cost is O.nk 2 / if we are only interested in the
first k dominant singular values. This is the case for strong spatial correlation.
In step 3, we need to compute the weights of level 2 .k C 3/-dimensional
Smolyak quadrature point set. For quadratic model with k C 3 variables, the number
of Smolyak quadrature points is about .k C 3/2 . So the time cost for generating
Smolyak quadrature point set is O..k C 3/2 /.
In step 4, we need to call (3.1) and (3.2) S times for each gate. In each call, we
need to compute k C 3 variables in the Hermite polynomials. The computing cost
for the two steps is (O.n.k C 3/ S /), where n is the number of gates. After the
leakage currents are computed for each gate, it takes O.n.k C 3// to compute the
full-chip leakage current.
The total computing cost is O.nk 2 C.kC3/2 Cn.kC3/S Cn.kC3//. For second-
order Hermite polynomials, S / k 2 , so the time complexity becomes O.nk 3 /. If
k n (for strong spatial correlation), we end up with a linear-time complexity
O(n). In the sub-90 nm VLSI technologies, the spatial correlation is really strong,
and in the downscaling process, the spatial correlation will become stronger, which
makes sure our method can achieve pretty good time complexity.
3 Numerical Examples
The presented method has been implemented in Matlab 7.4.0. For comparison
purpose, we also implement the grid-based method in [13] and the pure MC method.
All the experimental results are carried out in a Linux system with quad Intel Xeon
CPUs with 2:99 GHz and 16 GB memory. The initial results of this chapter were
published in [155, 157].
The methods for full-chip statistical leakage estimation are tested on circuits in
the PDWorkshop91 benchmark set. The circuits are synthesized with Nangate Open
Cell Library, and the placement is from MCNC [106]. The technology parameters
come from the 45 nm FreePDK Base Kit and PTM models [139].
Table 4.1 shows the detailed parameters for gate length and gate oxide thickness
variations. Here we choose two sets of 2 distributions. The last column of Table 4.1
shows the standard deviation () of each variation. The 3 values of parameter
variations for L and Tox are set to 12% of the nominal parameter values, of which
inter-die variations constitute 20% and intra-die variations, 80% (case 1); inter-die
variations constitute 50% and intra-die variations, 50% (case 2). The parameter L is
modeled as sum of correlated sources of variations, and the gate oxide thickness Tox
is modeled as an independent source of variation. The same framework can be easily
extended to include other parameters of variations. Both L and Tox in each gate are
3 Numerical Examples 61
0.4
0.2
0
0 2000 4000 6000 8000 10000
Full−chip Leakage Current(nA)
Cumulative Distribution of Leakage Current Comparison
1
Monte Carlo
0.8 Our Method
Probability
Grid−based Method
0.6
0.4
0.2
0
0 2000 4000 6000 8000 10000
Full−chip Leakage Current(nA)
Fig. 4.3 Distribution of the total leakage currents of the presented method, the grid-based method,
and the MC method for circuit SC0 (process variation parameters set as Case 1). Reprinted with
permission from [157] c 2010 Elsevier
Table 4.2 Comparison of the mean values of full-chip leakage currents among three methods
Circuit Gate Grid Variation of Ileak . A) Errors (%)
name # # setting MC [13] New [13] New
SC0 125 4 Case 1 1:84 1:75 1:82 4:67 0:84
Case 2 1:84 1:75 1:82 4:85 0:87
SC2 1888 16 Case 1 29:98 28:88 29:70 3:65 0:91
Case 2 30:02 28:89 29:75 3:77 0:89
SC5 6417 64 Case 1 107:9 103:6 107:2 3:93 0:65
Case 2 107:9 103:6 107:2 3:9 0:65
method in [13], which is grid based. And the presented method is much faster than
the MC method. On average, the presented method has about 16 speedup over the
grid-based method in [13]. We notice that method in [13] will become faster with
smaller number of grids used. But this can lead to large errors even with strong
spatial correlations.
4 Summary 63
Table 4.3 Comparison standard deviations of full-chip leakage currents among three methods
of Ileak (A) Errors (%)
Circuit name Variation setting MC [13] New [13] New
SC0 Case 1 0:495 0:668 0:524 35:0 5.77
Case 2 0:632 0:726 0:689 14:9 9.04
SC2 Case 1 8:606 10:86 8:798 26:2 2.23
Case 2 10:71 12:03 11:36 12:33 6.13
SC5 Case 1 26:19 41:36 25:11 57:9 4.12
Case 2 26:19 41:36 25:11 57:9 4.12
4 Summary
In this chapter, we have presented a gate-based method for analyzing the full-chip
leakage current distribution of digital circuit. The method considers both intra-
die and inter-die variations with spatial correlations. The new method employs
the orthogonal polynomials and multidimensional Gaussian quadrature method to
represent and compute variational leakage at the gate level and uses the orthogonal
decomposition to reduce the number of random variables by exploiting the strong
spatial correlations of intra-die variations. The resulting algorithm compares very
favorable with the existing grid-based method in terms of both CPU time and
accuracy. The presented method has about 16 speedup over [13] with constant
better accuracy.
Chapter 5
Linear Statistical Leakage Analysis by Virtual
Grid-Based Modeling
1 Introduction
chip levels. In case of medium and strong correlations, the presented method can
also work in linear time by properly sizing the grid cells so that both locality of
correlation and accuracy are still preserved.
Furthermore, we bring forth a novel characterization of SCL for statistical
leakage information and we have the following observations: (1) The set of neighbor
cells is usually small (10), and only considering the relative position, not the
absolute position on chip. (2) As proved later, the number of neighbor cells
involved in our model is not related to the strength (level) of spatial correlation.
(3) The collocation-based method is applied, and the variational leakage of a gate is
represented in an analytic form in terms of the virtual random variables, which can
give complete distribution. (4) The gate-level leakage distribution is only related to
the type of gates in a SCL. This statistical leakage characterization can be stored in
a LUT, which only needs to be built once for a SCL. And the full-chip leakage of
any chip can be easily calculated by summing up certain items in the LUT.
The main highlights of the presented algorithm are as follows:
1. We apply the virtual grid-based model for spatial correlation modeling in the
statistical leakage analysis, making the resulting algorithm linear time for the
first time for all the spatial correlation (weak or strong) cases.
2. A new characterization in SCL for statistical leakage analysis has been used.
The corresponding algorithm can accelerate full-chip statistical analysis for all
spatial correlation conditions (from weak to strong). To the best knowledge of
the authors, the presented approach is the first published algorithm which can
guarantee O.N / time complexity for all spatial correlation conditions.
3. In addition, an incremental algorithm has been applied. When a few local changes
are made, only a small circuit (includes the changing gates) is involved in
the updating process. Our numerical examples show the incremental analysis
can achieve 10 further speedup compared with the library-enabled full-chip
analysis approach.
In addition to the main highlights, we also present a forward-looking way to
extend the presented method to handle runtime leakage analysis. In order to estimate
maximum runtime leakage, the input state under the maximum leakage input vector
needs to be chosen. While for transient runtime leakage simulation, every time the
input vector changes, the input states of some gates on a chip will be updated.
Therefore, the incremental technique makes efficient runtime leakage simulation
possible. More details are given in Sect. 4.6.
Numerical examples on the PDWorkshop91 benchmarks on a 45 nm technology
show that the presented method using novel characterization in SCL is on average
two orders of magnitude faster than the recently proposed method [13] with similar
accuracy. For weak correlation situation, more speedup can be observed. We remark
that the experiment in this chapter is based on idle-time leakage. However, the
linear-time algorithm can also be applied to runtime leakage by selecting different
input states under certain input vectors. Notice that glitch events are ignored in
the simplified discussion, which may cause estimation errors [99], and need to be
considered in the future work. More details are discussed in Sect. 4.6.
2 Virtual Grid-Based Spatial Correlation Model 67
The virtual grid-based model is based on the observation that the leakage current of
a gate in the presence of spatial correlation only correlates to its neighbor area. If
we can introduce a set of uncorrelated variables to model the localized correlation,
computing the leakage current of one gate can be done in a constant time by only
considering its neighbor area. Hence, total full-chip statistical leakage currents can
then be computed by simply adding all the gate leakage currents together in terms
of the virtual set of variables in linear time. Notice that the virtual random variables
in different grids are always independent, which is different from traditional grid-
based model. This idea was proposed recently for fast statistical timing analysis [15]
to address the computational efficient modeling for weak spatial correlation, which
is similar to the PCA-based approach [155], but with a different set of independent
variables.
Specifically, the chip area is still divided into a set of grid cells. When the spatial
correlation is weak enough to be ignored, the cell can become so small that one cell
only contains one gate. Then we introduce a “virtual” random variable for each cell
for one source of process variation.
These virtual random variables are independent and will be the basis for
statistical leakage current calculation concerned with spatial correlation. Then we
can express the original physical random variable of a gate in a grid cell as a linear
combination of the virtual random variables of its own cell as well as its nearby
neighbors. Since virtual random variables in each cell has specific location on chip,
such location-dependent correlation model still retains the important spatial physical
meaning (in contrast to PCA-based models). The grid partition can be made of any
shape. We use hexagonal grid cells [15] in this chapter since they have minimum
anisotropy for 2D space.
Here we define the distance between centers of two direct neighbor grid cells
as the grid length dc . Gates located in the same cell have strong correlation (larger
than a given threshold value high ) and are assumed to have the same parameter
variations. And “spatial correlation distance” dmax is defined as the minimum
distance beyond which the spatial correlation between any two cells is sufficiently
small (or smaller than a given threshold value low ) so we can ignore it.
In this model, the j th grid cell is associated with one virtual random variable
j N.0; 1/, which is independent of all other virtual random variables. Lj
can then be expressed as its k closest neighbor cells. We introduce the concept
of correlation index neighbor set T .j / for cell j , and the corresponding variable
vector, g;j , is defined as
5 d1
For example, hexagonal grid partition is used as shown in Fig. 5.1, and if T .i /
for each cell is defined as its closest k D 7 neighbor cells, then L located at
cell .xi ; yi / can be represented as a linear combination of seven virtual random
variables located in its neighbor set. Take L1 in Fig. 5.1 for instance, we have
L1 D ˛1 1 C ˛2 2 C C ˛7 7 .
This concept of virtual random variable helps to model the spatial correlation.
Two cells close to each other will share more common spatial random variables,
which means the correlation is strong. On the other hand, two cells physically far
away from each other will share less or no common spatial random variables. In
this way, the spatial correlation is modeled as a homogeneous and isotropic random
field, and the spatial correlation is only related to distance. That is to say, spatial
correlation can be fully described by .d / in (3.12). dmax is the distance beyond
which .d / becomes small enough to be approximated as zero.
Since .d / is only a function of distance, the number of unique distance values
between two correlated grid cells equals the number of unique element values in
˝N . From Fig. 5.1, the spatial correlation
p distance equals to the distance between
cell 1 and cell 10 which is dmax D 7dc , and there are only three unique correlation
distances d1 to d3 . Correspondingly, there are only three unique elements in ˝N ,
without including two special values: 0 for d dmax or 1 for distance within
one cell.
Furthermore, the same correlation index can be used for all grid cells, and
the coefficient ˛k should be the same for the same distance because of the
homogeneousness and isotropy of spatial correlation. For the cell marked 1 in
Fig. 5.1, we only have two unique values among the seven coefficients, i.e., we set
p0 D ˛1 , p1 D ˛i ; i D 2; 3; : : : ; 7. In other words, we have
In this way, although there are seven random variables involved in the neighbor set,
there are only two unknown coefficients left in the linear function in (5.3) due to the
symmetry property of hexagonal partition.
3 Linear Chip-Level Leakage Power Analysis Method 69
L D PN;N ; (5.5)
where N is the number of grid cells and D Œ1 ; 2 ; : : : ; N . According to (5.2), the
correlation index set contains only k spatial random variables, which is a very small
fraction of the total spatial random variables. As a result, PN;N is a sparse matrix.
Every gate is only concerned with k virtual random variables, which has specific
location information.
Fundamentally, PCA-based method performs a similar process and has a similar
new transformation matrix between the original and new set of variables:
L D Vn;n ; (5.6)
In this section, we will present the new full-chip statistical leakage analysis method.
We first introduce the overall flow of the presented method and highlight the major
computing steps. The presented algorithm flow is summarized in Fig. 5.2.
The presented algorithm consists of three major parts. The first part (steps 1
and 2) is precharacterization. Step 1 builds the analytic leakage expressions (3.1)
and (3.2) for each type of gates, which only needs to be done once for a SCL. Step
2 deals with a small-sized nonlinear overdetermined system, which can be solved
with any least-square optimization algorithm. The second part (step 3) generates a
70 5 Linear Statistical Leakage Analysis by Virtual Grid-Based Modeling
small set of independent virtual random variables and builds the analytic leakage
current expressions and covariances for each gate on top of the new random
variables. The final part (step 4) computes the final full-chip leakage expressions by
simple polynomial additions. From the final expressions, we can calculate important
statistical information (like mean, variance, and even the whole distributions). In
the following, we briefly explain some important steps.
XP
Igate . gridj / D Igate;i;j Hi . gridj /; (5.8)
i D0
After the leakage currents are calculated for each gate, we can proceed to compute
the leakage current for the whole chip as follows:
Xn
Ichip ./ D .Isub . gridj / C Igate . gridj //: (5.9)
j D1
The summation is done for each coefficient of Hermite polynomials. Then we obtain
the analytic expression of the final leakage currents in terms of .
We can then obtain the mean value and variance of full-chip leakage current very
easily as follows:
where Ichip;i t h is the leakage coefficient for i th Hermite polynomial of second order
defined in (4.15). Since Hermite polynomials with orders higher than two have no
contribution to mean value or standard deviation, second order is good enough for
estimating chip and chip in (5.10) and (5.11).
To analyze the time complexity, one typically does not count the precharacterization
cost of step 1 in Fig. 5.2, and the time cost of step 2 is ignorable compared to the
following steps. In step 3, we need to compute the weights of level 2 k-dimensional
Smolyak quadrature point set. For quadratic model with k C 3 variables, the number
of Smolyak quadrature point is S O.k 2 / based on the discussion in Sect. 3.1.
72 5 Linear Statistical Leakage Analysis by Virtual Grid-Based Modeling
So the time cost for generating Smolyak quadrature points set is O.k 2 /. In step 4,
we need to call (3.1) and (3.2) S times for each gate. In each call, we need to
compute k C 3 variables in the Hermite polynomials. The computational cost for
the two steps is (O.nk S /), where n is the number of gates. After the leakage
currents are computed for each gate, it takes O.n.k C 3// to compute the full-chip
leakage current.
For the second-order Hermite polynomials, S / k 2 , and the k is the number
of grid cells in the correlated neighbor index set, which is a very small constant
number. As a result, the time complexity of our approach becomes linear—O.n/.
The spatial correlation in (5.2) is related to distance between two grid cells. As a
result, neighbor set T .i / represents the relative location, not the absolute location.
In other words, a local neighbor set T and a local set of variables loc D Œ1 ; : : : ; k
can be shared by all the gates in all the cells.
The local neighbor set T and the coefficients in (5.2) are determined by dmax =dc .
From the specific spatial correlation model in (3.12) (as shown in Fig. 5.3),
p q
dmax D ln.low /; dc D ln.high /; (5.12)
ρhigh
ρ = exp(−d2/η2)
ρ(d)
ρlow
0
Fig. 5.3 Relation between 0 dc /η d max /η
.d / and d= d/η
4 New Statistical Leakage Characterization in SCL 73
then the ratio of spatial correlation distance dmax over grid length dc becomes
q
dmax =dc D ln.low /= ln.high /: (5.13)
Once the threshold values high and low are set, dmax =dc is not related to the
correlation length . This means we can determine the grid length once we know
the spatial correlation distance for a specific correlation formula at cost of controlled
errors (by high and low ).
Furthermore, (5.13) shows the spatial correlation (strong or weak) has nothing
to do with T and the virtual random variables used in our model. At the same
time, the fitting parameters of static leakage in (3.1) and (3.2) are only related to
the types of gates in a library. As a result, the coefficients of Hermite polynomials
for the leakage of one gate are only functions of the type of the gate, high and
low . Therefore, a simple LUT can be used to store the coefficients of Hermite
polynomials of each type of gates in the library. In other words, we do not need
to compute the coefficients of Hermite polynomials for each gate, just look them
up from the table instead. This makes a big difference, as the time complexity is
reduced from O.n/ to O.N /, where n is the number of gates and N is the number
of grid cells on chip.
For the LUT, supposing Q is the number of Hermite polynomials involved and
m is the number of gate types in the library, then it includes two matrices as follows:
Here Isub;q;j represents the coefficient of Hq for j th kind of gate in the library for
subthreshold leakage and Isub;q;j represents the coefficient of Hq for j th kind of
gate in the library for gate oxide leakage. CS and CG are Q m matrices. Notice
the table needs to only be built once and can be reused for different designs with
different conditions of spatial correlations since the new algorithm is independent
of spatial correlation length or the circuit design information. In this way, the LUT
actually builds a new characterization in SCL, which presents the statistical leakage
behavior of each standard cell.
The enhanced new algorithm consists of two parts. The first part is precharacteri-
zation as shown in Fig. 5.4. We build analytic leakage current expressions for each
kind of gate on top of a small set of independent virtual random variables. For fixed
values of high , low , and one library, a new characterization is added to the SCL
by building a LUT, which stores coefficients of Hermite polynomials of Isub and
Igate for the leakage analytic expressions for each kind of gate. This process only
74 5 Linear Statistical Leakage Analysis by Virtual Grid-Based Modeling
Fig. 5.5 The flow of the presented algorithm using statistical leakage characterization in SCL
needs to be done once for one LIBRARY, given high and low . Besides, it involves
a small-size nonlinear overdetermined problem, which can be solved fast with any
least-square algorithm.
When we deal with full-chip statistical leakage analysis, the coefficients of local
Hermite polynomials in the neighbor grid cell set for each cell can be simply
calculated by the LUT. After transferring the local coefficients to corresponding
global positions, we can compute the final full-chip leakage expressions by simple
polynomial additions. From the resulting expression, we can calculate other statis-
tical information (like mean, variance, and even the whole distributions). The new
algorithm flow is summarized in Fig. 5.5. In the following, we briefly explain some
important steps.
4 New Statistical Leakage Characterization in SCL 75
GN m D fgi;j g; (5.15)
where gi;j represents the number of j th kind of gate in library located in i th grid
cell. Then the coefficients of local Hermite polynomials in neighbor set for all the
cells on chip can be easily calculated by the LUT as follows:
In order to get the full-chip leakage current, the local coefficients need to be
transferred to their corresponding global positions:
T .i / D .xi ; yi / C T: (5.17)
For the i th grid cell, the local set of random variables loc should be transferred
to the corresponding positions in T .i /. Therefore, Isub; loc and Igate; loc can be
transferred to the corresponding global coefficients based on the global virtual
random variable set . For example, the coefficient of i in the i th cell is
X
Isub .i / D Isub; loc .T .k/.xk ;yk / /: (5.18)
k;i 2T .k/
Next, we can proceed to compute the leakage current of the whole chip as follows,
X
Ichip ./ D Isub ./ C Igate ./: (5.19)
The summation is done for each coefficient of global Hermite polynomials to obtain
the analytic expression of the final leakage currents in terms of . We can then
obtain the mean value, variance, PDF, and CDF of the leakage current very easily.
For instance, the mean value and variance for the full-chip leakage current are
where Ichip;i t h is the leakage coefficient for i th Hermite polynomial of second order
defined in (4.15).
76 5 Linear Statistical Leakage Analysis by Virtual Grid-Based Modeling
During the leakage-aware circuit optimizations, a few small changes might be made
to the circuit. But we do not want to compute the whole chip leakage from scratch
again. In this case, incremental analysis becomes necessary. In this section, we show
how this can be done in our look-up-table-based framework.
For brevity, we only consider the case where one gate is changed. However, the
presented incremental approach can be easily extended to handle a number of gates.
Assume one gate located in the i th grid cell is changed (e.g., a j th type of gate
is replaced by a .j C 1/th type), resulting in
new old old new
Ichip D Ichip Igridi C Igridi ; (5.22)
new old
where Ichip and Ichip denote the full-chip leakage currents after and before change,
old new
respectively, and Igridi and Igridi are the leakage currents in the i th grid cell before
and after change, respectively.
As defined in (5.15), gi;j in gate mapping matrix represents the number of j th
kind of gate in the library located in the i th cell on a chip. Therefore, we can quickly
generate the new gate mapping matrix G new by updating only two elements in G old :
new old
gi;j D gi;j 1;
new old
gi;j C1 D gi;j C1 C 1: (5.23)
where Isub;j=.j C1/, Igate;j=.j C1/ are the j=.j C 1/th column in CS and CG , respec-
tively.
Compared to the size of the whole chip, the small circuit is much simpler and
only contains a few terms. Therefore, updating the leakage distribution using (5.24)
and (5.25) is much cheaper than the full-blown chip leakage analysis.
4 New Statistical Leakage Characterization in SCL 77
Considering statistical leakage analysis of a certain chip, for each grid cell, we need
to do a weighted sum up of m kinds of gates in this cell for every coefficient in
the neighbor set (size k). For quadratic model with k variables, the number of
coefficients is about S k 2 . So the time cost for this step is O.k 2 m N /,
where N is the number of cells. For transferring the local coefficients to their
global positions and summing them up, the time cost is O.N /. Next, it takes O.N /
to compute the full-chip leakage current. Since k and m are very small constant
numbers, as a result, the time complexity of our approach becomes O.N /.
The leakage current for each input combination we obtained in Sect. 2 of Chap. 3
can be used to estimate the average leakage in standby mode (idle) as well as time-
variant leakage in active mode (runtime).
For idle leakage analysis, we take the average of the leakage currents of all the
input combinations to arrive at analytic expression for each gate as in (5.26), in lieu
of the dominant states used in [13]. The reason for keeping all input states is that the
technology downscaling narrows the gap between leakage under dominant states
and others. Only considering one state in leakage analysis will lead to large error
compared to the simulation results:
avg
X
Isub D Pi Isub;i ;
i 2all input states
avg
X
Igate D Pi Igate;i ; (5.26)
i 2all input states
where Pi is the probability of input state i , and Isub;i and Igate;i are the subthreshold
leakage and gate leakage value at input state i , respectively.
On the other hand, runtime leakage might change when a new input vector is
applied. By choosing the input state at gate level under certain input vector, the final
analytic expression for runtime leakage can be obtained. Notice that the size of the
LUT of runtime leakage is larger than the one used in idle-time leakage analysis. For
runtime leakage, the analytic expressions of all input patterns cannot be combined
and have to be stored separately.
The presented statistical characterization in SCL is fast enough to make runtime
leakage estimation under a series of input vectors possible. More details for
statistical runtime leakage analysis is given in the following part.
78 5 Linear Statistical Leakage Analysis by Virtual Grid-Based Modeling
No
Change in
input vector?
Yes
To obtain the maximum statistical runtime leakage, we follow the work in [38],
which proposed a technique to accurately estimate the runtime maximum/minimum
leakage vector considering both cell functionalities and process variations. One can
first run the tool in [38] to obtain input vector, giving the maximum leakage power
first. Then one can apply the presented SCL tool to obtain the maximum/minimum
statistical leakage power under the input. The presented statistical leakage charac-
terization in SCL will work as long as the input vector is given.
We note that glitch events also have effect on runtime leakage power and ignoring
the glitching can cause an estimation error of approximately 5–20% depending
on circuit topology [99]. However, glitch has not been considered in any existing
statistical runtime leakage analysis works so far and will be investigated in the
future.
Runtime leakage reduction technology such as power gating [1] is widely applied in
design of mobile devices nowadays. Although the model of leakage power used in
this chapter is idle-time leakage, the presented method can be extended to leakage
computation under the runtime scenario with leakage reduction.
By shutting off the idle blocks, power gating is an effective technique for saving
leakage power. Following the runtime leakage model for power gating in [73], the
variational part of full-chip leakage can be estimated as
X gate
Ileak D .1 W / Ii ; (5.27)
i 2allgates
where W is the empirical switching factor. And from [198], the leakage of a gate
I gate can be approximated into a single exponential function of its virtual ground
voltage (VV G )
where Kgate is the leakage reduction exponent and IO is zero-VV G leakage current.
Notice both the switching factor W in (5.27) and the leakage reduction exponent
Kgate in (5.28) are related only to the type of gates and not to a statistical factor.
Therefore, the presented LUT approach can work for both idle leakage and runtime
leakage with power-gating activities.
5 Numerical Examples
The presented methods with and without using LUT have been implemented in
Matlab 7.8.0. Since the leakage model for method in [200] has to be purely log-
normal (linear terms in exponent parts), we did not choose it for comparing purpose.
80 5 Linear Statistical Leakage Analysis by Virtual Grid-Based Modeling
All the experimental results are carried out in a Linux system with quad Intel Xeon
CPUs with 2:99 GHz and 16 GB memory. The initial results of this chapter were
published in [158, 159].
The methods for full-chip statistical leakage analysis were tested on circuits in
the PDWorkshop91 benchmark set. The circuits were synthesized with Nangate
Open Cell Library [125], and the placement is from MCNC [106]. The technology
parameters come from the 45 nm FreePDK Base Kit and PTM models [139].
According to [71], L and Tox for high-performance logic in 45 nm technology
will be 18 nm and 1.8 nm, respectively. And the physical variation should be
controlled within ˙12%. So the 3 values of variations for L and Tox were set
to 12% of the nominal values, of which inter-die variations constitute 20% and
intra-die variations, 80%. L is modeled as sum of spatially correlated sources of
variations, and Tox is modeled as an independent source of variation. The same
framework can be easily extended to include other parameters of variations. Both
L and Tox are modeled as Gaussian parameters. For the correlated L, the spatial
correlation is modeled based on (3.12), and the partition adopts Fig. 5.1. The test
cases are given in Table 5.1 (all length units in m), where test case “VLSI” is
generated from duplicating SC2 as unit block to 16 16 array.
For comparison purposes, we performed MC simulations with 50,000 runs
using (3.1) and (3.2), the method in [13] (only consider spatial correlation of
neighbor grid cells), and the presented approaches on the benchmarks.
The results of the comparison of mean value and standard deviations of full-chip
leakage current are shown in Table 5.2, where New is the presented method. The
average errors for mean and standard variance () values of the new technique are
4.52% and 3.92%, respectively. While for the method in [13], the average errors
for mean value and are 4.12% and 3.83%, respectively. Table 5.2 shows these
two algorithms have almost the same accuracy, and our method can handle both
strong and weak spatial correlations by adjusting grid size, for very large circuit
5 Numerical Examples 81
Table 5.3 CPU time Test case MC Method in [13] New LUT
comparison
Case1 83.14 2.96 0.10 0.023
Case2 87.09 13.16 0.14 0.036
Case3 828.42 26.24 0.86 0.033
Case4 869.12 74.50 0.87 0.609
Case5 7532.77 117.77 8.65 1.005
Case6 7873.54 490.84 10.67 7.191
Case7 – – 2598 3.7313
such as Case 7 MC and method in [13] runs out of memory, but the presented method
still works.
Table 5.3 compares the CPU times of MC, method in [13], presented method
(New), and presented method using statistical leakage characterization in SCL
(shorted as LUT). This table shows the presented new method, New, is much faster
than the method in [13] and MC simulation. On average, the presented algorithm has
about 113 speedup over [13] and many order of magnitudes over the MC method.
And the speed of our approach is not affected by the total number of grid cells. If
the spatial correlation is strong, which means dmax is large, dc can be increased at
the same time without loss of accuracy. So the number of neighbor grid cells in T .i /
will still be much smaller than the number of gates. The presented method will be
efficient and linear under both cases. Table 5.3 also shows the presented method can
gain further speedup with LUT technique using statistical leakage characterization
in SCL.
82 5 Linear Statistical Leakage Analysis by Virtual Grid-Based Modeling
For comparison purpose, one gate in each benchmark circuit is changed, and the
presented incremental algorithm is applied to update the leakage value locally.
Table 5.4 shows the computational cost of the incremental analysis and the speedup
over four different leakage analysis methods in Table 5.3. Compared with the
LUT approach (the fifth column in Table 5.3), the incremental analysis achieves
13 3:1e4X speedup. As discussed in Sect. 4.4, the minicircuit for updating only
contains a small constant number of terms. Therefore, when the problem size
increases further, we expect the incremental analysis could achieve more speedup
over the full leakage analysis.
6 Summary
In this chapter, we have presented a linear algorithm for full-chip statistical analysis
of leakage currents in the presence of any condition of spatial correlation (strong or
weak). The new algorithm adopts a set of uncorrelated virtual variables over grid
cells to represent the original physical random variables with spatial correlation,
and the size of grid cell is determined by the correlation length. As a result, each
physical variable is always represented by virtual variables in local neighbor set.
Furthermore, a LUT is used to cache the statistical leakage information of each type
of gate in the library to avoid computing leakage for each gate instance. As a result,
the full-chip leakage can be calculated with O.N / time complexity, where N is the
number of grid cells on chip. The new method maintains the linear complexity from
strong to weak spatial correlation and has no limitation of leakage current model or
variation model.
This chapter also presented an incremental analysis scheme to update the leakage
distribution more efficiently when local changes to a circuit are made. Numerical
examples show the presented method is about 1,000 faster than the recently
proposed method [13] with similar accuracy and many orders of magnitude over the
MC method. Numerical results show the presented incremental analysis can further
achieve significant speedup over the full leakage analysis.
Chapter 6
Statistical Dynamic Power Estimation
Techniques
1 Introduction
It is well accepted that the process-induced variability has huge impacts on the
circuit performance in the sub-90 nm VLSI technologies. The variational consid-
eration of process has to be assessed in various VLSI design steps to ensure robust
circuit design. Process variations consist of both inter-die ones, which affect all
the devices on the same chip in the same way, and intra-die ones, which represent
variations of parameters within the same chip. These include spatially correlated
variations and purely independent or uncorrelated variations. Spatial correlation
describes the phenomenon that devices close to each other are more likely to have
similar characteristics than when they are far apart. It was shown that variations
in the practical chips in nanometer range are spatially correlated [195]. Simple
assumption of independence for involved random variables can lead to significant
errors.
One great challenge from aggressive technology scaling is the increasing power
consumption, which has become a major issue in VLSI design. And the variations
in process parameters and timing delays result in variations in power consumption.
Many statistical leakage power analysis methods have been proposed to handle both
inter-die and intra-die process variations considering spatial variation [13, 65, 155,
200]. However, the problem is far from being solved for dynamic power estimation.
Dynamic power for a digital circuit in general is expressed as follows:
1 X n
Pdyn D fclk Vd2d Cj Sj ; (6.1)
2 j D1
where n is the number of gates on chip, fclk is clock frequency, Vd d is the supply
voltage, Cj is the sum of load capacitance and equivalent short-circuit capacitance
at node j , and Sj is the switching activity for gate j . This expression, however, does
1.5
0.5
0
0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2
Leff Ratio 0.8~1.2
Fig. 6.1 The dynamic power versus effective channel length for an AND2 gate in 45 nm
technology (70 ps active pulse as partial swing, 130 ps active pulse as full swing). Reprinted with
permission from [60] c 2010 IEEE
not give explicit impacts of effective channel length (Leff ) and gate oxide thickness
(Tox ) of the gate on the dynamic power. In the work of [64], Leff and Tox are proved
to have the most impact on gate dynamic power consumption. Figure 6.1 shows
dynamic power variations due to different effective channel length for an AND2 gate
in 45 nm technology. It can be seen that channel lengths of a gate has a significant
impact on its dynamic power.
In this chapter, we propose to develop a more efficient statistical dynamic power
estimation method considering channel length variations with spatial correlation
and gate oxide thickness variations, which is not considered in the existing works.
The presented dynamic power analysis method explicitly considers the spatial
correlations and glitch width variations on a chip. The presented method [60]
follows the segment-based statistical power analysis method [30], where dynamic
power is estimated based on the switching period instead of switching events to
accommodate the glitch width variations. To consider the spatial correlation of
channel length, we set up a set of uncorrelated variables over virtual grids to
represent the original physical random variables via fitting. In this way, O.n2 / time
complexity for computing the variances can be reduced to linear-time complexity
(n is the number of gates in the circuit). The algorithm works for both strong and
weak correlations. Furthermore, a LUT is created to cache statistical information
for each type of gate to avoid running SPICE repeatedly. The presented method has
no restrictions on models of statistical distributions for dynamic power. Numerical
examples show that the presented method has 300 speedup over recently proposed
method [30] and many orders of magnitude over the MC method.
2 Prior Works 85
2 Prior Works
Many works on dynamic power analysis have been proposed in the past. MC-based
simulation was proposed in [10] where the circuit is simulated for a large number
of input vectors to gain statistics for average power. Later, probabilistic methods
for power estimation were proposed and widely used [29, 48, 116, 117, 183] because
statistical estimates can be obtained without time-consuming exhaustive simulation.
In [117], the concept of probability waveforms is proposed to estimate the mean and
variance of the current drawn by each circuit node. In [116], the notion of transition
density is introduced and they are propagated through combinational logic modules
without regard to their structure. However, the author did not consider the inner
signal correlation; thus, the algorithm is only applicable to combinational circuits.
Ghosh et al. [48] extended the transition density theory to consider sequential
circuits via the symbolic simulation to calculate the correlations between internal
lines due to reconvergence. However, the performance of this algorithm is restricted
due to its memory space complexity. In [29, 183], the authors used the tagged
probabilistic simulation (TPS) to model the set of all possible events at the output of
each circuit node, and is more efficient compared with [48] due to its effectiveness in
computing the signal correlation. The work [48] is based on zero-delay model, and
the works [10,116,183] are based on real delay model. However, all of them assume
fixed delay model, which is no longer true under process variation. At the same
time, all the previous works only consider full-swing transition, and partial-swing
effects are not well accounted for.
Recently, several approaches have been proposed for fast statistical dynamic
power estimation [4, 18, 30, 64, 66, 138]. Alexander et al. [4] proposed to consider
the delay variations and glitches for estimation dynamic powers. With efficient
simulation of input vectors, this algorithm has a linear-time complexity. But the
variation model is quite simple as only minimum and maximum bounds for delay
were obtained, and partial swings are not considered. Pilli et al. [138] presented
another approach, which divides the clock cycle into a number of time slots and
the transition density is computed for each slot, but only mean value of dynamic
power can be estimated. In [66], the authors used supergate and timed Boolean
functions to filter glitches and consider signal correlations due to reconvergent fan-
outs, but failed to consider the correlations including placement information. Chou
et al. [18] used probabilistic delay model based on MC simulation technique for
dynamic power estimation but also lacked the considerations including placement
information. Harish et al. [64] used hybrid power model based on MC analysis; the
method is only applied to a small two-stage two-input NAND gate; however, for
large circuits, Monte Carlo simulation can be really time consuming.
86 6 Statistical Dynamic Power Estimation Techniques
E1 E2 E3 E4 E5 ... Em
Time
Fig. 6.2 A transition waveform example fE1 ; E2 ; : : : ; Em g for a node. Reprinted with permission
from [60]
c 2010 IEEE
Dinh et al. [30] recently proposed a method not based on the fixed delay gate model
to consider the partial-swing effect as well as the effect of process variation.
To accurately estimate the dynamic power in the presence of process variation,
the work in [30] introduces the transition waveform concept, which is similar to the
probability waveform [117] or tagged waveform [29] concepts except that variance
of the transition time is introduced. Specifically, a transition wave consists of set
of a transition events, which is a triplet .p; t; ıt /; where p is the probability for
the transition to occur, t is the mean time of the transition, and ıt is the standard
deviation of the transition time. Figure 6.2 shows an example of transition waveform
for a node.
The triplets are then propagated from the primary inputs to the primary outputs,
and they are computed for every node. In addition to propagating the switching
probabilities like traditional methods, this method also propagates the variances
along the signal paths, which is done in a straightforward way based on the second-
order moment matching. The glitch filtering is also performed to ensure accuracy
and reduce the number of switches during the propagation.
Unlike the traditional power estimation methods in [29, 117], which count the
transition times (or their probabilities), i.e., edges in the transition waveform, to
estimate the dynamic power, the work in [30] proposed to count the transition
segments (duration), which are pairs of two transition events to take into account
the impacts of the different glitch widths on the dynamic power consumption. For
n transition event in transition waveform, the number of segments is Cn2 D n.n
1/=2, which increases the complexity of the computation compared to the edge-
based method. Another implication is traditional power edge-based consumption
formula (6.1) cannot be used any more. As a result, a LUT is built from the
SPICE simulation results for different glitch widths. The total dynamic power for a
gate is then the probability-weighted average dynamic power for all the switching
segments, which is then summed up to compute the total chip dynamic power.
However, this method does not consider spatial correlation, which can lead to
significant errors and is the main issue to be addressed in this chapter.
3 The Presented New Statistical Dynamic Power Estimation Method 87
In this section, we present the new full-chip statistical dynamic power analysis
method. The presented approach follows the segment-based power estimation
method [30]. The presented algorithm propagates the triplet switching events from
primary input to the output. Then it computes the statistical dynamic power at each
node based on orthogonal polynomial chaos and virtual grid-based variables for
channel length to deal with spatial correlation discussed in Sect. 3 of Chaps. 3 and
2 of Chap. 5.
We first present the overall flow of the presented method in Fig. 6.3 and then
highlight the major computing steps later.
The dynamic power for one gate (under glitch width Wg with variation and fixed
load capacitance Cl ) can be presented by Hermite polynomial expansion as
XQ
Pdyn;Wg ;Cl . g;j / D Pdyn;q;j Hq . g;j /: (6.2)
qD0
where l is Smolyak quadrature sample. From the dynamic power LUT Pdyn D
f .L; Tox ; Wg ; Cl /, we can interpolate Pdyn .l /, which is the dynamic power for
every Smolyak sampling point.
To compute the statistical gate power expression considering the glitch width
variations, we need to compute the probability of each switching segment assuming
that they follow the normal distribution:
1 .wi w /2
Pr.w D wi / D p exp : (6.4)
w 2 2w2
The Hermite polynomial coefficients for (6.2) under glitch width wi and load
capacitance Cl can be interpolated from the sub LUT. For a gate index j with the
transition waveform .p1 ; t1 ; t1 /, .p2 ; t2 ; t 2 /, : : : , .pM ; tM ; tM /, there are M.M
1/=2 segments. The resulting statistic power is probabilistic addition of power from
each segment (their Hermite polynomial expressions):
X
M 1 X
M
Pdyn;Cl . g;k / D P r.i; j / Pdyn;Cl . g;k ; i; j /; (6.5)
i D1 j Di C1
in which Pdyn;Cl . g;k ; i; j / is the dynamic power of gate k caused by the switching
segment between transitions Ei and Ej . P r.i; j / is the probability that the
switching segment .Ei ; Ej / occurs only if there are transitions at both Ei and Ej ,
and there are no transitions between Ei and Ej :
j 1
Y
P r.i; j / D pi pj .1 pk /: (6.6)
kDi C1
The dynamic power for each gate is calculated using (6.5). To compute the full-chip
dynamic powers, we also need to transfer the local coefficients to corresponding
global positions first. Then we can proceed to compute the dynamic power for the
whole chip as follows,
Xn
total
Pdyn ./ D Pdyn . g;j /: (6.7)
j D1
very easily. For instance, the mean value and variance for the full-chip dynamic
power are
where Pdyn;ith is the power coefficient for i th Hermite polynomial of second order
defined in (4.15).
4 Numerical Examples
The presented method and the segment-based analysis [30] have been implemented
in Matlab V7.8. The initial results of this chapter were published in [60].
The presented new method was tested on circuits in the ISCAS’89 benchmark
set. The circuits were synthesized with Nangate Open Cell Library under 45 nm
technology, and the placement is from UCLA/Umich Capo [145]. For comparison
purposes, we performed MC simulations (10,000 runs) considering spatial correla-
tion, the method in [30], and the presented method on the benchmark circuits. In our
MC implementation, similar to [30], we do not run the SPICE on the original circuits
as it is too much time consuming for ordinary computer. Instead, we compute the
results via interpolation from the characterization data computed from SPICE runs.
The 3 range of L and Tox is set as 20%, of which inter-die variations constitute 20%
and intra-die variations, 80%. L, Tox are modeled as Gaussian random variables. L
is modeled as sum of spatially correlated sources of variations based on (3.12). Tox
is modeled as an independent source of spatial variation. The same framework can
be easily extended to include other parameters of variations.
The characterization data for each type of gate in SCL are collected using
HSPICE simulation. For each type of gate, we perform repeated simulation on
sampling points in the 3 range of L, Tox , and input glitch width Wg for several
different load capacitances to obtain the gate dynamic powers and gate delays. The
table of characterization data will be used to interpolate the value of dynamic power
for each type of gate with different process parameters. We use 21 sample points for
glitch width, from 50 ps to 150 ps.
In transition waveform computation, the gate delays are obtained through the
table of characterization data, and the input signal probabilities are 0.5, with
switching probabilities of 0.75. The test cases are given in Table 6.1 (all length units
in m). In the first column, s and w stand for strong and weak spatial correlations,
respectively.
The comparison results of mean values and standard deviations of full-chip
dynamic power are shown in Table 6.2, where MC Co represents Monte Carlo
4 Numerical Examples 91
Table 6.2 Statistical dynamic power analysis accuracy comparison against Monte Carlo
Mean value (mW) Errors (%)
Test
case Grid # MC Co [30] New [30] New
s1196 (s) 27 1.14 1.19 1.14 3.82 0.49
s1196 (w) 294 1.14 1.19 1.14 3.98 0.41
s5378 (s) 93 6.09 6.24 5.98 2.46 1.85
s5378 (w) 1300 6.09 6.23 5.98 2.29 1.85
s9234 (s) 161 12.8 13.2 12.5 2.94 2.31
s9234 (w) 2358 12.8 13.1 12.5 2.78 2.14
Standard deviation (mW) Errors (%)
Test
case Grid # MC Co [30] New [30] New
s1196 (s) 27 0.0912 0.00394 0.0845 95.68 7.33
s1196 (w) 294 0.0671 0.00395 0.0645 94.11 3.94
s5378 (s) 93 0.470 0.00877 0.435 98.13 7.61
s5378 (w) 1300 0.436 0.00891 0.412 97.96 5.68
s9234 (s) 161 0.964 0.0185 0.882 98.08 8.52
s9234 (w) 2358 0.894 0.0191 0.839 97.87 6.14
considering spatial correlation, and New is the presented method. The method
in [30] cannot consider spatial correlation as it assumed that the power for the
gates are independent Gaussian random variables. In implementation of [30], we
assume the same variation for Leff and Tox but without spatial correlations. The
average errors for mean and standard deviation () values of the New technique
are 1.49% and 6.54% compared to MC Co, respectively. While for the method
in [30], the average errors for mean value and are 3.04% and 96.97%, respectively.
As a result, not considering spatial correlations can lead to significant errors.
Furthermore, from the comparison between mean and standard deviation of MC Co,
the average std=mean is 7.21% which means spatial correlation in process parameter
has significant impact on the distribution of dynamic power. The results in Table 6.2
also show that our method can handle both strong and weak spatial correlations by
adjusting grid size.
Table 6.3 compares the CPU times of three methods, which shows that the New
method is much faster than the method in [30] and MC simulation. On average, the
presented technique has about 377 speedup over [30] and 5,123 speedup over the
MC method. In [30], the dynamic power of each gate needed to be interpolated from
the LUT due to different L, Tox , and glitch width value variations; the complexity
92 6 Statistical Dynamic Power Estimation Techniques
is a linear function of the number of gates O.n/; however, in New algorithm, only
the coefficients of Hermite polynomials for each type of gate are needed to compute
and the overall complexity is a linear function of the number of grids O.N /.
5 Summary
1 Introduction
For digital CMOS circuits, the total power consumption is given by the following
formula:
Ptotal D Pdyn C Pshort C Pleakage ; (7.1)
in which Pdyn , Pshort , and Pleakage represent dynamic power, short-circuit power,
and leakage power, respectively. Most of the previous works on power estimation
either focus on dynamic power estimation [10, 28–30, 64, 116] or leakage power
estimation [13, 95, 158, 200]. As technology scales down to nanometer range, the
process-induced variability has huge impacts on the circuit performance [120].
Furthermore, many variational parameters in the practical chips in nanometer range
are spatially correlated, which makes the computations even more difficult [195],
and simple assumption of independence for involved random variables can lead to
significant errors.
Early research on power analysis is mainly focusing on dynamic power
analysis [10, 28, 29, 116]; the solution ranges from the transition density-based
method [116], tagged probabilistic method [29], to the practical MC based
method [10, 28, 29]. Later on, designers realize that leakage power is becoming
more and more significant and is very sensitive to the process variations. As a
result, full-chip leakage power estimation considering process variations under
spatial correlation has been intensively studied in the past [13, 95, 158, 200]; the
method can be grid based [13, 158], projection based [95], and simplified gate
leakage model based [200].
Although total power can be computed by simply adding the dynamic power and
leakage power (plus short-circuit power), practically, dynamic power and leakage
power are correlated. For instance, leakage power of a gate depends on its input
state, which depends on the primary inputs and timing of the circuits. Using
dominant state or average values is less accurate than the precise circuit-level
simulation under realistic testing input vectors. Under the process variations with
200
100
0
2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4
W x 10 −4
200
100
0
3.5 3.6 3.7 3.8 3.9 4 4.1 4.2
W x 10−4
Fig. 7.1 The comparison of circuit total power distribution of circuit c432 in ISCAS’85 bench-
mark sets (top) under random input vectors (with 0.5 input signal and transition probabilities) and
(bottom) under a fixed input vector with effective channel length spatial correlations. Reprinted
with permission from [62] c 2011 IEEE
spatial correlation, the dynamic power and leakage power are more correlated via
process parameters. As a result, traditional separate approaches will not be accurate.
Circuit-level total power estimation based on real testing vectors is more desirable.
Figure 7.1 shows the comparison of the circuit total power distribution of c432
from ISCAS’85 benchmark. We show two power variations. The first figure (upper)
is obtained due to random input vectors. The second is obtained using a fixed input
vector but under process variations with spatial correlation. As can be seen, the
variance induced by process variations is comparable with the variance induced by
random input vectors. As a result, considering process variation impacts on the total
chip power is important for early design solution exploration and post-layout design
sign-off validation.
Several works had been proposed to consider the dynamic power considering
process variation. Harish et al. [64] used hybrid power model based on MC analysis,
but the method is only applied to a small two-stage two-input NAND gate. The work
in [4] used a variation delay model to obtain minimum and maximum delay bounds
in order to estimate the number of glitches and dynamic power. The work in [30]
introduced a new method based on transition waveform concept, where transition
waveform is propagated through the circuit and the effect of partial swing could be
2 Review of the Monte Carlo-Based Power Estimation Method 95
in which P is the probability and ˚.Pdyn / is the CDF of the standard normal
distribution. Therefore, given the confidence level .1 ˛/, it follows that
( )
PT mdyn
P ˚˛=2 < ıp ˚1˛=2 D 1 ˛: (7.5)
sdyn N
Equation (7.6) can be viewed as the stopping criterion when N , mdyn , and sdyn
satisfy it.
Afterward, the work in [28, 29] further improves the efficiency of MC-based
method. In [29], the author transforms the power estimation problem to a survey
sampling problem and applied stratified random sampling to improve the efficiency
of MC sampling. In [28], the author proposed two new sampling techniques,
module-based and cluster-based, which can adapt stratification to further improve
the efficiency of the Monte Carlo-based techniques. However, all of these works
are based on gate-level logic simulation as they only consider dynamic powers. For
total power estimation and estimating of impacts of process variations, one needs
transistor-level simulations. As a result, improving the efficiency of MC method
becomes crucial and will be addressed in this chapter.
In this section, we present the new chip-level statistical method for total estimation
of full-level powers, called STEP. The method can consider both fixed input vectors
and random input vectors for power estimation. Power distribution considering
process variations under fixed input vectors is important because it can reveal the
power distribution for the maximum power, the minimum power, or the power due
to user-specified input vectors. This technique can be further applied to estimate the
distribution for maximum power dissipation [188]. Power distribution under random
input vectors is also important, as it can show the total power distribution caused by
random input vectors and process variations with spatial correlation. We first give
the overall flow of the presented method under a fixed input vector in Fig. 7.2 and
then highlight the major computing steps later. The flow of the presented method
considering random input vectors is followed afterward. The spatial correlation
model is the same as Sect. 3 of Chap. 3.
3 The Statistical Total Power Estimation Method 97
Fig. 7.2 The flow of the presented algorithm under a fixed input vector
The STEP method uses commercial Fast-SPICE tool for accurate total power
simulation. It transforms the correlated variables into uncorrelated ones and re-
duces the number of random variables using the PFA method [57]. Then it
computes the statistical total power based on Hermite polynomials and sparse grid
techniques [45].
Pt ot;q is then computed by the numerical Smolyak quadrature method. In this chap-
ter, we use second-order Hermite polynomials for statistical total power analysis,
and the Smolyak quadrature samples for k random variables is 2k 2 C 3k C 1. The
coefficient for qth Hermite polynomial, Ptot;q , can be computed as the following:
X
Ptot;q D Ptot .l /Hq .l /wl =hHq2 ./i; (7.8)
where Ptot;ith is the power coefficient for i th Hermite polynomial of second order
defined in (4.15).
To consider more input vectors or random input vectors used in the traditional
dynamic power analysis, one simple way is to treat the input vector as one more
variational parameter in our statistical analysis framework. This strategy can be
easily fit into the simple MC-based method [10] as we just add one dimension to the
variable space. But for spectral stochastic method, it is difficult to add this variable
into existing space.
In probability theory, the PDF of a function of several random variables can
be calculated from the conditional PDF for single random variable. Let Ptotal D
g.Ui n ; Leff /, in which Ui n is the variable of random input vectors and Leff is the
variable of gates effective channel length. The PDF of total power Ptotal can be
calculated by Z 1
fPtotal .p/ D fLeff .lju/fUi n .u/du; (7.11)
1
4 Numerical Examples 99
a b c Power
Fig. 7.3 The selected power points a, b, and c from the power distribution under random input
vectors. Reprinted with permission from [62]
c 2011 IEEE
in which the PDF function under random input vectors fUi n .u/ is obtained by MC-
based method [10] and the conditional PDF fLeff .ljUi n D u/ under fixed input u can
be obtained or interpolated from samples calculated from fixed input algorithm in
Fig. 7.2. Note u can be viewed as the power of chip under input u.
We use the example in Fig. 7.3 to illustrate the presented method. In this figure,
we first compute the power distribution (solid line) with random input vectors only.
Then we select three input power points, a; b; c (with three corresponding input
vectors). In each of the input power point, we perform statistical power analysis
with process variations under the fixed power input (using the corresponding input
vector). After this, we interpolate the power distributions for other power points for
final integration.
The flow of the presented analysis method under random input vectors is shown
in Fig. 7.4. The STEP algorithm computes the total power under random input
vectors using the MC-based method [10].
4 Numerical Examples
The presented method has been implemented in Matlab V7.8, and Cadence Ultrasim
7.0 was used for Fast-SPICE simulations. All the experimental results have been
carried out in a Linux system with quad Intel Xeon CPUs with 3 GHz and 16 GB
memory. The initial results of this chapter were published in [62].
The STEP method was tested on circuits in the ISCAS’85 benchmark set. The
circuits were synthesized with Nangate open cell library under 45 nm technology,
and the placement is obtained from UCLA/Umich Capo [145]. The test cases are
given in Table 7.1 (all length units in m).
Effective channel length Leff is modeled as sum of spatially correlated sources of
variations based on (3.12). The nominal value of Leff is 50 nm and the 3 range is
100 7 Statistical Total Power Estimation Techniques
Fig. 7.4 The flow of the presented algorithm with random input vectors and process variations
set as 20%. The same framework can be easily extended to include other parameters
of variations.
Firstly, we use the MC-based method [10] to obtain the mean and standard
deviation (std) of each circuit sample under random input vectors. The input signal
and transition probabilities are 0:5, with the clock cycle of 180 ps. The simulation
time for each sample circuit is 10 clock cycles, and the error tolerance is 0:01.
Secondly, we observe the total power distribution for each sample circuit under
fixed input vector. For each sample circuit, one input vector is selected, and then
we run the MC simulations (10,000 runs) under process variations with spatial
correlation as well as our presented STEP method. The results are shown in
Table 7.2, in which MC Co and STEP mean the MC method considering process
variations with spatial correlation and the presented method, respectively. The
average errors for mean and standard deviation of the STEP method are 2:90% and
6:00%, respectively. Figure 7.5 shows the total power distribution (PDF and CDF) of
circuit c880 under a fixed input. Table 7.3 gives parameter values of the correlation
length , reduced number of variable k, and sample count of Fast-SPICE running of
the two methods. Sampling time dominates the total simulation time for both MC
4 Numerical Examples 101
0.15 New
Probability
Monte Carlo
0.1
0.05
0
5.5 6 6.5 7 7.5
Power(W) x 10−4
c880 power distribution cdf under fixed input
1
0.8 New
Probability
Monte Carlo
0.6
0.4
0.2
0
5.5 6 6.5 7 7.5
Power(W) x 10−4
Fig. 7.5 The comparison of total power distribution PDF and CDF between STEP method and MC
method for circuit c880 under a fixed input vector. Reprinted with permission from [62]
c 2011
IEEE
Co and the S TEP methods and the STEP method has 78 speedup over MC Co
method on average. The more speedup can be gained for large cases.
Thirdly, we compare the STEP method with the MC method under both random
input vectors and process variations with spatial correlation. We select three power
102 7 Statistical Total Power Estimation Techniques
points from the total power distribution obtained by the MCy-based method [10]
and get the corresponding input vectors. We performed the STEP method under
these three input vectors and obtain the corresponding mean and standard deviation,
respectively. The .mean; std/ samples for other power points with distinguished
power values can be interpolated via the three samples.
Equation (7.11) is used to calculate the PDF of total power distribution under
both random input vectors and process variations with spatial correlation. The
results are shown in Table 7.4; MC Co, MC nCo, and STEP represent the MC
method considering process variations with spatial correlation, the MC method
without considering process variations with spatial correlation, and the presented
method, respectively. The average error of the mean and the standard deviation of
our method compared with MC Co is 2.17% and 6.09%, respectively. While the
average error of the mean and the standard deviation of MC nCo compared with MC
Co is 1.34% and 28.01%, respectively. The error (std) is increasing for larger test
cases.
Obviously, we can see that the MC method considering only random input
vectors fails to capture the true distribution when both input vector and process
variations are considered. The parameter values of ı and k is the same as in
Table 7.3. The difference is that we need to run STEP for three times and the
total sample numbers are increased correspondingly. However, the STEP method
still has 26 speedup over the MC method on average and remains to be accurate.
Figure 7.6 shows the power distribution comparison (PDF and CDF) of the STEP
method and the MC method under both random input vectors and process variations
with spatial correlation for circuit c880. We observe that the distribution of the total
power under a fixed input vector or under random input vectors has a distribution
similar to normal as shown in Figs. 7.5 and 7.6, such distribution justifies the use of
Hermite PC to represent the total power distributions.
5 Summary 103
0.1
0.05
0
4 4.5 5 5.5 6 6.5 7 7.5 8 8.5
Power(W) x 10−4
c880 power distribution cdf
1
0.8 New
Probability
Monte Carlo
0.6
0.4
0.2
0
4 4.5 5 5.5 6 6.5 7 7.5 8 8.5
Power(W) x 10−4
Fig. 7.6 The comparison of total power distribution PDF and CDF between STEP method and
Monte Carlo method for circuit c880 under random input vector. Reprinted with permission
from [62]
c 2011 IEEE
5 Summary
In this chapter, we have presented an efficient statistical total chip power estimation
method considering process variations with spatial correlation. The new method is
based on accurate circuit-level simulation under realistic testing input vectors to
obtain accurate total chip powers. To improve the estimation efficiency, efficient
sampling-based approach has been applied using the OPC-based representation
and random variable transformation and reduction techniques. Numerical examples
show that the presented method is 78 faster than the MC method under fixed input
vector and 26 faster than the MC method considering both random input vectors
and process variations with spatial correlation.
Part III
Variational On-Chip Power Delivery
Network Analysis
Chapter 8
Statistical Power Grid Analysis Considering
Log-Normal Leakage Current Variations
1 Introduction
As discussed in Part II, process-induced variability has huge impacts on chip leakage
currents, owing to the exponential relationship between subthreshold leakage
current Isub and threshold voltage Vth as shown below [172],
Vgs Vth
Vds
Isub D Is0 e nVT 1e V T ; (8.1)
2 Previous Works
A number of research work have been proposed recently to address the voltage drop
variation issues in the on-chip power delivery networks under process variations.
The voltage drop of power grid networks subject to the leakage current variations
was first studied in [39, 40]. This method assumes that the log-normal distribution
of the node voltage drop is caused by log-normal leakage current inputs and is based
on a localized MC (sampling) method to compute the variance of the node voltage
drop. However, this localized sampling method is limited to the static DC solution of
power grids modeled as resistor-only networks. Therefore, it can only compute the
responses to the standby leakage currents. However, the dynamic leakage currents
become more significant, especially when the sleep transistors are intensively used
nowadays for reducing leakage powers. In [131,169], impulse responses are used to
compute the means and variances of node voltage responses due to general current
variations. But this method needs to know the impulse responses from all the current
sources to all the nodes, which is expensive to compute for a large network. This
method also cannot consider the variations of the wires in the power grid networks.
Recently, a number of analysis approaches based on so-called spectral stochastic
analysis method have been proposed for analyzing interconnect and power grid
networks [46, 47, 108, 190]. This method is based on the OPC expansion of random
processes and the Galerkin theory to represent and solve for the stochastic responses
of statistical linear dynamic systems. The spectral stochastic method only needs to
solve for some coefficients of the orthogonal polynomials by using normal transient
simulation of the original circuits. Research work in [190] applied the spectral
3 Nominal Power Grid Network Model 109
The power grid networks in this chapter are modeled as RC networks with known
time-variant current sources, which can be obtained by gate-level logic simulations
of the circuits. Figure 8.1 shows the power grid models used in this chapter. For a
110 8 Statistical Power Grid Analysis Considering Log-Normal Leakage Current Variations
power grid (vs. the ground grid), some nodes having known voltage are modeled
as constant voltage sources. For C4 power grids, the known voltage nodes can be
internal nodes inside the power grid. Given the current source vector, u.t/, the node
voltages can be obtained by solving the following differential equations, which are
formulated using the modified nodal analysis (MNA) approach:
dv.t/
Gv.t/ C C D Bu.t/; (8.2)
dt
where G 2 Rnn is the conductance matrix, C 2 Rnn is the matrix resulting from
storage elements, v.t/ is the vector of time-variant node voltages and branch currents
of voltage sources, u.t/ is the vector of independent sources, and B is the input
selector matrix.
We remark that the proposed method can be directly applied to power grids
modeled as RLC/RLCK circuits. But inductive effects are still most visible at board
and package levels, and the recent power grid networks from IBM only consist of
resistance [123].
4 Problem Formulation 111
4 Problem Formulation
In this section, we present the modeling issue of leakage current under intra-die
variations for power grid network. Note that in this case, the leakage current is
random process instead of random variable in the full-chip leakage analysis in the
above part of this book. After this, we present the problem that we try to solve.
The G and C matrices and input currents I.t/ depend on the circuit parameters,
such as metal wire width, length, and thickness on power grids, and transistor
parameters, such as channel length, width, gate oxide thickness, etc. Some previous
work assumes that all circuit parameters and current sources are treated as uncorre-
lated Gaussian random variables [47]. In this chapter, we consider both power grid
wire variations and the log-normal leakage current variations, caused by the channel
length variations, which are modeled as Gaussian (normal) variations [142].
Process variations can also be classified into inter-die (die-to-die) variations
and intra-die variations. In inter-die variations, all the parameters variations are
correlated. The worst-case corner can be easily found by setting the parameters
to their range limits (mean plus 3). The difficulty lies in the intra-die variations,
where the circuit parameters are not correlated or spatially correlated within a
die. Intra-die variations also consist of local and layout-dependent deterministic
components and random components, which typically are modeled as multivariate
Gaussian process with some spatial correlations [12]. In this chapter, we first assume
we have a number of independent (uncorrelated) transformed orthonormal random
Gaussian variables ./; i D 1; : : : ; n, which actually model the channel length
and the device threshold voltage variations and other variations. Then, we consider
spatial correlation in the intra-die variation. We apply the PCA method in Sect. 2.2
of Chap. 2 to transfer the correlated variables into uncorrelated variables before the
spectral statistical analysis.
Let ˝ denote the sample space of the experimental or manufacturing outcomes.
For ! 2 ˝, let d .!/ D Œ1d .!/; : : : ; rd .!/ be a vector of r Gaussian variables
to represent the circuit parameters of interest. After the PCA operation, we obtain
independent random variable vectors D Œ1 ; : : : ; n . Notice that n r in general.
Therefore, given the process variations, the MNA for (8.2) becomes
dv.t/
G./v.t/ C C./ D I.t; .//; (8.3)
dt
The variation in wire width and thickness will cause variation in the conductance
matrix G./ and capacitance matrix C./. The variations are more related to back
end of the line (BEOL) as power grids are mainly metals at top or middle layers.
The input current vector, I.t; .//, has both deterministic and random components.
In this chapter, to simplify our analysis, we assume the dynamic currents (power)
caused by circuit switching are still modeled as deterministic currents as we only
consider the leakage variations. Practically, the variations caused by the dynamic
power of circuits can be significant. But the voltage variations caused by the leakage
variations can be viewed as background noise, which can be considered together
with dynamic power-induced variations later.
112 8 Statistical Power Grid Analysis Considering Log-Normal Leakage Current Variations
To obtain the variation current sources I.t; .//, some library characterization
methods will be used to compute the I.t; .// once we know the effective channel
length Leff variations, threshold voltage (Vth ) variations, and other variable sources
under different input patterns. With those variation-aware cell library, we can more
accurately obtain the I.t; .// based on the logic simulation of the whole chip
under some inputs.
Note that from practical use perspective, a user may be only interested in voltage
variations over a period of time or worst case in a period of time. Those information
can be easily obtained once we know the variations in any given time instance.
In other words, the information we obtain here can be used to derive any other
information that is interesting to designers.
The problem we need to solve is to efficiently find the mean and variances of
voltage v.t/ at any node and at any time instance. A straightforward method is MC-
based sampling methods in Sect. 3.1 of Chap. 2. We randomly generate G./, C./,
and I.t; .//, which are based on the log-normal distribution; solve (8.3) in time
domain for each sampling; and compute the means and variances based on sufficient
samplings. Obviously, MC will be computationally expensive. However, MC will
give the most reliable results and is the most robust and flexible method.
Specifically, we expand the variational G and C around their mean values and
keep the first-order terms as in [22, 102, 134].
G./ D G0 C G1 1 C G2 2 C : : : C GM M ; (8.4)
C./ D C0 C C1 1 C C2 2 C : : : C CM M :
We remark that the presented method can be trivially extended to the second- and
higher-order terms [134]. The input current variation i.t; / follows the log-normal
distribution as leakage variations are dominant factors:
Note that input current variation i./ is not a function of time as we only model the
static leakage variations for the simplicity of presentation. However, the presented
approach can be easily applied to time-variant variations with any distribution.
To simplify the presentation, we first assume that C and G are deterministic in (8.3).
We will remove this assumption later. In case that v.t; / is unknown random
process as shown in Sect. 3.2 of Chap. 2 (with unknown distributions) like node
voltages in (8.3), then the coefficients can be computed by using Galerkin-based
5 Statistical Power Grid Analysis Based on Hermite PC 113
By applying the Galerkin equation (2.44) and noting the orthogonal property of
the various orders of Hermite PCs, we end up with the following equations:
dvi .t/
Gvi .t/ C C D Ii .t/; (8.8)
dt
where i D 0; 1; 2; ::; P .
For two independent Gaussian variables, we have
v.t; / D v0 .t/ C v1 .t/1 C v2 .t/2 C v3 .t/ 12 1
Cv4 .t/ 22 1 C v5 .1 2 /: (8.9)
Assuming that we have a similar second-order Hermite PC for input leakage current
I.t; /,
I.t; / D I0 .t/ C I1 .t/1 C I2 .t/2 C I3 .t/ 12 1
CI4 .t/ 22 1 C I5 .1 2 /: (8.10)
The (8.8) is valid with i D 0; : : : ; 5. For more (more than two) Gaussian variables,
we can obtain the similar results with more coefficients of Hermite PCs to be solved
by using (8.8).
Once we obtain the Hermite PC of v.t; /, we can obtain the mean and variance
of v.t; / by (2.39).
One critical problem remaining so far is how to obtain the Hermite PC (8.7)
for leakage current with log-normal distribution. Our method is based on Sect. 4 of
Chap. 2, and we will show how it can be applied to solve our problems for one or
more independent Gaussian variables.
Once we have the Hermite PC representation of the leakage current sources
I.t; /, the node voltages v.t; / can be computed by using (8.8).
Once we obtain the Hermite PC of v.t; /, we can obtain the mean and variance
of v.t; / trivially by (2.39).
114 8 Statistical Power Grid Analysis Considering Log-Normal Leakage Current Variations
Spatial correlations exist in the intra-die variations in different forms and have
been modeled for timing analysis [12, 121]. The general way to consider spatial
correlation is by means of mapping the correlated random variables into a set
of independent variables. This can be done by using some orthogonal mapping
techniques, such as PCA in Sect. 2.2 of Chap. 2. In this chapter, we also apply
PCA method in our spectral statistical analysis framework for power/grid statistical
analysis.
To consider intra-die variation in Vth , the chip is divided into n regions, assuming
˚ D Œ˚1 ; ˚2 ; : : : ; ˚n is a random variable vector, representing the variation of Vth
on different part of the circuit. In other words, in the ith region, the leakage current
Isubi D ce Vth .˚i / follows the log-normal distribution. Here, ˚i is a random variable
with Gaussian distribution. ˚ D Œˆ1 ; ˚2 ; : : : ; ˚n is the mean vector of ˚ and
C is the covariance matrix of ˚.
With PCA, we can get the corresponding uncorrelated random variables D
Œ1 ; 2 ; : : : ; n from the equation
D A.˚ ˚ /: (8.11)
Also, the original random variables can be expressed as
X
n
˚i D aij j C ˚i ; i D 1; 2; : : : n; (8.12)
j D1
where aij is the ith row, jth column element in the orthogonal mapping matrix
defined in (2.21). D Œ1 ; 2 ; : : : ; n is a vector with orthogonal Gaussian random
variables. The mean of j is 0 and variance is j , j D 1; 2; : : : ; n. The distribution
of i can be written as
O D ŒO1 ; O2 ; : : : ; On is a vector with orthogonal normal Gaussian random variable.
˚i can be expressed with normal random variables, O D ŒO1 ; O2 ; : : : ; On :
X
n q
˚i D aij j Oj C ˚i ; i D 1; 2; : : : ; n: (8.14)
j D1
(8.15)
5 Statistical Power Grid Analysis Based on Hermite PC 115
Here,
q
gj D aij j ; j D 1; 2; : : : ; n: (8.16)
dv.t/ O
Gv.t/ C C D Ii .t; /: (8.17)
dt
dv.t/
G.g /v.t/ C C.c / D I.I ; t/: (8.18)
dt
The variation in width W and thickness T will cause variation in conductance
matrix G and capacitance matrix C while variation in threshold voltage will cause
variation in leakage currents I . Thus, the conductance and capacitance of wires can
be expressed as in [47]:
G.g / D G0 C G1 g ;
C.c / D C0 C C1 c : (8.19)
I D eg.I / ;
g.I / D I C I I : (8.21)
Now the task is to compute coefficients of the Hermite PC of node voltage v.t; /.
Applying Galerkin equation (2.44), we only need to solve the equations as follows:
With the distribution of g , c , I , we can get these coefficients v.t/ D Œv0 .t/, v1 .t/,
: : : , v9 .t/T of node voltage as
e
Gv.t/ e dv.t/ D e
CC I .t/; (8.25)
dt
6 Numerical Examples 117
where
2 3
G0 G1 0 0 0 0 0 0 0 0
6 G1 G0 0 0 2G1 0 0 0 0 0 7
6 7
6 0 0 G0 0 0 0 0 G1 0 0 7
6 7
6 0 0 0 G 0 0 0 0 0 0 7
6 0 7
6 7
e 6 0 G1 0 0 G0 0 0 0 0 0 7
GD6 7
6 0 0 0 0 0 G0 0 0 0 0 7
6 7
6 0 0 0 0 0 0 G0 0 0 0 7
6 7
6 0 0 0 0 0 0 0 G0 0 0 7
6 7
4 0 0 0 G1 0 0 0 0 G0 0 5
0 0 0 0 0 0 0 0 0 G0
2 3
C0 0 C1 0 0 0 0 0 0 0
6 0 C0 0 0 0 0 0 C1 0 0 7
6 7
6 C1 0 C0 0 0 2C1 0 0 0 0 7
6 7
6 0 0 0 C 0 0 0 0 0 0 7
6 0 7
6 7
6
eD6 0 0 0 0 C 0 0 0 0 0 0 7
C 7
6 0 0 C1 0 0 C0 0 0 0 0 7
6 7
6 0 0 0 0 0 0 C0 0 0 0 7
6 7
6 0 0 0 0 0 0 0 C0 0 0 7
6 7
4 0 0 0 0 0 0 0 0 C0 0 5
0 0 0 C1 0 0 0 0 0 C0
e
I .t/ D ŒI0 .t/; 0; 0; I1 .t/; 0; 0; I2 .t/; 0; 0; 0T : (8.26)
Knowing Hermite PC coefficients of node voltage v.t; /, it is easy to get the mean
and variance of v.t; /, which describe the random characteristic of node voltage in
the given circuit.
We remark that the presented method will lead to large circuit matrices, which
will add more computation costs. To mitigate this scalability problem, for really
large power grid circuits, we can apply partitioning strategies to compute the
variational responses for each subcircuit, which will be small enough for efficient
computation, as done in the existing work [17, 206].
6 Numerical Examples
This section describes the simulation results of circuits with log-normal leakage
current distributions for a number of power grid networks. All the presented
methods have been implemented in Matlab. Sparse techniques are used in the
Matlab. All the experimental results have been carried out in a Linux system with
dual Intel Xeon CPUs with 3.06 GHz and 1 GB memory. The initial results of this
chapter were published in [108, 109].
118 8 Statistical Power Grid Analysis Considering Log-Normal Leakage Current Variations
The power grid circuits we test are RC mesh circuits based on the values from
some industry circuits, which are driven by only leakage currents as we are only
interested in the variations from the leakage currents. The resistor values are in the
range 102 ˝, and capacitor values are in the range of 1012 farad.
We first compare the presented method with the simple Taylor expansion method
for one and more Gaussian variables.
For simplicity, we assume one Gaussian random variable g./, which is ex-
pressed as
g D g C g ; (8.27)
where is a normalized Gaussian random variable with hi = 0, and h 2 i = 1.
The log-normal random variable l./, obtained from g./, is written as
Expand the exponential into Taylor series and keep all the terms up to second
order, then we have
X
1
1 XX
1 1
l./ D 1 C i gi C gi gj C : : :
i D0
2 i D0 j D0 i j
1 1
D 1 C g C 2g C g2 C .g C g g /
2 2
1 2 2
C g . 1/ C : (8.29)
2
We observe that the second-order Taylor expansion, as shown in (8.29), is
similar to second-order Hermite PC in (2.57). Hence, the Galerkin-based method
can still be applied; we then use (8.8) to obtain the Hermite PC coefficients
of node voltage v.t; / accordingly. We want to emphasize, however, that the
polynomials generated by Taylor expansion in general are not orthogonal with
respect to Gaussian distributions and cannot be used with Galerkin-based method,
unless we only keep the first order of Taylor expansion results (with less accuracy).
In this case, the resulting node voltage distribution is still Gaussian, which obviously
is not correct.
We note that the first-order Taylor expansion has been used in the statistic timing
analysis [12]. The delay variations, owing to interconnects and devices, can be
approximated with this limitation. The skew distributions may be computed easily
with Gaussian process.
6 Numerical Examples 119
To compare these two methods, we use the MC method to measure the accuracies
of two methods in terms of standard deviation. For MC, we sample 2,000 times,
which represents 97.7% accuracy. The results are summarized in Table 8.1. In
this table, ıg is the standard deviation of the Gaussian random threshold voltage
Gaussian variable in the log-normal current source, and HPC is the standard
deviation from the Hermite PC method in terms of relative percentage against the
MC method. Taylor is the standard deviation from the Taylor expansion method in
terms of relative percentage against the MC method.
We can observe that when the variation of current source increases, the Taylor
expansion method will result in significant errors compared to the MC method,
while the presented method has the smaller errors for all cases. This clearly shows
the advantage of the presented method.
Figure 8.2 shows the node voltage distributions at one node on a certain point of a
ground network with 1,720 nodes. The MC results are obtained by 2,000 samples.
The standard deviations of the log-normal current sources with one Gaussian
variable are 0.1. The mean and 3 computed by the Hermite PC method are also
marked in the figure, which fits very well with the MC results. Figure 8.3 shows the
node voltages and its variations caused by the leakage currents from 0 ns to 126 ns.
The circuit selected contains 64 nodes with one Gaussian variable of 0.06 in the
current source. The blue solid lines are mean, upper bound and lower bound. The
cyan lines are node voltages of MC with 2,000 times. Most of the MC results are in
between upper bound and lower bound.
Another observation is that when standard deviation, g , is small, the shape looks
like Gaussian as in Fig. 8.2, but it is log-normal indeed. In the case of two random
variables with one large and the other small standard deviations, the larger one
dominates, which shows the shape of log-normal as in Fig. 8.4.
To consider multiple random variables, we divide the circuit into several
partitions. We first divide the circuit into two parts. Figure 8.4 shows the node
voltage of one node of a particular time instance of a ground network with 336
nodes with two independent variables. The standard deviations for two Gaussian
variations are g1 D 0:5, g2 D 0:1. The 3 variations are also marked in the figure.
Tables 8.2 and 8.3 show the speedup of the Hermite PC method over MC method
with 2,000 samples considering one and two random variables, respectively.
120 8 Statistical Power Grid Analysis Considering Log-Normal Leakage Current Variations
←μ − 3 δ ←μ ←μ+3δ
Number of occurances
100
50
0
0.15 0.2 0.25 0.3 0.35 0.4 0.45
Voltage (volts)
Fig. 8.2 Distribution of the voltage in a given node with one Gaussian variable, g D 0:1, at time
50 ns when the total simulation time is 200 ns. Reprinted with permission from [109] c 2008
IEEE
2.8
voltage(v)
2.6
2.4
2.2
2
0 20 40 60 80 100 120 140
time(ns)
Fig. 8.3 Distribution of the voltage caused by the leakage currents in a given node with one
Gaussian variable, g D 0:5, in the time instant from 0 ns to 126 ns. Reprinted with permission
from [109]
c 2008 IEEE
6 Numerical Examples 121
140
120 ←μ−3δ ←μ ←μ+3δ
100
80
60
40
20
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
Voltage (volts)
Fig. 8.4 Distribution of the voltage in a given node with two Gaussian variables, g1 D 0:1 and
g2 D 0:5, at time 50 ns when the total simulation time is 200 ns. Reprinted with permission from
[109] c 2008 IEEE
Table 8.2 CPU time comparison with the Monte Carlo method of one random variable
Ckt #node p n MC(s) #MC HPC(s) Speedup
gridrc 6 280 2 1 766.06 2000 1.0156 754.3
gridrc 12 3240 2 1 4389 2000 8.3281 527.0
gridrc 5 49600 2 1 2:3 105 2000 298.02 771.76
Table 8.3 CPU time comparison with the Monte Carlo method of two random variables
Ckt #node p n MC (s) #MC HPC (s) Speedup
gridrc 3 280 2 2 1:05 103 2000 2.063 507.6
gridrc 5 49600 2 2 2:49 105 2000 445.6 558.7
gridrc 9 105996 2 2 6:11 105 2000 1141.8 535.1
In two tables, #node is the number of nodes in the power grid circuits. p is the
order of the Hermite PCs, and n is the number of independent Gaussian random
variables. #MC is the number of samples used for MC method. HPC and MC
represent the CPU times used for Hermite PC method and MC method, respectively.
It can be seen that the presented method is about two orders of magnitude faster than
the MC method.
When more Gaussian variables are used for modeling intra-die variations, we
need more Hermite PC coefficients to compute. Hence, the speedup will be smaller
if the MC method uses the same number of samples as shown in gridrc 12. Also, one
122 8 Statistical Power Grid Analysis Considering Log-Normal Leakage Current Variations
Φ1 = ξ1 + 0.5ξ2 Φ2 = ξ2 + 0.5ξ1
Fig. 8.5 Correlated random variables setup in ground circuit divided into two parts. Reprinted
with permission from [109]
c 2008 IEEE
observation is that the speedup depends on the sampling size in MC method. The
speedup of the presented method over the MC method depends on many factors such
as the order of polynomials, number of variables, etc. In general, speedup should not
have a clear relationship with the circuit sizes. We still use 2,000 samples
p for MC,
which represent about 97.7% accuracy (as the error in MC is roughly 1= 2000 for
2,000 samples).
To model the intra-die variations with spatial correlations, we divide the power grid
circuit into several parts. We first consider that circuit is partitioned into two parts.
In this case, we have two independent random current variables, 1 and 2 . The
correlated variables for the two parts are ˚1 D 1 C 0:52 and ˚2 D 2 C 0:51 ,
respectively, as shown in Fig. 8.5.
Table 8.4 shows the error percentage of mean and standard deviation of the
comparison between Monte Carlo and HPC with PCA and the comparison between
Monte Carlo and HPC without PCA. As shown in the table, it is necessary to use
PCA when spatial correlation is considered. Figure 8.6 shows the node voltage
distribution of one certain node in a ground network with 336 nodes, using both
PCA and non-PCA methods.
To get more accuracy, we divide the circuit into four parts, and each part has
correlation with its neighbor as shown in Fig. 8.7. is the correlated random
variable vector we use in the circuit.
D Œ
1 ;
2 ;
3 ;
4 are independent Gaussian
distribution random variables with standard deviations
1 D 0:1,
2 D 0:2,
3 D 0:1,
and
4 D 0:5. Figure 8.8 is the voltage distribution of a given node. The mean
voltage and voltages of worst case are given as the solid line. Figure 8.9 is the
voltage distribution of a circuit with 1,160 nodes. The circuit is partitioned into 25
parts of five rows and five columns with spatial correlation. The dashed blue lines
are mean, upper bound, and lower bound by Hermite PC. While the solid red lines
are mean, upper bound, and lower bound by MC of 2,000 times.
6 Numerical Examples 123
200
150
100
50
0
−0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Voltage(volts)
Fig. 8.6 Distribution of the voltage in a given node with two Gaussian variables with spatial
correlation, at time 70 ns when the total simulation time is 200 ns. Reprinted with permission from
[109] c 2008 IEEE
φ1=ζ1+0.5ζ2+0.5ζ3 φ3=ζ3+0.5ζ1+0.5ζ4
φ2=ζ2+0.5ζ1+0.5ζ4 φ4=ζ4+0.5ζ2+0.5ζ3
Fig. 8.7 Correlated random variables setup in ground circuit divided into four parts. Reprinted
with permission from [109]
c 2008 IEEE
Note that the size of the ground networks we analyzed is mainly limited by the
solving capacity of Matlab on a single Intel CPU Linux workstation. Given long
simulation time of large MC sampling runs, we limit the ground network size to
about 3,000 nodes.
Also note that for more accurate modeling, we need to have more partitions of
the circuits, and thus, more independent Gaussian variables are needed as shown
in [12].
Figure 8.10 shows the node voltage distribution at one node of ground circuit,
circuit5, which contains 280 nodes considering variation in conductance, capacitor,
and leakage current. The maximum 3ı variation is 10% in g , c , and I . In
the figures, the solid lines are the mean voltage and worst-case voltages using
124 8 Statistical Power Grid Analysis Considering Log-Normal Leakage Current Variations
350
300
Number of occurances
200
150
100
50
0
0 0.2 0.4 0.6 0.8 1
Voltage(volts)
Fig. 8.8 Distribution of the voltage in a given node with four Gaussian variables with spatial
correlation, at time 30 ns when the total simulation time is 200 ns. Reprinted with permission from
[109] c 2008 IEEE
200
150
100
50
0
2 2.5 3 3.5 4
Voltage(volts)
Fig. 8.9 Distribution of the voltage in a given node with circuit partitioned of 5 5 with spatial
correlation, at time 30 ns when the total simulation time is 200 ns. Reprinted with permission from
[109] c 2008 IEEE
6 Numerical Examples 125
200
← μ−3δ ←μ ← μ+3δ
150
100
50
0
0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.08
Voltage(volts)
Fig. 8.10 Distribution of the voltage in a given node in circuit5 with variation on G,C,I, at time
50 ns when the total simulation time is 200 ns. Reprinted with permission from [109] c 2008
IEEE
HPC method. The histogram bars are the Monte Carlo results of 2,000 samples.
The dotted lines are the mean voltage and worst-case voltage of the 2,000 samples.
From the figures, we can see that results got from two methods match very well.
Table 8.5 shows the CPU speedup of HPC method over MC method. The sample
number of Monte Carlo is 3,500, and we can see that the presented method is about
two orders of magnitudes faster than the MC method when considering variations
in conductance, capacitors, and voltage sources. The speedup becomes smaller for
larger circuits. This is because of the super-linear-time complexity of linear solver
as the augmented matrices in (8.26) grow faster than each individual matrices Gi
and Ci . The presented method does not favor very large circuits. Practically, this
scalability problem can be mitigated by using partitioning-based strategies [17].
126 8 Statistical Power Grid Analysis Considering Log-Normal Leakage Current Variations
7 Summary
In this chapter, we have presented a stochastic simulation method for fast estimating
the voltage variations from the process-induced log-normal leakage current varia-
tions with spatial correlations. The presented new analysis is based on the Hermite
PC representation of random processes. We extended the existing Hermite PC-based
power grid analysis method [47] by considering log-normal leakage distributions as
well as the consideration of the spatial correlations. The new method considers both
log-normal leakage distribution and wire variations at the same time. The numerical
results show that the new method is more accurate than the Gaussian-only Hermite
PC using the Taylor expansion method for analyzing leakage current variations and
two orders of magnitude faster than MC methods with small variation errors. In the
presence of spatial correlations, method without considering the spatial correlations
may lead to large errors, roughly 8–10% in our tested cases, if correlation is not
considered. Numerical examples show the correctness and high accuracy of the
presented method. It leads to about 1% or less of errors in both mean and standard
deviations and is about two orders of magnitude faster than MC methods.
Chapter 9
Statistical Power Grid Analysis by Stochastic
Extended Krylov Subspace Method
1 Introduction
In this chapter, we present a stochastic method for analyzing the voltage drop
variations of on-chip power grid networks with log-normal leakage current varia-
tions, which is called StoEKS and which still applies the spectral-stochastic-method
to solve for the variational responses. But different from the existing spectral-
stochastic-based simulation method, the EKS method [177, 191] is employed to
compute variational responses using the augmented matrices consisting of the
coefficients of Hermite polynomials. Our work is inspired by recent spectral-
stochastic-based model order reduction method [214]. We apply this work to the
variational analysis of on-chip power grid networks considering the variational
leakage currents with the log-normal distribution.
Our contribution lies in the acceleration of the spectral stochastic method
using the EKS method to fast solve the variational circuit equations for the first
time. By using the Krylov-subspace-based reduction technique, the new method
partially mitigates the increased circuit-size problem associated with the augmented
matrices from the Galerkin-based spectral stochastic method. We will show how the
coefficients of Hermite PCs are computed for variational circuit matrices and for the
current moments used in EKS with log-normal distribution. Numerical examples
show that the presented StoEKS is about two orders of magnitude faster than the
existing Hermite PC-based simulation method, having similar error compared with
MC method. StoEKS can analyze much larger circuits than the existing Hermite PC
method in the same computation platform.
The variational power grid models and problem we plan to solve here are
the same as in Chap. 8. The rest of this chapter is organized as the follows:
Sect. 3 reviews the orthogonal PC-based stochastic simulation method and the
improved EKS method. Section 4 presents our new statistical power grid simulation
method. Section 5 presents the numerical examples and Sect. 6 concludes this
chapter.
2 Problem Formulation
In this chapter, we assume that the variational current source in (8.3), u.t; /,
consists of two components:
where ud .t/ is the dynamic current vector from circuit switching, which is still
modeled as deterministic currents as we only consider the leakage variations.
uv .; t/ is the variational leakage current vector, which is dominated by subthreshold
leakage currents and it may change over time also. uv .t; / follows the log-normal
distribution.
The problem we need to solve is to efficiently find the mean and variance of
voltage u.t; / at any node at any time instance without using the time-consuming
sampling-based method, such as MC.
In this subsection, we briefly review the EKS method in [191] and [89] for fast
computation of responses from linear dynamic systems.
The EKS method uses the Krylov-like reduction method to speed up the simula-
tion process. Different from the Krylov-based model order reduction method, EKS
performs the reduction considering both system matrices and input signals before
the simulation (so the subspace is no longer Krylov subspace). So it essentially is a
simulation approach using the Krylov subspace reduction method. It assumes input
signals can be represented by piecewise linear (PWL) sources.
Let V D ŒOv1 ; vO 2 ; :::Ovk be an orthogonal basis for moment subspace .m0 , m1 ,
: : :, mk / of input u.t/. Following is the high-level description of the EKS algorithm
(Fig. 9.1) [191].
Then the original circuit described by (8.2) can be reduced to a smaller system:
O C CO dz.t/ D Bu;
Gz O (9.2)
dt
where
After the reduced system in (9.2) has been solved for the given input u.t/, the
solution z.t/ can then be mapped back into original space by v.t/ D V z.t/.
As the EKS models a PWL source as a sum of delayed ramps in Laplace domain,
the terms, however, contain 1=s and 1=s 2 moments [191], while the traditional
3 Review of Extended Krylov Subspace Method 129
1 vO 0 D ˛0 v0 , where v0 D G 1 Bu0 , ˛0 D 1
norm.v0 /
;
2 set hs D 0;
3 for i D 1 W q 1
4 vi D G 1 f˘ji1 D0 ˛j Bui C.O
vi1 C ˛i1 hs /g;
5 hs D 0;
6 for j D 0 W i 1
7 h D vO Tj vi ;
8 hs D hs C hOvj ;
9 end
10 vN i D vi hs ;
11 ˛i D norm.N 1
vi /
;
12 vO i D ˛i vN i
13 end
Krylov space starts from 0th moment. Therefore, moment shifting must be made
in EKS, which would cause complex computation and more errors. This problem is
resolved in [89] in the IKES algorithm, which shows that the moments of 1=s and
1=s 2 are zeros for PWL input sources.
Assume that we want to obtain a single input source uj .s/ in the following
moment form:
uj .s/ D u1 C u2 s C u3 s 2 C C uL s L1 :
A PWL source uj .t/ is represented by a series of value-time pairs such as .a1 ; 1 /,
.a2 ; 2 /; :::; .aKC2 ; KC2 /; and L moments needed to be calculated. As proposed
in [89], the mth moment for current source uj .t/ in a current source vector u.s/ can
be calculated as
X k
1 .m/ .mC1/
uj;m D a1 ˛1 ˇ1 .˛i ˛i C1 /ˇi C1
mC1 i D1
kC2 .m/
aKC2 ˛KC1 ˇKC2 ; m D 1; :::; L: (9.3)
mC1
Here,
.m/ .i /m ai C1 ai
ˇi D ; ˛i D :
mŠ i C1 i
The EKS/IEKS method, however, has its limitations. One major drawback is that
current sources have to be represented in the explicit moment form, which may
130 9 Statistical Power Grid Analysis by Stochastic Extended Krylov...
not be accurate and not numerically stable when high-order moments are employed
for high-frequency-rich current waveforms owing to the well-known problem in the
explicit moment matching method [136].
Recently, more stable and accurate algorithm, called ETBR, has been pro-
posed [93], which is based on more accurate fast truncated balanced reduction
method. It uses a frequency spectrum to represent the current sources, and thus,
is more flexible and accurate. Since our contribution in this chapter is not about
improving the EKS method, we just use EKS as a baseline algorithm for StoEKS.
In this section, we present the new stochastic simulation algorithm, StoEKS, which
is based on both the spectral stochastic method and the EKS method [191]. The main
idea is that we use the spectral stochastic method to convert the statistical simulation
into a deterministic simulation problem. Then we apply EKS to solve the converted
problem.
First, we present StoEKS algorithm flowchart, which is shown in Fig. 9.2. The
algorithm starts with variational G./, C./, and variational input source u.t; /.
Then, it applies spectral stochastic method to convert the variational system (8.3)
into a deterministic system, which consists of augmented matrices of G./ and
C./ and position matrix B in (8.3) with new unknowns. Then we generate the first
L moments of coefficients of Hermite polynomial of current sources, UL , with log-
normal distribution. Finally, we apply EKS/IEKS to solve the obtained deterministic
system for response Z using the computed projection matrix V . After this, we get
back to the transient response of the original augmented system by v.t/ D V z.t/.
Finally, we compute the mean and variance of any voltage node from v.t/.
In the following subsections, we present the detailed descriptions for some
critical steps of the StoEKS algorithm.
We first show how we convert the variational circuit equation into a deterministic
one, which is suitable for EKS. Our work follows the recently presented stochastic
model order reduction (SMOR) method [214]. SMOR is based on Hermite PC and
the Krylov-based projection method.
4 The Stochastic Extended Krylov Subspace Method—StoEKS 131
We first assume that G./, C./, and u.t; / in (8.3) are represented in Hermite
PC forms with a proper order P :
Here, Hi ./ are the Hermite PC basis functions for G./, C./, and u.t; /. P is
also the number of these basis functions, which depends on the number of random
variables n and the expansion order p in (2.31). Gi , Ci , and ui are the Hermite
polynomial coefficients of conductance, capacitors, and current source. G0 and C0
are the mean value of conductance and capacitors. Gi and Ci are variational part for
conductance and capacitors.
Ideally, to obtain the G and C in the HPC format, i.e., to compute Gi and Ci from
the width and length variables, one can use spectral stochastic analysis method [86],
132 9 Statistical Power Grid Analysis by Stochastic Extended Krylov...
which is a fast MC method or other extraction methods. For this chapter, we simply
assume that we obtain such information. The detail of how Gi and Ci are obtained
is as follows:
Gi D ai G0 ;
Ci D ai C0 ; i D 1; :::; P: (9.4)
X
P 1 P
X 1 X
P 1 P
X 1
Gi vj Hi Hj C s Ci vj Hi Hj
i D0 j D0 i D0 j D0
X
P 1
D ud .t/ C ui .t/Hi : (9.5)
i D0
After performing the inner product of Hk on both sides of the equation (9.5), it will
become
X
P 1 P
X 1 X
P 1 P
X 1
Gi vj hHi Hj ; Hk i C s Ci vj hHi Hj ; Hk i
i D0 j D0 i D0 j D0
X
P 1
D ui hHi ; Hk i C hHk ; 1ivd .t/; k D 0; 1; :::; P 1; (9.7)
i D0
The inner product is a constant and can be computed a priori and stored in a table
for fast computation. Based on the P equations and the orthogonal nature of the
Hermite polynomials, these equations can be written in matrix form as
2 3
G00 : : : G0P 1
6 7
Gsts D 4 ::: : : : ::
: 5;
GP 0 : : : GP 1P 1
2 3
C00 : : : C0P 1
6 7
Cst s D 4 ::: ::
:
::
: 5;
CP 10 : : : CP 1P 1
2 3 2 3
u0 .t/ C ud .t/ V0 .t/
6 u1 .t/ 7 6 V1 .t/ 7
6 7 6 7
usts D6 :: 7;V D 6 :: 7; (9.11)
4 : 5 4 : 5
uP 1 .t/ VP 1 .t/
2 3
B0 : : : 0
6 :: : : : 7
Bsts D4 : : :: 5 (9.12)
0 : : : BP 1
X
P 1 X
P 1
Bi D B; Gkj D Gi hHi Hj ; Hk i; Ckj D Ci hHi Hj ; Hk i;
i D0 i D0
where Gsts 2 RmPmP , Csts 2 RmPmP , Bsts 2 RmP l , m is the size of the
original circuit, and P is the number of Hermite polynomials. In [214], PRIMA-
like reduction is performed on (9.10) to obtain the reduced variational system.
In this section, we show how to compute the Hermite coefficients for the varia-
tional leakage currents and their corresponding moments used in the augmented
equation (9.10).
134 9 Statistical Power Grid Analysis by Stochastic Extended Krylov...
Let uiv .t; / be the i th current in the current vector uv .t; / in (9.1), which is a
function of the normalized Gaussian random variables D Œ1 ; 2 ; :::; n and time t:
Pn
uiv .t; / eg.t;/ D e j D0 gj .t /j : (9.13)
The leakage current sources are therefore following log-normal distribution. We can
then present uiv .t; / by using Hermite PC expansion form:
X
P
uiv .t; / D uivk .t/Hkn ./
kD0
0
X
n X
n X
n
. i j ıij /
D uiv0 .t/ @1 C i gi .t/ C
i D1 i D1 j D1
< . i j ıij /2 >
1
where
Pn X
p
.n 1 C k/Š
g0 .t /C 12 gi .t /2
uiv0 .t/ De i D1 ;P D : (9.15)
kŠ.n 1/Š
kD0
Given the Gst s , Cst s , and ust s in moment forms, we can obtain the orthogonal
V using the EKS algorithm. The reduced systems then can be obtained by this
orthogonal basis V from equation (9.3). The reduced system will become
dz.t/
GO st s z.t/ C CO st s D BOst s ust s : (9.18)
dt
Here,
The reduced system can be solved in the time domain by any standard integration
algorithm. The solution of the reduced system, z.t/, can then be projected back to
original space by vQ .t/ D V z.t/.
By solving the augmented equation in (9.10), we can obtain mean and variance
of any node voltage v.t/ by
1
!
X
P
E.v.t// D E v0 .t/ C vi .t/Hi D v0 ;
i D1
1
! 1
X
P X
P
var.v.t// D var v0 .t/ C vi .t/Hi D vi .t/2 var.Hi /:
i D1 i D1
Further, the distribution of v.t/ can also be easily calculated by the characteristic
of Hermite PC and the distribution of 1 ,2 ,...,N . Figure 9.3 is the StoEKS algorithm
for given Gst s , Cst s , Bst s , and ust s .
136 9 Statistical Power Grid Analysis by Stochastic Extended Krylov...
In the following, we consider a simple case where we only have three independent
variables to illustrate the method. We assume that there are three independent
variables g , c , and I associated with matrices G and C and input sources,
respectively, in the circuit.
We assume that the variational component in (9.1), uv .t; I /, follows log-normal
distribution as
dv.t/
G.g /v.t/ C C.c / D Bu.t; I /: (9.21)
dt
The variation in width W and thickness T will cause variation in conductance matrix
G and storage matrix C while variation in threshold voltage will cause variation in
leakage currents u.t; I /. Thus, the resulting system can be written as [47]
dv.t/
Gst s v.t/ C Cst s D Bst s ust s .t/; (9.23)
dt
where
2 3
G0 G1 0 0 0 0 0 0 0 0
6 G1 G0 0 0 2G1 0 0 0 0 0 7
6 7
6 0 0 G0 0 0 0 0 G1 0 0 7
6 7
6 0 0 7
6 0 0 G0 0 0 0 0 G1 7
6 7
6 0 G1 0 0 G0 0 0 0 0 0 7
Gst s D6 7
6 0 0 0 0 0 G0 0 0 0 0 7
6 7
6 0 0 0 0 0 0 G0 0 0 0 7
6 7
6 0 0 G1 0 0 0 0 G0 0 0 7
6 7
4 0 0 0 G1 0 0 0 0 G0 0 5
0 0 0 0 0 0 0 0 0 G0
4 The Stochastic Extended Krylov Subspace Method—StoEKS 137
2 3
C0 0 C1 0 0 0 0 0 0 0
6 0 C0 0 0 0 0 0 C1 0 0 7
6 7
6 C1 0 C0 0 0 2C1 0 0 0 0 7
6 7
6 0 C1 7
6 0 0 C0 0 0 0 0 0 7
6 7
6 0 0 0 0 C0 0 0 0 0 0 7
Cst s D6 7
6 0 0 C1 0 0 C0 0 0 0 0 7
6 7
6 0 0 0 0 0 0 C0 0 0 0 7
6 7
6 0 C1 0 0 0 0 0 C0 0 0 7
6 7
4 0 0 0 0 0 0 0 0 C0 0 5
0 0 0 C1 0 0 0 0 0 C0
ust s .t/ D Œu0 .t/ C ud .t/; 0; 0; u3 .t/; 0; 0; u6 .t/; 0; 0; 0T :
One observation we have is that although the augmented circuit matrices are much
bigger than before, they are very sparse and also consist of repeated coefficient
matrices from the HPC. As a result, the reduction techniques can significantly
improve the simulation efficiency.
In this subsection, we analyze the computing costs for both StoEKS and HPC
methods and show the theoretical advantage of StoEKS over the non-reduction-
based HPC method.
First, if the PCA operation is performed, which essentially uses SVD on the
covariance matrix, its computation cost is O.ln2 /. Here, l is the number of original
correlated random variables and n is the first n dominant singular values, which
is also the number of independent random variables after PCA. Since the random
viable l is typically much smaller than the circuit size, the running time of PCA is
is not significant for the total cost.
After we transform the original circuit matrices into the augmented circuit
matrices in (9.10), which are still very sparse, the matrix sizes grow from m m
to P m P m, where P is the number of Hermite polynomials used. The number
is dependent on the Hermite polynomial order and the number of variable used as
shown in (2.31).
Typically, solving an n n linear matrix takes O.n˛ / (typically, 1 ˛ 1:2
for sparse circuits), and matrix factorizations take O.nˇ / (typically, 1:1 ˇ
1:5 for sparse circuits). For HPC, assuming that we need to compute w time steps
in transient analysis (taking w forward and backward substitutions after one LU
decomposition), the computing cost then is
While for StoEKS, we only need to approximately take q, the order of the reduced
model, steps (after the one LU decomposition) to compute the projection matrix V .
So the total computational cost is
O q.mP /˛ C .mP /ˇ C mP q 2 C q 3 C wq 2 ; (9.25)
without considering the cost of the PCA operations (ln2 ) as we did not perform
the PCA in our experiments. The last three items are the costs of performing the
reductions (QR operation) and transient simulation of the reduced circuit (which
have very dense matrices) in time domain. Since q w, the computing cost of
StoEKS can be significantly lower than HPC. Also the presented method can be
further improved by using the hierarchical EKS method [11].
5 Numerical Examples
This section describes the simulation results of circuits with both capacitance and
conductance variations and leakage current variation. The leakage current variation
follows log-normal distribution. The capacitance and conductance variations follow
Gaussian distribution.
All the presented methods have been implemented in Matlab 7.0. All the
experimental results are carried out on a Dell PowerEdge 1900 workstation (using a
Linux system) with Intel Quadcore Xeon CPUs with 2.99 Ghz and 16 GB memory.
To solve large circuits in Matlab, an external linear solver package UMFPACK [184]
has been used, which is linked with Matlab using Matlab mexFunction. The initial
results of this chapter were published in [110, 111].
As mentioned in Sect. 4 of Chap. 8, we assume that the random variables used
in the chapter for G and C and current sources are independent after the PCA
transformation.
First, we assume a time-variant leakage model, in which we assume that uiv .t; /
in (9.13) is a function of time t and further assume that gj .t/, the standard deviation,
is a fixed percentage, say 10%, of vd .t/ in (9.1), i.e., gi .t/ D 0:1ud i .t/, where ud i .t/
is the i th component of the PWL current vd .t/.
Figures 9.4–9.6 show the results at one particular node under this configuration.
Figure 9.4 shows the node voltage distribution at one node of a ground network
with 280 nodes, considering variation in conductance, capacitance, and leakage
current (with three random variables). The standard deviation (s.d.) of the log-
normal current sources with one Gaussian variable is 0:1ud i .t/. The s.d. in
conductance and capacitance are also 0:1 of the mean. The mean and s.d. computed
by the Hermite PC method, Hermite PC with EKS are also marked in the figure,
which fit very well with the MC results. In Fig. 9.4, the dotted lines are the mean and
s.d. calculated by MC. The solid lines are the mean and s.d. by the algorithm [108],
which is named as HPC. The dashed lines are the results from StoEKS. The MC
results are obtained by 3,000 samples. The reduced order for EKS is five, q D 5.
5 Numerical Examples 139
350
300
250
200
150
100
50
0
0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11
Voltage(volts)
Fig. 9.4 Distribution of the voltage variations in a given node by StoEKS, HPC, and Monte
Carlo of a circuit with 280 nodes with three random variables. gi .t / D 0:1ud i .t /. Reprinted with
permission from [110] c 2008 IEEE
Figure 9.5 shows the distribution at one node of a ground network with 2,640
nodes. The parameter gi .t/ value is set to the same as the ones in the circuit with 280
nodes. The s.d. in conductance are 0.02, 0.05, and 0.1 of the mean for three variables.
The s.d. in capacitance are 0:02, 0:02, and 0:1 of the mean for three variables. There
are totally seven random variables. The dotted lines represent the MC results. And
the dashed lines represent the results given by StoEKS. From these two figures, we
can only see marginal difference between the three different methods. The reduced
order for EKS is also five, q D 5.
Figure 9.6 shows the distribution at one node of a ground network with 280
nodes. But the variation setting of parameters is different. The standard deviations in
conductance are set to 0:02, 0:02, 0:03, 0:05, and 0:05 of the mean for five variables,
respectively, i.e., their a1 in (9.4) is set to those values. The standard deviations in
capacitance are set to 0:02, 0:03, 0:04, 0:05, and 0:05 of the mean for five variables,
respectively, also. The standard deviation of the log-normal current sources is 0:1
of the mean. There are 11 random variables in all. It is even harder for HPC to
compute mean and s.d. of the circuit. The dotted lines represent the MC results.
And the dashed lines represent the results given by StoEKS. The reduced order for
EKS is ten.
Table 9.1 shows the speedup of the StoEKS and HPC methods over MC method
under different numbers of random variables. In the table, #RV is the number of
140 9 Statistical Power Grid Analysis by Stochastic Extended Krylov...
300
250
200
150
100
50
0
0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65
Voltage(volts)
Fig. 9.5 Distribution of the voltage variations in a given node by StoEKS, HPC, and MC of
a circuit with 2,640 nodes with seven random variables. gi .t / D 0:1ud i .t /. Reprinted with
permission from [110] c 2008 IEEE
random variables used. In the table, there are 3, 7, and 11 random variables. The
variation value setup of three random variables is the same as the circuit used in
Fig. 9.4. The variation value setup of seven random variables is the same as the
circuit used in Fig. 9.5. The variation value setup of 11 random variables is the
same as the circuit used in Fig. 9.6. The first speedup is the speedup of StoEKS over
MC, and the second speedup is the speedup of HPC over MC.
From the table, we observe that we cannot obtain the results from HPC or MC
when the circuit becomes large enough in reasonable time. Meanwhile, StoEKS can
deliver all the results.
We remark that the intra-die variations are typically very spatially correlated [16].
After the transformation like PCA, the number of variables can be significantly
reduced. As a result, in our examples, we do not assume large number of variables.
Tables 9.2 and 9.3 show the mean and s.d. comparison of different methods over
the MC method for several circuits. Again, #RV is the number of random variables
used. Table 9.2 contains the values we obtain from different methods, and Table 9.3
presents the error comparison of StoEKS and HPC over Monte Carlo, respectively.
5 Numerical Examples 141
400
← μ−3δ ←μ ← μ+3δ
350
Number of occurances
300
250
200
150
100
50
0
0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65
Voltage(volts)
Fig. 9.6 Distribution of the voltage variations in a given node by StoEKS and MC of a circuit with
2,640 nodes with 11 random variables. gi .t / D 0:1ud i .t /. Reprinted with permission from [110]
c 2008 IEEE
Table 9.1 CPU time comparison of StoEKS and HPC with the Monte Carlo method.
gi .t / D 0:1ud i .t /
#nodes #RV MC StoEKS Speedup HPC [108] Speedup
280 3 694.35 0:3 2314:5 2:37 292:97
280 7 671.46 2:37 283:31 227:94 2:94
280 11 684.88 24:26 28:23 914:34 0:74
2,640 3 5925.7 4:33 1368:5 55:35 107:1
2,640 7 5927.6 25:02 236:9 1952:2 3:04
2,640 11 6042.2 693:27 8:72 – –
12,300 3 3:54 104 21:62 1637:4 298:84 118:5
12,300 7 3:30 104 151:71 217:65 – –
119,600 3 – 258:21 – – –
119,600 7 – 2074:8 – – –
1,078,800 3 – 1830:4 – – –
142 9 Statistical Power Grid Analysis by Stochastic Extended Krylov...
Table 9.3 Error comparison of StoEKS and HPC over Monte Carlo
methods. gi .t / D 0:1ud i .t /
StoEKS % HPC % StoEKS % HPC %
#nodes #RV error in error in error in error in
280 3 0.19 0.28 3.14 3.10
2,640 3 1.23 1.05 4.31 4.51
12,300 3 0.10 0.08 2.95 2.98
280 7 0.063 0.17 1.12 1.54
2,640 7 0.076 0.11 4.18 4.60
12,300 7 0.23 – 0.23 –
280 11 0.42 0.21 0.18 0.52
2,640 11 0.18 – 0.30 –
0.04
0.035
0.03
0.025
Ams
0.02
0.015
0.01
0.005
0
0 0.5 1 1.5 2
time(s) x 10−7
Fig. 9.7 A PWL current source at certain node. Reprinted with permission from [110]
c 2008
IEEE
6 Summary 143
300
250
200
150
100
50
0
0.04 0.05 0.06 0.07 0.08 0.09 0.1
Voltage(volts)
Fig. 9.8 Distribution of the voltage variations in a given node by StoEKS, HPC, and Monte Carlo
of a circuit with 280 nodes with three random variables using the time-invariant leakage model.
gi D 0:1Ip . Reprinted with permission from [110] c 2008 IEEE
We can see that StoEKS only has marginal difference from MC while it is able to
perform simulation on much larger circuit than the existing HPC method on the
same platform.
Finally, we use a time-invariant leakage model, in which we assume that uiv ./
in (9.13) is not a function of time t and further assume that gj , which is the standard
deviation, is a fixed percentage, of a constant current value in (9.1). In our test cases,
we use the peak current, Ip 41 mA as shown in Fig. 9.7, as the constant value.
Figure 9.8 shows the results in this configuration.
6 Summary
In this chapter, we have presented a fast stochastic method for analyzing the voltage
drop variations of on-chip power grid networks. The new method, called StoEKS,
applies HPC to represent the random variables in both power grid networks and
input leakage currents with log-normal distribution. This HPC method transforms
144 9 Statistical Power Grid Analysis by Stochastic Extended Krylov...
1 Introduction
The rest of this chapter is as follows: Sect. 2 reviews the EKS methods and
fast balanced truncation methods. Our new variational analysis method varETBR is
presented in Sect. 3. Section 4 shows the experimental results, and Sect. 5 concludes
this chapter.
The truncated balanced realization (TBR)-based reduction method has two steps
in the reduction process: The balancing step transforms the states that can be
controlled and observed equally. The truncating step then throws away the weak
states, which usually leads to much smaller models. The major advantage of the
TBR method is its ability to give a deterministic global bound for the approximate
error as well as provide nearly optimal models in terms of errors and model sizes.
Given a system in a standard state-space form,
x.t/
P D Ax.t/ C Bu.t/;
y.t/ D C x.t/; (10.1)
AX C XAT C BB T D 0;
AT Y C YA C C T C D 0: (10.2)
T 1 X Y T D † D diag.1 2 ; 2 2 ; : : : ; n 2 /; (10.3)
where T matrix is the transformation matrix and the Hankel singular values of the
system, (k ), are arranged in a descending order. If we partition the matrices as
T
W1 †1 0
X Y V1 V2 D ; (10.4)
W2T 0 †2
x.t/
P D Ar x.t/ C Br u.t/;
y.t/ D Cr x.t/; (10.5)
2 Review of Fast Truncated Balanced Realization Methods 147
The TBR method generally suffers high computation costs, as it needs to solve
expensive Lyapunov equations (10.2). To mitigate this problem, fast TBR meth-
ods [134, 196] have been proposed recently, which compute the approximate
Gramians. The Poor men’s TBR method or PMTBR [134] was proposed for
variational interconnect modeling.
Specifically, the Gramian X can also be computed in the time domain as
Z 1
T
XD eAt BB T eA t dt: (10.6)
0
From Parseval’s theorem, and the fact that the Laplace transform of eAt is .sI
A/1 , the Gramian X can also be computed in the frequency domain as
Z C1
XD .j!I A/1 BB T .j!I A/H d!; (10.7)
1
where superscript H denotes Hermitian transpose. Let !k be the kth sampling point.
If we define
zk D .j!k I A/1 B; (10.8)
then based on the numerical quadrature rule, X can be approximated as [134]:
X
XO D wk zk zH 2 H
k D ZW Z ; (10.9)
In [134], PMTBR has been extended to reduce interconnect circuits with variational
parameters. The idea is that the computation of Gramian in (10.7) can be viewed as
the mean computation of .j!I A/1 BB T .j!I A/H with respect to statistical
variable !, the frequency. If we have more statistical variable parameters, the
Gramians can be still viewed as the mean computation, but over all the variables
(including the frequency variables).
In the fast TBR framework, computing Gramian (10.7) is essentially a one-
dimensional integral with respect to the complex frequency !. When multiple
variables with specific distributions are considered, multidimensional integral with
respect to random variables will be computed. As in PMTBR, the MC method was
still employed in variational TBR to compute the multiple-dimensional integral.
One important observation in varPMTBR is that the number of samplings in
building subspaces is much smaller than the number of general MC samplings
for achieving the same accuracy. As a result, varPMTBR is much faster than
the brute-force Monte Carlo method, and its costs are much less sensitive to the
number of random variables and variation ranges, which makes this method much
more efficient than the existing variational or parameterized model order reduction
methods [208].
In this section, we detail the presented varETBR method. We first present the
recently proposed ETBR method for deterministic power grid analysis based on
reduction techniques.
The presented method is based on the recently proposed ETBR method [93]. We
first review this method.
For a linear system in (8.2), we first define the frequency-domain response
Gramian,
Z C1
Xr D .j!C C G/1 Bu.j!/uT .j!/B T .j!C C G/H d!; (10.11)
1
which is different from the Gramian concepts in the traditional TBR-based reduction
framework. Notice that in the new Gramian definition, the input signals u.j!/ are
considered. As a result, .j!C C G/1 Bu.j!/ serves as the system response with
respect to the input signal u.j!/ and resulting Xr becomes the response Gramian.
3 The Presented Variational Analysis Method: varETBR 149
where Zr is a matrix whose columns are zrk and W is a diagonal matrix with diagonal
p
entries wkk D wk . wk comes from a specific quadrature method.
The projection matrix can be obtained by singular value decomposition (SVD)
of Zr . After this, we can reduce the original matrices into small ones and then
perform the transient analysis on the reduced circuit matrices. The ETBR algorithm
is summarized in Fig. 10.1.
Notice that we need the frequency response caused by input signal u.j!k /
in (10.12). This can be obtained by FET on the input signals in time domain.
Using frequency spectrum representations for the input signals is a significant
improvement over the EKS method as we avoid the explicit moment representation
of the current sources, which are not accurate for currents rich in high-frequency
components due to the well-known problems in explicit moment matching meth-
ods [137]. Accuracy is also improved owing to the use of the fast balanced truncation
method for the reduction, which has global accuracy [112, 134].
150 10 Statistical Power Grid Analysis by Variational Subspace Method
Note that we use congruence transformation for the reduction process with
orthogonal columns in the projection matrix (by using Arnoldi or Arnoldi-like
process); the reduced system must be stable. For simulation purposes, this is
sufficient. If all the observable ports are also the current source nodes, i.e., y.t/ D
B T v.t/, where y.t/ is the voltage vector at all observable ports, the reduced system
is also passive. It was also shown in [134] that the fast TBR method has similar
time complexity to multiple-point Krylov-subspace-based reduction methods. The
extended TBR method also has similar computation costs as the EKS method.
For a linear dynamic system formulated in state space equations (MNA) in (8.2), if
complex frequency j! is a vector of random variables with uniform distribution in
the frequency domain, then the state responses V .j!/ D .G C j!C /1 Bu.!/
become random variables in frequency domain. Its covariance matrix can be
computed as
Z C1
˚
Xr D E V .j!/V .j!/T D V .j!/V .j!/T d!; (10.14)
1
where Efxg stands for computing the mean of random variable x. Xr is defined
in (10.11). The response Gramian essentially can be viewed as the covariance matrix
associated with state responses. Xr can also be interpreted as the mean for function
P .j!/ on evenly distributed random variables j! over Œ1; C1.1 ETBR method
actually performs the PCA transformation of the mentioned random process with
uniform distribution.
Define P .j!/ D V .j!/V .j!/T . Now suppose in addition to the frequency variable
j!, P .j!; / is also the function of the random variable with probability density
1
Practically, the interesting frequency range is always bounded.
3 The Presented Variational Analysis Method: varETBR 151
G./
O D VrT G0 Vr C VrT G1 Vr 1 C C VrT GM Vr M ; (10.17)
CO ./ D VrT C0 Vr C VrT C1 Vr 1 CC VrT CM Vr M : (10.18)
The algorithm starts with the given power grid network and the number of sam-
plings q, which are used for building the projection subspace. Then it computes the
1
variational response zrk D sk C.1k ; :::; M
k
/ C G.1k ; :::; M
k
/ B u.sk ; 1k ; :::; M
k
/
r r r
randomly. Then we perform the SVD on Zr D Œz1 ; z2 ; : : : ; zq to construct the
projection matrix. After the reduction, we perform the MC-based statistical analysis
to obtain the variational responses from v.t/ D Vr vO .t/.
152 10 Statistical Power Grid Analysis by Variational Subspace Method
We remark that in both Algorithm 10.1 and Algorithm 10.2, we perform MC-like
random sampling to obtain q frequency sampling points over the M C1 dimensional
space for given frequency range and parameter spaces (for Algorithm 10.1, sampling
is on the given frequency range only). We note that the MC-based sampling method
is also used in the PMTBR method [134].
Compared with existing approaches, varETBR offers several advantages and
features. First, varETBR only uses MC sampling, it is easy to implement, and is
very general for dealing with different variation distributions and large variation
ranges. It is also more amenable for parallel computing as each sampling in
frequency domain can be done in parallel. Second, it is vary scalable for solving
large networks with large number of variables as reduction is performed. Third,
varETBR is more accurate over wide band frequency ranges as it samples over
frequency band (compared with the less accurate moment matching-based EKS
method). Last, it avoids the explicit moment representation of the input signals,
leading to more accurate results than the EKS method when signals are rich in high
frequency components.
4 Numerical Examples
The varETBR algorithm has been implemented using Matlab and tested on an Intel
quad-core workstation with 16 GB memory under Linux environment. The initial
results of this chapter were published in [91, 92].
4 Numerical Examples 153
All the benchmarks are real PG circuits from IBM provided by [123], but the
circuits in [123] are resistor-only circuits. For transient analysis, we need to add
capacitors and transient input waveforms. As a result, we modified the benchmark
circuits. First, we added one grounded capacitor on each node with a random value
in the magnitude of pF. Second, we replaced the DC current sources by a PWL signal
in the benchmark. The values of these signals are also randomly generated based on
their original values in the DC benchmarks. We implemented a parser using Python
to transform the SPICE format benchmarks into Matlab format.
The summary of our transient PG benchmarks is shown in Table 10.1. We use
MNA formulation to set up the circuit matrices. To efficiently solve PG circuits with
1.6 million nodes in Matlab, an external linear solver package UMFPACK [184] is
used, which is linked with Matlab using Matlab mexFunction.
We will compare varETBR with the MC method, first in accuracy and then
in CPU times. In all the test cases, the number of samples used for forming the
subspace in varETBR is 50, based on our experience. The reduced order is set to
p D 10, which is sufficiently accurate in practice. Here we set the variation range,
the ratio of the maximum variation value to the nominal value, to 10% and set the
number of variables to 6 (2 for G, 2 for C and 2 for i ). G./ and C./ follow
Gaussian distribution. i.t; /, which models the leakage variations [39], follows
log-normal distribution.
varETBR is essentially a kind of reduced MC method. It inherits the merits of
MC methods, which are less sensitive to the number of variables and can reflect the
real distribution very accurately for a sufficient number of samples. But the main
disadvantage of MC is that it is too slow to simulate on large-scale circuits. varETBR
first reduces the size of circuits to a small number while maintaining sufficient
accuracy. Thus, varETBR can do MC simulation on the reduced circuits very fast.
Note that the reduction process is done only once during the simulation process.
To verify the accuracy of our varETBR method, we show the results of
simulations on ibmpg1 (100 samples) and ibmpg6 (10 samples). Figures 10.3 and
10.4 show the results of varETBR and the pure MC method at the 1,000th node
(named n1 20583 11663 in SPICE format) of ibmpg1 and at the 1,000th node
(named n3 16800 9178400 in SPICE format) of ibmpg6, respectively. The circuit
equations in MC are solved by Matlab.
The absolute errors and relative errors of ibmpg1 and ibmpg6 are shown in
Figs. 10.5 and 10.6. We can briefly see that errors are very small and our varETBR is
154 10 Statistical Power Grid Analysis by Variational Subspace Method
1.8
1.7
Voltage (V)
1.6
varETBR
1.5
Monte Carlo
1.4
1.3
0 0.5 1 1.5 2
Time (s) x 10−7
Fig. 10.3 Transient waveform at the 1,000th node (n1 20583 11663) of ibmpg1 (p D 10, 100
samples). Reprinted with permission from [91]
c 2010 Elsevier
1.78
1.76
1.74
varETBR
1.72 Monte Carlo
Voltage (V)
1.7
1.68
1.66
1.64
1.62
1.6
0 0.5 1 1.5 2
Time (s) x 10−7
Fig. 10.4 Transient waveform at the 1,000th node (n3 16800 9178400) of ibmpg6 (p D 10, 10
samples). Reprinted with permission from [91]
c 2010 Elsevier
4 Numerical Examples 155
Voltage (V)
2.5
0.015
2
0.01 1.5
1
0.005
0.5
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
Time (s) x 10−7 Time (s) x 10−7
Simulation errors of ibmpg1 (100 samples). Simulation errors of ibmpg6 (10 samples).
Fig. 10.5 Simulation errors of ibmpg1 and ibmpg6. Reprinted with permission from [91]
c 2010
Elsevier
2%
0.2%
Percentage
Percentage
1.5%
1%
0.1%
0.5%
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
Time (s) x 10 −7 Time (s) x 10−7
Relative errors of ibmpg1 (100 samples). Relative errors of ibmpg6 (10 samples).
Fig. 10.6 Relative errors of ibmpg1 and ibmpg6. Reprinted with permission from [91]
c 2010
Elsevier
very accurate. Note that the errors are not only influenced by the variations but also
depend on the reduced order. To increase the accuracy, we may increase the reduced
order. In our tests, we set the reduced order to p D 10 for all the benchmarks.
Next, we do accuracy comparison with MC on the probability distributions
including means and variances. Figure 10.7 shows the voltage distributions of both
varETBR and original MC at the 1,000th node of ibmpg1 when t D 50 ns (200 time
steps between 0 ns and 200 ns in total). We can also refer to simulation waveforms
on t D 50 ns in Fig. 10.3. Note that the results do not follow Gaussian distribution
as G./ and C./ follow Gaussian distribution and i.t; / follows log-normal
distribution. From Fig. 10.7, we can see that not only are the means and the variances
of varETBR and MC almost the same but so are their probability distributions.
156 10 Statistical Power Grid Analysis by Variational Subspace Method
400
μ−3σ μ μ+3σ
350
300
Number of events
250
200
Monte Carlo
150 varETBR
100
50
0
0 0.5 1 1.5 2 2.5
Voltages (V)
Fig. 10.7 Voltage distribution at the 1,000th node of ibmpg1 (10,000 samples) when t D 50 ns.
Reprinted with permission from [91] c 2010 Elsevier
Finally, we compare the CPU times of varETBR and the pure Monte Carlo
method. To verify the efficiency of varETBR on both CPU time and memory, we
do not need to run simulations many times for both varETBR and MC. We will run
10 or 100 samples for each benchmark to show the efficiency of varETBR since we
already showed its accuracy. Although we only run a small number of samples, the
speedup will be the same. Table 10.2 shows the actual CPU times of both varETBR
(including FFT costs) and MC on the given set of circuits. The number of sampling
points in reduction is q D 50. The reduction order is p D 10. Table 10.3 shows the
projected CPU times of varETBR (one-time reduction plus 10,000 simulations) and
MC (10,000 samples).
In varETBR, circuit model becomes much smaller after reduction and we only
need to perform the reduction once. Therefore, the total time is much faster than
4 Numerical Examples 157
Table 10.4 Relative errors for the mean of max voltage drop of varETBR
compared with Monte Carlo on the 2,000th node of ibmpg1 (q D 50, p D 10,
10,000 samples) for different variation ranges and different numbers of
variables
Variation range
#Variables var D 10% var D 30% var D 50% var D 100%
M D6 0:16% 0:08% 0:17% 0:21%
M D9 0:16% 0:25% 0:08% 0:23%
M D 12 0:25% 0:07% 0:07% 0:28%
M D 15 0:15% 0:06% 0:05% 0:06%
Table 10.5 Relative errors for the variance of max voltage drop of varETBR
compared with Monte Carlo on the 2,000th node of ibmpg1 (q D 50, p D
10, 10,000 samples) for different variation ranges and different numbers of
variables
Variation range
#Variables var D 10% var D 30% var D 50% var D 100%
M D6 0:27% 1:54% 1:38% 1:73%
M D9 0:25% 0:67% 1:32% 1:27%
M D 12 0:42% 0:07% 0:68% 1:41%
M D 15 0:18% 1:11% 0:67% 2:14%
MC (up to 1; 960). Basically, the bigger the original circuit size is, the faster the
simulation will be for varETBR. Compared to the MC method, the reduction time
is negligible compared to the total simulation time.
Note that we run random simulation 10,000 times for ibmpg1, as shown in
Table 10.2, to show the efficiency of our varETBR in practice.
It can be seen that varETBR is very scalable. It is, in practice, almost independent
of the variation range and numbers of variables. One possible reason is that
varETBR already captures the most dominant subspaces even for small number of
samples (50 in our case) as explained in Sect. 3.
When we increase the variation range and the number of variables, the accuracy
of varETBR is almost unchanged. Tables 10.4 and 10.5 show the mean and variance
comparison between the two methods for 10 K MC runs, where we increase the
number of variables from 6 to 15 and the variation range from 10% to 100%.
The tables show that varETBR is very insensitive to the number of variables and
158 10 Statistical Power Grid Analysis by Variational Subspace Method
Table 10.6 CPU times (s) comparison of StoEKS and varETBR (q D 50, p D 10)
with 10,000 samples for different numbers of variables
MD5 MD7 MD9
Test Ckts StoEKS varETBR StoEKS varETBR StoEKS varETBR
ibmpg1 165 1315 572 1338 3748 1326
ibmpg2 1458 1387 1351 1377
variation range for a given circuit ibmpg1, where simulations are run on 10,000
samples for both varETBR (q D 50, p D 10) and MC.
The variation range var is the ratio of the maximum variation value to the
nominal value. So “var D 100%” means the maximum variation value may be
as large as the nominal value.
From Tables 10.4 and 10.5, we observe that varETBR is basically insensitive to
the number of variables and the variation range. Here we use the same sampling size
(q D 50) and reduced order (p D 10) for all of the different combinations between
number of variables and variation range. And the computation cost of varETBR is
the almost same for different numbers of variables and different variation ranges.
This actually is consistent with the observation in PMTBR [134]. One explanation
for the insensitiveness or nice feature of the presented method is that the subspace
obtained even with small number of samplings contains the dominant response
Gramian subspaces for the wide parameter and frequency ranges.
Finally, to demonstrate the efficiency of varETBR, we compare it with one re-
cently proposed similar approach, StoEKS method, which employs Krylov subspace
reduction with orthogonal polynomials in [111] on the same suite of IBM circuit.
Table 10.6 shows the comparison results where “” means out of memory error.
StoEKS can only finish smaller circuits ibmpg1 (30 k) and ibmpg2 (120 k), while
varETBR can go through all the benchmarks (up to 1.6 M nodes) easily. The CPU
time of StoEKS increases rapidly and could not complete computations as variables
count increases. For varETBR, CPU time is independent of number of variables
and only depends on the reduced order and number of samples used in the reduced
MC simulation. Here we select reduced order p D 10 and 10,000 samples that are
sufficient in practice to obtain the accurate probability distribution.
5 Summary
In this chapter, we have presented a new scalable statistical power grid analysis
approach based on ETBR reduction techniques. The new method, called varETBR,
performs reduction on the original system using variation-bearing subspaces before
MC statistical transient simulation. But different from the varPMTBR method, both
system and input source variations are considered for generating the projection
subspace by sampling variational response Gramians to perform the reduction. As a
result, varETBR can reduce systems with many terminals like power grid networks
5 Summary 159
1 Introduction
It is well accepted that the process-induced variability has huge impacts on the
circuit performance in the sub-100 nm VLSI technologies [120,121]. The variational
consideration of process has to be assessed in various VLSI design steps to ensure
robust circuit design. Process variations consist of both systematic ones, which
depend on patterns and other process parameters, and random ones, which have
to be dealt with using stochastic approaches. Efficient capacitance extraction ap-
proaches by using the boundary element method (BEM) such as the fastCap [115],
HiCap [164], and PHiCap [199] have been proposed in the past. To consider the
variation impacts on the interconnects, one has to consider the RLC extraction
processes of the three-dimensional structures modeling the interconnect conductors.
In this chapter, we investigate the geometry variational impacts on the extracted
capacitance.
Statistical extraction of capacitance considering process variations has been stud-
ied recently, and several approaches have been proposed [74,87,207,208,210] under
different variational models. Method in [87] uses analytical formulas to consider the
variations in capacitance extraction and it has only first-order accuracy. The FastSies
program considers the rough surface effects of the interconnect conductors [210].
It assumes only Gaussian distributions and has high computational costs. Method
in [74] combines the hierarchical extraction and PFA to solve the capacitance
statistical extraction.
Recently, a capacitance extraction method using collocation-based spectral
stochastic method was proposed [205, 208]. This approach is based on the Hermite
PC representation of the variational capacitance. It applies the numerical quadrature
(collocation) method to compute the coefficients of the extracted capacitance in the
Hermite polynomial form where the capacitance extraction processes (by solving
the potential coefficient matrices) are performed many times (sampling). One of
the major problems with this method is that many redundant operations are carried
out (such as the setup of potential coefficient matrices for each sampling, which
the case for the hierarchical approach [164]: the number of panels (thus the random
variables) can be considerably reduced and the interactions between panels are
constant. These are the areas for our future investigations.
2 Problem Formulation
For m conductors system, the capacitance extraction problem based on the BEM
formulation is to solve the following integral equation [118]:
Z
1 ! !
! !
.xj /daj D v.xi /; (11.1)
S j xi xj j
! !
where .xj / is the charge distribution on the surface at conductor j , v.xi / is the
potential at conductor i , and ! 1 ! is the free space Green function.1 daj is the
j xi xj j
! !
surface area on the surface S of conductor j . xi and xj are point vectors. To solve
for capacitance from one conductor to the rest, we set the conductor’s potential to
be one and all other m 1 conductors’ potential to be zero. The resulting charges
computed are capacitance. BEM method divides the surfaces into N small panels
and assumes uniform charge distribution on each panel, which transforms (11.1)
into a linear algebraic equation:
P q D v; (11.2)
where P 2 RN N is the potential coefficient matrix, q is the charge on panels, and
v is the preset potential on each panel. By solving the above linear equation, we can
obtain all the panel charges (thus capacitance values). In potential coefficient matrix
P , each element is defined as
Z
1 ! !
Pij D G.xi ; xj /daj ; (11.3)
sj Sj
! ! 1 !
where G.xi ; xj / D ! ! is the Green function of point source at xj . Sj is the
j xi xj j
surface of panel j and sj is the area of panel j .
Process variations introducing conductor geometry variations are reflected on the
fact that the size of panel and distances between panels become random variables.
Here we assume the panel is still a two-dimensional surface. These variations will
make each element in capacitance matrix follow some kinds of random distributions.
The problem we need to solve now is to derive this random distribution and then to
1
Note that the scale factor 1=.40 / can be ignored here to simplify the notation and is used in the
implementation to give results in units of farads.
166 11 Statistical Capacitance Modeling and Extraction
effectively compute the mean and variance of involved capacitance given geometry
randomness parameters.
In this chapter, we follow the variational model introduced in [74], where each
point in panel i is disturbed by a vector ni that has the same direction as the
normal direction of panel i :
!0 !
xi Dxi Cni ; (11.4)
where the length of the ni follows Gaussian distribution jni j N.0; 2 /. If
the value is negative, it means the direction of the perturbation is reversed. The
correlation between random perturbation on each panel is governed by the empirical
formulation such as the exponential model [212]:
.r/ D e r
2 =2
; (11.5)
where r is the distance between two panel centers and is the correlation length.
The most straightforward method is to use MC simulation to obtain distributions,
mean values, and variances of all those capacitance. But the MC method will
be extremely time consuming as each sample run requires the formulation of the
changed potential coefficient matrix P .
Here the charge q. / in (11.2) is an unknown random variable vector (with normal
distribution), then potential coefficient equation becomes
P . /q. / D v; (11.6)
where both P . / and q. / are in Hermite PC form. Then the coefficients can be
computed by using Galerkin-based method in Sect. 3.4 of Chap. 2. The principle of
orthogonality states that the best approximation of v. / is obtained when the error,
. /, defined as
. / D P . /q. / v (11.7)
3 Presented Orthogonal PC-Based Extraction Method: StatCap 167
where Hk ./ are Hermite polynomials. In this way, we have transformed the
stochastic analysis process into a deterministic form, whereas we only need to
compute the corresponding coefficients of the Hermite PC.
For the illustration purpose, considering two Gaussian variables D Œ 1 ; 2
,
assuming the charge vector in panels can be written as a second-order (p D 2)
Hermite PC, we have
q./ D q0 C q1 1 C q2 2 C q3 . 12 1/
Cq4 . 22 1/ C q5 . 1 2 /; (11.9)
! !
where G.xi ; xj / is the free space Green function defined in (11.3).
Notice that if panel i and panel j are far away (their distance is much larger than
the panel area), we can have the following approximation [74]:
! !
Pij G.xi ; xj / i ¤ j: (11.12)
! !
Suppose variation of panel i can be written as ni D ıi ni where ni is the unit
normal vector of panel i and ıi is the scalar variation. Then take Taylor expansion
168 11 Statistical Capacitance Modeling and Extraction
! ! 1
G.xi Cni ; xj Cnj / D ! !
(11.13)
j xi xj Cni nj j
1 1
D ! !
Cr ! !
.nj ni / C O..ni nj /2 /: (11.14)
j xi xj j j xi xj j
! ! !
r D xi xj : (11.16)
Now we first ignore the second-order terms to make the variation in the linear
form. As a result, the potential coefficient matrix P can be written as
P P0 C P1 D
0 ! ! ! ! 1
G.x1 Cn1 ; x1 Cn1 / G.x1 Cn1 ; xn Cnn /
B ! ! ! ! C
B G.x2 Cn2 ; x1 Cn1 / G.x2 Cn2 ; xn Cnn / C (11.17)
B C;
B :: :: C
@ : : A
! ! ! !
G.xn Cnn ; x1 Cn1 / G.xn Cnn ; xn Cnn /
where
0 ! ! ! ! ! ! 1
G.x1 ; x1 / G.x1 ; x2 / G.x1 ; xn /
B ! ! ! ! ! ! C
B G.x2 ; x1 / G.x2 ; x2 / G.x2 ; xn / C
B
P0 D B C
:: :: :: C
@ : : : A
! ! ! ! ! !
G.xn ; x1 / G.xn ; x2 / G.xn ; xn /
0 ! ! 1
0 rG.x1 ; xn / .nn n1 /
B ! ! ! ! C
B rG.x2 ; x1 / .n1 n2 / rG.x2 ; xn / .nn n2 /C
B
P1 D B C
:: :: C
@ : : A
! !
rG.xn ; x1 / .n1 nn / 0
We can further write the P1 as the following form:
P1 D V1 N1 J1 J1 N1 V1 ; (11.18)
3 Presented Orthogonal PC-Based Extraction Method: StatCap 169
0 ! ! ! ! 1
0 rG.x1 ; x2 / rG.x1 ; xn /
B ! ! ! ! C
BrG.x2 ; x1 / 0 rG.x2 ; xn /C
B
J1 D B C
:: :: :: C
@ : : : A
! ! ! !
rG.xn ; x1 / rG.xn ; xn1 / 0
0! 1
n1 0
B ! C
B 0 n2 C
N1 D B
B ::
C
: C
@ : :: A
!
0 nn
0 1
ın1 0
B 0 ın2 C
B C
V1 D B : : C;
@ :: :: A
0 ınn
where J1 and N1 are vector matrices and V1 is a diagonal matrix.
To deal with spatial correlation, P1 can be further expressed as a linear
combination of the dominant and independent variables:
D Œ 1 ; 2 ; : : : ; p
(11.19)
where
P1i D Ai N1 J1 J1 N1 Ai (11.22)
and
0 1
a1i 0 0
B 0 a2i 0 C
B C
Ai D B : :: :: C : (11.23)
@ :: : : A
0 0 ani
170 11 Statistical Capacitance Modeling and Extraction
Once the potential coefficient matrix is represented in the affine form as shown in
(11.21), we are ready to solve for the coefficients P1i by using the Galerkin-based
method, which will result in a larger system with augmented matrices and variables.
Specifically, for p independent Gaussian random variables D Œ 1 ; : : : ; p
,
there are K D 2p Cp.p 1/=2 first- and second-order Hermite polynomials. Hi ./
i D 1; : : : ; K represents each Hermite polynomial and H1 D 1 ; : : : ; Hp D p . So
for the vector of variational potential variables q./, it can be written as
X
K
q./ D q0 C qi Hi ./; (11.24)
i D1
where each qi is a vector associated with one polynomial. So the random linear
equation can be written as
! !
X
p
X
K
P q D P0 C P1i Hi q0 C qi Hi D v: (11.25)
i D1 i D1
Expanding the equation and performing inner product with Hi on both sides, we
can derive new linear system equations:
!
X
p
W0 ˝ P0 C Wi ˝ P1i Q D V; (11.26)
i D1
and
0 1
hHi H0 H0 i hHi H0 H1 i hHi H0 HK i
B C
B hHi H1 H0 i hHi H1 H1 i hHi H1 HK i C
B C
Wi D B :: :: :: C; (11.28)
B C
@ : : hH H H
i l m i : A
hHi HK H0 ihHi HK H1 i hHi HK HK i
4 Second-Order StatCap 171
Such sparse property can help save the memory significantly as we do not need
to actually perform the tensor product as shown in (11.26). Instead, we can add
all Wi together and expand each element in the resulting matrix by some specific
P1i during the solving process, as there is no overlap among Wi for any element
position.
As the original potential coefficient matrix is quite sparse, low rank, the
augmented matrix is also low rank. As a result, the sparsity, low rank, and symmetric
properties can be exploited by iterative solvers to speed up the extraction process as
shown in the experimental results. In our implementation, the minimum residue
conjugate gradient method [130] is used as the solver since the augmented system
is symmetric.
4 Second-Order StatCap
L1 ¤L
X X2
where P2L is the coefficient corresponding to the first type of second-order Hermite
polynomial, L2 1, and P2L1 ;L2 means the coefficient corresponding to the second
type of second-order Hermite polynomial, L1 L2 .L1 ¤ L2 /.
4 Second-Order StatCap 173
@Pij @Pij
Pij;1L D aiL C ajL ; (11.34)
@ni @nj
2 @2 Pij 2
2 @ Pij
Pij;2L D aiL 2
C ajL
@ni @nj 2
@2 Pij
C 2aiL ajL ; (11.35)
@nj ni
@2 Pij @2 Pij
Pij;2L1 ;L2 D 2aiL1 aiL2 C 2ajL1 ajL2
@ni 2 @nj 2
@2 Pij
C 2.aiL1 ajL2 C aiL2 ajL1 / : (11.36)
@nj ni
Hence, we need to compute analytic expressions for the partial derivatives of Pij
to obtain the coefficients of Hermite polynomials. The details of the derivations for
computing the derivatives used in (11.34)–(11.36) can be found in the appendix
section.
Similar to Sect. 3, once the potential coefficient matrix is represented in the affine
form as shown in (11.33), we are ready to solve the coefficients P1L , P2L , and
P2L1 ;L2 by using the Galerkin-based method.
In this case, P in (11.33) now is rewritten as
X
p
X
K
P D P0 C P1i Hi C P2i Hi : (11.37)
i D1 i DpC1
Expanding the equation and performing inner product with Hi on both sides, we
can derive a new linear system:
0 1
X
p
X
K
@W0 ˝ P0 C Wi ˝ P1i C Wi ˝ P2i A Q D V; (11.39)
i D1 i DpC1
174 11 Statistical Capacitance Modeling and Extraction
where ˝ is the tensor product and Q and V are the same as in (11.27), and Wi has
the same definition as in (11.28).
Again, the matrix in the rhs of (11.39) is the augmented potential coefficient ma-
trix for the second-order StatCap. Since Hi are at most second-order polynomials,
we can still use LUT to calculate every element in Wi for any number of random
variables.
Now we study the properties of augmented potential coefficient matrix. We
review the features and observations we made for the first-order StatCap.
For Wi , which is a K K matrix, where K D p.p C3/=2, the number of nonzero
elements in Wi is showed in Table 11.1. From Table 11.1, we can see that matrices
Wi for i D 1; : : : ; K are still very sparse. As a result, their tensor products with P1i
and P2i will still give rise to the sparse augmented matrix in (11.39).
For the four observations in Sect. 3 regarding the structure of Wi ; i D p C
1; : : : ; K and the augmented matrix, we find that all the observations are still valid
except for Observation 2. As a result, all the efficient implementation and solving
techniques mentioned at the end of Sect. 3 can be applied to the second-order
method.
5 Numerical Examples
In this section, we compare the results of the presented first-order and second-
order StatCap methods against MC method and SSCM method [208], which are
based on the spectral stochastic collocation method. The StatCap methods have
been implemented in Matlab 7.4.0. We use minimum residue conjugate gradient
method as the iterative solver. We also implement the SSCM method in Matlab
using the sparse grid package [81, 82]. We do not use any hierarchical algorithm
to accelerate the calculation of the potential coefficient matrix for both StatCap and
SSCM. Instead, we use analytic formula in [194] to compute the potential coefficient
matrices.
All the experimental results are carried out in a Linux system with Intel Quadcore
Xeon CPUs with 2:99 Ghz and 16 GB memory. The initial results of this chapter
were published in [21, 156].
We test our algorithm on six testing cases. The more specific running parameters
for each testing case are summarized in Table 11.2. In Table 11.2, p is the number
of dominant and independent random variables we get through PCA operation and
M C # means the times we run MC method. The 22 bus are shown in Fig. 11.1, and
three-layer metal plane capacitance is shown in Fig. 11.2. In all the experiments, we
5 Numerical Examples 175
set standard deviation as 10% of the wire width and the , the correlation length, as
200% of the wire width.
First, we compare the CPU times of the four methods. The results are shown in
Table 11.3. In the table, StatCap(1st/2nd) refers to the presented first- and second-
order methods, respectively. SP(X) means the speedup of the first-order StatCap
comparing with MC or SSCM. All the capacitance is in picofarad unit.
It can be seen that both the first- and second-order StatCap are much faster than
both SSCM and the MC method. And for large testing cases, such as the 5 5 bus
case, MC and SSCM will run out of memory, but StatCap still works well. For all
the cases, StatCap can deliver about two orders of magnitude speed up over the
SSCM and three orders of magnitude speed up over MC method. Notice that both
SSCM and StatCap use the same random variables after PCA reduction.
176 11 Statistical Capacitance Modeling and Extraction
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
0.8
0.6 1
0.8
0.4 0.6
0.2 0.4
0.2
0 0
Fig. 11.2 Three-layer metal planes. Reprinted with permission from [156]
c 2010 IEEE
Table 11.3 CPU runtime (in seconds) comparison among MC, SSCM, and StatCap(1st/2nd)
1 1 bus, MC(10,000)
MC SSCM StatCap(1st) StatCap(2nd) SP(MC) SP(SSCM)
2,764 s 49.35 s 1.55 s 3.59 s 1,783 32
2 2 bus, MC(6,000)
MC SSCM StatCap(1st) StatCap(2nd) SP(MC) SP(SSCM)
63,059 s 2,315 s 122 s 190 s 517 19
Three-layer metal plane, MC(6,000)
MC SSCM StatCap(1st) StatCap(2nd) SP(MC) SP(SSCM)
16,437 s 387 s 4.11 s 6.67 s 3,999 94
3 3 bus, MC(6,000)
MC SSCM StatCap(1st) StatCap(2nd) SP(MC) SP(SSCM)
2:2 105 s 7,860 s 408 s 857 s 534 19
4 4 bus, MC(6,000)
MC SSCM StatCap(1st) StatCap(2nd) SP(MC) SP(SSCM)
–* 3:62 104 1,573 s 6,855 s 260 23
5 5 bus, MC(6,000)
MC SSCM StatCap(1st) StatCap(2nd) SP(MC) SP(SSCM)
–* – 1:7 104 6:0 104 s – –
* – out of memory
We notice that both MC and SSCM need to compute the potential coefficient
matrices each time the geometry changes. This computation can be significant
compared to the CPU time of solving potential coefficient equations. This is one
6 Additional Notes 177
of the reasons that SSCM and MC are much slower than StatCap, in which the
augmented system only needs to be set up once.
Also, SSCM uses the sparse grid scheme to reduce the collocation points in
order to derive the coefficients of OPC. But the number of collocation points is still
in the order of O.m2 / for the second-order Hermite polynomials, where m is the
number of variables. Thus, it requires O.m2 / solutions for the different geometries.
In our algorithm, we also consider the second-order Hermite polynomials. But we
only need to solve the augmented system once. The solving process can be further
improved by using some advanced solver or acceleration techniques.
Next, we perform the accuracy comparison. The statistics for 1 1 bus case from
the four algorithms are summarized in Tables 11.4 and 11.5 for the mean value
and standard deviation, respectively. The parameter settings for each case are listed
in Table 11.2. We make sure that SSCM and the first-order and the second-order
StatCap use the same number of random variables after the PCA operations.
From these two tables, we can see that first-order StatCap, second-order StatCap,
and SSCM give the similar results for both mean value and standard deviation
compared with the MC method. For all the other cases, the times we carry out MC
simulations are as shown in Table 11.3, and the similar experimental results can
be obtained. The maximum errors and average errors of mean value and standard
deviation for all the testing cases are shown in Tables 11.6 and 11.7. Compared to
the MC method, the accuracy of the second-order StatCap is better than the first-
order StatCap method, while from Table 11.3, the speed of second-order StatCap
keeps in the same order as first-order StatCap and is still much faster than SSCM
and MC.
6 Additional Notes
First, we consider the scenario where panel i and panel j are far away (their
distance is much larger than the panel area). In this case, the approximations in
(11.12) and (11.13) are still valid. From free space Green function, we have (11.15)
and (11.16) for the first-order Hermite polynomails, and we have the following for
the second-order Hermite polynomails:
1
Pij;0 D ! !
; (11.40)
j xi xj j
! !
@Pij r ni
D ! ; (11.41)
@ni j r j3
! !
@Pij r nj
D ! ; (11.42)
@nj j r j3
! !
@2 Pij 3. r ni /2 1
2
D !
! ; (11.43)
@ni j r j 5 j r j3
6 Additional Notes 179
! !
@2 Pij 3. r nj /2 1
2
D !
! ; (11.44)
@nj j r j5 j r j3
! ! ! !
@2 Pij 3. r nj /. r ni /
D !
: (11.45)
@nj ni j r j5
Second, we consider the scenario where panel i and panel j are near each other
(their distance is comparable with the panel area). In this case, the approximation
in (11.12) is no longer accurate and we must consider the general form in (11.29)
and (11.30).
@P @2 P
Since panel i panel j are perpendicular to ni /nj , for @nijj and @njij2 , with
(11.29), we have
R ! !
@Pij @ s1j Sj G.xi ; xj /daj
@nj @nj
180 11 Statistical Capacitance Modeling and Extraction
R
@ s1j Sj j! !
1
daj
xi xj Cni nj j
D
@nj
Z @ ! !
1
1 j xi xj Cni nj j
D daj
sj Sj @nj
Z ! !
1 r nj
D !
daj
sj Sj j r j3
! ! Z
r nj 1
D !
daj ; (11.46)
sj Sj j r j3
R ! !
@2 Pij @2 s1j Sj G.xi ; xj /daj
@nj 2 @nj 2
R
@2 s1j Sj j! !
1
daj
xi xj Cni nj j
D
@nj 2
Z @2 ! !
1
1 j xi xj Cni nj j
D daj
sj Sj @nj 2
Z ! !
1 3. r nj /2 1
D !
daj !
sj j r j5
Sj j r j3
! ! Z Z
3. r nj /2 daj 1 daj
D !
!
: (11.47)
sj Sj j r j5 sj Sj j r j3
@2 P
ij
While for @nj n i
, we need to further consider two cases. First, when panel i and
panel j are in parallel, we have
Second, we consider panel i and panel j are not in parallel. Then we arrive
@P
@2 Pij @ @niji
D
@nj ni @nj
!
r ni R
!
1
@ si Si ! dai s
j r j3
D
@nj
R 1
! ! @ dai
r ni Si j ! r j3
D : (11.51)
si @nj
Assume the conductors are rectangular geometries. Then two panels should be either
in parallel or perpendicular. Since panel i and panel j are not parallel, these two
panels will be perpendicular.
Without loss of generality, we assume that panel i is in parallel with xz-plane
! !
and panel j is in parallel with yz-plane. Then, easy to see, ni D .0; 1; 0/ and nj D
.1; 0; 0/. Let ukl , k; l 2 f0; 1g denote the four corners of panel i , with .xi k ; yi ; zi l /
being the Cartesian cooridinates of corner ukl , and the center of gravity is .xi ; yi ; zi /.
Let tkl , k; l 2 f0; 1g denote the four corners of panel j , with .xj ; yj k ; zj l / being the
Cartesian cooridinates of corner tkl , and the center of gravity is .xj ; yj ; zj /.
After that, (11.51) can be further deduced to
Rx Rz dxdz
@ xii10 zii10
@2 Pij yj yi !
j r j3
D
@nj ni si @xj
R x x R zi1 dz
@ xii10xjj zi 0 ! dx
yj yi j r 0 j3
D
si @xj
0 1
Z z Z zi1
yj yi B
B i1 dz dz CC
D B ˇ ˇ ˇ ˇ3 C
si @ zi 0 ˇ ! ˇ3 ˇ !C ˇ A
ˇr ˇ zi 0
ˇr ˇ
ˇ ˇ
182 11 Statistical Capacitance Modeling and Extraction
yj yi X X
1 1
.1/kClC1 .zi l zj /
D
si ..xi k xj /2 C .yi yj /2 /
kD0 lD0
!
1
p (11.52)
.xi k xj /2 C .yi yj /2 C .zi l zj /2
where
!
q
r D .x xj /2 C .yi yj /2 C .z zj /2 ;
! q
0
r D .x/2 C .yi yj /2 C .z zj /2 ;
! q
rC D .xi1 xj /2 C .yi yj /2 C .z zj /2 ;
! q
r D .xi 0 xj /2 C .yi yj /2 C .z zj /2 :
7 Summary
1 Introduction
Since the interconnect length and cross area are at different scales, the variational
capacitance extraction is quite different between the on-chip [21, 205, 209] and
the off-chip [34, 210]. The on-chip interconnect variation from the geometrical
parameters, such as width of one panel and distance between two panels, is more
dominant [21, 209] than the rough surface effect seen from the off-chip package
trace. However, it is unknown how to leverage the stochastic process variation into
the matrix-vector product (MVP) by fast multipole method (FMM) [21, 34, 205,
209, 210]. Similar to deal with the stochastic analog mismatch for transistors [133],
a cost-efficient full-chip extraction needs to explore an explicit relation between
the stochastic variation and the geometrical parameter such that the electrical
property can show an explicit dependence on geometrical parameters. Moreover, the
expansion by OPC with different collocation schemes [21, 34, 187, 196, 209] always
results in an augmented and dense system equation. This significantly increases
the complexity when dealing with a large-scale problem. The according GMRES
thereby needs to be designed in an incremental fashion to consider the update
from the process variation. As a result, a scalable extraction algorithm similar to
[77, 118, 163] is required to consider the process variation with the new MVP and
GMRES developed accordingly as well.
To address the aforementioned challenges, this chapter introduces a new tech-
nique [56], which contributes as follows. First, to reveal an explicit dependence
on geometrical parameters, the potential interaction is represented by a number of
GMs. As such, the process variation can be further included by expanding the GMs
with the use of orthogonal polynomial chaos, OPC, called SGMs in this chapter.
Next, with the use of the SGM, the process variation can be incorporated into a
modified FMM algorithm that evaluates the MVP in parallel. Finally, an incremental
GMRES method is introduced to update the preconditioner with different variations.
Such a parallel and incremental full-chip capacitance extraction considering the
stochastic variation is called piCAP. Parallel and incremental analyses are the two
The resulting potential coefficient matrix P is usually dense in the BEM method
in Sect. 2 of Chap. 11. As such, directly solving (11.2) would be computationally
expensive. FastCap [118] applies an iterative GMRES method [149] to solve (11.2).
Instead of performing an expensive LU decomposition of the dense P , GMRES first
forms a preconditioner W such that W 1 P has a smaller condition number than
P , which can accelerate the convergence of iterative solvers [150]. Take the left
preconditioning as an example:
.W 1 P /q D W 1 b:
The FMM was initially proposed to speed up the evaluation of long-ranged particle
forces in the N-body problem [141,193]. It can also be applied to the iterative solvers
by accelerating calculation of MVP [118]. Let us take the capacitance extraction
problem as an example to introduce the operations in the FMM. In general, the
FMM discretizes the conductor surface into panels and forms a cube with a finite
height containing a number of panels. Then, it builds a hierarchical oct-tree of cubes
and evaluates the potential interaction P at different levels.
3 Stochastic Geometrical Moment 185
Fig. 12.1 Multipole operations within the FMM algorithm. Reprinted with permission from [56]
c 2011 IEEE
Specifically, the FMM first assigns all panels to leaf cells/cubes, and computes
the MEs for all panels in each leaf cell. Then, FMM calculates the multipole
expansion of each parent cell using the expansions of its children cells (called M2M
operations in upward pass). Next, the local field expansions of the parent cells can
be obtained by adding multipole expansions of well-separated parent cells at the
same levels (called M2L operations). After that, FMM descends the tree structure
to calculate the local field expansion of each panel based on the local expansion of
its parent cell (called L2L in downward pass). All these operations are illustrated
within Fig. 12.1.
In order to further speed up the evaluation of MVP, the presented stochastic
extraction has a parallel evaluation P q with variations, which is discussed in Sect. 4
and an incremental preconditioner, which is discussed in Sect. 5. Both of these
features depend on how to find an explicit dependence between the stochastic
process variation and the geometric parameters, which is discussed in Sect. 3.
With FMM, the complexity of MVP P q evaluation can be reduced to O.N / during
the GMRES iteration. Since the spatial decomposition in FMM is geometrically
dependent, it is helpful to express P using GMs with an explicit geometry
186 12 Incremental Extraction of Variational Capacitance
dependence. As a result, this can lead to an efficient recursive update (M2M, M2L,
L2L) of P on the oct-tree. The geometry dependence is also one key property to
preserve in presence of the stochastic variation. In this section, we first derive the
geometrical moment and then expand it by stochastic orthogonal polynomials to
calculate the potential interaction with variations.
Process variation includes global systematic variations and local random variations.
This chapter focuses on local random variations, or stochastic variations, which
is more difficult to handle. Note that although there are many variation sources,
without loss of generality, the chapter considers two primary geometrical parameters
with stochastic variation for the purpose of illustration: panel distance (d ) and panel
width (h). Due to the local random variation, the width of the discretized panel, as
well as the distance between panels, may show random deviations from the nominal
value. Though there could exist a systematic correlation between d and h for each
panel, PCA in Sect. 2.2 of Chap. 2 can be first applied to decouple those correlated
parameters, and hence, potentially reduce the number of random variables. After the
PCA for the global systematic variation, we focus on the more challenging part: the
local random variation. With expansions in Cartesian coordinates, we can relate the
potential interaction with the geometry parameter through GMs that can be extended
to consider stochastic variations.
Let the center of an observer cube be r0 and the center of a source cube to be rc .
We assume that the distance between the i th source panel and rc is a vector r:
r D rx !
x C ry !
y C rz!
z
d D dx !
x C dy !
y C dz!
z
with jdj D d .
In Cartesian coordinates (x y z), when the observer is outside the source
region (d > r), a multipole expansion (ME) [9, 72] can be defined as
0 1
1 X .1/p 1
D .„ƒ‚…
r r/ @ r A
jr dj pŠ „ ƒ‚ … r„ ƒ‚ … d
pD0 p p p
X X
D Mp D lp .d /mp .r/; (12.1)
pD0 pD0
1
l0 .d / D ; m0 .r/ D 1;
d
dk
l1 .d / D 3 ; m1 .r/ D rk ;
d
3dk dl 1
l2 .d / D ; m2 .r/ D .3rk rl ıkl r 2 /;
d5 6
:::;
1 .1/p
lp .d / D „
r ƒ‚
r
… ; mp .r/ D . r r /: (12.2)
d pŠ „ƒ‚…
p p
X
p
pŠ
D mp C .h h/mpj : (12.3)
qD0
qŠ.p q/Š „ƒ‚…
j
Moreover, when the observer is inside the source region (d < r), a local
expansion (LE) under Cartesian coordinates is simply achieved by exchanging d
and h in (12.1)
1 X X
D Lp D mp .h/lp .r/: (12.4)
jr hj pD0 pD0
Also, when there is a spatial shift of the observer-cubic center r0 , the shift of
moments lp .r/ can be derived similarly to (12.3).
Clearly, both Mp , Lp and their spatial shifts show an explicit dependence on the
panel width h and panel distance d . For this reason, we call Mp and Lp GMs. As
such, we can also express the potential coefficient
(P
Mp if d > r;
40 P .h; d / ' PpD0 (12.5)
pD0 Lp otherwise;
Moreover, assuming that local random variations are described by two random
variables. h for the panel width h, and d for the panel distance d , the stochastic
forms of Mk and Lk become
MO p .h ; d / D Mp .h0 C h1 h ; d0 C d1 d /;
where h0 and d0 are the nominal values and h1 as well as d1 defines the perturbation
range (% of nominal). Similarly, the stochastic potential interaction becomes
PO .h ; d /.
P D W0 ˝ P0 C W1 ˝ P1
0 1 0 1
P0 0 0 0 P1 0
D @ 0 P0 0 A C @ P1 0 2P1 A
0 0 P0 0 2P1 0
0 1
P0 P1 0
D @ P1 P0 2P1 A : (12.7)
0 2P1 P0
E.q.d // D q0 ;
Var.q.d // D q12 Var.d / C q22 Var.d2 1/ D q12 C 2q22 :
4 Parallel Fast Multipole Method with SGM 189
30 P1 P0 2P1
40
50
2P1 P0
60
0 10 20 30 40 50 60
Matrix Column Index
Note that under a BEM formulation, the expanded terms Pi are still dense. With
a single plate example, we show the structure of augmented system in (12.7) as
Fig. 12.2.
Considering that the dimension of PO is further augmented, the complexity to
solve the augmented system (11.25) would be expensive. In the following, we
present a parallel FMM to reduce the cost of MVP evaluations in Sect. 4 and an
incremental preconditioner to reduce the cost of GMRES evaluation in Sect. 5.
Fig. 12.3 The M2M operation in an upward pass to evaluate local interactions around sources
cost and balance the workload. In the following steps, the stochastic P Q is
evaluated in two passes: an upward pass for multipole expansions (MEs) and a
downward pass for local expansions (LEs), both of which are further illustrated
with details below.
The upward pass manages the computation during the source expansion, which is
illustrated in Fig. 12.3.
It accumulates the multipole-expanded near-field interaction starting from the
bottom level (l D 0). For each child cube (leaf) without variation (nominal contribu-
tion to P0 ) at the bottom level, it first evaluates the stochastic geometrical moment
with (12.1) for all panels in that cube. If each panel experiences a variation d or h ,
it calculates Pi ./ q.i ¤ 0; D d ; h / by adding perturbation hi h or di d to
consider different variation sources, and then evaluates the SGMs with (12.6).
After building the MEs for each panel, it transverses to the upper level to consider
the contribution from parents as shown in Fig. 12.3. The moment of a parent cube
can be efficiently updated by summing the moments of its eight children via an
4 Parallel Fast Multipole Method with SGM 191
M2M operation. Based on (12.3), the M2M translates the children’s MO p into their
parents.
The M2M operations at different parents are performed in parallel since there is
no data dependence. Each processor builds its own panels’ SGMs while ignoring
the existence of other processors.
The potential evaluation for the observer is managed during a downward pass. At
lth level (l > 0), two cubes are said to be adjacent if they have at least one common
vertex. Two cubes are said to be well separated if they are not adjacent at level l but
their parent cubes are adjacent at level l 1. Otherwise, they are said to be far from
each other. The list of all the well-separated cubes from one cube at level l is called
the interaction list of that cube.
From the top level l D H 1, interactions from the cubes on the interaction
list to one cube are calculated by an M2L operation at one level (M2L operation
at top level, which is illustrated in Fig. 12.4). Assuming that a source-parent center
rc is changed to an observer-parent’s center r0 , this leads to an LE (12.4) using the
ME (12.1) when exchanging the r and d: As such, the M2L operation translates the
source’s MO p into the observer’s LO p for a number of source-parents on the interaction
list of one observer-parent at the same level. Due to the use of the interaction list,
the M2L operations have the data dependence that introduces overhead for a parallel
evaluation.
After the M2L operation, interactions are further recursively distributed down to
the children from their parents by an L2L operation (converse of the upward pass
shown in Fig. 12.5). Assume that the parent’s center r0 is changed to the child’s
center r00 by a constant h. Identical to the M2M update by (12.3), an L2L operation
updates r by r0 D r C h for all children’s L O k s. In this stage, all processors can
perform the same M2L operation at the same time on different data. This perfectly
employs the parallelism.
Finally, the FMM sums the L2L results for all leaves at the bottom level (l D 0)
and tabulates the computed products Pi qj (i; j D 0; 1; : : : ; n). By summing up
the products in order, the FMM returns the product P Q.i / in (11.25) for the next
GMRES iteration.
The total runtime complexity for the parallel FMM using stochastic GMs can be es-
timated by O.N=B/ C O.log8 B/ C C.N; B/, where N is the total number of panels
192 12 Incremental Extraction of Variational Capacitance
Observer c
Source c
Fig. 12.4 The M2L operation in a downward pass to evaluate interactions of well-separated source
cube and observer cube
Fig. 12.5 The L2L operation in a downward pass to sum all integrations
Cube k
5 Incremental GMRES
The parallel FMM presented in Sect. 4 provides a fast MVP for the fast GMRES
iteration. As discussed in Sects. 2 and 3, another critical factor for a fast GMRES
is the construction of a good preconditioner. In this section, to improve the
194 12 Incremental Extraction of Variational Capacitance
Note that is the shifting value that leads to a better convergence. This method
is called deflated power iteration. Moreover, as discussed below, the spectral
preconditioner W can be easily updated in an incremental fashion.
1
http://www.caam.rice.edu/software/ARPACK/.
5 Incremental GMRES 195
preconditioner W for the nominal system with the potential matrix P .0/ , it would
be expensive for another native Arnoldi iteration to form a new preconditioner W 0
for a new P 0 with updated ıP from P .1/ , : : :, P .n/ . Instead, we show that W can be
incrementally updated as follows.
If there is a perturbation ıP in P, the perturbation ıvi of i th eigenvector vi
.k D 1; : : : ; K/ can be given by [171]:
Œv1 ; : : : ; vj ; : : : ; vK ;
diagŒi 1 ; : : : ; i j ; : : : ; i K ;
W 0 D .I C VK0 .DK
0 1
/ .VK0 /T /
D W C ıW; (12.13)
where
VK0 D VK C ıVK ; 0
DK D .VK0 /T P VK0 : (12.14)
After expanding VK0 by VK and ıVK , the incremental change in the preconditioner
W can be obtained by
1 1
ıW D .EK VK DK FK DK VK /; (12.15)
where
1 T 1 T T
EK D ıVK DK VK C .ıVK DK VK / ; (12.16)
and
FK D ıVKT VK DK C .ıVKT VK DK /T : (12.17)
Note that all the above inverse operations only deal with the diagonal matrix DK ,
and hence, the computational cost is low.
Since there is only one Arnoldi iteration to construct a nominal spectral
preconditioner W , it can only be efficiently updated when ıP changes. For example,
ıP is different when one alters the perturbation range h1 of panel width or changes
the variation type from panel width h to panel distance d . We call this deflated
GMRES method with the incremental precondition an iGMRES method.
196 12 Incremental Extraction of Variational Capacitance
For our problem in (11.25), we first analyze an augmented nominal system with
W D diagŒW; W; : : : ; W ;
P D diagŒP .0/ ; P .0/ ; : : : ; P .0/ ;
DK D diagŒDK ; DK ; : : : ; DK ;
VK D diagŒVK ; VK ; : : : ; VK ;
which are all block diagonal with n blocks. Hence, there is only one preconditioning
cost from the nominal block P .0/ . In addition, the variation contributes to the
perturbation matrix by
0 1
0 P0;1 P0;n
B P1;0 0 P1;n C
B C
ıP D B : :: : : :: C : (12.18)
@ :: : : : A
Pn;0 Pn;1 0
6 piCAP Algorithm
The overall parallel extraction flow in piCAP is presented in Fig. 12.7. First, piCAP
discretizes conductor surfaces into small panels, and builds a hierarchical oct-tree
of cubes which will be distributed into many processors. Then, it sets the potential
of certain conductor j as 1 volt while other conductors are grounded. After that,
the spectrum preconditioner W is built according to the variational system P, and
updated partially for different variation sources. With the preconditioner, piCAP
uses GMRES to solve the augmented linear system P Q D B iteratively till
convergence. Parallel FMM described in Sect. 4 is then performed to provide MVP
P Q efficiently for GMRES. Finally, the variational capacitance Cij can be
achieved by summing up panel charges on conductor i .
As an example, we can take the procedure for panel distance d . With first-order
OPC expansion and the inner product, we can have the below augmented potential
coefficient matrix:
6 piCAP Algorithm 197
P D P .0/ C ıP
0 1 0 1
P0 0 0 0 P1 0
D @ 0 P0 0 A C @ P1 0 2P1 A
0 0 P0 0 2P1 0
0 1
P0 P1 0
D @ P1 P0 2P1 A : (12.19)
0 2P1 P0
198 12 Incremental Extraction of Variational Capacitance
Notice that the first-order OPC expansion is used here for illustration, and a
higher order expansion can provide more accurate variance information.
With the spectrum precondition in Sect. 5, we can build W .0/ for P .0/ and ıW
for ıP. Thus, the preconditioner W for an augmented system can be written as
E.q.d // D q0 ;
Var.q.d // D q12 Var.d / C q22 Var.d2 1/ D q12 C 2q22 :
The above procedure can be similarly applied to calculate the variance and the mean
for the geometrical parameter h. Clearly, the stochastic orthogonal expansion leads
to an augmented system with perturbed blocks in the off-diagonal. It increases the
computational cost for any GMRES method and remains an unresolved issue in the
previous applications of the stochastic orthogonal polynomial [21, 34, 187, 209].
In addition, when variation changes, the P matrix should be partially updated.
Forming a new preconditioner to consider the augmented (11.26) would therefore
be expensive.
Based on (12.15), we can do an incremental update of the preconditioner W to
consider a new variation P .i / when changing the perturbation range of hi or di .
Moreover, we can also make an incremental update of W when changing the
variation type from P .i / .h/ to P .i / .d /. This can dramatically reduce costs when
applying the deflated GMRES during the variational capacitance extraction. The
same procedure can be easily extended for high-order expansions with stochastic
orthogonal polynomials.
The first memory bottleneck is located at the O.N 2 / storage requirement of the
preconditioner matrix. For example, a second-order expanded system contains 3N
variables, where N is the number of panels. This is expensive to maintain. Because
each block of Pi;j is a set of symmetric positive semi-definite matrices, we can prune
some small off-diagonal entries, store half of them, and further apply a compress
sparse column (CSC) format to store the preconditioner matrix. This can reduce the
cost to build and store the block-diagonal spectral preconditioner. Another memory
bottleneck for the MVP is resolved due to the intrinsic matrix-free property of FMM.
This exploits the tree hierarchy to speed up the MVP evaluation with a cost of
O.N logN / for both memory and CPU time. Thus, the presented FMM using SGMs
can be efficiently used for large-sized variational capacitance extraction.
The time complexity stems mainly from the analysis of the preconditioner of the
nominal system during the first time. The use of a restarted Arnoldi in ARPACK can
be used to efficiently identify the first K eigenvalues. This can significantly reduce
the cost to O.N /. As a result, the computational cost to form the preconditioner is
reduced even during the first time.
7 Numerical Examples
Based on the presented algorithm, a program has been developed for piCap using
C++ on Linux network servers with Xeon processors (2.4 GHz CPU and 2 GB mem-
ory). In this section, we first validate the accuracy of SGMs by comparing them with
the MC integral. Then, we study the parallel runtime scalability when evaluating the
potential interaction using MVP with charge. In addition, the incremental GMRES
preconditioner is verified when compared to its nonincremental counterpart with
total runtime. Finally, spectral precondition is validated by analyzing the spectrum
of potential coefficient matrix. The initial results of this chapter were published
in [53].
Z(um)
0 h
d
panel i
−1
20
10 h
0 20
10
Y(um) 0
−10 −10 X (um)
when compared with exact values from integration method. Thus, higher OPC
expansion can lead to more accurate result but with higher computational expense
due to larger-scale system.
Next, the accuracy of presented method based on SGM is verified with the same
example in Fig. 12.8. To do so, we introduce a set of different random variation
ranges with Gaussian distribution for their distance d and width h. For this example,
MC method is used to validate the accuracy of SGMs.
First, MC method calculates their Cij s 3;000 times, and each time, the variation
with a normal distribution is introduced to distance d randomly. As such, we can
evaluate the distribution, including the mean value
and the standard deviation ,
of the variational capacitance.
Then, we introduce the same random variation to geometric moments in (12.6)
with stochastic polynomial expansion. Because of an explicit dependence on geo-
metrical parameters according to (12.1), we can efficiently calculate CO ij s. Table 12.3
shows the Cij value and runtime using the aforementioned two approaches. The
comparison in Table 12.3 shows that SGMs not only can keep high accuracy, which
yields an average error of 1.8%, but can also be up to 347 faster than the MC
method.
Moreover, Fig. 12.9 shows the Cij distribution from MC (3,000 times), while
considering 10% panel distance variation with Gaussian distribution. Also, the mean
and variance computed by piCAP are marked in the figure with the dashed lines,
which fit very well with MC results.
202 12 Incremental Extraction of Variational Capacitance
800
700 μ
Number of occurances
600
μ−3σ μ+3σ
500
400
300
200
100
0
−0.44 −0.42 −0.4 −0.38 −0.36 −0.34 −0.32 −0.3
Cij (pF)
In this part, we study the runtime scalability using a few large examples to show
both the advantage of the parallel FMM for MVP and the advantage of the deflated
GMRES with incremental preconditions.
The four large examples are comprised of 20; 40; 80; and 160 conductors, respec-
tively. For the two-layer example with 20 conductors, each conductor is of size
1 1 25 m (width thickness length), and piCap employs a uniform 3 3 50
discretization. Figure 12.10 shows its structure and surface discretization.
For each example, we use a different number of processors to calculate the MVP
of P q by the parallel FMM. Here we assume that only d has a 10% perturbation
range with Gaussian distribution. As shown in Table 12.4, the runtime of the parallel
MVP decreases evidently when more processors are involved. Due to the use of the
complement interaction list, the latency of communication is largely reduced and
the runtime shows a good scalability versus the number of processors. In fact, the
dependent list can eliminate major communication overhead and further achieve
1:57 speedup with four processors. Moreover, the total MVP runtime with four
processors is about 3 faster on average than runtime with a single processor.
7 Numerical Examples 203
Fig. 12.10 The structure and discretization of two-layer example with 20 conductors. Reprinted
with permission from [56]
c 2011 IEEE
Table 12.4 MVP runtime (s)/speedup comparison for four different examples
#Wire 20 40 80 160
#Panels 12,360 10,320 11,040 12,480
1 proc 0.737515/1.0 0.541515/1.0 0.605635/1.0 0.96831/1.0
2 procs 0.440821/1.7 0.426389/1.4 0.352113/1.7 0.572964/1.7
3 procs 0.36704/2.0 0.274881/2.0 0.301311/2.0 0.489045/2.0
4 procs 0.273408/2.7 0.19012/2.9 0.204606/3.0 0.340954/2.8
piCap has been used to perform analysis for three different structures as shown in
Fig. 12.11. The first is a plate with size 3232 m and discretized as 1616 panels.
The other two examples are cubic capacitor and Bus 2 2 crossover structures.
204 12 Incremental Extraction of Variational Capacitance
a b c
Fig. 12.11 Test structures: (a) plate, (b) cubic, and (c) crossover 22. Reprinted with permission
from [56]
c 2011 IEEE
For each example, we can obtain two stochastic equation systems in (12.19) by
considering variations separately from width h of each panel and from the centric
distance d between two panels, both with 20% perturbation ranges from their
nominal values which should obey the Gaussian distribution.
To demonstrate the effectiveness of the deflated GMRES with a spectral pre-
conditioner, two different algorithms are compared in Table 12.5. In the baseline
algorithm (column “diagonal prec.”), it constructs a simple preconditioner using
diagonal entries. As the fine mesh structure in the extraction usually introduces
degenerated or small eigenvalues, such a preconditioning strategy within the tra-
ditional GMRES usually needs much more iterations to converge. In contrast, since
the deflated GMRES employs the spectral preconditioner to shift the distribution
of nondominant eigenvalues, it accelerates the convergence of GMRES, leading
to a reduced number of iterations. As shown by Table 12.5, the deflated GMRES
consistently reduces the number of iterations by 3 on average.
Table 12.6 Total runtime(s) comparison for two-layer 20-conductor by different methods
Discretization Total runtime (s)
wtl #Panel #Variable Nonincremental Incremental
337 2,040 6,120 419.438 81.375
3 3 15 3,960 11,880 3,375.205 208.266
3 3 24 6,120 18,360 – 504.202
3 3 60 14,760 44,280 – 7,584.674
The spectral preconditioner can shift eigenvalue distribution to improve the conver-
gence of GMRES. Therefore, we compare the resultant spectrum with the nominal
case in this section, and further verify the efficiency of spectral preconditioner.
We use a single plate as an experimental example, and the spectrum of potential
coefficient matrix P can be calculated for nominal and perturbed systems.
206 12 Incremental Extraction of Variational Capacitance
103
Nominal System
Perturbed System
Preconditioned Perturbated System
102
EigenValue
101
100
10−1
0 20 40 60 80 100
EigenValue Index
Fig. 12.12 The comparison of eigenvalue distributions (panel width as variation source)
First, we study the spectrum of the nominal system without variation, which is
shown as plus signs in Fig. 12.12. It is obvious that the eigenvalues are not close
to each other, which can lead to large number of iterations in GMRES.
We introduce panel width variation h to generate perturbed system P ./
q./ D v. Here we assume that h has a 20% perturbation range. The eigenvalue
distribution of perturbed system can change dramatically from this nominal case, as
circle signs in Fig. 12.12, which disperse within a larger area. Therefore, in order
to speed up the convergence, we construct a spectral preconditioner as described
in Sect. 5 and apply it to the above perturbed system. Similarly, the spectrum of
the preconditioned perturbed system are shown as star signs in Fig. 12.12. It can be
observed that the preconditioned system has a more compact eigenvalue distribution
because the spectral preconditioner shifts dispersed eigenvalues to a certain area.
Moreover, when the linear system is solved with an iterative solver, such as
GMRES, the convergence speed depends greatly upon eigenvalue distributions
of the system matrix. With more compact spectrum, spectral preconditioner can
accelerate the convergence of iGMRES dramatically in the presented method.
8 Summary 207
102
Nominal System
Perturbed System
101
Preconditioned Perturbed System
EigenValue
100
10−1
0 20 40 60 80 100
EigenValue Index
Fig. 12.13 The comparison of eigenvalue distributions (panel distance as variation source)
Similarly, we can introduce panel distance variation d into the nominal system to
get perturbed system P ./ q./ D v. Also, distance d has a 20% perturbation
range.
We plot the spectrum of the perturbed system with distance variation with circle
signs in Fig. 12.13. When compared with spectrum in Fig. 12.12, we find that
panel width variation has more influence on the spectrum of perturbed system than
panel distance variation does. With spectral precondition, the spectrum becomes
more compact, as shown with star signs in Fig. 12.13. In fact, all eigenvalues
of preconditioned perturbed system are close to 0:2, which determines the small
condition number of the system matrix and thus fast convergence of GMRES.
8 Summary
In this chapter, we introduced GMs to capture local random variations for full-chip
capacitance extraction. Based on FMs, the stochastic capacitance can be thereby
calculated via OPC by FMM in a parallel fashion. As such, the complexity of
the MVP can be largely reduced to evaluate both nominal and stochastic values.
Moreover, one incrementally preconditioned GMRES is developed to consider
different types of update of variations with an improved convergence by spectrum
deflation.
208 12 Incremental Extraction of Variational Capacitance
1 Introduction
2 Problem Formulation
For a system with m conductors, we first divide all conductors into b filaments. The
resistance and inductance of all filaments are, respectively, stored in matrices Rbb
and Lbb , each with dimensions b b. R is a diagonal matrix with its diagonal
element
li
Rii D ; (13.1)
ai
where li is the length of filament i , is conductivity, and ai is the area of the
cross section of filament i . The inductance matrix L is a dense matrix. Lij can be
represented as in [76]:
Z Z
liPlj
Lij D dVi dVj ; (13.2)
4ai aj Vi Vj kr r0 k
where Ii and Ij are the currents inside the filaments i and j , ! is the angular
frequency, and ˚A and ˚B are the potentials at the end faces of the filament.
Equation (13.3) can be written in the matrix format as
.R C j!L/Ib D Vb ; (13.4)
2 Problem Formulation 211
wi 0 D wi C nw;i ; (13.6)
0
hi D hi C nh;i ; (13.7)
where the size of xi is a Gaussian distribution jxi j N.0; 2 /. The correlation
between random perturbations on each wire’s width and height is governed by an
empirical formulation such as the widely used exponential model:
.r/ D er
2 =2
; (13.8)
where r is the distance between two panel centers and is the correlation length. The
most straightforward method is to use a MC-based simulation to obtain distribution,
mean, and variance of all those inductances. Unfortunately, the MC method will be
extremely time consuming, and more efficient statistical approaches are needed.
212 13 Statistical Inductance Modeling and Extraction
In inductance extraction problem, process variations exist in the width w and height
h of the conductors, which make each element of the inductance matrix (13.2)
follow some kinds of random distributions. Solving this problem is done by deriving
the random distribution and then effectively computing the mean and variance of the
inductance with the given geometric randomness parameters. As shown in (13.6)
and (13.7), each filament i is modeled by two Gaussian random variables, nw;i
and nh;i . Suppose there are n filaments, then the inductance extraction problem
involves 2n Gaussian random variables with spatial correlation modeled as in (13.8).
Even with sparse grid quadrature, the number of sampling points still grows
quadratically with the number of variables. As a result, we should further reduce
the number of variables by exploiting the spatial correlations of the given random
width and height parameters of wires.
We start with independent random variables as the input of the spectral stochastic
method. Since the height and width variables of all wires are correlated, this
correlation should be removed before using the spectral stochastic method. As
proved in Sect. 2.3 of Chap. 2, the theoretical basis for decoupling the correlation
of those variables is Cholesky decomposition.
Proposition 13.1. For a set of zero-mean Gaussian distributed variables whose
covariance matrix is ˝2n2n , if there is a matrix L satisfying ˝ D LLT , then
can be represented by a set of independent standard normally distributed variables
as D L .
Here the covariance matrix ˝2n2n contains the covariance between all the nw;i
and nh;i for each filament, and ˝ is always a semipositive definite matrix due to
the nature of covariance matrix. At the same time, PFA [74] can substitute Cholesky
decomposition when variable reduction is needed. Eigendecomposition on ˝2n2n
yields: p
p
˝2n2n D LLT ; L D
1 e1 ; : : : ;
2n e2n ; (13.9)
3 The Presented Statistical Inductance Extraction Method—statHenry 213
where f
i g are eigenvalues in order of descending magnitude, and fei g are
corresponding eigenvectors. After PFA, the number of random variables involved
in inductance extraction is reduced from 2n to k by truncating L using the first k
items.
The error of PFA can be controlled by k:
P
2n
i
i DkC1
err D ; (13.10)
P
2n
i
i D1
where bigger k leads to a more accurate result. PFA is efficient, especially when the
correlation length is large. In the experiments, we set the correlation length being
eight times the width of wires. As a result, PFA can reduce the number of variables
from 40 to 14 with an error of about 1% in an example with 20 parallel wires.
PFA for variable reduction considers only the spatial correlation between wires,
while ignoring the influence of the inductance itself. One idea is to consider the
importance of the outputs during the reduction process. We follow the recently
proposed wPFA technique to seek better variable reduction efficiency [204].
If a weight is defined for each physical variable i , to reflect its impact on the
output, then a set of new variables are formed:
D W ; (13.11)
The error controlling process is similar to (13.10), but using the weighted eigenval-
ues
i . For inductance extraction, we take the partial inductance of the deterministic
structure as the weight, since this normal structure reflects an approximate equality
214 13 Statistical Inductance Modeling and Extraction
After explaining all the important pieces from related works in Chap. 2, we are now
ready to present the new algorithm—statHenry. Figure 13.1 is a flowchart of the
presented algorithm.
4 Numerical Examples
In this section, we compare the results of the statHenry method against the MC
method and a simple method using HPC with the sparse grid technique but without
variable reduction. The method statHenry has been implemented in Matlab 8.0. All
the experimental results were obtained using a computer with a 1:6 GHz Intel quad-
core i7-720 and 4 GB memory running Microsoft Windows 7 Ultimate operating
system. The version of FastHenry is 3.0 [76]. The initial results of this chapter were
published in [63, 143].
For the experiment, we set up four test cases to examine the algorithm: 2 parallel
wires, 5 parallel wires, 10 parallel wires, and 20 parallel wires as shown in Fig. 13.2.
In all four models, all of the wires have a width of 1
m, length of 6
m, and
pitch of 1
m between them. The unit of the inductance in the experiment results is
picohenry (pH).
4 Numerical Examples 215
We set the standard deviation as 10% of the wire widths and wire heights and the
correlation length being 8
m to indicate a strong correlation.
First, we compare the accuracy of the three methods in terms of the mean
and standard deviations of loop/partial inductance. The results are summarized in
Table 13.1. In the table, we report the results from four test cases as mentioned.
In each case, we report the results for partial self-inductance on wire 1 (L11p ) and
loop inductance between wire 1 and 2 (L12l ). Columns 3–4 are the mean value
and standard deviation value for the MC method (MC). And columns 5–12 are the
mean value, standard deviation value, and their errors comparing with MC method
for HPC and the presented method. The average error of the mean and standard
deviation of HPC method is 0:05% and 2:01% compared with MC method while
that of statHenry method is 0:05% and 2:06%, respectively. The MC results come
from 10,000 FastHenry runs.
It can be seen that statHenry is very accurate for both mean and standard
deviation compared with the HP C method and MC method. We observe that a
10% standard deviation for the width and height results in variations from 2.73% to
5.10% for the partial and loop inductances, which is significant for timing.
Next, we show the CPU time speedup of the presented method. The results are
summarized in Table 13.2. It can be seen that statHenry can be about two orders
of magnitude faster than the MC method. The average speedup of the HPC method
and statHenry method is 54.1 and 349.7 compared with MC method. We notice that
with more wires, the speedup goes down. This is expected as more wires lead to
more variables, even after the variable reduction, as the number of samplings in
the collocation method is O.m2 / for second-order Hermit polynomials, where m
is the number of variables. As a result, more samplings are needed to compute the
coefficients while MC has the fixed number of samplings (10,000 for all cases).
216 13 Statistical Inductance Modeling and Extraction
Table 13.1 Accuracy comparison (mean and variance values of inductances) among MC, HPC,
and statHenry
Values (pH) Error
Wires Inductance MC HPC statHenry HPC (%) statHenry (%)
2 L11p Mean 2.851 2.850 2.850 0.02 0.03
std 0.080 0.078 0.078 2.31 2.47
2 L12l Mean 3.058 3.057 3.056 0.05 0.06
std 0.158 0.156 0.155 1.50 2.21
5 L11p Mean 2.849 2.851 2.851 0.08 0.07
std 0.078 0.078 0.078 0.86 0.24
5 L12l Mean 3.054 3.058 3.058 0.11 0.11
std 0.155 0.156 0.156 1.01 0.70
10 L11p Mean 2.852 2.853 2.853 0.01 0.02
std 0.079 0.078 1.23% 0.078 1.37
10 L12l Mean 3.059 3.060 3.060 0.05 0.05
std 0.159 0.156 1.55% 0.156 1.74
20 L11p Mean 2.852 2.853 2.853 0.03 0.03
std 0.081 0.078 0.078 3.74 3.82
20 L12l Mean 3.059 3.060 3.060 0.04 0.05
std 0.163 0.156 0.156 3.88 3.96
Table 13.2 CPU runtime comparison among MC, HPC, and statHenry
MC HPC Speedup statHenry Speedup
Wires Time (s) Time (s) (vs. MC) Time (s) (vs. MC)
2 5394:4 32:6 165:4 9:8 550:4
5 7442:8 192:5 38:7 12:6 589:1
10 8333:5 893:7 9:3 42:5 195:9
20 13698:3 4532:9 3:0 215:8 63:5
Table 13.3 shows the reduction effects using PFA and wPFA for all the cases
under the same errors. We can see that with weighted wPFA, we can achieve lower
reduced variable number and fewer quadrature points for sampling, thus better
efficiency for the entire extraction algorithm.
Finally, we study the variational impacts of partial and loop inductances under
different variabilities for width and height using statHenry and the MC method.
The variation statistics are summarized in Table 13.4. Here we report the results
for standard deviations from 10% to 30% for width and height for statHenry
4 Numerical Examples 217
0.1
0.05
0
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
loop inductance L12 (pH)
Fig. 13.3 The loop inductance L12l distribution changes for the 10-parallel-wire case under 30%
width and height variations
method and MC method for 10-parallel-wire case. As the variation due to process
imperfections grows as the technology advances, we can see that inductance
variation will also grow. Considering a typical 3 range for variation, a 30%
standard deviation means that width and height changes can reach 90% of their
values. It can be seen that with the increasing variations of width and height (from
10% to 30%), the std=mean of partial inductance grows from 2.75% to 8.65% while
that of loop inductance grows from 5.10% to 15.9% , which can significantly impact
the noise and delay of the wires. The average error of mean and standard deviation
of statHenry is 0.33% and 1.75% compared with MC for all variabilities of width
and height. From this, we can see that the results of statHenry agree closely with
MC under different variations.
218 13 Statistical Inductance Modeling and Extraction
0.15
0.1
0.05
0
1.5 2 2.5 3 3.5 4 4.5
partial inductance L11 (pH)
Fig. 13.4 The partial inductance L11p distribution changes for the 10-parallel-wire case under
30% width and height variations
Figures 13.3 and 13.4 show the loop (for wire 1 and wire 2, L12l ) and partial
inductance distributions (for wire 1 itself, L11p ) under 30% deviations of width and
heights for the 10-parallel-wire case.
5 Summary
1 Introduction
Analog and mixed-signal circuits are very sensitive to the process variations
as many matchings are required. This situation becomes worse as technology
continues to scale to 90 nm and below owing to the increasing process-induced
variability [122, 148]. Transistor-level mismatch is the primary obstacle to reach
a high yield rate for analog designs in sub-90 nm technologies. For example, due
to an inverse-square-root-law dependence with the transistor area, the mismatch of
CMOS devices nearly doubles for each process generation less than 90 nm [80,104].
Since the traditional worst-case- or corner-case-based analysis is too pessimistic to
sacrifice the speed, power, and area, the statistical approach [133] thereby becomes
a trend to estimate the analog mismatch and performance variations. The variations
in the analog components can come from systematic (or global spatial variation)
ones and stochastic (or local random variation) ones. In this chapter, we model both
variations as the parameter intervals on the components of analog circuits.
Analog circuit designers usually perform a MC analysis to analyze the stochastic
mismatch and predict the variational responses of their designs under faults. As MC
analysis requires a large number of repeated circuit simulations, its computational
cost is expensive. Moreover, the pseudorandom generator in MC introduces numer-
ical noises that may lead to errors. More efficient variational analysis, which can
give the performance bounds, is highly desirable.
Bounding or worst-case analysis of analog circuits under parameter variations
has been studied in the past for fault-driven testing and tolerance analysis of analog
circuits [83, 162, 179]. The proposed approaches include sensitivity analysis [185],
the sampling method [168], and interval arithmetic-based approaches [83, 140, 162,
179]. But sensitivity-based method cannot give the worst-case in general, and the
sampling based method is limited to a few variables. Interval arithmetic methods, in
general, have the reputation of overly pessimistic in the past. Recently, worst-case
analysis of linearized analog circuits in frequency domain has been proposed [140],
where Kharitonov’s functions [79] were applied to obtain the performance bounds
in frequency domain, but no systemic method was proposed to obtain variational
transfer functions.
In this chapter, we propose a performance bound analysis algorithm of analog
circuits considering the process variations [61]. The presented method employs
several techniques to compute the bounding responses of analog circuits in the
frequency domain. First, the presented method models the variations of component
values as intervals measured from tested chip and manufacture processes. Then
the presented method applies determinant decision diagram (DDD) graph-based
symbolic analysis to derive the exact symbolic transfer functions from linearized
analog circuits. After this, affine interval arithmetic is applied to compute the vari-
ational transfer functions of the analog circuit with variational coefficients in forms
of intervals. Finally, the frequency response bounds (maximum and minimum) are
obtained by performing evaluations of a finite number of special transfer functions
given by the Kharitonov’s theorem, which shows the proved response bounds for
given interval polynomial functions in frequency domain. We show that symbolic
decancellation is critical for reducing inherent pessimism in the affine interval
analysis. We also show that response bounds given by the Kharitonov’s functions are
conservative, given the correlations among coefficient intervals in transfer functions.
Numerical examples demonstrate the presented method is more efficient than the
MC method.
The rest of this chapter is organized as follows: Sect. 2 gives a review on
interval arithmetic and affine arithmetic. The presented performance bound analysis
method is presented in Sect. 3. Section 4 shows the experimental results, and Sect. 5
summarizes this chapter.
Interval arithmetic was introduced by Moore in the 1960s [113] to solve range
estimation considering uncertainties. In interval arithmetic, a classical variable x
is represented by an interval xO D Œx ; x C which satisfies x x x C .
However, the interval arithmetic suffers the overestimation problem as it often yields
an interval that is much wider than the exact range of the function.
As an example, given xO D Œ1; 1, the interval evaluation of xO xO produces
Œ1 1; 1 .1/ D Œ2; 2 instead of Œ0; 0, which is the actual range of that
expression.
Affine arithmetic was proposed by Stolfi and de Figueiredo [25] to overcome the
error explosion problem of standard interval analysis. In affine interval, the affine
form xO of random variable x is given by
n
X
xO D x0 C xi "i ; (14.1)
i D1
3 The Performance Bound Analysis Method Based on Graph-based Symbolic Analysis 223
Returning to the previous example, if x has the affine form xO D 0 C "1 then xO xO D
"1 "1 D 0 gives the accurate result. Affine arithmetic multiplication is defined as
n
X
xO yO D x0 y0 C .x0 yi C xi y0 /"i C rad.x/
O rad.y/
O "nC1 ; (14.3)
i D1
in which "nC1 is a new noise symbol that is distinct from all the other noise
symbols "i .i D 1; 2; : : : ; n/. We notice that affine operations mitigate the problem
associated with symbolic cancellations in addition, but for multiplication, the
symbolic cancellation can still exist, for instance if xO yO yO xO D 0, but they will
generate two different "nC1 ’s when multiplication is done first and the complete
cancellation will not happen any more.
We first present the whole algorithm flow of the presented performance bound
analysis algorithm in Fig. 14.1. Basically, the presented method consists of two
major computing steps. The first step is to compute the variational transfer functions
from the variational circuit parameters, which will be done via DDD-based symbolic
analysis method and affine interval arithmetic (steps 1–3). Second, we compute the
frequency response bounds via Kharitonov’s functions, which just require a few
transfer function evaluations (step 4). Kharitonov’s functions can lead to approved
upper and lower bounds for the frequency domain responses for a variational
transfer function. We will present the two major computing steps in the following
sections.
In this section, we first provide a brief overview of DDD [160]. Next we show how
affine arithmetic can be applied to compute the variational transfer function.
224 14 Performance Bound Analysis of Variational Linearized Analog Circuits
1 R2 2 R3 3
I R1 C1 C2 C3
We view each entry in the circuit matrix as one distinct symbol and rewrite its system
determinant in the left-hand side of Fig. 14.3. Then its DDD representation is shown
in the rhs.
A DDD is a signed, rooted, directed acyclic graph with two terminal nodes,
namely, the 0-terminal vertex and the 1-terminal vertex. Each nonterminal DDD
vertex is labeled by a symbol in the determinant denoted by ai (A to G in Fig. 14.3),
and a positive or negative sign denoted by s.ai /. It originates two outgoing edges,
3 The Performance Bound Analysis Method Based on Graph-based Symbolic Analysis 225
1 edge
A +
0 edge
D + C -
A B 0
C D E G + F - B +
0 F G
E +
1 0
Fig. 14.3 A matrix determinant and its DDD representation. Reprinted with permission from [61].
c 2011 IEEE
called 1-edge and 0-edge. Each vertex ai represents a symbolic expression D.ai /
defined recursively as follows:
where Dai and Dai represent, respectively, the symbolic expressions of the nodes
pointed by the 1-edge and 0-edge of ai . The 1-terminal vertex represents expression
1, whereas the 0-terminal vertex represents expression 0. For example, vertex E
in Fig. 14.3 represents expression E, and vertex F represents expression EF ,
and vertex D represents expression DG FE. We also say that a DDD vertex
D represents an expression defined in the DDD subgraph rooted at D.
A 1-path in a DDD corresponds with a product term in the original DDD, which
is defined as a path from the root vertex (A in our example) to the 1-terminal
including all symbolic symbols and signs of the nodes that originate all the 1-edges
along the 1-path. In our example, there exist three 1-paths representing three product
terms: ADG, AFE, and CBG. The root vertex represents the sum of these
product terms. Size of a DDD is the number of DDD nodes, denoted by jDDDj.
Once a DDD has been constructed, the numerical values of the determinant it
represents can be computed by performing the depth-first-type search of the graph
and performing (14.4) at each node, whose time complexity is linear function
of the size of the graphs (its number of nodes). The computing step is called
Evaluate(D) where D is a DDD root. With proper node ordering and hierarchical
approaches, DDD can be very efficient to compute transfer functions of large analog
circuits [160, 174].
In order to compute the symbolic coefficients of the transfer function in different
powers of s, the original DDD can be expanded to the s-expanded DDD [161].
By doing this, each coefficient of the transfer function is represented by a coefficient
226 14 Performance Bound Analysis of Variational Linearized Analog Circuits
DDD. The s-expanded DDD can be constructed from the complex DDD in linear
time in the size of the original complex DDD [161].
where coefficients aO i and bOj are all affine intervals. This can be computed by means
of affine arithmetic [25]. Basically, the DDD Evaluation operation traverses the
DDD in a depth-first style and performs one multiplication and one addition at each
node as shown in (14.4). Now the two operations will be replaced by the addition
and multiplication from affine arithmetic.
As mentioned before, the interval and affine arithmetic operations are very sensitive
to the symbolic term cancellations, which, however, have significant presences
in the DDD and s-expanded DDD. It was shown that about 70–90% terms in
the determinant of a MNA-formulated circuit matrix are canceling terms [175].
Notice that symbolic cancellation always happens even in the presence of parameter
variations.
In DDD evaluation, we have both addition and multiplication as shown in (14.4).
Cancellation can lead to large errors if not removed. For example, considering two
terms xO yO zO and zO yO .x/,
O and supposing xO D 1 C "1 ; yO D 1 C "2 ; zO D 1 C "3,
then
xO yO zO C zO yO .x/
O D "4 C 3"5 "6 3"7 ; (14.6)
3 The Performance Bound Analysis Method Based on Graph-based Symbolic Analysis 227
The affine arithmetic used in DDD evaluation is addition and multiplication. The
affine addition is accurate as it does not include any new noise symbol. However,
for affine multiplication shown in (14.3), every time a new noise symbol "nC1 is
added and this process will reduce the accuracy of the bound of affine arithmetic
compared with real bound. In our implementation, we store the coefficients of first
order as well as second-order noise symbols and we only add new noise symbol for
higher orders. The affine multiplication in (14.3) is changed to:
n
X
xO yO D x0 y0 C .x0 yi C xi y0 /"i
i D1
n
X n
X n
X
C xi yi "2i C .xi yj C xj yi /"i "j : (14.7)
i D1 i D1 j Di C1
which is more accurate than the bound Œx0 y0 rad2 ; x0 y0 C rad2 obtained by
original affine multiplication in (14.3). For other combinations of the values of
x ; x C ; xi ; y ; y C ; yi , the accuracy of affine multiplication can also be increased
accordingly via considering second-order noise symbols.
228 14 Performance Bound Analysis of Variational Linearized Analog Circuits
Given a transfer function with variational coefficients, one can perform MC-
based approach to compute the variational responses in frequency domain. However,
more efficient works can be done via Kharitonov’s functions which are only a few,
but can give the approved bounds of the responses in frequency domain.
Kharitonov’s seminal work proposed in 1978 [79] was originally concerned
with the stability issues of a polynomial (with real coefficients) with coefficient
uncertainties (due to perturbations). He showed that one needs to verify only four
special polynomials to ensure that all the variational polynomials are stable.
Specifically, given a family of polynomials with real and variational coefficients,
where
a b
omax
2 4
1 3
omin
emin emax
Fig. 14.4 (a) Kharitonov’s rectangle in state 8. (b) Kharitonov’s rectangle for all nine states.
Reprinted with permission from [61].
c 2011 IEEE
Specifically, in the complex frequency domain, the magnitude and phase re-
sponse of Kharitonov’s rectangle in the complex plane can be divided into nine
states, which is shown in Fig. 14.4b [90]. And the corresponding maximum and
minimum magnitude and phase of the nine states are shown in Table 14.1:
Pmax .!/ D max.jQ1 .!/j; jQ2 .!/j; jQ3 .!/j; jQ4 .!/j/; (14.19)
Pmin .!/ D min.jQ1 .!/j; jQ2 .!/j; jQ3 .!/j; jQ4 .!/j;
max ArgP! D max.jQ1 .!/j; jQ2 .!/j; jQ3 .!/j; jQ4 .!/j/: (14.21)
In Table 14.1, jP .j!/j and argŒP .j!/ are defined as the magnitude and
phase of the polynomial P .j!/. Once the variational transfer function is obtained
230 14 Performance Bound Analysis of Variational Linearized Analog Circuits
from (14.5), the coefficients can be converted from affine interval to classical
interval as aO i D Œai ; aiC and bOj D Œbj ; bjC . Afterward, one can compute the
upper and lower bounds of the transfer function easily:
Since the maximum and minimum magnitude and phase of numerator N.s/ and
denominator D.s/ have only a few possible cases which are shown in Table 14.1, it
is very straightforward to obtain the magnitude and phase bounds of H.s/ compared
to large sampling-based MC simulations [90].
It was shown that if all the variational coefficients are not correlated and the value
of each coefficient in numerator and denominator belongs to finite real interval,
the magnitude and phase bound are precise (real bound) [90], i.e., each bound will
be attained by one function in the variational function family. But in our problem,
we know that each circuit parameter may contribute to several coefficients during
the evaluations of coefficient DDDs, and thus, the variational coefficients are not
independent.
However, DDD can generate the dominant terms of each coefficient in different
powers of s by performing the shortest path algorithm [176]. The shared parameters
in the dominant terms can be removed from different coefficients to tighten the affine
interval bounds and reduce the correlation between coefficients.
In the experiment part, we show that the bounds given by Kharitonov’s theorem
are conservative and they indeed cover all the responses from the MC simulation
results.
4 Numerical Examples
The presented method has been implemented in CCC, and the affine arithmetic part
is based on [43]. All the experimental results were carried out in a Linux system with
quad Intel Xeon CPUs with 3 GHz and 16 GB memory. The presented performance
bound method was tested on two sample circuits, one is a CMOS low-pass filter
(shown in Fig. 14.5), another is a CMOS cascode op-amp circuit [154] where the
small signal model is used to model the MOSFET transistors. The initial results of
this chapter were published in [61].
The information about the complexity of complex DDD and s-expanded DDD
after symbol decancellation are shown in column 1 to 7 in Table 14.3, in which
NumP and DenP are the total numbers of product terms in the numerator and
denominator of the transfer function and jDDDj is the size(number of vertices)
4 Numerical Examples 231
a b
i1 1 1 2 2
in
1
i2 1 2 1d
5 F 1
F i3
3
Fig. 14.5 (a) A low-pass filter. (b) A linear model of the op-amp in the low-pass filter. Reprinted
with permission from [61]. c 2011 IEEE
Table 14.3 Summary of DDD information and performance of the presented method
Complex DDD s-Expanded DDD
Circuit NumP DenP jDDDj NumP DenP jDDDj
Low-pass 5 8 31 7 70 32
Cascode 76 216 153 4,143 13,239 561
Number Global Local Bound range Speed up
Circuit of " variation (%) Variation (%) Mag (%) Pha (%) to MC
Low-pass 7 5 10 95.1 93.8 115
10 10 92.5 91.9 101
Cascode 30 5 10 83.9 84.3 77
10 10 81.1 80.2 68
of the DDD representing both the numerator and the denominator of the transfer
function. From the table, we can see that s-expanded DDDs are able to represent a
huge number of product terms with a relatively small number of vertices by means
of sharing among different coefficient DDDs.
First, we show that term decancellation is critical in improving the accuracy for
interval bounds in DDD evaluation using affine interval. Table 14.2 shows the effect
of coefficient affine radius reduction considering term decancellation for the given
two example circuits during the DDD evaluation under different sets of variations.
Var, Nom, and Den represent process variation, numerator, and denominator, respec-
tively. As can be seen from the table, the average radius reduction amount is 35:4%
and 49:8% for numerators and denominators, respectively, and the reduction effect
grows with the increasing of process variation. As a result, symbolic decancellation
can indeed significantly reduce the pessimism of affine arithmetic.
232 14 Performance Bound Analysis of Variational Linearized Analog Circuits
0
Magnitude (dB)
−30
103 104 105 106 107
0
−20
Phase (deg)
−100
103 104 105 106 107
Frequency (Hz)
Fig. 14.6 Bode diagram of the CMOS low-pass filter. Reprinted with permission from [61].
c 2011 IEEE
Second, we present the performance of the presented method. For the low-pass filter
example, we introduce three noise symbols " as the local variation source for the
VCCS, resistor, and capacitor inside linear op-amp model shown in Fig. 14.5b. And
we introduce another four noise symbols " for other devices of the filter as global
variation. For the cascode op-amp example, we introduce three noise symbols " for
the VCCS, resistor, and capacitor inside the small signal model for each MOSFET
transistor as local variation source and introduce another six noise symbols " for
other devices in the op-amp as global variations. The total number of noise symbols
for each testing circuit is shown in the 8th column in Table 14.3. As a DDD
expression is exactly symbolic and does not have any approximations, it is proved
to be accurate compared with SPICE (which uses the simple linearized device
models). In the experiments, we compare the obtained result with the Monte Carlo
simulations using DDD. We test the presented algorithm on different global/local
variation pairs as is shown in column 9. We introduce the bound range, which is
the average value of the result of the bound of the MC simulation divided by the
bound of the presented method.
Shown in Figs. 14.6 and 14.7 are the two results for comparison for the presented
method and the MC method under 10% global, 10% local variation and 5% global,
10% local variation, in which Affine DDD is the presented method and the Nominal
is the response of the circuit without parameter variation. During all the simulations,
5 Summary 233
Magnitude (dB)
40
Nominal
20 Affine DDD
Monte Carlo
0
100 102 104 106
0
Phase (deg)
−50 Nominal
Affine DDD
Monte Carlo
−100
100 102 104 106
Frequency (Hz)
Fig. 14.7 Bode diagram of the CMOS cascode op-amp. Reprinted with permission from [61].
c 2011 IEEE
5 Summary
1 Introduction
nominal case, the nominal SDAE at dc can be linearized with the stochastic current
source. The obtained dc solution from SiSMA is used as initial condition (i c) for
transient analysis. This assumption may not be accurate enough for describing the
mismatch during the transient simulation as the stochastic current source is only
included during dc. Another limitation is that SiSMA calculates the mismatch
by the extraction and analysis of a covariance matrix to avoid an expensive MC
simulation. When there are thousands of devices, it would be slow to analyze the
covariance matrix. Moreover, the computation is expensive for large-scale problems
since the entire circuit is analyzed twice. As a result, there is still a need to find a
faster transient mismatch analysis technique that requires improvements in twofold:
a different NMC method and an efficient macromodel by the nonlinear model order
reduction (MOR).
This chapter presents a fast NMC mismatch analysis, named isTPWL method
[202], which uses an incremental and stochastic TPWL macromodel. First, we
introduce the transient mismatch model and its macromodeling in this chapter and
then the way to linearize SDAE along a series of snapshots on a nominal transient
trajectory. After that, stochastic current source (for mismatch) is added at each
snapshot as a perturbation, which is more accurate than considering the mismatch
through an i c condition [6]. We further show how to apply an improved TPWL
model order reduction [58, 144, 181] to generate a stochastic nonlinear macromodel
along the snapshots of the nominal transient trajectory. After that, we apply it for a
fast transient mismatch analysis along the full transient trajectory. The presented
approach applies incremental aggregation on local tangent subspaces, linearized
at snapshots. In this way, the applied technique can reduce the computational
complexity of [58] and even improve the accuracy of [144].
The numerical examples show that the isTPWL method is 5 times more
accurate than the work in [144] and is 20 faster than the work in [58] on average.
Besides, the nonlinear macromodels reduce the runtime by up to 25 compared to
the use of the full model during the mismatch analysis.
Next, in order to solve the SDAE efficiently and avoid applying MC iterations or
analyzing the expensive covariance matrix [6], the stochastic variation is described
by spectral stochastic method based on OPC and forms an according SDAE [196].
The chapter presents a new method to apply OPC for nonlinear analog circuits
during an NMC mismatch analysis. Numerical results show that compared to the
MC method, the presented method is 1,000 times faster with a similar accuracy.
The rest of the chapter is organized in the following manner. In Sect. 2, the
background of the mismatch model and the nonlinear model order reduction are
presented. Section 3 discusses a transient mismatch analysis in SDAE, including a
perturbation analysis and a NMC analysis by the OPC expansions. We develop an
incremental and stochastic TPWL model order reduction for mismatch in Sect. 4.
And numerical examples are given in Sect. 5. Section 6 concludes and summarizes
the chapter.
2 Preliminary 237
2 Preliminary
Precise mismatch model and analysis are the key to a robust analog circuit design.
Similar to the two components of process variation, inter-die and intra-die, there
are global and local components of mismatch. The global mismatch affects the the
whole chip the same way, while the local mismatch is more complex and the most
difficult one to analyze, and hence, it is the focus of this chapter.
The local mismatch is dependent on the variation in process parameter. The
Pelgrom’s model is one of the most popular CMOS mismatch models, which [133]
relates the local mismatch variance of one electrical parameter (such as the channel
current Id ) with geometrical parameters (such as the area A) by a geometrical
dependence equation as follows:
ˇ
Id D p ; (15.1)
A
Based on the mismatch model in (15.2), a NMC transient mismatch analysis for
a large number of transistors can be developed, which is shown in Sect. 3.
Here we discuss the nominal model for nonlinear circuit first, then expand it to
stochastic model. The nominal nonlinear circuit is described by the following
differential algebra equation (DAE):
P t/ D Bu.t/;
f .x; x; (15.3)
238 15 Stochastic Analog Mismatch Analysis
where x (xP D dx=dt) are the state variables, which include nodal voltage and branch
current. f .x; x;
P t/ is used to describe the nonlinear i v relation, and u.t/ are the
external sources with a topology matrix B, which describes how to add them into
the circuit. The time cost of solving the MNA equations in (15.3) includes three
parts: device evaluation, matrix factorization, and time-step control and integration.
Among these three items, the portion of runtime mainly comes from the matrix
factorization when the circuit size is large or when devices are latent in most of the
time. Supposing we are under this condition, model order reduction can be used to
reduce the size of circuit, and then reduce the overall runtime efficiently. Therefore,
model order reduction can be applied in a transient mismatch analysis as a powerful
speedup tool as well.
The basic idea in model order reduction is to find a small dimensioned subspace
that can represent the original state space with a preserved system response, which
can be usually realized in the view of a coordinate transformation. For linear circuits,
the coordinate transformation can be described by a linear mapping as follows:
z D V T x; x D V z; (15.4)
we have
ˇ
d ˇ d
zP D fO.z; t/ C Bu.t/;
O fO.z; t/ D f .x; t/ ˇˇ ; BO D B: (15.8)
dx xD 1 .z/ dx
The authors of [58] presented a working related the above nonlinear mapping
function with a TPWL method [144], which leads to a local two-dimensional
(2D) projection [58]. The bright side is that such a local 2D-projection is constructed
from local tangent subspaces, which maintains a high accuracy. However, the time
complexity comes out as an issue. Local 2D-projection could be computationally
expensive to project and store, when the number of local tangent subspaces is large.
On the other hand, the TPWL method [144] approximated the nonlinear mapping
function by aggregating those local tangent subspaces with the use of a global
SVD. This global SVD results in a one-dimensional (1D) projection. Obviously, the
global 1D-projection leads to a more efficient projection and less runtime. Another
thing is the accuracy of the TPWL model order reduction is limited because the
information in the dominant bases of each local tangent subspace is lost during the
global SVD [58]. In Sect. 4, an incremental aggregation that can balance the speed
and accuracy is introduced. In addition, the nonlinear model order reduction can be
extended to consider the stochastic mismatch as shown in Sect. 4.
It is difficult to add the stochastic mismatch into the state variable x of (15.3)
directly, since f .x; x;P / may not be differentiable. Therefore, we model the
mismatch as a current source i.x; / added at the rhs of (15.3), similar to SiSMA [6]:
P t/ D F i.x; / C Bu.t/:
f .x; x; (15.9)
Here, F is the topology matrix describing the way to connect i into the circuit.
Based on the BPV equation in (15.2), the stochastic current source i has the
following form:
X
i.x; / D n.x/ g ˇ .pl /l ; (15.10)
l
where l is a random variable associated with a stochastic distribution W .l / for the
parameter pl . n.x/ describes the biasing-dependent condition (depending on x; x), P
provided from a nominal transient simulation. g ˇ .pl / is a constant for the parameter
pl at operating region ˇ. Taking one CMOS transistor with respect to the parameter
p
area A, for instance, A is one Gaussian random variable, g ˇ .A/ is ˇ = A, and
n.x/ becomes Id . Generally speaking, g ˇ .pl / can be either derived based on the
analytical device equations or practically characterized from measurements [105].
240 15 Stochastic Analog Mismatch Analysis
In this chapter, we assume that the impact of the local mismatch is small, (15.9) and
can be solved by treating the right-hand-side term for mismatch as a perturbation
to the nominal trajectory x .0/ .t/ of the circuit, where x .0/ .t/ is the nominal state
variable or solution of the nonlinear circuit equation:
f x .0/ ; xP .0/ ; t D Bu.t/: (15.11)
or
G x .0/ ; xP .0/ xm C C x .0/ ; xP .0/ xP m D F in x .0/ ; ; (15.13)
where
ˇ
P t ˇˇ
@f x; x;
G x .0/ ; xP .0/ D ˇ ;
@x ˇ
ˇ
P xP .0/
xDx .0/ ;xD
ˇ
P t ˇˇ
@f x; x;
C x .0/ ; xP .0/ D ˇ (15.14)
@xP ˇ
ˇ
P xP .0/
xDx .0/ ;xD
are the linearized conductive and capacitive components stamped by the companion
models in SPICE, and xm D x x .0/ is the first-order perturbed mismatch response.
Recall that x .0/ .t/ and xP .0/ .t/ are a number of time-dependent biasing points along
the transient trajectory.
Performing Monte Carlo or the correlation mismatch analysis can be really ex-
pensive, so in this part, we will introduce the perturbed SDAE (15.13) where
the random variable is solved through an expansion of the OPC using spectral
stochastic method in Sect. 3.2 of Chap. 2. Different process variations are related
to the different orthogonal polynomials. In this chapter, we assume that the random
3 Stochastic Transient Mismatch Analysis 241
process parameters for the local mismatch have a Gaussian distribution. Therefore,
an according Hermite polynomial (represent one random variable)
is used to construct the basis of HPC expansion to calculate the mean and the
variance of xm .t/.
The first step is expanding the stochastic state variable xm .t/ by
X
xm .t/ D ˛i .t/˚i ./: (15.16)
i
where W ./ is the PDF of the random variable . We assume all parameters involved
here follow Gaussian distribution.
Without the loss of generality, for one random variable for modeling one
geometrical parameter p, it is easy to verify that (15.17) leads to
˛0 D 0; ˛2 D 0
G x .0/ ; xP .0/ ˛1 .t/ C C x .0/ ; xP .0/ ˛P1 .t/ D F n x .0/ g ˇ .p/ (15.18)
are Jacobians and the current source of mismatch at the kth time-instant along the
nominal trajectory x .0/ .
242 15 Stochastic Analog Mismatch Analysis
In this part, using one CMOS transistor as an example, which is modeled with a
geometric parameter A, and the according Gaussian random variable A , (15.18)
becomes
1 1 ˇ
Gk C Ck ˛1 .tk / D Ck ˛1 .tk h/ C p .Id /k (15.22)
h h A
at the kth time step. Recall that Gk , Ck , and .Id /k represent the nominal value
of conductance p (gds ), capacitance (cds ), and channel current Id evaluated at tk ;
g ˇ .A/ is ˇ = A, and n.x/ becomes Id . Note that ˇ is the extracted constant from
Pelgrom’s model.
In this way, the transient mismatch voltage .xm D ˛1 .t/˚1 .A // of this transistor
has a time-varying standard variance ˛1 .t/2 , which
p can be solved from the above
perturbation equation. In most of the cases, ˇ = A is about few percentages of the
nominal channel current Id . The more important thing is that we can simultaneously
solve the transient mismatch vector using (15.18) with a generally characterized
g ˇ .pl / by the BPV model [105] for thousands of different typed transistors.
For speedup purpose, we can take K snapshots along a nominal transient trajectory
instead of performing a full simulation for the nominal transient and transient
mismatch. Then the subspaces or macromodels can be found from the K snapshots
with respect to right-hand-side of the nominal input and stochastic current source,
respectively. Afterward, efficient transient analysis and transient mismatch estima-
tion can be performed along the full transient trajectory using those macromodels. In
the following part, we first introduce an incremental TPWL method for the nominal
transient to balance the accuracy and efficiency when generating the macromodel.
After that, we extend this approach to incremental stochastic TPWL (isTPWL) to
handle the stochastic mismatch.
4 Macromodeling for Mismatch Analysis 243
As discussed in Sect. 2, the first step in TPWL takes a few number of snapshots
along the typical transient trajectory and performs the local reduction at each
linearized snapshot or biasing point. The second step is creating a global subspace
using a sequence of linearized local subspaces obtained at those snapshots. Then we
apply a singular value decomposition (SVD) [51] to analyze the global subspace,
and further construct a global projection matrix with weights. The linearized
stochastic DAE (15.18) can be naturally reduced in the framework of the TPWL
method since the stochastic mismatch analysis isTPWL is performed along the
nominal trajectory x .0/ . n o
.0/ .0/
Suppose that there are K snapshots x1 ; : : : ; xK taken along the nominal
trajectory x .0/ . The linearized SDAE at the kth snapshot should be
can be constructed locally. Here we use the subscript to describe the index of
snapshot, and the superscript to describe the index of the reduction order.
When the ninput vector isogiven (usually a set of typical inputs is used), we take K
.0/ .0/
snapshots x1 ; : : : ; xK along a nominal transient trajectory x .0/ .t/ and linearize
the DAE (15.3) at K snapshots (or biasing points), with the first snapshot x1 taken
at the i c point. The linearized DAE at kth (k D 1; : : : ; K) snapshot is
.0/ .0/ .0/ .0/
Gk x xk C Ck xP xP k D ık ; ık D Bu.tk / f xk ; xP k ; tk ; (15.26)
.0/
where ık represents the rhs source and the “nonequilibrium” update. xk at the kth
snapshot is contained by a subspace of moments fAk , Ak Rk , A2k Rk , . . . ,g expanded
244 15 Stochastic Analog Mismatch Analysis
X
K h i
x D 1 .z/
.0/
wk xk C Vk z zk (15.29)
kD1
and
X
K h i
.0/
z D .x/ wk zk C VkT x xk ; (15.30)
kD1
P
K
where wk kD1 wk D 1 is the weighted kernel function. The weighted kernel
function depends on the distance between a point on the trajectory and a lineariza-
tion point [144].
A nonlinear model order reduction is derived in terms of a local two-dimensional
(2D) projection based on equations (15.8), (15.29), and (15.30) as follows:
X
K X
K h i XK
.0/ .0/
wl wk VlT Gk Vk z zk C VlT Ck Vk zP zPk D wl VlT ık ;
lD1 kD1 lD1
(15.31)
4 Macromodeling for Mismatch Analysis 245
where we assume that all Vk s are reduced to the same order q 0 . The number of
sampled snapshots is required to be quite large to maintain a high accuracy for
circuits with a sharp transition (input) or strong nonlinearity (device). For this kind
of circuits, the numerical examples show that the number of sampled snapshots
(or neighbors) has to be large to produce a good accuracy. As such, the computa-
tional runtime cost would be prohibited by the local 2D projection (15.31) in [58].
On the other hand, the TPWL method in [144] approximates the nonlinear
mapping function by aggregating the local subspace Vk (2 N q 0 ) into a unified
global subspace spanfV1 ; V2 ; : : : ; VK g, which can be further compressed into a
lower-dimensioned subspace V (2 N q, q N ) by a SVD as follows,
X
K h i XK
.0/ .0/
wk V T Gk V z zk C V T Ck V zP zPk D wk V T ık : (15.33)
kD1 kD1
It is easy to see that such a global 1D-projection has a smaller projection time and
storage than the local 2D-projection. However, the global 1D-projection usually
requires a higher-order q to achieve an accuracy similar to the local 2D projection
with the order q 0 (q 0 < q) [58] at the same time, since the dominant bases of those
local Vk s are interpolated by the global aggregation.
Longer runtime and larger storage are required by the local 2D-projection in (15.31)
compared to the global 1D-projection (15.33). On the other hand, the local 2D-
projection (15.31) is more accurate than the global 1D-projection (15.33) by V.
Therefore, we need a procedure that can balance both of the accuracy and efficiency.
The manifold ddx
can be covered by the local tangent subspaces fV1 , V2 ,. . . ,VK g
along the trajectory, where each Vk can be further composed of different orders
q0
of dominant bases, fv1k ; v2k ; : : : ; vk g. As such, an effective aggregation needs to
consider the order or the dominance of those bases. This motivates us to use those
local tangent subspaces to decompose the space spanned first according to the order.
In this way, (15.29) becomes
0
X
K X
K X
q
1 p .0/
xD .z/ wk xk C wk vk z zk
kD1 kD1 pD1
246 15 Stochastic Analog Mismatch Analysis
0
X
K X
q
X
K
p .0/
D wk xk C vk wk z zk
kD1 pD1 kD1
X
K h i
.0/ .0/
D wk xk C v11 w1 z z1 C : : : C v1K wK z zK
kD1
h i
q .0/ q .0/
C : : : C v1 w1 z z1 C : : : C vK wK z zK : (15.34)
After that, we can form a global tangent subspace in the order of the dominant
bases by n 0 0 o
˚
q q q0
span v11 ; v12 ; : : : ; v1K ; : : : ; span v1 ; v2 ; : : : ; vK : (15.35)
After the incremental aggregation, we further extend the above discussion to build
the TPWL macromodel for stochastic mismatch analysis. Instead of linearizing the
5 Numerical Examples 247
DAE in (15.3) directly, we linearize the SDAE (15.18) at K snapshots along the
nominal trajectory similarly, and then construct the local tangent subspace Vk by
the following formula:
5 Numerical Examples
Table 15.1 Scalability comparison of runtime and error for the exact model with
MC, the exact model with OPC, and the isTPWL macromodel with OPC
Case Circuit # of nodes # of steps # of snapshots # of orders
1 Diode chain 802 225 24 25
2 BJT mixer-1 238 135 25 25
3 BJT mixer-2 1,248 219 83 45
4 CMOS comp. 654 228 75 60
Exact OPC OPCCisTPWL
MC
Case Time (s) Time (s) Error (%) Time (s) Error (%)
1 520.1 0.53 0.41 0.02 0.43
2 338.0 0.34 0.29 0.02 0.36
3 348.0 0.20 0.18 0.04 0.24
4 412.1 0.39 0.41 0.08 0.62
In this part, we first compare the accuracy of the waveform of transient mismatch
between the MC method (1,000 iterations) and the exact orthogonal PC. After that,
we further compare the accuracy with the isTPWL macromodel. In addition, we
also compare the waveform of the transient mismatch and the waveform by adding
mismatch as one initial condition similar to the setting in SiSMA [6] technique.
Finally, the runtime and waveform error are summarized in Table 15.1.
The first example is a BJT-mixer circuit including an extracted distributed
inductor with 238 state variables. The waveforms are compared by solving the
perturbed SDAE (15.13) with use of the MC analysis and the OPC expansion,
respectively. We apply MC analysis with Gaussian distribution 1,000 times at one
time step and calculate the time-varying standard deviation. It takes 348 s for the
transient mismatch by the MC analysis, and only 0:20 s (more than 1,000 times
speedup) for the exact OPC expansion up to the second order with error less than
0:18%. Clearly, these two waveforms of transient mismatches got from the two
methods are virtually identical, as shown in Fig. 15.1.
Next, we show further speed improvement by macromodeling. The second
example is a CMOS comparator including an extracted power supply with 654 state
variables. Waveforms of the exact OPC and the one further reduced by isTPWL are
compared in this part. Figure 15.2a shows the comparison of the transient nominal,
while Fig. 15.2b shows the comparison of the transient mismatch. Here 75 snapshots
are used to generate the macromodel: we reduce the original model to a macromodel
with the order of 60. For a short transient with 228 time steps, it takes 0.39 second
for the exaction and 0.08 second for the isTPWL (five times speedup). The error of
waveforms analyzed by isTPWL is 0.62%.
We further compare the transient mismatch waveforms for different ways to
add the mismatch. The first is to add the stochastic mismatch only for the ic
condition like the procedure used in SiSMA [6] (Fig. 15.3). The second is adding
5 Numerical Examples 249
Transient Mismatch
5
Monte Carlo
SOP Expansion
4
3
(mV)
0
0 1 2 3 4 5 6 7 8 9 10
(ns)
Fig. 15.1 Transient mismatch (the time-varying standard deviation) comparison at output of a
BJT mixer with distributed inductor: the exact by Monte CarloN and the exact by orthogonal PC
expansion. Reprinted with permission from [52]. c 2011 ACM
6
1
4
0.5
2
0
0
0 2 4 6 0 2 4 6
(ns) (ns)
Exact SOP isTPWL SOP Exact SOP isTPWL SOP
Fig. 15.2 Transient nominal x .0/ .t / (a) and transient mismatch (˛1 .t /) (b) for one output of a
COMS comparator by the exact orthogonal PC and the isTPWL. Reprinted with permission from
[52].
c 2011 ACM
the stochastic mismatch during every time step as in the presented approach. In this
part, we use a diode chain with 802 state variables. Figure 15.4 shows one waveform
of the transient nominal, and two waveforms with mismatches added differently,
from which we can see that the waveform with mismatch added at i c shows a
nonnegligible difference.
250 15 Stochastic Analog Mismatch Analysis
Transient Waveform
1.1
Nominal Transient
1.0 SiSMA Transient
Exact−SOP Transient
0.9
(V)
0.8
0.7
0.6
0.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
(ns)
Fig. 15.3 Transient waveform comparison at output of a diode chain: the transient nominal, the
transient with mismatch by SiSMA (adding mismatch at i c only), the transient with mismatch by
the presented method (adding mismatch at transient trajectory). Reprinted with permission from
[52]. c 2011 ACM
Transient Mismatch
Exact SOP
isTPWL
2 TPWL
1.5
(mV)
0.5
0
0 1 2 3 4 5 6 7 8 9 10
(ns)
Fig. 15.4 Transient mismatch (˛1 .t /, the time-varying standard deviation) comparison at output
of a BJT mixer with distributed substrate: the exact by OPC expansion, the macromodel by TPWL
(order 45), and the macromodel by isTPWL (order 45). The waveform by isTPWL is visually
identical to the exact OPC. Reprinted with permission from [52]. c 2011 ACM
Finally, Table 15.1 summarizes the runtime and error of four different analog/RF
circuits. In this table, the waveform error is defined as the relative difference
between the exact and the macromodel, and the runtime here is the total simulation
time. We find that the OPC expansion reduces the runtime by 1,000 times yet
5 Numerical Examples 251
6
20
5 cmos−
maniMOR/isTPWL
comp
TPWL/isTPWL
cmos−
comp 15 bjt−
4 mixer
bjt− −2
mixer bjt−
3 −1 mixer
bjt− 10 −1
mixer diode
−2 chain
diode
2 chain
5
1
0 0
1 2 3 4 1 2 3 4
ckt type ckt type
Fig. 15.5 (a) Comparison of the ratio of the waveform error by TPWL and by isTPWL under the
same reduction order. (b) comparison of the ratio of the reduction runtime by maniMOR and by
isTPWL under the same reduction order. In both cases, isTPWL is used as the baseline. Reprinted
with permission from [52]. c 2011 ACM
By isTPWL, we can improve the accuracy and runtime further, as shown in this
part. First, Fig. 15.4 presents the transient-mismatch waveform comparison for a
BJT mixer including the distributed substrate with total 1,248 state variables. Here
83 snapshots are used for both TPWL and isTPWL to reduce the original model
to a macromodel with the order of 45. We find that the waveform by isTPWL is
visually identical to the exact OPC expansion. But the waveform by TPWL [144]
shows a nonnegligible waveform error 4.5 times larger than the one by isTPWL.
Figure 15.5 further summarizes the comparison by the four circuits used in the
previous section. Figure 15.5a is the comparison of the ratio (TPWL vs. isTPWL)
of errors in waveforms for simulated macromodels by TPWL [144] and by isTPWL
under the same model reduction order. Figure 15.5b shows the comparison of the
252 15 Stochastic Analog Mismatch Analysis
ratio (maniMOR vs. isTPWL) of the reduction time for reduced macromodels by
maniMOR [58], and by isTPWL under the same reduction order. In both of those
cases, isTPWL is used as the baseline when calculating the ratio. The numerical
examples show that the isTPWL method is 5 times more accurate than TPWL [144]
and is 20 times faster than maniMOR [58] on average, which clearly demonstrates
the advantage to use the incremental aggregation.
6 Summary
This chapter has presented a fast non-MC mismatch analysis. It models the
mismatch by a current source associated with a random variable and forms a SDAE.
The random variable in SDAE is expanded by OPC. This leads to an efficient
solution without using the MC or correlation analysis. Moreover, the SDAE has
been solved by an improved TPWL model order reduction, called isTPWL. An
incremental aggregation has been introduced to balance the efficiency and accuracy
when generating the macromodel. Numerical examples show that when compared to
the MC method, the presented method is 1,000 times faster with a similar accuracy.
Moreover, on average, the isTPWL method is 5 times more accurate than the work
in [144] and is 20 times faster than the work in [58]. In addition, the use of a reduced
macromodel reduces the runtime by up to 25 times when compared to the use of a
full model.
Chapter 16
Statistical Yield Analysis and Optimization
1 Introduction
2 Problem Formulations
We formulate the yield optimization problem in this chapter. This is based on the
observation that the parameter vector p can change the performance metric fm ,
such as delay and output swing, and further lead to the circuit failure that affects
the yield rate. In general, the parametric yield Y .p/ is defined as the percentage of
manufactured circuits that can satisfy the performance constraints.
To illustrate this we can consider one output voltage that discharges from high to
low. Because the process variation can perturb the parameter vector p away from
their nominal values, this leads to the transient variation (mismatch) waveform
shown in Fig. 16.1.
0.9
0.898
Output Voltage
0.896 fail
0.894
vthreshold
0.892
success
tmax
0.89
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04
Time
Number of Occurances
60 Successful
region
40
Failed region
20
0
0.891 0.8915 0.892 0.8925 0.893 0.8935
Output Voltage
This means that those curves below vth at tmax correspond to successful samplings. In
addition, one can plot the distribution of output voltages at tmax shown in Fig. 16.2. It
is clear that samplings located at the left of the performance constraint are successes,
while those at the right are failures.
As such, parametric yield can be defined as
Z
Y .pI t/ D pdf .fm .pI t//dS; (16.2)
S
where S is the successful region and pdf .fm .pI t// is the PDF of the performance
metric fm .pI t/ of interest. With defined parametric yield, one can optimize the
parametric yield by tuning the parameters under stochastic variations. Meanwhile,
one needs to consider other performance merits, such as power and area, during the
optimization process.
Accordingly, stochastic multiobjective optimization problem in this chapter can
be formulated in detail below:
Maximize Y .p/;
Minimize pc .p/;
Subject to Y .p/ YN ;
pc .p/ pNc ;
F .p/ Fmax ;
pmin p0 pmax : (16.3)
256 16 Statistical Yield Analysis and Optimization
Here, Y .p/ is the parametric yield associated with the parameter vector p and
pc .p/ is the power consumption. F .p/ denotes other performance metrics (such
as area A), which define the feasible design space. Moreover, YN and pNc are the
minimum yield rate and maximum power consumption (or targeted values) that can
be accepted, respectively. In other words, the multiobjective optimization procedure
is to maximize the Y .p/ that should be larger than YN and minimize the pc .p/ that
should be smaller than pNc simultaneously. Meanwhile, other constraints defined by
F .p/ should be satisfied.
Moreover, p is a vector of the process parameters with variations and can be
expressed as p D p0 C ıp. Also, p0 is a vector of the nominal values assigned in
the design stage, and ıp consists of parameter variations with zero-mean Gaussian
distributions. In addition, all nominal values of process parameters p0 are assumed
to be limited within the feasible parameter space (pmin ; pmax ) and can be tuned for
better yield rate.
One effective solution of this optimization is the gradient-based approach, which
requires the calculation of the sensitivity in the stochastic domain. As discussed
later, this chapter develops a stochastic sensitivity analysis, which can be embedded
into one sequential linear programming (SLP) to solve this optimization problem
efficiently.
In this section, we show how to apply the OPC technique introduced in Sect. 3.2 of
Chap. 2 to analyze and estimate the yield.
In this section, we first review the existing works of mismatch analysis [6,32,105,
133]. Here we focus on the stochastic variation, or referred to as local mismatch. We
illustrate the stochastic variation analysis using MOS transistors in the following
section. A similar approach can be extended to other types of transistors by the
so-called propagation of variance (POV) method [32, 105].
The mismatch of one MOS transistor is usually modeled by Pelgrom’s model
[133], which relates the local mismatch variance of one electrical parameter with
geometrical parameters by
ˇ
Dp ; (16.4)
W L
where ˇ is the additional fitting parameter.
To consider the local mismatch during circuit simulation without running Monte
Carlo, SiSMA [6] models the random local mismatch of a MOS transistor by a
stochastic noise current source , coupled with the nominal drain current ID in
parallel. can be expressed by
ˇ
D ID tm .W; L/.x; y/: (16.5)
3 Stochastic Variation Analysis for Yield Analysis 257
ˇ
Here, the ID is determined by the operating region of MOS transistors; tm .W; L/
considers the geometry of the device active area:
ˇ
tm .W; L/ D 1 C p ; (16.6)
W L
and .x; y/ refers to the sources of all the variations that depend on the device
position, which can include the spatial correlation [6]. Here, .x; y/ D 1 because
all parameters are decoupled after the PCA.
Note that the random variable in the stochastic current source can be expanded
by the spectral stochastic method [187, 196]. For example, let us use the channel
length L of one MOS transistor as the variation source. Assuming the variation of
L is small, one can expand tm .W; L/ around its nominal value W.0/ and L.0/ with
Taylor expansion by
ˇ
tm .W; L/ D 1 C p
WL
2 3
6 1
ˇ
1 7
D1C p 4p q L L.0/ 5
W.0/ L.0/ 2 L 3
.0/
2 3
ˇ
6 1 1 7
D1C p 4p q 5 (16.7)
W.0/ L.0/ 2 L 3
.0/
Here, is the random variable for the variation of the channel length L. One can
describe by OPC. Based on the Askey scheme [196], a Gaussian distribution of
can be expanded using Hermite polynomials ˚i (i D 0; : : : ; n) by
X
n
D gi0 ˚i ; (16.8)
i D0
X
n
D gi ˚i ; (16.9)
i D0
258 16 Statistical Yield Analysis and Optimization
where gi is the new expression of the expanded coefficients but with geometry
dependence.
Knowing the expression of for one parameter variation source, multiple process
parameters pi (i D 1; ; m) can be considered by a vector of stochastic current
source .t/.
On the other hand, any integrated circuit is composed of passive and active
devices described by a number of terminal-branch equations. According to KCL,
one can obtain a differential algebraic equation (DAE) as below:
d
q.x.t// C f .x.t/; t/ C B u.t/ D 0: (16.10)
dt
Here, x.t/ is vector of state variables consisting of node voltages and branch
currents. q.x.t/; t/ contains active components such as charges and fluxes. Also,
f .x.t/; t/ describes passive components, and u.t/ denotes input sources. B de-
scribes how to connect sources into the circuit which is determined by circuit
topology.
Similar to [6], one can add .t/, representing the mismatch, to the rhs of the
differential algebra equation (DAE):
dq.x.t//
C f .x.t// C B u.t/ D T .t/; (16.11)
dt
which describes the circuit and system under stochastic variations. Note that T is the
topology matrix describing how to connect .t/ into the circuit, and one can have
X
m
T .t/ D Tpi pi (16.12)
i D1
for multiple parameters. For example, pi is the mismatch current source for i th
parameter variation, which can be expanded using OPC shown in (16.9).
In summary, we outline the overall algorithm flow as in Algorithm (1). From this
flow, we observe that the optimization procedure involves several optimization
iterations. Each of the iterations contains three major steps: stochastic yield
estimation, stochastic sensitivity analysis, and stochastic yield optimization. The
last is achieved by tuning nominal parameters along the obtained gradient directions.
Notice that we take all design parameters as random variables; fixed parameters that
cannot be tuned can be removed from this procedure by parameter screening.
3 Stochastic Variation Analysis for Yield Analysis 259
In this section, we will discuss how to estimate the parametric yield and further
optimize it by tuning parameters automatically. As such, we first show how to
estimate the parametric yield with the stochastic variation (mismatch) (fm It ; fm It )
obtained from the above NMC mismatch analysis.
First, we construct the performance distribution at one time step tk by (fm .tk /,
fm .tk /), shown as the solid curve from 3 to C 3 in Fig. 16.3. Then, the
performance constraint is given as
With the constraints, the boundary separating success region from failure region can
be plotted as the straight line h.pI tk / D 0 in following figure.
As a result, the performance fm .tk / located at the left of h.pI tk / D 0 (shown
as the shaded region) can satisfy the constraint in (16.13) and thus belongs to the
260 16 Statistical Yield Analysis and Optimization
Number of Occuranes
1
h(p;t)=0
Success
Region
0.5
0
−3 −2 −1 0 1 2 3
Performance (fm)
successful region SO . Hence, the parametric yield can be estimated with the area
ratio by
SO
Y .p/ D : (16.14)
Sfm
When denoting the entire region area Sfm D 1, Y .p/ becomes SO and is determined
by the integration below:
Z Z
Y .p/ D pdf.fm .pI tk //dS D pdf.fm ; fm /dS; (16.15)
SO SO
In order to enhance yield rate, most optimization engines need sensitivity infor-
mation to identify and further tune those critical parameters. However, with the
emerging process variations beyond 90 nm, traditional sensitivity analysis becomes
inefficient: either use the worst-case scenario or conduct MC simulations [88, 100,
153]. Therefore, an efficient NMC-based stochastic sensitivity analysis is needed
for this purpose. With all parameter variations calculated from the fast mismatch
analysis in Sect. 15, one can further explore the impact or contribution from the
parameter variation pi to the performance variation fm . This can be utilized to
perform optimization procedure for better performance merits. In this section, we
3 Stochastic Variation Analysis for Yield Analysis 261
@fm . p I t/
spi .t/ D ; i D 1; ; m; (16.16)
@pi
where spi .t/ is the derivative of the performance variation fm with respect to the
i th random parameter variable pi at one time instant t. Depending on the problem
or circuit under study, the performance fm can be output voltage, period, and power,
and the parameter can be transistor width, length, and oxide thickness. Such a so-
called stochastic sensitivity can be also understood based on the POV relationship
[32, 105]:
X @fm . p I t/ 2
2
fm D 2p : (16.17)
i
@pi
i
Here, 2p is the parameter variance and 2f is the performance variance.
i m
Note that the performance variation fm is mainly determined by ˛1 [196] in
(16.15) at time step tk as derived in Sect. 3.3, while ˛2 has little impact on the
performance variation. As such, one can truncate the OPC expansions to the first-
order for the calculation of mean and variance, and experiments show that the
first order expansion can provide adequate accuracy. Therefore, ˛1 is the dominant
moment for fm while ˛2 can be truncated to simplify calculation. Therefore, we
have the following:
˛1 .tk / D c1 C c0 T g.tk /; (16.18)
where
k 1 k 1
c0 D G.0/ C C.0/ ;
h
1 k
c1 D c0 C.0/ ˛1 .tk h/ :
h
ı
As such, one can further calculate the stochastic sensitivity @fm . p I t/ @pi
using
@fm . p I t/ @g.tk /
spi .tk / D D c0 Tpi ; (16.19)
@pi @pi
which can be utilized in any gradient-based optimization to improve the yield rate.
262 16 Statistical Yield Analysis and Optimization
Next, we make use of sensitivities spi to improve parametric yield. Meanwhile, since
power is also a primary design concern, we treat power consumption reduction as an
extra objective and solve a multiobjective optimization problem defined in Sect. 3.
Note that other performance merits can be treated as objectives of optimization
in a similar way. As such, by tuning nominal process parameters along gradient
directions, we enable more parameters containing process variations to satisfy the
performance constraints. This is an important feature for a robust design. In this
section, we demonstrate this requirement by a sequential linear programming (SLP).
At the beginning of each optimization iteration, the nonlinear objective functions
Y .p/ and pc .p/ can be approximated by linearization:
Y .p/ D Y p.0/ C rp Y . p.0/ /T p p.0/ ;
pc .p/ D pc p.0/ C rp pc . p.0/ /T p p.0/ ; (16.20)
where p.0/ represents the nominal design parameters while p contains the process
variations of these parameters. Note that (31) is a first-order Taylor expansion of
parametric yield Y .p/ defined in (16.15) and power consumption pc .p/, around
the nominal parameter region p.0/ . Thus, rp Y . p.0/ / is a vector consisting of
ı
@Y . p / @pi . The same is true for power consumption rp pc . p.0/ /. Therefore, the
nonlinear objective functions can be transformed into a series of linear optimiza-
tion subproblems. The optimization terminates when the convergence criterion is
achieved.
As such, the stochastic multiobjective yield optimization problem in Sect. 3 can
be reformulated as
T
Maximize Y .p/ D Y p.0/ C rp Y p.0/ p p.0/ ;
T
Minimize pc .p/ D pc p.0/ C rp pc p.0/ p p.0/ ;
Subject to Y .p/ YN ;
pc .p/ pNc ;
F .p/ Fmax ;
pmin p pmax ;
Number of Occuranes
1 μfm(p1) h(p;t)=0
0.5
0
−4 −3 −2 −1 0 1 2 3
Performance (fm)
Z
@Y . p / @pdf.F . p I t//
D dS
@pi @pi
SO
Z
@pdf.F / @F . p I t/
D dS: (16.21)
@F @pi
SO
ı ı
As a result, @Y . p / @pi can be obtained with @F . p I t/ @pi calculated from the
stochastic sensitivity analysis. Note that the PDF of the performance variation and
the integral region SO are both given from the yield estimation in (16.15).
We illustrate the presented optimization procedure for yield objective function
Y .p/ through Fig. 16.4. With the parametric yield estimation using the NMC
mismatch analysis, the distribution of performance fm for nominal parameters
p0 can be plotted as a solid curve, which has a mean value fm .p0 /. With the
performance constraint h.pI t/ 0 in (16.1), the shaded area located at the left of
the constraint line is the desired successful region. One yield optimization procedure
needs to move the performance distribution to left side so that the shaded area can
be maximized. Therefore, the problem here is how to change the process parameters
p in order to move the performance distribution for an enhanced yield rate.
Moreover, power consumption can be estimated by
where Vdd is the power supply voltage source and iNVdd is the average value of current
through the voltage source. The power consumption optimization can be explained
as shown in Fig. 16.5. The initial design generates the current iVdd denoted as the
black curve and leads to high power consumption pc .
264 16 Statistical Yield Analysis and Optimization
x 10−5
0
Optimal
Middle
Current through power supply (A)
−0.5 Initial
−1
−1.5
−2
−2.5
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035
Time (ns)
On the other hand, we perform the same procedure to optimize the power
consumption. Same as in (16.19), we calculate the sensitivity of power consumption
w.r.t. process parameters at iVdd with a minimum current value:
4 Numerical Examples 265
" ˇ #
@pc .p/ @iVdd ˇˇ
D Vdd : (16.25)
@pi @pi ˇiV
dd DMinimum
In this way, the total changes to the process parameters are the weighted
summation below:
ıptotal D
1 ıpyield C
2 ıppower ; .
1 ;
2 2 Œ0; 1/; (16.27)
where
1 and
2 are weights for yield and power consumption. Also,
1 and
2 can
be updated dynamically and weight
should be larger for the performance merit
that is farther from the target value.
Therefore, one can update p with the new parameter p0 C ıptotal . Moreover, the
NMC mismatch analysis is conducted to update the performance distribution, which
is denoted by a dashed curve shown in Fig. 16.4. With the updated new parameters
and performance distribution, all performance constraints F .p/ Fmax are checked
for violations. If they are still valid, p becomes the new design point, and this
procedure is repeated again to enhance the yield rate.
4 Numerical Examples
The presented NMC algorithms has been implemented for NMC mismatch analysis,
yield estimation, and optimization in a Matlab-based circuit simulator. All experi-
ments are performed on a Linux server with a 2.4 GHz Xeon processor and 4 GB
memory. In the experiment, we take the widths of MOSFETs as process variable
parameters. The initial results of this chapter were published in [52].
However, the presented approach only considers design parameters such as
channel width W , because the distribution of design parameters under process
variations can be shifted by tuning their nominal values. As such, more design
parameters with process variations can satisfy the performance constraints and the
total yield rate can be enhanced, which is also needed for a robust design. Therefore,
the parameters that are not tunable, such as channel length L, are not considered in
the presented approach.
We first use an operational amplifier (OPAM) to validate the accuracy and
efficiency of the NMC mismatch analysis by comparing it with the MC simulations.
Then, a Schmitt trigger is used to verify the presented parametric yield estimation
and stochastic yield analysis. Next, we demonstrate the validity and efficiency of
the presented yield optimization flow using a six-transistor SRAM cell.
266 16 Statistical Yield Analysis and Optimization
Vdd +5V
Mp5
Mp8 Mp7
Output
Mn6
Mn3 Mn4
Vss −5V
The OPAM is shown in Fig. 16.6, which consists of eight MOS transistors. Their
widths are treated as stochastic variational parameters with Gaussian distributions
and a 10% random perturbation from their nominal values. Moreover, we consider
the matching design requirements for the input pair devices, such as the same
nominal width (Wp1 D Wp2 , W n3 D W n4 , Wp5 D Wp8 ) and the fixed width
ratio (W n6 D kW n3 ).
We first introduce the width variations to all MOS transistors, and perform 1;000
times MC simulations with a high confidence level to find the variational trajectories
at the output node. Then, we apply the developed NMC mismatch analysis to OPAM
and locate the boundaries ( 3, C 3) of variational trajectories with a one-
time run of transient circuit simulation. The results are shown in Fig. 16.7, where
blue lines denote the MC simulations and the two black lines are results from the
presented mismatch analysis. We observe that our approach can capture the transient
stochastic variation (mismatch) as accurately as that in the MC result.
We further compare the accuracy and efficiency for NMC mismatch analysis and
the MC method in the Table 16.1. From this table, we can see that NMC mismatch
analysis not only can achieve 2% accuracy, but also gains 680 speedup over MC
method.
We further consider the Schmitt trigger shown in Fig. 16.8 to demonstrate the
stochastic yield estimation. Similarly, we assume the widths of all MOSFETs
4 Numerical Examples 267
Fig. 16.7 NMC mismatch analysis vs. Monte Carlo for operational amplifier case
to have 10% variations from their nominal values and to conform to Gaussian
distributions. Moreover, we consider the lower switching threshold VTL to be the
performance metric of the parametric yield, which can be changed due to MOSFET
width variations. Thus, the performance constraint for the parametric yield is the
following: when the input VTL is 1:8 V and the output is initially set to Vdd D 5 V,
the output VOUH should be greater than 4.2 V.
First, we perform 1;000 times MC simulations and compare it with the NMC
stochastic variation analysis shown in Fig. 16.9a. Then, the output distribution from
the MC simulation at the time step where input equals to 1:8 V is plotted in
Fig. 16.9b. Also, the PDF estimation by the NMC mismatch analysis is compared
with MC simulations in the same figure. We can observe that the two distributions
coincide with each other very well.
Then, the yield rate can be calculated with one estimated PDF from the NMC
mismatch analysis efficiently. We list the mean (), standard deviation (), and
yield estimation results from the presented approach and those by MC simulations
in Table 16.2.
268 16 Statistical Yield Analysis and Optimization
Mp1
Mp2 Mp3
Vin Vout
Vdd
Mn1 Mn3
Mn2
GND
With the accurate estimation of output distribution, the presented method can
calculate the yield rate with 2:7% accuracy as well as 756 speedup when compared
to the MC method.
More important, NMC mismatch analysis has linear scalability because all
process variation sources can be modeled as additive mismatch current sources and
introduced into the rhs of DAE system in (16.11).
a 5.2
5
Output Voltage (volt)
4.8
4.6
4.4
4.2
3.8
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Time [ns]
NMC mismatch analysis vs. MC
b 90
80
70
Number of Occuances
60
50
40
30
20
10
0
3.8 3.9 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7
Output Vltage (volt)
Output distributions from NMC mismatch analysis and Monte Carlo
Table 16.3 Sensitivity of Parameter Mn1 width Mn2 width Mn3 width
output with respect to each
MOSFET width variation pi Sensitivity 2.4083e-4 2.4083e-4 4.8069e-3
Parameter Mp1 width Mp2 width Mp3 width
Sensitivity 2.4692e-2 2.4692e-2 0
270 16 Statistical Yield Analysis and Optimization
WL=1
Vdd +5V
Q =1 BL=1
BL_B=1 Q_B=0
Mn1 Mn3
GND
80
Number of Occurances
60
40
20
0
0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17
Output Voltage (volt)
Table 16.4 Sensitivity of Parameter Mn1 width Mn2 width Mp6 width
vBL B and power with
respect to each MOSFET Sensitivity (vBL B ) 1.3922e-3 2.0787e-3 7.0941e-2
width variation pi Sensitivity (power ) 3.7888e-4 5.7816e-4 5.8871e-4
Table 16.5 Comparison of different yield optimization algorithms for SRAM cell
Parameter First cut Baseline Single objective Multiobjective
Mn1 width (m) 1e-5 2.872e-5 2.7841e-5 3.577e-5
Mn2 width (m) 1e-5 2.3282e-5 2.2537e-5 2.7341e-5
Mp6 width (m) 3e-5 1.5308e-5 1.6296e-5 9.7585e-6
Power (W ) 1.0262e-005 3.0852e-5 1.2434e-5 1.0988e-5
Area (m2 ) 2.4e-11 2.81e-11 2.8e-11 2.88e-11
Yield (%) 49.32 94.23 95.49 95.31
Runtime (seconds) 2.42 32.384 27.226 15.21
Iterations 1 12 10 6
The results from all optimization methods are shown in Table 16.5. From this
table, it can be observed that all methods can improve the parametric yield to be
around or even more than 95% compared with the initial design. According nominal
values can be used as better initial design parameters. Meanwhile, the area is smaller
than the maximum acceptable area criterion A 1:2Ainitial .
However, optimal designs from baseline (gravity-directed) method and single-
objective optimization require 2:75 and 21% more power consumption when
compared with initial design, respectively. Proposed method can lead to optimal
design with only 7% more power requirement. Therefore, it can be demonstrated
that presented multiobjective optimization not only can improve the yield rate
but also suppresses the power penalty simultaneously. Moreover, the presented
optimization procedure only needs six iteration runs to achieve the shown results
within 15:21 s. Notice that the parametric yield Y .p/ can be further improved with
a higher target yield YN and more optimization iterations.
5 Summary
In this chapter, we have presented one fast NMC method to calculate mismatch in
time domain with the consideration of local random process variations. We model
the mismatch by a stochastic current source expanded by OPC. This leads to an
efficient solution for mismatch and further for parametric yield rate without using
the expensive MC simulations. In addition, we are the first to derive stochastic sensi-
tivity of yield within the context of OPC. This leads to a multiobjective optimization
method to improve the yield rate and other performance merits simultaneously.
Numerical examples demonstrate that the presented NMC approach can achieve up
to 2% accuracy with 700 speedup when compared to the Monte Carlo simulations.
Moreover, the presented multiobjective optimization can improve the yield rate up
to 95:3% with other performance merits optimized at the same time. The presented
approach assumes the need to know the distribution type of the process variations
in advance.
Chapter 17
Voltage Binning Technique for Yield
Optimization
1 Introduction
Process-induced variability has huge impacts on the circuit performance and yield
in the nanometer VLSI technologies [71]. Indeed, the characteristics of devices and
interconnects are prone to increasing process variability as device geometries get
close to the size of atoms. The yield loss from process fluctuations is expected
to increase as the transistor size scaling down. As a result, improving yields
considering the process variations is critical to mitigate the huge impacts from
process uncertainties.
Supply voltage adjustment can be used as a technique to reduce yield loss, which
is based on the fact that both chip performance and power consumption depend
on supply voltage. By increasing supply voltage, chip performance improves. Both
dynamic power and leakage power, however, will become worse at the same
time [182]. In contrast, lower supply voltage will reduce the power consumption but
make the chip slower. In other words, faster chips usually have higher power con-
sumption and slower chips often come with lower power consumption. Therefore,
it is possible to reduce yield loss by adjusting supply voltage to make some failing
chips satisfy application constraints.
For yield enhancement, there are also different schemes for supply voltage
adjustment. In [182], the authors proposed an adaptive supply voltage method for
reducing impacts of parameter variations by assigning individual supply voltage to
each manufactured chip. This methodology can be very effective but it requires
significant effort in chip design and testing at many different supply voltages.
Recently, a new voltage binning technique has been proposed by the patent [85]
for yield optimization as an alternative technique of adaptive supply voltage. All
manufactured chips are divided into several bins, and a certain value of supply
voltage is assigned to each bin to make sure all chips in this bin can work under the
corresponding supply voltage. At the cost of small yield loss, this technique is much
more practical than the adaptive voltage supply. But only a general idea is given
in [85], without details of selecting optimal supply voltage levels. Another recent
work [213] provides a statistical technique of yield computation for different voltage
binning schemes. From results of statistical timing and variational power analysis,
the authors developed a combination of analytical and numerical techniques to
compute joint PDFs of chip yield as a function of inter-die variation in effective
gate length L, and solve the problem of computing optimal supply voltages for a
given binning scheme.
However, the method in [213] only works under several assumptions and approx-
imations that will cause accuracy loss in both yield analysis and optimal voltage
binning scheme. The statistical model for both timing and power analysis used in
[213] is simplified by integrating all process variations other than inter-die variation
in L to one random variable following Gaussian distribution. Indeed, the intra-die
variations have a huge impact on performance and power consumption [3,158]. And
other process variations (gate oxide thickness, threshold voltage, etc.) have different
distributions and should not be simplified to only one Gaussian distribution.
Furthermore, this technique cannot predict the number of voltage bins needed under
certain yield requirement before solving the voltage binning problem.
In general, voltage binning for yield improvement becomes an emerging tech-
nique but with many unsolved issues. In this chapter, we present a new voltage
binning scheme to optimize yield. The presented method first computes the set of
working supply voltage segments under timing and power constraints from either
the measurement of real chips or MC-based SPICE simulations on a chip with
process variations. Then on top of the distribution of voltage segment lengths,
we propose a formula to predict the upper bound of bin number needed under
uniform binning scheme for the yield requirement. Furthermore, we frame the
voltage binning scheme as a set-cover problem in graph theory and solve it by a
greedy algorithm in an incremental way. The presented method is not limited by
the number or types of process variability involved as it should be based on actual
measured results. Furthermore, the presented algorithm can be easily extended to
deal with a range of working supply voltages for dynamic voltage scaling under
different operation modes (like lower power and high-performance modes).
Numerical examples on a number of benchmarks under 45 nm technology show
that the presented method can correctly predict the upper bound on the number of
bins required. The optimal binning scheme can lead to significant saving for the
number of bins compared to the uniform one to achieve the same yield with very
small CPU cost.
2 Problem Formulation
For a single voltage supply, the definition of parametric chip yield is the percentage
of manufactured chips satisfying these constraints. Specifically, we compute yield
for a given voltage level by direct integration in the space of process parameters:
Z Z
Y D f .X1 ; : : : ; Xn /dX1 : : : dXn ; (17.1)
S >0;P <Plim
where f .X1 ; X2 ; : : : ; Xn / is the joint PDF of X1 to Xn , which represents
the process variations. Also, there exists spatial correlation in the intra-die part
of variation. Existing approach in [213] ignores the intra-die variation in process
parameters, which means only one random variable for inter-die variation is
considered. And all other variations except inter-die variation in Leff are integrated
into one Gaussian random variable. In this way, the multi-dimensional integral
in (17.1) can be modeled numerically as a two- or three-dimensional integral.
However, the spatial correlation can have significant impacts on both statistical
timing and statistical power of a circuit [12,158], thus impacts on yield analysis also.
where Y is the total yield under the optimal voltage binning scheme with supply
voltage levels V D fV1 ; V2 ; : : : ; Vk g.
276 17 Voltage Binning Technique for Yield Optimization
We would like to mention one special type of voltage binning in which we have
an infinite number of voltage bins with all possible voltage levels. This binning
scheme allows the supply voltage to be individually tailored for each chip to meet
timing and power constraints. It is obvious that the yield in this case is the maximum
possible yield, named as Ymax , which should be an upper bound of yield for any
other voltage binning scheme. As a result, for optimal solution, kopt should be the
minimum number of bins that make Yk;opt D Ymax .
In this section, we present a new voltage binning scheme, which not only gives the
good solution for a given set of voltage levels, but also computes the minimum
number of bins required. Figure 17.1 presents the overall flow of the presented
method and highlights the major computing steps. Basically, steps 1 and 2 compute
the valid voltage segment for each chip. Step 3 determinates the voltage levels and
the chip assignments to the resulting bins. This is done by a greedy-based set-
covering method. In Fig. 17.1, S left denotes the set of uncovered voltage segments
left in the complete set of valid voltage segments Sval . Vi is the i th supply voltage
level, and chips assigned to Ui can meet both the power and timing constraints at
supply voltage Vi .
The algorithm in step 3 tries to find the voltage level one at a time such that it
can cover as many chips as possible in a greedy fashion (a chip is covered if its valid
Fig. 17.1 The algorithm sketch of the presented new voltage binning method
3 The Presented Voltage Binning Method 277
350
0.25 300
Power (μ W)
Delay (ns)
250
0.2
200
150
0.15
100
0.8 1 1.2 0.8 1 1.2 1.4
Supply voltage (V) Supply voltage (V)
Fig. 17.2 The delay and power change with supply voltage for C432
Vdd segment contains the given voltage level). The algorithm stops when all the
chips are covered, and the number of levels seen so far (kopt ) will be the minimum
number of bins that can reach the maximum possible yield Ymax . In the presented
algorithm, we can also provide a formulation to predict the minimum number of
bins required under the uniform binning scheme from the distribution of length of
valid Vdd segment, which can serve as a guideline for the number of bins required.
For a chip, the working supply voltage range (segment) ŒVlow ; Vhigh actually can be
considered as a knob to do the trade-off between the power and timing of the circuit.
As we know, supply voltage affects power consumption and timing performance
in opposite ways. Reducing supply voltage will decrease the dynamic power and
leakage power, which is often considered the most effective technique for low
power design. On the other hand, propagation delay will increase as supply voltage
decreases [186]. Figure 17.2 shows the mean delay and power consumption as
functions of supply voltage, which show such trends clearly. As a result, given
the power consumption bound and the timing constraint for a chip, Vlow is mainly
decided by timing and Vhigh is mainly determined by power constraint. Since process
variation leads to different timing performances and power consumptions, the valid
Vdd segment ŒVlow ; Vhigh will be different for each chip. As a result, the measured
timing and total power data from a chip can be mapped onto corresponding working
Vdd segments, which is the step 1 in Fig. 17.1. For some chips, we may have
Vlow > Vhigh (invalid segment), which means that these chips will fail on any supply
voltage. So we call them “bad” chips.
278 17 Voltage Binning Technique for Yield Optimization
Vdd
Vmin V1 V2 V3 Vmax
Suppose there are N sampling chips from testing, and nbad bad chips. Obviously,
the maximum number of possible yield via voltage binning scheme only will be
We then define the set of valid segments Sval D ŒVlow ; Vhigh by removing the
bad chips from the sampling set and only keeping the valid segments (step 2 in
Fig. 17.1). Then the voltage binning scheme problem in (17.2) can be framed into
a set-cover problem. Take Fig. 17.3, for instance; there are nval D 13 horizontal
segments between Vmin and Vmax (each corresponds a valid Vdd segment), and the
problem becomes using minimum number of vertical lines to cover all the horizontal
segments. In this case, three voltage levels can cover all the Vdd segments of these
13 chips. We also notice that one chip can be covered by more than one voltage
level. In this case, it can be assigned to any voltage level containing it. The problem
is well known in graph theory with known efficient solutions. This valid voltage
segment model has many benefits compared with other yield analysis model for
voltage binning:
1. Distribution of length of valid supply voltage segment can provide information
about the minimum number for uniform binning under certain yield requirement
(e.g., to achieve 99% for Ymax , more details in Sect. 3.2.).
2. The model can also be used when the allowed supply voltage level for one voltage
bin is an interval or a group of discrete values for voltage scaling mechanism
instead of a scalar (details in Sect. 3.3).
The distribution of valid Vdd segment length (defined as len D Vhigh Vlow ) can be
a guide in yield optimization when there is a lower bound requirement for yield.
And it works for both uniform binning and optimal binning. Notice that the optimal
3 The Presented Voltage Binning Method 279
350
300
250
200
150
100
50
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Length of Valid Vdd Range (V)
Fig. 17.4 Histogram of the length of valid supply voltage segment len for C432
binning can always have an equal or better yield than the uniform binning. Actually,
the experiment result part shows that the number of bins needed for optimal voltage
binning is much smaller than the prediction from the distribution of len. Figure 17.4
shows the histogram of valid supply voltage length, len, for testing circuit C432,
from which we can see that it is hard to tell which type of random variable it belongs
to. However, it is quite simple to get the numerical probability density function
(PDF) and CDF from measured data of testing samples, as well as the mean value
and standard deviation.
Suppose the yield requirement is Yreq and the allowed supply voltages for testing
is in ŒVmin ; Vmax . For the uniform voltage binning scheme, there is k bins, and the set
of supply voltage levels is V D fV1 ; V2 ; : : : ; Vk g. Since the voltage binning scheme
is uniform,
Vi Vi 1 D V const. .i D 2; 3; : : : k/: (17.4)
For the uniform voltage binning scheme, we have the following observations:
Observation 1. If there are k bins in ŒVmin ; Vmax then
Observation 2. For a Vdd segment ŒVlow ; Vhigh with a length len D Vhigh Vlow , if
len > V , there must exist at least one Vdd level in the set of supply voltage levels
V D fV1 ; V2 ; : : : ; Vk g that can cover ŒVlow ; Vhigh . Now we have the following
results:
Proposition 17.1. For the yield requirement Yreq , the upper bound for voltage
binning numbers kup can be determined by
280 17 Voltage Binning Technique for Yield Optimization
Vmax Vmin
kup D 1; (17.6)
F 1 .1 Yreq /
For the upper bound for voltage binning numbers kup , the corresponding Vmin can
be calculated by
Vmax Vmin
Vmin D (Observation 1): (17.8)
kup C 1
The whole voltage binning algorithm for yield analysis and optimization is given
in Fig. 17.1. After the yield analysis and optimization, supply voltage levels V D
fV1 ; V2 ; : : : ; Vk;opt g, and the corresponding set of bins U D fU1 ; U2 ; : : : ; Uk;opt g can
be calculated up to kopt , where Yk;opt D Ymax already.
There are many algorithms for solving the set-cover problem in step 3. By
choosing optimal set-cover algorithm, the global optimal solution can be obtained.
In this case, the decision version of set-covering problem will be NP-complete. In
this chapter, we use a greedy approximation algorithm as shown in Fig. 17.5, which
4 Numerical Examples 281
Fig. 17.5 The flow of greedy algorithm for covering most uncovered elements in S
can easily be implemented to run in polynomial time and achieve a good enough
approximation of optimal solution. Notice that the greedy approximation is not
necessary and any algorithm for set-cover can be used in step 3, which is not a
limitation for the presented valid supply voltage segment model. The solution found
by GREEDY-SET-COVER is at most a small constant times larger than optimal [19],
which is found already satisfactory as shown in the experimental results. Besides,
the greedy algorithm can guarantee that each voltage level will cover the most
segments corresponding to uncovered testing chips, which means this algorithm is
incremental. As a result, if only k 1 bins is needed, we can stop the computation at
k 1 instead of k. And when the designer needs more voltage bins, the computation
does not need to be started all over again. Actually, the benefit of incremental voltage
binning scheme is very useful for circuit design. Since when the number of bins
increase from k 1 to k, the existing k 1 voltage levels will be the same.
We remark that the presented method can be easily extended to deal with a
group of discrete values Vg;1 ; Vg;2 ; : : : for dynamic voltage scaling under different
operation modes instead of a single voltage. For example, if the i th supply voltage
level Vi contains two discrete values, Vs and Vh , which are the supply voltages for
saving-power mode and high-performance mode, respectively (anything in between
also works for the selected chips). Set-cover algorithm in Fig. 17.5 now will use a
range Vg (defined by users) to cover the voltage segments instead of a single voltage
level. Such extension is very straightforward for the presented method.
4 Numerical Examples
In this section, the presented voltage binning technique for yield analysis and opti-
mization was verified on circuits in the ISCAS’85 benchmark set with constraints
on timing performance and power consumption. The circuits were synthesized
with Nangate Open Cell Library. The technology parameters come from the
45 nm FreePDK Base Kit and PTM models [139]. The presented method has been
implemented in Matlab 7.8.0. All the experiments are carried out in a Linux system
with quad Intel Xeon CPUs with 2:99 GHz and 16 GB memory.
282 17 Voltage Binning Technique for Yield Optimization
Table 17.1 Predicted and Circuit Yreq Predicted Real for uni. Real for opt.
actual number of bins needed
under yield requirement C432 99% 25 23 4
97% 10 9 3
95% 7 6 3
C1908 99% 27 12 7
97% 11 6 3
95% 7 3 3
C2670 99% 8 4 3
97% 5 3 2
95% 3 2 1
C7552 99% 30 12 5
97% 9 4 3
95% 6 3 2
For each type of circuit in the benchmark, 10;000 Monte Carlo samples are
generated from process variations. In this chapter, effective gate length L and
gate oxide thickness Tox are considered as two main sources of process variations.
According to [71], the physical variation in L and Tox should be controlled within
˙12%. So the 3 values of variations for L and Tox were set to 12% of the nominal
values, of which inter-die variations constitute 20% and intra-die variations, 80%. L
is modeled as sum of spatially correlated sources of variations, and Tox is modeled
as an independent source of variation. The same framework can be easily extended
to include other parameters of variations. Both L and Tox are modeled as Gaussian
parameters. For the correlated L, the spatial correlation was modeled based on the
exponential models [195].
The power and timing information as a function of supply voltage for each testing
chip is characterized by using SPICE simulation. Under 45 nm technology, typical
supply voltage range is 0:85 V–1:3625 V [69]. Since that, Vdd is varied between 0.8
volt and 1.4 volt in this chapter, which is enough for 45 nm technology.
We remark that practically the power and timing information can be obtained
from measurements. As a result, all the sources of variability of transistors and
interconnects including inter-die and intra-die variations with spatial correlations
will be considered automatically.
As mentioned in Sect. 3.2, the presented valid segment model can be used to
predict the number of bins needed under yield requirement before voltage binning
optimization. Table 17.1 shows the comparison between the predicted number and
the actual number needed under yield requirement for the testing chips. In this
4 Numerical Examples 283
Table 17.2 Yield under uniform and optimal voltage binning schemes (%)
Circuit Ymax VB 1 bin 2 bins 5 bins 10 bins kopt
C432 96.66 Uni. 60.19 79.04 90.52 94.36 4,514
Opt. 80.08 88.68 96.42 96.66 10
C1908 98.06 Uni. 71.80 91.46 95.20 97.04 437
Opt. 89.18 92.88 97.18 98.06 21
C2670 90.15 Uni. 81.12 87.13 89.74 89.95 1,205
Opt. 85.77 88.34 89.83 90.08 13
C7552 93.46 Uni. 73.94 86.38 91.40 92.34 1,254
Opt. 87.22 90.30 92.64 93.26 18
table, Yreq means the lower bound requirement for yield optimization (normalized
by Ymax ). Column 3 is the predicted number of bins, and columns 4 and 5 are the
actual bin numbers found for the uniform and optimal voltage binning schemes,
respectively. This table validates the upper bound formulation for the needed
number of bins in Sect. 3.2. From this table, we can see that the predicted value
is always the upper bound of actual number of bins needed, which can be applied as
a guide for yield requirement in optimization. Table 17.1 also shows that the optimal
voltage binning scheme can significantly reduce the number of bins compared with
the uniform voltage binning schema under the same yield requirement. When yield
requirement is 99% of the optimal yield, the optimal voltage binning scheme can
reduce 52% bin count on average.
Numerical examples for both uniform and the optimal voltage binning schemes
with different number of bins are used to verify the presented voltage binning
technique. Table 17.2 shows the results, where Ymax is the maximum chip yield
which can be achieved when Vdd is adjusted individually for each manufactured
chip, VB stands for voltage binning schemes used, and kopt is the minimum number
of bins to achieve Ymax . From Table 17.2, we can see that the yield of optimal
VB always increases with the number of bins, with Ymax as the upper bound. And
the voltage binning can significantly improve yield compared with simple supply
voltage. Column 8 in Table 17.2 shows that the number of bins needed to achieve
Ymax in optimal voltage binning schemes is only 1.88% of number of bins needed in
the uniform scheme on average, which means that optimal voltage binning schemes
is much more economic in order to reach the best possible yield.
Figure 17.6 compares the yields from uniform and optimal voltage binning
schemes with the number of bins from 1 to 10 for C432. This figure shows
that the optimal binning scheme always provides higher yield than the uniform
284 17 Voltage Binning Technique for Yield Optimization
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10
Number of voltage bins
Fig. 17.6 Yield under uniform and optimal voltage binning schemes for C432
binning scheme. For optimal voltage binning scheme, the yield increasing speed is
slower as the bin number increases since we use greedy algorithm. For other testing
circuits, similar phenomenon is observed from the yield results.
For very strict power or frequency constraints, voltage binning can provide more
opportunities to improve yield. Figure 17.7 shows the changes in parametric yield
for C432 with and without voltage binning yield optimization due to the changes in
frequency and power consumption requirements, where Pnorm is normalized power
constraint and fnorm is normalized frequency constraint. By analyzing this figure, we
can see that parametric yield is sensitive to both performance and power constraints.
As a result, yield can be substantially increased by binning supply voltage to a very
small amount of levels in the optimal voltage binning scheme. For example, without
voltage binning technique, the yield will fall down 0% when constraints become
20% stricter, while the voltage binning technique can keep the yield as high as 80%
under the same situation.
Table 17.3 compares the CPU times among different voltage binning schemes and
different numbers of bins. Since the inputs of the presented algorithm in Fig. 17.1
5 Summary 285
Fig. 17.7 Maximum achievable yield as function of power and performance constraints for C2670
are the measured data for real chips practically, the time cost of measuring data
is not counted in the time cost of the voltage binning method. But in this chapter,
the timing and power data is generated from SPICE simulation. There are three
steps in the presented method as shown in Fig. 17.1. It is easy to see that the time
complexity of steps 1 and 2 is both O.N /, where N is the number of MC sample
points. From [19], step 3 can run within O.N 2 ln.N // time. Therefore, the speed
of the voltage binning algorithm is not related to the size of circuits. Table 17.3
confirms that binning technique is insignificant even for the case of 10 bins, and the
time cost is not increasing with the number of gates on chip.
5 Summary
In this chapter, we have presented a voltage binning technique to improve the yield
of chips. First, A novel formulation has been introduced to predict the maximum
number of bins required under the uniform binning scheme from the distribution of
286 17 Voltage Binning Technique for Yield Optimization
1. A. Abdollahi, F. Fallah, and M. Pedram, “Runtime mechanisms for leakage current reduction
in CMOS VLSI circuits,” in Proc. Int. Symp. on Low Power Electronics and Design (ISLPED),
Aug 2002, pp. 213–218.
2. A. Abu-Dayya and N. Beaulieu, “Comparison of methods of computing correlated lognormal
sum distributions and outages for digital wireless applications,” in Proc. IEEE Vehicular
Technology Conference, vol. 1, June 1994, pp. 175–179.
3. K. Agarwal, D. Blaauw, and V. Zolotov, “Statistical timing analysis for intra-die process
variations with spatial correlations,” in Proc. Int. Conf. on Computer Aided Design (ICCAD),
Nov 2003, pp. 900–907.
4. J. D. Alexander and V. D. Agrawal, “Algorithms for estimating number of glitches and
dynamic power in CMOS circuits with delay variations,” in IEEE Computer Society Annual
Symposium on VLSI, May 2009, pp. 127–132.
5. S. Bhardwaj, S. Vrudhula, and A. Goel, “A unified approach for full chip statistical timing and
leakage analysis of nanoscale circuits considering intradie process variations,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 10, pp. 1812–1825,
Oct 2008.
6. G. Biagetti, S. Orcioni, C. Turchetti, P. Crippa, and M. Alessandrini, “SiSMA: A tool for
efficient analysis of analog CMOS integrated circuits affected by device mismatch,” IEEE
TCAD, pp. 192–207, 2004.
7. S. Borkar, T. Karnik, and V. De, “Design and reliability challenges in nanometer technolo-
gies,” in Proc. Design Automation Conf. (DAC). IEEE Press, 2004, pp. 75–75.
8. S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De, “Parameter variations
and impact on circuits and microarchitecture,” in Proc. Design Automation Conf. (DAC).
IEEE Press, 2003, pp. 338–342.
9. C. Brau, Modern Problems In Classical Electrodynamics. Oxford Univ. Press, 2004.
10. R. Burch, F. Najm, P. Yang, and T. Trick, “A Monte Carlo approach for power estimation,”
IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 1, no. 1, pp. 63–71,
Mar 1993.
11. Y. Cao, Y. Lee, T. Chen, and C. C. Chen, “HiPRIME: hierarchical and passivity reserved
interconnect macromodeling engine for RLKC power delivery,” in Proc. Design Automation
Conf. (DAC), 2002, pp. 379–384.
12. H. Chang and S. Sapatnekar, “Statistical timing analysis under spatial correlations,” IEEE
Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 9,
pp. 1467–1482, Sept. 2005.
13. H. Chang and S. S. Sapatnekar, “Full-chip analysis of leakage power under process variations,
including spatial correlations,” in Proc. IEEE/ACM Design Automation Conference (DAC),
2005, pp. 523–528.
14. H. Chen, S. Neely, J. Xiong, V. Zolotov, and C. Visweswariah, “Statistical modeling and
analysis of static leakage and dynamic switching power,” in Power and Timing Modeling, Op-
timization and Simulation: 18th International Workshop, (PATMOS), Sep 2008, pp. 178–187.
15. R. Chen, L. Zhang, V. Zolotov, C. Visweswariah, and J. Xiong, “Static timing: back to
our roots,” in Proc. Asia South Pacific Design Automation Conf. (ASPDAC), Jan 2008,
pp. 310–315.
16. C. Chiang and J. Kawa, Design for Manufacturability. Springer, 2007.
17. E. Chiprout, “Fast flip-chip power grid analysis via locality and grid shells,” in Proc. Int.
Conf. on Computer Aided Design (ICCAD), Nov 2004, pp. 485–488.
18. T.-L. Chou and K. Roy, “Power estimation under uncertain delays,” Integr. Comput.-Aided
Eng., vol. 5, no. 2, pp. 107–116, Apr 1998.
19. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2nd ed.
MIT Press, 2001.
20. P. Cox, P. Yang, and O. Chatterjee, “Statistical modeling for efficient parametric yield
estimation of MOS VLSI circuits,” in IEEE Int. Electron Devices Meeting, 1983, pp. 391–398.
21. J. Cui, G. Chen, R. Shen, S. X.-D. Tan, W. Yu, and J. Tong, “Variational capacitance
modeling using orthogonal polynomial method,” in Proc. IEEE/ACM International Great
Lakes Symposium on VLSI, 2008, pp. 23–28.
22. L. Daniel, O. C. Siong, L. S. Chay, K. H. Lee, and J. White, “Multi-parameter moment-
matching model-reduction approach for generating geometrically parameterized interconnect
performance models,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and
Systems, vol. 23, no. 5, pp. 678–693, May 2004.
23. S. Dasgupta, “Kharitonov’s theorem revisited,” Systems & Control Letters, vol. 11, no. 5,
pp. 381–384, 1988.
24. V. De and S. Borkar, “Technology and design challenges for low power and high perfor-
mance,” in Proc. Int. Symp. on Low Power Electronics and Design (ISLPED), Aug 1999,
pp. 163–168.
25. L. H. de Figueiredo and J. Stolfi, “Self-validated numerical methods and applications,” in
Brazilian Mathematics Colloquium monographs, IMPA/CNPq, Rio de Janeiro, Brazil, 1997.
26. K. Deb, Multi-objective optimization using evolutionary algorithms. Wiley Publishing,
Hoboken, NJ, 2002.
27. A. Demir, E. Liu, and A.Sangiovanni-Vincentelli, “Time-domain non-Monte Carlo noise
simulation for nonlinear dynamic circuits with arbitrary excitations,” IEEE TCAD, pp. 493–
505, 1996.
28. C. Ding, C. Hsieh, and M. Pedram, “Improving the efficiency of Monte Carlo power
estimation VLSI,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 8, no. 5,
pp. 584–593, Oct 2000.
29. C. Ding, C. Tsui, and M. Pedram, “Gate-level power estimation using tagged probabilistic
simulation,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,
vol. 17, no. 11, pp. 1099–1107, Nov 1998.
30. Q. Dinh, D. Chen, and M. D. Wong, “Dynamic power estimation for deep submicron circuits
with process variation,” in Proc. Asia South Pacific Design Automation Conf. (ASPDAC), Jan
2010, pp. 587–592.
31. S. W. Director, P. Feldmann, and K. Krishna, “Statistical integrated circuit design,” IEEE J.
of Solid State Circuits, pp. 193–202, 1993.
32. P. Drennan and C. McAndrew, “Understanding MOSFET mismatch for analog design,” IEEE
J. of Solid State Circuits, pp. 450–456, 2003.
33. S. G. Duvall, “Statistical circuit modeling and optimization,” in Intl. Workshop Statistical
Metrology, Jun 2000, pp. 56–63.
34. T. El-Moselhy and L. Daniel, “Stochastic integral equation solver for efficient variation-aware
interconnect extraction,” in Proc. ACM/IEEE Design Automation Conf. (DAC), 2008.
References 289
35. J. Fan, N. Mi, S. X.-D. Tan, Y. Cai, and X. Hong, “Statistical model order reduction for
interconnect circuits considering spatial correlations,” in Proc. Design, Automation and Test
In Europe. (DATE), 2007, pp. 1508–1513.
36. P. Feldmann and R. W. Freund, “Efficient linear circuit analysis by Pade approximation via
the Lanczos process,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and
Systems, vol. 14, no. 5, pp. 639–649, May 1995.
37. P. Feldmann and S. W. Director, “Improved methods for IC yield and quality optimization
using surface integrals,” in IEEE/ACM ICCAD, 1991, pp. 158–161.
38. R. Fernandes and R. Vemuri, “Accurate estimation of vector dependent leakage power in
presence of process variations,” in Proc. IEEE Int. Conf. on Computer Design (ICCD),
Oct 2009, pp. 451–458.
39. I. A. Ferzli and F. N. Najm, “Statistical estimation of leakage-induced power grid voltage
drop considering within-die process variations,” in Proc. IEEE/ACM Design Automation
Conference (DAC), 2003, pp. 865–859.
40. I. A. Ferzli and F. N. Najm, “Statistical verification of power grids considering process-
induced leakage current variations,” in Proc. Int. Conf. on Computer Aided Design (ICCAD),
2003, pp. 770–777.
41. G. F. Fishman, Monte Carlo, concepts, algorithms, and Applications. Springer, 1996.
42. P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos, “Modeling within-die spatial
correlation effects for process design co-optimization,” in Proceedings of the 6th International
Symposium on Quality of Electronic Design, 2005, pp. 516–521.
43. O. Gay, D. Coeurjolly, and N. Hurst, “Libaffa: CCC affine arithmetic library for gnu/linux,”
May 2005, http://savannah.nongnu.org/projects/libaa/.
44. R. Ghanem, “The nonlinear Gaussian spectrum of log-normal stochastic processes and
variables,” Journal of Applied Mechanics, vol. 66, pp. 964–973, December 1999.
45. R. G. Ghanem and P. D. Spanos, Stochastic Finite Elements: A Spectral Approach. Dover
Publications, 2003.
46. P. Ghanta, S. Vrudhula, and S. Bhardwaj, “Stochasic variational analysis of large power
grids considering intra-die correlations,” in Proc. IEEE/ACM Design Automation Conference
(DAC), July 2006, pp. 211–216.
47. P. Ghanta, S. Vrudhula, R. Panda, and J. Wang, “Stochastic power grid analysis considering
process variations,” in Proc. Design, Automation and Test In Europe. (DATE), vol. 2, 2005,
pp. 964–969.
48. A. Ghosh, S. Devadas, K. Keutzer, and J. White, “Estimation of average switching activity in
combinational and sequential circuits,” in Proc. IEEE/ACM Design Automation Conference
(DAC), June 1992, pp. 253–259.
49. L. Giraud, S. Gratton, and E. Martin, “Incremental spectral preconditioners for sequences of
linear systems,” Appl. Num. Math., pp. 1164–1180, 2007.
50. K. Glover, “All optimal Hankel-norm approximations of linear multi-variable systems and
their L1 error bounds”,” Int. J. Control, vol. 36, pp. 1115–1193, 1984.
51. G. H. Golub and C. V. Loan, Matrix Computations, 3rd ed. The Johns Hopkins University
Press, 1996.
52. F. Gong, X. Liu, H. Yu, S. X. Tan, and L. He, “A fast non-Monte-Carlo yield analysis and
optimization by stochastic orthogonal polynomials,” ACM Trans. on Design Automation of
Electronics Systems, 2012, in press.
53. F. Gong, H. Yu, and L. He, “Picap: a parallel and incremental capacitance extraction
considering stochastic process variation,” in Proc. ACM/IEEE Design Automation Conf.
(DAC), 2009, pp. 764–769.
54. F. Gong, H. Yu, and L. He, “Stochastic analog circuit behaviour modelling by point estimation
method,” in ACM International Symposium on Physical Design (ISPD), 2011.
290 References
55. F. Gong, H. Yu, Y. Shi, D. Kim, J. Ren, and L. He, “QuickYield: an efficient global-search
based parametric yield estimation with performance constraints,” in Proc. ACM/IEEE Design
Automation Conf. (DAC), 2010, pp. 392–397.
56. F. Gong, H. Yu, L. Wang, and L. He, “A parallel and incremental extraction of variational ca-
pacitance with stochastic geometric moments,” IEEE Trans. on Very Large Scale Integration
(VLSI) Systems, 2012, in press.
57. R. L. Gorsuch, Factor Analysis. Hillsdale, NJ, 1974.
58. C. J. Gu and J. Roychowdhury, “Model reduction via projection onto nonlinear manifolds,
with applications to analog circuits and biochemical systems,” in Proc. Int. Conf. on Computer
Aided Design (ICCAD), Nov 2008.
59. C. Gu and J. Roychowdhury, “An efficient, fully nonlinear, variability-aware non-Monte-
Carlo yield estimation procedure with applications to SRAM cells and ring oscillators,” in
Proc. Asia South Pacific Design Automation Conf., 2008, pp. 754–761.
60. Z. Hao, R. Shen, S. X.-D. Tan, B. Liu, G. Shi, and Y. Cai, “Statistical full-chip dynamic power
estimation considering spatial correlations,” in Proc. Int. Symposium. on Quality Electronic
Design (ISQED), March 2011, pp. 677–782.
61. Z. Hao, R. Shen, S. X.-D. Tan, and G. Shi, “Performance bound analysis of analog
circuits considering process variations,” in Proc. Design Automation Conf. (DAC), July 2011,
pp. 310–315.
62. Z. Hao, S. X.-D. Tan, and G. Shi, “An efficient statistical chip-level total power estimation
method considering process variations with spatial correlation,” in Proc. Int. Symposium. on
Quality Electronic Design (ISQED), March 2011, pp. 671–676.
63. Z. Hao, S. X.-D. Tan, E. Tlelo-Cuautle, J. Relles, C. Hu, W. Yu, Y. Cai, and G. Shi, “Statistical
extraction and modeling of inductance considering spatial correlation,” Analog Integr Circ Sig
Process, 2012, in press.
64. B. P. Harish, N. Bhat, and M. B. Patil, “Process variability-aware statistical hybrid modeling
of dynamic power dissipation in 65 nm CMOS designs,” in Proc. Int. Conf. on Computing:
Theory and Applications (ICCTA), Mar 2007, pp. 94–98.
65. K. R. Heloue, N. Azizi, and F. N. Najm, “Modeling and estimation of full-chip leakage current
considering within-die correlation,” in Proc. IEEE/ACM Design Automation Conference
(DAC), 2007, pp. 93–98.
66. F. Hu and V. D. Agrawal, “Enhanced dual-transition probabilistic power estimation with
selective supergate analysis,” in Proc. IEEE Int. Conf. on Computer Design (ICCD), Oct 2005,
pp. 366–372.
67. G. M. Huang, W. Dong, Y. Ho, and P. Li, “Tracing SRAM separatrix for dynamic noise margin
analysis under device mismatch,” in Proc. of IEEE Int. Behavioral Modeling and Simulation
Conf., 2007, pp. 6–10.
68. A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis. Wiley, 2001.
69. “Intel pentium processor e5200 series specifications,” Intel Co., http://ark.intel.com/Product.
aspx?id=37212.
70. A. Iserles, A First Course in the Numerical Analysis of Differential Equations, 3rd ed.
Cambridge University, 1996.
71. “International technology roadmap for semiconductors (ITRS), 2010 update,” 2010, http://
public.itrs.net.
72. J. D. Jackson, Classical Electrodynamics. John Wiley and Sons, 1975.
73. H. Jiang, M. Marek-Sadowska, and S. R. Nassif, “Benefits and costs of power-gating
technique,” in Proc. IEEE Int. Conf. on Computer Design (ICCD), Oct 2005, pp. 559–566.
74. R. Jiang, W. Fu, J. M. Wang, V. Lin, and C. C.-P. Chen, “Efficient statistical capacitance
variability modeling with orthogonal principle factor analysis,” in Proc. Int. Conf. on
Computer Aided Design (ICCAD), 2005, pp. 683–690.
75. I. T. Jolliffe, Principal Component Analysis. Springer-Verlag, 1986.
76. M. Kamon, M. Tsuk, and J. White, “FastHenry: a multipole-accelerated 3D inductance
extraction program,” IEEE Trans. on Microwave Theory and Techniques, pp. 1750–1758,
Sept. 1994.
References 291
77. S. Kapur and D. Long, “IES3: A fast integral equation solver for efficient 3-dimensional
extraction,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 1997.
78. T. Karnik, S. Borkar, and V. De, “Sub-90 nm technologies-challenges and opportunities for
CAD,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), San Jose, CA, Nov 2002,
pp. 203–206.
79. V. L. Kharitonov, “Asymptotic stability of an equilibrium position of a family of systems of
linear differential equations,” Differential. Uravnen., vol. 14, pp. 2086–2088, 1978.
80. J. Kim, K. Jones, and M. Horowitz, “Fast, non-Monte-Carlo estimation of transient perfor-
mance variation due to device mismatch,” in Proc. IEEE/ACM Design Automation Conference
(DAC), 2007.
81. A. Klimke, “Sparse Grid Interpolation Toolbox—user’s guide,” University of Stuttgart, Tech.
Rep. IANS report 2006/001, 2006.
82. A. Klimke and B. Wohlmuth, “Algorithm 847: spinterp: Piecewise multilinear hierarchical
sparse grid interpolation in MATLAB,” ACM Transactions on Mathematical Software,
vol. 31, no. 4, 2005.
83. L. Kolev, V. Mladenov, and S. Vladov, “Interval mathematics algorithms for tolerance
analysis,” IEEE Trans. on Circuits and Systems, vol. 35, no. 8, pp. 967–975, Aug 1988.
84. J. N. Kozhaya, S. R. Nassif, , and F. N. Najm, “A multigrid-like technique for power grid
analysis,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 21,
no. 10, pp. 1148–1160, Oct 2002.
85. M. W. Kuemerle, S. K. Lichtensteiger, D. W. Douglas, and I. L. Wemple, “Integrated circuit
design closure method for selective voltage binning,” in U.S. Patent 7475366, Jan 2009.
86. Y. S. Kumar, J. Li, C. Talarico, and J. Wang, “A probabilistic collocation method based
statistical gate delay model considering process variations and multiple input switching,” in
Proc. Design, Automation and Test In Europe. (DATE), 2005, pp. 770–775.
87. A. Labun, “Rapid method to account for process variation in full-chip capacitance extraction,”
IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, pp. 941–
951, June 2004.
88. K. Lampaert, G. Gielen, and W. Sansen, “Direct performance-driven placement of mismatch-
sensitive analog circuits,” in Proc. IEEE/ACM Design Automation Conference (DAC), 1995,
pp. 445–449.
89. Y. Lee, Y. Cao, T. Chen, J. Wang, and C. Chen, “HiPRIME: Hierarchical and passivity
preserved interconnect macromodeling engine for RLKC power delivery,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 6, pp. 797–806, 2005.
90. A. Levkovich, E. Zeheb, and N. Cohen, “Frequency response envelopes of a family of
uncertain continuous-time systems,” IEEE Trans. on Circuits and Systems I: Fundamental
Theory and Applications, vol. 42, no. 3, pp. 156–165, Mar 1995.
91. D. Li and S. X.-D. Tan, “Statistical analysis of large on-chip power grid networks by
variational reduction scheme,” Integration, the VLSI Journal, vol. 43, no. 2, pp. 167–175,
April 2010.
92. D. Li, S. X.-D. Tan, G. Chen, and X. Zeng, “Statistical analysis of on-chip power grid
networks by variational extended truncated balanced realization method,” in Proc. Asia South
Pacific Design Automation Conf. (ASPDAC), Jan 2009, pp. 272–277.
93. D. Li, S. X.-D. Tan, and B. McGaughy, “ETBR: Extended truncated balanced realization
method for on-chip power grid network analysis,” in Proc. Design, Automation and Test In
Europe. (DATE), 2008, pp. 432–437.
94. D. Li, S. X.-D. Tan, E. H. Pacheco, and M. Tirumala, “Fast analysis of on-chip power grid
circuits by extended truncated balanced realization method,” IEICE Trans. on Fundamentals
of Electronics, Communications and Computer Science(IEICE), vol. E92-A, no. 12, pp. 3061–
3069, 2009.
95. P. Li and W. Shi, “Model order reduction of linear networks with massive ports via frequency-
dependent port packing,” in Proc. Design Automation Conf. (DAC), 2006, pp. 267–272.
292 References
96. T. Li, W. Zhang, and Z. Yu, “Full-chip leakage analysis in nano-scale technologies:
Mechanisms, variation sources, and verification,” in Proc. Design Automation Conf. (DAC),
June 2008, pp. 594–599.
97. X. Li, J. Le, L. Pileggi, and A. Strojwas, “Projection-based performance modeling for
inter/intra-die variations,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2005,
pp. 721–727.
98. X. Li, J. Le, and L. T. Pileggi, “Projection-based statistical analysis of full-chip leakage power
with non-log-normal distributions,” in Proc. IEEE/ACM Design Automation Conference
(DAC), July 2006, pp. 103–108.
99. Y. Lin and D. Sylvester, “Runtimie lekaage power estimation technique for combinational
circuits,” in Proc. Asia South Pacific Design Automation Conf. (ASPDAC), Jan 2007,
pp. 660–665.
100. B. Liu, F. V. Fernandez, and G. Gielen, “An accurate and efficient yield optimization method
for analog circuits based on computing budget aladdress and memetic search technique,” in
Proc. Design Automation and Test Conf. in Europe, 2010, pp. 1106–1111.
101. Y. Liu, S. Nassif, L. Pileggi, and A. Strojwas, “Impact of interconnect variations on the clock
skew of a gigahertz microprocessor,” in Proc. IEEE/ACM Design Automation Conference
(DAC), 2000, pp. 168–171.
102. Y. Liu, L. T. Pileggi, and A. J. Strojwas, “Model order-reduction of rc(l) interconnect
including variational analysis,” in DAC ’99: Proceedings of the 36th ACM/IEEE conference
on Design automation, 1999, pp. 201–206.
103. R. Marler and J. Arora, “Survey of multi-objective optimization methods for engineering,”
Struct Multidisc Optim 26, pp. 369–395, 2004.
104. H. Masuda, S. Ohkawa, A. Kurokawa, and M. Aoki, “Challenge: Variability characterization
and modeling for 65- to 90-nm processes,” in Proc. IEEE Custom Integrated Circuits Conf.,
2005.
105. C. McAndrew, J. Bates, R. Ida, and P. Drennan, “Efficient statistical BJT modeling, why beta
is more than ic/ib,” in Proc. IEEE Bipolar/BiCMOS Circuits and Tech. Meeting, 1997.
106. “MCNC benchmark circuit placements,” http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/
nPlacement/.
107. N. Mi, J. Fan, and S. X.-D. Tan, “Simulation of power grid networks considering wires and
lognormal leakage current variations,” in Proc. IEEE International Workshop on Behavioral
Modeling and Simulation (BMAS), Sept. 2006, pp. 73–78.
108. N. Mi, J. Fan, and S. X.-D. Tan, “Statistical analysis of power grid networks considering
lognormal leakage current variations with spatial correlation,” in Proc. IEEE Int. Conf. on
Computer Design (ICCD), 2006, pp. 56–62.
109. N. Mi, J. Fan, S. X.-D. Tan, Y. Cai, and X. Hong, “Statistical analysis of on-chip power
delivery networks considering lognormal leakage current variations with spatial correlations,”
IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications, vol. 55, no. 7,
pp. 2064–2075, Aug 2008.
110. N. Mi, S. X.-D. Tan, Y. Cai, and X. Hong, “Fast variational analysis of on-chip power grids
by stochastic extended krylov subspace method,” IEEE Trans. on Computer-Aided Design of
Integrated Circuits and Systems, vol. 27, no. 11, pp. 1996–2006, 2008.
111. N. Mi, S. X.-D. Tan, P. Liu, J. Cui, Y. Cai, and X. Hong, “Stochastic extended Krylov
subspace method for variational analysis of on-chip power grid networks,” in Proc. Int. Conf.
on Computer Aided Design (ICCAD), 2007, pp. 48–53.
112. B. Moore, “Principal component analysis in linear systems: Controllability, and observability,
and model reduction,” IEEE Trans. Automat. Contr., vol. 26, no. 1, pp. 17–32, 1981.
113. R. E. Moore, Interval Analysis. Prentice-Hall, 1966.
114. S. Mukhopadhyay and K. Roy, “Modeling and estimation of total leakage current in nano-
scaled CMOS devices considering the effect of parameter variation,” in Proc. Int. Symp. on
Low Power Electronics and Design (ISLPED), 2003, pp. 172–175.
115. K. Nabors and J. White, “Fastcap: A multipole accelerated 3-d capacitance extraction
program,” IEEE TCAD, pp. 1447–1459, Nov 1991.
References 293
116. F. Najm, “Transition density: a new measure of activity in digital circuits,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 12, no. 2, pp. 310–323, Feb
1993.
117. F. Najm, R. Burch, P. Yang, and I. Hajj, “Probabilistic simulation for reliability analysis of
CMOS VLSI circuits,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and
Systems, vol. 9, no. 4, pp. 439–450, Apr 1990.
118. K. Narbos and J. White, “FastCap: a multipole accelerated 3D capacitance extraction
program,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,
vol. 10, no. 11, pp. 1447–1459, 1991.
119. S. Narendra, V. De, S. Borkar, D. A. Antoniadis, and A. P. Chandrakasan, “Full-chip
subthreshold leakage power prediction and reduction techniques for sub-0.18-m CMOS,”
IEEE J. Solid-State Circuits, vol. 39, no. 3, pp. 501–510, Mar 2004.
120. S. Nassif, “Delay variability: sources, impact and trends,” in Proc. IEEE Int. Solid-State
Circuits Conf., San Francisco, CA, Feb 2000, pp. 368–369.
121. S. Nassif, “Design for variability in DSM technologies,” in Proc. Int. Symposium. on Quality
Electronic Design (ISQED), San Jose, CA, Mar 2000, pp. 451–454.
122. S. R. Nassif, “Model to hardware correlation for nm-scale technologies,” in Proc. IEEE Inter-
national Workshop on Behavioral Modeling and Simulation (BMAS), Sept 2007, keynote
speech.
123. S. R. Nassif, “Power grid analysis benchmarks,” in Proc. Asia South Pacific Design Auto-
mation Conf. (ASPDAC), 2008, pp. 376–381.
124. S. R. Nassif and K. J. Nowka, “Physical design challenges beyond the 22 nm node,” in Proc.
ACM Int. Sym. Physical Design (ISPD), 2010, pp. 13–14.
125. “Nangate open cell library,” http://www.nangate.com/.
126. E. Novak and K. Ritter, “Simple cubature formulas with high polynomial exactness,”
Constructive Approximation, vol. 15, no. 4, pp. 449–522, Dec 1999.
127. A. Odabasioglu, M. Celik, and L. Pileggi, “PRIMA: Passive reduced-order interconnect
macro-modeling algorithm,” IEEE TCAD, pp. 645–654, 1998.
128. J. Oehm and K. Schumacher, “Quality assurance and upgrade of analog characteristics by fast
mismatch analysis option in network analysis environment,” IEEE J. of Solid State Circuits,
pp. 865–871, 1993.
129. M. Orshansky, L. Milor, and C. Hu, “Characterization of spatial intrafield gate cd variability,
its impact on circuit performance, and spatial mask-level correction,” in IEEE Trans. on
Semiconductor Devices, vol. 17, no. 1, Feb 2004, pp. 2–11.
130. C. C. Paige and M. A. Saunders, “Solution of sparse indefinite systems of linear equations,”
SIAM J. on Numerical Analysis, vol. 12, no. 4, pp. 617–629, September 1975.
131. S. Pant, D. Blaauw, V. Zolotov, S. Sundareswaran, and R. Panda, “A stochastic approach
to power grid analysis,” in Proc. IEEE/ACM Design Automation Conference (DAC), 2004,
pp. 171–176.
132. A. Papoulis and S. Pillai, Probability, Random Variables and Stochastic Processes. McGraw-
Hill, 2001.
133. M. Pelgrom, A. Duinmaijer, and A. Welbers, “Matching properties of mos transistors,” IEEE
J. of Solid State Circuits, pp. 1433–1439, 1989.
134. J. R. Phillips and L. M. Silveira, “Poor man’s TBR: a simple model reduction scheme,” IEEE
Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 1, pp. 43–
55, 2005.
135. L. Pileggi, G. Keskin, X. Li, K. Mai, and J. Proesel, “Mismatch analysis and statistical design
at 65 nm and below,” in Proc. IEEE Custom Integrated Circuits Conf., 2008, pp. 9–12.
136. L. T. Pillage and R. A. Rohrer, “Asymptotic waveform evaluation for timing analysis,” IEEE
Trans. on Computer-Aided Design of Integrated Circuits and Systems, pp. 352–366, April
1990.
137. L. T. Pillage, R. A. Rohrer, and C. Visweswariah, Electronic Circuit and System Simulation
Methods. New York: McGraw-Hill, 1994.
294 References
138. S. Pilli and S. Sapatnekar, “Power estimation considering statistical ic parametric variations,”
in Proc. IEEE Int. Symp. on Circuits and Systems (ISCAS), vol. 3, June 1997, pp. 1524–1527.
139. “Predictive Technology Model,” http://www.eas.asu.edu/ptm/.
140. L. Qian, D. Zhou, S. Wang, and X. Zeng, “Worst case analysis of linear analog circuit
performance based on kharitonov’s rectangle,” in Proc. IEEE Int. Conf. on Solid-State and
Integrated Circuit Technology (ICSICT), Nov 2010.
141. W. T. Rankin, III, “Efficient parallel implementations of multipole based n-body algorithms,”
Ph.D. dissertation, Duke University, Durham, NC, USA, 1999.
142. R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester, “Statistical analysis of subthreshold
leakage current for VLSI circuits,” IEEE Trans. on Very Large Scale Integration (VLSI)
Systems, vol. 12, no. 2, pp. 131–139, Feb 2004.
143. J. Relles, M. Ngan, E. Tlelo-Cuautle, S. X.-D. Tan, C. Hu, W. Yu, and Y. Cai, “Statistical
extraction and modeling of 3D inductance with spatial correlation,” in Proc. IEEE Interna-
tional Workshop on Symbolic and Numerical Methods, Modeling and Applications to Circuit
Design, Oct 2010.
144. M. Rewienski and J. White, “A trajectory piecewise-linear approach to model order reduction
and fast simulation of nonlinear circuits and micromachined devices,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 22, no. 2, pp. 155–170,
Feb 2003.
145. J. Roy, S. Adya, D. Papa, and I. Markov, “Min-cut floorplacement,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 25, no. 7, pp. 1313–1326,
July 2006.
146. J. Roychowdhury, “Reduced-order modelling of time-varying systems,” in Proc. Asia South
Pacific Design Automation Conf. (ASPDAC), Jan 1999, pp. 53–56.
147. A. E. Ruehli, “Equivalent circuits models for three dimensional multiconductor systems,”
IEEE Trans. on Microwave Theory and Techniques, pp. 216–220, 1974.
148. R. Rutenbar, “Next-generation design and EDA challenges,” in Proc. Asia South Pacific
Design Automation Conf. (ASPDAC), January 2007, keynote speech.
149. Y. Saad and M. H. Schultz, “GMRES: a generalized minimal residual algorithm for solving
nonsymmetric linear systems,” SIAM J. on Sci and Sta. Comp., pp. 856–869, 1986.
150. Y. Saad, Iterative methods for sparse linear systems. SIAM, 2003.
151. S. B. Samaan, “The impact of device parameter variations on the frequency and performance
of VLSI chips,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), ser. ICCAD ’04,
2004, pp. 343–346.
152. Y. Sawaragi, H. Nakayama, and T. Tanino, Theory of Multiobjective Optimization (vol. 176
of Mathematics in Science and Engineering). Orlando, FL: Academic Press Inc. ISBN
0126203709, 1985.
153. F. Schenkel, M. Pronath, S. Zizala, R. Schwencker, H. Graeb, and K. Antreich, “Mismatch
analysis and direct yield optimization by specwise linearization and feasibility-guided
search,” in Proc. IEEE/ACM Design Automation Conference (DAC), 2001.
154. A. S. Sedra and K. C. Smith, Microelectronic Circuits. Oxford University Press, USA, 2009.
155. R. Shen, N. Mi, S. X.-D. Tan, Y. Cai, and X. Hong, “Statistical modeling and analysis of
chip-level leakage power by spectral stochastic method,” in Proc. Asia South Pacific Design
Automation Conf. (ASPDAC), Jan 2009, pp. 161–166.
156. R. Shen, S. X.-D. Tan, J. Cui, W. Yu, Y. Cai, and G. Chen, “Variational capacitance extraction
and modeling based on orthogonal polynomial method,” IEEE Trans. on Very Large Scale
Integration (VLSI) Systems, vol. 18, no. 11, pp. 1556–1565, 2010.
157. R. Shen, S. X.-D. Tan, N. Mi, and Y. Cai, “Statistical modeling and analysis of chip-level
leakage power by spectral stochastic method,” Integration, the VLSI Journal, vol. 43, no. 1,
pp. 156–165, January 2010.
158. R. Shen, S. X.-D. Tan, and J. Xiong, “A linear algorithm for full-chip statistical leakage power
analysis considering weak spatial correlation,” in Proc. Design Automation Conf. (DAC), Jun.
2010, pp. 481–486.
References 295
159. R. Shen, S. X.-D. Tan, and J. Xiong, “A linear statistical analysis for full-chip leakage power
with spatial correlation,” in Proc. IEEE/ACM International Great Lakes Symposium on VLSI
(GLSVLSI), May 2010, pp. 227–232.
160. C.-J. Shi and X.-D. Tan, “Canonical symbolic analysis of large analog circuits with determi-
nant decision diagrams,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and
Systems, vol. 19, no. 1, pp. 1–18, Jan 2000.
161. C.-J. Shi and X.-D. Tan, “Compact representation and efficient generation of s-expanded
symbolic network functions for computer-aided analog circuit design,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 20, no. 7, pp. 813–827, April
2001.
162. C.-J. R. Shi and M. W. Tian, “Simulation and sensitivity of linear analog circuits under
parameter variations by robust interval analysis,” ACM Trans. Des. Autom. Electron. Syst.,
vol. 4, pp. 280–312, July 1999.
163. W. Shi, J. Liu, N. Kakani, and T. Yu, “A fast hierarchical algorithm for 3-d capacitance
extraction,” in Proc. ACM/IEEE Design Automation Conf. (DAC), 1998.
164. W. Shi, J. Liu, N. Kakani, and T. Yu, “A fast hierarchical algorithm for 3-d capacitance
extraction,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,
vol. 21, no. 3, pp. 330–336, March 2002.
165. R. W. Shonkwiler and L. Lefton, An introduction to parallel and vector scientific computing.
Cambridge University Press, 2006.
166. V. Simoncini and D. Szyld, “Recent computational developments in Krylov subspace methods
for linear systems,” Num. Lin. Alg. with Appl., pp. 1–59, 2007.
167. R. S. Soin and R. Spence, “Statistical exploration approach to design centering,” Proceedings
of the Institution of Electrical Engineering, pp. 260–269, 1980.
168. R. Spence and R. Soin, Tolerance Design of Electronic Circuits. Addison-Wesley, Reading,
MA., 1988.
169. A. Srivastava, R. Bai, D. Blaauw, and D. Sylvester, “Modeling and analysis of leakage power
considering within-die process variations,” in Proc. Int. Symp. on Low Power Electronics and
Design (ISLPED), Aug 2002, pp. 64–67.
170. A. Srivastava, D. Sylvester, and D. Blaauw, Statistical Analysis and Optimization for VLSI:
Timing and Power. Springer, 2005.
171. G. W. Stewart, Matrix Algorithms, VOL II. SIAM Publisher, 2001.
172. B. G. Streetman and S. Banerjee, Solid-State Electronic Devices. Prentice Hall, 2000, 5th ed.
173. E. Suli and D. Mayers, An Introduction to Numerical Analysis. Cambridge University, 2006.
174. S. X.-D. Tan, W. Guo, and Z. Qi, “Hierarchical approach to exact symbolic analysis of large
analog circuits,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,
vol. 24, no. 8, pp. 1241–1250, August 2005.
175. S. X.-D. Tan and C.-J. Shi, “Efficient DDD-based interpretable symbolic characterization of
large analog circuits,” IEICE Trans. on Fundamentals of Electronics, Communications and
Computer Science(IEICE), vol. E86-A, no. 12, pp. 3112–3118, Dec 2003.
176. S. X.-D. Tan and C.-J. Shi, “Efficient approximation of symbolic expressions for analog
behavioral modeling and analysis,” IEEE Trans. on Computer-Aided Design of Integrated
Circuits and Systems, vol. 23, no. 6, pp. 907–918, June 2004.
177. S. X.-D. Tan and L. He, Advanced Model Order Reduction Techniques in VLSI Design.
Cambridge University Press, 2007.
178. R. Teodorescu, B. Greskamp, J. Nakano, S. R. Sarangi, A. Tiwari, and J. Torrellas, “A
model of parameter variation and resulting timing errors for microarchitects,” in Workshop
on Architectural Support for Gigascale Integration (ASGI), Jun 2007.
179. W. Tian, X.-T. Ling, and R.-W. Liu, “Novel methods for circuit worst-case tolerance analysis,”
IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications, vol. 43, no. 4,
pp. 272–278, Apr 1996.
296 References
180. S. Tiwary and R. Rutenbar, “Generation of yield-aware Pareto surfaces for hierarchical circuit
design space exploration,” in Proc. IEEE/ACM Design Automation Conference (DAC), 2006,
pp. 31–36.
181. S. K. Tiwary and R. A. Rutenbar, “Faster, parametric trajectory-based macromodels via
localized linear reductions,” in Proc. Int. Conf. on Computer Aided Design (ICCAD),
Nov 2006, pp. 876–883.
182. J. W. Tschanz, S. Narendra, R. Nair, and V. De, “Ectiveness of adaptive supply voltage and
body bias for reducing impact of parameter variations in low power and high performance
microprocessors,” IEEE J. Solid-State Circuits, vol. 38, no. 5, pp. 826–829, May 2003.
183. C.-Y. Tsui, M. Pedram, and A. Despain, “Efficient estimation of dynamic power consumption
under a real delay model,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), Nov 1993,
pp. 224–228.
184. “Umfpack,” http://www.cise.ufl.edu/research/sparse/umfpack/.
185. J. Vlach and K. Singhal, Computer Methods for Circuit Analysis and Design. New York, NY:
Van Nostrand Reinhold, 1995.
186. M. Vratonjic, B. R. Zeydel, and V. G. Oklobdzija, “Circuit sizing and supply-voltage selection
for low-power digital circuit design,” in Power and Timing Modeling, Optimization and
Simulation: 18th International Workshop, (PATMOS), 2006, pp. 148–156.
187. S. Vrudhula, J. M. Wang, and P. Ghanta, “Hermite polynomial based interconnect analysis
in the presence of process variations,” IEEE Trans. on Computer-Aided Design of Integrated
Circuits and Systems, vol. 25, no. 10, 2006.
188. C.-Y. Wang and K. Roy, “Maximum power estimation for CMOS circuits using deterministic
and statistical approaches,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems,
vol. 6, no. 1, pp. 134–140, Mar 1998.
189. H. Wang, H. Yu, and S. X.-D. Tan, “Fast analysis of nontree-clock network considering
environmental uncertainty by parameterized and incremental macromodeling,” in Proc.
IEEE/ACM Asia South Pacific Design Automation Conf. (ASPDAC), 2009, pp. 379–384.
190. J. Wang, P. Ghanta, and S. Vrudhula, “Stochastic analysis of interconnect performance in the
presence of process variations,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), Nov
2004, pp. 880–886.
191. J. M. Wang and T. V. Nguyen, “Extended Krylov subspace method for reduced order analysis
of linear circuit with multiple sources,” in Proc. IEEE/ACM Design Automation Conference
(DAC), 2000, pp. 247–252.
192. J. M. Wang, B. Srinivas, D. Ma, C. C.-P. Chen, and J. Li, “System-level power and thermal
modeling and analysis by orthogonal polynomial based response surface approach (OPRS),”
in Proc. Int. Conf. on Computer Aided Design (ICCAD), Nov 2005, pp. 727–734.
193. M. S. Warren and J. K. Salmon, “A parallel hashed oct-tree n-body algorithm,” in Proceedings
of the 1993 ACM/IEEE conference on Supercomputing, ser. Supercomputing ’93, 1993,
pp. 12–21.
194. D. Wilton, S. Rao, A. Glisson, D. Schaubert, O. Al-Bundak, and C. Butler, “Potential integrals
for uniform and linear source distributions on polygonal and polyhedral domains,” IEEE
Trans. on Antennas and Propagation, vol. AP-32, no. 3, pp. 276–281, March 1984.
195. J. Xiong, V. Zolotov, and L. He, “Robust extraction of spatial correlation,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 26, no. 4, 2007.
196. D. Xiu and G. Karniadakis, “The Wiener-Askey polynomial chaos for stochastic differential
equations,” SIAM J. Scientific Computing, vol. 24, no. 2, pp. 619–644, Oct 2002.
197. D. Xiu and G. Karniadakis, “Modeling uncertainty in flow simulations via generalized
polynomial chaos,” J. of Computational Physics, vol. 187, no. 1, pp. 137–167, May 2003.
198. H. Xu, R. Vemuri, and W. Jone, “Run-time active leakage reduction by power gating and
reverse body biasing: An energy view,” in Proc. IEEE Int. Conf. on Computer Design (ICCD),
Oct 2008, pp. 618–625.
199. S. Yan, V. Sarim, and W. Shi, “Sparse transformation and preconditioners for 3-d capacitance
extraction,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,
vol. 24, no. 9, pp. 1420–1426, 2005.
References 297
200. Z. Ye and Z. Yu, “An efficient algorithm for modeling spatially-correlated process variation in
statistical full-chip leakage analysis,” in Proc. Int. Conf. on Computer Aided Design (ICCAD),
Nov 2009, pp. 295–301.
201. L. Ying, G. Biros, D. Zorin, and H. Langston, “A new parallel kernel-independent fast multi-
pole method,” in IEEE Conf. on High Performance Networking and Computing, 2003.
202. H. Yu, X. Liu, H. Wang, and S. X.-D. Tan, “A fast analog mismatch analysis by an incremental
and stochastic trajectory piecewise linear macromodel,” in Proc. Asia South Pacific Design
Automation Conf. (ASPDAC), Jan 2010, pp. 211–216.
203. H. Yu and S. X.-D. Tan, “Recent advance in computational prototyping for analysis of
high-performance analog/RF ICs,” in IEEE International Conf. on ASIC (ASICON), 2009,
pp. 760–764.
204. W. Yu, C. Hu, and W. Zhang, “Variational capacitance extraction of on-chip interconnects
based on continuous surface model,” in Proc. IEEE/ACM Design Automation Conference
(DAC), July 2009, pp. 758–763.
205. W. Zhang, W. Yu, Z. Wang, Z. Yu, R. Jiang, and J. Xiong, “An efficient method for chip-level
statistical capacitance extraction considering process variations with spatial correlation,” in
Proc. Design, Automation and Test In Europe. (DATE), Mar 2008, pp. 580–585.
206. M. Zhao, R. V. Panda, S. S. Sapatnekar, and D. Blaauw, “Hierarchical analysis of power
distribution networks,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and
Systems, vol. 21, no. 2, pp. 159–168, Feb 2002.
207. Y. Zhou, Z. Li, Y. Tian, W. Shi, and F. Liu, “A new methodology for interconnect parasitics
extraction considering photo-lithography effects,” in Proc. Asia South Pacific Design Automa-
tion Conf. (ASPDAC), Jan 2007, pp. 450–455.
208. H. Zhu, X. Zeng, W. Cai, J. Xue, and D. Zhou, “A sparse grid based spectral stochastic collo-
cation method for variations-aware capacitance extraction of interconnects under nanometer
process technology,” in Proc. Design, Automation and Test In Europe. (DATE), Mar 2007,
pp. 1514–1519.
209. Z. Zhu and J. Phillips, “Random sampling of moment graph: a stochastic Krylov-
reduction algorithm,” in Proc. Design, Automation and Test In Europe. (DATE), April 2007,
pp. 1502–1507.
210. Z. Zhu and J. White, “FastSies: a fast stochastic integral equation solver for modeling
the rough surface effect,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2005,
pp. 675–682.
211. Z. Zhu, B. Song, and J. White, “Algorithms in FastImp: a fast and wideband impedance
extraction program for complicated 3-d geometries,” in Proc. Design Automation Conf.
(DAC). New York, NY, USA: ACM, 2003, pp. 712–717.
212. Z. Zhu, J. White, and A. Demir, “A stochastic integral equation method for modeling the
rough surface effect on interconnect capacitance,” in Proc. Int. Conf. on Computer Aided
Design (ICCAD), 2004, pp. 887–891.
213. V. Zolotov, C. Viweswariah, and J. Xiong, “Voltage binning under process variation,” in Proc.
Int. Conf. on Computer Aided Design (ICCAD), Nov 2009, pp. 425–432.
214. Y. Zou, Y. Cai, Q. Zhou, X. Hong, S. X.-D. Tan, and L. Kang, “Practical implementation of
stochastic parameterized model order reduction via hermite polynomial chaos,” in Proc. Asia
South Pacific Design Automation Conf. (ASPDAC), Jan 2007, pp. 367–372.
Index
A C
Adaptive voltage supply CAD
yield optimization, 273 developers, 9
Affine interval, 13 inductance extraction, 209
performance bound analysis, 222 Capacitance extraction, 163
Arnoldi algorithm Capacitance matrix
capacitance extraction, 194, 199 power grid, 111
power grid, 150 CDF
Askey scheme, 29 cumulative distribution function, 19
yield analysis, 257 Charge distribution
Augmented potential coefficient matrix capacitance extraction, 165
capacitance extraction, 167 Chebyshev’s inequality, 17–18
Cholesky decomposition, 26
CMP, 3
Collocation-based method
B spectral stochastic method, 31
Balancing Collocation-based spectral stochastic method
TBR, 146 capacitance extraction, 163
Baseline leakage analysis, 65
yield, 271 Conductance matrix, 110
BEM Continuous random variable, 16
boundary element method, 163 Corner-based, 3
capacitance extraction, 165, 184 Correlation index neighbor set
inductance extraction, 209 statistical leakage analysis, 67
BEOL Covariance, 21
back-end-of-the-line, 111 Covariance matrix, 8, 23, 25
Bin voltage level statistical leakage analysis, 43, 57
yield, 275 Critical dimension, 7
Binning algorithm
yield, 275
Block-Arnoldi orthonormalization, 243 D
BPV DAE
backward propagation of variance, 237 differential-algebra-equation, 235
mismatch, 242 yield, 258
DDD FMM
determinant decision diagram, 222 fast-multipole-method, 183
Decancellation Free space Green function, 168
performance bound analysis, 227
Delay
dynamic power, 86 G
inductance extraction, 217 Galerkin-based method, 33
power grid, 107 spectral stochastic method, 31
yield, 254 Galerkin-based spectral stochastic method, 11,
Deterministic current source, 134 166
Discrete probability distribution, 18 capacitance extraction, 164, 166
Discrete random variable, 16 power grid, 113, 136
Dishing, 7 Gate oxide leakage
Downward pass, 185 statistical leakage analysis, 41
Dynamic current Gate oxide thickness
power grid, 128 statistical leakage analysis, 41
Dynamic power, 10 dynamic power analysis, 84
Dynamic power analysis, 85 Gaussian-Hermite quadrature
Dynamic power fundamental, 31
yield optimization, 273 Gaussian distribution, 19
Gaussian quadrature
fundamental, 31
E leakage analysis, 10
Effective channel length statistical leakage analysis, 59
dynamic power analysis, 84 inductance extraction, 212
power grid, 112 Gaussian
statistical leakage analysis, 41 capacitance extraction, 166
yield, 257, 274 dynamic power analysis, 90
EKS, 11 inductance extraction, 211
extended Krylov subspace, 127 mismatch, 241
Extended Krylov subspace method, 11 power grid, 111
power grid, 128 random variable, 7
Electrical parameter, 256 statistical leakage analysis, 58
Electromigration, 4 yield, 256
ETBR, 11 yield optimization, 275
ETBR extended truncated balanced realization, Geometric variation
11 capacitance extraction, 166
ETBR inductance extraction, 209
extended truncated balanced realization, Geometrical parameter, 256
145 Glitch width variation
power grid, 130, 148 dynamic power analysis, 89
Event, 15 Glitch
Expectation, 16 dynamic power analysis, 86
Experiment, 15 Global aggregation, 245
Exponential correlation model GM
capacitance extraction, 166 geometrical moment, 186
inductance extraction, 211 GMRES
capacitance extraction, 183
general minimal residue, 164
F Gradient-based yield optimization, 256
Fast multipole method, 12 Gramian
Filament current, 211 power grid, 145, 147
Filament voltage, 211 Greedy algorithm, 13
Index 301
H K
Hermite polynomials KCL
total power analysis, 10, 95 Kirchhoff’s current law, 211
yield, 257 yield, 258
HOC Kharitonov’s functions, 13
hermit polynomial chaos, 33 performance bound analysis, 222, 228
Hot carrier injection, 4 Kharitonov’s polynomials, 13
HPC Krylov subspace
capacitance extraction, 163, 166 capacitance extraction, 194
Hermite polynomial chaos, 29
inductance extraction, 214–215 L
power grid, 115, 131 Layout dependent variation, 7
statistical leakage analysis, 40 LE
total power analysis, 97 local expansion, 187
Leakage power, 39
yield optimization, 273
I Local tangent subspace
Idle leakage, 77 mismatch, 244
IEKS Log-normal, 19
improved extended Krylov subspace Log-normal leakage current, 11
methods, 11 Log-normal
IGMRES power grid, 111, 134
incremental GMRES, 195 statistical leakage analysis, 41
Incremental aggregation, 246 Look-up table
Independent, 20 capacitance extraction, 171
capacitance extraction, 167 gate-based leakage analysis, 41
power grid, 110 LUT, 66
statistical leakage analysis, 67 Loop-up-table, 10
statistical leakage analysis, 57 LU decomposition, 184
Inductance extraction, 209 Lyapunov equation, 146
Inductance matrix, 210
Inner product
capacitance extraction, 171 M
mismatch, 241 Macromodel
power grid, 132 mismatch, 242
Inter-die, 6 ManiMOR
fundamentals, 23 mismatch, 247
power grid, 111 Markov’s inequality, 17–18
statistical leakage analysis, 45, 57 Maximum possible yield, 276
yield optimization, 275 MC
Interval arithmetic capacitance extraction, 166
performance bound analysis, 222 dynamic power analysis, 90
Intra-die, 6 mismatch, 235
fundamentals, 23 Monte Carlo, 28
power grid, 111 performance bound analysis, 221, 228
statistical leakage analysis, 45, 55 power grid, 132, 151
yield optimization, 275 statistical leakage analysis, 49, 61
IsTPWL total power analysis, 95
incremental stochastic TPWL, 236 yield, 253, 260, 282
mismatch, 247 inductance extraction, 211
302 Index
ME Orthogonal PC
multiple expansion, 186 power grids, 11
Mean value, 16 Orthogonal polynomial chaos, 29, 158
dynamic power analysis, 90 capacitance extraction, 166, 183, 188
mismatch, 241 leakage analysis, 55
power grid, 116 mismatch, 236
statistical leakage analysis, 39, 58 Orthogonal polynomial chaos
yield, 261 mismatch, 236
inductance extraction, 211 Orthogonal polynomial chaos
total power analysis, 100 mismatch, 240
Mismatch, 235 power grid, 108, 127
analog circuits, 13 statistical leakage analysis, 53
performance bound analysis, 221 yield, 257
yield, 253 Orthogonal polynomial chaos
MNA yield analysis and optimization, 13
modified nodal analysis, 111 Orthogonal polynomial chaos
power grid, 115 dynamic power analysis, 87
Moment, 17 Orthogonal polynomials chaos
power grid, 129 analog circuits, 13
statistical leakage analysis, 50 Oxide erosion, 7
MOR
mismatch, 236, 238
model order reduction, 236 P
Multi-objective optimization, 262 Panel-distance, 186
Multivariate Gaussian process Panel-width, 186
power grid, 111 Parametric yield, 254, 275
Mutually independent, 20 PBTI, 4
MVP PCA
matrix-vector product, 183 capacitance extraction, 167, 186
power grid, 111, 150
principal component analysis, 27
N statistical leakage analysis, 49, 57, 67
NBTI, 4 yield, 257
NMC PDF
mismatch, 235 mismatch, 241
non-Monte Carlo, 253 probability density function, 18
Non-Monte-Carlo method, 13 total power analysis, 99
Non-Monte Carlo method yield, 255, 263
yield, 259 yield optimization, 274
Pelgrom’s model
mismatch, 237
O yield, 256
OPAM Performance bound analysis, 12, 222
operational amplifier, 265 Performance metric, 255
Optical proximity correction, 7 Perturbation
Optimal binning scheme, 280 mismatch, 240
Ordinary differential equation Perturbed SDAE
ODE, 238 mismatch, 240
Orthogonal decomposition PFA
capacitance extraction, 12 principle factor analysis, 26
leakage analysis, 10 total power analysis, 10, 95
power grids, 11 Phase-shift mask, 7
Index 303
PiCAP, 12 SCL
parallel and incremental capacitance standard cell library, 66
extraction, 183 Segment
PMTBR dynamic power analysis, 86
power grid, 147 Set covering, 276
Potential coefficient matrix SGM
second-order, 168 stochastic geometric moment, 189
capacitance extraction, 165 Single-objective yield optimization, 272
POV Singular value
propagation of variation, 256 power grid, 146
yield, 261 Slack, 274
Power constraint, 276 SLP
Power grid network, 109 sequential linear programming, 256
Power grids, 10 yield, 262
Pre-set potential, 165 Smolyak quadrature
Preconditioner, 184 dynamic power analysis, 88
Primary conductor, 211 fundamental, 32
Principal factor analysis, 10 inductance extraction, 212
Process variation, 4, 23 statistical leakage analysis, 60
capacitance extraction, 163, 165, 183 total power analysis, 98
inductance extraction, 209 SMOR
performance bound analysis, 221 stochastic model order reduction, 130
statistical leakage analysis, 45 Snapshot
total power analysis, 95 mismatch, 243
yield, 253 Sparse grid quadrature, 32
Projection matrix, 147 Sparse grid
PSD inductance extraction, 214
power spectral density, 235 total power analysis, 10, 95
PWL Sparse grids
piece-wise linear, 128 inductance extraction, 12
Spatial correlation, 8
Spatial correlation, 23
Q capacitance extraction, 169
Quadrature points, 31 power grid, 111
statistical leakage analysis, 59 statistical leakage analysis, 46, 57, 67
total power analysis, 95
yield optimization, 275
R Spatial correlations
Random variable, 16 leakage analysis, 10
Random variable reduction, 12 Spectral-stochastic-based MOR
RC network, 109 power grid, 127
Response Gramian, 11, 148 Spectral stochastic method
RHS leakage analysis, 10
right-hand-side, 258 Spectral stochastic method
Run-time leakage, 77 mismatch, 240
estimation, 77 power grid, 108
reduction, 79 statistical leakage analysis, 40
total power analysis, 97
yield, 257
S SPICE
Sample space, 15 dynamic power analysis, 86
power grid, 111 mismatch, 240
Schmitt trigger, 265 total power analysis, 95
304 Index