Statistical Performance Analysis and Modeling Techniques For Nanometer Vlsi Designs

Statistical Performance Analysis and Modeling
Techniques for Nanometer VLSI Designs

Ruijing Shen • Sheldon X.-D. Tan • Hao Yu
Statistical Performance
Analysis and Modeling
Techniques for Nanometer
VLSI Designs
123
Ruijing Shen Sheldon X.-D. Tan
Department of Electrical Engineering Department of Electrical Engineering
University of California University of California
Riverside, USA Riverside, USA
Hao Yu
Department of Electrical and Electronic
Nanyang Technological University
Nanyang Avenue 50, Singapore
ISBN 978-1-4614-0787-4 e-ISBN 978-1-4614-0788-1

DOI 10.1007/978-1-4614-0788-1
Springer New York Dordrecht Heidelberg London
Library of Congress Control Number: 2012931560
© Springer Science+Business Media, LLC 2012

All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer software,
or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are
not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject
to proprietary rights.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

To our families
Preface
As VLSI technology scales into nanometer regime, chip design engineering faces
several challenges. One profound change in the chip design business is that engi-
neers cannot realize the design precisely into the silicon chips. Chip performance,
manufacture yield, and lifetime thereby cannot be determined accurately at the
design stage accordingly. The main culprit here is that many chip parameters—
such as oxide thickness due to chemical and mechanical polish (CMP) and impurity
density from doping fluctuations—cannot be determined or estimated precisely and
thus become unpredictable at device, circuit, and system levels, respectively. The
so-called manufacturing process variations start to play an essential role, and their
influence on the performance, yield, and reliability becomes significant. As a result,
variation-aware design methodologies and computer-aided design (CAD) tools are
widely believed to be the key to mitigate the unpredictability challenges for 45 nm
technologies and beyond. Variational characterization, modeling, and optimization,
hence, have to be incorporated into each step of the design and verification processes
to ensure reliable chips and profitable manufacture yields.
The book is divided into five parts. Part I introduces basic concepts of many
mathematic notations relevant to statistical analysis. Many established algorithms
and theories such as the Monte Carlo method, the spectral stochastic method, and
the principal factor analysis method and its variants will also be introduced. Part
II focuses on the techniques for statistical full-chip power consumption analy-
sis considering process variations. Chapter 3 reviews existing statistical leakage
analysis methods, as leakage powers are more susceptible to process variations.
Chapter 4 presents a gate-level leakage analysis method considering both inter-
die and inter-die variations with spatial correlations using the spectral stochastic
method. Chapter 5 tries to solve the similar problems in the previous chapter. But a
more efficient, linear-time algorithm is presented based on a virtual grid modeling of
process variations with spatial correlations. In Chap. 6, a statistical dynamic power
analysis technique using the combined virtual grid and the orthogonal polynomial
methods is presented. In Chap. 7, a statistical total chip power estimation method
will be presented. A collocation-based spectral-stochastic-based method is applied
to obtain the variational total chip powers based on accurate SPICE simulation.
vii
viii Preface
Part III emphasizes on variational analysis of on-chip power grid networks

under process variations. Chapter 8 introduces an efficient stochastic method for
analyzing the voltage drop variations of on-chip power grid networks, considering
log-normal leakage current variations with spatial correlation. Chapter 9 presents
another stochastic method for solving the similar problem in the previous chapter.
But model order reduction has been applied in this method to improve the efficiency
of the simulation. Chapter 10 introduces a new approach to variational power
grid analysis, where model order reduction techniques and variational subspace
modeling are used to obtain the variational voltage drop responses.
Part IV of this book is concerned with statistical interconnect extraction and
modeling under process variations. Chapter 11 presents a statistical capacitance
extraction method using Galerkin-based spectral stochastic method. Chapter 12
discusses a parallel and incremental solver for stochastic capacitance extraction.
Chapter 13 gives a statistical inductance extraction method by collocation-based
spectral stochastic method.
Part V of this book focuses on the performance bound and statistical analysis
of nanometer analog/mixed-signal circuits and the yield analysis and optimization
based on statistical performance analysis and modeling. Chapter 14 presents per-
formance bound analysis technique in s-domain for linearized analog circuits using
symbolic and affine interval methods. Chapter 15 presents an efficient stochastic
mismatch analysis technique for analog circuits using Galerkin-based spectral
stochastic method and nonlinear modeling. Chapter 16 shows a yield analysis and
optimization technique, and Chap. 17 describes a yield optimization algorithm by
an improved voltage binning scheme.
The content of the book comes mainly from the recent publications of authors.
Many of those original publications can be found at http://www.ee.ucr.edu/stan/
project/sts ana/main sts ana proj.htm. Future errata and update about this book can
be found at http://www.ee.ucr.edu/stan/project/books/book11 springer.htm.
Riverside, CA, USA Ruijing Shen

Riverside, CA, USA Sheldon X.-D. Tan
Singapore, Singapore Hao Yu
Acknowledgment
The contents of the book mainly come from the research works done in the Mixed-
Signal Nanometer VLSI Research Lab (MSLAB) at the University of California at
Riverside over the past several years. Some of the presented methods also come from
the research from Dr. Hao Yu’s research group at Nanyang Technological University,
Singapore.
It is a pleasure to record our gratitude to many Ph.D. students who have
contributed to this book. They include Dr. Duo Li, Dr. Ning Mi, Dr. Zhigang Hao,
and Mr. Fang Gong (UCLA) for some of their research works presented in this book.
Special thank is also given to Dr. Hai Wang, who helps to revise and proofread the
final draft of this book.
Sheldon X.-D. Tan is grateful to his collaborator Prof. Yici Cai of Tsinghua
University for the collaborative research works, which lead to some of the presented
works in this book. Sheldon X.-D. Tan is also obligated to Dr. Jinjun Xiong and
Dr. Chandu Visweswariah of IBM for their insights into many important problems
in industry, which inspired some of the works in this book.
The authors would like to thank both the National Science Foundation and
National Nature Science Foundation of China for their financial support for this
book. Sheldon X.-D. Tan highly appreciates the consistent support of Dr. Sankar
Basu of the National Science Foundation over the past 7 years. This book project is
funded in part by NSF grant under No. CCF-0448534; in part by NSF grants under
No. OISE-0623038, OISE-0929699, OISE-1051787, CCF-1116882; and OISE-
1130402; and in part by the National Natural Science Foundation of China (NSFC)
grant under No. 60828008. We would also would like to thank for the support of
UC Regent’s Committee on Research Fellowship and Faculty Fellowships from the
University of California at Riverside. Dr. Hao Yu would like also to acknowledge
the funding support from NRF2010NRF-POC001-001, Tier-1-RG 26/10, and Tier-
2-ARC 5/11 at Singapore.
Last not least, Sheldon X.-D. Tan would like to thank his wife, Yan, and his
daughters, Felicia and Leslay, for understanding and support during the many hours
it took to write this book. Ruijing Shen would like to express her deepest gratitude to
her adviser, Prof. Sheldon X.-D. Tan, for his help, trust, and guidance. There exist
ix
x Acknowledgment
wonders as well as frustrations in academic research. His kindness, insight, and

suggestions always let her go the right way. A special word of thanks for all of
Ruijing’s mentors in Tsinghua University (Prof. Xiangqing He, Prof. Xianlong
Hong, Prof. Changzheng Sun, et al.). They taught her about the world of electronics
(and much beyond). Finally, Ruijing Shen is extremely grateful to her husband,
Boyuan Yan, the whole family, and all her friends. She would like to thank them for
their constant support and encouragement during the writing of this manuscript.
Contents
Part I Fundamentals
1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3
1 Nanometer Chip Design in Uncertain World . . . . . . .. . . . . . . . . . . . . . . . . . . . 3
1.1 Causes of Variations .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4
1.2 Process Variation Classification and Modeling . . . . . . . . . . . . . . . . . . 6
1.3 Process Variation Impacts .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8
2 Book Outline .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8
2.1 Statistical Full-Chip Power Analysis . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9
2.2 Variational On-Chip Power Delivery Network Analysis . . . . . . . . 10
2.3 Statistical Interconnect Modeling and Extraction .. . . . . . . . . . . . . . . 11
2.4 Statistical Analog and Yield Analysis and Optimization . . . . . . . . 12
3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13
2 Fundamentals of Statistical Analysis . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15
1 Basic Concepts in Probability Theory . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15
1.1 Experiment, Sample Space, and Event . . . . . . .. . . . . . . . . . . . . . . . . . . . 15
1.2 Random Variable and Expectation .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 16
1.3 Variance and Moments of Random Variable .. . . . . . . . . . . . . . . . . . . . 17
1.4 Distribution Functions .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 18
1.5 Gaussian and Log-Normal Distributions . . . . .. . . . . . . . . . . . . . . . . . . . 19
1.6 Basic Concepts for Multiple Random Variables . . . . . . . . . . . . . . . . . 20
2 Multiple Random Variables and Variable Reduction.. . . . . . . . . . . . . . . . . . 23
2.1 Components of Covariance in Process Variation.. . . . . . . . . . . . . . . . 23
2.2 Random Variable Decoupling and Reduction .. . . . . . . . . . . . . . . . . . . 25
2.3 Principle Factor Analysis Technique . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 26
2.4 Weighted PFA Technique . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 26
2.5 Principal Component Analysis Technique . . .. . . . . . . . . . . . . . . . . . . . 27
3 Statistical Analysis Approaches.. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 28
3.1 Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 28
xi
xii Contents
3.2 Spectral Stochastic Method Using Stochastic

Orthogonal Polynomial Chaos . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 29
3.3 Collocation-Based Spectral Stochastic Method .. . . . . . . . . . . . . . . . . 31
3.4 Galerkin-Based Spectral Stochastic Method .. . . . . . . . . . . . . . . . . . . . 33
4 Sum of Log-Normal Random Variables . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 33
4.1 Hermite PC Representation of Log-Normal Variables.. . . . . . . . . . 34
4.2 Hermite PC Representation with One Gaussian Variable . . . . . . . 35
4.3 Hermite PC Representation of Two and More
Gaussian Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 35
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 36
Part II Statistical Full-Chip Power Analysis
3 Traditional Statistical Leakage Power Analysis Methods . . . . . . . . . . . . . . 39

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 39
2 Static Leakage Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 40
2.1 Gate-Based Static Leakage Model .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 41
2.2 MOSFET-Based Static Leakage Model.. . . . .. . . . . . . . . . . . . . . . . . . . 44
3 Process Variational Models for Leakage Analysis .. . . . . . . . . . . . . . . . . . . . 45
4 Full-Chip Leakage Modeling and Analysis Methods . . . . . . . . . . . . . . . . . . 49
4.1 Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 49
4.2 Traditional Grid-Based Methods.. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 49
4.3 Projection-Based Statistical Analysis Methods . . . . . . . . . . . . . . . . . . 53
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 53
4 Statistical Leakage Power Analysis by Spectral Stochastic Method . . 55
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 55
2 Flow of Gate-Based Method . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 56
2.1 Random Variables Transformation and Reduction.. . . . . . . . . . . . . . 57
2.2 Computation of Full-Chip Leakage Currents . . . . . . . . . . . . . . . . . . . . 58
2.3 Time Complexity Analysis . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 60
3 Numerical Examples.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 60
4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 63
5 Linear Statistical Leakage Analysis by Virtual Grid-Based Modeling 65
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 65
2 Virtual Grid-Based Spatial Correlation Model . . . . .. . . . . . . . . . . . . . . . . . . . 67
3 Linear Chip-Level Leakage Power Analysis Method .. . . . . . . . . . . . . . . . . 69
3.1 Computing Gate Leakage by the Spectral Stochastic Method . . 70
4 New Statistical Leakage Characterization in SCL .. . . . . . . . . . . . . . . . . . . . 72
4.1 Acceleration by Look-Up Table Approach .. .. . . . . . . . . . . . . . . . . . . . 72
4.2 Enhanced Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 73
Contents xiii
4.4 Incremental Leakage Analysis . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 76

4.6 Discussion of Extension to Statistical Runtime
Leakage Estimation.. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 77
4.7 Discussion about Runtime Leakage Reduction Technique . . . . . . 79
5.1 Accuracy and CPU Time .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 80
5.2 Incremental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 82
6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 82
6 Statistical Dynamic Power Estimation Techniques .. . . . . . . . . . . . . . . . . . . . 83
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 83
2 Prior Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 85
2.1 Existing Relevant Works . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 85
2.2 Segment-Based Power Estimation Method.. .. . . . . . . . . . . . . . . . . . . . 86
3 The Presented New Statistical Dynamic Power Estimation Method . . 87
3.1 Flow of the Presented Analysis Method .. . . . .. . . . . . . . . . . . . . . . . . . . 87
3.2 Acceleration by Building the Look-Up Table .. . . . . . . . . . . . . . . . . . . 88
3.3 Statistical Gate Power with Glitch Width Variation . . . . . . . . . . . . . 89
3.4 Computation of Full-Chip Dynamic Power . .. . . . . . . . . . . . . . . . . . . . 89
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 92
7 Statistical Total Power Estimation Techniques . . . . . .. . . . . . . . . . . . . . . . . . . . 93
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 93
2 Review of the Monte Carlo-Based Power Estimation Method . . . . . . . . 95
3 The Statistical Total Power Estimation Method .. . .. . . . . . . . . . . . . . . . . . . . 96
3.1 Flow of the Presented Analysis Method Under Fixed
Input Vector .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 97
3.2 Computing Total Power by Orthogonal Polynomials .. . . . . . . . . . . 97
3.3 Flow of the Presented Analysis Method Under
Random Input Vectors .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 98
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 103
Part III Variational On-Chip Power Delivery Network

Analysis
8 Statistical Power Grid Analysis Considering Log-Normal

Leakage Current Variations .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 107
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 107
2 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 108
3 Nominal Power Grid Network Model . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 109
4 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 111
xiv Contents
5 Statistical Power Grid Analysis Based on Hermite PC . . . . . . . . . . . . . . . . 112

5.1 Galerkin-Based Spectral Stochastic Method .. . . . . . . . . . . . . . . . . . . . 112
5.2 Spatial Correlation in Statistical Power Grid Analysis . . . . . . . . . . 114
5.3 Variations in Wires and Leakage Currents . . .. . . . . . . . . . . . . . . . . . . . 115
6.1 Comparison with Taylor Expansion Method .. . . . . . . . . . . . . . . . . . . . 118
6.2 Examples Without Spatial Correlation . . . . . . .. . . . . . . . . . . . . . . . . . . . 119
6.3 Examples with Spatial Correlation . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 122
6.4 Consideration of Variations in Both Wire and Currents . . . . . . . . . 123
7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 126
9 Statistical Power Grid Analysis by Stochastic Extended
Krylov Subspace Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 127
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 127
3 Review of Extended Krylov Subspace Method . . . .. . . . . . . . . . . . . . . . . . . . 128
4 The Stochastic Extended Krylov Subspace Method—StoEKS .. . . . . . . 130
4.1 StoEKS Algorithm Flowchart.. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 130
4.2 Generation of the Augmented Circuit Matrices . . . . . . . . . . . . . . . . . . 130
4.3 Computation of Hermite PCs of Current Moments
with Log-Normal Distribution . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 133
4.4 The StoEKS Algorithm . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 135
4.5 A Walk-Through Example . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 136
4.6 Computational Complexity Analysis . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 137
6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 143
10 Statistical Power Grid Analysis by Variational Subspace Method .. . . 145
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 145
2 Review of Fast Truncated Balanced Realization Methods .. . . . . . . . . . . . 146
2.1 Standard Truncated Balanced Realization Methods . . . . . . . . . . . . . 146
2.2 Fast and Approximate TBR Methods.. . . . . . . .. . . . . . . . . . . . . . . . . . . . 147
2.3 Statistical Reduction by Variational TBR . . . .. . . . . . . . . . . . . . . . . . . . 148
3 The Presented Variational Analysis Method: varETBR . . . . . . . . . . . . . . . 148
3.1 Extended Truncated Balanced Realization Scheme.. . . . . . . . . . . . . 148
3.2 The Presented Variational ETBR Method .. . .. . . . . . . . . . . . . . . . . . . . 150
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 158
Part IV Statistical Interconnect Modeling and Extractions
11 Statistical Capacitance Modeling and Extraction.. .. . . . . . . . . . . . . . . . . . . . 163

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 163
3 Presented Orthogonal PC-Based Extraction Method: StatCap . . . . . . . . 166
3.1 Capacitance Extraction Using Galerkin-Based Method . . . . . . . . . 166
Contents xv
3.2 Expansion of Potential Coefficient Matrix . . .. . . . . . . . . . . . . . . . . . . . 167

3.3 Formulation of the Augmented System . . . . . .. . . . . . . . . . . . . . . . . . . . 170
4 Second-Order StatCap .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 171
4.1 Derivation of Analytic Second-Order Potential
Coefficient Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 172
4.2 Formulation of the Augmented System . . . . . .. . . . . . . . . . . . . . . . . . . . 173
6 Additional Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 177
7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 182
12 Incremental Extraction of Variational Capacitance . . . . . . . . . . . . . . . . . . . 183
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 183
2 Review of GRMES and FMM Algorithms . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 184
2.1 The GMRES Method .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 184
2.2 The Fast Multipole Method . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 184
3 Stochastic Geometrical Moment . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 185
3.1 Geometrical Moment .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 186
3.2 Orthogonal PC Expansion . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 188
4 Parallel Fast Multipole Method with SGM . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 189
4.1 Upward Pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 190
4.2 Downward Pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 191
4.3 Data Sharing and Communication .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 191
5 Incremental GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 193
5.1 Deflated Power Iteration . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 194
5.2 Incremental Precondition.. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 194
6 piCAP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 196
6.1 Extraction Flow .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 196
6.2 Implementation Optimization .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 198
7.1 Accuracy Validation .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 199
7.2 Speed Validation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 202
7.3 Eigenvalue Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 205
8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 207
13 Statistical Inductance Modeling and Extraction .. . .. . . . . . . . . . . . . . . . . . . . 209
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 209
3 The Presented Statistical Inductance Extraction Method—statHenry. 212
3.1 Variable Decoupling and Reduction . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 212
3.2 Variable Reduction by Weighted PFA . . . . . . . .. . . . . . . . . . . . . . . . . . . . 213
3.3 Flow of statHenry Technique . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 214
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 218
xvi Contents
Part V Statistical Analog and Yield Analysis

and Optimization Techniques
14 Performance Bound Analysis of Variational Linearized

Analog Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 221
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 221
2 Review of Interval Arithmetic and Affine Arithmetic . . . . . . . . . . . . . . . . . 222
3 The Performance Bound Analysis Method Based
on Graph-based Symbolic Analysis . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 223
3.1 Variational Transfer Function Computation ... . . . . . . . . . . . . . . . . . . . 223
3.2 Performance Bound by Kharitonov’s Functions . . . . . . . . . . . . . . . . . 228
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 233
15 Stochastic Analog Mismatch Analysis. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 235
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 235
2 Preliminary .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 237
2.1 Review of Mismatch Model.. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 237
2.2 Nonlinear Model Order Reduction . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 237
3 Stochastic Transient Mismatch Analysis . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 239
3.1 Stochastic Mismatch Current Model . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 239
3.2 Perturbation Analysis. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 240
3.3 Non-Monte Carlo Analysis by Spectral Stochastic Method .. . . . 240
3.4 A CMOS Transistor Example .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 242
4 Macromodeling for Mismatch Analysis . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 242
4.1 Incremental Trajectory-Piecewise-Linear Modeling .. . . . . . . . . . . . 243
4.2 Stochastic Extension for Mismatch Analysis . . . . . . . . . . . . . . . . . . . . 246
5.1 Comparison of Mismatch Waveform-Error and Runtime . . . . . . . 248
5.2 Comparison of TPWL Macromodel . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 251
6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 252
16 Statistical Yield Analysis and Optimization . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 253
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 253
2 Problem Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 254
3 Stochastic Variation Analysis for Yield Analysis . .. . . . . . . . . . . . . . . . . . . . 256
3.1 Algorithm Overview.. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 258
3.2 Stochastic Yield Estimation and Optimization .. . . . . . . . . . . . . . . . . . 259
3.3 Fast Yield Calculation .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 259
3.4 Stochastic Sensitivity Analysis . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 260
3.5 Multiobjective Optimization . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 262
4.1 NMC Mismatch for Yield Analysis . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 266
4.2 Stochastic Yield Estimation .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 266
4.3 Stochastic Sensitivity Analysis . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 268
4.4 Stochastic Yield Optimization . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 270
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 272
Contents xvii
17 Voltage Binning Technique for Yield Optimization .. . . . . . . . . . . . . . . . . . . . 273

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 273
2.1 Yield Estimation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 274
2.2 Voltage Binning Problem . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 275
3 The Presented Voltage Binning Method . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 276
3.1 Voltage Binning Considering Valid Segment . . . . . . . . . . . . . . . . . . . . 277
3.2 Bin Number Prediction Under Given Yield Requirement . . . . . . . 278
3.3 Yield Analysis and Optimization . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 280
4.1 Setting of Process Variation .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 282
4.2 Prediction of Bin Numbers Under Yield Requirement . . . . . . . . . . 282
4.3 Comparison Between Uniform and Optimal Voltage
Binning Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 283
4.4 Sensitivity to Frequency and Power Constraints .. . . . . . . . . . . . . . . . 284
4.5 CPU Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 284
5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 285
References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 287
Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 299
List of Figures
Fig. 1.1 OPT and PSM procedures in the manufacture process .. . . . . . . . . . . . . 5

Fig. 1.2 Chemical and mechanical polishing (CMP) process . . . . . . . . . . . . . . . . 6
Fig. 1.3 The dishing and oxide erosion after the CMP process . . . . . . . . . . . . . . 7
Fig. 1.4 The comparison of circuit total power distribution
of circuit c432 in ISCAS’85 benchmark sets (top)
under random input vectors (with 0.5 input signal
and transition probabilities) and (bottom) under a
fixed input vector with effective channel length spatial
correlations. Reprinted with permission from [62]
c 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9
Fig. 2.1 Grid-based model for spatial correlations .. . . . . . .. . . . . . . . . . . . . . . . . . . . 24
Fig. 3.1 Subthreshold leakage currents for four different input

patterns in AND2 gate under 45 nm technology .. . . . . . . . . . . . . . . . . . . . 42
Fig. 3.2 Gate oxide leakage currents for four different input
patterns in AND2 gate under 45 nm technology .. . . . . . . . . . . . . . . . . . . . 43
Fig. 3.3 Typical layout of a MOSFET .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 44
Fig. 3.4 Procedure to derive the effective gate channel length model . . . . . . . 45
Fig. 4.1 An example of a grid-based partition. Reprinted with

permission from [157] c 2010 Elsevier . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 56
Fig. 4.2 The flow of the presented algorithm . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 57
Fig. 4.3 Distribution of the total leakage currents of the
presented method, the grid-based method, and the MC
method for circuit SC0 (process variation parameters
set as Case 1). Reprinted with permission from [157]
c 2010 Elsevier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 62
xix
xx List of Figures
Fig. 5.1 Location-dependent modeling with the T .i / of grid cell

i defined as its seven neighbor cells. Reprinted with
permission from [159] c 2010 IEEE .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 68
Fig. 5.3 Relation between .d / and d= .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 72
Fig. 5.4 The flow of statistical leakage characterization in SCL . . . . . . . . . . . . . 74
Fig. 5.5 The flow of the presented algorithm using statistical
leakage characterization in SCL . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 74
Fig. 5.6 Simulation flow for full-chip runtime leakage .. .. . . . . . . . . . . . . . . . . . . . 78
Fig. 6.1 The dynamic power versus effective channel length for
an AND2 gate in 45 nm technology (70 ps active pulse
as partial swing, 130 ps active pulse as full swing).
Reprinted with permission from [60] c 2010 IEEE .. . . . . . . . . . . . . . . 84
Fig. 6.2 A transition waveform example fE1 ; E2 ; : : : ; Em g for a
node. Reprinted with permission from [60] c 2010 IEEE . . . . . . . . . 86
Fig. 6.4 The flow of building the sub LUT . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 88
Fig. 7.1 The comparison of circuit total power distribution

of circuit c432 in ISCAS’85 benchmark sets (top)
under random input vectors (with 0.5 input signal
and transition probabilities) and (bottom) under a
fixed input vector with effective channel length spatial
correlations. Reprinted with permission from [62]
c 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 94
Fig. 7.2 The flow of the presented algorithm under a fixed input vector . . . . 97
Fig. 7.3 The selected power points a, b, and c from the power
distribution under random input vectors. Reprinted with
permission from [62] c 2011 IEEE . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 99
Fig. 7.4 The flow of the presented algorithm with random input
vectors and process variations . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 100
Fig. 7.5 The comparison of total power distribution PDF and
CDF between STEP method and MC method for
circuit c880 under a fixed input vector. Reprinted with
permission from [62] c 2011 IEEE . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 101
Fig. 7.6 The comparison of total power distribution PDF and
CDF between STEP method and Monte Carlo method
for circuit c880 under random input vector. Reprinted
with permission from [62] c 2011 IEEE . . . . . . .. . . . . . . . . . . . . . . . . . . . 103
Fig. 8.1 The power grid model used . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 110

List of Figures xxi
Fig. 8.2 Distribution of the voltage in a given node with one

Gaussian variable, g D 0:1, at time 50 ns when
the total simulation time is 200 ns. Reprinted with
Fig. 8.3 Distribution of the voltage caused by the leakage
currents in a given node with one Gaussian variable,
g D 0:5, in the time instant from 0 ns to 126 ns.
Reprinted with permission from [109] c 2008 IEEE . . . . . . . . . . . . . . 120
Fig. 8.4 Distribution of the voltage in a given node with two
Gaussian variables, g1 D 0:1 and g2 D 0:5, at
time 50 ns when the total simulation time is 200 ns.
Fig. 8.5 Correlated random variables setup in ground circuit
divided into two parts. Reprinted with permission from
[109] c 2008 IEEE.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 122
Fig. 8.6 Distribution of the voltage in a given node with
two Gaussian variables with spatial correlation, at
Fig. 8.7 Correlated random variables setup in ground circuit
divided into four parts. Reprinted with permission from
[109] c 2008 IEEE.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 123
four Gaussian variables with spatial correlation, at
circuit partitioned of 5 5 with spatial correlation, at
Fig. 8.10 Distribution of the voltage in a given node in circuit5
with variation on G,C,I, at time 50 ns when the total
simulation time is 200 ns. Reprinted with permission
from [109] c 2008 IEEE.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 125
Fig. 9.1 The EKS algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 129

Fig. 9.2 Flowchart of the StoEKS algorithm. Reprinted with
Fig. 9.3 The StoEKS algorithm .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 135
Fig. 9.4 Distribution of the voltage variations in a given node by
StoEKS, HPC, and Monte Carlo of a circuit with 280
nodes with three random variables. gi .t/ D 0:1ud i .t/.
xxii List of Figures
Fig. 9.5 Distribution of the voltage variations in a given node

by StoEKS, HPC, and MC of a circuit with 2,640
nodes with seven random variables. gi .t/ D 0:1ud i .t/.
Fig. 9.6 Distribution of the voltage variations in a given node by
StoEKS and MC of a circuit with 2,640 nodes with 11
random variables. gi .t/ D 0:1ud i .t/. Reprinted with
Fig. 9.7 A PWL current source at certain node. Reprinted with
Fig. 9.8 Distribution of the voltage variations in a given node
by StoEKS, HPC, and Monte Carlo of a circuit with
280 nodes with three random variables using the
time-invariant leakage model. gi D 0:1Ip . Reprinted
with permission from [110] c 2008 IEEE . . . . . .. . . . . . . . . . . . . . . . . . . . 143
Fig. 10.1 Flow of ETBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 149

Fig. 10.2 Flow of varETBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 152
Fig. 10.3 Transient waveform at the 1,000th node
(n1 20583 11663) of ibmpg1 (p D 10, 100 samples).
Reprinted with permission from [91] c 2010 Elsevier .. . . . . . . . . . . . 154
Fig. 10.4 Transient waveform at the 1,000th node
(n3 16800 9178400) of ibmpg6 (p D 10, 10 samples).
Reprinted with permission from [91] c 2010 Elsevier .. . . . . . . . . . . . 154
Fig. 10.5 Simulation errors of ibmpg1 and ibmpg6. Reprinted
with permission from [91] c 2010 Elsevier . . . .. . . . . . . . . . . . . . . . . . . . 155
Fig. 10.6 Relative errors of ibmpg1 and ibmpg6. Reprinted with
permission from [91] c 2010 Elsevier . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 155
Fig. 10.7 Voltage distribution at the 1,000th node of ibmpg1
(10,000 samples) when t D 50 ns. Reprinted with
permission from [91] c 2010 Elsevier . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 156
Fig. 11.1 A 2 2 bus. Reprinted with permission from [156]

c 2010 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 175
Fig. 11.2 Three-layer metal planes. Reprinted with permission
from [156] c 2010 IEEE.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 176
Fig. 12.1 Multipole operations within the FMM algorithm.

Reprinted with permission from [56] c 2011 IEEE .. . . . . . . . . . . . . . . 185
Fig. 12.2 Structure of augmented system in piCAP . . . . . . . .. . . . . . . . . . . . . . . . . . . . 189
Fig. 12.3 The M2M operation in an upward pass to evaluate local
interactions around sources .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 190
Fig. 12.4 The M2L operation in a downward pass to evaluate
interactions of well-separated source cube and observer cube .. . . . . 192
Fig. 12.5 The L2L operation in a downward pass to sum all integrations . . . . 193
List of Figures xxiii
Fig. 12.6 Prefetch operation in M2L. Reprinted with permission

from [56] c 2011 IEEE .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 193
Fig. 12.7 Stochastic capacitance extraction algorithm . . . . .. . . . . . . . . . . . . . . . . . . . 197
Fig. 12.8 Two distant panels in the same plane.. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 200
Fig. 12.9 Distribution comparison between Monte Carlo and piCAP .. . . . . . . . 202
Fig. 12.10 The structure and discretization of two-layer example
with 20 conductors. Reprinted with permission from
[56] c 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 203
Fig. 12.11 Test structures: (a) plate, (b) cubic, and (c) crossover
22. Reprinted with permission from [56] c 2011 IEEE .. . . . . . . . . 204
Fig. 12.12 The comparison of eigenvalue distributions (panel
width as variation source).. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 206
Fig. 12.13 The comparison of eigenvalue distributions (panel
distance as variation source) .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 207
Fig. 13.1 The statHenry algorithm .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 214

Fig. 13.2 Four test structures used for comparison .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 215
Fig. 13.3 The loop inductance L12l distribution changes for the
10-parallel-wire case under 30% width and height variations . . . . . . 217
Fig. 13.4 The partial inductance L11p distribution changes for
the 10-parallel-wire case under 30% width and height variations . . 218

Fig. 14.2 An example circuit. Reprinted with permission from
[61]. c 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 224
Fig. 14.3 A matrix determinant and its DDD representation.
Reprinted with permission from [61]. c 2011 IEEE . . . . . . . . . . . . . . . 225
Fig. 14.4 (a) Kharitonov’s rectangle in state 8. (b) Kharitonov’s
rectangle for all nine states. Reprinted with permission
from [61]. c 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 229
Fig. 14.5 (a) A low-pass filter. (b) A linear model of the op-amp
in the low-pass filter. Reprinted with permission from
[61]. c 2011 IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 231
Fig. 14.6 Bode diagram of the CMOS low-pass filter. Reprinted
with permission from [61]. c 2011 IEEE .. . . . . .. . . . . . . . . . . . . . . . . . . . 232
Fig. 14.7 Bode diagram of the CMOS cascode op-amp. Reprinted
with permission from [61]. c 2011 IEEE .. . . . . .. . . . . . . . . . . . . . . . . . . . 233
Fig. 15.1 Transient mismatch (the time-varying standard

deviation) comparison at output of a BJT mixer with
distributed inductor: the exact by Monte CarloN and
the exact by orthogonal PC expansion. Reprinted with
permission from [52]. c 2011 ACM . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 249
xxiv List of Figures

Fig. 15.2 Transient nominal x .0/ .t/ (a) and transient mismatch
(˛1 .t/) (b) for one output of a COMS comparator by
the exact orthogonal PC and the isTPWL. Reprinted
with permission from [52]. c 2011 ACM. . . . . . .. . . . . . . . . . . . . . . . . . . . 249
Fig. 15.3 Transient waveform comparison at output of a diode
chain: the transient nominal, the transient with
mismatch by SiSMA (adding mismatch at i c only),
the transient with mismatch by the presented method
(adding mismatch at transient trajectory). Reprinted
with permission from [52]. c 2011 ACM. . . . . . .. . . . . . . . . . . . . . . . . . . . 250
Fig. 15.4 Transient mismatch (˛1 .t/, the time-varying standard
deviation) comparison at output of a BJT mixer with
distributed substrate: the exact by OPC expansion, the
macromodel by TPWL (order 45), and the macromodel
by isTPWL (order 45). The waveform by isTPWL is
visually identical to the exact OPC. Reprinted with
permission from [52]. c 2011 ACM . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 250
Fig. 15.5 (a) Comparison of the ratio of the waveform error by
TPWL and by isTPWL under the same reduction order.
(b) comparison of the ratio of the reduction runtime by
maniMOR and by isTPWL under the same reduction
order. In both cases, isTPWL is used as the baseline.
Reprinted with permission from [52]. c 2011 ACM .. . . . . . . . . . . . . . 251
Fig. 16.1 Example of the stochastic transient variation or mismatch . . . . . . . . . 254

Fig. 16.2 Distribution of output voltage at tmax . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 255
Fig. 16.3 Parametric yield estimation based on orthogonal
PC-based stochastic variation analysis . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 260
Fig. 16.4 Stochastic yield optimization .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 263
Fig. 16.5 Power consumption optimization . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 264
Fig. 16.6 Schematic of operational amplifier .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 266
Fig. 16.7 NMC mismatch analysis vs. Monte Carlo for
operational amplifier case. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 267
Fig. 16.8 Schematic of Schmitt trigger . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 268
Fig. 16.9 Comparison of Schmitt trigger example . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 269
Fig. 16.10 Schematic of SRAM 6-T cell . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 270
Fig. 16.11 Voltage distribution at BL B node .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 271
Fig. 16.12 NMC mismatch analysis vs. MC . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 271
Fig. 17.1 The algorithm sketch of the presented new voltage

binning method .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 276
Fig. 17.2 The delay and power change with supply voltage for C432 . . . . . . . . 277
Fig. 17.3 Valid voltage segment graph and the voltage binning solution . . . . . 278
Fig. 17.4 Histogram of the length of valid supply voltage segment
len for C432 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 279
List of Figures xxv
Fig. 17.5 The flow of greedy algorithm for covering most

uncovered elements in S . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 281
Fig. 17.6 Yield under uniform and optimal voltage binning
schemes for C432 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 284
Fig. 17.7 Maximum achievable yield as function of power and
performance constraints for C2670 . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 285
List of Tables
Table 3.1 Different methods for full-chip SLA . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 40

Table 3.2 Relative errors by using different fitting formulas for
leakage currents of AND2 gate . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 43
Table 3.3 Process variation parameter breakdown for 45 nm technology.. . . 46
Table 4.1 Process variation parameter breakdown for 45 nm technology.. . . 61

Table 4.2 Comparison of the mean values of full-chip leakage
currents among three methods . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 62
Table 4.3 Comparison standard deviations of full-chip leakage
currents among three methods . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 63
Table 4.4 CPU time comparison among three methods . .. . . . . . . . . . . . . . . . . . . . 63
Table 5.1 Summary of test cases used in this chapter.. . . .. . . . . . . . . . . . . . . . . . . . 80

Table 5.2 Accuracy comparison of different methods based on
Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 81
Table 5.3 CPU time comparison .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 81
Table 5.4 Incremental leakage analysis cost . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 82
Table 6.1 Summary of benchmark circuits . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 91

Table 6.2 Statistical dynamic power analysis accuracy
comparison against Monte Carlo . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 91
Table 6.3 CPU time comparison .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 92
Table 7.1 Summary of benchmark circuits . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 100

Table 7.2 Total power distribution under fixed input vector .. . . . . . . . . . . . . . . . . 101
Table 7.3 Sampling number comparison under fixed input vector . . . . . . . . . . . 101
Table 7.4 Total power distribution comparison under random
input vector and spatial correlation . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 102
xxvii
xxviii List of Tables
Table 8.1 Accuracy comparison between Hermite PC (HPC)

and Taylor expansion . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 119
Table 8.2 CPU time comparison with the Monte Carlo method
of one random variable .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 121
Table 8.3 CPU time comparison with the Monte Carlo method
of two random variables .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 121
Table 8.4 Comparison between non-PCA and PCA against
Monte Carlo methods . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 122
Table 8.5 CPU time comparison with the MC method
considering variation in G,C,I . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 125
Table 9.1 CPU time comparison of StoEKS and HPC with the
Monte Carlo method. gi .t/ D 0:1ud i .t/ . . . . . .. . . . . . . . . . . . . . . . . . . . 141
Table 9.2 Accuracy comparison of different methods, StoEKS,
HPC, and MC. gi .t/ D 0:1ud i .t/ . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 142
Table 9.3 Error comparison of StoEKS and HPC over Monte
Carlo methods. gi .t/ D 0:1ud i .t/ . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 142
Table 10.1 Power grid (PG) benchmarks . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 153

Table 10.2 CPU times (s) comparison of varETBR and Monte
Carlo (q D 50, p D 10) .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 156
Table 10.3 Projected CPU times (s) comparison of varETBR and
Monte Carlo (q D 50, p D 10, 10,000 samples) . . . . . . . . . . . . . . . . . . 157
Table 10.4 Relative errors for the mean of max voltage drop of
varETBR compared with Monte Carlo on the 2,000th
node of ibmpg1 (q D 50, p D 10, 10,000 samples) for
different variation ranges and different numbers of variables . . . . . 157
Table 10.5 Relative errors for the variance of max voltage drop of
varETBR compared with Monte Carlo on the 2,000th
node of ibmpg1 (q D 50, p D 10, 10,000 samples)
for different variation ranges and different numbers of variables . 157
Table 10.6 CPU times (s) comparison of StoEKS and varETBR
(q D 50, p D 10) with 10,000 samples for different
numbers of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 158
Table 11.1 Number of nonzero element in Wi . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 174

Table 11.2 The test cases and the parameters setting . . . . . .. . . . . . . . . . . . . . . . . . . . 175
Table 11.3 CPU runtime (in seconds) comparison among MC,
SSCM, and StatCap(1st/2nd) . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 176
Table 11.4 Capacitance mean value comparison for the 11 bus . . . . . . . . . . . . . 177
Table 11.5 Capacitance standard deviation comparison for the
1 1 bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 177
Table 11.6 Error comparison of capacitance mean values among
SSCM, and StatCap (first- and second-order) ... . . . . . . . . . . . . . . . . . . . 178
List of Tables xxix
Table 11.7 Error comparison of capacitance standard deviations

among SSCM, and StatCap (first- and second-order).. . . . . . . . . . . . . 179
Table 12.1 Accuracy comparison of two orthogonal PC expansions .. . . . . . . . . 200

Table 12.2 Incremental analysis versus MC method .. . . . . .. . . . . . . . . . . . . . . . . . . . 201
Table 12.3 Accuracy and runtime(s) comparison between
MC(3,000), piCap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 201
Table 12.4 MVP runtime (s)/speedup comparison for four
different examples .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 203
Table 12.5 Runtime and iteration comparison for different examples.. . . . . . . . 204
Table 12.6 Total runtime(s) comparison for two-layer
20-conductor by different methods . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 205
Table 13.1 Accuracy comparison (mean and variance values of

inductances) among MC, HPC, and statHenry.. . . . . . . . . . . . . . . . . . . . 216
Table 13.2 CPU runtime comparison among MC, HPC, and statHenry . . . . . . 216
Table 13.3 Reduction effects of PFA and wPFA . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 216
Table 13.4 Variation impacts on inductances using statHenry.. . . . . . . . . . . . . . . . 217
Table 14.1 Extreme values of jP .j!/j and ArgP .j!/ for

nine states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 229
Table 14.2 Summary of coefficient radius reduction with cancellation . . . . . . . 231
Table 14.3 Summary of DDD information and performance of
the presented method .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 231
Table 15.1 Scalability comparison of runtime and error for the

exact model with MC, the exact model with OPC, and
the isTPWL macromodel with OPC . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 248
Table 16.1 Comparison of accuracy and runtime . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 267

Table 16.2 Comparison of accuracy and runtime . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 268
Table 16.3 Sensitivity of output with respect to each MOSFET
width variation pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 269
Table 16.4 Sensitivity of vBL B and power with respect to each
MOSFET width variation pi . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 271
Table 16.5 Comparison of different yield optimization algorithms
for SRAM cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 272
Table 17.1 Predicted and actual number of bins needed under

yield requirement .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 282
Table 17.2 Yield under uniform and optimal voltage binning
schemes (%) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 283
Table 17.3 CPU time comparison(s) . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 285
Part I
Fundamentals
Chapter 1
Introduction
1 Nanometer Chip Design in Uncertain World
As VLSI technology scales into the nanometer regime, chip design engineering
faces several challenges in maintaining historical rates of performance improvement
and capacity increase with CMOS technologies. One profound change in the chip
design business is that engineers cannot put the design precisely into the silicon
chips. Chip performance, manufacture yield, and lifetime become unpredictable at
the design stage, and they cannot be determined accurately at the design stage. The
main culprit is that many chip parameters—such as oxide thickness due to chemi-
cal and mechanical polish (CMP) and impurity density from doping fluctuations—
cannot be determined precisely and thus are unpredictable. The so-called manu-
facture process variations start to play a big role, and their influence on the chip’s
performance, yield, and reliability becomes significant [16, 78, 121, 122, 170].
Traditional corner-based analysis and design approaches apply guard bands to
consider parameter variations, which may lead to too conservative designs. Such
pessimism can lead to increased design efforts and prolonged time to market. Also
a worse case is a circuit that does not correspond with all parameters at their worst
or best process conditions. It will become extremely difficult to find such a worst
case by simulating a limited number of corner cases.
As a result, it is imperative to develop new design methodologies to consider the
impacts of various process and environmental uncertainties and elevated temper-
ature on chip performance. Variational impacts have to be incorporated into every
step of design process to ensure the reliable chips and profitable manufacture yields.
The design methodologies and design tools from system level down to the physical
levels have to consider variability impacts on the chip performance, which calls for
new statistical optimization approaches for designing nanometer VLSI systems.
Performance modeling and analysis of nanometer VLSI systems in the presence
of process-induced variation and uncertainty is the one crucial problem facing IC
chip designers and design tool developers. How to efficiently and accurately assess
R. Shen et al., Statistical Performance Analysis and Modeling Techniques 3

for Nanometer VLSI Designs, DOI 10.1007/978-1-4614-0788-1 1,
4 1 Introduction
the impacts of the process variations on circuit performances in the various physical
design steps is critical for fast design closure, yield improvement, cost reduction
of VLSI design, and fabrication processes. The design methodologies and design
tools from system level down to the physical levels have to embrace variability
impacts on the nanometer VLSI chips, which calls for statistical/stochastic-based
approaches for designing 90 nm and beyond VLSI systems. The advantages and
promises of statistical analysis is that the impact of parameter variations on a circuit
is simultaneously obtained with a less computing effort and the impacts on yield can
be properly understood and used for further optimization.
1.1 Causes of Variations
To consider the impact of variations on the circuit performance, we should first

understand the sources of variations and how they affect circuit performances.
The first source is process-induced variation, which is value fluctuation of
process parameters during the manufacture process. Those variations will affect
the performance of devices and interconnects. For instance, chip leakage power
(especially subthreshold leakage power) is very sensitive to channel length varia-
tions owing to the exponential relationship between leakage current and effective
channel length. Process variation is caused by different sources such as lithography
(optical proximity correction, PSM), etching, CMP, doping process, etc. [16, 170].
Figure 1.1 gives cartoon illustrations for the optical proximity correction (OPC)
process (a) and phase-shift mask (b) procedures. Figure 1.2 shows the CMP
process. Some of the variations are systematic, i.e., those caused by the lithography
process [42, 129]. Some are purely random, i.e., the doping density of impurities
and edge roughness, etching, and CMP [7]. Process variations can occur at different
levels: wafer level, inter-die level, and intra-die level, and we will discuss this in
detail soon.
In addition to the process-induced variations, there are also variations from the
chip operational environments. These include temperature variations and power
supply variations, which will affect circuit timing and powers. A reduced power
supply will reduce the driving strength of the devices and, hence, degrades their
performance. The so-called power supply integrity now becomes a serious concern
for chip sign-off. On the other hand, increased temperature will lead to more leak-
age, which in turn will result in more heat generated and high on-chip temperature.
Such positive feedback can sometimes lead to thermal runaway and ultimate failure
of the devices. Further, both voltage supply degradation and temperature are subject
to process-induced variations as they are functions of chip power (dynamic, short,
and leakage), which are susceptible to process variations.
In addition to the mentioned variations, chip performance also changes over the
time due to aging and other reliability physical effects such as hot carrier injections,
negative/positive bias temperature instability (N/PBTI), and electromigration. Hot
carrier injection can trigger numerous physical damages in the devices and cause
1 Nanometer Chip Design in Uncertain World 5
Optical proximity correction (OPC) process.
Phase shift mask (PSM) process.
Fig. 1.1 OPT and PSM procedures in the manufacture process
the voltage threshold voltage shift. N/PBTI will also lead to increased threshold
voltage, decreased drain current and transconductance of devices. Electromigration
will result in increased wire resistance and timing degradation of wires and even
lead to failure of the wires in the worst case. Those variations typically happen after
chips have been used for a while and were more studied as reliability issues than
variation problems in the past. So in this book, we do not consider such aging- and
reliability-related variations.
6 1 Introduction
Fig. 1.2 Chemical and mechanical polishing (CMP) process
1.2 Process Variation Classification and Modeling
To facilitate the modeling and analysis, it is beneficial to classify the process

variations into different categories. In general, process variations can be classified
into the following categories [16, 170]: inter-die and intra-die. Inter-die variations
are the variations from die to die, wafer to wafer, and lot to lot. Those are typically
represented by a single variable for each die. As a result, inter-die variations are
global variables and affect all the devices on a chip in the same way, i.e., make the
transistor gate channel lengths of all the devices on the same chip smaller.
In this book, we can model parameter variation as follows:
ıtotal D ıinter ; (1.1)
where ıinter represents the inter-die variation. Typically, inter-die variations have
simple distributions such as Gaussian. For a single parameter variation, inter-die
variation impact can be very easily captured as all the devices in a die take the same
values. In other words, under inter-die variation, if the circuit performance metrics
such as power, timing, and noises of all gates or devices are sensitive to the process
parameters in a similar way, then the circuit performance can be analyzed at multiple
process corners using deterministic analysis methods. However, if a number of inter-
die process variations are considered and they are also correlated, the corner cases
will grow exponentially with the increased number of process parameters.
Intra-die variations correspond to variability within a single chip. Intra-die
variations may affect different devices differently on the same die, i.e., make
some devices to have smaller gate oxide thicknesses and others to have larger
1 Nanometer Chip Design in Uncertain World 7
Fig. 1.3 The dishing and oxide erosion after the CMP process
transistor gate oxide thicknesses. In addition, intra-die variations may exhibit spatial
correlation due to proximity effects, i.e., it is more likely for devices located close
to each other to have similar characteristics than those placed far away.
Obviously, intra-die variations will typically have a large number of variables as
each device may require a variable. As a result, statistical methods must be used
as the corner-based method will be too expensive in this case. Intra-die variation
can be further classified into wafer-level variation, layout-dependent variation,
and statistical variations [170] based on the sources of the variations. Wafer-level
variation comes from lens aberration effect. Layout-dependent variation is caused
by lithographic and etching processes such as CMP and OPC and phase-shift
masks (PSM). CMP may lead to variations in dimensions called dishing and oxide
erosion. Figure 1.3 gives a cartoon illustration of the dishing and oxide erosion after
the CMP process.
Optical proximity effects are layout dependent and will lead to different critical
dimension (CD) variations depending on the neighboring layout of a pattern. Those
layout-dependent variations typically are spatially correlated (they also have purely
random components). Statistical variations come from random dopant variations,
whose impacts are not significant in the past and become more visible as CD scales
down. Those variations are purely random and not spatially correlated. However,
their impact on performance tends to be limited due to averaging effect in general.
In summary, we can model all the components of variation as follows:
ıtotal D ıinter C ıintra ; (1.2)
where ıinter and ıintra represent the inter-die variation and intra-die variation,
respectively. In some works such as in [13,95,170], ıinter and ıintra are both modeled
as Gaussian random variables. In general, we will consider both the Gaussian and
non-Gaussian cases.
For layout-dependent ıintra , the value of parameter p located at .x; y/ can be
modeled as a location-dependent normally distributed random variable [101]:
p D p C ıx C ıy C ; (1.3)
8 1 Introduction
where p is the mean value (nominal design parameter value) at .0; 0/, and ıx
and ıy stand for the gradients of the parameter indicating the spatial variations
of p along the x and y directions, respectively. represents the random intra-
chip variation. Due to spatial correlations in the intra-die variation [195], the
vector of all random components across the chip has a correlated multivariate
normal distribution, N.0; †/, where † is the covariance matrix of the spatially
correlated parameters. If the covariance matrix is identity matrix, then there is no
correlation among the variables.
1.3 Process Variation Impacts
In this section, we discuss the impact of the variations on the performance of a

circuit. We have discussed different variations and their sources in the previous
sections. It was shown that variations in device channel length have the largest
impacts on the device and circuit performances [151,170]. Channel length variations
consist of both inter-die and intra-die variations and have spatially correlated
components and purely random components. Channel length directly affects the
leakage current, the driving strength of a device.
It was well accepted that process variations have huge impacts on circuit timing,
power, yield, and reliability, and many studies have been done to assess their
impacts in the past decade. In 2003, Borkar from Intel Corporation showed in a
famous figure that the leakage current variations can be 20 with 1.3 variation
in timing [8]. As a result, leakage analysis and estimation have been intensively
studied recently. Furthermore, our recent study shows the total chip power variations
can be significant as glitch-related variation and other variation impacts on dynamic
power can be significant [60, 62]. Figure 1.4 shows the comparison of the circuit
total power distribution of c432 from ISCAS’85 benchmark. There are two power
variations. The first figure (upper) is obtained due to random input vectors. The
second is obtained using a fixed input vector but under process variations with
spatial correlation. As can be seen, the variance induced by process variations is
comparable with the variance induced by random input vectors, which is quite
significant.
In this book, we will have detailed studies to assess the impacts of process
variations on full-chip powers (leakage, dynamic, and total powers), interconnects
and their delays, voltage drops on power distribution networks, analog circuit
performances, and yields in the following chapters.
2 Book Outline
The book will present the latest developments for modeling and analysis of VLSI
systems in the presence of process variations at the nanometer scale. The authors
make no attempt to be comprehensive on the selected topics. Instead, we want to
2 Book Outline 9
Power distribution with random input vectors

300
Occurances 200
100
0
2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4
W −4
x 10
Power distribution with a fixed input vector and correlations in Leff
300
Occurances
200
100
0
3.5 3.6 3.7 3.8 3.9 4 4.1 4.2
W x 10−4
Fig. 1.4 The comparison of circuit total power distribution of circuit c432 in ISCAS’85 bench-
mark sets (top) under random input vectors (with 0.5 input signal and transition probabilities) and
(bottom) under a fixed input vector with effective channel length spatial correlations. Reprinted
with permission from [62] c 2011 IEEE
provide some promising perspectives from the angle of new analysis algorithms
to solve the existing problems with reduced design cycle and cost. We hope this
book can guide chip designers for understanding the potential and limitations of
the existing design tools when improving their circuit design productivity, CAD
developers for implementing the state-of-the-art techniques in their tools, CAD
researchers for developing better and new generation algorithms, and students for
understanding and mastering the emerging needs in the research.
The book consists of five parts. Part I starts with the review of many fundamental
statistical and stochastic mathematic concepts, illustrated in Chap. 2. We discuss
random processes, correlation matrices, and Monte Carlo (MC) method. We also re-
view orthogonal polynomial chaos (PC) and the related spectral stochastic method,
and principal factor analysis (PFA) and their variants for variable reductions.
2.1 Statistical Full-Chip Power Analysis
Part II of this book focuses on the techniques for statistical full-chip power
consumption analysis considering process variations. We will look at important
aspects of statistical power analysis such as leakage powers, dynamic powers, and
total power estimation techniques in different chapters.
10 1 Introduction
Chapter 3 gives the overall review of statistical leakage analysis problem

considering process variations with spatial correlations. The chapter discusses the
existing approaches and presents the pros and cons of those methods.
Chapter 4 presents a method for analyzing the full-chip leakage current distri-
butions. The method considers both intra-die and inter-die variations with spatial
correlations. The presented method employs the spectral stochastic method and
multidimensional Gaussian quadrature method to represent and compute variational
leakage at the gate level and uses the orthogonal decomposition to reduce the
number of random variables by exploiting the strong spatial correlations of intra-
die variations.
Chapter 5 gives a linear-time algorithm for full-chip statistical analysis of leakage
powers in the presence of general spatial correlation (strong or weak). The presented
algorithm adopts a set of uncorrelated virtual variables over grid cells to represent
the original physical random variables with spatial correlation, and the size of grid
cell is determined by the correlation length. A look-up table (LUT) is further applied
to cache the statistical leakage information of each type of gate in the library to avoid
computing leakage for each gate instance. As a result, the full-chip leakage can be
calculated with O.N / time complexity, where N is the number of grid cells on chip.
Chapter 6 proposes a statistical dynamic power estimation method considering
the spatial correlation in process variation. The chapter first shows that channel
length variations have significant impacts on the dynamic power of a gate. Like
leakage analysis, the virtual grid-based modeling is applied here to consider the
spatial correlations among gates. The segment-based statistical power method has
been used to deal with impacts of the glitch variations on dynamic powers. The
orthogonal polynomials of a statistical gate power are computed based on switching
segment probabilities. The total full-chip dynamic power expressions are then
computed by summing up resulting orthogonal polynomials (their coefficients).
Chapter 7 introduces an efficient statistical chip-level total power estimation
method considering process variations with spatial correlation. The new method
computes the total power via circuit-level simulation under realistic input testing
vectors. To consider the process variations with spatial correlation, the PFA
method is applied to transform the correlated variables into uncorrelated ones and
meanwhile reduce the number of resulting random variables. Afterward, Hermite
polynomials and sparse grid techniques are used to estimate total power distribution
in a sampling way.
2.2 Variational On-Chip Power Delivery Network Analysis
Part III of the book deals with variational analysis of on-chip power grid (distribu-
tion) networks to assess the impacts of process variations on voltage drop noises and
power delivery integrity. We have three chapters in the part: Chaps. 8–10.
Chapter 8 introduces an efficient stochastic method for analyzing the voltage
drop variations of on-chip power grid networks, considering log-normal leakage
2 Book Outline 11
current variations with spatial correlation. The new analysis is based on the OPC
representation of random processes. This method considers both wire variations
and subthreshold leakage current variations, which are modeled as log-normal
distribution random variables, on the power grid voltage variations. To consider the
spatial correlation, the orthogonal decomposition is carried to map the correlated
random variables into independent variables.
Chapter 9 presents another stochastic method for solving the similar problems
presented in Chap. 8. The new method, called StoEKS, still applies Hermite orthog-
onal polynomial to represent the random variables in both power grid networks
and input leakage currents. But different from the other orthogonal polynomial-
based stochastic simulation method, extended Krylov subspace (EKS) method is
employed to compute variational responses from the augmented matrices consisting
of the coefficients of Hermite polynomials. The new contributions of this method
lie in the acceleration of the spectral stochastic method using the EKS method to
fast solve the variational circuit equations. By using the reduction technique, the
presented method partially mitigates increased circuit-size problem associated with
the augmented matrices from the Galerkin-based spectral stochastic method.
Chapter 10 gives a new approach to variational power grid analysis. The new
approach, called ETBR for extended truncated balanced realization, is based on
model order reduction techniques to reduce the circuit matrices before the simula-
tion. Different from the (improved) extended Krylov subspace methods EKS/IEKS,
ETBR performs fast truncated balanced realization on response Gramian to reduce
the original system. ETBR also avoids the adverse explicit moment representation
of the input signals. Instead, it uses spectrum representation in frequency domain
for input signals by fast Fourier transformation.
The new algorithm is very efficient and scalable for huge networks with a large
number of variational variables. This approach, called varETBR for variational
ETBR, is based on model order reduction techniques to reduce the circuit matrices
before the variational simulation. It performs the parameterized reduction on the
original system using variation-bearing subspaces. varETBR calculates variational
response Gramians by MC-based numerical integration considering both system
and input source variations for generating the projection subspace. varETBR is very
scalable considering number of variables, and is flexible for different variational
distributions and ranges as demonstrated in experimental results. After the reduc-
tion, MC-based statistical simulation is performed on the reduced system, and the
statistical responses of the original system are obtained thereafter.
2.3 Statistical Interconnect Modeling and Extraction
Part IV of this book is concerned with statistical interconnect extraction and

modeling due to process variations. There are three chapters: Chaps. 11–13.
12 1 Introduction
Chapter 11 introduces a statistical capacitance extraction method for interconnect

conductors considering process variations. The new method is called StatCap, where
orthogonal polynomials are used to represent the statistical processes. The chapter
shows how the variational potential coefficient matrix is represented in a first-order
form using Taylor expansion and orthogonal decomposition. Then an augmented
potential coefficient matrix, which consists of the coefficients of the polynomials,
is derived. After that, corresponding augmented system is solved to obtain the
variational capacitance values in the orthogonal polynomial form. Chapter 11
further extends StatCap to the second-order form to give more accurate results
without loss of efficiency compared to the linear models.
Chapter 12 presents a parallel and incremental solver for stochastic capacitance
extraction. Our overall extraction flow is called piCAP. The random geometrical
variation is described by stochastic geometrical moments (SGMs), which leads to
a densely augmented system equation. To efficiently extract the capacitance and
solve the system equation, a parallel fast multipole method (FMM) is derived
in the framework of stochastic GMs. This can efficiently estimate the stochastic
potential interaction and its matrix-vector product (MVP) with charge. Moreover,
a generalized minimal residual method with incremental update is developed to
calculate both the nominal value and the variance.
Chapter 13 presents a method for statistical inductance extraction and modeling
for interconnects considering process variations. The new method, called statHenry,
is based on the collocation-based spectral stochastic method. The coefficients of the
partial inductance orthogonal polynomial are computed via the collocation method
where a fast multidimensional Gaussian quadrature method is applied with sparse
grids. To further improve the efficiency of the presented method, a random variable
reduction scheme is used. Given the interconnect wire variation parameters, the
resulting method can derive the parameterized closed form of the inductance value.
The chapter will show that both partial and loop inductance variations can be
significant given the width and height variations. The presented approach can work
with any existing inductance extraction tool to extract the variational partial and
loop inductance or impedance.
2.4 Statistical Analog and Yield Analysis and Optimization
In Part V of this book, we discuss the variational analysis of analog and mixed-
signal circuits as well as the yield analysis and optimization methods based on
statistical performance analysis and modeling. We will present the performance
bound analysis technique in s-domain for linearized analog circuits (Chap. 14) and
the stochastic mismatch analysis of analog circuits (Chap. 15). Chapter 16 shows a
yield analysis and optimization technique, and Chap. 17, binning scheme.
Chapter 14 introduces a performance bound analysis of analog circuits consid-
ering process variations. The presented method applies a graph-based symbolic
3 Summary 13
analysis and affine interval arithmetic to derive the variational transfer functions
of analog circuits (linearized) with variational coefficients in forms of intervals.
Then the frequency response bounds (maximum and minimum) are obtained
by performing analysis of a finite number of transfer functions given by the
control-theoretic Kharitonov’s polynomial functions, which can be computed very
efficiently. We also show in this chapter that the response bounds given by the
Kharitonov’s functions are conservative given the correlations among coefficient
intervals in transfer functions.
Chapter 15 discusses a fast non-Monte Carlo (NMC) method to calculate mis-
match of analog circuits in time domain. The local random mismatch is described
by a noise source with an explicit dependence on geometric parameters and is
further expanded by OPC. The resulting equation forms a stochastic differential
algebra equation (SDAE). To deal with large-scale problems, the SDAE is linearized
at a number of snapshots along the nominal transient trajectory and, hence, is
naturally embedded into a trajectory-piecewise-linear (TPWL) macromodeling. The
modeling is further improved with a novel incremental aggregation of subspaces
identified at those snapshots.
Chapter 16 introduces a fast NMC method to capture physical-level stochastic
variations for system-level yield estimation and optimization. Based on the or-
thogonal PC expansion concept, an efficient and true NMC mismatch analysis is
developed to estimate the parametric yield. Moreover, this work further derives the
stochastic sensitivity for yield within the framework of orthogonal polynomials.
Using sensitivities, a corresponding multiobjective optimization is developed to
improve the yield rate and other performance merits, simultaneously. As a result, the
presented approach can automatically tune design parameters for a robust design.
Chapter 17 gives a yield optimization technique using voltage binning method
to improve yield of chips. Voltage binning technique tries to assign different supply
voltages to different chips in order to improve the yield. The chapter will introduce
the valid voltage segment concept, which is determined by the timing and power
constraints of chips. Then we show a formulation to predict the maximum number
of bins required under the uniform binning scheme from the distribution of length
of valid supply voltage segment. With this concept, an optimal binning scheme can
be modeled as a set-cover problem. A greedy algorithm is developed to solve the
resulting set-cover problem in an incremental way. The presented method is also
extendable to deal with the ranged supply voltages for dynamic voltage scaling
under different operation modes (like low power and high-performance modes).
3 Summary
In this chapter, we first describe the motivations for the statistical and variational
analysis and modeling of nanometer VLSI systems. We then briefly introduce
all the chapters in the book, which are divided into five parts: introduction and
fundamental, statistical full-chip power analysis, variational power delivery network
14 1 Introduction
analysis, statistical interconnect extraction and modeling, and performance bound

and statistical analysis for analog/mixed-signal circuits as well as statistical yield
analysis and optimization respectively.
Throughout the book, numerical examples are provided to shed light on the
discussed topics and to help the reader gain more insights into the discussed
methods. Our treatment of those topics does not mean to be comprehensive, but we
hope it can guide circuit designers and CAD developers to understand the important
impacts of variability and reliability on nanometer chips and limitations of their
existing tools. We hope this book helps readers to apply those techniques and to
develop new-generation CAD tools to design emerging nanometer VLSI systems.
Chapter 2
Fundamentals of Statistical Analysis
To make this book self-contained, this chapter will review relevant mathematical
concepts used in this book. We first review basic probability and statistical concepts
used in this book. Then we introduce mathematic notations for statistical processes
with multiple variable and variable reduction methods. We will then go through
some statistical analysis approaches such as the MC method and the spectral
stochastic method. Finally, we will discuss some fast techniques to compute some
of random variables with log-normal distributions.
1 Basic Concepts in Probability Theory
An understanding of probability theory is essential to statistical analysis. In this

section, we will explain some basic concepts in probability theory [132] first. More
details and other stochastic theories can be found in [132].
1.1 Experiment, Sample Space, and Event
Definition 2.1. A experiment is any process of observation or procedure that can

be repeated (theoretically) an infinite number of times and has a well-defined set of
possible outcomes.
Definition 2.2. A sample space is the set of all possible outcomes of an experiment.
Definition 2.3. An event is a subset of the sample space of an experiment.
Consider the following experiments as examples:
Example 1. Tossing a coin.
Sample space: S D fhead or tailg or
S D f0, 1g, where 0 represents a tail and 1 represents a head.
16 2 Fundamentals of Statistical Analysis
1.2 Random Variable and Expectation
Usually, we are interested in some value associated with a random event rather than
the event itself. For example, in the experiment of tossing two dice, we only care
about the sum of the two dice, not the outcome of each die.
Definition 2.4. A random variable X on a sample space S is a real-valued function
X W S ! R.
Definition 2.5. A discrete random variable is a random variable that takes only a
finite or countably infinite number of values (arises from counting).
Definition 2.6. A continuous random variable is a random variable whose set of
assumed values is uncountable (arises from measurement).
Let X be a random variable and let a 2 R. The event “X D a” represents the set
fs 2 S j X.s/ D ag and the probability of this event is written as
X
Pr.X D a/ D Pr.s/:
s2S WX.s/Da
Example 2. Continuous random variable. A CPU is picked randomly from a group

of CPUs whose area should be 1 cm2 . Due to some error in the manufacture process,
the area of a chip could vary from chip to chip in the range 0.9 cm2 to 1.05 cm2 ,
excluding the latter.
Let X denote the area of a selected chip. Possible outcomes: 0:9 X < 1:05:
Example 3. Refer to the previous example. The area of a selected chip is a
continuous random variable. The following table gives the area in cm2 of 100 chips.
It lists the observed values of the continuous random variable, the corresponding
frequencies, and their probabilities.
Area X Number of chips Pr.a X < b/

0.90–0.95 8 0.08
0.95–1.00 57 0.57
1.00–1.05 35 0.35
Total 100 1.00
Definition 2.7. The expectation EŒX , or , of a discrete random variable X is

X
EŒX D D i Pr.X D i /;
i P
where the sum is taken over all values in the range of X . If i ji j Pr.X D i /
converges, then the expectation is finite. Otherwise, the expectation is said to
be unbounded.
E.X / is also called the mean value of the probability distribution.
1 Basic Concepts in Probability Theory 17
1.3 Variance and Moments of Random Variable
Theorem 2.1. Markov’s inequality. For a random variable X that takes on only
nonnegative values and for all a > 0, we have
EŒX
Pr.X a/ :
a
Proof. Let X be a random variable such that X 0 and let a > 0. Define a random
variable I by (
1; if X a,
I D
0; otherwise,
where EŒI D Pr.I D 1/ D Pr.X a/ and
X
I : (2.1)
a
The expectations of both sides of (2.1) are given by the inequality

X EŒX
EŒI D Pr.X a/ E D ;
a a
where we used Lemma 2.3. t

u
k
Definition 2.8. The kth moment of a random variable X is EŒX . The variance of
X is

VarŒX D E .X EŒX /2

D E X 2 2X EŒX C .EŒX /2
D EŒX 2 2 EŒX EŒX C .EŒX /2
D EŒX 2 .EŒX /2 ;
and the standard deviation of X is defined as

p
.X / D VarŒX :
The area under each curve is 1.
Theorem 2.2. Chebyshev’s inequality. For any a > 0 and a random variable X ,
we have
VarŒX
Pr .jX EŒX j a/ :
a2
Proof. Note that

Pr .jX EŒX j a/ D Pr .X EŒX /2 a2
and the random variable .X EŒX /2 > 0. Use Markov’s inequality and the
definition of variance to obtain

2 2
E .X EŒX /2 VarŒX
Pr .X EŒX / a D
a2 a2
as required. t
u
Corollary 2.1. For any t > 1 and a random variable X , we have
1
Pr jX EŒX j t .X / 2
t
VarŒX
Pr jX EŒX j t EŒX 2 :
t .EŒX /2
Proof. The results follow from the definitions of variance and standard deviation
and Chebyshev’s inequality. t
u
1.4 Distribution Functions
Definition 2.9. A discrete probability distribution is a table (or a formula) listing

all possible values that a discrete variable can take on, together with the associated
probabilities.
Definition 2.10. The function f .x/ is called a probability density function (PDF)
for the continuous random variable X , if
Z b
f .x/dx D Pr.a X b/ (2.2)
a
for any values of a and b.

That is to say, the area under the curve of f .x/ between any two ordinates x D a
and x D b is the probability that X lies between a and b.
It is easy to see that the total area under the PDF curve bounded by the x-axis is
equal to 1:
Z 1
f .x/dx D 1: (2.3)
1
Definition 2.11. For a real-value random variable X , the probability distribution is

completely characterized by its cumulative distribution function (CDF):
Z x
F .x/ D f .t/dt D PrŒX x; x 2 R; (2.4)
1
which describes probabilities for a random variable to fall in the intervals of

.1; x.
1.5 Gaussian and Log-Normal Distributions
Definition 2.12. A Gaussian distribution (also called normal distribution) is

denoted as N .; 2 /, where, as usual, identifies the mean and 2 the variance.
The PDF is defined as follows:
.x /2
2 1
f .xI ; / D p e 2 2 : (2.5)
2 2
The CDF of the standard normal distribution is denoted with ˚.x/ and can be
computed as an integral of the PDF:
Z x
1 t 2 =2 1 x
˚.x/ D p e dt D 1 C erf p ; x 2 R; (2.6)
2 1 2 2
where erf is the complementary error function.
Definition 2.13. If X is distributed normally with mean and variance 2 , then
the exponential of X Y D exp.X / follows log-normal distribution. That is to say,
a log-normal distribution is a probability distribution of a random variable whose
logarithm is normally distributed.
The PDF and CDF of a log-normal distribution are as follows:
1 .lnx/2
f .xI ; / D p e ; x > 0;
2 2 (2.7)
x 2

1 lnx lnx
FX .xI ; / D erf p D˚ : (2.8)
2 2
More details about the sum of multiple log-normal distribution is given in Sect. 4
of Chap. 2.
1.6 Basic Concepts for Multiple Random Variables
Definition 2.14. Two random variables X and Y are independent if
Pr ..X D x/ \ .Y D y// D Pr.X D x/ Pr.Y D y/
for all x; y 2 R. Furthermore, the random variables X1 ; X2 ; : : : ; Xk are mutually

independent if for any subset I f1; 2; : : : ; kg and any values xi for i 2 I , we
have !
\ Y
Pr Xi D xi D Pr.Xi D xi /:
i 2I i 2I
Theorem 2.3. Linearity of expectations. Let X1 ; X2 ; : : : ; Xn be a finite collection

of discrete random variables with finite expectations. Then
" #
X X
E Xi D EŒXi :
i i
Proof. We use induction on the number of random variables. For the base case, let
X and Y be random variables. Use the law of total probability to get
XX
EŒX C Y D .i C j / Pr ..X D i / \ .Y D j //
i j
XX
D i Pr ..X D i / \ .Y D j //
i j
XX
C j Pr ..X D i / \ .Y D j //
i j
X X
D i Pr ..X D i / \ .Y D j //
i j
X X
C j Pr ..X D i / \ .Y D j //
j i
X X
D i Pr.X D i / C j Pr.Y D j /
i j
D EŒX C EŒY :
t
u
Linearity of expectations holds for anyP
collection of random variables, even if
they are not independent. Furthermore, if 1 i D1 E ŒjXi j converges, then it can be
shown that
" 1
# 1
X X
E Xi D E ŒXi :
i D1 i D1
Lemma 2.1. Let c be any constant and X a random variable. Then
EŒcX D c EŒX :
Proof. The case c D 0 is trivial. Suppose c ¤ 0. Then

X
EŒcX D i Pr.cX D i /
i
X
Dc .i=c/ Pr.X D i=c/
i
X
Dc k Pr.X D k/
k
D c EŒX
as required. t
u
If X and Y are two random variables, their covariance is
Cov.X; Y / D E Œ.X EŒX /.Y EŒY /
D E Œ.Y EŒY /.X EŒX /
D Cov.Y; X /:
Theorem 2.4. For any two random variables X and Y , we have
VarŒX C Y D VarŒX C VarŒY C 2 Cov.X; Y /:
Proof. Use the linearity of expectations, and the definitions of variance and
covariance, to obtain

VarŒX C Y D E .X C Y EŒX C Y /2

D E .X C Y EŒX EŒY /2

D E .X EŒX /2 C .Y EŒY /2

C 2.X EŒX /.Y EŒY /

D E .X EŒX /2 C E .Y EŒY /2
C 2 E Œ.X EŒX /.Y EŒY /
D VarŒX C VarŒY C 2 Cov.X; Y /
as required. t
u
Theorem 2.4 can be extended to a sum of any finite number of random variables.
For a collection X1 ; : : : ; Xn of random variables, it can be shown that
" #
X X XX
Var Xi D VarŒXi C 2 Cov.Xi ; Xj /:
i i i j >i
Theorem 2.5. For any two independent random variables X and Y , we have
EŒX Y D EŒX EŒY :
Proof. Let the indices i and j assume all values in the ranges of X and Y ,
respectively. As X and Y are independent random variables, then
XX
EŒX Y D ij Pr ..X D i / \ .Y D j //
i j
XX
D ij Pr.X D i / Pr.Y D j /
i j
" #2 3
X X
D i Pr.X D i / 4 j Pr.Y D j /5
i j
D EŒX EŒY
as required. t
u
Corollary 2.2. For any independent random variables X and Y , we have
Cov.X; Y / D 0
and
VarŒX C Y D VarŒX C VarŒY :
Proof. As X and Y are independent, then so are X EŒX and Y EŒY . For any
random variable Z, we have
E ŒZ EŒZ D EŒZ E ŒEŒZ D 0:
Using Theorem 2.5, the covariance of X and Y is
Cov.X; Y / D E Œ.X EŒX /.Y EŒY /

D E Œ.X EŒX / E Œ.Y EŒY /
D 0:
2 Multiple Random Variables and Variable Reduction 23
Conclude via the latter equation and Theorem 2.4 that
VarŒX C Y D VarŒX C VarŒY C 2 Cov.X; Y /

D VarŒX C VarŒY
as required. t
u
Definition 2.15. For a collection of random variables, X D X1 ; : : : ; Xn , the
covariance matrix ˝nn is defined as
0 1
Var.X1 / Cov.X1 ; X2 / : : : Cov.X1 ; Xn /
B C
B Cov.X2 ; X1 / Var.X1 / : : : Cov.X2 ; Xn / C
B C
B : :: :: C
˝DB :: : ::: : C
B C
B C
@ Cov.Xn1 ; X1 / Cov.Xn1 ; X2 / : : : Cov.Xn1 ; Xn / A
Cov.Xn ; X1 / Cov.Xn ; X2 / ::: Var.Xn /
When X1 ; : : : ; Xn are mutually independent random variables, it can be shown

by induction that
" #
X X
Var Xi D VarŒXi :
i i
And the covariance matrix is a diagonal matrix in this case.
2 Multiple Random Variables and Variable Reduction
2.1 Components of Covariance in Process Variation
In general, process variation can be classified into two categories [13]: inter-die
and intra-die. Inter-die variations are variations from die to die, while intra-die
variations correspond to variability within a single chip. Inter-die variations are
global variables and, hence, affect all the devices on a chip in the similar fashion. For
example, it can cause channel lengths of all the devices on the same chip smaller.
Intra-die variations may affect devices differently on the same chip. For example, it
can cause some devices with smaller gate oxide thicknesses and others with larger
gate oxide thicknesses. The intra-die variations may exhibit spatial correlation. For
example, it is more likely for devices located close to each other to have similar
characteristics.
Fig. 2.1 Grid-based model

for spatial correlations
Gate1
Gate3
Gate5
Gate2
Gate4
In general, we can model parameter variation as follows,
ıtotal D ıinter C ıintra ; (2.9)
where ıinter and ıintra represent the inter-die variation and intra-die variation,
respectively. In general [13, 95, 169], ıinter and ıintra can be modeled as Gaussian
random variables with normal distribution. In this chapter, we will discuss both
Gaussian and non-Gaussian cases. Note that due to global effect of inter-die
variation, single random variable ıinter is used for all gates/grids in one chip.
For ıintra , the value of parameter p located at .x; y/ can be modeled as normally
distributed random variable [101] dependent on location:
p D p C ıx C ıy C ; (2.10)
where p is the mean value (nominal design parameter value) at .0; 0/ and ıx and
ıy stand for gradients of the parameter indicating the spatial variations of p along x
and y directions, respectively. represents the random intra-chip variation. Due to
spatial correlations in the intra-chip variation, the vector of all random components
across the chip has a correlated multivariate normal distribution, N.0; †/,
where † is the covariance matrix of the spatially correlated parameters.
A grid-based method is introduced by [13] for the consideration of correlation. In
the grid-based
p method,
p the intra-die spatial correlation of parameters is partitioned
into n row n col D n grids. Since devices close to each other are more
likely to have similar characteristics than those placed far away, grid-based methods
assume a perfect correlation among the devices in the same grid, high correlations
among those in close grids, and low to zero correlations in faraway grids. For
example, in Fig. 2.1, Gate1 and Gate2 have sizes shown to be exaggeratedly large.
They are located in the same grid square, and hence, their parameter variations such
as the variations of their gate channel length are assumed to be always identical.
Gate1 and Gate3 lie in neighboring grids, and hence, their parameter variations
are not identical but highly correlated due to their spatial proximity. For example,
when Gate1 has a larger than nominal gate channel length, Gate3 is more likely
to have a larger than nominal gate channel length. On the other hand, Gate1 and
Gate4 are far away from each other; their parameters can be assumed as weakly
correlated or uncorrelated. For example, when Gate1 has a larger than nominal gate
channel length, the gate channel length for Gate4 may be either larger or smaller
than nominal.
With the grid-based model, we can use a single random variable p.x; y/ to model
a parameter variation in a single grid at location .x; y/. As a result, n random
variables are needed for each type of parameter, where each represents the value of
a parameter in one of the n grids. In addition, we assume that correlation only exists
among the same type of parameters in different grids. Note that this assumption is
not critical and can easily be removed. For example, gate length L for transistors
in the i th grid is correlated with those in nearby grids, but is uncorrelated with
other parameters such as gate oxide thickness Tox in any grid including the i th grid
itself. For each type of parameter, a correlation matrix † of size n n represents the
spatial correlation of this parameter. Notice that the number of grid partitions needed
is determined by the process, not the circuit. So we can apply the same correlation
model to different designs under the same process.
2.2 Random Variable Decoupling and Reduction
Due to correlation, a large number of random variables involved in VLSI design can
be reduced. After the random variable decoupling via correlation, one may further
reduce the cost of statistical analysis by the spectral stochastic method as discussed
in Sect. 3. Since the random variables are correlated, this correlation should be
removed before using the spectral stochastic method. In this part, we first present
the theoretical basis for decoupling the correlation of random variables.
Proposition 2.1. For a set of zero-mean Gaussian-distributed variables whose
covariance matrix is ˝, if there is a matrix L satisfying ˝ D LLT , then can
be represented by a set of independent standard normal distributed variables as
D L.
Proof. According to the characteristics of normal distribution, linear transformation
does not impact on the zero mean of the variables and yield another normal
distribution. Thus, we only need to prove the covariance matrix remains unchanged
during the transformation. According to the definition of covariance,

cov.L/ D E L.L/T D LE T LT : (2.11)
Since is subject to standard normal distribution,

LE T LT D LLT D n: (2.12)
2.3 Principle Factor Analysis Technique
Note that the solution for decoupling is not unique. For example, Cholesky
decomposition can be used to seek L since the covariance matrix ˝ is always a
semipositive definite matrix. However, Cholesky decomposition cannot reduce the
number of variables. PFA [74] can substitute Cholesky decomposition when variable
reduction is needed. Eigendecomposition on the covariance matrix yields

p p
˝ D LLT ; L D 1 e1 ; :::; n en ; (2.13)
where f i g are eigenvalues in order of descending magnitude, and fei g are corre-
sponding eigenvectors. PFA reduces the number of components in by truncating
L using the first k items.
The error of PFA can be controlled by k:
P
n
i
i DkC1
err D ; (2.14)
P
n
i
i D1
where bigger k leads to a more accurate result. PFA is efficient, especially when the
correlation length is large. In our experiments, we set the correlation length being
eight times the width of wires. As a result, PFA can reduce the number of variables
from 40 to 14 with an error of about 1% in an example with 20 parallel wires.
2.4 Weighted PFA Technique
One idea is to consider the importance of the outputs during the reduction process
when using PFA. Recently, the weighted PFA (wPFA) technique has been used [204]
to obtain variable reduction efficiency.
If a weight is defined for each physical variable i , to reflect its impact on the
output, then a set of new variables are formed:
D W ; (2.15)
where W D diag.w1 ; w2 ; :::; wn / is a diagonal matrix of weights. As a result,

the covariance matrix of , ˝. / now contains the weight information, and
performing PFA on ˝. / leads to the weighted variable reduction. Specifically,
we have
˝. / D E W .W /T D W ˝./W T ; (2.16)
and denote its eigenvalues and eigenvectors by i and ei . Then, the variables
can be approximated by the linear combination of a set of independent dominant
variables
:
Xk q
D W 1 W 1 i ei
i : (2.17)
i D1
The error controlling process is similar to (2.14) but uses the weighted eigenval-
ues i .
2.5 Principal Component Analysis Technique
We first briefly review the concept of principal component analysis (PCA), which is
used here to transform the random variables with correlation to uncorrelated random
variables [75].
Suppose that x is a vector of n random variables, x D Œx1 ; x2 ; :::; xn T , with
covariance matrix ˝ and mean vector x D Œx1 ; x2 ; :::; xn . To find the
orthogonal random variables, we first calculate the eigenvalue and corresponding
eigenvector. Then, by ordering the eigenvectors in descending order eigenvalues,
the orthogonal matrix A will be obtained. Here, A is expressed as
T
A D e1T ; e2T ; :::; enT ; (2.18)
where ei is the corresponding eigenvector to eigenvalue i , which satisfies
i ei D ˝ei ; i D 1; 2; :::; n; (2.19)
and
i < i 1 ; i D 2; 3; :::; n: (2.20)
With A, we can perform the transformation to get orthogonal random variables y,
y D Œy1 ; y2 ; :::; yn T by using
y D A.x x /; (2.21)
where yi is a random variablepwith Gaussian distribution. The mean, yi , is 0 and

the standard deviation, yi , is i on the condition that [75]
eiT ei D 1; i D 1; 2; :::; n: (2.22)
Here, because of the orthogonal property of matrix A,
A1 D AT : (2.23)
To reconstruct the original random variables, we use the following equation:
x D AT y C x : (2.24)
3 Statistical Analysis Approaches
3.1 Monte Carlo Method
Monte Carlo techniques [41] are usually used to estimate the value of a definite,
finite-dimensional integral of the form
Z
GD g.X /f .X /dX; (2.25)
S
R S is a finite domain and f .X / is a PDF over X , i.e., f .X / 0 for all X

where
and S f .X /dX D 1. We can accomplish the MC estimation for the value of G by
drawing a set of independent samples X1 ; X2 ; :::; XM C from f .X / and by applying
X
MC
GM C D .1=M C / g.Xi /: (2.26)
i D1
The estimator GM C above is a random variable. Its mean value is the integral G
to estimate, i.e., E.GM C / D G, making it an unbiased estimator. The variance of
GM C is Var.GM C / D 2 =M C , where 2 is the variance of the random variable
g.X / given by
Z
D g 2 .X /f .X /dX G 2 :
2
(2.27)
S
3 Statistical Analysis Approaches 29
We can use the standard deviation of GM C to assess its accuracy in estimating G.

If the sample number M C is sufficiently large, then by the Central Limit Theorem,
GMpC G
has an approximate standard normal distribution (N.0; 1/). Hence,
= M C

P G 1:96 p GM C G C 1:96 p 0:95; (2.28)
MC MC
where Phis the probability measure. Equation

i (2.28) shows that GM C will be in the
p
interval G 1:96 pM C
; G C 1:96 MC
with 95% confidence. Thus, one can use
the error measure
2
jErrorj p (2.29)
MC
in order to assess the accuracy of the estimator.
3.2 Spectral Stochastic Method Using Stochastic Orthogonal

Polynomial Chaos
One recent advance in fast statistical analysis is to apply stochastic OPC [187] to the
nanometer-scale integrated circuit analysis. Based on the Askey scheme [196], any
stochastic random variable can be represented by OPC, and the random variable
with different probability distribution type is associated with different types of
orthogonal polynomials.
Hermite polynomial chaos (Hermite PC or HPC) utilizes a series of orthogonal
polynomials (with respect to the Gaussian distribution) to facilitate stochastic
analysis [197]. These polynomials are used as the orthogonal base to decompose
a random process in a similar way that sine and cosine functions are used to
decompose a periodic signal in a Fourier series expansion. Note that for the
Gaussian and log-normal distributions, Hermite polynomial is the best choice as
they lead to exponential convergence rate [45]. For non-Gaussian and non-log-
normal distributions, there are other orthogonal polynomials such as Legendre for
uniform distribution, Charlier for Poisson distribution, and Krawtchouk for binomial
distribution [44, 187].
For a random variable y./ with limited variance, where D Œ1 ; 2 ; :::n is
a vector of zero-mean orthogonal Gaussian random variables, the random variable
can be approximated by truncated Hermite PC expansion as follows [45]:
X
P
y./ D ak Hkn ./; (2.30)
kD0
where n is the number of independent random variables, Hkn ./ is n-dimensional

Hermite polynomials, and ak are the deterministic coefficients. The number of terms
P is given by
X p
.n 1 C k/Š
P D ; (2.31)
kŠ.n 1/Š
kD0
where p is the order of the Hermite PC.

Similarly, a random process v.t; / with limited variance can be approximated as
X
P
v.t; / D ak Hkn ./: (2.32)
kD0
If only one random variable/process is considered, the one-dimensional Hermite

polynomials are expressed as follows:
H01 ./ D 1; H11 ./ D ; H21 ./ D 2 1; H31 ./ D 3 3; ::: : (2.33)
Hermite polynomials are orthogonal with respect to Gaussian weighted expectation

(the superscript n is dropped for simple notation):
hHi ./; Hj ./i D hHi2 ./iıij ; (2.34)
where ıij is the Kronecker delta and h

;
i denotes an inner product defined as
follow: Z
1 1 T
hf ./; g./i D p f ./g./e 2 d: (2.35)
.2/n
Similar to Fourier series, the coefficient ak for random variable y and ak .t/ for
random process v.t/ can be found by a projection operation onto the HPC basis:
hy./; Hk ./i
ak D ; (2.36)
hHk2 ./i
hv.t; /; Hk ./i
ak .t/ D ; 8k 2 f0; :::; P g: (2.37)
hHk2 ./i
Once we obtain the Hermite PC, we can calculate the mean and variance of
random variable y./ by one-time analysis as (one Gaussian variable case):
E.y.// D y0

Var.y.// D y12 Var.1 / C y22 .t/Var 12 1
D y12 C 2y22 : (2.38)
3 Statistical Analysis Approaches 31
Similarly, for random process v.t; / (one Gaussian variable case), the mean and
variance are as follows:
E.v.t; // D v0 .t/

Var.v.t; // D v21 .t/Var.1 / C v22 .t/Var 12 1
D v21 .t/ C 2v22 .t/: (2.39)
One critical problem remains so far is how to obtain the coefficients of Hermite
PC in (2.36) and (2.37) efficiently. There are two kinds of techniques to calculate
the coefficients of Hermite PC in (2.36) and (2.37), which are collocation-based
spectral stochastic method and Galerkin-based spectral stochastic method. In short,
we classify in the later part of the book as collocation-based and Galerkin-based
methods.
3.3 Collocation-Based Spectral Stochastic Method
The collocation method is mainly based on computing the definite integral of a

function [70]. The Gaussian quadrature is the commonly used method. We can
compute the coefficients ak and ak .t/ in (2.36) and (2.37), respectively. We review
this method by using the Hermite polynomial shown below.
Our objective is to determine the numerical solution of the integral equation
hy./; Hj ./i (x can be a random variable or random process). In our problem,
this is one-dimensional numerical quadrature problem based on Hermite polynomi-
als [70]. Thus, we have
Z
1 1 2
hy./; Hk ./i D p y./Hk ./e 2 d
.2/
X
P
y.i /Hi .i /wi : (2.40)
i D0
Here we have only a single random variable . i and wi are Gaussian-Hermite

quadrature abscissas (quadrature points) and weights.
The quadrature rule states that if we select the roots of the P th Hermite
polynomial as the quadrature points, the quadrature is exact for all polynomials
of degree 2P 1 or less for (2.40). This is called (P 1)-level accuracy of the
Gaussian-Hermite quadrature.
For multiple random variables, a multidimensional quadrature is required. The
traditional way of computing a multidimensional quadrature is to use a direct
tensor product based on one-dimensional Gaussian Hermite quadrature abscissas
and weights [126]. With this method, the number of quadrature points needed for
n dimensions at level P is about .P C 1/n , which is well known as the curse of
dimensionality.
Smolyak quadrature [126], also known as sparse grid quadrature, is used as an
efficient method to reduce the number of quadrature points. Let us define a one-
dimensional sparse grid quadrature point set 1P D fi ; 2 ; :::; P g, which uses
P C 1 points to achieve degree 2P C 1 of exactness. The sparse grid for an n-
dimensional quadrature at degree P chooses points from the following set:
nP D [ .1i1 1in /; (2.41)

P C1jijP Cn
Pn
where jij D j D1 ij . The corresponding weight is

P Cnjij n1
wij1i :::i:::jn in D .1/ i
˘ wm ; (2.42)
1 n C P jij m jim

n1
where is the combinatorial number and w is the weight for the
n C P jij
corresponding quadrature points. It has been shown that interpolation on a Smolyak
grid ensures a bound for the mean-square error [126]

jEP j D O NPr .logNP /.rC1/.n1/ ;
where NP is the number of quadrature points and r is the order of the maximum
derivative that
exists
for the delay function. The number of quadrature points
nP
increases as O .P /Š .
It can be shown that a sparse grid at least with level P is required for an order P
representation. The reason is that the approximation contains order P polynomials
for both y./ and Hj ./. Thus, there exists y./Hj ./ with order 2P , which
requires a sparse grid of at least level P with an exactness degree of 2P C 1.
Therefore, level 1 and level 2 sparse grids are required for linear and quadratic
models, respectively. The number of quadrature points is about 2n for the linear
model and 2n2 for the quadratic model. The computational cost is about the same as
the Taylor-conversion method, while keeping the accuracy of homogeneous chaos
expansion.
In addition to the sparse grid technique, we can also employ several accelerating
techniques. Firstly, when n is too small, the number of quadrature points for sparse
grid may be larger than that of direct tensor product of a Gaussian quadrature. For
example, if there are only two variables, the number is 5 and 15 for level 1 and 2
sparse grid, compared to 4 and 9 for direct tensor product. In this case, the sparse
grid will not be used. Secondly, the set of quadrature points (2.41) may contain the
same points with different weights. For example, the level 2 sparse grid for three
variables contains four instances of the point (0,0,0). Combining these points by
summing the weights reduces the computational cost of y.i /.
4 Sum of Log-Normal Random Variables 33
3.4 Galerkin-Based Spectral Stochastic Method
The Galerkin-based method is based on the principle of orthogonality that the best
approximation of y./ is obtained when the error, ./, defined as
./ D y./ y (2.43)
is orthogonal to the approximation. That is,
< ./; Hk ./ >D 0; k D 0; 1; : : : ; P; (2.44)
where Hk ./ are Hermite polynomials. In this way, we have transformed the
stochastic analysis process into a deterministic form, whereas we only need to
compute the corresponding coefficients of the Hermite PC.
For the illustration purpose, considering two Gaussian variable D Œ1 ; 2 , we
assume that the charge vector in panels can be written as a second-order (p D 2)
Hermite PC, we have
y./ D y0 C y1 1 C y2 2 C y3 .12 1/ C
y4 .22 1/ C y5 .1 2 /; (2.45)
which will be solved by (2.44). Once the Hermite PC of y./ is known, the mean
and variance of y./ can be evaluated trivially. Given an example, for one random
variable, the mean and variance are calculated as
E.y.// D y0 ;
Var.y.// D y12 Var./ C y22 Var. 2 1/
D y12 C 2y22 : (2.46)
In consideration of correlations among random variables, we apply PCA Sect. 2.5

to transform the correlated variables into a set of independent variables.
4 Sum of Log-Normal Random Variables
Leakage current distribution is usually with log-normal distribution. Due to the

exponential convergence rate, Hermite PC can be used to represent log-normal
variables and the sum of log-normal variables [109].
4.1 Hermite PC Representation of Log-Normal Variables
Let g./ be the Gaussian random variable and l./ be the random variable obtained
by taking the exponential of g./,
l./ D eg./ ; g./ D ln.l.//: (2.47)
For a log-normal random variable Il , let the mean and the variance of g./ as g
and g2 , then the mean and variance of l./ are

g2
g C 2
l D e ; (2.48)
h i
l2 D e. 2g Cg2 / e g2
1 ; (2.49)
respectively.
A general Gaussian variable g./ can always be represented in the following
affine form:
Xn
g./ D i gi ; (2.50)
i D0
where i are orthogonal Gaussian variables. That is, hi j i D ıij , hi i D 0, and
0 D 1 and gi is the coefficient of the individual Gaussian variables. Note that
such form can always be obtained by using Karhunen–Loeve orthogonal expansion
method [45].
In our problem, we need to represent the log-normal random variable l./ by
using the Hermite PC expansion form:
X
P
l./ D lk Hkn ./; (2.51)
kD0

g2
where l0 D exp g C 2 . To find the other coefficients, we can apply (2.36) on
l./. Therefore, we have
hl.t; /; Hk ./i

lk .t/ D ; 8k 2 f0; :::; P g: (2.52)
hHk2 ./i
As was shown in [44], l./ can be written as

2 3
1X 2
n
hHk . g/i
l./ D D exp4g C g 5; (2.53)
hHi2 ./i 2 j D1 j
where n is the number of independent Gaussian random variables.

4 Sum of Log-Normal Random Variables 35
The log-normal process can then be written as

0 1
X
n X
n X
n
. i j ıij /
l./ D l0 @1 C i gi C gi gj C A ; (2.54)
i D1 i D1 j D1
h. i j ıij /2 i
where gi is defined in (2.50).
4.2 Hermite PC Representation with One Gaussian Variable
In this case, D Œ1 . For the second-order Hermite PC (P D 2/, following (2.54),
we have
1
l./ D l0 1 C g 1 C g2 12 1 : (2.55)
2
Hence, the desired Hermite PC coefficients, I0;1;2 , can be expressed as l0 ; l0 g ,
and 12 l0 g2 , respectively.
4.3 Hermite PC Representation of Two and More Gaussian

Variables
For two random variables (n D 2), assume that D Œ 1 ; 2 is a normalized

uncorrelated Gaussian random variable vector that represents random variable g./:
g./ D g C 1 1 C 2 2 : (2.56)
Note that
h.i j ıij /2 i D hi2 j2 i D hi2 ihj2 i D 1:
Therefore, the expansion of the log-normal random variables using second-order
Hermite PCs can be expressed as

2
l./ D l0 1 C 1 1 C 2 2 C 1 .12 1/
2
2
2 2
C .2 1/ C 21 2 1 2 ; (2.57)
2
where
1 2 1 2
l D l0 D exp g C 1 C 2 :
2 2
Hence, the desired Hermite PC coefficients, I0;1;2;3;4;5 , can be expressed as

l0 ; l0 1 ; l0 2 ; 12 l0 12 ; 12 l0 22 , and 2l0 1 2 , respectively.
Similarly, for four Gaussian random variables, assume that
D Œ1 ; 2 ; 3 ; 4 is a normalized, uncorrelated Gaussian random variable vector.
The random variable g./ can be expressed as
X
4
g D g C i i : (2.58)
i D1
As a result, the log-normal random variable l./ can be expressed as
0 1
X
4 X 4
1 2 2 X 4 X 4
l./ D l0 @1 C i i C i 1 i C i j i j C A ;
i D1 i D1
2 i D1 j D1
(2.59)
where !
1X 2
4
l D l0 D exp 0 C :
2 i D1 i
Hence, the desired Hermite PC coefficients can be expressed using the equation
(2.59) above.
5 Summary
The discussion of preliminary in probability theory is required to understanding

statistical analysis and modeling for VLSI design in nanometer region. In this
chapter, we introduced the relevant fundamentals employed in statistical analysis.
First, we presented the basic concepts and components such as mean, variance,
and covariance due to process variation. After that, we reviewed techniques for
the statistical variable decoupling and reduction in PFA/PCA analysis. We further
discussed the spectral stochastic analysis required for extraction, mismatch, and
yield analysis used in the later chapters. We also discussed different methods to
estimate the sum of random variables required for leakage current estimation.
Part II
Statistical Full-Chip Power Analysis
Chapter 3
Traditional Statistical Leakage Power
Analysis Methods
1 Introduction
Process-induced variability has huge impact on the circuit performance in the sub-
90 nm VLSI technologies [120]. This is the particular case for leakage power,
which has increased dramatically with the technology scaling and is becoming the
dominant chip power dissipation [71].
Leakage power and its proportion in chip power dissipation have increased
dramatically with technology scaling [71]. The dominant factors in leakage currents
are subthreshold leakage currents Isub and gate oxide leakage currents Igate .
Subthreshold leakage currents rapidly increase for every technology generation
(about 5 to 10 increase per generation [24]) and are highly sensitive to threshold
voltage V th variations owing to the exponential relationship between Isub and V th.
On the other hand, as gate oxide thickness, Tox , scales down, Igate grows rapidly as
Igate has an exponential dependence on Tox .
Both leakage currents are highly sensitive to process variations due to the
exponential relation between the leakage current and variational parameters like
effective channel lengths. As process-induced variability becomes more pronounced
in the deep submicro regime [120], leakage variations become more significant,
and traditional worst-case-based approaches will lead to extremely pessimistic and
expensive overdesigned solutions. Statistical estimation and analysis of leakage
powers considering process variability are critical in various chip design steps to
improve design yield and robustness. In the leakage estimation model, we can obtain
the chip-level leakage statistics such as the mean value and standard deviation from
process information, library information, and design information.
Many methods have been proposed for the statistical model of chip-level leakage
current. Early work in [169] gives the analytic expressions of mean value and
variance of leakage currents of CMOS gates considering only subthreshold leakage.
The method in [119] provides simple analytic expressions of leakage currents of the

40 3 Traditional Statistical Leakage Power Analysis Methods
Table 3.1 Different methods for full-chip SLA

Criteria Categories
Process variation Inter-die Intra-die, w or w/o spatial correlation
Leakage distribution Log-normal Non-log-normal
Speedup method MC Grid b Gate b Projection b
Leakage component Isub Igate
Static leakage model Gate-based MOSFET-based
whole chip considering global variations only. The method in [192] uses third-order
Hermite polynomials without considering spatial correlations and only calculates
the mean value of full-chip leakage current. In [114], reverse biased source/drain
junction BTBT (band-to-band tunneling) leakage current is considered, in addition
to the subthreshold leakage currents, for estimating the mean values and variances of
the leakage currents of gates only. In [142], the PDF of stacked CMOS gates and the
entire chip are derived considering both inter-die and intra-die variations. In [14],
a hardware-based statistical model of dynamic switching power and static leakage
power was presented, which was extracted from experiments in a predetermined
process window.
Chip-level SLA methods can be classified into different categories based on
different criteria as shown in Table 3.1. Our classification and survey may not
be complete as this is still an active research field and more efficient methods
will be developed in the future. We will present in detail some recent important
developments in the section such as Monte Carlo method and the traditional grid-
based method [13]. The gate-based spectral stochastic method [155] and the virtual
grid-based method will be introduced in Chap. 4 and Chap. 5, respectively. We
remark that our limited coverage of the other methods, which are presented in
minimal detail, does not diminish the value of their contributions.
This chapter is structured as follows. In Sect. 2, we discuss the static leakage
model for one gate/MOSFET, and then Sect. 3 gives the process variation models for
computing statistical information of full-chip leakage current. Section 4 presents the
recently proposed chip-level statistical leakage modeling and analysis works. The
chapter concludes with a summary and brief discussion of potential future research.
2 Static Leakage Modeling
Full-chip leakage current has two components, subthreshold leakage current and
gate leakage current. Here we describe the empirical models for both of them, based
on the assumption that the leakage current under process variations is estimated
under log-normal distributions.
2 Static Leakage Modeling 41
2.1 Gate-Based Static Leakage Model
The subthreshold leakage current, Isub , is exponentially dependent on the threshold

voltage, V th. V th is observed to be most sensitive to gate oxide thickness Tox and
effective gate channel length L due to short-channel effects. When the change in
L or Tox is small, the precise relationship shows an exponential dependent effect
on Isub , with the effect of Tox being relatively weak. For the gate oxide leakage
current, both channel length and oxide thickness have strong impacts on the leakage
currents, which are exponential functions of the two variables.
The leakage model is based on gates, as in [13] and [155]. We follow the
analytical expressions given in [13], which estimate the subthreshold leakage
currents and the gate oxide leakage currents as follows:
2 Ca T 1 Ca T
Isub D ea1 Ca2 LCa3 L 4 ox 5 ox
; (3.1)
Igate D ea1 Ca2 LCa3 L
2 Ca T Ca T 2
4 ox 5 ox
; (3.2)
where a1 through a5 are the fitting coefficients for each unique input combination
of a gate. Then we can use a LUT to store the fitting parameters. For a k-input gate,
the size of the LUT is 2k 10 as we have two equations for each input combination,
and each equation has 10 fitting parameters. While in [13], they only keep dominant
states for leakage current, i.e., only one “off” transistor in a series transistor stack.
However, with technology down scaling to 45 nm, this is not the practical case. The
Isub based on the model in (3.1) still has a large error compared to the simulation
results. Hence, the authors in [155] keep all the states.
After choosing sampling points for L and Tox in their 3 regions linearly, and
then conducting SPICE simulation at each point, the subthreshold leakage current
is stored as the original curve. We can then perform the curve fitting process.
Figures 3.1 and 3.2 show the curve fitting results of Isub and Igate for four input
patterns in the AND2 gate. Here, 100 points are chosen linearly in the 3 regions
for L and Tox . These figures show that the curves fit the SPICE results very well,
and the currents in the four cases are comparable with each other. Since there is no
“dominant state,” all of them need to be considered.
Table 3.2 shows the errors compared with industry SPICE simulation results
for the AND2 gate for Isub . Max Err. is the maximum error given by one input
combination, and Avg Err. refers to the average error over all the input patterns. If
we add more terms into (3.1) as shown in Table 3.2, we can reduce the errors from
8% to about 3%. After we obtain the analytic expression for each input combination,
we take the average of the leakage currents of all the input combinations to arrive
final analytic expression for each gate in lieu of the dominant states used in [13].
Based on this model, the leakage current of one gate under process variation
can be estimated by log-normal distributions. The average leakage of a gate can be
computed as a weighted sum of leakage under different input states,
ln(nA) Input Patern 0 ln(nA) Input Patern 1

5 6
Spice Spice
4 Curve−fitting 5
Curve−fitting
3 4
ln(Isub)
ln(Isub)
2 3
1 2
0 1
−1 0
0 20 40 60 80 100 0 20 40 60 80 100
Sample Point Index Sample Point Index

6 6
Spice Spice
5 Curve−fitting 5 Curve−fitting
4
ln(Isub)
ln(Isub)
4
3
3
2
2 1
1 0
0 20 40 60 80 100 0 20 40 60 80 100
Fig. 3.1 Subthreshold leakage currents for four different input patterns in AND2 gate under 45 nm
technology
avg
X
Isub D Pj Isub;j ; (3.3)
j 2 input states
avg
X
Igate D Pj Igate;j ; (3.4)
j 2 input states
X avg avg
Ileak; chip D Isub;i C Igate;i ; (3.5)
8gates i D1;:::;n
where Pj is the probability of input state j ; Isub;j and Igate;j are the subthreshold
leakage and the gate oxide leakage at input state j , respectively. n is the total number
of gates in the circuit. The interaction between these two leakage mechanisms is
included in total leakage estimation.
Since all the leakage components can be approximated as a log-normal distribu-
tion, we can simply sum up the distributions of the log-normals for all gates to get
the full-chip leakage distribution. Note that there exist spatial correlations, and the
2 Static Leakage Modeling 43

6 6
Spice Spice
ln(Igate)
ln(Igate)
2 2
0 0
−2 −2
0 50 100 0 50 100
6 8
Spice Spice
ln(Igate)
ln(Igate)
4
2
2
0 0
−2 −2
0 50 100 0 50 100
Fig. 3.2 Gate oxide leakage currents for four different input patterns in AND2 gate under 45 nm
technology
Table 3.2 Relative errors by using different fitting formulas for

leakage currents of AND2 gate
Fitting components Max Err. (%) Avg Err. (%)
1
Original: L; L2 ; Tox ; Tox 14.7 8.46
2
Add Tox 13.95 8.26
2
Add Tox ; Tox =L 7.08 5.95
2
Add Tox , Tox =L; L=Tox 7.14 4.94
Add Tox2
; Tox =L, L=Tox ; Tox L 3.67 3.49
leakage distributions of any two gates may be correlated. Therefore, the full-chip
leakage current is calculated by a sum of correlated log-normals:
X
p
SD eYi ; (3.6)
i D1
where p is the total number of log-normals to sum, Yi is Gaussian random variable,

and Y D ŒY1 ; Y2 ; : : : ; Yp forms a multivariate normal distribution with covariance
matrix †Y . The vector Y is a function of L and Tox .
Fig. 3.3 Typical layout

Gate
of a MOSFET
A B
Leff
W
Source Drain
2.2 MOSFET-Based Static Leakage Model
Like in [96], sometimes the statistical model for the subthreshold leakage current is
formulated in a MOSFET. Here, we only discuss the formulation method developed
for NMOS transistors, then the method can be easily extended to PMOS transistors.
Here Isub of one MOSFET is formulated, and the Leff for nonrectilinear transistor is
developed.
The leakage current of a ideal transistor can be expressed as a function of
Leff [65]. The curve-fitted leakage model considering narrow width effect is shown
in (3.7),
p
˛sub qsi Ncheff .W 2 C ˛W W /
Isub D
.Vds2 C ˛ds1 Vds C ˛ds2 /exp.˛L1 L2eff C ˛L2 Leff /

A B
2 exp exp
A0 B0

Vds Vgs Vthlin
1 exp exp ; (3.7)
VT nVT
where all ˛s are fitting parameters, si is the dielectric constant of Si, Ncheff is the
effective channel doping concentration, and A and B are layout parameters as shown
in Fig. 3.3.
When high-k techniques are used to better insulate the gate from the channel
for sub-65-nm technologies, gate oxide tunneling effect has been moderated and
controlled [96]. In this case, Igate is less important than Isub .
A real gate structure under sub-90-nm technology is with rough edge (nonrec-
tilinear), which can be translated into an equivalent single transistor with effective
gate channel length Leff . As shown in Fig. 3.4, a nonrectilinear gate can be divided
into several slices of subgate, each of which has its own length and shares same
characteristic width W0 along the width direction. In this way, the leakage current of
one nonrectilinear gate IG can be approximated as the sum of the leakage currents
of all the slices along the width direction:
3 Process Variational Models for Leakage Analysis 45
W0
Li W
W
Leff
Fig. 3.4 Procedure to derive the effective gate channel length model
X
M
IG D Ij .Lj ; W0 / D I.Leff ; W /; (3.8)
j D1
where W is the width of the gate and each slice is a regular gate. Under this frame,
supposing we have M slices along the width direction, then we have
PM
j D1 Lj
D ; (3.9)
M
qP
M
j D1 .Lj /2
D : (3.10)
M
The Leff for the equivalent gate can be calculated by

W
Leff D Lmin C ˛ln ; (3.11)
W0
where ˛ is the fitting parameter.

After we set up the Leff model, the equivalent Leff can be used in the compact
model for leakage current as shown in (3.7).
3 Process Variational Models for Leakage Analysis
In this section, we present the process variation for computing variational leakage
currents. Process variation occurs at different levels: wafer level, inter-die level,
and intra-die level. Furthermore, they are caused by different sources such as
lithography, materials, aging, etc. [7]. Some of the variations are systematic, i.e.,
those caused by the lithography process [42, 129]. Some are purely random, i.e., the
Table 3.3 Process variation 2 distribution ( )

parameter breakdown for
45 nm technology Gate Inter-die 20% 4% 18 nm
length (L) Intra-die
Spatial correlated 80%
Gate oxide Inter-die 20% 4% 1:8 nm
thickness (Tox ) Intra-die
Noncorrelated 80%
doping density of impurities and edge roughness [7]. In this section, we introduce
different kinds of process variations first, and then the process variational model for
leakage analysis.
The main process parameter to have a big impact on leakage current is the
transistor threshold voltage V th. V th is observed to be the most sensitive to the
effective gate channel length L and gate oxide thickness Tox . The ITRS [71]
indicates that the gate channel length variation is a primary factor for device
parameter variation, and the number of dopants in channel results in an unacceptably
large statistical variation of the threshold voltage. Therefore, we must consider
the variations in L and Tox , since leakage current is most sensitive to these
parameters [13]. To reflect reality, we model spatial correlations in the gate channel
length, while the gate oxide thickness values for different gates are taken to be
uncorrelated.
Here we list an example of detailed parameters for gate channel length and gate
oxide thickness variations for under 45 nm technology in Table 3.3. As indicated
in the second column, we can decompose each parameter variation into “inter-die”
and “intra-die” variations. For intra-die variation, we further decompose it into with
and without spatial correlation. In most cases, these variations can be modeled by
Gaussian distributions [33, 178]. The total variance ( 2 ) is computed by summing
up the variances of all components, since the sum of Gaussian distributions is still a
Gaussian distribution.
Under inter-die variation, if the leakage currents of all gates or devices are
sensitive to the process parameters in similar ways, then the circuit performance
can be analyzed at multiple process corners using deterministic analysis methods.
However, statistical methods must be used to correctly predict the leakage if
intra-die variations are involved. As leakage current varies exponentially with
these parameters, simple use of worst-case values for all parameters can result in
exponentially larger leakage estimates than the nominal values which are actually
obtained, which is too inaccurate to be used in practical cases.
Electrical measurements of a full wafer show that the intra-die gate channel
length variation has strong spatial correlation [42]. This implies that devices that
are physically close to each other are more likely to be similar than those that are far
apart. Therefore, the intra-die variation of gate channel lengths is modeled based on
such kind of correlation. There are several different models that can represent this
kind of spatial correlations. Take the exponential model [195] for instance,
.r/ D er
2 =2
(3.12)
3 Process Variational Models for Leakage Analysis 47
where r is the distance between two panel centers and is the correlation length.
We notice that the strong spatial correlation suggested by (3.12) has been exploited
by [13] to speed up the calculation, where the full-chip is divided into N grids
and the correlated random variables are perfectly correlated in a grid. The strong
spatial correlation is explored naturally by grid-based method or PCA (for Gaussian
distributions) or independent component analysis (for non-Gaussian distributions),
which can transfer the correlated random variables into independent ones with
reduced numbers. Details will be given in the next section. For gate oxide thickness,
Tox , strong spatial correlation does not exist; therefore, we assume Tox of different
gates are uncorrelated.
The last column of Table 3.3 shows the standard deviation () of each variation.
According to statistical theory regarding Gaussian distributions, 99% of the samples
should fall in the range of ˙3. According to [71], the physical gate channel length
for high-performance logic in 45 nm technology will be 18 nm, and the physical
variation should be controlled within ˙12%. Therefore, we let 3 be 12%, and a
similar analysis can be done for Tox .
For a gate/module in a chip with gate channel length L, and process variation
L using our model parameters in Table 3.3, we have
L D L C L; L D Linter C Lintra corr ; (3.13)
where L is the nominal design parameter value, and Linter is constant for all gates
in all grids since it is a global factor that applies to the entire chip. For one chip
sample, we only need to generate it once. Lintra corr is different between each gate
or each grid and has spatial correlation. Therefore, we generate one value for each
gate/grid, and the spatial correlation is regarded as an exponential model in (3.12),
so that the correlation coefficient value diminishes with the distance between any
two gates/grids.
As for the gate oxide thickness Tox , using model parameters in Table 3.3, we have
Tox D ox C Tox ; Tox D Tox; inter C Tox; intra uncorr ; (3.14)
where ox is the nominal design parameter value. Due to similar reason as Linter ,
Tox; inter is constant for all gates in all grids. Tox; intra uncorr is different between
any gates/grids, but does not have spatial correlation.
After the process variations are modeled as correlated distributions, we can apply
the PCA in Sect. 2.2 of Chap. 2 to decompose correlated Gaussian distributions into
independent ones. After PCA, the process variations (e.g., V th, Tox and L) of
each gate can be modeled as
XG;i D VG;i E; (3.15)

where the vector XG;i D ŒxG;i;1 ; xG;i;2 ; : : :T stands for the parameter
variations of the i th gate. E D Œ"1 ; "2 ; : : : ; "m T represents the random variables
for modeling both inter-die and intra-die variations of the entire die. Here
f"1 ; "2 ; : : : ; "m g can be extracted by PCA. They are independent and satisfy the
standard Gaussian distribution (i.e., zero mean and unit standard deviation). m is
the total number of these random variables. For practical industry designs, m is
typically large (e.g., 103 106 ). VG;i captures the correlations among the random
variables.
When m is a large number, the size of VG;i can be extremely huge. However, XG;i
only depends on the intra-die variations within its neighborhood; so VG;i should be
quite sparse. In Sect. 4, the gate-based spectral stochastic method and the projection-
based method will use this sparsity property to reduce the computational cost in two
different ways.
Gate-based statistical leakage analysis typically starts from the leakage modeling
for one gate,
Ileak;i D f .E/; (3.16)
where Ileak;i represents the total leakage current of the i th gate. Different models
can be chosen here to represent the relationship between E and Ileak;i . For example,
quadratic models are used to guarantee accuracy:
log.Ileak;i / D E T Aleak;i E C Bleak;i

T
E C Cleak;i ; (3.17)
where Aleak;i 2 Rmm ; Bleak;i 2 Rm ; and Cleak;i 2 R are the coefficients. More
details will be given in the next section.
Given the leakage models of all the individual gates, the full-chip leakage current
is the sum of leakage currents of all the gates on the chip:
Ileak; Chip D Ileak;1 C Ileak;2 C C Ileak;n ; (3.18)
where n is the total number of gates in a chip. If we choose the quadratic model
in (3.17) and (3.18) implies that the full-chip leakage current is the sum of many
log-normal distributions. As we mentioned before, it can be approximated as a
log-normal distribution [13]. Therefore, we can also use a quadratic model to
approximate the logarithm of the full-chip leakage:
log.Ileak; Chip / D E T AChip E C BChip

T
E C CChip ; (3.19)
where AChip 2 Rmm ; BChip 2 Rm ; and CChip 2 R are the coefficients. In (3.17) and
(3.19), the quadratic coefficient matrices AGatei and AChip can be extremely large for
capturing all the intra-die variations, which makes the quadratic modeling problem
extremely expensive in practical applications. Several approaches have been made
to reduce the size of the model, with more details shown in the next section.
4 Full-Chip Leakage Modeling and Analysis Methods 49
4 Full-Chip Leakage Modeling and Analysis Methods
Full-chip statistical leakage modeling and analysis methods can be classified into
different categories based on different criteria as shown in Fig. 3.1. In this section,
we will present in detail the three important methods: MC method, the traditional
grid-based method, and project-based method.
4.1 Monte Carlo Method
Monte Carlo technique mentioned in Sect. 3.1 of Chap. 2 can be used to estimate
the value of leakage power at gate level as well as chip level.
For full-chip leakage current, Ileak; Chip is G in (2.25). If the sample number M C
is large enough, then we can obtain a sufficiently accurate result. However, for full-
chip leakage current analysis, the MC estimator is too expensive. A more efficient
method with good accuracy is needed.
Several techniques exist for improving the accuracy of Monte Carlo evaluation
of finite integrals. In these techniques, the goal is to construct an estimator with a
reduced variance for a given, fixed number of samples. In other words, the improved
estimator can provide the same accuracy as the standard Monte Carlo estimator,
while needing considerably fewer samples. This is desirable because computing the
value of g.Xi / is typically costly.
4.2 Traditional Grid-Based Methods
Since the number of gates on an entire chip is very large and every gate has their
own variational parameter, the resulting number of random variables is very large.
For greater efficiency, the grid-based method partitions a chip to several grids, and
assigns all the gates on one grid with the same parameters.
A full-chip SLA method considering spatial correlations in the intra-die and
inter-die variations was proposed [13]. This method introduces a grid-based par-
titioning of the circuits to reduce the number of variables at a loss of accuracy. A
projection-based approach has been proposed in [95] to speed up the leakage anal-
ysis, where Krylov-subspace-based reduction has been performed on the coefficient
matrices of second-order expressions. This method assumes independent random
variables after a preprocessing step such as PCA. However, owing to the large
number of random variables involved (103 to 106 ), the PCA-based preprocess can be
very expensive. Work in [65] proposes a linear-time complexity method to compute
the mean and variance of full-chip leakage currents by exploiting the symmetric
property of one existing exponential spatial correlation formula. The method only
considers subthreshold leakage, and it requires the chip cells and modules to be
partitioned into a regular grid with similar uniform fitting functions, which is
typically impractical. In this work, both subthreshold leakage and gate oxide leakage
of only dominant input states are considered in (3.4). Here we consider only intra-
die variation of parameters. The extension to handling inter-die variation is quite
obvious, as shown at the end of this subsection.
As shown in (3.6), the total leakage current of a chip is the sum of correlated
leakage components, which can be approximated P as a log-normal using Wilkinson’s
method [2]. A sum of t log-normals, S D ti D1 eYi , is approximated as the log-
normal eZ , where Z D N.z ; z /. In Wilkinson’s approach, the mean value and
standard
P deviation of Z are obtained by matching the first two moments, u1 and u2 ,
of ti D1 eYi as follows:
X
t
eyi Cyi =2 ;
2
u1 D E.S / D ez Cz =2 D
2
(3.20)
i D1
X
t
e2yi C2yi
2
2 2z C2z2
u2 D E.S / D e D
i D1
t 1
X X
t
C2 eyi Cyj e.y2i C y2j C 2rij yi yj /=2; (3.21)
i D1 j Di C1
where rij is the correlation coefficient of Yi and Yj . Solving (3.21) for z and z
yields
1
z D 2 ln u1 ln u2 ; (3.22)
2
z2 D ln u2 2 ln u1 : (3.23)
From the above formula, we can see that a pair-by-pair computation for all
correlated pairs of variables needs to be done, i.e., for all i , j such that rij D 0.
It will lead to a very expensive computation time cost. First, leakage currents of
different gates are correlated because of the spatial correlation of L. Secondly, Isub
and Igate associated with the same NMOS transistor are correlated. Thirdly, Isub
in the same transistor stack are also correlated. If there are N gates in the circuit,
the complexity for computing the sum will be O.N 2 /, which is far from practical
for large circuits. Therefore, the grid-based method uses several approximations
to reduce the time complexity. In the grid-based method, gates in the same grid
have the same parameter values. For example, let Isub;i be the subthreshold leakage
currents for Gatei (i D 1; : : : ; t) under the same input vector, and assume that these
gates are all in the same grid k. Then
0 Cˇ dL Cˇ dT
Isub;i D ˛i eYi 0 k 1 ox;i
; (3.24)
4 Full-Chip Leakage Modeling and Analysis Methods 51
where ˛i , ˇ0 , and ˇ1 are the fitting coefficients. Since we assume that L is spatially
correlated and Tox is uncorrelated, all of the Isub;i in the same grid should use the
same variable dLk and different dTox values. Then, the sum of the leakage terms
Isub;i in grid k is given by
0 X
t
Cˇ0 dLk
eYi ˛i eˇ1 dTox;i : (3.25)
i D1
Note that the second part of the above expression is a sum of independent log-
normal variables, which is a special case for the sum of correlated log-normal
variables. By using Wilkinson’s method, this can be computed in linear time.
Therefore, for gates of the same type with the same input state in the same grid,
the time complexity is only linear, and we can approximate the sum of leakage
of all gates by a log-normal variable which can be superposed in the original
expression. Similarly, Igate of different gates in the same grid can be calculated
through summation in linear time and can be approximated by a log-normal variable.
Now, if the chip is divided into n grids, we can reduce the number of correlated
leakage components in each grid to a small constant c in their library. As a result, the
total number of correlated log-normals to sum is no more than c n. In general, the
number of grids is set to be substantially smaller than the number of gates in the chip,
which can be regarded as a constant number. Therefore, the complexity required
for the sum of log-normals in the grid-based method is reduced from O.N 2 / to a
substantially smaller constant O.n2 /.
As we discussed before, leakage currents of different gates are correlated due to
spatially correlated parameters such as transistor gate channel length. Furthermore,
Isub and Igate are correlated within the same gate. In addition, leakage currents under
different input vectors of the same gate are correlated because they are sensitive to
the same parameters of the gate, regardless of whether or not these are spatially
correlated. We must carefully predict the distribution of total leakage in the circuit,
and the correlations of these leakage currents must be correctly considered when
they are summed up.
As we mentioned before, the leakage currents that arise from the same leakage
mechanisms in the same grid from the same entry of the LUT are merged into
a single log-normally distributed leakage component to reduce the number of
correlated leakage components to sum. Let I1sum and I2sum be two merged sums,
which correspond to subthreshold leakage and gate oxide leakage components in
the same grid, respectively. These can be calculated as
X
t
Y10 Cˇ0 dL 0
I1sum De ˛i eˇ1 dTox;i D eY1 Cˇ0 dL e ; (3.26)
i D1
0
0 0 X
t
0 0 0
I2sum D e Y2 Cˇ0 dL ˛i0 eˇ1 dTox;i D eY2 Cˇ0 dL e ; (3.27)
i D1
where e and e are the log-normal approximations of the sum of independent log-
P P0 0 0
normals, ti D1 ˛i eˇ1 dTox;i and ti D1 ˛i eˇ1 dTox;i in I1sum and I2sum ; respectively, as
described in (3.25).
P P0 0 0
Note that ti D1 ˛i eˇ1 dTox;i and ti D1 ˛i eˇ1 dTox;i may be correlated, since the
same gate could have both subthreshold and gate leakage. Therefore, e and e are
correlated, and we need to derive the correlation between and . Since the Tox
values are independent in different gates, we can easily compute the correlation,
P P0 0 0
cov. ti D1 ˛i eˇ1 dTox;i ; ti D1 ˛i eˇ1 dTox;i / as
X .ˇi2 Cˇi02 / 2 0 =2 ˇi ˇi0 T2ox;i

˛i ˛i0 e Tox;i
.e 1/: (3.28)
The correlation between e and e is then found as
cov.e ; e / D E.e C / E.e /E.e /

2
D e C C. C /=2 .ecov. ; /=2 1/;
2
(3.29)
where / and / are the mean value and standard deviation of / ,

respectively. Solving (3.29) for cov. ; /, we have

cov.e ; e /
cov. ; / D 2log 1 C 2
: (3.30)
em Cm C. C. /=2
2
P P0 0 0
Since e and e are approximations of ti D1 ˛i eˇ1 dTox;i and ti D1 ˛i eˇ1 dTox;i ;
respectively, it is reasonable to assume that
0 0
1
X
t X
t
ˇi0 Tox;i
0
cov.e ; e / D cov @ ˛i eˇi Tox;i ; ˛i0 e A: (3.31)
i D1 i D1
At the same time, the mean values and standard deviations of and are already
known from the approximations; therefore, the computation of cov. ; / is easily
possible.
We can extend the framework for statistical computation of full-chip leakage
considering spatial correlations in intra-die variations of parameters to handle inter-
die variation. For each type of parameter, a global random variable can be applied
to all gates in the circuit to model the inter-die effect. In addition, this framework
is general and can be used to predict the circuit leakage under other parameter
variations or other leakage components. However, if the Gaussian or log-normal
assumption does not work, we can not use the grid-based method to estimate full-
chip leakage.
5 Summary 53
4.3 Projection-Based Statistical Analysis Methods
Recent work in [5] presents a unified approach for statistical timing and leakage
current analysis using quadratic polynomials. However, this method only considers
the long-channel effects and ignores the short-channel effects (ignoring channel
length variables) for the gate leakage models. The coefficients of the orthogonal PC
at gate level are computed directly by the interproduction via the efficient Smolyak
quadrature method. The method also tries to reduce the number of variables via the
moment matching method, which further speeds up the quadrature process at the
cost of more errors.
This projection-based method is used to compute the moments of statistical
leakages via moment matching techniques, which are well developed in the area of
interconnect model order reduction [177]. In the projection-based method, quadratic
models in (3.17) and (3.18) are used to guarantee accuracy. Li et al. [97] proposed
a projection-based approach (PROBE) to reduce the quadratic modeling cost. In
a quadratic model, we need to compute all elements of the quadratic coefficient
matrix, which is the main difficulty. Take Achip in (3.19), for example. In most
real cases, Achip is rank deficient. As a result, this full-rank matrix Achip can be
approximated by another low-rank matrix AQchip if kAchip AQchip kF is minimized.
Here, k kF denotes the Frobenius norm, which is the square root of the sum of
the squares of all matrix elements. Li et al. [97] proved that the optimal rank-R
approximation is
XR
AQchip D T

chipr Pchipr Pchipr ; (3.32)
rD1
where m stands for the total number of random variables and

chipr 2 R and
Pchipr 2 Rm are the rth dominant eigenvalue and eigenvector of the matrix Achip ,
respectively.
The PROBE method proposed in [97] is efficient in handling 101 102 random
variables. However, there are 103 106 variables in a full-chip SLA. This led
Li et al. [98] to improve the projection-based analysis algorithm by exploring
the underlying sparse structure of the leakage analysis problem. Specifically, the
improved methodology includes (1) two-step iterative algorithm for quadratic
SLA modeling, (2) quadratic model compaction algorithm for leakage distribution
estimation, and (3) incremental analysis algorithm for locally updating the leakage
distribution.
5 Summary
In this chapter, we have presented problem of statistical leakage analysis under

process variations with spatial correlations. We then discuss the existing approaches
and present the pros and cons of those methods. All the existing approaches either
suffer from the high computing costs (MC method), or can only work for variations
with strong spatial correlations (grid-based method), or has strong assumption about
parameter variations (no spatial correlation in the projection-based method).
In the following chapters, we show how those problems can be resolved or
mitigated. We will mainly present two statistical leakage analysis methods: the
spectral-stochastic-based method with variable reduction techniques and the virtual
grid-based approach.
Chapter 4
Statistical Leakage Power Analysis by Spectral
Stochastic Method
1 Introduction
In this chapter, we present a gate-based general full-chip leakage modeling and

analysis method [157]. The gate-based method starts with the process variational
parameters such as the channel length, ıL, and gate oxide thickness, ıTox , and it
can derive the full-chip leakage current Ileak in terms of those variables directly
(or their corresponding transformed variables). Unlike existing grid-based methods,
which trade the accuracy for speedup, the presented method is gate-based method
and uses principal component analysis (PCA) to reduce the number of variables
with much less accuracy loss, assuming that the geometrical variables are Gaussian.
For non-Gaussian variables, independent component analysis (ICA) [68] can be
used. The presented method considers both inter-die and intra-die variations, and
it can work with various spatial correlations. The presented method becomes
linear under strong spatial correlations. Unlike the existing approaches [13, 65],
the presented method does not make any assumptions about the distributions of
final total leakage currents for both gates and chips and does not require any
grid-based partitioning of the chip. Compared with [5], the presented method
applies a more efficient multidimensional numerical quadrature method (vs. reduced
number of variables using interproduction via the moment matching), considers
more accurate leakage models, and presents more comprehensive comparisons with
other methods.
The presented method first fits both the subthreshold and gate oxide leakage
currents into analytic expressions in terms of parameter variables. We show that
by using more terms in the gate-level analytic models and we can achieve better
accuracy than [13]. Second, the presented method employs the OPC, which gives
the best representation for specific distributions [45] and is also called the spectral
stochastic method, to represent the variational gate leakages in an analytic form
in terms of the random variables. The step is achieved by using the numerical
Gaussian quadrature method, which is much faster than the MC method. The total
leakage currents are finally computed by simply summing up the resulting analytical

56 4 Statistical Leakage Power Analysis by Spectral Stochastic Method
orthogonal polynomials of all gates (their coefficients). The spatial correlations are
taken care of by PCA or ICA, and at the same time, the number of random variables
can also be substantially reduced in the presence of strong spatial correlations during
the decomposition process. Numerical examples on the PDWorkshop91 benchmarks
on a 45 nm technology show that the presented method is about 10 times faster than
the recently presented method [13] with constant better accuracy.
2 Flow of Gate-Based Method
To analyze the statistical model of chip-level leakage current, traditional methods

are grid-based. Since the number of gates on a whole chip is very large, and
every gate has its own variational parameters, it means that the number of random
variables is huge. So considering efficiency, the traditional methods partition a chip
to several grids and assume that all the gates in one grid have the same parameters
as mentioned in Sect. 4.2 of Chap. 3. However, this is not the real case. Take Fig. 4.1
as one example. Here the distance between Gate1 and Gate 2 is smaller than the
distance between Gate1 and Gate 3. In grid-based method, we suppose that Gate1
has strong correlation with Gate 3, and has weak correlation with Gate 2. But
actually, the situation is opposite. In this section, we will present the full-chip
statistical leakage analysis method. This method is gate-based instead of grid-based,
while it can gain better speed as well as better accuracy than the method in [13],
which is based on grid. Our algorithm is shown in Fig. 4.2. The presented algorithm
basically consists of three major parts. The first part (step 1) is precharacterization,
which builds the analytic leakage expressions (3.1) and (3.2) for each type of
gates. This step only needs to be done once for a standard cell library (SCL). The
second part (step 2–5) generates a set of independent random variables and builds
Gate3
Gate1
Gate2
Fig. 4.1 An example of a

grid-based partition.
Reprinted with permission
from [157] c 2010 Elsevier
2 Flow of Gate-Based Method 57
Input: standard cell lib, netlist, placement information of design, of L and Tox
Output: analytic expression of the full-chip leakage currents in terms of Hermite
polynomials
1. Generate fitting parameter matrices asub and agate of Isub and Igate in (3.1) and (3.2) for
each type of gates (after SPICE run on each input pattern) (Sect. 2).
2. Perform PCA to transform and reduce the original parameter variables in L into
independent random variables in Lk (Sect. 2.2).
3. Generate Smolyak quadrature point set n2 with corresponding weights.
4. Calculate the coefficients of Hermite polynomial of Isub;k and Igate;k for the final leakage
analytic expression for each gate using (4.9) and (4.10).
5. Calculate the analytic expression of the full-chip leakage current by simple polynomial
additions and calculate leakage , leakage , PDF, and CDF of the leakage current if required.
Fig. 4.2 The flow of the presented algorithm
the gate-level analytic leakage current expressions and covariances. The final part
(step 6) computes the final leakage expressions by simple polynomial additions and
calculates other statistical information.
2.1 Random Variables Transformation and Reduction
In presented gate-based approach, instead of using grid-based partitioning, as

in [13], to reduce the number of channel length variables in the presence of the
strong spatial correlation, we apply the PCA to reduce the number of random
variables. Our method starts with the following random variable vectors:
L D ŒL1 ; L2 ; :::; Ln C ıLinter ; (4.1)

Tox D ŒTox1 ; Tox2 ; :::; Toxn C ıTox; inter ; (4.2)
where n is the total number of gates on the whole chip, and ıLinter and ıTox; inter
represent the inter-die (global) variations. In total, we have 2nC2 random variables.
There exist correlations between L among different gates, represented by the
covariance matrix cov.Li ; Lj / computed by (3.12).
The first step is to perform PCA on L to get a set of independent random variables
L0 D ŒL01 ; L02 ; :::; L0n , where L D PL0 and P D fpij g is the n n principal
component coefficient matrix. In this process, singular value decomposition (SVD)
is used on the covariance matrix, and the singular values are arranged in a decreasing
order, which means that the elements in L0 are arranged in a decreasing weight order.
Then the number of elements in L0 can be reduced by only considering the dominant
part of L0 as ŒL01 ; L02 ; :::; L0k (e.g., the weight should be bigger than 1%), where k
is the number of reduced random variables. Then every element L0i in L0 can be
represented by orthogonal Gaussian random variable i with normal distribution:
L0i D i C i i ; (4.3)
where i and i are the mean value and standard deviation of L0i . And L can be
represented as
0 1 0 10 1
L1 p11 ::: p1k 1 1
B L2 C B p21 ::: p2k C B C
B C B C B 2 2 C
LDB : CCB : :: :: C B : C C ıLinter : (4.4)
@ :: A @ :: : : A @ :: A
Ln pn1 : : : pnk k k
For ŒTox1 ; Tox2 ; :::; Toxn , ıLinter , and ıTox; inter , we can also represent them by using
the standard Gaussian variables as
Tox;j D ox;j C ox;j ox;j ;

ıLinter D L;inter L;inter ;
ıTox; inter D ox; inter ox; inter ; (4.5)
where ox;j , L;inter , and ox; inter are independent orthonormal Gaussian random
variables. As a result, we can present L and Tox by k C n C 2 independent
orthonormal Gaussian random variables:
D Œ1 ; 2 ; :::; kCnC2 : (4.6)
Then the Isub .L; Tox / / Igate .L; Tox / can be modeled as Isub ./ / Igate ./, respec-
tively.
But among the k C n C 2 variables, only k C 2 variables related to the
channel lengths are correlated. In other words, the n variables Tox;i of each gate
are independent. As a result, for the j th gate, we only have k C 3 independent
variables; the corresponding variable vector, g D fg;j g, is defined as
g;j D Œ1 ; :::; k ; ox;j ; L;inter ; ox; inter : (4.7)
2.2 Computation of Full-Chip Leakage Currents
For each gate, we need to present the leakage currents in order-2 Hermite polynomi-
als first as shown below for both subthreshold and gate leakage currents—Isub . g;j /
and Igate . g;j /:
2 Flow of Gate-Based Method 59
X
P X
P
Isub . g;j / D Isub;i;j Hi2 . g;j /; Igate . g;j / D Igate;i;j Hi2 . g;j /; (4.8)
i D0 i D0
where Hi2 . g;j /s are order-2 Hermite polynomials. Isub;i;j and Igate;i;j are then
computed by the numerical Gaussian quadrature method discussed in Sect. 3.3 of
Chap. 2. Let S be the size of Z-dimensional second-order (level 2) quadrature point
set Z2 and Z D k C 3. Then Isub;i and Igate;i can be computed as the following:
X
S
Isub;i;j D Isub .l /Hi2 .l /wl =hHi2 . g;j /i; (4.9)
lD1
X
S
Igate;i;j D Igate .l /Hi2 .l /wl =hHi2 . g;j /i; (4.10)
lD1
where Isub .l / and Igate .l / are computed using (3.1) and (3.2).
As a result, their coefficients for i th Hermite polynomial at j th gate can be added
directly as X X
Ileakage;i;j D Isub;i;j C Igate;i;j : (4.11)
After the leakage currents are calculated for each gate, we can proceed to compute
the leakage current for the whole chip as follows:
X
n
Ileakage ./ D .Isub . g;j / C Igate . g;j //: (4.12)
j D1
The summation is done for each coefficient of Hermite polynomials. Then we obtain
the analytic expression of the final leakage currents in terms of the .
We can then obtain the mean value, variance, PDF, and CDF of the leakage
current very easily. For instance, the mean value and variance for the full-chip
leakage current are
leakage D Ileakage; 0th ; (4.13)

X X
2 2 2
leakage D Ileakage; 1st C 2 Ileakage; 2nd; type1
X
2
C Ileakage; 2nd; type2 ; (4.14)
where Ileakage;i th is the leakage coefficient for i th Hermite polynomial of second

order defined as follows,
H0th ./ D 1; H1st ./ D i ;

H2nd; type1 ./ D i2 1; H2nd; type2 ./ D i j ; i ¤ j: (4.15)
2.3 Time Complexity Analysis
To analyze the time complexity, one typically does not count the precharacterization
cost of step 1 in Fig. 4.2. For PCA step (step 2), which essentially uses SVD on the
covariance matrix, its computation cost is O.nk 2 / if we are only interested in the
first k dominant singular values. This is the case for strong spatial correlation.
In step 3, we need to compute the weights of level 2 .k C 3/-dimensional
Smolyak quadrature point set. For quadratic model with k C 3 variables, the number
of Smolyak quadrature points is about .k C 3/2 . So the time cost for generating
Smolyak quadrature point set is O..k C 3/2 /.
In step 4, we need to call (3.1) and (3.2) S times for each gate. In each call, we
need to compute k C 3 variables in the Hermite polynomials. The computing cost
for the two steps is (O.n.k C 3/ S /), where n is the number of gates. After the
leakage currents are computed for each gate, it takes O.n.k C 3// to compute the
full-chip leakage current.
The total computing cost is O.nk 2 C.kC3/2 Cn.kC3/S Cn.kC3//. For second-
order Hermite polynomials, S / k 2 , so the time complexity becomes O.nk 3 /. If
k n (for strong spatial correlation), we end up with a linear-time complexity
O(n). In the sub-90 nm VLSI technologies, the spatial correlation is really strong,
and in the downscaling process, the spatial correlation will become stronger, which
makes sure our method can achieve pretty good time complexity.
3 Numerical Examples
The presented method has been implemented in Matlab 7.4.0. For comparison
purpose, we also implement the grid-based method in [13] and the pure MC method.
All the experimental results are carried out in a Linux system with quad Intel Xeon
CPUs with 2:99 GHz and 16 GB memory. The initial results of this chapter were
published in [155, 157].
The methods for full-chip statistical leakage estimation are tested on circuits in
the PDWorkshop91 benchmark set. The circuits are synthesized with Nangate Open
Cell Library, and the placement is from MCNC [106]. The technology parameters
come from the 45 nm FreePDK Base Kit and PTM models [139].
Table 4.1 shows the detailed parameters for gate length and gate oxide thickness
variations. Here we choose two sets of 2 distributions. The last column of Table 4.1
shows the standard deviation () of each variation. The 3 values of parameter
variations for L and Tox are set to 12% of the nominal parameter values, of which
inter-die variations constitute 20% and intra-die variations, 80% (case 1); inter-die
variations constitute 50% and intra-die variations, 50% (case 2). The parameter L is
modeled as sum of correlated sources of variations, and the gate oxide thickness Tox
is modeled as an independent source of variation. The same framework can be easily
extended to include other parameters of variations. Both L and Tox in each gate are
3 Numerical Examples 61
Table 4.1 Process Case 1

variation parameter
2 distribution ( )
breakdown for 45 nm
technology Gate Inter-die 20% 4% 18 nm
Noncorrelated 80%
Case 2
2 distribution ( )
Gate Inter-die 50% 4% 18 nm
Noncorrelated 50%
modeled as Gaussian parameters. For the correlated L, the spatial correlation is

modeled based on the exponential spatial correlation in (3.12). For [13], we still
partition the chip into a number of regular grids, and the numbers of grid partitions
of spatial correlation model used for the benchmarks are given in Table 4.1.
For comparison purposes, we perform MC simulations with 500,000 runs, the
grid-based method in [13], and the presented method on the benchmarks. The large
number of MC runs is due to the fact that presented method is quite accurate.
Figure 4.3 shows the full-chip leakage current distribution (PDF and CDF) of circuit
SC0 with 125 gates, considering variation in gate length and gate oxide thickness as
in Table 4.1 for Case 1, and the spatial correlation of gate length. It shows that our
method fits very well with the MC results, and is more accurate than [13]. Other test
cases show the similar comparison results. The results of the comparison of mean
values and standard deviations of full-chip leakage currents are shown in Tables 4.2
and 4.3. For Case 1, the average errors for mean value and standard deviation of the
presented gate-based method are 0.8% and 4.04%, respectively. While for the grid-
based method in [13], the average errors for mean value and standard deviation are
4.08% and 39.7%, respectively. For Case 2, the average errors for mean value and
standard deviation of the presented new gate-based method are 0.8% and 5.51%,
respectively. While for the grid-based method in [13], the average errors for mean
value and standard deviation are 4.17% and 28.4%, respectively. The presented gate-
based method is more accurate than the grid-based method, especially for standard
deviation value. Since we use 45 nm technology, while the results in [13] is based
on 100 nm technology, the error ranges are different. (In [13], the average errors
for mean value and standard deviation are 1.3% and 4.1%.) Results of the grid-
based method in [13] will become worse when the technology scales down, since
the dominant state assumption is not working any more.
And Table 4.4 also compares the CPU times of the three methods. From this
table, we can see that even if our method is gate based, it is still faster than the
x 10−3 Probability Density of Leakage Current Comparison

1
Probability Density Monte Carlo
0.8 Our Method
Grid−based Method
0.6
0.4
0.2
0
0 2000 4000 6000 8000 10000
Full−chip Leakage Current(nA)
Cumulative Distribution of Leakage Current Comparison
1
Monte Carlo
0.8 Our Method
Probability
Grid−based Method
0.6
0.4
0.2
0
0 2000 4000 6000 8000 10000
Full−chip Leakage Current(nA)
Fig. 4.3 Distribution of the total leakage currents of the presented method, the grid-based method,
and the MC method for circuit SC0 (process variation parameters set as Case 1). Reprinted with
permission from [157] c 2010 Elsevier
Table 4.2 Comparison of the mean values of full-chip leakage currents among three methods
Circuit Gate Grid Variation of Ileak . A) Errors (%)
name # # setting MC [13] New [13] New
SC0 125 4 Case 1 1:84 1:75 1:82 4:67 0:84
Case 2 1:84 1:75 1:82 4:85 0:87
SC2 1888 16 Case 1 29:98 28:88 29:70 3:65 0:91
Case 2 30:02 28:89 29:75 3:77 0:89
SC5 6417 64 Case 1 107:9 103:6 107:2 3:93 0:65
Case 2 107:9 103:6 107:2 3:9 0:65
method in [13], which is grid based. And the presented method is much faster than
the MC method. On average, the presented method has about 16 speedup over the
grid-based method in [13]. We notice that method in [13] will become faster with
smaller number of grids used. But this can lead to large errors even with strong
spatial correlations.
4 Summary 63
Table 4.3 Comparison standard deviations of full-chip leakage currents among three methods
of Ileak (A) Errors (%)
Circuit name Variation setting MC [13] New [13] New
SC0 Case 1 0:495 0:668 0:524 35:0 5.77
Case 2 0:632 0:726 0:689 14:9 9.04
SC2 Case 1 8:606 10:86 8:798 26:2 2.23
Case 2 10:71 12:03 11:36 12:33 6.13
SC5 Case 1 26:19 41:36 25:11 57:9 4.12
Case 2 26:19 41:36 25:11 57:9 4.12
Table 4.4 CPU time comparison among three methods

Cost time(s) Speedup (%)
Circuit name Variation setting MC [13] New [13] New
SC0 Case 1 378.1 11:35 1:40 8:11 270:1
Case 2 358.6 7:47 1:41 5:30 254:33
SC2 Case 1 1:35 104 168:51 18:79 30:6 718:5
Case 2 1:35 104 87:94 17:23 5:10 437:96
SC5 Case 1 2:76 105 3335 121:2 27:52 2277
Case 2 2:06 105 7798:3 443:95 17:56 464:33
4 Summary
In this chapter, we have presented a gate-based method for analyzing the full-chip
leakage current distribution of digital circuit. The method considers both intra-
die and inter-die variations with spatial correlations. The new method employs
the orthogonal polynomials and multidimensional Gaussian quadrature method to
represent and compute variational leakage at the gate level and uses the orthogonal
decomposition to reduce the number of random variables by exploiting the strong
spatial correlations of intra-die variations. The resulting algorithm compares very
favorable with the existing grid-based method in terms of both CPU time and
accuracy. The presented method has about 16 speedup over [13] with constant
better accuracy.
Chapter 5
Linear Statistical Leakage Analysis by Virtual
Grid-Based Modeling
1 Introduction
When the spatial correlation is weak, existing general approaches mentioned in

Chaps. 3 and 4 do not work well as the number of correlated variables cannot be
reduced too much. Recently, an efficient method was proposed [200] to address this
problem. The method is based on simplified gate leakage models and formulates the
major computation tasks into matrix–vector multiplications via Taylor’s expansion.
It then applies fast numerical methods like the fast multipole method or the pre-
corrected fast Fourier transformation (FFT) method to compute the multiplication.
However, this method assumes the gate-level leakage currents are purely log-
normal, and the chip-level leakage is also approximated by log-normal distribution,
which is not the case as we will show in the chapter. Also it can only give the means
and variances, not the complete distribution of the leakage powers.
In this chapter, a linear statistical leakage analysis technique using virtual grid-
based model is presented [158, 159]. We start with a new linear-time algorithm for
statistical leakage analysis in the presence of any spatial correlation (from no spatial
correlation to 100% correlated situation). The presented algorithm exploits the
following property: leakage current of a gate in the presence of spatial correlation
is affected by process variations in the neighbor area. As a result, gate leakage
current can be efficiently computed by considering the neighbor area in constant
time. We adopt a newly used spatial correlation model where a new set of location-
dependent uncorrelated virtual variables are defined over grid cells to represent
original correlated random variables via fitting. To compute the statistical leakage
current of a gate on the new set of variables, the collocation-based method is applied
and the variational gate leakages and total leakage currents are represented in an
analytic form in terms of the random variables, which can give complete statistic
information. The presented method considers both inter-die and intra-die variations
and can work with any spatial correlations (strong or weak, as defined in Sect. 3).
Unlike the existing approaches [13, 65], the presented method does not make any
assumptions about the final distributions of total leakage currents for both gate and

66 5 Linear Statistical Leakage Analysis by Virtual Grid-Based Modeling
chip levels. In case of medium and strong correlations, the presented method can
also work in linear time by properly sizing the grid cells so that both locality of
correlation and accuracy are still preserved.
Furthermore, we bring forth a novel characterization of SCL for statistical
leakage information and we have the following observations: (1) The set of neighbor
cells is usually small (10), and only considering the relative position, not the
absolute position on chip. (2) As proved later, the number of neighbor cells
involved in our model is not related to the strength (level) of spatial correlation.
(3) The collocation-based method is applied, and the variational leakage of a gate is
represented in an analytic form in terms of the virtual random variables, which can
give complete distribution. (4) The gate-level leakage distribution is only related to
the type of gates in a SCL. This statistical leakage characterization can be stored in
a LUT, which only needs to be built once for a SCL. And the full-chip leakage of
any chip can be easily calculated by summing up certain items in the LUT.
The main highlights of the presented algorithm are as follows:
1. We apply the virtual grid-based model for spatial correlation modeling in the
statistical leakage analysis, making the resulting algorithm linear time for the
first time for all the spatial correlation (weak or strong) cases.
2. A new characterization in SCL for statistical leakage analysis has been used.
The corresponding algorithm can accelerate full-chip statistical analysis for all
spatial correlation conditions (from weak to strong). To the best knowledge of
the authors, the presented approach is the first published algorithm which can
guarantee O.N / time complexity for all spatial correlation conditions.
3. In addition, an incremental algorithm has been applied. When a few local changes
are made, only a small circuit (includes the changing gates) is involved in
the updating process. Our numerical examples show the incremental analysis
can achieve 10 further speedup compared with the library-enabled full-chip
analysis approach.
In addition to the main highlights, we also present a forward-looking way to
extend the presented method to handle runtime leakage analysis. In order to estimate
maximum runtime leakage, the input state under the maximum leakage input vector
needs to be chosen. While for transient runtime leakage simulation, every time the
input vector changes, the input states of some gates on a chip will be updated.
Therefore, the incremental technique makes efficient runtime leakage simulation
possible. More details are given in Sect. 4.6.
Numerical examples on the PDWorkshop91 benchmarks on a 45 nm technology
show that the presented method using novel characterization in SCL is on average
two orders of magnitude faster than the recently proposed method [13] with similar
accuracy. For weak correlation situation, more speedup can be observed. We remark
that the experiment in this chapter is based on idle-time leakage. However, the
linear-time algorithm can also be applied to runtime leakage by selecting different
input states under certain input vectors. Notice that glitch events are ignored in
the simplified discussion, which may cause estimation errors [99], and need to be
considered in the future work. More details are discussed in Sect. 4.6.
2 Virtual Grid-Based Spatial Correlation Model 67
2 Virtual Grid-Based Spatial Correlation Model
The virtual grid-based model is based on the observation that the leakage current of
a gate in the presence of spatial correlation only correlates to its neighbor area. If
we can introduce a set of uncorrelated variables to model the localized correlation,
computing the leakage current of one gate can be done in a constant time by only
considering its neighbor area. Hence, total full-chip statistical leakage currents can
then be computed by simply adding all the gate leakage currents together in terms
of the virtual set of variables in linear time. Notice that the virtual random variables
in different grids are always independent, which is different from traditional grid-
based model. This idea was proposed recently for fast statistical timing analysis [15]
to address the computational efficient modeling for weak spatial correlation, which
is similar to the PCA-based approach [155], but with a different set of independent
variables.
Specifically, the chip area is still divided into a set of grid cells. When the spatial
correlation is weak enough to be ignored, the cell can become so small that one cell
only contains one gate. Then we introduce a “virtual” random variable for each cell
for one source of process variation.
These virtual random variables are independent and will be the basis for
statistical leakage current calculation concerned with spatial correlation. Then we
can express the original physical random variable of a gate in a grid cell as a linear
combination of the virtual random variables of its own cell as well as its nearby
neighbors. Since virtual random variables in each cell has specific location on chip,
such location-dependent correlation model still retains the important spatial physical
meaning (in contrast to PCA-based models). The grid partition can be made of any
shape. We use hexagonal grid cells [15] in this chapter since they have minimum
anisotropy for 2D space.
Here we define the distance between centers of two direct neighbor grid cells
as the grid length dc . Gates located in the same cell have strong correlation (larger
than a given threshold value high ) and are assumed to have the same parameter
variations. And “spatial correlation distance” dmax is defined as the minimum
distance beyond which the spatial correlation between any two cells is sufficiently
small (or smaller than a given threshold value low ) so we can ignore it.
In this model, the j th grid cell is associated with one virtual random variable
j N.0; 1/, which is independent of all other virtual random variables. Lj
can then be expressed as its k closest neighbor cells. We introduce the concept
of correlation index neighbor set T .j / for cell j , and the corresponding variable
vector, g;j , is defined as
gridj D Œq ; q 2 T .j / (5.1)
to model the spatial correlation of Lj as

X
Lj D ˛q q : (5.2)
q2T .j /
Fig. 5.1 Location-dependent

modeling with the T .i / of d max
grid cell i defined as its seven 10
neighbor cells. Reprinted
with permission from [159] 2 8
d3
c 2010 IEEE
7
3
1 9
d2
4 6
5 d1
For example, hexagonal grid partition is used as shown in Fig. 5.1, and if T .i /
for each cell is defined as its closest k D 7 neighbor cells, then L located at
cell .xi ; yi / can be represented as a linear combination of seven virtual random
variables located in its neighbor set. Take L1 in Fig. 5.1 for instance, we have
L1 D ˛1 1 C ˛2 2 C C ˛7 7 .
This concept of virtual random variable helps to model the spatial correlation.
Two cells close to each other will share more common spatial random variables,
which means the correlation is strong. On the other hand, two cells physically far
away from each other will share less or no common spatial random variables. In
this way, the spatial correlation is modeled as a homogeneous and isotropic random
field, and the spatial correlation is only related to distance. That is to say, spatial
correlation can be fully described by .d / in (3.12). dmax is the distance beyond
which .d / becomes small enough to be approximated as zero.
Since .d / is only a function of distance, the number of unique distance values
between two correlated grid cells equals the number of unique element values in
˝N . From Fig. 5.1, the spatial correlation
p distance equals to the distance between
cell 1 and cell 10 which is dmax D 7dc , and there are only three unique correlation
distances d1 to d3 . Correspondingly, there are only three unique elements in ˝N ,
without including two special values: 0 for d dmax or 1 for distance within
one cell.
Furthermore, the same correlation index can be used for all grid cells, and
the coefficient ˛k should be the same for the same distance because of the
homogeneousness and isotropy of spatial correlation. For the cell marked 1 in
Fig. 5.1, we only have two unique values among the seven coefficients, i.e., we set
p0 D ˛1 , p1 D ˛i ; i D 2; 3; : : : ; 7. In other words, we have
L1 D p0 1 C p1 .2 C C 7 /: (5.3)
In this way, although there are seven random variables involved in the neighbor set,
there are only two unknown coefficients left in the linear function in (5.3) due to the
symmetry property of hexagonal partition.
3 Linear Chip-Level Leakage Power Analysis Method 69
According to (3.12), a nonlinear overdetermined system can be built to determine

the two unique values of p0 , p1 as follows,
.0/ D E.L21 / D p02 C 6p12

.d1 / D E.L1 ; L2 / D 2p0 p1 C 2p12 (5.4)
.d2 / D E.L1 ; L9 / D 2p12
.d3 / D E.L1 ; L8 / D p12 :
The system in (5.5) can be solved by formulating them as a nonlinear least-square

optimization problem. In the matrix form, we can rewrite (5.2) for a whole chip as
L D PN;N ; (5.5)
where N is the number of grid cells and D Œ1 ; 2 ; : : : ; N . According to (5.2), the
correlation index set contains only k spatial random variables, which is a very small
fraction of the total spatial random variables. As a result, PN;N is a sparse matrix.
Every gate is only concerned with k virtual random variables, which has specific
location information.
Fundamentally, PCA-based method performs a similar process and has a similar
new transformation matrix between the original and new set of variables:
L D Vn;n ; (5.6)
where Vn;n is the transformation matrix obtained from eigenvalue decomposition of

the correlation matrix in PCA. The major difference is that Vn;n is a dense matrix
even though the original correlation matrix is sparse. This makes a huge difference
especially when the spatial correlation is weak, as eigendecomposition will take
almost O.n3 / to compute. The virtual independent spatial correlation model also
works for medium and strong correlation cases, which will be shown in the next
section.
3 Linear Chip-Level Leakage Power Analysis Method
In this section, we will present the new full-chip statistical leakage analysis method.
We first introduce the overall flow of the presented method and highlight the major
computing steps. The presented algorithm flow is summarized in Fig. 5.2.
The presented algorithm consists of three major parts. The first part (steps 1
and 2) is precharacterization. Step 1 builds the analytic leakage expressions (3.1)
and (3.2) for each type of gates, which only needs to be done once for a SCL. Step
2 deals with a small-sized nonlinear overdetermined system, which can be solved
with any least-square optimization algorithm. The second part (step 3) generates a
small set of independent virtual random variables and builds the analytic leakage
current expressions and covariances for each gate on top of the new random
variables. The final part (step 4) computes the final full-chip leakage expressions by
simple polynomial additions. From the final expressions, we can calculate important
statistical information (like mean, variance, and even the whole distributions). In
the following, we briefly explain some important steps.
3.1 Computing Gate Leakage by the Spectral Stochastic

Method
In the following, we use the orthogonal polynomial-based modeling approaches

mentioned in Sect. 3.2 of Chap. 2. Note that for Gaussian and log-normal distribu-
tions, Hermite polynomial is the best choice as it leads to exponential convergence
rate [45]. For non-Gaussian and non-log-normal distributions, there are other or-
thogonal polynomials. The presented method can be extended to other distributions
with different orthogonal polynomials.
In our problem, y./ in (2.30) will be the leakage current for each gate and
eventually for the full chip. For the j th gate, from (5.2), Lj only relates to k
independent virtual random variables in T .j /. Since k is a small number, step 3 in
Fig. 5.2 can be very efficient.
To compute the gate leakage current, we need to present both Isub and Igate of
each gate in the second-order Hermite polynomials, respectively:
XP
Isub . gridj / D Isub;i;j Hi . gridj /; (5.7)
i D0
3 Linear Chip-Level Leakage Power Analysis Method 71
XP
Igate . gridj / D Igate;i;j Hi . gridj /; (5.8)
i D0
where Hi . gridj / are second-order Hermite polynomials defined as in (4.15). And

Isub;i;j and Igate;i;j are then computed by the numerical Smolyak quadrature method
in (2.40).
Notice that the time complexity of computing leakage for a gate is O.k 2 /. And
the number of involved independent random variables k is very small compared to
total number of gates. The analytic expression is also functions of those involved
random variables.
After the leakage currents are calculated for each gate, we can proceed to compute
the leakage current for the whole chip as follows:
Xn
Ichip ./ D .Isub . gridj / C Igate . gridj //: (5.9)
j D1
The summation is done for each coefficient of Hermite polynomials. Then we obtain
the analytic expression of the final leakage currents in terms of .
We can then obtain the mean value and variance of full-chip leakage current very
easily as follows:
chip D Ichip; 0th ; (5.10)

X X
2 2 2
chip D Ichip; 1st C 2 Ichip; 2nd; type1
X
2
C Ichip; 2nd; type2 ; (5.11)
where Ichip;i t h is the leakage coefficient for i th Hermite polynomial of second order
defined in (4.15). Since Hermite polynomials with orders higher than two have no
contribution to mean value or standard deviation, second order is good enough for
estimating chip and chip in (5.10) and (5.11).
To analyze the time complexity, one typically does not count the precharacterization
cost of step 1 in Fig. 5.2, and the time cost of step 2 is ignorable compared to the
following steps. In step 3, we need to compute the weights of level 2 k-dimensional
Smolyak quadrature point set. For quadratic model with k C 3 variables, the number
of Smolyak quadrature point is S O.k 2 / based on the discussion in Sect. 3.1.
So the time cost for generating Smolyak quadrature points set is O.k 2 /. In step 4,
we need to call (3.1) and (3.2) S times for each gate. In each call, we need to
compute k C 3 variables in the Hermite polynomials. The computational cost for
the two steps is (O.nk S /), where n is the number of gates. After the leakage
currents are computed for each gate, it takes O.n.k C 3// to compute the full-chip
leakage current.
For the second-order Hermite polynomials, S / k 2 , and the k is the number
of grid cells in the correlated neighbor index set, which is a very small constant
number. As a result, the time complexity of our approach becomes linear—O.n/.
4 New Statistical Leakage Characterization in SCL
In this section, we will present why a new characterization modeling statistical

leakage can be added to SCL and how it can be applied in our new full-chip
statistical leakage analysis method.
4.1 Acceleration by Look-Up Table Approach
The spatial correlation in (5.2) is related to distance between two grid cells. As a
result, neighbor set T .i / represents the relative location, not the absolute location.
In other words, a local neighbor set T and a local set of variables loc D Œ1 ; : : : ; k
can be shared by all the gates in all the cells.
The local neighbor set T and the coefficients in (5.2) are determined by dmax =dc .
From the specific spatial correlation model in (3.12) (as shown in Fig. 5.3),
p q
dmax D ln.low /; dc D ln.high /; (5.12)
ρhigh
ρ = exp(−d2/η2)
ρ(d)
ρlow
0
Fig. 5.3 Relation between 0 dc /η d max /η
.d / and d= d/η
4 New Statistical Leakage Characterization in SCL 73
then the ratio of spatial correlation distance dmax over grid length dc becomes
q
dmax =dc D ln.low /= ln.high /: (5.13)
Once the threshold values high and low are set, dmax =dc is not related to the
correlation length . This means we can determine the grid length once we know
the spatial correlation distance for a specific correlation formula at cost of controlled
errors (by high and low ).
Furthermore, (5.13) shows the spatial correlation (strong or weak) has nothing
to do with T and the virtual random variables used in our model. At the same
time, the fitting parameters of static leakage in (3.1) and (3.2) are only related to
the types of gates in a library. As a result, the coefficients of Hermite polynomials
for the leakage of one gate are only functions of the type of the gate, high and
low . Therefore, a simple LUT can be used to store the coefficients of Hermite
polynomials of each type of gates in the library. In other words, we do not need
to compute the coefficients of Hermite polynomials for each gate, just look them
up from the table instead. This makes a big difference, as the time complexity is
reduced from O.n/ to O.N /, where n is the number of gates and N is the number
of grid cells on chip.
For the LUT, supposing Q is the number of Hermite polynomials involved and
m is the number of gate types in the library, then it includes two matrices as follows:
CS D fIsub;q;j g; CG D fIgate;q;j g: (5.14)
Here Isub;q;j represents the coefficient of Hq for j th kind of gate in the library for
subthreshold leakage and Isub;q;j represents the coefficient of Hq for j th kind of
gate in the library for gate oxide leakage. CS and CG are Q m matrices. Notice
the table needs to only be built once and can be reused for different designs with
different conditions of spatial correlations since the new algorithm is independent
of spatial correlation length or the circuit design information. In this way, the LUT
actually builds a new characterization in SCL, which presents the statistical leakage
behavior of each standard cell.
4.2 Enhanced Algorithm
The enhanced new algorithm consists of two parts. The first part is precharacteri-
zation as shown in Fig. 5.4. We build analytic leakage current expressions for each
kind of gate on top of a small set of independent virtual random variables. For fixed
values of high , low , and one library, a new characterization is added to the SCL
by building a LUT, which stores coefficients of Hermite polynomials of Isub and
Igate for the leakage analytic expressions for each kind of gate. This process only
Fig. 5.4 The flow of statistical leakage characterization in SCL
Fig. 5.5 The flow of the presented algorithm using statistical leakage characterization in SCL
needs to be done once for one LIBRARY, given high and low . Besides, it involves
a small-size nonlinear overdetermined problem, which can be solved fast with any
least-square algorithm.
When we deal with full-chip statistical leakage analysis, the coefficients of local
Hermite polynomials in the neighbor grid cell set for each cell can be simply
calculated by the LUT. After transferring the local coefficients to corresponding
global positions, we can compute the final full-chip leakage expressions by simple
polynomial additions. From the resulting expression, we can calculate other statis-
tical information (like mean, variance, and even the whole distributions). The new
algorithm flow is summarized in Fig. 5.5. In the following, we briefly explain some
important steps.
Here we define a gate mapping matrix as follows:
GN m D fgi;j g; (5.15)
where gi;j represents the number of j th kind of gate in library located in i th grid
cell. Then the coefficients of local Hermite polynomials in neighbor set for all the
cells on chip can be easily calculated by the LUT as follows:
Isub; loc D G CST ; Igate; loc D G CGT : (5.16)
In order to get the full-chip leakage current, the local coefficients need to be
transferred to their corresponding global positions:
T .i / D .xi ; yi / C T: (5.17)
For the i th grid cell, the local set of random variables loc should be transferred
to the corresponding positions in T .i /. Therefore, Isub; loc and Igate; loc can be
transferred to the corresponding global coefficients based on the global virtual
random variable set . For example, the coefficient of i in the i th cell is
X
Isub .i / D Isub; loc .T .k/.xk ;yk / /: (5.18)
k;i 2T .k/
Next, we can proceed to compute the leakage current of the whole chip as follows,
X
Ichip ./ D Isub ./ C Igate ./: (5.19)
The summation is done for each coefficient of global Hermite polynomials to obtain
the analytic expression of the final leakage currents in terms of . We can then
obtain the mean value, variance, PDF, and CDF of the leakage current very easily.
For instance, the mean value and variance for the full-chip leakage current are
chip D Ichip; 0th ; (5.20)

X X
2 2 2
chip D Ichip; 1st C 2 Ichip; 2nd; type1
X
2
C Ichip; 2nd; type2 ; (5.21)
where Ichip;i t h is the leakage coefficient for i th Hermite polynomial of second order
defined in (4.15).
4.4 Incremental Leakage Analysis
During the leakage-aware circuit optimizations, a few small changes might be made
to the circuit. But we do not want to compute the whole chip leakage from scratch
again. In this case, incremental analysis becomes necessary. In this section, we show
how this can be done in our look-up-table-based framework.
For brevity, we only consider the case where one gate is changed. However, the
presented incremental approach can be easily extended to handle a number of gates.
Assume one gate located in the i th grid cell is changed (e.g., a j th type of gate
is replaced by a .j C 1/th type), resulting in
new old old new
Ichip D Ichip Igridi C Igridi ; (5.22)
new old
where Ichip and Ichip denote the full-chip leakage currents after and before change,
old new
respectively, and Igridi and Igridi are the leakage currents in the i th grid cell before
and after change, respectively.
As defined in (5.15), gi;j in gate mapping matrix represents the number of j th
kind of gate in the library located in the i th cell on a chip. Therefore, we can quickly
generate the new gate mapping matrix G new by updating only two elements in G old :
new old
gi;j D gi;j 1;
new old
gi;j C1 D gi;j C1 C 1: (5.23)
In the incremental analysis processes, we can consider the updating part as a

small circuit, in which there is only one grid cell (the i th cell on chip) and only two
types of gates in the library (the j th and the .j C 1/th). Then the updating gate
mapping matrix is
G update D Œ1 1; (5.24)
and LUTs in (5.14) used in the small circuit are only

update
CS D ŒIsub;j ; Isub;j C1 ;
update
CG D ŒIgate;j ; Igate;j C1 ; (5.25)
where Isub;j=.j C1/, Igate;j=.j C1/ are the j=.j C 1/th column in CS and CG , respec-
tively.
Compared to the size of the whole chip, the small circuit is much simpler and
only contains a few terms. Therefore, updating the leakage distribution using (5.24)
and (5.25) is much cheaper than the full-blown chip leakage analysis.
Considering statistical leakage analysis of a certain chip, for each grid cell, we need
to do a weighted sum up of m kinds of gates in this cell for every coefficient in
the neighbor set (size k). For quadratic model with k variables, the number of
coefficients is about S k 2 . So the time cost for this step is O.k 2 m N /,
where N is the number of cells. For transferring the local coefficients to their
global positions and summing them up, the time cost is O.N /. Next, it takes O.N /
to compute the full-chip leakage current. Since k and m are very small constant
numbers, as a result, the time complexity of our approach becomes O.N /.
4.6 Discussion of Extension to Statistical Runtime Leakage

Estimation
The leakage current for each input combination we obtained in Sect. 2 of Chap. 3
can be used to estimate the average leakage in standby mode (idle) as well as time-
variant leakage in active mode (runtime).
For idle leakage analysis, we take the average of the leakage currents of all the
input combinations to arrive at analytic expression for each gate as in (5.26), in lieu
of the dominant states used in [13]. The reason for keeping all input states is that the
technology downscaling narrows the gap between leakage under dominant states
and others. Only considering one state in leakage analysis will lead to large error
compared to the simulation results:
avg
X
Isub D Pi Isub;i ;
i 2all input states
avg
X
Igate D Pi Igate;i ; (5.26)
i 2all input states
where Pi is the probability of input state i , and Isub;i and Igate;i are the subthreshold
leakage and gate leakage value at input state i , respectively.
On the other hand, runtime leakage might change when a new input vector is
applied. By choosing the input state at gate level under certain input vector, the final
analytic expression for runtime leakage can be obtained. Notice that the size of the
LUT of runtime leakage is larger than the one used in idle-time leakage analysis. For
runtime leakage, the analytic expressions of all input patterns cannot be combined
and have to be stored separately.
The presented statistical characterization in SCL is fast enough to make runtime
leakage estimation under a series of input vectors possible. More details for
statistical runtime leakage analysis is given in the following part.
Fig. 5.6 Simulation flow for

full-chip runtime leakage SLA on given initial input vector and
input states of all gates on chip
No
Change in
input vector?
Yes
Update runtime leakage behavior

by incremental leakage analysis
Here we present a forward-looking way to extend the presented method to

handle runtime leakage current estimation. In traditional power analysis, leakage
was considered important only in the idle time. However, as technology scales down,
the growth of leakage power becomes significant even during runtime, for instance,
for computing the maximum power bound [38].
Runtime leakage, however, is input-signal dependent and changes each time
the input signals change, which means it becomes time varying. As a result, the
runtime leakage analysis will take an extremely long time as we need to perform the
statistical analysis for each input vector along the time domain. Fortunately, with
the novel statistical characterization in SCL and the incremental approach discussed
in Sect. 4.4, leakage analysis at each cycle is fast enough to make runtime leakage
estimation possible.
In the following, we show how to extend the presented statistical leakage method
to handle the runtime leakage analysis. First, in the runtime leakage analysis, given
the initial input vector and initial state of each gate on a chip, the initial leakage
analysis can be done using the algorithm in Fig. 5.5. After that, every time the input
vector changes, the input states of some gates on the chip will be updated. Instead
of computing the chip-level leakage from the very beginning, the incremental
technique discussed in Sect. 4.4 can be applied here to update the runtime leakage
information. The flow of the presented statistical analysis of runtime leakage is
shown in Fig. 5.6.
Also one notable difference is that the gate-level leakage analytical expressions
in (3.1) and (3.2) for all input states need to be stored for runtime leakage analysis
instead of the average value in (5.26) for idle-time leakage analysis.
Second, sometimes the maximum statistical runtime leakage estimation is
required instead of such transient results of leakage. In fact, the maximum runtime
leakage of a circuit can be much greater than the minimum leakage (by a few orders
of magnitude [99]). Besides, the input vectors causing the maximum leakage current
highly depend on process variations due to the shrinking physical dimensions.
To obtain the maximum statistical runtime leakage, we follow the work in [38],
which proposed a technique to accurately estimate the runtime maximum/minimum
leakage vector considering both cell functionalities and process variations. One can
first run the tool in [38] to obtain input vector, giving the maximum leakage power
first. Then one can apply the presented SCL tool to obtain the maximum/minimum
statistical leakage power under the input. The presented statistical leakage charac-
terization in SCL will work as long as the input vector is given.
We note that glitch events also have effect on runtime leakage power and ignoring
the glitching can cause an estimation error of approximately 5–20% depending
on circuit topology [99]. However, glitch has not been considered in any existing
statistical runtime leakage analysis works so far and will be investigated in the
future.
4.7 Discussion about Runtime Leakage Reduction Technique
Runtime leakage reduction technology such as power gating [1] is widely applied in
design of mobile devices nowadays. Although the model of leakage power used in
this chapter is idle-time leakage, the presented method can be extended to leakage
computation under the runtime scenario with leakage reduction.
By shutting off the idle blocks, power gating is an effective technique for saving
leakage power. Following the runtime leakage model for power gating in [73], the
variational part of full-chip leakage can be estimated as
X gate
Ileak D .1 W / Ii ; (5.27)
i 2allgates
where W is the empirical switching factor. And from [198], the leakage of a gate
I gate can be approximated into a single exponential function of its virtual ground
voltage (VV G )
I gate IOe Kgate VV G ; (5.28)
where Kgate is the leakage reduction exponent and IO is zero-VV G leakage current.
Notice both the switching factor W in (5.27) and the leakage reduction exponent
Kgate in (5.28) are related only to the type of gates and not to a statistical factor.
Therefore, the presented LUT approach can work for both idle leakage and runtime
leakage with power-gating activities.
The presented methods with and without using LUT have been implemented in
Matlab 7.8.0. Since the leakage model for method in [200] has to be purely log-
normal (linear terms in exponent parts), we did not choose it for comparing purpose.
Table 5.1 Summary of test cases used in this chapter

Circuit Gate # Area/ m2 Test case dmax =m dc =m Grid #
SC0 125 1,459 1,350 Case 1 2,190 730 22
Case 2 1,095 365 44
SC1 1,888 4,892 4,874 Case 3 1,896 612 88
Case 4 918 328 16 16
SC2 6,417 10,092 10,466 Case 5 984 328 32 31
Case 6 482 164 64 64
VLSI 2e6 SC2 256 Case 7 6,301 2,144 112 112
All the experimental results are carried out in a Linux system with quad Intel Xeon
CPUs with 2:99 GHz and 16 GB memory. The initial results of this chapter were
The methods for full-chip statistical leakage analysis were tested on circuits in
the PDWorkshop91 benchmark set. The circuits were synthesized with Nangate
Open Cell Library [125], and the placement is from MCNC [106]. The technology
parameters come from the 45 nm FreePDK Base Kit and PTM models [139].
According to [71], L and Tox for high-performance logic in 45 nm technology
will be 18 nm and 1.8 nm, respectively. And the physical variation should be
controlled within ˙12%. So the 3 values of variations for L and Tox were set
to 12% of the nominal values, of which inter-die variations constitute 20% and
intra-die variations, 80%. L is modeled as sum of spatially correlated sources of
variations, and Tox is modeled as an independent source of variation. The same
framework can be easily extended to include other parameters of variations. Both
L and Tox are modeled as Gaussian parameters. For the correlated L, the spatial
correlation is modeled based on (3.12), and the partition adopts Fig. 5.1. The test
cases are given in Table 5.1 (all length units in m), where test case “VLSI” is
generated from duplicating SC2 as unit block to 16 16 array.
For comparison purposes, we performed MC simulations with 50,000 runs
using (3.1) and (3.2), the method in [13] (only consider spatial correlation of
neighbor grid cells), and the presented approaches on the benchmarks.
5.1 Accuracy and CPU Time
The results of the comparison of mean value and standard deviations of full-chip
leakage current are shown in Table 5.2, where New is the presented method. The
average errors for mean and standard variance () values of the new technique are
4.52% and 3.92%, respectively. While for the method in [13], the average errors
for mean value and are 4.12% and 3.83%, respectively. Table 5.2 shows these
two algorithms have almost the same accuracy, and our method can handle both
strong and weak spatial correlations by adjusting grid size, for very large circuit
Table 5.2 Accuracy comparison of different methods based on Monte Carlo

Mean value (A) Errors (%)
Test case Grid # MC Method [13] New Method [13] New
Case1 2 2 3.311 3.105 3.169 6.20 4.28
Case2 44 3.310 3.105 3.169 6.20 4.28
Case3 88 30.04 28.88 30.46 3.85 1.38
Case4 16 16 30.04 28.88 30.46 3.85 1.38
Case5 32 32 191.6 179.0 182.7 6.59 4.65
Case6 64 64 191.6 179.0 182.7 6.59 4.65
Case7 112 112 – – 2.6e4 – –
Standard deviation (A) Errors (%)
Test case Grid # MC Method [13] New Method [13] New
Case1 22 0.904 0.837 0.861 7.40 4.69
Case2 44 0.594 0.547 0.548 7.91 7.74
Case3 88 5.713 5.494 5.417 3.83 5.18
Case4 16 16 5.307 5.400 5.067 1.75 4.52
Case5 32 32 33.87 31.83 32.25 6.02 4.78
Case6 64 64 33.20 30.27 29.34 8.83 11.63
Case7 112 112 – – 4.1e3 – –
Table 5.3 CPU time Test case MC Method in [13] New LUT
comparison
Case1 83.14 2.96 0.10 0.023
Case2 87.09 13.16 0.14 0.036
Case3 828.42 26.24 0.86 0.033
Case4 869.12 74.50 0.87 0.609
Case5 7532.77 117.77 8.65 1.005
Case6 7873.54 490.84 10.67 7.191
Case7 – – 2598 3.7313
such as Case 7 MC and method in [13] runs out of memory, but the presented method
still works.
Table 5.3 compares the CPU times of MC, method in [13], presented method
(New), and presented method using statistical leakage characterization in SCL
(shorted as LUT). This table shows the presented new method, New, is much faster
than the method in [13] and MC simulation. On average, the presented algorithm has
about 113 speedup over [13] and many order of magnitudes over the MC method.
And the speed of our approach is not affected by the total number of grid cells. If
the spatial correlation is strong, which means dmax is large, dc can be increased at
the same time without loss of accuracy. So the number of neighbor grid cells in T .i /
will still be much smaller than the number of gates. The presented method will be
efficient and linear under both cases. Table 5.3 also shows the presented method can
gain further speedup with LUT technique using statistical leakage characterization
in SCL.
Table 5.4 Incremental Cost time(s) Speedup over

leakage analysis cost Test
case Incremental LUT MC [13] New LUT
Case1 3.78e4 2.2e5 2.7e4 265 53
Case2 1.53e4 5.7e5 8.1e4 915 157
Case3 0.0026 3.2e5 3.7e4 331 13
Case4 1.12e4 7.8e6 6.7e5 7768 407
Case5 0.0095 7.9e5 1.1e5 911 16
Case6 2.77e4 2.8e7 6.1e6 3.9e4 3.1e4
5.2 Incremental Analysis
For comparison purpose, one gate in each benchmark circuit is changed, and the
presented incremental algorithm is applied to update the leakage value locally.
Table 5.4 shows the computational cost of the incremental analysis and the speedup
over four different leakage analysis methods in Table 5.3. Compared with the
LUT approach (the fifth column in Table 5.3), the incremental analysis achieves
13 3:1e4X speedup. As discussed in Sect. 4.4, the minicircuit for updating only
contains a small constant number of terms. Therefore, when the problem size
increases further, we expect the incremental analysis could achieve more speedup
over the full leakage analysis.
6 Summary
In this chapter, we have presented a linear algorithm for full-chip statistical analysis
of leakage currents in the presence of any condition of spatial correlation (strong or
weak). The new algorithm adopts a set of uncorrelated virtual variables over grid
cells to represent the original physical random variables with spatial correlation,
and the size of grid cell is determined by the correlation length. As a result, each
physical variable is always represented by virtual variables in local neighbor set.
Furthermore, a LUT is used to cache the statistical leakage information of each type
of gate in the library to avoid computing leakage for each gate instance. As a result,
the full-chip leakage can be calculated with O.N / time complexity, where N is the
number of grid cells on chip. The new method maintains the linear complexity from
strong to weak spatial correlation and has no limitation of leakage current model or
variation model.
This chapter also presented an incremental analysis scheme to update the leakage
distribution more efficiently when local changes to a circuit are made. Numerical
examples show the presented method is about 1,000 faster than the recently
proposed method [13] with similar accuracy and many orders of magnitude over the
MC method. Numerical results show the presented incremental analysis can further
achieve significant speedup over the full leakage analysis.
Chapter 6
Statistical Dynamic Power Estimation
Techniques
1 Introduction
It is well accepted that the process-induced variability has huge impacts on the
circuit performance in the sub-90 nm VLSI technologies. The variational consid-
eration of process has to be assessed in various VLSI design steps to ensure robust
circuit design. Process variations consist of both inter-die ones, which affect all
the devices on the same chip in the same way, and intra-die ones, which represent
variations of parameters within the same chip. These include spatially correlated
variations and purely independent or uncorrelated variations. Spatial correlation
describes the phenomenon that devices close to each other are more likely to have
similar characteristics than when they are far apart. It was shown that variations
in the practical chips in nanometer range are spatially correlated [195]. Simple
assumption of independence for involved random variables can lead to significant
errors.
One great challenge from aggressive technology scaling is the increasing power
consumption, which has become a major issue in VLSI design. And the variations
in process parameters and timing delays result in variations in power consumption.
Many statistical leakage power analysis methods have been proposed to handle both
inter-die and intra-die process variations considering spatial variation [13, 65, 155,
200]. However, the problem is far from being solved for dynamic power estimation.
Dynamic power for a digital circuit in general is expressed as follows:
1 X n
Pdyn D fclk Vd2d Cj Sj ; (6.1)
2 j D1
where n is the number of gates on chip, fclk is clock frequency, Vd d is the supply
voltage, Cj is the sum of load capacitance and equivalent short-circuit capacitance
at node j , and Sj is the switching activity for gate j . This expression, however, does

84 6 Statistical Dynamic Power Estimation Techniques
AND2 Gate Dynamic Power Variation under

x 10−5 Full−swing and Partial−swing
AND2 Gate Dynamic Power (W) 2.5
Full−swing
2 Partial−swing
1.5
0.5
0
0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2
Leff Ratio 0.8~1.2
Fig. 6.1 The dynamic power versus effective channel length for an AND2 gate in 45 nm
technology (70 ps active pulse as partial swing, 130 ps active pulse as full swing). Reprinted with
permission from [60] c 2010 IEEE
not give explicit impacts of effective channel length (Leff ) and gate oxide thickness
(Tox ) of the gate on the dynamic power. In the work of [64], Leff and Tox are proved
to have the most impact on gate dynamic power consumption. Figure 6.1 shows
dynamic power variations due to different effective channel length for an AND2 gate
in 45 nm technology. It can be seen that channel lengths of a gate has a significant
impact on its dynamic power.
In this chapter, we propose to develop a more efficient statistical dynamic power
estimation method considering channel length variations with spatial correlation
and gate oxide thickness variations, which is not considered in the existing works.
The presented dynamic power analysis method explicitly considers the spatial
correlations and glitch width variations on a chip. The presented method [60]
follows the segment-based statistical power analysis method [30], where dynamic
power is estimated based on the switching period instead of switching events to
accommodate the glitch width variations. To consider the spatial correlation of
channel length, we set up a set of uncorrelated variables over virtual grids to
represent the original physical random variables via fitting. In this way, O.n2 / time
complexity for computing the variances can be reduced to linear-time complexity
(n is the number of gates in the circuit). The algorithm works for both strong and
weak correlations. Furthermore, a LUT is created to cache statistical information
for each type of gate to avoid running SPICE repeatedly. The presented method has
no restrictions on models of statistical distributions for dynamic power. Numerical
examples show that the presented method has 300 speedup over recently proposed
method [30] and many orders of magnitude over the MC method.
2 Prior Works 85
2 Prior Works
2.1 Existing Relevant Works
Many works on dynamic power analysis have been proposed in the past. MC-based
simulation was proposed in [10] where the circuit is simulated for a large number
of input vectors to gain statistics for average power. Later, probabilistic methods
for power estimation were proposed and widely used [29, 48, 116, 117, 183] because
statistical estimates can be obtained without time-consuming exhaustive simulation.
In [117], the concept of probability waveforms is proposed to estimate the mean and
variance of the current drawn by each circuit node. In [116], the notion of transition
density is introduced and they are propagated through combinational logic modules
without regard to their structure. However, the author did not consider the inner
signal correlation; thus, the algorithm is only applicable to combinational circuits.
Ghosh et al. [48] extended the transition density theory to consider sequential
circuits via the symbolic simulation to calculate the correlations between internal
lines due to reconvergence. However, the performance of this algorithm is restricted
due to its memory space complexity. In [29, 183], the authors used the tagged
probabilistic simulation (TPS) to model the set of all possible events at the output of
each circuit node, and is more efficient compared with [48] due to its effectiveness in
computing the signal correlation. The work [48] is based on zero-delay model, and
the works [10,116,183] are based on real delay model. However, all of them assume
fixed delay model, which is no longer true under process variation. At the same
time, all the previous works only consider full-swing transition, and partial-swing
effects are not well accounted for.
Recently, several approaches have been proposed for fast statistical dynamic
power estimation [4, 18, 30, 64, 66, 138]. Alexander et al. [4] proposed to consider
the delay variations and glitches for estimation dynamic powers. With efficient
simulation of input vectors, this algorithm has a linear-time complexity. But the
variation model is quite simple as only minimum and maximum bounds for delay
were obtained, and partial swings are not considered. Pilli et al. [138] presented
another approach, which divides the clock cycle into a number of time slots and
the transition density is computed for each slot, but only mean value of dynamic
power can be estimated. In [66], the authors used supergate and timed Boolean
functions to filter glitches and consider signal correlations due to reconvergent fan-
outs, but failed to consider the correlations including placement information. Chou
et al. [18] used probabilistic delay model based on MC simulation technique for
dynamic power estimation but also lacked the considerations including placement
information. Harish et al. [64] used hybrid power model based on MC analysis; the
method is only applied to a small two-stage two-input NAND gate; however, for
large circuits, Monte Carlo simulation can be really time consuming.
E1 E2 E3 E4 E5 ... Em
Time
Fig. 6.2 A transition waveform example fE1 ; E2 ; : : : ; Em g for a node. Reprinted with permission
from [60]
c 2010 IEEE
2.2 Segment-Based Power Estimation Method
Dinh et al. [30] recently proposed a method not based on the fixed delay gate model
to consider the partial-swing effect as well as the effect of process variation.
To accurately estimate the dynamic power in the presence of process variation,
the work in [30] introduces the transition waveform concept, which is similar to the
probability waveform [117] or tagged waveform [29] concepts except that variance
of the transition time is introduced. Specifically, a transition wave consists of set
of a transition events, which is a triplet .p; t; ıt /; where p is the probability for
the transition to occur, t is the mean time of the transition, and ıt is the standard
deviation of the transition time. Figure 6.2 shows an example of transition waveform
for a node.
The triplets are then propagated from the primary inputs to the primary outputs,
and they are computed for every node. In addition to propagating the switching
probabilities like traditional methods, this method also propagates the variances
along the signal paths, which is done in a straightforward way based on the second-
order moment matching. The glitch filtering is also performed to ensure accuracy
and reduce the number of switches during the propagation.
Unlike the traditional power estimation methods in [29, 117], which count the
transition times (or their probabilities), i.e., edges in the transition waveform, to
estimate the dynamic power, the work in [30] proposed to count the transition
segments (duration), which are pairs of two transition events to take into account
the impacts of the different glitch widths on the dynamic power consumption. For
n transition event in transition waveform, the number of segments is Cn2 D n.n
1/=2, which increases the complexity of the computation compared to the edge-
based method. Another implication is traditional power edge-based consumption
formula (6.1) cannot be used any more. As a result, a LUT is built from the
SPICE simulation results for different glitch widths. The total dynamic power for a
gate is then the probability-weighted average dynamic power for all the switching
segments, which is then summed up to compute the total chip dynamic power.
However, this method does not consider spatial correlation, which can lead to
significant errors and is the main issue to be addressed in this chapter.
3 The Presented New Statistical Dynamic Power Estimation Method 87
3 The Presented New Statistical Dynamic Power Estimation

Method
3.1 Flow of the Presented Analysis Method
In this section, we present the new full-chip statistical dynamic power analysis
method. The presented approach follows the segment-based power estimation
method [30]. The presented algorithm propagates the triplet switching events from
primary input to the output. Then it computes the statistical dynamic power at each
node based on orthogonal polynomial chaos and virtual grid-based variables for
channel length to deal with spatial correlation discussed in Sect. 3 of Chaps. 3 and
2 of Chap. 5.
We first present the overall flow of the presented method in Fig. 6.3 and then
highlight the major computing steps later.
The dynamic power for one gate (under glitch width Wg with variation and fixed
load capacitance Cl ) can be presented by Hermite polynomial expansion as
XQ
Pdyn;Wg ;Cl . g;j / D Pdyn;q;j Hq . g;j /: (6.2)
qD0

Fig. 6.4 The flow of building the sub LUT
Pdyn;q;j is then computed by the numerical Smolyak quadrature method. In this

chapter, we use second-order Hermite polynomials for statistical dynamic power
analysis. The coefficient for qth Hermite polynomial at j th gate, Pdyn;q;j , can be
computed as the following:
X
Pdyn;q;j D Pdyn .l /Hq .l /wl =hHq2 . g;j /i; (6.3)
where l is Smolyak quadrature sample. From the dynamic power LUT Pdyn D
f .L; Tox ; Wg ; Cl /, we can interpolate Pdyn .l /, which is the dynamic power for
every Smolyak sampling point.
3.2 Acceleration by Building the Look-Up Table
Since we follow the segment-based power estimation method, we have to charac-

terize the powers from the SPICE simulation with different sets of parameters. The
power of a gate is a function of L and Tox as well as glitch width Wg and load
capacitance Cl in the look-up table. Pdyn D f .L; Tox ; Wg ; Cl /. We then perform
SPICE simulation on different sets of those four parameters to get the accurate data
and build the LUT.
On the other hand, we observe that the coefficients of Hermite polynomials for
dynamic power of one gate in (6.2) and (6.3) are only functions of the type of the
gate, high and low (defined in Sect. 2 of Chap. 5) and Wg and Cl . Therefore, another
sub LUT can be used to store the coefficients of Hermite polynomials for each kind
of gate instead of computing the coefficients for each gate. The time complexity
reduces from the number of gates, O.n/, to the number of grids, O.N /. Figure 6.4
shows the flow of sub LUT construction.
3 The Presented New Statistical Dynamic Power Estimation Method 89
3.3 Statistical Gate Power with Glitch Width Variation
To compute the statistical gate power expression considering the glitch width
variations, we need to compute the probability of each switching segment assuming
that they follow the normal distribution:

1 .wi w /2
Pr.w D wi / D p exp : (6.4)
w 2 2w2
The Hermite polynomial coefficients for (6.2) under glitch width wi and load
capacitance Cl can be interpolated from the sub LUT. For a gate index j with the
transition waveform .p1 ; t1 ; t1 /, .p2 ; t2 ; t 2 /, : : : , .pM ; tM ; tM /, there are M.M
1/=2 segments. The resulting statistic power is probabilistic addition of power from
each segment (their Hermite polynomial expressions):
X
M 1 X
M
Pdyn;Cl . g;k / D P r.i; j / Pdyn;Cl . g;k ; i; j /; (6.5)
i D1 j Di C1
in which Pdyn;Cl . g;k ; i; j / is the dynamic power of gate k caused by the switching
segment between transitions Ei and Ej . P r.i; j / is the probability that the
switching segment .Ei ; Ej / occurs only if there are transitions at both Ei and Ej ,
and there are no transitions between Ei and Ej :
j 1
Y
P r.i; j / D pi pj .1 pk /: (6.6)
kDi C1
In the following, we write Pdyn;Cl . g;k / as Pdyn . g;k / without confusion.
3.4 Computation of Full-Chip Dynamic Power
The dynamic power for each gate is calculated using (6.5). To compute the full-chip
dynamic powers, we also need to transfer the local coefficients to corresponding
global positions first. Then we can proceed to compute the dynamic power for the
whole chip as follows,
Xn
total
Pdyn ./ D Pdyn . g;j /: (6.7)
j D1
The summation is done for each coefficient of global Hermite polynomials to

obtain the analytic expression of the final dynamic power in terms of . We can
then obtain the mean value, variance, PDF, and CDF of full-chip dynamic power
very easily. For instance, the mean value and variance for the full-chip dynamic
power are
total D Pdyn; 0th ; (6.8)

X X
2 2 2
total D Pdyn; 1st C 2 Pdyn; 2nd; type1
X
2
C Pdyn; 2nd; type2 ; (6.9)
where Pdyn;ith is the power coefficient for i th Hermite polynomial of second order
defined in (4.15).
The presented method and the segment-based analysis [30] have been implemented
in Matlab V7.8. The initial results of this chapter were published in [60].
The presented new method was tested on circuits in the ISCAS’89 benchmark
set. The circuits were synthesized with Nangate Open Cell Library under 45 nm
technology, and the placement is from UCLA/Umich Capo [145]. For comparison
purposes, we performed MC simulations (10,000 runs) considering spatial correla-
tion, the method in [30], and the presented method on the benchmark circuits. In our
MC implementation, similar to [30], we do not run the SPICE on the original circuits
as it is too much time consuming for ordinary computer. Instead, we compute the
results via interpolation from the characterization data computed from SPICE runs.
The 3 range of L and Tox is set as 20%, of which inter-die variations constitute 20%
and intra-die variations, 80%. L, Tox are modeled as Gaussian random variables. L
is modeled as sum of spatially correlated sources of variations based on (3.12). Tox
is modeled as an independent source of spatial variation. The same framework can
be easily extended to include other parameters of variations.
The characterization data for each type of gate in SCL are collected using
HSPICE simulation. For each type of gate, we perform repeated simulation on
sampling points in the 3 range of L, Tox , and input glitch width Wg for several
different load capacitances to obtain the gate dynamic powers and gate delays. The
table of characterization data will be used to interpolate the value of dynamic power
for each type of gate with different process parameters. We use 21 sample points for
glitch width, from 50 ps to 150 ps.
In transition waveform computation, the gate delays are obtained through the
table of characterization data, and the input signal probabilities are 0.5, with
switching probabilities of 0.75. The test cases are given in Table 6.1 (all length units
in m). In the first column, s and w stand for strong and weak spatial correlations,
respectively.
The comparison results of mean values and standard deviations of full-chip
dynamic power are shown in Table 6.2, where MC Co represents Monte Carlo
Table 6.1 Summary of Test case Gate # Grid # Area

benchmark circuits
s1196 (s) 529 27 9590
s1196 (w) 529 294 9590
s5378 (s) 2779 93 209.5198
s5378 (w) 2779 1300 209.5198
s9234 (s) 5597 161 278.5270
s9234 (w) 5597 2358 278.5270
Table 6.2 Statistical dynamic power analysis accuracy comparison against Monte Carlo
Mean value (mW) Errors (%)
Test
case Grid # MC Co [30] New [30] New
s1196 (s) 27 1.14 1.19 1.14 3.82 0.49
s1196 (w) 294 1.14 1.19 1.14 3.98 0.41
s5378 (s) 93 6.09 6.24 5.98 2.46 1.85
s5378 (w) 1300 6.09 6.23 5.98 2.29 1.85
s9234 (s) 161 12.8 13.2 12.5 2.94 2.31
s9234 (w) 2358 12.8 13.1 12.5 2.78 2.14
Standard deviation (mW) Errors (%)
Test
case Grid # MC Co [30] New [30] New
s1196 (s) 27 0.0912 0.00394 0.0845 95.68 7.33
s1196 (w) 294 0.0671 0.00395 0.0645 94.11 3.94
s5378 (s) 93 0.470 0.00877 0.435 98.13 7.61
s5378 (w) 1300 0.436 0.00891 0.412 97.96 5.68
s9234 (s) 161 0.964 0.0185 0.882 98.08 8.52
s9234 (w) 2358 0.894 0.0191 0.839 97.87 6.14
considering spatial correlation, and New is the presented method. The method
in [30] cannot consider spatial correlation as it assumed that the power for the
gates are independent Gaussian random variables. In implementation of [30], we
assume the same variation for Leff and Tox but without spatial correlations. The
average errors for mean and standard deviation () values of the New technique
are 1.49% and 6.54% compared to MC Co, respectively. While for the method
in [30], the average errors for mean value and are 3.04% and 96.97%, respectively.
As a result, not considering spatial correlations can lead to significant errors.
Furthermore, from the comparison between mean and standard deviation of MC Co,
the average std=mean is 7.21% which means spatial correlation in process parameter
has significant impact on the distribution of dynamic power. The results in Table 6.2
also show that our method can handle both strong and weak spatial correlations by
adjusting grid size.
Table 6.3 compares the CPU times of three methods, which shows that the New
method is much faster than the method in [30] and MC simulation. On average, the
presented technique has about 377 speedup over [30] and 5,123 speedup over the
MC method. In [30], the dynamic power of each gate needed to be interpolated from
the LUT due to different L, Tox , and glitch width value variations; the complexity
Table 6.3 CPU time CPU time (s) Speedup over

comparison Test
case MC Co [30] New MC Co [30]
s1196 (s) 1261 88 0.30 4242 296
s1196 (w) 1225 92 0.33 3743 281
s5378 (s) 7037 522 1.19 5927 440
s5378 (w) 6859 517 1.41 4874 367
s9234 (s) 14805 1062 2.11 7026 504
s9234 (w) 13978 1058 2.84 4927 373
is a linear function of the number of gates O.n/; however, in New algorithm, only
the coefficients of Hermite polynomials for each type of gate are needed to compute
and the overall complexity is a linear function of the number of grids O.N /.
5 Summary
In this chapter, we have presented a new statistical dynamic power estimation

method considering the spatial correlation in the presence of process variation.
The presented method considers the variational impacts of channel length on gate
dynamic powers. To consider the spatial correlation, it uses a spatial correlation
model where a new set of uncorrelated variables are defined over virtual grids to
represent the original physical random variables by least-square fitting. To compute
the statistical dynamic power of a gate on the new set of variables, the new method
applies the flexible OPC-based method, which can be applied to any gate models.
We adopted the segment-based statistical power method to consider the impacts
of glitch width variations on dynamic powers. The total full-chip dynamic powers
expressions are then computed by summing up the resulting orthogonal polynomials
(their coefficients) on the new set of variables for all gates. Numerical results
on ISCAS’89 benchmark with 45 nm technology show that the presented method
has about 300 speedup over recently proposed segment-based statistical power
estimation method [30] and many orders of magnitude over the MC method.
Chapter 7
Statistical Total Power Estimation Techniques
1 Introduction
For digital CMOS circuits, the total power consumption is given by the following
formula:
Ptotal D Pdyn C Pshort C Pleakage ; (7.1)
in which Pdyn , Pshort , and Pleakage represent dynamic power, short-circuit power,
and leakage power, respectively. Most of the previous works on power estimation
either focus on dynamic power estimation [10, 28–30, 64, 116] or leakage power
estimation [13, 95, 158, 200]. As technology scales down to nanometer range, the
process-induced variability has huge impacts on the circuit performance [120].
Furthermore, many variational parameters in the practical chips in nanometer range
are spatially correlated, which makes the computations even more difficult [195],
and simple assumption of independence for involved random variables can lead to
significant errors.
Early research on power analysis is mainly focusing on dynamic power
analysis [10, 28, 29, 116]; the solution ranges from the transition density-based
method [116], tagged probabilistic method [29], to the practical MC based
method [10, 28, 29]. Later on, designers realize that leakage power is becoming
more and more significant and is very sensitive to the process variations. As a
result, full-chip leakage power estimation considering process variations under
spatial correlation has been intensively studied in the past [13, 95, 158, 200]; the
method can be grid based [13, 158], projection based [95], and simplified gate
leakage model based [200].
Although total power can be computed by simply adding the dynamic power and
leakage power (plus short-circuit power), practically, dynamic power and leakage
power are correlated. For instance, leakage power of a gate depends on its input
state, which depends on the primary inputs and timing of the circuits. Using
dominant state or average values is less accurate than the precise circuit-level
simulation under realistic testing input vectors. Under the process variations with

94 7 Statistical Total Power Estimation Techniques
Power distribution with random input vectors

300
Occurances
200
100
0
2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4
W x 10 −4
Power distribution with a fixed input vector and correlations in Leff

300
Occurances
200
100
0
3.5 3.6 3.7 3.8 3.9 4 4.1 4.2
W x 10−4
Fig. 7.1 The comparison of circuit total power distribution of circuit c432 in ISCAS’85 bench-
mark sets (top) under random input vectors (with 0.5 input signal and transition probabilities) and
(bottom) under a fixed input vector with effective channel length spatial correlations. Reprinted
with permission from [62] c 2011 IEEE
spatial correlation, the dynamic power and leakage power are more correlated via
process parameters. As a result, traditional separate approaches will not be accurate.
Circuit-level total power estimation based on real testing vectors is more desirable.
Figure 7.1 shows the comparison of the circuit total power distribution of c432
from ISCAS’85 benchmark. We show two power variations. The first figure (upper)
is obtained due to random input vectors. The second is obtained using a fixed input
vector but under process variations with spatial correlation. As can be seen, the
variance induced by process variations is comparable with the variance induced by
random input vectors. As a result, considering process variation impacts on the total
chip power is important for early design solution exploration and post-layout design
sign-off validation.
Several works had been proposed to consider the dynamic power considering
process variation. Harish et al. [64] used hybrid power model based on MC analysis,
but the method is only applied to a small two-stage two-input NAND gate. The work
in [4] used a variation delay model to obtain minimum and maximum delay bounds
in order to estimate the number of glitches and dynamic power. The work in [30]
introduced a new method based on transition waveform concept, where transition
waveform is propagated through the circuit and the effect of partial swing could be
2 Review of the Monte Carlo-Based Power Estimation Method 95
considered. However, none of these works consider the process-induced variations

with spatial correlation which can be significant (as shown in Fig. 7.1).
In this chapter, we present an efficient statistical chip-level total power estimation
(STEP) method [62] considering process variations under spatial correlation in
which both the dynamic power and leakage power are included. To the best
knowledge of the authors, it is first work toward the statistical total power analysis.
The presented method uses the commercial Fast-SPICE tool (UltraSim) to obtain
total chip power. To consider the process variations with spatial correlation, we
first apply PFA method to transform the correlated variables into uncorrelated
ones and meanwhile reduce the number of resulting random variables. Afterward,
Hermite polynomials and sparse grid techniques are used to estimating total power
distribution in a sampling way. Numerical examples show that the proposed method
is 78 faster than the MC method under fixed input vector and 26 faster than
the MC method considering both random input vectors and process variations with
spatial correlation.
2 Review of the Monte Carlo-Based Power Estimation

Method
In general, dynamic power Pdyn is expressed as in (6.1). Many previous works

about dynamic power estimation are based on (6.1); they can be MC-based
[10, 28, 29] or probabilistic based [29, 116]. The MC-based method is considered
more accurate than probabilistic-based method and at the same time without losing
much efficiency [10]. In the MC-based method, the switching activity Si in (6.1)
can be modeled as
ni .T /
Si D ; (7.2)
T
in which ni .T / is the number of transitions of node i in the time interval
.T =2; T =2. The mean power PT is defined as

PT D E Pdyn : (7.3)
The key part in MC simulation is the stopping criterion. Suppose we need to

perform N different simulations of the circuit, each of length T and the average and
standard deviation of the N different Pdyn values are mdyn and sdyn , respectively.
Therefore, we have
8 9
<P m =
T dyn
lim P .p Pdyn D ˚ Pdyn ; (7.4)
N !1 :s N ;
dyn
in which P is the probability and ˚.Pdyn / is the CDF of the standard normal
distribution. Therefore, given the confidence level .1 ˛/, it follows that
( )
PT mdyn
P ˚˛=2 < ıp ˚1˛=2 D 1 ˛: (7.5)
sdyn N
As ˚˛=2 D ˚1˛=2 , given a specified error tolerance , (7.5) can be recast to

ˇ ˇ
ˇPT mdyn ˇ ˚1˛=2 sdyn
p : (7.6)
mdyn mdyn N
Equation (7.6) can be viewed as the stopping criterion when N , mdyn , and sdyn
satisfy it.
Afterward, the work in [28, 29] further improves the efficiency of MC-based
method. In [29], the author transforms the power estimation problem to a survey
sampling problem and applied stratified random sampling to improve the efficiency
of MC sampling. In [28], the author proposed two new sampling techniques,
module-based and cluster-based, which can adapt stratification to further improve
the efficiency of the Monte Carlo-based techniques. However, all of these works
are based on gate-level logic simulation as they only consider dynamic powers. For
total power estimation and estimating of impacts of process variations, one needs
transistor-level simulations. As a result, improving the efficiency of MC method
becomes crucial and will be addressed in this chapter.
3 The Statistical Total Power Estimation Method
In this section, we present the new chip-level statistical method for total estimation
of full-level powers, called STEP. The method can consider both fixed input vectors
and random input vectors for power estimation. Power distribution considering
process variations under fixed input vectors is important because it can reveal the
power distribution for the maximum power, the minimum power, or the power due
to user-specified input vectors. This technique can be further applied to estimate the
distribution for maximum power dissipation [188]. Power distribution under random
input vectors is also important, as it can show the total power distribution caused by
random input vectors and process variations with spatial correlation. We first give
the overall flow of the presented method under a fixed input vector in Fig. 7.2 and
then highlight the major computing steps later. The flow of the presented method
considering random input vectors is followed afterward. The spatial correlation
model is the same as Sect. 3 of Chap. 3.
3 The Statistical Total Power Estimation Method 97
Fig. 7.2 The flow of the presented algorithm under a fixed input vector
3.1 Flow of the Presented Analysis Method Under Fixed

Input Vector
The STEP method uses commercial Fast-SPICE tool for accurate total power
simulation. It transforms the correlated variables into uncorrelated ones and re-
duces the number of random variables using the PFA method [57]. Then it
computes the statistical total power based on Hermite polynomials and sparse grid
techniques [45].
3.2 Computing Total Power by Orthogonal Polynomials
Instead of using the MC method, a better approach is to use spectral stochastic

method, which will lead to much less sampling than standard MC for small number
of variables as discussed in Sect. 3.3 of Chap. 2.
In our problem, x./ will be the total power for the full chip. k is the number of
reduced variables by performing the PFA method. The full-chip total power can be
presented by HPC expansion as
XQ
Ptot ./ D Ptot;q Hq ./: (7.7)
qD0
Pt ot;q is then computed by the numerical Smolyak quadrature method. In this chap-
ter, we use second-order Hermite polynomials for statistical total power analysis,
and the Smolyak quadrature samples for k random variables is 2k 2 C 3k C 1. The
coefficient for qth Hermite polynomial, Ptot;q , can be computed as the following:
X
Ptot;q D Ptot .l /Hq .l /wl =hHq2 ./i; (7.8)
where l is Smolyak quadrature sample. As stated in Sect. 2.2 of Chap. 2, certain

quadrature sample can be converted to the sample in terms of the original gate
effective channel length variables via ı D Ll . Thus, Ptot .l / can be obtained by
running the circuit simulation tools like Fast-SPICE using the specified Leff obtained
from ı for each gate.
After the coefficients of the analytic expression of the total power (7.7) is
obtained, we can then get the mean value, variance, PDF, and CDF of full-chip
total power very easily. For instance, the mean value and variance for the full-chip
total power are
tot D Ptot;0th ; (7.9)

X X
2 2 2
tot D Ptot;1st C2 Ptot;2nd;type1
X
2
C Ptot;2nd;type2 ; (7.10)
where Ptot;ith is the power coefficient for i th Hermite polynomial of second order
defined in (4.15).
3.3 Flow of the Presented Analysis Method Under Random

Input Vectors
To consider more input vectors or random input vectors used in the traditional
dynamic power analysis, one simple way is to treat the input vector as one more
variational parameter in our statistical analysis framework. This strategy can be
easily fit into the simple MC-based method [10] as we just add one dimension to the
variable space. But for spectral stochastic method, it is difficult to add this variable
into existing space.
In probability theory, the PDF of a function of several random variables can
be calculated from the conditional PDF for single random variable. Let Ptotal D
g.Ui n ; Leff /, in which Ui n is the variable of random input vectors and Leff is the
variable of gates effective channel length. The PDF of total power Ptotal can be
calculated by Z 1
fPtotal .p/ D fLeff .lju/fUi n .u/du; (7.11)
1
Total power distribution under selected power points

Total power distribution under random input vectors
a b c Power
Fig. 7.3 The selected power points a, b, and c from the power distribution under random input
vectors. Reprinted with permission from [62]
c 2011 IEEE
in which the PDF function under random input vectors fUi n .u/ is obtained by MC-
based method [10] and the conditional PDF fLeff .ljUi n D u/ under fixed input u can
be obtained or interpolated from samples calculated from fixed input algorithm in
Fig. 7.2. Note u can be viewed as the power of chip under input u.
We use the example in Fig. 7.3 to illustrate the presented method. In this figure,
we first compute the power distribution (solid line) with random input vectors only.
Then we select three input power points, a; b; c (with three corresponding input
vectors). In each of the input power point, we perform statistical power analysis
with process variations under the fixed power input (using the corresponding input
vector). After this, we interpolate the power distributions for other power points for
final integration.
The flow of the presented analysis method under random input vectors is shown
in Fig. 7.4. The STEP algorithm computes the total power under random input
vectors using the MC-based method [10].
The presented method has been implemented in Matlab V7.8, and Cadence Ultrasim
7.0 was used for Fast-SPICE simulations. All the experimental results have been
carried out in a Linux system with quad Intel Xeon CPUs with 3 GHz and 16 GB
memory. The initial results of this chapter were published in [62].
The STEP method was tested on circuits in the ISCAS’85 benchmark set. The
circuits were synthesized with Nangate open cell library under 45 nm technology,
and the placement is obtained from UCLA/Umich Capo [145]. The test cases are
given in Table 7.1 (all length units in m).
Effective channel length Leff is modeled as sum of spatially correlated sources of
variations based on (3.12). The nominal value of Leff is 50 nm and the 3 range is
Fig. 7.4 The flow of the presented algorithm with random input vectors and process variations
Table 7.1 Summary Circuit Gate # Input # Output # Area

of benchmark circuits
c432 242 36 7 55 48
c880 383 60 16 85 84
c1355 562 41 32 84 78
c1908 972 33 25 102 102
c3540 1705 50 22 141 144
set as 20%. The same framework can be easily extended to include other parameters
of variations.
Firstly, we use the MC-based method [10] to obtain the mean and standard
deviation (std) of each circuit sample under random input vectors. The input signal
and transition probabilities are 0:5, with the clock cycle of 180 ps. The simulation
time for each sample circuit is 10 clock cycles, and the error tolerance is 0:01.
Secondly, we observe the total power distribution for each sample circuit under
fixed input vector. For each sample circuit, one input vector is selected, and then
we run the MC simulations (10,000 runs) under process variations with spatial
correlation as well as our presented STEP method. The results are shown in
Table 7.2, in which MC Co and STEP mean the MC method considering process
variations with spatial correlation and the presented method, respectively. The
average errors for mean and standard deviation of the STEP method are 2:90% and
6:00%, respectively. Figure 7.5 shows the total power distribution (PDF and CDF) of
circuit c880 under a fixed input. Table 7.3 gives parameter values of the correlation
length , reduced number of variable k, and sample count of Fast-SPICE running of
the two methods. Sampling time dominates the total simulation time for both MC
Table 7.2 Total power distribution under fixed input vector

Mean (uW) Err Std (uW) Err
Circuit MC Co Step (%) MC Co Step (%)
c432 267:6 261:7 2.23 10:22 9:54 6.78
c880 606:9 610:5 0.59 19:88 18:09 9.02
c1355 785:6 799:4 1.76 40:51 43:25 6.77
c1908 1404:9 1294:4 7.86 76:15 79:73 4.71
c3540 2824:6 2766:8 2.05 268:5 261:2 2.73
c880 power distribution pdf under fixed input

0.2
0.15 New
Probability
Monte Carlo
0.1
0.05
0
5.5 6 6.5 7 7.5
Power(W) x 10−4
c880 power distribution cdf under fixed input
1
0.8 New
Probability
Monte Carlo
0.6
0.4
0.2
0
5.5 6 6.5 7 7.5
Power(W) x 10−4
Fig. 7.5 The comparison of total power distribution PDF and CDF between STEP method and MC
method for circuit c880 under a fixed input vector. Reprinted with permission from [62]
c 2011
IEEE
Table 7.3 Sampling number Sample count

comparison under fixed input Speedup
vector Circuit ı k MC Co Step over
c432 50 6 10,000 91 110
c880 50 9 10,000 190 53
c1355 50 9 10,000 190 53
c1908 100 6 10,000 91 110
c3540 100 8 10,000 153 65
Co and the S TEP methods and the STEP method has 78 speedup over MC Co
method on average. The more speedup can be gained for large cases.
Thirdly, we compare the STEP method with the MC method under both random
input vectors and process variations with spatial correlation. We select three power
Table 7.4 Total power Mean (uW) Errors(%)

distribution comparison under
random input vector and Circuits MC Co MC nCo Step MC nCo Step
spatial correlation c432 299.9 299.9 312.7 0.01 4.26
c880 609.8 604.5 604.4 0.88 0.89
c1355 802.6 777.1 778.3 3.18 3.04
c1908 1375.1 1361.6 1361.3 0.98 0.99
c3540 2775.8 2821.7 2822.2 1.65 1.67
Standard deviation (uW) Errors(%)
Circuits MC Co MC nCo Step MC nCo Step
c432 45.3 40.4 44.6 10.9 1.52
c880 57.1 51.5 56.5 9.76 0.95
c1355 56.3 30.2 60.5 46.4 7.45
c1908 115.5 79.4 128.5 31.3 11.3
c3540 309.3 180.4 280.8 41.7 9.21
points from the total power distribution obtained by the MCy-based method [10]
and get the corresponding input vectors. We performed the STEP method under
these three input vectors and obtain the corresponding mean and standard deviation,
respectively. The .mean; std/ samples for other power points with distinguished
power values can be interpolated via the three samples.
Equation (7.11) is used to calculate the PDF of total power distribution under
both random input vectors and process variations with spatial correlation. The
results are shown in Table 7.4; MC Co, MC nCo, and STEP represent the MC
method considering process variations with spatial correlation, the MC method
without considering process variations with spatial correlation, and the presented
method, respectively. The average error of the mean and the standard deviation of
our method compared with MC Co is 2.17% and 6.09%, respectively. While the
average error of the mean and the standard deviation of MC nCo compared with MC
Co is 1.34% and 28.01%, respectively. The error (std) is increasing for larger test
cases.
Obviously, we can see that the MC method considering only random input
vectors fails to capture the true distribution when both input vector and process
variations are considered. The parameter values of ı and k is the same as in
Table 7.3. The difference is that we need to run STEP for three times and the
total sample numbers are increased correspondingly. However, the STEP method
still has 26 speedup over the MC method on average and remains to be accurate.
Figure 7.6 shows the power distribution comparison (PDF and CDF) of the STEP
method and the MC method under both random input vectors and process variations
with spatial correlation for circuit c880. We observe that the distribution of the total
power under a fixed input vector or under random input vectors has a distribution
similar to normal as shown in Figs. 7.5 and 7.6, such distribution justifies the use of
Hermite PC to represent the total power distributions.
5 Summary 103
c880 power distribution pdf

0.2
New
0.15 Monte Carlo
Probability
0.1
0.05
0
4 4.5 5 5.5 6 6.5 7 7.5 8 8.5
Power(W) x 10−4
c880 power distribution cdf
1
0.8 New
Probability
Monte Carlo
0.6
0.4
0.2
0
4 4.5 5 5.5 6 6.5 7 7.5 8 8.5
Power(W) x 10−4
Fig. 7.6 The comparison of total power distribution PDF and CDF between STEP method and
Monte Carlo method for circuit c880 under random input vector. Reprinted with permission
from [62]
c 2011 IEEE
5 Summary
In this chapter, we have presented an efficient statistical total chip power estimation
method considering process variations with spatial correlation. The new method is
based on accurate circuit-level simulation under realistic testing input vectors to
obtain accurate total chip powers. To improve the estimation efficiency, efficient
sampling-based approach has been applied using the OPC-based representation
and random variable transformation and reduction techniques. Numerical examples
show that the presented method is 78 faster than the MC method under fixed input
vector and 26 faster than the MC method considering both random input vectors
and process variations with spatial correlation.
Part III
Variational On-Chip Power Delivery
Network Analysis
Chapter 8
Statistical Power Grid Analysis Considering
Log-Normal Leakage Current Variations
1 Introduction
As discussed in Part II, process-induced variability has huge impacts on chip leakage
currents, owing to the exponential relationship between subthreshold leakage
current Isub and threshold voltage Vth as shown below [172],
Vgs Vth
Vds

Isub D Is0 e nVT 1e V T ; (8.1)
where Is0 is a constant related to the device characteristics, VT is the thermal

voltage, and n is a constant. It was shown in [78] that leakage variations for 90 nm
can be 20. Based on the ITRS [71], the leakage power accounts for more than
60% at 45 nm; there are many consequences for chip design, especially for design
of the power grid. The grid will develop voltage drop at all the nodes that are
correspondingly significant with strong within-die components. The voltage drop
is unavoidable and manifests itself as a background noise on the grid which has an
impact on the circuit delay and operation.
Clearly, the leakage current has exponential dependency on the threshold voltage
Vth . In the sequel, the leakage current is mainly referred to as the subthreshold
leakage current. Detailed analysis shows that Isub is also an exponential function
of the effective channel length Leff [142]. Actually, Leff are strongly correlated with
Voff as Voff variations typically are caused by the Leff . So if we model Vth or Leff
as the random variable with Gaussian variation caused by the inter-die or intra-
die process variations, then the leakage currents will have a log-normal distribution
as shown in [142]. On top of this, those random variables are spatially correlated
within a die, owing to the nature of the many physical and chemical manufacture
processes [120].
On-chip power grid analysis and designs have been intensively studied in the
past due to the increasing impacts of excessive voltage drops as technologies
scale [84, 191, 206]. Owing to the increasing impacts of leakage currents and its

108 8 Statistical Power Grid Analysis Considering Log-Normal Leakage Current Variations
variations on the circuit performances, especially on the on-chip power delivery

networks, a number of research works have been proposed recently to perform the
stochastic analysis of power grid networks under process-induced leakage current
variations. The voltage drop of power grid networks subject to the leakage current
variations was first studied in [39, 40]. This method assumes that the log-normal
distribution of the node voltage drop is caused by the log-normal leakage current
inputs and is based on a localized MC (sampling) method to compute the variance
of the node voltage drop. However, this localized sampling method is limited to
the static DC solution of power grids modeled as resistor-only networks. Therefore,
it can only compute the responses to the standby leakage currents. However, the
dynamic leakage currents become more significant, especially when the sleep
transistors are intensively used nowadays for reducing leakage powers. In [131,169],
impulse responses are used to compute the means and variances of node voltage
responses caused by general current variations. But this method needs to know the
impulse response from all the current sources to all the nodes, which is expensive
to compute for a large network. In [142], the PDF of leakage currents is computed
based on the Gaussian variations of channel lengths.
2 Previous Works
A number of research work have been proposed recently to address the voltage drop
variation issues in the on-chip power delivery networks under process variations.
The voltage drop of power grid networks subject to the leakage current variations
was first studied in [39, 40]. This method assumes that the log-normal distribution
of the node voltage drop is caused by log-normal leakage current inputs and is based
on a localized MC (sampling) method to compute the variance of the node voltage
drop. However, this localized sampling method is limited to the static DC solution of
power grids modeled as resistor-only networks. Therefore, it can only compute the
responses to the standby leakage currents. However, the dynamic leakage currents
become more significant, especially when the sleep transistors are intensively used
nowadays for reducing leakage powers. In [131,169], impulse responses are used to
compute the means and variances of node voltage responses due to general current
variations. But this method needs to know the impulse responses from all the current
sources to all the nodes, which is expensive to compute for a large network. This
method also cannot consider the variations of the wires in the power grid networks.
Recently, a number of analysis approaches based on so-called spectral stochastic
analysis method have been proposed for analyzing interconnect and power grid
networks [46, 47, 108, 190]. This method is based on the OPC expansion of random
processes and the Galerkin theory to represent and solve for the stochastic responses
of statistical linear dynamic systems. The spectral stochastic method only needs to
solve for some coefficients of the orthogonal polynomials by using normal transient
simulation of the original circuits. Research work in [190] applied the spectral
3 Nominal Power Grid Network Model 109
stochastic method to compute the variational delay of interconnects. In [46, 47],

the spectral stochastic method has been applied to compute the voltage drop
variations caused by Gaussian-only variations in the power grid wires and input
currents (approximating them as Gaussian variations by using first-order Taylor
expansion). Intra-die variations can be considered in [46]. Recently, the authors
extended the spectral stochastic method by specifically considering the log-normal
leakage variations to solve for the variational voltage drops in on-chip power grid
networks [107, 108]. Spatial correlations were also considered in [109].
In this chapter, we apply the spectral statistical method to deal with leakage
current inputs with log-normal distributions and spatial correlations [108]. We
show how to represent a log-normal distribution in terms of Hermite polynomials,
assuming Gaussian distribution of threshold voltage Vt h in consideration of intra-die
variation. To consider the spatial correlation, we apply orthogonal decomposition
via PCA to map the correlated random variables into independent variables. To
the best knowledge of the authors, the presented method is the first method being
able to perform statistical analysis on power grids with variation dynamic leakage
currents having log-normal distributions and spatial correlations. Experiment results
show that the presented method predicates the variances of the resulting log-normal-
like node voltage drops more accurately than Taylor expansion-based Gaussian
approximation method.
Notice that we only consider the leakage current inputs with log-normal dis-
tributions in this chapter. For general current variations from dynamic power of the
circuits, which typically can be modeled as Gaussian distribution, existing work [47]
using Taylor series expansion has been explored. The voltage variations caused
by the dynamic power can be considered on top of the variations from the log-
normal leakage currents. We notice that similar work, which consider only leakage
variations have been done before [39, 40].
We also remark that Vdd drop will have impacts on the leakage currents, which
create a negative feedback for the leakage current itself as increasing Vdd drop leads
to lower Vgs in (8.1), which leads to smaller Isub . However, to consider the effect,
both the power grid and signal circuits need to be simulated together, which will
be very expensive. Hence, practically, two-step simulation approach is used where
power grid and signal circuits are simulated separately but in an iterative way to
consider the coupling between them. In light of this simulation methodology, the
presented method can be viewed as the only one step (power grid simulation step)
in such a method.
3 Nominal Power Grid Network Model
The power grid networks in this chapter are modeled as RC networks with known
time-variant current sources, which can be obtained by gate-level logic simulations
of the circuits. Figure 8.1 shows the power grid models used in this chapter. For a
Fig. 8.1 The power grid model used
power grid (vs. the ground grid), some nodes having known voltage are modeled
as constant voltage sources. For C4 power grids, the known voltage nodes can be
internal nodes inside the power grid. Given the current source vector, u.t/, the node
voltages can be obtained by solving the following differential equations, which are
formulated using the modified nodal analysis (MNA) approach:
dv.t/
Gv.t/ C C D Bu.t/; (8.2)
dt
where G 2 Rnn is the conductance matrix, C 2 Rnn is the matrix resulting from
storage elements, v.t/ is the vector of time-variant node voltages and branch currents
of voltage sources, u.t/ is the vector of independent sources, and B is the input
selector matrix.
We remark that the proposed method can be directly applied to power grids
modeled as RLC/RLCK circuits. But inductive effects are still most visible at board
and package levels, and the recent power grid networks from IBM only consist of
resistance [123].
4 Problem Formulation 111
4 Problem Formulation
In this section, we present the modeling issue of leakage current under intra-die
variations for power grid network. Note that in this case, the leakage current is
random process instead of random variable in the full-chip leakage analysis in the
above part of this book. After this, we present the problem that we try to solve.
The G and C matrices and input currents I.t/ depend on the circuit parameters,
such as metal wire width, length, and thickness on power grids, and transistor
parameters, such as channel length, width, gate oxide thickness, etc. Some previous
work assumes that all circuit parameters and current sources are treated as uncorre-
lated Gaussian random variables [47]. In this chapter, we consider both power grid
wire variations and the log-normal leakage current variations, caused by the channel
length variations, which are modeled as Gaussian (normal) variations [142].
Process variations can also be classified into inter-die (die-to-die) variations
and intra-die variations. In inter-die variations, all the parameters variations are
correlated. The worst-case corner can be easily found by setting the parameters
to their range limits (mean plus 3). The difficulty lies in the intra-die variations,
where the circuit parameters are not correlated or spatially correlated within a
die. Intra-die variations also consist of local and layout-dependent deterministic
components and random components, which typically are modeled as multivariate
Gaussian process with some spatial correlations [12]. In this chapter, we first assume
we have a number of independent (uncorrelated) transformed orthonormal random
Gaussian variables ./; i D 1; : : : ; n, which actually model the channel length
and the device threshold voltage variations and other variations. Then, we consider
spatial correlation in the intra-die variation. We apply the PCA method in Sect. 2.2
of Chap. 2 to transfer the correlated variables into uncorrelated variables before the
spectral statistical analysis.
Let ˝ denote the sample space of the experimental or manufacturing outcomes.
For ! 2 ˝, let d .!/ D Œ1d .!/; : : : ; rd .!/ be a vector of r Gaussian variables
to represent the circuit parameters of interest. After the PCA operation, we obtain
independent random variable vectors D Œ1 ; : : : ; n . Notice that n r in general.
Therefore, given the process variations, the MNA for (8.2) becomes
dv.t/
G./v.t/ C C./ D I.t; .//; (8.3)
dt
The variation in wire width and thickness will cause variation in the conductance
matrix G./ and capacitance matrix C./. The variations are more related to back
end of the line (BEOL) as power grids are mainly metals at top or middle layers.
The input current vector, I.t; .//, has both deterministic and random components.
In this chapter, to simplify our analysis, we assume the dynamic currents (power)
caused by circuit switching are still modeled as deterministic currents as we only
consider the leakage variations. Practically, the variations caused by the dynamic
power of circuits can be significant. But the voltage variations caused by the leakage
variations can be viewed as background noise, which can be considered together
with dynamic power-induced variations later.
To obtain the variation current sources I.t; .//, some library characterization
methods will be used to compute the I.t; .// once we know the effective channel
length Leff variations, threshold voltage (Vth ) variations, and other variable sources
under different input patterns. With those variation-aware cell library, we can more
accurately obtain the I.t; .// based on the logic simulation of the whole chip
under some inputs.
Note that from practical use perspective, a user may be only interested in voltage
variations over a period of time or worst case in a period of time. Those information
can be easily obtained once we know the variations in any given time instance.
In other words, the information we obtain here can be used to derive any other
information that is interesting to designers.
The problem we need to solve is to efficiently find the mean and variances of
voltage v.t/ at any node and at any time instance. A straightforward method is MC-
based sampling methods in Sect. 3.1 of Chap. 2. We randomly generate G./, C./,
and I.t; .//, which are based on the log-normal distribution; solve (8.3) in time
domain for each sampling; and compute the means and variances based on sufficient
samplings. Obviously, MC will be computationally expensive. However, MC will
give the most reliable results and is the most robust and flexible method.
Specifically, we expand the variational G and C around their mean values and
keep the first-order terms as in [22, 102, 134].
G./ D G0 C G1 1 C G2 2 C : : : C GM M ; (8.4)
C./ D C0 C C1 1 C C2 2 C : : : C CM M :
We remark that the presented method can be trivially extended to the second- and
higher-order terms [134]. The input current variation i.t; / follows the log-normal
distribution as leakage variations are dominant factors:
i./ D eg./ ; g./ D C : (8.5)
Note that input current variation i./ is not a function of time as we only model the
static leakage variations for the simplicity of presentation. However, the presented
approach can be easily applied to time-variant variations with any distribution.
5 Statistical Power Grid Analysis Based on Hermite PC
5.1 Galerkin-Based Spectral Stochastic Method
To simplify the presentation, we first assume that C and G are deterministic in (8.3).
We will remove this assumption later. In case that v.t; / is unknown random
process as shown in Sect. 3.2 of Chap. 2 (with unknown distributions) like node
voltages in (8.3), then the coefficients can be computed by using Galerkin-based
5 Statistical Power Grid Analysis Based on Hermite PC 113
method. In this way, we transform the stochastic analysis process to a deterministic

process, where we only need to compute the coefficients of its Hermite PC. Once
we obtain those coefficients, the mean and variance of the random variables can be
easily computed as shown later in the section.
For illustration purpose, considering one Gaussian variable D Œ1 , we then
can assume that the node voltage response can be written as a second-order (p D 2)
Hermite PC:
v.t; / D v0 .t/ C v1 .t/1 C v2 .t/ 12 1 : (8.6)
Assuming that the input leakage current sources can also be represented by a second
Hermite PC,

I.t; / D I0 .t/ C I1 .t/1 C I2 .t/ 12 1 : (8.7)
By applying the Galerkin equation (2.44) and noting the orthogonal property of
the various orders of Hermite PCs, we end up with the following equations:
dvi .t/
Gvi .t/ C C D Ii .t/; (8.8)
dt
where i D 0; 1; 2; ::; P .
For two independent Gaussian variables, we have

v.t; / D v0 .t/ C v1 .t/1 C v2 .t/2 C v3 .t/ 12 1

Cv4 .t/ 22 1 C v5 .1 2 /: (8.9)
Assuming that we have a similar second-order Hermite PC for input leakage current
I.t; /,

I.t; / D I0 .t/ C I1 .t/1 C I2 .t/2 C I3 .t/ 12 1

CI4 .t/ 22 1 C I5 .1 2 /: (8.10)
The (8.8) is valid with i D 0; : : : ; 5. For more (more than two) Gaussian variables,
we can obtain the similar results with more coefficients of Hermite PCs to be solved
by using (8.8).
Once we obtain the Hermite PC of v.t; /, we can obtain the mean and variance
of v.t; / by (2.39).
One critical problem remaining so far is how to obtain the Hermite PC (8.7)
for leakage current with log-normal distribution. Our method is based on Sect. 4 of
Chap. 2, and we will show how it can be applied to solve our problems for one or
more independent Gaussian variables.
Once we have the Hermite PC representation of the leakage current sources
I.t; /, the node voltages v.t; / can be computed by using (8.8).
Once we obtain the Hermite PC of v.t; /, we can obtain the mean and variance
of v.t; / trivially by (2.39).
5.2 Spatial Correlation in Statistical Power Grid Analysis
Spatial correlations exist in the intra-die variations in different forms and have
been modeled for timing analysis [12, 121]. The general way to consider spatial
correlation is by means of mapping the correlated random variables into a set
of independent variables. This can be done by using some orthogonal mapping
techniques, such as PCA in Sect. 2.2 of Chap. 2. In this chapter, we also apply
PCA method in our spectral statistical analysis framework for power/grid statistical
analysis.
To consider intra-die variation in Vth , the chip is divided into n regions, assuming
˚ D Œ˚1 ; ˚2 ; : : : ; ˚n is a random variable vector, representing the variation of Vth
on different part of the circuit. In other words, in the ith region, the leakage current
Isubi D ce Vth .˚i / follows the log-normal distribution. Here, ˚i is a random variable
with Gaussian distribution. ˚ D Œˆ1 ; ˚2 ; : : : ; ˚n is the mean vector of ˚ and
C is the covariance matrix of ˚.
With PCA, we can get the corresponding uncorrelated random variables D
Œ1 ; 2 ; : : : ; n from the equation
D A.˚ ˚ /: (8.11)
Also, the original random variables can be expressed as
X
n
˚i D aij j C ˚i ; i D 1; 2; : : : n; (8.12)
j D1
where aij is the ith row, jth column element in the orthogonal mapping matrix
defined in (2.21). D Œ1 ; 2 ; : : : ; n is a vector with orthogonal Gaussian random
variables. The mean of j is 0 and variance is j , j D 1; 2; : : : ; n. The distribution
of i can be written as
i D i C i Oi ; i D 1; 2; : : : ; n: (8.13)
O D ŒO1 ; O2 ; : : : ; On is a vector with orthogonal normal Gaussian random variable.
˚i can be expressed with normal random variables, O D ŒO1 ; O2 ; : : : ; On :
X
n q
˚i D aij j Oj C ˚i ; i D 1; 2; : : : ; n: (8.14)
j D1
With (8.14), the leakage current can be expanded as Hermite PC:

Pn
O
j D1 gj j C˚i
I.˚i / e ˚i D e
0 1
X
n X
n X
n Oj Ok ıj k
B C
D i @1 C Oj gj C 2 gj gk C A :
j D1 j D1 kD1 h Oj Ok ıj k i
(8.15)
5 Statistical Power Grid Analysis Based on Hermite PC 115
Here,
q
gj D aij j ; j D 1; 2; : : : ; n: (8.16)
Therefore, the MNA equation with correlated random variables ˆ in current

source can be expressed in terms of uncorrelated random variables O as follows:
dv.t/ O
Gv.t/ C C D Ii .t; /: (8.17)
dt
O (8.17) will be simply solved by using (8.8), i D

With orthogonal property of ,
1; 2; : : : ; P .
5.3 Variations in Wires and Leakage Currents
In this section, we will consider variations in width (W ), thickness(T ) of wires of

power grids, as well as threshold voltage(Vth) in active devices which are reflected
in the leakage currents. Meanwhile, without loss of generality, these variations are
supposed to be independent of each other. As mentioned in [47], the MNA equation
for the ground circuit will become
dv.t/
G.g /v.t/ C C.c / D I.I ; t/: (8.18)
dt
The variation in width W and thickness T will cause variation in conductance
matrix G and capacitance matrix C while variation in threshold voltage will cause
variation in leakage currents I . Thus, the conductance and capacitance of wires can
be expressed as in [47]:
G.g / D G0 C G1 g ;
C.c / D C0 C C1 c : (8.19)
G0 ; C0 represent the deterministic components of conductance and capacitance of

the wires. G1 ; C1 represent sensitivity matrices of the conductance and capacitance.
g ; c are normalized random variables with Gaussian distribution, representing
process variation in wires of conductance and capacitor, respectively. As mentioned
in previous section, the variation in leakage current can be represented by a second
Hermite PC as in (2.55):

I.t; I / D I0 .t/ C I1 .t/I C I2 .t/ I2 1 : (8.20)
Here, I is a normalized Gaussian distribution random variable representing

variation in threshold voltage. I.t; I / follows log-normal distribution as
I D eg.I / ;
g.I / D I C I I : (8.21)
As in previous part, the desired Hermite PC coefficients, I0;1;2 , can be expressed as

I0 ; I0 I ; and 12 I0 I2 respectively. I0 is the mean of leakage current source, which is
expressed as

1
I0 D exp I C I2 : (8.22)
2
Considering the influence of g ; c ; I , the node voltage is therefore expanded by
Hermite PC in the second-order form as
v.t; / D v0 .t/ C v1 .t/g C v2 .t/c C v3 .t/I

Cv4 .t/ g2 1 C v5 .t/ c2 1 C v6 .t/ I2 1
Cv7 .t/g c C v8 .t/g I C v9 .t/c I : (8.23)
Now the task is to compute coefficients of the Hermite PC of node voltage v.t; /.
Applying Galerkin equation (2.44), we only need to solve the equations as follows:
h .t; /; 1i D 0; h .t; /; g i D 0;

h .t; /; c i D 0; h .t; /; I i D 0;
h .t; /; 2g 1i D 0; h .t; /; 2c 1i D 0;
h .t; /; 2I 1i D 0; h .t; /; g c i D 0;
h .t; /; g I i D 0; h .t; /; c I D 0: (8.24)
With the distribution of g , c , I , we can get these coefficients v.t/ D Œv0 .t/, v1 .t/,
: : : , v9 .t/T of node voltage as
e
Gv.t/ e dv.t/ D e
CC I .t/; (8.25)
dt
where
2 3
G0 G1 0 0 0 0 0 0 0 0
6 G1 G0 0 0 2G1 0 0 0 0 0 7
6 7
6 0 0 G0 0 0 0 0 G1 0 0 7
6 7
6 0 0 0 G 0 0 0 0 0 0 7
6 0 7
6 7
e 6 0 G1 0 0 G0 0 0 0 0 0 7
GD6 7
6 0 0 0 0 0 G0 0 0 0 0 7
6 7
6 0 0 0 0 0 0 G0 0 0 0 7
6 7
6 0 0 0 0 0 0 0 G0 0 0 7
6 7
4 0 0 0 G1 0 0 0 0 G0 0 5
0 0 0 0 0 0 0 0 0 G0
2 3
C0 0 C1 0 0 0 0 0 0 0
6 0 C0 0 0 0 0 0 C1 0 0 7
6 7
6 C1 0 C0 0 0 2C1 0 0 0 0 7
6 7
6 0 0 0 C 0 0 0 0 0 0 7
6 0 7
6 7
6
eD6 0 0 0 0 C 0 0 0 0 0 0 7
C 7
6 0 0 C1 0 0 C0 0 0 0 0 7
6 7
6 0 0 0 0 0 0 C0 0 0 0 7
6 7
6 0 0 0 0 0 0 0 C0 0 0 7
6 7
4 0 0 0 0 0 0 0 0 C0 0 5
0 0 0 C1 0 0 0 0 0 C0
e
I .t/ D ŒI0 .t/; 0; 0; I1 .t/; 0; 0; I2 .t/; 0; 0; 0T : (8.26)
Knowing Hermite PC coefficients of node voltage v.t; /, it is easy to get the mean
and variance of v.t; /, which describe the random characteristic of node voltage in
the given circuit.
We remark that the presented method will lead to large circuit matrices, which
will add more computation costs. To mitigate this scalability problem, for really
large power grid circuits, we can apply partitioning strategies to compute the
variational responses for each subcircuit, which will be small enough for efficient
computation, as done in the existing work [17, 206].
This section describes the simulation results of circuits with log-normal leakage
current distributions for a number of power grid networks. All the presented
methods have been implemented in Matlab. Sparse techniques are used in the
Matlab. All the experimental results have been carried out in a Linux system with
dual Intel Xeon CPUs with 3.06 GHz and 1 GB memory. The initial results of this
chapter were published in [108, 109].
The power grid circuits we test are RC mesh circuits based on the values from
some industry circuits, which are driven by only leakage currents as we are only
interested in the variations from the leakage currents. The resistor values are in the
range 102 ˝, and capacitor values are in the range of 1012 farad.
6.1 Comparison with Taylor Expansion Method
We first compare the presented method with the simple Taylor expansion method
for one and more Gaussian variables.
For simplicity, we assume one Gaussian random variable g./, which is ex-
pressed as
g D g C g ; (8.27)
where is a normalized Gaussian random variable with hi = 0, and h 2 i = 1.
The log-normal random variable l./, obtained from g./, is written as
l./ D eg./ D exp.g C g /: (8.28)
Expand the exponential into Taylor series and keep all the terms up to second
order, then we have
X
1
1 XX
1 1
l./ D 1 C i gi C gi gj C : : :
i D0
2 i D0 j D0 i j
1 1
D 1 C g C 2g C g2 C .g C g g /
2 2
1 2 2
C g . 1/ C : (8.29)
2
We observe that the second-order Taylor expansion, as shown in (8.29), is
similar to second-order Hermite PC in (2.57). Hence, the Galerkin-based method
can still be applied; we then use (8.8) to obtain the Hermite PC coefficients
of node voltage v.t; / accordingly. We want to emphasize, however, that the
polynomials generated by Taylor expansion in general are not orthogonal with
respect to Gaussian distributions and cannot be used with Galerkin-based method,
unless we only keep the first order of Taylor expansion results (with less accuracy).
In this case, the resulting node voltage distribution is still Gaussian, which obviously
is not correct.
We note that the first-order Taylor expansion has been used in the statistic timing
analysis [12]. The delay variations, owing to interconnects and devices, can be
approximated with this limitation. The skew distributions may be computed easily
with Gaussian process.
Table 8.1 Accuracy ıg 0.01 0.1 0.3 0.5 0.7

comparison between Hermite
PC (HPC) and Taylor HPC (%) 3.19 1.88 2.07 5.5 2.92
expansion Taylor (%) 3.19 1.37 2.41 16.6 24.02
To compare these two methods, we use the MC method to measure the accuracies
of two methods in terms of standard deviation. For MC, we sample 2,000 times,
which represents 97.7% accuracy. The results are summarized in Table 8.1. In
this table, ıg is the standard deviation of the Gaussian random threshold voltage
Gaussian variable in the log-normal current source, and HPC is the standard
deviation from the Hermite PC method in terms of relative percentage against the
MC method. Taylor is the standard deviation from the Taylor expansion method in
terms of relative percentage against the MC method.
We can observe that when the variation of current source increases, the Taylor
expansion method will result in significant errors compared to the MC method,
while the presented method has the smaller errors for all cases. This clearly shows
the advantage of the presented method.
6.2 Examples Without Spatial Correlation
Figure 8.2 shows the node voltage distributions at one node on a certain point of a
ground network with 1,720 nodes. The MC results are obtained by 2,000 samples.
The standard deviations of the log-normal current sources with one Gaussian
variable are 0.1. The mean and 3 computed by the Hermite PC method are also
marked in the figure, which fits very well with the MC results. Figure 8.3 shows the
node voltages and its variations caused by the leakage currents from 0 ns to 126 ns.
The circuit selected contains 64 nodes with one Gaussian variable of 0.06 in the
current source. The blue solid lines are mean, upper bound and lower bound. The
cyan lines are node voltages of MC with 2,000 times. Most of the MC results are in
between upper bound and lower bound.
Another observation is that when standard deviation, g , is small, the shape looks
like Gaussian as in Fig. 8.2, but it is log-normal indeed. In the case of two random
variables with one large and the other small standard deviations, the larger one
dominates, which shows the shape of log-normal as in Fig. 8.4.
To consider multiple random variables, we divide the circuit into several
partitions. We first divide the circuit into two parts. Figure 8.4 shows the node
voltage of one node of a particular time instance of a ground network with 336
nodes with two independent variables. The standard deviations for two Gaussian
variations are g1 D 0:5, g2 D 0:1. The 3 variations are also marked in the figure.
Tables 8.2 and 8.3 show the speedup of the Hermite PC method over MC method
with 2,000 samples considering one and two random variables, respectively.
Distribution of voltage at given node (one variable, σ = 0.1)

150
←μ − 3 δ ←μ ←μ+3δ
Number of occurances
100
50
0
0.15 0.2 0.25 0.3 0.35 0.4 0.45
Voltage (volts)
Fig. 8.2 Distribution of the voltage in a given node with one Gaussian variable, g D 0:1, at time
50 ns when the total simulation time is 200 ns. Reprinted with permission from [109] c 2008
IEEE
x 10−3 Comparison between Hermite PC and Monte Carlo

3.2
2.8
voltage(v)
2.6
2.4
2.2
2
0 20 40 60 80 100 120 140
time(ns)
Fig. 8.3 Distribution of the voltage caused by the leakage currents in a given node with one
Gaussian variable, g D 0:5, in the time instant from 0 ns to 126 ns. Reprinted with permission
from [109]
c 2008 IEEE
Distribution of voltage at given node (two variables, σ = 0.1 and 0.5)

200
180
160
140
120 ←μ−3δ ←μ ←μ+3δ
100
80
60
40
20
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
Voltage (volts)
Fig. 8.4 Distribution of the voltage in a given node with two Gaussian variables, g1 D 0:1 and
g2 D 0:5, at time 50 ns when the total simulation time is 200 ns. Reprinted with permission from
[109] c 2008 IEEE
Table 8.2 CPU time comparison with the Monte Carlo method of one random variable
Ckt #node p n MC(s) #MC HPC(s) Speedup
gridrc 6 280 2 1 766.06 2000 1.0156 754.3
gridrc 12 3240 2 1 4389 2000 8.3281 527.0
gridrc 5 49600 2 1 2:3 105 2000 298.02 771.76
Table 8.3 CPU time comparison with the Monte Carlo method of two random variables
Ckt #node p n MC (s) #MC HPC (s) Speedup
gridrc 3 280 2 2 1:05 103 2000 2.063 507.6
gridrc 5 49600 2 2 2:49 105 2000 445.6 558.7
gridrc 9 105996 2 2 6:11 105 2000 1141.8 535.1
In two tables, #node is the number of nodes in the power grid circuits. p is the
order of the Hermite PCs, and n is the number of independent Gaussian random
variables. #MC is the number of samples used for MC method. HPC and MC
represent the CPU times used for Hermite PC method and MC method, respectively.
It can be seen that the presented method is about two orders of magnitude faster than
the MC method.
When more Gaussian variables are used for modeling intra-die variations, we
need more Hermite PC coefficients to compute. Hence, the speedup will be smaller
if the MC method uses the same number of samples as shown in gridrc 12. Also, one
Φ1 = ξ1 + 0.5ξ2 Φ2 = ξ2 + 0.5ξ1
Fig. 8.5 Correlated random variables setup in ground circuit divided into two parts. Reprinted
with permission from [109]
c 2008 IEEE
Table 8.4 Comparison Mean Std dev

between non-PCA
and PCA against Monte Non-PCA PCA Non-PCA PCA
ckt #nodes % error % error % error % error
Carlo methods
1 336 10.3 0.52 18.8 1.13
2 645 8.27 0.59 11.4 1.16
3 1160 10.8 0.50 2.6 0.73
observation is that the speedup depends on the sampling size in MC method. The
speedup of the presented method over the MC method depends on many factors such
as the order of polynomials, number of variables, etc. In general, speedup should not
have a clear relationship with the circuit sizes. We still use 2,000 samples
p for MC,
which represent about 97.7% accuracy (as the error in MC is roughly 1= 2000 for
2,000 samples).
6.3 Examples with Spatial Correlation
To model the intra-die variations with spatial correlations, we divide the power grid
circuit into several parts. We first consider that circuit is partitioned into two parts.
In this case, we have two independent random current variables, 1 and 2 . The
correlated variables for the two parts are ˚1 D 1 C 0:52 and ˚2 D 2 C 0:51 ,
respectively, as shown in Fig. 8.5.
Table 8.4 shows the error percentage of mean and standard deviation of the
comparison between Monte Carlo and HPC with PCA and the comparison between
Monte Carlo and HPC without PCA. As shown in the table, it is necessary to use
PCA when spatial correlation is considered. Figure 8.6 shows the node voltage
distribution of one certain node in a ground network with 336 nodes, using both
PCA and non-PCA methods.
To get more accuracy, we divide the circuit into four parts, and each part has
correlation with its neighbor as shown in Fig. 8.7. is the correlated random
variable vector we use in the circuit.
D Œ
1 ;
2 ;
3 ;
4 are independent Gaussian
distribution random variables with standard deviations
1 D 0:1,
2 D 0:2,
3 D 0:1,
and
4 D 0:5. Figure 8.8 is the voltage distribution of a given node. The mean
voltage and voltages of worst case are given as the solid line. Figure 8.9 is the
voltage distribution of a circuit with 1,160 nodes. The circuit is partitioned into 25
parts of five rows and five columns with spatial correlation. The dashed blue lines
are mean, upper bound, and lower bound by Hermite PC. While the solid red lines
are mean, upper bound, and lower bound by MC of 2,000 times.
Distribution of voltage considering spatial correlation(two variables)

350
dotted line:Monte Carlo
300 solid line:HPC with PCA
dashed line:HPC without PCA
250 μ−3δ → ←μ ← μ+3δ
200
150
100
50
0
−0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Voltage(volts)
Fig. 8.6 Distribution of the voltage in a given node with two Gaussian variables with spatial
correlation, at time 70 ns when the total simulation time is 200 ns. Reprinted with permission from
[109] c 2008 IEEE
φ1=ζ1+0.5ζ2+0.5ζ3 φ3=ζ3+0.5ζ1+0.5ζ4
φ2=ζ2+0.5ζ1+0.5ζ4 φ4=ζ4+0.5ζ2+0.5ζ3
Fig. 8.7 Correlated random variables setup in ground circuit divided into four parts. Reprinted
c 2008 IEEE
Note that the size of the ground networks we analyzed is mainly limited by the
solving capacity of Matlab on a single Intel CPU Linux workstation. Given long
simulation time of large MC sampling runs, we limit the ground network size to
about 3,000 nodes.
Also note that for more accurate modeling, we need to have more partitions of
the circuits, and thus, more independent Gaussian variables are needed as shown
in [12].
6.4 Consideration of Variations in Both Wire and Currents
Figure 8.10 shows the node voltage distribution at one node of ground circuit,
circuit5, which contains 280 nodes considering variation in conductance, capacitor,
and leakage current. The maximum 3ı variation is 10% in g , c , and I . In
the figures, the solid lines are the mean voltage and worst-case voltages using
Distribution of voltage considering spatial correlation(four variables)

400
350
300
250 ← μ−3σ ← μ+3σ ←μ
200
150
100
50
0
0 0.2 0.4 0.6 0.8 1
Voltage(volts)
Fig. 8.8 Distribution of the voltage in a given node with four Gaussian variables with spatial
[109] c 2008 IEEE
Distribution of voltage considering spatial correlation(5*5)

300
dashed:HPC
line:Monte Carlo
250 ← μ−3δ ←μ ← μ+3δ
200
150
100
50
0
2 2.5 3 3.5 4
Voltage(volts)
Fig. 8.9 Distribution of the voltage in a given node with circuit partitioned of 5 5 with spatial
[109] c 2008 IEEE
Distribution of voltage considering variance in G,C,I

300
250 dot: Monte Carlo

line: HPC
200
← μ−3δ ←μ ← μ+3δ
150
100
50
0
0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.08
Voltage(volts)
Fig. 8.10 Distribution of the voltage in a given node in circuit5 with variation on G,C,I, at time
50 ns when the total simulation time is 200 ns. Reprinted with permission from [109] c 2008
IEEE
Table 8.5 CPU time Ckt # of nodes MC(s) HPC(s) Speedup

comparison with the MC
method considering variation gridrc 6 280 1320.1 9.25 142.7
in G,C,I gridrc 12 3,240 12183 141.4 86.2
gridrc 62 9,964 63832 3261 19.6
HPC method. The histogram bars are the Monte Carlo results of 2,000 samples.
The dotted lines are the mean voltage and worst-case voltage of the 2,000 samples.
From the figures, we can see that results got from two methods match very well.
Table 8.5 shows the CPU speedup of HPC method over MC method. The sample
number of Monte Carlo is 3,500, and we can see that the presented method is about
two orders of magnitudes faster than the MC method when considering variations
in conductance, capacitors, and voltage sources. The speedup becomes smaller for
larger circuits. This is because of the super-linear-time complexity of linear solver
as the augmented matrices in (8.26) grow faster than each individual matrices Gi
and Ci . The presented method does not favor very large circuits. Practically, this
scalability problem can be mitigated by using partitioning-based strategies [17].
7 Summary
In this chapter, we have presented a stochastic simulation method for fast estimating
the voltage variations from the process-induced log-normal leakage current varia-
tions with spatial correlations. The presented new analysis is based on the Hermite
PC representation of random processes. We extended the existing Hermite PC-based
power grid analysis method [47] by considering log-normal leakage distributions as
well as the consideration of the spatial correlations. The new method considers both
log-normal leakage distribution and wire variations at the same time. The numerical
results show that the new method is more accurate than the Gaussian-only Hermite
PC using the Taylor expansion method for analyzing leakage current variations and
two orders of magnitude faster than MC methods with small variation errors. In the
presence of spatial correlations, method without considering the spatial correlations
may lead to large errors, roughly 8–10% in our tested cases, if correlation is not
considered. Numerical examples show the correctness and high accuracy of the
presented method. It leads to about 1% or less of errors in both mean and standard
deviations and is about two orders of magnitude faster than MC methods.
Chapter 9
Statistical Power Grid Analysis by Stochastic
Extended Krylov Subspace Method
1 Introduction
In this chapter, we present a stochastic method for analyzing the voltage drop
variations of on-chip power grid networks with log-normal leakage current varia-
tions, which is called StoEKS and which still applies the spectral-stochastic-method
to solve for the variational responses. But different from the existing spectral-
stochastic-based simulation method, the EKS method [177, 191] is employed to
compute variational responses using the augmented matrices consisting of the
coefficients of Hermite polynomials. Our work is inspired by recent spectral-
stochastic-based model order reduction method [214]. We apply this work to the
variational analysis of on-chip power grid networks considering the variational
leakage currents with the log-normal distribution.
Our contribution lies in the acceleration of the spectral stochastic method
using the EKS method to fast solve the variational circuit equations for the first
time. By using the Krylov-subspace-based reduction technique, the new method
partially mitigates the increased circuit-size problem associated with the augmented
matrices from the Galerkin-based spectral stochastic method. We will show how the
coefficients of Hermite PCs are computed for variational circuit matrices and for the
current moments used in EKS with log-normal distribution. Numerical examples
show that the presented StoEKS is about two orders of magnitude faster than the
existing Hermite PC-based simulation method, having similar error compared with
MC method. StoEKS can analyze much larger circuits than the existing Hermite PC
method in the same computation platform.
The variational power grid models and problem we plan to solve here are
the same as in Chap. 8. The rest of this chapter is organized as the follows:
Sect. 3 reviews the orthogonal PC-based stochastic simulation method and the
improved EKS method. Section 4 presents our new statistical power grid simulation
method. Section 5 presents the numerical examples and Sect. 6 concludes this
chapter.

128 9 Statistical Power Grid Analysis by Stochastic Extended Krylov...
In this chapter, we assume that the variational current source in (8.3), u.t; /,
consists of two components:
u.t; / D ud .t/ C uv .t; /; (9.1)
where ud .t/ is the dynamic current vector from circuit switching, which is still
modeled as deterministic currents as we only consider the leakage variations.
uv .; t/ is the variational leakage current vector, which is dominated by subthreshold
leakage currents and it may change over time also. uv .t; / follows the log-normal
distribution.
The problem we need to solve is to efficiently find the mean and variance of
voltage u.t; / at any node at any time instance without using the time-consuming
sampling-based method, such as MC.
3 Review of Extended Krylov Subspace Method
In this subsection, we briefly review the EKS method in [191] and [89] for fast
computation of responses from linear dynamic systems.
The EKS method uses the Krylov-like reduction method to speed up the simula-
tion process. Different from the Krylov-based model order reduction method, EKS
performs the reduction considering both system matrices and input signals before
the simulation (so the subspace is no longer Krylov subspace). So it essentially is a
simulation approach using the Krylov subspace reduction method. It assumes input
signals can be represented by piecewise linear (PWL) sources.
Let V D ŒOv1 ; vO 2 ; :::Ovk be an orthogonal basis for moment subspace .m0 , m1 ,
: : :, mk / of input u.t/. Following is the high-level description of the EKS algorithm
(Fig. 9.1) [191].
Then the original circuit described by (8.2) can be reduced to a smaller system:
O C CO dz.t/ D Bu;
Gz O (9.2)
dt
where
GO D V T GV; CO D V T C V; BO D V T B; v.t/ D V z.t/:
After the reduced system in (9.2) has been solved for the given input u.t/, the
solution z.t/ can then be mapped back into original space by v.t/ D V z.t/.
As the EKS models a PWL source as a sum of delayed ramps in Laplace domain,
the terms, however, contain 1=s and 1=s 2 moments [191], while the traditional
3 Review of Extended Krylov Subspace Method 129
Input: G,C ,B,u.t / and moment order q

Output: orthogonal basis V D fOv0 ; vO 2 ; :::; vO q1 g
1 vO 0 D ˛0 v0 , where v0 D G 1 Bu0 , ˛0 D 1
norm.v0 /
;
2 set hs D 0;
3 for i D 1 W q 1
4 vi D G 1 f˘ji1 D0 ˛j Bui C.O
vi1 C ˛i1 hs /g;
5 hs D 0;
6 for j D 0 W i 1
7 h D vO Tj vi ;
8 hs D hs C hOvj ;
9 end
10 vN i D vi hs ;
11 ˛i D norm.N 1
vi /
;
12 vO i D ˛i vN i
13 end
Fig. 9.1 The EKS algorithm
Krylov space starts from 0th moment. Therefore, moment shifting must be made
in EKS, which would cause complex computation and more errors. This problem is
resolved in [89] in the IKES algorithm, which shows that the moments of 1=s and
1=s 2 are zeros for PWL input sources.
Assume that we want to obtain a single input source uj .s/ in the following
moment form:
uj .s/ D u1 C u2 s C u3 s 2 C C uL s L1 :
A PWL source uj .t/ is represented by a series of value-time pairs such as .a1 ; 1 /,
.a2 ; 2 /; :::; .aKC2 ; KC2 /; and L moments needed to be calculated. As proposed
in [89], the mth moment for current source uj .t/ in a current source vector u.s/ can
be calculated as
X k
1 .m/ .mC1/
uj;m D a1 ˛1 ˇ1 .˛i ˛i C1 /ˇi C1
mC1 i D1

kC2 .m/
aKC2 ˛KC1 ˇKC2 ; m D 1; :::; L: (9.3)
mC1
Here,
.m/ .i /m ai C1 ai
ˇi D ; ˛i D :
mŠ i C1 i
The EKS/IEKS method, however, has its limitations. One major drawback is that
current sources have to be represented in the explicit moment form, which may
not be accurate and not numerically stable when high-order moments are employed
for high-frequency-rich current waveforms owing to the well-known problem in the
explicit moment matching method [136].
Recently, more stable and accurate algorithm, called ETBR, has been pro-
posed [93], which is based on more accurate fast truncated balanced reduction
method. It uses a frequency spectrum to represent the current sources, and thus,
is more flexible and accurate. Since our contribution in this chapter is not about
improving the EKS method, we just use EKS as a baseline algorithm for StoEKS.
4 The Stochastic Extended Krylov Subspace

Method—StoEKS
In this section, we present the new stochastic simulation algorithm, StoEKS, which
is based on both the spectral stochastic method and the EKS method [191]. The main
idea is that we use the spectral stochastic method to convert the statistical simulation
into a deterministic simulation problem. Then we apply EKS to solve the converted
problem.
4.1 StoEKS Algorithm Flowchart
First, we present StoEKS algorithm flowchart, which is shown in Fig. 9.2. The
algorithm starts with variational G./, C./, and variational input source u.t; /.
Then, it applies spectral stochastic method to convert the variational system (8.3)
into a deterministic system, which consists of augmented matrices of G./ and
C./ and position matrix B in (8.3) with new unknowns. Then we generate the first
L moments of coefficients of Hermite polynomial of current sources, UL , with log-
normal distribution. Finally, we apply EKS/IEKS to solve the obtained deterministic
system for response Z using the computed projection matrix V . After this, we get
back to the transient response of the original augmented system by v.t/ D V z.t/.
Finally, we compute the mean and variance of any voltage node from v.t/.
In the following subsections, we present the detailed descriptions for some
critical steps of the StoEKS algorithm.
4.2 Generation of the Augmented Circuit Matrices
We first show how we convert the variational circuit equation into a deterministic
one, which is suitable for EKS. Our work follows the recently presented stochastic
model order reduction (SMOR) method [214]. SMOR is based on Hermite PC and
the Krylov-based projection method.
4 The Stochastic Extended Krylov Subspace Method—StoEKS 131
Fig. 9.2 Flowchart of the

StoEKS algorithm
StoEKS algorithm. Reprinted
c 2008 IEEE
Given varience of
G, C, u
Get augmented system

G_sts, C_sts,B_sts,u_sts
Compute first L moments of

u_sts by IEKS for every current source
Obtain orthogonal basis V

by IEKS on the augmented system
Solve reduced system, z(t),

based on orthogonal basis V
Project back to original circuit

x(t)=Vz(t)
get mean and variance of the voltage

of every node
We first assume that G./, C./, and u.t; / in (8.3) are represented in Hermite
PC forms with a proper order P :
G./ D G0 C G1 H1 ./ C G2 H2 ./ C C GP HP ./;

C./ D C0 C C1 H1 ./ C C2 H2 ./ C C CP HP ./;
u.t; / D .u0 .t/ C ud .t// C u1 .t/H1 ./ C C uP .t/HP ./:
Here, Hi ./ are the Hermite PC basis functions for G./, C./, and u.t; /. P is
also the number of these basis functions, which depends on the number of random
variables n and the expansion order p in (2.31). Gi , Ci , and ui are the Hermite
polynomial coefficients of conductance, capacitors, and current source. G0 and C0
are the mean value of conductance and capacitors. Gi and Ci are variational part for
conductance and capacitors.
Ideally, to obtain the G and C in the HPC format, i.e., to compute Gi and Ci from
the width and length variables, one can use spectral stochastic analysis method [86],
which is a fast MC method or other extraction methods. For this chapter, we simply
assume that we obtain such information. The detail of how Gi and Ci are obtained
is as follows:
Gi D ai G0 ;
Ci D ai C0 ; i D 1; :::; P: (9.4)
ai is the variational percentage for Hi .

Substitute (9.4) into (8.3), the system equations become
X
P 1 P
X 1 X
P 1 P
X 1
Gi vj Hi Hj C s Ci vj Hi Hj
i D0 j D0 i D0 j D0
X
P 1
D ud .t/ C ui .t/Hi : (9.5)
i D0
Here, vi is the coefficients of Hermite polynomial of node voltages v.t; / as
v.t; / D v0 .t/ C v1 .t/H1 C v2 .t/H2 C C vP 1 .t/HP 1 : (9.6)
After performing the inner product of Hk on both sides of the equation (9.5), it will
become
X
P 1 P
X 1 X
P 1 P
X 1
Gi vj hHi Hj ; Hk i C s Ci vj hHi Hj ; Hk i
i D0 j D0 i D0 j D0
X
P 1
D ui hHi ; Hk i C hHk ; 1ivd .t/; k D 0; 1; :::; P 1; (9.7)
i D0
where hHi Hj ; Hk i is the inner product of Hi Hj and Hk . On the right-hand side

(rhs) of (9.7), the inner product is calculated based on Hi and Hk .
Notice that hHk ; 1i D 1, when k D 0; hHk ; 1i D 0 when k ¤ 0. In general, the
coefficients of Hi Hj are calculated in (9.5), and the inner product is defined as
Z C1
hHi Hj ; Hk i D Hi Hj Hk d; (9.8)
1
considering the independent of Hermite polynomial Hi , Hj , and Hk . Also, the inner
product is similar for
Z C1
hHi ; Hj i D Hi Hj d: (9.9)
1
The inner product is a constant and can be computed a priori and stored in a table
for fast computation. Based on the P equations and the orthogonal nature of the
Hermite polynomials, these equations can be written in matrix form as
.Gsts C sCsts /V D Bsts usts ; (9.10)
2 3
G00 : : : G0P 1
6 7
Gsts D 4 ::: : : : ::
: 5;
GP 0 : : : GP 1P 1
2 3
C00 : : : C0P 1
6 7
Cst s D 4 ::: ::
:
::
: 5;
CP 10 : : : CP 1P 1
2 3 2 3
u0 .t/ C ud .t/ V0 .t/
6 u1 .t/ 7 6 V1 .t/ 7
6 7 6 7
usts D6 :: 7;V D 6 :: 7; (9.11)
4 : 5 4 : 5
uP 1 .t/ VP 1 .t/
2 3
B0 : : : 0
6 :: : : : 7
Bsts D4 : : :: 5 (9.12)
0 : : : BP 1
X
P 1 X
P 1
Bi D B; Gkj D Gi hHi Hj ; Hk i; Ckj D Ci hHi Hj ; Hk i;
i D0 i D0
where Gsts 2 RmPmP , Csts 2 RmPmP , Bsts 2 RmP l , m is the size of the
original circuit, and P is the number of Hermite polynomials. In [214], PRIMA-
like reduction is performed on (9.10) to obtain the reduced variational system.
4.3 Computation of Hermite PCs of Current Moments

with Log-Normal Distribution
In this section, we show how to compute the Hermite coefficients for the varia-
tional leakage currents and their corresponding moments used in the augmented
equation (9.10).
Let uiv .t; / be the i th current in the current vector uv .t; / in (9.1), which is a
function of the normalized Gaussian random variables D Œ1 ; 2 ; :::; n and time t:
Pn
uiv .t; / eg.t;/ D e j D0 gj .t /j : (9.13)
The leakage current sources are therefore following log-normal distribution. We can
then present uiv .t; / by using Hermite PC expansion form:
X
P
uiv .t; / D uivk .t/Hkn ./
kD0
0
X
n X
n X
n
. i j ıij /
D uiv0 .t/ @1 C i gi .t/ C
i D1 i D1 j D1
< . i j ıij /2 >
1
gi .t/gj .t/ C A ; (9.14)
where
Pn X
p
.n 1 C k/Š
g0 .t /C 12 gi .t /2
uiv0 .t/ De i D1 ;P D : (9.15)
kŠ.n 1/Š
kD0
n is the number of random variables and p is the order of Hermite PC expansion.

As a result, the variational variable u.t; / leads to the ust s in (9.10):
h iT
ust s D u0 .t/T C ud .t/T ; u2 .t/T ; :::; uP 1 .t/T : (9.16)
Note that ud .t/ is the deterministic current source vector.

In the EKS method, we need to compute the moments of input sources in
frequency domain. Suppose .ai1 ; i1 /, .ai 2 ; i 2 /,..., .aiKC2 ; iKC2 / are PWL series
of value-time pairs for ui .t/ or u0 .t/ C ud .t/ in (9.16). Using equation (9.3), we can
get the first L moments for each ui , i D 1; 2; :::; P in (9.16), respectively, and we
have
ui .s/ D mui1 C mui 2 sC; :::; muiL s L1 ; (9.17)
where mui k is the kth order moment vector of Hermite PCs coefficient for ui . In
this way, we can compute the moments of Hermite PC coefficients for every current
source.
Input: Augmented system Gsts , Csts , Bsts , usts

Output: The HPC coefficients of node voltage, v
1 Get the first L moments of usts for each current source.
2 Compute the orthogonal basis of subspace from (9.10) V.
3 Obtain the reduced system matrix from
GO D V T Gsts V , CO D V T Csts V , BO D V T Bsts .
O / C CO d z.t/ D Bu
4 Solve Gz.t O sts .t /.
dt
5 Project back to original space to get v(t) = Vz(t).
6 Compute the variational values (means, variance) of the specified nodes.
Fig. 9.3 The StoEKS algorithm
4.4 The StoEKS Algorithm
Given the Gst s , Cst s , and ust s in moment forms, we can obtain the orthogonal
V using the EKS algorithm. The reduced systems then can be obtained by this
orthogonal basis V from equation (9.3). The reduced system will become
dz.t/
GO st s z.t/ C CO st s D BOst s ust s : (9.18)
dt
Here,
GO st s D V T Gst s V; CO st s D V T Cst s V; BO st s D V T Bst s : (9.19)
The reduced system can be solved in the time domain by any standard integration
algorithm. The solution of the reduced system, z.t/, can then be projected back to
original space by vQ .t/ D V z.t/.
By solving the augmented equation in (9.10), we can obtain mean and variance
of any node voltage v.t/ by
1
!
X
P
E.v.t// D E v0 .t/ C vi .t/Hi D v0 ;
i D1
1
! 1
X
P X
P
var.v.t// D var v0 .t/ C vi .t/Hi D vi .t/2 var.Hi /:
i D1 i D1
Further, the distribution of v.t/ can also be easily calculated by the characteristic
of Hermite PC and the distribution of 1 ,2 ,...,N . Figure 9.3 is the StoEKS algorithm
for given Gst s , Cst s , Bst s , and ust s .
4.5 A Walk-Through Example
In the following, we consider a simple case where we only have three independent
variables to illustrate the method. We assume that there are three independent
variables g , c , and I associated with matrices G and C and input sources,
respectively, in the circuit.
We assume that the variational component in (9.1), uv .t; I /, follows log-normal
distribution as
uv .t; I / D eg.t;I / ; g.t; / D I .t/ C I .t/I : (9.20)
Then equation (8.3) becomes
dv.t/
G.g /v.t/ C C.c / D Bu.t; I /: (9.21)
dt
The variation in width W and thickness T will cause variation in conductance matrix
G and storage matrix C while variation in threshold voltage will cause variation in
leakage currents u.t; I /. Thus, the resulting system can be written as [47]
G.g / D G0 C G1 g ; C.c / D C0 C C1 c : (9.22)
G0 ; C0 represent the deterministic component of conductance and capacitance of

the wires. G1 ; C1 represent sensitivity matrices of the conductance and capacitance.
g ; c are random variables with normalized Gaussian distribution, representing
process variation in wires of conductance and capacitor, respectively.
I is a normalized Gaussian distribution random variable representing variation
in threshold voltage.
Using Galerkin-based method as in [107] with second-order Hermite PCs, we
end up solving the following equation:
dv.t/
Gst s v.t/ C Cst s D Bst s ust s .t/; (9.23)
dt
where
2 3
G0 G1 0 0 0 0 0 0 0 0
6 G1 G0 0 0 2G1 0 0 0 0 0 7
6 7
6 0 0 G0 0 0 0 0 G1 0 0 7
6 7
6 0 0 7
6 0 0 G0 0 0 0 0 G1 7
6 7
6 0 G1 0 0 G0 0 0 0 0 0 7
Gst s D6 7
6 0 0 0 0 0 G0 0 0 0 0 7
6 7
6 0 0 0 0 0 0 G0 0 0 0 7
6 7
6 0 0 G1 0 0 0 0 G0 0 0 7
6 7
4 0 0 0 G1 0 0 0 0 G0 0 5
0 0 0 0 0 0 0 0 0 G0
2 3
C0 0 C1 0 0 0 0 0 0 0
6 0 C0 0 0 0 0 0 C1 0 0 7
6 7
6 C1 0 C0 0 0 2C1 0 0 0 0 7
6 7
6 0 C1 7
6 0 0 C0 0 0 0 0 0 7
6 7
6 0 0 0 0 C0 0 0 0 0 0 7
Cst s D6 7
6 0 0 C1 0 0 C0 0 0 0 0 7
6 7
6 0 0 0 0 0 0 C0 0 0 0 7
6 7
6 0 C1 0 0 0 0 0 C0 0 0 7
6 7
4 0 0 0 0 0 0 0 0 C0 0 5
0 0 0 C1 0 0 0 0 0 C0
ust s .t/ D Œu0 .t/ C ud .t/; 0; 0; u3 .t/; 0; 0; u6 .t/; 0; 0; 0T :
One observation we have is that although the augmented circuit matrices are much
bigger than before, they are very sparse and also consist of repeated coefficient
matrices from the HPC. As a result, the reduction techniques can significantly
improve the simulation efficiency.
4.6 Computational Complexity Analysis
In this subsection, we analyze the computing costs for both StoEKS and HPC
methods and show the theoretical advantage of StoEKS over the non-reduction-
based HPC method.
First, if the PCA operation is performed, which essentially uses SVD on the
covariance matrix, its computation cost is O.ln2 /. Here, l is the number of original
correlated random variables and n is the first n dominant singular values, which
is also the number of independent random variables after PCA. Since the random
viable l is typically much smaller than the circuit size, the running time of PCA is
is not significant for the total cost.
After we transform the original circuit matrices into the augmented circuit
matrices in (9.10), which are still very sparse, the matrix sizes grow from m m
to P m P m, where P is the number of Hermite polynomials used. The number
is dependent on the Hermite polynomial order and the number of variable used as
shown in (2.31).
Typically, solving an n n linear matrix takes O.n˛ / (typically, 1 ˛ 1:2
for sparse circuits), and matrix factorizations take O.nˇ / (typically, 1:1 ˇ
1:5 for sparse circuits). For HPC, assuming that we need to compute w time steps
in transient analysis (taking w forward and backward substitutions after one LU
decomposition), the computing cost then is
O.w.mP /˛ C .mP /ˇ /: (9.24)

While for StoEKS, we only need to approximately take q, the order of the reduced
model, steps (after the one LU decomposition) to compute the projection matrix V .
So the total computational cost is

O q.mP /˛ C .mP /ˇ C mP q 2 C q 3 C wq 2 ; (9.25)
without considering the cost of the PCA operations (ln2 ) as we did not perform
the PCA in our experiments. The last three items are the costs of performing the
reductions (QR operation) and transient simulation of the reduced circuit (which
have very dense matrices) in time domain. Since q w, the computing cost of
StoEKS can be significantly lower than HPC. Also the presented method can be
further improved by using the hierarchical EKS method [11].
This section describes the simulation results of circuits with both capacitance and
conductance variations and leakage current variation. The leakage current variation
follows log-normal distribution. The capacitance and conductance variations follow
Gaussian distribution.
All the presented methods have been implemented in Matlab 7.0. All the
experimental results are carried out on a Dell PowerEdge 1900 workstation (using a
Linux system) with Intel Quadcore Xeon CPUs with 2.99 Ghz and 16 GB memory.
To solve large circuits in Matlab, an external linear solver package UMFPACK [184]
has been used, which is linked with Matlab using Matlab mexFunction. The initial
results of this chapter were published in [110, 111].
As mentioned in Sect. 4 of Chap. 8, we assume that the random variables used
in the chapter for G and C and current sources are independent after the PCA
transformation.
First, we assume a time-variant leakage model, in which we assume that uiv .t; /
in (9.13) is a function of time t and further assume that gj .t/, the standard deviation,
is a fixed percentage, say 10%, of vd .t/ in (9.1), i.e., gi .t/ D 0:1ud i .t/, where ud i .t/
is the i th component of the PWL current vd .t/.
Figures 9.4–9.6 show the results at one particular node under this configuration.
Figure 9.4 shows the node voltage distribution at one node of a ground network
with 280 nodes, considering variation in conductance, capacitance, and leakage
current (with three random variables). The standard deviation (s.d.) of the log-
normal current sources with one Gaussian variable is 0:1ud i .t/. The s.d. in
conductance and capacitance are also 0:1 of the mean. The mean and s.d. computed
by the Hermite PC method, Hermite PC with EKS are also marked in the figure,
which fit very well with the MC results. In Fig. 9.4, the dotted lines are the mean and
s.d. calculated by MC. The solid lines are the mean and s.d. by the algorithm [108],
which is named as HPC. The dashed lines are the results from StoEKS. The MC
results are obtained by 3,000 samples. The reduced order for EKS is five, q D 5.
Comparison of voltage distribution among three method with three RV

500
dash: StoEKS
line: HPC
400
← μ−3δ ←μ ← μ+3δ
350
300
250
200
150
100
50
0
0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11
Voltage(volts)
Fig. 9.4 Distribution of the voltage variations in a given node by StoEKS, HPC, and Monte
Carlo of a circuit with 280 nodes with three random variables. gi .t / D 0:1ud i .t /. Reprinted with
Figure 9.5 shows the distribution at one node of a ground network with 2,640
nodes. The parameter gi .t/ value is set to the same as the ones in the circuit with 280
nodes. The s.d. in conductance are 0.02, 0.05, and 0.1 of the mean for three variables.
The s.d. in capacitance are 0:02, 0:02, and 0:1 of the mean for three variables. There
are totally seven random variables. The dotted lines represent the MC results. And
the dashed lines represent the results given by StoEKS. From these two figures, we
can only see marginal difference between the three different methods. The reduced
order for EKS is also five, q D 5.
Figure 9.6 shows the distribution at one node of a ground network with 280
nodes. But the variation setting of parameters is different. The standard deviations in
conductance are set to 0:02, 0:02, 0:03, 0:05, and 0:05 of the mean for five variables,
respectively, i.e., their a1 in (9.4) is set to those values. The standard deviations in
capacitance are set to 0:02, 0:03, 0:04, 0:05, and 0:05 of the mean for five variables,
respectively, also. The standard deviation of the log-normal current sources is 0:1
of the mean. There are 11 random variables in all. It is even harder for HPC to
compute mean and s.d. of the circuit. The dotted lines represent the MC results.
And the dashed lines represent the results given by StoEKS. The reduced order for
EKS is ten.
Table 9.1 shows the speedup of the StoEKS and HPC methods over MC method
under different numbers of random variables. In the table, #RV is the number of
Comparison of voltage distribution among three methods with seven RV

500
dash: StoEKS
line: HPC
400
← μ−3δ ←μ ← μ+3δ
350
300
250
200
150
100
50
0
0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65
Voltage(volts)
Fig. 9.5 Distribution of the voltage variations in a given node by StoEKS, HPC, and MC of
a circuit with 2,640 nodes with seven random variables. gi .t / D 0:1ud i .t /. Reprinted with
random variables used. In the table, there are 3, 7, and 11 random variables. The
variation value setup of three random variables is the same as the circuit used in
Fig. 9.4. The variation value setup of seven random variables is the same as the
circuit used in Fig. 9.5. The variation value setup of 11 random variables is the
same as the circuit used in Fig. 9.6. The first speedup is the speedup of StoEKS over
MC, and the second speedup is the speedup of HPC over MC.
From the table, we observe that we cannot obtain the results from HPC or MC
when the circuit becomes large enough in reasonable time. Meanwhile, StoEKS can
deliver all the results.
We remark that the intra-die variations are typically very spatially correlated [16].
After the transformation like PCA, the number of variables can be significantly
reduced. As a result, in our examples, we do not assume large number of variables.
Tables 9.2 and 9.3 show the mean and s.d. comparison of different methods over
the MC method for several circuits. Again, #RV is the number of random variables
used. Table 9.2 contains the values we obtain from different methods, and Table 9.3
presents the error comparison of StoEKS and HPC over Monte Carlo, respectively.
Comparison of voltage distribution between two methods with eleven RV

500
dash: StoEKS
400
← μ−3δ ←μ ← μ+3δ
350
300
250
200
150
100
50
0
0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65
Voltage(volts)
Fig. 9.6 Distribution of the voltage variations in a given node by StoEKS and MC of a circuit with
2,640 nodes with 11 random variables. gi .t / D 0:1ud i .t /. Reprinted with permission from [110]
c 2008 IEEE
Table 9.1 CPU time comparison of StoEKS and HPC with the Monte Carlo method.
gi .t / D 0:1ud i .t /
#nodes #RV MC StoEKS Speedup HPC [108] Speedup
280 3 694.35 0:3 2314:5 2:37 292:97
280 7 671.46 2:37 283:31 227:94 2:94
280 11 684.88 24:26 28:23 914:34 0:74
2,640 3 5925.7 4:33 1368:5 55:35 107:1
2,640 7 5927.6 25:02 236:9 1952:2 3:04
2,640 11 6042.2 693:27 8:72 – –
12,300 3 3:54 104 21:62 1637:4 298:84 118:5
12,300 7 3:30 104 151:71 217:65 – –
119,600 3 – 258:21 – – –
119,600 7 – 2074:8 – – –
1,078,800 3 – 1830:4 – – –
Table 9.2 Accuracy comparison of different methods, StoEKS, HPC,

and MC. gi .t / D 0:1ud i .t /
Mean Std dev
#nodes #RV MC StoEKS HPC MC StoEKS HPC
280 3 0.047 0.047 0.047 0.0050 0.0048 0.0048
2,640 3 0.39 0.39 0.39 0.048 0.046 0.046
12,300 3 1.66 1.66 1.66 0.16 0.17 0.17
280 7 0.047 0.047 0.047 0.0056 0.0055 0.0055
2,640 7 0.39 0.39 0.39 0.048 0.046 0.046
12,300 7 2.56 2.56 – 0.31 0.30 –
280 11 0.047 0.047 0.047 0.0039 0.0039 0.0040
2,640 11 0.39 0.39 – 0.033 0.033 –
Table 9.3 Error comparison of StoEKS and HPC over Monte Carlo
methods. gi .t / D 0:1ud i .t /
StoEKS % HPC % StoEKS % HPC %
#nodes #RV error in error in error in error in
280 3 0.19 0.28 3.14 3.10
2,640 3 1.23 1.05 4.31 4.51
12,300 3 0.10 0.08 2.95 2.98
280 7 0.063 0.17 1.12 1.54
2,640 7 0.076 0.11 4.18 4.60
12,300 7 0.23 – 0.23 –
280 11 0.42 0.21 0.18 0.52
2,640 11 0.18 – 0.30 –
A PWL current source at one node

0.045
0.04
0.035
0.03
0.025
Ams
0.02
0.015
0.01
0.005
0
0 0.5 1 1.5 2
time(s) x 10−7
Fig. 9.7 A PWL current source at certain node. Reprinted with permission from [110]
c 2008
IEEE
6 Summary 143
Comparison of voltage distribution among three methods with three RVs

500
dash: HPC
450 dot: MonteCarlo
line: HPC
400
← μ−3δ ←μ ← μ+3δ
350
300
250
200
150
100
50
0
0.04 0.05 0.06 0.07 0.08 0.09 0.1
Voltage(volts)
Fig. 9.8 Distribution of the voltage variations in a given node by StoEKS, HPC, and Monte Carlo
of a circuit with 280 nodes with three random variables using the time-invariant leakage model.
gi D 0:1Ip . Reprinted with permission from [110] c 2008 IEEE
We can see that StoEKS only has marginal difference from MC while it is able to
perform simulation on much larger circuit than the existing HPC method on the
same platform.
Finally, we use a time-invariant leakage model, in which we assume that uiv ./
in (9.13) is not a function of time t and further assume that gj , which is the standard
deviation, is a fixed percentage, of a constant current value in (9.1). In our test cases,
we use the peak current, Ip 41 mA as shown in Fig. 9.7, as the constant value.
Figure 9.8 shows the results in this configuration.
6 Summary
In this chapter, we have presented a fast stochastic method for analyzing the voltage
drop variations of on-chip power grid networks. The new method, called StoEKS,
applies HPC to represent the random variables in both power grid networks and
input leakage currents with log-normal distribution. This HPC method transforms
a statistical analysis problem into a deterministic analysis problem where increased

augmented circuit matrices are created. The augmented circuit matrices consist of
the coefficients of Hermite polynomials representing both variational parameters in
circuit matrices and input sources. We then applied the EKS method to compute
variational responses from the augmented circuit equations. The presented method
does not require any sampling operations as used by collocation-based spectral
stochastic analysis method. Numerical examples have shown that the presented
method is about two orders of magnitude faster than the existing Hermite PC-
based simulation method and more orders of magnitudes faster than MC method
with marginal errors. StoEKS also increases the analysis capacity of the statistical
simulation methods based on the spectral stochastic method presented in Chap. 8.
Chapter 10
Statistical Power Grid Analysis by Variational
Subspace Method
1 Introduction
In this chapter, we present a novel scalable statistical simulation approach for

large power grid network analysis considering process variations [92]. The new
algorithm is very scalable for large networks with a large number of random
variables. Our work is inspired by the recent work on variational model order
reduction using fast balanced truncation method (called variational Poor man’s TBR
method, or varPMTBR [134]). The new method, called varETBR, is based on the
recently proposed ETBR method [93, 94]. To consider the variational parameters,
we extend the concept of response Gramian, which was used in ETBR to compute
the reduction projection subspace, to the variational response Gramian. Then
MC-based numerical integration is employed to multiple-dimensional integrals.
Different from traditional reduction approaches, varETBR calculates the variational
response Gramians, considering both system and input source variations, to gen-
erate the projection subspace. In this way, much more efficient reduction can be
performed for interconnects with massive terminals like power grid networks [177].
Furthermore, the new method is based on the globally more accurate balanced
truncation reduction method instead of the less accurate Krylov subspace method
as in EKS/IEKS [89, 191]. After the reduction, MC-based statistical simulation is
performed on the reduced system and the statistical responses of the original systems
are obtained thereafter. The varETBR only requires the simulation of the reduced
circuit using any existing transient analysis method. It is insensitive to the number
of variables and variation ranges in terms of computing costs and accuracy, which
makes it very general and scalable. Numerical results, on a number of the IBM
benchmark circuits [123] up to 1.6 million nodes, show that the varETBR can be
up to 1; 900 faster than the MC method, and is much more scalable than the
StoEKS method [110,111]. varETBR can solve very large power grid networks with
large numbers of random variables, large variation ranges, and different variational
distributions.

146 10 Statistical Power Grid Analysis by Variational Subspace Method
The rest of this chapter is as follows: Sect. 2 reviews the EKS methods and
fast balanced truncation methods. Our new variational analysis method varETBR is
presented in Sect. 3. Section 4 shows the experimental results, and Sect. 5 concludes
this chapter.
2 Review of Fast Truncated Balanced Realization Methods
2.1 Standard Truncated Balanced Realization Methods
The truncated balanced realization (TBR)-based reduction method has two steps
in the reduction process: The balancing step transforms the states that can be
controlled and observed equally. The truncating step then throws away the weak
states, which usually leads to much smaller models. The major advantage of the
TBR method is its ability to give a deterministic global bound for the approximate
error as well as provide nearly optimal models in terms of errors and model sizes.
Given a system in a standard state-space form,
x.t/
P D Ax.t/ C Bu.t/;
y.t/ D C x.t/; (10.1)
where A 2 Rnn , B 2 Rnp , C 2 Rpn , and y.t/, u.t/ 2 Rp . The controllable

and observable Gramians are the unique symmetric positive definite solutions to the
Lyapunov equations:
AX C XAT C BB T D 0;
AT Y C YA C C T C D 0: (10.2)
Since the eigenvalues of product X Y are invariant under similarity transformation,

we can perform a similarity transformation .Ab D T 1 AT; Bb D T 1 B; Cb D
C T / to diagonalize the product X Y such that
T 1 X Y T D † D diag.1 2 ; 2 2 ; : : : ; n 2 /; (10.3)
where T matrix is the transformation matrix and the Hankel singular values of the
system, (k ), are arranged in a descending order. If we partition the matrices as
T
W1 †1 0
X Y V1 V2 D ; (10.4)
W2T 0 †2
where †1 D diag.1 2 ; 2 2 ; : : : ; r 2 / are the first r largest eigenvalues of Gramian

product X Y and W1 and V1 are corresponding eigenvectors. A reduced model can
be obtained as follows:
x.t/
P D Ar x.t/ C Br u.t/;
y.t/ D Cr x.t/; (10.5)
2 Review of Fast Truncated Balanced Realization Methods 147
where Ar D W1T AW1 , Br D W1T B, and Cr D C V1 . One most desired feature of

the TBR method is that it has proved error bound:
Pthe error in the transfer function
of the order r approximation is bounded by 2 N i DrC1 k [50, 112]. In the TBR
procedure, the computational cost is dominated by solving Lyapunov equations of
complexity O.n3 /, which makes it too expensive to apply to large problem sizes.
2.2 Fast and Approximate TBR Methods
The TBR method generally suffers high computation costs, as it needs to solve
expensive Lyapunov equations (10.2). To mitigate this problem, fast TBR meth-
ods [134, 196] have been proposed recently, which compute the approximate
Gramians. The Poor men’s TBR method or PMTBR [134] was proposed for
variational interconnect modeling.
Specifically, the Gramian X can also be computed in the time domain as
Z 1
T
XD eAt BB T eA t dt: (10.6)
0
From Parseval’s theorem, and the fact that the Laplace transform of eAt is .sI
A/1 , the Gramian X can also be computed in the frequency domain as
Z C1
XD .j!I A/1 BB T .j!I A/H d!; (10.7)
1
where superscript H denotes Hermitian transpose. Let !k be the kth sampling point.
If we define
zk D .j!k I A/1 B; (10.8)
then based on the numerical quadrature rule, X can be approximated as [134]:
X
XO D wk zk zH 2 H
k D ZW Z ; (10.9)
where Z D Œz1 ; z2 ; : : : ; zn . W is a diagonal matrix with diagonal entries wkk D

p
wk . wk comes from a specific numerical quadrature method. Since XO is symmet-
ric, it is orthogonally diagonalizable:
T
T O O VO1 †
O1 0
V X V D O T X V1 V2 D
O O O O
O2 ; (10.10)
V2 0 †
where VO T VO D I . VO converges to the eigenspaces of X , and the dominant

eigenvectors VO1 can be used as the projection matrix in a model reduction approach
.Ar D VO1T AVO1 ; Br D VO1T B/.
2.3 Statistical Reduction by Variational TBR
In [134], PMTBR has been extended to reduce interconnect circuits with variational
parameters. The idea is that the computation of Gramian in (10.7) can be viewed as
the mean computation of .j!I A/1 BB T .j!I A/H with respect to statistical
variable !, the frequency. If we have more statistical variable parameters, the
Gramians can be still viewed as the mean computation, but over all the variables
(including the frequency variables).
In the fast TBR framework, computing Gramian (10.7) is essentially a one-
dimensional integral with respect to the complex frequency !. When multiple
variables with specific distributions are considered, multidimensional integral with
respect to random variables will be computed. As in PMTBR, the MC method was
still employed in variational TBR to compute the multiple-dimensional integral.
One important observation in varPMTBR is that the number of samplings in
building subspaces is much smaller than the number of general MC samplings
for achieving the same accuracy. As a result, varPMTBR is much faster than
the brute-force Monte Carlo method, and its costs are much less sensitive to the
number of random variables and variation ranges, which makes this method much
more efficient than the existing variational or parameterized model order reduction
methods [208].
3 The Presented Variational Analysis Method: varETBR
In this section, we detail the presented varETBR method. We first present the
recently proposed ETBR method for deterministic power grid analysis based on
reduction techniques.
3.1 Extended Truncated Balanced Realization Scheme
The presented method is based on the recently proposed ETBR method [93]. We
first review this method.
For a linear system in (8.2), we first define the frequency-domain response
Gramian,
Z C1
Xr D .j!C C G/1 Bu.j!/uT .j!/B T .j!C C G/H d!; (10.11)
1
which is different from the Gramian concepts in the traditional TBR-based reduction
framework. Notice that in the new Gramian definition, the input signals u.j!/ are
considered. As a result, .j!C C G/1 Bu.j!/ serves as the system response with
respect to the input signal u.j!/ and resulting Xr becomes the response Gramian.
3 The Presented Variational Analysis Method: varETBR 149
Fig. 10.1 Flow of ETBR
To fast compute the response Gramian Xr , we can use MC-based method to

estimate the numerical value as done in [134]. Specifically, let !k be kth sampling
point over the frequency range. If we further define
zrk D .j!k C C G/1 Bu.j!k /; (10.12)
then XO can be computed approximately by numerical quadrature methods:

X
XO r D wk zrk zrk H D Zr W 2 ZrH ; (10.13)
k
where Zr is a matrix whose columns are zrk and W is a diagonal matrix with diagonal
p
entries wkk D wk . wk comes from a specific quadrature method.
The projection matrix can be obtained by singular value decomposition (SVD)
of Zr . After this, we can reduce the original matrices into small ones and then
perform the transient analysis on the reduced circuit matrices. The ETBR algorithm
is summarized in Fig. 10.1.
Notice that we need the frequency response caused by input signal u.j!k /
in (10.12). This can be obtained by FET on the input signals in time domain.
Using frequency spectrum representations for the input signals is a significant
improvement over the EKS method as we avoid the explicit moment representation
of the current sources, which are not accurate for currents rich in high-frequency
components due to the well-known problems in explicit moment matching meth-
ods [137]. Accuracy is also improved owing to the use of the fast balanced truncation
method for the reduction, which has global accuracy [112, 134].
Note that we use congruence transformation for the reduction process with
orthogonal columns in the projection matrix (by using Arnoldi or Arnoldi-like
process); the reduced system must be stable. For simulation purposes, this is
sufficient. If all the observable ports are also the current source nodes, i.e., y.t/ D
B T v.t/, where y.t/ is the voltage vector at all observable ports, the reduced system
is also passive. It was also shown in [134] that the fast TBR method has similar
time complexity to multiple-point Krylov-subspace-based reduction methods. The
extended TBR method also has similar computation costs as the EKS method.
3.2 The Presented Variational ETBR Method
We first start the new statistical interpretation of Gramian computation before

introducing the presented method.
3.2.1 Statistical Interpretation of Gramian
For a linear dynamic system formulated in state space equations (MNA) in (8.2), if
complex frequency j! is a vector of random variables with uniform distribution in
the frequency domain, then the state responses V .j!/ D .G C j!C /1 Bu.!/
become random variables in frequency domain. Its covariance matrix can be
computed as
Z C1
˚
Xr D E V .j!/V .j!/T D V .j!/V .j!/T d!; (10.14)
1
where Efxg stands for computing the mean of random variable x. Xr is defined
in (10.11). The response Gramian essentially can be viewed as the covariance matrix
associated with state responses. Xr can also be interpreted as the mean for function
P .j!/ on evenly distributed random variables j! over Œ1; C1.1 ETBR method
actually performs the PCA transformation of the mentioned random process with
uniform distribution.
3.2.2 Computation of Variational Response Gramian
Define P .j!/ D V .j!/V .j!/T . Now suppose in addition to the frequency variable
j!, P .j!; / is also the function of the random variable with probability density
1
Practically, the interesting frequency range is always bounded.
3 The Presented Variational Analysis Method: varETBR 151
f ./. The new variational response Gramian Xvr can be defined as

Z Z C1
Xvr D f ./P .j!; /d!d D EfP .j!; /g (10.15)
s 1
where s is the domain of variable with a specific distribution. Hence, Xvr is

essentially the mean of P .j!; / with respect to both j! and . The concept can be
extended to more random variables D Œ1 ; 2 ; :::; n and each variable i adds one
more dimension of integration for the integral.
As a result, calculating the variational Gramian is equivalent to computing the
multidimensional integral in (10.15), which can be computed by numerical quadra-
ture methods. For one-dimensional integration, efficient methods like Gaussian
quadrature rule [173] exist. For multidimensional integral, quadrature points are
created by taking tensor products of one-dimensional quadrature points, which,
unfortunately, grow exponentially with the number of variables (dimensions) and
make the integration intractable for practical problems [165].
Practically, established techniques like MC or quasi MC are more amenable for
computing the integrals [173] as the computation costs are not dependent on the
number of variables (integral dimensions). In this chapter, we apply the standard
MC method to compute the variational Gramian Xvr . The MC estimation of (10.15)
consists of sampling N random points xi 2 S , where S is the domain for both
frequency and other variables, from a uniform distribution, and then computing the
estimate as
N
1 X
XO vr D P .xi /: (10.16)
N i D1
p
The MC method has a slow convergence rate (1= N ) in general, although it can
be improved to (1=N ) by quasi MC methods. But as observed by Phillips [134], the
projection subspace constructed from the sampled points actually converges much
faster than the value of XO vr . As we are concerned with the projection subspace rather
than the actual numerical values of Xvr , we require only the drawing of a small
number of samples as shown in the experimental result. The varETBR algorithm
flow is shown in Fig. 10.2. Where G./ O D VrT G./Vr and CO ..// D VrT C./Vr
stand for
G./
O D VrT G0 Vr C VrT G1 Vr 1 C C VrT GM Vr M ; (10.17)
CO ./ D VrT C0 Vr C VrT C1 Vr 1 CC VrT CM Vr M : (10.18)
The algorithm starts with the given power grid network and the number of sam-
plings q, which are used for building the projection subspace. Then it computes the
1
variational response zrk D sk C.1k ; :::; M
k
/ C G.1k ; :::; M
k
/ B u.sk ; 1k ; :::; M
k
/
r r r
randomly. Then we perform the SVD on Zr D Œz1 ; z2 ; : : : ; zq to construct the
projection matrix. After the reduction, we perform the MC-based statistical analysis
to obtain the variational responses from v.t/ D Vr vO .t/.
Fig. 10.2 Flow of varETBR
We remark that in both Algorithm 10.1 and Algorithm 10.2, we perform MC-like
random sampling to obtain q frequency sampling points over the M C1 dimensional
space for given frequency range and parameter spaces (for Algorithm 10.1, sampling
is on the given frequency range only). We note that the MC-based sampling method
is also used in the PMTBR method [134].
Compared with existing approaches, varETBR offers several advantages and
features. First, varETBR only uses MC sampling, it is easy to implement, and is
very general for dealing with different variation distributions and large variation
ranges. It is also more amenable for parallel computing as each sampling in
frequency domain can be done in parallel. Second, it is vary scalable for solving
large networks with large number of variables as reduction is performed. Third,
varETBR is more accurate over wide band frequency ranges as it samples over
frequency band (compared with the less accurate moment matching-based EKS
method). Last, it avoids the explicit moment representation of the input signals,
leading to more accurate results than the EKS method when signals are rich in high
frequency components.
The varETBR algorithm has been implemented using Matlab and tested on an Intel
quad-core workstation with 16 GB memory under Linux environment. The initial
results of this chapter were published in [91, 92].
Table 10.1 Power grid (PG)

benchmarks Name # of nodes # of V sources # of I sources
ibmpg1 30,638 14,308 10,774
ibmpg2 127,238 330 37,926
ibmpg3 851,584 955 201,054
ibmpg4 953,583 962 276,976
ibmpg5 1,079,310 539,087 540,800
ibmpg6 1,670,494 836,239 761,484
All the benchmarks are real PG circuits from IBM provided by [123], but the
circuits in [123] are resistor-only circuits. For transient analysis, we need to add
capacitors and transient input waveforms. As a result, we modified the benchmark
circuits. First, we added one grounded capacitor on each node with a random value
in the magnitude of pF. Second, we replaced the DC current sources by a PWL signal
in the benchmark. The values of these signals are also randomly generated based on
their original values in the DC benchmarks. We implemented a parser using Python
to transform the SPICE format benchmarks into Matlab format.
The summary of our transient PG benchmarks is shown in Table 10.1. We use
MNA formulation to set up the circuit matrices. To efficiently solve PG circuits with
1.6 million nodes in Matlab, an external linear solver package UMFPACK [184] is
used, which is linked with Matlab using Matlab mexFunction.
We will compare varETBR with the MC method, first in accuracy and then
in CPU times. In all the test cases, the number of samples used for forming the
subspace in varETBR is 50, based on our experience. The reduced order is set to
p D 10, which is sufficiently accurate in practice. Here we set the variation range,
the ratio of the maximum variation value to the nominal value, to 10% and set the
number of variables to 6 (2 for G, 2 for C and 2 for i ). G./ and C./ follow
Gaussian distribution. i.t; /, which models the leakage variations [39], follows
log-normal distribution.
varETBR is essentially a kind of reduced MC method. It inherits the merits of
MC methods, which are less sensitive to the number of variables and can reflect the
real distribution very accurately for a sufficient number of samples. But the main
disadvantage of MC is that it is too slow to simulate on large-scale circuits. varETBR
first reduces the size of circuits to a small number while maintaining sufficient
accuracy. Thus, varETBR can do MC simulation on the reduced circuits very fast.
Note that the reduction process is done only once during the simulation process.
To verify the accuracy of our varETBR method, we show the results of
simulations on ibmpg1 (100 samples) and ibmpg6 (10 samples). Figures 10.3 and
10.4 show the results of varETBR and the pure MC method at the 1,000th node
(named n1 20583 11663 in SPICE format) of ibmpg1 and at the 1,000th node
(named n3 16800 9178400 in SPICE format) of ibmpg6, respectively. The circuit
equations in MC are solved by Matlab.
The absolute errors and relative errors of ibmpg1 and ibmpg6 are shown in
Figs. 10.5 and 10.6. We can briefly see that errors are very small and our varETBR is
Transient waveforms on node 1000 of ibmpg1

1.9
1.8
1.7
Voltage (V)
1.6
varETBR
1.5
Monte Carlo
1.4
1.3
0 0.5 1 1.5 2
Time (s) x 10−7
Fig. 10.3 Transient waveform at the 1,000th node (n1 20583 11663) of ibmpg1 (p D 10, 100
samples). Reprinted with permission from [91]
c 2010 Elsevier
Transient Waveforms on Node 1000 of ibmpg6

1.8
1.78
1.76
1.74
varETBR
1.72 Monte Carlo
Voltage (V)
1.7
1.68
1.66
1.64
1.62
1.6
0 0.5 1 1.5 2
Time (s) x 10−7
Fig. 10.4 Transient waveform at the 1,000th node (n3 16800 9178400) of ibmpg6 (p D 10, 10
samples). Reprinted with permission from [91]
c 2010 Elsevier
a Simulation errors of ibmpg1 b x 10−4

Simulation errors of ibmpg6
0.03 4.5
4
0.025
3.5
0.02 3
Voltage (V)
Voltage (V)
2.5
0.015
2
0.01 1.5
1
0.005
0.5
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
Time (s) x 10−7 Time (s) x 10−7
Simulation errors of ibmpg1 (100 samples). Simulation errors of ibmpg6 (10 samples).
Fig. 10.5 Simulation errors of ibmpg1 and ibmpg6. Reprinted with permission from [91]
c 2010
Elsevier
a Relative errors of ibmpg1 b Relative errors of ibmpg6

2.5%
2%
0.2%
Percentage
Percentage
1.5%
1%
0.1%
0.5%
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
Time (s) x 10 −7 Time (s) x 10−7
Relative errors of ibmpg1 (100 samples). Relative errors of ibmpg6 (10 samples).
Fig. 10.6 Relative errors of ibmpg1 and ibmpg6. Reprinted with permission from [91]
c 2010
Elsevier
very accurate. Note that the errors are not only influenced by the variations but also
depend on the reduced order. To increase the accuracy, we may increase the reduced
order. In our tests, we set the reduced order to p D 10 for all the benchmarks.
Next, we do accuracy comparison with MC on the probability distributions
including means and variances. Figure 10.7 shows the voltage distributions of both
varETBR and original MC at the 1,000th node of ibmpg1 when t D 50 ns (200 time
steps between 0 ns and 200 ns in total). We can also refer to simulation waveforms
on t D 50 ns in Fig. 10.3. Note that the results do not follow Gaussian distribution
as G./ and C./ follow Gaussian distribution and i.t; / follows log-normal
distribution. From Fig. 10.7, we can see that not only are the means and the variances
of varETBR and MC almost the same but so are their probability distributions.
Distributions of voltages for Monte Carlo and varETBR

450
400
μ−3σ μ μ+3σ
350
300
Number of events
250
200
Monte Carlo
150 varETBR
100
50
0
0 0.5 1 1.5 2 2.5
Voltages (V)
Fig. 10.7 Voltage distribution at the 1,000th node of ibmpg1 (10,000 samples) when t D 50 ns.
Reprinted with permission from [91] c 2010 Elsevier
Table 10.2 CPU times (s)

varETBR (s) Monte Carlo
comparison of varETBR and
Monte Carlo (q D 50, Test Ckts Red. (s) Sim. (s) Sim. (s)
p D 10) ibmpg1 (100) 23 14 739
ibmpg1 (10000) 23 1335 70719
ibmpg2 (10) 115 1.4 536
ibmpg3 (10) 1879 1.5 4973
ibmpg4 (10) 2130 1.3 5275
ibmpg5 (10) 1439 1.3 5130
ibmpg6 (10) 1957 1.5 6774
Finally, we compare the CPU times of varETBR and the pure Monte Carlo
method. To verify the efficiency of varETBR on both CPU time and memory, we
do not need to run simulations many times for both varETBR and MC. We will run
10 or 100 samples for each benchmark to show the efficiency of varETBR since we
already showed its accuracy. Although we only run a small number of samples, the
speedup will be the same. Table 10.2 shows the actual CPU times of both varETBR
(including FFT costs) and MC on the given set of circuits. The number of sampling
points in reduction is q D 50. The reduction order is p D 10. Table 10.3 shows the
projected CPU times of varETBR (one-time reduction plus 10,000 simulations) and
MC (10,000 samples).
In varETBR, circuit model becomes much smaller after reduction and we only
need to perform the reduction once. Therefore, the total time is much faster than
Table 10.3 Projected CPU

Test Ckts varETBR (s) Monte Carlo (s) Speedup
times (s) comparison of
varETBR and Monte Carlo ibmpg1 1358 70719 53
(q D 50, p D 10, 10,000 ibmpg2 1515 53600 354
samples) ibmpg3 3379 497300 1472
ibmpg4 3430 527500 1538
ibmpg5 2739 513000 1873
ibmpg6 3457 677400 1960
Table 10.4 Relative errors for the mean of max voltage drop of varETBR
compared with Monte Carlo on the 2,000th node of ibmpg1 (q D 50, p D 10,
10,000 samples) for different variation ranges and different numbers of
variables
Variation range
#Variables var D 10% var D 30% var D 50% var D 100%
M D6 0:16% 0:08% 0:17% 0:21%
M D9 0:16% 0:25% 0:08% 0:23%
M D 12 0:25% 0:07% 0:07% 0:28%
M D 15 0:15% 0:06% 0:05% 0:06%
Table 10.5 Relative errors for the variance of max voltage drop of varETBR
compared with Monte Carlo on the 2,000th node of ibmpg1 (q D 50, p D
10, 10,000 samples) for different variation ranges and different numbers of
variables
Variation range
#Variables var D 10% var D 30% var D 50% var D 100%
M D6 0:27% 1:54% 1:38% 1:73%
M D9 0:25% 0:67% 1:32% 1:27%
M D 12 0:42% 0:07% 0:68% 1:41%
M D 15 0:18% 1:11% 0:67% 2:14%
MC (up to 1; 960). Basically, the bigger the original circuit size is, the faster the
simulation will be for varETBR. Compared to the MC method, the reduction time
is negligible compared to the total simulation time.
Note that we run random simulation 10,000 times for ibmpg1, as shown in
Table 10.2, to show the efficiency of our varETBR in practice.
It can be seen that varETBR is very scalable. It is, in practice, almost independent
of the variation range and numbers of variables. One possible reason is that
varETBR already captures the most dominant subspaces even for small number of
samples (50 in our case) as explained in Sect. 3.
When we increase the variation range and the number of variables, the accuracy
of varETBR is almost unchanged. Tables 10.4 and 10.5 show the mean and variance
comparison between the two methods for 10 K MC runs, where we increase the
number of variables from 6 to 15 and the variation range from 10% to 100%.
The tables show that varETBR is very insensitive to the number of variables and
Table 10.6 CPU times (s) comparison of StoEKS and varETBR (q D 50, p D 10)
with 10,000 samples for different numbers of variables
MD5 MD7 MD9
Test Ckts StoEKS varETBR StoEKS varETBR StoEKS varETBR
ibmpg1 165 1315 572 1338 3748 1326
ibmpg2 1458 1387 1351 1377
variation range for a given circuit ibmpg1, where simulations are run on 10,000
samples for both varETBR (q D 50, p D 10) and MC.
The variation range var is the ratio of the maximum variation value to the
nominal value. So “var D 100%” means the maximum variation value may be
as large as the nominal value.
From Tables 10.4 and 10.5, we observe that varETBR is basically insensitive to
the number of variables and the variation range. Here we use the same sampling size
(q D 50) and reduced order (p D 10) for all of the different combinations between
number of variables and variation range. And the computation cost of varETBR is
the almost same for different numbers of variables and different variation ranges.
This actually is consistent with the observation in PMTBR [134]. One explanation
for the insensitiveness or nice feature of the presented method is that the subspace
obtained even with small number of samplings contains the dominant response
Gramian subspaces for the wide parameter and frequency ranges.
Finally, to demonstrate the efficiency of varETBR, we compare it with one re-
cently proposed similar approach, StoEKS method, which employs Krylov subspace
reduction with orthogonal polynomials in [111] on the same suite of IBM circuit.
Table 10.6 shows the comparison results where “” means out of memory error.
StoEKS can only finish smaller circuits ibmpg1 (30 k) and ibmpg2 (120 k), while
varETBR can go through all the benchmarks (up to 1.6 M nodes) easily. The CPU
time of StoEKS increases rapidly and could not complete computations as variables
count increases. For varETBR, CPU time is independent of number of variables
and only depends on the reduced order and number of samples used in the reduced
MC simulation. Here we select reduced order p D 10 and 10,000 samples that are
sufficient in practice to obtain the accurate probability distribution.
5 Summary
In this chapter, we have presented a new scalable statistical power grid analysis
approach based on ETBR reduction techniques. The new method, called varETBR,
performs reduction on the original system using variation-bearing subspaces before
MC statistical transient simulation. But different from the varPMTBR method, both
system and input source variations are considered for generating the projection
subspace by sampling variational response Gramians to perform the reduction. As a
result, varETBR can reduce systems with many terminals like power grid networks
5 Summary 159
while preserving variational information. After the reduction, MC-based statistical

simulation is performed on the reduced system to obtain the statistical responses of
the original system. Numerical examples show that the varETBR can be 1;900
faster than the MC method and can be scalable to solve very large power grid
networks with large numbers of random variables and variation ranges. varETBR
is also much more scalable than the StoEKS [111] on the IBM benchmark circuits.
Part IV
Statistical Interconnect Modeling
and Extractions
Chapter 11
Statistical Capacitance Modeling and Extraction
1 Introduction
It is well accepted that the process-induced variability has huge impacts on the
circuit performance in the sub-100 nm VLSI technologies [120,121]. The variational
consideration of process has to be assessed in various VLSI design steps to ensure
robust circuit design. Process variations consist of both systematic ones, which
depend on patterns and other process parameters, and random ones, which have
to be dealt with using stochastic approaches. Efficient capacitance extraction ap-
proaches by using the boundary element method (BEM) such as the fastCap [115],
HiCap [164], and PHiCap [199] have been proposed in the past. To consider the
variation impacts on the interconnects, one has to consider the RLC extraction
processes of the three-dimensional structures modeling the interconnect conductors.
In this chapter, we investigate the geometry variational impacts on the extracted
capacitance.
Statistical extraction of capacitance considering process variations has been stud-
ied recently, and several approaches have been proposed [74,87,207,208,210] under
different variational models. Method in [87] uses analytical formulas to consider the
variations in capacitance extraction and it has only first-order accuracy. The FastSies
program considers the rough surface effects of the interconnect conductors [210].
It assumes only Gaussian distributions and has high computational costs. Method
in [74] combines the hierarchical extraction and PFA to solve the capacitance
statistical extraction.
Recently, a capacitance extraction method using collocation-based spectral
stochastic method was proposed [205, 208]. This approach is based on the Hermite
PC representation of the variational capacitance. It applies the numerical quadrature
(collocation) method to compute the coefficients of the extracted capacitance in the
Hermite polynomial form where the capacitance extraction processes (by solving
the potential coefficient matrices) are performed many times (sampling). One of
the major problems with this method is that many redundant operations are carried
out (such as the setup of potential coefficient matrices for each sampling, which

164 11 Statistical Capacitance Modeling and Extraction
corresponds to solve one particular extraction problem). For the second-order

Hermite polynomials, the number of samplings is O(m2 ), where m is the number
of variables. So if m is large, the approach will lose its efficiency compared to the
Monte Carlo method.
In this chapter, instead of using the numerical quadrature method, we use a
different spectral stochastic method, where the Galerkin scheme is used. Galerkin-
based spectral stochastic method has been applied for statistical interconnect
modeling [35, 187] and on-chip power grid analysis considering process variations
in the past [109–111]. The presented method, called StatCap [156], first transforms
the original stochastic potential coefficient equations into a deterministic and
larger one (via the Galerkin-based method) and then solves it using an iterative
method. It avoids the less efficient sampling process in the existing collocation-
based extraction approach. As a result, the potential coefficient equations and the
corresponding augmented system only need to be setup once versus many times in
the collocation-based sampling method. This can lead to a significant saving in CPU
time. Also, the augmented potential coefficient system is sparse, symmetric, and
low rank, which is further exploited by an iterative solver to gain extra speedup. To
consider second-order effects, we derive the closed-form OPC for the capacitance
integral equations directly in terms of variational variables without the loss of
speed compared with the linear model. Numerical examples show that the presented
method based on the first-order and second-order effects can deliver two orders of
magnitude speedup over the collocation-based spectral stochastic method and many
orders of magnitude over the MC method.
The highlights of the presented algorithm are as follows:
1. Proposing the Galerkin-based spectral stochastic method to solve the statistical
capacitance extraction problem where Galerkin scheme (vs. the collocation
method) is used to compute the coefficients of capacitance.
2. Deriving the closed-form coefficients Hermite polynomial for potential coeffi-
cient matrices in both first-order and second-order forms.
3. Studying the augmented matrix properties and showing that augmented matrix is
still quite sparse, low rank, and symmetric.
4. Solving the augmented systems by minimum residue conjugate gradient
method [130] to take advantage of the sparsity, low rank, and symmetric
properties of the augmented matrices.
5. Comparing with the existing statistical capacitance extraction methods based on
the spectral stochastic collocation approach [208] and MC method and showing
the superiority of the presented method.
We remark that we have put less emphasis on the acceleration techniques during
the extraction processes such as the multiple-pole scheme [115], the hierarchical
methods [164, 199], using the more sophisticated iterative solvers such as general
minimal residue (GMRES) [149], which actually are the key components of those
methods. The reason is that this is not the focus area where our major contributions
are made. We believe those existing acceleration techniques can significantly speed
up the presented method as they did for the deterministic problem. This is especially
the case for the hierarchical approach [164]: the number of panels (thus the random
variables) can be considerably reduced and the interactions between panels are
constant. These are the areas for our future investigations.
For m conductors system, the capacitance extraction problem based on the BEM
formulation is to solve the following integral equation [118]:
Z
1 ! !
! !
.xj /daj D v.xi /; (11.1)
S j xi xj j
! !
where .xj / is the charge distribution on the surface at conductor j , v.xi / is the
potential at conductor i , and ! 1 ! is the free space Green function.1 daj is the
j xi xj j
! !
surface area on the surface S of conductor j . xi and xj are point vectors. To solve
for capacitance from one conductor to the rest, we set the conductor’s potential to
be one and all other m 1 conductors’ potential to be zero. The resulting charges
computed are capacitance. BEM method divides the surfaces into N small panels
and assumes uniform charge distribution on each panel, which transforms (11.1)
into a linear algebraic equation:
P q D v; (11.2)
where P 2 RN N is the potential coefficient matrix, q is the charge on panels, and
v is the preset potential on each panel. By solving the above linear equation, we can
obtain all the panel charges (thus capacitance values). In potential coefficient matrix
P , each element is defined as
Z
1 ! !
Pij D G.xi ; xj /daj ; (11.3)
sj Sj
! ! 1 !
where G.xi ; xj / D ! ! is the Green function of point source at xj . Sj is the
j xi xj j
surface of panel j and sj is the area of panel j .
Process variations introducing conductor geometry variations are reflected on the
fact that the size of panel and distances between panels become random variables.
Here we assume the panel is still a two-dimensional surface. These variations will
make each element in capacitance matrix follow some kinds of random distributions.
The problem we need to solve now is to derive this random distribution and then to
1
Note that the scale factor 1=.40 / can be ignored here to simplify the notation and is used in the
implementation to give results in units of farads.
effectively compute the mean and variance of involved capacitance given geometry
randomness parameters.
In this chapter, we follow the variational model introduced in [74], where each
point in panel i is disturbed by a vector ni that has the same direction as the
normal direction of panel i :
!0 !
xi Dxi Cni ; (11.4)
where the length of the ni follows Gaussian distribution jni j N.0; 2 /. If
the value is negative, it means the direction of the perturbation is reversed. The
correlation between random perturbation on each panel is governed by the empirical
formulation such as the exponential model [212]:
.r/ D e r
2 =2
; (11.5)
where r is the distance between two panel centers and is the correlation length.
The most straightforward method is to use MC simulation to obtain distributions,
mean values, and variances of all those capacitance. But the MC method will
be extremely time consuming as each sample run requires the formulation of the
changed potential coefficient matrix P .
3 Presented Orthogonal PC-Based Extraction

Method: StatCap
In this section, we present the new spectral-stochastic-based method, StatCap,

which uses the OPC to represent random variables starting from the geometry
parameters.
In our presented method, we first represent the variation potential matrix P into
a first-order form using the Taylor expansion. We then extend our method to handle
the second-order variations in the Sect. 4.
3.1 Capacitance Extraction Using Galerkin-Based Method
Here the charge q. / in (11.2) is an unknown random variable vector (with normal
distribution), then potential coefficient equation becomes
P . /q. / D v; (11.6)
where both P . / and q. / are in Hermite PC form. Then the coefficients can be
computed by using Galerkin-based method in Sect. 3.4 of Chap. 2. The principle of
orthogonality states that the best approximation of v. / is obtained when the error,
. /, defined as
. / D P . /q. / v (11.7)
3 Presented Orthogonal PC-Based Extraction Method: StatCap 167
is orthogonal to the approximation. That is,
h./; Hk ./i D 0; k D 0; 1; : : : ; P; (11.8)
where Hk ./ are Hermite polynomials. In this way, we have transformed the
stochastic analysis process into a deterministic form, whereas we only need to
compute the corresponding coefficients of the Hermite PC.
For the illustration purpose, considering two Gaussian variables D Œ 1 ; 2
,
assuming the charge vector in panels can be written as a second-order (p D 2)
Hermite PC, we have
q./ D q0 C q1 1 C q2 2 C q3 . 12 1/
Cq4 . 22 1/ C q5 . 1 2 /; (11.9)
which will be solved by using augmented potential coefficient matrices to be

discussed in Sect. 3. Once the Hermite PC of q./ is known, the mean and variance
of q./ can be evaluated trivially. Given an example, for one random variable, the
mean and variance are calculated as
E.q.// D q0 ;
Var.q.// D q12 Var. / C q22 Var. 2 1/
D q12 C 2q22 : (11.10)
In consideration of correlations among random variables, we apply PCA to trans-

form the correlated variables into a set of independent variables.
3.2 Expansion of Potential Coefficient Matrix
Specifically, each element in the potential coefficient matrix P can be expressed as

Z
1 ! !
Pij D G.xi ; xj /daj ; (11.11)
sj Sj
! !
where G.xi ; xj / is the free space Green function defined in (11.3).
Notice that if panel i and panel j are far away (their distance is much larger than
the panel area), we can have the following approximation [74]:
! !
Pij G.xi ; xj / i ¤ j: (11.12)
! !
Suppose variation of panel i can be written as ni D ıi ni where ni is the unit
normal vector of panel i and ıi is the scalar variation. Then take Taylor expansion
on the Green function,
! ! 1
G.xi Cni ; xj Cnj / D ! !
(11.13)
j xi xj Cni nj j
1 1
D ! !
Cr ! !
.nj ni / C O..ni nj /2 /: (11.14)
j xi xj j j xi xj j
From free space Green function, we have

!
! ! 1 1 r
rG.xi ; xj / D r ! !
Dr !
D !
(11.15)
j xi xj j j r j j r j3
! ! !
r D xi xj : (11.16)
Now we first ignore the second-order terms to make the variation in the linear
form. As a result, the potential coefficient matrix P can be written as
P P0 C P1 D
0 ! ! ! ! 1
G.x1 Cn1 ; x1 Cn1 / G.x1 Cn1 ; xn Cnn /
B ! ! ! ! C
B G.x2 Cn2 ; x1 Cn1 / G.x2 Cn2 ; xn Cnn / C (11.17)
B C;
B :: :: C
@ : : A
! ! ! !
G.xn Cnn ; x1 Cn1 / G.xn Cnn ; xn Cnn /
where
0 ! ! ! ! ! ! 1
G.x1 ; x1 / G.x1 ; x2 / G.x1 ; xn /
B ! ! ! ! ! ! C
B G.x2 ; x1 / G.x2 ; x2 / G.x2 ; xn / C
B
P0 D B C
:: :: :: C
@ : : : A
! ! ! ! ! !
G.xn ; x1 / G.xn ; x2 / G.xn ; xn /
0 ! ! 1
0 rG.x1 ; xn / .nn n1 /
B ! ! ! ! C
B rG.x2 ; x1 / .n1 n2 / rG.x2 ; xn / .nn n2 /C
B
P1 D B C
:: :: C
@ : : A
! !
rG.xn ; x1 / .n1 nn / 0
We can further write the P1 as the following form:
P1 D V1 N1 J1 J1 N1 V1 ; (11.18)
3 Presented Orthogonal PC-Based Extraction Method: StatCap 169
0 ! ! ! ! 1
0 rG.x1 ; x2 / rG.x1 ; xn /
B ! ! ! ! C
BrG.x2 ; x1 / 0 rG.x2 ; xn /C
B
J1 D B C
:: :: :: C
@ : : : A
! ! ! !
rG.xn ; x1 / rG.xn ; xn1 / 0
0! 1
n1 0
B ! C
B 0 n2 C
N1 D B
B ::
C
: C
@ : :: A
!
0 nn
0 1
ın1 0
B 0 ın2 C
B C
V1 D B : : C;
@ :: :: A
0 ınn
where J1 and N1 are vector matrices and V1 is a diagonal matrix.
To deal with spatial correlation, P1 can be further expressed as a linear
combination of the dominant and independent variables:
D Œ 1 ; 2 ; : : : ; p
(11.19)
through the PCA operation. As a result, V1 can be further expressed as

0 Pp 1
a1i i 0
B
i D1 Pp C
B 0 i D1 a2i i C
B :: :: C (11.20)
@ : : A
Pp
0 i D1 ani i
Finally, we can represent the P1 as
X
P1 D P1i i ; (11.21)
where
P1i D Ai N1 J1 J1 N1 Ai (11.22)
and
0 1
a1i 0 0
B 0 a2i 0 C
B C
Ai D B : :: :: C : (11.23)
@ :: : : A
0 0 ani
3.3 Formulation of the Augmented System
Once the potential coefficient matrix is represented in the affine form as shown in
(11.21), we are ready to solve for the coefficients P1i by using the Galerkin-based
method, which will result in a larger system with augmented matrices and variables.
Specifically, for p independent Gaussian random variables D Œ 1 ; : : : ; p
,
there are K D 2p Cp.p 1/=2 first- and second-order Hermite polynomials. Hi ./
i D 1; : : : ; K represents each Hermite polynomial and H1 D 1 ; : : : ; Hp D p . So
for the vector of variational potential variables q./, it can be written as
X
K
q./ D q0 C qi Hi ./; (11.24)
i D1
where each qi is a vector associated with one polynomial. So the random linear
equation can be written as
! !
X
p
X
K
P q D P0 C P1i Hi q0 C qi Hi D v: (11.25)
i D1 i D1
Expanding the equation and performing inner product with Hi on both sides, we
can derive new linear system equations:
!
X
p
W0 ˝ P0 C Wi ˝ P1i Q D V; (11.26)
i D1
where ˝ is the tensor product and

0 1 0 1
q0 v
B q1 C B0C
B C B C
QDB : CI V D B : C (11.27)
@ :: A @ :: A
qK 0
and
0 1
hHi H0 H0 i hHi H0 H1 i hHi H0 HK i
B C
B hHi H1 H0 i hHi H1 H1 i hHi H1 HK i C
B C
Wi D B :: :: :: C; (11.28)
B C
@ : : hH H H
i l m i : A
hHi HK H0 ihHi HK H1 i hHi HK HK i
4 Second-Order StatCap 171
where hHi Hl Hm i represents the inner productPp of three Hermite polynomials Hi ,

Hl , and Hm . The matrix .W0 ˝ P0 C i D1 Wi ˝ P1i / in (11.26) is called
the augmented potential coefficient matrix. Since Hi are at most second-order
polynomials, we can quickly calculate every element in Wi with a LUT for any
number of random variables.
We remark that matrices Wi are very sparse due to the nature of the inner product.
As a result, their tensor products with P1i will also lead to the very sparse augmented
matrix in (11.26). As a result, we have the following observations regarding the
structure of the Wi and the augmented matrix:
1. Observation 1: W0 is a diagonal matrix.
2. Observation 2: For Wi matrices, i ¤ 0, all the diagonal elements are zero.
Pp 3: All Wi are symmetric and the resulting augmented matrix W0 ˝
3. Observation
P0 C i D1 Wi ˝ P1i is also symmetric.
4. Observation 4: If one element at position .l; m/ in Wi is not zero, i.e.,
Wi .l; m/ ¤ 0, then elements at the same position .l; m/ of Wj , j ¤ i , must be
zero. In other words,
Wi .l; m/ Wj .l; m/ D 0 when i ¤ j;

8 i; j D 1; : : : ; p and l; m D 1; : : : ; K:
Such sparse property can help save the memory significantly as we do not need
to actually perform the tensor product as shown in (11.26). Instead, we can add
all Wi together and expand each element in the resulting matrix by some specific
P1i during the solving process, as there is no overlap among Wi for any element
position.
As the original potential coefficient matrix is quite sparse, low rank, the
augmented matrix is also low rank. As a result, the sparsity, low rank, and symmetric
properties can be exploited by iterative solvers to speed up the extraction process as
shown in the experimental results. In our implementation, the minimum residue
conjugate gradient method [130] is used as the solver since the augmented system
is symmetric.
4 Second-Order StatCap
In this section, we extend StatCap to consider second order perturbations. We show

the derivation of the coefficient matrix element in second-order OPC from the
geometric variables. As a result, the second-order potential coefficient matrix can
be computed very quickly. In our second-order StatCap, we consider both of the
far-field and near-field cases when (11.11) is approximated.
4.1 Derivation of Analytic Second-Order Potential

Coefficient Matrix
Each element in the potential coefficient matrix P can be expressed as

Z Z
1 ! !
Pij D G.xi ; xj /dai daj
si sj Si Sj
Z
1 ! !
G.xi ; xj /daj (11.29)
sj Sj
Z
1 ! !
G.xi ; xj /dai ; (11.30)
si Si
! !
where G.xi ; xj / is the free space Green function defined in (11.3).
!
We assume the same definitions for ni , ıni , and ni as in Sect. 3. If we consider
both first-order and second-order terms, we have the following Taylor expansion
on Pij :
Pij .ni ; nj /
D Pi;j;0 C rPij ni C rPij nj
Cnj T r 2 Pij nj C ni T r 2 Pij ni
C2nj T r 2 Pij ni C O..ni nj /3 /
@Pij @Pij
Pi;j 0 C ıni C ınj
@ni @nj
@2 Pij 2 @2 Pij 2 @2 Pij
C ıni C ın j C 2 ıni ınj : (11.31)
@ni 2 @nj 2 @ni nj
And to deal with the spatial correlation, ni can be further expressed as a linear
combination of the dominant and independent variables in (11.19) through the PCA
operation. As a result,
! !
ni D ıni ni D .ai1 1 C : : : C aip p / ni ; (11.32)
where aiL is defined in (11.20). After that, P will be represented by a linear

combination of Hermite polynomials:
X
p
X
p
P D P0 C P1L L C P2L . L2 1/
LD1 LD1
L1 ¤L
X X2
C P2L1 ;L2 L1 L2 ; (11.33)

L1 L2
where P2L is the coefficient corresponding to the first type of second-order Hermite
polynomial, L2 1, and P2L1 ;L2 means the coefficient corresponding to the second
type of second-order Hermite polynomial, L1 L2 .L1 ¤ L2 /.
4 Second-Order StatCap 173
So for each element Pij in P , the coefficients of orthogonal polynomials can be

computed as follows:
@Pij @Pij
Pij;1L D aiL C ajL ; (11.34)
@ni @nj
2 @2 Pij 2
2 @ Pij
Pij;2L D aiL 2
C ajL
@ni @nj 2
@2 Pij
C 2aiL ajL ; (11.35)
@nj ni
@2 Pij @2 Pij
Pij;2L1 ;L2 D 2aiL1 aiL2 C 2ajL1 ajL2
@ni 2 @nj 2
@2 Pij
C 2.aiL1 ajL2 C aiL2 ajL1 / : (11.36)
@nj ni
Hence, we need to compute analytic expressions for the partial derivatives of Pij
to obtain the coefficients of Hermite polynomials. The details of the derivations for
computing the derivatives used in (11.34)–(11.36) can be found in the appendix
section.
4.2 Formulation of the Augmented System
Similar to Sect. 3, once the potential coefficient matrix is represented in the affine
form as shown in (11.33), we are ready to solve the coefficients P1L , P2L , and
P2L1 ;L2 by using the Galerkin-based method.
In this case, P in (11.33) now is rewritten as
X
p
X
K
P D P0 C P1i Hi C P2i Hi : (11.37)
i D1 i DpC1
So after considering the first-order and second-order Hermite polynomials in P , the

random linear equation can be written as
0 1 !
X p
XK XK
P q D @P0 C P1i Hi C P2i Hi A q0 C qi Hi D v: (11.38)
i D1 i DpC1 i D1
Expanding the equation and performing inner product with Hi on both sides, we
can derive a new linear system:
0 1
X
p
X
K
@W0 ˝ P0 C Wi ˝ P1i C Wi ˝ P2i A Q D V; (11.39)
i D1 i DpC1
Table 11.1 Number of nonzero element in Wi

i D 0 1 i p p C 1 i 2p 2p C 1 i K
# Nonzero K 2p C 2 pC3 2p C 4
where ˝ is the tensor product and Q and V are the same as in (11.27), and Wi has
the same definition as in (11.28).
Again, the matrix in the rhs of (11.39) is the augmented potential coefficient ma-
trix for the second-order StatCap. Since Hi are at most second-order polynomials,
we can still use LUT to calculate every element in Wi for any number of random
variables.
Now we study the properties of augmented potential coefficient matrix. We
review the features and observations we made for the first-order StatCap.
For Wi , which is a K K matrix, where K D p.p C3/=2, the number of nonzero
elements in Wi is showed in Table 11.1. From Table 11.1, we can see that matrices
Wi for i D 1; : : : ; K are still very sparse. As a result, their tensor products with P1i
and P2i will still give rise to the sparse augmented matrix in (11.39).
For the four observations in Sect. 3 regarding the structure of Wi ; i D p C
1; : : : ; K and the augmented matrix, we find that all the observations are still valid
except for Observation 2. As a result, all the efficient implementation and solving
techniques mentioned at the end of Sect. 3 can be applied to the second-order
method.
In this section, we compare the results of the presented first-order and second-
order StatCap methods against MC method and SSCM method [208], which are
based on the spectral stochastic collocation method. The StatCap methods have
been implemented in Matlab 7.4.0. We use minimum residue conjugate gradient
method as the iterative solver. We also implement the SSCM method in Matlab
using the sparse grid package [81, 82]. We do not use any hierarchical algorithm
to accelerate the calculation of the potential coefficient matrix for both StatCap and
SSCM. Instead, we use analytic formula in [194] to compute the potential coefficient
matrices.
All the experimental results are carried out in a Linux system with Intel Quadcore
Xeon CPUs with 2:99 Ghz and 16 GB memory. The initial results of this chapter
were published in [21, 156].
We test our algorithm on six testing cases. The more specific running parameters
for each testing case are summarized in Table 11.2. In Table 11.2, p is the number
of dominant and independent random variables we get through PCA operation and
M C # means the times we run MC method. The 22 bus are shown in Fig. 11.1, and
three-layer metal plane capacitance is shown in Fig. 11.2. In all the experiments, we
Table 11.2 The test cases and the parameters setting

1 1 bus 2 2 bus Three-layer 3 3 bus 4 4 bus 5 5 bus
Panel # 28 352 75 720 1,216 4,140
p 10 15 8 21 28 35
MC # 10,000 6,000 6,000 6,000 6,000 6,000
Fig. 11.1 A 2 2 bus. Reprinted with permission from [156]

c 2010 IEEE
set standard deviation as 10% of the wire width and the , the correlation length, as
200% of the wire width.
First, we compare the CPU times of the four methods. The results are shown in
Table 11.3. In the table, StatCap(1st/2nd) refers to the presented first- and second-
order methods, respectively. SP(X) means the speedup of the first-order StatCap
comparing with MC or SSCM. All the capacitance is in picofarad unit.
It can be seen that both the first- and second-order StatCap are much faster than
both SSCM and the MC method. And for large testing cases, such as the 5 5 bus
case, MC and SSCM will run out of memory, but StatCap still works well. For all
the cases, StatCap can deliver about two orders of magnitude speed up over the
SSCM and three orders of magnitude speed up over MC method. Notice that both
SSCM and StatCap use the same random variables after PCA reduction.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
0.8
0.6 1
0.8
0.4 0.6
0.2 0.4
0.2
0 0
Fig. 11.2 Three-layer metal planes. Reprinted with permission from [156]
c 2010 IEEE
Table 11.3 CPU runtime (in seconds) comparison among MC, SSCM, and StatCap(1st/2nd)
1 1 bus, MC(10,000)
MC SSCM StatCap(1st) StatCap(2nd) SP(MC) SP(SSCM)
2,764 s 49.35 s 1.55 s 3.59 s 1,783 32
2 2 bus, MC(6,000)
63,059 s 2,315 s 122 s 190 s 517 19
Three-layer metal plane, MC(6,000)
16,437 s 387 s 4.11 s 6.67 s 3,999 94
3 3 bus, MC(6,000)
2:2 105 s 7,860 s 408 s 857 s 534 19
4 4 bus, MC(6,000)
–* 3:62 104 1,573 s 6,855 s 260 23
5 5 bus, MC(6,000)
–* – 1:7 104 6:0 104 s – –
* – out of memory
We notice that both MC and SSCM need to compute the potential coefficient
matrices each time the geometry changes. This computation can be significant
compared to the CPU time of solving potential coefficient equations. This is one
6 Additional Notes 177
Table 11.4 Capacitance

mean value comparison for MC SSCM StatCap(1st) StatCap(2nd)
the 11 bus C11 135.92 135.90 136.58 136.21
C12 57.11 57.01 57.49 57.27
C21 57.11 57.02 57.49 57.27
C22 135.94 135.69 136.58 136.21
Table 11.5 Capacitance

MC SSCM StatCap(1st) StatCap(2nd)
standard deviation
comparison for the 1 1 bus C11 2.42 2.49 3.13 2.63
C12 1.71 1.74 2.02 1.86
C21 1.72 1.71 2.02 1.86
C22 2.51 2.52 3.19 2.63
of the reasons that SSCM and MC are much slower than StatCap, in which the
augmented system only needs to be set up once.
Also, SSCM uses the sparse grid scheme to reduce the collocation points in
order to derive the coefficients of OPC. But the number of collocation points is still
in the order of O.m2 / for the second-order Hermite polynomials, where m is the
number of variables. Thus, it requires O.m2 / solutions for the different geometries.
In our algorithm, we also consider the second-order Hermite polynomials. But we
only need to solve the augmented system once. The solving process can be further
improved by using some advanced solver or acceleration techniques.
Next, we perform the accuracy comparison. The statistics for 1 1 bus case from
the four algorithms are summarized in Tables 11.4 and 11.5 for the mean value
and standard deviation, respectively. The parameter settings for each case are listed
in Table 11.2. We make sure that SSCM and the first-order and the second-order
StatCap use the same number of random variables after the PCA operations.
From these two tables, we can see that first-order StatCap, second-order StatCap,
and SSCM give the similar results for both mean value and standard deviation
compared with the MC method. For all the other cases, the times we carry out MC
simulations are as shown in Table 11.3, and the similar experimental results can
be obtained. The maximum errors and average errors of mean value and standard
deviation for all the testing cases are shown in Tables 11.6 and 11.7. Compared to
the MC method, the accuracy of the second-order StatCap is better than the first-
order StatCap method, while from Table 11.3, the speed of second-order StatCap
keeps in the same order as first-order StatCap and is still much faster than SSCM
and MC.
6 Additional Notes
In this appendix section, we detail the derivations for computing derivatives in

(11.34)–(11.36).
Table 11.6 Error

1 1 bus, MC(10,000) as standard
comparison of capacitance
mean values among SSCM, SSCM StatCap(1st) StatCap(2nd)
and StatCap (first- and Max err 0.19% 0.67% 0.28%
second-order) Avg err 0.14% 0.57% 0.24%
SSCM StatCap(1st) StatCap(2nd)
Max err 0.32% 0.49% 1.19%
Avg err 0.15% 0.24% 0.89%
Three-layer metal plane, MC(6,000) as standard
Max err 0.30% 1.84% 0.81%
Avg err 0.14% 0.90% 0.58%
Max err 0.33% 0.81% 0.43%
Avg err 0.11% 0.58% 0.11%
4 4 bus, SSCM as standard
Max err 0 0.76% 0.35%
Avg err 0 0.40% 0.09%
5 5 bus, StatCap(2nd) as standard
Max err – 0.59% 0
Avg err – 0.28% 0
First, we consider the scenario where panel i and panel j are far away (their
distance is much larger than the panel area). In this case, the approximations in
(11.12) and (11.13) are still valid. From free space Green function, we have (11.15)
and (11.16) for the first-order Hermite polynomails, and we have the following for
the second-order Hermite polynomails:
1
Pij;0 D ! !
; (11.40)
j xi xj j
! !
@Pij r ni
D ! ; (11.41)
@ni j r j3
! !
@Pij r nj
D ! ; (11.42)
@nj j r j3
! !
@2 Pij 3. r ni /2 1
2
D !
! ; (11.43)
@ni j r j 5 j r j3
Table 11.7 Error

comparison of capacitance
standard deviations among SSCM StatCap(1st) StatCap(2nd)
SSCM, and StatCap (first- Max err 2.48% 29.34% 8.77%
and second-order) Avg err 2.29% 23.38% 7.91%
Max err 14.28% 12.98% 25.99%
Avg err 6.11% 8.51% 6.04%
3-layer metal plane, MC(6,000) as standard
Max err 8.35% 16.26% 2.38%
Avg err 3.37% 5.06% 0.86%
Max err 23.32% 21.39% 11.75%
Avg err 3.33% 10.35% 4.38%
4 4 bus, SSCM as standard
Max err 0 25.7% 6.68%
Avg err 0 16.1% 3.89%
5 5 bus, StatCap(2nd) as standard
Max err – 17.5% 0
Avg err – 7.92% 0
! !
@2 Pij 3. r nj /2 1
2
D !
! ; (11.44)
@nj j r j5 j r j3
! ! ! !
@2 Pij 3. r nj /. r ni /
D !
: (11.45)
@nj ni j r j5
Second, we consider the scenario where panel i and panel j are near each other
(their distance is comparable with the panel area). In this case, the approximation
in (11.12) is no longer accurate and we must consider the general form in (11.29)
and (11.30).
@P @2 P
Since panel i panel j are perpendicular to ni /nj , for @nijj and @njij2 , with
(11.29), we have
R ! !
@Pij @ s1j Sj G.xi ; xj /daj

@nj @nj
R
@ s1j Sj j! !
1
daj
xi xj Cni nj j
D
@nj
Z @ ! !
1
1 j xi xj Cni nj j
D daj
sj Sj @nj
Z ! !
1 r nj
D !
daj
sj Sj j r j3
! ! Z
r nj 1
D !
daj ; (11.46)
sj Sj j r j3
R ! !
@2 Pij @2 s1j Sj G.xi ; xj /daj

@nj 2 @nj 2
R
@2 s1j Sj j! !
1
daj
xi xj Cni nj j
D
@nj 2
Z @2 ! !
1
1 j xi xj Cni nj j
D daj
sj Sj @nj 2
Z ! !
1 3. r nj /2 1
D !
daj !
sj j r j5
Sj j r j3
! ! Z Z
3. r nj /2 daj 1 daj
D !
!
: (11.47)
sj Sj j r j5 sj Sj j r j3
Similarly, with (11.30), we can further obtain

R ! !
@Pij @ s1i Si G.xi ; xj /dai

@ni @ni
! ! Z
r ni 1
D !
dai ; (11.48)
si Si j r j3
R ! !
@2 Pij @2 s1i Si G.xi ; xj /dai

@ni 2 @ni 2
! ! Z Z
3. r ni /2 dai 1 dai
D !
!
: (11.49)
si Si j r j5 si Si j r j3
@2 P
ij
While for @nj n i
, we need to further consider two cases. First, when panel i and
panel j are in parallel, we have
@2 Pij @2 Pij @2 Pij

D D : (11.50)
@ni 2 @nj 2 @nj ni
Second, we consider panel i and panel j are not in parallel. Then we arrive
@P
@2 Pij @ @niji
D
@nj ni @nj
!
r ni R
!
1
@ si Si ! dai s
j r j3
D
@nj
R 1
! ! @ dai
r ni Si j ! r j3
D : (11.51)
si @nj
Assume the conductors are rectangular geometries. Then two panels should be either
in parallel or perpendicular. Since panel i and panel j are not parallel, these two
panels will be perpendicular.
Without loss of generality, we assume that panel i is in parallel with xz-plane
! !
and panel j is in parallel with yz-plane. Then, easy to see, ni D .0; 1; 0/ and nj D
.1; 0; 0/. Let ukl , k; l 2 f0; 1g denote the four corners of panel i , with .xi k ; yi ; zi l /
being the Cartesian cooridinates of corner ukl , and the center of gravity is .xi ; yi ; zi /.
Let tkl , k; l 2 f0; 1g denote the four corners of panel j , with .xj ; yj k ; zj l / being the
Cartesian cooridinates of corner tkl , and the center of gravity is .xj ; yj ; zj /.
After that, (11.51) can be further deduced to
Rx Rz dxdz
@ xii10 zii10
@2 Pij yj yi !
j r j3
D
@nj ni si @xj

R x x R zi1 dz
@ xii10xjj zi 0 ! dx
yj yi j r 0 j3
D
si @xj
0 1
Z z Z zi1
yj yi B
B i1 dz dz CC
D B ˇ ˇ ˇ ˇ3 C
si @ zi 0 ˇ ! ˇ3 ˇ !C ˇ A
ˇr ˇ zi 0
ˇr ˇ
ˇ ˇ
yj yi X X
1 1
.1/kClC1 .zi l zj /
D
si ..xi k xj /2 C .yi yj /2 /
kD0 lD0
!
1
p (11.52)
.xi k xj /2 C .yi yj /2 C .zi l zj /2
where
!
q
r D .x xj /2 C .yi yj /2 C .z zj /2 ;
! q
0
r D .x/2 C .yi yj /2 C .z zj /2 ;
! q
rC D .xi1 xj /2 C .yi yj /2 C .z zj /2 ;
! q
r D .xi 0 xj /2 C .yi yj /2 C .z zj /2 :
7 Summary
In this chapter, we have introduced a statistical capacitance extraction method,

called StatCap, for three-dimensional interconnects considering process variations.
The presented method is based on the orthogonal polynomial method to represent
the variational geometrical parameters in a deterministic way. We consider both
first-order and second-order variational effects. The presented method avoids the
sampling operations in the existing collocation-based spectral stochastic method.
The presented method solves an enlarged potential coefficient system to obtain the
coefficients of OPC for capacitance. StatCap only needs to setup the augmented
equation once and can exploit the sparsity and low-rank property to speed up
the extraction process. The presented StatCap method can consider second-order
perturbation effects to generate more accurate quadratic variational capacitance.
Numerical examples show that our method is two orders of magnitude faster than
the recently proposed statistical capacitance extraction method based on the spectral
stochastic collocation method and many orders of magnitude faster than the MC
method for several practical interconnect structures.
Chapter 12
Incremental Extraction of Variational
Capacitance
1 Introduction
Since the interconnect length and cross area are at different scales, the variational
capacitance extraction is quite different between the on-chip [21, 205, 209] and
the off-chip [34, 210]. The on-chip interconnect variation from the geometrical
parameters, such as width of one panel and distance between two panels, is more
dominant [21, 209] than the rough surface effect seen from the off-chip package
trace. However, it is unknown how to leverage the stochastic process variation into
the matrix-vector product (MVP) by fast multipole method (FMM) [21, 34, 205,
209, 210]. Similar to deal with the stochastic analog mismatch for transistors [133],
a cost-efficient full-chip extraction needs to explore an explicit relation between
the stochastic variation and the geometrical parameter such that the electrical
property can show an explicit dependence on geometrical parameters. Moreover, the
expansion by OPC with different collocation schemes [21, 34, 187, 196, 209] always
results in an augmented and dense system equation. This significantly increases
the complexity when dealing with a large-scale problem. The according GMRES
thereby needs to be designed in an incremental fashion to consider the update
from the process variation. As a result, a scalable extraction algorithm similar to
[77, 118, 163] is required to consider the process variation with the new MVP and
GMRES developed accordingly as well.
To address the aforementioned challenges, this chapter introduces a new tech-
nique [56], which contributes as follows. First, to reveal an explicit dependence
on geometrical parameters, the potential interaction is represented by a number of
GMs. As such, the process variation can be further included by expanding the GMs
with the use of orthogonal polynomial chaos, OPC, called SGMs in this chapter.
Next, with the use of the SGM, the process variation can be incorporated into a
modified FMM algorithm that evaluates the MVP in parallel. Finally, an incremental
GMRES method is introduced to update the preconditioner with different variations.
Such a parallel and incremental full-chip capacitance extraction considering the
stochastic variation is called piCAP. Parallel and incremental analyses are the two

184 12 Incremental Extraction of Variational Capacitance
effective techniques in reducing computational cost. Experiments show that the

presented method with stochastic polynomial expansion is hundreds of times faster
than the MC-based method while maintaining a similar accuracy. Moreover, the
parallel MVP in the presented method is up to 3 faster than the serial method,
and the incremental GMRES in the presented method is up to 15 faster than
nonincremental GMRES methods.
2 Review of GRMES and FMM Algorithms
2.1 The GMRES Method
The resulting potential coefficient matrix P is usually dense in the BEM method
in Sect. 2 of Chap. 11. As such, directly solving (11.2) would be computationally
expensive. FastCap [118] applies an iterative GMRES method [149] to solve (11.2).
Instead of performing an expensive LU decomposition of the dense P , GMRES first
forms a preconditioner W such that W 1 P has a smaller condition number than
P , which can accelerate the convergence of iterative solvers [150]. Take the left
preconditioning as an example:
.W 1 P /q D W 1 b:
Then, using either ME [118], low-rank approximation [77], or the hierarchical-

tree method [163] to efficiently evaluate the MVP for .W 1 P /qi (qi is the solution
for i -th iteration), the GMRES method minimizes the residue error
min W jjW 1 b .W 1 P /qi jj
iteratively till converged.

Clearly, the use of GMRES requires a well-designed preconditioner and a fast
MVP. In fact, FMM is able to accelerate the evaluation of MVP with O.N / time
complexity where N is the number of variables. We will introduce FMM first as
what follows.
2.2 The Fast Multipole Method
The FMM was initially proposed to speed up the evaluation of long-ranged particle
forces in the N-body problem [141,193]. It can also be applied to the iterative solvers
by accelerating calculation of MVP [118]. Let us take the capacitance extraction
problem as an example to introduce the operations in the FMM. In general, the
FMM discretizes the conductor surface into panels and forms a cube with a finite
height containing a number of panels. Then, it builds a hierarchical oct-tree of cubes
and evaluates the potential interaction P at different levels.
3 Stochastic Geometrical Moment 185
Fig. 12.1 Multipole operations within the FMM algorithm. Reprinted with permission from [56]
c 2011 IEEE
Specifically, the FMM first assigns all panels to leaf cells/cubes, and computes
the MEs for all panels in each leaf cell. Then, FMM calculates the multipole
expansion of each parent cell using the expansions of its children cells (called M2M
operations in upward pass). Next, the local field expansions of the parent cells can
be obtained by adding multipole expansions of well-separated parent cells at the
same levels (called M2L operations). After that, FMM descends the tree structure
to calculate the local field expansion of each panel based on the local expansion of
its parent cell (called L2L in downward pass). All these operations are illustrated
within Fig. 12.1.
In order to further speed up the evaluation of MVP, the presented stochastic
extraction has a parallel evaluation P q with variations, which is discussed in Sect. 4
and an incremental preconditioner, which is discussed in Sect. 5. Both of these
features depend on how to find an explicit dependence between the stochastic
process variation and the geometric parameters, which is discussed in Sect. 3.
3 Stochastic Geometrical Moment
With FMM, the complexity of MVP P q evaluation can be reduced to O.N / during
the GMRES iteration. Since the spatial decomposition in FMM is geometrically
dependent, it is helpful to express P using GMs with an explicit geometry
dependence. As a result, this can lead to an efficient recursive update (M2M, M2L,
L2L) of P on the oct-tree. The geometry dependence is also one key property to
preserve in presence of the stochastic variation. In this section, we first derive the
geometrical moment and then expand it by stochastic orthogonal polynomials to
calculate the potential interaction with variations.
3.1 Geometrical Moment
Process variation includes global systematic variations and local random variations.
This chapter focuses on local random variations, or stochastic variations, which
is more difficult to handle. Note that although there are many variation sources,
without loss of generality, the chapter considers two primary geometrical parameters
with stochastic variation for the purpose of illustration: panel distance (d ) and panel
width (h). Due to the local random variation, the width of the discretized panel, as
well as the distance between panels, may show random deviations from the nominal
value. Though there could exist a systematic correlation between d and h for each
panel, PCA in Sect. 2.2 of Chap. 2 can be first applied to decouple those correlated
parameters, and hence, potentially reduce the number of random variables. After the
PCA for the global systematic variation, we focus on the more challenging part: the
local random variation. With expansions in Cartesian coordinates, we can relate the
potential interaction with the geometry parameter through GMs that can be extended
to consider stochastic variations.
Let the center of an observer cube be r0 and the center of a source cube to be rc .
We assume that the distance between the i th source panel and rc is a vector r:
r D rx !

x C ry !

y C rz!
z
with jrj D r, and the distance between r0 and rc is a vector d
d D dx !

x C dy !

y C dz!
z
with jdj D d .
In Cartesian coordinates (x y z), when the observer is outside the source
region (d > r), a multipole expansion (ME) [9, 72] can be defined as
0 1
1 X .1/p 1
D .„ƒ‚…
r r/ @ r A
jr dj pŠ „ ƒ‚ … r„ ƒ‚ … d
pD0 p p p
X X
D Mp D lp .d /mp .r/; (12.1)
pD0 pD0
by expanding r around rc , where

3 Stochastic Geometrical Moment 187
1
l0 .d / D ; m0 .r/ D 1;
d
dk
l1 .d / D 3 ; m1 .r/ D rk ;
d
3dk dl 1
l2 .d / D ; m2 .r/ D .3rk rl ıkl r 2 /;
d5 6
:::;
1 .1/p
lp .d / D „
r ƒ‚
r
… ; mp .r/ D . r r /: (12.2)
d pŠ „ƒ‚…
p p
Note that dk ; dl are the coordinate components of vector r in Cartesian coordinates.

The same is true for rk and rl . r is the Laplace operator to take the spatial difference,
ıkl is the Kronecker delta function, and .r r/ and .r r d1 / are rank-p tensors
with x ˛ ; y ˇ ; z (˛ C ˇ C D p) components.
Assume that there is a spatial shift at the source-cubic center, rc , for example,
change one child’s center to its parent’s center by h (jhj D c h), where c is a
constant and h is the panel width. This leads to the following transformation for mp
in (12.2):
m0p D ..r C h/ .r C h//

„ ƒ‚ …
p
X
p
pŠ
D mp C .h h/mpj : (12.3)
qD0
qŠ.p q/Š „ƒ‚…
j
Moreover, when the observer is inside the source region (d < r), a local
expansion (LE) under Cartesian coordinates is simply achieved by exchanging d
and h in (12.1)
1 X X
D Lp D mp .h/lp .r/: (12.4)
jr hj pD0 pD0
Also, when there is a spatial shift of the observer-cubic center r0 , the shift of
moments lp .r/ can be derived similarly to (12.3).
Clearly, both Mp , Lp and their spatial shifts show an explicit dependence on the
panel width h and panel distance d . For this reason, we call Mp and Lp GMs. As
such, we can also express the potential coefficient
(P
Mp if d > r;
40 P .h; d / ' PpD0 (12.5)
pD0 Lp otherwise;
as a geometrical-dependence function P .h; d / via GMs.

Moreover, assuming that local random variations are described by two random
variables. h for the panel width h, and d for the panel distance d , the stochastic
forms of Mk and Lk become
MO p .h ; d / D Mp .h0 C h1 h ; d0 C d1 d /;
LO p .h ; d / D Lp .h0 C h1 h ; d0 C d1 d /; (12.6)
where h0 and d0 are the nominal values and h1 as well as d1 defines the perturbation
range (% of nominal). Similarly, the stochastic potential interaction becomes
PO .h ; d /.
3.2 Orthogonal PC Expansion
By expanding the stochastic potential interaction PO .h ; d / with OPC, we can

further derive the SGMs similarly as Sect. 4 of Chap. 11.
We use n D 1 as an example to illustrate the general expression in Sect. 4 of
Chap. 11. First, the potential coefficient matrix PO can be expanded with the first two
Hermite polynomials by
PO ./ D P0 ˚0 ./ C P1 ˚1 ./ D P0 C P1 :
Then, the Wk .k D 0; 1/ matrix becomes

0 1 0 1
10 0 01 0
W0 D @ 0 1 0 A ; W1 D @ 1 0 2 A;
00 1 02 0
and the newly augmented coefficient system can be written as
P D W0 ˝ P0 C W1 ˝ P1
0 1 0 1
P0 0 0 0 P1 0
D @ 0 P0 0 A C @ P1 0 2P1 A
0 0 P0 0 2P1 0
0 1
P0 P1 0
D @ P1 P0 2P1 A : (12.7)
0 2P1 P0
By solving q0 , q1 ; : : : and qn , the Hermite polynomial expansion of charge

density can be obtained. Especially, the mean and the variance can be obtained from
E.q.d // D q0 ;
Var.q.d // D q12 Var.d / C q22 Var.d2 1/ D q12 C 2q22 :
4 Parallel Fast Multipole Method with SGM 189
Fig. 12.2 Structure of 0

augmented system in piCAP
10 P0 P1
Matrix Row Index

20
30 P1 P0 2P1
40
50
2P1 P0
60
0 10 20 30 40 50 60
Matrix Column Index
Note that under a BEM formulation, the expanded terms Pi are still dense. With
a single plate example, we show the structure of augmented system in (12.7) as
Fig. 12.2.
Considering that the dimension of PO is further augmented, the complexity to
solve the augmented system (11.25) would be expensive. In the following, we
present a parallel FMM to reduce the cost of MVP evaluations in Sect. 4 and an
incremental preconditioner to reduce the cost of GMRES evaluation in Sect. 5.
4 Parallel Fast Multipole Method with SGM
As discussed in Sect. 3, we need an efficient evaluation of MVP P Q for the

augmented and dense system (11.25). The block structure of the matrix blocks in
P can be utilized to simplify the evaluation of MVP (P Q). In the framework
of a parallel FMM, each product of Pi;j qi .q D q0 ; q1 ; : : : ; qn /, the MVPs of
both nominal values, and their variations can be efficiently evaluated at the block
level before being summed to obtain the final P Q. Though the parallel FMM
has been discussed before such as in [201], the extension to deal with stochastic
variation for capacitance extraction needs to be addressed in the content of SGMs.
In the following, we illustrate the parallel FMM considering the process variation.
The first step of a parallel FMM evaluation is to hierarchically subdivide space in
order to form the clusters of panels. This is accomplished by using a tree structure
to represent each subdivision. We assume that there are N panels at the finest
(or bottom) level. Providing depth H , we build an oct-tree with H D dlog8 Nn e by
assigning n panels in one cube. In other words, there are 8h cubes at the bottom level.
A parallel FMM further distributes a number of cubes into different processors to
evaluate P. The decomposition of the tasks needs to minimize the communication
Center of leaf source M2M

Center of parent source
Fig. 12.3 The M2M operation in an upward pass to evaluate local interactions around sources
cost and balance the workload. In the following steps, the stochastic P Q is
evaluated in two passes: an upward pass for multipole expansions (MEs) and a
downward pass for local expansions (LEs), both of which are further illustrated
with details below.
4.1 Upward Pass
The upward pass manages the computation during the source expansion, which is
illustrated in Fig. 12.3.
It accumulates the multipole-expanded near-field interaction starting from the
bottom level (l D 0). For each child cube (leaf) without variation (nominal contribu-
tion to P0 ) at the bottom level, it first evaluates the stochastic geometrical moment
with (12.1) for all panels in that cube. If each panel experiences a variation d or h ,
it calculates Pi ./ q.i ¤ 0; D d ; h / by adding perturbation hi h or di d to
consider different variation sources, and then evaluates the SGMs with (12.6).
After building the MEs for each panel, it transverses to the upper level to consider
the contribution from parents as shown in Fig. 12.3. The moment of a parent cube
can be efficiently updated by summing the moments of its eight children via an
4 Parallel Fast Multipole Method with SGM 191
M2M operation. Based on (12.3), the M2M translates the children’s MO p into their
parents.
The M2M operations at different parents are performed in parallel since there is
no data dependence. Each processor builds its own panels’ SGMs while ignoring
the existence of other processors.
4.2 Downward Pass
The potential evaluation for the observer is managed during a downward pass. At
lth level (l > 0), two cubes are said to be adjacent if they have at least one common
vertex. Two cubes are said to be well separated if they are not adjacent at level l but
their parent cubes are adjacent at level l 1. Otherwise, they are said to be far from
each other. The list of all the well-separated cubes from one cube at level l is called
the interaction list of that cube.
From the top level l D H 1, interactions from the cubes on the interaction
list to one cube are calculated by an M2L operation at one level (M2L operation
at top level, which is illustrated in Fig. 12.4). Assuming that a source-parent center
rc is changed to an observer-parent’s center r0 , this leads to an LE (12.4) using the
ME (12.1) when exchanging the r and d: As such, the M2L operation translates the
source’s MO p into the observer’s LO p for a number of source-parents on the interaction
list of one observer-parent at the same level. Due to the use of the interaction list,
the M2L operations have the data dependence that introduces overhead for a parallel
evaluation.
After the M2L operation, interactions are further recursively distributed down to
the children from their parents by an L2L operation (converse of the upward pass
shown in Fig. 12.5). Assume that the parent’s center r0 is changed to the child’s
center r00 by a constant h. Identical to the M2M update by (12.3), an L2L operation
updates r by r0 D r C h for all children’s L O k s. In this stage, all processors can
perform the same M2L operation at the same time on different data. This perfectly
employs the parallelism.
Finally, the FMM sums the L2L results for all leaves at the bottom level (l D 0)
and tabulates the computed products Pi qj (i; j D 0; 1; : : : ; n). By summing up
the products in order, the FMM returns the product P Q.i / in (11.25) for the next
GMRES iteration.
4.3 Data Sharing and Communication
The total runtime complexity for the parallel FMM using stochastic GMs can be es-
timated by O.N=B/ C O.log8 B/ C C.N; B/, where N is the total number of panels
Center of parent source M2L

Center of parent observer
Observer c
Source c
Fig. 12.4 The M2L operation in a downward pass to evaluate interactions of well-separated source
cube and observer cube
and B is the number of used processors. The C.N; B/ implies communication or

synchronization overhead.
Therefore, it is desired to minimize the overhead of data sharing and communi-
cation during a parallel evaluation. In the presented parallel FMM implementations,
the message-passing interface (MPI) is used for data communication and synchro-
nization between multiple processors. We notice that data dependency mainly comes
from the interaction list during M2L operations. In this operation, a local cube
needs to know the ME moments from cubes in its interaction list. To design a
task distribution with small latency between computation and communication, the
implementation uses a complement interaction list and prefetch operation.
As shown in Fig. 12.6, the complement interaction list (or dependency list) for
the cube under calculation records cubes that require their ME moments to be
listed within the shaded area. As such, the studied cube first anticipates which ME
moments will be needed by other dependent cubes (such as Cube 0, : : :, Cube k
shown in Fig. 12.6). Then, it distributes the required ME moments to these cubes
prior to the computation. From the point of view of these dependent cubes, they can
“prefetch” the required ME moments and perform their own calculations without
stalls. Therefore, the communication overhead can be significantly reduced.
5 Incremental GMRES 193
Center of leaf observer L2L

Center of parent observer
Fig. 12.5 The L2L operation in a downward pass to sum all integrations
Fig. 12.6 Prefetch operation

Cube 1
in M2L. Reprinted with
permission from [56]
c 2011 IEEE
Cube 0
Cube k
Cube under calculation

Cube 0
Cube 1
… Dependency List
Cube k
…
5 Incremental GMRES
The parallel FMM presented in Sect. 4 provides a fast MVP for the fast GMRES
iteration. As discussed in Sects. 2 and 3, another critical factor for a fast GMRES
is the construction of a good preconditioner. In this section, to improve the
convergence of GMRES iteration, we first present a deflated power iteration to

improve convergence during the extraction. Then, we introduce an incremental
precondition in the framework of the deflated power iteration.
5.1 Deflated Power Iteration
The convergence of GMRES can be slow in the presence of degenerated small

eigenvalues of the potential matrix P such as the case for most extraction problems
with fine meshes. Constructing a preconditioner W to shift the eigenvalue distri-
bution (spectrum) of a preconditioned matrix W P can significantly improve the
convergence [49]. This is one of the so-called deflated GMRES methods [166].
To avoid fully decomposing P, an implicitly restarted Arnoldi method by
ARPACK1 can be applied to find its first K eigenvalues Œ1 ; : : : ; K and its Kth-
order Krylov subspace composed by the first K eigenvector VK D Œv1 ; : : : ; vK ,
where
PVK D VK DK ; VKT VK D I: (12.8)
Note that DK is a diagonal matrix composed of the first K eigenvalues
DK D VKT AVK D diagŒ1 ; : : : ; K : (12.9)
Then, an according spectrum preconditioner is formed:

1 T
W D I C .VK DK VK /; (12.10)
which leads to a shifted eigenspectrum using
.W P/vi D . C i /vi i D 1; : : : ; K: (12.11)
Note that is the shifting value that leads to a better convergence. This method
is called deflated power iteration. Moreover, as discussed below, the spectral
preconditioner W can be easily updated in an incremental fashion.
5.2 Incremental Precondition
The essence of the deflated GMRES is to form a preconditioner that shifts

degenerated small eigenvalues. For a new P 0 with updated ıP, the distribution
of the degenerated small eigenvalues changes accordingly. Therefore, given a
1
http://www.caam.rice.edu/software/ARPACK/.
5 Incremental GMRES 195
preconditioner W for the nominal system with the potential matrix P .0/ , it would
be expensive for another native Arnoldi iteration to form a new preconditioner W 0
for a new P 0 with updated ıP from P .1/ , : : :, P .n/ . Instead, we show that W can be
incrementally updated as follows.
If there is a perturbation ıP in P, the perturbation ıvi of i th eigenvector vi
.k D 1; : : : ; K/ can be given by [171]:
ıvi D Vi Bi1 ViT ıPvi : (12.12)
Note that Vi is the subspace composed of
Œv1 ; : : : ; vj ; : : : ; vK ;
and Bi is the perturbed spectrum
diagŒi 1 ; : : : ; i j ; : : : ; i K ;
(j ¤ i; i; j D 1; : : : ; K). As a result, ıVK can be obtained similarly for K

eigenvectors.
Assume that the perturbed preconditioner is W 0 :
W 0 D .I C VK0 .DK
0 1
/ .VK0 /T /
D W C ıW; (12.13)
where
VK0 D VK C ıVK ; 0
DK D .VK0 /T P VK0 : (12.14)
After expanding VK0 by VK and ıVK , the incremental change in the preconditioner
W can be obtained by
1 1
ıW D .EK VK DK FK DK VK /; (12.15)
where
1 T 1 T T
EK D ıVK DK VK C .ıVK DK VK / ; (12.16)
and
FK D ıVKT VK DK C .ıVKT VK DK /T : (12.17)
Note that all the above inverse operations only deal with the diagonal matrix DK ,
and hence, the computational cost is low.
Since there is only one Arnoldi iteration to construct a nominal spectral
preconditioner W , it can only be efficiently updated when ıP changes. For example,
ıP is different when one alters the perturbation range h1 of panel width or changes
the variation type from panel width h to panel distance d . We call this deflated
GMRES method with the incremental precondition an iGMRES method.
For our problem in (11.25), we first analyze an augmented nominal system with
W D diagŒW; W; : : : ; W ;
P D diagŒP .0/ ; P .0/ ; : : : ; P .0/ ;
DK D diagŒDK ; DK ; : : : ; DK ;
VK D diagŒVK ; VK ; : : : ; VK ;
which are all block diagonal with n blocks. Hence, there is only one preconditioning
cost from the nominal block P .0/ . In addition, the variation contributes to the
perturbation matrix by
0 1
0 P0;1 P0;n
B P1;0 0 P1;n C
B C
ıP D B : :: : : :: C : (12.18)
@ :: : : : A
Pn;0 Pn;1 0
6 piCAP Algorithm
We further discuss how to apply iGMRES to the presented stochastic capacitance

extraction in this part. For a full-chip extraction, simultaneously considering
variations from all kinds of geometrical parameters would significantly increase
model complexity, if at all possible. In this chapter, we study the stochastic variation
contributed by each parameter individually in an incremental fashion. Together
with the incremental GMRES discussed above, the computational cost can be
dramatically reduced for a large-scale extraction.
6.1 Extraction Flow
The overall parallel extraction flow in piCAP is presented in Fig. 12.7. First, piCAP
discretizes conductor surfaces into small panels, and builds a hierarchical oct-tree
of cubes which will be distributed into many processors. Then, it sets the potential
of certain conductor j as 1 volt while other conductors are grounded. After that,
the spectrum preconditioner W is built according to the variational system P, and
updated partially for different variation sources. With the preconditioner, piCAP
uses GMRES to solve the augmented linear system P Q D B iteratively till
convergence. Parallel FMM described in Sect. 4 is then performed to provide MVP
P Q efficiently for GMRES. Finally, the variational capacitance Cij can be
achieved by summing up panel charges on conductor i .
As an example, we can take the procedure for panel distance d . With first-order
OPC expansion and the inner product, we can have the below augmented potential
coefficient matrix:
6 piCAP Algorithm 197
Fig. 12.7 Stochastic capacitance extraction algorithm
P D P .0/ C ıP
0 1 0 1
P0 0 0 0 P1 0
D @ 0 P0 0 A C @ P1 0 2P1 A
0 0 P0 0 2P1 0
0 1
P0 P1 0
D @ P1 P0 2P1 A : (12.19)
0 2P1 P0
Notice that the first-order OPC expansion is used here for illustration, and a
higher order expansion can provide more accurate variance information.
With the spectrum precondition in Sect. 5, we can build W .0/ for P .0/ and ıW
for ıP. Thus, the preconditioner W for an augmented system can be written as
W D W .0/ C ıW: (12.20)

Therefore, the preconditioned GMRES can be used to solve the linear system
P Q D B with W as the preconditioner. In each iteration, the parallel FMM in
Sect. 4 is involved to provide the MVP P Q quickly. More specifically, FMM
first calculates geometric moments for potential coefficient P0 in P .0/ with (12.5).
Then, it introduces a certain range perturbation d1 (% of nominal) to panel distance
d and recalculates the geometric moments for P1 in ıP according to (12.9). With
all geometric moments, FMM can evaluate P .0/ and ıP, and then return the final
MVP P Q.
When GMRES reaches its convergence, it achieves the resultant vector Qd D
Œq0 ; q1 ; : : : ; qn T , which contains the mean as well as the variance for the geometric
parameter d by
E.q.d // D q0 ;
Var.q.d // D q12 Var.d / C q22 Var.d2 1/ D q12 C 2q22 :
The above procedure can be similarly applied to calculate the variance and the mean
for the geometrical parameter h. Clearly, the stochastic orthogonal expansion leads
to an augmented system with perturbed blocks in the off-diagonal. It increases the
computational cost for any GMRES method and remains an unresolved issue in the
previous applications of the stochastic orthogonal polynomial [21, 34, 187, 209].
In addition, when variation changes, the P matrix should be partially updated.
Forming a new preconditioner to consider the augmented (11.26) would therefore
be expensive.
Based on (12.15), we can do an incremental update of the preconditioner W to
consider a new variation P .i / when changing the perturbation range of hi or di .
Moreover, we can also make an incremental update of W when changing the
variation type from P .i / .h/ to P .i / .d /. This can dramatically reduce costs when
applying the deflated GMRES during the variational capacitance extraction. The
same procedure can be easily extended for high-order expansions with stochastic
orthogonal polynomials.
6.2 Implementation Optimization
The memory complexity of iGMRES limits the scalable capability to large-scale

problems, which generally comes from two parts: memory consumption of the
preconditioner and of the MVP. Moreover, there is a time complexity mainly from
time-consuming LU and eigenvalue decompositions.
The first memory bottleneck is located at the O.N 2 / storage requirement of the
preconditioner matrix. For example, a second-order expanded system contains 3N
variables, where N is the number of panels. This is expensive to maintain. Because
each block of Pi;j is a set of symmetric positive semi-definite matrices, we can prune
some small off-diagonal entries, store half of them, and further apply a compress
sparse column (CSC) format to store the preconditioner matrix. This can reduce the
cost to build and store the block-diagonal spectral preconditioner. Another memory
bottleneck for the MVP is resolved due to the intrinsic matrix-free property of FMM.
This exploits the tree hierarchy to speed up the MVP evaluation with a cost of
O.N logN / for both memory and CPU time. Thus, the presented FMM using SGMs
can be efficiently used for large-sized variational capacitance extraction.
The time complexity stems mainly from the analysis of the preconditioner of the
nominal system during the first time. The use of a restarted Arnoldi in ARPACK can
be used to efficiently identify the first K eigenvalues. This can significantly reduce
the cost to O.N /. As a result, the computational cost to form the preconditioner is
reduced even during the first time.
Based on the presented algorithm, a program has been developed for piCap using
C++ on Linux network servers with Xeon processors (2.4 GHz CPU and 2 GB mem-
ory). In this section, we first validate the accuracy of SGMs by comparing them with
the MC integral. Then, we study the parallel runtime scalability when evaluating the
potential interaction using MVP with charge. In addition, the incremental GMRES
preconditioner is verified when compared to its nonincremental counterpart with
total runtime. Finally, spectral precondition is validated by analyzing the spectrum
of potential coefficient matrix. The initial results of this chapter were published
in [53].
7.1 Accuracy Validation
To validate the accuracy of SGM by first-order and second-order expansions, we

use two distant square panels as shown in Fig. 12.8. The nominal center-to-center
distance d is d0 , and nominal panel width h is h0 .
7.1.1 Orthogonal PC Expansion
First, we compare the accuracy of first-order and second-order OPC expansions

against the exact values from integration method. The Cij between these two panels
are calculated with different methods as listed in Table 12.1. It can be observed that
second-order OPC expansion can achieve higher accuracy than first-order expansion
Fig. 12.8 Two distant panels

in the same plane panel j
1
Z(um)
0 h
d
panel i
−1
20
10 h
0 20
10
Y(um) 0
−10 −10 X (um)
Table 12.1 Accuracy comparison of two orthogonal PC expansions

2 panels, d0 D 25 m; h0 D 5 m
First-order orthogonal PC Second-order orthogonal PCn Integration
Cij .f F / 2.7816 2.777 2.7769
2 panels, d0 D 15 m; h0 D 2 m
First-order orthogonal PC Second-order orthogonal PC Integration
Cij .f F / 1.669 1.6677 1.6677
when compared with exact values from integration method. Thus, higher OPC
expansion can lead to more accurate result but with higher computational expense
due to larger-scale system.
7.1.2 Incremental Analysis
One possible concern is about accuracy of incremental analysis, which considers

independent variation sources separately and combines their contributions to get
the total variable capacitance. In order to validate it, we first introduce panel width
variation (Gaussian distribution with perturbation range h1 ) to panel j in Fig. 12.8
and calculate the variable capacitance distribution. Then, panel distance variation
d1 is added to panel j and the same procedure is conducted. As such, according
to incremental analysis, we can obtain the total capacitance as a superposition of
nominal capacitance and both variation contributions. Moreover, we introduce the
MC simulations (10,000 times) as the baseline where both variations are introduced
simultaneously. The comparison is shown in Table 12.2, and we can observe that
the results from incremental analysis can achieve high accuracy.
Actually, it is ideal to consider all variations simultaneously, but the dimension
of system can increase exponentially with the number of variations, and thus, the
complexity is prohibited. As a result, when the variation sources are independent,
it is possible and necessary to separate them by solving the problem with each
variation individually.
Table 12.2 Incremental analysis versus MC method

2 panels, d0 D 10 m; h0 D 2 m; d1 D 30%d0 ; h1 D 30%h0
Incremental analysis .f F / MC .f F / Error .%/
mCij 1.1115 1.1137 0.19
Cij 0.11187 0.11211 0.21
2 panels, d0 D 25 m; h0 D 5 m, d1 D 20%d0 , h1 D 20%h0
Incremental analysis .f F / Monte Carlo .f F / Error .%/

Cij 2.7763 2.7758 0.018
Cij 0.19477 0.194 0.39
Table 12.3 Accuracy and 2 panels, d0 D 7:07 m; h0 D 1 m, d1 D 20%d0

runtime(s) comparison
between MC(3,000), piCap MC piCAP
Cij .f F / 0.3113 0.3056
Runtime (s) 2.6965 0.008486
2 panels, d0 D 11:31 m; h0 D 1 m, d1 D 10%d0
MC piCAP
Cij .f F / 0.3861 0.3824
Runtime (s) 2.694 0.007764
2 panels, d D 4:24 m; h0 D 1 m, d1 D 20%d0 ; h1 D 20%
MC piCAP
Cij .f F / 0.2498 0.2514
Runtime (s) 2.7929 0.008684
7.1.3 Stochastic Geometrical Moments
Next, the accuracy of presented method based on SGM is verified with the same
example in Fig. 12.8. To do so, we introduce a set of different random variation
ranges with Gaussian distribution for their distance d and width h. For this example,
MC method is used to validate the accuracy of SGMs.
First, MC method calculates their Cij s 3;000 times, and each time, the variation
with a normal distribution is introduced to distance d randomly. As such, we can
evaluate the distribution, including the mean value
and the standard deviation ,
of the variational capacitance.
Then, we introduce the same random variation to geometric moments in (12.6)
with stochastic polynomial expansion. Because of an explicit dependence on geo-
metrical parameters according to (12.1), we can efficiently calculate CO ij s. Table 12.3
shows the Cij value and runtime using the aforementioned two approaches. The
comparison in Table 12.3 shows that SGMs not only can keep high accuracy, which
yields an average error of 1.8%, but can also be up to 347 faster than the MC
method.
Moreover, Fig. 12.9 shows the Cij distribution from MC (3,000 times), while
considering 10% panel distance variation with Gaussian distribution. Also, the mean
and variance computed by piCAP are marked in the figure with the dashed lines,
which fit very well with MC results.
Distribution of Cij compare between two methods

900
800
700 μ
600
μ−3σ μ+3σ
500
400
300
200
100
0
−0.44 −0.42 −0.4 −0.38 −0.36 −0.34 −0.32 −0.3
Cij (pF)
Fig. 12.9 Distribution comparison between Monte Carlo and piCAP
7.2 Speed Validation
In this part, we study the runtime scalability using a few large examples to show
both the advantage of the parallel FMM for MVP and the advantage of the deflated
GMRES with incremental preconditions.
7.2.1 Parallel Fast Multipole Method
The four large examples are comprised of 20; 40; 80; and 160 conductors, respec-
tively. For the two-layer example with 20 conductors, each conductor is of size
1 1 25 m (width thickness length), and piCap employs a uniform 3 3 50
discretization. Figure 12.10 shows its structure and surface discretization.
For each example, we use a different number of processors to calculate the MVP
of P q by the parallel FMM. Here we assume that only d has a 10% perturbation
range with Gaussian distribution. As shown in Table 12.4, the runtime of the parallel
MVP decreases evidently when more processors are involved. Due to the use of the
complement interaction list, the latency of communication is largely reduced and
the runtime shows a good scalability versus the number of processors. In fact, the
dependent list can eliminate major communication overhead and further achieve
1:57 speedup with four processors. Moreover, the total MVP runtime with four
processors is about 3 faster on average than runtime with a single processor.
Fig. 12.10 The structure and discretization of two-layer example with 20 conductors. Reprinted
c 2011 IEEE
Table 12.4 MVP runtime (s)/speedup comparison for four different examples
#Wire 20 40 80 160
#Panels 12,360 10,320 11,040 12,480
1 proc 0.737515/1.0 0.541515/1.0 0.605635/1.0 0.96831/1.0
2 procs 0.440821/1.7 0.426389/1.4 0.352113/1.7 0.572964/1.7
3 procs 0.36704/2.0 0.274881/2.0 0.301311/2.0 0.489045/2.0
4 procs 0.273408/2.7 0.19012/2.9 0.204606/3.0 0.340954/2.8
It is worth mentioning that MVP needs to be performed many times in the

iterative solver such as GMRES. Hence, even a small reduction of MVP runtime
can lead to an essential impact on the total runtime of the solution, especially when
the problem size increases rapidly.
7.2.2 Deflated GMRES
piCap has been used to perform analysis for three different structures as shown in
Fig. 12.11. The first is a plate with size 3232 m and discretized as 1616 panels.
The other two examples are cubic capacitor and Bus 2 2 crossover structures.
a b c
plate cubic bus2x2
Fig. 12.11 Test structures: (a) plate, (b) cubic, and (c) crossover 22. Reprinted with permission
from [56]
c 2011 IEEE
Table 12.5 Runtime and iteration comparison for different examples

Diagonal prec. Spectral prec.
#Panel #Variable # Iter Time # Iter Time
Single plate 256 768 29 24.594 11 8:625
Cubic 864 2592 32 49.59 11 19:394
Crossover 1,272 3,816 41 72.58 15 29:21
For each example, we can obtain two stochastic equation systems in (12.19) by
considering variations separately from width h of each panel and from the centric
distance d between two panels, both with 20% perturbation ranges from their
nominal values which should obey the Gaussian distribution.
To demonstrate the effectiveness of the deflated GMRES with a spectral pre-
conditioner, two different algorithms are compared in Table 12.5. In the baseline
algorithm (column “diagonal prec.”), it constructs a simple preconditioner using
diagonal entries. As the fine mesh structure in the extraction usually introduces
degenerated or small eigenvalues, such a preconditioning strategy within the tra-
ditional GMRES usually needs much more iterations to converge. In contrast, since
the deflated GMRES employs the spectral preconditioner to shift the distribution
of nondominant eigenvalues, it accelerates the convergence of GMRES, leading
to a reduced number of iterations. As shown by Table 12.5, the deflated GMRES
consistently reduces the number of iterations by 3 on average.
7.2.3 Incremental Preconditioner
With the spectral preconditioner, an incremental GMRES can be designed easily

to update the preconditioner when considering different stochastic variations. It
quite often happens that a change occurs in the perturbation range of one geometry
parameter or in the variation type from one geometry parameter to the other. As the
Table 12.6 Total runtime(s) comparison for two-layer 20-conductor by different methods
Discretization Total runtime (s)
wtl #Panel #Variable Nonincremental Incremental
337 2,040 6,120 419.438 81.375
3 3 15 3,960 11,880 3,375.205 208.266
3 3 24 6,120 18,360 – 504.202
3 3 60 14,760 44,280 – 7,584.674
system equation in (12.19) is augmented to 3 larger than the nominal system, it

becomes computationally expensive to apply any nonincremental GMRES methods
whenever there is a change from the variation. As shown by the experiments, the
incremental preconditioning in the deflated GMRES can reduce the computation
cost dramatically.
As described in Sect. 5, iGMRES needs to perform the precondition only one
time for the nominal system and to update the preconditioner with perturbations
from matrix block P .1/ . In order to verify the efficiency of such an incremental
preconditioner strategy, we apply two different perturbation ranges for h1 for panels
of the two-layer 20 conductors shown in Fig. 12.10. Then, we compare the total
runtime of the iGMRES and GMRES, both with the deflation. The results are shown
in Table 12.6.
From Table 12.6, we can see that a nonincremental approach needs to construct
its preconditioner whenever there is an update of variations, which is very time
consuming. The presented iGMRES can reduce CPU time greatly during the con-
struction of the preconditioner by only updating the nominal spectral preconditioner
incrementally with (12.15). The result of iGMRES shows a speedup up to 15 over
nonincremental algorithms and only iGMRES can finish all large-scale examples up
to 14,760 panels.
Moreover, we investigate the speedup each technique can bring to the overall
performance, and find that parallel MVP using FMM can reduce on average
36% of total runtime when compared with serial counterpart. Similarly, spectral
preconditioner can reduce 27% total runtime on average. In addition, when applying
incremental precondition, total runtime can be reduced by 21% on average. It can
be found that parallel MVP is the most efficient mechanism among these techniques
to achieve speedup.
7.3 Eigenvalue Analysis
The spectral preconditioner can shift eigenvalue distribution to improve the conver-
gence of GMRES. Therefore, we compare the resultant spectrum with the nominal
case in this section, and further verify the efficiency of spectral preconditioner.
We use a single plate as an experimental example, and the spectrum of potential
coefficient matrix P can be calculated for nominal and perturbed systems.
103
Nominal System
Perturbed System
Preconditioned Perturbated System
102
EigenValue
101
100
10−1
0 20 40 60 80 100
EigenValue Index
Fig. 12.12 The comparison of eigenvalue distributions (panel width as variation source)
7.3.1 Perturbed System with Width as a Variation Source
First, we study the spectrum of the nominal system without variation, which is
shown as plus signs in Fig. 12.12. It is obvious that the eigenvalues are not close
to each other, which can lead to large number of iterations in GMRES.
We introduce panel width variation h to generate perturbed system P ./
q./ D v. Here we assume that h has a 20% perturbation range. The eigenvalue
distribution of perturbed system can change dramatically from this nominal case, as
circle signs in Fig. 12.12, which disperse within a larger area. Therefore, in order
to speed up the convergence, we construct a spectral preconditioner as described
in Sect. 5 and apply it to the above perturbed system. Similarly, the spectrum of
the preconditioned perturbed system are shown as star signs in Fig. 12.12. It can be
observed that the preconditioned system has a more compact eigenvalue distribution
because the spectral preconditioner shifts dispersed eigenvalues to a certain area.
Moreover, when the linear system is solved with an iterative solver, such as
GMRES, the convergence speed depends greatly upon eigenvalue distributions
of the system matrix. With more compact spectrum, spectral preconditioner can
accelerate the convergence of iGMRES dramatically in the presented method.
8 Summary 207
102
Nominal System
Perturbed System
101
Preconditioned Perturbed System
EigenValue
100
10−1
0 20 40 60 80 100
EigenValue Index
Fig. 12.13 The comparison of eigenvalue distributions (panel distance as variation source)
7.3.2 Perturbed System with Distance as a Variation Source
Similarly, we can introduce panel distance variation d into the nominal system to
get perturbed system P ./ q./ D v. Also, distance d has a 20% perturbation
range.
We plot the spectrum of the perturbed system with distance variation with circle
signs in Fig. 12.13. When compared with spectrum in Fig. 12.12, we find that
panel width variation has more influence on the spectrum of perturbed system than
panel distance variation does. With spectral precondition, the spectrum becomes
more compact, as shown with star signs in Fig. 12.13. In fact, all eigenvalues
of preconditioned perturbed system are close to 0:2, which determines the small
condition number of the system matrix and thus fast convergence of GMRES.
8 Summary
In this chapter, we introduced GMs to capture local random variations for full-chip
capacitance extraction. Based on FMs, the stochastic capacitance can be thereby
calculated via OPC by FMM in a parallel fashion. As such, the complexity of
the MVP can be largely reduced to evaluate both nominal and stochastic values.
Moreover, one incrementally preconditioned GMRES is developed to consider
different types of update of variations with an improved convergence by spectrum
deflation.
A number of experiments show that the presented approach is 347 faster

than the MC-based evaluation of variation with a similar accuracy, up to 3 faster
than the serial method in MVP, and up to 15 faster than nonincremental GMRES
methods. In detail, the observed speedup of the presented approach is analyzed in
twofold: the first is from the efficient parallel FMM, and the other is from the non-
MC evaluation by OPC. The potential speedup of one parallel algorithm is given
by Amdahl’s law. As FMM and OPC can be highly parallelized, the presented
developed extraction thereby can achieve significant speedups on parallel computing
platforms. However, note that the spectral precondition is not parallelized. For
example, the parallel MVP in FMM can reduce the total runtime by 36% on average.
The use of spectral precondition and incremental evaluation can reduce the total
runtime by 27% and 21% on average, respectively. As such, the parallel MVP is the
one to reduce the runtime mostly. Moreover, we have also investigated the benefit
from data sharing on communication overhead during the parallel implementation.
It shows that the data-sharing technique, such as the use of dependence list, can
eliminate the major communication overhead and can achieve up to 1:57 speedup
for the parallel MVP on four processors. The future work is planned to extend
the presented approach to deal with the general capacitance extraction with a non-
square-panel geometry.
Chapter 13
Statistical Inductance Modeling and Extraction
1 Introduction
A significant portion of process variations are purely random in nature [122]. As a

result, variation-aware design methodologies and statistical computer-aided design
(CAD) tools are widely believed to be the key to mitigating some of the challenges
for 45 nm technologies and beyond [122, 148]. Variational considerations have to
be incorporated into every step of the design and verification processes to ensure
reliable chips and profitable manufacturing yields.
In this chapter, we investigate the impact of geometric variations on the
extracted inductance (partial or loop). Parasitic extraction algorithms have been
intensively studied in the past to estimate the resistance, capacitance, inductance,
and susceptance of 3D interconnects [76, 118, 147, 211]. Many efficient algorithms
like the FastCap [118], FastHenry [76], and FastImp [211] were proposed, based
on using the BEM or volume discretization methods (for partial element equivalent
circuit (PEEC)-based inductance extraction [147]). In the nanometer regime, cir-
cuit layout will have significant variations, both systematic and random, coming
from the fabrication process. Much recent research work has been done under
different variational models for capacitance extraction while considering process
variations [74, 207, 208, 210]. However, less research has been done for variational
inductance extraction in the past.
We present a new statistical inductance extraction method called statHenry [143],
based on a spectral stochastic collocation scheme. This approach is based on the
Hermite PC representation of the variational inductance. statHenry applies the
collocation idea where the inductance extraction processes are performed many
times in predetermined sampling positions so that the coefficients of orthogonal
polynomials of variational inductance can be computed using the weighted least-
square method. The number of samplings is O(m2 ), where m is the number of
variables for the second-order Hermite polynomials. If m is large, the approach will
lose its efficiency compared to the MC method. To mitigate this problem, a weighted
principal factor analysis (wPFA) method is performed to reduce the number of

210 13 Statistical Inductance Modeling and Extraction
variables by exploiting the spatial correlations of variational parameters. Numerical

examples show that the presented method is orders of magnitudes faster than the
MC method with very small errors for several practical interconnect structures. We
also show that typical variation for the width and height of wires (10–30%) can
cause significant variations to both partial and loop inductance.
For a system with m conductors, we first divide all conductors into b filaments. The
resistance and inductance of all filaments are, respectively, stored in matrices Rbb
and Lbb , each with dimensions b b. R is a diagonal matrix with its diagonal
element
li
Rii D ; (13.1)
ai
where li is the length of filament i , is conductivity, and ai is the area of the
cross section of filament i . The inductance matrix L is a dense matrix. Lij can be
represented as in [76]:
Z Z
liPlj
Lij D dVi dVj ; (13.2)
4ai aj Vi Vj kr r0 k
where is permeability, li and lj are unit vectors of the lengthwise direction

of filaments i and j , r is an arbitrary point in the filament, and Vi and Vj
are the volumes of filaments i and j , respectively. Assuming magnetoquasistatic
electric fields, the inductance extraction problem is then finding the solution to the
discretized integral equation:
Z Z !
li X b
liPlj
Ii C j! dVi dVj Ij
j D1
4ai aj Vi Vj kr r0 k
Z
1
D .˚A ˚B /dA; (13.3)
ai ai
where Ii and Ij are the currents inside the filaments i and j , ! is the angular
frequency, and ˚A and ˚B are the potentials at the end faces of the filament.
Equation (13.3) can be written in the matrix format as
.R C j!L/Ib D Vb ; (13.4)
where Ib 2 C b is the vector of b filament currents and Vb is a vector of dimension

b containing the filament voltages. We will first solve for the inductance of one
conductor, which we will call the primary conductor, and then the inductance
between it and all others, which we will call the environmental conductors. To do
this, we set the voltages of filaments in the primary conductor to unit voltage and
voltages of all other filaments to zero. Therefore Ib can be calculated by solving
a system of linear equations, together with the current conservation (Kirchhoff’s
current law (KCL)) equation
MIb D Im (13.5)
on all the filaments, where M is an adjacent matrix for the filaments and Im is
the currents of all m conductors. By repeating this process with each of the m
conductors as the primary conductor, we can obtain Im;i ; i D Œ1; : : : m vectors
which form an m m matrix Ip D ŒIm;1 ; Im;2 ; : : : ; Im;m . Since the voltages of
all primary conductors have been set to unit voltage previously, the resistance and
inductance can be achieved respectively from the real part and the imaginary part of
the inverse matrix of Ip .
Process variations affecting conductor geometry are reflected by changes in the
width w and height h of the conductors. We ignore the length of the wires as the
variations are typically insignificant compared to their magnitude. These variations
will make each element in the inductance matrix follow some kinds of random
distributions. Solving this problem is done by deriving the random distribution
and then effectively computing the mean and variance of the inductance with the
given geometric randomness parameters. In this chapter, we assume that width and
height in each filament i are disturbed by random variables nw;i and nh;i , which
gives us:
wi 0 D wi C nw;i ; (13.6)
0
hi D hi C nh;i ; (13.7)
where the size of xi is a Gaussian distribution jxi j N.0; 2 /. The correlation
between random perturbations on each wire’s width and height is governed by an
empirical formulation such as the widely used exponential model:
.r/ D er
2 =2
; (13.8)
where r is the distance between two panel centers and is the correlation length. The
most straightforward method is to use a MC-based simulation to obtain distribution,
mean, and variance of all those inductances. Unfortunately, the MC method will be
extremely time consuming, and more efficient statistical approaches are needed.
3 The Presented Statistical Inductance Extraction

Method—statHenry
In this section, we present the new statistical inductance extraction method—

statHenry. The presented method is based on spectral stochastic method where the
integration in (2.36) is computed via an improved numerical quadrature method. The
presented method is based on the efficient multidimensional numerical Gaussian
and Smolyak quadrature in Sect. 3.3 of Chap. 2 and the variable decoupling and
reduction technique in Sect. 2.2 of Chap. 2.
3.1 Variable Decoupling and Reduction
In inductance extraction problem, process variations exist in the width w and height
h of the conductors, which make each element of the inductance matrix (13.2)
follow some kinds of random distributions. Solving this problem is done by deriving
the random distribution and then effectively computing the mean and variance of the
inductance with the given geometric randomness parameters. As shown in (13.6)
and (13.7), each filament i is modeled by two Gaussian random variables, nw;i
and nh;i . Suppose there are n filaments, then the inductance extraction problem
involves 2n Gaussian random variables with spatial correlation modeled as in (13.8).
Even with sparse grid quadrature, the number of sampling points still grows
quadratically with the number of variables. As a result, we should further reduce
the number of variables by exploiting the spatial correlations of the given random
width and height parameters of wires.
We start with independent random variables as the input of the spectral stochastic
method. Since the height and width variables of all wires are correlated, this
correlation should be removed before using the spectral stochastic method. As
proved in Sect. 2.3 of Chap. 2, the theoretical basis for decoupling the correlation
of those variables is Cholesky decomposition.
Proposition 13.1. For a set of zero-mean Gaussian distributed variables whose
covariance matrix is ˝2n2n , if there is a matrix L satisfying ˝ D LLT , then
can be represented by a set of independent standard normally distributed variables
as D L .
Here the covariance matrix ˝2n2n contains the covariance between all the nw;i
and nh;i for each filament, and ˝ is always a semipositive definite matrix due to
the nature of covariance matrix. At the same time, PFA [74] can substitute Cholesky
decomposition when variable reduction is needed. Eigendecomposition on ˝2n2n
yields: p
p
˝2n2n D LLT ; L D
1 e1 ; : : : ;
2n e2n ; (13.9)
3 The Presented Statistical Inductance Extraction Method—statHenry 213
where f
i g are eigenvalues in order of descending magnitude, and fei g are
corresponding eigenvectors. After PFA, the number of random variables involved
in inductance extraction is reduced from 2n to k by truncating L using the first k
items.
The error of PFA can be controlled by k:
P
2n

i
i DkC1
err D ; (13.10)
P
2n

i
i D1
where bigger k leads to a more accurate result. PFA is efficient, especially when the
correlation length is large. In the experiments, we set the correlation length being
eight times the width of wires. As a result, PFA can reduce the number of variables
from 40 to 14 with an error of about 1% in an example with 20 parallel wires.
3.2 Variable Reduction by Weighted PFA
PFA for variable reduction considers only the spatial correlation between wires,
while ignoring the influence of the inductance itself. One idea is to consider the
importance of the outputs during the reduction process. We follow the recently
proposed wPFA technique to seek better variable reduction efficiency [204].
If a weight is defined for each physical variable i , to reflect its impact on the
output, then a set of new variables are formed:
D W ; (13.11)
where W D diag.w1 ; w2 ; : : : ; w2n / is a diagonal matrix of weights. As a result, the

covariance matrix of , . / now contains the weight information, and perform-
ing PFA on 2n2n . / leads to the weighted variable reduction. Specifically, we
have
2n2n . / D E W .W /T D W 2n2n . /W T (13.12)
and denote its eigenvalues and eigenvectors by
i and ei . Then, the variables
can be approximated by the linear combination of a set of independent dominant
variables :
Xk q
D W 1 W 1
i ei i : (13.13)
i D1
The error controlling process is similar to (13.10), but using the weighted eigenval-
ues
i . For inductance extraction, we take the partial inductance of the deterministic
structure as the weight, since this normal structure reflects an approximate equality
Fig. 13.1 The statHenry algorithm
of inductance compared with the variational structure. By performing wPFA in the

same example with 20 parallel wires, 40 variables can now be reduced to 8 rather
than 14 when using PFA (more details in the experimental results).
3.3 Flow of statHenry Technique
After explaining all the important pieces from related works in Chap. 2, we are now
ready to present the new algorithm—statHenry. Figure 13.1 is a flowchart of the
presented algorithm.
In this section, we compare the results of the statHenry method against the MC
method and a simple method using HPC with the sparse grid technique but without
variable reduction. The method statHenry has been implemented in Matlab 8.0. All
the experimental results were obtained using a computer with a 1:6 GHz Intel quad-
core i7-720 and 4 GB memory running Microsoft Windows 7 Ultimate operating
system. The version of FastHenry is 3.0 [76]. The initial results of this chapter were
For the experiment, we set up four test cases to examine the algorithm: 2 parallel
wires, 5 parallel wires, 10 parallel wires, and 20 parallel wires as shown in Fig. 13.2.
In all four models, all of the wires have a width of 1 m, length of 6 m, and
pitch of 1 m between them. The unit of the inductance in the experiment results is
picohenry (pH).
Fig. 13.2 Four test structures used for comparison
We set the standard deviation as 10% of the wire widths and wire heights and the
correlation length being 8 m to indicate a strong correlation.
First, we compare the accuracy of the three methods in terms of the mean
and standard deviations of loop/partial inductance. The results are summarized in
Table 13.1. In the table, we report the results from four test cases as mentioned.
In each case, we report the results for partial self-inductance on wire 1 (L11p ) and
loop inductance between wire 1 and 2 (L12l ). Columns 3–4 are the mean value
and standard deviation value for the MC method (MC). And columns 5–12 are the
mean value, standard deviation value, and their errors comparing with MC method
for HPC and the presented method. The average error of the mean and standard
deviation of HPC method is 0:05% and 2:01% compared with MC method while
that of statHenry method is 0:05% and 2:06%, respectively. The MC results come
from 10,000 FastHenry runs.
It can be seen that statHenry is very accurate for both mean and standard
deviation compared with the HP C method and MC method. We observe that a
10% standard deviation for the width and height results in variations from 2.73% to
5.10% for the partial and loop inductances, which is significant for timing.
Next, we show the CPU time speedup of the presented method. The results are
summarized in Table 13.2. It can be seen that statHenry can be about two orders
of magnitude faster than the MC method. The average speedup of the HPC method
and statHenry method is 54.1 and 349.7 compared with MC method. We notice that
with more wires, the speedup goes down. This is expected as more wires lead to
more variables, even after the variable reduction, as the number of samplings in
the collocation method is O.m2 / for second-order Hermit polynomials, where m
is the number of variables. As a result, more samplings are needed to compute the
coefficients while MC has the fixed number of samplings (10,000 for all cases).
Table 13.1 Accuracy comparison (mean and variance values of inductances) among MC, HPC,
and statHenry
Values (pH) Error
Wires Inductance MC HPC statHenry HPC (%) statHenry (%)
2 L11p Mean 2.851 2.850 2.850 0.02 0.03
std 0.080 0.078 0.078 2.31 2.47
2 L12l Mean 3.058 3.057 3.056 0.05 0.06
std 0.158 0.156 0.155 1.50 2.21
5 L11p Mean 2.849 2.851 2.851 0.08 0.07
std 0.078 0.078 0.078 0.86 0.24
5 L12l Mean 3.054 3.058 3.058 0.11 0.11
std 0.155 0.156 0.156 1.01 0.70
10 L11p Mean 2.852 2.853 2.853 0.01 0.02
std 0.079 0.078 1.23% 0.078 1.37
10 L12l Mean 3.059 3.060 3.060 0.05 0.05
std 0.159 0.156 1.55% 0.156 1.74
20 L11p Mean 2.852 2.853 2.853 0.03 0.03
std 0.081 0.078 0.078 3.74 3.82
20 L12l Mean 3.059 3.060 3.060 0.04 0.05
std 0.163 0.156 0.156 3.88 3.96
Table 13.2 CPU runtime comparison among MC, HPC, and statHenry
MC HPC Speedup statHenry Speedup
Wires Time (s) Time (s) (vs. MC) Time (s) (vs. MC)
2 5394:4 32:6 165:4 9:8 550:4
5 7442:8 192:5 38:7 12:6 589:1
10 8333:5 893:7 9:3 42:5 195:9
20 13698:3 4532:9 3:0 215:8 63:5
Table 13.3 Reduction effects of PFA and wPFA

Original PFA wPFA
Wires Variables Reduction Points Reduction Points
2 4 4 45 2 15
5 10 4 45 2 15
10 20 6 91 4 45
20 40 14 435 8 153
Table 13.3 shows the reduction effects using PFA and wPFA for all the cases
under the same errors. We can see that with weighted wPFA, we can achieve lower
reduced variable number and fewer quadrature points for sampling, thus better
efficiency for the entire extraction algorithm.
Finally, we study the variational impacts of partial and loop inductances under
different variabilities for width and height using statHenry and the MC method.
The variation statistics are summarized in Table 13.4. Here we report the results
for standard deviations from 10% to 30% for width and height for statHenry
Table 13.4 Variation impacts on inductances using statHenry

10 parallel wires L11p (pH)
Monte Carlo statHenry Error
Variation Mean Std Mean Std Mean (%) Std (%)
10% 2.852 0.079 2.853 0.078 0.02 1.37
20% 2.872 0.163 2.862 0.160 0.35 1.84
30% 2.890 0.245 2.879 0.249 0.36 1.45
10 parallel wires L12l (pH)
Monte Carlo statHenry Error
Variation Mean Std Mean Std Mean (%) Std (%)
10% 3.059 0.159 3.060 0.156 0.05 1.74
20% 3.097 0.325 3.078 0.319 0.61 1.84
30% 3.128 0.484 3.110 0.495 0.56 2.26
loop inductance L12 distribution of 10 parallel wires

0.2
Monte Carlo
statHenry
0.15
probability
0.1
0.05
0
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
loop inductance L12 (pH)
Fig. 13.3 The loop inductance L12l distribution changes for the 10-parallel-wire case under 30%
width and height variations
method and MC method for 10-parallel-wire case. As the variation due to process
imperfections grows as the technology advances, we can see that inductance
variation will also grow. Considering a typical 3 range for variation, a 30%
standard deviation means that width and height changes can reach 90% of their
values. It can be seen that with the increasing variations of width and height (from
10% to 30%), the std=mean of partial inductance grows from 2.75% to 8.65% while
that of loop inductance grows from 5.10% to 15.9% , which can significantly impact
the noise and delay of the wires. The average error of mean and standard deviation
of statHenry is 0.33% and 1.75% compared with MC for all variabilities of width
and height. From this, we can see that the results of statHenry agree closely with
MC under different variations.
partial inductance L11 distribution of 10 parallel wires

0.25
Monte Carlo
0.2 statHenry
probability
0.15
0.1
0.05
0
1.5 2 2.5 3 3.5 4 4.5
partial inductance L11 (pH)
Fig. 13.4 The partial inductance L11p distribution changes for the 10-parallel-wire case under
30% width and height variations
Figures 13.3 and 13.4 show the loop (for wire 1 and wire 2, L12l ) and partial
inductance distributions (for wire 1 itself, L11p ) under 30% deviations of width and
heights for the 10-parallel-wire case.
5 Summary
In this chapter, we have presented a new statistical inductance extraction method,

called statHenry, for interconnects considering process variations with spatial
correlation. This new method is based on the collocation-based spectral stochastic
method where OPC is used to represent the variational geometrical parameters in
a deterministic way. Statistical inductance values are then computed using a fast
multidimensional Gaussian quadrature method with sparse grid technique. Then, to
further improve the efficiency of the presented method, a random variable reduction
scheme based on wPFA is applied. Numerical examples show that the presented
method is orders of magnitudes faster than the MC method with very small errors
for several practical interconnect structures. We also show that both partial and loop
inductance variations can be significant for the typical 10–30% standard variations
of width and heights of interconnect wires.
Part V
Statistical Analog and Yield Analysis
and Optimization Techniques
Chapter 14
Performance Bound Analysis of Variational
Linearized Analog Circuits
1 Introduction
Analog and mixed-signal circuits are very sensitive to the process variations
as many matchings are required. This situation becomes worse as technology
continues to scale to 90 nm and below owing to the increasing process-induced
variability [122, 148]. Transistor-level mismatch is the primary obstacle to reach
a high yield rate for analog designs in sub-90 nm technologies. For example, due
to an inverse-square-root-law dependence with the transistor area, the mismatch of
CMOS devices nearly doubles for each process generation less than 90 nm [80,104].
Since the traditional worst-case- or corner-case-based analysis is too pessimistic to
sacrifice the speed, power, and area, the statistical approach [133] thereby becomes
a trend to estimate the analog mismatch and performance variations. The variations
in the analog components can come from systematic (or global spatial variation)
ones and stochastic (or local random variation) ones. In this chapter, we model both
variations as the parameter intervals on the components of analog circuits.
Analog circuit designers usually perform a MC analysis to analyze the stochastic
mismatch and predict the variational responses of their designs under faults. As MC
analysis requires a large number of repeated circuit simulations, its computational
cost is expensive. Moreover, the pseudorandom generator in MC introduces numer-
ical noises that may lead to errors. More efficient variational analysis, which can
give the performance bounds, is highly desirable.
Bounding or worst-case analysis of analog circuits under parameter variations
has been studied in the past for fault-driven testing and tolerance analysis of analog
circuits [83, 162, 179]. The proposed approaches include sensitivity analysis [185],
the sampling method [168], and interval arithmetic-based approaches [83, 140, 162,
179]. But sensitivity-based method cannot give the worst-case in general, and the
sampling based method is limited to a few variables. Interval arithmetic methods, in
general, have the reputation of overly pessimistic in the past. Recently, worst-case
analysis of linearized analog circuits in frequency domain has been proposed [140],

222 14 Performance Bound Analysis of Variational Linearized Analog Circuits
where Kharitonov’s functions [79] were applied to obtain the performance bounds
in frequency domain, but no systemic method was proposed to obtain variational
transfer functions.
In this chapter, we propose a performance bound analysis algorithm of analog
circuits considering the process variations [61]. The presented method employs
several techniques to compute the bounding responses of analog circuits in the
frequency domain. First, the presented method models the variations of component
values as intervals measured from tested chip and manufacture processes. Then
the presented method applies determinant decision diagram (DDD) graph-based
symbolic analysis to derive the exact symbolic transfer functions from linearized
analog circuits. After this, affine interval arithmetic is applied to compute the vari-
ational transfer functions of the analog circuit with variational coefficients in forms
of intervals. Finally, the frequency response bounds (maximum and minimum) are
obtained by performing evaluations of a finite number of special transfer functions
given by the Kharitonov’s theorem, which shows the proved response bounds for
given interval polynomial functions in frequency domain. We show that symbolic
decancellation is critical for reducing inherent pessimism in the affine interval
analysis. We also show that response bounds given by the Kharitonov’s functions are
conservative, given the correlations among coefficient intervals in transfer functions.
Numerical examples demonstrate the presented method is more efficient than the
MC method.
The rest of this chapter is organized as follows: Sect. 2 gives a review on
interval arithmetic and affine arithmetic. The presented performance bound analysis
method is presented in Sect. 3. Section 4 shows the experimental results, and Sect. 5
summarizes this chapter.
2 Review of Interval Arithmetic and Affine Arithmetic
Interval arithmetic was introduced by Moore in the 1960s [113] to solve range
estimation considering uncertainties. In interval arithmetic, a classical variable x
is represented by an interval xO D Œx ; x C which satisfies x x x C .
However, the interval arithmetic suffers the overestimation problem as it often yields
an interval that is much wider than the exact range of the function.
As an example, given xO D Œ1; 1, the interval evaluation of xO xO produces
Œ1 1; 1 .1/ D Œ2; 2 instead of Œ0; 0, which is the actual range of that
expression.
Affine arithmetic was proposed by Stolfi and de Figueiredo [25] to overcome the
error explosion problem of standard interval analysis. In affine interval, the affine
form xO of random variable x is given by
n
X
xO D x0 C xi "i ; (14.1)
i D1
3 The Performance Bound Analysis Method Based on Graph-based Symbolic Analysis 223
in which each noise symbol "i .i D 1; 2; : : : ; n/ is an independent component

of the total uncertainties of x which satisfies 1 "i 1, the coefficient xi
is the magnitude of "i , and x0 is the central value of x. O The conversion from
xO in (14.1) can be converted to
affine intervals to classical intervals is easy asP
n
Œx0 rad.x/;
O x0 C rad.x/O in which rad.x/ O D i D1 jxi j is defined as the radius
of the affine expression x.O Basic operation of addition and subtraction of affine
arithmetic is defined by
n
X
xO ˙ yO D .x0 ˙ y0 / C .xi ˙ yi /"i : (14.2)
i D1
Returning to the previous example, if x has the affine form xO D 0 C "1 then xO xO D
"1 "1 D 0 gives the accurate result. Affine arithmetic multiplication is defined as
n
X
xO yO D x0 y0 C .x0 yi C xi y0 /"i C rad.x/
O rad.y/
O "nC1 ; (14.3)
i D1
in which "nC1 is a new noise symbol that is distinct from all the other noise
symbols "i .i D 1; 2; : : : ; n/. We notice that affine operations mitigate the problem
associated with symbolic cancellations in addition, but for multiplication, the
symbolic cancellation can still exist, for instance if xO yO yO xO D 0, but they will
generate two different "nC1 ’s when multiplication is done first and the complete
cancellation will not happen any more.
3 The Performance Bound Analysis Method Based

on Graph-based Symbolic Analysis
We first present the whole algorithm flow of the presented performance bound
analysis algorithm in Fig. 14.1. Basically, the presented method consists of two
major computing steps. The first step is to compute the variational transfer functions
from the variational circuit parameters, which will be done via DDD-based symbolic
analysis method and affine interval arithmetic (steps 1–3). Second, we compute the
frequency response bounds via Kharitonov’s functions, which just require a few
transfer function evaluations (step 4). Kharitonov’s functions can lead to approved
upper and lower bounds for the frequency domain responses for a variational
transfer function. We will present the two major computing steps in the following
sections.
3.1 Variational Transfer Function Computation
In this section, we first provide a brief overview of DDD [160]. Next we show how
affine arithmetic can be applied to compute the variational transfer function.
1 R2 2 R3 3
I R1 C1 C2 C3
Fig. 14.2 An example circuit. Reprinted with permission from [61].

c 2011 IEEE
3.1.1 Symbolic Analysis by Determinant Decision Diagrams
Determinant decision diagrams [160] are compact and canonical graph-based

representation of determinants. The concept is best illustrated using a simple RC
filter circuit shown in Fig. 14.2.
Its system equations can be written as
2 1 1
32 3 2 3
R1
C sC1 C R2
R12 0 v1 I
6 7
4 R12 1
R2 C sC2 C 1
R3 R13 5 4 v2 5 D 4 0 5 :
0 R13 1
R3
C sC3 v3 0
We view each entry in the circuit matrix as one distinct symbol and rewrite its system
determinant in the left-hand side of Fig. 14.3. Then its DDD representation is shown
in the rhs.
A DDD is a signed, rooted, directed acyclic graph with two terminal nodes,
namely, the 0-terminal vertex and the 1-terminal vertex. Each nonterminal DDD
vertex is labeled by a symbol in the determinant denoted by ai (A to G in Fig. 14.3),
and a positive or negative sign denoted by s.ai /. It originates two outgoing edges,
1 edge
A +
0 edge
D + C -
A B 0
C D E G + F - B +
0 F G
E +
1 0
Fig. 14.3 A matrix determinant and its DDD representation. Reprinted with permission from [61].
c 2011 IEEE
called 1-edge and 0-edge. Each vertex ai represents a symbolic expression D.ai /
defined recursively as follows:
D.ai / D ai s.ai / Dai C Dai ; (14.4)
where Dai and Dai represent, respectively, the symbolic expressions of the nodes
pointed by the 1-edge and 0-edge of ai . The 1-terminal vertex represents expression
1, whereas the 0-terminal vertex represents expression 0. For example, vertex E
in Fig. 14.3 represents expression E, and vertex F represents expression EF ,
and vertex D represents expression DG FE. We also say that a DDD vertex
D represents an expression defined in the DDD subgraph rooted at D.
A 1-path in a DDD corresponds with a product term in the original DDD, which
is defined as a path from the root vertex (A in our example) to the 1-terminal
including all symbolic symbols and signs of the nodes that originate all the 1-edges
along the 1-path. In our example, there exist three 1-paths representing three product
terms: ADG, AFE, and CBG. The root vertex represents the sum of these
product terms. Size of a DDD is the number of DDD nodes, denoted by jDDDj.
Once a DDD has been constructed, the numerical values of the determinant it
represents can be computed by performing the depth-first-type search of the graph
and performing (14.4) at each node, whose time complexity is linear function
of the size of the graphs (its number of nodes). The computing step is called
Evaluate(D) where D is a DDD root. With proper node ordering and hierarchical
approaches, DDD can be very efficient to compute transfer functions of large analog
circuits [160, 174].
In order to compute the symbolic coefficients of the transfer function in different
powers of s, the original DDD can be expanded to the s-expanded DDD [161].
By doing this, each coefficient of the transfer function is represented by a coefficient
DDD. The s-expanded DDD can be constructed from the complex DDD in linear
time in the size of the original complex DDD [161].
3.1.2 Variational Transfer Function

P
n
Assume that each circuit parameter xO becomes an affine interval xO D x0 C xi "i
i D1
due to process variations, now we want to compute the variational transfer functions.
The resulting transfer functions will take the following s-expanded rational form:
Pm
N.s/ aO i s i
H.s/ D D Pni D0 ; (14.5)
D.s/ O j
j D0 bj s
where coefficients aO i and bOj are all affine intervals. This can be computed by means
of affine arithmetic [25]. Basically, the DDD Evaluation operation traverses the
DDD in a depth-first style and performs one multiplication and one addition at each
node as shown in (14.4). Now the two operations will be replaced by the addition
and multiplication from affine arithmetic.
3.1.3 Symbolic Decancellation in DDD Evaluation Using

Affine Arithmetic
As mentioned before, the interval and affine arithmetic operations are very sensitive
to the symbolic term cancellations, which, however, have significant presences
in the DDD and s-expanded DDD. It was shown that about 70–90% terms in
the determinant of a MNA-formulated circuit matrix are canceling terms [175].
Notice that symbolic cancellation always happens even in the presence of parameter
variations.
In DDD evaluation, we have both addition and multiplication as shown in (14.4).
Cancellation can lead to large errors if not removed. For example, considering two
terms xO yO zO and zO yO .x/,
O and supposing xO D 1 C "1 ; yO D 1 C "2 ; zO D 1 C "3,
then
xO yO zO D .1 C "1 C "2 C "4 / zO

D 1 C "1 C "2 C "3 C "4 C 3"5 ;
O D .1 C "2 C "3 C "6 / .x/
zO yO .x/ O
D 1 "1 "2 "3 "6 3"7 :
However, the addition of these two terms is
xO yO zO C zO yO .x/
O D "4 C 3"5 "6 3"7 ; (14.6)
which should be 0. The reason is that in affine multiplication defined in (14.3),

the new noise symbol is actually a function of the original noise symbols "i .i D
1; 2; : : : ; n/, but affine arithmetic assumes the new symbol is independent from the
original ones. As a result, the symbolic canceling terms will result in inaccurate
results, which can be as large as Œ8; 8 for (14.6).
Fortunately, we can perform the decancellation operation on coefficient DDDs
in the s-expanded DDDs in a very efficient way during or after the coefficient
DDD construction, so that the resulting coefficient DDD is cancellation free [175],
which can significantly improve the interval computation accuracy as shown in the
experimental results.
3.1.4 Increase the Accuracy of Affine Arithmetic by Considering

Second-Order Noise Symbols
The affine arithmetic used in DDD evaluation is addition and multiplication. The
affine addition is accurate as it does not include any new noise symbol. However,
for affine multiplication shown in (14.3), every time a new noise symbol "nC1 is
added and this process will reduce the accuracy of the bound of affine arithmetic
compared with real bound. In our implementation, we store the coefficients of first
order as well as second-order noise symbols and we only add new noise symbol for
higher orders. The affine multiplication in (14.3) is changed to:
n
X
xO yO D x0 y0 C .x0 yi C xi y0 /"i
i D1
n
X n
X n
X
C xi yi "2i C .xi yj C xj yi /"i "j : (14.7)
i D1 i D1 j Di C1
For simplicity, assume x ; x C ; xi ; y ; y C ; yi > 0 .i D 0; 1; ; n/, the bound

of xO yO in (14.7) is Œx0 y0 rad1 ; x0 y0 C rad2 , in which
n
X n X
X n
rad1 D .x0 yi C xi y0 / xi yj ; (14.8)
i D1 i D1 j D1
n
X n X
X n
rad2 D .x0 yi C xi y0 / C xi yj ; (14.9)
i D1 i D1 j D1
which is more accurate than the bound Œx0 y0 rad2 ; x0 y0 C rad2 obtained by
original affine multiplication in (14.3). For other combinations of the values of
x ; x C ; xi ; y ; y C ; yi , the accuracy of affine multiplication can also be increased
accordingly via considering second-order noise symbols.
3.2 Performance Bound by Kharitonov’s Functions
Given a transfer function with variational coefficients, one can perform MC-
based approach to compute the variational responses in frequency domain. However,
more efficient works can be done via Kharitonov’s functions which are only a few,
but can give the approved bounds of the responses in frequency domain.
Kharitonov’s seminal work proposed in 1978 [79] was originally concerned
with the stability issues of a polynomial (with real coefficients) with coefficient
uncertainties (due to perturbations). He showed that one needs to verify only four
special polynomials to ensure that all the variational polynomials are stable.
Specifically, given a family of polynomials with real and variational coefficients,
P .s/ D p0 C p1 s C : : : C pn s n ; pi 6 pi 6 piC ; i D 0; ; n: (14.10)
Then the four special Kharitonov’s functions are:
Q1 .j!/ D Pemin .!/ C jPomin .!/; (14.11)

Q2 .j!/ D Pemin .!/ C jPomax .!/; (14.12)
Q3 .j!/ D Pemax .!/ C jPomin .!/; (14.13)
Q4 .j!/ D Pemax .!/ C jPomax .!/; (14.14)
where
Pemin .!/ D p0 p2C ! 2 C p4 ! 4 p6C ! 6 C ; (14.15)

Pemax .!/ D p0C p2 ! 2 C p4C ! 4 p6 ! 6 C ; (14.16)
Pomin .!/ D p1 ! p3C ! 3 C p5 ! 5 p7C ! 7 C ; (14.17)
Pomax .!/ D p1C p3 ! 3 C p5C ! 5 p7 ! 7 C : (14.18)
One important observation is that the four special functions given by

Kharitonov’s theorem create a rectangle (called Dasgupta’s rectangle) [23] in
the response complex domain as shown in Fig. 14.4a, where the rectangle has
edges in parallel with real and imaginary axis. The four Kharitonov’s functions
(polynomials) correspond to the four corners of the rectangle.
Later, Levkovich et al. [90] showed that Kharitonov’s theorem can be used to
calculate the amplitude and phase envelops of a family of interval rational transfer
functions of continuous-time systems in frequency domain. The results can be easily
interpreted based on the Dasgupta’s rectangle (which is also called Kharitonov’s
rectangle), which can clearly show what is the largest magnitude (the longest
distance from origin of the complex plane to one corner of the rectangle). Same thing
can be derived for the smallest magnitudes and the bounds of the phase responses.
a b
omax
2 4
1 3
omin
emin emax
Fig. 14.4 (a) Kharitonov’s rectangle in state 8. (b) Kharitonov’s rectangle for all nine states.
Reprinted with permission from [61].
c 2011 IEEE
Table 14.1 Extreme Max Min Max Min

values of jP .j!/j
State jP .j!/j jP .j!/j argŒP .j!/ argŒP .j!/
and ArgP .j!/ for
nine states 1 Q4 Q1 Q2 Q3
2 Q3 or Q4 Pemin Q2 Q1
3 Q3 Q2 Q4 Q1
4 Q1 or Q3 Pomax Q4 Q2
5 Q1 Q4 Q3 Q2
6 Q1 or Q2 Pemax Q3 Q4
7 Q2 Q3 Q1 Q4
8 Q2 or Q4 Pomin Q3 Q1
9 Q1 or Q2 or 0 2 0
Q3 or Q4
Specifically, in the complex frequency domain, the magnitude and phase re-
sponse of Kharitonov’s rectangle in the complex plane can be divided into nine
states, which is shown in Fig. 14.4b [90]. And the corresponding maximum and
minimum magnitude and phase of the nine states are shown in Table 14.1:
Pmax .!/ D max.jQ1 .!/j; jQ2 .!/j; jQ3 .!/j; jQ4 .!/j/; (14.19)
Pmin .!/ D min.jQ1 .!/j; jQ2 .!/j; jQ3 .!/j; jQ4 .!/j;
jPemin j; jPomin j; jPemax j; jPomax j; 0/: (14.20)
An example of cascode op-amp circuit for phase envelops,
max ArgP! D max.jQ1 .!/j; jQ2 .!/j; jQ3 .!/j; jQ4 .!/j/: (14.21)
In Table 14.1, jP .j!/j and argŒP .j!/ are defined as the magnitude and
phase of the polynomial P .j!/. Once the variational transfer function is obtained
from (14.5), the coefficients can be converted from affine interval to classical
interval as aO i D Œai ; aiC and bOj D Œbj ; bjC . Afterward, one can compute the
upper and lower bounds of the transfer function easily:
maxjH.s/j D maxjN.s/j=minjD.s/j; (14.22)

minjH.s/j D minjN.s/j=maxjD.s/j; (14.23)
max argŒH.s/ D max argŒN.s/ min argŒD.s/; (14.24)
min argŒH.s/ D min argŒN.s/ max argŒD.s/: (14.25)
Since the maximum and minimum magnitude and phase of numerator N.s/ and
denominator D.s/ have only a few possible cases which are shown in Table 14.1, it
is very straightforward to obtain the magnitude and phase bounds of H.s/ compared
to large sampling-based MC simulations [90].
It was shown that if all the variational coefficients are not correlated and the value
of each coefficient in numerator and denominator belongs to finite real interval,
the magnitude and phase bound are precise (real bound) [90], i.e., each bound will
be attained by one function in the variational function family. But in our problem,
we know that each circuit parameter may contribute to several coefficients during
the evaluations of coefficient DDDs, and thus, the variational coefficients are not
independent.
However, DDD can generate the dominant terms of each coefficient in different
powers of s by performing the shortest path algorithm [176]. The shared parameters
in the dominant terms can be removed from different coefficients to tighten the affine
interval bounds and reduce the correlation between coefficients.
In the experiment part, we show that the bounds given by Kharitonov’s theorem
are conservative and they indeed cover all the responses from the MC simulation
results.
The presented method has been implemented in CCC, and the affine arithmetic part
is based on [43]. All the experimental results were carried out in a Linux system with
quad Intel Xeon CPUs with 3 GHz and 16 GB memory. The presented performance
bound method was tested on two sample circuits, one is a CMOS low-pass filter
(shown in Fig. 14.5), another is a CMOS cascode op-amp circuit [154] where the
small signal model is used to model the MOSFET transistors. The initial results of
this chapter were published in [61].
The information about the complexity of complex DDD and s-expanded DDD
after symbol decancellation are shown in column 1 to 7 in Table 14.3, in which
NumP and DenP are the total numbers of product terms in the numerator and
denominator of the transfer function and jDDDj is the size(number of vertices)
a b
i1 1 1 2 2
in
1
i2 1 2 1d
5 F 1
F i3
3
Fig. 14.5 (a) A low-pass filter. (b) A linear model of the op-amp in the low-pass filter. Reprinted
with permission from [61]. c 2011 IEEE
Table 14.2 Summary of coefficient radius reduction with cancellation

Ave. Max. Min.
Var. (%) Num. (%) Den. (%) Num. (%) Den. (%) Num. (%) Den. (%)
5 23.2 35.2 36.8 51.7 2.0 25.7
10 36.9 52.0 54.5 66.6 3.9 41.4
15 45.9 61.9 64.8 73.6 5.8 51.6
Table 14.3 Summary of DDD information and performance of the presented method
Complex DDD s-Expanded DDD
Circuit NumP DenP jDDDj NumP DenP jDDDj
Low-pass 5 8 31 7 70 32
Cascode 76 216 153 4,143 13,239 561
Number Global Local Bound range Speed up
Circuit of " variation (%) Variation (%) Mag (%) Pha (%) to MC
Low-pass 7 5 10 95.1 93.8 115
10 10 92.5 91.9 101
Cascode 30 5 10 83.9 84.3 77
10 10 81.1 80.2 68
of the DDD representing both the numerator and the denominator of the transfer
function. From the table, we can see that s-expanded DDDs are able to represent a
huge number of product terms with a relatively small number of vertices by means
of sharing among different coefficient DDDs.
First, we show that term decancellation is critical in improving the accuracy for
interval bounds in DDD evaluation using affine interval. Table 14.2 shows the effect
of coefficient affine radius reduction considering term decancellation for the given
two example circuits during the DDD evaluation under different sets of variations.
Var, Nom, and Den represent process variation, numerator, and denominator, respec-
tively. As can be seen from the table, the average radius reduction amount is 35:4%
and 49:8% for numerators and denominators, respectively, and the reduction effect
grows with the increasing of process variation. As a result, symbolic decancellation
can indeed significantly reduce the pessimism of affine arithmetic.
Bode Diagram of CMOS Lowpass Filter

10
0
Magnitude (dB)
−10 Monte Carlo

Nominal
−20 Affine DDD
−30
103 104 105 106 107
0
−20
Phase (deg)
−40 Monte Carlo

−60 Nominal
Affine DDD
−80
−100
103 104 105 106 107
Frequency (Hz)
Fig. 14.6 Bode diagram of the CMOS low-pass filter. Reprinted with permission from [61].
c 2011 IEEE
Second, we present the performance of the presented method. For the low-pass filter
example, we introduce three noise symbols " as the local variation source for the
VCCS, resistor, and capacitor inside linear op-amp model shown in Fig. 14.5b. And
we introduce another four noise symbols " for other devices of the filter as global
variation. For the cascode op-amp example, we introduce three noise symbols " for
the VCCS, resistor, and capacitor inside the small signal model for each MOSFET
transistor as local variation source and introduce another six noise symbols " for
other devices in the op-amp as global variations. The total number of noise symbols
for each testing circuit is shown in the 8th column in Table 14.3. As a DDD
expression is exactly symbolic and does not have any approximations, it is proved
to be accurate compared with SPICE (which uses the simple linearized device
models). In the experiments, we compare the obtained result with the Monte Carlo
simulations using DDD. We test the presented algorithm on different global/local
variation pairs as is shown in column 9. We introduce the bound range, which is
the average value of the result of the bound of the MC simulation divided by the
bound of the presented method.
Shown in Figs. 14.6 and 14.7 are the two results for comparison for the presented
method and the MC method under 10% global, 10% local variation and 5% global,
10% local variation, in which Affine DDD is the presented method and the Nominal
is the response of the circuit without parameter variation. During all the simulations,
5 Summary 233
Bode Diagram of CMOS Cascode Opamp

60
Magnitude (dB)
40
Nominal
20 Affine DDD
Monte Carlo
0
100 102 104 106
0
Phase (deg)
−50 Nominal
Affine DDD
Monte Carlo
−100
100 102 104 106
Frequency (Hz)
Fig. 14.7 Bode diagram of the CMOS cascode op-amp. Reprinted with permission from [61].
c 2011 IEEE
we found that the bound calculated by Kharitonov’s functions in the presented

method is always the conservative bound compared with MC. However, further
investigation is needed to obtain tighter bound using affine arithmetic. We chose
the MC samples to be 10,000. The speed up of the presented method compared with
MC is shown in column 12 in the Table 14.3. The average speed up is 90 for given
circuits.
5 Summary
In this chapter, we have presented a performance bound analysis algorithm of

analog circuits considering process variations. The presented method applies a
graph-based symbolic analysis and affine interval arithmetic to derive the variational
transfer functions of linearized analog circuits with variational coefficients. Then the
frequency response bounds were obtained by using the Kharitonov’s polynomial
theorem. We have shown that symbolic decancelation is important and necessary
to reduce pessimism for affine interval analysis. We also showed that the response
bound given by the Kharitonov’s functions is conservative given the correlations
among coefficient intervals in transfer functions. Numerical examples demonstrated
the effectiveness of the presented algorithm compared to the MC method.
Chapter 15
Stochastic Analog Mismatch Analysis
1 Introduction
For sub-90 nm technologies, mismatch in transistor is one of the primary obstacles

to reach a high yield rate for analog designs. For example, mismatch of CMOS
devices nearly doubles for every process generation less than 90 nm [80,104] due to
an inverse-square-root-law dependence with the transistor area.
Similar to leakage analysis, the traditional worst-case-based analysis is too pes-
simistic to sacrifice the speed, power, and area. Therefore, the statistical approach
[6, 80, 105, 128, 133] becomes a viable approach to estimate analog mismatch.
Analog circuit designers usually perform a MC analysis to analyze and predict the
statistical mismatch and functionality of VLSI designs. As MC analysis requires
a large number of repeated circuit simulations to achieve accurate result, its
computational cost is extremely expensive. Besides, MC pseudorandom generator
introduces numerical noises that may lead to errors.
Recently, many NMC methods [6, 80, 128] were developed to analyze stochastic
mismatch in VLSI. The authors of [128] calculated dc sensitivities with respect to
small device-parameter perturbations and scaled them as desired mismatches while
[80] extended the above work by modeling dc mismatches as ac noise sources. In a
transient simulation, the mismatch is converted back from the power spectral density
(PSD) in frequency domain. The speed of these NMC mismatch simulations can be
much faster than the MC approaches, but the accuracy remains a concern.
Recently, the mismatch was studied within the framework of the stochastic
differential algebra equation (SDAE), which is called SiSMA [6]. SiSMA is similar
to dealing with the transient noise [27]. Due to random variable existing in DAE,
it is unknown if the derivative is still continuous. Besides, the mismatch of the
channel current in transistors is designers’ top interest. As a result, the mismatch
was modeled as a stochastic current source in SiSMA and formed an SDAE.
Assuming the magnitude of the stochastic mismatch is much smaller than the

236 15 Stochastic Analog Mismatch Analysis
nominal case, the nominal SDAE at dc can be linearized with the stochastic current
source. The obtained dc solution from SiSMA is used as initial condition (i c) for
transient analysis. This assumption may not be accurate enough for describing the
mismatch during the transient simulation as the stochastic current source is only
included during dc. Another limitation is that SiSMA calculates the mismatch
by the extraction and analysis of a covariance matrix to avoid an expensive MC
simulation. When there are thousands of devices, it would be slow to analyze the
covariance matrix. Moreover, the computation is expensive for large-scale problems
since the entire circuit is analyzed twice. As a result, there is still a need to find a
faster transient mismatch analysis technique that requires improvements in twofold:
a different NMC method and an efficient macromodel by the nonlinear model order
reduction (MOR).
This chapter presents a fast NMC mismatch analysis, named isTPWL method
[202], which uses an incremental and stochastic TPWL macromodel. First, we
introduce the transient mismatch model and its macromodeling in this chapter and
then the way to linearize SDAE along a series of snapshots on a nominal transient
trajectory. After that, stochastic current source (for mismatch) is added at each
snapshot as a perturbation, which is more accurate than considering the mismatch
through an i c condition [6]. We further show how to apply an improved TPWL
model order reduction [58, 144, 181] to generate a stochastic nonlinear macromodel
along the snapshots of the nominal transient trajectory. After that, we apply it for a
fast transient mismatch analysis along the full transient trajectory. The presented
approach applies incremental aggregation on local tangent subspaces, linearized
at snapshots. In this way, the applied technique can reduce the computational
complexity of [58] and even improve the accuracy of [144].
The numerical examples show that the isTPWL method is 5 times more
accurate than the work in [144] and is 20 faster than the work in [58] on average.
Besides, the nonlinear macromodels reduce the runtime by up to 25 compared to
the use of the full model during the mismatch analysis.
Next, in order to solve the SDAE efficiently and avoid applying MC iterations or
analyzing the expensive covariance matrix [6], the stochastic variation is described
by spectral stochastic method based on OPC and forms an according SDAE [196].
The chapter presents a new method to apply OPC for nonlinear analog circuits
during an NMC mismatch analysis. Numerical results show that compared to the
MC method, the presented method is 1,000 times faster with a similar accuracy.
The rest of the chapter is organized in the following manner. In Sect. 2, the
background of the mismatch model and the nonlinear model order reduction are
presented. Section 3 discusses a transient mismatch analysis in SDAE, including a
perturbation analysis and a NMC analysis by the OPC expansions. We develop an
incremental and stochastic TPWL model order reduction for mismatch in Sect. 4.
And numerical examples are given in Sect. 5. Section 6 concludes and summarizes
the chapter.
2 Preliminary 237
2 Preliminary
2.1 Review of Mismatch Model
Precise mismatch model and analysis are the key to a robust analog circuit design.
Similar to the two components of process variation, inter-die and intra-die, there
are global and local components of mismatch. The global mismatch affects the the
whole chip the same way, while the local mismatch is more complex and the most
difficult one to analyze, and hence, it is the focus of this chapter.
The local mismatch is dependent on the variation in process parameter. The
Pelgrom’s model is one of the most popular CMOS mismatch models, which [133]
relates the local mismatch variance of one electrical parameter (such as the channel
current Id ) with geometrical parameters (such as the area A) by a geometrical
dependence equation as follows:
ˇ
Id D p ; (15.1)
A
where A D W L is the area of a width W and length L, and ˇ is an extracted

constant depending on the operating region ˇ.
Considering process parameters other than the geometry, a more general pur-
posed mismatch model can be derived through a so-called backward propagation of
variance (BPV) method [105] for other transistors such as diode, BJT, [105]. For
example, the base-current Ib depends on the base current density, emitter area, and
sheet resistance. The BPV model is then built up for the relation between the local
mismatch of an electrical property e and those process parameters pl by a first-order
sensitivity:
X @e
e D pl : (15.2)
@pl
l
Based on the mismatch model in (15.2), a NMC transient mismatch analysis for
a large number of transistors can be developed, which is shown in Sect. 3.
2.2 Nonlinear Model Order Reduction
Here we discuss the nominal model for nonlinear circuit first, then expand it to
stochastic model. The nominal nonlinear circuit is described by the following
differential algebra equation (DAE):
P t/ D Bu.t/;
f .x; x; (15.3)
where x (xP D dx=dt) are the state variables, which include nodal voltage and branch
current. f .x; x;
P t/ is used to describe the nonlinear i v relation, and u.t/ are the
external sources with a topology matrix B, which describes how to add them into
the circuit. The time cost of solving the MNA equations in (15.3) includes three
parts: device evaluation, matrix factorization, and time-step control and integration.
Among these three items, the portion of runtime mainly comes from the matrix
factorization when the circuit size is large or when devices are latent in most of the
time. Supposing we are under this condition, model order reduction can be used to
reduce the size of circuit, and then reduce the overall runtime efficiently. Therefore,
model order reduction can be applied in a transient mismatch analysis as a powerful
speedup tool as well.
The basic idea in model order reduction is to find a small dimensioned subspace
that can represent the original state space with a preserved system response, which
can be usually realized in the view of a coordinate transformation. For linear circuits,
the coordinate transformation can be described by a linear mapping as follows:
z D V T x; x D V z; (15.4)
where V is a small dimensioned projection matrix (2 N q, q N ). V can

be constructed from the first few dominant bases spanning a space of moments
(or derivatives of transfer functions) [36, 127].
For nonlinear circuits, model order reduction is more complex, and there are
already many MOR techniques developed [58, 144, 146, 181] as well. Similar to
MOR for linear circuit, there can be a nonlinear mapping defined by a function :
z D .x/; x D 1 .z/: (15.5)
Without losing generality, we assume an ordinary differential equation (ODE) form

for the simplicity of illustration:
xP D f .x; t/ C Bu.t/ (15.6)
for the DAE in (15.3). Since

d dx d d
zP D D f .x; t/ C B u.t/; (15.7)
dx dt dx dx
we have
ˇ
d ˇ d
zP D fO.z; t/ C Bu.t/;
O fO.z; t/ D f .x; t/ ˇˇ ; BO D B: (15.8)
dx xD 1 .z/ dx
In this way, if a proper lower-dimensioned mapping function (2 N q) can

be found, the original nonlinear system can be reduced within a tangent subspace
spanned by d=dx (or named as manifold).
3 Stochastic Transient Mismatch Analysis 239
The authors of [58] presented a working related the above nonlinear mapping
function with a TPWL method [144], which leads to a local two-dimensional
(2D) projection [58]. The bright side is that such a local 2D-projection is constructed
from local tangent subspaces, which maintains a high accuracy. However, the time
complexity comes out as an issue. Local 2D-projection could be computationally
expensive to project and store, when the number of local tangent subspaces is large.
On the other hand, the TPWL method [144] approximated the nonlinear mapping
function by aggregating those local tangent subspaces with the use of a global
SVD. This global SVD results in a one-dimensional (1D) projection. Obviously, the
global 1D-projection leads to a more efficient projection and less runtime. Another
thing is the accuracy of the TPWL model order reduction is limited because the
information in the dominant bases of each local tangent subspace is lost during the
global SVD [58]. In Sect. 4, an incremental aggregation that can balance the speed
and accuracy is introduced. In addition, the nonlinear model order reduction can be
extended to consider the stochastic mismatch as shown in Sect. 4.
3 Stochastic Transient Mismatch Analysis
3.1 Stochastic Mismatch Current Model
It is difficult to add the stochastic mismatch into the state variable x of (15.3)
directly, since f .x; x;P / may not be differentiable. Therefore, we model the
mismatch as a current source i.x; / added at the rhs of (15.3), similar to SiSMA [6]:
P t/ D F i.x; / C Bu.t/:
f .x; x; (15.9)
Here, F is the topology matrix describing the way to connect i into the circuit.
Based on the BPV equation in (15.2), the stochastic current source i has the
following form:
X
i.x; / D n.x/ g ˇ .pl /l ; (15.10)
l
where l is a random variable associated with a stochastic distribution W .l / for the
parameter pl . n.x/ describes the biasing-dependent condition (depending on x; x), P
provided from a nominal transient simulation. g ˇ .pl / is a constant for the parameter
pl at operating region ˇ. Taking one CMOS transistor with respect to the parameter
p
area A, for instance, A is one Gaussian random variable, g ˇ .A/ is ˇ = A, and
n.x/ becomes Id . Generally speaking, g ˇ .pl / can be either derived based on the
analytical device equations or practically characterized from measurements [105].
3.2 Perturbation Analysis
In this chapter, we assume that the impact of the local mismatch is small, (15.9) and
can be solved by treating the right-hand-side term for mismatch as a perturbation
to the nominal trajectory x .0/ .t/ of the circuit, where x .0/ .t/ is the nominal state
variable or solution of the nonlinear circuit equation:

f x .0/ ; xP .0/ ; t D Bu.t/: (15.11)
First-order Taylor expansion of f .x; x;

P t/ in (15.9) can lead to the following
equation:

@f .x; x;
P t/ @f x; x;
P t
f x .0/ ; xP .0/ ; t C x x .0/ C xP xP .0/
@x @xP

D F in x .0/ ; C Bu.t/; (15.12)
or
G x .0/ ; xP .0/ xm C C x .0/ ; xP .0/ xP m D F in x .0/ ; ; (15.13)
where
ˇ
P t ˇˇ
@f x; x;
G x .0/ ; xP .0/ D ˇ ;
@x ˇ
ˇ
P xP .0/
xDx .0/ ;xD
ˇ
P t ˇˇ
@f x; x;
C x .0/ ; xP .0/ D ˇ (15.14)
@xP ˇ
ˇ
P xP .0/
xDx .0/ ;xD
are the linearized conductive and capacitive components stamped by the companion
models in SPICE, and xm D x x .0/ is the first-order perturbed mismatch response.
Recall that x .0/ .t/ and xP .0/ .t/ are a number of time-dependent biasing points along
the transient trajectory.
3.3 Non-Monte Carlo Analysis by Spectral Stochastic Method
Performing Monte Carlo or the correlation mismatch analysis can be really ex-
pensive, so in this part, we will introduce the perturbed SDAE (15.13) where
the random variable is solved through an expansion of the OPC using spectral
stochastic method in Sect. 3.2 of Chap. 2. Different process variations are related
to the different orthogonal polynomials. In this chapter, we assume that the random
3 Stochastic Transient Mismatch Analysis 241
process parameters for the local mismatch have a Gaussian distribution. Therefore,
an according Hermite polynomial (represent one random variable)
˚./ D Œ˚1 ./; ˚2 ./; ˚3 ./; : : : ; T D Œ1; ; 2 1; : : : ; T (15.15)
is used to construct the basis of HPC expansion to calculate the mean and the
variance of xm .t/.
The first step is expanding the stochastic state variable xm .t/ by
X
xm .t/ D ˛i .t/˚i ./: (15.16)
i
Then, we apply the inner product of the residue error

X X
./ D G x .0/ ; xP .0/ ˛i .t/˚i ./ C C x .0/ ; xP .0/ ˛Pi .t/˚i ./
i i
.0/
X
F n x g ˇ .pl /l
l
by the orthogonal basis ˚j ./, it results in

Z
h./; ˚j ./i D ./˚j ./W ./d D 0; (15.17)

where W ./ is the PDF of the random variable . We assume all parameters involved
here follow Gaussian distribution.
Without the loss of generality, for one random variable for modeling one
geometrical parameter p, it is easy to verify that (15.17) leads to
˛0 D 0; ˛2 D 0

G x .0/ ; xP .0/ ˛1 .t/ C C x .0/ ; xP .0/ ˛P1 .t/ D F n x .0/ g ˇ .p/ (15.18)
with a second-order HPC expansion of xm ./. The according standard deviation is

thereby given by
Var < xm ./ >D ˛12 Var./ C ˛22 Var. 2 1/ D ˛12 : (15.19)
The first-order OPC coefficients of ˛1 .t/ in (15.18) can be solved by backward-

Euler integration as follows:

1 1
Gk C Ck ˛1 .tk / D Ck ˛1 .tk h/ C F ik ; (15.20)
h h
where
X
.0/ .0/ .0/ .0/
Gk D G xk ; xP k ; Ci D C xk ; xP k ; ik D n.xk / g ˇ .pl / (15.21)
l
are Jacobians and the current source of mismatch at the kth time-instant along the
nominal trajectory x .0/ .
It is easy to see that a native application of the above perturbation-based

mismatch analysis is still slow, since Gk , Ck , and ik have to be evaluated during
every time step along the nominal trajectory. Therefore, only K snapshots along the
nominal trajectory are used in the frame of a macromodeling instead of linearizing
along the full nominal trajectory, in Sect. 4.
3.4 A CMOS Transistor Example
In this part, using one CMOS transistor as an example, which is modeled with a
geometric parameter A, and the according Gaussian random variable A , (15.18)
becomes

1 1 ˇ
Gk C Ck ˛1 .tk / D Ck ˛1 .tk h/ C p .Id /k (15.22)
h h A
at the kth time step. Recall that Gk , Ck , and .Id /k represent the nominal value
of conductance p (gds ), capacitance (cds ), and channel current Id evaluated at tk ;
g ˇ .A/ is ˇ = A, and n.x/ becomes Id . Note that ˇ is the extracted constant from
Pelgrom’s model.
In this way, the transient mismatch voltage .xm D ˛1 .t/˚1 .A // of this transistor
has a time-varying standard variance ˛1 .t/2 , which
p can be solved from the above
perturbation equation. In most of the cases, ˇ = A is about few percentages of the
nominal channel current Id . The more important thing is that we can simultaneously
solve the transient mismatch vector using (15.18) with a generally characterized
g ˇ .pl / by the BPV model [105] for thousands of different typed transistors.
4 Macromodeling for Mismatch Analysis
For speedup purpose, we can take K snapshots along a nominal transient trajectory
instead of performing a full simulation for the nominal transient and transient
mismatch. Then the subspaces or macromodels can be found from the K snapshots
with respect to right-hand-side of the nominal input and stochastic current source,
respectively. Afterward, efficient transient analysis and transient mismatch estima-
tion can be performed along the full transient trajectory using those macromodels. In
the following part, we first introduce an incremental TPWL method for the nominal
transient to balance the accuracy and efficiency when generating the macromodel.
After that, we extend this approach to incremental stochastic TPWL (isTPWL) to
handle the stochastic mismatch.
4 Macromodeling for Mismatch Analysis 243
4.1 Incremental Trajectory-Piecewise-Linear Modeling
As discussed in Sect. 2, the first step in TPWL takes a few number of snapshots
along the typical transient trajectory and performs the local reduction at each
linearized snapshot or biasing point. The second step is creating a global subspace
using a sequence of linearized local subspaces obtained at those snapshots. Then we
apply a singular value decomposition (SVD) [51] to analyze the global subspace,
and further construct a global projection matrix with weights. The linearized
stochastic DAE (15.18) can be naturally reduced in the framework of the TPWL
method since the stochastic mismatch analysis isTPWL is performed along the
nominal trajectory x .0/ . n o
.0/ .0/
Suppose that there are K snapshots x1 ; : : : ; xK taken along the nominal
trajectory x .0/ . The linearized SDAE at the kth snapshot should be
Gk ˛1 .t/ C Ck ˛P1 .t/ D F ik : (15.23)
The above linearized subsystem in frequency domain is contained by a subspace

fAk ; Ak Rk ; A2k Rk ; : : : ; g composed by moments expanded at a frequency point s0
using two moments matrices:
Ak D .Gk C s0 Ck /1 Ck ; Rk D .Gk C s0 Ck /1 F : (15.24)
With the use of the block-Arnoldi orthonormalization [127], a q 0 th order

projection matrix Vk (2 N q 0 ),
h i
q0
Vk D v1k ; v2k ; : : : ; vk ; k D 1; : : : ; K (15.25)
can be constructed locally. Here we use the subscript to describe the index of
snapshot, and the superscript to describe the index of the reduction order.
4.1.1 Local Tangent Subspace
When the ninput vector isogiven (usually a set of typical inputs is used), we take K
.0/ .0/
snapshots x1 ; : : : ; xK along a nominal transient trajectory x .0/ .t/ and linearize
the DAE (15.3) at K snapshots (or biasing points), with the first snapshot x1 taken
at the i c point. The linearized DAE at kth (k D 1; : : : ; K) snapshot is

.0/ .0/ .0/ .0/
Gk x xk C Ck xP xP k D ık ; ık D Bu.tk / f xk ; xP k ; tk ; (15.26)
.0/
where ık represents the rhs source and the “nonequilibrium” update. xk at the kth
snapshot is contained by a subspace of moments fAk , Ak Rk , A2k Rk , . . . ,g expanded
at a frequency point s0 in frequency domain, where
Ak D .Gk C s0 Ck /1 Ck ; Rk D .Gk C s0 Ck /1 ık (15.27)
are two moments matrices.

With the use of the block-Arnoldi orthonormalization [127], a q 0 th order
projection matrix Vk (2 N q 0 ) with q 0 bases
h i
q0
Vk D v1k ; v2k ; : : : ; vk (15.28)
can be constructed locally to represent that local subspace. We call vik (k D

1; : : : ; K; i D 1; : : : ; q 0 ) as the first-q 0 dominant bases of one Vk , where the
subscript and superscript describe the index of the local subspace and the index of
the order of the dominant base, respectively. Block-Arnoldi orthonormalization can
.0/ .0/
find a linear coordinate transformation Vk which maintains jjz zk jj jjx xk jj.
Moreover, as discussed in the following part, those Vk s could span a subspace for
d=dx, the tangent (or named as manifold) of the mapping function introduced in
Sect. 2. In this chapter, we call the space spanned by Vk s as local tangent subspace.
4.1.2 Local and Global Projection
One approach to approximate the nonlinear mapping function introduced in

Sect. 2 is discovered in [58]:
X
K h i
x D 1 .z/
.0/
wk xk C Vk z zk (15.29)
kD1
and
X
K h i
.0/
z D .x/ wk zk C VkT x xk ; (15.30)
kD1
P
K
where wk kD1 wk D 1 is the weighted kernel function. The weighted kernel
function depends on the distance between a point on the trajectory and a lineariza-
tion point [144].
A nonlinear model order reduction is derived in terms of a local two-dimensional
(2D) projection based on equations (15.8), (15.29), and (15.30) as follows:
X
K X
K h i XK
.0/ .0/
wl wk VlT Gk Vk z zk C VlT Ck Vk zP zPk D wl VlT ık ;
lD1 kD1 lD1
(15.31)
4 Macromodeling for Mismatch Analysis 245
where we assume that all Vk s are reduced to the same order q 0 . The number of
sampled snapshots is required to be quite large to maintain a high accuracy for
circuits with a sharp transition (input) or strong nonlinearity (device). For this kind
of circuits, the numerical examples show that the number of sampled snapshots
(or neighbors) has to be large to produce a good accuracy. As such, the computa-
tional runtime cost would be prohibited by the local 2D projection (15.31) in [58].
On the other hand, the TPWL method in [144] approximates the nonlinear
mapping function by aggregating the local subspace Vk (2 N q 0 ) into a unified
global subspace spanfV1 ; V2 ; : : : ; VK g, which can be further compressed into a
lower-dimensioned subspace V (2 N q, q N ) by a SVD as follows,
V D SVDq .ŒV1 ; V2 ; : : : ; VK / : (15.32)
This procedure is defined as global aggregation. A global aggregation can generate

a global one-dimensional (1D) projection by
X
K h i XK
.0/ .0/
wk V T Gk V z zk C V T Ck V zP zPk D wk V T ık : (15.33)
kD1 kD1
It is easy to see that such a global 1D-projection has a smaller projection time and
storage than the local 2D-projection. However, the global 1D-projection usually
requires a higher-order q to achieve an accuracy similar to the local 2D projection
with the order q 0 (q 0 < q) [58] at the same time, since the dominant bases of those
local Vk s are interpolated by the global aggregation.
4.1.3 Incremental Aggregation of Subspaces
Longer runtime and larger storage are required by the local 2D-projection in (15.31)
compared to the global 1D-projection (15.33). On the other hand, the local 2D-
projection (15.31) is more accurate than the global 1D-projection (15.33) by V.
Therefore, we need a procedure that can balance both of the accuracy and efficiency.
The manifold ddx
can be covered by the local tangent subspaces fV1 , V2 ,. . . ,VK g
along the trajectory, where each Vk can be further composed of different orders
q0
of dominant bases, fv1k ; v2k ; : : : ; vk g. As such, an effective aggregation needs to
consider the order or the dominance of those bases. This motivates us to use those
local tangent subspaces to decompose the space spanned first according to the order.
In this way, (15.29) becomes
0
X
K X
K X
q
1 p .0/
xD .z/ wk xk C wk vk z zk
kD1 kD1 pD1
0
X
K X
q
X
K
p .0/
D wk xk C vk wk z zk
kD1 pD1 kD1
X
K h i
.0/ .0/
D wk xk C v11 w1 z z1 C : : : C v1K wK z zK
kD1
h i
q .0/ q .0/
C : : : C v1 w1 z z1 C : : : C vK wK z zK : (15.34)
After that, we can form a global tangent subspace in the order of the dominant
bases by n 0 0 o
˚
q q q0
span v11 ; v12 ; : : : ; v1K ; : : : ; span v1 ; v2 ; : : : ; vK : (15.35)
A global projection matrix V is accordingly constructed below in a fashion of an

incremental aggregation. In this process, we first aggregate each global tangent
subspace by orders
1 h 0 i
q q0
V1 D SVDq v1 ; : : : ; v1K ; : : : ; Vq 0 D SVDq v1 ; : : : ; vK : (15.36)
That is to say, we can identify a Vp (p D 1; : : : ; q 0 ) to represent the p-th order

global tangent subspace.
Then, the global projection matrix V can be further aggregated

V D SVDq ŒV1 ; V2 ; : : : ; Vq 0 (15.37)
by those global tangent subspaces in a descending order of dominance. As shown by

the numerical examples, usually we can choose a much lower q 0 (q 0 q) for each
local tangent subspace Vk , and the order q depends on the number of snapshots. For
circuits with the sharp transition (input waveform) or strong nonlinearity (device),
the number of snapshots is large and so does the number of q.
The information of those dominant bases at low orders are preserved, as the
local tangent subspace is incrementally aggregated according to their ordered bases.
As shown by the numerical examples, when compared to the previous TPWL
method [144], this incremental aggregation results in a higher accuracy yet with
a similar computational cost in the projection time and memory storage. Another
benefit of the presented incremental aggregation is that it also can consider more
sampled biasing (linearization) points than the approach in [58], whereas the
computational cost of the local 2D-projection would increase dramatically.
4.2 Stochastic Extension for Mismatch Analysis
After the incremental aggregation, we further extend the above discussion to build
the TPWL macromodel for stochastic mismatch analysis. Instead of linearizing the
DAE in (15.3) directly, we linearize the SDAE (15.18) at K snapshots along the
nominal trajectory similarly, and then construct the local tangent subspace Vk by
the following formula:
A0k D .Gk C s0 Ck /1 Ck ; Rk0 D .Gk C s0 Ck /1 ık0 : (15.38)
Here ık0 is determined by the nonequilibrium correction associated with F ik . After

that, we can build the similar incrementally aggregated mapping V through (15.36)
and (15.37).
Then, a set of weighted local macromodels can construct the global macromodel,
where we use
X
K

wk V T Gk V˛1 .t/ C V T Ck V ˛P1 .t/ V T F ik D 0 (15.39)
kD1
to calculate the transient mismatch. We call such a macromodeling technique as

isTPWL method, which is sampled from K snapshots. Using such a macromodel,
we can then efficiently perform a transient mismatch analysis for the full trajectory.
To show the numerical examples of the presented method, a modernized SPICE3

(http://ngspice.sourceforge.net/) is used to generate the K snapshots of a nominal
trajectory and to extract the mismatch current model. The presented mismatch
algorithm has been implemented in C and Matlab, where the OPC expansion,
backward-Euler, and incremental and stochastic TPWL (isTPWL) are implemented
in Matlab. The TPWL method and maniMOR method are implemented exactly fol-
lowing the procedure described in [144] and [58], respectively, for the comparison
purpose. For instance, the state variables at snapshots are added to have a “richer”
information during the global aggregation as for the TPWL method [144]. We
implement the flow under MC analysis as the baseline with 1,000 iterations. The
initial results of this chapter were published in [202].
All experimental results are measured on an Intel dual-core 2.0 GHZ PC with
2 GB memory. We compare the accuracy and study the scalability of the presented
method with four industrial analog/RF circuits. They contain different transistors
such as diode, BJT, CMOS. The circuits also include the extracted parasites so that
the matrix time is dominant. For the characterization of gˇ .pl /, we apply Pelgrom’s
model for CMOS transistors and BPV model for diodes and BJTs. All of them result
in 10% variation from the nominal bias n.x/ (e.g., Id for CMOS transistor). In
addition, the waveform error is measured by taking the averaged difference of two
waveforms.
.0/ Three waveforms are measured at each time step: the transient nominal
x .t/ , the transient mismatch (˛1 .t/, the time-varying standard deviation), and
the transient (x.t/, the nominal plus the standard deviation).
Table 15.1 Scalability comparison of runtime and error for the exact model with
MC, the exact model with OPC, and the isTPWL macromodel with OPC
Case Circuit # of nodes # of steps # of snapshots # of orders
1 Diode chain 802 225 24 25
2 BJT mixer-1 238 135 25 25
3 BJT mixer-2 1,248 219 83 45
4 CMOS comp. 654 228 75 60
Exact OPC OPCCisTPWL
MC
Case Time (s) Time (s) Error (%) Time (s) Error (%)
1 520.1 0.53 0.41 0.02 0.43
2 338.0 0.34 0.29 0.02 0.36
3 348.0 0.20 0.18 0.04 0.24
4 412.1 0.39 0.41 0.08 0.62
5.1 Comparison of Mismatch Waveform-Error and Runtime
In this part, we first compare the accuracy of the waveform of transient mismatch
between the MC method (1,000 iterations) and the exact orthogonal PC. After that,
we further compare the accuracy with the isTPWL macromodel. In addition, we
also compare the waveform of the transient mismatch and the waveform by adding
mismatch as one initial condition similar to the setting in SiSMA [6] technique.
Finally, the runtime and waveform error are summarized in Table 15.1.
The first example is a BJT-mixer circuit including an extracted distributed
inductor with 238 state variables. The waveforms are compared by solving the
perturbed SDAE (15.13) with use of the MC analysis and the OPC expansion,
respectively. We apply MC analysis with Gaussian distribution 1,000 times at one
time step and calculate the time-varying standard deviation. It takes 348 s for the
transient mismatch by the MC analysis, and only 0:20 s (more than 1,000 times
speedup) for the exact OPC expansion up to the second order with error less than
0:18%. Clearly, these two waveforms of transient mismatches got from the two
methods are virtually identical, as shown in Fig. 15.1.
Next, we show further speed improvement by macromodeling. The second
example is a CMOS comparator including an extracted power supply with 654 state
variables. Waveforms of the exact OPC and the one further reduced by isTPWL are
compared in this part. Figure 15.2a shows the comparison of the transient nominal,
while Fig. 15.2b shows the comparison of the transient mismatch. Here 75 snapshots
are used to generate the macromodel: we reduce the original model to a macromodel
with the order of 60. For a short transient with 228 time steps, it takes 0.39 second
for the exaction and 0.08 second for the isTPWL (five times speedup). The error of
waveforms analyzed by isTPWL is 0.62%.
We further compare the transient mismatch waveforms for different ways to
add the mismatch. The first is to add the stochastic mismatch only for the ic
condition like the procedure used in SiSMA [6] (Fig. 15.3). The second is adding
Transient Mismatch
5
Monte Carlo
SOP Expansion
4
3
(mV)
0
0 1 2 3 4 5 6 7 8 9 10
(ns)
Fig. 15.1 Transient mismatch (the time-varying standard deviation) comparison at output of a
BJT mixer with distributed inductor: the exact by Monte CarloN and the exact by orthogonal PC
expansion. Reprinted with permission from [52]. c 2011 ACM
a Transient Nominal b Transient Mismatch

12
2.5
10
2
8
1.5
(mV)
(V)
6
1
4
0.5
2
0
0
0 2 4 6 0 2 4 6
(ns) (ns)
Exact SOP isTPWL SOP Exact SOP isTPWL SOP

Fig. 15.2 Transient nominal x .0/ .t / (a) and transient mismatch (˛1 .t /) (b) for one output of a
COMS comparator by the exact orthogonal PC and the isTPWL. Reprinted with permission from
[52].
c 2011 ACM
the stochastic mismatch during every time step as in the presented approach. In this
part, we use a diode chain with 802 state variables. Figure 15.4 shows one waveform
of the transient nominal, and two waveforms with mismatches added differently,
from which we can see that the waveform with mismatch added at i c shows a
nonnegligible difference.
Transient Waveform
1.1
Nominal Transient
1.0 SiSMA Transient
Exact−SOP Transient
0.9
(V)
0.8
0.7
0.6
0.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
(ns)
Fig. 15.3 Transient waveform comparison at output of a diode chain: the transient nominal, the
transient with mismatch by SiSMA (adding mismatch at i c only), the transient with mismatch by
the presented method (adding mismatch at transient trajectory). Reprinted with permission from
[52]. c 2011 ACM
Transient Mismatch
Exact SOP
isTPWL
2 TPWL
1.5
(mV)
0.5
0
0 1 2 3 4 5 6 7 8 9 10
(ns)
Fig. 15.4 Transient mismatch (˛1 .t /, the time-varying standard deviation) comparison at output
of a BJT mixer with distributed substrate: the exact by OPC expansion, the macromodel by TPWL
(order 45), and the macromodel by isTPWL (order 45). The waveform by isTPWL is visually
identical to the exact OPC. Reprinted with permission from [52]. c 2011 ACM
Finally, Table 15.1 summarizes the runtime and error of four different analog/RF
circuits. In this table, the waveform error is defined as the relative difference
between the exact and the macromodel, and the runtime here is the total simulation
time. We find that the OPC expansion reduces the runtime by 1,000 times yet
a error ratio b runtime ratio

7 25
6
20
5 cmos−
maniMOR/isTPWL
comp
TPWL/isTPWL
cmos−
comp 15 bjt−
4 mixer
bjt− −2
mixer bjt−
3 −1 mixer
bjt− 10 −1
mixer diode
−2 chain
diode
2 chain
5
1
0 0
1 2 3 4 1 2 3 4
ckt type ckt type
Fig. 15.5 (a) Comparison of the ratio of the waveform error by TPWL and by isTPWL under the
same reduction order. (b) comparison of the ratio of the reduction runtime by maniMOR and by
isTPWL under the same reduction order. In both cases, isTPWL is used as the baseline. Reprinted
with permission from [52]. c 2011 ACM
with an error of 0.23% on average. Moreover, the macromodel by isTPWL further

reduces the runtime up to 25 times (diode chain) yet with an error up to 0.43%. This
demonstrates the efficiency and accuracy of the isTPWL method for the transient
mismatch analysis.
5.2 Comparison of TPWL Macromodel
By isTPWL, we can improve the accuracy and runtime further, as shown in this
part. First, Fig. 15.4 presents the transient-mismatch waveform comparison for a
BJT mixer including the distributed substrate with total 1,248 state variables. Here
83 snapshots are used for both TPWL and isTPWL to reduce the original model
to a macromodel with the order of 45. We find that the waveform by isTPWL is
visually identical to the exact OPC expansion. But the waveform by TPWL [144]
shows a nonnegligible waveform error 4.5 times larger than the one by isTPWL.
Figure 15.5 further summarizes the comparison by the four circuits used in the
previous section. Figure 15.5a is the comparison of the ratio (TPWL vs. isTPWL)
of errors in waveforms for simulated macromodels by TPWL [144] and by isTPWL
under the same model reduction order. Figure 15.5b shows the comparison of the
ratio (maniMOR vs. isTPWL) of the reduction time for reduced macromodels by
maniMOR [58], and by isTPWL under the same reduction order. In both of those
cases, isTPWL is used as the baseline when calculating the ratio. The numerical
examples show that the isTPWL method is 5 times more accurate than TPWL [144]
and is 20 times faster than maniMOR [58] on average, which clearly demonstrates
the advantage to use the incremental aggregation.
6 Summary
This chapter has presented a fast non-MC mismatch analysis. It models the
mismatch by a current source associated with a random variable and forms a SDAE.
The random variable in SDAE is expanded by OPC. This leads to an efficient
solution without using the MC or correlation analysis. Moreover, the SDAE has
been solved by an improved TPWL model order reduction, called isTPWL. An
incremental aggregation has been introduced to balance the efficiency and accuracy
when generating the macromodel. Numerical examples show that when compared to
the MC method, the presented method is 1,000 times faster with a similar accuracy.
Moreover, on average, the isTPWL method is 5 times more accurate than the work
in [144] and is 20 times faster than the work in [58]. In addition, the use of a reduced
macromodel reduces the runtime by up to 25 times when compared to the use of a
full model.
Chapter 16
Statistical Yield Analysis and Optimization
1 Introduction
A robust design beyond 90 nm is challenging due to process variations [6,20,31,32,

37, 54, 55, 59, 67, 80, 88, 100, 105, 124, 133, 135, 153, 180, 187, 189, 203]. The sources
of variation can come from etching, lithography, polishing, stress. For example, the
proximity effect caused by stress from shallow-trench isolation regions affects the
stress in the channel of nearby transistors and therefore affects carrier mobility and
threshold voltage. Process variation (or mismatch) significantly threatens not only
the timing closure of digital circuits but also the functionality of analog circuits. To
ensure the robustness in terms of a high yield rate, in addition to performance, a
fast engine for yield estimation and optimization is needed to verify designs beyond
90 nm. Note that there are two types of variations: systematic global variation, and
stochastic local variation. The stochastic variation such as analog mismatch is the
most difficult one. One either performs thousand times of MC (Monte Carlo) runs
consuming engineering resources, or uses pessimistic process corners provided from
the foundry. Since corners are usually pessimistic for yield and MC is too painful
for verification, the stochastic engine with a NMC approach is currently required
for yield estimation and optimization.
To ensure one robust design, the development of fast variation (mismatch)
analysis to estimate yield is the first priority. Many NMC methods have been
developed recently for stochastic variation (mismatch) analysis as discussed in
Chap. 15.
Next, one needs to improve or optimize the yield by tuning parameters at
nominal conditions to ensure a robust design. An efficient approach is to derive
and employ yield sensitivity with respect to design parameters. Unfortunately,
it is unknown how to calculate the stochastic sensitivity in the frame work of
the OPC [187, 196]. This chapter is the first to discuss the stochastic sensitivity
analysis under OPC, which can be effectively deployed in any gradient-based
optimization such as the sequential linear or quadratic programming. Moreover, it
is necessary, even imperative, to optimize two or more objectives or performance

254 16 Statistical Yield Analysis and Optimization
merits simultaneously [26,103,152], such as maximizing the benefit and minimizing

the expense. To do so, we formulate a stochastic optimization problem and
develop a multiobjective optimization algorithm to improve the yield rate and
other objectives simultaneously. As such, our OPC-sensitivity-based algorithm
performs the optimization by changing the nominal point along gradient directions
of orthogonal PC-expanded SDAE [52]. Experiments show that fast mismatch
analysis can achieve up to 700 speedup and maintain 2% accuracy; meanwhile,
our optimization procedure can improve yield rate to 95:5% and enhance other
performance merits compared with other existing methods.
2 Problem Formulations
We formulate the yield optimization problem in this chapter. This is based on the
observation that the parameter vector p can change the performance metric fm ,
such as delay and output swing, and further lead to the circuit failure that affects
the yield rate. In general, the parametric yield Y .p/ is defined as the percentage of
manufactured circuits that can satisfy the performance constraints.
To illustrate this we can consider one output voltage that discharges from high to
low. Because the process variation can perturb the parameter vector p away from
their nominal values, this leads to the transient variation (mismatch) waveform
shown in Fig. 16.1.
0.9
0.898
Output Voltage
0.896 fail
0.894
vthreshold
0.892
success
tmax
0.89
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04
Time
Fig. 16.1 Example of the stochastic transient variation or mismatch

2 Problem Formulations 255
Fig. 16.2 Distribution of 100

output voltage at tmax
Performance
80 Constraint
Number of Occurances
60 Successful
region
40
Failed region
20
0
0.891 0.8915 0.892 0.8925 0.893 0.8935
Output Voltage
The performance constraint h.pI t/ in this case is
h.pI t/ D fm .tmax / fmthreshold 0: (16.1)
This means that those curves below vth at tmax correspond to successful samplings. In
addition, one can plot the distribution of output voltages at tmax shown in Fig. 16.2. It
is clear that samplings located at the left of the performance constraint are successes,
while those at the right are failures.
As such, parametric yield can be defined as
Z
Y .pI t/ D pdf .fm .pI t//dS; (16.2)
S
where S is the successful region and pdf .fm .pI t// is the PDF of the performance
metric fm .pI t/ of interest. With defined parametric yield, one can optimize the
parametric yield by tuning the parameters under stochastic variations. Meanwhile,
one needs to consider other performance merits, such as power and area, during the
optimization process.
Accordingly, stochastic multiobjective optimization problem in this chapter can
be formulated in detail below:
Maximize Y .p/;
Minimize pc .p/;
Subject to Y .p/ YN ;
pc .p/ pNc ;
F .p/ Fmax ;
pmin p0 pmax : (16.3)
Here, Y .p/ is the parametric yield associated with the parameter vector p and
pc .p/ is the power consumption. F .p/ denotes other performance metrics (such
as area A), which define the feasible design space. Moreover, YN and pNc are the
minimum yield rate and maximum power consumption (or targeted values) that can
be accepted, respectively. In other words, the multiobjective optimization procedure
is to maximize the Y .p/ that should be larger than YN and minimize the pc .p/ that
should be smaller than pNc simultaneously. Meanwhile, other constraints defined by
F .p/ should be satisfied.
Moreover, p is a vector of the process parameters with variations and can be
expressed as p D p0 C ıp. Also, p0 is a vector of the nominal values assigned in
the design stage, and ıp consists of parameter variations with zero-mean Gaussian
distributions. In addition, all nominal values of process parameters p0 are assumed
to be limited within the feasible parameter space (pmin ; pmax ) and can be tuned for
better yield rate.
One effective solution of this optimization is the gradient-based approach, which
requires the calculation of the sensitivity in the stochastic domain. As discussed
later, this chapter develops a stochastic sensitivity analysis, which can be embedded
into one sequential linear programming (SLP) to solve this optimization problem
efficiently.
3 Stochastic Variation Analysis for Yield Analysis
In this section, we show how to apply the OPC technique introduced in Sect. 3.2 of
Chap. 2 to analyze and estimate the yield.
In this section, we first review the existing works of mismatch analysis [6,32,105,
133]. Here we focus on the stochastic variation, or referred to as local mismatch. We
illustrate the stochastic variation analysis using MOS transistors in the following
section. A similar approach can be extended to other types of transistors by the
so-called propagation of variance (POV) method [32, 105].
The mismatch of one MOS transistor is usually modeled by Pelgrom’s model
[133], which relates the local mismatch variance of one electrical parameter with
geometrical parameters by
ˇ
Dp ; (16.4)
W L
where ˇ is the additional fitting parameter.
To consider the local mismatch during circuit simulation without running Monte
Carlo, SiSMA [6] models the random local mismatch of a MOS transistor by a
stochastic noise current source , coupled with the nominal drain current ID in
parallel. can be expressed by
ˇ
D ID tm .W; L/.x; y/: (16.5)
3 Stochastic Variation Analysis for Yield Analysis 257
ˇ
Here, the ID is determined by the operating region of MOS transistors; tm .W; L/
considers the geometry of the device active area:
ˇ
tm .W; L/ D 1 C p ; (16.6)
W L
and .x; y/ refers to the sources of all the variations that depend on the device
position, which can include the spatial correlation [6]. Here, .x; y/ D 1 because
all parameters are decoupled after the PCA.
Note that the random variable in the stochastic current source can be expanded
by the spectral stochastic method [187, 196]. For example, let us use the channel
length L of one MOS transistor as the variation source. Assuming the variation of
L is small, one can expand tm .W; L/ around its nominal value W.0/ and L.0/ with
Taylor expansion by
ˇ
tm .W; L/ D 1 C p
WL
2 3
6 1
ˇ
1 7
D1C p 4p q L L.0/ 5
W.0/ L.0/ 2 L 3
.0/
2 3
ˇ
6 1 1 7
D1C p 4p q 5 (16.7)
W.0/ L.0/ 2 L 3
.0/
Here, is the random variable for the variation of the channel length L. One can
describe by OPC. Based on the Askey scheme [196], a Gaussian distribution of
can be expanded using Hermite polynomials ˚i (i D 0; : : : ; n) by
X
n
D gi0 ˚i ; (16.8)
i D0
where gi0 is the OPC expansion coefficient.

As such, one can summarize the expression of the stochastic current source as
2 0 13
ˇ6 B 1
ˇ
1 X
n
C7
D ID 41 C p @p q gi0 ˚i A5 ;
W .0/ L.0/ 2 L3 i D1
.0/
X
n
D gi ˚i ; (16.9)
i D0
where gi is the new expression of the expanded coefficients but with geometry
dependence.
Knowing the expression of for one parameter variation source, multiple process
parameters pi (i D 1; ; m) can be considered by a vector of stochastic current
source .t/.
On the other hand, any integrated circuit is composed of passive and active
devices described by a number of terminal-branch equations. According to KCL,
one can obtain a differential algebraic equation (DAE) as below:
d
q.x.t// C f .x.t/; t/ C B u.t/ D 0: (16.10)
dt
Here, x.t/ is vector of state variables consisting of node voltages and branch
currents. q.x.t/; t/ contains active components such as charges and fluxes. Also,
f .x.t/; t/ describes passive components, and u.t/ denotes input sources. B de-
scribes how to connect sources into the circuit which is determined by circuit
topology.
Similar to [6], one can add .t/, representing the mismatch, to the rhs of the
differential algebra equation (DAE):
dq.x.t//
C f .x.t// C B u.t/ D T .t/; (16.11)
dt
which describes the circuit and system under stochastic variations. Note that T is the
topology matrix describing how to connect .t/ into the circuit, and one can have
X
m
T .t/ D Tpi pi (16.12)
i D1
for multiple parameters. For example, pi is the mismatch current source for i th
parameter variation, which can be expanded using OPC shown in (16.9).
3.1 Algorithm Overview
In summary, we outline the overall algorithm flow as in Algorithm (1). From this
flow, we observe that the optimization procedure involves several optimization
iterations. Each of the iterations contains three major steps: stochastic yield
estimation, stochastic sensitivity analysis, and stochastic yield optimization. The
last is achieved by tuning nominal parameters along the obtained gradient directions.
Notice that we take all design parameters as random variables; fixed parameters that
cannot be tuned can be removed from this procedure by parameter screening.
3.2 Stochastic Yield Estimation and Optimization
In this section, we will discuss how to estimate the parametric yield and further
optimize it by tuning parameters automatically. As such, we first show how to
estimate the parametric yield with the stochastic variation (mismatch) (fm It ; fm It )
obtained from the above NMC mismatch analysis.
3.3 Fast Yield Calculation
First, we construct the performance distribution at one time step tk by (fm .tk /,
fm .tk /), shown as the solid curve from 3 to C 3 in Fig. 16.3. Then, the
performance constraint is given as
h.pI tk / D fm .pI tk / fmthreshold 0: (16.13)
With the constraints, the boundary separating success region from failure region can
be plotted as the straight line h.pI tk / D 0 in following figure.
As a result, the performance fm .tk / located at the left of h.pI tk / D 0 (shown
as the shaded region) can satisfy the constraint in (16.13) and thus belongs to the
Fig. 16.3 Parametric yield 1.5

estimation based on
orthogonal PC-based μfm μfm+3σfm
stochastic variation analysis μfm−3σfm
Number of Occuranes
1
h(p;t)=0
Success
Region
0.5
0
−3 −2 −1 0 1 2 3
Performance (fm)
successful region SO . Hence, the parametric yield can be estimated with the area
ratio by
SO
Y .p/ D : (16.14)
Sfm
When denoting the entire region area Sfm D 1, Y .p/ becomes SO and is determined
by the integration below:
Z Z
Y .p/ D pdf.fm .pI tk //dS D pdf.fm ; fm /dS; (16.15)
SO SO
where pdf.fm / is the probability distribution function (PDF) of the performance

merit of interest, characterized by fm and fm at the time step tk .
3.4 Stochastic Sensitivity Analysis
In order to enhance yield rate, most optimization engines need sensitivity infor-
mation to identify and further tune those critical parameters. However, with the
emerging process variations beyond 90 nm, traditional sensitivity analysis becomes
inefficient: either use the worst-case scenario or conduct MC simulations [88, 100,
153]. Therefore, an efficient NMC-based stochastic sensitivity analysis is needed
for this purpose. With all parameter variations calculated from the fast mismatch
analysis in Sect. 15, one can further explore the impact or contribution from the
parameter variation pi to the performance variation fm . This can be utilized to
perform optimization procedure for better performance merits. In this section, we
develop an approach to evaluate the sensitivity of transient variation (mismatch)

with respect to each parameter variation.
We start from the definition of stochastic sensitivity, expressing the relationship
between the performance metric variation fm . From now on, we note fm .t/ D
fm . p I t/) for illustration purpose and assume the random parameter vector p (2
Rm ). As such, the stochastic sensitivity can be defined by
@fm . p I t/
spi .t/ D ; i D 1; ; m; (16.16)
@pi
where spi .t/ is the derivative of the performance variation fm with respect to the
i th random parameter variable pi at one time instant t. Depending on the problem
or circuit under study, the performance fm can be output voltage, period, and power,
and the parameter can be transistor width, length, and oxide thickness. Such a so-
called stochastic sensitivity can be also understood based on the POV relationship
[32, 105]:
X @fm . p I t/ 2
2
fm D 2p : (16.17)
i
@pi
i
Here, 2p is the parameter variance and 2f is the performance variance.
i m
Note that the performance variation fm is mainly determined by ˛1 [196] in
(16.15) at time step tk as derived in Sect. 3.3, while ˛2 has little impact on the
performance variation. As such, one can truncate the OPC expansions to the first-
order for the calculation of mean and variance, and experiments show that the
first order expansion can provide adequate accuracy. Therefore, ˛1 is the dominant
moment for fm while ˛2 can be truncated to simplify calculation. Therefore, we
have the following:
˛1 .tk / D c1 C c0 T g.tk /; (16.18)
where

k 1 k 1
c0 D G.0/ C C.0/ ;
h

1 k
c1 D c0 C.0/ ˛1 .tk h/ :
h
ı
As such, one can further calculate the stochastic sensitivity @fm . p I t/ @pi
using
@fm . p I t/ @g.tk /
spi .tk / D D c0 Tpi ; (16.19)
@pi @pi
which can be utilized in any gradient-based optimization to improve the yield rate.
3.5 Multiobjective Optimization
Next, we make use of sensitivities spi to improve parametric yield. Meanwhile, since
power is also a primary design concern, we treat power consumption reduction as an
extra objective and solve a multiobjective optimization problem defined in Sect. 3.
Note that other performance merits can be treated as objectives of optimization
in a similar way. As such, by tuning nominal process parameters along gradient
directions, we enable more parameters containing process variations to satisfy the
performance constraints. This is an important feature for a robust design. In this
section, we demonstrate this requirement by a sequential linear programming (SLP).
At the beginning of each optimization iteration, the nonlinear objective functions
Y .p/ and pc .p/ can be approximated by linearization:

Y .p/ D Y p.0/ C rp Y . p.0/ /T p p.0/ ;

pc .p/ D pc p.0/ C rp pc . p.0/ /T p p.0/ ; (16.20)
where p.0/ represents the nominal design parameters while p contains the process
variations of these parameters. Note that (31) is a first-order Taylor expansion of
parametric yield Y .p/ defined in (16.15) and power consumption pc .p/, around
the nominal parameter region p.0/ . Thus, rp Y . p.0/ / is a vector consisting of
ı
@Y . p / @pi . The same is true for power consumption rp pc . p.0/ /. Therefore, the
nonlinear objective functions can be transformed into a series of linear optimiza-
tion subproblems. The optimization terminates when the convergence criterion is
achieved.
As such, the stochastic multiobjective yield optimization problem in Sect. 3 can
be reformulated as
T
Maximize Y .p/ D Y p.0/ C rp Y p.0/ p p.0/ ;
T
Minimize pc .p/ D pc p.0/ C rp pc p.0/ p p.0/ ;
Subject to Y .p/ YN ;
pc .p/ pNc ;
F .p/ Fmax ;
pmin p pmax ;
where ıp D p p0 is the

step size. Within each iteration, the sensitivity vector
rp Y p.0/ ; rp pc p.0/ ; and ıp should be updated.
However, ı the stochastic sensitivity
ı analysis in Sect. 5ıcan only calculate
@F . p I t/ @pi rather than @Y . p / @pi . To obtain @Y . p / @pi , we start from
(16.15) with the following derivation:
Fig. 16.4 Stochastic yield 1.5

optimization μfm(p0)
Number of Occuranes
1 μfm(p1) h(p;t)=0
0.5
0
−4 −3 −2 −1 0 1 2 3
Performance (fm)
Z
@Y . p / @pdf.F . p I t//
D dS
@pi @pi
SO
Z
@pdf.F / @F . p I t/
D dS: (16.21)
@F @pi
SO
ı ı
As a result, @Y . p / @pi can be obtained with @F . p I t/ @pi calculated from the
stochastic sensitivity analysis. Note that the PDF of the performance variation and
the integral region SO are both given from the yield estimation in (16.15).
We illustrate the presented optimization procedure for yield objective function
Y .p/ through Fig. 16.4. With the parametric yield estimation using the NMC
mismatch analysis, the distribution of performance fm for nominal parameters
p0 can be plotted as a solid curve, which has a mean value fm .p0 /. With the
performance constraint h.pI t/ 0 in (16.1), the shaded area located at the left of
the constraint line is the desired successful region. One yield optimization procedure
needs to move the performance distribution to left side so that the shaded area can
be maximized. Therefore, the problem here is how to change the process parameters
p in order to move the performance distribution for an enhanced yield rate.
Moreover, power consumption can be estimated by
pc .p/ D ŒVdd iNVdd ; (16.22)
where Vdd is the power supply voltage source and iNVdd is the average value of current
through the voltage source. The power consumption optimization can be explained
as shown in Fig. 16.5. The initial design generates the current iVdd denoted as the
black curve and leads to high power consumption pc .
x 10−5
0
Optimal
Middle
Current through power supply (A)
−0.5 Initial
−1
−1.5
−2
−2.5
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035
Time (ns)
Fig. 16.5 Power consumption optimization
According to (16.22), pc can be reduced by lowering the average value of iVdd .

To do so, we move the minimum point on the current trajectory close to zero and
obtain the optimal design with minimum pc as the red curve shown in Fig. 16.5. As
such, the power optimization requires us to change p in order to move the minimum
point of iVdd close to zero for smaller power consumption. To solve this problem,
the parametric yield rate Y .p0 / is first calculated from (16.15) and the performance
distribution is constructed accordingly, similar to the one in Fig. 16.4. Then, the
targeted yield rate YN is used to compare with Y .p0 / by
Y .p0 / D YN Y .p0 /: (16.23)

ı
Next, the NMC stochastic sensitivity analysis is performed to find @F . p I t/ @pi ;
ı
and thus, @Y . p / @pi in (16.21). As a result, with the first-order Taylor expansion
in SLP (16.20), one can determine the parameter incremental ıpyield D p p.0/ in
order to reach Y .p/ D YN by

YN Y p.0/ Y p.0/
ıpyield D D : (16.24)
rp Y p.0/ rp Y p.0/
On the other hand, we perform the same procedure to optimize the power
consumption. Same as in (16.19), we calculate the sensitivity of power consumption
w.r.t. process parameters at iVdd with a minimum current value:
" ˇ #
@pc .p/ @iVdd ˇˇ
D Vdd : (16.25)
@pi @pi ˇiV
dd DMinimum
The according parameter increments can be computed as

pNc pc p.0/ pc p.0/
ıppower D D : (16.26)
rp pc p.0/ rp pc p.0/
In this way, the total changes to the process parameters are the weighted
summation below:
ıptotal D
1 ıpyield C
2 ıppower ; .
1 ;
2 2 Œ0; 1/; (16.27)
where
1 and
2 are weights for yield and power consumption. Also,
1 and
2 can
be updated dynamically and weight
should be larger for the performance merit
that is farther from the target value.
Therefore, one can update p with the new parameter p0 C ıptotal . Moreover, the
NMC mismatch analysis is conducted to update the performance distribution, which
is denoted by a dashed curve shown in Fig. 16.4. With the updated new parameters
and performance distribution, all performance constraints F .p/ Fmax are checked
for violations. If they are still valid, p becomes the new design point, and this
procedure is repeated again to enhance the yield rate.
The presented NMC algorithms has been implemented for NMC mismatch analysis,
yield estimation, and optimization in a Matlab-based circuit simulator. All experi-
ments are performed on a Linux server with a 2.4 GHz Xeon processor and 4 GB
memory. In the experiment, we take the widths of MOSFETs as process variable
parameters. The initial results of this chapter were published in [52].
However, the presented approach only considers design parameters such as
channel width W , because the distribution of design parameters under process
variations can be shifted by tuning their nominal values. As such, more design
parameters with process variations can satisfy the performance constraints and the
total yield rate can be enhanced, which is also needed for a robust design. Therefore,
the parameters that are not tunable, such as channel length L, are not considered in
the presented approach.
We first use an operational amplifier (OPAM) to validate the accuracy and
efficiency of the NMC mismatch analysis by comparing it with the MC simulations.
Then, a Schmitt trigger is used to verify the presented parametric yield estimation
and stochastic yield analysis. Next, we demonstrate the validity and efficiency of
the presented yield optimization flow using a six-transistor SRAM cell.
Vdd +5V
Mp5
Mp8 Mp7
Output
Input− Mp1 Mp2 Input+

Is
Mn6
Mn3 Mn4
Vss −5V
Fig. 16.6 Schematic of operational amplifier
4.1 NMC Mismatch for Yield Analysis
The OPAM is shown in Fig. 16.6, which consists of eight MOS transistors. Their
widths are treated as stochastic variational parameters with Gaussian distributions
and a 10% random perturbation from their nominal values. Moreover, we consider
the matching design requirements for the input pair devices, such as the same
nominal width (Wp1 D Wp2 , W n3 D W n4 , Wp5 D Wp8 ) and the fixed width
ratio (W n6 D kW n3 ).
We first introduce the width variations to all MOS transistors, and perform 1;000
times MC simulations with a high confidence level to find the variational trajectories
at the output node. Then, we apply the developed NMC mismatch analysis to OPAM
and locate the boundaries ( 3, C 3) of variational trajectories with a one-
time run of transient circuit simulation. The results are shown in Fig. 16.7, where
blue lines denote the MC simulations and the two black lines are results from the
presented mismatch analysis. We observe that our approach can capture the transient
stochastic variation (mismatch) as accurately as that in the MC result.
We further compare the accuracy and efficiency for NMC mismatch analysis and
the MC method in the Table 16.1. From this table, we can see that NMC mismatch
analysis not only can achieve 2% accuracy, but also gains 680 speedup over MC
method.
4.2 Stochastic Yield Estimation
We further consider the Schmitt trigger shown in Fig. 16.8 to demonstrate the
stochastic yield estimation. Similarly, we assume the widths of all MOSFETs
Fig. 16.7 NMC mismatch analysis vs. Monte Carlo for operational amplifier case
Table 16.1 Comparison of Operational amplifier example

accuracy and runtime
Runtime (seconds) Proposed 1.33
Monte Carlo 905.06
Mean value () Proposed 0.35493
Unit: volt Monte Carlo 0.34724
Std. value ( ) Proposed 0.57032
to have 10% variations from their nominal values and to conform to Gaussian
distributions. Moreover, we consider the lower switching threshold VTL to be the
performance metric of the parametric yield, which can be changed due to MOSFET
width variations. Thus, the performance constraint for the parametric yield is the
following: when the input VTL is 1:8 V and the output is initially set to Vdd D 5 V,
the output VOUH should be greater than 4.2 V.
First, we perform 1;000 times MC simulations and compare it with the NMC
stochastic variation analysis shown in Fig. 16.9a. Then, the output distribution from
the MC simulation at the time step where input equals to 1:8 V is plotted in
Fig. 16.9b. Also, the PDF estimation by the NMC mismatch analysis is compared
with MC simulations in the same figure. We can observe that the two distributions
coincide with each other very well.
Then, the yield rate can be calculated with one estimated PDF from the NMC
mismatch analysis efficiently. We list the mean (), standard deviation (), and
yield estimation results from the presented approach and those by MC simulations
in Table 16.2.
Fig. 16.8 Schematic of Vdd

Schmitt trigger
Mp1
Mp2 Mp3
Vin Vout
Vdd
Mn1 Mn3
Mn2
GND
Table 16.2 Comparison of Schmitt trigger example

accuracy and runtime
Runtime (seconds) Proposed 1.06
Monte Carlo 801.84
Mean value () Proposed 4.2043
Unit: volt Monte CarloN 4.1993
Std. value ( ) Proposed 0.10487
Yield rate Proposed 0.48357
Monte Carlo 0.47059
With the accurate estimation of output distribution, the presented method can
calculate the yield rate with 2:7% accuracy as well as 756 speedup when compared
to the MC method.
More important, NMC mismatch analysis has linear scalability because all
process variation sources can be modeled as additive mismatch current sources and
introduced into the rhs of DAE system in (16.11).
4.3 Stochastic Sensitivity Analysis
Furthermore, we apply the presented stochastic sensitivity analysis to Schmitt

trigger example to find the contribution of each variation source to the output
variation. Note that we are interested in the lower switching threshold VTL , where
input increases from zero and output decreases from Vdd . The sensitivity of output
voltage variation output with respect to all MOSFET widths variations pi at the time
step where input equals to 1:8 V are shown in Table 16.3. From this table, we can
observe that widths of Mp1, Mp2, and M n3 transistors are more critical than other
MOSFETs.
a 5.2
5
Output Voltage (volt)
4.8
4.6
4.4
4.2
3.8
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Time [ns]
NMC mismatch analysis vs. MC
b 90
80
70
Number of Occuances
60
50
40
30
20
10
0
3.8 3.9 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7
Output Vltage (volt)
Output distributions from NMC mismatch analysis and Monte Carlo
Fig. 16.9 Comparison of Schmitt trigger example
Table 16.3 Sensitivity of Parameter Mn1 width Mn2 width Mn3 width
output with respect to each
MOSFET width variation pi Sensitivity 2.4083e-4 2.4083e-4 4.8069e-3
Parameter Mp1 width Mp2 width Mp3 width
Sensitivity 2.4692e-2 2.4692e-2 0
WL=1
Vdd +5V
Mn2 Mp5 Mp6 Mn4
Q =1 BL=1
BL_B=1 Q_B=0
Mn1 Mn3
GND
Fig. 16.10 Schematic of SRAM 6-T cell
4.4 Stochastic Yield Optimization
To demonstrate the yield optimization using stochastic sensitivity analysis, we use

a typical design of 6-T SRAM cell in Fig. 16.10. In this example, the performance
merit is the access time of the SRAM, which is determined by the voltage difference
between BL B and BL. Initially, both BL B and BL are precharged to Vdd , while
Q B stores zero and Q stores one. When reading the SRAM cell, BL B starts to
discharge from Vdd and produces a voltage difference V between it and BL. The
time it takes BL B to produce a large enough voltage difference Vth is called
access time. If the access time is larger than the threshold at the time step tthreshold ,
this leads to an access time failure. In the experiment, we assume that tthreshold D
0:04 ns and Vth D 0:1338 V.
Similarly, all channel widths of MOSFETs are considered as the variational
parameters which conform to Gaussian distributions with 12% perturbation from
nominal values. As such, when the access time differs from the nominal value due
to variations in channel width, access time failure occurs, and thus, yield loss may
happen. In order to enhance it, we first perform NMC mismatch analysis to find the
voltage distribution of BL B at tthreshold , which is shown in Fig. 16.11. Also, as a
baseline for comparison, we run 1;000 times MC simulations to plot the variational
transient waveforms of BL B, which are shown in Fig. 16.12. This validates the
accuracy of the NMC mismatch analysis.
Then,ı the sensitivityıanalysis developed in this chapter is used to find the
@vBL B @pi and @power @pi where pi is the width variation of i th MOS transistor
and power is the variation of power supply voltage source. Results are shown
in Table 16.4. From this table, we can see that only M n1, M n2, and Mp6 can
have influence on the access time and power variations; also, we can see that their
nominal values can be tuned to reduce access time failure for better parametric yield
rate and to lower the power consumption simultaneously.
Fig. 16.11 Voltage 100

distribution at BL B node
80
Number of Occurances
60
40
20
0
0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17
Output Voltage (volt)
Fig. 16.12 NMC mismatch analysis vs. MC
Table 16.4 Sensitivity of Parameter Mn1 width Mn2 width Mp6 width
vBL B and power with
respect to each MOSFET Sensitivity (vBL B ) 1.3922e-3 2.0787e-3 7.0941e-2
width variation pi Sensitivity (power ) 3.7888e-4 5.7816e-4 5.8871e-4
As a result, we apply the developed multiobjective yield optimization to improve

yield. For comparison purpose, two algorithms have been implemented:
1. Baseline, the generic gravity-directed method in [167], which moves the nominal
parameters to the gravity of successful region
2. The single-objective optimization which only improves the yield
Table 16.5 Comparison of different yield optimization algorithms for SRAM cell
Parameter First cut Baseline Single objective Multiobjective
Mn1 width (m) 1e-5 2.872e-5 2.7841e-5 3.577e-5
Mn2 width (m) 1e-5 2.3282e-5 2.2537e-5 2.7341e-5
Mp6 width (m) 3e-5 1.5308e-5 1.6296e-5 9.7585e-6
Power (W ) 1.0262e-005 3.0852e-5 1.2434e-5 1.0988e-5
Area (m2 ) 2.4e-11 2.81e-11 2.8e-11 2.88e-11
Yield (%) 49.32 94.23 95.49 95.31
Runtime (seconds) 2.42 32.384 27.226 15.21
Iterations 1 12 10 6
The results from all optimization methods are shown in Table 16.5. From this
table, it can be observed that all methods can improve the parametric yield to be
around or even more than 95% compared with the initial design. According nominal
values can be used as better initial design parameters. Meanwhile, the area is smaller
than the maximum acceptable area criterion A 1:2Ainitial .
However, optimal designs from baseline (gravity-directed) method and single-
objective optimization require 2:75 and 21% more power consumption when
compared with initial design, respectively. Proposed method can lead to optimal
design with only 7% more power requirement. Therefore, it can be demonstrated
that presented multiobjective optimization not only can improve the yield rate
but also suppresses the power penalty simultaneously. Moreover, the presented
optimization procedure only needs six iteration runs to achieve the shown results
within 15:21 s. Notice that the parametric yield Y .p/ can be further improved with
a higher target yield YN and more optimization iterations.
5 Summary
In this chapter, we have presented one fast NMC method to calculate mismatch in
time domain with the consideration of local random process variations. We model
the mismatch by a stochastic current source expanded by OPC. This leads to an
efficient solution for mismatch and further for parametric yield rate without using
the expensive MC simulations. In addition, we are the first to derive stochastic sensi-
tivity of yield within the context of OPC. This leads to a multiobjective optimization
method to improve the yield rate and other performance merits simultaneously.
Numerical examples demonstrate that the presented NMC approach can achieve up
to 2% accuracy with 700 speedup when compared to the Monte Carlo simulations.
Moreover, the presented multiobjective optimization can improve the yield rate up
to 95:3% with other performance merits optimized at the same time. The presented
approach assumes the need to know the distribution type of the process variations
in advance.
Chapter 17
Voltage Binning Technique for Yield
Optimization
1 Introduction
Process-induced variability has huge impacts on the circuit performance and yield
in the nanometer VLSI technologies [71]. Indeed, the characteristics of devices and
interconnects are prone to increasing process variability as device geometries get
close to the size of atoms. The yield loss from process fluctuations is expected
to increase as the transistor size scaling down. As a result, improving yields
considering the process variations is critical to mitigate the huge impacts from
process uncertainties.
Supply voltage adjustment can be used as a technique to reduce yield loss, which
is based on the fact that both chip performance and power consumption depend
on supply voltage. By increasing supply voltage, chip performance improves. Both
dynamic power and leakage power, however, will become worse at the same
time [182]. In contrast, lower supply voltage will reduce the power consumption but
make the chip slower. In other words, faster chips usually have higher power con-
sumption and slower chips often come with lower power consumption. Therefore,
it is possible to reduce yield loss by adjusting supply voltage to make some failing
chips satisfy application constraints.
For yield enhancement, there are also different schemes for supply voltage
adjustment. In [182], the authors proposed an adaptive supply voltage method for
reducing impacts of parameter variations by assigning individual supply voltage to
each manufactured chip. This methodology can be very effective but it requires
significant effort in chip design and testing at many different supply voltages.
Recently, a new voltage binning technique has been proposed by the patent [85]
for yield optimization as an alternative technique of adaptive supply voltage. All
manufactured chips are divided into several bins, and a certain value of supply
voltage is assigned to each bin to make sure all chips in this bin can work under the
corresponding supply voltage. At the cost of small yield loss, this technique is much
more practical than the adaptive voltage supply. But only a general idea is given
in [85], without details of selecting optimal supply voltage levels. Another recent

274 17 Voltage Binning Technique for Yield Optimization
work [213] provides a statistical technique of yield computation for different voltage
binning schemes. From results of statistical timing and variational power analysis,
the authors developed a combination of analytical and numerical techniques to
compute joint PDFs of chip yield as a function of inter-die variation in effective
gate length L, and solve the problem of computing optimal supply voltages for a
given binning scheme.
However, the method in [213] only works under several assumptions and approx-
imations that will cause accuracy loss in both yield analysis and optimal voltage
binning scheme. The statistical model for both timing and power analysis used in
[213] is simplified by integrating all process variations other than inter-die variation
in L to one random variable following Gaussian distribution. Indeed, the intra-die
variations have a huge impact on performance and power consumption [3,158]. And
other process variations (gate oxide thickness, threshold voltage, etc.) have different
distributions and should not be simplified to only one Gaussian distribution.
Furthermore, this technique cannot predict the number of voltage bins needed under
certain yield requirement before solving the voltage binning problem.
In general, voltage binning for yield improvement becomes an emerging tech-
nique but with many unsolved issues. In this chapter, we present a new voltage
binning scheme to optimize yield. The presented method first computes the set of
working supply voltage segments under timing and power constraints from either
the measurement of real chips or MC-based SPICE simulations on a chip with
process variations. Then on top of the distribution of voltage segment lengths,
we propose a formula to predict the upper bound of bin number needed under
uniform binning scheme for the yield requirement. Furthermore, we frame the
voltage binning scheme as a set-cover problem in graph theory and solve it by a
greedy algorithm in an incremental way. The presented method is not limited by
the number or types of process variability involved as it should be based on actual
measured results. Furthermore, the presented algorithm can be easily extended to
deal with a range of working supply voltages for dynamic voltage scaling under
different operation modes (like lower power and high-performance modes).
Numerical examples on a number of benchmarks under 45 nm technology show
that the presented method can correctly predict the upper bound on the number of
bins required. The optimal binning scheme can lead to significant saving for the
number of bins compared to the uniform one to achieve the same yield with very
small CPU cost.
2.1 Yield Estimation
A “good” chip needs to satisfy two requirements:

(1) Timing slack is positive S > 0 under working frequency.
(2) Power does not exceed the limit P < Plim .
For a single voltage supply, the definition of parametric chip yield is the percentage
of manufactured chips satisfying these constraints. Specifically, we compute yield
for a given voltage level by direct integration in the space of process parameters:
Z Z
Y D f .X1 ; : : : ; Xn /dX1 : : : dXn ; (17.1)
S >0;P <Plim
where f .X1 ; X2 ; : : : ; Xn / is the joint PDF of X1 to Xn , which represents
the process variations. Also, there exists spatial correlation in the intra-die part
of variation. Existing approach in [213] ignores the intra-die variation in process
parameters, which means only one random variable for inter-die variation is
considered. And all other variations except inter-die variation in Leff are integrated
into one Gaussian random variable. In this way, the multi-dimensional integral
in (17.1) can be modeled numerically as a two- or three-dimensional integral.
However, the spatial correlation can have significant impacts on both statistical
timing and statistical power of a circuit [12,158], thus impacts on yield analysis also.
2.2 Voltage Binning Problem
We first define voltage binning scheme as in [213].

Definition 17.1. A voltage binning scheme is a set of supply voltage levels V D
fV1 ; V2 ; : : : ; Vk g; a set of corresponding bins U D fU1 ; U2 ; : : : ; Uk g, which is also a
partitioning of all chips; and a binning algorithm B, which distributes manufactured
chips among the bins.
The binning algorithm B assigns chips to bins so that any chip in bin Ui meets both
the performance and power constraints at supply voltage level Vi corresponding
to Ui . The yield loss is constituted by chips which fail to be assigned to any bin
in U.
The definition of a voltage binning scheme depends on two factors: the bin
voltage levels V and the binning algorithm A. Different binning algorithms will
result in different yields even given the same bin voltage levels V. However, in
the optimization process, the focus is the binning algorithms which can produce
the maximum possible yield. That is to say, in an optimal binning algorithm, there
exists at least one voltage bin for any “good” chip (the chips satisfy performance
and power constraints). In this way, the yield loss under bin voltage levels V will
reach the maximum value.
Therefore, the problem of computing optimal voltage binning scheme can be
formulated as follows:
max Y I s:t: Vmin Vi 2 V Vmax ; (17.2)

V
where Y is the total yield under the optimal voltage binning scheme with supply
voltage levels V D fV1 ; V2 ; : : : ; Vk g.
We would like to mention one special type of voltage binning in which we have
an infinite number of voltage bins with all possible voltage levels. This binning
scheme allows the supply voltage to be individually tailored for each chip to meet
timing and power constraints. It is obvious that the yield in this case is the maximum
possible yield, named as Ymax , which should be an upper bound of yield for any
other voltage binning scheme. As a result, for optimal solution, kopt should be the
minimum number of bins that make Yk;opt D Ymax .
3 The Presented Voltage Binning Method
In this section, we present a new voltage binning scheme, which not only gives the
good solution for a given set of voltage levels, but also computes the minimum
number of bins required. Figure 17.1 presents the overall flow of the presented
method and highlights the major computing steps. Basically, steps 1 and 2 compute
the valid voltage segment for each chip. Step 3 determinates the voltage levels and
the chip assignments to the resulting bins. This is done by a greedy-based set-
covering method. In Fig. 17.1, S left denotes the set of uncovered voltage segments
left in the complete set of valid voltage segments Sval . Vi is the i th supply voltage
level, and chips assigned to Ui can meet both the power and timing constraints at
supply voltage Vi .
The algorithm in step 3 tries to find the voltage level one at a time such that it
can cover as many chips as possible in a greedy fashion (a chip is covered if its valid
Fig. 17.1 The algorithm sketch of the presented new voltage binning method
3 The Presented Voltage Binning Method 277
Mean delay as function of vdd Mean power as function of vdd

0.3 400
350
0.25 300
Power (μ W)
Delay (ns)
250
0.2
200
150
0.15
100
0.8 1 1.2 0.8 1 1.2 1.4
Supply voltage (V) Supply voltage (V)
Fig. 17.2 The delay and power change with supply voltage for C432
Vdd segment contains the given voltage level). The algorithm stops when all the
chips are covered, and the number of levels seen so far (kopt ) will be the minimum
number of bins that can reach the maximum possible yield Ymax . In the presented
algorithm, we can also provide a formulation to predict the minimum number of
bins required under the uniform binning scheme from the distribution of length of
valid Vdd segment, which can serve as a guideline for the number of bins required.
3.1 Voltage Binning Considering Valid Segment
For a chip, the working supply voltage range (segment) ŒVlow ; Vhigh actually can be
considered as a knob to do the trade-off between the power and timing of the circuit.
As we know, supply voltage affects power consumption and timing performance
in opposite ways. Reducing supply voltage will decrease the dynamic power and
leakage power, which is often considered the most effective technique for low
power design. On the other hand, propagation delay will increase as supply voltage
decreases [186]. Figure 17.2 shows the mean delay and power consumption as
functions of supply voltage, which show such trends clearly. As a result, given
the power consumption bound and the timing constraint for a chip, Vlow is mainly
decided by timing and Vhigh is mainly determined by power constraint. Since process
variation leads to different timing performances and power consumptions, the valid
Vdd segment ŒVlow ; Vhigh will be different for each chip. As a result, the measured
timing and total power data from a chip can be mapped onto corresponding working
Vdd segments, which is the step 1 in Fig. 17.1. For some chips, we may have
Vlow > Vhigh (invalid segment), which means that these chips will fail on any supply
voltage. So we call them “bad” chips.
Fig. 17.3 Valid voltage

segment graph and the
voltage binning solution
Vdd
Vmin V1 V2 V3 Vmax
Suppose there are N sampling chips from testing, and nbad bad chips. Obviously,
the maximum number of possible yield via voltage binning scheme only will be
Ymax D .N nbad /=N: (17.3)
We then define the set of valid segments Sval D ŒVlow ; Vhigh by removing the
bad chips from the sampling set and only keeping the valid segments (step 2 in
Fig. 17.1). Then the voltage binning scheme problem in (17.2) can be framed into
a set-cover problem. Take Fig. 17.3, for instance; there are nval D 13 horizontal
segments between Vmin and Vmax (each corresponds a valid Vdd segment), and the
problem becomes using minimum number of vertical lines to cover all the horizontal
segments. In this case, three voltage levels can cover all the Vdd segments of these
13 chips. We also notice that one chip can be covered by more than one voltage
level. In this case, it can be assigned to any voltage level containing it. The problem
is well known in graph theory with known efficient solutions. This valid voltage
segment model has many benefits compared with other yield analysis model for
voltage binning:
1. Distribution of length of valid supply voltage segment can provide information
about the minimum number for uniform binning under certain yield requirement
(e.g., to achieve 99% for Ymax , more details in Sect. 3.2.).
2. The model can also be used when the allowed supply voltage level for one voltage
bin is an interval or a group of discrete values for voltage scaling mechanism
instead of a scalar (details in Sect. 3.3).
3.2 Bin Number Prediction Under Given Yield Requirement
The distribution of valid Vdd segment length (defined as len D Vhigh Vlow ) can be
a guide in yield optimization when there is a lower bound requirement for yield.
And it works for both uniform binning and optimal binning. Notice that the optimal
3 The Presented Voltage Binning Method 279
Number of Sample Chips in Each Bin

450 Two σ One σ Mean Value
400
350
300
250
200
150
100
50
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Length of Valid Vdd Range (V)
Fig. 17.4 Histogram of the length of valid supply voltage segment len for C432
binning can always have an equal or better yield than the uniform binning. Actually,
the experiment result part shows that the number of bins needed for optimal voltage
binning is much smaller than the prediction from the distribution of len. Figure 17.4
shows the histogram of valid supply voltage length, len, for testing circuit C432,
from which we can see that it is hard to tell which type of random variable it belongs
to. However, it is quite simple to get the numerical probability density function
(PDF) and CDF from measured data of testing samples, as well as the mean value
and standard deviation.
Suppose the yield requirement is Yreq and the allowed supply voltages for testing
is in ŒVmin ; Vmax . For the uniform voltage binning scheme, there is k bins, and the set
of supply voltage levels is V D fV1 ; V2 ; : : : ; Vk g. Since the voltage binning scheme
is uniform,
Vi Vi 1 D V const. .i D 2; 3; : : : k/: (17.4)
For the uniform voltage binning scheme, we have the following observations:
Observation 1. If there are k bins in ŒVmin ; Vmax then
V D .Vmax Vmin /=.k C 1/: (17.5)
Observation 2. For a Vdd segment ŒVlow ; Vhigh with a length len D Vhigh Vlow , if
len > V , there must exist at least one Vdd level in the set of supply voltage levels
V D fV1 ; V2 ; : : : ; Vk g that can cover ŒVlow ; Vhigh . Now we have the following
results:
Proposition 17.1. For the yield requirement Yreq , the upper bound for voltage
binning numbers kup can be determined by
Vmax Vmin
kup D 1; (17.6)
F 1 .1 Yreq /
where F 1 .len/ is the inverse function of CDF of len.

(17.6) basically says that the upper bound for the numbers of voltage bins in uniform
scheme can be predicted from the yield requirement and the distribution of len.
Proof Sketch for Proposition 17.1:
If the chip satisfies the yield requirement Yreq ,
1 F .V / Yreq (Observation 2): (17.7)
For the upper bound for voltage binning numbers kup , the corresponding Vmin can
be calculated by
Vmax Vmin
Vmin D (Observation 1): (17.8)
kup C 1
From (17.7) and (17.8),

Vmax Vmin
Yreq D 1 F .Vmin / D 1 F ; (17.9)
kup C 1
which is equivalent form of (17.6). Q.E.D.

Notice that the optimal binning always has a better or equal yield compared
to uniform binning using same number of bins. Therefore, if the uniform voltage
binning scheme with k bins already satisfies the yield requirement, k bins must be
enough for the optimal voltage binning scheme. So the histogram for the length of
valid Vdd segment can be used to estimate the upper bound for the number of bins
needed for a certain yield requirement for both uniform and optimal voltage binning
schemes. And this process can be done right after mapping measured power and
timing data to working Vdd segments.
3.3 Yield Analysis and Optimization
The whole voltage binning algorithm for yield analysis and optimization is given
in Fig. 17.1. After the yield analysis and optimization, supply voltage levels V D
fV1 ; V2 ; : : : ; Vk;opt g, and the corresponding set of bins U D fU1 ; U2 ; : : : ; Uk;opt g can
be calculated up to kopt , where Yk;opt D Ymax already.
There are many algorithms for solving the set-cover problem in step 3. By
choosing optimal set-cover algorithm, the global optimal solution can be obtained.
In this case, the decision version of set-covering problem will be NP-complete. In
this chapter, we use a greedy approximation algorithm as shown in Fig. 17.5, which
Fig. 17.5 The flow of greedy algorithm for covering most uncovered elements in S
can easily be implemented to run in polynomial time and achieve a good enough
approximation of optimal solution. Notice that the greedy approximation is not
necessary and any algorithm for set-cover can be used in step 3, which is not a
limitation for the presented valid supply voltage segment model. The solution found
by GREEDY-SET-COVER is at most a small constant times larger than optimal [19],
which is found already satisfactory as shown in the experimental results. Besides,
the greedy algorithm can guarantee that each voltage level will cover the most
segments corresponding to uncovered testing chips, which means this algorithm is
incremental. As a result, if only k 1 bins is needed, we can stop the computation at
k 1 instead of k. And when the designer needs more voltage bins, the computation
does not need to be started all over again. Actually, the benefit of incremental voltage
binning scheme is very useful for circuit design. Since when the number of bins
increase from k 1 to k, the existing k 1 voltage levels will be the same.
We remark that the presented method can be easily extended to deal with a
group of discrete values Vg;1 ; Vg;2 ; : : : for dynamic voltage scaling under different
operation modes instead of a single voltage. For example, if the i th supply voltage
level Vi contains two discrete values, Vs and Vh , which are the supply voltages for
saving-power mode and high-performance mode, respectively (anything in between
also works for the selected chips). Set-cover algorithm in Fig. 17.5 now will use a
range Vg (defined by users) to cover the voltage segments instead of a single voltage
level. Such extension is very straightforward for the presented method.
In this section, the presented voltage binning technique for yield analysis and opti-
mization was verified on circuits in the ISCAS’85 benchmark set with constraints
on timing performance and power consumption. The circuits were synthesized
with Nangate Open Cell Library. The technology parameters come from the
45 nm FreePDK Base Kit and PTM models [139]. The presented method has been
implemented in Matlab 7.8.0. All the experiments are carried out in a Linux system
with quad Intel Xeon CPUs with 2:99 GHz and 16 GB memory.
Table 17.1 Predicted and Circuit Yreq Predicted Real for uni. Real for opt.
actual number of bins needed
under yield requirement C432 99% 25 23 4
97% 10 9 3
95% 7 6 3
C1908 99% 27 12 7
97% 11 6 3
95% 7 3 3
C2670 99% 8 4 3
97% 5 3 2
95% 3 2 1
C7552 99% 30 12 5
97% 9 4 3
95% 6 3 2
4.1 Setting of Process Variation
For each type of circuit in the benchmark, 10;000 Monte Carlo samples are
generated from process variations. In this chapter, effective gate length L and
gate oxide thickness Tox are considered as two main sources of process variations.
According to [71], the physical variation in L and Tox should be controlled within
˙12%. So the 3 values of variations for L and Tox were set to 12% of the nominal
values, of which inter-die variations constitute 20% and intra-die variations, 80%. L
is modeled as sum of spatially correlated sources of variations, and Tox is modeled
as an independent source of variation. The same framework can be easily extended
to include other parameters of variations. Both L and Tox are modeled as Gaussian
parameters. For the correlated L, the spatial correlation was modeled based on the
exponential models [195].
The power and timing information as a function of supply voltage for each testing
chip is characterized by using SPICE simulation. Under 45 nm technology, typical
supply voltage range is 0:85 V–1:3625 V [69]. Since that, Vdd is varied between 0.8
volt and 1.4 volt in this chapter, which is enough for 45 nm technology.
We remark that practically the power and timing information can be obtained
from measurements. As a result, all the sources of variability of transistors and
interconnects including inter-die and intra-die variations with spatial correlations
will be considered automatically.
4.2 Prediction of Bin Numbers Under Yield Requirement
As mentioned in Sect. 3.2, the presented valid segment model can be used to
predict the number of bins needed under yield requirement before voltage binning
optimization. Table 17.1 shows the comparison between the predicted number and
the actual number needed under yield requirement for the testing chips. In this
Table 17.2 Yield under uniform and optimal voltage binning schemes (%)
Circuit Ymax VB 1 bin 2 bins 5 bins 10 bins kopt
C432 96.66 Uni. 60.19 79.04 90.52 94.36 4,514
Opt. 80.08 88.68 96.42 96.66 10
C1908 98.06 Uni. 71.80 91.46 95.20 97.04 437
Opt. 89.18 92.88 97.18 98.06 21
C2670 90.15 Uni. 81.12 87.13 89.74 89.95 1,205
Opt. 85.77 88.34 89.83 90.08 13
C7552 93.46 Uni. 73.94 86.38 91.40 92.34 1,254
Opt. 87.22 90.30 92.64 93.26 18
table, Yreq means the lower bound requirement for yield optimization (normalized
by Ymax ). Column 3 is the predicted number of bins, and columns 4 and 5 are the
actual bin numbers found for the uniform and optimal voltage binning schemes,
respectively. This table validates the upper bound formulation for the needed
number of bins in Sect. 3.2. From this table, we can see that the predicted value
is always the upper bound of actual number of bins needed, which can be applied as
a guide for yield requirement in optimization. Table 17.1 also shows that the optimal
voltage binning scheme can significantly reduce the number of bins compared with
the uniform voltage binning schema under the same yield requirement. When yield
requirement is 99% of the optimal yield, the optimal voltage binning scheme can
reduce 52% bin count on average.
4.3 Comparison Between Uniform and Optimal Voltage

Binning Schemes
Numerical examples for both uniform and the optimal voltage binning schemes
with different number of bins are used to verify the presented voltage binning
technique. Table 17.2 shows the results, where Ymax is the maximum chip yield
which can be achieved when Vdd is adjusted individually for each manufactured
chip, VB stands for voltage binning schemes used, and kopt is the minimum number
of bins to achieve Ymax . From Table 17.2, we can see that the yield of optimal
VB always increases with the number of bins, with Ymax as the upper bound. And
the voltage binning can significantly improve yield compared with simple supply
voltage. Column 8 in Table 17.2 shows that the number of bins needed to achieve
Ymax in optimal voltage binning schemes is only 1.88% of number of bins needed in
the uniform scheme on average, which means that optimal voltage binning schemes
is much more economic in order to reach the best possible yield.
Figure 17.6 compares the yields from uniform and optimal voltage binning
schemes with the number of bins from 1 to 10 for C432. This figure shows
that the optimal binning scheme always provides higher yield than the uniform
Yield under different number of voltage bins

1
0.9
Optimal VB
0.8
Uniform VB
0.7
0.6
Yield
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10
Number of voltage bins
Fig. 17.6 Yield under uniform and optimal voltage binning schemes for C432
binning scheme. For optimal voltage binning scheme, the yield increasing speed is
slower as the bin number increases since we use greedy algorithm. For other testing
circuits, similar phenomenon is observed from the yield results.
4.4 Sensitivity to Frequency and Power Constraints
For very strict power or frequency constraints, voltage binning can provide more
opportunities to improve yield. Figure 17.7 shows the changes in parametric yield
for C432 with and without voltage binning yield optimization due to the changes in
frequency and power consumption requirements, where Pnorm is normalized power
constraint and fnorm is normalized frequency constraint. By analyzing this figure, we
can see that parametric yield is sensitive to both performance and power constraints.
As a result, yield can be substantially increased by binning supply voltage to a very
small amount of levels in the optimal voltage binning scheme. For example, without
voltage binning technique, the yield will fall down 0% when constraints become
20% stricter, while the voltage binning technique can keep the yield as high as 80%
under the same situation.
4.5 CPU Times
Table 17.3 compares the CPU times among different voltage binning schemes and
different numbers of bins. Since the inputs of the presented algorithm in Fig. 17.1
5 Summary 285
Fig. 17.7 Maximum achievable yield as function of power and performance constraints for C2670
Table 17.3 CPU time Circuit VB 1 bin 2 bins 5 bins 10 bins

comparison(s)
C432 Uni. 0.0486 0.0571 0.0866 0.1374
Opt. 0.0747 0.0786 0.0823 0.0827
C1908 Uni. 0.0551 0.0749 0.1237 0.2037
Opt. 0.0804 0.0840 0.0874 0.0901
C2670 Uni. 0.0347 0.0371 0.0425 0.0504
Opt. 0.0686 0.0696 0.0711 0.0704
C7552 Uni. 0.0476 0.0565 0.0925 0.1493
Opt. 0.0775 0.0791 0.0802 0.0812
are the measured data for real chips practically, the time cost of measuring data
is not counted in the time cost of the voltage binning method. But in this chapter,
the timing and power data is generated from SPICE simulation. There are three
steps in the presented method as shown in Fig. 17.1. It is easy to see that the time
complexity of steps 1 and 2 is both O.N /, where N is the number of MC sample
points. From [19], step 3 can run within O.N 2 ln.N // time. Therefore, the speed
of the voltage binning algorithm is not related to the size of circuits. Table 17.3
confirms that binning technique is insignificant even for the case of 10 bins, and the
time cost is not increasing with the number of gates on chip.
5 Summary
In this chapter, we have presented a voltage binning technique to improve the yield
of chips. First, A novel formulation has been introduced to predict the maximum
number of bins required under the uniform binning scheme from the distribution of
valid Vdd segment length. We then developed an approximation of optimal binning

scheme based on greedy-based set-cover solution to minimize the number of bins
and keep the corresponding voltage levels incremental. The presented method is also
extendable to deal with a range of working supply voltages for dynamic voltage
scaling operation. Numerical results on some benchmarks on 45 nm technology
show that the presented method can correctly predict the upper bound on the number
of bins required. The presented optimal binning scheme can lead to significant
saving for the number of bins compared to the uniform one to achieve the same
yield with very small CPU cost.
References
1. A. Abdollahi, F. Fallah, and M. Pedram, “Runtime mechanisms for leakage current reduction
in CMOS VLSI circuits,” in Proc. Int. Symp. on Low Power Electronics and Design (ISLPED),
Aug 2002, pp. 213–218.
2. A. Abu-Dayya and N. Beaulieu, “Comparison of methods of computing correlated lognormal
sum distributions and outages for digital wireless applications,” in Proc. IEEE Vehicular
Technology Conference, vol. 1, June 1994, pp. 175–179.
3. K. Agarwal, D. Blaauw, and V. Zolotov, “Statistical timing analysis for intra-die process
variations with spatial correlations,” in Proc. Int. Conf. on Computer Aided Design (ICCAD),
Nov 2003, pp. 900–907.
4. J. D. Alexander and V. D. Agrawal, “Algorithms for estimating number of glitches and
dynamic power in CMOS circuits with delay variations,” in IEEE Computer Society Annual
Symposium on VLSI, May 2009, pp. 127–132.
5. S. Bhardwaj, S. Vrudhula, and A. Goel, “A unified approach for full chip statistical timing and
leakage analysis of nanoscale circuits considering intradie process variations,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 10, pp. 1812–1825,
Oct 2008.
6. G. Biagetti, S. Orcioni, C. Turchetti, P. Crippa, and M. Alessandrini, “SiSMA: A tool for
efficient analysis of analog CMOS integrated circuits affected by device mismatch,” IEEE
TCAD, pp. 192–207, 2004.
7. S. Borkar, T. Karnik, and V. De, “Design and reliability challenges in nanometer technolo-
gies,” in Proc. Design Automation Conf. (DAC). IEEE Press, 2004, pp. 75–75.
8. S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De, “Parameter variations
and impact on circuits and microarchitecture,” in Proc. Design Automation Conf. (DAC).
IEEE Press, 2003, pp. 338–342.
9. C. Brau, Modern Problems In Classical Electrodynamics. Oxford Univ. Press, 2004.
10. R. Burch, F. Najm, P. Yang, and T. Trick, “A Monte Carlo approach for power estimation,”
IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 1, no. 1, pp. 63–71,
Mar 1993.
11. Y. Cao, Y. Lee, T. Chen, and C. C. Chen, “HiPRIME: hierarchical and passivity reserved
interconnect macromodeling engine for RLKC power delivery,” in Proc. Design Automation
Conf. (DAC), 2002, pp. 379–384.
12. H. Chang and S. Sapatnekar, “Statistical timing analysis under spatial correlations,” IEEE
Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 9,
pp. 1467–1482, Sept. 2005.

for Nanometer VLSI Designs, DOI 10.1007/978-1-4614-0788-1,
288 References
13. H. Chang and S. S. Sapatnekar, “Full-chip analysis of leakage power under process variations,
including spatial correlations,” in Proc. IEEE/ACM Design Automation Conference (DAC),
2005, pp. 523–528.
14. H. Chen, S. Neely, J. Xiong, V. Zolotov, and C. Visweswariah, “Statistical modeling and
analysis of static leakage and dynamic switching power,” in Power and Timing Modeling, Op-
timization and Simulation: 18th International Workshop, (PATMOS), Sep 2008, pp. 178–187.
15. R. Chen, L. Zhang, V. Zolotov, C. Visweswariah, and J. Xiong, “Static timing: back to
our roots,” in Proc. Asia South Pacific Design Automation Conf. (ASPDAC), Jan 2008,
pp. 310–315.
16. C. Chiang and J. Kawa, Design for Manufacturability. Springer, 2007.
17. E. Chiprout, “Fast flip-chip power grid analysis via locality and grid shells,” in Proc. Int.
Conf. on Computer Aided Design (ICCAD), Nov 2004, pp. 485–488.
18. T.-L. Chou and K. Roy, “Power estimation under uncertain delays,” Integr. Comput.-Aided
Eng., vol. 5, no. 2, pp. 107–116, Apr 1998.
19. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2nd ed.
MIT Press, 2001.
20. P. Cox, P. Yang, and O. Chatterjee, “Statistical modeling for efficient parametric yield
estimation of MOS VLSI circuits,” in IEEE Int. Electron Devices Meeting, 1983, pp. 391–398.
21. J. Cui, G. Chen, R. Shen, S. X.-D. Tan, W. Yu, and J. Tong, “Variational capacitance
modeling using orthogonal polynomial method,” in Proc. IEEE/ACM International Great
Lakes Symposium on VLSI, 2008, pp. 23–28.
22. L. Daniel, O. C. Siong, L. S. Chay, K. H. Lee, and J. White, “Multi-parameter moment-
matching model-reduction approach for generating geometrically parameterized interconnect
performance models,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and
Systems, vol. 23, no. 5, pp. 678–693, May 2004.
23. S. Dasgupta, “Kharitonov’s theorem revisited,” Systems & Control Letters, vol. 11, no. 5,
pp. 381–384, 1988.
24. V. De and S. Borkar, “Technology and design challenges for low power and high perfor-
mance,” in Proc. Int. Symp. on Low Power Electronics and Design (ISLPED), Aug 1999,
pp. 163–168.
25. L. H. de Figueiredo and J. Stolfi, “Self-validated numerical methods and applications,” in
Brazilian Mathematics Colloquium monographs, IMPA/CNPq, Rio de Janeiro, Brazil, 1997.
26. K. Deb, Multi-objective optimization using evolutionary algorithms. Wiley Publishing,
Hoboken, NJ, 2002.
27. A. Demir, E. Liu, and A.Sangiovanni-Vincentelli, “Time-domain non-Monte Carlo noise
simulation for nonlinear dynamic circuits with arbitrary excitations,” IEEE TCAD, pp. 493–
505, 1996.
28. C. Ding, C. Hsieh, and M. Pedram, “Improving the efficiency of Monte Carlo power
estimation VLSI,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 8, no. 5,
pp. 584–593, Oct 2000.
29. C. Ding, C. Tsui, and M. Pedram, “Gate-level power estimation using tagged probabilistic
simulation,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,
vol. 17, no. 11, pp. 1099–1107, Nov 1998.
30. Q. Dinh, D. Chen, and M. D. Wong, “Dynamic power estimation for deep submicron circuits
with process variation,” in Proc. Asia South Pacific Design Automation Conf. (ASPDAC), Jan
2010, pp. 587–592.
31. S. W. Director, P. Feldmann, and K. Krishna, “Statistical integrated circuit design,” IEEE J.
of Solid State Circuits, pp. 193–202, 1993.
32. P. Drennan and C. McAndrew, “Understanding MOSFET mismatch for analog design,” IEEE
J. of Solid State Circuits, pp. 450–456, 2003.
33. S. G. Duvall, “Statistical circuit modeling and optimization,” in Intl. Workshop Statistical
Metrology, Jun 2000, pp. 56–63.
34. T. El-Moselhy and L. Daniel, “Stochastic integral equation solver for efficient variation-aware
interconnect extraction,” in Proc. ACM/IEEE Design Automation Conf. (DAC), 2008.
References 289
35. J. Fan, N. Mi, S. X.-D. Tan, Y. Cai, and X. Hong, “Statistical model order reduction for
interconnect circuits considering spatial correlations,” in Proc. Design, Automation and Test
In Europe. (DATE), 2007, pp. 1508–1513.
36. P. Feldmann and R. W. Freund, “Efficient linear circuit analysis by Pade approximation via
the Lanczos process,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and
Systems, vol. 14, no. 5, pp. 639–649, May 1995.
37. P. Feldmann and S. W. Director, “Improved methods for IC yield and quality optimization
using surface integrals,” in IEEE/ACM ICCAD, 1991, pp. 158–161.
38. R. Fernandes and R. Vemuri, “Accurate estimation of vector dependent leakage power in
presence of process variations,” in Proc. IEEE Int. Conf. on Computer Design (ICCD),
Oct 2009, pp. 451–458.
39. I. A. Ferzli and F. N. Najm, “Statistical estimation of leakage-induced power grid voltage
drop considering within-die process variations,” in Proc. IEEE/ACM Design Automation
Conference (DAC), 2003, pp. 865–859.
40. I. A. Ferzli and F. N. Najm, “Statistical verification of power grids considering process-
induced leakage current variations,” in Proc. Int. Conf. on Computer Aided Design (ICCAD),
2003, pp. 770–777.
41. G. F. Fishman, Monte Carlo, concepts, algorithms, and Applications. Springer, 1996.
42. P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos, “Modeling within-die spatial
correlation effects for process design co-optimization,” in Proceedings of the 6th International
Symposium on Quality of Electronic Design, 2005, pp. 516–521.
43. O. Gay, D. Coeurjolly, and N. Hurst, “Libaffa: CCC affine arithmetic library for gnu/linux,”
May 2005, http://savannah.nongnu.org/projects/libaa/.
44. R. Ghanem, “The nonlinear Gaussian spectrum of log-normal stochastic processes and
variables,” Journal of Applied Mechanics, vol. 66, pp. 964–973, December 1999.
45. R. G. Ghanem and P. D. Spanos, Stochastic Finite Elements: A Spectral Approach. Dover
Publications, 2003.
46. P. Ghanta, S. Vrudhula, and S. Bhardwaj, “Stochasic variational analysis of large power
grids considering intra-die correlations,” in Proc. IEEE/ACM Design Automation Conference
(DAC), July 2006, pp. 211–216.
47. P. Ghanta, S. Vrudhula, R. Panda, and J. Wang, “Stochastic power grid analysis considering
process variations,” in Proc. Design, Automation and Test In Europe. (DATE), vol. 2, 2005,
pp. 964–969.
48. A. Ghosh, S. Devadas, K. Keutzer, and J. White, “Estimation of average switching activity in
combinational and sequential circuits,” in Proc. IEEE/ACM Design Automation Conference
(DAC), June 1992, pp. 253–259.
49. L. Giraud, S. Gratton, and E. Martin, “Incremental spectral preconditioners for sequences of
linear systems,” Appl. Num. Math., pp. 1164–1180, 2007.
50. K. Glover, “All optimal Hankel-norm approximations of linear multi-variable systems and
their L1 error bounds”,” Int. J. Control, vol. 36, pp. 1115–1193, 1984.
51. G. H. Golub and C. V. Loan, Matrix Computations, 3rd ed. The Johns Hopkins University
Press, 1996.
52. F. Gong, X. Liu, H. Yu, S. X. Tan, and L. He, “A fast non-Monte-Carlo yield analysis and
optimization by stochastic orthogonal polynomials,” ACM Trans. on Design Automation of
Electronics Systems, 2012, in press.
53. F. Gong, H. Yu, and L. He, “Picap: a parallel and incremental capacitance extraction
considering stochastic process variation,” in Proc. ACM/IEEE Design Automation Conf.
(DAC), 2009, pp. 764–769.
54. F. Gong, H. Yu, and L. He, “Stochastic analog circuit behaviour modelling by point estimation
method,” in ACM International Symposium on Physical Design (ISPD), 2011.
290 References
55. F. Gong, H. Yu, Y. Shi, D. Kim, J. Ren, and L. He, “QuickYield: an efficient global-search
based parametric yield estimation with performance constraints,” in Proc. ACM/IEEE Design
Automation Conf. (DAC), 2010, pp. 392–397.
56. F. Gong, H. Yu, L. Wang, and L. He, “A parallel and incremental extraction of variational ca-
pacitance with stochastic geometric moments,” IEEE Trans. on Very Large Scale Integration
(VLSI) Systems, 2012, in press.
57. R. L. Gorsuch, Factor Analysis. Hillsdale, NJ, 1974.
58. C. J. Gu and J. Roychowdhury, “Model reduction via projection onto nonlinear manifolds,
with applications to analog circuits and biochemical systems,” in Proc. Int. Conf. on Computer
Aided Design (ICCAD), Nov 2008.
59. C. Gu and J. Roychowdhury, “An efficient, fully nonlinear, variability-aware non-Monte-
Carlo yield estimation procedure with applications to SRAM cells and ring oscillators,” in
Proc. Asia South Pacific Design Automation Conf., 2008, pp. 754–761.
60. Z. Hao, R. Shen, S. X.-D. Tan, B. Liu, G. Shi, and Y. Cai, “Statistical full-chip dynamic power
estimation considering spatial correlations,” in Proc. Int. Symposium. on Quality Electronic
Design (ISQED), March 2011, pp. 677–782.
61. Z. Hao, R. Shen, S. X.-D. Tan, and G. Shi, “Performance bound analysis of analog
circuits considering process variations,” in Proc. Design Automation Conf. (DAC), July 2011,
pp. 310–315.
62. Z. Hao, S. X.-D. Tan, and G. Shi, “An efficient statistical chip-level total power estimation
method considering process variations with spatial correlation,” in Proc. Int. Symposium. on
Quality Electronic Design (ISQED), March 2011, pp. 671–676.
63. Z. Hao, S. X.-D. Tan, E. Tlelo-Cuautle, J. Relles, C. Hu, W. Yu, Y. Cai, and G. Shi, “Statistical
extraction and modeling of inductance considering spatial correlation,” Analog Integr Circ Sig
Process, 2012, in press.
64. B. P. Harish, N. Bhat, and M. B. Patil, “Process variability-aware statistical hybrid modeling
of dynamic power dissipation in 65 nm CMOS designs,” in Proc. Int. Conf. on Computing:
Theory and Applications (ICCTA), Mar 2007, pp. 94–98.
65. K. R. Heloue, N. Azizi, and F. N. Najm, “Modeling and estimation of full-chip leakage current
considering within-die correlation,” in Proc. IEEE/ACM Design Automation Conference
(DAC), 2007, pp. 93–98.
66. F. Hu and V. D. Agrawal, “Enhanced dual-transition probabilistic power estimation with
selective supergate analysis,” in Proc. IEEE Int. Conf. on Computer Design (ICCD), Oct 2005,
pp. 366–372.
67. G. M. Huang, W. Dong, Y. Ho, and P. Li, “Tracing SRAM separatrix for dynamic noise margin
analysis under device mismatch,” in Proc. of IEEE Int. Behavioral Modeling and Simulation
Conf., 2007, pp. 6–10.
68. A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis. Wiley, 2001.
69. “Intel pentium processor e5200 series specifications,” Intel Co., http://ark.intel.com/Product.
aspx?id=37212.
70. A. Iserles, A First Course in the Numerical Analysis of Differential Equations, 3rd ed.
Cambridge University, 1996.
71. “International technology roadmap for semiconductors (ITRS), 2010 update,” 2010, http://
public.itrs.net.
72. J. D. Jackson, Classical Electrodynamics. John Wiley and Sons, 1975.
73. H. Jiang, M. Marek-Sadowska, and S. R. Nassif, “Benefits and costs of power-gating
technique,” in Proc. IEEE Int. Conf. on Computer Design (ICCD), Oct 2005, pp. 559–566.
74. R. Jiang, W. Fu, J. M. Wang, V. Lin, and C. C.-P. Chen, “Efficient statistical capacitance
variability modeling with orthogonal principle factor analysis,” in Proc. Int. Conf. on
Computer Aided Design (ICCAD), 2005, pp. 683–690.
75. I. T. Jolliffe, Principal Component Analysis. Springer-Verlag, 1986.
76. M. Kamon, M. Tsuk, and J. White, “FastHenry: a multipole-accelerated 3D inductance
extraction program,” IEEE Trans. on Microwave Theory and Techniques, pp. 1750–1758,
Sept. 1994.
References 291
77. S. Kapur and D. Long, “IES3: A fast integral equation solver for efficient 3-dimensional
extraction,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 1997.
78. T. Karnik, S. Borkar, and V. De, “Sub-90 nm technologies-challenges and opportunities for
CAD,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), San Jose, CA, Nov 2002,
pp. 203–206.
79. V. L. Kharitonov, “Asymptotic stability of an equilibrium position of a family of systems of
linear differential equations,” Differential. Uravnen., vol. 14, pp. 2086–2088, 1978.
80. J. Kim, K. Jones, and M. Horowitz, “Fast, non-Monte-Carlo estimation of transient perfor-
mance variation due to device mismatch,” in Proc. IEEE/ACM Design Automation Conference
(DAC), 2007.
81. A. Klimke, “Sparse Grid Interpolation Toolbox—user’s guide,” University of Stuttgart, Tech.
Rep. IANS report 2006/001, 2006.
82. A. Klimke and B. Wohlmuth, “Algorithm 847: spinterp: Piecewise multilinear hierarchical
sparse grid interpolation in MATLAB,” ACM Transactions on Mathematical Software,
vol. 31, no. 4, 2005.
83. L. Kolev, V. Mladenov, and S. Vladov, “Interval mathematics algorithms for tolerance
analysis,” IEEE Trans. on Circuits and Systems, vol. 35, no. 8, pp. 967–975, Aug 1988.
84. J. N. Kozhaya, S. R. Nassif, , and F. N. Najm, “A multigrid-like technique for power grid
analysis,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 21,
no. 10, pp. 1148–1160, Oct 2002.
85. M. W. Kuemerle, S. K. Lichtensteiger, D. W. Douglas, and I. L. Wemple, “Integrated circuit
design closure method for selective voltage binning,” in U.S. Patent 7475366, Jan 2009.
86. Y. S. Kumar, J. Li, C. Talarico, and J. Wang, “A probabilistic collocation method based
statistical gate delay model considering process variations and multiple input switching,” in
Proc. Design, Automation and Test In Europe. (DATE), 2005, pp. 770–775.
87. A. Labun, “Rapid method to account for process variation in full-chip capacitance extraction,”
IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, pp. 941–
951, June 2004.
88. K. Lampaert, G. Gielen, and W. Sansen, “Direct performance-driven placement of mismatch-
sensitive analog circuits,” in Proc. IEEE/ACM Design Automation Conference (DAC), 1995,
pp. 445–449.
89. Y. Lee, Y. Cao, T. Chen, J. Wang, and C. Chen, “HiPRIME: Hierarchical and passivity
preserved interconnect macromodeling engine for RLKC power delivery,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 6, pp. 797–806, 2005.
90. A. Levkovich, E. Zeheb, and N. Cohen, “Frequency response envelopes of a family of
uncertain continuous-time systems,” IEEE Trans. on Circuits and Systems I: Fundamental
Theory and Applications, vol. 42, no. 3, pp. 156–165, Mar 1995.
91. D. Li and S. X.-D. Tan, “Statistical analysis of large on-chip power grid networks by
variational reduction scheme,” Integration, the VLSI Journal, vol. 43, no. 2, pp. 167–175,
April 2010.
92. D. Li, S. X.-D. Tan, G. Chen, and X. Zeng, “Statistical analysis of on-chip power grid
networks by variational extended truncated balanced realization method,” in Proc. Asia South
Pacific Design Automation Conf. (ASPDAC), Jan 2009, pp. 272–277.
93. D. Li, S. X.-D. Tan, and B. McGaughy, “ETBR: Extended truncated balanced realization
method for on-chip power grid network analysis,” in Proc. Design, Automation and Test In
Europe. (DATE), 2008, pp. 432–437.
94. D. Li, S. X.-D. Tan, E. H. Pacheco, and M. Tirumala, “Fast analysis of on-chip power grid
circuits by extended truncated balanced realization method,” IEICE Trans. on Fundamentals
of Electronics, Communications and Computer Science(IEICE), vol. E92-A, no. 12, pp. 3061–
3069, 2009.
95. P. Li and W. Shi, “Model order reduction of linear networks with massive ports via frequency-
dependent port packing,” in Proc. Design Automation Conf. (DAC), 2006, pp. 267–272.
292 References
96. T. Li, W. Zhang, and Z. Yu, “Full-chip leakage analysis in nano-scale technologies:
Mechanisms, variation sources, and verification,” in Proc. Design Automation Conf. (DAC),
June 2008, pp. 594–599.
97. X. Li, J. Le, L. Pileggi, and A. Strojwas, “Projection-based performance modeling for
inter/intra-die variations,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2005,
pp. 721–727.
98. X. Li, J. Le, and L. T. Pileggi, “Projection-based statistical analysis of full-chip leakage power
with non-log-normal distributions,” in Proc. IEEE/ACM Design Automation Conference
(DAC), July 2006, pp. 103–108.
99. Y. Lin and D. Sylvester, “Runtimie lekaage power estimation technique for combinational
circuits,” in Proc. Asia South Pacific Design Automation Conf. (ASPDAC), Jan 2007,
pp. 660–665.
100. B. Liu, F. V. Fernandez, and G. Gielen, “An accurate and efficient yield optimization method
for analog circuits based on computing budget aladdress and memetic search technique,” in
Proc. Design Automation and Test Conf. in Europe, 2010, pp. 1106–1111.
101. Y. Liu, S. Nassif, L. Pileggi, and A. Strojwas, “Impact of interconnect variations on the clock
skew of a gigahertz microprocessor,” in Proc. IEEE/ACM Design Automation Conference
(DAC), 2000, pp. 168–171.
102. Y. Liu, L. T. Pileggi, and A. J. Strojwas, “Model order-reduction of rc(l) interconnect
including variational analysis,” in DAC ’99: Proceedings of the 36th ACM/IEEE conference
on Design automation, 1999, pp. 201–206.
103. R. Marler and J. Arora, “Survey of multi-objective optimization methods for engineering,”
Struct Multidisc Optim 26, pp. 369–395, 2004.
104. H. Masuda, S. Ohkawa, A. Kurokawa, and M. Aoki, “Challenge: Variability characterization
and modeling for 65- to 90-nm processes,” in Proc. IEEE Custom Integrated Circuits Conf.,
2005.
105. C. McAndrew, J. Bates, R. Ida, and P. Drennan, “Efficient statistical BJT modeling, why beta
is more than ic/ib,” in Proc. IEEE Bipolar/BiCMOS Circuits and Tech. Meeting, 1997.
106. “MCNC benchmark circuit placements,” http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/
nPlacement/.
107. N. Mi, J. Fan, and S. X.-D. Tan, “Simulation of power grid networks considering wires and
lognormal leakage current variations,” in Proc. IEEE International Workshop on Behavioral
Modeling and Simulation (BMAS), Sept. 2006, pp. 73–78.
108. N. Mi, J. Fan, and S. X.-D. Tan, “Statistical analysis of power grid networks considering
lognormal leakage current variations with spatial correlation,” in Proc. IEEE Int. Conf. on
Computer Design (ICCD), 2006, pp. 56–62.
109. N. Mi, J. Fan, S. X.-D. Tan, Y. Cai, and X. Hong, “Statistical analysis of on-chip power
delivery networks considering lognormal leakage current variations with spatial correlations,”
IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications, vol. 55, no. 7,
pp. 2064–2075, Aug 2008.
110. N. Mi, S. X.-D. Tan, Y. Cai, and X. Hong, “Fast variational analysis of on-chip power grids
by stochastic extended krylov subspace method,” IEEE Trans. on Computer-Aided Design of
Integrated Circuits and Systems, vol. 27, no. 11, pp. 1996–2006, 2008.
111. N. Mi, S. X.-D. Tan, P. Liu, J. Cui, Y. Cai, and X. Hong, “Stochastic extended Krylov
subspace method for variational analysis of on-chip power grid networks,” in Proc. Int. Conf.
on Computer Aided Design (ICCAD), 2007, pp. 48–53.
112. B. Moore, “Principal component analysis in linear systems: Controllability, and observability,
and model reduction,” IEEE Trans. Automat. Contr., vol. 26, no. 1, pp. 17–32, 1981.
113. R. E. Moore, Interval Analysis. Prentice-Hall, 1966.
114. S. Mukhopadhyay and K. Roy, “Modeling and estimation of total leakage current in nano-
scaled CMOS devices considering the effect of parameter variation,” in Proc. Int. Symp. on
Low Power Electronics and Design (ISLPED), 2003, pp. 172–175.
115. K. Nabors and J. White, “Fastcap: A multipole accelerated 3-d capacitance extraction
program,” IEEE TCAD, pp. 1447–1459, Nov 1991.
References 293
116. F. Najm, “Transition density: a new measure of activity in digital circuits,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 12, no. 2, pp. 310–323, Feb
1993.
117. F. Najm, R. Burch, P. Yang, and I. Hajj, “Probabilistic simulation for reliability analysis of
CMOS VLSI circuits,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and
Systems, vol. 9, no. 4, pp. 439–450, Apr 1990.
118. K. Narbos and J. White, “FastCap: a multipole accelerated 3D capacitance extraction
program,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,
vol. 10, no. 11, pp. 1447–1459, 1991.
119. S. Narendra, V. De, S. Borkar, D. A. Antoniadis, and A. P. Chandrakasan, “Full-chip
subthreshold leakage power prediction and reduction techniques for sub-0.18-m CMOS,”
IEEE J. Solid-State Circuits, vol. 39, no. 3, pp. 501–510, Mar 2004.
120. S. Nassif, “Delay variability: sources, impact and trends,” in Proc. IEEE Int. Solid-State
Circuits Conf., San Francisco, CA, Feb 2000, pp. 368–369.
121. S. Nassif, “Design for variability in DSM technologies,” in Proc. Int. Symposium. on Quality
Electronic Design (ISQED), San Jose, CA, Mar 2000, pp. 451–454.
122. S. R. Nassif, “Model to hardware correlation for nm-scale technologies,” in Proc. IEEE Inter-
national Workshop on Behavioral Modeling and Simulation (BMAS), Sept 2007, keynote
speech.
123. S. R. Nassif, “Power grid analysis benchmarks,” in Proc. Asia South Pacific Design Auto-
mation Conf. (ASPDAC), 2008, pp. 376–381.
124. S. R. Nassif and K. J. Nowka, “Physical design challenges beyond the 22 nm node,” in Proc.
ACM Int. Sym. Physical Design (ISPD), 2010, pp. 13–14.
125. “Nangate open cell library,” http://www.nangate.com/.
126. E. Novak and K. Ritter, “Simple cubature formulas with high polynomial exactness,”
Constructive Approximation, vol. 15, no. 4, pp. 449–522, Dec 1999.
127. A. Odabasioglu, M. Celik, and L. Pileggi, “PRIMA: Passive reduced-order interconnect
macro-modeling algorithm,” IEEE TCAD, pp. 645–654, 1998.
128. J. Oehm and K. Schumacher, “Quality assurance and upgrade of analog characteristics by fast
mismatch analysis option in network analysis environment,” IEEE J. of Solid State Circuits,
pp. 865–871, 1993.
129. M. Orshansky, L. Milor, and C. Hu, “Characterization of spatial intrafield gate cd variability,
its impact on circuit performance, and spatial mask-level correction,” in IEEE Trans. on
Semiconductor Devices, vol. 17, no. 1, Feb 2004, pp. 2–11.
130. C. C. Paige and M. A. Saunders, “Solution of sparse indefinite systems of linear equations,”
SIAM J. on Numerical Analysis, vol. 12, no. 4, pp. 617–629, September 1975.
131. S. Pant, D. Blaauw, V. Zolotov, S. Sundareswaran, and R. Panda, “A stochastic approach
to power grid analysis,” in Proc. IEEE/ACM Design Automation Conference (DAC), 2004,
pp. 171–176.
132. A. Papoulis and S. Pillai, Probability, Random Variables and Stochastic Processes. McGraw-
Hill, 2001.
133. M. Pelgrom, A. Duinmaijer, and A. Welbers, “Matching properties of mos transistors,” IEEE
J. of Solid State Circuits, pp. 1433–1439, 1989.
134. J. R. Phillips and L. M. Silveira, “Poor man’s TBR: a simple model reduction scheme,” IEEE
Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 1, pp. 43–
55, 2005.
135. L. Pileggi, G. Keskin, X. Li, K. Mai, and J. Proesel, “Mismatch analysis and statistical design
at 65 nm and below,” in Proc. IEEE Custom Integrated Circuits Conf., 2008, pp. 9–12.
136. L. T. Pillage and R. A. Rohrer, “Asymptotic waveform evaluation for timing analysis,” IEEE
Trans. on Computer-Aided Design of Integrated Circuits and Systems, pp. 352–366, April
1990.
137. L. T. Pillage, R. A. Rohrer, and C. Visweswariah, Electronic Circuit and System Simulation
Methods. New York: McGraw-Hill, 1994.
294 References
138. S. Pilli and S. Sapatnekar, “Power estimation considering statistical ic parametric variations,”
in Proc. IEEE Int. Symp. on Circuits and Systems (ISCAS), vol. 3, June 1997, pp. 1524–1527.
139. “Predictive Technology Model,” http://www.eas.asu.edu/ptm/.
140. L. Qian, D. Zhou, S. Wang, and X. Zeng, “Worst case analysis of linear analog circuit
performance based on kharitonov’s rectangle,” in Proc. IEEE Int. Conf. on Solid-State and
Integrated Circuit Technology (ICSICT), Nov 2010.
141. W. T. Rankin, III, “Efficient parallel implementations of multipole based n-body algorithms,”
Ph.D. dissertation, Duke University, Durham, NC, USA, 1999.
142. R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester, “Statistical analysis of subthreshold
leakage current for VLSI circuits,” IEEE Trans. on Very Large Scale Integration (VLSI)
Systems, vol. 12, no. 2, pp. 131–139, Feb 2004.
143. J. Relles, M. Ngan, E. Tlelo-Cuautle, S. X.-D. Tan, C. Hu, W. Yu, and Y. Cai, “Statistical
extraction and modeling of 3D inductance with spatial correlation,” in Proc. IEEE Interna-
tional Workshop on Symbolic and Numerical Methods, Modeling and Applications to Circuit
Design, Oct 2010.
144. M. Rewienski and J. White, “A trajectory piecewise-linear approach to model order reduction
and fast simulation of nonlinear circuits and micromachined devices,” IEEE Trans. on
Feb 2003.
145. J. Roy, S. Adya, D. Papa, and I. Markov, “Min-cut floorplacement,” IEEE Trans. on
July 2006.
146. J. Roychowdhury, “Reduced-order modelling of time-varying systems,” in Proc. Asia South
Pacific Design Automation Conf. (ASPDAC), Jan 1999, pp. 53–56.
147. A. E. Ruehli, “Equivalent circuits models for three dimensional multiconductor systems,”
IEEE Trans. on Microwave Theory and Techniques, pp. 216–220, 1974.
148. R. Rutenbar, “Next-generation design and EDA challenges,” in Proc. Asia South Pacific
Design Automation Conf. (ASPDAC), January 2007, keynote speech.
149. Y. Saad and M. H. Schultz, “GMRES: a generalized minimal residual algorithm for solving
nonsymmetric linear systems,” SIAM J. on Sci and Sta. Comp., pp. 856–869, 1986.
150. Y. Saad, Iterative methods for sparse linear systems. SIAM, 2003.
151. S. B. Samaan, “The impact of device parameter variations on the frequency and performance
of VLSI chips,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), ser. ICCAD ’04,
2004, pp. 343–346.
152. Y. Sawaragi, H. Nakayama, and T. Tanino, Theory of Multiobjective Optimization (vol. 176
of Mathematics in Science and Engineering). Orlando, FL: Academic Press Inc. ISBN
0126203709, 1985.
153. F. Schenkel, M. Pronath, S. Zizala, R. Schwencker, H. Graeb, and K. Antreich, “Mismatch
analysis and direct yield optimization by specwise linearization and feasibility-guided
search,” in Proc. IEEE/ACM Design Automation Conference (DAC), 2001.
154. A. S. Sedra and K. C. Smith, Microelectronic Circuits. Oxford University Press, USA, 2009.
155. R. Shen, N. Mi, S. X.-D. Tan, Y. Cai, and X. Hong, “Statistical modeling and analysis of
chip-level leakage power by spectral stochastic method,” in Proc. Asia South Pacific Design
Automation Conf. (ASPDAC), Jan 2009, pp. 161–166.
156. R. Shen, S. X.-D. Tan, J. Cui, W. Yu, Y. Cai, and G. Chen, “Variational capacitance extraction
and modeling based on orthogonal polynomial method,” IEEE Trans. on Very Large Scale
Integration (VLSI) Systems, vol. 18, no. 11, pp. 1556–1565, 2010.
157. R. Shen, S. X.-D. Tan, N. Mi, and Y. Cai, “Statistical modeling and analysis of chip-level
leakage power by spectral stochastic method,” Integration, the VLSI Journal, vol. 43, no. 1,
pp. 156–165, January 2010.
158. R. Shen, S. X.-D. Tan, and J. Xiong, “A linear algorithm for full-chip statistical leakage power
analysis considering weak spatial correlation,” in Proc. Design Automation Conf. (DAC), Jun.
2010, pp. 481–486.
References 295
159. R. Shen, S. X.-D. Tan, and J. Xiong, “A linear statistical analysis for full-chip leakage power
with spatial correlation,” in Proc. IEEE/ACM International Great Lakes Symposium on VLSI
(GLSVLSI), May 2010, pp. 227–232.
160. C.-J. Shi and X.-D. Tan, “Canonical symbolic analysis of large analog circuits with determi-
nant decision diagrams,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and
Systems, vol. 19, no. 1, pp. 1–18, Jan 2000.
161. C.-J. Shi and X.-D. Tan, “Compact representation and efficient generation of s-expanded
symbolic network functions for computer-aided analog circuit design,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 20, no. 7, pp. 813–827, April
2001.
162. C.-J. R. Shi and M. W. Tian, “Simulation and sensitivity of linear analog circuits under
parameter variations by robust interval analysis,” ACM Trans. Des. Autom. Electron. Syst.,
vol. 4, pp. 280–312, July 1999.
163. W. Shi, J. Liu, N. Kakani, and T. Yu, “A fast hierarchical algorithm for 3-d capacitance
extraction,” in Proc. ACM/IEEE Design Automation Conf. (DAC), 1998.
164. W. Shi, J. Liu, N. Kakani, and T. Yu, “A fast hierarchical algorithm for 3-d capacitance
extraction,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,
vol. 21, no. 3, pp. 330–336, March 2002.
165. R. W. Shonkwiler and L. Lefton, An introduction to parallel and vector scientific computing.
Cambridge University Press, 2006.
166. V. Simoncini and D. Szyld, “Recent computational developments in Krylov subspace methods
for linear systems,” Num. Lin. Alg. with Appl., pp. 1–59, 2007.
167. R. S. Soin and R. Spence, “Statistical exploration approach to design centering,” Proceedings
of the Institution of Electrical Engineering, pp. 260–269, 1980.
168. R. Spence and R. Soin, Tolerance Design of Electronic Circuits. Addison-Wesley, Reading,
MA., 1988.
169. A. Srivastava, R. Bai, D. Blaauw, and D. Sylvester, “Modeling and analysis of leakage power
considering within-die process variations,” in Proc. Int. Symp. on Low Power Electronics and
Design (ISLPED), Aug 2002, pp. 64–67.
170. A. Srivastava, D. Sylvester, and D. Blaauw, Statistical Analysis and Optimization for VLSI:
Timing and Power. Springer, 2005.
171. G. W. Stewart, Matrix Algorithms, VOL II. SIAM Publisher, 2001.
172. B. G. Streetman and S. Banerjee, Solid-State Electronic Devices. Prentice Hall, 2000, 5th ed.
173. E. Suli and D. Mayers, An Introduction to Numerical Analysis. Cambridge University, 2006.
174. S. X.-D. Tan, W. Guo, and Z. Qi, “Hierarchical approach to exact symbolic analysis of large
analog circuits,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,
vol. 24, no. 8, pp. 1241–1250, August 2005.
175. S. X.-D. Tan and C.-J. Shi, “Efficient DDD-based interpretable symbolic characterization of
large analog circuits,” IEICE Trans. on Fundamentals of Electronics, Communications and
Computer Science(IEICE), vol. E86-A, no. 12, pp. 3112–3118, Dec 2003.
176. S. X.-D. Tan and C.-J. Shi, “Efficient approximation of symbolic expressions for analog
behavioral modeling and analysis,” IEEE Trans. on Computer-Aided Design of Integrated
Circuits and Systems, vol. 23, no. 6, pp. 907–918, June 2004.
177. S. X.-D. Tan and L. He, Advanced Model Order Reduction Techniques in VLSI Design.
Cambridge University Press, 2007.
178. R. Teodorescu, B. Greskamp, J. Nakano, S. R. Sarangi, A. Tiwari, and J. Torrellas, “A
model of parameter variation and resulting timing errors for microarchitects,” in Workshop
on Architectural Support for Gigascale Integration (ASGI), Jun 2007.
179. W. Tian, X.-T. Ling, and R.-W. Liu, “Novel methods for circuit worst-case tolerance analysis,”
IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications, vol. 43, no. 4,
pp. 272–278, Apr 1996.
296 References
180. S. Tiwary and R. Rutenbar, “Generation of yield-aware Pareto surfaces for hierarchical circuit
design space exploration,” in Proc. IEEE/ACM Design Automation Conference (DAC), 2006,
pp. 31–36.
181. S. K. Tiwary and R. A. Rutenbar, “Faster, parametric trajectory-based macromodels via
localized linear reductions,” in Proc. Int. Conf. on Computer Aided Design (ICCAD),
Nov 2006, pp. 876–883.
182. J. W. Tschanz, S. Narendra, R. Nair, and V. De, “Ectiveness of adaptive supply voltage and
body bias for reducing impact of parameter variations in low power and high performance
microprocessors,” IEEE J. Solid-State Circuits, vol. 38, no. 5, pp. 826–829, May 2003.
183. C.-Y. Tsui, M. Pedram, and A. Despain, “Efficient estimation of dynamic power consumption
under a real delay model,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), Nov 1993,
pp. 224–228.
184. “Umfpack,” http://www.cise.ufl.edu/research/sparse/umfpack/.
185. J. Vlach and K. Singhal, Computer Methods for Circuit Analysis and Design. New York, NY:
Van Nostrand Reinhold, 1995.
186. M. Vratonjic, B. R. Zeydel, and V. G. Oklobdzija, “Circuit sizing and supply-voltage selection
for low-power digital circuit design,” in Power and Timing Modeling, Optimization and
Simulation: 18th International Workshop, (PATMOS), 2006, pp. 148–156.
187. S. Vrudhula, J. M. Wang, and P. Ghanta, “Hermite polynomial based interconnect analysis
in the presence of process variations,” IEEE Trans. on Computer-Aided Design of Integrated
Circuits and Systems, vol. 25, no. 10, 2006.
188. C.-Y. Wang and K. Roy, “Maximum power estimation for CMOS circuits using deterministic
and statistical approaches,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems,
vol. 6, no. 1, pp. 134–140, Mar 1998.
189. H. Wang, H. Yu, and S. X.-D. Tan, “Fast analysis of nontree-clock network considering
environmental uncertainty by parameterized and incremental macromodeling,” in Proc.
IEEE/ACM Asia South Pacific Design Automation Conf. (ASPDAC), 2009, pp. 379–384.
190. J. Wang, P. Ghanta, and S. Vrudhula, “Stochastic analysis of interconnect performance in the
presence of process variations,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), Nov
2004, pp. 880–886.
191. J. M. Wang and T. V. Nguyen, “Extended Krylov subspace method for reduced order analysis
of linear circuit with multiple sources,” in Proc. IEEE/ACM Design Automation Conference
(DAC), 2000, pp. 247–252.
192. J. M. Wang, B. Srinivas, D. Ma, C. C.-P. Chen, and J. Li, “System-level power and thermal
modeling and analysis by orthogonal polynomial based response surface approach (OPRS),”
in Proc. Int. Conf. on Computer Aided Design (ICCAD), Nov 2005, pp. 727–734.
193. M. S. Warren and J. K. Salmon, “A parallel hashed oct-tree n-body algorithm,” in Proceedings
of the 1993 ACM/IEEE conference on Supercomputing, ser. Supercomputing ’93, 1993,
pp. 12–21.
194. D. Wilton, S. Rao, A. Glisson, D. Schaubert, O. Al-Bundak, and C. Butler, “Potential integrals
for uniform and linear source distributions on polygonal and polyhedral domains,” IEEE
Trans. on Antennas and Propagation, vol. AP-32, no. 3, pp. 276–281, March 1984.
195. J. Xiong, V. Zolotov, and L. He, “Robust extraction of spatial correlation,” IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, vol. 26, no. 4, 2007.
196. D. Xiu and G. Karniadakis, “The Wiener-Askey polynomial chaos for stochastic differential
equations,” SIAM J. Scientific Computing, vol. 24, no. 2, pp. 619–644, Oct 2002.
197. D. Xiu and G. Karniadakis, “Modeling uncertainty in flow simulations via generalized
polynomial chaos,” J. of Computational Physics, vol. 187, no. 1, pp. 137–167, May 2003.
198. H. Xu, R. Vemuri, and W. Jone, “Run-time active leakage reduction by power gating and
reverse body biasing: An energy view,” in Proc. IEEE Int. Conf. on Computer Design (ICCD),
Oct 2008, pp. 618–625.
199. S. Yan, V. Sarim, and W. Shi, “Sparse transformation and preconditioners for 3-d capacitance
extraction,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems,
vol. 24, no. 9, pp. 1420–1426, 2005.
References 297
200. Z. Ye and Z. Yu, “An efficient algorithm for modeling spatially-correlated process variation in
statistical full-chip leakage analysis,” in Proc. Int. Conf. on Computer Aided Design (ICCAD),
Nov 2009, pp. 295–301.
201. L. Ying, G. Biros, D. Zorin, and H. Langston, “A new parallel kernel-independent fast multi-
pole method,” in IEEE Conf. on High Performance Networking and Computing, 2003.
202. H. Yu, X. Liu, H. Wang, and S. X.-D. Tan, “A fast analog mismatch analysis by an incremental
and stochastic trajectory piecewise linear macromodel,” in Proc. Asia South Pacific Design
Automation Conf. (ASPDAC), Jan 2010, pp. 211–216.
203. H. Yu and S. X.-D. Tan, “Recent advance in computational prototyping for analysis of
high-performance analog/RF ICs,” in IEEE International Conf. on ASIC (ASICON), 2009,
pp. 760–764.
204. W. Yu, C. Hu, and W. Zhang, “Variational capacitance extraction of on-chip interconnects
based on continuous surface model,” in Proc. IEEE/ACM Design Automation Conference
(DAC), July 2009, pp. 758–763.
205. W. Zhang, W. Yu, Z. Wang, Z. Yu, R. Jiang, and J. Xiong, “An efficient method for chip-level
statistical capacitance extraction considering process variations with spatial correlation,” in
Proc. Design, Automation and Test In Europe. (DATE), Mar 2008, pp. 580–585.
206. M. Zhao, R. V. Panda, S. S. Sapatnekar, and D. Blaauw, “Hierarchical analysis of power
distribution networks,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and
Systems, vol. 21, no. 2, pp. 159–168, Feb 2002.
207. Y. Zhou, Z. Li, Y. Tian, W. Shi, and F. Liu, “A new methodology for interconnect parasitics
extraction considering photo-lithography effects,” in Proc. Asia South Pacific Design Automa-
tion Conf. (ASPDAC), Jan 2007, pp. 450–455.
208. H. Zhu, X. Zeng, W. Cai, J. Xue, and D. Zhou, “A sparse grid based spectral stochastic collo-
cation method for variations-aware capacitance extraction of interconnects under nanometer
process technology,” in Proc. Design, Automation and Test In Europe. (DATE), Mar 2007,
pp. 1514–1519.
209. Z. Zhu and J. Phillips, “Random sampling of moment graph: a stochastic Krylov-
reduction algorithm,” in Proc. Design, Automation and Test In Europe. (DATE), April 2007,
pp. 1502–1507.
210. Z. Zhu and J. White, “FastSies: a fast stochastic integral equation solver for modeling
the rough surface effect,” in Proc. Int. Conf. on Computer Aided Design (ICCAD), 2005,
pp. 675–682.
211. Z. Zhu, B. Song, and J. White, “Algorithms in FastImp: a fast and wideband impedance
extraction program for complicated 3-d geometries,” in Proc. Design Automation Conf.
(DAC). New York, NY, USA: ACM, 2003, pp. 712–717.
212. Z. Zhu, J. White, and A. Demir, “A stochastic integral equation method for modeling the
rough surface effect on interconnect capacitance,” in Proc. Int. Conf. on Computer Aided
Design (ICCAD), 2004, pp. 887–891.
213. V. Zolotov, C. Viweswariah, and J. Xiong, “Voltage binning under process variation,” in Proc.
Int. Conf. on Computer Aided Design (ICCAD), Nov 2009, pp. 425–432.
214. Y. Zou, Y. Cai, Q. Zhou, X. Hong, S. X.-D. Tan, and L. Kang, “Practical implementation of
stochastic parameterized model order reduction via hermite polynomial chaos,” in Proc. Asia
South Pacific Design Automation Conf. (ASPDAC), Jan 2007, pp. 367–372.
Index
A C
Adaptive voltage supply CAD
yield optimization, 273 developers, 9
Affine interval, 13 inductance extraction, 209
performance bound analysis, 222 Capacitance extraction, 163
Arnoldi algorithm Capacitance matrix
capacitance extraction, 194, 199 power grid, 111
power grid, 150 CDF
Askey scheme, 29 cumulative distribution function, 19
yield analysis, 257 Charge distribution
Augmented potential coefficient matrix capacitance extraction, 165
capacitance extraction, 167 Chebyshev’s inequality, 17–18
Cholesky decomposition, 26
CMP, 3
Collocation-based method
B spectral stochastic method, 31
Balancing Collocation-based spectral stochastic method
TBR, 146 capacitance extraction, 163
Baseline leakage analysis, 65
yield, 271 Conductance matrix, 110
BEM Continuous random variable, 16
boundary element method, 163 Corner-based, 3
capacitance extraction, 165, 184 Correlation index neighbor set
inductance extraction, 209 statistical leakage analysis, 67
BEOL Covariance, 21
back-end-of-the-line, 111 Covariance matrix, 8, 23, 25
Bin voltage level statistical leakage analysis, 43, 57
yield, 275 Critical dimension, 7
Binning algorithm
yield, 275
Block-Arnoldi orthonormalization, 243 D
BPV DAE
backward propagation of variance, 237 differential-algebra-equation, 235
mismatch, 242 yield, 258

for Nanometer VLSI Designs, DOI 10.1007/978-1-4614-0788-1,
300 Index
DDD FMM
determinant decision diagram, 222 fast-multipole-method, 183
Decancellation Free space Green function, 168
performance bound analysis, 227
Delay
dynamic power, 86 G
inductance extraction, 217 Galerkin-based method, 33
power grid, 107 spectral stochastic method, 31
yield, 254 Galerkin-based spectral stochastic method, 11,
Deterministic current source, 134 166
Discrete probability distribution, 18 capacitance extraction, 164, 166
Discrete random variable, 16 power grid, 113, 136
Dishing, 7 Gate oxide leakage
Downward pass, 185 statistical leakage analysis, 41
Dynamic current Gate oxide thickness
power grid, 128 statistical leakage analysis, 41
Dynamic power, 10 dynamic power analysis, 84
Dynamic power analysis, 85 Gaussian-Hermite quadrature
Dynamic power fundamental, 31
yield optimization, 273 Gaussian distribution, 19
Gaussian quadrature
fundamental, 31
E leakage analysis, 10
Effective channel length statistical leakage analysis, 59
dynamic power analysis, 84 inductance extraction, 212
power grid, 112 Gaussian
statistical leakage analysis, 41 capacitance extraction, 166
yield, 257, 274 dynamic power analysis, 90
EKS, 11 inductance extraction, 211
extended Krylov subspace, 127 mismatch, 241
Extended Krylov subspace method, 11 power grid, 111
power grid, 128 random variable, 7
Electrical parameter, 256 statistical leakage analysis, 58
Electromigration, 4 yield, 256
ETBR, 11 yield optimization, 275
ETBR extended truncated balanced realization, Geometric variation
11 capacitance extraction, 166
ETBR inductance extraction, 209
extended truncated balanced realization, Geometrical parameter, 256
145 Glitch width variation
power grid, 130, 148 dynamic power analysis, 89
Event, 15 Glitch
Expectation, 16 dynamic power analysis, 86
Experiment, 15 Global aggregation, 245
Exponential correlation model GM
capacitance extraction, 166 geometrical moment, 186
inductance extraction, 211 GMRES
capacitance extraction, 183
general minimal residue, 164
F Gradient-based yield optimization, 256
Fast multipole method, 12 Gramian
Filament current, 211 power grid, 145, 147
Filament voltage, 211 Greedy algorithm, 13
Index 301
Green function, 168 ITRS

Grid-based method, 24 International technology roadmap for
statistical leakage analysis, 49 semiconductors, 107
H K
Hermite polynomials KCL
total power analysis, 10, 95 Kirchhoff’s current law, 211
yield, 257 yield, 258
HOC Kharitonov’s functions, 13
hermit polynomial chaos, 33 performance bound analysis, 222, 228
Hot carrier injection, 4 Kharitonov’s polynomials, 13
HPC Krylov subspace
capacitance extraction, 163, 166 capacitance extraction, 194
Hermite polynomial chaos, 29
inductance extraction, 214–215 L
power grid, 115, 131 Layout dependent variation, 7
statistical leakage analysis, 40 LE
total power analysis, 97 local expansion, 187
Leakage power, 39
yield optimization, 273
I Local tangent subspace
Idle leakage, 77 mismatch, 244
IEKS Log-normal, 19
improved extended Krylov subspace Log-normal leakage current, 11
methods, 11 Log-normal
IGMRES power grid, 111, 134
incremental GMRES, 195 statistical leakage analysis, 41
Incremental aggregation, 246 Look-up table
Independent, 20 capacitance extraction, 171
capacitance extraction, 167 gate-based leakage analysis, 41
power grid, 110 LUT, 66
statistical leakage analysis, 67 Loop-up-table, 10
statistical leakage analysis, 57 LU decomposition, 184
Inductance extraction, 209 Lyapunov equation, 146
Inductance matrix, 210
Inner product
capacitance extraction, 171 M
mismatch, 241 Macromodel
power grid, 132 mismatch, 242
Inter-die, 6 ManiMOR
fundamentals, 23 mismatch, 247
power grid, 111 Markov’s inequality, 17–18
statistical leakage analysis, 45, 57 Maximum possible yield, 276
yield optimization, 275 MC
Interval arithmetic capacitance extraction, 166
performance bound analysis, 222 dynamic power analysis, 90
Intra-die, 6 mismatch, 235
fundamentals, 23 Monte Carlo, 28
power grid, 111 performance bound analysis, 221, 228
statistical leakage analysis, 45, 55 power grid, 132, 151
yield optimization, 275 statistical leakage analysis, 49, 61
IsTPWL total power analysis, 95
incremental stochastic TPWL, 236 yield, 253, 260, 282
mismatch, 247 inductance extraction, 211
302 Index
ME Orthogonal PC
multiple expansion, 186 power grids, 11
Mean value, 16 Orthogonal polynomial chaos, 29, 158
dynamic power analysis, 90 capacitance extraction, 166, 183, 188
mismatch, 241 leakage analysis, 55
statistical leakage analysis, 39, 58 Orthogonal polynomial chaos
yield, 261 mismatch, 236
inductance extraction, 211 Orthogonal polynomial chaos
total power analysis, 100 mismatch, 240
Mismatch, 235 power grid, 108, 127
analog circuits, 13 statistical leakage analysis, 53
performance bound analysis, 221 yield, 257
yield, 253 Orthogonal polynomial chaos
MNA yield analysis and optimization, 13
modified nodal analysis, 111 Orthogonal polynomial chaos
power grid, 115 dynamic power analysis, 87
Moment, 17 Orthogonal polynomials chaos
power grid, 129 analog circuits, 13
statistical leakage analysis, 50 Oxide erosion, 7
MOR
mismatch, 236, 238
model order reduction, 236 P
Multi-objective optimization, 262 Panel-distance, 186
Multivariate Gaussian process Panel-width, 186
power grid, 111 Parametric yield, 254, 275
Mutually independent, 20 PBTI, 4
MVP PCA
matrix-vector product, 183 capacitance extraction, 167, 186
power grid, 111, 150
principal component analysis, 27
N statistical leakage analysis, 49, 57, 67
NBTI, 4 yield, 257
NMC PDF
mismatch, 235 mismatch, 241
non-Monte Carlo, 253 probability density function, 18
Non-Monte-Carlo method, 13 total power analysis, 99
Non-Monte Carlo method yield, 255, 263
yield, 259 yield optimization, 274
Pelgrom’s model
mismatch, 237
O yield, 256
OPAM Performance bound analysis, 12, 222
operational amplifier, 265 Performance metric, 255
Optical proximity correction, 7 Perturbation
Optimal binning scheme, 280 mismatch, 240
Ordinary differential equation Perturbed SDAE
ODE, 238 mismatch, 240
Orthogonal decomposition PFA
capacitance extraction, 12 principle factor analysis, 26
leakage analysis, 10 total power analysis, 10, 95
power grids, 11 Phase-shift mask, 7
Index 303
PiCAP, 12 SCL
parallel and incremental capacitance standard cell library, 66
extraction, 183 Segment
PMTBR dynamic power analysis, 86
power grid, 147 Set covering, 276
Potential coefficient matrix SGM
second-order, 168 stochastic geometric moment, 189
capacitance extraction, 165 Single-objective yield optimization, 272
POV Singular value
propagation of variation, 256 power grid, 146
yield, 261 Slack, 274
Power constraint, 276 SLP
Power grid network, 109 sequential linear programming, 256
Power grids, 10 yield, 262
Pre-set potential, 165 Smolyak quadrature
Preconditioner, 184 dynamic power analysis, 88
Primary conductor, 211 fundamental, 32
Principal factor analysis, 10 inductance extraction, 212
Process variation, 4, 23 statistical leakage analysis, 60
capacitance extraction, 163, 165, 183 total power analysis, 98
inductance extraction, 209 SMOR
performance bound analysis, 221 stochastic model order reduction, 130
statistical leakage analysis, 45 Snapshot
total power analysis, 95 mismatch, 243
yield, 253 Sparse grid quadrature, 32
Projection matrix, 147 Sparse grid
PSD inductance extraction, 214
power spectral density, 235 total power analysis, 10, 95
PWL Sparse grids
piece-wise linear, 128 inductance extraction, 12
Spatial correlation, 8
Spatial correlation, 23
Q capacitance extraction, 169
Quadrature points, 31 power grid, 111
statistical leakage analysis, 59 statistical leakage analysis, 46, 57, 67
total power analysis, 95
yield optimization, 275
R Spatial correlations
Random variable, 16 leakage analysis, 10
Random variable reduction, 12 Spectral-stochastic-based MOR
RC network, 109 power grid, 127
Response Gramian, 11, 148 Spectral stochastic method
RHS leakage analysis, 10
right-hand-side, 258 Spectral stochastic method
Run-time leakage, 77 mismatch, 240
estimation, 77 power grid, 108
reduction, 79 statistical leakage analysis, 40
total power analysis, 97
yield, 257
S SPICE
Sample space, 15 dynamic power analysis, 86
Schmitt trigger, 265 total power analysis, 95
304 Index
SSCM Total power, 10, 93

capacitance extraction, 175 TPWL
Standard deviation, 17–18 mismatch, 246
dynamic power analysis, 90 trajectory-piecewise-linear, 236
mismatch, 241 Trajectory-piecewise-linear macromodeling,
statistical leakage analysis, 39, 58 13
total power analysis, 100 Trancating
StatCap, 12 TBR, 146
statistical capacitance extraction, 166 Transition waveform
State-space dynamic power analysis, 86
power grid, 146
StatHenry, 12, 212
Statistical leakage analysis, 10 U
Statistical variation, 7 Uniform binning scheme, 277
Statistical yield, 12 Upward pass, 185
STEP
statistical chip-level total power estimation,
95
Stochastic current source V
yield, 257 Valid voltage segments
Stochastic differential-algebra-equation, 13 yield, 276
mismatch, 235 VarETBR
Stochastic geometrical moments, 183 variational TBR, 11
Stochastic sensitivity, 261 Variance, 17–18
StoEKS, 11 inductance extraction, 211
stocahstic Krolov subspace method, 127 mismatch, 241
Subthreshold leakage, 39 statistical leakage analysis, 46, 59
power grid, 107 yield, 261
statistical leakage analysis, 41 Variation-aware design
Supply voltage, 263 inductance extraction, 209
Supply voltage adjustment Variation
yield optimization, 273 capacitance extraction, 167
SVD yield, 257
mismatch, 245 Variational current source
singular-value-decomposition, 239 power grid, 128
Switching segment, 89 Variational response Gramian, 151
Symbolic analysis, 13 Variational transfer function
performance bound analysis, 223 performance bound analysis, 226
Symbolic cancellation VarPMTBR
performance bound analysis, 223 variational Poor man’s TBR, 145
Virtual grid
dynamic power analysis, 10, 87
T statistical leakage analysis, 67
Taylor expansion, 118 Virtual variables, 10
capacitance extraction, 166 Voltage binning method, 13
mismatch, 240 yield optimization, 273
TBR Voltage binning scheme
truncated balanced realization, 146 yield, 275
Tensor product
capacitance extraction, 171
Threshold voltage W
power grid, 107 Wafer-level variation, 7
statistical leakage analysis, 41 Wire thickness
Timing constraint, 276 power grid, 111
Index 305
Wire width WPFA

power grid, 111 weighted PFA, 26
Worst case(corner)
mismatch, 235
performance bound analysis, 221 Y
power grid, 111 Yield estimation, 253
statistical leakage analysis, 39 Yield optimization, 253
yield, 260 Yield sensitivity, 253

Statistical Performance Analysis and Modeling Techniques For Nanometer Vlsi Designs

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Statistical Performance Analysis and Modeling Techniques For Nanometer Vlsi Designs

Hochgeladen von

Copyright:

Verfügbare Formate

Statistical Performance Analysis and Modeling

Techniques for Nanometer VLSI Designs

ISBN 978-1-4614-0787-4 e-ISBN 978-1-4614-0788-1

Library of Congress Control Number: 2012931560

© Springer Science+Business Media, LLC 2012

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

Part III emphasizes on variational analysis of on-chip power grid networks

Riverside, CA, USA Ruijing Shen

wonders as well as frustrations in academic research. His kindness, insight, and

3.2 Spectral Stochastic Method Using Stochastic

Part II Statistical Full-Chip Power Analysis

3 Traditional Statistical Leakage Power Analysis Methods . . . . . . . . . . . . . . 39

4.4 Incremental Leakage Analysis . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 76

Part III Variational On-Chip Power Delivery Network

8 Statistical Power Grid Analysis Considering Log-Normal

5 Statistical Power Grid Analysis Based on Hermite PC . . . . . . . . . . . . . . . . 112

Part IV Statistical Interconnect Modeling and Extractions

11 Statistical Capacitance Modeling and Extraction.. .. . . . . . . . . . . . . . . . . . . . 163

3.2 Expansion of Potential Coefficient Matrix . . .. . . . . . . . . . . . . . . . . . . . 167

Part V Statistical Analog and Yield Analysis

14 Performance Bound Analysis of Variational Linearized

17 Voltage Binning Technique for Yield Optimization .. . . . . . . . . . . . . . . . . . . . 273

Fig. 1.1 OPT and PSM procedures in the manufacture process .. . . . . . . . . . . . . 5

Fig. 2.1 Grid-based model for spatial correlations .. . . . . . .. . . . . . . . . . . . . . . . . . . . 24

Fig. 3.1 Subthreshold leakage currents for four different input

Fig. 4.1 An example of a grid-based partition. Reprinted with

Fig. 5.1 Location-dependent modeling with the T .i / of grid cell

Fig. 7.1 The comparison of circuit total power distribution

Fig. 8.1 The power grid model used . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 110

Fig. 8.2 Distribution of the voltage in a given node with one

Fig. 9.1 The EKS algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 129

Fig. 9.5 Distribution of the voltage variations in a given node

Fig. 10.1 Flow of ETBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 149

Fig. 11.1 A 2 2 bus. Reprinted with permission from [156]

Fig. 12.1 Multipole operations within the FMM algorithm.

Fig. 12.6 Prefetch operation in M2L. Reprinted with permission

Fig. 13.1 The statHenry algorithm .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 214

Fig. 14.1 The flow of the presented algorithm . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 224

Fig. 15.1 Transient mismatch (the time-varying standard

Fig. 16.1 Example of the stochastic transient variation or mismatch . . . . . . . . . 254

Fig. 17.1 The algorithm sketch of the presented new voltage

Fig. 17.5 The flow of greedy algorithm for covering most

Table 3.1 Different methods for full-chip SLA . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 40

Table 4.1 Process variation parameter breakdown for 45 nm technology.. . . 61

Table 5.1 Summary of test cases used in this chapter.. . . .. . . . . . . . . . . . . . . . . . . . 80

Table 6.1 Summary of benchmark circuits . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 91

Table 7.1 Summary of benchmark circuits . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 100

Table 8.1 Accuracy comparison between Hermite PC (HPC)

Table 10.1 Power grid (PG) benchmarks . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 153

Table 11.1 Number of nonzero element in Wi . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 174

Table 11.7 Error comparison of capacitance standard deviations

Table 12.1 Accuracy comparison of two orthogonal PC expansions .. . . . . . . . . 200

Table 13.1 Accuracy comparison (mean and variance values of

Table 14.1 Extreme values of jP .j!/j and ArgP .j!/ for

Table 15.1 Scalability comparison of runtime and error for the

Table 16.1 Comparison of accuracy and runtime . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 267

Table 17.1 Predicted and actual number of bins needed under

1 Nanometer Chip Design in Uncertain World

R. Shen et al., Statistical Performance Analysis and Modeling Techniques 3

1.1 Causes of Variations

To consider the impact of variations on the circuit performance, we should first

Optical proximity correction (OPC) process.

E ŒZ EŒZ D EŒZ E ŒEŒZ D 0:

Cov.X; Y / D E Œ.X EŒX /.Y EŒY /

Since is subject to standard normal distribution,