Beruflich Dokumente
Kultur Dokumente
a r t i c l e in fo abstract
Article history: As the device and interconnect physical dimensions decrease steadily in modern nanometer silicon
Received 21 February 2008 technologies, the ability to control the process and environmental variations is becoming more and
Received in revised form more difficult. As a consequence, variability is a dominant factor in the design of complex system-on-
30 September 2008
chip (SoC) circuits. A solution to the problem of accurately evaluating the design performance with
Accepted 3 October 2008
variability is statistical static timing analysis (SSTA). Starting from the probability distributions of the
process parameters, SSTA allows to accurately estimating the probability distribution of the circuit
Keywords: performance in a single timing analysis run. An excellent survey on SSTA was recently published [D.
Statistical static timing analysis Blaauw, K. Chopra, A. Srivastava, L. Scheffer, Statistical timing analysis: from basic principles to state of
Process variations
the art, IEEE Trans. Computer-Aided Design 27 (2008) 589–607], where the authors presented a general
Systematic variations
overview of the subject and provided a comprehensive list of references.
Random variations
Inter-die variability The purpose of this survey is complementary with respect to Blaauw et al. (2008), and presents the
Intra-die variability reader a detailed description of the main sources of process variation, as well as a more in-depth review
and analysis of the most important algorithms and techniques proposed in the literature that have been
applied for an accurate and efficient statistical timing analysis.
& 2008 Elsevier B.V. All rights reserved.
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
2. Sources of variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
2.1. Definition and classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
2.1.1. Inter-die variations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
2.1.2. Intra-die variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
2.1.3. Device variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
2.1.4. Interconnect variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
2.2. Variation trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
3. Introduction to statistical static timing analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
3.1. Static timing analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
3.1.1. Path-enumeration and block-oriented algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
3.2. Monte Carlo methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
3.3. Probabilistic analysis methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
3.4. Key challenges for statistical static timing analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
4. Block-based statistical static timing analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
4.1. The canonical first-order delay model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
4.2. Circuit delay calculation in block-based statistical timing analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
4.3. Spatial correlation modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
4.4. Orthogonal transformations of correlated random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
4.5. Canonical form generalization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
Corresponding author. Tel.: +39 039 603 6437; fax: +39 039 603 6251.
E-mail addresses: cristiano.forzan@st.com (C. Forzan), davide.pandini@st.com (D. Pandini).
0167-9260/$ - see front matter & 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.vlsi.2008.10.002
ARTICLE IN PRESS
1. Introduction Recently, a strong research effort has been devoted to this topic,
and this survey is focused on parametric yield loss.
As microelectronic technology continues to reduce the mini- Typically, the methodology to determine the circuit timing
mum feature size, and consequently to increase the number of performance spread under variability is to run multiple static
transistors that can be integrated onto the same die in accordance timing analyses (STA) at different process conditions, i.e., ‘‘cases’’
with the Moore’s law, the gap between the designed layout and or ‘‘corners’’, which include the ‘‘best-’’, ‘‘nominal-’’ and ‘‘worst-
what is really fabricated on silicon is widening significantly. As a case’’. A process corner (or corner in short) is a set of values
consequence, performances predicted at the design level may assigned to all process parameters to bound the circuit perfor-
drastically differ from the results obtained after silicon manu- mance. The worst-case corner is defined as the corner with every
facturing. Aggressive technology scaling introduces new sources parameter at the m73s value, such that a typical circuit has the
of variation, while at the same time process control and tuning smallest slack. However, it is worth pointing out that determining
during fabrication become more and more difficult. Coping with the real worst-case corner is very difficult (if not impossible at all)
variations during design has potentially significant advantages without an explicit enumeration of all corners, since the circuit
both in terms of time-to-market and reduced costs in process slack is a non-monotonic function of variation parameters. This
control. The first ones stem from taking the right decisions early in approach is breaking down because the increasing number of
the design flow, even at the system level, thus considerably independent sources of variation would require too many timing
reducing the number of design iterations before tape-out. analyses. In fact, the corner-case approach necessitates up to 2n
Furthermore, variability reduction by means of process control runs, where n is the number of significant sources of variation. In
usually requires expensive manufacturing equipment [1]. Hence, Table 1, a list of the principal variability sources in advanced
the impact of parameter variations should be compensated with
novel design solutions and tools, due to the very high cost of
advanced process control techniques [2,3]. Following the technol-
ogy scaling, while steadily shrinking in absolute terms, process
variations are growing as a percentage of increasingly smaller
geometries [4,5]. Moreover, variability sources grow in number as
the process becomes more complex, and correlations between
different sources of variation and a general quality figure of the
Dummy fill Dummy fill
process are becoming more and more difficult to predict.
Manufacturing variations introduce the following yield loss
mechanisms:
100
Catastrophic yield loss: Fabricated chips do not function
correctly. 80
Defect Based
Parametric yield loss: Fabricated chips do not perform according
to specification (they may not be as fast as predicted during 60 Lithography Based
Yield
silicon technologies and their impact on delay is reported [6], and sold for the highest profit. More in general, it allows to estimating
a complete case analysis taking into account all these variations the true operating frequency. In contrast, for ASICs, it permits an
may need from 27 up to 220 timing analyses! A possible solution to early decision making on risk management at chip level. Another
reduce the number of timing analyses is to design and verify in important output from SSTA is diagnostics, enabling a designer or
the worst/best-case corner. Worst/best-case timing analysis an automatic optimization tool to improve the circuit overall
determines the chip performance by assuming that worst/best performance and robustness, by exploiting the sensitivity of the
process and operating conditions exist simultaneously. Therefore, arrival times to different sources of variation. Therefore, SSTA will
the delay of each circuit element is computed under these simultaneously allow to targeting high-performance while pro-
conditions. Since only the performance extreme values are of viding quantitative risk management [9].
interest, neither the details of the performance probability density This survey is organized as follows: in Section 2 the most
function (PDF), nor the distribution of the single parameters are important sources of device and interconnect variations are
necessary. This approach is based on the assumption that if a introduced and classified. In Section 3, the formulation of the
circuit works correctly under the most pessimistic conditions, SSTA problem, the key challenges, and the different approaches
then it will function under nominal conditions. Hence, designing are presented, while the main algorithms and techniques adopted
in worst-/best-case would automatically take into account the in modern block-based SSTA are described in Section 4. Finally,
nominal case. However, considering the corner values for each Section 5 presents some conclusive remarks.
electrical parameter may lead to over-pessimistic performance
estimation, since the actual correlation between electrical para-
meters is not considered. In other words, the scenario with all 2. Sources of variation
parameters in their worst-/best-case values has really a minimal
probability to happen in practice, and in several cases it cannot Process variations in both interconnect and devices dictate
happen at all. As an example, by considering the variation impact more conservative design margins. Therefore, understanding how
on delay reported in Table 1, the worst-case approach will give a much variability exists in a given design and its impact on timing
[65%, +80%] guard-band timing interval, thus leading to a strong and power performances is becoming a critical issue. In the
underutilization of the technology. Furthermore, within-die (WID) following sections, the impact of different variability sources is
variations have become a non-negligible component of the total analyzed.
variations [4,5]. These variations may be handled by existing
corner-case design methodology only by applying different 2.1. Definition and classification
derating factors for datapath and clock-path delay, and/or by
introducing large uncertainty margins, resulting in either an over- Variation is the deviation from designed values for a layout
or under-estimation of the circuit delay, depending on the circuit structure or circuit parameter. The electrical performance of VLSI
topology. Another drawback of the traditional worst-case meth- ICs is impaired by two principal sources of variation:
odology is that it cannot provide information about the design
sensitivity to different process parameters, which could poten- Environmental variations, which arise during the circuit opera-
tially be very useful to obtain a more robust design implementa- tion, and include fluctuations in power supply, switching
tion. Examples of worst-case approaches can be found in [7,8]. activity, and die temperature. These variations are time-
A potential solution to the problem of accurately evaluating the dependent and have a large range of temporal time constants
design performance with variability is statistical static timing that vary from the nanosecond to millisecond for temperature
analysis (SSTA). Starting from the probability distributions of the effects. Therefore, they are also called temporal (or dynamic)
sources of variation, SSTA allows to computing the probability variations, and directly impact the parametric yield.
distribution of the design slack in a single analysis. An example of Physical variations, which arise during manufacturing and
the design slack distribution is illustrated in Fig. 2. The plot result in structural device and interconnect parameter fluctua-
indicates that for a slack of 200 ps the parametric yield of the tions. They include lithography-induced systematic and ran-
design will be close to 100%, while for a slack of 300 ps the yield dom variations in critical device dimensions such as transistor
drops to about 0%. The slack distribution information may yield length and width, as well as wire and via width. Moreover,
several advantages. For products that are at-speed tested and they also include random phenomena like the impact of
binned like microprocessors, it allows to predicting the number of discrete dopant fluctuations on MOSFET threshold voltage, and
chips that will fall into the high-frequency bin, and consequently systematic phenomena like inter-layer dielectric thickness
1.2
1
Parametric Yield
0.8
0.6
0.4
0.2
0
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
Slack (ns)
2.1.4. Interconnect variations ing variations are increasing relatively to their nominal values, as
The interconnect variations, also denoted as Back-End-of-the- illustrated in Fig. 4. Furthermore, the intra-die variations are also
Line (BEOL) variations, consist of the following components: increasing significantly, as shown in Fig. 5, which reports the ratio
between WID and total variations for some key device and
Metal thickness T variations, due to deposition deviations in interconnect parameters. Following the technology scaling trends,
conventional metal interconnects, or dishing and erosion CMOS devices are expected to continue shrinking over the next
fluctuations in damascene (i.e., copper polishing) processes. two decades, but as they approach the dimensions of the silicon
Dielectric thickness H or ILD variations, caused by fluctuations of lattice, they can no longer be described, designed, modeled, or
deposited or polished oxide films. Furthermore, the CMP interpreted as continuous semiconductor devices. Fig. 6 illustrates
process can introduce strong ILD variations across the chip. a 22 nm (physical gate length) MOSFET expected in mass
Line width W and line space S variations, due to photolitho- production before 2010 according to the 2003 ITRS roadmap
graphy and etch dependencies. At the smallest dimensions [15], where there may be less than 50 Si atoms along the channel.
(lower metal levels), proximity and photolithographic effects In these devices, random discrete dopants, atomic-scale interface
may be important, while at higher levels etch effects depend- roughness, and line-edge roughness will introduce large intrinsic
ing on line width and local layout can be more significant.
Line edge roughness (LER), due to the photolithographic and
etching steps. 45%
Leff Tox Vth
Table 2
Variation components in 90 nm CMOS technology.
Table 3
Technology process parameter (nominal/3s variations) trends.
Year Leff (nm) Tox (nm) Vth (mV) W (mm) H (mm) r (mO)
1
3 1
1
FF1 1 1
4
1 1 FF2
2
2
a a 1
e e
b 2
b g
g 2
c c 1
f f
1
d d
3
Fig. 9. A simple combinational circuit (left) and its corresponding timing graph (right).
delay distribution is derived from the collection of output delays. where O is a finite domain, X is a vector variable representing the
With a sufficient number of trials, the output distribution can be process parameters, and f(X) is the PDF on X. If g(X) is a function
predicted with a measurable confidence. An estimation of the that evaluates to 1 when the circuit delay is within the
timing yield is then obtained by considering the fraction of specifications and 0 otherwise, then the value of the integral
samples for which the timing constraint is satisfied. MC-based G is the circuit yield. MC estimation for the value of G is obtained
ARTICLE IN PRESS
by drawing a set of samples X1, X2, y, Xn from f(X) and letting the the yield computation can be expressed as in (1)
Z
estimator GN be given by the following expression:
LossðSÞ ¼ 1 YieldðSÞ ¼ IðS; XÞf ðXÞ dX.
1X N
GN ¼ gðX i Þ. (2) Then, for estimating the timing yield, it was proposed to use the
N i¼1
logical effort approximation to obtain a function that approx-
imates I(S, X) and has the mathematical properties required by the
The variance reduction techniques typically reduce the number
variance reduction methods. In [24] the control variates technique
of MC simulations required to accurately estimate (i.e., with small
is used in conjunction with importance sampling; however, no
variance) the value of the finite integral (1) by means of
experimental results were presented. The work in [25] presented
expression (2). The work [24] focused on the importance sampling
an efficient formulation of the importance sampling method,
and the control variates techniques. The first method biases the
called mixture importance sampling, for statistical SRAM design
choice of the samples from the process parameter space towards
and analysis. To produce more samples in the important region,
areas where the circuit delay violates the timing constraints
where the delay does not meet the target, the authors proposed to
(called important regions). Mathematically, the technique is based
distort the (natural) sampling function by using an appropriate
on drawing the samples for X from another distribution f˜ in order
mixture of distributions, including a shifted Gaussian and a
to reduce the variance of the estimator GN. Integral (1) is then
uniform distribution. The reported results demonstrated some
written as
efficiency and accuracy improvement against the standard MC
Z analysis. A further application of the importance sampling
gðXÞf ðXÞ ˜
G¼ f ðXÞ dX technique to speed-up path-based MC simulations for statistical
O f˜ ðXÞ timing analysis was proposed in [26].
Another variance reduction technique suitable for parametric
and if X1, X2, y, Xn are drawn from f˜ instead of f, the new
yield estimation is Latin Hypercube Sampling (LHS). The advan-
estimator is expressed as
tage of LHS over the importance sampling and control variates
techniques is that is does not require any knowledge of the system
1X N
gðX i Þf ðX i Þ under consideration, and is therefore general and scalable. LHS
G̃N ¼ .
N i¼1 f˜ ðX i Þ attempts to ensure that the chosen samples are spread more or
less uniformly in the sample space. In a simple version, LHS
Ideally, the choice of f˜ that minimizes the variance of the generates N samples from a sample space of k random variables
estimator GN is given by X ¼ [X1, X2, y, Xk] in the following manner. The range of each
variable is partitioned into N non-overlapping intervals of equal
gðXÞf ðXÞ probability size 1/N. One value is chosen at random from each of
f˜ ideal ðXÞ ¼ ,
G these N intervals for every variable, and the N values thus
obtained for X1 are randomly paired with the N values obtained
but in practice f˜ ideal cannot be realized since the value of G is not
for X2. This results in N pairs that are combined randomly with the
known a priori. Instead, a function f˜ ‘‘similar’’ to f˜ ideal is typically
N values of X3 to form N triplets. The procedure continues until N
used.
k-tuples are obtained. Fig. 11 illustrates LHS sampling algorithm
In the control variates approach, a function h(X) that ‘‘correlates
for the three-variable case [27]. LHS achieves variance reduction
well’’ with g(X) is used. The function h must be chosen so that the
integral:
Z
H¼ hðXÞf ðXÞ dX
O
1X N
Gcm ¼ H þ DðX i Þ.
N i¼1
Since H can be estimated with zero or very low variance, and all
D(Xi) values (and therefore their contribution to the total
variance) are very small, a variance reduction is then obtained.
In order to be effective, these techniques require a function
that well approximates g(X). In [24] the authors firstly defined the
timing yield as an integral in the form of (1), by defining an
indicator variable I(S, X) that evaluates to 1 if the circuit delay Fig. 11. Example of LHS sampling with N ¼ 8, k ¼ 3: (a) sampling of a variable in
does not meet the timing target, and 0 otherwise. The variable S equal probability bins and (b) forming triplets by randomly combining individual
represents the fixed design parameters for the circuit. Therefore, samples [27].
ARTICLE IN PRESS
in very general cases and can be effectively combined with other recomputation of the circuit delay with small changes in the
techniques for variance reduction. In [27], a Criticality Aware Latin design is necessary. In fact, if the samples for SH-QMC on circuit C
Hypercube Sampling (CALHS) approach is introduced to improve are reused for C0 (C with small changes), then most samples need
the efficiency of MC-based statistical timing analysis. Timing not be reevaluated to recompute the xth percentile delay; only
criticality information is used to partition the process space into those samples with a circuit arrival time close enough to the xth
mutually exclusive strata. Then, the LHS technique determines an percentile delay of C need to be re-evaluated.
appropriate set of samples in these strata. By assuming that However, although these techniques improve the performance
process variations can be represented as a linear combination of of MC-based SSTA, and some limitations can be discussed
orthogonal random variables, and by assuming a linear relation- and possibly removed [30], there is a general agreement that
ship between the gate delay and the principal components of all more research is required to assess if MC methods can be effective
the parameters and the uncorrelated random component (the for the timing yield estimation of large system-on-chip (SoC)
validity of both the above assumptions will be discussed in the designs.
next section), the results in [27] showed about 7 reduction in
the number of samples compared to random sampling. Moreover,
the MC-based SSTA with CALHS computed the 99th percentile 3.3. Probabilistic analysis methods
circuit delay with about 50% less error than a traditional SSTA-
based approach. While MC techniques are based on sample space enumeration,
Another variance reduction technique is represented by the other methods explicitly model timing quantities such as delays,
Quasi-Monte Carlo (QMC) method. The error bound to numeri- arrival times, and slacks as probability distributions; they are
cally estimate integral (1) by using a sequence of samples can be referred as Probabilistic Analysis Methods. The equivalent timing
related to a mathematical measure of uniformity for the graph is probabilistic, and delays are random variables, as
distribution of the points, called ‘‘discrepancy’’. This suggests that illustrated in Fig. 12. Therefore, the probability distribution of
sequences with the smallest discrepancy should be used to the circuit performance under the influence of parameter
evaluate the function in order to achieve the smallest possible variations can be predicted with a single timing analysis. The
error bound. Such sequences constructed to reduce discrepancy problem of unnecessary risks, excessive number of timing
are called Low Discrepancy Sequences (LDS) and they are analyses, and pessimism are all potentially avoided. Moreover,
deterministic. QMC techniques are characterized by using LDSs the WID variations, which are random in nature, are actually
to generate samples. However, their exploitation in SSTA is not considered as statistical quantities during the analysis. Finally,
straightforward, since when the problem dimension increases, other phenomena can be considered statistically such as [9]
there is degraded uniformity (pattern dependency, [28]). To
minimize this effect, the concept of criticality of variables was
The inaccuracy of the model-to-hardware correlation can be
introduced in [29], where a technique for variable ordering based
treated statistically to reduce pessimism.
on their criticality with respect to circuit delay is proposed. The
Aging and fatigue effects such as negative bias temperature
variables are separated into critical, moderate, and non-critical
instability (NBTI), hot electron effects, and electromigration
ones. Then, the variance reduction techniques are applied where
can be considered with probabilistic techniques.
they are most effective. For the top-most critical variables, the
Coupling noise can be probabilistically integrated into a unified
stratified sampling technique is used, leading to faster accuracy.
timing verification environment. However, coupling effects are
Only the top 2–5 variables are used to guide stratification since
typically not considered as variability sources. SSTA algorithms
the number of strata increases exponentially with the number of
including coupling effects will be discussed in Section 4.7.
variables. QMC methods are employed on the top-most to
moderately critical variables for its fast convergence properties.
Because of pattern dependency, only a limited number of A typical SSTA tool accepts additional input information with
variables are sampled with QMC. Therefore, on the non-critical respect to a traditional timing analyzer, including the sources of
variables, the LHS technique is adopted. This approach, called variation and their probability distributions, variances and co-
Stratification+Hybrid QMC (SH-QMC), achieved on average about variances. Moreover, it is possible to compute the dependence of
24 reduction in the number of samples required for timing the cell delay and slew on the sources of variability. The main
estimation compared to a random sampling approach. Moreover, output of the tool is the probability distribution of the slack and
SH-QMC is suitable for incremental timing analysis, when a fast probabilistic diagnostics.
B
D
I1 A C
3.4. Key challenges for statistical static timing analysis potentially statistically critical paths may be missed, as illustrated
in Fig. 13. This plot shows the probability that a given path is in
Taking spatial correlations into account is a crucial require- the top 50 worst-case paths on a given die. The paths are ranked
ment for SSTA [31]. There are several kinds of correlation that on the x-axis by margins (computed deterministically with worst-
must be considered. The first ones are structural correlations case STA) at the latching flip-flops. As shown in Fig. 13, several
introduced by different data paths sharing some standard cells, paths with rank higher than 100 show up in the top 50 paths for
otherwise known as reconvergent fanouts. The second type of the block on 10% of the dies. This result demonstrates that
correlation is related to spatial proximity: devices and wires that deterministic timing analysis may not give an accurate path
are within the same layout region exhibit very similar parameter ordering [32]. All path-based methods have the fundamental
variations, because they are caused by the same manufacturing limitation that the number of paths is too large and some
sources. For instance, standard cells close to each other are likely heuristics must be used to limit the critical paths considered for
to have very small channel length variation; therefore, their delays detailed analysis. On the other hand, block-based approaches,
are also quite similar. Moreover, it is very likely that transistors while computationally more efficient, suffer from a lack of
and interconnects within the same layout region also have similar accuracy especially due to the statistical max/min operation. In
temperature and power supply values. Hence, this type of the next section, the main approaches proposed in the literature
correlation is known as spatial correlation. addressing the challenges discussed above will be analyzed,
Another challenge is represented by the delay modeling for focusing the attention on the block-based approach, which
cells and interconnects. While most process variations can be enables SSTA on multi-million gate designs in a reasonable
described by means of a normal distribution, this is not amount of time.
necessarily the case for the delay variations introduced by such
process variations. In order to simplify calculations and reduce the
overall computational effort for SSTA, most approaches assumed a 4. Block-based statistical static timing analysis
linear dependency of delay on process variations. Recently, higher-
order models have been proposed, while analytical modeling of One of the most useful approaches for circuit analysis and
gate-level behavior has not received much attention as yet. The optimization is parameterized statistical timing analysis. This
propagation of delay distributions through a circuit represents technique considers gate and wire delays as functions of the
another critical issue in SSTA. After the delay distribution of all process parameters. Using this representation, parameterized
circuit components has been modeled, the delay of an entire statistical timing analysis computes circuit timing characteristics
circuit needs to be computed. Operations of fundamental (arrival times, delays, timing slacks) as functions of the same
importance in block-based analysis are the sum and the max/ parameters. Knowing explicit dependencies of timing character-
min of random variables. In particular, for the max/min operation, istics on process parameters has two main advantages. First, by
it is computationally very expensive to determine the exact result. combining this information with the parameter statistics, we can
Therefore, most of the proposed approaches make the simplifying compute the probability distribution of circuit delay and predict
assumption that the result of these operations is also a normal manufacturing yield. Then, this information can be used for circuit
distribution. optimization, improving the design robustness and manufactur-
A critical topic is related to the different algorithmic ing line tailoring. In contrast, non-parameterized statistical timing
approaches used to compute the delay distribution, i.e., path- analysis cannot compute relations between circuit timing char-
based or block-based, which may differ significantly in terms of acteristics and process parameters [33–36]. The most important
both accuracy and computational complexity. Due to the large works on parameterized SSTA using a block-based approach were
computational effort necessary for path-based analysis, in [31] it proposed by Visweswariah et al. [37], and Chang and Sapatnekar
was proposed first to run traditional STA, and then to analyze only [38]. The work of Visweswariah et al. was one of the first
the n-most critical paths accurately using SSTA. However, some statistical timing methods that were exploited in an industrial
tool by IBM, called EINSSTAT.
Sapatnekar [38]. The method is outlined in Fig. 15. However, the It is important to notice that the canonical first-order delay
correlation (i.e., covariance) between the independent sources of model (3) employed for all timing quantities allows to considering
variations (DRa in canonical first-order form (3)) is not preserved. both global correlations and independent randomness, but it does
Moreover, during the recursive computation of n-variable max not take into account the spatial correlations, which can be
function, some inaccuracy can be introduced since the max is handled by means of derating factors. However, considering the
approximated by a normal distribution even though it is not spatial correlations by means of derating factors will yield
normal. Such inaccuracy is exacerbated when proceeding with inaccurate results in statistical timing analysis, which might be
further recursive calculations. Therefore, as the number of either pessimistic or risky. As such, spatial correlations must be
variable increases, a larger error can be introduced. Moreover, included, and different modeling techniques will be discussed in
the loss in accuracy of the final result is dependent on the the next section.
ordering of the pair-wise max operations. The max operation on n
Gaussians is analogous to the construction of a binary tree with n 4.3. Spatial correlation modeling
leaves such that each internal node computes the max of its two
children. In [43] the above tree is referred as Max Binary Tree
Not every timing quantity depends on all global sources of
(MBT). Novel approaches for constructing good MBTs to reduce
variation, and the works [38,45,46] suggest methods for modeling
the max of n Gaussian inaccuracy have been proposed and
parameter variations by having the delay of gates and wires in
analyzed in [43]. The experimental results of the proposed
physically different die regions depending on different sets of
methods showed an accuracy improvement in variance estima-
random variables. The approach proposed in [45] is mainly
tion up to 50% against to the traditional approach.
focused on device channel length variability, but it can be
The sum operation between two random variables (timing
straightforwardly extended to other process variations. The total
quantities) in canonical form, D ¼ A+B, can be easily expressed in
channel length Ltotal,k of device k is the algebraic sum of nominal
canonical form
channel length, inter-die channel length variation, and intra-die
X
n qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi channel length variation:
2
D ¼ ða0 þ b0 Þ þ ðai þ bi ÞDX i þ a2nþ1 þ bnþ1 DRd . (9)
i¼1 Ltotal;k ¼ Lnom þ DLinter þ DLintra;k , (10)
where DLinter and DLintra,k are random variables, and Lnom
Therefore, by replacing the sum (difference) and max (min)
represents the mean of the channel length across all possible
operations with probabilistic equivalents, and by re-expressing
dies which is equal to the nominal value of the device channel
the result in canonical form after each operation, SSTA can be
length. All devices on a die share one variable DLinter for the inter-
carried out by a standard forward and backward propagation
die component of their total channel length variation, which
through the timing graph.
represents a variation of the mean of all the devices of a particular
die. DLintra,k is the variation of an individual device from this die
d1 mean. If the spatial correlation of intra-die variations is not
max { considered, then each device is represented with a separate
max { d2
independent random variable DLintra,k, where all random variables
max {
... max { d3 DLintra,k have identical probability distributions. Based on the
d4 max (d1, …, dn)
... ... assumption that for small variations the change in gate delay is
linear with respect to the change in channel length, the delay of
dn the k-gate can be expressed as
Fig. 15. Recursive computation of n-variable max function. dk ¼ Dnom þ aDLinter þ aDLintra;k , (11)
0,1
1,2
1,1 1,4
1,3
2,6
2,5 2,8
2,2 2,7 2,14
2,1 2,4 2,13 2,16
2,3 2,10 2,15
2,9 2,12
2,11
where a is the sensitivity of the delay with respect to the channel the gate delay:
length computed at the nominal device channel length. In (10) the !
X
intra-die variation of channel length is modeled by assigning an dk ¼ Dnom þ a DLinter þ DLl;r þ DLrandom;k . (14)
independent random variable for each gate. However, in presence 0plpm; r intersects k
The covariance matrix can be determined from data extracted partition the gates into spatial regions, as shown in Fig. 19,
from manufactured wafers [47]. However, if real silicon data is not similarly to the technique proposed in [38]. The variation of a
available, the correlation matrix can also be derived from the process parameter P can be represented as a linear combination of
spatial correlation model proposed in [45,46]. four independent random components P1, P2, P3, and P4, with zero
It is believed that the correlation model proposed in [38] is mean and finite variance, which are random variables correspond-
more general than the model described in [45,46], since it is ing to the four corners of the chip (as depicted in Fig. 19). For any
purely based on neighborhood. For example, consider the case in gate j, the corresponding parameter Pj can be modeled as
Fig. 18, where the 4 4 grids are numbered according to the quad-
tree partitioning of Fig. 16. Following the model proposed in [38], P j ¼ a 0 þ a1 P 1 þ a2 P 2 þ a3 P 3 þ a4 P 4 , (17)
the intra-die device length in grid (2,8) has equal correlations where a0 is the nominal value of parameter Pj. For any placed gate j
with that in grid (2,6) and (2,14), while by the model described in we can compute the grid-based radial distance from the four
[45] it will have higher correlation with grid (2,6) than grid (2,14), corners of the placement, i.e., R1, R2, R3, and R4 in Fig. 19. The
i.e., the correlations are uneven at the two neighbors of grid (2,8), coefficients a1, a2, a3, and a4, in (17) can be computed by using
as summarized in these radial distances with an appropriate function H(R) as follows:
DLintra;a ¼ DL2;6 þ DL1;2 þ DL0;1 þ DLrandom;a ; a1 ¼ HðR1 Þ; a2 ¼ HðR2 Þ; a3 ¼ HðR3 Þ; a4 ¼ HðR4 Þ. (18)
DLintra;c ¼ DL2;8 þ DL1;2 þ DL0;1 þ DLrandom;c ; (16)
The random variables P1, P2, P3, and P4 can have any arbitrary
DLintra;e ¼ DL2;14 þ DL1;4 þ DL0;1 þ DLrandom;e :
distributions, depending on the distribution of the parameter Pj.
Hence, if two gates are far apart, they will have different
We can observe from (16) that gates in squares (2,6) and (2,8)
contributions from the four components P1, P2, P3, and P4, and will
are strongly correlated, as they share the common variables DL1,2
have a weak correlation. In contrast, if they are placed close by, the
and DL0,1. On the other hand, gates in squares (2,8) and (2,14) are
four coefficients (18) will be similar, and a stronger spatial
weakly correlated as they share only the common variable DL0,1.
correlation will exist between them. This approach to model the
Another approach for spatial correlation modeling was pro-
spatial correlations is similar to the method proposed in [46].
posed in [48]. A uniform grid is imposed on the placed netlist to
However, in [46] the number of underlying variables to capture the
spatial correlations is potentially higher, where in the approach
a c e proposed in [47] only four variables are necessary for each
parameter. The importance of including spatial correlations in
(2,6) (2,8) statistical timing analysis was demonstrated in [46], where
(2,14) (2,16)
ignoring such correlations may yield an under estimation of the
computed variability.
The correlation models proposed in [38,45] were analyzed in
(2,5) (2,7) (2,13) (2,15) [49], based on the critical dimension (CD) data obtained through
electrical linewidth measurements (ELM) of a 130 nm test chip,
consisting of 8 different test structures (various densities and
orientations of polysilicon lines with OPC included), where 5
(2,2) (2,4) (2,10) (2,12) different wafers were investigated, each wafer containing 23
fields, and each field including 308 measurement points: 14
points in the horizontal direction and 22 points in the vertical
direction. It was demonstrated that correlation is not mono-
(2,1) (2,3) (2,9) (2,11)
tonically decreasing with distance, as shown in Fig. 20, where it is
evident that correlation vs. horizontal distance is different from
Fig. 18. Quad-tree partitioning (level 2).
correlation vs. the vertical distance (distance is not the key
component to correlation, which is typically stronger along a
particular axis). Moreover, it was reported that the number of
0.8
Average Correlation
0.6
0.4
0.2
0
0 3 6 9 12 15 18
Distance (mm)
Fig. 19. Grid-based radial spatial correlation model [48]. Fig. 20. Average correlation vs. distance [49].
ARTICLE IN PRESS
principal components (from Principal Component Analysis) while the covariance between d and any PC p0i is given by
necessary to obtain accurate results with the grid-based approach
covðd; p0i Þ ¼ ki s2p0 ¼ ki . (21)
presented in [38] is about 3, while for the quad-tree method [45] i
any number of levels above 3 did not give any significant Moreover, if di and dj are two random variables expressed in
improvement in terms of accuracy. The results presented in [49] terms of PCs as
demonstrate that both the grid-based approach [38] and the
quad-tree method [45] provide an accurate estimation of the 0 P
m
di ¼ di þ kir p0r ;
actual mean and variance of the circuit delay distributions. r¼1
However, another interesting result reported in [49] is that also 0 Pm
dj ¼ dj þ kjr p0r
much simpler models (i.e., the die-to-die plus random model) for r¼1
spatial correlations can yield a good accuracy, within a few
percent of the grid-based models. their covariance can be computed by
X
m
covðdi ; dj Þ ¼ kir kjr .
4.4. Orthogonal transformations of correlated random variables
r¼1
In SSTA, when both the spatial correlations and the structural In the work presented in [38], the above properties of delay in
correlations due to reconvergent fanouts are taken into account, the form of Eq. (19) are used to find the distribution of circuit
the overall correlation composition becomes very complicated. To delay. The approach described in [38] to compute the max
make this problem tractable, in [38] the principal component function of n normally distributed random variables is an
analysis (PCA) technique is used to transform a set of correlated extension of the method proposed in [40], which only considered
parameters into an uncorrelated set. Given a set of correlated uncorrelated random variables. In [38] the Gaussian distribution
random variables ~ X with a covariance matrix R, PCA can transform is used to approximate the max function dmax Nðmmax ; smax Þ by
0 means of a linear combination of all PCs as
the set ~X into a set of mutually orthogonal random variables ~ X,
0
~
such that each member of X has zero mean and unit variance. The X
m
0
elements of the set ~ X are called principal components (PCs) in PCA, dmax ¼ mmax þ aj p0j . (22)
j¼1
and are mathematical abstractions that cannot be directly
0
measured. The size of ~ X is no larger than the size of ~ X, and any Therefore, determining the approximation for dmax is equivalent
0
variable xi 2 ~
X can be expressed in terms of the PCs ~ X as to finding mmax and all the coefficients aj. From (21) the coefficient
0 1 aj equals to covðdmax ; p0j Þ and the variance of dmax (22) can be
X qffiffiffiffi
xi ¼ @ lj vij x0j Asi þ mi , expressed by means of (20) as
j X
m X
m
0 s20 ¼ a2j ¼ cov2 ðdmax ; p0j Þ. (23)
where x0j
2 X is a PC, lj is the jth eigenvalue of the covariance j¼1 j¼1
matrix R, vij is the ith element of the jth eigenvector of R, and si
and mi are the mean and standard deviation of xi, respectively. For Since (23) is an approximation, to reduce the difference
instance, let ~ Lg be a vector of random variables representing between s20 and the actual variance s2max of dmax, the value aj can
transistor channel length fluctuations in all grids of Fig. 17, and the be normalized as
set of random variables is of multivariate normal distribution with smax
0 aj ¼ covðdmax ; p0j Þ .
covariance matrix RLg . Let ~
Lg be the set of PCs computed with PCA. s0
Then any random variable Lig 2 ~ Lg representing the variation of
Hence, to find the linear approximation for dmax the values of
transistor channel length in the ith grid can be expressed as linear
mmax and smax and covðdmax ; p0j Þ are necessary. Those values can be
function of the PCs:
obtained by using the Clark’s formulas (4) and (5). This approach
01 0t has similarities with [37], as they are both based on Clark’s result;
Lig ¼ mLi þ ai1 l g þ þ ait l g ,
g
they differ in the fact that [37] uses its sensitivity to match
0i 0 0i
where mLi is the mean of Lig , l g is a PC in ~
Lg , all l g are independent variance while [38] scales all sensitivities to match variance (and
g
with zero mean and unit variance, and t is the total number of PCs thus it loses some correlation information).
0
in ~
Lg . In this way, any FEOL and BEOL process random variable can Finally, in [38] an extension to consider also the intra-die
be expressed as a linear function of the corresponding principal spatially uncorrelated parameters was proposed. To model the
components. intra-die variation of spatially uncorrelated parameters a separate
Hence, by assuming that different types of process parameters random variable is used for each gate (wire), instead of a single
are uncorrelated and by approximating the delay linearly using a random variable for all gates (wires) in the same grid for spatial
first-order Taylor expansion, gate and interconnect delays are correlated parameters. After each sum or max operation the
random variables that can be expressed as a linear combination of random variations for spatially uncorrelated parameters are
PCs of all relevant FEOL and BEOL process parameters: merged into one random variable. Hence, only one independent
random variable is kept for all intra-die variations of spatially
X
m
d ¼ d0 þ ki p0i , (19) uncorrelated parameters. This technique of adding an indepen-
i¼1 dent random variable to the standard form of timing quantities is
0 0 similar to [37]. However, in the approach presented in [38], the
where p0i 2 ~P ,~
P is the union of the sets of principal components of
0 structural correlations due to spatially uncorrelated parameters
each relevant process parameters, m is the size of ~ P and all the
PCs p0i in (19) are independent. Since all p0i are orthogonal random cannot be handled.
variables with zero mean and unit variance, the variance of d in
(19) can be simply computed as 4.5. Canonical form generalization
X
m
2
s2d ¼ ki , (20) As it was discussed in the previous sections, one of the most
i¼1 promising approaches for circuit analysis and optimization taking
ARTICLE IN PRESS
into account parameter variability is parameterized SSTA. This in substantially inaccurate results [50]. Furthermore, there is a
technique considers gate and wire delay D as function of process nonlinearity source coming from the max operation, which
parameters Xi: generates non-Gaussian delay distribution even if the input
operands are Gaussian distributions. The obvious way to handle
D ¼ DðX 1 ; X 2 ; . . . ; X n Þ, (24)
process parameters that have non-Gaussian distributions and/or
and Fig. 21 shows a graphical illustration of expression (24) for affect gate delay nonlinearly is to apply efficient numerical-
two process parameters. Using this description, parameterized integration techniques [31]. However, these methods are quite
SSTA computes circuit timing characteristics A (arrival and expensive in runtime. A combined approach, which processes
required arrival times, delays, timing slacks) as a function of the linear Gaussian parameters analytically and uses a numerical
same process parameters: technique only for nonlinear and non-Gaussian parameters, was
presented in [51]. The first-order canonical form was generalized
A ¼ AðX 1 ; X 2 ; . . . ; X n Þ, (25) to include non-Gaussian and nonlinear parameters, and a
Parameterized SSTA [37,38] assumes that all parameters have statistical approximation for the maximum of two generalized
independent normal Gaussian probability distributions and affect canonical forms was derived similarly as in the linear Gaussian
gate delays linearly. The independence can be achieved by PCA. case: first, a linear approximation using tightness probabilities as
According to this assumption, gate delays are represented in first- weighting factors is derived; then, the exact mean and variance
order canonical form (3), where Fig. 22 shows the canonical form values of the maximum of two generalized forms is computed.
for one process parameter. In the case of multiple process The first-order canonical form is generalized as
parameters, the canonical form is represented by a hyper-plane nLG
X
defining the timing quantity (25) as a linear function of process A ¼ a0 þ aLG;i DX LG;i þ f A ðDX N Þ þ anLG þ1 DRa , (26)
parameters and two parallel hyper-planes bounding the 3s region i¼1
of uncertainty for the uncorrelated variation. where DXLG,i are linear Gaussian parameters and aLG,i their
The assumption about the linear Gaussian nature of process sensitivities, nLG is the number of linear Gaussian parameters,
parameters is very convenient for SSTA, since it allows the use of DXN ¼ (DXN,1, D XN,2,y) is a vector of non-Gaussian and/or non-
analytical formulas for computing canonical forms, thus making linear parameters, fA is a function describing the dependence on
statistical timing analysis practical. Unfortunately, some process non-Gaussian/nonlinear parameters (it should have zero mean
parameters have significantly non-Gaussian probability distribu- value), and DRa is a normalized Gaussian parameter for uncorre-
tions. For example, via resistance is known to have an asymmetric lated variation with its sensitivity anLG þ1. The generalization of the
probability distribution, and the dopant concentration density is first-order canonical form (26) differs from the original one (3)
also observed to be well-modeled by a Poisson distribution. only by the term fA(DXN) that describes dependencies of A on
Hence, a normality assumption may lead to errors. Moreover, the nonlinear and non-Gaussian parameters. For numerical computa-
linear approximation is justified by small variations, but with tions, function fA, which can be of arbitrary form, is represented by
critical feature size shrinking, the process variations are becoming a table. Furthermore, there are no restrictions on the distribution
larger and linear approximation is not accurate enough. For of the non-Gaussian parameters that can be mutually correlated
instance, delay dependence on transistor channel length (Leff) is by means of a JPDF r(DXN,1, DXN,2,y) specified by a table for
essentially nonlinear, and assuming linear dependency can result numerical computation.
Propagation of arrival time in generalized canonical form
D (X1, X2) through a timing edge with delay in the same form is similar to
the pure linear Gaussian case. The only difference is the
summation of nonlinear functions of the arrival time and delay,
which can be performed numerically by summing tables describ-
ing these nonlinear functions. Hence, the sum of two generalized
canonical forms is also a generalized canonical form. The
computation of the sum of two timing quantities expressed as
in (26), i.e., C ¼ sum(A, B), is expressed as in the following
X2 equation:
nLG
X
C ¼ ða0 þ b0 Þ þ ðaLG;i þ bLG;i ÞDX LG;i þ ðf A ðDX N Þ þ f B ðDX N ÞÞ
i¼1
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
X1 þ a2nLG þ1 þ bnLG þ1 DRc .
c0 ¼ E½maxðA; BÞ,
ci ¼ T A ai þ ð1 T A Þbi ; i ¼ 1; . . . ; nLG ,
f C ðDX N Þ ¼ T A f A ðDX N Þ þ ð1 T A Þf B ðDX N Þ, (27)
Approximation
Accurate Cappr = c0 + fc (ΔX)
Approximation
Accurate max (A,B )
Cappr = c0 + c1ΔX
max (A,B)
A = a0 + fA (ΔX)
A = a0 + a1ΔX
B = b0 + b1 ΔX B = b0 + fB (ΔX)
ΔX ΔX
Fig. 23. Linear approximation of max of two canonical forms (left) and two generalized canonical forms (right) [51].
deviation of the exact maximum C ¼ max(A, B). Similarly to the of each variable are sufficient. This approach is practical for cases
linear Gaussian case, the approximation of the maximum of two with up to 7–8 nonlinear and non-Gaussian variables. For higher
generalized canonical forms is linear: the coefficients ci and dimensions the integrals can be computed by MC integration, and
function fC are computed as linear combinations of coefficients ai the overall approach rapidly becomes computationally expensive.
and bi, and functions fA and fB, respectively, as in (27). Fig. 23 Moreover, the approach [51] does not provide a solution in the
shows the linear approximation of the maximum of: (1) two presence of correlated non-Gaussian parameter distributions.
canonical forms that depend only on one linear parameter (left); Since the deviation from a normal distribution becomes more
(2) two generalized canonical forms that depend only on one significant when the non-Gaussian random variables exhibit
nonlinear parameter (right). The approximation of the maximum correlation, it is crucial to accurately manage the case where the
Cappr is represented by the green curve. The approximation of the non-Gaussian parameters may be correlated.
maximum of two generalized canonical forms requires the The work in [52] proposes a parameterized block-based SSTA
computation of the tightness probability TA, the mean, and algorithm that can handle both spatially correlated non-Gaussian
the second moment of max(A, B). Considering the nonlinear and as well as Gaussian distributions. The correlations are described
non-Gaussian parameter variations fixed, the expression for the using a grid structure similar to [38], which incorporates also non-
generalized canonical form can be rewritten by combining the Gaussian distributions. This approach works even for cases when
mean value a0 and the term fA(DXN) the closed-form expression of the PDF of the sources of variation is
not available, and it only requires the moments of the process
nLG
X
A ¼ ða0 þ f A ðDX N ÞÞ þ aLG;i DX LG;i þ anLG þ1 DRa . (28) parameter distributions. These moments are relatively easier to
i¼1 calculate from the process data files than the actual PDFs, and the
procedure is based on a moment matching technique to generate
Expression (28) can be considered as a canonical form Acond
the PDFs of the arrival time and delay variables.
with a mean value a0+fA(DXN) and linear Gaussian parameters. All
To incorporate the effects of both Gaussian and non-Gaussian
the sensitivities are the same as in the original generalized
parameters in the SSTA framework presented in [52], all delays
canonical form (26). If two generalized canonical forms A and B
and arrival times are represented in linear form as
are represented as in (28), the conditional tightness probability,
conditional mean, and second moments of max(A, B) are functions X
n X
m
of the nonlinear and non-Gaussian parameters DXN (with fixed D¼mþ bi X i þ cj Y j þ e Z ¼ m þ BT X þ CT Y þ e Z,
values) given by i¼1 j¼1
(29)
T A;cond ðDX N Þ ¼ ProbðA4BjDX N Þ,
c0;cond ðDX N Þ ¼ E½maxðA; BÞjDX N , where D is the random variable corresponding to a timing
2
m2;cond ðDX N Þ ¼ E½ðmaxðA; BÞÞ jDX N . quantity (gate delay or arrival time at the input pin of a gate), Xi
[Yj] is a non-Gaussian [Gaussian] random variable corresponding
The linear Gaussian parameters are independent of the non- to the physical parameter variation, bi [cj] is the first-order (linear)
linear and non-Gaussian ones. Therefore, the joint conditional PDF sensitivity of the timing quantity with respect to the ith non-
of the linear Gaussian parameters at the condition of frozen values Gaussian [jth Gaussian] parameter, Z is the uncorrelated para-
of nonlinear and non-Gaussian parameters is simply a JPDF of the meter that could be either a Gaussian or non-Gaussian random
linear Gaussian parameters. Hence, the same approach presented variable, e is the sensitivity with respect to the uncorrelated
in [37] and reported in Section 4.2 can be used to compute the variable, and n [m] is the number of correlated non-Gaussian
conditional tightness probability, mean, and second moments for [Gaussian] random variables. In the vector form, B and C are the
the maximum of two generalized canonical forms at the condition sensitivity vectors for X, the random vector of non-Gaussian
that all nonlinear and non-Gaussian parameters are frozen, by parameter variations, and Y, the random vector of Gaussian
substituting a0+fA(DXN) and b0+fB(DXN) for a0 and b0, respectively. random variables, respectively. Gaussian and non-Gaussian para-
The unconditional tightness probability, mean, and second meters are statistical independent. The mean m is adjusted so that
moment of max(A, B) can be computed by integrating the X and Y are centered, i.e., each Xi, Yj, and Z has zero-mean.
conditional tightness probability, mean, and second moment over For computational and conceptual simplicity, it is useful to
the space of nonlinear and non-Gaussian parameters with their work with a set of statistically independent random variables.
JPDF, where such integration can be implemented by any Since the random vector Y consists of correlated Gaussian random
numerical technique. Although the computational complexity variables, a PCA transformation R ¼ PYY guarantees statistical
for numerical integration by discretizing the integration region is independence for the components of the transformed vector R
exponential with respect to the number of nonlinear and non- (for a Gaussian distribution, uncorrelatedness implies statistical
Gaussian parameters, the experimental results presented in [51] independence). Such a property does not hold for general non-
show that for achieving a reasonable accuracy 5–7 discrete points Gaussian parameters X.
ARTICLE IN PRESS
Independent component analysis (ICA) is a mathematical A quadratic timing model was proposed in [50] to capture the
technique that accomplishes the desired goal of transforming a nonlinearity of the dependency of gate and wire delays as well as
set of non-Gaussian correlated random variables into a set of arrival times on the variation sources. In [50], the first-order
random variables that are statistically as independent as possible, canonical model was extended with second-order terms:
via a linear transformation. The approach described in [52] uses X X
D ¼ m þ aR þ bi X i þ aij X i X j , (31)
ICA as a preprocessing step to transform the correlated set of non-
i i;j
Gaussian random variables X1, y, Xn to a set of statistically
independent variables S1, y, Sn by the following relation: where aij are quadratic coefficients and m is a constant term that
in general might be different from the mean value of the delay
X
n
timing variable. The difference with respect to the generalized
S¼WX where Si ¼ WTi X ¼ wij X j 8i ¼ 1; . . . ; n.
j¼1
canonical form (26) proposed in [51] is that in (26) the nonlinear/
non-Gaussian parameters are represented by the nonlinear
As in [38], the chip area is first tiled into a grid, and the function fA(DXN), while in (31) they are characterized by the
covariance matrix associated with the random vector X is quadratic terms. The quadratic gate delay model is formulated by
determined. Using the covariance matrix, and the underlying the second-order Taylor expansion with respect to the global
probability distributions of the variables in X, samples of the sources of variation (evaluated around their mean value):
correlated non-Gaussian variables are generated and are given as
2
input to the ICA procedure, which produces as output the qDg qDg 1 q Dg 2
Dg mg þ aR þ Lþ Vþ L
estimates of the matrix W and its inverse A, called mixing matrix. qL qV 2 qL2
2 2
For a specific grid, the independent components of the non- 1 q Dg 2 q Dg
þ V þ LV þ , (32)
Gaussian random variables must be computed only once, and this 2 qV 2 qLqV
can be carried out as a pre-characterization step. Hence, ICA does
where the coefficients in this Taylor expansion are computed
not have to be recomputed for different circuits or different
during cell characterization, and are the same coefficients bi and
placements of the same circuit, and this preprocessing step does
aij in (31)
not impact the runtime of the SSTA procedure. ICA is applied to
2
the non-Gaussian parameters X and PCA to the Gaussian variables qDg 1 q Dg
Y, to obtain a set of statistically independent non-Gaussian bi ¼ ; aij ¼ . (33)
qX i 2 qX i qX j
variables S and a set of independent Gaussian variables R. By
substituting the respective transformation matrices A and PY in Assuming there are p global sources of variation, the Gaussian
(29), the following canonical delay model can be derived: variation vector is defined as Xg ¼ ½X 1 ; X 2 ; . . . ; X p T Nð0; Rg Þ.
The correlation matrix Rg ¼ E½Xg XTg in general is not a unit
T T
X
n
0
X
m matrix, as these global variation random variables may be correlated.
D ¼ m þ B0 S þ C 0 R þ e Z ¼ m þ bi Si þ c0j Rj þ e Z Eqs. (31) and (32) can be compacted into a quadratic form:
i¼1 j¼1
T
B0 ¼ BT A
T
C0 ¼ CT P1 Dg ¼ mg þ a R þ BTg Xg þ XTg Ag Xg , (34)
Y , (30)
where B0 T and C0 T are the new sensitivity vectors with respect to where vector Bg and matrix Ag are a vectorized representation of the
the statistically independent non-Gaussian components S1, y, Sn Taylor expansion coefficients (33). Similarly to the work [38], also in
and Gaussian principal components R1, y, Rm. The inputs required [50] the wire delay is expressed by the Elmore’s delay model:
for the SSTA approach in [52] are the moments of the random N X
X N XN X N 2
r s l ðcs W j þ cf T j Þ
vector X: mk ðX i Þ ¼ E½X ki , which can be computed from mathema- Dw ¼ Ri C j ¼ , (35)
i¼1 j¼i i¼1 j¼i
Wi Ti
tical tables if a closed-form PDF for the process parameters Xi is
available, or from the process files. After performing ICA, the next where Ri and Ci are the resistance and capacitance of the ith wire
step is to determine the moments of the independent components segment, rs is the wire resistivity, cs and cf are the wire sheet and
S1, y, Sn from the moments of the correlated non-Gaussian fringing capacitance, Wi and Ti are the width and thickness of the ith
parameters mk(Xi). The moments E½Ski can be used to compute wire segment, and N is the number of wire segments with equal
the PDF (CDF) of any random delay variable expressed in the length l. Truncating the Taylor’s expansion of (35) at the second
canonical form (30) using the binomial moment evaluation order, the quadratic wire delay model can be expressed in compact
procedure proposed in [53], since this canonical form satisfies form similarly to (34):
the independence requirement by construction. After computing
the PDF and CDF of the delay and arrival time random variables Dw ¼ mw þ a R þ BTw Xw þ XTw Aw Xw , (36)
expressed as linear canonical forms, the sum and max atomic where Xw is a 2N 1 global variation vector:
operations of block-based SSTA can be performed to obtain a Xw ¼ ½W 01 ; W 02 ; . . . ; W 0N ; T 01 ; T 02 ; . . . ; T 0N T Nð0; Rw Þ, while W 0i ¼
result in canonical form. W i E½W i and T 0i ¼ T i E½T i are random variables, which in
general are not statistically independent to each other since
interconnects usually span a long distance and these variables
4.6. Quadratic timing modeling may be spatially correlated. Due to the nonlinearity of the wire delay
with respect to the process variations of width and thickness shown
In order to accurately account the impact of non-Gaussian and in Eq. (35), the delay distribution of the wire will not be Gaussian
nonlinear parameters, most of the recent papers proposed as a even if the width and thickness are usually considered to be
solution quadratic timing models. In [54] it was reported that a Gaussian [5].
quadratic delay model matches the MC simulations quite well. If there are q gate/wire delays in the input cone of the arrival
Moreover, for any Gaussian random variable, the skew (third- time Da and there are p global sources of variation impacting the q
order moment) is always zero; hence, and non-zero skew gate/wire delays, the arrival time will be approximated by the
distributions cannot be represented in linear delay models. In following quadratic form:
contrast, under nonlinear delay models, non-zero skews can be
expressed by quadratic terms. Da ¼ ma þ aTa Ra þ BTa Xa þ XTa Aa Xa , (37)
ARTICLE IN PRESS
where random variation vectors Ra ¼ ½R1 ; R2 ; . . . ; Rq T Nð0; IÞ and Compared with the SSTA method based on first-order canonical
Xa ¼ ½X 1 ; X 2 ; . . . ; X p T Nð0; Ra Þ are mutually independent local model, the extra computation complexity of the method based on
and global variations. If every arrival time in a circuit is quadratic timing model stems from updating the quadratic
approximated as a linear combination of its input gate/wire coefficient matrix A at every arrival time propagation step. The
delays and all gate/wire delays have the quadratic delay form (34) number of quadratic coefficients is limited by the number of
and (36), then all timing variables in the circuit, including gate/ global variations and is usually a constant. Updating matrix A will
wire delays and arrival times, will have the quadratic timing not increase the computation complexity since it only involves
model: moment computation of quadratic timing variables which is not
dependent on the circuit size. To sum up, the computation
DQ ðm; a; B; AÞ ¼ m þ aT R þ BT X þ XT A X. (38) complexity of SSTA based on quadratic timing model will be the
In [50] it was demonstrated that for a quadratic timing quantity same as its canonical timing model correspondence. In [54] the
expressed as (38), its mean and variance are given by timing quantities such as gate and wire delays, arrival times,
slacks, etc., are represented in the following quadratic form:
mD ¼ E½D ¼ m þ trfR Ag,
Y ¼ XT A X þ BT X þ C,
s2D ¼ aT a þ BT R B þ 2 trfR2 A2 g,
where X ¼ ðX 1 ; X 2 ; . . . ; X n ÞT is the independent process parameter
where tr{ } means trace and equals the sum of the diagonal
vector with normalized Gaussian distributions N(0, 1) derived
elements of the matrix. The distribution of the quadratic delay
from PCA, A is a symmetric n n matrix that contains the
model (38) can be computed by means of its characteristic
coefficients of the second-order terms, while BT is a 1 n vector,
function, analytically derived in [50].
whose components are coefficients of the first-order terms, and C
If random variables X and Y are both expressed in quadratic
is a scalar constant term. Therefore, the sum operation of two
form (38), the output of the sum operator is given by
random variables Y1 and Y2 is straightforward:
Z ¼ X þ YZðmZ ; aZ ; BZ ; AZ Þ,
Y 1 ¼ XT A1 X þ BT1 X þ C 1
mZ ¼ mX þ mY ; aZ ¼ aX þ aY ,
BZ ¼ B X þ B Y ; AZ ¼ AX þ AY . Y 2 ¼ XT A2 X þ BT2 X þ C 2
Y ¼ sumðY 1 ; Y 2 Þ ¼ Y 1 þ Y 2 ¼ XT ðA1 þ A2 Þ X
In contrast, the max operator is intrinsically nonlinear, and it is
necessary to evaluate if it can be approximated with a linear þ ðBT1 þ BT2 Þ X þ C 1 þ C 2 . (39)
operator. The linearity of the max operator can be evaluated by the In order to simplify the max operation, the cross terms XiXj in
Gaussianity of the max output assuming the inputs are Gaussian. the quadratic expression:
Skewness, which is a symmetry indicator of the distribution, can
then be applied for the purpose of Gaussianity checking since a maxðY 1 ; Y 2 Þ ¼ Y 1 þ maxð0; Y 2 Y 1 Þ ¼ Y 1
Gaussian distribution will always be symmetric. To propagate the þ maxð0; XT ðA2 A1 Þ X þ ðBT2 BT1 Þ X
quadratic timing model through the max operator, in [50] the max þ C2 C1Þ
operation is first performed on two Gaussian inputs whose mean
and variance match what is computed from the quadratic timing should be removed, where Y1 and Y2 are expressed by quadratic
model. Then, the equations given in [42] are used to compute the forms as in (39). (A2–A1) is a symmetric matrix, thus it can be
output skewness. If the skewness is smaller than a threshold, then factorized as: PT R P, where R is a diagonal matrix composed by
the max operator can be approximated by a linear operator. the eigenvalues of (A2–A1) and P is the corresponding eigenvector
Otherwise, both inputs are placed into a max-tuple (Mt), which is matrix. If Z ¼ P X and U ¼ ðBT2 BT1 Þ PT , then we obtain the
a collection of random variables waiting to be maxed. The actual following expression:
max operation can be postponed, since the sum operation for a
maxðY 1 ; Y 2 Þ ¼ Y 1 þ maxð0; ZT R Z þ U Z þ C 2 C 1 Þ,
max-tuple can be simply done as
which no longer includes cross terms in the max operation. Since
MtfX; Yg þ D ¼ MtfX þ D; Y þ Dg
Xi’s are independent Gaussian random variables, then also Zi’s are
and the max operation between two max-tuples is the merge of Gaussian random variables. Moreover, since the eigenvectors P of
two tuples together: a symmetric matrix (A2A1) are orthonormal, Zi’s are also
uncorrelated; hence, Zi’s are also independent [53]. Therefore, it
maxðMtfX; Yg; MtfU; VgÞ ¼ MtfX; Y; U; Vg.
is possible to map the original parameter base into a new base
To maintain the size of the max-tuple as small as possible, the without cross terms, perform the max operation under the new
linearity of the max operation is constantly checked between any base, and map the results back into the original base. Based on
two members of the max-tuple: if their max output skewness is this orthogonalization procedure, the inputs of the max operation
small enough, then the max operation is performed on the two in the approach presented in [54] are quadratic functions of an
variables. With such conditional linear max operation, it is independent normalized base X ¼ ðX 1 ; X 2 ; . . . ; X n ÞT without cross
possible to control the error of the linear approximation for max terms, where all Xi’s are normalized Gaussian random variables
operator within an acceptable range. N(0, 1). The quadratic approximation of the nonlinear max
When two quadratic random variables X and Y expressed operation in [54] is performed by solving a system of equations
as in (38) are maximized with a linear approximation obtained via moment matching technique. However, this ap-
Z ¼ a X þ b Y þ c, the approximation parameters a, b, and c, proach requires expensive numerical integrations.
are computed assuming X and Y are Gaussian and using A novel technique to model the gate and interconnect
the equations in [42]. Hence, the quadratic timing delay was presented in [55], where the authors proposed a
variable ZQ ðmZ ; aZ ; BZ ; AZ Þ can be obtained by the following delay model representation using orthogonal polynomials,
expressions: which allows to independently computing the coefficients of
the max of two delay expansions instead of using moment
aZ ¼ a aX þ b aY ; mZ ¼ a mX þ b mY þ c; matching technique as in [54]. Their approach is based on the
BZ ¼ a B X þ b B Y ; AZ ¼ a AX þ b AY : Polynomial Chaos theory. A second-order stochastic process can be
ARTICLE IN PRESS
represented as covariance function Cðx̄1 ; x̄2 Þ. The delay expansion of each gate i is
X
1 obtained in terms of a common set of random variables by
f ¼ ai ci , (40) substituting the KLE corresponding to each random parameter of
i¼0 gate i in its delay expansion di. Once delays of all gates are
where the functions ci’s are the orthonormal basis, and depend on obtained, it is possible to perform SSTA to compute the circuit
the random variables modeling the underlying process variations. delay in terms of the common set of variables. To propagate the
If the process variations are modeled with Gaussian variables, the delay through the circuit, both the sum and the max operations
basis functions are Hermite polynomials. In practice, the series must be defined for the proposed delay expression. Given two
P P
expansion in (40) is truncated to a finite number of terms. While delay expansions d1 ¼ ni¼1 ai ci ðx̄Þ and d2 ¼ ni¼1 bi ci ðx̄Þ, their
Pn
for any general distribution of random variables and any arbitrary sum can be obtained as d1 þ d2 ¼ i¼1 ðai þ bi Þci ðx̄Þ. The compu-
function f the coefficients ai can be estimated with expensive tation of the max is based on an efficient dimensionality reduction
numerical techniques such as MC or generalized quadrature technique, which uses the moment matching methods to obtain
methods, for some specific distribution such as Gaussian, Uni- the coefficients of the max of two delay expansions. The
form, etc., and a smooth function f, the integral can be evaluated computation of the sum and max can also be extended to non-
with very high accuracy using N+1-order Gaussian quadrature, Gaussian variables. Therefore, the proposed approach can be used
where N is the order of the polynomial that accurately to propagate linear expansions of non-Gaussian variables.
approximates f. In [55] this method is used to perform library Another approach where gate delay and arrival time distribu-
characterization; since standard cell delay and output slew can be tions were modeled as polynomials using a Taylor-series expan-
modeled accurately using a second-order expansion, a third-order sion on the underlying parameters was presented in [48], where
Gaussian quadrature can be used to estimate the expansion the degree of the polynomial depends on the magnitude of the
coefficients. variations and the required level of accuracy. In this work, the gate
The delay is first expressed as a multi-variate function of both delay is a function of location-dependent parameters that are
the process variations (e.g., Vtn, Vtp, Tox, L), load capacitance Ceff, and mutually independent random variables. Suppose P, Q, and R are
input slew Sin, thus treating all these variables as deterministic such parameters (although the approach is very general and can
quantities. By denoting with ~ Z the normalized variables within the be easily extended to more parameters); hence, the gate delay can
range [1, 1], the delay deterministic model can be expressed as a be expressed similarly to (24)) as
second-order Chebyshev polynomial series in the variables ~ Z. The D ¼ DðP; Q ; RÞ, (42)
coefficients of the Chebyshev polynomial expansion are obtained
where D can be a nonlinear function, and even if the random
from the third-order interpolation of Chebyshev zeros on the
variables P, Q, and R are Gaussian distributions, in general the
Smolyak grid, to ensure some optimality in convergence while
delay distribution (42) will not be Gaussian. Each parameter can
reducing the number of interpolation points:
be represented as a linear combination of the underlying random
X
N X
6 components as in (17), using the spatial correlation model
dð~
ZÞ ¼ ai ci ð~
ZÞ ¼ a0 þ ai Z i described in Section 4.3. Therefore, expression (42) becomes
i¼0 i¼1
X
6 D ¼ DðP 1 ; P 2 ; P 3 ; P 4 ; Q 1 ; Q 2 ; Q 3 ; Q 4 ; R1 ; R2 ; R3 ; R4 Þ (43)
þ a6þi ð2Z 2i 1Þ þ þ aN Z 5 Z 6 .
i¼1
and for the sake of conciseness the random variables in (43) are
represented with the following notation:
Subsequently, the delay deterministic model is projected onto a
second-order Hermite polynomial basis in the process variables D ¼ DðX 1 ; X 2 ; X 3 ; X 4 ; X 5 ; X 6 ; X 7 ; X 8 ; X 9 ; X 10 ; X 11 ; X 12 Þ, (44)
and input slew. The coefficients of the second-order Hermite where all the random variables Xi are independent with zero
polynomial expansion, which are functions of the load capaci- mean and finite variance. The Taylor-series expansion of (44)
tance Ceff, can be readily obtained for various values of Ceff by around the mean values yields:
using the Galerkin technique. As a result, the delay can be !
12
expressed as X qD 1X 12
q2 D
D ¼ Dð0Þ þ Xk þ X 2k þ , (45)
k¼1
qX k X k ¼0 2 k¼1 qX 2k
X
N X ¼0 k
dðx̄Þ ¼ ai ci ðx̄Þ, (41)
where D(0) is the nominal value for gate delay (44) when all Xk
i¼0
random variables assume their nominal value. Expression (45) is
where x̄ represents the normalized (zero mean, unit variance) similar to the quadratic gate delay represented by (32), and the
process and slew variables. A similar approach is adopted for gate delay is modeled as a general polynomial in the global
modeling the output slew. variables Xk. It is worth pointing out that in (45) there are 66
Due to manufacturing variations, some gate parameters on a second-order cross terms in the form XiXj, with iaj:
die are random variables. Moreover, for a particular die, these
random variables are functions of the gate location on the die, and D ¼ c1 X 1 þ þ c12 X 12 þ c13 X 21 þ þ c24 X 212 þ (46)
can be modeled as a stochastic process pðx̄; yÞ, where x̄ ¼ ðx; yÞ is and consequently there are 91 terms in expression (46), which is
the location on the die, and y belongs to the space of the second-order truncation of the Taylor-series expansion (45). It
manufactured outcomes. Ideally, for each parameter, there are can be observed that by increasing the degree of the approximat-
as many random variables as the number of gates in a die. In order ing polynomial, the number of terms increase and the error in
to reduce the number of random variables, in [55] it was proposed approximation reduces. Therefore, there is a trade-off between
to represent the process pðx̄; yÞ using the Karhunen–Loéve runtime of statistical timing analysis and its accuracy. This trade-
expansion (KLE): off can be controlled by the degree of the polynomial (46).
1 pffiffiffiffiffi
X Moreover, since all timing quantities in the circuit share the same
pðx̄; yÞ ¼ ln xn ðyÞfn ðx̄Þ, global variables Xi, this approach enables to effectively capturing
n¼1
the correlations between them, similarly to the works [37,38].
where fxn ðyÞg is a set of uncorrelated random variables, ln are the The result of the sum operation between arrival time at the
eigenvalues, and ffn ðx̄Þg are the orthonormal eigenfunctions of the gate input Ai and the gate delay Di approximated as a polynomial
ARTICLE IN PRESS
(46) in the same independent global parameters is also a quadratic form, called general canonical form:
polynomial in the same global parameters. Likewise expression X
D ¼ d0 þ ðai X i þ bi X 2i Þ þ ar X r þ br X 2r , (49)
(9) the coefficient of each term in the resulting polynomial
is the sum of the coefficients of the corresponding terms in where Xi are the global sources of variation, and Xr is the
Ai and Di: independent random variation. The Xi random variables may have
arbitrary distributions with bounded values; they are assumed
Di ¼ polyðX 1 ; X 2 ; . . . ; X 12 Þ,
independent (if they are correlated, techniques like ICA [52] may
Ai ¼ polyðX 1 ; X 2 ; . . . ; X 12 Þ, be used to generate a new set of independent components) and
Aiout ¼ Ai þ Di ¼ polyðX 1 ; X 2 ; . . . ; X 12 Þ. (47) centered with zero mean. To propagate the delay in block-based
SSTA, not only it is necessary to efficiently compute the sum and
Hence, the max operation among n polynomials obtained with
max operations, but the timing results after each operation must
(47) is a polynomial in the same global random variables
be represented in the same general canonical form. Therefore,
Aout ¼ maxðA1out ; A2out ; . . . ; Anout Þ ¼ polyðX 1 ; X 2 ; . . . ; X 12 Þ. (48) given D1 and D2 in the form (49)
P
In [48] a regression-based strategy is proposed to compute the D1 ¼ d01 þ ðai1 X i þ bi1 X 2i Þ þ ar1 X r1 þ br1 X 2r1 ;
P
max operation by performing least square fitting, trying to find the D2 ¼ d02 þ ðai2 X i þ bi2 X 2i Þ þ ar2 X r2 þ br2 X 2r2
best polynomial approximating the degree of polynomial (48)
with the smallest error. To approximate Aout with a degree-two both D ¼ D1+D2 and D ¼ max(D1D2) must be represented as in
(i.e., quadratic) polynomial, the coefficients of the approximating (49). Denote DD1 ¼ D1m1 and DD2 ¼ D2m2, where m1 and m2 are
polynomial should yield the smallest error against the actual max the mean values of D1 and D2, respectively. Since both D1 and D2
operation result obtained on a set of sampling vectors for the are timing quantities, their values are physically lower- and
parameter Xi’s. The advantage of using regression stems from the upper-bounded:
generality to handle timing distributions of any nature (not only lpDD1 pl; hpDD2 ph.
Gaussians). However, the computational complexity of this
To compute the max, the work [56] proposed a six-step flow.
approach grows exponentially with the polynomial order. To
The first step computes the JPDF of D1 and D2, denoted as g(v1, v2).
achieve the accuracy obtained from using a higher-order poly-
If the JPDF of DD1 and DD2 is f(v1, v2), it is easy to show that:
nomial as well as runtime that is comparable to SSTA with linear
delay models, a scheme using linear-modeling-based SSTA to gðv1 ; v2 Þ ¼ f ðv1 m1 ; v2 m2 Þ.
drive the polynomial (i.e., quadratic) SSTA was proposed in [48].
Then, the JPDF f(v1, v2) is approximated by means of K-order
Although the quadratic polynomial can represent the PDF/CDF of
Fourier series
gate delays and arrival times more accurately than linear
modeling, the mean and variance of the distributions are captured X
K
with reasonable accuracy with first-order polynomials. Therefore, f ðv1 ; v2 Þ apq ezp v1 þZq v2 , (50)
p;q¼K
in [48] a second-order polynomial modeling technique driven by
linear modeling (which has lower runtime) was derived. With this where zp ¼ jpp=l and Zq ¼ jqp=h. In [56] an effective solution to
technique the work presented in [48] avoided the complexity of simplify the computation of the Fourier coefficients apq was
solving a large (i.e., quadratic) polynomial regression problem at developed: for an arbitrary source of variation Xi, the Xi’s range is
each gate (during the max operation) in block-based SSTA by divided into M small sub-regions, S1, y, SM. Then, the Fourier
solving a smaller linear regression problem and then performing transform of the PDF of Xi, denoted as gi(xi), is pre-calculated for
moment matching (first two moments). all pre-determined sub-regions of the variation source Xi, and the
However, the proposed techniques to handle nonlinear delay results are stored into a 1D lookup table. The valid region of each
dependency and non-Gaussian variation sources suffer from some variation source is uniformly divided into twelve sub-regions and
limitations. The approach [52] addressed the non-Gaussian the fourth-order Fourier series is considered to represent the JPDF.
variation sources, but it is still based on a linear delay model. In the second step, the raw moments Mt ¼ E½maxðD1 ; D2 Þt for
The nonlinear effects were considered in [50] and [54]: these D ¼ max(D1, D2) are computed. According to (50), Mt can be
works proposed a quadratic delay model. However, to keep the written as
complexity under control they assumed that all the sources of X
K
variation must be represented by a Gaussian distribution, even Mt ¼ apq Lðt; p; q; l; h; m1 ; m2 Þ, (51)
though the delay may not be Gaussian. In order to compute the p;q¼K
max between two delays D1 and D2, [50] treated D1 and D2 as where L ¼ ðt; p; q; l; h; m1 ; m2 Þ can be efficiently evaluated with
Gaussians to obtain the tightness probability, even if there is no closed-form formulas. In the third step, the expectation Eci;t ¼
justification why the tightness probability formula can be applied E½X ti maxðD1 ; D2 Þ is evaluated, by first obtaining the JPDF of Xi, DD1,
to non-Gaussian distributions. Instead, [54] proposed to compute and DD2, and then by computing Eci,t, similarly to the derivation of
the D ¼ max(D1, D2) by means of moment matching techniques, (51).
which requires several expensive numerical integrations. The Finally, the last three steps are needed to reconstruct
works in [48,51] handled both nonlinear and non-Gaussian effects D ¼ max(D1, D2) into the general canonical form (49), by first
simultaneously. The first one proposed to compute D ¼ max(D1, computing the coefficients ai and bi by matching E½X ti
D2) by a regression-based strategy, while the latter dealt with the maxðD1 ; D2 Þ for t ¼ 1, 2; then by computing ar and br in (49) by
max operation through the concept of tightness probability, matching the second- and third-order moments of max(D1, D2);
computed by means of expensive numerical multi-dimensional finally by computing d0 in (49) by matching the first-order
integrations. As a result, such methods are not suitable to handle a moment of max(D1, D2). The computation of D ¼ D1+D2 in the
large number of non-Gaussian random variables. general canonical form (49) is straightforward, both for the
A novel SSTA technique that efficiently performs the max nominal and global random variable coefficients, as they can be
operation and simultaneously handles both the nonlinear depen- obtained by adding up the corresponding terms:
dency and non-Gaussian distributions, was proposed in [56]. The
authors represented the timing quantities in the following d0 ¼ d01 þ d02 ; ai ¼ ai1 þ ai2 ; bi ¼ bi1 þ bi2 .
ARTICLE IN PRESS
Local Variables
Global
Variables RRR RRR RRR
Fig. 24. Application of RRR technique to second-order SSTA: conceptual (on the left) and actual (on the right) [57].
t1ot4 and t3ot2, where t1, t3 are the early arrival times and t2, t4 considering the uncertainty from variability first. Hence, SSTA on a
are the late arrival times. Following this iterative timing window wire i computes a distribution of a single signal switching.
alignment procedure [60], proposed to extend SSTA to consider However, when the uncertainty from functional information and
the impact of variations and coupling effects concurrently. In input configuration is also considered, the timing information on i
SSTA, the earliest arrival time and latest arrival time for a timing is then a set of signal switching distributions, which is
window of a given net become random variables. The overlap of represented by a window of signal switching distributions. In
two timing windows can no longer be simply determined by the [61] the concept of statistical switching window was introduced as
condition t1ot4 and t3ot2, represented by the timing interval a representation for a set of random variables. For any random
between the dashed lines in Fig. 25. On the other hand, since t1, t2, variable xni in the set, a lower and an upper bound on the
t3, t4 are all random variables, new random variables (t4t1) and probability that xni is not less (or more) than a given real value c is
(t2t3), along with the overlap condition can be defined as considered. The statistical switching window extends the bounds
follows: over the entire range of c, in the form of two distribution of
correlated random variables xli and xui , respectively. These two
mt4 t1 þ 3st4 t1 40; random variables are called the bounding Random Variables of
mt2 t3 þ 3st2 t3 40: the statistical switching window xi. Mathematically, the statistical
switching window is defined as follows:
By using the 3s values to determine the overlap of two timing
windows, represented by the timing interval between the grey xi ¼ ½xli ; xui ¼ fxni : Prðxli pxni pxui Þ ¼ 1g,
lines in Fig. 25, the proposed method prevents the over-shrink of
the timing windows and preserves the earliest and latest arrival where Pr(k) denotes the probability of event k. Then, the inclusion
times. Furthermore, the correlations between different arrival relation between two statistical switching windows xi and xj is
times are inherently incorporated into the new random variables, formally defined as
thus removing any unnecessary pessimism in the timing window xi xj def ðPrðxlj pxli pxui pxuj Þ ¼ 1Þ.
alignment. The mean value and the standard deviation for new
random variable titj can be computed from the existing mean Both a statistical switching window example and the inclusion
and covariance tables: relation between statistical switching windows are graphically
illustrated in Fig. 26. The amount of delay noise is function of the
mti tj ¼ Meanðti Þ Meanðtj Þ,
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi overlap between the switching windows on the coupled nets. For
sti tj ¼ Varðt i Þ 2Covðt i ; t j Þ þ Varðt j Þ. two statistical switching windows xi and xj, the overlap between
the windows is a random variable Oij defined as follows:
However, this approach estimates the switching window over-
lap deterministically, based on the worst-case values, leading to a Oij ¼ minðxui ; xuj Þ minðxli ; xlj Þ.
pessimistic computation of the victim delay variation due to
coupling. An alternative solution to statistical timing analysis with Since the overlap between two statistical switching windows is
coupling is to consider a distribution of switching windows on each a random variable, the coupling induced delay noise is conse-
wire of the circuit [61]. This view is obtained by first considering quently a random variable.
the uncertainty from ignorance of functional information, and To compute the worst-case delay on a wire i when coupling
then the uncertainty from variability. Each window is denoted by effects are considered, we need to add to the delay on the wire
a best and worst value. Therefore, the distribution of the when coupling effects are not considered, and the coupling
switching windows contains the distributions of the best- and induced delay noise due to each wire it couples with. In [61], an
worst-case values. The two distributions thus obtained are example of computation of a delay noise as a random variable
represented as the distributions of two correlated random based on a simple coupling model is illustrated. The considered
variables, respectively. The window formed using these random coupling model is given by
( O
variables as the best and the worst-case value, respectively, D overlap;
contains all possible signal switching distributions. This transfor- D¼
DN no overlap;
mation of the original solution gives an alternate view of the
solution to SSTA with coupling as a window of signal switching where DO and DN are the values assigned to D, depending on
distributions. This view of the solution can also be obtained by whether the statistical switching windows between the coupled
Fig. 26. A statistical switching window for a set of distributions (a) and the inclusion relation between two statistical switching windows (b) [61].
ARTICLE IN PRESS
wires overlap or not, respectively. Using PCA, all random variables then the PDF of the delay noise fd(s) as a function of the input
are expressed as a weighted sum of independent and orthonormal skew distribution can be directly obtained by applying the basic
random variables xi, i ¼ 1,2, y, n. The delay noise is a random theory of probability and statistics:
variable and it is expressed as
f s ðr 1 Þ f s ðr 2 Þ
8 f d ðsÞ ¼ þ , (52)
> O P
n
O j2a1 r 1 þ b1 j j2a2 r 2 þ b2 j
>
> d þ d x overlap;
Xn < 0 i¼1 i i
D ¼ d0 þ di x i ¼ P
n where r1 and r2 are the smaller and larger roots of the two
>
> N N
i¼1 > d0 þ
: di xi no overlap:
i¼1
quadratic pieces of the DCC. The delay noise in (52) is not
necessarily Gaussian. However, using the PDF of delay distribution
from (52), it is possible to compute the first and the second
The computation of the di coefficients as a function of the
moment of delay noise in closed form. Therefore, the canonical
overlap random variable O is reported in [61]. While the approach
form of delay noise can be constructed by matching the first two
proposed in [61] is extensible to an arbitrary coupling model, it
moments, while the correlations of the delay noise distribution are
cannot use Gaussian switching windows because the assumption
assigned by using the sensitivities of the given single aggressor-
of a Gaussian distribution for the bounding Random Variables
victim input skew distribution to process parameters. The proposed
prohibits the generic use of the inclusion relation between the
analytical technique can be extended so that the worst-case delay
switching windows. In this case, the solution is to replace
noise computation can be performed within the current SSTA
Gaussian distributions with truncated Gaussian distributions for
framework with statistical timing windows, instead of single skew
representing the bounding Random Variables. Arithmetic opera-
distribution. The solution is based on the result reported in [63],
tions on Gaussians are used identically for truncated Gaussians,
where it is shown that regardless of the aggressor transition, the
although they involve some approximations.
worst-case delay noise occurs when the victim input transition
Another approach to include crosstalk noise into SSTA was
occurs at the latest point in its timing window. Therefore, for
proposed in [62]. Given a quadratic model of the delay change
computing the worst-case delay noise, in [62] only the distribution
curve DCC which captures the dependence of delay noise on the
of late victim input arrival time was considered. Given the
aggressor-victim input skew, graphically represented in Fig. 27,
statistical timing window at the input of the aggressor, the early
and an input skew distribution in canonical form, the proposed
and late aggressor input arrival time distributions are subtracted
approach allows to obtaining closed-form expressions of the
from the late victim arrival time distribution to obtain the
resulting delay noise distribution. Since the correlations in the
statistical skew window. The arrival time distributions of end
input skew are preserved exactly in the delay noise distribution, it
points of the skew window are referred to as early and late skew
is possible to express the delay noise in canonical form. Without
distributions. As shown in Fig. 27, the skew window can align with
loss of generality, the input skew distribution is given by
the DCC in three different ways, denoted as Case A (when the mean
s ¼ s0 þ s1 x1 þ s2 x2 , of late skew distribution is less than the worst-case skew value if
where s0 is the mean, and s1 and s2 are the sensitivities with the DCC), Case B (when the mean of late skew distribution is less
respect two independent standard normal random variables x1 than z1 and the mean of early skew distribution is less than z1) and
and x2. Since the process parameters are Gaussians, the input Case C (when the mean of early skew distribution is greater than
skew PDF fs(s) is therefore normally distributed with mean m and z1). Since any skew distribution which lies within the skew window
variance s expressed by is feasible, for Case B, the delay noise is modeled by its worst-case
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi value dmax. Note that in Case A, the DCC is a monotonic function of
m ¼ s0 ; s ¼ s21 þ s22 . skew. Therefore, the mean delay noise will be maximized when the
mean of the feasible skew distribution coincides with the late skew
distribution. Therefore, the delay noise distribution in canonical
Supposing that the delay change curve DCC is piece-wise
form can be analytically computed as a function of the late skew
quadratic, as depicted in Fig. 27:
distribution. Similarly, for Case C, the early skew distribution can be
8 9
> 0; soz0 > used to obtain the delay noise distribution. As a result, given the
>
> >
< a1 s2 þ b1 s þ c1 ; z0 pspz1 >
= statistical timing window from block-based SSTA, the delay noise
DCC ¼ distribution can be analytically computed. Since it is in canonical
>
> a s þ b2 s þ c2 ; z1 pspz2 >
2
>
> 2
: >
; form, it can trivially be added to the late victim output arrival time
0; s4z2
distribution and propagated downstream.
5. Conclusions
[29] V. Veetil, D. Sylvester, D. Blaauw, Efficient Monte Carlo based incremental [53] X. Li, J. Le, P. Gopalakrishnan, L.T. Pileggi, Asymptotic probability extraction
statistical timing analysis, in: Proceedings of the International Workshop on for non-normal distributions of circuit performance, in: Proceedings of
Timing Issues, February 2008. the International Conference on Computer-Aided Design, November 2004, pp.
[30] L. Scheffer, The count of Monte Carlo, in: Proceedings of the International 1–9.
Workshop on Timing Issues, February 2004. [54] Y. Zhan, A.J. Strojwas, X. Li, L.T. Pileggi, D. Newmark, M. Sharma,
[31] J.A.G. Jess, K. Kalafala, S.R. Naidu, R.H.J. Otten, C. Visweswariah, Statistical Correlation-aware statistical timing analysis with non-gaussian delay
timing for parametric yield prediction of digital integrated circuits, in: distributions, in: Proceedings of the Design Automation Conference, June
Proceedings of the Design Automation Conference, June 2003, pp. 932–937. 2005, pp. 77–82.
[32] C.S. Amin, N. Menezes, K. Killpack, F. Dartu, U. Choudhury, N. Hakim, Y.I. [55] S. Bhardway, P. Ghanta, S. Vrudhula, A framework for statistical timing
Ismail, Statistical static timing analysis: how simple can we get?, in: analysis using nonlinear delay and slew models, in: Proceedings of
Proceedings of the Design Automation Conference, June 2005, pp. 652–657. the International Conference on Computer-Aided Design, November 2006,
[33] A. Devgan, C. Kashyap, Block-based static timing analysis with uncertainty, in: pp. 225–230.
Proceedings of the International Conference on Computer-Aided Design, [56] L. Cheng, J. Xiong, L. He, Nonlinear statistical static timing analysis for non-
November 2003, pp. 607–614. Gaussian variation sources, in: Proceedings of the Design Automation
[34] M. Orshansky, A. Bandyopadhyay, Fast statistical timing analysis handling Conference, June 2007, pp. 250–255.
arbitrary delay correlations, in: Proceedings of the Design Automation [57] Z. Feng, P. Li, Y. Zhan, Fast second-order statistical static timing analysis using
Conference, June 2004, pp. 337–342. parameter dimension reduction, in: Proceedings of the Design Automation
[35] M. Orshansky, K. Keutzer, A general probabilistic framework for worst case Conference, June 2007, pp. 244–249.
timing analysis, in: Proceedings of the Design Automation Conference, June [58] Z. Feng, P. Li, Performance-oriented statistical parameter reduction of
2002, pp. 556–561. parameterized systems via reduced rank regression, in: Proceedings of the
[36] A. Agarwal, V. Zolotov, D. Blaauw, Statistical timing analysis using bounds and International Conference on Computer-Aided Design, November 2006, pp.
selective enumeration, IEEE Trans. Computer-Aided Design 22 (2003) 868–875.
1243–1260. [59] R. Arunachalam, K. Rajagopal, L.T. Pileggi, TACO: timing analysis with
[37] C. Visweswariah, K. Ravindran, K. Kalafala, S.G. Walker, S. Narayan, First-order coupling, in: Proceedings of the Design Automation Conference, June 2000,
incremental block-based statistical timing analysis, in: Proceedings of the pp. 266–269.
Design Automation Conference, June 2004, pp. 331–336. [60] J. Le, X. Li, L.T. Pileggi, STAC: statistical timing analysis with correlation, in:
[38] H. Chang, S.S. Sapatnekar, Statistical timing analysis considering spatial Proceedings of the Design Automation Conference, June 2004, pp. 343–348.
correlations using a single PERT-like traversal, in: Proceedings of the Interna- [61] D. Sinha, H. Zhou, Statistical timing analysis with coupling, IEEE Trans.
tional Conference on Computer-Aided Design, November 2003, pp. 621–625. Computer-Aided Design 25 (2006) 2965–2975.
[39] M.R.C.M. Berkelaar, Statistical delay calculation: a linear time method, in: [62] R. Gandikota, D. Blaauw, D. Sylvester, Modeling crosstalk in statistical static
Proceedings of the International Workshop on Timing Issues, December 1997, timing analysis, in: Proceedings of the International Workshop on Timing
pp. 15–24. Issues, February 2008.
[40] E.T.A.F. Jacobs, M.R.C.M. Berkelaar, Gate sizing using a statistical delay model, [63] R. Gandikota, K. Chopra, D. Blaauw, D. Sylvester, M. Becer, J. Geada, Victim
in: Proceedings of the DATE, March 2000, pp. 283–290. alignment in crosstalk aware timing analysis, in: Proceedings of
[41] S. Tsukiyama, M. Tanaka, M. Fukui, A new statistical static timing analyzer the International Conference on Computer-Aided Design, November 2007,
considering correlation between delays, in: Proceedings of the International pp. 698–704.
Workshop on Timing Issues, December 2000, pp. 27–33.
[42] C.E. Clark, The greatest of a finite set of random variables, Oper. Res. 9 (1961) Cristiano Forzan received the Dr. Eng. degree in
145–162. electronics engineering from the University of Padova,
[43] D. Sinha, H. Zhou, N.V. Shenoy, Advances in computation of the maximum of a Italy, in 1993. In 1994 he joined STMicroelectronics in
set of Gaussian random variables, IEEE Trans. Computer-Aided Design 26 Agrate Brianza, Italy, where he is a CAD Expert. He has
(2007) 1522–1533. published several papers in his research areas, which
[44] K. Chopra, B. Zhai, D. Blaauw, D. Sylvester, A new statistical max operation for include delay calculation, digital standard cell char-
propagating skewness in statistical timing analysis, in: Proceedings of acterization, interconnect characterization and model-
the International Conference on Computer-Aided Design, November 2006, ing, crosstalk- and noise-aware timing analysis.
pp. 237–243. Presently his research interests are in statistical
[45] A. Agarwal, D. Blaauw, V. Zolotov, Statistical timing analysis for intra-die analysis and optimization, variability-aware design,
process variations with spatial correlations, in: Proceedings of the Interna- DFM for nanometer technologies, and EMC-aware
tional Conference on Computer-Aided Design, November 2003, pp. 900–907. design. In 2008 he received the ST Corporate STAR
[46] A. Agarwal, D. Blaauw, V. Zolotov, S. Sundareswaran, M. Zhou, K. Gala, R. Gold Award for participating to the R&D excellence
Panda, Statistical delay computation considering spatial correlations, in: team on EMC-aware design.
Proceedings of the ASP-DAC, January 2003, pp. 271–276.
[47] V. Mehrotra, S.L. Sam, D. Boning, A. Chandrakasan, R. Vallishayee, S. Nassif, A
methodology for modeling the effects of systematic within-die interconnect Davide Pandini holds a Ph.D. degree in electrical and
and device variation on circuit performance, in: Proceedings of the Design computer engineering from Carnegie Mellon Univer-
Automation Conference, June 2000, pp. 172–175. sity, Pittsburgh, PA. He was a research intern at Philips
[48] V. Khandelwal, A. Srivastava, A general framework for accurate statistical Research Labs. in Eindhoven, the Netherlands, and at
timing analysis considering correlations, in: Proceedings of the Design Digital Equipment Corp., Western Research Labs. in
Automation Conference, June 2005, pp. 89–94. Palo Alto, CA. He joined STMicroelectronics in Agrate
[49] B. Cline, K. Chopra, D. Blaauw, Y. Cao, Analysis and modeling of CD variation Brianza, Italy, in 1995, where he is a Design Methodol-
for statistical static timing, in: Proceedings of the International Conference on ogies R&D manager and a senior member of the
Computer-Aided Design, November 2006, pp. 60–66. technical staff. His current research interests include
[50] L. Zhang, W. Chen, Y. Hu, J.A. Gubner, C.C.-P. Chen, Correlation-preserved non- signal integrity and interconnect modeling for DSM
Gaussian statistical timing analysis with quadratic timing model, in: technologies, statistical analysis and optimization,
Proceedings of the Design Automation Conference, June 2005, pp. 83–88. asynchronous design, DFM and regular design, EMC/
[51] H. Chang, V. Zolotov, S. Narayan, C. Visweswariah, Parameterized block-based EMI. Dr. Pandini has authored and coauthored more
statistical timing analysis with non-Gaussian parameters, nonlinear delay than forty papers in international journals and conference proceedings, and during
functions, in: Proceedings of the Design Automation Conference, June 2005, the academic years from 1998 to 2000, he was a visiting professor at the University
pp. 71–76. of Brescia, Italy. He serves on the program committee of international conferences
[52] J. Singh, S. Sapatnekar, Statistical timing analysis with correlated non- such as DAC, GLSVLSI, EMC-COMPO, PATMOS, ASYNC, and ESSDERC. Dr. Pandini
Gaussian parameters using independent component analysis, in: Proceedings received the ST Corporate STAR 2008 Gold Award for leading the R&D excellence
of the Design Automation Conference, July 2006, pp. 155–160. team on EMC-aware design.