Beruflich Dokumente
Kultur Dokumente
Library
a b
0 1
clk
0 1
clk
Design Process
Design : specify and enter the
design intent
Verify:
verify the correctness of design and implementation
Implement:
refine the design through all phases
Implementation Verification
HDL RTL Synthesis netlist logic optimization netlist physical design layout
a b
0 1
manual design
a b
0 1
clk
Is the implementation consistent with the original design intent? Is what I implemented what I wanted?
4
clk
manual design
a b
0 1
a b
0 1
functionality: 0-1 behavior on regression test set and timing: maximum delay of circuit across critical paths
clk
ASIC signoff
Software Simulation
Simulation driver (vectors)
s clk a
0
d q
clk
a b
0 1
clk
ASIC signoff
functionality: formal equivalencechecking techniques well talk about this later and timing: use static timing analysis
Timing-driven optimization give fast accurate timing information to guide tools as they evolve the chip
Logic synthesis Placement Routing
Combinational logic
Combinational logic
Combinational logic
clk
clk
clk
determine fastest permissible clock speed (e.g. 100MHz) by determining delay (including set-up and hold time) of longest path from register to register (e.g. 10ns.) largely eliminates need for gate-level simulation to verify the delay of the circuit
cycle time
data Tclock1
Tmax T
Q1
Tclock1 clock
Tclock2
10
setup time
data Tclock1
Tmax + Tsetup T
Q1
Tclock1 clock
Tclock2
11
clock skew
Q1
Tclock1 clock
Tclock2
12 12
clock-to-Q
Q1
Tclock1 clock
Tclock2
13
hold time
data Tclock1
Tmin Thold + Tskew
Q1
Tclock1 clock
Tclock2
14
hold time D stable after clock set-up time D stable before clock When signal may change
Delay calculation
Delay numbers for gates Delay numbers for wires
D
Vdd
tW1
tG3
tG1
tG2
tW1
tG3
Time
17
A CL B
F CL
Vdd tpd
tslew
Time
20
Scope of Variation
Inter-Die Variation Intra-Die Variation As an ASIC vendor we must ensure that our chips work across this range of variation This is a very restrictive condition and will be responsible for many conservative assumptions throughout timing analysis
N(,)
21
FF: Fast NMOS & Fast PMOS SF: Slow NMOS & Fast PMOS 3 corner Model: TT, SS, FF
Must use worst case assumptions for Slow/set-up and Fast/hold analysis
22
Interpolate between table entries Interpolation error is usually below 10% of SPICE
Output Capacitance Output Capacitance
Input Slew
Intrinsic Delay
Input Slew
Output Slew
Resulting waveform
23
Vdd A B tpd F
A B
Time
F CL
Vdd A tpd F
Time
24
Delay calculation
Delay numbers for gates Delay numbers for wires
Wire-Dominated Chips
Wiring requirements grow dramatically
Interconnect effects
Large RC and RLC delays Inter-wire coupling
Vin
Vout C
Distributed RC model
Required for longer interconnect lines Exact solution requires solving diffusion equation No closed-form solution - approximations
R1
C1
R2
C2
RN CN
28
R1
C1
R2
C2
RN CN
29
The q unknown time- constants, or poles, of the circuit, are obtained through a moment matching technique (a Pad approximation) A first-order AWE approximation reduces to RC tree methods
30
Constructing RC Network
Resistance estimation (extraction) is easy
Rw = Lw W T
S C11
W T H
31
Electrical-field lines
cint =
di
t di
WL
SCwire =
S 1 = S SL SL
32
Interwire Capacitance
fringing
parallel
33
T H
Problem: wire-to-wire capacitance and wire-length not known until placement and routing of all circuit components Two modes of evaluating wire delay
Post P&R: 2, 2&1/2 or 3-D interconnect parasitic extraction (e.g. capacitance and resistance), distributed RC delay models Synthesis: tabular data based on prior chips
34
Wire delay based on gathered empirical data from past chips or estimated by typical (average) wirelength given by Rents rule see extra slides at the end of the lecture
Function of FO Function of block size fanout
5 4 3 2 1 100 500 1000 2000 3000 block size (gates) 35 cap
input slew intrinsic delay output capacitance non-linear models used in tabular approach
Physical footprint (area) Power characteristics
Library
37
Delay calculation
Delay numbers for gates Delay numbers for wires
Combinational logic
Combinational logic
Combinational logic
clk
clk
clk
Clocking issues Regimes: single-phase, two-phase, multi-phase overlapping, non-overlapping qualified clocks, clock skew
39
Clock Skew
The clock may be delivered to each FF at a different, unsynchronized, time
Combinational logic
Combinational logic
Combinational logic
clk
clk
clk
Delay calculation procedure Longest graphical path Longest true delay Method of calculation ``Batch mode Incremental
41
Typical Simplifications
Clocking issues
clocking regime treated as orthogonal issue
Delay modeling
gate - fixed pin-to-pin delays, interconnect - non-linear effects captured in tables Process-voltage-temperature - worst-case values used
Delay calculation
``extract combinational logic from sequential circuit Choose for accuracy
42
Delay calculation
Delay numbers for gates Delay numbers for wires
Combinational logic
Combinational logic
clk
clk
clk
original circuit
Combinational logic
extracted block
A
.05
2 Y 2 .05
.20 Z .15
C
.20 1
2 .20
Whats the right mathematical object to use to represent this physical object?
45
Problem formulation - 1
Use a labeled directed graph G = <V,E> Vertices represent gates, primary inputs and primary outputs Edges represent wires Labels represent delays Now what do we do with this?
1 0 W .10 1 0 .05 X
A
.05
2 Y 2 .05 0 1 W .05
.20 Z .15
C
.20 1
2 .20
A
0 0
.1
.20
2 .15 Z
.2
.20
46
dx z
Z dY z Y
A(Z)
A(Y)
A( ) = max (A(u) + du )
uFI( )
47
Problem formulation - 2
Use a labeled directed graph G = <V,E> Enumerate all paths - choose the longest?
0 W .10 1 0 .05 X
A
.5
2 Y 2 .05 0 1 W .5
.20 Z .15
C
.20 1
2 .20
A
0 0 1
X 2 .20 2 .15 Z
.1
.2
.20 Y 2
48
Problem formulation - 3
0 0 0 0
A
.1 .2 1 1
0 W .5
C0
0
Origin .1
of the DAG) Forward-prop(W){ for each vertex v in W
.20
B Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed add w to W } Longest path(G) Forward_prop(Origin) }
49
Algorithm Execution
0 0 0 0
AO
.1 .2 1 1
0 W .5
2 .20 2 Z .15 0
C0
0
Origin .1
of the DAG) Forward-prop(W){ for each vertex v in W
.20
B Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed add w to W } Longest path(G) Forward_prop(Origin) }
50
Algorithm Execution
0 0 0 0 0C
AO
0 .1 .2 1 1
0 W .5
2 .20 2 Z .15 0
Origin .1
of the DAG) Forward-prop(W){ for each vertex v in W
.20
B Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed add w to W } Longest path(G) Forward_prop(Origin) }
51
Algorithm Execution
0 0 0 0 0
AO
.1 .2 1 1
0 W .5
2 .20 2 Z .15 0
C0
0 .1
Origin .1
of the DAG) Forward-prop(W){
.20
B Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
for each vertex v in W for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed add w to W } Longest path(G) Forward_prop(Origin) }
52
Algorithm Execution
0 0 0 0 0
AO
.1 .2 1 1
0 W .5
2 X
C0
0 .1
Origin .1
of the DAG) Forward-prop(W){
.20
B Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
for each vertex v in W for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed add w to W } Longest path(G) Forward_prop(Origin) }
53
Algorithm Execution
AO
0 0 0 0 1 1.1 W .2 1 Y 2 .5 .20 X 2 .20 2 Z .15 0
C0
.1
.1
Origin .1
of the DAG) Forward-prop(W){ for each vertex v in W
B Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay (w) + delay(<v,w>)) if all incoming edges of w have been traversed add w to W } Longest path(G) Forward_prop(Origin) }
54
Algorithm Execution
AO
0 0 0 0 1 1.1 W .2 1 .5 2.2 Y 2 .20 X 2 .20 2 Z .15 0
C0
.1
.1
Origin .1
of the DAG) Forward-prop(W){ for each vertex v in W
B Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed add w to W } Longest path(G) Forward_prop(Origin) }
55
Algorithm Execution
AO
0 0 0 0 1 1.1 W .2 1 .5 3 Y 2 .20 X 2 .20 2 Z .15 0
C0
.1
.1
Origin .1
of the DAG) Forward-prop(W){ for each vertex v in W
B Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed add w to W } Longest path(G) Forward_prop(Origin) }
56
Algorithm Execution
AO
0 0 0 0 1 1.1 W .2 1 Y .5 3 2 .20 2 X 3.6 .20 2 Z .15 0
C0
0
.1
Origin 1
of the DAG) Forward-prop(W){ for each vertex v in W
B Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed add w to W } Longest path(G) Forward_prop(Origin) }
57
Algorithm Execution
AO
0 0 0 0 1 1.1 W .2 1 Y .5 2 3 .20 2 X 3.6 .20 2 5.8 Z .15 0
C0
0
.1
Origin 1
of the DAG) Forward-prop(W){ for each vertex v in W
B Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed add w to W } Longest path(G) Forward_prop(Origin) }
58
Algorithm Execution
AO
0 0 0 0 1 1.1 W .2 1 Y .5 2 3 .20 2 X 3.6 .20 2 5.8 Z .15 0
C0
0
.1
Origin 1
of the DAG) Forward-prop(W){ for each vertex v in W
B Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed add w to W } Longest path(G) Forward_prop(Origin) }
59
Algorithm Execution
AO
0 0 0 0 1 1.1 W .2 1 Y .5 2 3 .20 2 X 3.6 .20 2 5.8 Z .15 5.95
C0
0
.1
f
0
Origin 1
of the DAG) Forward-prop(W){ for each vertex v in W
B Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed add w to W } Longest path(G) Forward_prop(Origin) }
60
C0
0
.1
f
0
Origin 1
of the DAG) Forward-prop(W){ for each vertex v in W
B Compute the longest path in a graph G = <V,E,delay,Origin> (delay is set of labels, Origin is the super-source
for each edge <v,w> from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>)) if all incoming edges of w have been traversed add w to W } Longest path(G) Forward_prop(Origin) }
61
clock
62
Required Time
Required time R(v) is the time before which a signal must arrive to avoid a timing violation
Y
R(Y)
R(X)
dx Y dX z
Z R(Z)
Then recursively
R( ) =
uFO( )
min (R(u) du )
0 1 1.1 0.95 .2 1 W .5
2 X 3.45 2 Y
f
0
Origin
0.45
3 3.45
.20
Assume required time at output R(f) = 5.80 Propagate required times backwards
64
Timing Slack
From arrival and required time can compute slack. For each node v: S( ) = R( ) A( ) Slack reflects criticality of a node Positive slack
Node is not on critical path. Timing constraints met.
Zero slack
Node is on critical path. Timing constraints are barely met.
Negative slack
There is a timing violation
-0.15
0.45
66
Path slack reflects slack at output assuming no other path in circuit is active
Lemma 1: For any path =< v1,v 2 ,...,vk > through circuit, S(vi ) S(), for 1 i k
Lemma 2: For each cell v V there is some path v such that S(v) = S(v )
By increasing delay on only one cell in the path, can set path slack to zero -> other cells also have zero slack.
67
Combinational logic
Combinational logic
Combinational logic
clk
clk
clk
computing longest path delay full path enumeration - potentially exponential longest path algorithm on DAG (Kirkpatrick 1966, IBM JRD) (O(v+e) or O(g + p)) Currently in successful application on even the largest (>10M gate) circuits has two challenges: asynchronous sub-circuits - limited gate-level simulation false paths - ubiquitous and problematic
68
Delay calculation
Delay numbers for gates Delay numbers for wires
so
s1
mux
co
Full Adder
c1
Full Adder
c2
p0
p1
co
a0 b0
a1
b1
and
Lehman, Birla - IRETrans. Electron. Comput. , 1961 V. Oklobdzija - Jrnl. of VLSI Signal Processing, 1991
70
si+1 ci+1
Longest graphical/topological path runs along carry chain from stage to stage Longest path analysis would identify red path as critical
71
si+1 ci+1
To sensitize red path we need: aibi && ai+1 bi+1 But: red path is false because when this condition is true MUX selects 1 input, i.e. directly from ci-1 Instead shorter green paths are sensitized and red path is not the critical path of the circuit False paths first observed by V. Hrapcenko (Soviet Math. Dokl. 1978 )
72
Eliminates false paths from analysis Two approaches to false path elimination
User-specified path exceptions (still mainstream) Automatic false-path detection
73
74
a b
a b
75
timing variation
76
Gate-level models from Semiconductor vendors Driven by fast interpolation from tables Estimated (wire-load model) or back-annotated capacitances
Custom (e.g. Intel microprocessor)/COT (e.g. nVidia, ATI graphics chips)
Mixture of above and transistor-level static-timing verification Transistor level based on extracted device and wiring attributes Simulation of each transistor network builds accurate model
77
Extras
Rents rule RC Models
78
Early on data showed that there exist strong correlation between number of IO terminals and number of gates in the block
(Bakoglu)
79
Rents Rule
In 1960, E.F.Rent showed that
Np = K(Ng )
where Np =no. of external signal connections on a block, Ng = no. of logic gates in a block, K accounts for # of pins per gate, and is Rent s constant is a measure of many of the gates in a circuit block need to communicate with the outside world.
80
Lavg = f(Ng, )
(J. Davis)
Wirelength (gate pitches)
81
R1
1
C1 R3 R4
4
C4
3
Ri
C3
i
Ci
where Ri,k = R j (R j [path(i s) I path(k s)]) E.g. i == C1R1 + C2R1 + C3 (R1 + R3 ) + C4 (R1 + R3 ) + Ci (R1 + R3 + Ri )
82