Dynamic Programming and Some VLSI CAD Applications: Shmuel Wimer

Dynamic Programming and
Some VLSI CAD Applications
Shmuel Wimer
Bar Ilan Univ. Eng. Faculty
Technion, EE Faculty
May 2012 Dynamic Programming 1

Outline
• NP Completeness paradox
• Efficient matrix multiplication by dynamic programming
• Dynamic programming in a tree model
– Optimal tree covering in technology mapping
– Optimal floor planning
– Optimal buffer insertion
• Dynamic programming as sequential decision problem

– Resource allocation
– The knapsack problem
– Automatic cell layout generation
– Optimal wire sizing

NP Completeness Paradox
Let A  a1 , a2 , , an  and the sizes s  a1  , s  a2  , , s  an  in Z  constitute an
arbitrary instance of PARTITION Problem, where we ask whether there exists
A  A satisfying  aA s  a    aA A s  a .
If B   aA s  a  is an odd integer than we the answer is NO. Otherwise,
define a Boolean function t  i, j  , as follows:
T
t  i, j   
there exists a subset A  a1 , a2 , , ai  ,  aA
s a  j
 F otherwise
t 1, j   T iff either j  0 or j  s  a1  . For 1  i  n and 0  j  B 2 t  i, j   T
iff either t  i  1, j   T or s  ai   j and t  i  1, j  s  ai    T . The answer is
then YES iff t  n, B 2   T .

Example : A  a1 , a2 , a3 , a4 , a5  ,
s  a1   1, s  a2   9, s  a3   5, s  a4   3 and s  a5   8.
j
0 1 2 3 4 5 6 7 8 9 10 11 12 13
i
1 T T F F F F F F F F F F F F
2 T T F F F F F F F T T F F F
3 T T F F F T T F F T T F F F
4 T T F T T T T F T T T F T T
5 T T F T T T T F T T T T T T
s  a1   s  a2   s  a4   1  9  3  13  26 2  s  a3   s  a5 

Here is the paradox: It is easy to define an iterative
algorithm to fill the entries of the table. Conplexity
of such algorithm is very low polynimial in table
size nB.
Have we found a polynomial algorithm to PARTITION,
thus proving that P=NP?
Every s  ai  can be coded at the input by a string of
O  log s  ai   . The length of the input of PARTITION
is therefore O  n log B  . nB is not bounded by any
polynomial function of O  n log B  .
The NP completeness of PARTITION stronly depends
on allowing arbitrary large input numbers. If those are
bounded in advance, the algorithm is polynomial time.
We call such algorithms pseudo - polynomial.

Optimal Matrix-Chain Multiplication
Let A, B and C be k  l , l  m and m  n matrices, respectively.
consider the cost of computing D  ABC   AB  C  A  BC 
as the number of products.
The elements of E  BC are given by Ers  t 1 Brt Cts ,

m
1  r  l , 1  s  n, i.e., l  m  n products are required

for the matrix mltiplication.
For k  10, l  100, m  5 and n  50,
A 10,100   B 100,5 C  5,50    100  5  50   10  100  50   75000
 A10,100 B 100,5 C 5,50  10 100  5  10  5  50  7500
Problem: How to paranthesize chain multiplication
A1 A2 An to minimize # products?
We could calculate upfront the best paranthesization,

but how many paranthesizations exist?
We could split the matrix product at any 1  k  n  1

into:
  multiply k matrices  multiply n  k matrices  .

This yields the recurrsive equation
 1 if n  1
P  n    n1 ,
 k 1 P  k  P  n  k  if n  2
whose solution is P  n    4  n1
 n  1
32
.
P  n  is exponential in n.
Denote Ai.. j the result of Ai Ai 1 A j . For i  k  j  1,
paranthesization is a tree whose root splits Ai Ai 1 Aj
into Ai..k and Ak 1.. j . Optimal paranthesization implies
optimal paranthesization of Ai..k and Ak 1.. j each.
Optimal solution must contain optimal solutions
of subproblems.
Let m[i, j ] be minimal number of scalar multiplications
to produce Ai.. j . Let Ai size be pi 1  pi . Then
 0 if i  j
m[i, j ]  
imin
k  j
m[i, k ]  m[k  1, j ]  pi 1 pk p j  if i  j
m[1, n] is then the smallest number of scalar

multiplications to compute A1..n .
Optimal solution solves recurrsively subproblems.

Recurrence is not better than exploring all
parenthesizations since it expands a full binary
tree and same m[i, j ] is computed many times,
while only O  n 2
 distinct m[i, j ] exist.
Let s[i, j ] denote the split index at which m[i, j ]
is obtained. The trick is to compute m[i, j ] in
increasing order of chain length l =j  i. We use
two tables m[1..n,1..n] and s[1..n,1..n] to store
m[i, j ] and s[i, j ], resp.
Overlapping solutions of sub - problems are memorized.
Matrix-chain product minimization can be computed

in O  n3  time.

MatrixChainOrder ( p1 , p2 , , pn ) {
for (i  1 to n) { m[i, i ]  0 } // initialize length-1 chains
for (l  2 to n) { // increasing chain length
for (i  1 to n  l  1) { // set starting index
j  i  l  1; { // set ending index
m[i, j ]  ; // initialize smallest number of scalar products
for (k  i to j  1) { // set the split of Ai .. j
// m[i, k ] and m[k  1, j ] are known!
q  m[i, k ]  m[k  1, j ]  pi 1 pk p j ;
if (q  m[i, j ]) { m[i, j ]  q; s[i, j ]  k }
}}}
return tables m and s;
}
A1 : 30  35, A2 : 35  15, A3 :15  5, A4 : 5  10, A5 :10  20, A6 : 20  25
i=1 2 3 4 5 6 i=1 2 3 4 5 6
15125 10500 5375 3500 5000 0 3 3 3 5 5
J=6 J=6
11875 7125 2500 1000 0 5 3 3 3 4

5
9375 4375 750 0 3 3 3

4 4
7875 2625 0 1 2
3 3
15750 0 1
2 2
0
1 1
m[1..6,1..6] s[1..6,1..6]
m[2,2]  m[3,5]  p1 p2 p5  0  2500  35  15  20  13000,


m[2,5]  min  m[2,3]  m[4,5]  p1 p3 p5  2625  1000  35  5  20  7125,  7125
 m[2,4]  m[5,5]  p p p  4375  0  35 10  20  11375,
 1 4 5

Procedure MatrixChainOrder does not directly
perform multiplication.
This information is derived from s[1..n,1..n] by

recurrsive construction of the split binary tree.
Starting from s[1, n] that yields A1..n , then calling

s[1, s[1, n]] and s[ s[1, n]  1, n], yielding A1..s[1,n]
and As[1,n]1..n , resp., etc.
MatrixChainMultiply ( A1 , An , s[1..n,1..n], i, j ) {
if (i  j ) {
Ai..s[i , j ] 
MatrixChainMultiply ( A1 , An , s[1..n,1..n], i, s[i, j ]);
As[i , j ]1.. j 
MatrixChainMultiply ( A1 , An , s[1..n,1..n], s[i, j ]  1, j );
return Ai..s[i , j ] As[i , j ]1.. j ;
}
else { return Ai }
}
Construction of optimal solution (backtracking).

Elements of Dynamic Programming
• A problem exhibits optimal substructure if an optimal
solution to the problem contains within it optimal
solutions to sub problems.
• In a sequence of decisions the remaining ones must

constitute optimal solutions regardless of past
decisions. (principle of optimality).
• The space of sub problems must be small, namely, a

recursive solution must solve same problem many
times. Optimization problem has overlapping sub-
problems.

• Overlapping sub-problems called by recursive solution
are memorized (encoded in a table), hence
addressing their solution only once.
• Optimal solution is constructed by backtracking.

Optimal Tree Covering
A problem occurring in mapping logic circuit into
new cell library. Given:
• Rooted binary tree T(V,E) called subject tree (cone of

logic circuit), whose leaves are inputs, root is an output
and internal nodes are logic gates with their I/O pins.
• A family of rooted pattern trees (logic cells of library),

each associated with a non-negative cost (area, power,
delay). Root is cell’s output and leaves are its inputs.

A cover of the subject tree is a partitioning where
every part is matching an element of library and
every edge of the subject tree is covered exactly
once.
Find a cover of subject tree whose total sum
of costs is minimal.

r t1 (2) t2 (3) t3 (3) t4 (4) t5 (5)
s t
3+2+2+3=10 4+2+3=9 3+5=8
t3 t5
t3
t1 t4
t1 t1
t2 t2

INV (1)
NAND2 (2) a NAND2 (5)
c NAND2 (8) INV (9)

NAND2 (2)
INV (1) b AOI21(6) NAND2 (11)
e
f NAND3 (12)
INV (1) d NAND3 (3)
j
NAND2 (2) g INV (3)

NAND2 (5)
h NAND3 (3)
AOI21 (3)
i
Observation: pattern p rooted at the root of T(V,E) yields

minimal cost only if the cost at any of p’s leaves is minimal,
suggesting bottom-up matching algorithm.
TreeCover (T (V , E ), P) {
foreach (v V ) {
if (v is a leaf) { cost  v  =0 } else { cost  v   1 }
}
while (some v V with cost  v   1 exists) {
select v V whose childeren are all nonnegative cost;
cost  v   ;
M  v   set of all matching patterns at v;
L  m   u V matching leaves of an m  M  v  ;
 
cost  v   min cost  m    uL m cost  u  ;
mM  v 
}}

Optimal Buffer Insertion
v
? ?
? ? ? ?
u,q u,q
d  v, ui  : driver to receiver delay. Root required time: T  min qi  d  v, ui .

i
Problem 1: max
buffer insertions
min q  d  v, u  by buffer insertion at internal nodes.
i i i
Buffer reduces load delay but adds internal delay, power and area.
Problem 2: max
buffer insertions
min q  d  v, u , s.t. power and area constraints.
i i i

Delay Model
R4 4
d  v, ui    R jiC j   Rj Lj
ji j i  C4
R2
2
R5 5
C3
R1
0 C5
1
C1 R6 6
R3
C6
3
C2
R7 7
  k  - nodes along path from root to node k
C7
T  k  - nodes of sub-tree rooted at node k
Rkl   j  k   l  R j - resistance along common paths
Lk   jT  k  C j - capacitance of sub-tree

Bottom-Up Solution
sub-tree
TK  min TM , TN  (TM , LM)
RM
M
LK  LM  LN
(TK , LK) (T’K , L’K) CM
RK K
without buffer
CK sub-tree
1
TK  TK  RK LK  RK CK RN
N
2
LK  LK  CK (TN , LN)
CN
with buffer
1
TK  TK  Dbuffer  Rbuffer LK  RK Cbuffer  RK CK
2
LK  Cbuffer  CK

Outline of Algorithm
With b nodes, 2b buffer insertions exist. There’s a polynomial solution!
Compare T , L and T , L  at a node. If T   T  and L  L then T , L 
is dropped as it necessairly results non optimal solution. Candidate optimal
solutions are obtained at root, from which an optimal one is chosen. Nodes
of buffer insertion are obtained by top-down backtracking.
Merging sub-tree solutions at a parent node takes linear time!
LM LN L’K
+ =
TM TN T’K

Interconnect Signal Model
driver’s receiver’s
resistance load
line-to-line
line resistance coupling
line-to-line
coupling
signal's activity, 0<= AF <=1
Using Elmore delay model, simple, inaccurate but with high fidelity

Interconnect Bus Model
σ1
Si
σi Ri Wi Ci
Si+1
A
L
σn-1
σn

Delay and Dynamic Power Minimization
signal’s delay:
Di  si 1, wi , si   i  i wi   i wi  i   i wi 1 si 1  1 si  , 1  i  n
i , i ,  i ,  i ,  i - technology parameters, driver's resistance,

capacitive load and bus length L.
signal’s dynamic power:
Pi  si 1 , wi , si   i wi  i 1 si 1  1 si  , 1  i  n
 i ,i - technology parameters, signal's activity, and bus length L.

Minimize bus delay
D sum  s , w    i 1 Di  si 1 , wi , si  or D max  s , w   max Di  si 1 , wi , si 

n
1i  n
Minimize bus power
P  s , w  i 1 Pi  si 1 , wi , si 
n
Subject to:
i1 w0  i0 si  A
n n
In 32nm node and beyond spaces and widths are very few discrete values
si  S  S1 ,..., S p  and wi W  W1 ,...,Wq 
Continuous optimization and its well-known results are invalid. The sizing
problem is NP-complete. A pseudo polynomial resource allocation dynamic
programming solution is suitable.

 :  w0 , s0 , w1, s1,..., wn , sn  is a sequence of allocation decisions.
Observation: After  wi , si  , 0  i  j, are decided, optimal allocation
of rest n  1  j wires depends only on s j and A   j

i 0
j

wi  i 0 si .
  :  w0 , s0 ,..., wj , sj  is dominant and   :  w0, s0,..., wj , sj  is redundant if:
j
1. A0..  j j
 
 i0 si   i0 wi   i0 si  i0 wi
j j
  j
A0..
2. sj  sj
3. D     D    and P     P   
The treiplet A0.. j , s j ,  D  A0.. j , s j  , P  A0.. j , s j  is a state and it is
sufficient to maintain only non redundant states.

Dynamic programming comprises n decision stages. Each stage expands all
non redundant states.
A state A0.. j 1 , s j 1 ,  D  A0.. j 1 , s j 1  , P  A0.. j 1 , s j 1   at stage j  1 is obtained
from states A0.. j , s j ,  D  A0.. j , s j  , P  A0.. j , s j   of stage j by augmentations
with all permissible  w, s   W1 ,...,Wq   S1 ,..., S p .
A stage maintains only non redundant states.
Algorithm can be extended to arbitrary routing by construction of wire

visibilty graph and topological ordeing of graph's nodes.

Floorplan and Layout
Floorplan Graph representation
B1 B2 B8 B7
B2 B8
B9 B1 B7
B9
B12 B10
B5 B10
B3
B3 B5 B4 B12
B6 B11
B11
B6
B4
Floorplan is represented by a planar graph.
Vertices - vertical lines. Arcs - rectangular areas where blocks are embedded.
A dual graph is implied.
From Floorplan to Layout
• Actual layout is obtained by embedding real blocks into floorplan

cells.
– Blocks’ adjacency relations are maintained
– Blocks are not perfectly matched, thus white area (waste) results
• Layout width and height are obtained by assigning blocks’

dimensions to corresponding arcs.
– Width and height are derived from longest paths
• Different block sizes yield different layout area, even if block sizes
are area invariant.

Optimal Slicing Floorplan
Top block’s area is divided by vertical Slicing tree. Leaf blocks are associated
and horizontal cut-lines with areas.
v
B B B B
1 2 8 7
h h
B
9 v v v v
B12 B10
h h h h
B B B B12
B B 1 2 7
3 5 B11
B
B 6 B B B B B B B10 B11
4 3 4 5 6 8 9

Let block Bi , 1  i  b, have possible implementations  x , y  ,
i
j
i
j
j
1  j  ni , having fixed area x ij y ij  ai .
In the most simplified case Bi , 1  i  b, have 2 implementations
corresponding to 2 orientations.
Problem: Find among the 2b possible block orientations i , 1  i  2b ,
the one of smallest area.
Theorem (L. Stockmeyer): Given slicing floorplan of b blocks whose

slicing tree has depth d , finding the orietnation that yields the smallest
area takes O  bd  time and O  b  storage.

Merge horizontally two width-height sets (vertical cut-line)
+ =
+ =
hparent  max hleft , hright 
wparent  wleft  wright
+ =

  
VerticalMerging (  wi, hii 1 ,  wj , hj  j 1 ) { // horizontal cut-line
s t
// lists are sorted in descending order of width

h
i  1; j  1;
while (( i  s ) && ( j  t )) {
wparent  max  wi, wj ;
hparent  hi  hj ;
if ( wi  wj ) {  i }
else if ( wi  wj ) {   j }
else {  i;   j } // wi  wj
}
}
Size of new width-height list equals sum of lengths of children lists,
rather than their product.
Sketch of Proof
• Problem is solved by a bottom-up dynamic programming algorithm
working on corresponding slicing tree.
• Each node maintains a set of width-height pairs, none of which can
be ruled out until root of tree is reached. Size of sets is in the order
of node’s leaf count. Sets in leaves are just Bi’s two orientations.
• The sets of width-height pairs at each node is created by merging
the sets of left-son and right-son sub-trees in time linear in their size.
• Width-height pair sets are maintained as a sorted list in one
dimension (hence sorted inversely in the other dimension).
• Final implementation is obtained by backtracking from the root.

Automatic Cell Layout Generation
3 step process: Transistor placement comprises:
1. Transistor placement 1. Transistor P-N pairing
2. Interconnect completion 2. Pair ordering
3. Design rule adherence 3. Pair flipping – optimize cell area, node cap,
potential cell abutment, cell’s internal routing
Cost=0 Cost=1 Cost=2
Vcc
Vcc
Vcc
Vcc
Vcc
Vcc
Vcc
a a a a a a a
Vss
Vss
Vss
Vss
Vss
Vss
b b Vss b b b
b b

• Most cells unfortunately contain more than 4 transistors.
• A flip configuration of a pair depends on the flip of its left and right
neighbors.
• Seek the flip configuration yielding minimal sum of abutment cost.
– With n pairs, there are 2n solutions to consider.
• Observation: An optimal flip of j+1 pairs subject to given right end
configuration of pair j necessitates that the first j pairs have been
optimally flipped.
– Principle of optimality, optimal sub problem solutions.
• Observation: The optimal flip of rest n – j pairs is independent of the
first j flips except the right end configuration of pair j.
– This defines a state for which only the lowest cost flip of j pairs is of
interest.
• Dynamic Programming solution is in order. (Bar-Yehuda et. al.)

State Augmentation
stage j stage j+1
a c a c
abutment cost
b d b d
a c a c
d b d b
c a c a
d b d b
c a c a
b d b d

• Dynamic programming takes O(n) time.
• Can be extended to multi-row cell (double height, etc.).
• It can be combined in a DFS algorithm which considers
simultaneously paring, pair ordering and optimal flip, without any
complexity overhead (state augmentation takes O(1) time)
• Dynamic programming is solving in fact a shortest path algorithm
on the state transition graph.
• New litho rules in 32nm and smaller feature size offer many
optimization opportunities.

Resource Allocation
K units (integer) of a resource are used for manufacturing n commodities.
The production of xi units (integer) of commodity i consumes ci  xi  of the
resource (integer), where ci  0   0, and produces pi  xi  profit . B units
at most can be allocated for each commodity.
The optimal resource allocation problem is therefore:
maximize  i 1 pi  xi  ,
n
all allocations
 ci  xi   K
n
i 1
subject to:
0  xi  B, 1  i  n

Allocation can be viewd as a sequenial decision making. Let commodities
1 through j have already been produces, consuming 0   i 1 ci  xi   K .

j
Define f  j , y  as the maximal total profit that can be achieved by allocating
y unit for producing commodities 1 through j. By definition f  n, K  solves
the problem.
The production of x units of comodity j must statisfy c  x   y.
Functional equations which can be solved recurrsively result in:
 max x B  p1  x , j 1
f  j, y   
 
 max x  B p j  x   f  j  1, y  c j  x  , 1  j  n

Elements of Dynamic Programming
• Sequential decision making process.

• Transition occurs from state to state.
• A state is a summary of prior history of the process sufficiently
detailed to enable evolution of current alternatives.
– Sequential decision process evolves from state to state.
– The pair (j,y) is a state in the resource allocation process.
– The elements encoded in a state are called state variables.
• Principle of optimality states that whatever the initial state is and

decisions were, the remaining decisions must constitute an optimal
policy.

Linear Case: Knapsack Problem
Resource is knapsack of volume K .
A unit of commodity i occupies volume ci , yielding profit pi .
We look for most profitable way to pack the sack.
To allow partially empty sack we introduce commodity 0

with c0  1 and p0  0, resulting the problem:
maximize  i 0 pi xi ,
n
all allocations
subject to:  i 0 xi ci  K , xi  0, xi integer, 0  i  n

n

Linear Case: Knapsack Problem
Let some items have been put into the sack and y
volume remains.
Linearity implies that packing the rest is independent

of past.
y  0,1, , K is the state and f  y  is maximum profit
obtainable from packing volume y.

Hence f  y   max p j  f  y  c j  , y  1,
j |c j  y
 , K.

Dynamic Programming and Some VLSI CAD Applications: Shmuel Wimer

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Dynamic Programming and Some VLSI CAD Applications: Shmuel Wimer

Hochgeladen von

Copyright:

Verfügbare Formate

Dynamic Programming and

Some VLSI CAD Applications

May 2012 Dynamic Programming 1

• Dynamic programming as sequential decision problem

May 2012 Dynamic Programming 2

arbitrary instance of PARTITION Problem, where we ask whether there exists

A  A satisfying  aA s  a    aA A s  a .

If B   aA s  a  is an odd integer than we the answer is NO. Otherwise,

define a Boolean function t  i, j  , as follows:

iff either t  i  1, j   T or s  ai   j and t  i  1, j  s  ai    T . The answer is

then YES iff t  n, B 2   T .

May 2012 Dynamic Programming 4

May 2012 Dynamic Programming 6

The elements of E  BC are given by Ers  t 1 Brt Cts ,

1  r  l , 1  s  n, i.e., l  m  n products are required

We could calculate upfront the best paranthesization,

We could split the matrix product at any 1  k  n  1

May 2012 Dynamic Programming 8

m[1, n] is then the smallest number of scalar

Optimal solution solves recurrsively subproblems.

May 2012 Dynamic Programming 10

Matrix-chain product minimization can be computed

May 2012 Dynamic Programming 12

11875 7125 2500 1000 0 5 3 3 3 4

9375 4375 750 0 3 3 3

m[2,2]  m[3,5]  p1 p2 p5  0  2500  35  15  20  13000,

May 2012 Dynamic Programming 14

This information is derived from s[1..n,1..n] by

Starting from s[1, n] that yields A1..n , then calling

Construction of optimal solution (backtracking).

May 2012 Dynamic Programming 16

• In a sequence of decisions the remaining ones must

• The space of sub problems must be small, namely, a

May 2012 Dynamic Programming 17

• Optimal solution is constructed by backtracking.

May 2012 Dynamic Programming 18

• Rooted binary tree T(V,E) called subject tree (cone of

• A family of rooted pattern trees (logic cells of library),

May 2012 Dynamic Programming 19

May 2012 Dynamic Programming 20

3+2+2+3=10 4+2+3=9 3+5=8

May 2012 Dynamic Programming 21

NAND2 (2) a NAND2 (5)

c NAND2 (8) INV (9)

NAND2 (2) g INV (3)

Observation: pattern p rooted at the root of T(V,E) yields

May 2012 Dynamic Programming 23

d  v, ui  : driver to receiver delay. Root required time: T  min qi  d  v, ui .

May 2012 Dynamic Programming 24

T  k  - nodes of sub-tree rooted at node k

Rkl   j  k   l  R j - resistance along common paths

Lk   jT  k  C j - capacitance of sub-tree

May 2012 Dynamic Programming 25

May 2012 Dynamic Programming 26

Compare T , L and T , L  at a node. If T   T  and L  L then T , L 

is dropped as it necessairly results non optimal solution. Candidate optimal

Merging sub-tree solutions at a parent node takes linear time!

May 2012 Dynamic Programming 27

signal's activity, 0<= AF <=1

May 2012 Dynamic Programming 28

May 2012 Dynamic Programming 29

Di  si 1, wi , si   i  i wi   i wi  i   i wi 1 si 1  1 si  , 1  i  n

i , i ,  i ,  i ,  i - technology parameters, driver's resistance,