Sie sind auf Seite 1von 49

Dynamic Programming and

Some VLSI CAD Applications

Shmuel Wimer
Bar Ilan Univ. Eng. Faculty
Technion, EE Faculty

May 2012 Dynamic Programming 1


Outline
• NP Completeness paradox
• Efficient matrix multiplication by dynamic programming
• Dynamic programming in a tree model
– Optimal tree covering in technology mapping
– Optimal floor planning
– Optimal buffer insertion

• Dynamic programming as sequential decision problem


– Resource allocation
– The knapsack problem
– Automatic cell layout generation
– Optimal wire sizing

May 2012 Dynamic Programming 2


NP Completeness Paradox
Let A  a1 , a2 , , an  and the sizes s  a1  , s  a2  , , s  an  in Z  constitute an

arbitrary instance of PARTITION Problem, where we ask whether there exists

A  A satisfying  aA s  a    aA A s  a .

If B   aA s  a  is an odd integer than we the answer is NO. Otherwise,

define a Boolean function t  i, j  , as follows:

T
t  i, j   
there exists a subset A  a1 , a2 , , ai  ,  aA
s a  j
 F otherwise
t 1, j   T iff either j  0 or j  s  a1  . For 1  i  n and 0  j  B 2 t  i, j   T

iff either t  i  1, j   T or s  ai   j and t  i  1, j  s  ai    T . The answer is

then YES iff t  n, B 2   T .


May 2012 Dynamic Programming 3
Example : A  a1 , a2 , a3 , a4 , a5  ,

s  a1   1, s  a2   9, s  a3   5, s  a4   3 and s  a5   8.

j
0 1 2 3 4 5 6 7 8 9 10 11 12 13
i

1 T T F F F F F F F F F F F F

2 T T F F F F F F F T T F F F

3 T T F F F T T F F T T F F F

4 T T F T T T T F T T T F T T

5 T T F T T T T F T T T T T T

s  a1   s  a2   s  a4   1  9  3  13  26 2  s  a3   s  a5 

May 2012 Dynamic Programming 4


Here is the paradox: It is easy to define an iterative
algorithm to fill the entries of the table. Conplexity
of such algorithm is very low polynimial in table
size nB.
Have we found a polynomial algorithm to PARTITION,
thus proving that P=NP?
Every s  ai  can be coded at the input by a string of
O  log s  ai   . The length of the input of PARTITION
is therefore O  n log B  . nB is not bounded by any
polynomial function of O  n log B  .
May 2012 Dynamic Programming 5
The NP completeness of PARTITION stronly depends
on allowing arbitrary large input numbers. If those are
bounded in advance, the algorithm is polynomial time.
We call such algorithms pseudo - polynomial.

May 2012 Dynamic Programming 6


Optimal Matrix-Chain Multiplication
Let A, B and C be k  l , l  m and m  n matrices, respectively.
consider the cost of computing D  ABC   AB  C  A  BC 
as the number of products.

The elements of E  BC are given by Ers  t 1 Brt Cts ,


m

1  r  l , 1  s  n, i.e., l  m  n products are required


for the matrix mltiplication.
For k  10, l  100, m  5 and n  50,
A 10,100   B 100,5 C  5,50    100  5  50   10  100  50   75000
 A10,100 B 100,5 C 5,50  10 100  5  10  5  50  7500
May 2012 Dynamic Programming 7
Problem: How to paranthesize chain multiplication
A1 A2 An to minimize # products?

We could calculate upfront the best paranthesization,


but how many paranthesizations exist?

We could split the matrix product at any 1  k  n  1


into:
  multiply k matrices  multiply n  k matrices  .

May 2012 Dynamic Programming 8


This yields the recurrsive equation
 1 if n  1
P  n    n1 ,
 k 1 P  k  P  n  k  if n  2
whose solution is P  n    4  n1
 n  1
32
.
P  n  is exponential in n.
Denote Ai.. j the result of Ai Ai 1 A j . For i  k  j  1,
paranthesization is a tree whose root splits Ai Ai 1 Aj
into Ai..k and Ak 1.. j . Optimal paranthesization implies
optimal paranthesization of Ai..k and Ak 1.. j each.
May 2012 Dynamic Programming 9
Optimal solution must contain optimal solutions
of subproblems.
Let m[i, j ] be minimal number of scalar multiplications
to produce Ai.. j . Let Ai size be pi 1  pi . Then
 0 if i  j
m[i, j ]  
imin
k  j
m[i, k ]  m[k  1, j ]  pi 1 pk p j  if i  j

m[1, n] is then the smallest number of scalar


multiplications to compute A1..n .

Optimal solution solves recurrsively subproblems.

May 2012 Dynamic Programming 10


Recurrence is not better than exploring all
parenthesizations since it expands a full binary
tree and same m[i, j ] is computed many times,
while only O  n 2
 distinct m[i, j ] exist.
Let s[i, j ] denote the split index at which m[i, j ]
is obtained. The trick is to compute m[i, j ] in
increasing order of chain length l =j  i. We use
two tables m[1..n,1..n] and s[1..n,1..n] to store
m[i, j ] and s[i, j ], resp.
May 2012 Dynamic Programming 11
Overlapping solutions of sub - problems are memorized.

Matrix-chain product minimization can be computed


in O  n3  time.

May 2012 Dynamic Programming 12


MatrixChainOrder ( p1 , p2 , , pn ) {
for (i  1 to n) { m[i, i ]  0 } // initialize length-1 chains
for (l  2 to n) { // increasing chain length
for (i  1 to n  l  1) { // set starting index
j  i  l  1; { // set ending index
m[i, j ]  ; // initialize smallest number of scalar products
for (k  i to j  1) { // set the split of Ai .. j
// m[i, k ] and m[k  1, j ] are known!
q  m[i, k ]  m[k  1, j ]  pi 1 pk p j ;
if (q  m[i, j ]) { m[i, j ]  q; s[i, j ]  k }
}}}
return tables m and s;
}
May 2012 Dynamic Programming 13
A1 : 30  35, A2 : 35  15, A3 :15  5, A4 : 5  10, A5 :10  20, A6 : 20  25

i=1 2 3 4 5 6 i=1 2 3 4 5 6
15125 10500 5375 3500 5000 0 3 3 3 5 5
J=6 J=6

11875 7125 2500 1000 0 5 3 3 3 4


5

9375 4375 750 0 3 3 3


4 4
7875 2625 0 1 2
3 3
15750 0 1
2 2
0
1 1

m[1..6,1..6] s[1..6,1..6]

m[2,2]  m[3,5]  p1 p2 p5  0  2500  35  15  20  13000,



m[2,5]  min  m[2,3]  m[4,5]  p1 p3 p5  2625  1000  35  5  20  7125,  7125
 m[2,4]  m[5,5]  p p p  4375  0  35 10  20  11375,
 1 4 5

May 2012 Dynamic Programming 14


Procedure MatrixChainOrder does not directly
perform multiplication.

This information is derived from s[1..n,1..n] by


recurrsive construction of the split binary tree.

Starting from s[1, n] that yields A1..n , then calling


s[1, s[1, n]] and s[ s[1, n]  1, n], yielding A1..s[1,n]
and As[1,n]1..n , resp., etc.
May 2012 Dynamic Programming 15
MatrixChainMultiply ( A1 , An , s[1..n,1..n], i, j ) {
if (i  j ) {
Ai..s[i , j ] 
MatrixChainMultiply ( A1 , An , s[1..n,1..n], i, s[i, j ]);
As[i , j ]1.. j 
MatrixChainMultiply ( A1 , An , s[1..n,1..n], s[i, j ]  1, j );
return Ai..s[i , j ] As[i , j ]1.. j ;
}
else { return Ai }
}

Construction of optimal solution (backtracking).

May 2012 Dynamic Programming 16


Elements of Dynamic Programming
• A problem exhibits optimal substructure if an optimal
solution to the problem contains within it optimal
solutions to sub problems.

• In a sequence of decisions the remaining ones must


constitute optimal solutions regardless of past
decisions. (principle of optimality).

• The space of sub problems must be small, namely, a


recursive solution must solve same problem many
times. Optimization problem has overlapping sub-
problems.

May 2012 Dynamic Programming 17


• Overlapping sub-problems called by recursive solution
are memorized (encoded in a table), hence
addressing their solution only once.

• Optimal solution is constructed by backtracking.

May 2012 Dynamic Programming 18


Optimal Tree Covering
A problem occurring in mapping logic circuit into
new cell library. Given:

• Rooted binary tree T(V,E) called subject tree (cone of


logic circuit), whose leaves are inputs, root is an output
and internal nodes are logic gates with their I/O pins.

• A family of rooted pattern trees (logic cells of library),


each associated with a non-negative cost (area, power,
delay). Root is cell’s output and leaves are its inputs.

May 2012 Dynamic Programming 19


A cover of the subject tree is a partitioning where
every part is matching an element of library and
every edge of the subject tree is covered exactly
once.
Find a cover of subject tree whose total sum
of costs is minimal.

May 2012 Dynamic Programming 20


r t1 (2) t2 (3) t3 (3) t4 (4) t5 (5)

s t

3+2+2+3=10 4+2+3=9 3+5=8

t3 t5
t3

t1 t4
t1 t1

t2 t2

May 2012 Dynamic Programming 21


INV (1)

NAND2 (2) a NAND2 (5)

c NAND2 (8) INV (9)


NAND2 (2)
INV (1) b AOI21(6) NAND2 (11)
e
f NAND3 (12)
INV (1) d NAND3 (3)
j

NAND2 (2) g INV (3)


NAND2 (5)
h NAND3 (3)
AOI21 (3)
i

Observation: pattern p rooted at the root of T(V,E) yields


minimal cost only if the cost at any of p’s leaves is minimal,
suggesting bottom-up matching algorithm.
May 2012 Dynamic Programming 22
TreeCover (T (V , E ), P) {
foreach (v V ) {
if (v is a leaf) { cost  v  =0 } else { cost  v   1 }
}
while (some v V with cost  v   1 exists) {
select v V whose childeren are all nonnegative cost;
cost  v   ;
M  v   set of all matching patterns at v;
L  m   u V matching leaves of an m  M  v  ;

 
cost  v   min cost  m    uL m cost  u  ;
mM  v 

}}

May 2012 Dynamic Programming 23


Optimal Buffer Insertion
v

? ?

? ? ? ?

u,q u,q

d  v, ui  : driver to receiver delay. Root required time: T  min qi  d  v, ui .


i

Problem 1: max
buffer insertions
min q  d  v, u  by buffer insertion at internal nodes.
i i i

Buffer reduces load delay but adds internal delay, power and area.
Problem 2: max
buffer insertions
min q  d  v, u , s.t. power and area constraints.
i i i

May 2012 Dynamic Programming 24


Delay Model
R4 4
d  v, ui    R jiC j   Rj Lj
ji j i  C4
R2
2
R5 5
C3
R1
0 C5
1
C1 R6 6

R3
C6
3
C2
R7 7
  k  - nodes along path from root to node k
C7

T  k  - nodes of sub-tree rooted at node k

Rkl   j  k   l  R j - resistance along common paths

Lk   jT  k  C j - capacitance of sub-tree

May 2012 Dynamic Programming 25


Bottom-Up Solution
sub-tree
TK  min TM , TN  (TM , LM)
RM
M
LK  LM  LN
(TK , LK) (T’K , L’K) CM
RK K
without buffer
CK sub-tree
1
TK  TK  RK LK  RK CK RN
N
2
LK  LK  CK (TN , LN)
CN

with buffer
1
TK  TK  Dbuffer  Rbuffer LK  RK Cbuffer  RK CK
2
LK  Cbuffer  CK

May 2012 Dynamic Programming 26


Outline of Algorithm
With b nodes, 2b buffer insertions exist. There’s a polynomial solution!

Compare T , L and T , L  at a node. If T   T  and L  L then T , L 

is dropped as it necessairly results non optimal solution. Candidate optimal

solutions are obtained at root, from which an optimal one is chosen. Nodes
of buffer insertion are obtained by top-down backtracking.

Merging sub-tree solutions at a parent node takes linear time!

LM LN L’K

+ =
TM TN T’K

May 2012 Dynamic Programming 27


Interconnect Signal Model

driver’s receiver’s
resistance load
line-to-line
line resistance coupling

line-to-line
coupling

signal's activity, 0<= AF <=1

Using Elmore delay model, simple, inaccurate but with high fidelity

May 2012 Dynamic Programming 28


Interconnect Bus Model

σ1
Si
σi Ri Wi Ci
Si+1

A
L

σn-1

σn

May 2012 Dynamic Programming 29


Delay and Dynamic Power Minimization

signal’s delay:

Di  si 1, wi , si   i  i wi   i wi  i   i wi 1 si 1  1 si  , 1  i  n

i , i ,  i ,  i ,  i - technology parameters, driver's resistance,


capacitive load and bus length L.

signal’s dynamic power:

Pi  si 1 , wi , si   i wi  i 1 si 1  1 si  , 1  i  n

 i ,i - technology parameters, signal's activity, and bus length L.

May 2012 Dynamic Programming 30


Minimize bus delay

D sum  s , w    i 1 Di  si 1 , wi , si  or D max  s , w   max Di  si 1 , wi , si 


n

1i  n

Minimize bus power

P  s , w  i 1 Pi  si 1 , wi , si 
n

Subject to:

i1 w0  i0 si  A
n n

In 32nm node and beyond spaces and widths are very few discrete values

si  S  S1 ,..., S p  and wi W  W1 ,...,Wq 

Continuous optimization and its well-known results are invalid. The sizing
problem is NP-complete. A pseudo polynomial resource allocation dynamic
programming solution is suitable.

May 2012 Dynamic Programming 31


 :  w0 , s0 , w1, s1,..., wn , sn  is a sequence of allocation decisions.

Observation: After  wi , si  , 0  i  j, are decided, optimal allocation

of rest n  1  j wires depends only on s j and A   j


i 0
j

wi  i 0 si .

  :  w0 , s0 ,..., wj , sj  is dominant and   :  w0, s0,..., wj , sj  is redundant if:
j
1. A0..  j j
 
 i0 si   i0 wi   i0 si  i0 wi
j j
  j
A0..
2. sj  sj
3. D     D    and P     P   

The treiplet A0.. j , s j ,  D  A0.. j , s j  , P  A0.. j , s j  is a state and it is

sufficient to maintain only non redundant states.


May 2012 Dynamic Programming 32
Dynamic programming comprises n decision stages. Each stage expands all
non redundant states.

A state A0.. j 1 , s j 1 ,  D  A0.. j 1 , s j 1  , P  A0.. j 1 , s j 1   at stage j  1 is obtained

from states A0.. j , s j ,  D  A0.. j , s j  , P  A0.. j , s j   of stage j by augmentations

with all permissible  w, s   W1 ,...,Wq   S1 ,..., S p .

A stage maintains only non redundant states.

Algorithm can be extended to arbitrary routing by construction of wire


visibilty graph and topological ordeing of graph's nodes.

May 2012 Dynamic Programming 33


Floorplan and Layout
Floorplan Graph representation

B1 B2 B8 B7

B2 B8
B9 B1 B7
B9

B12 B10
B5 B10
B3

B3 B5 B4 B12
B6 B11
B11
B6
B4

Floorplan is represented by a planar graph.

Vertices - vertical lines. Arcs - rectangular areas where blocks are embedded.
A dual graph is implied.
May 2012 Dynamic Programming 34
From Floorplan to Layout

• Actual layout is obtained by embedding real blocks into floorplan


cells.

– Blocks’ adjacency relations are maintained

– Blocks are not perfectly matched, thus white area (waste) results

• Layout width and height are obtained by assigning blocks’


dimensions to corresponding arcs.

– Width and height are derived from longest paths

• Different block sizes yield different layout area, even if block sizes
are area invariant.

May 2012 Dynamic Programming 35


Optimal Slicing Floorplan
Top block’s area is divided by vertical Slicing tree. Leaf blocks are associated
and horizontal cut-lines with areas.

v
B B B B
1 2 8 7

h h

B
9 v v v v

B12 B10
h h h h
B B B B12
B B 1 2 7
3 5 B11
B
B 6 B B B B B B B10 B11
4 3 4 5 6 8 9

May 2012 Dynamic Programming 36


Let block Bi , 1  i  b, have possible implementations  x , y  ,
i
j
i
j
j

1  j  ni , having fixed area x ij y ij  ai .

In the most simplified case Bi , 1  i  b, have 2 implementations

corresponding to 2 orientations.

Problem: Find among the 2b possible block orientations i , 1  i  2b ,

the one of smallest area.

Theorem (L. Stockmeyer): Given slicing floorplan of b blocks whose


slicing tree has depth d , finding the orietnation that yields the smallest

area takes O  bd  time and O  b  storage.

May 2012 Dynamic Programming 37


Merge horizontally two width-height sets (vertical cut-line)

+ =

+ =
hparent  max hleft , hright 
wparent  wleft  wright

+ =

May 2012 Dynamic Programming 38


  
VerticalMerging (  wi, hii 1 ,  wj , hj  j 1 ) { // horizontal cut-line
s t

// lists are sorted in descending order of width


h
i  1; j  1;
while (( i  s ) && ( j  t )) {
wparent  max  wi, wj ;
hparent  hi  hj ;
if ( wi  wj ) {  i }
else if ( wi  wj ) {   j }
else {  i;   j } // wi  wj
}
}
Size of new width-height list equals sum of lengths of children lists,
rather than their product.
May 2012 Dynamic Programming 39
Sketch of Proof
• Problem is solved by a bottom-up dynamic programming algorithm
working on corresponding slicing tree.
• Each node maintains a set of width-height pairs, none of which can
be ruled out until root of tree is reached. Size of sets is in the order
of node’s leaf count. Sets in leaves are just Bi’s two orientations.
• The sets of width-height pairs at each node is created by merging
the sets of left-son and right-son sub-trees in time linear in their size.
• Width-height pair sets are maintained as a sorted list in one
dimension (hence sorted inversely in the other dimension).
• Final implementation is obtained by backtracking from the root.

May 2012 Dynamic Programming 40


Automatic Cell Layout Generation
3 step process: Transistor placement comprises:
1. Transistor placement 1. Transistor P-N pairing
2. Interconnect completion 2. Pair ordering
3. Design rule adherence 3. Pair flipping – optimize cell area, node cap,
potential cell abutment, cell’s internal routing

Cost=0 Cost=1 Cost=2

Vcc

Vcc
Vcc
Vcc

Vcc

Vcc

Vcc

a a a a a a a

Vss

Vss
Vss
Vss

Vss

Vss

b b Vss b b b
b b

May 2012 Dynamic Programming 41


• Most cells unfortunately contain more than 4 transistors.
• A flip configuration of a pair depends on the flip of its left and right
neighbors.
• Seek the flip configuration yielding minimal sum of abutment cost.
– With n pairs, there are 2n solutions to consider.
• Observation: An optimal flip of j+1 pairs subject to given right end
configuration of pair j necessitates that the first j pairs have been
optimally flipped.
– Principle of optimality, optimal sub problem solutions.
• Observation: The optimal flip of rest n – j pairs is independent of the
first j flips except the right end configuration of pair j.
– This defines a state for which only the lowest cost flip of j pairs is of
interest.
• Dynamic Programming solution is in order. (Bar-Yehuda et. al.)

May 2012 Dynamic Programming 42


State Augmentation
stage j stage j+1

a c a c
abutment cost
b d b d

a c a c

d b d b

c a c a

d b d b

c a c a

b d b d

May 2012 Dynamic Programming 43


• Dynamic programming takes O(n) time.

• Can be extended to multi-row cell (double height, etc.).

• It can be combined in a DFS algorithm which considers

simultaneously paring, pair ordering and optimal flip, without any

complexity overhead (state augmentation takes O(1) time)

• Dynamic programming is solving in fact a shortest path algorithm

on the state transition graph.

• New litho rules in 32nm and smaller feature size offer many

optimization opportunities.

May 2012 Dynamic Programming 44


Resource Allocation

K units (integer) of a resource are used for manufacturing n commodities.

The production of xi units (integer) of commodity i consumes ci  xi  of the

resource (integer), where ci  0   0, and produces pi  xi  profit . B units

at most can be allocated for each commodity.

The optimal resource allocation problem is therefore:

maximize  i 1 pi  xi  ,
n

all allocations

 ci  xi   K
n
i 1
subject to:
0  xi  B, 1  i  n

May 2012 Dynamic Programming 45


Allocation can be viewd as a sequenial decision making. Let commodities

1 through j have already been produces, consuming 0   i 1 ci  xi   K .


j

Define f  j , y  as the maximal total profit that can be achieved by allocating

y unit for producing commodities 1 through j. By definition f  n, K  solves

the problem.

The production of x units of comodity j must statisfy c  x   y.

Functional equations which can be solved recurrsively result in:

 max x B  p1  x , j 1
f  j, y   
 
 max x  B p j  x   f  j  1, y  c j  x  , 1  j  n

May 2012 Dynamic Programming 46


Elements of Dynamic Programming

• Sequential decision making process.


• Transition occurs from state to state.
• A state is a summary of prior history of the process sufficiently
detailed to enable evolution of current alternatives.
– Sequential decision process evolves from state to state.
– The pair (j,y) is a state in the resource allocation process.
– The elements encoded in a state are called state variables.

• Principle of optimality states that whatever the initial state is and


decisions were, the remaining decisions must constitute an optimal
policy.

May 2012 Dynamic Programming 47


Linear Case: Knapsack Problem
Resource is knapsack of volume K .
A unit of commodity i occupies volume ci , yielding profit pi .

We look for most profitable way to pack the sack.

To allow partially empty sack we introduce commodity 0


with c0  1 and p0  0, resulting the problem:

maximize  i 0 pi xi ,
n

all allocations

subject to:  i 0 xi ci  K , xi  0, xi integer, 0  i  n


n

May 2012 Dynamic Programming 48


Linear Case: Knapsack Problem
Let some items have been put into the sack and y
volume remains.

Linearity implies that packing the rest is independent


of past.

y  0,1, , K is the state and f  y  is maximum profit

obtainable from packing volume y.


Hence f  y   max p j  f  y  c j  , y  1,
j |c j  y
 , K.
May 2012 Dynamic Programming 49

Das könnte Ihnen auch gefallen