Sie sind auf Seite 1von 4

An E cient Transistor Folding Algorithm for Row-Based CMOS Layout Design

Jaewon Kim Quickturn Design Systems, Inc. 440 Clyde Av. Mountain View, CA 94043 Abstract S. M. Kang Coordinated Science Laboratory University of Illinois at Urbana-Champaign Urbana, IL 61801 2 Problem De nition

Let C be a CMOS circuit with m transistor pairs. The transistor sizes of C are divided into two sets according to the transistor types. Let P = fp1 ; p2 ; p3 ;    ; pm g and N = fn1 ; n2 ; n3 ;    ; nm g be the sets of the sizes of pMOS transistors and the sizes of nMOS transistors, respectively. Each transistor size is assumed to be in multiples of , the minimum resolution size. A transistor pair with the same index, pi in P and ni in N , have the CMOS duality. If pi ni  does not have its dual transistor, ni pi  is set to zero. In order to satisfy the design rules, each transistor size should be at least equal to the minimum transistor size, PMIN for pMOS transistor and NMIN for nMOS transistor, which are 1 Introduction multiples of . Each element Among various design attributes, timing delay has become also minimum sizeconstraint as in P and N should satisfy one of the most important constraints in the high-performance the circuit design. In order to meet all the timing requirements, pi  PMIN for all pi 2 P; transistors with various current driving capabilities are required. Judicious increase of certain transistor sizes can reduce the cirni  NMIN for all ni 2 N: cuit delay at the expense of additional chip area 1 . Many optimization techniques have been introduced to solve the tranThe folding size limits the maximum height of any transistor sistor sizing problem 2, 3, 4 . Most approaches try to minimize layout. If the speci ed transistor size exceeds the folding size of the area subject to the constraints on the maximum circuit de- its type, the transistor should be folded. In order to determine lay or the area-delay product term. Even if transistor sizes are the optimal folding size for each transistor type precisely, it is optimized to minimize the total di usion area, the variations in desirable to generate physical layout for every possible folding the transistor sizes may make the circuit layout worse than the size. However, since it is excessively time-consuming to genone with non-optimal uniform transistor sizes, depending on erate actual layouts, an area estimate function is introduced. the layout design methodology. Among various design method- The total layout area is estimated by summing the cell areas. ologies that have been developed recently, row-based design Since the routing area is di cult to estimate precisely, it is has become most popular. In row-based layout synthesis, the considered an overhead. The estimation of the total cell area variation in transistor sizes may cause non-uniform cell heights is determined by and the non-uniform cell heights may lead to signi cant waste of the layout area. In order to utilize the chip area more e Area = maxP  + maxN  + VertOverhead ciently, a transistor folding scheme should be introduced. The  TotalColumn + HoriOverhead: objective of transistor folding is to keep the cell height of cells uniform and concurrently to reduce the total layout area. TraVertOverhead includes the overhead the ditionally, the transistor folding procedure has been assigned minimum distance between pMOS andfor the power rails, and nMOS transistors to layout experts exclusively for high-performance custom cell the estimation of the minimum intra-cell interconnections inor standard cell designs, while it has been ignored by cell gen- cluding terminal positions. TotalColumn represents the toeration tools. Recently, several researchers have worked on tal number of transistors multiplied by the minimum column transistor folding 5, 6 . However, their works are based on the width speci ed in the design rules. HoriOverhead represents xed cell height or cell-by-cell local optimization, which are the horizontal area for gate output and the intra-cell interconnot guaranteed to lead to optimal layout area. In this paper, we present an e cient algorithm that can nd nections. The direct layout synthesis transistor sizes the optimal transistor folding size for given transistor sizes in a without folding may waste largewith the givento the wide varichip area due circuit. We rst eliminate redundant folding sizes from the so- ation of the transistor sizes. In the above estimation, the height lution space. Then, we apply the modi ed exhaustive method of cells is estimated to be maxP  + maxN  + VertOverhead , with the time complexity of Om2 log m to nd the optimal but this height may not be optimal in terms of area because folding size, where m is the number of transistors in the cir- most cells in the same row have shorter height. cuit. In the automatic synthesis of custom VLSI chips, the If we maximum transistor folding scheme contributes signi cantly to the total overall limit theciency. Lettransistor height, it canpimprove the area e the folding sizes be h and hn for layout area reduction. pMOS and nMOS transistor, respectively. If any pMOSnMOS transistor size is larger than hp hn , the transistor should be folded. The amount of the wasted area can be reduced as long as the horizontal expansion does not incur the net area Design Automation Conference R Copyright c 1997 by the Association for Computing Machinery, Inc. increase. If a transistor is with size hp Permission to make digital or hard copies of part or all of this work for sistor is divided in folded e dn foldingcolumns.hn , the trandpi =hp i =hn e Our objective personal or classroom use is granted without fee provided that copies is to minimize the area function, which is represented as are not made or distributed for pro t or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyareatotal = hp + hn + VertOverhead  width; rights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior where width is represented in terms of hp and hn as speci c permission and or a fee. Request permissions from Publications p n Dept, ACM Inc., fax +1 212 869-0481, or permissions@acm.org. width = maxd hi e; d h i e + HoriOverhead: In timing-driven layout synthesis, transistor sizes tend to be signi cantly di erent from each other and thus the use of conventional layout approaches can cause ine cient area utilization. We propose an e cient algorithm to nd the optimal transistor folding sizes in row-based designs. Our algorithm nds optimal folding sizes given a CMOS circuit with m pairs of pMOS and nMOS transistors in Om2 log m time complexity with the e ective reduction of the solution space. MCNC benchmark circuits are used to demonstrate the area-e ciency of the physical layouts with optimal folding sizes.
0-89791-847-9 97 0006 $3.50 DAC 97 - 06 97 Anaheim, CA, USA

X
i

The reason why we apply the max operation in the above formula is that pMOS and nMOS transistors share the same input signal on vertical polysilicon layer, so both types can not be considered separately. This is due to the fact that our scheme is targeted to automatic CMOS layout synthesis in which gate column sharing is done to reduce layout area. The upper bound of hp hn  is set to maxP maxN . On the other hand, we need to limit the lower bound of hp hn  because any pMOS nMOS transistor should be larger than or equal to PMIN NMIN, which is speci ed by the design rule of the corresponding CMOS technology. Therefore, the problem can be formulated as a minimization problem with the double-sided constraints. minimize hp + hn + VertOverhead p n   maxd hi e; d h i e + HoriOverhead;

Algorithm optimal-folding 1:

1: for every folding size from PMIN to maxP  2: for every folding size from NMIN to maxN  3: calculate the current area if current area best area save the current area to the best area end for end for

X
i

s.t.

If two adjacent di usion areas are electrically equivalent, then layout area can be reduced by merging the di usions. Otherwise, a di usion break is required between them. In the above function, we do not consider the di usion breaks which may be caused by transistor folding for two reasons. First, we do not know how many di usion breaks are required before actual cell synthesis. Normally, the number of di usion breaks is determined by transistor ordering algorithms 7 . All good transistor ordering algorithms try to minimize the diffusion breaks between transistors. Second, transistor folding in conjunction with any good transistor ordering algorithms introduces new di usion breaks because folded transistors are usually abutted as shown in Fig. 1.
C1 C2 C1 C2 C1 C2 C1 C2

PMIN  hp  maxP ; NMIN  hn  maxN :

48 24 16 12

5 2

(a)

Figure 1: Folding example.

(b)

(c)

(d)

A simple method to solve the problem would be, albeit inefcient, to consider all possible folding sizes exhaustively, calculate the areas and choose the pair of folding sizes with the minimum area, as shown in Fig. 2. The time complexity of line 1 and 2 are dependent on the maximum sizes of transistors in the circuit because the minimum sizes of the transistors are xed. Let the size of the solution space for hp and hn be s. Since line 3 takes the time that is proportional to the number of transistor pairs, m, the total time complexity of the algorithm becomes Os2 m. However, depending on the circuit behavior, s can be constant or dependent on the number of transistor pairs. In the design of full custom VLSI circuits, each transistor will be customized to have its own optimal transistor size in terms of layout area and timing delay. Furthermore, if a transistor sizing algorithm is used in the synthesis, every transistor can have a di erent size. If transistor sizes in a circuit are almost uniform and the solution space of s is very small, we may not need any transistor folding scheme. However, this is not the case in real custom designs. Thus, we assume that s m for the worst case in which all transistor sizes are di erent and the time complexity becomes Om3 . With such time complexity, the solution process of the optimal folding sizes for large circuits may take prohibitively long time. Reduction of solution space In the previous section, we de ned the solution space of hp and hn for the optimum area. Even though the size of the solution space is dependent on the circuit behavior, the space is usually very large due to the variation of the transistor sizes in the circuit. Hence, it is desirable to minimize the solution space to reduce the run time. However, the reduction process should not remove the possible optimal solutions of the folding sizes. The following de nitions and theorems are applied to

3 Optimal Folding

Figure 2: Algorithm 1. both types of transistors. Here, we only mention the pMOS transistor case without loss of generality. De nition 1 Let Sip be an ordered set of the possible folding sizes of transistor pi in P . Sip is composed of all dpi =ke that are greater than or equal to PMIN, where k is a positive integer. Sip is sorted in descending order. The ordered folding size set S p is a partial set of the entire solution space of folding sizes of pii . Since k is a positive integer, every element in Sip is smaller than or equal to maxP . Theorem 1 Folding sizep sp that satis es PMIN  sp  ex maxP  but is not in Si does not yield the minimumex area solution. Proof: Suppose Sip = fs1 ; s2 ;    ; su g be an ordered set of possible foldingpisize for transistor pi inp P . Let k be a positive constant and d k e be sj in Sip . Since Si is sorted in decreasing order, we can divide the situation into three cases. i d pki e = d kpi e +1 pi e = d pi e + 1 ii d k k+1 pi e = d pi e + c, where c 1. iii d k k+1 Obviously, there is no other possible integer between d pki e and d ikpi e in the rst and the second cases. In the third case, e let d pk +1and d kpi pe be a + c and a, respectively. a + c and a are elements ofp+1j , but a + c , 1; a + c , 2;    ;pa + 1 are not S contained in Si according to the de nition of Si . We need to justify why a + c , 1; a + c , 2;    ; a + 1 are not better than a in terms of the cell area. pi can be represented by other constants, pi = a + c , 1k + b; where 0 b  k: If we apply the folding size, a + c , 1 to pi , d a +pci , 1 e = k + d a + b , 1 e: c Since b is greater than 0, d a +pci , 1 e  k + 1: This induces

pi can be also represented as pi = a , 1k + 1 + d; where 0 d  k + 1: If we apply the folding size, a to pi , d pi e = k + 1 + d d , k + 1 e: a a Since d is less than or equal to k + 1, d pi e  k + 1: a This also induces d a +pci , 1 e; d a +pci , 2 e;    ; d pi e  k + 1: a

d a +pci , 1 e; d a +pci , 2 e;    ; d pi e  k + 1: a

1

2

According to 1 and 2, d a +pci , 1 e; d a +pci , 2 e;    ; d pi e = k + 1: a

3

index pMOS nMOS

1 15 30

2 20 8

3 25 20

4 5 12

5 8 20

6 30 5

7 30 15

8 23 10

9 11 25

10 20 10

11 9 17

12 17 13

a + c , 1; a + c , 2;    ; a +1 produce the same folded transistor numbers as a does, but the estimated area with those folding sizes are obviously greater than the estimated area with a because a + c , 1; a + c , 2;    ; a + 1 that are greater than a cost more area with the same folded numbers. Note that the multiplication of the folding size and the folded number mainly contributes to the area estimate. Hence, a + c , 1; a + c , 2;    ; a +1 that are not elements of Sip can be ignored in the area minimization process. 2 p is the set of ordered folding sizes for transisDe nition 2 pS p p p tor set P , if S contains all elements in S1 S2    Sm and
is sorted in descending order. Theorem 2 The folding size sp that satis es PMIN  sp  ex maxP  but is not contained inexS p does not produce the minimal area. Proof: Let S p = fsp ; sp ;    ; sp g be an ordered set of the u 2 folding sizes for P and 1x be an integer that satis es sp x i sp+1 , but is not an element of S p . According to the above i theorem, x generates the same folded transistor numbers as sp+1 does because x does not correspond to d pki e for any pi in Pi and any positive integer k. x  folded numberx sp+1  folded numbersp+1 ; i i p are equivalent. x because both folded p numbers for x and si+1 can be ignored in S without of degradation of the solution quality. 2 With the theorems described above, the continuous solution space is reduced and converted to the discrete solution space.

Figure 4: Initial transistor sizes


P P P P P P P P P P P P 1 2 3 4 5 6 7 8 9 10 11 12 N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12
section number

a Initial state


3 2

p S i = 10 P P P P P P P P P P P P 6 7 3 8 2 10 12 1 9 11 5 4

N6 N7 N3 N8 N2 N10 N12 N1 N9 N11 N5 N4

section number
p i

b pMOS sorting and sectioning


3 2 1

3 7 8 6 1 9 12 10 2 5 11 4 S =10 P P P P P P P P P P P P

n S j =10 N3 N7 N8 N6 N1 N9 N12 N10 N2 N5 N11 N4

optimal-folding initialize column pMOS-phase nMOS-phase end optimal-folding pMOS-phase 1: sort P and N in descending order of p in P 2: for each sp in S p 3: divide Pi and N into sections 4: calculate section number gk of each section 5: sort each sectionnin descending order of n in N 6: for each sn in S j for each section k if gk d nk1 e, where nk1 is the rst sn j nMOS transistor in section kp 7: column sp; sn = column si ; sn + gk  jgk j j i j else if gk d nkl e, where nkl is the last sn j nMOS transistor in section k do nothing else 8: nd rst nkl that satis es d nkl e  gk sn end for end for end pMOS-phase
 column sp; sn = column sp; sn + gk  jgk j i j i j
j

Algorithm optimal-folding 2:

Optimal folding size

Figure 3: Algorithm 2.

After the sets for the reduced solution space for pMOS and nMOS transistors are found, the optimal folding sizes for pMOS and nMOS transistors can be chosen from the reduced sets. In order to avoid the time complexity of Om3  suggested in Fig. 2, the proper algorithm and data structure should be established. The main idea is to count the folded numbers of pMOS transistors and nMOS transistors e ciently for given folding sizes. With the given folding sizes, we only need to count the dominant folded number between a pair of dual transistors. The optimal folding size problem is divided into two phases; the pMOS phase and the nMOS phase. In the pMOS phase, the

c nMOS sorting Figure 5: Array sectioning and sorting folded numbers of the pMOS transistors that are greater than or equal to the folded numbers of the nMOS transistors are counted. Likewise, in the nMOS phase, the folded numbers of the nMOS transistors that are greater than the folded numbers of the pMOS transistors are counted. The algorithm is shown in Fig. 3. Since the only major di erence between the pMOS phase and the nMOS phase is whether there exists an equality in line 8, Only the pMOS phase is described in Fig. 3. Since we count the folded numbers in a two-phase operation, the storage for the folded numbers is needed during the entire operation. column is a two dimensional array whose indices represent the folding sizes of each transistor type. Figure 4 shows an example of transistor sizes in a circuit that has 12 transistor pairs. The same indices represent the dual transistors between pMOS and nMOS transistors. The initial state of transistor arrays is shown in Fig. 5a. In the pMOS phase, the transistor arrays are sorted in descending order of pMOS transistor sizes. With any folding sizes of pMOS transistors which is given in line 2 of Fig. 3, the array is divided into several sections within which the folded numbers of the transistors are the same. Suppose that sp is 10, then the array is divided into three sections because ithe maximum of the folded numbers with sp is 3 and the minimum is 1. Section i numbers indicate the folded transistor numbers. In Fig. 5b, transistors p6 ; p7 ; p3 ; p8 and n6 ; n7 n3 ; n8 are contained in the rst section whose section number is 3. After sectioning, each section is sorted again in descending order of nMOS transistor sizes as shown in Fig. 5c. After all sections are sorted, the arrays are ready for the calculation of the folded transistor numbers. Each section will be in one of the followingn three situations with given folding size of nMOS transistor sj . 1. The section number is greater than the folded number of the rst nMOS transistor in the section. 2. The section number is smaller than the folded number of the last nMOS transistor in the section. 3. None of the above. The rst represents the situation that the folded number of the pMOS transistors in a section is greater than the folded numbers of any other nMOS transistors in the section. Since the larger folded number dominates the smaller one, we can ignore the folded numbers of nMOS transistors in such sections. The partial sum of folded number is calculated as column sp ; sn  column sp ; sn + gk  jgk j; i j i j where gk and jgk j are the folded number of section k and the number of elements in section k, respectively. In Fig. 5c, the section with section number 3 corresponds to this situation. The second case is the opposite situation of the rst one. This

n folded

circuit name

Table 1: Algorithm results on MCNC benchmarks. case should be handled in the nMOS phase. If we calculate the pMOS folding size folded numbers of nMOS transistors in pMOS phase, the time 20 30 40 50 60 70 80 complexity is proportional to the number of transistors, while 10 1.16 1.20 1.21 1.20 1.21 1.19 1.21 it takes only constant time in nMOS phase. The section with nMOS 20 0.95 0.92 0.94 0.93 0.94 0.96 0.94 section number 1 in Fig. 5c belongs to this situation. The folding 30 0.99 0.92 0.94 0.94 1.01 1.01 1.08 third case represents the situation wherein the folded number size 40 0.97 0.94 0.93 0.97 0.94 1.01 1.01 of the pMOS transistors in a section is a median value of the 50 0.96 1.09 0.97 0.99 0.93 1.03 1.01 folded numbers of nMOS transistors. In order to calculate 60 0.95 1.08 0.99 1.04 1.00 1.01 1.02 the partial sum of the folded numbers, the section point that satis es nkl e  g ; Table 2: fract layout area comparison. d sn k j area, the physical layouts of several circuits including MCNC should be found in a section. We can calculate the partial sum benchmarks were generated with cell generation, placement of the folded numbers for the partial section that satis es the and routing as shown in Table 1. For circuit eldckt which is a above situation as, full custom design, a signi cant amount of area was saved by the optimal transistor folding scheme. Others are benchmark p ; sn  column sp ; sn + g  jg  j; circuits for the standard cell approach. For the transistor sizes column si j k k i j speci ed in the MCNC standard cell library, which are rela j is the number of elements that satisfy the above tively uniform compared to customized designs, the area ratios where jgk condition in section k. The section with section number 2 cor- are less than that of eldckt. In order to consider the running time of the algorithms, algoresponds to this situation in Fig. 5c. The rest of the elements in the section that do not contribute to the partial sum, are rithm 1 shown in Fig. 2 without solution space reduction and considered in the nMOS phase. However, the above condition algorithm 2 shown in Fig. 3 with the solution space reduction is slightly di erent in the nMOS phase, because the equality were implemented. Both algorithms were used to generate the condition is already considered in the pMOS phase. The con- optimal transistor folding sizes of the benchmark circuits. Table 1 shows the comparison result on run time. While both dition in the nMOS phase is, algorithms produce the same folding sizes for all designs. the run times of the second algorithm are signi cantly smaller than d pkl e gk : those of the rst algorithm as expected. This run time data sp j was measured on a SPARCstation 10 30 with the algorithms C In Fig. 5c, p12 ; n12  is the section point which matches the described inthe language. We found optimal folding sizes of each transistor given folding sizes. Only the transistor pairs, p12 ; n12 , p10 ; n10 , they were fed to the custom cell synthesis system to type and generate p2 ; n2  that satisfy the above condition in the section with sec- physical layouts of the circuits. In order to observe the e ect tion number 2, are considered in the calculation of the partial of the folding sizes, we applied various folding sizes to generate sum of the folded numbers in the pMOS phase. the physical layouts of fract Through placement and routing Time Complexity were automatically synthesized. In order to measure the performance of the algorithm shown tools, the circuit layoutsof fract with the various folding Table 2 shows the area ratios sizes. in Fig. 3, the time complexity should be compared to that of The layout with the worst area was generated with P = 80 and the algorithm in Fig. 2. Let the number of transistor pairs N = 10, while the layout with the best area was generated with and the number of possible folding sizes in S p or S n be m P = 30 and N = 30 or N = 20. The actual best or worst area and s, respectively. The sorting of the transistor arrays in line terms may be obtained with the median folding sizes. Not all 1 takes Om log m with a reasonable sorting algorithm. The possible folding sizes are shown in Table 2 due to the space sectioning process in line 3 linearly scans the entire transistor limitation of the table. However, we can see the global view arrays, which costs Osm. Since a typical transistor is folded of the area variation with the folding sizes speci ed. We note only a nite number of times and is often smaller compared that the area di erence can be 30 depending on the folding to the folding size, the maximum folded number is assumed to sizes of transistors. be a constant. The maximum number of sections is also set to a constant, line 4 takes Os. The sorting process in each REFERENCES section takes Osm log m. While line 7 costs Os2 , line 8 1 S. S. Sapatnekar et al., An exact solution to the transistor 2 log m in the worst case due to the binary searchtakes Os sizing problem for CMOS circuits using convex optimizaing. Hence, Osm log m or Os2 log m can represent the time tion," IEEE Trans. CAD, pp. 1621-1634, Nov. 1993. complexity of the optimal folding algorithm depending on the 2 J. P. Fishburn and A. E. Dunlop, TILOS: A posynomial situation. Is is hard to de ne the relation between s and m. programming approach to transistor sizing," Proc. 1985 However, we expect that they are dependent on the applied ICCAD, pp. 326-328, 1985. design methodologies. With the gate array and gate matrix 3 J. Shyu et al., Optimization-based transistor sizing," approaches, the size of s is a small constant and irrelevant to IEEE J. Solid-State Circuits, pp. 400-409, Apr. 1988. m. Standard cell approach may have moderately big size of s, 4 Z. Dai and K. Asada, MOSIZ: A two-step transistor sizwhich is partially dependent on m. The major concern of this ing algorithm based on optimal timing assingment method paper is fully customized cell designs that have the big size of for multi-stage complex gates," Proc. 1989 CICC, pp. s. In the worst case of this approach, every transistor width 17.3.1-17.3.4, May 1989. can be di erent from the others. If we assume the folding size is proportional to the number of transistor pairs in the circuit 5 Y.-C. Hsieh, LiB: A Cell Layout Generator," Proc. 1990 as we did in the 2rst algorithm, we can get the worst case DAC, pp.474-479, 1990. complexity of Om log m, which is signi cantly smaller than 6 T. W. Her and D. F. Wong, Cell Area Minimization by 3  of the algorithm in Fig. 2. Om Transistor Folding," Proc. 1993 Euro-DAC, pp.172-177, 1993. 7 T. Uehara and W. M. VanCleemput, Optimal layout of 4 Experimental Result CMOS functional arrays," Proc. IEEE Transactions of We implemented the optimal transistor folding algorithms Computers, vol. 30, pp. 305-312, May 1981. and compared the performance in terms of the run time and the area using a layout generation system 8 . First, in order to 8 J. Kim and S. M. Kang, High Performance CMOS measure how the optimal transistor folding a ects the layout Macromodule Layout Synthesis," Proc 1994 ISCAS, 1994.

tr. number eldckt 192 highway 156 fract 598 struct 8990 avq.small 83300 avq.large 88258

layout area2  area no-folding optimal folding ratio 384800 268320 0.69 114692 114168 0.99 685000 619200 0.91 1.238x106 1.090x106 0.88 1.038x108 0.954x108 0.91 1.199x108 1.116x108 0.93

run timesec. speed algorithm 1 algorithm 2 up 2.4 0.39 6.15 2.64 0.63 4.19 10.87 2.12 5.12 173.07 14.61 11.85 2357.97 270.98 8.7 3297.45 311.03 10.6

Das könnte Ihnen auch gefallen