Sie sind auf Seite 1von 9

A robust gates-2-layout methodology for Faster Migration to New DSM Processes

Srikant Reddy Modugula

Project Manager, Design Flow & Reuse, Central R&D, STMicroelectronics Pvt. Ltd.
Plot 2&3, Sector 16-A, Institutional Area, Noida, INDIA 201301

srikant-reddy.modugula@st.com
ABSTRACT In the traditional gates to layout flow there is a marked distinction between the logical and physical domains which we call respectively as Front End and Back End. With the deep sub-micron effects and dramatic increase in gate count of designs, the design cycle times have also increased proportionately, in contrast to the faster time-to-market requirements due to market needs and rapidly evolving silicon process technology. With the traditional flow, we end up with loops between Front End and Back End (IPO, ECO loops). A late realization in the design cycle that the timing requirements are not met is a costly affair. Also with many tools used in the flow for each task created, many interfaces which require file format changes, name changes, ad-hoc scripts etc. We prepared a flow for RTL to placed gates using the SYNOPSYS physical synthesis tools Chip Architect and Physical Compiler, while retaining the existing solutions, tools for clock tree synthesis, final routing, extraction, DRC/LVS. Now the synthesis and placement domains are merged to obtain better placement results with a reduced design cycle time in terms of timing requirement achievement. Also the interfaces required between tools have reduced greatly. The flow itself is simpler to learn and use, especially for current FE designers, who have less expertise with P&R. This paper aims at describing this flow in detail, mentioning all the interfaces involved. We also describe the short-comings and propose corresponding work-arounds, improvements in the tools involved.

1.0 Introduction
The flow takes a gate-level netlist as input and gives an optimized, routed def as output. With a few modifications the flow can be adapted for RTL to routed design flow.

Tools and versions used


Table 1: Tool Chip Architect Physical Compiler Design Compiler PrimeTime Silicon Ensemble CTGEN Version 3.0, 2000.11 Nov 30, 2000 2.0, 2000.11 Nov 27, 2000 2000.11 Nov 27, 2000 2000.11-PTSI-Beta2.0 5.3 3.5-p3

SNUG India 2001

2 Gates to Placed Gates For Faster Migration to New Technology

Proposed gates-2-layout Flow Diagram


Block.db from design compiler

Hierarchical Floorplanning in CA Fix Die Area Place IO pads Fix block sizes, positions and pin positions Power Planning for top level as well as blocks Define rows Place and fix macro positions Define Placement and routing obstructions wherever required Global routing and Congestion analysis at top level Write out db for all subblocks

Top_Phier.idm

Block1.db Blockn.db Block1.db

physopt Do iterations for timing closure Write out db

Legend
PC CA Cadence data files

Placed.db

Read into CA and write out def for CTGEN

Placed_block.v

Placed_block.def

Clock Tree Synthesis And Placement in CTGEN

Block_ctgen.v
Continued on next page SNUG India 2001

Block_ctgen.def

3 Gates to Placed Gates For Faster Migration to New Technology

Block_ctgen.v

Block_ctgen.def

Read in netlist and def into CA and save as IDM and db

Block_ctgen.db

(Re)optimize in PSYN if required

Legend
Optimized Block_ctgen.db

Read in def into CA and save as db

PC CA Cadence data files

Block.def for routing

wroute in SE

Generate parasitics info and backannotate

SNUG India 2001

4 Gates to Placed Gates For Faster Migration to New Technology

2.0 Hierarchical Floorplanning


Using CA, hierarchical floorplanning is done without any limit on the number of levels??? in the hierarchy. The gui features of CA come in very handy here. Once the hierarchical floorplanning is done, indivudual blocks from bottom up are implemented in Physical Compiler. For instance, consider the following hierarchical design. TOP

U1

U2

U3

U2_1

U2_2

Here after hierarchical floorplanning, the footprints of U2_1, U2_2, U2, U1, U3 are decided and then the floorplan of TOP is frozen. The power planning, hard macro positions, std. cell rows, placement and routing obstructions are all planned at this stage for all the hierarchical blocks. Then a .db for each block is written out using the update_db command in CA. For this an intial .db (with block-level constraints applied) must have been saved for each block before starting the floorplan. This can be done in Design Compiler using the characterize command if the top level boundary constraints are known. At this point it is worth mentioning that hierarchy manipulations can also be done with the purpose of post-routing backannotation without any name clashes in perspective. For the above mentioned design, one can flatten U2 alone or flatten the whole of TOP. In both these cases, after implementation backannotation would pose no problems. But if new hierarchies are created, then implementation becomes a problem. Suppose, the TOP is flattened and then new blocks U1_new and U2_new are created for implementation, then implementation becomes difficult. The .db format of SYNOPSYS is improved to handle physical information as well as the logical information. So this database format can be used as interface between SYNOPSYS tools like Design Compiler, Chip Architect, Physical Compiler, Primetime etc. Only Chip Architect and Physical Compiler can interpret the physical information though! A rough timing driven placement of the standard cells can be done inside Chip Architect (create_placement) followed by a global route (route_global). This can help in estimating and fixing timing exceptions after floorplanning. But this global routing information is in no way passed to PC as it is normally done in the traditional SE flow.

SNUG India 2001

5 Gates to Placed Gates For Faster Migration to New Technology

Once power planning is done at top level, that can be pushed into the block . For pin placement in the blocks, if the locations are to be specified by the designer, a script can be written and sourced in CA. If u want to avoid the pain of redoing the floorplan next time, may be for a similar project, one can write out the script used to place the blocks,pins etcand use it later. Some limitations to mention are: 1. Still some limitations on hierarchy handling. Not yet a true hierarchical floor planner, which lets u play as u want with logical and physical hierarchies. 2. For pin placement in the blocks, if the locations are to be specified by the designer, a script can be written and source in CA. There are no special features like staggering the pins or placing a group of pins equidistant are not available in CA (Avant! Apollo has some interesting features to do this)

3.0 Block Placement


For the placement of standard cells the netlist, floorplan data and constraints are taken to Physical Compiler. Physical Compiler does placement of the standard cells taking into consideration timing constraints, placement blockages, regionning of cells etc. Use the (physopt -effort high) command for doing placement. In Physical Compiler, commands from Prime Time like report_timing, report_constraints etc. can be used to analyze the placement of your design. Use these commands for achieving timing closure of each of the blocks. Set set_dont_touch on high fanout nets like TE (Test Enable), CK (CLOCK), CD (Reset) etc. This will tell the tool that these nets need not be optimised for timing. Once a satisfactory placement is achieved, a db is written(write) out. Now the clock tree synthesis and routing are done in the Cadence environment. So a DEF file of the placed design is required. The DEF file can be obtained by using the db2def5 utility packaged with PC. But this utility writes out DEF with DB units 1000. The LEF units given in ST libraries is 100. These two are not compatible and hence the db has to be taken into CA to write out a DEF file with DB units 100. So we read the placed db inside CA and write out a def (write_def). A verilog netlist is also written(write) out. The reason for this is explained in the next section. Good correlation is observed between the congestion map in Physical Compiler and the one in Silicon Ensemble. Some limitations to mention are:

SNUG India 2001

6 Gates to Placed Gates For Faster Migration to New Technology

1. A detailed router by Synopsys is needed which works in tandem with other Synopsys tools while sharing the global route algorithm. 2. Filler Cell addition to maintain the power grid connectivity in standard cell rows facility not provided. 3. write_def is not available in PC, which forces one to use db2def5.

4.0 Clock Tree Synthesis


Clock Tree Synthesis can not be done inside Physical Compiler. Hence, we stick to the proven and established CTGEN flow. The inputs to CTGEN are the placed_block.def from PC, ctgen.cst which will contain all the clock information such as the root pins, leaf pins, clock constraints and types of buffers and inverters to be used. The control file design.cmd will be the argument when you run ctgentool. A verilog netlist after the clock-tree synthesis is required to enter SYNOPSYS environment for eco. For this, we must give a (original) verilog netlist as input to CTGEN. This is the reason for writing out a verilog netlist after placement. Some limitations to mention are: 1. No Clock Tree Synthesis feature in PC

5.0 Post Clock Tree Synthesis (including ECO if needed)


After clock tree synthesis, the verilog netlist and the def output from ctgen are read into CA. The database is then saved in .db and IDM formats. The .db can be taken to Physical Compiler for Post ClockTree eco. For guidlines on this refer to Physical Compiler User Guide Version 2000.11, page 9-9.

There is a way to use physical compiler to remove the overlaps after clock tree synthesis. Overlap removal can be done inside Physical Compiler to give a better optimisation and also not to upset the placement of logic cells created by PC. This can be done by using the ctgen/run/db/qp_in.def file as input to physical compiler tools. Use legalize_placement -incremental -eco.

SNUG India 2001

7 Gates to Placed Gates For Faster Migration to New Technology

The usage of the qp_in.def file has to done with caution. The trials that were run for making this doc always had naming and missing nets problems when the qp_in.def file was used. Overlap removal in CTGEN is recommended instead. After the optimisation write out a db and take the db to CA. This is required for writing out a def file. PC gives a def output with db units 1000. But that is not compatible with LEF db units 100. So we have to write out a def(version 5.2 and db units 100) from CA. .update_db.PC...write_defshould be ok Some limitations to mention are: 2. The def written out by CA has one problem. CA doesnt maintain the USE CLOCK property in the def. But this property is required if we want to route clock nets separately. So the def output of CA has to be processed to add this property. Refer to a sample script A.1 in Appendix section 3. Note that adding Filler Cells to maintain the power grid connectivity in standard cell rows, takes too long in CA compared to the time taken in SE

Clock Routing and Detailed Signal Routing


Before we move on to routing the design a few things have to be taken care of. Filler Cells have to be added to maintain power rail continuity in std cell rows. After which fillervia cells must be instantiated to propagate power and ground to the std cells. For doing this mac files can be used. But the co-ordinates of the power stripes have to be extracted first. This can be done using tcl scripts. Now we are all set for routing the clock net. We can use wroute to route the clock net by selecting the clock net first. Once the clock routing is finished, detailed routing can be done with wroute.

NOTE: Timing analysis can be done in Prime Time at any of the stages, after placement or clock tree synthesis or clock routing or detailed routing.

SNUG India 2001

8 Gates to Placed Gates For Faster Migration to New Technology

Post Routing ECO Flow


After detailed routing, an rspf can be extracted from SE itself or extraction can be done in Arcadia to get a dspf file. This parasitic file is read into Primetime along with the verilog netlist obtained after clock tree synthesis by CA. Then write out a sdf and setload file. Now we have to create a new IDM database which will be used for all post cts analysis. For this read in the post-cts verilog netlist and the routed def into CA and save as IDM and db. There will be warnings for fillercell instances which can be ignored but caution has to be maintained during all future operations. These warnings occur because these fillercell and fillervia instances are not there in the front-end netlist which is used. Some places where this can lead to trouble are during eco runs. An eco run may instantiate a new buffer which maybe placed in a location where a fillercell already exists. These kind of overlaps are not removed by the tool as it does not have any information of the fillercell placement. Suggested remedy is to place fillercells and fillervias all over again. Take the .db created in CA to PC and do a post routing eco (physopt -incremental -post_route). Before the re-optimization, the parasitics information must have been read in delay file format(read_sdf) in. Refer to Physical Compiler User Guide Version 2000.11, page 9-12. This means that we work on the placed database taking into account the actual parasitics extracted after routing the initial placed design. The placement maybe improved. Then the new placed design has to be routed again.

6.0 Conclusion
We have illustrated a flow which is mixing Synopsys CA, PC with the existing tools in our standard flow, which include Cadence CTGEN, Silicon Ensemble, and Synopsys Arcadia. The flow works well with not many bottlenecks and is easy to learn and we have decided to use this for some of our future designs. However further improvements apart from the fixes needed in CA,PC to current workarounds, would be 1. Proven Clock Tree Synthesis feature in PC 2. A Detailed router which is more closely linked with CA,PC 3. Better handling of hierarchies Further experiments are on-going to establish a RTL-2-layout flow using CA, PC. To transfer the floor-plan information (PDEF) from CA to PC for hierarchical blocks is not straight forward.

7.0 References
1. Chip Architect User Guide Version 3.0 2. Physical Compiler User Guide Version 2000.11

SNUG India 2001

9 Gates to Placed Gates For Faster Migration to New Technology

Das könnte Ihnen auch gefallen