Beruflich Dokumente
Kultur Dokumente
Project Manager, Design Flow & Reuse, Central R&D, STMicroelectronics Pvt. Ltd.
Plot 2&3, Sector 16-A, Institutional Area, Noida, INDIA 201301
srikant-reddy.modugula@st.com
ABSTRACT In the traditional gates to layout flow there is a marked distinction between the logical and physical domains which we call respectively as Front End and Back End. With the deep sub-micron effects and dramatic increase in gate count of designs, the design cycle times have also increased proportionately, in contrast to the faster time-to-market requirements due to market needs and rapidly evolving silicon process technology. With the traditional flow, we end up with loops between Front End and Back End (IPO, ECO loops). A late realization in the design cycle that the timing requirements are not met is a costly affair. Also with many tools used in the flow for each task created, many interfaces which require file format changes, name changes, ad-hoc scripts etc. We prepared a flow for RTL to placed gates using the SYNOPSYS physical synthesis tools Chip Architect and Physical Compiler, while retaining the existing solutions, tools for clock tree synthesis, final routing, extraction, DRC/LVS. Now the synthesis and placement domains are merged to obtain better placement results with a reduced design cycle time in terms of timing requirement achievement. Also the interfaces required between tools have reduced greatly. The flow itself is simpler to learn and use, especially for current FE designers, who have less expertise with P&R. This paper aims at describing this flow in detail, mentioning all the interfaces involved. We also describe the short-comings and propose corresponding work-arounds, improvements in the tools involved.
1.0 Introduction
The flow takes a gate-level netlist as input and gives an optimized, routed def as output. With a few modifications the flow can be adapted for RTL to routed design flow.
Hierarchical Floorplanning in CA Fix Die Area Place IO pads Fix block sizes, positions and pin positions Power Planning for top level as well as blocks Define rows Place and fix macro positions Define Placement and routing obstructions wherever required Global routing and Congestion analysis at top level Write out db for all subblocks
Top_Phier.idm
Legend
PC CA Cadence data files
Placed.db
Placed_block.v
Placed_block.def
Block_ctgen.v
Continued on next page SNUG India 2001
Block_ctgen.def
Block_ctgen.v
Block_ctgen.def
Block_ctgen.db
Legend
Optimized Block_ctgen.db
wroute in SE
U1
U2
U3
U2_1
U2_2
Here after hierarchical floorplanning, the footprints of U2_1, U2_2, U2, U1, U3 are decided and then the floorplan of TOP is frozen. The power planning, hard macro positions, std. cell rows, placement and routing obstructions are all planned at this stage for all the hierarchical blocks. Then a .db for each block is written out using the update_db command in CA. For this an intial .db (with block-level constraints applied) must have been saved for each block before starting the floorplan. This can be done in Design Compiler using the characterize command if the top level boundary constraints are known. At this point it is worth mentioning that hierarchy manipulations can also be done with the purpose of post-routing backannotation without any name clashes in perspective. For the above mentioned design, one can flatten U2 alone or flatten the whole of TOP. In both these cases, after implementation backannotation would pose no problems. But if new hierarchies are created, then implementation becomes a problem. Suppose, the TOP is flattened and then new blocks U1_new and U2_new are created for implementation, then implementation becomes difficult. The .db format of SYNOPSYS is improved to handle physical information as well as the logical information. So this database format can be used as interface between SYNOPSYS tools like Design Compiler, Chip Architect, Physical Compiler, Primetime etc. Only Chip Architect and Physical Compiler can interpret the physical information though! A rough timing driven placement of the standard cells can be done inside Chip Architect (create_placement) followed by a global route (route_global). This can help in estimating and fixing timing exceptions after floorplanning. But this global routing information is in no way passed to PC as it is normally done in the traditional SE flow.
Once power planning is done at top level, that can be pushed into the block . For pin placement in the blocks, if the locations are to be specified by the designer, a script can be written and sourced in CA. If u want to avoid the pain of redoing the floorplan next time, may be for a similar project, one can write out the script used to place the blocks,pins etcand use it later. Some limitations to mention are: 1. Still some limitations on hierarchy handling. Not yet a true hierarchical floor planner, which lets u play as u want with logical and physical hierarchies. 2. For pin placement in the blocks, if the locations are to be specified by the designer, a script can be written and source in CA. There are no special features like staggering the pins or placing a group of pins equidistant are not available in CA (Avant! Apollo has some interesting features to do this)
1. A detailed router by Synopsys is needed which works in tandem with other Synopsys tools while sharing the global route algorithm. 2. Filler Cell addition to maintain the power grid connectivity in standard cell rows facility not provided. 3. write_def is not available in PC, which forces one to use db2def5.
There is a way to use physical compiler to remove the overlaps after clock tree synthesis. Overlap removal can be done inside Physical Compiler to give a better optimisation and also not to upset the placement of logic cells created by PC. This can be done by using the ctgen/run/db/qp_in.def file as input to physical compiler tools. Use legalize_placement -incremental -eco.
The usage of the qp_in.def file has to done with caution. The trials that were run for making this doc always had naming and missing nets problems when the qp_in.def file was used. Overlap removal in CTGEN is recommended instead. After the optimisation write out a db and take the db to CA. This is required for writing out a def file. PC gives a def output with db units 1000. But that is not compatible with LEF db units 100. So we have to write out a def(version 5.2 and db units 100) from CA. .update_db.PC...write_defshould be ok Some limitations to mention are: 2. The def written out by CA has one problem. CA doesnt maintain the USE CLOCK property in the def. But this property is required if we want to route clock nets separately. So the def output of CA has to be processed to add this property. Refer to a sample script A.1 in Appendix section 3. Note that adding Filler Cells to maintain the power grid connectivity in standard cell rows, takes too long in CA compared to the time taken in SE
NOTE: Timing analysis can be done in Prime Time at any of the stages, after placement or clock tree synthesis or clock routing or detailed routing.
6.0 Conclusion
We have illustrated a flow which is mixing Synopsys CA, PC with the existing tools in our standard flow, which include Cadence CTGEN, Silicon Ensemble, and Synopsys Arcadia. The flow works well with not many bottlenecks and is easy to learn and we have decided to use this for some of our future designs. However further improvements apart from the fixes needed in CA,PC to current workarounds, would be 1. Proven Clock Tree Synthesis feature in PC 2. A Detailed router which is more closely linked with CA,PC 3. Better handling of hierarchies Further experiments are on-going to establish a RTL-2-layout flow using CA, PC. To transfer the floor-plan information (PDEF) from CA to PC for hierarchical blocks is not straight forward.
7.0 References
1. Chip Architect User Guide Version 3.0 2. Physical Compiler User Guide Version 2000.11