Sie sind auf Seite 1von 18

Institut für Integrierte Systeme

Integrated Systems Laboratory

Department of Information Technology and Electrical Engineering

VLSI II:
Design of Very Large Scale Integration Circuits
227-0147-00L

Exercise 8

Placement and Routing Flow


Prof. L. Benini
F. Gürkaynak

Last Changed: 2019-04-15 16:34:37 +0200

Reminder:
With the execution of this training you declare that you understand and accept the regulations about using
CAE/CAD software installations at the ETH Zurich. These regulations can be read anytime at
http://eda.ee.ethz.ch/index.php/Regulations.
1 Overview
In Exercise 3 we learned how to setup the back-end design flow in C ADENCE I NNOVUS, we have added the
pad frame, placed macro blocks, and we have performed the power routing steps. In this exercise, we will
continue the back-end design flow where we stopped at the end of Exercise 3, and will perform main tasks of
the back-end design: placement and routing. In the process you will learn:
• How to place the standard cells
• How to perform the signal routing (power routing was covered in Exercise 3)
• How to synthesize the clock tree (remember from Exercise 5 that high-fan-out nets, such as clock, reset
and scan enable require special attention during back-end design because their routing is critical for the
correct timing of your chip)
• How to make sure that your chip after back-end design still meets all the timing constraints you have
specified.
The starting point for this exercise is the saved design of Exercise 3 in which the floorplaning and power routing
was completed. You will make use of your knowledge of timing you acquired in Exercise 5 to successfully
complete this exercise.

1.1 About the Style


We will use different typesetting and highlighting styles to identify different types of actions.

Student Task: Parts of the text that have a gray background, like the current paragraph, indicate steps
required to complete the exercise.

Actions that require you to select a specific menu will be shown like the following:
menu→sub-menu→sub-sub-menu
Whenever there is an option or a tab that can be found in the current view/menu we will use a BUTTON to
indicate such an option. Throughout the exercise you will be asked to enter certain commands using the
command line.

Note: We strongly recommend to use the command line instead of the GUI mainly because of two rea-
sons: First, certain commands or options are not available via the GUI; and second, during your chip
design you will for sure run the same sequence of commands many times, and the commands you enter
in the command line can be collected in a script to make the repeated execution faster and the results
reproducible.

The following is an example of how the execution of Linux commands are represented in this and following
exercises.
sh > some_linux_command

Whereas some of the commands you will have to be entered on the command line of the C ADENCE I NNOVUS
tool, which is represented by
sh > some_innovus_command

2 Getting Started
You will need a terminal program to type in commands throughout this exercise. In the computers in the ETZ
D61.2 you can get a terminal by accessing the menu on the top left corner and selecting Applications→\
Accessories→Terminal.

2
Student Task 1:
• Change to your home directory and install the training files with the script provided:
sh > cd
sh > /home/vlsi2/ex08/install_ex08

• Change to the design directory


sh > cd ex08

• Start C ADENCE I NNOVUSa either from your design directory by using cockpit
sh > icdesign umcL65 &

• or from the encounter directory by issuing the command


sh > cd ~/ex08/encounter
sh > cds_innovus-17.11.000 innovus

a This exercise uses version 17.11 of Cadence Innovus. In the past, we used Cadence Encounter version 13.14. The
main difference to the previous version is a fancier name and a new clock insertion tool which will be used throughout this
exercise.

We want to continue the back-end design from where we stopped at the end of Exercise 3. To make things
easier we have already provided you with a save file from the end of Exercise 3 under ./encounter/save\
/filter_chip.enc1

Note: Once again, for an efficient back-end flow it is really important to build up a script while you continue
through the flow. C ADENCE I NNOVUS keeps a log of all the commands you enter in the command line or
execute via the GUI. Each time you start C ADENCE I NNOVUS a new command log file innovus.cmd* is
created in the working directory. You can simply copy the commands you need to build your script from
these files.
However, be aware that if you execute commands via the GUI, the corresponding commands in the logs
may include a lot of unnecessary command options that were specified as default in the dialog boxes.
Thus, you should consult the man pages of the commands to understand what exactly they do.

Student Task 2:
• Select File→Restore Design and change the DATA T YPE to I NNOVUS, which will change the
appearance of the requester. Click on the folder icon at the far end of the R ESTORE D ESIGN F ILE
field and select the saved design with the name filter_chip.enc, which was stored in the ./\
encounter/save directory.

The design we just loaded contains the completed pad frame, the placed SRAM macro-cell, the power rings
around the core, the power grid and should correspond to what we did in Exercise 3. Make sure to switch to
the Physical View to the top right of the area where the layout is2 . If you zoom in the layout using the ’z’ key,
you will see that all standard cell rows are empty. What you actually see are the power and ground lines on
the metal layer (Metal-1) that will later connect to the corresponding supply pins of the standard cells. With
’SHIFT-z’ you can zoom out again or press ’f’ to fit the entire chip layout into the innovus window.

1 Normally you only need to make a save to preserve the state of the design after a computationally intensive operation such as
a lengthy optimization or final routing. Steps that do not take much time can be easily repeated by executing a script.
2 See earlier exercises for using Cadence Innovus

3
2.1 Well taps
Before we can have C ADENCE I NNOVUS place the standard cells for us, we need to add well taps to the
design. All MOS transistors need a bulk connection to function properly. For simple digital designs, the bulk
of all nMOS transistors are normally connected to the lowest voltage (ground), and the bulk of all pMOS
transistors are connected to the highest voltage (VDD), these connections actually are made by making a
direct connection to the pwell (for nMOS) and nwell (for pMOS) transistors, and are therefore also known
as well taps. There are some advanced design techniques, where the bulk connections are connected to a
different voltage. This is called body biasing. In older technologies, the bulk connections of all transistors were
automatically connected to GND and VDD. However, in some more modern technologies, standard cells have
the option of providing a separate body bias. This is also the case for umcL65. In this technology the standard
cells do not have any well taps at all. The well taps have to be provided by specific well tap cells instantiated
regularly with a certain density. There are different well tap cells, that either connect the wells to the supply
voltages (standard behavior) or allow one or both of the voltages to be routed to a global signal for advanced
body biasing solutions.
Student Task 3:
• There is a script that will automatically place the well taps at the correct spacing. Type the following in
the C ADENCE I NNOVUS console window, and do not forget to include the same line in your back-end
script.
sh > source scripts/welltap.tcl

• Zoom in the layout again and examine the result of this pre-placement step. How far apart from each
other were they placed?

Now it is time to place the standard cells into our design.

3 Placement
The goal of the placement is to find a suitable location for all the standard cells in the design in a way that
minimizes the utilization of routing resources and satisfies the given timing constraints. The placement problem
is computationally very intensive3 , and only heuristic algorithms can be used for this purpose. Before we place
all standard cells we have to give C ADENCE I NNOVUS additional information about all scan chains in the design
because this information can be used to improve the placement of the standard cells.
Student Task 4:
• The design contains of 14 scan chains which we can specify with the following command:
sh > specifyScanChain scan_chain0 -start In_DI[0] -stop Out_DO[0]

• Repeat the command for the remaining 13 scan chains. (chains: In DI[i] - Out DO[i], i=[0:13])
• Since In DI[i] and Out DO[i] are not the actual names of the pad instances but names of the ports
of the synopsys netlist, we have to allow C ADENCE I NNOVUS to trace the scan chains to the actual
pads. The following two commands take care of that.
sh > setScanReorderMode -compLogic true
sh > scanTrace -lockup -verbose

As with all heuristic algorithms, the initial conditions of the problem can affect the final solution. We influence
these initial conditions by placing the macro and I/O cells. It is also possible to give C ADENCE I NNOVUS hints
as where to place certain cells4 , however we will not cover this in this exercise, and rely on the defaults.
3 It is actually NP-complete
4 Just the basics: When you change to the Floorplan mode you will see a pink box that corresponds to the design. You can

4
Student Task 5:
• As mentioned earlier, this is a computationally intensive task, so make sure that you get all the help
from your computer by typing:
sh > setMultiCpuUsage -localCpu max

• Select Place →Place Standard Cell. We will keep the default options in this dialog box. This
will allow us to run a full placement (not an incremental one which builds on a previous placement,
and not a quick one)a . I NCLUDE P RE -P LACE O PTIMIZATION is very useful as it removes all buffer/in-
verter trees from the netlist, which will facilitate the timing analysis as you will see later.
• In the Mode... menu, set the congestion effort to medium and deactivate the Timing Driven Place-
ment. Click on the button OK to save your changes.
• Now, we are ready to start the placement procedure by clicking OK. This may take some time.
a You can find a short description of all the options if you press Help in the bottom right corner of the dialog box.

The placement tool has several advanced options, which we accessed by clicking on the M ODE within the
Place window that you start from Place →Place Standard Cell.
We have to warn you about the various performance related options you can find under this menu such as
C ONGESTION E FFORT and RUN T IMING D RIVEN P LACEMENT. In the exercises we will sometimes advise you
to use certain settings for these options (or keep the defaults) in order to reduce run time, or because for
this particular design we have found out that a particular option gives better results. When you do your own
designs, you should consider evaluating which options are better suited rather than copying the options from
this exercise.
Just to give some brief information about some of these options5 :
Congestion Effort On a chip there are only a finite amount of routing resources. C ADENCE I NNOVUS divides
the circuit area into equally sized gcells and determines how many routing resources you will have in
a gcell both in horizontal and vertical direction. Once you place the cells this will result in a certain
connection pattern, and C ADENCE I NNOVUS will be able to make the connections and see how much
percent of the gcells would be needed. If you overfill a gcell there will be congestion. You can actually
see this, in your design, the congestion markers in the layer Control window on the right hand side of
C ADENCE I NNOVUS are per default enabled. In our example, there will be only very few of these markers,
so the see them better, turn off all the routing layers. The following figure shows an example congestion
marker

ungroup this box and select the boxes that correspond to modules in your hierarchy. You can then carry them into the chip
and by right clicking select the Attribute Editor. There you will find the Constraint Type where you can select whether or
not this is a slight suggestion (Guide), a general instruction (Region) or an absolute order Fence. Make sure you first start
with a default placement strategy before you try these features.
5 this is not a complete description of all the options. Additional information can be found in the man pages of the placement
commands placeDesign and setPlaceMode.

5
In the figure you can see that there are 161 routing tracks available in the horizontal direction, but 170
were used. This may or may not be problem, after all not all the connections need to go through that
particular area, if the neighboring areas are available, some might make a small detour.
The congestion effort is a measure that shows how much effort should be put in to minimizing such
congestion hot spots. This of course depends on how tight the overall routing resources are, and will
change from design to design.
Timing Driven Placement Timing driven placement tries to minimize the distance between connected cells
in order reduce the delays due to long interconnect lines and the associated parasitics. On paper, this
option sounds like a must have option. In practice, it sometimes results in longer run times at similar or
worse performance. This highly depends on the type of circuit you have as well.
Module Plan It is difficult to guess the functionality of this option. It uses a different placement engine that
bunches up gates that are within the same module together. The result is a placement where there are
sort of bubbles that collect cells together. In most cases this option works pretty well. However, in some
designs turning this option off yields better results. We would suggest experimenting with this option only
if you have congestion problems or are generally not satisfied with the results you are getting from the
placement.
Scan Connection Placement algorithms try to optimize the placement by calculating a cost function that in-
volves the number (and distance) of connections between the cells. This also includes nets that belong
to the scan chain. However, the scan chain connections could actually be re-routed. For scan function-
ality, the order in which the flip-flops are connected will not always matter, so C ADENCE I NNOVUS could
actually re-route these cells in order to find a better placement solution.
Maximum Density The density refers to how much of the available space is used to place cells. This option
allows some space to be left between the cells that could alleviate congestion problems, or could be used
to add more filler cells for signal integrity issues.

6
Student Task 6:
• As soon as the placement is completed, examine the result in the design browser by switching to the
P HYSICAL VIEW. You can get a better view of the placed standard cells if you turn off the visibility
of the power grid by deselecting P/G in the N ET section of the L AYER C ONTROL at the right edge
of the C ADENCE I NNOVUS window. Zoom in with the ’z’ key too have a closer look at the detailed
placement of the standard cells.
• You may notice that the standard cells are bunched up into several distinct clusters. Find out what
these clusters are by using the design browser. The standard cells pertaining to the entities you
select in the design browser will be highlighted in the layout.
• The command checkPlace provides information about the number of placed standard cells and
placement density. If by any chance not all standard cells have been placed, you can use the
ecoPlace command to place the remaining cells.
• The previously specified scan chains can be highlighted and cleared again with the following com-
mands:
sh > displayScanChain
sh > clearScanDisplay

Placement and routing is an iterative process. Rather than obsessing about the current results, the best course
of action is to continue with the process and then evaluate the end results. If you are not happy with the overall
result, you could try a different strategy for the next iteration. Of course what you change will depend on
what exactly is not to your liking in the current iteration. For example if the final design was unroutable due
to congestion, in the next iteration you will have to place cells with more space. Only experience and several
iterations will allow you to find a placement for your circuit that is close to optimal.

7
The results for placement (and later routing) are strongly design dependent. For example, structures with
many interconnections such as look-up tables will usually need much more space than synthesis predicted as
the cells need to be spread out in order to have enough space to route all the interconnections. This is why
generalizations for back-end design, such as ”During back-end design, your circuit area will increase by 10%”
don’t work very well.

Student Task 7:
• Let us save the entire design with File →Save Design. Select I NNOVUS, use the browse button
to change to the ./save folder and select an appropriate filename, e.g., filter_chip_preCTS
(as our design is currently in the pre–clock–tree–synthesis (preCTS) stage).
Alternatively you could also just save the placement. To do that, select File →Save →Place, and
again chose a sensible filename.

3.1 Replacing Tie cells


During synthesis, S YNOPSYS D ESIGN C OMPILER assigns constant logic values to two special standard cells
named TIE0x and TIE1x, where x is a drive strength modifier6 . This creates a small inconvenience, as often
one of these cells is assigned to drive many outputs at the same time, creating relatively long interconnections.
Long nets require optimizations due to max capacitance and fan out requirements in the library7 . If you think
about it, there is no need to connect all the constants to a single tie cell. These are very small any way, and
could be replicated at will. There is sufficient place on the chip to place several of these cells.

6 In theory, you could directly connect these nets to VDD and VSS. However, this would connect the gate of the transistors to
the power rail, which in practice could cause several problems that could breakdown the gate-oxide. This is the reason why
usually a simple cell is used to drive the gates through a transistor rather than a power rail.
7 All related to the dynamic behavior of the cells.

8
We will use a script that first removes all these cells. Then we will set the rules for placing these cells. The
example script scripts/tiehilo.tcl sets the maximum number of connections driven by a single cell to
12, and the maximum distance between the pin and the tie cell to 200 µm. And, finally, we insert the tie cells
according to the rules we have defined.

Student Task 8:
• At the command line type:
sh > source scripts/tiehilo.tcl

Although we are adding more cells to our layout, actually the routing will be simplified, as we no longer have to
deal with high fan-out nets for constants.

3.2 preCTS optimizations


Now that we have placed our design, it is time to look at how we stand regarding the timing constraints we
have specified. Please refer to Exercise 5 if you have questions about how to define timing constraints and
how to run the timing analysis. 8

Student Task 9: Run a timing analysis in the preCTS mode:


sh > timeDesign -preCTS -outDir reports/timing

If you examine the results, you will realize that you have a lot of DRV (Design Rule Violations). As explained
in Exercise 5 these are nets that exceed certain limits set in the library. Before correcting all the other timing
problems, it is a good idea to address these problems first.

Student Task 10: Run an optimization that will fix only the DRV errors in the preCTS mode:
sh > optDesign -preCTS -drv -outDir reports/timing

Running the DRV rule fixing first reduces easily fixable problems in the design. The subsequent optimization
phase can then concentrate on the real problems. If you examine the results you will see that the timing is
much better, but there are still violations.
Student Task 11: Now let us run a standard optimization step to try to fix the timing violation.
sh > optDesign -preCTS -outDir reports/timing

When and how many times you optimize a design is one of the many parameters you can experiment with
during the back-end design flow. In most cases, optimizing after every step works very well. In others, you
could defer the optimization step until the very end, and still get acceptable results.

Student Task 12: These last steps have taken some time to run, in case something goes wrong during the
next steps, it is a good idea to save the current state under ./encounter/save/filter_chip_placed\
.enc
Select File→Save Design..., change the DATA T YPE to I NNOVUS and using the browser save the
design under ./encounter/save/a .
Alternatively you could simply type the following command on the console

8 You can use the command createBasicPathGroups -expanded to display slacks for each path group (reg2reg, in2out, in2reg,
reg2out).

9
sh > saveDesign save/filter_chip_placed.enc

a Make sure that you select the correct directory, Cadence EDI Encounter does not automatically place you in the
./encounter/save/ directory when you select Save Design.... You do not want your saves distributed in random
directories.

10
4 Clock Tree Insertion
The fan-out of a net refers to the number of inputs driven by a particular output. High fan-out nets (that drive
hundreds or even thousands of inputs) need to be handled differently from standard interconnections.
Every synchronous circuit has at least one high fan-out net, namely the clock net. For most circuits reset and
scan-enable signals have to be distributed to each and every flip-flop as well.
The main problem with high fan-out nets is the large load capacitance that needs to be driven. Each driven
input adds its own input capacitance to the total load capacitance and in addition, the interconnection required
to distribute the signal to all these inputs increases the load capacitance further.
There are three important parameters for such nets:
Transition time This is the time it takes to change the logic level of a node (e.g. 0 → 1). Basically, the
more load an output has to drive, the more time is required to charge this load. CMOS drivers consume
additional short circuit current during the transition, therefore long transition times are not very welcome.
Furthermore, noise on signals with long transition times can result in glitching. Most libraries set an upper
limit for the transition time (for the technology we are using this is 0.56 ns for typical libraries). To lower
the transition time, a tree of buffers can be inserted so that the total load is shared between the buffers.
The lower the desired transition time, the more buffers are required.
Insertion delay The time required for the signal to travel from the driver to the end-points. This delay is usually
different for each end-point. Each level of buffers in the buffer tree will add a delay to the signal.
Skew The difference between insertion delays of different end-points. To minimize skew, a balanced buffer
tree has to be built. Generally, the lower the desired skew the more buffers are required.
What parameters are most important depends on the type of net:
Clock Our main concern is to control the skew, since it will affect the timing. The maximum acceptable skew
depends on the defined clock period. As an example, for a 20 MHz clock a clock skew of 0.5 ns is
acceptable. But for a 200 MHz clock, the same skew equals to 10% of the clock period and would be too
high.
If you over-constrain your skew, you will need a deep (and large) clock tree and your insertion time will
rise, which will affect your input and output timing. Therefore you will want to balance the skew against
insertion delay and the number of buffers. Constraining maximum insertion delay too low will usually
degrade results.
Usually, a tree that gives you an acceptable skew will also give you a decent transition time, so you don’t
have to worry about that.
Reset We are interested in propagating the reset within one clock cycle to all flip-flops in our design. For
designs with on-chip reset synchronization this is strictly required. The insertion delay should therefore
be less than the clock period, transition times within the bounds imposed by the technology and skew
doesn’t matter at all.
Scan Enable Very similar to the reset signal. Usually a slower clock is used for scan testing, therefore we can
allow even a larger insertion delay. For transition time and skew the same holds true as for the reset.

11
Trunk Leaf

Sink Tran

Buf Tran

Sink Tran

Buf Tran

Clk_CI
Sink Tran

Buf Tran

Sink Tran

Min Delay

Max Delay

Max Skew

As shown in the figure above the clock tree is divided into trunk and leaf. In C ADENCE I NNOVUS, a new tool
called Clock Concurrent Optimization (CCOpt) performs the synthesis of this clock tree. In contrast to previous
clock tree synthesis (CTS) tools CCOpt optimizes both the clock tree and the datapath to meet global timing
constraints. Hence, small skews at the leafs of the clock tree can be used to balance the different propagation
delays between flip-flops. We will use CCOpt to insert the clock tree. Scan Enable and Reset are less timing
critical and can also be routed with all other signals. Hence, no special treatment is required.

Student Task 13:


• A sample ccopt specification file can be found under ./encounter/src/sample/chip.ccopt\
.spec.
• Copy this file to the src directory and make sure that the clock name matches your design.
• For educational purposes, change the max. transition time at leafs to 0.2 ns and specify a target
skew of 0.3 ns in the clock tree specification.

12
Student Task 14:
• We have adjusted the clock tree specification and are ready to insert the tree. First source the
specifications:
sh > source src/filter_chip.ccopt.spec

• Now we perform the clock tree insertion and data path optimization in one step with the following
command:
sh > ccopt_design -outDir reports/timing

• Generate the clock tree results and identify the total insertion delay, total number of inserted buffers,
and the maximum skew.
sh > mkdir -p reports/clock

sh > report_ccopt_clock_trees -file reports/clock/clock_trees.rpt


sh > report_ccopt_skew_groups -file reports/clock/skew_groups.rpt

Note: Just because you have specified some values in the constraints file, it does not mean that these
will be automatically fulfilled. In our case, the constraints were relatively easy to realize, and you should
not have any problems. You will also have to pay a price for the performance you demand. In a clock tree,
this is the number of clock tree cells (buffers and inverters) that have to be inserted.

The clock tree with all buffers, leafs, and all skews can also be displayed in a graphical interface.

Student Task 15:


• Select Clock →CCOpt Clock Tree Debugger... to start the gui. Accept the default name and
press ok.
• Zoom in and click on different clock nets to see where they are located on the chip. (make sure to
hide all signals, and special nets)

13
4.1 Additional Tricks for Clock Tree Synthesis
If you are working on a design with challenging clock distribution, the Clock→CCOpt Clock Tree Debugger\
can be quite helpful. This tool allows you to graphically display the clock tree, the skews and insertion delays
as seen in the figure below.

This can be useful in analyzing the connections in more complex designs, especially when multiple clocks are
involved.
In case you have trouble balancing paths in your design you can make use of skew groups. Let us assume the
request path to a memory is 1.0 ns longer than the return path and slows down your complete circuit because
it is not possible to balance the paths with additional pipeline registers. In this case you can optimize the timing
by allowing the optimizer to use larger skews with the following command:
sh > setUsefulSkewMode -maxSkew true -maxAllowedDelay 0.5 -minAllowedDelay \
0.0

This allows the clock tree to have skews up to 0.5 ns. In the best case the insertion delay of all flip-flops will
then be 0.5 ns shorter than the insertion delay of the memory macro. This 0.5 ns skew would perfectly balance
the request and return path of the memory and allow you to achieve a higher operating frequency. You can
also define different skew groups in the ccopt-specification file (see src/sample/chip.ccopt.spec). In
this example the memory would be in one group and all the rest in the other.
Skew groups can also be used to minimize the skew between two nets, e.g. if you want to create a fast clock
out of two slower clock nets it is necessary to have two balanced insertion delays to create a fast and clean
clock. Hence, you would put both nets in one skew group and minimize the skew in this group.

14
sh > set_ccopt_property -skew_group balancedClocks target_skew 0.05

Note: Such strategies can indeed improve timing, but should be discussed with your assistant as this can
also lead to major problems if not properly used! In any case the graphical clock tree debug tool should to
understand what the tool did with the different clock trees.

5 Signal Routing
Finally the last step of the back-end design is to complete the routing of all the connections. Note that what
you have seen so far as routing is only the result of trial route, which is used to quickly approximate the routing
parasitics but is useless for interconnections as it is full of shorts and DRC errors9 .

Student Task 16:


• Start the routing process by selecting Route →NanoRoute →Route.... A large window will open.
Enable the I NSERT D IODES option (you can leave the D IODE C ELL N AME field blank) and leave all
other settings at their defaultsa . Click OK to start routing. You can observe the progress in the
console window.
• To check if all nets are routed you can use the following command:
sh > checkRoute

a If you have followed the exercise correctly, you should already have Number of Local CPU(s) option at more than 1.
If this is not the case set the number of CPUs to the number you have available on your computer for an almost linear
speed up

The F IX A NTENNA and I NSERT D IODE will cause the router to change layers and/or insert special protection
diodes in order to avoid damages that can happen during manufacturing due to charges that accumulate on
the wires and stress the gate oxide of input pins. Note that this is usually referred to as P ROCESS A NTENNAS
which is entirely different from geometrical antennas (which is related to dangling wires).
Our example design should route without problems. This is not always the case and we might get geometry
violations. Geometry violations include shorts between nets and design rule violations (for example metal lines
are drawn too close to be manufactured as separate wires). Needless to say that we must solve all these
violations.
You should always closely examine the violations in order to find out what causes them. Sometimes there is an
unfortunate placement of macro-cells or power lines to blame and sometimes there is just not enough space
to route all connections. Solutions range from re-running routing to completely reworking the floorplan.

Student Task 17:


• Now that we have the real signal wiring we need to perform a post-route timing analysis to see if we
still meet all constraints. At this point not only a setup time analysis, but also a hold time analysis
needs to be run. Usually it is not necessary to deal with hold time until this point.
Note that you have to do two separate runs, one for setup and one for hold, as it is not possible
do this in one single step. Use the GUI (make sure to select P OST-R OUTE) or type the commands
below to perform the two analyzes.
sh > timeDesign -postRoute -outDir reports/timing
sh > timeDesign -postRoute -hold -outDir reports/timing

9 We will see more about DRC errors in Exercises 9 and 10.

15
• Inspect the two summaries and the report files written to the reports/timing directory. If you
have problems with the timing you should run an optimization step.

At this point, the timing analysis is at its most accurate

16
Student Task 18:
• Now let us have a look at the post-route timing of our clock tree(s)
sh > report_ccopt_clock_trees -file reports/clock/clock_treesPostRoute\
.rpt
sh > report_ccopt_skew_groups -file reports/clock/skew_groupsPostRoute\
.rpt

For this sample design the clock report after routing is almost the same of the one before routing.
This is not always the case, specially when the density is high and there is not a lot of space between
standard cells for wires.

5.1 Multi-cut vias


Vias connect one metal layer to the next one. In modern technologies these are actually one of the harder
structures to manufacture. Consequently some device failures are attributed to malformed vias. One solution
is to have redundant connections, these are known as multi-cut vias. You might have noticed that C ADENCE
I NNOVUS reports the ratio of single and multi-cut vias during routing phases, similar to the following report.

#Up-Via Summary (total 923846):


# single-cut multi-cut Total
#-----------------------------------------------------------
# Metal 1 336388 (100.0%) 0 ( 0.0%) 336388
# Metal 2 333433 (100.0%) 6 ( 0.0%) 333439
# Metal 3 138280 (100.0%) 0 ( 0.0%) 138280
# Metal 4 72246 (100.0%) 3 ( 0.0%) 72249
# Metal 5 35181 (100.0%) 0 ( 0.0%) 35181
# Metal 6 0 ( 0.0%) 6957 (100.0%) 6957
# Metal 7 1352 (100.0%) 0 ( 0.0%) 1352

Notice that all via connections from Metal6 to Metal7 are multi-cut as this is enforced10 . For all other layers
only single-cut vias were used. The main problem with multi-cut vias is that generally vias are placed at the
intersection points of two metal lines that by default run horizontal at one level and vertical at the other. This
arrangement does not naturally allow for two vias to be placed next to each other and would require routing
in the non-preferred direction at least on one of the layers which will reduce the routing density somewhat.
However, once routing is finished, a post routing repair stage can be used to add as many multi-cut vias as
possible.
sh > setNanoRouteMode -droutePostRouteSwapVia multiCut
sh > routeDesign -viaOpt

This will not make all vias multi-cut but will significantly increase the ratio.

6 Next steps
Congratulations, we have finished the placement and routing stage. In the next exercise we will see how we
can export the data from C ADENCE I NNOVUS and will talk about some checks that have to be performed before
we conclude our design.
The last part of the design will be DRC and LVS checks which will be done using M ENTOR G RAPHICS C ALIBRE,
a different tool. Once our design passes those checks, it can be manufactured.
10 This is the result of the design rules as supplied by the manufacturer and available in the design rule manual ./docs/65\
nm_layout_rules.pdf. According to these rules, all connections from metal 6 to metal 7, need to be multi-cut. This is
mainly because there is a wide disparity in the minimum size between metal 6 and metal 7 and the via6 is comparatively small.
This rule is specific to the metallization option we have chosen for umcL65.

17
We strongly encourage you to go through the placement and routing flow once as early as possible for your
design. This will allow you to
• Prepare the basic setup that you can reuse for the final chip
• Get some experience and a feel of the design. Some designs offer more challenges than others, this will
allow you to judge how much time you will need to factor in for the final back-end flow.
• See if the constraints for area and timing are realistic for your design.
• Obtain a usable power estimation for your circuit. As explained in Exercise 6, power estimations are
wildly inaccurate if parasitics are not properly accounted for.

18

Das könnte Ihnen auch gefallen