Beruflich Dokumente
Kultur Dokumente
ur Integrierte Systeme
Integrated Systems Laboratory
Training 1
SVN Rev.:
Last Changed:
1016
2013-10-15
Reminder:
With the execution of this training you declare that you understand and accept the regulations about
using CAE/CAD software installations at the ETH Zurich. These regulations can be read anytime at
http://dz.ee.ethz.ch/regulations/index.en.html.
1 Overview
Unlike other exercises in the VLSI lectures, the back-end design flow requires you to learn how to
use a commercial Electronic Design Automation (EDA) tool, in our case C ADENCE S O C E NCOUNTER
from Cadence Design Systems. These exercises are therefore called Trainings and will teach you
the basics of C ADENCE S O C E NCOUNTER so that you can use it for your semester projects.
There will be three trainings:
Training 1
Floorplanning, placement, clock tree synthesis, optimization, routing and timing analysis with
C ADENCE S O C E NCOUNTER .
Training 2
Determining power consumption, IR drop analysis.
Training 3
Tape-out preparation, performing Design Rule Check (DRC) and Layout Versus Schematic
(LVS) on your final database.
Students who plan to work on an ASIC semester project should make sure to visit all three trainings.
There are many reasons for using a commandline. Some functionality can not not be accessed through GUI commands,
and in some cases, using the commandline will be much faster. Most importantly, things you enter on the commandline
can be converted into a script and executed repeatedly
2 Introduction
In this training we will start with a structural Verilog design netlist (from synthesis) and create step by
step a physical layout that can be manufactured. To keep runtimes reasonably low, we will use an
example design with a (slightly) lower complexity than most student design projects.
ResetxRBI
DataInxDI
RamWDxD
DataInReqxSI
RamTestxTI
RamAddrxD
SY180_2048X16X1CM8
r256x72tb300xo
DataInAckxSO
RamRDxD
LUT
16
LUT
16
32
16
LUT
16
32
16
LUT
16
32
16
LUT
16
32
16
16
32
ScanEnxTI
DataOutAckxSI
48
48
48
48
48
48
48
48
48
48
48
48
48
48
48
48
48
48
48
48
48
48
filter_stage1
filter_stage2
filter_stage3
filter_stage4
DataOutReqxSO
48
48
DataOutxDO
48
0
ClkxCI
filter_stage8
filter
fiter_top
filter_chip
Each filter stage contains a large multiplier, a look-up table and an accumulator. Note that the input of
the first stage is tied to constants and therefore greatly simplified. The following is a short description
of all pins of the circuit:
The filter is basically useless and has only been engineered as an example circuit suitable for the exercise.
Pin Descriptions
Name
Bits
Dir
Description
ClkxCI
In
Clock input
ResetxRBI
In
ScanEnxTI
In
RamTestxTI
In
DataInxDI
16
In
DataInReqxSI
In
DataInAckxSO
Out
DataOutxDO
16
Out
DataOutReqxSO
Out
DataOutAckxSI
In
3 Getting Started
You will need a terminal program to type in commands throughout this exercise. In the computers
in the ETZ D61.2 you can get a terminal by accessing the menu on the top left corner and selecting
ApplicationsAccessoriesTerminal.
Student Task 1:
Change to your home directory and install the training files with the script provided:
sh > cd
sh > /home/vlsi2/t1/install_t1
Change to the design directory
sh > cd training_1
The copied files and folders are arranged in a certain structure which is described in the next section.
design
.cockpitrc
calibre
docs
Links to documents
encounter
out
save
scripts
src
sample
tech
lef
lib
modelsim
Simulation tool
simvectors
sourcecode
VHDL sourcecode
synopsys
Synthesis environment
tetramax
In this structure, there are five subdirectories for C ADENCE S O C E NCOUNTER . It is strongly recommended to use them in the following way:
out Place all final data to be exported from C ADENCE S O C E NCOUNTER in this directory. This
includes the final netlist (the initial netlist gets modified by clock tree insertion, optimization etc.),
layout and delay files that will be used for postlayout simulation and/or physical verification and
chip finishing. A sample script that generates all these files is provided (scripts/exportall.tcl).
save Put all C ADENCE S O C E NCOUNTER save files, i.e. files in native C ADENCE S O C E NCOUNTER
format, in this directory.
scripts Contains TCL scripts. By default several example scripts for common tasks are provided. It
is highly recommended to develop a run script that contains all the commands used for your
design.
src All user input files should be placed here. These include the initial Verilog netlist, the I/O placement file, timing constraints file and clock tree definition file (all will be explained later in section
3.2).
tech Holds links to technology specific files. Cockpit manages this directory automatically.
With all this information we are now ready to add the missing corner and supply pads to our Verilog
netlist.
A typical Verilog netlist that you will obtain from S YNOPSYS D ESIGN C OMPILER will contain many
levels of hierarchy. Each level of hierarchy is enclosed between the
module name ( pin names separated by comma )
...
endmodule
statements, where name refers to the name of the module (module is the Verilog equivalent of an
entity in VHDL). In our case we need to add the pads to the top-level module which contains the rest
of the I/O pads. The top-level design is almost always the last module definition in a Verilog file3 .
Student Task 2:
Copy the Verilog netlist to encounter/src/ in order to have a clean copy of the initial netlist
even if synthesis is rerun.
sh > cd encounter/src/
sh > cp -p ../../synopsys/netlists/filter_chip.v \
filter_chip.v.initial
The file specialpads.v contains four corner pads and 8 supply pads corresponding to the
power scheme 1. As our design uses power scheme 1, no changes are required to this
3
The content of the module needs to be defined before it can be instantiated by a different module. Consequently the
top-level module is the last to be defined, however not all Verilog files need to be hierarchical, a design can also be
spread between multiple files
file. For power scheme 2, we would have to comment out the eight additional supply pads
(comments in Verilog start with //).
What remains to do is to add the contents of specialpads.v at the right point, i.e. where the
other pads are, to the initial netlist.
Using a text editor a , open filter chip.v.initial and find the definition of the top-level module
chip by searching for:
module chip
Below this declaration you should see lines that instantiate the pads. Insert the contents of
specialpads.v at this point. As long as you are in the module body, it does not matter where
exactly you insert them.
Save the file as filter chip.v and exit the text editor.
a
There are many text editors you can use. There are terminal based editors (vi, vim, nvi, joe, jed, pico, nano etc.),
editors that are mainly terminal based but have a simple GUI (emacs, xemacs, gvim etc), and GUI based editors
(mousepad, gedit, nedit, kate etc). Out of these emacs, vi (and derivatives), and nedit are the most advanced
editors.
Remark: In the future you can use a small Perl script to add the specialpads to the initial netlist, i.e.
sh > ./insert_specialpads ../../synopsys/netlists/filter_chip.v \
./specialpads.v > filter_chip.v
inserts the contents of specialpads.v into the last module defined in ../synopsys/netlists/filter chip.v
and write the modified netlist to filter chip.v.
The cockpit will copy sample I/O files automatically to the src/sample directory6 . All lines starting
with # are comments. The file consists of two main sections: globals and iopad.
(globals
[global definitions]
4
5
6
A good pinout could simplify the routing on the PCB, allow you to use fewer layers and result in less parasitics
8 pins will be left unconnected
For this technology there will be four files. There will be two template files chip.iotemplate and chipep.iotemplate
for the normal and extended power configuration respectively. These files have all the required power connections in
place, and the data sections are commented out. There are also two example files that have fictional I/O placement
where all pins are defined.
)
(iopad
(topleft
[pads that are on the top left]
)
(left
[pads that are on the left side]
)
[definitions for other sides]
)
For us the relevant part is the iopad section. This part contains eight subsections that define the
names of the pad instances, and their locations in the four sides and four corners. We do not have
to touch the corner specifications7 as they will be the same for all designs. We have to distribute the
pads among the four sides of the chip top, right, bottom, left. If you look at the sample file you
will see that for each pad there is a single line entry in the following form
(inst name="NAME_OF_PAD"
The last part following # is a comment, it is there just for your information. Regardless of the power
scheme you are using, we will use the same 56 pin package as illustrated in the webpage above.
The PIN_NUMBER is just a reminder to show which particular location is being defined. The location
is specified using the OFFSET_VALUE. C ADENCE S O C E NCOUNTER uses a coordinate system that
bases the coordinate (0,0) on the bottomleft corner as shown in the figure below:
top
topleft
topright
1
2
3
left
right
Side
Offset
0,0
bottomleft
bottom
bottomright
On the left and right side the pads will be ordered from bottom-to-top, and on the top and
bottom side the pads will be ordered from left-to-right. This ordering can be quite confusing, as
it is neither clockwise, nor counterclockwise. Therefore the aforementioned comments showing the
actual pin numbers will be very useful.
7
The OFFSET_VALUEs given in the template represent fixed locations for the given pad. It is very
important that you do not change these values, as the chip-finishing part will rely on the pads being
located exactly at these locations.
You can assign your pads by writing the name of each pad into the corresponding NAME_OF_PAD.
The name of the pad will be the name of the instance in the Verilog file. For example assume that you
are using standard power scheme and your clock signal is assigned to a pad named pad_clock. In
your Verilog file you would have the following entry for this pad:
XMD ClkxCI_PAD ( .I(ClkxCI) [other pin definitions] )
If you now want to place this pad on pin number 54 of your package, you will find the subsection top
in the I/O file and edit the line for pin 54:
...
(iopad
...
(top
...
(inst name="ClkxCI_PAD"
...
)
...
)
Be careful, do not modify the offset value while you are editing the I/O file. Since we use a fixed
bonding scheme for the power and ground pins, all we need to do is extract the instance names for
all our signal pads and place them by inserting within the appropriate inst name="" statement corresponding the OFFSET_VALUE which corresponds to the desired location. It is also recommended
to put the clock pin (if possible) to pin number 48. All new test boards will make sure that the pin 48
has the best signal quality.
Preparing the I/O file from scratch can be a lengthy and tedious task. To avoid unnecessary work
during this exercise we will start with an almost complete I/O file, but before doing so we will describe
the full procedure recommended when starting from scratch:
1. Start C ADENCE S O C E NCOUNTER and proceed to design import8 by selecting Design\
Import Design. In this form make sure that the IO A SSIGNMENT F ILE is empty.
2. If everything works well, the design will be loaded. Now we can write out a template file that will
contain all the names of the pads. Use DesignSave I/O File ... to save an I/O file
src/chipsequence.io. You can select the SEQUENCE checkbox, however it is not imperative.
What we need is only the names of the pads.
3. Copy the template I/O file src/sample/chip.iotemplate to src/chip.io. As noted earlier, this file
includes all offset= statements, and all statements for corner and supply pads.
4. Using a text editor open the files src/chip.io and src/chipsequence.io. You need to move the
PAD_NAMEs from the file src/chipsequence.io to the correct positions in the file src/chip.io.
5. All entries for data pins in the template file are by default commented out using # character.
Do not forget to remove the comment character for the pads you are using.
8
Student Task 3:
Now, for this exercise you can start with the almost complete I/O file src/chip.ioincomplete\
instead of the template file. This file has all the pads placed properly with the exception of
the 16 pads of the input bus DataInxDI which are still missing.
Furthermore the file src/filter chip.sequence.io mentioned above has already been generated for you.
The desired I/O assignment is depicted in the figure below and can also be found in the file
src/filter chip.io.psa .
Create the complete I/O file and save it as src/filter chip.io.
a
Postscript viewers were very common in the earlier days, you can use gv, kghostview, or evince to view this file
You can use the utility src/io2ps.pl to generate a postscript file from your I/O file. This utility will also
verify if you have used the correct offset locations in you I/O file, and will report errors. For best
results, you should also provide the Verilog netlist file, which will enable the script to make even more
checks.
sh > ./io2ps.pl filter_chip.io > filter_chip.pin_diagram.ps
DataInAckxSO_PAD
DataInReqxSI_PAD
pad_vcc_p4
ClkxCI_PAD
ScanEnxTI_PAD
pad_gnd_c4
RamTestxTI_PAD
pad_vcc_c4
DataOutxDO_PAD_14
DataOutxDO_PAD_15
DataOutxDO_PAD_13
ResetxRBI_PAD
pad_gnd_p4
1
2
3
4
5
6
pad_gnd_c3
34
DataOutxDO_PAD_6
33
DataOutxDO_PAD_5
32
DataOutxDO_PAD_4
31
DataOutxDO_PAD_3
30
DataOutxDO_PAD_2
29
pad_vcc_p3
15
16
17
18
19
20
21
22
23
24
25
26
27
28
DataInxDI_PAD_11
DataInxDI_PAD_13
DataInxDI_PAD_14
pad_gnd_c2
pad_vcc_c2
DataInxDI_PAD_15
DataOutAckxSI_PAD
DataOutReqxSO_PAD
DataOutxDO_PAD_0
DataOutxDO_PAD_1
pad_gnd_p2
pad_gnd_p1
DataInxDI_PAD_12
DataInxDI_PAD_0
DataInxDI_PAD_10
DataInxDI_PAD_1
pad_vcc_p2
DataInxDI_PAD_3
DataInxDI_PAD_2
14
43
44
45
10
46
11
47
12
48
13
49
35
DataInxDI_PAD_4
50
36
pad_vcc_c1
51
37
pad_gnd_c1
52
38
DataInxDI_PAD_5
53
39
DataInxDI_PAD_6
54
40
DataInxDI_PAD_8
DataInxDI_PAD_7
55
41
DataInxDI_PAD_9
56
42
pad_vcc_p1
DataOutxDO_PAD_12
The src/io2ps.pl utility uses a configuration file with the extension .pads. Per default the file src/io2ps\
.pads will be used. If you are planning to use the extended power scheme, you will have to add the
configuration file src/io2psep.pads to the command as well.
10
pad_gnd_p3
DataOutxDO_PAD_11
DataOutxDO_PAD_10
DataOutxDO_PAD_9
DataOutxDO_PAD_8
DataOutxDO_PAD_7
pad_vcc_c3
11
3.2.5 Macro-cells
The macro-cells for the umcL180 process are created using dedicated memory compilers. The specific memory compiler we have access to is able to create five different types of macro-cells with
various capacities:
SU180 : single-port static RAM
SJ180 : dual-port9 static RAM
SY180 : single-port register-file10
SZ180 : two-port11 register-file
SP180 : via programmable ROM
The following parameters are used for the macro-cells:
words
Number of words in the memory
sub-word size
Number of bits within a sub-word of the memory. The sub-word is the smallest unit used for
data access in the macro-cell12 .
number of sub-words per data word
This parameter allows creating multiple sub-words. Each sub-word can be written to separately.
For example, A 32-bit RAM can be configured as having a single 32-bit sub-word, or two 16-bit
sub-words, four 8-bit sub-words and so on.
column or block multiplexer
This parameter affects the geometry of the macro-block. This can have significant influence
on the performance of the macro-block. There is no general rule to determine this parameter.
Once the memory requirements are known, all possible geometries will be considered and the
most suitable one will be determined.
There are several available macro cells, their datasheets can be found under:
/usr/pack/designkits-1.0-ma/umc_L180/faraday/gen/memaker/200901.1.1/datasheet.dz
If none of the available macro-cells suit your needs more can be easily generated on demand. Please
contact the Microelectronics Design Center for this purpose.
Our example design uses a single-port RAM named SY180_2048X16X1CM8. This RAM has 2048
words of 16-bits each (single sub-word) and a block multiplexer of 8. All necessary preparations to
work with this macro-cell have already been done, so you do not need to do anything additional for
this exercise.
10
11
12
dual-port memories have two completely independent access ports. At the same time two separate memory addresses
can be accessed for both read and write.
Although the name suggests that the memory is made out of individual registers, it is very similar in design to SRAM.
In two-port memories, the read and write ports are separate, so you can simultaneously read and write. There are
timing constraints for reads and writes to the same address, please refer to the memory compiler manual for details.
In many places this sub-word is referred to as byte. This might be slightly confusing, since a byte is commonly
accepted to be an information unit consisting of 8-bits.
12
sh > cd /training_1
sh >
sh > icdesign umcL180 &
or from the encounter directory by issuing the command
cd /training_1/encounter
cds_soc81 encounter
a
This exercise uses version 8.1 of the Cadence SoC Encounter . There are newer versions of these software,
however the main principles have not changed much so we will continue to use this version for this exercise,
newer versions have slightly changed GUI elements, and improved capabilities for some functions.
13
We are now in the floorplan view of C ADENCE S O C E NCOUNTER which displays an empty floorplan
with only the pads placed. All top level module(s) of the netlist are shown as a pink/purple square to
the left and all macro-cells to the right. Note that all standard cells are inside the module(s).
5 Floorplanning
Now we will have to decide how cells and macro-cells will be placed on our chip. This process is
called floorplanning. For a standard design, our main concern would be to find a floorplan that will
result in the smallest possible area, while fulfilling all performance and reliability requirements. This
is purely driven by economical reasons, since chip costs are mainly determined by the area. In some
cases there are additional geometrical constraints. The manufacturing company may impose certain
14
limits to the aspect ratio of the final layout13 , or even dictate the maximum height or width of the
layout.
Back-end design is not only used for complete chips. Macro-cells that will be part of a larger systemon-chip design can also be designed in this way. In such cases there might be even more restrictions.
For example, certain metal layers might be reserved for the system level.
So the question is, How small can my layout be so that I am still able to fulfill all specifications?. As
a lower bound, you will need enough area to place all your I/O pads and standard cells. Ideally, in
terms of area (and assuming your design is not pad limited, see exercise 2), you will want to place
standard cells without leaving extra space in between, completely filling out the core area. This is
hardly ever possible because:
The number of interconnections that can pass through a certain area is limited by the number
of metal layers available14 , wire width and minimum spacing requirements. Depending on the
interconnection overhead, the area above the cells15 may not be sufficient for routing.
Timing is greatly affected by the placement of your cells. Placing them next to each other with
no space in between not leave the tool any flexibility in placing cells. This in turn reduces the
optimization options of the tool, like the ability to cluster cells that are closely interconnected.
All designs require power routing for operation. Some wires of the power connection limit where
the cells can be placed, or restrict signal routing which in turn increases the area requirement.
The majority of designs require a clock tree to function. This clock tree is added during the backend design. This requires additional area for the buffers used in the clock tree. Furthermore,
the clock tree synthesis algorithm can produce better results if it has more freedom to place its
buffers.
Macro-cells, like the RAM in our example, usually require some extra space along the edges so
that they can properly be connected to power and signal lines.
Designs that have a high switching activity require a lot of current for a short time which is
called a surge. The power distribution network may need additional decoupling capacitors to
store some charge that can provide some of the current of the standard cells during such a
surge. Additional space for these decoupling cells may be required during placement.
As a consequence, the standard cell rows (which form the core area) can not be filled completely
with standard cells, in other words there needs to remain some free space in between cells.
Utilization indicates to what amount the standard cell rows are filled. 100% utilization is the upper
bound where all cells are abutted and there is no extra space, while a utilization of 50% means that
half of the core area is empty.
Usually, it is not possible to predict whether or not it is possible to fulfill all requirements with a certain
utilization16 . You will have to try and find out. This is the main reason why back-end design is an
iterative process17 .
13
14
15
16
17
Especially in MPW runs, a lot of silicon area is wasted if all designs have wildly different dimensions.
For our technology there are 6 metal layers.
Cells in our technology use mostly the lowest metal layer Metal-1 and very rarely the Metal-2 for internal connections,
all other layers are free for routing.
Both placement and routing are separately NP complete problems, without completing the routing and placement
you will not know if it is possible to fulfill the requirements.
Obviously, technology plays an important role, and it is possible to give certain guidelines for a technology. However,
backend design is always highly dependent on the design itself. You will usually see in a few iterations what is possible
and what is not.
15
As a consequence, we only have to make sure that our design fits on this area, and there is no need
to find the smallest possible layout. We may however need to constrain the core area to make it
smaller if the utilization is too low, since a spread out design has longer interconnections that may
adversely affect timing.
18
19
The width of the metal line depends on the amount of current drawn from the line, you will be able to judge this
better after exercise 3 which is dedicated to estimating the power consumption. We will mostly use a width of 20 m,
since this is the widest metal that can be manufactured without slotting (wider metal lines require slots/holes which
break up the metal shape).
The problem is that if much current is drawn, there will be a significant IR drop along the power lines. The cells
in the middle will be supplied with a lower VCC than the ones on the sides. This could dramatically effect the
performance of the system.
16
When placing a macro-cell, you should also take into account where the power and signal pins of the
block are located and what metal layer they are on. Often signal connections are only on two edges
and you want them to face the core and not the I/O pads.
Now, when we consider all the above, the core area that remains free to place core cells on is much
smaller than the 1.54 mm2 that we started with. Our example design has a total cell area (including
RAM) of 0.82 mm2 and should therefore comfortably fit into the designated area.
1519.62 m
VDD
GND
Power Stripe
Standard Cell Power Connections
1239.38 m
Macro Cell
(RAM)
Standard Cells
Block Halo
I/O to Core
Spacing
17
Larger values will reduce the area available to place the core cells thereby increasing core
utilization.
As noted earlier, some iterations are usually required to find optimal values for a particular
design.
In this exercise we will assume that we will use one VCC and one GND line of maximum
width 20 m. We need some extra space between the lines and, for the moment, we can
start with a distance of 50 m for all sides and click on OK.
The floorplan should now look like shown in the screen-shot below. Note that the pads are all placed
at their proper locations as the I/O file used during design import specifies absolute locations and we
made sure that the die size stays fixed to the proper size during the initialize floorplan step.
Student Task 8:
Next we need to place the RAM macro-cell. Change the cursor mode to M OVE /R ESIZE /R E SHAPE by selecting the appropriate icon (next to the ruler icon) or use the keyboard shortcut
SHIFT-R. Now you can select the RAM macro-cell and drag it to any location you like. The
blue lines displayed are so called flightlines that show where the signal connections to the
block are.
You can change the orientation of the RAM by either using Floorplan Edit Floorplan\
Flip/Rotate Instances ... (or press r), or with the attribute editor (press q). Note that the
RAM macro will completely block Metal-1, Metal-2, Metal-3 and Metal-4. Only Metal-5,
Metal-6 will be available for routing over the RAM macro-cell20 .
20
By default, the internal structures within a cell or block are not displayed. You need to make Cell Blkg visible to
see the so called blockages within a cell.
18
21
There is also a special rule required if there are logic one/zero values 1b1/1b0 instead of TIE1/TIE0 cells in your
netlist. You should however not have such logic values in your netlist.
19
Next we will add the core power rings that distribute power all around the core.
Student Task 10:
Select the menu Power Power Planning Add Rings.... A large window will appear. The N ET ( S ) field on the top defines for which nets rings will be created. The default
is to create power VCC as well as ground GND rings.
In the R ING C ONFIGURATION section you can specify on what layers the ring segments will
be created. Select metal5 H for TOP and B OTTOM and metal6 V for L EFT and R IGHT.
Specify W IDTH as 20 m, S PACING as 1.5 m and O FFSET as 4 m and click O K.
There are many alternative power distribution schemes that can be used. The one that we have
chosen here is a very simple one. We have selected the upper metal layers Metal-5 and Metal-6
for the ring, because in this technology Metal-6 is thicker and consequently has less parasitic
resistance which is desirable for power distribution.
For your own designs, you should perform a power analysis (topic of Training 2) to find out the best
power distribution approach that matches your design.
The width has been chosen as 20 m for convenience reasons. Basically the wider the power connection, the better. But as already mentioned earlier, in this technology, metal lines wider than 20 m
need to be slotted (stress relief slots) which requires extra effort. As an alternative to slotting it is
also possible to create several smaller parallel rings, e.g. two VCC and two GND rings.
20
S PACING determines the distance between the two nets and O FFSET determines the distance between the core area and the innermost ring.
We also need a (partial) ring around the macro-cell, you will see later why this is necessary.
Student Task 11:
Select the menu PowerPower Planning Add Rings... just like before. This time
in the R ING T YPE box, select B LOCK RING ( S ) AROUND. You can leave the selection at
E ACH BLOCK since we have only one block anyway.
C ADENCE S O C E NCOUNTER is usually smart enough to create wires only on the edges
where no power lines are yet, i.e. to not create new wires on top of the core ring.
If this fails you can specify the segments and connections you want on the A DVANCED tab.
Fill in the values/settings similar to that of the A DD R INGS and click on O K.
At any point if you wish to delete part of the floorplan you can:
use the U NDO feature by simply pressing u
select and remove objects of a specific class (press d)
use the menu option Floorplan Edit Floorplan Clear Floorplan...
select an object and hit the Del key on the keyboard
Student Task 12:
Also, you can save or load (restore) your floorplan at any time using the menu Design
Save Floorplan ... and Design Load Floorplan ... respectively.
Save your floorplan to the save directory.
At this point power is to the standard cells arrives from the sides. Especially for fast designs the
standard cells in the middle of the standard cell row will not receive sufficient power it is important to
add vertical stripes to improve the power distribution.
Student Task 13:
Select Power Power Planning Add Stripes ....
The S ET C ONFIGURATION part of the window defines the properties of one stripe set.
The S ET PATTERN part defines how many stripes will be added. We can either choose to
insert a fixed number of sets or only specify the distance between two sets S ET- TO - SET
DISTANCE :
In the F IRST /L AST S TRIPE part, we select R ELATIVE FROM CORE OR SELECTED AREA. Add
to X FROM LEFT and X FROM RIGHT a value stripe sets in such a way that the standard cell
rows get divided into three equally long pieces. See the screen shot for width, spacing and
layer. Note: You can fine tune this later by moving the stripe sets.
By default stripes will continue over macro cells. To prevent this, select the O MIT STRIPES
INSIDE BLOCK RINGS option in the S TRIPE B REAKING section of the A DVANCED tab.
21
It is rather easy to move wires in C ADENCE S O C E NCOUNTER . Click on the move wires button (or
press m), select the wires you want to move, and drag them to their new location. C ADENCE S O C
E NCOUNTER will make sure that electrical connections remain intact. If you want you can use this to
fine tune the stripe placement.
We still need to define a block halo for the RAM macro-cell. This is necessary to keep standard cells
from being placed to close to the RAM and also to avoid problems when routing the power lines of
the standard cell rows.
The figure below illustrates one common problem with the block halo.
Power Rails
22
Block Halo
In this figure, only two standard cell rows are shown. The block halo around the first row extends far
enough to cover the two power lines22 . This is like it should be.
For the second row, the block halo does not cover the power rails, and when making the power
connections C ADENCE S O C E NCOUNTER will try to extend the power connection past the power
rails as shown in the figure. This leaves a dangling power line23 . While this will not render your chip
useless, it should be avoided.
Student Task 14:
From the menu select Floorplan Edit Floorplan Edit Halo.... A window will
appear, where you can specify a keep-out zone for routing and/or placement around the
macro-cell.
Usually we only need a Placement Halo. The size will depend on your power routing/floorplan.
Create an appropriate Placement Halo.
Notice that the I/O pads are placed with some distance between them24 . At some point in the design
flow we need to close the gaps between the I/O pads in order to complete the supply rings that run
around the core (within the pad cells) and are required to supply the circuitry within of the pad cells.
Student Task 15:
Instead of using wires, we will place so called filler cells that completely fill the gaps and
establish the required connectivity.
There is a script that will automatically insert matching filler cells. Type the following in the
C ADENCE S O C E NCOUNTER console window
enc > source scripts/fillperi.tcl
22
23
24
This is just for illustration. It is not possible to draw a block halo that has this (L) shape.
This sort of dangling wires are known as geometry antenna in Cadence SoC Encounter
This is due to the contraints set by the company that bonds the chips. They specify that the minimum distance
between two adjacent pads can be 90 m. Since even a core-limited pad in this technology is roughly 60 wide, we
need to place them with gaps in between.
23
Now we need to finalize the power connections of the chip. The following connections still need to be
made:
The core ring needs to be connected to the core supply pads (VCC3IOD and GNDIOD).
All standard cells need to be connected to VCC and GND lines.
All macro-cells need to be connected to VCC and GND lines.
Student Task 16:
Select Route Special Route ... from the menu. SRoute is the special net router,
and is only used to make power connections.
The R OUTE : part contains the different connection types we have listed above. B LOCK
PINS are macro-cell power connections, PAD PINS are the connections from the core supply
pads to the core ring. We will not need PAD RINGS since we have already used filler cells to
complete these rings. S TANDARD CELL PINS will add power lines to the standard cell rows.
Finally, if you still have stripes that are not connected to power (not very likely) you can use
the S TRIPES ( UNCONNECTED ) option.
While it is possible to route all connections at the same time, it is strongly recommended to
do it one by one:
1. Start with PAD PINS. If nothing happens you have most likely forgotten to source the
globalnet.tcl script.
2. Route B LOCK PINS. Check the result, did the router connect the macro-cell the way
you wanted? If not you may need to study the A DVANCED tab of the SRoute window.
If all fails you can edit the connections manually.
3. Route the S TANDARD CELL PINS. This should create many horizontal Metal-1 lines
that connect to the rings and stripes. Look for dangling wires around the block halo
(adjust the block halo if necessary).
We are now finished with floorplanning. Your floorplan should look similar to the following screen
shot.
24
6 Placement
We will now start with the placement of the standard cells in the core area. Placement is a very
computation intensive problem, and mostly heuristic algorithms are used for this purpose.
Student Task 17:
Select Place Standard Cells.. ....
We want run a full placement and not an incremental or just the quick prototyping one.
I NCLUDE P RE -P LACE O PTIMIZATION however is very useful as it removes all buffers/inverters trees from the netlist which will help us for timing analysis as you will see later.
To set advanced options click M ODE. Set C ONGESTION E FFORT to L OW and deselect RUN
T IMING D RIVEN P LACEMENT as timing driven takes much longer and might not help that
much to improve timing. There are several other options that you can set, but at this time
we will leave them as they are. Apply the changes by pressing OK
You will come back to the placement window seen below, click OK to start placement. This
may take some time.
We have to warn you about the various performance related options such as C ONGESTION E FFORT
and RUN T IMING D RIVEN P LACEMENT above. In the exercises sometimes we will advise you to use
certain settings for these options in order to reduce runtime, or because for this particular design
we have found out that a particular option gives better results. When you do your own designs, you
25
should consider evaluating which options are better suited rather than copying all options from this
exercise.
For each standard cell, the placement algorithm will try to find the optimum location so that there is a
feasible routing solution and the total length of the connections is minimized.
Examine the placement by using the design browser (switch to the physical view). You will notice that
standard cells within the same entity are mostly placed next to each other.
The available space and the placement of macro-cells and I/O pads can have a great influence on
the placement of standard cells. Even though more space seems to be a good idea, too much
space sometimes results in placements where the average distance between standard cells and
consequently the delays caused by wire capacitance/resistance become larger. Only experience and
several iterations will allow you to find a placement for your circuit that is close to optimal.
Note: Visibility of S PECIAL N ET is turned off in the next screen shot.
26
The results for placement (and later routing) are strongly design dependent. For example, structures
with many interconnections such as look-up tables will usually need much more space than synthesis
predicted as the cells need to be spread out in order to have enough space to route all the interconnections. This is why generalizations for back-end design, such as During back-end design, your
circuit area will increase by 10% dont work very well.
Student Task 18:
Let us save the entire design with Design Save Design As SoCE. This will save the
configuration file, netlist, floorplan, special route, placement and routing files as well as the
current mode, options and preferences. A design saved in this way can be restores using
Design Restore Design ... SoCE.
The space required is surprisingly small as most files are compressed and the library files
do not get saved along with the design.
Remember to save under the save directory.
Alternatively you could also just save the placement. Select Design Save Place \
....
During synthesis, S YNOPSYS D ESIGN C OMPILER assigns constant logic values to two special standard cells named TIE0x and TIE1x, where x is a drive strength modifier. This creates a small
inconvenience, as often one of these cells is assigned to drive many outputs at the same time, creating relatively long interconnections.
There is sufficient place on the chip to place several of these cells. We will use a script that first
removes all these cells. Then we will set the rules for placing these cells. The example script scripts\
/tiehilo.tcl sets the maximum number of connections driven by a single cell to 20, and the maximum
distance between the pin and the tie cell to 250 m. And finally we insert the tie cells according to
the rules we have defined.
Student Task 19:
At the command line type:
enc > source scripts/tiehilo.tcl
7 Timing
The synthesis tools we currently use for HDL synthesis (S YNOPSYS D ESIGN C OMPILER) are not
aware of any instance placement information. Therefore the interconnects can only be estimated
based on a statistical model, i.e. the fanout of a net determines its length, capacitance, resistance and
area. Now that the placement and even trial-routing is available the timing might differ considerably
from the numbers obtained from S YNOPSYS D ESIGN C OMPILER.
7.1 Analysis
C ADENCE S O C E NCOUNTER has a practical timing analysis function, where you only have to specify
the state of the design (see below) and the A NALYSIS T YPE (Setup or Hold) you want to run.
27
The summary gives a very good overview of the current design timing. Some explanations:
The analysis was run in setup mode, i.e. setup time checks were performed but no hold time
checks.
28
The columns contain numbers for all path in the design (ALL) or for specific path groups, e.g.
reg2reg for all register to register paths.
Worst negative slack (WNS) reports the slack for the most critical path. Negative numbers
mean that the constraints are violated by this value.
Total negative slack (TNS) is the sum of WNS for all violating paths. Together with the number
of violating paths this figure helps to see how severe the violations are.
Real/Total DRV show (electrical) design rule violations, some libraries have a maximum transition time for all nets. The report above shows that 370 nets have a transition violation (the
signal takes too long to change from logic-1 to logic-0 or vice versa). In addition 135 nets have
a maximum capacitance violation (the total amount of capacitance driven by a net exceeds the
limit set by the design library). These violations are mostly related to excessive parasitic capacitance due to interconnections, and generally cause timing violations as well. However, even if
a DRV does not cause a timing violation it needs to be fixed.
D ENSITY and R OUTING OVERFLOW show the placement utilization and routing resources, i.e.
are a measure for the feasibility of the current floorplan/placement.
Remark: Refer to exercise 4 of VLSI I25 if you have problems with timing concepts.
The summary looks really terrible. Obviously we have many timing violations that we need to have a
closer look at, before we try to optimize the timing with C ADENCE S O C E NCOUNTER .
Here are some important points to consider when doing so:
The timing depends entirely on the constraints you have specified in the file src/chip.sdc. The
most common mistake is to have errors in this file. Before you go any further make sure that
your timing constraints are correct.
Make sure to not accidentally use constraints that were written for the core level (chip without
pads) at the chip level (with pads) and vice versa. The pads affect the I/O timing quite a bit and
the drive capabilities of a standard cell and an output pad are entirely different, i.e. set_load
needs to be very different.
Inputs and outputs used for test and debugging may cause timing violations. Most of these
signals are not dynamic (they are not toggled during normal operation) and the timing paths
originating from these inputs or ending at these outputs should be ignored, i.e. left unconstrained or explicitly disabled.
To speed up delay calculation C ADENCE S O C E NCOUNTER does not compute the timing of
nets with a fanout above a certain limit but rather swaps in predefined values for delay, capacitance and transition time. All these numbers are specified on the D ESIGN I MPORT form on the
A DVANCED tab in the Delay Calculation category. As a result you will not see the real timing26
of these net in timing analysis and furthermore optimization will not see (and therefore not fix)
violations27 on these nets. However, this is usually the desired behavior as we give these nets
a special treatment anyway (with CTS).
25
26
27
You can access the exercise descriptions, files, and solutions under /home/vlsi1/u4.
To see the real timing you can change the limit on-the-fly from 1000 to a very high value in the console with
setUseDefaultDelayLimit 100000. More on this topic later.
DRV violations will be fixed but no setup/hold violations. Clock nets are even more special, also no DRV fixing will
be done there.
29
Lets now examine the detailed reports that were generated by timing analysis and can be found in
the timingReports folder. Each analysis produces multiple files. Among these there are three files
dedicated to design rule violations (max capacitance: *.cap , max fanout: *.fanout, max transition
time: *.tran violations), and separate *.tarpt timing analysis report files for different path groups
(in2out, in2reg, reg2reg, reg2out)
Student Task 21:
Where do the violating paths in the in2out path category start?
Where do the violating paths in the in2reg path category start?
Do the paths in reg2out and reg2reg look like normal path that should be optimized to
meet timing or is there something wrong?
Why are the reg2reg paths too slow? Look for large numbers in the Delay column and
check the drive strength of the corresponding cell.
There are several different problems in the .sdc file that we have used. First of all, two of our inputs
should not be considered for timing analysis28 . We also have several nets (clock, reset and scan
enable) that we will take care of separately (using the clock tree synthesizer, which we will see later).
These nets will show up in the DRV reports. We do not want to solve timing related problems for
these nets (since they will anyway be solved later), the time and effort required to optimize these nets
could prevent other parts of the design to be optimized.
We can use the D EFAULT P IN L IMIT feature of C ADENCE S O C E NCOUNTER to stop C ADENCE S O C
E NCOUNTER from extracting timing information (and reporting timing violations) for the nets that we
will be optimizing later on. By default the pin limit of C ADENCE S O C E NCOUNTER is set to 1000. In
our case this number is too high (we have slightly more than 400 flip flops in our design).
Student Task 22:
Let us see the nets which have a large fanout. Report all nets with e.g. more than 400 pins.
Use the console command:
enc > report_net -min_fanout 400
Now set a suitable limit with the command
enc > setUseDefaultDelayLimit <number>
so that the high fanout nets will not be considered for timing. Also make the necessary changes to the timing constraints file src/chip.sdc to disable the offending inputports. Reload the timing constraints by selecting the menu Timing Load Timing \
Constraint ....
Then rerun timing analysis.
If you have done everything correct, the only setup violations should be in the path group registerto-register and register-to-out. There should no longer be pins that belong to scan enable or reset
network in the transition time violation report.
28
Cadence SoC Encounter provides a special timing calculation mode that is called Multi-Mode Multi-Corner
Analysis (MMMC). In this mode it is possible to define several scenarios (i.e. separate test and functional modes).
The setup for MMMC is slightly involved and will not be covered as part of this exercise.
30
7.2 Optimization
In order to (better) meet the constraints, C ADENCE S O C E NCOUNTER can try to optimize the design
at every stage of the design process. In our case, the worst setup time violation is about 5.8 ns (for
a 8 ns period), although the netlist delivered by the synthesis tool had no timing violations. This is
due to differences in interconnect parasitics between the two tools. While the synthesis tool relies on
an estimate (statistical model based) C ADENCE S O C E NCOUNTER can use the real placement and
(trial-)routing at hand. Consider the following line from a timing report (broken down over many lines
for readability)
Path 1: VIOLATED Setup Check with Pin i_filter_top/u_filter/u_filter_stage_5/
RegxDP_reg_42_/CK
Endpoint:
i_filter_top/u_filter/u_filter_stage_5/RegxDP_reg_42_/D () checked
with leading edge of ClkxCI
Beginpoint: i_filter_top/u_ram_wrapper/i_ram/DO5
()
triggered by leading edge of ClkxCI
Path Groups: {reg2reg}
Other End Arrival Time
0.000
- Setup
0.149
+ Phase Shift
8.000
= Required Time
7.851
- Arrival Time
14.405
= Slack Time
-6.554
Clock Rise Edge
0.000
= Beginpoint Arrival Time
0.000
Timing Path:
+----------------------------------------------------------------------------------------------------------+
|
Instance
|
Arc
|
Cell
| Slew | Load | Delay | Arrival |
|
|
|
|
|
|
| Time
|
|-----------------------------------+---------------+--------------------+-------+-------+-------+---------|
|
| ClkxCI
|
| 0.000 | 1.828 |
|
0.000 |
| ClkxCI_PAD
| I -> O
| XMD
| 0.000 | 0.000 | 0.000 |
0.000 |
| i_filter_top/u_ram_wrapper/i_ram | CK -> DO5 | SY180_2048X16X1CM8 | 0.130 | 0.033 | 1.750 |
1.750 |
| i_filter_top/u_ram_wrapper/i_test_| A -> O
| MUX2
| 8.441 | 1.874 | 3.973 |
5.722 |
| bypass_mux5
|
|
|
|
|
|
|
The last line reports an standard cell instance MUX2 with low driving capability (2) that has to drive a
big load on its output (1.876 pF). The propagation delay is therefore huge (3.98 ns).
The timing of the same cell as reported by synthesis are: Delay: 0.15 ns, Slew: 0.09, Load: 0.01.
While this is an extreme case you see how synthesis can be wrong without knowing the actual
placement and wire loads.
Student Task 23:
Open the optimization form by selecting Timing Optimize ....
D ESIGN S TAGE needs to be set to the current design stage. Some options are only available
for certain stages, e.g. hold time optimization can not be performed during PRE -CTS as it
doesnt make much sense.
Timing is not the only thing that can optimized. Most technologies specify design rules
like maximum transition time, maximum capacitance driven by a certain cell or maximum
fanout.
After pressing the M ODE button, within the T HRESHOLDS section you can find options that
can be used to tighten the constraints in order to get some margina .
31
Set the options as shown in the figure below and hit OK. Watch the progress of the optimization in the console window. C ADENCE S O C E NCOUNTER is very verbose with its
actions.
a
Cadence SoC Encounter will already automatically add a small margin on its own (internally)
During optimization C ADENCE S O C E NCOUNTER can select different drive strengths for cells, add/remove buffers and inverters, move instances or even restructure part of the logic (just like synthesis
does).
Optimization is done using iterations of timing analysis, optimization, trial-route and parasitic extraction.
As a last step C ADENCE S O C E NCOUNTER performs a timing analysis on the optimized design,
prints the summary to the console and writes the detailed reports to the timingReports directory.
Student Task 24:
Take a look at the summary and the final reports generated. There should be no violations
left.
But what happens if we can not fix the violations with optimization? Again, first make sure to understand what your constraints are and why they are violated. Often there are errors in converting the
design specifications to constraints (is the input delay really 3.5 ns? Also for this pin?) and describing
them properly with the commands available. If you still have problems, there are three levels where
you can reach a solution:
Optimization during backend design (C ADENCE S O C E NCOUNTER )
C ADENCE S O C E NCOUNTER can optimize the design at every stage of the design process. In
general, the earlier the stage, the more changes can be done, e.g. P RE -CTS optimization has
much more flexibility than P OST-R OUTE optimization. At the P RE -CTS stage registers can be
moved and resized, this will no longer be possible after clock tree insertion. On the other hand,
the parasitic interconnect information is much more accurate with later stages of design, so the
timing information (and hence the optimization goals) will be more accurate.
We can (re)run the optimization at various stages, try a new placement or even start with a
new floorplan. It is impossible to give general guidelines, you will have to see what works best
for your design. If you are far from meeting your target (e.g. for a 10 ns clock, if after all
optimizations you still have a timing violation of 2 ns), you may need to go back to synthesis.
32
33
Clock Our main concern is to reduce the skew, since it will effect our timing. The maximum skew
depends on the clock period. As an example, for a 20 MHz clock a clock skew of 0.5 ns is
acceptable. But for a 200 MHz clock, the same skew equals to 10% of the clock period and
would be to high.
If you over-constrain your skew, you will need a deep (and large) clock tree and your insertion
time will rise, which will affect your input and output timing. Therefore you will want to balance
the skew against insertion delay and the number of buffers. Constraining maximum insertion
delay too low will usually degrade results.
Usually, a tree that gives you an acceptable skew will also give you a decent transition time, so
you dont have to worry about that.
Reset We are interested in propagating the reset within one clock cycle to all flip-flops in our design.
For designs with on-chip reset synchronization this is strictly required. The insertion delay
should therefore be less than the clock period, transition times within the bounds imposed by
the technology and skew doesnt matter at all.
Scan Enable Very similar to the reset signal. Usually a slower clock is used for scan testing, therefore
we can allow even a larger insertion delay. For transition time and skew the same holds true as
for the reset.
Sink Tran
Buf Tran
Sink Tran
Buf Tran
AutoCTS
Root Pin
Sink Tran
Buf Tran
Sink Tran
Min Delay
Max Delay
Max Skew
In C ADENCE S O C E NCOUNTER , clock tree synthesis (CTS) is used to generate optimized buffer
trees to drive high fan-out nets. It can be configured to satisfy a variety of constraints.
Student Task 26:
A sample clock tree synthesis configuration file can be found under src/sample/chip.ctstch\
sample. The sample file contains three different configurations for a clock, a reset and a
scan enable signal.
Copy this file to the src directory and adapt the AutoCTSRootPin statements to match
your design.
For educational purposes, change the clock tree specifications as follows: max. skew
0.2 ns, max. insertion delay 4 ns, max. transition time at buffers 0.6 ns and at clock pins
0.4 nsa
34
It is usually not a good idea to specify a small max. insertion time such that this becomes a limiting factor for
CTS. Results may degrade significantly and for most designs the insertion delay is not very important anyway.
If the design employs a reset synchronization register (the example design has one) the source of
the reset tree must be the output of the synchronization register. Note that there is a special option
named SetASyncSRPinAsSync YES for the reset tree definition. This allows set and reset pins to
be considered as targets for the clock tree optimization.
The scan-enable signal is also a special case. Normally the clock tree synthesis algorithm starts at
the AutoCTSRootPin and traces through the netlist in order to find valid endpoints. Per default,
combinational gates will be traced through and clock and asynchronous input pins of sequential
elements (flip-flops) will be stopped at.
By specifying the NoGating rising option, we can make the tracer stop at the first gate encountered. This is necessary since the scan enable signal is often connected to multiplexers and we want
their input pins to be endpoints. Once this option is underway you need to specify the internal pin of
the pad driving the scan-enable signal, otherwise tracing will stop prematurely at the pad cell.
Student Task 27:
Read in the clock tree specification by selecting Clock Design Clock ... from the
menu. Using the browser select the clock tree specification file you have just modified.
Press L OAD S PEC. DONT PRESS OK yeta . You should now see a summary for all three
clock specifications on the console, check it.
Our netlist may have some buffers on the high fan-out nets we want to build trees on. We
need to remove them prior to CTS with the following command:
enc > deleteClockTree -all
a
Pressing OK will start the clock tree insertion. We need to make sure that the clock tree specification is correct
before we go ahead with this step. If you accidentally pressed OK here, it is advised to restart from the last
saved point.
A large number of errors can be discovered by analyzing the pins connected to these nets, even
before building a clock tree.
Student Task 28:
Select Clock Trace Pre-CTS Clock Tree .... To start the trace, click on the icon
on the top left and accept the default trace file name. A summary will be displayed on the
console and the content of the trace file visualized in the GUI.
35
We can see how the trees currently look like and what pins are connected to them. Look also at the
trace file directly. Things to look for include:
Clock, reset, or scan-enable connecting to unexpected input pins, e.g. the reset signal should
not connect to pins other than asynchronous set/reset pins of sequential elements.
Unexpected latches on the clock tree can be discovered this way (G or GB pin).
Discrepancy between the number of endpoints of clock, reset and scan trees. For our example
numbers are as follows:
clock tree: 443 with 442 flip-flop CK pins + 1 RAM CK pin
reset tree: 441 flip-flop RB pins
scan tree: 447 with 441 flip-flop SEL pins + 6 mux S pins, to choose between the functional
and test (scan chain) output signal.
As we see, 442 flip-flops are clocked but only 441 receive a reset signal, this is due to the reset
synchronization register being connected to the external reset signal rather than the internal
reset tree. As the reset synchronization flip-flop is also not on the scan chain and we use full
scan otherwise the 441 flip-flops on the scan tree match perfectly. You get the idea...
Student Task 29:
Open the file chip.cts trace and search for Clock Tree to examine the leaf pins.
If everything looks OK we can proceed with clock synthesis. In the S YNTHESIZE C LOCK
T REE form press OK.
After a few minutes clock tree synthesis will be completed. Detailed reports will be generated under
the directory specified on the form (most likely clock report). This directory includes a simple report
file (clock.report).
36
A summary report is also displayed on the C ADENCE S O C E NCOUNTER console. The first column
shows the achieved performance while the second column reports the target specified in the configuration file.
Student Task 30:
Check your results (summary and detailed reports). How many buffers were added? How
many levels created? Whats the insertion delay? Are all constraints met?
Note 1: You will get a max transition time violation on ClkxCI_PAD/I which can safely be
ignored. As we have specified an input transition time of 800 ps on all primary inputs there
is no way CTS could fulfill the 600 ps requirement at this point.
Note 2: Unless the RouteClkNet YES option was used (more on this later), the
timing figures reported are only estimates and might change quite a bit with detailed routing.
9 Timing Revisited
At this point we will have to go into some more detail about timing. During different stages of the design flow, we have slightly different timing constraints (Refer to the following figure for the differences
in the three stages).
a) synthesis initially the design does not contain any pads. The input delay tidel and the output delay
todel should contain the contribution of the input tinpad and output toutpad pads.
b) pre-CTS during placement and routing phase, all required I/O pads and drivers will be present.
At this stage there is no clock tree present. The timing should be adjusted, as at this moment
the input delay tidel and output delay todel no longer include the pad delays.
c) post-CTS once the clock tree is inserted, the timing will change slightly again. Due to the clock
insertion delay tdi the internal clock will be slightly offset when compared to the external clock.
At the input, the data travelling towards the first flip-flop inside the chip, will have more time,
since this flip-flop will be trigerred by a clock signal that has been delayed by tdi . At the output
however, the data that is coming from the chip will be launched with the internal clock, but will
have to be sampled by the external clock. Consequently there will be less time for this signal.
It should now be clear why it might be desirable to set constraints on the clock insertion delay property
by specifying minimum and maximum values in the chip.ctstch file by MinDelay and MaxDelay
parameters. The clock insertion delay can play an important part in the I/O delay. You may want to
keep the insertion delay within certain limits to ensure proper I/O timing.
Design tools have different mechanisms to deal with these three different cases. The simple solution
is to use multiple constraint files for different stages. However, both S YNOPSYS D ESIGN C OMPILER
and C ADENCE S O C E NCOUNTER accept several parameters to deal with this problem automatically. In the following we will discuss on how C ADENCE S O C E NCOUNTER calculates delays in the
presence and absence of clock tree. The following table summarizes the most important settings:
37
clock latency
(setAnalysisMode)
-noSkew
forced ideal
no effect
-skew -noClockTree
forced ideal
SDCs in effect
-skew -clockTree
a
b
SDCs in
effecta
SDCs in effectb
The timing analysis mode is automatically updated by C ADENCE S O C E NCOUNTER to match the
design stage, i.e. before clock tree insertion it is set to -skew -noClockTree and afterwards to
-skew -ClockTree. The analysis mode can also be changed manually with the setAnalysisMode
command.
The two synopsys design constraints (SDC) set_propagated_clock and set_clock_latency
are usually specified by the designer in the chip.sdc file. Furthermore, CTS tries to add a
set_propagated_clock constraint on-the-fly (in memory), which can cause a number of problems:
This constraint will only be added if the AutoCTSRootPin pin/port in chip.ctstch and the clock
waveform source pin/port (from the create_clock command in chip.sdc) are perfectly identical, i.e. not port vs. instance pin etc.
This constraint is never written to your chip.sdc file, so if you reload that file the constraint is
lost.
Before CTS, only a pointer to your constraints file is saved along with the database. Now, if a
constraint was added by CTS, all loaded constraints (including the new one) will be saved along
with the database to a new file (*.pt). Restoring this database will then load this new constraints
file instead of the one in encounter/src/ that you might have expected.
Note: As soon as you manually (re-)load a constraints file, the behavior is reverted to the normal
one.
Now, as can be seen from the table above, to get the actual timing of the buffers/inverters on the
clock tree instead of ideal mode, setting both -skew -ClockTree and set_propagated_clock
is required. Also note that set_propagated_clock gets overridden for all pre-CTS design stages
and could therefore be set right from the start (as already mentioned earlier).
In ideal mode, the clock tree insertion delay is zero unless the set_clock_latency command
is used to specify a different number, preferably close to the delay of the real tree (that is still to
be inserted). While this placeholder delay has the advantage that the I/O timing doesnt change
between pre-CTS and post-CTS phases, it renders timing reports more intransparent and is not
handled exactly the same across different tools. Therefore, do not use this command unless you
know what you are doing.
In conclusion, it is recommended to include set_propagated_clock right from the start, not use
set_clock_latency and load modified timing constraints after CTS only if required, i.e. when the
I/O timing numbers (set_input_delay, set_output_delay) need to be adjusted to account for
the actual clock tree29 . For this training we will modify and reload the constraints30 .
29
30
For slower clock speeds and/or uncritical I/O timing this is often not required.
It might be more convenient to keep a separate post-CTS constraint file rather than changing the numbers back and
fourth when redoing the flow.
38
Tclk
tidel
tpd ff tpd a
tinpad
a)
Tclk
treg2reg
tin2reg
tpd c
toutpad
tpd e tsu ff
Clk
tidel
tpd ff tpd a
Tclk
tin2reg
tinpad
Tclk
treg2reg
tpd c
treg2out
tsu ff tpd ff tpd d
todel
toutpad
tpd e tsu ff
Chip
Clk
Tclk
tidel
tpd ff tpd a
Tclk
treg2reg
tin2reg
tinpad
c)
todel
Top
Tclk
b)
Tclk
treg2out
tpd c
Tclk
treg2out
tsu ff tpd ff tpd d
todel
toutpad
tpd e tsu ff
Chip
Clk
tdi
External Clock
Internal Clock
tidel
tin2reg
treg2out
todel
The previous figure illustrates all three stages in some detail. Whereever possible the same naming
conventions as the textbook have been used31
31
Refer to page 235 How to formulate timing constraints, and page 346 How to achieve friendly input/output timing
for more on this topic
39
Currently loaded constraints will be purged before the new ones get loaded.
10 Signal Routing
We will now route the signal nets. What you have seen so far are only trial-route nets that are not
DRC clean and can therefore not be manufactured.
Student Task 32:
There are two routing engines in C ADENCE S O C E NCOUNTER . WRoute is the older one
and NanoRoute is supposed to be the latest and greatest. Start NanoRoute by selecting
Route NanoRoute Route.... A large window will open. Enable the I NSERT D IODES
option (you can leave the D IODE C ELL N AME field blank) and leave all other settings at their
defaultsa . Click OK to start routing. You can observe the progress in the console window.
a
On multi-CPU or multi-core machines you can increase the number of CPUs used by selecting Set Multiple
CPU. This gives almost a linear speedup.
The F IX A NTENNA and I NSERT D IODE will cause the router to change layers and/or insert special
protection diodes in order to avoid damages that can happen during manufacturing due to charges
that accumulate on the wires and stress the gate oxide of input pins. Note that this is usually referred
to as P ROCESS A NTENNAS which is entirely different from geometrical antennas (which is related to
dangling wires).
40
Our example design should route without problems. This is not always the case and we might get
geometry violations. Geometry violations include shorts between nets and design rule violations (for
example metal lines are drawn too close to be manufactured as separate wires). Needless to say
that we must solve all these violations.
You should always closely examine the violations in order to find out what causes them. Sometimes
there is an unfortunate placement of macro-cells or power lines to blame and sometimes there is just
not enough space to route all connections. Solutions range from re-running routing to completely
reworking the floorplan.
Student Task 33:
Now that we have the real signal wiring we need to perform a postroute timing analysis to
see if we still meet all constraints. At this point not only a setup time analysis, but also a
hold time analysis needs to be run. Usually it is not necessary to deal with hold time until
this point.
Note that you have to do two separate runs, one for setup and one for hold, as it is not
possible do this in one single step. Use the GUI (make sure to select P OST-R OUTE) or type
the commands below to perform the two analyses.
enc > timeDesign -postRoute
enc > timeDesign -postRoute -hold
Inspect the two summaries and the report files written to the timingReports directory. You
will most likely have setup violations.
To fix violations or increase the hold margin we can now perform a postroute optimization. Internal
hold time violations need to be fixed in any case as, unlike internal setup violations, they can not be
41
42
In the clock tree constraint file, set RouteClkNet YES. This is a per-tree setting that instructs
CTS to call NanoRoute in order to route this clock net during clock tree insertion. The wires
get a status of FIXED and will therefore not be changed later during signal routing. While this
improves timing on the clock tree, overall routability gets worse.
To further improve timing, you can tell NanoRoute to route this net not like an ordinary signal
net, but to create a balanced routing (by following the so called RouteGuide computed by
CTS). To do so, set UseCTSRouteGuide YES in the clock constraint file33 .
11 Timing Debug
To analyze timing violations, C ADENCE S O C E NCOUNTER also offers a graphical interface (Timing
Debug Timing) that visualizes paths and allows cross-probing with the layout. We will not explain
the tool in detail here, but rather make some important notes:
This functionality is sort of standalone, it does not use results from the timeDesign command
but runs a new analysis that generates the file top.mtarpt. Then these paths are visualized.
If the above file already exists, it will usually simply be loaded. This means that whenever your
design has changed you have to regenerate this file in order to get up to date data. This can be
done with the G ENERATE switch on the form that opens when you click the folder icon.
When generating the top.mtarpt, the current timing mode is relevant, i.e. to analyze hold paths
timing mode has to be set to hold mode.
33
This will persistently(!) alter the global CTS Mode to setCTSMode -useCTSRouteGuide
43
12 Finishing
We are almost done with backend design, there are only a few steps required to finish the layout and
verify that everything is correct.
Note that your row utilization will be 100% after this step. This means that you will have no room
for further optimizations. Make sure to insert filler cells after all optimizations have been completed.
44
Note: It is also possible to remove the filler cells with Place Filler Delete... or by using
the script removefillcore.tcl.
There is a script that will perform the last verification steps for you automatically. You can set a
variable DESIGNNAME to assign the base name for all the files generated by this script.
enc > set DESIGNNAME MyBeautifulChip
enc > source scripts/checkdesign.tcl
45
To get complete supply net connectivity in the Verilog netlist for LVS, the missing connections for the power
and ground pins (GNDIO/VCC3IO) of the pads are added and removed on-the-fly. We could also define and
handle these two nets in the same way as VCC/GND, but there are more drawbacks than benefits.
Similar to the checkdesign.tcl file, the variable DESIGNNAME will be used to assign the base name of
the files. If you do not specify a name, final will be used. After you complete this step you will have
the following files:
*.v This is the final netlist. Make sure to use this netlist for post layout simulations.
*.gds.gz The layout in GDSII (Graphic Design System II) format. This is the standard format for
exchanging layout data.
*.sdf.gz The SDF (Standard Delay Format) file to be used for post layout simulation.
46