Sie sind auf Seite 1von 63

Understanding

Clock Tree Synthesis


Log Messages

© Synopsys 2012 1
Agenda
• Prerequisites for Clock Tree Synthesis

• Enabling Useful Debug Messages in IC Compiler Clock


Tree Synthesis

• Clock Tree Synthesis Log Messages

• Clock Tree Optimization Log Messages

© Synopsys 2012 2
Agenda
• Prerequisites for Clock Tree Synthesis

• Enabling Useful Debug Messages in IC Compiler Clock


Tree Synthesis

• Clock Tree Synthesis Log Messages

• Clock Tree Optimization Log Messages

© Synopsys 2012 3
Prerequisite 1:
Run the check_clock_tree Command
• Run the check_clock_tree command prior to clock tree
synthesis, and fix the issues reported

• This command checks the following, and reports issues that can
lead to bad QoR:
 Cl
Clock
kTTree S
Structure
 Constraints
 Clock Tree Exceptions

© Synopsys 2012 4
Prerequisite 2:
Ensure Placement Legality
g y
• For clock tree synthesis to proceed without any errors, it is necessary to
have a legally placed design.
• Use the check_legality
check legality command to check whether the design is
properly placed and legalized, prior to CTS.
• In case of legality issues, use the legalize_placement command to
resolve these issues
issues.

Note:
• Clock tree synthesis will abort in case of placement legality issues
issues.
• In some cases, like overlapping standard cells, it may still proceed and
issue a warning during placement legality checking, but continuing with
placement legality issues may lead to bad QoR
QoR.

Warning: Some cells in the design are not legal. (CTS-242)

© Synopsys 2012 5
Default Constraints

• The default constraints that clock tree synthesis uses are as follows:
Maximum transition time 0.5ns
Maximum capacitance 0.6pF
M i
Maximum f
fanout 2000

© Synopsys 2012 6
Design Rule Constraints
• In addition to the clock tree design rule constraint values specified using
set_clock_tree_options, IC Compiler also considers the design rule constraint values
from the logic library and the design.

• The following table summarizes how IC Compiler determines the design rule constraint
values used during the design rule fixing stage of clock tree synthesis and optimization.
Case1: Case2: Case3:
Default behavior: Use library and SDC settings for maximum Use only user set settings for clock tree
fanout: synthesis and clock tree optimization:
cts_use_lib_max_fanout=false
t lib f t f l cts_use_lib_max_fanout=true
t lib f t t
cts_use_sdc_max_fanout=false cts_use_sdc_max_fanout=true cts_force_user_constraints=true
cts_force_user_constraints=false cts_force_user_constraints=false

The minimum value from: The minimum value from:


• The set_clock_tree_options • The set_clock_tree_options Value set using
Maximum capacitance • The CTS default value (0.6pF) • The CTS default value (0.6pF) set_clock_tree_options
p
• The logic library • The logic library
• The SDC constraints • The SDC constraints

The minimum value from: The minimum value from:


• The set_clock_tree_options • The set_clock_tree_options Value set using
Maximum transition time • The CTS default value (0.5ns) • The CTS default value (0.5ns) set_clock_tree_options
• The
Th logic
l i lib
library • The
Th logic
l i lib
library
• The SDC constraints • The SDC constraints

The minimum value from


Maximum fanout The value set using • The logic library The value set using
set_clock_tree_options • The SDC constraints set_clock_tree_options
• The set_clock_tree_options
set clock tree options

© Synopsys 2012 7
Constraints Specified Using the
set_clock_tree_options
p Command
• Library units are used for time and capacitance values specified by using
the set_clock_tree_options command

• The smallest values accepted for the -max_capacitance and


-max_transition options of the set_clock_tree_options
command are 1fF and 1ps respectively
respectively.

• For example, if the library units are pF and ps, and you specify the following
command IC Compiler will issue an error:
command,
icc_shell> set_clock_tree_options -max_cap 0.0009 -max_tran 0.300
Error: User max_cap constraint (0.900000 fF) is too small. (CTS-206)
Error: User max_tran constraint (0.300000 ps) is too small. (CTS-207)

– IC compiler will not accept these small values, and will use the previously
specified values or the default values for maximum capacitance and maximum
transition, during clock tree synthesis.

© Synopsys 2012 8
Agenda
• Prerequisites for Clock Tree Synthesis

• Enabling Useful Debug Messages in IC Compiler Clock


Tree Synthesis

• Clock Tree Synthesis Log Messages

• Clock Tree Optimization Log Messages

© Synopsys 2012 9
Enabling Debug Messages

• To enable clock tree synthesis debug messages in IC Compiler, use:


set cts_use_debug_mode
cts use debug mode true

• Many of the messages discussed in this presentation are available only


when yyou enable the debug
g mode.

© Synopsys 2012 10
Agenda
• Prerequisites for Clock Tree Synthesis

• Enabling Useful Debug Messages in IC Compiler Clock


Tree Synthesis

• Clock Tree Synthesis Log Messages

• Clock Tree Optimization Log Messages

© Synopsys 2012 11
Messages in the compile_clock_tree
Command Log

• Before clock tree synthesis:


– Design
D i update
d t
– Buffer and Inverter information
– Clock tree constraints
– Clock structure before clock three synthesis

• During clock tree synthesis:


– Clustering
– Meeting target early delay
– Gate level clock tree synthesis results

• After clock tree synthesis:


– Summary reportt
S
– Embedded clock tree optimization
– DRC fixing beyond exceptions
– Placement legalization

© Synopsys 2012 12
Overview of the compile_clock_tree Command Log
START_CMD: compile
p _clock_tree CPU: 55 s ( 0.02 hr)
) ELAPSE: 288 s ( 0.08 hr)
) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011
(PSYN-508)
CTS: CTS Operating Condition(s): MAX(Worst)
START_FUNC: prelude CPU: 55 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011
(PSYN-508)
Loading design 'ORCA_TOP'
Prelude

Information: Design
g Libraryy and main library
y capacitance
p units are matched - 1.000 p
pf.
END_FUNC: prelude CPU: 56 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011
(PSYN-508)

****************************************************************
Information: TLUPlus based RC computation is enabled. (RCEX-141)
Extraction related messages
****************************************************************
Information: The distance unit in Capacitance and Resistance is 1 micron. (RCEX-007)
(RCEX 007)
Information: The RC model used is TLU+. (RCEX-015)

CTS: Blockage Aware Algorithm
CTS: Marking Ignore Pins....

Warning: too small maximum transition (=0.300000) defined at library cell dl02d4. (CTS-619)
CTS b
CTS: buffer
ff estimated
ti t d skew
k t
target
t d
delay
l d
driving
i i res i
input
t cap
CTS: invbdk [0.009 0.010] [0.043 0.058] [0.197 0.213] [0.059 0.059] Buffer characterization
...
CTS: Prepare sources for clock domain SD_DDR_CLK
CTS: Prepare sources for clock domain SDRAM_CLK
CTS: Prepare sources for clock domain SYS_2x_CLK

CTS: Region Aware Algorithm is automatically turned off when design has no region or only has one region.
CTS: Info: Found net sys_2x_clk, on cell I_RISC_CORE/I_REG_FILE/REG_FILE_B_RAM is macro. Will not treat as pad.

clean drc fixing cell first...
In all, 0 drc fixing cell(s) are cleaned
In all, 0 drc fixing cell(s) beyond exception pins are cleaned

CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_8/S is implicit ignore
CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_9/S is implicit ignore

© Synopsys 2012 13
CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_8/S is implicit ignore
CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_11/S is implicit ignore

Warning: Ignore net sd_CK since it has no synchronous pins. (CTS-231)


CTS: Info: will use target transition value for initial CTS stages

Pruning library cells (r/f, pwr)


Min drive = 0.000372606.
… Pruning of buffers and inverters
Final pruned buffer set (7 buffers):
bufbd1

CTDN lib estimation: buffers should result in better clock power.
CTS: BA: Net 'sdram_clk'
CTS: Starting clock tree synthesis ...
CTS: Conditions = worst(1)
CTS: Global design rule constraints [rise fall] Reporting global
CTS: max transition = worst[0.300 0.300] GUI = worst[0.300 0.300] SDC = undefined/ignored clock tree constraints

Information: Removing clock transition on clock PCI_CLK ... (CTS-103)

CTS: gate level 1 clock tree synthesis


CTS: clock net = sdram_clk
CTS: gate level 1 clock tree synthesis results Clock tree synthesis
CTS: clock net : sdram_clk
sdram clk

TS: Clock tree synthesis completed successfully
CTS: CPU time: 18 seconds
CTS: Reporting clock tree violations ...
… Reporting the results of clock tree synthesis
CTS: ------------------------------------------------
CTS: Clock Tree Synthesis Summary
CTS: ------------------------------------------------

CTS: Starting block level clock tree optimization

CTS: gate level 1 clock tree optimization Embedded clock tree optimization
CTS: clock net = pclk

© Synopsys 2012 14
Gate Upsizing During Clock Tree
Synthesis
• The compile_clock_tree command will upsize all the
preexisting cells in the clock tree before building the clock tree.
Preexisting gate
Information: Replaced the library cell of sys_ctl/sunburst_clk_mux_div1/clk_buf from bufbd4 to
bufbdf (CTS-152)
bufbdf. (CTS 152)

• In the previous example the preexisting gate is upsized from a


bufbd4 to a bufbdf.

• This upsizing helps in reducing the number of buffer levels needed


to building
g the clock tree,, therebyy reducing
g the buffer count.

© Synopsys 2012 15
Maximum Capacitance and Transition Related
Warnings
• Even if the set_clock_tree_options command does not issue
any errors when you set the maximum capacitance and transition
constraints, the compile_clock_tree command can issue
warnings if the values are too small.

Warning: too small maximum transition (=0.050000) defined at


pin instCLK1GC1/Q. (CTS-620) Max trans =50ps is too tight for the pin instCLK1GC1/Q
Warning: too small maximum capacitance (=0.050000) defined at
pin instCLK1GC1/Q. (CTS-620) Max cap =50fF is too tight for the pin instCLK1GC1/Q

Warning: too small maximum transition ( (=0.050000)


0.050000) defined at
library cell bufbdk. (CTS-619)

• Tight constraints can cause clock tree synthesis to use an excessive


number of buffers to build the clock trees

© Synopsys 2012 16
Buffers and Inverters Used During Clock Tree
Synthesis
• Before synthesizing the clock tree, IC Compiler characterizes each buffer
and inverter
 To see the characterization details, set the following
g variable to true:
set cts_do_characterization true
 After characterization is done, characterized values for each buffer and
Buffer inverter are reported
p
CTS: buffer estimated skew target delay driving res input cap
CTS: bufbdf [0.013 0.015] [0.217 0.200] [0.210 0.248] [0.007 0.007]
CTS: inv0da [0.018 0.021] [0.097 0.119] [0.294 0.347] [0.036 0.036]
CTS: bufbd7 [0.025 0.030] [0.223 0.234] [0.415 0.503] [0.008 0.008]
Inverter CTS:
CTS b
bufbd4
fbd4 [0
[0.047
047 0.053]
0 053] [0.347
[0 347 0.357]
0 357] [0.786
[0 786 0.880]
0 880] [0.004
[0 004 0.004]
0 004]
Rise delay Fall delay

• Driving resistance determines the drive strength of the buffer or inverter.


• Smaller the driving resistance, greater is the drive strength.
• In the previous example, bufbdf is the buffer with the highest drive strength.

© Synopsys 2012 17
Unbalanced Buffers

• Buffers and inverters that have a big difference between their rise
and fall delays, which is referred to as the rise/fall delay skew, are
reported.
CTS: inverter inv0da: rise/fall delay skew = 0.204816 (> 0.200000)

• Remove unbalanced buffers them from the buffer list specified for
clock tree synthesis, as they can might cause bad skew.
• Use the set_clock_tree_references command to specify the
buffers and inverters that should be used for clock tree synthesis

© Synopsys 2012 18
Pruning of Buffers and Invertors
• Pruning is a process by which IC Compiler selects the buffers and
inverters which are best suited for clock tree synthesis, based on the
buffer and inverter characterization, and prevents the remaining ones
f
from being
b i used. d

• IC Compiler prunes the buffers and inverters based on drive strength


and power:
Pruning library cells (r/f, pwr)
Min drive = 0.264263.
Pruning inv0d0 because drive of 0.149845 is less than 0.264263.
Pruning inv0d2 because it is (w/ power-considered) inferior to invbd2.

• IC Compiler calculates a minimum drive value based on heuristics.


Buffers and inverters whose drive strength is less than the minimum
drive
d e value
a ue a
are
e co
considered
s de ed as weak
ea ddrivers
esa anddaare
eppruned
u ed by IC
C
Compiler.

• It is not possible to override the default pruning process

© Synopsys 2012 19
Maximum Transition, Maximum
Capacitance and Timing Constraints
Before clock tree synthesis begins, all the global clock tree constraints are
reported in the log,
log in the format shown below:

Default value or the value set


using
The value set_clock_tree_optionsp
used by CTS Value from
SDC
CTS: Global design rule constraints [rise fall]
CTS: max transition = worst[0.050
[ 0.050]
] GUI = worst[0.100
[ 0.100]
] SDC = worst[0.050
[ 0.050]]
CTS: max capacitance = worst[0.600 0.600] GUI = worst[0.600 0.600] SDC = undefined/ignored
CTS: max fanout = 2000 GUI = 2000 SDC = undefined/ignored

Undefined means no value


on
Skew/insertio

specified
ifi d iin SDC
delay targets
s

CTS: Global timing/clock tree constraints


CTS: clock skew = worst[0.100]
CTS: insertion delay = worst[2.000] Ignored means the value from
CTS: levels per net = 200 SDC is ignored as the
S

cts force user constraints


cts_force_user_constraints
d

Values set using the


set_clock_tree_options variable is set to true
command

© Synopsys 2012 20
Clock Tree Synthesis Target Specifications

• Target specifications are the internal targets for clock tree synthesis,
but are not guaranteed. Only target constraints are guaranteed to be
achieved
CTS: Global target spec [rise fall]
CTS: transition = worst[0.250 0.250]
CTS: capacitance = worst[0.300 0.300]
CTS: fanout= 32 (This target fanout value is not considered by CTS)

• Target specifications:
 maxTransSpec: Min(0.25, 80%of max_transition constraints)
 maxCapSpec: Min(0.30, 80%of max_capacitance constraints)

© Synopsys 2012 21
Preexisting Clock Tree Information in the Log File
Maximum number of Before starting to
gate levels available build the clock tree,
CTS: Design infomation
CTS: total gate levels = 8 the preexisting clock
e levels

CTS: Root clock net CLK2


Number of sinks tree structure is
CTS: clock gate levels = 2 printed in the log file
of gate
ffor clock CLK2

CTS: clock sink pins = 4


CTS: level 2: gates = 1 Existing gate levels and number
CTS: level 1: gates = 1 of gates at each level
Number

CTS: Buffer/Inverter list for CTS for clock net CLK2:


CTS: invbdk
N

CTS: bufbdk
...
CTS: Root clock net CLK1
CTS: clock gate levels = 8
CTS: clock sink pins = 8431
CTS: level 8: gates = 2
flip-flops towards
Gate levels from

CTS: level 7: gates = 3


CTS: level 6: gates = 4
clock ssource

CTS: level 5: gates = 3


CTS: level 4: gates = 1
CTS: level 3: gates = 5
CTS: level 2: gates = 4
CTS: level 1: gates = 1
CTS: Buffer/Inverter list for CTS for clock net CLK1:
CTS
CTS: i bdk
invbdk
CTS: bufbdk
...

© Synopsys 2012 22
Real Gates and Guide Buffers
• You may see the term real gates in the preexisting clock tree structure
information section:
CTS: Root clock net CLK1
CTS: clock gate levels = 16
CTS: clock sink pins = 70644
...
CTS: level 13: gates = 14 (real gates = 4)
CTS: level 12: gates = 111 (real gates = 101)
CTS: level 11: g
gates = 146 (
(real g
gates = 136)
)
CTS: level 10: gates = 2488 (real gates = 2478)

• Real gates are preexisting gates in the clock tree, and are not gates added by
the tool

• Guide buffers are buffers or inverters that are inserted by the tool, before it
begins to build the tree. They are intended to help clock tree synthesis build a
better clock tree

• The number of guide buffers inserted at each level can be determined from the
difference between gates and real gates.
– In the above example, the tool has added 10 guide buffers at each of the clock tree

© Synopsys 2012 23
Buffers and Inverters Used
• Before it begins to build the clock tree, the tool will list all the buffers and inverters it will
use to build the tree
CTS: Buffer/Inverter list for CTS for clock net sdram_clk:
CTS: CLKBUFX20
CTS: CLKBUFX16 CTS uses this list
CTS: CLKBUFX12
CTS: Buffer/Inverter LEQ cell list for Boundary Cell for clock net sdram_clk:
CTS
CTS: CLKBUFX20
CTS: CLKBUFX16 CTS uses this list for inserting boundary cells
CTS: CLKINVX8
CTS: Buffer/Inverter LEQ cell list for CTO for clock net sdram_clk:
CTS: CLKBUFX20
CTS: CLKBUFX16 CTO uses this list for sizing
CTS: CLKINVX8
CTS: Buffer/Inverter list for DelayInsertion for clock net sdram_clk:
CTS: CLKBUFX20
CTS: CLKBUFX16 CTO uses this
thi list
li t for
f delay
d l iinsertion
ti
CTS: CLKINVX8

• You can change the buffer and inverter list by using the following command:
set_clock_tree_references

© Synopsys 2012 24
Clock Tree Synthesis Removes User-Specified
Ideal Attributes on Clocks
• Synthesized clocks are set to be propagated, and clock transition, which
is an attribute of an ideal clock, is removed
CTS: Information: Removing clock transition on clock SP0XCLK ... (CTS-103)
CTS: Information: Removing clock transition on clock SP0RCLK ... (CTS-103)

• Latency, another attribute of an ideal clock, is also removed


CTS: Information: Removing clock latency on pin
Idma_scr_wrap0__Idma_scrba0_m2m0_wrap/I_dma_scrba0_m2m0/ I_dma@ ... (CTS-
098)

• Source Latency is removed for generated clocks


Information: Removing clock source latency on clock CLK1GC1 ... (CTS-289)

• These messages are informational only, and no action is required

© Synopsys 2012 25
Overlap or Reconvergent Paths

• Overlap or reconvergent paths occur when multiple clocks can drive a


node

• IC Compiler issues warnings about such paths


Warning: Either the driven net has been synthesized previously or
clock path overlaps/reconverges at pin periph/U1852/Y. (CTS-209)
• Such messages should be treated as informational, rather than as
warnings
– IC Compiler has no problems handling such situations

© Synopsys 2012 26
Gate Level-by-Level Clock Tree Synthesis
• Clock
Cl k ttree b
building
ildi iis d
done gate
t llevell b
by gate
t llevel,
l starting
t ti ffrom th
the
sinks to the clock root

• For each gate level, just before the synthesis starts, the following
information will be printed in the log:
CTS: gate level 2 clock tree synthesis
CTS: clock net = I_BLENDER_1/gclk
g Net and driver at
CTS: driving pin = I_BLENDER_1/U483/Z this gate level
CTS: gate level 2 design rule constraints [rise fall]
CTS: max transition = worst[0.300 0.300]
CTS: max capacitance = worst[0.300 0.300]
CTS: max fanout = 2000
CTS: gate level 2 target spec [rise fall]
CTS: transition = worst[0.240 0.240]
CTS: capacitance = worst[0.240 0.240]
CTS:
C S: d
driver
e cap. = worst[0.088
o st[0.088 0.088]
CTS: fanout = 32
CTS: gate level 2 timing constraints
CTS: clock skew = worst[0.000]
CTS: levels per net = 200
CTS: -----------------------------------------------
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]

© Synopsys 2012 27
Clustering During Clock Tree Synthesis
• The clock tree buildingg starts with clustering.
g Clustering
g is the p
process of
dividing a set of sink pins (fanouts) into groups. Each group is driven by a
buffer
 The instances of a cluster are all close to each other
• The following message says that 423 sink pins are divided into 27 clusters
clusters,
each with approximately 423/27 sink pins
CTS: gate level 2 clock tree synthesis
...
CTS: gate level 2 design rule constraints [rise fall]
CTS: max transition = worst[0.300 0.300]
CTS: max capacitance = worst[0.300 0.300]
CTS: max fanout = 2000
CTS: gate level 2 target spec [rise fall]
CTS: transition = worst[0.240 0.240]
CTS: p
capacitance = worst[0.240
[ 0.240]
]
CTS: driver cap. = worst[0.088 0.088]
CTS: fanout = 32
CTS: gate level 2 timing constraints
... Before clustering
After clustering
CTS: -----------------------------------------------
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
CTS: Completed 423 to 27 clustering
CTS: BA: lp (1.520, 0.673): skew (0.149, 0.080) c(1.481, 0.198) viol(n y)
One buffer level is added
CTS: ----------------------------------------------- with each clustering
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
Represents DRCs
CTS: Completed 27 to 4 clustering
(cap,trans)
CTS: BA: lp (0.673,
(0 673 00.597):
597): skew (0
(0.080,
080 00.105)
105) c(0
c(0.198,
198 00.026)
026) viol(n n)
y : violation present
CTS: -----------------------------------------------
n : no violation
Skew (Before clustering, After clustering)

© Synopsys 2012 28
Clustering With Hookup Pins
• Hookup pins are input pins of gates or macros

• Unlike clock pins of flip-flops and latches (sink pins), hookup pins
have a nonzero phase delay that must be balanced with the sink
pins

© Synopsys 2012 29
Clustering With Hookup Pins
• Initially, the tool makes attempts to cluster hookup pins along with the normal sinks (trial
Initially
clustering)
CTS: gate level 1 clock tree synthesis
...
CTS: gate level 1 design rule constraints [rise fall]
CTS:
CTS:
max transition = worst[0.300 0.300]
max capacitance = worst[0.300 0.300] In this example,
example there are 479 sinks
CTS:
CTS:
max fanout = 2000
gate level 1 target spec [rise fall]
and 1 hookup pin
CTS: transition = worst[0.240 0.240]
CTS: capacitance = worst[0.240 0.240]
CTS: driver cap. = worst[0.150 0.150]
CTS: fanout = 32
CTS: gate level 1 timing constraints
...
CTS: -----------------------------------------------
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
CTS: Completed 480 to 34 clustering Trial
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
CTS
CTS: C
Completed
l t d 34 t
to 6 clustering
l t i clustering
CTS: BA: this delay [max min] (skew) = worst[0.000 0.000] (0.000)
CTS: BA: next delay [max min] (skew) = worst[0.124 0.124] (0.000)
CTS: BA: target cap = 0.070 pf
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
CTS:
CTS:
BA: CAC set: target cap = 0.070317: targetWireCap = 0.274866
Completed 479 to 39 clustering
Actual
CTS:
CTS:
BA: lp (1.574, 0.770): skew (0.821, 0.451) c(1.737, 0.269) viol(n y)
-----------------------------------------------
clustering
l t i
• At the trial clustering stage, the hookup pin is considered along with the other sink pins and
(479+1) to 34 to 6 clustering is obtained
• At the actual clustering stage
stage, the tool clusters the 479 sink pins separately from the hookup
pin

© Synopsys 2012 30
Clustering With Hookup Pins:
Hookup Pin Clustered With Sinks
• If the trial clustering gives good QoR results, the following message shown in
blue is displayed :
CTS: BA: lp (1.968, 2.031): skew (0.257, 0.194) c(0.076, 0.072) viol(y y)
CTS: -----------------------------------------------
CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005]
CTS: BA: rootNetCap = 0.071776: targ cap = 0.045000: targ wirecap = 0.000000: not relaxed
CTS: Completed 2 to 2 clustering
CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005]
CTS: BA: rootNetCap = 0.071776: targ cap = 0.045000: targ wirecap = 0.000000: not relaxed
CTS: Completed 2 to 1 clustering
CTS: BA: this delay [max min] (skew) = worst[2.040 1.844] (0.196)
CTS: BA: next delay [max min] (skew) = worst[2.161 1.965] (0.196)
CTS: BA: target cap = 0.048 pf
CTS: Pin 1: periph/U5659/A is selected for next level
CTS: delay [max min] (skew) = worst[1.976 1.921] (0.055)
CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005]
CTS: Completed
p 2 to 2 clustering
g
CTS: BA: lp (2.031, 2.153): skew (0.194, 0.210) c(0.072, 0.026) viol(n n)
CTS: -----------------------------------------------

• When the phase delay of the hookup pin periph/U5659/A matches with the
delayy of the alreadyy built tree at that g
gate level,, it will be clustered at that buffer
level.

© Synopsys 2012 31
Meeting Target Early Delay
• After the synthesis of the root clock net (gate level 1 synthesis), the tool checks if the delay
constraint set by the user is being met or not.

• If it is not met, the tool inserts some buffers at the root clock net to achieve the target delay
p
specified by
y the user.

• In the following message, 16 buffers are inserted at the root clock net to increase the delay from
0.569ns to 2ns, which is the user specified target.
CTS: gate level 1 clock tree synthesis
CTS:
C S: clock
c oc net
et = sys_c
sys clk
CTS: driving pin = sys_clk
CTS: gate level 1 design rule constraints [rise fall]
...
CTS: gate level 1 target spec [rise fall]
...
CTS: gate level 1 timing constraints Constraint set by the user
CTS: clock skew = worst[0.000]
CTS: insertion delay = worst[2.000]
CTS: levels per net = 200
CTS: -----------------------------------------------
CTS: Starting clustering for CLKBUF_X20 with target load = worst[0.211 0.270]
...
CTS: -----------------------------------------------
CTS: Starting clustering for CLKBUF_X20 with target load = worst[0.211 0.270]
CTS: Completed 19 to 2 clustering
CTS: BA: lp (0.563, 0.569): skew (0.142, 0.112) c(0.008, 0.008) viol(n n)
CTS: -----------------------------------------------
CTS: Inserting delay cells for clock tree sys_clk ...
CTS: current delay = worst[0.569] worst[0.457]
CTS: constraint = worst[2.000] worst[0.000]
CTS: inserted 16 (buffd3) delay cells to the clock net sys_clk

© Synopsys 2012 32
Synthesis Results of One Gate Level
CTS: gate level 1 clock tree synthesis results After the synthesis of a
delay at the

CTS: clock net : sdram_clk


gate level, the results are
dram_clk)

CTS: driving pin: sdram_clk


CTS: load pins : 5 sink pins, 0 gates/macros pins, 0 ignore pins printed in the log
CTS: buffer level 1: bufbd7 (1)
CTS: buffer level 2: bufbd7 (1)
n A (here sd
d insertion d

CTS: clock tree skew = worst[0.036]


CTS: longest path delay = worst[0.327](rise)
CTS: shortest path delay = worst[0.291](rise)
CTS: total capacitance = worst[0.389 0.389]
CTS: buffer level phase delay Operating Condition
driving pin
Skew and

CTS
CTS: 1 (I)
(I): worst[0.293](rise),
t[0 293]( i ) worst[0.256](rise);
t[0 256]( i ) skew
k = worst[0.036]
t[0 036]
CTS: (O): worst[0.151](rise), worst[0.129](rise); skew = worst[0.022]
CTS: 2 (I): worst[0.150](rise), worst[0.128](rise); skew = worst[0.022]
CTS: (O): worst[0.004](rise), worst[0.000](rise); skew = worst[0.004]
CTS: buffer level output transition delays [rise fall]
CTS: level 0: worst[0.088 0.085] worst[0.088 0.085]
CTS: load 0: worst[0.088 0.085] worst[0.088 0.085]
CTS: level 1: worst[0.111 0.115] worst[0.091 0.092]
CTS: load 1: worst[0.111 0.115] worst[0.091 0.092]
CTS: level 2: worst[0.158 0.153] worst[0.080 0.071]
CTS: load 2: worst[0.158 0.153] worst[0.080 0.071]
CTS: buffer level total load capacitance
CTS: level 0: worst[0.045 0.045]
CTS: level 1: worst[0.093 0.093]
CTS: level 2: worst[0.251 0.251] A 1 2 C
CTS: drc violations: 0 0 B
Load capacitance value is added and is
Number of cap Number of trans reported as total capacitance of the subtree
violations violations

© Synopsys 2012 33
Maximum Transition and Capacitance
Violations
• After each gate level is synthesized, the maximum capacitance and
maximum transition violations at that gate level are reported
CTS: gate level 3 clock tree synthesis results
...
CTS: buffer level total load capacitance
...
CTS
CTS: capacitance
it violation
i l ti on periph/CTS_755
i h/CTS 755
CTS: capacitance = worst[0.052 0.052]
CTS: constraint = worst[0.050 0.050]
CTS: capacitance violation on periph/CTS_757
CTS: capacitance = worst[0.051 0.051]
CTS: constraint = worst[0.050
worst[0 050 0.050]
0 050]
...
CTS: transition delay violation at periph/CLKBUFX20_G3B1I3/A
CTS: transition delay = worst[0.052 0.050] worst[0.052 0.050]
CTS: constraint = worst[0.050 0.050]
CTS: transition delay violation at periph/CLKBUFX20_G3B2I14/A
CTS: transition delay = worst[0.053 0.051] worst[0.053 0.051]
CTS: constraint = worst[0.050 0.050]
...
CTS: drc violations: 18 5
Number of cap Number of trans
violations violations

© Synopsys 2012 34
A More Complex Synthesis Results
CTS: gate level 1 clock tree synthesis results
CTS: clock net : clk
CTS: driving pin: clk
CTS: load pins : 80 sink pins, 0 gates/macros pins, 0 ignore pins
CTS: buffer level 1: CLKBUFX20 (1)
CTS: buffer level 2: CLKBUFX20 (2) CLKBUFX12 (1)
CTS: clock tree skew = worst[0.001]
CTS: longest path delay = worst[0.248](rise)
CTS: shortest path delay = worst[0.246](rise)
CTS: total capacitance = worst[0.549 0.549]
CTS: buffer level phase delay
CTS: 1 (I): worst[0.247](rise), worst[0.246](rise); skew = worst[0.001]
CTS: (O): worst[0.141](rise), worst[0.140](rise); skew = worst[0.001]
CTS: 2 (I): worst[0.141](rise), worst[0.140](rise); skew = worst[0.001]
CTS: (O): worst[0.001](rise), worst[0.000](rise); skew = worst[0.001]
CTS: buffer level output transition delays [rise fall]
CTS: level 0: worst[0.000 0.000] worst[0.000 0.000]
CTS: load 0: worst[0.000 0.000] worst[0.000 0.000]
CTS: level 1: worst[0.089 0.076] worst[0.089 0.076]
CTS: load 1: worst[0.089 0.076] worst[0.089 0.076]
CTS: level 2: worst[0.109 0.093] worst[0.104 0.091]
CTS: load 2: worst[0.109 0.093] worst[0.104 0.091]
CTS: buffer level total load capacitance
CTS: level 0: worst[0.038 0.038]
CTS: level 1: worst[0.108 0.108]
CTS: level 2: worst[0.403 0.403]
CTS: drc violations: 0 0

© Synopsys 2012 35
Gate levvel 1
(Clock ssource pin)

© Synopsys 2012
Buffer le
evel 1 of gate level 1

36
Buffer le
evel 2 of gate level 1

Red: Preexisting gates


Black: CTS introduced gates
Gate Le
evel 2

Buffer le
evel 1 of gate level 2

Buffer le
evel 2 of gate level 2

to appear top-down
Buffer level 3
e level 2
of gate

Buffer level 4
e level 2
of gate
At each gate level, the clock tree is built
bottom-up, but the buffer names are changed
Gate Level and Buffer Level Nomenclature
DRC Violation Report After Synthesis
• After building the complete clock tree, all the remaining DRC violations in
the entire clock tree gets reported in the log file:
CTS: Clock tree synthesis completed successfully
CTS: CPU time: 50 seconds
CTS: Reporting clock tree violations ...
CTS: Global design rules:
CTS: maximum transition delay [rise,fall] = [0.05,0.05]
CTS: maximum capacitance = 0.05
CTS: maximum fanout = 2000
Constraints
CTS: maximum buffer levels per net = 200
CTS: transition delay violation at sdram_clk
CTS: user specified transition delay = worst[0.056 0.050] worst[0.056 0.050]
CTS: constraint = worst[0.050 0.050]
CTS: transition delay violation at CLKBUF_X20_G1B21I1/Z
CTS: transition delay = worst[0.051 0.050] worst[0.051 0.050]
CTS: constraint = worst[0.050 0.050]
CTS: capacitance violation on CTS_6557
Reports only transition
CTS: capacitance
p = worst[0.074
[ 0.074]] and capacitance
p violations
CTS: constraint = worst[0.050 0.050]
CTS: Summary of clock tree violations:
CTS: Total number of transition violations = 2 Total transition and
CTS: Total number of capacitance violations = 1 capacitance violations

© Synopsys 2012 37
Summary Report After
Clock Tree Synthesis
CTS: ------------------------------------------------
CTS Clock
CTS: Cl k TTree S
Synthesis
th i S Summary
CTS: ------------------------------------------------
CTS: 5 clock domain synthesized
CTS: 30 gated clock nets synthesized
CTS: 26 buffer trees inserted
CTS: 722 buffers used (total size = 45974.2)
CTS: 752 clock nets total capacitance = worst[76.868 76.868]

Each gate level can


h
have multiple
l i l nets

© Synopsys 2012 38
Clock-by-Clock Summary
• A summary is reported for each clock:
CTS: ------------------------------------------------
CTS: Clock-by-Clock Summary
Buffer tree is inserted
CTS: only if necessary
------------------------------------------------
CTS: Root clock net pclk
CTS: 3 gated clock nets synthesized
CTS: 2 buffer trees inserted
CTS: 2 buffers used (total size = 159.667)
CTS: 5 clock nets total capacitance = worst[0.514 0.514]
CTS: clock tree skew = worst[0.341]
CTS: longest path delay = worst[5.959](rise)
CTS: shortest path delay = worst[5.619](rise)
CTS: Root clock net sys_clk
...

© Synopsys 2012 39
Embedded Clock Tree Optimization
• After clock tree synthesis, embedded clock tree optimization begins
• The characteristics of the buffers and inverters used are reported again
CTS: buffer estimated skew target delay driving res input cap
CTS: bufbdf [0.013 0.015] [0.217 0.200] [0.210 0.248] [0.007 0.007]
CTS: inv0da [0.018 0.021] [0.097 0.119] [0.294 0.347] [0.036 0.036]
...

• The global constraints for clock tree are also reported again
CTS: Global design rule constraints [rise fall]
CTS: max transition = worst[0.050 0.050] GUI = worst[0.050 0.050] SDC = undefined/ignored
...
C S Gl
CTS: Global
b l timing/clock
i i / l k tree constraints
i
CTS: clock skew = worst[0.000]
...
CTS: Global target spec [rise fall]
CTS: transition = worst[0.040 0.040]
...

Note:
Embedded clock tree optimization is called only when the compile_clock_tree
command is used
used. It is not called when the clock_opt
l k t command is used

© Synopsys 2012 40
More Messages on Real Gates and
Guide Buffers
• At the beginning of optimization, you might get the following
messages:
CTS: Root clock net chip_sclk_src
CTS: clock gate levels = 75
CTS: clock sink pins = 125896
...
CTS: level 73: gates = 3 (real gates = 1)
CTS: level 72: gates = 2 (no real gates, guide buffers only)

• All the gates are guide buffers


ff and inverters inserted during clock
tree synthesis.
• This information is similar to the one printed prior to clock tree
synthesis.
h i

© Synopsys 2012 41
Gate Level Optimization
• The clock tree optimization is also done for each gate level
• Similar to when the clock tree is built

• Before optimizing a gate level, the current skew, longest path delay and shortest
path delay from the driving pin of that gate level, is reported.

CTS: gate level 2 clock tree optimization


CTS: clock net = I_BLENDER_1/gclk
CTS: driving pin = I_BLENDER_1/U483/Z
CTS: clock tree skew = worst[0.517]
CTS: longest path delay = worst[5.339](rise)
CTS: shortest path delay = worst[4.822](fall)

• After which that gate level is optimized

© Synopsys 2012 42
Buffer Sizing

• The following message indicates that buffer sizing was successful


CTO-BS: Starting buffer sizing ...
Information: Replaced the library cell of CLKBUF_X20_G2B2I1 from CLKBUF_X20 to CLKBUF_X16. (CTS-152)
CTO-BS: CPU time = 0 seconds for buffer sizing

• Clock tree optimization will try to resize buffers, and improve skew and
insertion delay. If it does not find it beneficial, then the original cell
master will be restored.

CTO-BS: Starting buffer sizing ...


CTO-BS: Restoring original cellMaster <CLKBUF_X20> of <CLKBUF_X20_G2B2I4>
CTO-BS: CPU time = 1 seconds for buffer sizing

© Synopsys 2012 43
Gate Sizing
CTO-GS: Starting gate sizing ...
Information: Replaced the library cell of I7188625 from TLQMUX2X60 to TULQMUX2ZSX40. (CTS-152)
Information: Replaced the library cell of I7586451 from TLTMUX2X60 to TLTMUX2X50. (CTS-152)
Information: Replaced the library cell of I3342873 from TULTMUX2X50 to TLTMUX2ZSX60. (CTS-152)
Information: Replaced the library cell of I1387108 from TULTMUX2X80 to TULTMUX2ZSX80. (CTS-152)
... 14 cells sized
I f
Information:
ti R
Replaced
l d th
the lib
library cell
ll of
f I6717862 ffrom THQMUX2ZSX80 t
to TSTMUX2ZSX20
TSTMUX2ZSX20. (CTS-152)
(CTS 152)
Information: Replaced the library cell of I9359863 from TLTMUX2ZSX80 to TULTMUX2ZSX60. (CTS-152)
Information: Replaced the library cell of I10258160 from TLTMUX2ZSX60 to TLTMUX2ZSX40. (CTS-152)
Information: Replaced the library cell of I7636259 from TLTMUX2ZFFX80 to TULTMUX2ZSX60. (CTS-152)
CTO-GS: 1: Sized 14/40 cell instances (tested 40X247)
CTO-GS: delay
y (
(from)
) = worst[9.104]
[ ] worst[8.633];
[ ] Summary of the first round of sizing
]; skew = worst[0.471]
[
• Number of gate sized (Here 14 out of 40 gates)
CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471]
• Shows the improvement in skew
CTO-GS: improvement = worst[0.106%]
Information: Replaced the library cell of I2130284 from TLTMUX2X80 to TLTMUX2ZSX40. (CTS-152)
Information: Replaced the library cell of I8618764 from TLTMUX2ZFFX80 to TLTMUX2X80. (CTS-152)
Information: Replaced the library cell of I1749911 from TULTMUX2ZFFFX80 to TULTMUX2ZFFX80. (CTS-152)
Information: Replaced the library cell of I3342873 from TLTMUX2ZSX60 to TLTMUX2ZSX40. (CTS-152)
Information: Replaced the library cell of I8872989 from TULTMUX2ZFFFX60 to TLTMUX2ZFFX80. (CTS-152)
Information: Replaced the library cell of I1387108 from TULTMUX2ZSX80 to TULTMUX2X50. (CTS-152)
CTO-GS: 2: Sized 6/40 cell instances (tested 40X247)
CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471]
CTO GS:
CTO-GS: delay (to) = worst[9
worst[9.104]
104] worst[8
worst[8.633];
633]; skew = worst[0
worst[0.471]
471]
CTO-GS: improvement = worst[0.000%]
CTO-GS: Summary of cell sizing
CTO-GS: Sized 20/40 cell instances (tested 80X247)
CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471] Overall summary of gate sizing done at this gate
level. Total 14+6 =20 gates sized giving an
CTO-GS: delay
y (to) = worst[9.104] worst[8.633]; skew = worst[0.471]
0 106% iimprovementt iin skew
0.106% k att thi
this gate
t llevell
CTO-GS: improvement = worst[0.106%]
CTO-GS: CPU time = 2413 seconds for gate sizing

© Synopsys 2012 44
Gate Relocation

• Gate relocation works on preexisting gates.

• If you have no preexisting gates, you might see the following


message:
g

CTO-GR: gate relocation is skipped since there are no hookup pins

© Synopsys 2012 45
A Successful Gate Relocation
2 cells were tried at 47
new locations, 1 was moved
CTO-GR: Starting gate relocation ...
CTO-GR: delay [max min] (skew) = worst[9.023 8.563] (0.460)
CTO-GR: 1: Relocated 1/40 cell instances (tested 2 cell instances at 47 points)
CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460] Initial skew
CTO-GR: delay (to) = worst[9.023] worst[8.563]; skew = worst[0.460] Final skew
CTO-GR: improvement = worst[0.000%] Improvement in skew
CTO GR
CTO-GR: d
delay
l [max
[ min]
i ] (skew)
( k ) = worst[9.018
t[9 018 8.563]
8 563] (0
(0.455)
455)
CTO-GR: delay [max min] (skew) = worst[9.018 8.563] (0.455)
CTO-GR: 2: Relocated 2/40 cell instances (tested 5 cell instances at 83 points)
CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460]
CTO-GR: delayy ((to)
) = worst[9.018]
[ ] worst[8.563];
[ ] skew = worst[0.455]
[ ]
CTO-GR: improvement = worst[1.118%]
CTO-GR: Summary of cell relocation
CTO-GR: Relocated 3/40 cell instances (tested 7 cell instances at 130 points)
CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460] Overall summary of
CTO-GR: delay (to) = worst[9.018] worst[8.563]; skew = worst[0.455] gate
t relocation
l ti att this
thi
CTO-GR: improvement = worst[1.118%] gate level
CTO-GR: CPU time = 2 seconds for gate relocation

© Synopsys 2012 46
Gate Relocation: Failed Attempts

CTO-GR: Starting gate relocation ...


CTO-GR: Summary of cell relocation
CTO-GR: Relocated 0/1 cell instances (tested 1 cell instances at 24 points)
CTO-GR: delay (from) = worst[1.207] worst[0.980]; skew = worst[0.227]
CTO-GR: delay (to) = worst[1.207] worst[0.980]; skew = worst[0.227]
CTO-GR: improvement = worst[0.000%]
CTO-GR: CPU time = 0 seconds for gate relocation

• In this example, clock tree optimization tried to move one gate


instance to 24 different locations. Since the attempts did not improve
the QoR, the gate relocation was abandoned

© Synopsys 2012 47
Buffer Relocation

• Buffer relocation is done on all clock tree synthesis inserted buffers


CTO BR:
CTO-BR: Buffer relocation ...
CTO-BR: Optimization level: net
CTO-BR: delay [max min] (skew) = worst[9.087 8.503] (0.584)
CTO-BR: 1: Relocated 1/6 cell instances (tested 6 cell instances at 74 points)
CTO-BR: delay (from) = worst[9.099] worst[8.503]; skew = worst[0.596]
CTO-BR: delay (to) = worst[9.087] worst[8.503]; skew = worst[0.584]
CTO-BR: improvement = worst[2.013%]
CTO-BR: delay [max min] (skew) = worst[9.087 8.503] (0.584)
CTO-BR: 2: Relocated 1/6 cell instances (tested 5 cell instances at 62 points)
CTO-BR:
CTO BR: delay (from) = worst[9.087]
worst[9 087] worst[8.503];
worst[8 503]; skew = worst[0.584]
worst[0 584]
CTO-BR: delay (to) = worst[9.087] worst[8.503]; skew = worst[0.584]
CTO-BR: improvement = worst[0.000%]
CTO-BR: Summary of cell relocation
CTO-BR: Relocated 2/6 cell instances (tested 11 cell instances at 136 points)
CTO-BR: delay (from) = worst[9.099] worst[8.503]; skew = worst[0.596]
CTO-BR: delay (to) = worst[9.099] worst[8.503]; skew = worst[0.584]
CTO-BR: improvement = worst[2.013%]
CTO-BR: CPU time = 0 seconds for buffer relocation

• Th information
The i f i iis similar
i il to gate relocation
l i

© Synopsys 2012 48
Post Embedded Clock Tree Synthesis
• After the embedded clock tree optimization, the tool prints the summary.
• It looks exactly similar to the summary printed after clock tree synthesis
synthesis.
CTS: ------------------------------------------------
CTS: Clock Tree Optimization Summary
CTS: ------------------------------------------------
CTS: 4 clock domain synthesized
CTS: 5 gated clock nets synthesized
CTS: 5 buffer trees inserted
CTS: 1000 buffers used (total size = 16570
16570.8)
8)
CTS: 1005 clock nets total capacitance = worst[14.010 14.010]
CTS: ------------------------------------------------
CTS: Clock-by-Clock Summary
CTS: ------------------------------------------------
CTS: Root clock net sdram_clk
CTS: 1 gated clock nets synthesized
CTS: 1 buffer trees inserted
CTS: 302 buffers used (total size = 5039.47)
CTS: 303 clock nets total capacitance = worst[4.170 4.170]
CTS: clock tree skew = worst[0.035]
CTS: longest path delay = worst[2.041](rise)
CTS: shortest path delay = worst[2.006](fall)
CTS: Root clock net sys_2x_clk
...

• After the summary, all the trans and cap violations on the clock tree are also reported.
CTS: Global design rules:
CTS: maximum transition delay [rise,fall] = [0.05,0.05]
CTS: maximum capacitance = 0.05
CTS: maximum fanout = 2000
CTS: maximum buffer levels per net = 200
CTS: transition delay violation at sdram_clk
CTS: user specified transition delay = worst[0.056 0.050] worst[0.056 0.050]
CTS: constraint = worst[0.050 0.050]
CTS: transition delay violation at buffd2_G1B1I1/Z
...
CTS: Summary of clock tree violations:
CTS: Total number of transition violations = 3994
CTS: Total number of capacitance violations = 1

© Synopsys 2012 49
DRC Fixing Beyond Exceptions
• After embedded clock tree optimization, the tool will start fixing the
DRC violations beyond exceptions.
• The messages are similar to clustering:
CTS: fixing DRC beyond exception pins under clock CLK1

CTS: gate level 2 DRC fixing (exception level 1)


CTS: clock net = CLK1_G1IP
CTS: driving pin = bufbd2_G1IP_1/Z
CTS: gate level 2 design rule constraints [rise fall]
CTS: max transition = worst[0.100 0.100]
CTS: max capacitance = worst[0.600 0.600]
CTS: max fanout = 2000
CTS: -----------------------------------------------
CTS: Starting clustering for bufbdf with target load = worst[0.056 0.056]
CTS: Completed 4 to 1 clustering
CTS: -----------------------------------------------
CTS: Starting clustering for bufbd7 with target load = worst[0.050 0.050]
CTS: Completed 1 to 1 clustering
i
CTS: ------------------------------------------------

• After fixing the DRC violations, the whole summary and the clock-
by-clock
by clock summary of DRC fixing beyond exceptions are reported.

© Synopsys 2012 50
Placement Legalization is Called
After Clock Tree Synthesis
• When clock tree synthesis places a clock tree buffer or inverter, it
places it at a legal location, but the location might be occupied
 Causes overlaps which needs to be resolved
• The tool calls the placement legalizer which moves the cells to
resolve the overlaps.
• After legalization, the cells with large displacement gets reported in
the log
Largest displacement cells:
Cell: periph/U122 (AND3X) 1 of 6 cells that
Input location: (906.380 1597.520)
were displaced
Legal location: (897.140 1582.400)
Displacement: 17
17.720
720 um
um, e
e.g.
g 33.52
52 row height
height.
Total 6 cells has large displacement (e.g. > 15.120 um or 3 row height)

© Synopsys 2012 51
Agenda
• Prerequisites for Clock Tree Synthesis

• Enabling Useful Debug Messages in IC Compiler Clock


Tree Synthesis

• Clock Tree Synthesis Log Messages

• Clock Tree Optimization Log Messages

© Synopsys 2012 52
The optimize_clock_tree Command
Log File Messages

• Optimization
p options
p
• Report before optimization
• Optimization
• Report after optimization

© Synopsys 2012 53
Standalone Optimization Using the
optimize clock tree Command
optimize_clock_tree
• Standalone optimization differs from embedded optimization in the
algorithms used

• g messages
Some of the log g are similar to those of when yyou use the
compile_clock_tree command
 Design update information
 Buffer characterization
 Pruning of cells
 List of cells used for clock tree optimization

© Synopsys 2012 54
CTS-352 Warning
• The default delay calculation engine is Elmore. Elmore delay
calculation might lead to inferior accuracy in skew and latency
estimation.

• Enable the Arnoldi delayy calculation engine


g for more accurate delay
y
calculation during optimization, by using the following command:
set_delay_calculation –clock_arnoldi

• Otherwise, the optimize_clock_tree command will issue the


following warning:
Warning: set_delay_calculation is currently set to 'elmore'.
'clock arnoldi' is suggested
'clock_arnoldi' suggested. (CTS
(CTS-352)
352)

© Synopsys 2012 55
Optimization Options

• Before starting optimization, the optimize_clock_tree


command d reports the
h root pin
i andd the
h optimization
i i i options
i ffor each
h
clock.
• The following are the options which you have specified, by using the
set clock tree optimization options command
set_clock_tree_optimization_options

Initializing parameters for clock CLK2GC:


Root pin: instCLK2GC/Q
Using the following optimization options:
gate sizing : on
gate relocation : on
preserve levels : off
area recovery : on
relax insertion delay : off
balance rc : off

© Synopsys 2012 56
Preoptimization Report
• Before the tool begins to optimize the clock tree, it reports some of
the current characteristics of the clock tree:
*****************************************
* Preoptimization report (clock 'CLK3')
CLK3 ) * Clock name
*****************************************
Corner max' CTS corner
Estimated Skew (r/f/b) = (0.073 0.000 0.073) The starting skew and ID
Estimated Insertion Delay (r/f/b) = (1.903 -inf 1.903) for the clock as seen by
Corner 'RC-ONLY'
CTO
Estimated Skew (r/f/b) = (0.005 0.000 0.005)
Estimated Insertion Delay (r/f/b) = (0.008 -inf 0.008)
Wire capacitance = 0.8 pf
Total capacitance = 2.3 pf Maximum transition value
Max transition = 0.448 ns
present in the clock tree
p
Cells = 24 (area=67.500000)
Buffers = 23 (area=67.500000)
Buffer Types
============ Information about the
bufbd2: 1 buffers and inverters
bufbdf: 8
presentt in
i th
the clock
l k ttree
bufbd7: 5
bufbd4: 3
bufbd1: 6

© Synopsys 2012 57
Optimization Messages
• During optimization, the tool prints out messages for sizing, insertion
and removal, and switching of metal layers:

Deleting cell I_SDRAM_TOP/bufbda_G1B1I10 and output net I_SDRAM_TOP/sdram_clk_G1B1I10.


iteration 1: (0.314104, 3.328620) (skew, ID)
Total 1 buffers removed on clock CLK3 Buffer Removal
Start (3.256, 3.527), End (3.015, 3.329)
Start (sp, lp) : Initial delays
....
End (sp, lp) : Final delays
iteration 2: (0.313991, 3.314841)
sp: shortest path delay
iteration 3: (0.308073, 3.295621)
lp: longest path delay
Total 2 cells sized on clock CLK3
Start (3
(3.015,
015, 3
3.329),
329), End (2
(2.988,
988, 3
3.296)
296)
Cell Sizing
....
iteration 6: (0.305181, 3.275623)
Total 1 delay buffers added on clock sck_in12 (LP)
Start (2.975, 3.283), End (2.970, 3.276)
Buffer Insertion
....
Switch to low metal layer for clock ‘CLK3':
Total 9 out of 13 nets switched to low metal layer for clock ‘CLK3' with largest cap
change 0.00 percent

Metal layer switching

© Synopsys 2012 58
Optimization Messages

• If area recovery option is enabled, the tool does area recovery after
optimizing each clock
clock, and reports the changes made to that clock:

Area recovery optimization for clock ‘CLK3':


15% 23% 30% 46% 53% 61% 76% 84% 92% 100%
Deleting cell cell I_SDRAM_TOP/bufbda_G1B1I9 and output net I_SDRAM_TOP/sdram_clk_G1B1I9.

Total 1 buffers removed (all paths) for clock ‘CLK3'

© Synopsys 2012 59
Post Optimization Report
• After completing
p g the optimization
p of a clock,, the tool reports
p the new
characteristics of the clock tree.

• This is similar to the information printed in before optimization:


**************************************************
* Multicorner optimization report (clock 'CLK3') *
**************************************************
Corner ‘max'
Estimated Skew (r/f/b) = (0.041 0.000 0.041)
E ti t d I
Estimated Insertion
ti D
Delay
l (
(r/f/b)
/f/b) = (1
(1.725
725 -inf
i f 1.725)
1 725)
Corner 'RC-ONLY'
Estimated Skew (r/f/b) = (0.007 0.000 0.007)
Estimated Insertion Delay (r/f/b) = (0.009 -inf 0.009)
Wire capacitance = 0.8 pf
Total capacitance = 2.3 pf
Max transition = 0.356 ns
Cells = 24 (area=59.000000)
Buffers = 23 (area=59.000000)
Buffer Types
============
bufbd7: 4
bufbdf: 6
bufbd4: 5
bufbd1: 7
bufbd2: 1

© Synopsys 2012 60
Reporting the Longest and Shortest Paths
• The longest and shortest paths corresponding to all corners are reported,
soon after the post optimization report:

++ Longest path for clock CLK3 in corner 'max':


object fan cap trn inc arr r location
clk3 (port) 32 0 0 r ( 440 748)
clk3 (net) 13 97

I_SDRAM_TOP/I_SDRAM_READ_FIFO/reg_array_reg_3__8_/CP (senrq1)
167 4 289 r ( 521 520)

++ Shortest path for clock CLK3 in corner 'max':


object fan cap trn inc arr r location
clk3 (port) 32 0 0 r ( 440 748)
clk3(net) 13 97

I_SDRAM_TOP/I_SDRAM_READ_FIFO/reg_array_reg_4__11_/CP (senrq1)
217 4 247 r ( 687 656)

• Placement legalization related messages are located at the end of the


optimize_clock_tree command log

© Synopsys 2012 61
Thank you

© Synopsys 2012 62
© Synopsys 2012 63

Das könnte Ihnen auch gefallen