Clock Implementation

Clock Implementation
Technology Guide
Magma Design Automation
Inc.
1650 Technology Drive
San J ose, CA 95110
USA
408-565-7500
October 2007
Talus
1.0
Copyright 19972007 Magma Design Automation Inc. All rights reserved.
Clock Implementation Technology Guide, Talus 1.0
This document, as well as the software described in it, are furnished under license and can be used or copied only in
accordance with the terms of such license. The content of this document is furnished for information use only, is subject to
change without notice, and should not be construed as a commitment by Magma Design Automation Inc. Magma Design
Automation Inc. assumes no responsibility or liability for any errors, omissions, or inaccuracies that might appear in this
book.
Except as permitted by such license, no part of this publication can be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, recording, or otherwise, without the prior written
permission of Magma Design Automation Inc. Further, this document and the software described in it constitute the
confidential information of Magma Design Automation Inc. and cannot be disclosed within your company or to any third
party except as expressly permitted by such license.
The absence of a name, tagline, symbol or logo in these lists does not constitute a waiver of any and all intellectual property
rights that Magma Design Automation Inc. has established in any of its product, feature, or service names or logos.
Registered Trademarks
Magma, the Magma logo, Magma Design Automation, Blast Chip, Blast Fusion, Blast Gates, Blast Noise, Blast RTL, Blast
Speed, Blast Wrap, FixedTiming, MegaLab, Melting Logical & Physical Design, MOLTEN, QuickCap, SiliconSmart, Talus,
and YieldManager are registered trademarks of Magma Design Automation Inc.
Trademarks
ArchEvaluator, Automated Chip Creation, Blast Create, Blast DFT, Blast FPGA, Blast Logic, Blast Plan, Blast Power, Blast
Prototype, Blast Rail, Blast SA, Blast View, Blast Yield, Camelot, Characterization-to-Silicon, Design Ahead of the Curve,
Diamond SI, Fastest Path from RTL to Silicon, FineSim, FineWave, GlassBox, HyperCell, MagmaCast, Merlin, Native
Parallel Technology, PALACE, Physical Netlist, Quartz, QuickInd, QuickRules, Relative Floorplanning Constraints, Relative
Placement Constraint, Sign-off in the Loop, Silicon Integrity, SiliconSmart CR, SiliconSmart I/O, SiliconSmart MR,
SiliconSmart SI, SuperSite, and Volcano are trademarks of Magma Design Automation Inc.
Sun, Sun Microsystems, and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the United
States and in other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks
of SPARC International, Inc. in the United States and in other countries. UNIX is a registered trademark of The Open
Group.
All other trademarks are the property of their respective owners.
Notice to U.S. government end users. The software and documentation are "commercial items," as that term is defined at
48 C.F.R. 2.101, consisting of "commercial computer software" and "commercial computer software documentation," as
such terms are used in 48 C.F.R. 12.212 or 48 C.F.R. 227.7202, as applicable. Consistent with 48 C.F.R. 12.212 or 48
C.F.R. 227.7202-1 through 227.7202-4, as applicable, the commercial computer software and commercial computer
software documentation are being licensed to U.S. government end users (A) only as commercial items and (B) with only
those rights as are granted to all other end users pursuant to the terms and conditions set forth in the Magma standard
commercial agreement for this software. Unpublished rights reserved under the copyright laws of the United States.
Magma trademarks, taglines, symbols, and logo are registered trademarks or trademarks of Magma Design Automation
Inc., in the United States and/or other countries. This trademark list is provided for informational purposes only; Magma
Design Automation Inc. does not provide any express or implicit warranties or guarantees with respect to the information
provided in this document.
Printed in the U.S.A.
Typographic Conventions

Clock Implementation Technology Guide
Talus 1.0 3
The following table summarizes typographic conventions or styles used throughout this document to
improve readability.
Visual Cue What It Means
blue Indicates hyperlinked text.
Bold Used in running text to identify Magma commands and options and menu
selection sequence. For example:
Set the -case option of the config timing crosstalk delay command to
best. To save the file, choose File > Save.
Bol d i t al i c Used in running text to identify user-replaceable strings in Magma
commands. For example:
Use the f i l ename argument to specify the output file for the constraints.
Italic Used in running text to indicate emphasis, book titles, and generic
unknowns such as n.
Cour i er Indicates commands, system prompts and output, code from files, error
messages, and reports printed by the system. For example:
f or ce r out e power 2 r i ng $mf pr i ngv85
Note: Not used in running text.
Courier italic Indicates user-replaceable strings in Magma commands and code.
For example:
f or ce cel l t emper at ur e cell temperature
Note: Not used in running text.
- (hyphen) Precedes an option in command syntax. For example:
Use the -domain option to identify the domain.
or
f or ce del ay l i b_gr oup - domai n domain_name
ALL UPPERCASE Used in running text and code to indicate logic functions such as AND, OR,
and NOR.
/ (slash) Indicates levels of directory structure in UNIX. For example:
/work/top/top.

4 Talus 1.0
\ (backslash) Used in two different ways:
In Microsoft NT, indicates levels of directory structure.
In Magma code, indicates a continuation of a command line. For
example:
expor t spi ce pat h $mpat h1 - f i l e " pat h1. sp" \
- r un_spi ce " - f r omU1/ Q - t o U12/ D"
[ ] (brackets) Denotes optional parameters such as:
por t 1 [ por t 2 . . . por t n]
| (vertical bar) Used in command syntax to indicate a choice among literal arguments. For
example:
conf i g cel l - case wor st | best | bot h
_ (underscore) Connects words that are read as a single term by the system. For example:
scan_out put _pi n
$l Used in Magma code to indicate either:
The replaceable string library_name.
A library that has been set for the design using the set command.
$m Used in Magma code to indicate either:
The replaceable string model_name.
A model that has been set for the design using the set command.
Menu > Command Shows a menu selection sequence using the > symbol to descend through
menu options. For example:
File > Save.
Visual Cue What It Means
Contents
Talus1.0 5
Contents
1. Overview of Magma Clock Tree Synthesis
Magma Clock Tree Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Clock Tree Synthesis Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Constraint Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Clock Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Clock Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Clock Tree Synthesis Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2. Clock Constraints
Using config... Commands to Implement Clock Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
The config clock auto_skew_balance Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
The config timing clock multiple Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
The config timing mode multiple Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Using force... Commands to Implement Clock Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
The force timing clock Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
The force plan clock Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Applying Constraints to Portions of the Clock Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Applying Skew Balancing Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Specifying Inverters and Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Specifying Separate Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Modifying Clock Phases and Skew Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
The force clock gate_clone Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
The force timing adjust_latency Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Recomputing Source and Network Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Contents
6 Talus1.0
The force timing latency Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Specifying Network Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Specifying Source Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Specifying I/O Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Specifying Skew Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
The force model routing layer Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
The force net nondefault Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
The force net shielding Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Using the rule nondefault Command to Implement Clock Constraints . . . . . . . . . . . . . . . . . . . 39
3. Clock Implementation
Clock Tree Synthesis Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Preparing for Clock Tree Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Clock Signals and Data Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Library Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Forcing Specific Cells to Be Used by the fix clock Command . . . . . . . . . . . . . . . . . . . . 42
Synthesizing Clocks With the fix clock Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Using Higher Effort Clock Tree Synthesis Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Using the run route clock Command for Clock Tree Synthesis Implementation . . . . . . . . . . . . 45
Controlling the Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Clock Repeater Naming Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Handling Gated Clock Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Propagating Nondefault Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Propagating Shielding Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Reducing Crosstalk Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Controlling the Size of Buffers at the Same Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Sizing Buffers and Inverters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Skipping Global Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Skipping the Sign-In Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Using a Special Timing Optimization Command for Clock Tree Synthesis Implementation . . . 51
Clock Implementation: Key Points To Remember . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Contents
Talus1.0 7
4. Clock Tuning
Using the run gate clock Command for Clock Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Using the Optimization Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
The slack Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
The skew Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
The boundary Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
The failing_endpoints Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Understanding Clock Repeater Naming Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Mixing Optimization Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Controlling the Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Clock tuning: Key Points To Remember . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5. Clock Reporting
Reporting Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
The report clock tree Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
The report clock skew Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
The report clock sinks Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
The report clock latency Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Query Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
The query clock histogram Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
The query clock sinks Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
The query model buffer_count Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Exporting Clock Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Using the export clock Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Using the Clock Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Clock Reporting: Key Points To Remember . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Overview of Magma Clock Tree Synthesis
Magma Clock Tree Synthesis
Talus1.0 9
1. Overview of Magma Clock Tree
Synthesis
This chapter describes the Magma clock tree synthesis methodology:
The Magma fix clock_hier command performs full-chip hierarchical clock tree routing in the Talus
Automated Chip Creation flow. Magma also provides a prebuilt clock insertion command called fix
clock to use when performing clock tree synthesis in a flat flow. Figure 1 shows the clock insertion
commands in the context of the overall Talus flows.
10 Talus1.0
Figure 1: The Magma Talus Design Flows
The fix clock_hier command performs the following tasks:
Runs the following pre-clock steps for top-level cells and nets only:
o
Global placement
o
Gate sizing
o
Detailed placement
o
Global routing
Runs clock prototyping on the entire design to determine clock latency for all the soft macros
Performs top-level clock routing. (Replaces the top level clock tree by a final clock tree to
achieve an accurate budgeting and OCV analysis)
Performs full chip clock skew balancing
Runs incremental global routing
Note: This command does not perform any hold buffering.
Talus Platform
fix top*
fix block*
fix partition*
fix shape*
fix power
fix pin*
fix budget
fix clock_hier* **
fix plan
fix wire
fix time*
fix power
fix cell
fix clock
fix time*
Flat Design Flow
(Chip or Block)
Automated Chip Creation
Design Flow
* Multiprocessing step (multi-CPUs or
multithreading)
** Not included in the Talus Vortex flow
Not full functionality in the Talus Design flow
fix netlist
fix rtl
fix netlist
fix rtl
Talus
Design
Talus
Vortex
Clock Tree Synthesis Methodology
Talus1.0 11
The fix clock command performs the following tasks:
Routes and buffers clock nets to minimize skew and insertion delay
Optimizes the clock network to meet timing and skew objectives
Inserts buffers to fix hold time violations
Performs incremental timing and placement optimizations to recover timing as necessary
Performs detailed placement
This chapter pays special attention to the first two tasks in the fix clock command: routing and
buffering clock nets and optmizing the clock network.
No matter which flow you are using, you need to set clock constraints. See Chapter 2, Clock
Constraints, for details about using constraints in your design.
If you are using the Automated Chip Creation design flow, see the information about the fix
clock_hier command in n the Talus Automated Chip Creation Flow Guide for more information
about this command.
The Magma clock tree synthesis methodology involves three fundamental steps:
1. Constraint generation
2. Clock implementation
3. Clock tuning
Magma tools also allow you to create a variety of reports that provide information about the timing of
signal paths in your clock tree.
Constraint Generation
You specify constraints for your clock tree by using config..., force..., and rule... commands. The
commands let you apply specific conditions to your clock tree, ranging from skew constraints to the
inclusion of inverters or buffers. The clock routing and clockgating commands adhere to the
constraints you set.
See Chapter 2, Clock Constraints, for details about using constraints in your design.
12 Talus1.0
The run route clock command builds the initial structure of the clock tree. It routes and buffers the
clock tree to minimize skew and insertion delay, taking into account the constraints you specified
earlier in the flow.
This command has several options that control the implementation of your clock tree, including:
Whether to run the command on the entire clock tree or only portions of it
How to balance and size the clock structure
How to propagate nondefault and net shielding rules
See Using the run route clock Command for Clock Tree Synthesis Implementation on page 45 for
an explanation of the run route clock command.
Clock Tuning
The run gate clock command optimizes the previously constructed clock tree to meet your
constraints, which might include the following constraints:
Target insertion delay
Skew
Maximum useful skew
Like the run route clock command, the run gate clock command has options that control its
operation. The options allow you to fine-tune the clock tree to meet your goals by accomplishing the
following activities:
Whether to run the command on all clock nets or only on particular nets
How to adjust clock paths by weight to minimize skew or reduce slack
See Chapter 4, Clock Tuning, for an explanation of the run gate clock command.
Clock Tree Synthesis Reporting
Talus1.0 13
Four reporting commands are helpful in gaining insight into the timing of your clock trees:
The report clock tree command generates a report of the clock tree structure for every clock
tree in your design. You can configure the report to contain a large variety of information in
specified columns. Fanout, wire length, pin capacitance, and arrival time are just a few of the
columns you can create.
The report clock skew command generates a report of skew statistics, with configurable
columns, for all clocks in your design. The report can include data about the number of sinks
and, for each phase of the clock, the maximum rise/fall skew, maximum rise/fall insertion
delay, and the earliest and latest arriving sinks.
The report clock sinks command gives information about insertion delay, clock phases, and
skew groups for sinks in the design. Particularly useful for debugging, it can be called on
subsections of the clock tree for detailed analysis.
The query clock histogram command gives a histogram of clock insertion delay distribution.
In addition, the Clock Viewer in the GUI is valuable for studying the clocks in a design. The
Clock Viewer provides a look into skew groups, insertion delay histograms, and individual
clock sinks, while also providing useful cross-probes to path reports, the clock schematic, the
Clock Tree Browser, and the layout window.
See Chapter 5, Clock Reporting, for an explanation of these commands, as well as other
commands that are useful for gathering information about your clock trees.
14 Talus1.0
Clock Constraints
Using config... Commands to Implement Clock Constraints
Talus1.0 15
2. Clock Constraints
This chapter describes how to configure the most widely used commands associated with clock tree
constraints: the config... commands, the force... commands, and the rule... commands.
Using config... Commands to Implement Clock
Constraints
The config... commands allow you to define specific options used by commands during construction
of the clock tree. This section explains how to use the most commonly used config... commands.
The config clock auto_skew_balance Command
The config clock auto_skew_balance command, when set to on, automatically issues force plan
clock balancing constraints for common situations. The goal is to automatically issue common
constraints so that you do not have to create them manually.
The online man pages contain detailed information about the command syntax. Here are a few
important issues you should keep in mind:
Example:
conf i g cl ock aut o_skew_bal ance on
As a result of setting the value to on, all clock-related commands (such as run route clock, run
gate clock, and report clock skew) first analyze the clock definition to determine whether
generated clock constraints are present. If generated clock constraints are found, force plan clock
constraints are dynamically created to force the generated clock sinks to be balanced with their root
clock sinks.
Clock Constraints
Using config... Commands to Implement Clock Constraints
16 Talus1.0
In addition, force plan clock constraints are dynamically created to balance the rising phase of each
clock with its falling phase. In other words, rising-edge-triggered and falling-edge-triggered sinks of
the same clock are balanced together.
The force plan clock constraints are not persistent, but are dynamically derived each time a clock
tree construction or skew reporting command is issued. This ensures that any changes to clock
timing constraints are always properly handled by clock tree synthesis and reporting.
The constraints generated by turning on the config clock auto_skew_balance command do not
automatically appear in report plan clock. You must use the report plan clock
-auto_skew_balance command to see any constraints automatically generated by the tool.
If you want to use some of the automatic constraints, but not all of them, you can write the automatic
constraints to a file using report plan clock -auto_skew_balance -file. Then, edit that file to reflect
the required constraint set. After that, set config clock auto_skew_balance off and source that file
prior to clock tree synthesis.
Important: The config clock auto_skew_balance command does not guarantee that every
generated clock will be balanced with the source clock. The tool examines the topology
and puts generated clocks in the same skew phase as the source clock only if it detects
a proper divider circuit. Hence, any unusual divider topologies or generated clocks used
inappropriately (such as divide-by-one generated clocks at the output of a buffer) are
not automatically balanced with the source clock.
For more information about the force plan clock command, see The force plan clock Command on
page 18.
The config timing clock multiple Command
The config timing clock multiple command allows multiple clocks to be propagated through
multiplexers, logic gates, or other points where they converge. Enabling the command allows run
gate clock to acknowledge the multiple clock phases that might appear on a wire, and account for all
of them when making changes to balance skew. This helps keep one domain from affecting another
in an adverse way.
The online man pages contain detailed information about the command syntax.
Example:
conf i g t i mi ng cl ock mul t i pl e on
The example enables the propagation of multiple clocks through points where they converge.
Typically, only one clock signal emerges from the convergence point.
For more information, see Using the run gate clock Command for Clock Tuning on page 53.
Clock Constraints
Using force... Commands to Implement Clock Constraints
Talus1.0 17
The config timing mode multiple Command
The config timing mode multiple command is similar to config timing clock multiple, except
instead of allowing multiple clocks to propagate through the same timing nodes in a single mode, it
enables the multimode capability of the tool so that multiple clocks belonging to multiple modes
propagate through the same nodes.
Example:
conf i g t i mi ng mode mul t i pl e on
The example enables the activation of multiple clocks through points where they converge. Typically,
only one mode can be active at a time. The force timing mode command defines which modes are
currently active.
Using force... Commands to Implement Clock
Constraints
The force... commands allow you to establish constraints that affect specific parameters used during
construction of the clock tree. This section explains how to use the following force... commands:
force timing clock
force plan clock
force clock gate_clone
force timing adjust latency
While not restricted to clock tree synthesis, the following force... commands are applied to clock nets
quite often:
force timing latency
force model routing layer
force net nondefault
force net shielding
Clock Constraints
18 Talus1.0
The force timing clock Command
The force timing clock command defines clock specifications to the timer.
The following example defines a reference named CLK at the primary pin $m/clk:
f or ce t i mi ng cl ock $m/ cl k 4n - wavef or m{- r i se 500p - f al l 1500p} - name CLK
The clock in this example has a period of 4n, a rise time of 500p, and a fall time of 1,500p. Figure 2
shows this clock.
Figure 2: Clock Definitions
For clock tree synthesis purposes, force timing clock definitions create the clock phases that
propagate to the clock sinks. Every node in the timing graph is flagged as either CLOCK or DATA.
You can query this value with query node flag is_clock. Every node that is not a clock node is a
data node. For a node to be a clock node, it must have:
A clock definition as a predecessor somewhere in the timing graph,
and
A clock sink as a successor somewhere in the timing graph
If both of these requirements are not met, the node is flagged as DATA.
The force plan clock Command
The force plan clock command establishes clock router and skew minimization settings for your
design. The following sections discuss important areas of clock design covered by this command.
See the online man pages for detailed information about the command syntax.
Applying Constraints to Portions of the Clock Tree
In general, the -pin and -net options to force plan clock can be paired with many other options, and
are used to apply constraints to portions of the clock tree.
Waveform fall
1500p
Waveform rise
500p
Period
4n
Clock Constraints
Talus1.0 19
Use the -net option to apply a constraint to one net only. Use the -pin option to apply a constraint to
only that pin. Generally, a -pin constraint applied on an intermediate node in the clock tree does not
apply to all sinks downstream.
See the online man page for force plan clock for specific details about the options you can use with
-net and -pin and the meaning of -pin when paired with other options. Not all force plan clock
constraints work with -net or -pin.
Applying Skew Balancing Constraints
Some options that you use with the force plan clock command affect the behavior of the run gate
clock command. See Using the run gate clock Command for Clock Tuning on page 53 for more
information about the run gate clock command.
Skew Groups and Generated Clocks
Using the Magma clock router, it is possible to automatically balance entire clock domains together
or specify a set of clock sinks to be balanced separately from the rest of the tree.
Six special force plan clock options are available to greatly expand your ability to control the clock
router. The options apply only when you use the run gate clock -weight skew option.
1. Clock phase (-clock_phase cl ock_phase_name)
A clock phase is the basic element of a clock. Each clock domain created with the force
timing clock command creates two clock phases: one for the rising edge and one for the
falling edge.
Use this option to specify the name of a clock phase to be affected by the -skew_anchor,
-skew_group, and -skew_phase options.
2. Skew phase (-skew_phase phase_name)
A skew phase can contain one or more clock phases. Every clock phase in the same skew
phase is automatically balanced together by the tool. By default, the force timing clock
command creates each clock phase in its own skew phase. Use the force plan clock
-skew_phase command to place clock phases in the same skew phase.
Use this option to specify the name of a skew phase to which the clock phase named by the
-clock_phase option is assigned.
3. Skew group (-skew_group number )
A skew group is a subdivision of a skew phase. Normally, all pins in a skew phase are in
skew group 0 and are balanced as a group. If you have created a set of pins labeled as group
1, for example, the skew phase containing these pins is divided into two skew groups: one
containing the user-specified group and one containing the normal clock pins. This option is
useful if you want to segregate certain sets of clock pins and not balance them with the
default group. You can now define multiple groups of pins and balance them independently.
Clock Constraints
20 Talus1.0
4. Skew anchor (-skew_anchor)
A skew anchor is used to define a target insertion delay for a nondefault skew group (that is,
anything other than skew group 0). A skew anchor is not needed for a skew group if the
group consists only of clock sinks. A skew anchor is required for a skew group that contains a
clockgating cell input or a divider register input as one of its elements. The skew anchor is
applied to the clock input pin of the clockgating cell or divider to instruct the tool that all other
elements of the skew group must be tapped early to match insertion delay with that pin.
For example, a skew group for a nonintegrated clockgating circuit can be created by putting
the clockgating cell clock input and latch clock input into a skew group together. Then, a skew
anchor is defined on the clockgating cell clock input to instruct the tool to tap the latch early to
match delay with the clockgating cell. If the skew anchor is defined on the latch clock input,
the clockgating cell is delayed to match insertion delay with the latch, which is typically not
the desired behavior.
You must use the -pin option with this option.
5. Skew offset (-skew_offset f l oat )
The skew offset, a floating-point number, is used to describe certain phase relationships that
exist when placing multiple clock phases into the same skew phase. The skew offset is used
to adjust the arrival time of a specific clock phase when being compared to another clock
phase in the same group.
For example, to balance two clock phases, CLK:R with an arrival time (AT) of 0 ns and CLK:F
with an AT of 2 ns, put the clock phase CLK:F into the same skew phase as CLK:R with a 2
ns skew offset. More detailed examples are presented later in this section.
You must use the -skew_phase option in conjunction with the -skew_offset option in the
same command execution.
6. Skew dontcare (-skew_dontcare)
Normally, all sink pins are considered during skew balancing. To remove pins from the list of
pins to skew balance, use the -skew_dontcare option in combination with the -pin option.
Important: Using -skew_dontcare does not mean that no sync buffers will be inserted prior to that
sink by the run gate clock command. Instead, it means that you do not care whether
the sink is skewed. In most cases, sink buffers are not inserted prior to that sink but, if all
other sinks in that branch of the tree need to be delayed, that -skew_dontcare sink
might also get delayed.
Clock Constraints
Talus1.0 21
Skew Optimization Mode
Specify a skew using the -max_skew option of the force plan clock command if you want to use the
run gate clock -weight skew command during clock tuning. The -max_skew option denotes the
maximum allowable skew during skew minimization. If you do not define a skew target, the default is
derived based on the library and is approximately a buffer delay.
Slack Optimization Mode
Specify a skew using the -max_useful_skew option of the force plan clock command if you want to
use the run gate clock -weight slack or run gate clock -weight failing_endpoints command
during clock tuning. The -max_useful_skew setting is also honored by the run timing adjust skew
command. The -max_useful_skew option denotes the maximum amount of useful skew allowed.
Important: The -max_useful_skew option does not specify how much skew remains in the design
after useful skewing. It controls the amount of adjustment that useful skewing is allowed
to make per clock endpoint. If a large amount of skew is present before running useful
skew, it is not deskewed during run gate clock. For this reason, if you want useful
skewing in your clock tree synthesis methodology, first use run gate clock -weight
skew to minimize skew, and then implement useful skew on the balanced clock tree.
Specifying Inverters and Buffers
If you do not specify inverters or buffers with force plan clock -inverter or -buffer, respectively, the
run route clock and run gate clock commands choose the repeaters for you. The choice might not
be the one you want.
These are the three choices you can make:
1. Specify -inverter without specifying -buffer.
o
The run route clock and run gate clock commands use inverters only.
2. Specify -buffer without specifying -inverter.
o
The run route clock and run gate clock commands use buffers only.
3. Specify both -inverter and -buffer.
o
When given both buffers and inverters to choose from, the run route clock command
uses buffers in stem buffering (which connects the clock tree to the driver pin) and uses
inverters in leaf buffering (the bulk of the clock tree). The run gate clock command uses
only buffers.
The force plan clock -buffer and -inverter options take a library downmodel as the argument. If you
select a sized model, these options can use only that size buffer or inverter. If you select the
HyperCell model, these options can use any of the unhidden sized models in that entity.
Clock Constraints
22 Talus1.0
Additionally, you can use -buffer multiple times to specify up to two different buffer models. This is
useful only for run gate clock, where you can specify a regular buffer as well as a delay buffer. The
run gate clock command uses the larger delay buffer where needed and the regular buffer
everywhere else.
Specifying Separate Trees
With the -separate_tree option to the force plan clock command, you can specify a pin or set of
pins to be built on a separate tree during run route clock.
A separate tree means that, for a given clock net, a completely separate clock route is tapped off
from the driver of the net to go to the pins on the separate tree. This overrides the normal behavior of
run route clock towards clock sinks, which is to balance the sinks on the same tree. It also
overrides the normal behavior of run route clock towards gating elements, which is to tap those
gating elements into the tree at the best point for insertion delay and on-chip variation (OCV) issues.
Note: The force plan clock -separate_tree command applies only to the run route clock
command.
You might find it helpful to use the -separate_tree option in conjunction with skew groups to
constrain certain pins to be placed in another clock tree. Alternatively, you can use the option on
intermediate clock pins, such as clockgating cells and MUXs, to set early tap points if you need more
control.
Pins are placed into separate tree groups using the integer argument to the -separate_tree option.
All pins in the same numbered group are placed on the same separate tree. Therefore, you can
create multiple separate trees, as needed, using different integers. It does not make sense to put
pins that are not on the same physical net into the same separate tree.
The following example defines the input of a MUX to be tapped at the root during run route clock to
guarantee low latency to its input:
f or ce pl an cl ock $m- pi n " I _MUX/ A" - separ at e_t r ee 2
The following example defines a group of two registers to be in a separate skew group and sets them
to be tapped at the root during run route clock.
f or ce pl an cl ock $m- pi n " I _DFF1/ CLK I _DFF2/ CLK" - skew_gr oup 1
- separ at e_t r ee 1
It is necessary to put clock sink pins in a skew group, as well, when putting them in a separate tree.
This is because the separate tree constraint affects only the behavior of run route clock. The run
route clock command taps early to those pins, but run gate clock -weight skew balances those
pins with other sinks in the skew phase. So, run gate clock undoes the early tapping by delaying
those sinks. If you put them in a separate skew group, run gate clock balances those sinks only with
each other and not with the larger skew phase, preserving the early tapping done by run route
clock.
Clock Constraints
Talus1.0 23
Modifying Clock Phases and Skew Groups
Clock tree designs are becoming more complicated as time progresses. In the interest of power
savings, special clock structures like clockgating cells and dividers are becoming more prevalent.
Talus can specify how to balance and build clock trees, even when building more complex clocks.
This section covers some of the more advanced structures and constraints that you can use to deal
with them when the default flow is not sufficient.
Background
Clock tree synthesis is a two-step process. Two commands are responsible for inserting the clock.
The first is run route clock. This command builds the basic clock topology, in a bottom-up
fashion, and inserts it into the design.
The second is run gate clock. After the basic topologies are inserted, this command works in
a top-down fashion to satisfy design goals like skew and insertion delay minimization. The
run gate clock command performs additional sizing and repeater insertion to meet your
design goals. The run gate clock command supports four different weight options to allow
you to control what goal run gate clock is trying to achieve.
To understand how best to optimize skew, you need to understand how the tool addresses skew
problems. Skew optimization is performed on what is called a skew group basis. This optimization
takes place after the basic clock has been inserted by run route clock. The run gate clock
command performs tuning and balancing. All endpoints in a given skew group are examined, and the
largest insertion delay endpoint is found. The arrival time of the clock at this endpoint is assigned to
each clock endpoint in the skew group as a required time. (Actually, two arrival times are determined
and converted to required timesone for rising and one for falling.)
After the required times are assigned, the tool examines the clock tree by starting at the root and
working down the tree. If delay can be added so the arrival time slack of all clock pins in the current
fanout is improved without causing any arrival time slack violations, delay is added at that point. In
this manner, the number of buffers is reduced by always inserting delay as high in the tree as
possible so that it can be most effectively shared.
This process iterates until the skew target of the skew group is met or the tool reaches the point
where no improvement can be made.
Defining Skew Groups
In a normal flow, you need to define only the clocks for a circuit using the timing constraint force
timing clock. This is done for purposes of static timing analysis, and not just clock tree synthesis.
The process of declaring the clocks also begins the process of defining the skew groups.
To understand this fully, you need to know the definitions of two more terms: clock phase and skew
phase.
Clock Constraints
24 Talus1.0
A clock phase is a timer event that is associated with a particular edge of the source clock.
For any given clock defined by force timing clock, two clock phases are createdone for
the rising edge and one for the falling edge. The clock phases are created with the same
names as the timing clock (using the -name option to force timing clock), with a :R or :F to
denote rising or falling clock phase, respectively. These phases propagate through the circuit
to the endpoints so that events at the clock pins can be traced to events driven by the clocks
defined. Because Talus is capable of propagating multiple clocks through a circuit, it is
possible for any clock pin to have two or more clock phases associated with it. For example,
if CLKA and CLKB are connected to the i0 and i1 inputs of a 2:1 MUX, all clock pins in the
fanout of this MUX will have four clock phases associated with themCLKA:R, CLKA:F,
CLKB:R, and CLKB:F. (This assumes that you allow the propagation of multiple clock
phases.) For more information, see the online man page for the config timing clock
multiple command.
A skew phase is a collection of clock phases. When a clock is defined with force timing
clock, skew phases are automatically created. They are created with the same names as the
clock phases, and each clock phase is placed into the skew phase of the same name. The
purpose of a skew phase is to indicate to the tool which clock phases should be balanced.
Every clock phase is balanced with every other clock phase residing in the same skew
phase.
Example:
f or ce t i mi ng cl ock $m/ mpi n: cl k 10n - name CLK
This constraint creates a clock of period 10 ns, with a name of CLK. This creates the following skew
phases automatically:
A skew phase named CLK:R, which contains a clock phase also named CLK:R
A skew phase named CLK:F, which contains a clock phase also named CLK:F
This might seem redundant, but you see the power of this structure when it becomes necessary to
move clock phases between different skew phases to achieve certain skew balancing goals.
Clock tree skew balancing is done on a per-skew group basis. How does a skew group relate to the
clock phase and skew phase? A skew group is a set of clock pins that have been declared as a
group. By default, all clock pins are placed in group 0, so each skew phase contains one group. If
you create a group of pins labeled by the number 1, for example, the skew phase that contains these
pins is divided into two skew groups: one containing all of the normal clock pins and a separate
group containing the specified group. This is useful for segregating groups of clock pins that have
special circumstances and that you do not want to be balanced with the default group.
If skew groups have been created, you see them in skew reports denoted with a #n appended to the
skew phase name. The n is the number of the skew group. The default group, group 0, is not
denoted by a pound sign.
Clock Constraints
Talus1.0 25
The default behavior is sufficient to balance most clock trees. It is only when there are special
circumstances that you might have to modify this structure. The rest of this section describes the
constraints needed to handle special cases and presents several typical scenarios.
The Constraints
To interact with the clock tree synthesizer and modify the default behavior, use the force plan clock
constraint. This constraint has many options, but only those of particular interest to this topic are
described here. There are five main options that are used to modify the behavior of the skew groups
used during clock tree synthesis. They are:
-skew_phase name
-clock_phase name
-skew_group integer
-skew_anchor
-skew_offset time_value
Use the first two options, -skew_phase and -clock_phase, together when you want a clock phase
moved into a skew phase, other than the default skew phase for that clock phase. For example,
force plan clock $m -skew_phase phase1 -clock_phase phase2 places the clock phase named
phase2 inside the skew phase named phase1.
Use the -skew_group option in combination with the -pin option to force plan clock to place clock
pins into a nondefault skew group. Remember, all clock pins start in skew group 0, the default, but
you can reassigned them into a positive-integer-numbered group to isolate them from the default
group.
Use the -skew_anchor option with the -pin option to force plan clock. A skew anchor pin is a clock
endpoint pin that controls a downstream clock tree. For example, a register that is a divide-by-two
clock generator has a clock input pin that is a skew anchor, because the arrival time of the clock at
that clock pin affects the arrival times of all the clocks in the generated domain that begin at the
register Q pin.
Use the -skew_offset option to describe certain phase relationships that exist when placing multiple
clock phases into the same skew phase. The skew offset adjusts the arrival time of a specific clock
phase compared to another clock phase in the same group. This is useful for specifying complex
relationships between clocks with different periods or different edges of the same clock.
Clock Constraints
26 Talus1.0
Case Study #1: Aligning Two Unrelated Clocks
This first example is a design with two clock domains. One has a period of 10n, and the other has a
period of 15n. The clocks are defined as follows:
f or ce t i mi ng cl ock $m/ mpi n: CLK10 10n - wavef or m{ - r i se 0 - f al l 5n }
f or ce t i mi ng cl ock $m/ mpi n: CLK15 15n - wavef or m{ - r i se 0 - f al l 7. 5n }
The requirements for this design are that the rising edges of both clocks should line up, have the
same insertion delay, and have skew minimized between them.
These two timing constraints generate four skew phases, each containing one clock phase of the
same name. In the default flow, when run gate clock is balancing skew, it examines each skew
group separately, determines the maximum insertion delay in that group, and uses that value as a
target insertion delay for all pins in that group. So, clock pins in skew group CLK10:R have no
relationship to the insertion delay of the clock pins in group CLK15:R. While each group can have
good skew within it, it does not have any relationship to the other. To remedy this, you can assign the
clock phase from one clock to be in the same skew phase as the other, and set them to be
considered together. For example:
f or ce pl an cl ock $m- skew_phase CLK10: R - cl ock_phase CLK15: R
Now, when the run gate clock operation examines the skew group CLK10:R, it considers the arrival
times in the clock phases CLK10:R and CLK15:R to determine the maximum. There is, however, a
minor detail that might lead to unexpected behavior. Even though the rising edges of the two clocks
are now considered together when arrival-time computation is done, the falling edges are not. If one
of these two clocks has a significantly higher insertion delay than the other, the required times for the
rising edges are the same because they are both in the same skew group. The falling edges of each
clock are still considered separately, so it is possible to have a situation on one of these clocks where
the rising-edge required time is based on an insertion delay that is much greater than the
falling-edge required time. When this clock is being optimized, it is impossible to adjust the rising
edge to meet specifications, without causing the falling edge to fail the specifications. As a result,
run gate clock is unable to balance this clock.
The solution is straightforward. Assign the falling-edge clock phases with the rising edges using a
skew offset.
Example:
f or ce pl an cl ock $m- skew_phase CLK10: R - cl ock_phase \
CLK10: F - skew_of f set 5n
f or ce pl an cl ock $m- skew_phase CLK10: R - cl ock_phase \
CLK15: F - skew_of f set 7. 5n
Clock Constraints
Talus1.0 27
These two constraints place the two falling edge clock phases into the skew phase that already
contains the two rising-edge clock phases. The use of -skew_offset describes the relationship
between the falling clock phase and the rising clock phase in the CLK10:R skew phase. In the first
constraint, an offset of 5n is used to specify that, when comparing arrival times in the CLK10:F clock
phase with other arrival times, the arrival times of the CLK10:F edges need to be adjusted by 5n.
This is because the falling edge of CLK10 is 5n after the rising edge of CLK10 and CLK15. Similarly,
the CLK15:F edges are 7.5n after the rising edges of CLK10:R and CLK15:R, so they must be
adjusted by 7.5n to be meaningfully compared.
Case Study #2: Isolated Interface Circuit
For this example, assume that there is a circuit containing 10,000 registers on one clock domain. You
want the majority of these registers to be skew balanced to one another. Also assume that there are
five critical clock pins that need to be balanced to one another, but not to the main tree. They should
be isolated and get as early a clock as possible.
The clock is declared as follows:
f or ce pl an cl ock $m/ mpi n: cl k 5n - wavef or m{ - r i se 0 - f al l 2. 5n }
There are two tasks that you need to do to properly isolate the five special clock endpoints. First, you
must remove them from the default skew group and place them into a nondefault group. You can do
this as follows:
f or ce pl an cl ock $m- pi n " $m/ df f 1/ CLK" - skew_gr oup 1
This creates a nondefault skew group, numbered 1, and places each of the five clock pins inside it.
This constrains run gate clock to balance these five pins separately from the majority of the tree. In
the skew reports, you see the main tree reported in skew groups clk:R and clk:F, and the five special
pins in skew groups clk:R#1 and clk:F#1.
The constraints you issued cause run gate clock to balance these five pins separately from the rest
of the tree. However, because run route clock is responsible for defining the initial structure of the
tree, the initial tree does not isolate these five clock pins onto a separate high-speed clock branch.
The force plan clock constraint has an option called -separate_tree that allows this to happen.
The -separate_tree option functions similarly to a skew group in that it isolates clock pins from the
main clock tree. The difference is that this constraint is applied during run route clock, where the
initial structure is defined. In this example, you add the following constraints:
f or ce pl an cl ock $m- pi n " $m/ df f 1/ CLK" - separ at e_t r ee 1
Clock Constraints
28 Talus1.0
These constraints modify the behavior of run route clock and place these five pins onto a separate
clock branch, which will be faster than the main branch that feeds the rest of the design.
Note that the integer 1 used in the -separate_tree and -skew_group options is not related. These
numbers do not have anything to do with one another. Also, you can assign the pins to the separate
tree and skew group in the same constraint line to expedite the process:
f or ce pl an cl ock $m- pi n pin_list - separ at e_t r ee n - skew_gr oup j
Case Study #3: Generated Clocks (Simple)
A common application of skew grouping is found in circuits that contain clock dividers. This occurs
often enough and the remedy is easy enough to define that, in many cases, Talus handles the
generation of the skew grouping constraints for you automatically. This example covers not only the
constraints that must be issued, but also the following information:
Operations that Talus performs automatically
How to predict what the tool is going to do
How to modify this automatic behavior
How to turn off the automatic behavior
You access the control for enabling the automatic derivation of skew groups through a config...
command: config clock auto_skew_balance on | off.
If the config... command is enabled (on), Talus automatically derives and applies a set of force plan
clock constraints to balance any generated clocks in the design with the source clock that is
generating them. These constraints are derived every time you issue any command that is skew
relatedany command that optimizes or reports skew in any way triggers this automatic derivation.
Do this every time to ensure that Talus always operates on the correct set of clock constraints. It
creates the force constraints each time you run a command that is related to skew. If the constraints
change, the system still operates on the most current information the next time you run a
skew-related command.
You can view or remove the force plan clock constraints you supply using the report plan clock
and clear plan clock commands, respectively. Because the force constraints that Talus generates
automatically are not persistent, you cannot view or remove them with these two commands.
Clock Constraints
Talus1.0 29
You can turn the constraints off by setting config clock auto_skew_balance to off. In addition, you
can preview them by adding the -auto_skew_balance option to report plan clock. When this is
done, a skew group constraint derivation is done automatically, and all force constraints that would
be applied, given the current definition of the clocks, are printed. This gives you the ability to see
what these constraints actually do. You can save this preview report in a file, modify it, and reapply it
to the design as user constraints if you want, thus overriding what Talus would do automatically. You
must ensure that these force constraints are kept up to date with any clock constraint changes.
Consider a simple divided clock example, as shown in Figure 3.
Figure 3: Divided Clock
This is a simple circuit in which a register is fed back to itself to create a divide-by-two clock. The
timing constraints to describe this are as follows:
f or ce t i mi ng cl ock $m/ mpi n: CLK 5n - wavef or m{ - r i se 0 - f al l 2. 5n }
f or ce t i mi ng cl ock $m/ I _DI V/ Q - gener at ed - di vi der 2 - sour ce $m/ mpi n: CLK
Figure 4 shows the waveforms that are created by these two constraints.

CLK
CLK2
I_DIV
CLK
CLK2
I_DIV
Clock Constraints
30 Talus1.0
Figure 4: Waveforms Created by Timing Constraints
Make sure that the rising edge of CLK2 occurs at the same time as the rising edge of CLK at all
endpoints. Because these are actually two clocks, if nothing special is done, they are each balanced
separately and have no regard for one another.
If you have enabled the software to automatically detect this circumstance by turning config clock
auto_skew_balance to on, the following constraints are issued before each skew optimization or
reporting command:
f or ce pl an cl ock $m- skew_phase CLK: R - cl ock_phase CLK2: R - skew_of f set 0
f or ce pl an cl ock $m- skew_phase CLK: R - cl ock_phase CLK2: F - skew_of f set 5n
f or ce pl an cl ock $m- pi n $m/ I _DI V/ CLK - skew_anchor
f or ce pl an cl ock $m- skew_phase CLK: R - cl ock_phase CLK: F - skew_of f set 2. 5n
The first two constraints place both edges of the generated clocks, CLK2:R and CLK2:F, into the
skew phase of the rising edge of the source clock, CLK:R. This is done because the rising edge of
the source clock actually triggers both edges of the generated clocks, due to the division taking
place.
The third constraint places a skew anchor tag on the clock pin of the divider circuit to inform the clock
router not to treat this pin like a standard clock endpoint during the balancing of the clocks. The
presence of a skew anchor in the default skew group (all pins are in group 0 by default) causes Talus
to treat that anchor like a clockgating pin, rather than an endpoint. The tool detects that there is
another downstream clock tree from this point, and this clock pin will be tapped into the main clock
tree early to attempt to compensate for the anticipated additional delay in the CLK2 tree.
The final constraint takes the falling phase of the root clock (CLK:F) and places it into the skew
phase (CLK:R). With the addition of the generated clock phases into the CLK:R skew phase, the
CLK:F skew phase contains only the CLK:F clock phase. In many common configurations, not doing
this can result in different arrival time requirements on the rising and falling edges of clock pins in the
CLK:F and CLK:R clock phases. This happens when the CLK:R skew phase arrival time is
generated by a late arrival in the CLK2:R or CLK2:F clock phase. This late arrival is propagated to all
clock phases in the CLK:R skew phase (CLK:R, CLK2:R, and CLK2:F are all in the CLK:R skew
CLK
CLK2
0 2.5 ns
5.0 ns 10 ns 15 ns 20 ns
Clock Constraints
Talus1.0 31
phase.) When the arrival time for the CLK:F skew phase is calculated, it only has to search in the
CLK:F clock phase, which might not be very deep. So the CLK:F skew phase required time is
created without information about the CLK:R skew phase arrival time. The discrepancy is removed
by moving the CLK:F clock phase into the CLK:R skew phase, and all arrival times are generated in
a consistent manner.
Case Study #4: Generated Clocks (Complex)
This example contains several additional structures that you find in many complicated clock divider
circuits. In this example, you add multistage generated clocks, as well as a more complicated state
machine divider.
Consider the waveforms shown in Figure 5.
Figure 5: Waveforms of Complex Generated Clocks
Generation of these clocks is accomplished by a two-register state machine (that generates the
divide-by-three clock CLK3) and a simple divide-by-two (that generates CLK6 by dividing CLK3 by
2). Assume that the registers generating CLK3 are called I_DIV3_0 and I_DIV3_1. The register
divider generating CLK6 is called I_DIV6.
Here are the timing constraints used to define these three clocks:
f or ce t i mi ng cl ock $m/ mpi n: CLK 5n - wavef or m{ - r i se 0 \
- f al l 2. 5n } - name CLK
f or ce t i mi ng cl ock $m/ I _DI V3_0/ Q - gener at ed - sour ce $m/ mpi n: CLK \
- edges {1 5 7} - name CLK3
f or ce t i mi ng cl ock $m/ I _DI V6/ Q - gener at ed - di vi der 2 - sour ce \
$m/ I _DI V3_0/ Q - name CLK6
0 2.5 ns 5.0 ns 10 ns
15 ns
20 ns 30 ns 40 ns
CLK
CLK3
CLK6
Clock Constraints
32 Talus1.0
One of the key differences between this example and Case Study #3: Generated Clocks (Simple)
on page 28 is that one of the generated clocks (CLK3) is created by a state machine of more than
one register. An important concern is that the two clock pins that comprise this state machine should
be skew balanced to each other. The register that drives the actual generated clock (I_DIV3_0) must
have its clock pin placed in the CLK tree early to account for its downstream delay. Because the
other register in the state machine (I_DIV3_1) needs to be balanced to I_DIV3_0, it too must be in
the tree early. To achieve this, you must create a skew group. The automatic detection in the tool is
not capable of decoding the state machine.
In this example, you create a force... command to make the automatic skew group derivation more
efficient. This is what you need:
f or ce pl an cl ock $m- pi n $m/ I _DI V3_0/ CLK - skew_gr oup 1
f or ce pl an cl ock $m- pi n $m/ I _DI V3_1/ CLK - skew_gr oup 1
This constraint creates a skew group out of the two registers that create the state machine. The other
constraints that are automatically generated are as follows:
f or ce pl an cl ock $m- pi n I _DI V3_0/ CLK - skew_anchor
f or ce pl an cl ock $m- pi n I _DI V6/ CLK - skew_anchor
f or ce pl an cl ock $m- skew_phase CLK: R - cl ock_phase CLK: F - skew_of f set 2. 5n
In the first two constraints, the generated clock phases CLK3:R and CLK3:F are put into the skew
phase CLK:R. The clock phase CLK3:R requires no skew offset adjustment, while the clock phase
CLK3:F requires a 10 ns skew offset. This is because the falling edge of CLK3 occurs 10 ns after the
first rising edge of CLK.
The clock pin that controls the register driving the generated clock, CLK3, has also been declared as
a skew anchor; and you placed it in a nondefault skew group. During clock tree construction with run
route clock, these two pins in the nondefault skew group are treated as clockgate pins and inserted
into the root clock tree early. During skew tuning, run gate clock uses the arrival time of the skew
anchor pin, I_DIV3_0/CLK, as the arrival time requirement for other pins in skew group 1, I_DIV3_1/
CLK. Neither of these pins are balanced to the main tree, but they are balanced to each other.
The next set of constraints places the rising and falling clock phases CLK6:R and CLK6:F into the
skew phase CLK:R. Even though CLK6 is a generated clock of CLK3, because CLK3 itself was
generated by CLK, the CLK6 clock phases follow CLK3 back to the real root, CLK:R. Again, the
rising phase CLK6:R needs no skew offset, and the falling phase CLK6:F needs a skew offset of
15 ns because the falling edge of CLK6 occurs 15 ns after the first rising edge of CLK.
Clock Constraints
Talus1.0 33
The last constraint places the falling clock phase CLK:F into the skew phase of CLK:R, with a 5 ns
skew offset.
The force clock gate_clone Command
The force clock gate_clone command specifies clockgate cloning settings for the model you
specify. These settings are used by run clock gate_clone. To apply settings for specific clockgate
cells, use the -net, -pin, or -cell options. It is best to run the run clock gate_clone command before
using the fix cell command.
All settings specified by this command (except for -dont_clone) are active only for the next
invocation of run clock gate_clone. After the cloning command completes, all settings are
deactivated and are not used for any subsequent cloning. Settings must be reapplied if needed for
subsequent cloning. The -dont_clone setting is not deactivated after cloning completes. This setting
is honored until it is cleared using the clear clock gate_clone command.
To remove settings specified by force clock gate_clone, use the clear clock gate_clone
command. To view current settings, use the report force clock gate_clone command.
The report clock gate_clone command creates a report of the clockgate cells that were cloned for
the specified model. This report is normally generated after clockgate cloning is done with the run
clock gate_clone command. The config report clock gate_clone command configures the
clockgate cloning report.
See the online man pages for complete information about these commands and their options.
The force timing adjust_latency Command
The force timing adjust_latency command enables the automatic adjustment of source, network,
and I/O latencies after clock insertion.
Consider the following example:
f or ce t i mi ng adj ust _l at ency $mboundar y_aver age
As a result of setting the value to boundary_average, the fix clock command automatically calls the
run timing adjust latency command after performing the run route clock and run gate clock
commands. The goal of latency adjustment is to ensure that the arrival times at clock sinks after
clock tree synthesis approximately match the arrival times prior to clock tree synthesis. A perfect
match is impossible, because clock skew is introduced in computed clock mode. The force timing
adjust_latency setting of boundary_average, average, boundary_median, or median determines
which sinks are used to determine arrival times (all sinks or only boundary sinks) and whether to
consider the average or median arrival time at those sinks. The main advantage of running latency
Clock Constraints
34 Talus1.0
adjustment is that it prevents sudden I/O timing shifts due to clock insertion. Without latency
adjustment, it is common for input I/O paths to have better timing and output I/O paths to have worse
timing, due to a nonzero clock insertion delay.
With force timing adjust_latency turned on, the run timing adjust latency command is also called
at the end of the fix wire command to ensure that the latencies are updated for final mode timing.
Recomputing Source and Network Latency
In hierarchical chip design flows, the modeling of the clock is becoming increasingly difficult. In the
block-level flow, it is difficult to know what the clock insertion delay is going to be before the clock is
built, which makes it difficult to properly constrain the block I/O. After block-level clock tree
construction, if the clock that was actually built is not what was originally constrained, it becomes a
problem to adjust the latencies of the block so that new timing problems on the block I/Os do not
become a concern.
Relevant Commands
The run timing adjust_latency command is run at the block level to perform an automatic
adjustment of the ideal latencies set on a block, based on a measurement taken on the actual
computed network. You can run this command yourself or instruct Talus to run the command at
appropriate times in the flow.
The force timing adjust_latency command, when turned on, causes the run timing adjust latency
command to be executed at certain points in the fix clock flow, as well as fix wire flow, when the
clock latencies could potentially change. It also causes run timing adjust latency to be executed
during run prepare glassbox abstract, so that the GlassBox model being created has the most
up-to-date latencies set.
Design Flow Example
To illustrate the process, consider the next design example. For simplicity, assume that a chip is
being constructed with one hierarchical block and that it has some additional logic at the top level.
Also assume that the overall insertion delay budget for the chip is 4 ns.
When the block is constructed, model the expected clock latencies as accurately as possible. So, in
the block level constraints, the following three force timing latency commands are applied:
f or ce t i mi ng l at ency $m/ mpi n: cl k 1. 5n - t ype sour ce
f or ce t i mi ng l at ency $m/ mpi n: cl k 2. 5n - t ype net wor k
f or ce t i mi ng l at ency $m/ mpi n: cl k 2. 5n - t ype i o
This assumes that the block requires 2.5 ns of insertion delay, and the rest of the 4 ns chip budget
(1.5 ns) is used at the top level.
Clock Constraints
Talus1.0 35
In the clock relative I/O constraint methodology, using force timing delay and force timing check,
the actual arrival and required times used at the block I/O are the values used in your constraint plus
the source and I/O latencies you define. Because in this methodology the source latency and
network latencies are being adjusted to maintain arrival times at the clock endpoints within the block,
the I/O latency must also be adjusted to maintain those arrival and required times at the block data I/
O. The adjustment applied to the I/O latency is equal to the opposite of the adjustment made to the
source latency. This is so the source and I/O latency add up to the same delay that they did before
the adjustment.
Because the arrival times of the clock signals at the clock endpoints are maintained in both
computed and ideal modes, it is possible to use absolute arrival and required times at the block I/O,
or use force timing arrival and force timing required. These constraints are used in a top-down
flow when using data pushdown timing.
In either methodology, the arrival times are maintained at both the clock pins inside the block, as well
as the data I/O at the boundary.
Continuing with the example, assume that you want to automatically keep the latencies up to date
throughout the flow. Turn on the config as follows:
f or ce t i mi ng adj ust _l at ency $mboundar y_aver age
Latencies are automatically kept up to date as you proceed through the chip building flow. Assume
that you are now at the fix clock stage of the block-level flow. Run fix clock with a skew target, and
instruct it to minimize skew within that target.
f or ce pl an cl ock $m- max_skew 100p
f i x cl ock $m$l - wei ght skew
Because the automatic latency adjustment has been turned on and the tool is running in skew
optimization mode (during the fix clock flow, after the clock has been turned on), the run timing
adjust latency command is executed. Assume that you are able to achieve a clock skew goal of 100
ps and that the average insertion delay in the clock network is 2.15 ns (quite a bit faster than the
estimate of 2.5 ns). The run timing adjust latency command automatically performs the following
tasks:
cl ear t i mi ng l at ency $m/ mpi n: cl k - t ype sour ce
cl ear t i mi ng l at ency $m/ mpi n: cl k - t ype net wor k
cl ear t i mi ng l at ency $m/ mpi n: cl k - t ype i o
f or ce t i mi ng l at ency $m/ mpi n: cl k 1. 85n - t ype sour ce
f or ce t i mi ng l at ency $m/ mpi n: cl k 2. 15n - t ype net wor k
f or ce t i mi ng l at ency $m/ mpi n: cl k 2. 15n - t ype i o
Clock Constraints
36 Talus1.0
During fix clock, after the clock is constructed, timing is automatically switched to computed mode.
This is done so that some additional sizing and pin swapping can be done to account for small
changes in the design caused by the clock insertion. Because switching to computed mode does not
affect the source latency, which is now set to 1.85 ns, you still have, at each clock endpoint, an arrival
time that is very close to what it was in the ideal mode. The only deviation is the deviation of each
individual endpoint compared to the average latency. This has the positive side effect of not
drastically changing the slack on I/O paths because block-level latency is not exactly as expected.
You assume that the difference in block-level latency is accounted for at the top level.
The force timing latency Command
The force timing latency command specifies ideal mode clock latencies (delays). Most of these
latencies are not used during clock tree synthesis but, instead, give ideal mode timing a better
concept of what the clock tree looks like.
You can specify four different types of latency usign the -type option: network (the default), source,
io, and skew.
The online man pages contain detailed information about the force timing latency command
syntax.
To remove latency constraints set by force timing latency, use the clear timing latency command.
Specifying Network Latency
Network latency is the internal insertion delay for the circuit you are timing (the delay of the clock tree
from the source of the clock to all of the clock sinks). After you use the run route clock command,
the timing switches into computed mode. This means that the network latencies are ignored and,
instead, real clock insertion delays are used. Network latencies are only for modeling timing in ideal
mode.
Normally, network latencies have no impact on clock tree synthesis. But, network latencies do have a
second application when put on clock sinks. In this case, the latency is treated as an offset target
during clock routing. For example, a 200 ps network latency on a clock sink delays the sink by 200 ps
during clock implementation. Likewise, a -200 ps network latency taps the clock sink early by 200 ps.
Network latencies are also valuable to ensure that clockgate enable paths get properly optimized
during fix cell. For more information, see the online man page for run timing adjust latency
-clockgates.
Important: Network latencies are added to any source latencies that have also been defined.
Example:
f or ce t i mi ng l at ency $m/ cl k {- r i se 50p - f al l 300p}
Clock Constraints
Talus1.0 37
The example adds a network latency of 50 ps to the rising edges of clock $m/clk and 300 ps to the
falling edges of the same clock. You do not have to use the -type option because network is the
default.
Specifying Source Latency
Source latency is the insertion delay external to the circuit you are timing. It applies only to primary
clocks. Source latency adds latency to the clock arrival time and does not disappear when the clock
is switched to propagated mode.
Important: Source latency is also added to I/O latency when adjusting arrival and required times at
design I/Os.
Source latency, like jitter, is fixed throughout the Magma design flow. Consider the following
example:
f or ce t i mi ng l at ency $m/ cl k 100p - t ype sour ce
The example adds a source latency of 100 ps to clock $m/clk. If you do not use the -type option, the
default type is network latency.
While source latency might be used to help in the I/O timing specification, it is not always necessary.
Specifying I/O Latency
I/O latency accounts for clock network latency on primary inputs and outputs. It enables you to
correlate Magma timing results with other tools.
Important: I/O latencies are added to any source latencies that have also been defined.
Example:
f or ce t i mi ng l at ency $m/ cl k 200p t ype i o
The example sets a 200 ps latency for all primary I/O pins referenced to $m/clk.
I/O latency is added to the input pin constraints (arrival times) that are set by the force timing delay
command and added to output pin constraints (required times) that you set with the force timing
check command.
Specifying Skew Latency
Skew latencies are created by run timing adjust skew to implement useful skew in ideal clock
mode.
Clock Constraints
38 Talus1.0
Skew latencies are nearly identical to network latencies, in that the timer treats both identically. But,
when you run the run timing adjust skew command incrementally, overwrites pre-existing skew
latencies. It does not overwrite network latencies. For this reason, it is advisable to use network
latencies rather than skew latencies when manually applying latency offsets to clock sinks, to
prevent those constraints from being overwritten."
Skew latencies can be cleared separately with the clear timing all -type skew_latency command.
The force model routing layer Command
The force model routing layer command constrains the router to use routing layers between the
specified lowest and highest layers. You can use the -net_type option to indicate that the routing
layer constraint applies only to clock nets.
The online man pages contain detailed information about command syntax.
Example:
f or ce model r out i ng l ayer $mhi ghest METAL4 - net _t ype cl ock
This example sets the highest routing layer on model $m to METAL4. The constraint applies to clock
nets only.
The highest routing layer constraint is a hard constraint that is never violated. A clock net is never
routed above the highest allowed layer, regardless of routing blockages or congestion.
The lowest routing layer constraint is a best effort constraint, because the clock router has to be
able to tap down to buffers and clockgating cells, which necessitates violating the constraint. It is also
possible that the lowest level constraint might be violated in cases of bad routing blockage definitions
or extreme congestion. This is very rare, however, because clocks and critical signal nets are given
top priority during routing.
The force net nondefault Command
After using the rule nondefault command to define spacing and width requirements for a clock net,
use the force net nondefault command to apply the rule to the clock net. After the rule has been set
on the net, that net uses the nondefault rule instead of the library default rule.
The -propagate_clock option is used specifically on clock nets. With this option, the nondefault rule
is propagated to all clock nets that have the same phase tag and are driven by the same net.
Propagation passes the constraints through buffers and clockgating cells.
Clock Constraints
Using the rule nondefault Command to Implement Clock Constraints
Talus1.0 39
Note: The -propagate_clock option controls how and if the nondefault rule is propagated to
downstream nets in a design as they exist when the force net nondefault command is
issued. The -nondefault_mode option of the run route clock command controls how the
rules are propagated as run route clock buffers the clock and creates new nets. For
information about the run route clock command, see Chapter 3, Clock Implementation.
The online man pages contain detailed information about command syntax. Consider a nondefault
rule you create called spacing_rule. To apply the rule to the clock net called /work/top/top/net:CLK,
run the following command:
f or ce net nondef aul t / wor k/ t op/ t op/ net : CLK spaci ng_r ul e
The force net shielding Command
The force net shielding command places a shielding constraint on a specified net to ensure that
shield wires are created by the run route shielding command. It reserves space for the shield wires
by placing a spacing requirement on the net.
The -propagate_clock option is used specifically on clock nets. With this option, the shielding rule is
propagated to all clock nets that have the same phase tag and are driven by the same net.
Propagation passes the constraint through buffers and clockgating cells.
Note: The -propagate_clock option controls how and if the shielding rule is propagated to
downstream nets in a design as they exist when the force net shielding command is issued.
The -shielding_mode option of the run route clock command controls how the rules are
propagated as run route clock buffers the clock and creates new nets. For information about
the run route clock command, see Chapter 3, Clock Implementation.
If you issue the run route clock -shielding_mode manual command, the rule that was applied to
the clock net with the force net shielding command is applied to all newly created clock nets.
For more information about the run route clock command, see Chapter 3, Clock Implementation.
Using the rule nondefault Command to
Implement Clock Constraints
The rule nondefault command creates a nondefault routing rule for a library. In the context of clock
tree synthesis, you can create rules governing width and spacing of clock nets. While it is involved in
design issues other than clock tree synthesis, the command is also used by clock nets
See the online man page for rule nondefault for detailed information about the command syntax.
Clock Constraints
Using the rule nondefault Command to Implement Clock Constraints
40 Talus1.0
Example:
r ul e nondef aul t $l cl ock_spaci ng {METAL4 0. 4u 0. 8u}
This example defines a rule for METAL4 with a width of 0.4 m and a spacing of 0.8 m.
Clock Tree Synthesis Implementation
Talus1.0 41
3. Clock Implementation
This chapter explains how to build the initial structure of the clock tree using the run route clock
command. It also introduces the Clock Tree Browser.
The heart of clock tree synthesis in the Magma environment is the fix clock command.
Figure 6 shows the input and output of clock tree synthesis.
Figure 6: Input and Output of Clock Tree Synthesis
Placed, optimized
design
Clock tree synthesis
Placed design with
synthesized clock trees
that is (optionally)
hold buffered
Input
Output
42 Talus1.0
Preparing for Clock Tree Synthesis
The following sections provides information you should know before you undertake clock tree
synthesis.
Clock Signals and Data Signals
Before discussing clock tree synthesis, it is important to understand the difference between a clock
signal and a data signal in the Magma tools. Anytime the Static Timing Analyzer updates the timing
of the design model, every timing node is flagged as either clock or data. You can check this value
with the query node flag is_clock command. Normally, a node is considered to be a data node,
unless it meets the requirements to be a clock node:
A definition implemented by the force timing clock command must be traceable in the
timing graph as a predecessor.
A clock sink must be traceable in the timing graph as a successor.
A clock sink can be a flip-flop clock pin, a latch enable pin, or a hard-macro clock pin. It can also be a
manually constrained clock balance point on an output mpin of the design, or maybe on an input pin
of a piece of hierarchy, like a black box or GlassBox model.
If both of these requirements are not met, a node is considered data. Clock tree synthesis operates
only on clock nodes.
In a complex clock network, it is possible that a pin fans out to both clock sinks and data endpoints.
In such a case, any node having both clock and data sinks in its fanout is flagged as a clock node,
because it meets both of the requirements. After the tree has only data endpoints in the fanout, the
nodes are flagged as data nodes.
Library Preparation
Most libraries have separate buffers and inverters for optimization and clock tree synthesis. Because
these buffers or inverters have differing characteristics, they must be separated into different entities
so that the HyperCell models will be characterized correctly. Additionally, it is standard practice to
hide the clock buffer and inverter entities so that normal optimization does not use them. See the
information about library and design preparation in the Talus Library Preparation Technology Guide
for details.
Forcing Specific Cells to Be Used by the fix clock Command
After performing the fix cell command and prior to running the fix clock command, the cells that
were hidden by the library preparation process should be made available to the clock router. You can
remove the hidden property from the buffers and inverters with the clear hide command.
Talus1.0 43
Example:
cl ear hi de $l / CLKBUF
cl ear hi de $l / CLKI NV
The following example constrains the clock router to use only models found in the CLKBUF and
CLKINV entities during clock tree expansion.
f or ce pl an cl ock $mbuf f er $l / CLKBUF/ CLKBUF_HYPER \
i nver t er $l / CLKI NV/ CLKI NV_HYPER
After the completion of the fix clock command, additional optimization might be required. To prevent
the use of clock models for data signal buffering, hide the special clock entities after the fix clock
command ends. You can do this with the force hide command.
Example:
f or ce hi de $l / CLKBUF
f or ce hi de $l / CLKI NV
Synthesizing Clocks With the fix clock Command
Clock tree synthesis is handled by the fix clock command. Before running fix clock, apply all clock
constraints to the design. This includes force timing clock definitions, as well as the force plan
clock constraints that guide the clock router.
Most flows involve building minimum insertion clock trees that are skew balanced. For backward
compatibility reasons, the default behavior of fix clock is to implement useful skew during clock tree
construction, without any effort toward skew balancing. This does not comply with most common
clocking strategies. To get the tool to build balanced trees, run the fix clock command with the
following option:
f i x cl ock $m$l - wei ght skew
Figure 7 on page 44 shows a schematic example of a simple clock tree containing two branches of a
single clock, in which one branch is gated and the other is not. At this point in the flow, the design has
just finished going through the fix cell command and the clock tree has not been implemented yet.
There are 200,000 balance points on the main branch, and 50,000 balance points on the gated
branch. All of the registers, along with related cells, have been placed and sized.
Prior to the fix clock part of the flow, the skew between the balance points has been idealized by the
timer, which eliminates the delay through gating logic. The timer uses the timing specification to
determine the idealized arrival time of the clocks at each of the balance points. The ideal arrival
times do not calculate any delay for existing logic in the clock tree, such as the gate logic in Figure 7
on page 44.
44 Talus1.0
Figure 7: Idealized Clock Input to Clock Implementation Process
Nodes that require a clock include the following:
Clock inputs of sequential cells
Nodes with an applied force timing clockbalance constraint
In the previous example, the default insertion delay is applied to the clock tree. The value of the
default insertion delay is 0 ns. Therefore, the arrival time of the clock signal at the CK pins is at 0 ns.
This default, or ideal, insertion delay is used from the fix time stage of the flow through the fix cell
stage of the flow.
Using Higher Effort Clock Tree Synthesis Flows
It is possible to increase the level of effort applied during clock tree synthesis in order to achieve the
optimal solution. If absolute best skew and lowest buffer count are required, add the -clock_effort
high option to the fix clock command. This increases the amount of compute resources and
memory required to perform clock tree synthesis, but usually provides the best results possible. In
many cases, this can lead to improvements in power as a result of the best use of buffers (buffer
count reductions over the default flow), but this is not exclusively a low-power feature.
If low power is the primary goal, use the config optimize clock_power on command. This
automatically applies the high-effort techniques, as described previously, as well as enables other
specific power-saving methodologies in an effort to reduce the dynamic clock tree power as much as
possible.
For design flows for which you want to use a low-skew approach, you can also introduce additional
effort during placement to attempt to keep registers close together at the leaf level, thus reducing the
net capacitance and the driver requirements necessary to drive them. This is done by adding the
-placement option to config optimize clock_power. The default value is off, meaning no
CK
Q D
CK
Q D
R1 R3
CK
Q D
R2
CK
Q D
R4
clk
clk_en
gated_clk
200K loads
50K loads
balance points
Using the run route clock Command for Clock Tree Synthesis Implementation
Talus1.0 45
placement optimization is done. Other accepted values are early (recommended), late, or both.
Using early enables the placement optimization to occur during fix cell at a point where it is least
disruptive. Use late to cause some additional optimization to occur at clock tree synthesis runtime, or
use both to enable both techniques. It is important to understand that these techniques apply only to
low-skew flows run with fix clock -weight skew, and to no others, because the placement
optimization and useful skew scheduling are often at odds with one another. If useful skew is a
desired methodology, skip placement optimization. Finally, because this optimization occurs during
fix cell, remember to enable it before you use the fix cell command.
Aside from the effort applied to clock tree synthesis, it is possible to also increase the effort fix clock
applies to any timing optimization that it performs as well. To do this, use the -timing option to
fix cell. This option does not accept arguments; it is either used or not.
Using the run route clock Command for Clock
Tree Synthesis Implementation
The run route clock command builds the initial structure of the clock tree. By default, all clock nets
are routed.
Running the run route clock command on a previously routed clock net unroutes, and then
reroutes, the previous result. Be sure this is your intention before attempting this operation.
Depending on the attributes you want your clock tree to have, several options are available to help
you control the manner in which your clock tree is synthesized.
This is the syntax of the run route clock command:
r un r out e cl ock model lib [ - net net] [ - pi n pin] \
[ - nondef aul t _mode mode] [ - shi el di ng_mode mode] \
[ - over r i de_si gn_i n_check] [ - separ at e_gat e_t r ee] \
[ - cr osst al k] [ - nosi ze] [ - samesi ze] [ - nogl obal ] \
[ - ef f or t medi um| hi gh] [ - pr ot ot ype] [ - hi er ] [ - bot h_edges]
Controlling the Scope
Rather than running the command on your entire design, you can run it on a subset of nets by using
the -pin or -net options.
Use the -pin option to route the entire fanout from a specific primary input or output pin of a cell. The
clock router traverses gating elements and multiplexers (MUXs). Depending on the structure of the
design, it might route several nets.
46 Talus1.0
Use the -net option to route and buffer only on a specified net. The clock router does not traverse
gating elements and MUXs.
If using -pin or -net calls the clock router on a previously routed clock net, it is unbuffered and
unrouted first, and then reimplemented.
Clock Repeater Naming Conventions
The repeaters inserted by the run route clock command follow a naming convention that helps you
to identify why a particular repeater was added.
Tree buffers are added to create the high-fanout clocktree to all the sinks. Stem buffers are added to
connect the clock driver pin to the top of the tree created by the tree buffers. Sync buffers are added
by run gate clock, and are discussed in more detail in Chapter 4, Clock Tuning.
buffer tree:<net_name>_L_<j>, where =level number, and
<j>=1,2,... n1, where there are n buffers at level i.
buffer_stem:<net_name>_S_<j>, where =level number, and
<j>=1,2,... n1, where there are n buffers at level i over all stems.
CLK_SYNC_: There are delay elements added by the run gate clock command to satisfy
the minimum insertion delay.
The query model buffer_count command returns the number of inserted repeaters of various types
in the design. Use this command to report the total number of clock buffers or the number of tree,
stem, and sync buffers. See the online man pages for syntax of this command.
Handling Gated Clock Trees
When a clocktree contains gated subbranches, the run route clock command automatically
determines where to tap those clockgating cells into the clock tree. The tapping level is a trade-off
between minimizing insertion delay and improving on-chip variation (OCV) robustness. The deeper a
clock-gating cell is placed in the clock tree, the better for OCV, because there is more common path
between two different gated subbranches. If a clockgating cell is inserted too deep into the clock tree,
it can cause the overall insertion delay of the clock tree to increase. Therefore, the run route clock
command taps clockgating cells as deep as possible to maximize common path, while not tapping so
deep as to artificially increase the worst insertion delay.
You can manually override the default tapping behavior with force plan clock -separate_tree. For a
given clock net, one clock pin or a group of clock pins can be placed in a separate tree together. This
builds a separate clock tree from the root of that net, and that tree drives only those pins placed in
that separate tree. This technique is used to force run route clock to tap as early as possible to
Talus1.0 47
those sinks for the given net. See Figure 8 and Figure 9 for the difference between normal tapping
behavior and a using -separate_tree constraint. For more information about the force plan clock
command, see Chapter 4, Clock Tuning.
Figure 8: Gated Clock Tree With Normal Tapping Behavior
Figure 9: Gated Clock Tree With Separate Tree Constraint on Clockgating Cell
Tap-in builds one tree for
both branches.
Gated branch is driven from
a separate clock tree.
48 Talus1.0
Propagating Nondefault Rules
Use the -nondefault_mode option to select how nondefault rules are propagated throughout the
clock tree by running the following command:
r un r out e cl ock - nondef aul t _mode mode
Some -nondefault_mode option settings assume that you have placed a rule on the clock using the
force net nondefault command. For more information about the force net nondefault command,
see Chapter 4, Clock Tuning.
The mode argument can be one of the following:
none
In this mode, new nets created as a result of buffering do not get a nondefault rule assigned
to them.
manual (default)
This mode takes the rule that was applied to the clock net with the force net nondefault
command and assigns it to all newly created nets.
noleaf
Similar to manual mode, the noleaf mode differs slightly in that it does not apply nondefault
rules to newly created nets if they fan out to a clock leaf pin. Not applying nondefault rules on
leaf-level nets helps prevent routing congestion.
tapering
This mode is an automatic method for creating tapered routing rules. Each level up from the
leaves is automatically increased in width and spacing. You must specify a nondefault rule on
the net that is used as an upper bound for all newly created rules.
double_s
This mode is an automatic method of routing the clock with a double spacing rule. No force
net nondefault command is needed. This mode implies the use of the noleaf mode as well.
enforce_double_s
This mode is the same as the double_s mode, except that the nondefault rules are enforced
on the top two metal layers used to route the clocks. An enforced nondefault rule means that
the DRC checker flags violations of the nondefault rule. For example, if clock routing is
allowed on M1 through M6, this mode applies double spacing rules to all clock layers, and
enforces the double spacing rules on M5 and M6. This mode also implies the use of the
noleaf mode.
Talus1.0 49
Propagating Shielding Rules
By using the -shielding_mode option, you can control the propagation of shielding rules through the
clock nets. This is the command syntax:
r un r out e cl ock - shi el di ng_mode mode
Because earlier stages of the flow can detect that shielding will be inserted, the routers reserve
space for the shields and the extractors use the shields when estimating wire capacitance. The wires
are physically inserted by the run route shielding command later in the flow.
The mode argument can be one of the following:
none
In this mode, new nets created as a result of buffering do not get a shielding rule assigned to
them.
manual (default)
This mode takes the rule that was applied to the clock net with the force net shielding
command and assigns it to all newly created nets.
noleaf
Similar to manual mode, the noleaf mode differs slightly in that it does not apply shielding
rules to newly created nets if they fan out to a clock leaf pin.
greedy
The greedy mode marks all clock nets for shielding, but it does not constrain them
immediately. The mode allows signal nets to route next to the clock nets. Any gaps left after
routing are then shielded after running the fix wire command.
auto
This mode causes minimum width shielding constraints to be applied to all nets created by
the clock router. No force net shielding commands are required.
Reducing Crosstalk Sensitivity
The -crosstalk option applies double spacing on all routed clock nets to reduce crosstalk sensitivity.
The option has the same effect as using -nondefault_mode double_s. It is part of the crosstalk
avoidance flow.
50 Talus1.0
There is a tradeoff between eliminating wire coupling on the clock net versus minimizing area. The
-crosstalk method of double spacing the clock is minimal and not strictly enforced. If crosstalk is a
serious problem in your design, triple spacing or shielding is a more effective tactic. Shielding is the
most effective approach, but it costs more in area than double or triple spacing the clock nets.
Controlling the Size of Buffers at the Same Level
By default, all buffers are sized independently so that skew at each level of the clock tree is
minimized. If you want all buffers at the same level of the tree to be the same size, use the
-samesize option for the run route clock command. The tool selects the optimal size buffer for each
level based on the worst-case load at that level.
The -samesize option and the -nosize option, discussed in the following section, are mutually
exclusive.
Sizing Buffers and Inverters
Normally, each buffer in the clock tree is sized to try to match the buffer delay at each level. Use the
-nosize option for the run route clock command to skip this sizing step.
The -samesize option and the -nosize option are mutually exclusive.
Skipping Global Routing
The -noglobal option for the run route clock command causes the global routing update to be
skipped. This is very useful if you are planning to perform many calls to the run route clock
command. Only the last call needs to update the global route. Allowing earlier commands to skip this
step saves time.
Skipping the Sign-In Check
When invoked, the run route clock command first performs a library sanity check to make sure the
library RC values, buffer and inverter typical loads, slew limits, and antenna information are
reasonable. If they are not, an error message displays and execution of the command terminates.
Use the -override_sign_in_check option for the run route clock command to skip the library sanity
check and allow the command to continue running.
Using a Special Timing Optimization Command for Clock Tree Synthesis Implementation
Talus1.0 51
Using a Special Timing Optimization Command
for Clock Tree Synthesis Implementation
The tool includes a special optimization command, fix opt global, that is based on techniques used
to optimize timing beyond that achieved using the standard fix clock command.
Note: This command is not part of the default fix-command-based flow and should only be used
when absolutely necessary to achieve timing convergence on difficult timing designs.
After running fix clock, you can use the fix opt global command to improve the timing of the design.
The command performs optimization techniques such as timing-driven placement, gate sizing, and
unbuffering. It also performs incremental detailed placement, global routing, and (optionally) track
routing.
Clock Implementation: Key Points To Remember
While using the run route clock command, keep a few important points in mind:
Latencies
Latencies are never used during initial clock tree construction, regardless of the options you
use. Specifically, latency targets specified with the force timing latency command are not
taken into account at this stage of the flow.
Skew goals
Skew goals are not used during this stage. The run route clock command builds all clock
trees to be as fast as possible. Skew goals are handled during clock tuning, using the run
gate clock command. For information about clock tuning, see Chapter 4, Clock Tuning.
Clock Implementation: Key Points To Remember
52 Talus1.0
Clock Tuning
Using the run gate clock Command for Clock Tuning
Talus1.0 53
4. Clock Tuning
This chapter describes how to tune the clock tree using the run gate clock command.
Clock tuning involves the optimization of previously constructed clock trees to meet various design
goals. Tuning is the part of the clock tree synthesis flow that works to meet the clock constraints by
resizing and buffering clock nets. It attempts to meet the fllowing constraints:
Target insertion delay
Maximum skew
Maximum useful skew
Using the run gate clock Command for Clock
Tuning
You perform clock tuning with the run gate clock command. You can run the command several
times to meet different objectives.
Like the run route clock command, the run gate clock command has several options that let you
control how clock tuning is accomplished. Additionally, several force plan clock settings control the
behavior of run gate clock.
This is the syntax of the run gate clock command:
r un gat e cl ock model lib [ - pi n string] \
[ - wei ght sl ack | skew | boundar y | f ai l i ng_endpoi nt s] [ - bot h_edges] \
[ - pr ot ot ype] [ - hi er ] [ - power ] [ - over r i de_si gn_i n_check] \
[ - ef f or t medi um| hi gh] [ - l egal i ze on | of f ]
Clock Tuning
54 Talus1.0
Using the Optimization Modes
Using the -weight option allows you to control the balancing and tuning of the clock tree. This option
is closely tied to constraints you set with the force plan clock and the force timing latency
commands earlier in the flow.
Four modes are available:
1. slack (default)
2. skew
3. boundary
4. failing_endpoints
The slack Mode
Use the -weight slack option to implement the slack mode.
The slack mode, which is the default mode if you do not use the -weight option, helps you achieve
better timing by adjusting clock paths to improve slack. This mode does not address skew. Using
slack mode might not result in the minimum skew solution because skew is adjusted to obtain the
best slack solution.
The slack mode algorithm differs from the failing_endpoints mode algorithm for slack minimization.
The slack mode attempts to minimize the worst negative slack. This means that it is always focused
on the worst violators. If no improvements can be made on the worst slack path in the design, other
paths are not investigated. This behavior might not be desirable if the design has constraint or other
known issues causing bad, uncorrectable slack.
The slack mode uses the constraint you set earlier with the force plan clock -max_useful_skew
option.
Important: The -max_useful_skew option does not control the maximum overall skew. It controls
the amount of adjustment that the run gate clock -weight slack command is allowed to
make per clock sink. If a large amount of skew is present before running the run gate
clock -weight slack command, it is not deskewed during run gate clock.
Clock Tuning
Talus1.0 55
Consider the example in Figure 10.
Figure 10: Useful Clock Skew Example
Figure 10 depicts four pipeline registers. The long path in the design is between the middle two
registers (R2 and R3). If the path from R3 to R4 is shorter than the cycle time and has positive timing
slack, it is possible to borrow that positive slack to achieve timing goals in the long path. This is
accomplished by adding clock buffers to slow the clock to R3.
The -weight slack option respects the force plan clock -max_useful_skew constraint each time
that you perform run gate clock -weight slack.
For more information about the force plan clock command, seethe online man page and Chapter 2,
Clock Constraints.
The skew Mode
Use the -weight skew option to implement the skew balancing mode.
CK
Q D
R3
CK
Q D
R2
clk
R2D
R3D
CK
Q D
R4
fast clk
fast clk
R4D
fast clk
slow clk
clk
slow clk
CK
Q D
R1
CLK_SYNC_SLACK buf
clk distribution
fast clk
Clock Tuning
56 Talus1.0
The skew balancing mode attempts to minimize skew to the limit you set with the force plan clock
-max_skew option. Skew is minimized by first determining the latest arriving clock sink, and then
slowing all other clock sinks to match delay with the latest sink. This is done on a per-skew-group
basis across the entire design.
If you do not specify a value, the -max_skew option defaults to a library-dependent value, equal to
approximately a buffer delay.
The skew mode respects and implements the following force plan clock constraints:
-target_insertion_delay
-skew_dontcare
-skew_care
-skew_phase
-skew_offset
-skew_group
Additionally, skew mode respects any network or skew latency constraints placed on clock sinks with
the force timing latency command. These latencies are implemented as offsets. For example, a
200ps network latency on a clock sink causes it to be delayed by 200 ps with respect to the insertion
delay of its skew group. A negative latency causes early tapping. Note that these latencies must be
placed on clock sinks to be implemented by run gate clock. Any latencies on clock source pins or
intermediate nodes in the clock network are ignored during clock tree synthesis.
Because the only mechanism that run gate clock has to control skew is to add delay, one large
insertion delay endpoint causes the rest of the clock endpoints to be delayed to meet it. Sometimes,
the better way to address skew problems is to address the insertion delay problems that caused the
bad skew to begin with. This must be done during run route clock, because run gate clock never
decreases the insertion delay to a clock sink.
The skew mode attempts to minimize skew on both edges of the clock at all clock endpoints.
Consequently, libraries with unbalanced buffers might have trouble converging on a good skew
value. In addition, clock networks with many gating elements might have trouble converging on a
good skew due to differences in rise and fall delays in these cells. Also, non-unate gating elements,
such as XOR gates, can hurt skew balancing because they cause the number of timing events in the
clock network to be doubled, effectively doubling the number of events that have to be balanced at
the clock sinks.
For more information about the force plan clock command, see the online man page and
Chapter 2, Clock Constraints.
Clock Tuning
Talus1.0 57
The boundary Mode
Use the -weight boundary option to implement the boundary mode.
The boundary mode, a hybrid between skew mode and slack mode, attempts to minimize skew
between all registers that interact with the design I/O. It minimizes the clock skew for all phases of all
clocks connected to registers that send or receive signals to or from the design boundary.
As in skew mode, boundary mode minimizes skew by slowing faster clock paths to meet the limit
you set with the force plan clock -max_skew option. Similarly, as in slack mode, boundary mode
improves timing by slowing clock nets to destination registers to stretch the clock cycle time to those
registers.
The boundary mode gives the appearance from the outside of a skew balanced clock, while
allowing freedom inside the block to meet slack goals through clock skew tuning.
For more information about the force plan clock command, see Chapter 2, Clock Constraints.
The failing_endpoints Mode
Use the -weight failing_endpoints option to implement the failing_endpoints mode of useful skew.
The failing_endpoints mode is similar to slack mode in that it performs useful skew, but it uses a
different algorithm. It focuses on minimizing the total number of failing endpoints in the design. So,
unlike slack mode, failing_endpoints mode examines subcritical paths for improvement. This
method of slack optimization results in a worst negative slack similar to the slack mode algorithm, but
usually has better total negative slack and fewer failing endpoints. As a result, the timing might be
better, but at the expense of more useful skew applied in the design.
The failing_endpoints mode uses the constraint you set earlier with the force plan clock
-max_useful_skew option.
Understanding Clock Repeater Naming Conventions
The repeaters inserted by the run gate clock command follow a naming convention that helps you
identify why a particular repeater was added:
CLK_SYNC_SKEW_are added if the -weight skew option is used.
CLK_SYNC_USKEW_are added if the -weight slack option is used.
CLK_SYNC_FEP_are added if the -weight failing_endpoints option is used.
For the naming convention of repeaters inserted by the run route clock command, see Chapter 3,
Clock Implementation.
Clock Tuning
58 Talus1.0
Mixing Optimization Modes
Sometimes, it is useful to perform the run gate clock command more than once using different
optimization modes.
Important: If you run the command more than once with different modes, be careful that the second
pass does not undo work from the first pass.
Consider the following scenario:
You are working on a design and want to take advantage of useful skew to meet your timing goals.
But you find that the run gate clock -weight slack command leaves too much skew between
noncritical endpoints, causing hold problems.
A solution to this problem is to perform the run gate clock command two times: once with the
-weight skew option and a second time with the -weight slack option. The code might look like this:
# I mpl ement t he cl ock t r ee
r un r out e cl ock $m$l
# Est abl i sh a l oose const r ai nt f or skew, but do not over do
# i t because skew i s not your pr i mar y concer n. You j ust want
# t o cl ean up t he r eal l y bad skews.
f or ce pl an cl ock $m- max_skew 300p
r un gat e cl ock $m$l - wei ght skew
# Per f or mr un gat e cl ock agai n t o i mpl ement usef ul skew t o
# t r y t o f i x t i mi ng pr obl ems
r un gat e cl ock $m$l - wei ght sl ack
Most design methodologies call for reducing the overall clock skew in the design, sometimes
allowing useful skew for critical paths. For this reason, most methodologies first use -weight skew
during fix clock, followed optionally by a pass of useful skew, either during fix clock, fix opt global,
or in final mode after fix wire.
Controlling the Scope
Rather than running the run gate clock command on your entire design, you can run it on a subset
of nets by using the -pin option.
Clock Tuning
Clock tuning: Key Points To Remember
Talus1.0 59
Use the -pin option to balance the entire fanout from any given pin. The clock net connected to each
pin and all clock nets in the transitive fanout cone of the pin are optimized. The optimization
traverses gating elements and MUXs. Depending on the structure, it might operate on several nets.
By default, all clock nets are optimized.
While using the run gate clock command, keep these important points in mind:
Changing arrival times
The run gate clock command can change arrival times only by slowing branches. If you are
trying to meet a target latency and are already failing, the run gate clock command does not
fix the problem. You must investigate the initial structure of your clock tree or the target
latency to alleviate the problem.
Performing clock tuning
Never perform the run gate clock command on a net that has not been initially buffered with
the run route clock command.
Clock Tuning
60 Talus1.0
Clock Reporting
Reporting Commands
Talus1.0 61
5. Clock Reporting
This chapter describes how to build reports about the clock tree.
After implementing and tuning your clock tree, it might be good enough to examine timing results to
assess whether clock tree synthesis met your design requirements. However, you might also want to
evaluate the resulting clock skew. Alternatively, you might want to look at latencies or the levels of
logic in the clock tree.
Reporting Commands
Magma clock tree reports allow you to extract such relevant information about your clock tree
implementation. You can generate reports with the following commands:
report clock tree
report clock skew
report clock sinks
report clock latency
Additionally, the Clock Viewer in the GUI provides many useful ways to visualize and report on your
clock tree. See Using the Clock Viewer on page 71.
The following section explains each reporting command and its associated config... command, if
applicable.
Clock Reporting
Reporting Commands
62 Talus1.0
The report clock tree Command
The report clock tree command generates a report of the clock tree structure for all elements in the
clock tree. This is the syntax for the report clock tree command:
r epor t cl ock t r ee model [ - pi n pin] [ - mode] [ - l at e] [ - ear l y] \
[ - f i l e filename] [ - append filename] [ - st r i ng] [ - noheader ] [ - nohi er ] \
[ - pr ot ot ype]
Each row in the report contains the name and downmodel of a specific clock node in the clock
network, organized by number indicating the level in the clock tree, starting with zero for the root of
the clock tree.
You can customize the columns in the report to contain only the information you want, such as net,
pin, arrival time, successor, predecessor, and others. Use the config report clock tree command to
specify the information you want the report to contain.
This is the syntax of the config report clock tree command:
conf i g r epor t cl ock t r ee value [ -prototype]
where val ue is a list of column identifiers enclosed in double quotation marks () or curly braces ({}).
See the online man pages for a complete list of identifiers. The -prototye option for this command
applies the settings to the clock tree report for soft macros (invoked using the -prototype option for
report clock tree).
Running the config report clock tree command without an argument returns the currently specified
set of column identifiers. Running the command with an invalid argument, such as an empty set,
returns the entire list of valid column identifiers.
Example:
conf i g r epor t cl ock t r ee {SOURCE BALANCE RT_RI SE OFFSET}
This example configures the clock tree report to contain columns for the following items for each
node in the clock tree:
Source name
Position relative to the balance point
Required time for the rising edge of the input pin
User-defined offset
Clock Reporting
Reporting Commands
Talus1.0 63
Example:
conf i g r epor t cl ock t r ee
This example returns a list of the currently specified column identifiers.
Example:
conf i g r epor t cl ock t r ee {}
This example returns a list of all possible column identifiers.
Example:
r epor t cl ock t r ee / wor k/ ent i t y/ my_model
This example generates a report of the clock tree structure for each clock node in the model
/work/entity/my_model. The columns of the report contain the identifiers previously specified with the
config report clock tree command.
To control the scope of the report clock tree command, rather than running it on your entire design,
you can run it on a subset of nets by using the -pin pi n option.
Use the -pin option to report statistics for the clock driven by pi n. The driven clock includes the net
connected to pi n as well as the clock nets in the transitive fanout cone of pi n.
The report clock skew Command
The report clock skew command generates a report of skew statistics for all clocks in the design on
a per-clock-phase basis. Like the report clock tree command, the clock skew report has
configurable columns. This is the syntax for the report clock skew command:
r epor t cl ock skew model [ - al l ] [ - mode] [ - l at e] [ - ear l y] [ - no_of f set ] \
[ - no_of f set _pi ns] [ - pi n pin_name] [ - bot h_edges] [ - nohi er ] \
[ - pr ot ot ype] [ - f i l e] [ - append filename] [ - st r i ng] [ - noheader ]
Important: The report clock skew command is helpful in debugging latency problems. The
command runs faster than the report clock latency command and is a dedicated skew
report.
Use the config report clock skew command to specify the information you want the report to
contain. The report contains a table with one row for each skew phase or skew group in the design.
For example, a row with CLK:R contains information pertaining to the CLK:R skew phase. A row with
CLK:R#1 contains information pertaining to skew group #1 of the CLK:R skew phase.
This is the syntax of the config report clock skew command:
Clock Reporting
Reporting Commands
64 Talus1.0
conf i g r epor t cl ock skew format [ - pr ot ot ype]
where f or mat is a list of column identifiers enclosed in double quotation marks () or curly braces
({}). See the online man pages for a complete list of identifiers. The -prototype option indicates that
the settings are applied to the clock skew report for soft macros.
Running the config report clock skew command without an argument returns the currently
specified set of column identifiers. Running the command with an invalid argument, such as an
empty set, returns the entire list of valid column identifiers.
Several of the column identifiers contain the prefixes GLOBAL_, LOCAL_, or BOUNDARY_. The
values refer to global skew, local skew, and boundary skew. The following definitions are helpful:
Global skew
The difference between the minimum insertion delay and the maximum insertion delay on
any given clock. This value is pessimistic because not all registers in a design have an
interaction. Global skew is theoretically bound to clock skew, but might never happen in an
actual design.
Local skew
The largest difference between insertion delays for register combinations that have an
interaction. This value is more realistic. All local skew reports represent a real skew. Local
skews are always less than or equal to global skews.
Boundary skew
Like global skew, but represents the skew only between clock pins that have an interaction
with the design I/O. Boundary skews are always less than or equal to global skews.
Note: By default, the report clock skew command reports skew only on the active edges. If you
want a report containing information on both edges of the clock, use the -both_edges option.
Example:
conf i g r epor t cl ock skew {CLOCK SI NK_COUNT PHASE MAX_SKEW\
MAX_I NSERTI ON_DELAY MI N_I NSERTI ON_DELAY}
This example configures the skew report to include the following information for all clocks in the
design:
Clock name
Number of sinks
Phase name
Maximum skew
Clock Reporting
Reporting Commands
Talus1.0 65
Maximum insertion delay
Minimum insertion delay
Example:
conf i g r epor t cl ock skew
This example returns a list of the currently specified column identifiers.
Example:
conf i g r epor t cl ock skew {}
This example returns a list of all possible column identifiers.
Example:
r epor t cl ock skew / wor k/ ent i t y/ my_model
This example generates a report of routing statistics for each clock net in the model
/work/entity/my_model. The columns of the report contain the identifiers previously specified with the
config report clock skew command.
The report clock sinks Command
The report clock sinks command generates a report of clock sinks, sorted by increasing insertion
delay. The report contains a table with rows for each clock sink.
By default, the command reports all clock sinks for the specified model. However, several options are
available to limit the report to a subset of the sinks.
By default, the report contains columns for latency, pin name, edge, skew group, and clock phase.
You can override the default configuration by using the config report clock sinks command to
specify a list of column identifiers that represents the information that you want the report to contain.
The report clock latency Command
The report clock latency command generates a report of clock latencies for all clocks in the design.
As opposed to the skew report, the latency report is a slack report in which the worst latency slack is
reported on a per-clock-phase basis. Slack is defined as the comparison of ideal clock arrival time
with the computed clock arrival time.
Clock Reporting
Query Commands
66 Talus1.0
Note: Use the report clock skew command to debug latency problems. It runs much faster than
report clock latency. The report clock latency command does not provide good results if
the ideal arrival times are not specified or are unreasonable.
This is the syntax for the report clock latency command:
r epor t cl ock l at ency model [ - pi n pin_name] [ - mode] [ - l at e] [ - ear l y] \
[ - bot h_edges] [ - nohi er ] [ - f i l e] [ - append filename] [ - st r i ng] \
[ - noheader ]
Use the config report clock latency command to specify the information you want the report to
contain. The report contains tables with one row for each timing event: either the rising or falling
edge.
This is the syntax of the config report clock latency command:
conf i g r epor t cl ock l at ency value
where val ue is a list of column identifiers enclosed in double quotation marks () or curly braces ({}).
See the online man pages for a complete list of identifiers.
Running the config report clock latency command without an argument returns the currently
specified set of column identifiers. Running the command with an invalid argument, such as an
empty set, returns the entire list of valid column identifiers.
Query Commands
This section explains three important query... commands that can provide useful information about
your clock tree:
query clock histogram
query clock sinks
query model buffer_count
The query clock histogram Command
Use the query clock histogram command to generate a histogram showing the number of
endpoints versus insertion delay for a specified range. This method is a fast way to assess the clock
skew distribution. It can help you decide whether bad skew is caused by one or a few bad endpoints,
or whether the bad skew is evenly distributed.
Clock Reporting
Query Commands
Talus1.0 67
There are several options you can use with this command. They allow you to set and control
parameters such as these:
Maximum and minimum insertion delays for the histogram subranges
Number of divisions in the histogram
Number of endpoints with insertion delay less than or equal to that of each range
Generation of reports for specific skew groups
Use the -all option if you want all clock sinks to be included in the histogram. By default, sinks
specified as skew_dontcares by the force plan clock command are excluded from the histogram.
The -all option overrides the default behavior and includes all sinks.
Use the -nohier option to indicate that the report is not to traverse the hierarchy and not to include
clock pins embedded within hierarchical cells. By default, clock pins in hierarchical cells are
considered.
Use the -pin option to specify the name of a pin for which to include clock sinks in the histogram.
Sinks are then included if they are in the transitive fanout of the specified pin. By default, all clock
sinks are considered.
See the online man pages for a comlete list of options for this command.
The query clock sinks Command
The query clock sinks command returns a list of clock sinks, sorted by increasing insertion delay.
By default, the command returns a list of all clock sinks for the specified model. However, several
options are available to limit the list to a subset of the sinks. Options allow you to tailor the list in the
following ways:
Use the -metric val ue option to specify the metric to be used for generating and sorting
clock sink information. Valid values are as follows:
o
Use the latency value to report sinks and sort them by latency (the default).
o
Use the levels value to generate data where sinks are sorted according to the number of
logic levels in the clock tree leading up to that sink.
o
Use the wire_delay_fraction value to generate data where sinks are sorted according to
the total wire delay fraction of all logic levels leading up to a sink.
Use the -mode option to specify the timer modes, as specified by force timing mode, that
need to be considered. Only clocks that are active in one of the specified modes are
reported.
Clock Reporting
Query Commands
68 Talus1.0
Use the -late or -early options to configure the report to consider only worst case timing (the
default) or best case timing.
Use the -boundary option to list clock sinks only if they have an interaction with the design
interface.
Use the -both_edges option to specify using both edges of the clock at each clock sink,
rather than only the active edges.
Use the -no_offset option to eliminate from the report offsets specified with the force plan
clock -offset command.
Use the -no_offset_pins option to configure the command not to include balance points with
nonzero skew offsets, as specified by force timing latency -type skew or -type network at
clock sinks. By default, all sinks are included.
Use the pin pi n_name option to include sinks only if they are in the transitive fanout of a
specific pin. By default, all clock sinks are considered
Use the -skew_group st r i ng option to Include only the sinks for a specific skew group,
rather than for all skew groups.
Use the -no_hier option to configure the command not to traverse the hierarchy to include
clock sinks embedded within hierarchical cells.
Use the -all option to specify that all clock sinks are to be considered. By default, only sinks
specified by force plan clock -skew_care, if any, are considered
Use the -clock_phase option to specify the name of the clock phase for which the sinks are
reported.
Use the -largest number or -smallest number option to specify the number of sinks to be
listed, beginning with the highest or lowest insertion delay. You can use these options
together.
For complete information about the query clock sinks command and its options, see the online man
pages.
The query model buffer_count Command
Use the query model buffer_count command to quickly determine the impact of various buffering
steps during clock tree synthesis. The command, by default, returns the total number of repeaters
that were inserted in the design by all commands. To see information about clock buffers or inverters,
use the -type option.
Clock Reporting
Exporting Clock Information
Talus1.0 69
By using the -type option, you can specify the particular types of repeaters you want counted. There
are many values available, including these:
Buffers in the basic clock tree (-type clock_tree)
Buffers in the stem of the clock tree (-type clock_stem)
Buffers used in the clock tree for skew or slack optimization (-type clock_sync)
Buffers that create dummy loads to perform capacitance balancing in the clock tree (-type
clock_cap)
All clock buffers listed in the previous bullets (-type clock)
For a complete list of repeater types and other query model buffer_count options, see the online
man page for this command.
Using the export clock command, you can export logical (netlist) and physical (placement and
routing) information about the clock networks in your design for use at a later time. The output of the
operation is a script containing commands needed to re-create the clock objects or networks. You
can then read the output file back into the tool using the source command to re-create the clock
objects or networks.
Using the export clock Command
This is the syntax of the export clock command:
expor t cl ock model filename [ - pi n pin_name] [ - l evel ] [ - append] \
[ - r out i ng list] [ - pl acement list] [ - wi r e_pr er out e_st at uses list] \
[ - shi el d_pr er out e_st at uses list] \
[ - r out i ng_l ayer s {layer_name layer_name ...}]
Use the -pin option if you want to export the clock fanout from a specific pin. For example, consider
a design with three clocks. By default, the export clock command exports all clocks in the design.
However, if you want to export only one of the three clocks, use the -pin option to specify the source
pin of one of the clocks.
The -level option is useful primarily for debugging operations. By default, the export clock
command exports all clock objects and networks. If you want to export a specific clock network or
path between two particular gates, though, use the -level option to export one level of logic.
Clock Reporting
70 Talus1.0
Use the -append option if you do not want to overwrite the contents of the output file during
subsequent exporting operations. For example, consider a design containing two clock networks.
Suppose you export the first clock net to the output file. If you later export the second clock net to the
same output file, the previous contents are overwritten by default. However, if you use the -append
option, the information about the second clock net is appended to the previous contents of the output
file.
By default, the export clock command does not export routing and placement information. If you
want to export this information, you can use the -routing and -placement options.
Use the -routing option to export routing information, such as wire and shielding data. The
accepted arguments to the option are wire, shielding, or all. If you use the all argument,
both wire and shielding information are exported.
Example:
expor t cl ock $m- f i l e dump. t cl - r out i ng wi r e
The example exports the clock netlist and routing wire information of all clock networks in
model $m to a file named dump.tcl.
If you use the -routing wire argument, you can also use the -wire_preroute_statuses
option to limit the exporting operation to wires having the following preroute statuses: soft,
hard, special, and none.
Example:
expor t cl ock $m- f i l e dump. t cl - r out i ng wi r e \
- wi r e_pr er out e_st at uses sof t
The example exports the clock netlist and routing wire information of all clock networks in
model $m to a file named dump.tcl. The clock routing wires are limited to those with a soft
preroute status.
Similarly, if you use the -routing shield argument, you can also use the
-shield_preroute_statuses option to limit the exporting operation to clock shielding wires
having the following preroute statuses: soft, hard, special, and none.
Example:
expor t cl ock $m- f i l e dump. t cl - r out i ng shi el d \
- shi el d_pr er out e_st at uses har d
The example exports the clock netlist and routing shield information of all clock networks in
model $m to a file named dump.tcl. The clock shielding wires are limited to those with a hard
preroute status.
Clock Reporting
Using the Clock Viewer
Talus1.0 71
Use the -placement option to export placement information. The following arguments are
accepted:
o
gater (clockgating cell)
o
repeater
o
leaf
o
all (which exports the preceding items)
Example:
expor t cl ock $m- f i l e dump. t cl - pl acement {gat er r epeat er }
The example exports the clock netlist and placement information of clockgating cells and
repeaters in model $m to the file named dump.tcl.
Lastly, you can use the -routing_layers option to list the clock routing layers to export. Use this
option in conjunction with the -routing wire or -routing shield options.
Example:
expor t cl ock $m- f i l e dump. t cl - r out i ng wi r e \
- r out i ng_l ayer s {METAL2 METAL3}
The example exports the clock netlist and routing wire information of all clock networks in model $m
to a file named dump.tcl. The clock routing layers are limited to METAL2 and METAL3.
The Clock Viewer is the part of the GUI used to facilitate clock reporting and analysis. Key
components of the Clock Viewer include:
A summary of all skew groups in the design
Detailed histogram views of individual skew groups
Cross-probing between histogram, layout, schematic, timing paths, and the Clock Tree
Browser
A specialized clock schematic viewer to maximize usability on high fanout clock networks
A Clock Tree Browser used for a level-by-level logical view of the clock tree
To open the Clock Viewer, right-click on the design model in the Model Browser, and choose Clock
Viewer (shown in Figure 11 on page 72).
Clock Reporting
72 Talus1.0
Figure 11: Clock Viewer
Clock Reporting
Talus1.0 73
When the Clock Viewer launches, it prompts you to update timing (if needed), and then displays the
Clock Summary panel (shown in Figure 12). This panel consists of a summary of clock issues on the
left, and a listing of all skew groups with minimum and maximum insertion delay on the right.
Figure 12: Clock Summary Panel
The issues on the left of the panel are those reported by check clock. Any possible problems are
shown with a yellow or red indicator. Review and, if necessary, correct all of the issues indicated by
these messages before proceeding with clock tree synthesis.
On the right side is a listing of all the skew groups in the design. Mousing over a skew group shows
detailed information about it. To see the full histogram for any of the skew groups, left-click on it, The
Clock Histogram panel (shown in Figure 13 on page 74) opens, with that skew group as the focus.
Clock Reporting
74 Talus1.0
Figure 13: Clock Histogram Panel
The Clock Histogram panel shows again the skew group summary, this time on the left side. You can
select any of those skew groups can be selected for a detailed view. The detailed view consists of a
full histogram in the upper right corner and a list of sinks in the lower right corner. By default, the five
earliest and latest sinks are shown.
Left-clicking on any bar in the detailed histogram displays in the lower panel all sinks occupying that
insertion delay range. Additionally, right-clicking on any of the histogram bars allows for diving down
into that bar or zooming out from that bar, making more detailed inspection possible.
Right-clicking on any individual or group of sinks presents several options to cross-probe to other
areas of the GUI or to set that sink name to a variable. You can select multiple sinks at once.
Cross-probing to the Clock Path Details opens the Path Details panel (Figure 14 on page 75). This is
the same as the Timing Viewer Path Details panel, but it displays only the clock path when
cross-probed from the Clock Viewer.
Clock Reporting
Talus1.0 75
Figure 14: Path Details Panel
All normal capabilities are available here, including inspecting constraints on sinks and customizing
the columns in the path report.
You can also cross-probe to investigate timing and clock constraints or to investigate a layout view.
Two other aspects of the Clock Viewer are the clock schematic and Clock Tree Browser (See
Figure 15 on page 76.) These occupy the same panel and can be displayed side by side or
individually.
Clock Reporting
76 Talus1.0
Figure 15: Clock Schematic and Clock Tree Browser
When displayed side by side, the two views are linked. If a new object is selected in the Clock Tree
Browser, the grouping that contains the object in the schematic is highlighted.
The Clock Tree Browser can display the clock tree with or without buffers, and the columns are
customizable. By right-clicking on an object in the Clock Tree Browser, you can access the
cross-probing options.
The clock schematic differs from the regular schematic in that groupings of clockgating cells and
buffers are clustered together to simplify the view. If this is not done, schematic viewing of a clock
tree is infeasible because of the large number of elements and large fanouts. Right-click these
groupings to expand or selectively expand some of the elements.
Clock Reporting
Clock Reporting: Key Points To Remember
Talus1.0 77
When generating clock tree reports, keep these important points in mind:
Using config... commands
Use the config... commands to customize report clock tree and report clock skew.
Do not mistake insertion delay for arrival time. Magma supports both fields, and they might
have different values.
Addressing latency and skew issues
Debug latency issues before performing the run gate clock command. Latency issues
usually stem from a deficiency in the basic structure of the clock tree. After using the run
gate clock command, clock tuning might make it more difficult to identify the source of
problems.
In addition, the run gate clock command used for tuning works differently after the fix wire
command is completed. There are fewer degrees of freedom working with a such a layout.
Detailed routing can make bad problems even worse, and more difficult to fix.
Clock Reporting
78 Talus1.0

Clock Implementation

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Clock Implementation

Hochgeladen von

Copyright:

Verfügbare Formate

Clock Implementation

Das könnte Ihnen auch gefallen