Sie sind auf Seite 1von 26

Clock Distribution

Rajeev Murgai
Advanced CAD Technologies
Fujitsu Labs of America

UC Berkeley
Feb 15, 2005
1

Defining Clock Skew and Jitter

Clock skew

The deterministic (knowable) difference in clock arrival times at each flip-flop


Caused mainly by imperfect balancing of clock tree/mesh
Can be deliberately introduced using delay blocks in order to time-borrow
Accounted for in STA by calculating the clock arrival times at each flip-flop

Clock jitter

The random (unknowable, except distribution ) difference in clock arrival


times at each flip-flop
Caused by on-die process, Vdd, temperature variation, PLL jitter, crosstalk,
Static timing analysis (STA) accuracy, layout parameter extraction (LPE)
accuracy
Accounted for in STA by subtracting (~3 ) from the cycle time in long path
analysis, and adding to receiving clock arrival time in race analysis

Jitter is always bad, skew can be helpful or harmful.

Clock uncertainty skew jitter


Long path analysis
F
F

F
F

Logic

skew
clk

Race analysis

-jitter

F
F
clk

F
F
skew
+jitter
2

Background

Technology scaling results in:

Existing ASIC clock synthesis flows

higher clock frequencies possible and requested by users


prominence of wiring parasitics (R,L,C) in electrical behavior
increasing noise impact on delays
increasing on-chip process variation impact on delays
Use tree architectures: not best for low skew, jitter, variations
Don't properly address noise issues
Rely on STA to calculate the delays through clock networks
Use inaccurate wiring models
Use noise-sensitive clock circuit topologies
Ignore or crudely estimate process/voltage/temperature variations
Dont have tight integration of physical synthesis & clock synthesis

Result

Predictability of clock delay is poor: Clock uncertainty (i.e., skew +


jitter) of 400ps is not uncommon
Maximum attainable clock frequency is impaired
3

Problems with Existing Clock Methodologies

Tree-based Clock Distribution

Low power but...

Sensitive to mismatching branches, difficult to layout

Sensitive to noise, especially if wires are not shielded

Using STA to calculate tree timing results in large errors

=> high skew and jitter


F
F
F
F

F
F

small skew and jitter

medium skew and jitter

F
F

F
F

PLL

F
F

large skew and jitter

Problems with Static Timing Analysis (STA)


What we have...
R

Cs
signal wire

Cg

What STA uses...


Rup
Rwire
Rdn

Cw/2

Cw/2

Cload

Note: driver model is a


little better than this with
table look-up

Other problems
Cw can match either delay or slew, but not both
interpolation using look-up tables

Clock Distribution Architectures

Two basic architectures

Tree
Grid (mesh)

Hybrids of tree and mesh

Tree + crosslinks
Mesh + local trees

Tree

Widely used in ASICs

Advantages

Disadvantages

Low cost

Wiring

Capacitance

Power
Clock gating easy
Difficult to balance path
delays due to asymmetric
FF distribution
Sensitive to variations

Flip-flops

Topologies

Symmetric H-tree
Asymmetric trees
7

CAD for Tree Architecture

Topology generation

H-tree: widely used


Method of means and medians (MMM) [Jackson et al. DAC 90]

Goal: reduce wirelength while minimizing skew.

Divide set S of points into Sleft and Sright, based on median.

| Sleft | = | Sright |

Connect/route center of mass (CM) of S to CM of Sleft and Sright.

Recurse on Sleft and Sright.

Method of Means & Medians

Problem

Solution

May not result in zero skew


One step look-ahead and decide direction of splitting.

Estimate skews using Penfield Rubenstein model.

Other problems

Buffer insertion not handled.


Obstructions not handled.

Topology: Recursive Geometric Matching

[Kahng et al. DAC 91]

Bottom-up pair-wise merge algorithm

Optimum geometric matching on n points (minimum wirelength)

Determine center point of each match edge

Recurse on n/2 points

Uses path length skews

Tries to balance root to leaf path lengths.

10

Topology: Simulated Annealing

Topology generation

Cheng et al: improve initial topology by simulated annealing

effective in reducing delay

11

CAD for Tree Architecture

Routing & wire sizing

Tsay, TCAD 93: zero-skew routing

first paper to use Elmore delay as delay model

earlier work used pathlength

DME, planar DME

make faster paths slower by detours/snaking to match delays

may use wire-sizing: make slower paths faster


Wire spacing

Buffering

Tellez & Sarrafzadeh, TCAD 97


insert minimum buffers on a given topology to meet skew and slew
constraints.

12

Grid/Mesh
Clock source

n x n uniform mesh

Distributed array of k x k
buffers drives the mesh.

Buffers driven by global Htree.

Flip-flops directly connected


to the nearest mesh segment

Used in modern processors

Advantages

flip flops

Excellent for low skew


Robust to variations

Disadvantages

Higher wiring area,


capacitance, power
Difficult to analyze
13 and

Loops
redundancy

Mesh

Sizing of clock distribution networks for high performance CPU chips

Desai et al., DEC [DAC 1996]

goal: size grid interconnect segments with constraints on clock latency


and average current

assume: initial grid and interconnect sizes

width explicit => non-linear program; practical for small networks/trees.

consider width as implicit & solve using sequence of network problems.

Results: applied on clock networks of two actual processors: DC21046A


and DC21164. Results for DC21046A:

275MHz clock

grid has 1 million edges, 15.5K drivers, 81K receivers

16% reduction in capacitance - without increasing clock latency.

Runtime: 3 days.
Optimal Wire and Transistor Sizing for Circuits with Non-tree Topology

Vandeberghe et al., Stanford University [ICCAD 97]


RC circuit with tree topology => sizing problem is convex optimization
meshes have R loops; use dominant time constant as measure of delay
14
solve using semi-definite programming (quasi-convex function)

Hybrid Architecture: Tree + Cross-links

Reducing Clock Skew Variability via Cross Links

[Rajaram et. al., DAC 2004]

tree + short-circuit some sink pairs => non-tree topology

clock signal propagates through multiple paths; reduces skew and


skew variability between shorted sinks

reduces skew variability by 30-70%

very small wire-length penalty (2%) over tree topology

Drawback:

does not consider buffering

source

15

Hybrid Architecture: Mesh + Trees

Hybrid Structured Clock Network


Construction [Hu & Sapatnekar,
ICCAD 01]

Hybrid clock topology

simple top-level global mesh

zero-skew local trees at


bottom
Presents wire sizing scheme to
achieve latency and skew
reduction.

iterative LP to minimize wire


width (area) of top-level
mesh, given delay bound

uses Elmore delay t =


G-1C

sensitivity-based post-layout
clock tree tuning to reduce
skew.

(a, CDa)
a
b

source

c
d

16

Clock Architectures
Clock source

Flip-flops
flip flops

Tree
-- low cost (wiring, power, cap)
-- higher skew, jitter than mesh
-- widely used in ASIC designs
-- clock gating easy to incorporate

Mesh
-- excellent for low skew, jitter
-- high power, area, capacitance
-- difficult to analyze
-- clock gating not easy
-- used in modern processors

Clock source

Best architecture depends on the application


Flip flops

crosslink

crosslink
tree

Local trees

Hybrid: tree + cross-links


-- low cost (wiring, power, cap)
-- smaller skew, jitter than tree
-- difficult to analyze

Flip flops

Hybrid: mesh + local trees


-- suitable for coarse mesh

17

Processors

Traditionally two hierarchies

Global clock network


Local clock network

Skew control

Global network: balanced trees or grids


Local network: de-skewing buffers

18

Pentium4 [IJSSC Nov 2001]

0.18u, 6 metal layers, 42 million transistors

Core medium clock frequency: 2 GHz

Used by most core blocks

High speed scheduling and execution: 4GHz

Non critical blocks (e.g., bus interface logic): 1GHz

Global clock distribution

3 spines; each spine has binary clock distribution


jitter reduction schemes

low-pass RC-filtered power supply for clock drivers

shield clock wires

source
spines
19

IBM [IJSSC 2001]

Same clock architecture for 6 chips (including PowerPC):

Design priorities: min. clock skew, sharp rise and fall times (below
100 ps for 1ns clock), 50% duty cycle, low power consumption

Global buffered H-trees (on top 2 layers) drive sector buffers.

Each sector buffer drives tuneable tree, which drives global mesh

length-matched

Tree wire-widths tuned to minimize skew over long distances


Mesh minimizes local skew by connecting nearby points directly.

For each chip, 10-20 complete tuning cycles

Buffer placement, wiring

Flip-flops connected to closest point on mesh

Global clock skew of 22ps

Inductance included in analysis

Mesh difficult to analyze due to loops

cut the mesh

Clock source

flip flops

20

Alpha, DEC [JSSC, Nov 98]

0.35u, 4 metal layers, 15.2 million transistors, 600 MHz at 2.2V

3 hierarchies in clock distribution

Global, major (regional) and local

Multi-level mesh

global: trees to global GCLK grid

Uses 3% of M3/M4 interconnect

M3/M4 shielding; M2, M4: Vdd/Vss

power = 16W; skew = 72ps

Major (regional)

six grids over execution units

use 6% of M3, M4

power = 14W
Local clock

tree structure, not shielded


conditional/unconditional clocks
less than 10ps skew; power = 15.6W

Clock simulation

AWE-reduction + SPICE

PLL

GCLK grid
21

Summary of Processor Clock Design

Three basic routing structures for global clock

H-tree

low skew, smallest routing capacitance, low power

Floorplan flexibility is poor:


Grid or mesh

low skew, increases routing capacitance, worse power

Alpha uses global clock grid and regional clock grids


Spine

Small RC delay because of large spine width

Spine has to balance delays; difficult problem

Routing cap lower than grid but may be higher than H-tree.

Clock
structure

Clock skew

Capacitance/Layout
area/power

Floorplan flexibility

H-tree

Low/medium

Low

Low

Grid

Low

Spine

High

High
Medium

Medium/high
Medium
22

Estimation of Process-dependent Clock


Skew in CMOS VLSI, Shoji [JSSC, Oct. 86]

Given two paths from clock source to FFs

Conventional design method

However,

skew may not be zero at another process corner

S2

Novel idea in the paper

design the two paths such that skew between S1


and S2 is zero for different process corners

TA + TB + TC = TD + TE (typical corner)

For high-current process corner H,

S1

design paths such that skew between S1 and S2


is zero at a (fixed) process corner

TA(H) = TA * 1/fN; TB(H) = TB * 1/fP (fN, fP > 1)

Zero-skew condition at H

TA(H) + TB(H) + TC(H) = TD(H) + TE(H)


(TA+TC) * 1/fN + TB/FP = TD/fN + TE/fP
(TE TB)/fN = (TE - TB)/fP

CLK

23

Estimation of Process-dependent Clock


Skew in CMOS VLSI, Shoji [JSSC, Oct. 86]

Either TE = TB or fN = fP.

But fN may not be same as fP (for PH-NL process)

In general, TE = TB => TD = TA + TC.

Pull-up and pull-down delays of two paths should


be identical.

Determine NMOS & PMOS transistor widths of


inverters to achieve this.

Results

1.75 u process
Widths selected manually
Lead to very small skews at all process corners

S1

S2

C
B

Drawbacks

only analyzes two paths


assumes identical percentage delay variation for
all NMOS (PMOS) devices
uses simplistic delay model; ignores wire cap

CLK

24

Optimal Clock Skew Scheduling

Long & short path constraints impose lower/upper bounds on skew.

long path analysis: aj ai + logic_max + tset_up - Tcycle


short path analysis: aj ai + logic_min - thold

Leads to a set of linear inequalities: ai aj cij

Given a clock cycle, feasibility can be solved using linear program,


more efficiently with Bellman-Ford shortest path [Fishburn TCAD90].

If wish to compute optimum clock cycle,

Perform binary search using above feasibility check.


Perform parametrized shortest path [Tarjan et al.]

One challenge: realize each ai

Other objectives: minimize power or switching noise.


j

i
F
F
ai skew
clk

Logic

F
F

aj
25

Optimal Clock Skew Scheduling Tolerant to


Process Variations [Neves & Friedman, 96]

Long path and short path constraints impose lower and upper
bounds on skew.

long path analysis: aj ai + logic_max + tset_up - Tcycle


short path analysis: aj ai + logic_min - thold

Try to choose skews in the middle of the bounds for maximum


protection against process variations.

i
F
F
ai skew

Logic

F
F

aj

clk

26

Das könnte Ihnen auch gefallen