Training 211

Institut f
ur Integrierte Systeme
Integrated Systems Laboratory
Department of Information Technology and Electrical Engineering
VLSI II: Entwurf von hochintegrierten Schaltungen

227-0147-00
Training 2
Energy Efficiency and Power Distribution

Prof. Dr. H. Kaeslin
Dr. N. Felber
SVN Rev.:
Last Changed:
1025
2013-11-05
Reminder:
With the execution of this training you declare that you understand and accept the regulations about
using CAE/CAD software installations at the ETH Zurich. These regulations can be read anytime at
http://dz.ee.ethz.ch/regulations/index.en.html.
1 What you will learn

In previous trainings, you have learned how to carry out a digital circuit design that meets given
timing and area constraints. This exercise will extend your knowledge to power considerations. More
specifically, we will show you:
How to determine node activity figures of adequate accuracy.

How to estimate a circuits power dissipation from node activities.
How to locate excessive voltage losses in power and ground networks.
How to detect excessive current densities in power and ground networks.
How to improve power and ground distribution networks where necessary.
A few ideas for improving a circuits overall energy efficiency (optional).
You will be assisted by M ENTOR G RAPHICS M ODELSIM (for circuit simulation) and by C ADENCE S O C
E NCOUNTER (for place&route, preparation of power and ground nets, IR drop analysis, and current
density estimations).
2 Introduction
2.1 Theoretical background
As explained in section 9.1 of our textbook,1 four phenomena dissipate energy in static CMOS circuits:
Phenomenon
Charging and discharging of capacitive loads
Crossover currents
Driving of resistive loads (if any)
Leakage currents
Results in dissipation
while node voltages
are in transit
at all times, even after
circuit has settled
Nature
dynamic
static
We will not be concerned with static power in this exercise as we limit ourselves to pure CMOS circuits
with no resistive loads and because leakage is almost negligible due to the conservative fabrication
process being studied. For the needs of EDA tools the dynamic dissipation can be attributed to library
cells as follows.
Internal power Pint is the power dissipated inside a cell for the charging and discharging of internal
capacitances and due to crossover currents.
Switching power Pext is the power dissipated inside a cell for charging and discharging the load
capacitance connected to the cells output. That external load consists of the input capacitances
of all cells being driven plus the parasitic capacitances of the wires (aka interconnect).
The total power dissipation Ptot related to a cell can now be expressed as
Ptot = Pstat + Pdyn ' Pdyn = Pint + Pext
Calculating Pext is straightforward
Pext = fcp
1
2
Cext Udd
2
(1)
(2)
Hubert Kaeslin, Digital Integrated Circuit Design, from VLSI Architectures to CMOS Fabrication, Cambridge
University Press, 2008.
where denotes the switching activity of the cells output node, Cext the load capacitance attached,
and Udd the supply voltage. fcp stands for the computation rate, i.e. the inverse of the computation period. 2 Pint gets calculated in much the same way, yet coming up with accurate activity and
capacitance figures requires detailed information about the inner circuitry and layout of each cell.
A power estimator essentially is a piece of software that sums up the various contributions over an
entire circuit. Provided the same clock and voltage get used everywhere, this amounts to
Pckt =
M
X
m=1
Pint m +
N
X
2
Pext n ' fcp Udd
(
n=1
M
X
m
m=1
Cint m +
N
X
n
n=1
Cext n )
(3)
Index m = 1...M refers to the cells instantiated in the circuit and n = 1...N to the nets of interconnect
running in between. For each cell, an internal activity figure m is estimated from the node activities at
the input(s). Note that Cint m is not meant to correspond to any capacitance physically present in the
circuit. Rather, it is just a numerical parameter adjusted for each cell during library characterization
such as to model its internal dissipation. 3
Equation (3) tells us a few important things about power dissipation and power estimation:
Realistic switching activity figures are crucial, they can be obtained from gate-level simulations.
Realistic capacitance figures are important, they are best extracted from layout data.
Dynamic power grows with Udd squared. The power vs. speed dilemma is discussed in the
textbook.
2.2 Manual activity and power calculations for warm up

To get a feeling for the process, let us estimate the power consumption of the toy example of Figure1,
a simple arithmetic processing unit that accepts two unsigned numbers of 4 bits each (InputAxDI
and InputBxDI) and that delivers either their sum or their product at the output (OutputxDO) as an
8 bit word.
Figure 1: A small arithmetic unit used for hand calculations.

A signal AddxSI decides which operation result gets assigned to the output according to the following
rule (in pseudo-VHDL):
2
For standard single-edge-triggered one-phase clocking, computation period and clock cycle are the same fcp = fclk .
Double-edge triggered circuits, in contrast, offer two computation periods per clock cycle so that fcp = 2fclk .
Incidentally observe that any attempt to capture the internal dissipation of a cell with a single quantity is not exactly
accurate as the energy dissipated when one input toggles may also depend on what is happening at other inputs at the
same time. And in the occurrence of a bistable, the current state is likely to matter too. While industrial standard
cell models typically cover all possible situations, we shall not be concerned with such details here.
if AddxSI = 1
then OutputxDO <= InputAxDI + InputBxDI;
else OutputxDO <= InputAxDI * InputBxDI;
end if;
The frequency of ClkxCI is 100 MHz and the input waveforms are represented in Figure2. They are
periodic and the two input values (InputAxDI and InputBxDI) have been chosen to be always the
same. Moreover, suppose that no glitches occur. Supply voltage is 1.8 V.
ClkxCI
InputAxDI
=
InputBxDI
0000 1111 0000 1111 0000 1111 0000 1111 0000
AddxSI
OutputxDO
20
40
80 time (ns)
60
Figure 2: Input and output waveforms.
Table 1: Power dissipated for driving the various nets of interconnect.

Capacitive load
Cext [fF]
Net
ClkxCI
AddxSI
InputAxDI (per bit line)
InputBxDI (per bit line)
OutputxDO (per bit line)
Further nets
Node activity
[1]
Switching power
Pext [mW]
140
2
...
90
...
...
60
...
...
60
...
...
0
...
...
neglected in the context of this excercise
Table 2: Power dissipated by the various circuit blocks (@ 100 MHz).

Dynamic power [mW]
Switching power
Internal power
0.04
0.66
0.00
0.12
0.56
0.54
Adder
Multiplier
Output register + mux
Student Task 1:
1. Output waveform: Collecting all 8 bits into one signature, draw the waveform and numeric values of OutputxDO in Figure2.
2. Switching activities: Assuming single-edge-triggered one-phase clocking, complete the
node activity column in Table 1.
3. Power spent for switching of nets: You now have all the facts required to calculate the
switching powers associated with the various nets according to (2). Fill in the numbers into
the last column.
4. Power dissipated within circuit blocks: Now consider Table 2. What is the main sink
of power among the blocks listed there and how much does it dissipate?
5. Consolidated dissipation: Compiling all contributions from Table 1 and Table 2, how
much power does the circuit dissipate internally, that is, with no load attached?
6. Overall dissipation: Suppose each output drives a load of 1 pF. What is the total power
consumption now?
3 The test vehicle used for computerized calculations

3.1 Architectural overview
Figure3 illustrates the circuit serving as a test case for this exercise. The circuit is entirely digital
and dominated by two finite impulse response (FIR) filters of identical structures that differ in their
coefficients. Each filter is fully parallel. At the output, an adder combines the high-pass and lowpass responses. A 2-bit selection input signal can be used to only output the high-pass component
or the low-pass component. Additional flags (ModexSI/TestModexTI) enable/disable the filters
completely.
OUTSELECTxSI
MODExSI
TESTMODExTI
DATAINxDI
16
16
DATAOUTxDO
1
1
Figure 3: High-level diagram of test vehicle used in this exercise (simplified).

In the exercise, node activity figures will be determined by way of gate-level simulations. For comparison, let us now make a quick back-of-the-envelope calculation from data available without detailed
simulation. The test vehicle is believed to have the characteristics below.
Clocking discipline
Clock frequency fclk [MHz]
Supply voltage Udd [V]
Number of interconnect nets N
Avg. load capacitance Cext n [fF]
Avg. switching activity n [1]
Number of cell instances M
Avg. equiv. capacitance Cint m [fF]
Avg. internal activity m [1]
single-edge-triggered one-phase
50
1.8
5 500
30.0
0.2
3 900
25.0
same as n
Student Task 2: Plug in these numbers into (3) and put down the result here: ....
3.2 Install test vehicle and start cockpit

We provide you with a finished test vehicle with final routing completed. To install it do
Student Task 3:
1. Open a Unix shell window.
2. Install the test vehicle:
sh > /home/vlsi2/t2/install_t2_partA
3. Start the cockpit:
sh > cd training2_partA
sh > icdesign umcL180 &
The design views now available include

1.
2.
3.
4.
Source code (available at sourcecode/..)

C ADENCE S O C E NCOUNTER database
Final netlist
.sdf file for back annotation
3.3 Generating stimuli

For running meaningful power simulations we will need the right input stimuli. We provide a set of
stimuli in the simvectors directory (input.stim). During this training, you will need to modify the stimuli
files to estimate power in different operating modes. As seen in Figure 3, the signal OutSelectxSI
is used to control which filter block is added to the output. Furthermore, there is a ModexSI and
a TestModexTI signal that controls how the internal registers are enabled. These signals can be
used to configure the test vehicle in a variety of modes. To change the operating mode, you need to
adapt the number in the first line of the stimuli file simvectors/input.stim, since it encodes the
operating mode of the design as integer value. See the following table for the operating modes we
will use in this exercise.
The subsequent integer values in the stimuli file correspond to the input data. Next let us give some
technical comments on the process of automated power estimation.
Enable all:
Disable HP:
Clock Gate:
TestModexTI
1
1
0
ModexSI
1
1
0
OutSelectxSI(1)
1
1
1
OutSelectxSI(0)
1
0
0
int value
15
14
2
4 Power Estimation Flow

We are going to use the same CAD/CAE tools your are familiar with from previous exercises and/or
from your semester project in VLSI design. During earlier design phases, ModelSim had served to
functionally verify RTL source code. The focus now shifts to collecting the respective toggle counts
of electrical nodes present in a circuit netlist as a prerequisite for power calculations.
In search of accuracy, we are going to do a postlayout simulation that includes the various layout parasitics that had come into existence once placement and routing were completed. For this
purpose, the netlist previously written out by C ADENCE S O C E NCOUNTER in Verilog format is
compiled using vlog instead of vcom (have a look into the file modelsim/compile gate.csh in order
to observe the compilation of the verilog netlist). Since ModelSim is able to perform mixed-language
simulations, we can use any VHDL testbench (almost the same as the testbench for rtl simulation,
only with some minor adaptations) to carry out this particular postlayout simulation.
The next point that merits your attention is the selection of the stimuli. As power dissipation is datadependent, it is important to make a proper choice of the stimuli vectors to get meaningful results.
The node activities used for power estimation must be statistically representative for the target
application which implies that the stimuli will not necessarily be the same as those employed during
functional verification.
What follows is a brief overview of the file types involved in annotating a netlist.
SDF back annotation: The SDF (Standard Delay Format) file contains the information about the
interconnect and cell delays in a design. It can be exported from C ADENCE S O C E NCOUNTER
to transmit these delay data to a simulator (and/or to a static timing analyzer). This file is
required for any type of post-layout simulation, irrespective of whether you are interested in
calculating power consumption or in gate-level functional verification.
VCD back annotation: The VCD (Value Change Dump) file logs all signal changes (i.e. the events
in VHDL terminology) that occur during a simulation run. The information is essentially the same
as in the ModelSim wave window but in textual form. File size thus not only grows with design
complexity but also with the length of a simulation run. A VCD is required for power analysis
with C ADENCE S O C E NCOUNTER . For obvious reasons, it is always possible to extract the
average activity for each circuit node from a VCD file but not the other way round.
As a welcome observation, we note that no parasitics exchange file (such as SPEF or RSPF) is
required to transport estimated capacitance values from the place&route tool to the power calculation
tool as both functions are assumed by C ADENCE S O C E NCOUNTER in the current design flow.
Side note: Our experience suggests that, while internal dissipation is well characterized in our technology (umc L180), leakage power is often by far overestimated.
5 SoC Encounter Power Analysis

In this section we will perform a power analysis of our final chip using different sets of toggle activities.
C ADENCE S O C E NCOUNTER is able to perform a power analysis based on statistical estimates of the
switching activity. For more accuracy it can also process value change dump (VCD) files generated
as a result of post-layout simulations. Throughout the whole power analysis exercise you will have to
update the following table continously.
We will first start C ADENCE S O C E NCOUNTER and load the saved test vehicle.
Student Task 4:
Start C ADENCE S O C E NCOUNTER .
In the C ADENCE S O C E NCOUNTER GUI and select the menu Design Restore \
Design SoCE... and choose chip filter.enc from the save directory. Among the views
on the top right hand, select the last one, the P HYSICAL VIEW.
Power Analysis Method
Total Power [mW]
Dominating Instances
Power [mW]
Global Activity
Input Activity
VCD-Based Activity
Enable all
Enable all (zero inputs)
Disable HP
Clock Gate
5.1 Statistical Power Analysis

As you know, dynamic power consumption directly depends on the switching activity. C ADENCE S O C
E NCOUNTER provides some simple approaches that estimates the switching activity of the circuit,
without running costly simulations. These methods are useful to quickly get a first measure of the
chips power consumption.
Global activity
C ADENCE S O C E NCOUNTER allows to automatically set a default toggle-activity value to all internal
nodes. Throughout the power analysis each internal node of your chip will toggle with this value
during each clock cycle.
Student Task 5: In order to start this analysis, select Power Power Analysis
Run Power Analysis... a . In this form, select the folder reports/power as the results
directory (see Figure 4).For the moment leave the clock frequency at 100 Mhz. Then step into
the Activity tab and write 0.2 as global activity (this means that every node will change its
state with a probability of 0.2 per clock cycle). This is a good initial value. At this point, you are
able to start your first statistical power analysis. Press the OK button (or A PPLY)
a
If the menu Run Power Analysis... is not available select first Set Power Analysis Mode... and press
OK with the default settings. Now the previous menu should be accessible
The power analysis will then start and write lines similar to the following on the C ADENCE S O C
E NCOUNTER shell window:
CPE found ground net: GND
CPE found power net: VCC voltage: 1.8V
INFO (POWER-1606): Found clock ClkxCI with frequency 50MHz from SDC file.
CK: assigning clock ClkxCI to net ClkxCI
Propagating signal activity...
Starting Levelizing
2011-Nov-07 10:29:54 (2011-Nov-07 09:29:54 GMT)
2011-Nov-07 10:29:54 (2011-Nov-07 09:29:54 GMT): 5%
..
Among the messages in the console you will find some information about the clock. Notice that the
clock frequency extracted from the SDC file (50 MHz) does not match the frequency specified in the
GUI. The tool will use the SDC version, so the entry in the GUI will be ignored. It is important that
you always check the clock frequency on the console.
Student Task 6: Adjust the clock frequency (dominant frequency value) in the GUI so that
it matches the SDC value, and rerun the analysis.
There will be a warning message on the console about the TIE cells not having a power model. Since
the tie cells, do not have any switching activity (they tie the output to either logic-1 or logic-0), this is
not really a problem.
At the end of the analysis C ADENCE S O C E NCOUNTER will write a summary on the console. The
result will also be written to the chip filter.rpt file, in the reports/power directory. Have a look at it and
try to identify the main results of the power dissipation of your chip. How much power does the chip
dissipate? What are the values that contribute most to the total power?
Student Task 7: Talk to an assistant and discuss where most of the power is being dissipated.
Calculate the total power dissipated by these instances. Update the results table at the beginning
of section 5. Use the additional column to enter the power dissipated by the above mentioned
instances.
Once we run the analysis again this report file will be overwritten. For this exercise we would
like to preserve the file, so that we can compare the results later on. Step into the encounter
directory of this exercise and make a copy or move the file under a different name, for example:
sh > cd ../encounter
sh > mv reports/power/chip_filter.rpt \
sh >
reports/power/chip_filter_ga.rpt
Figure 4: Run Power Analysis menu in Cadence SoC Encounter .

Input Activity
Setting all internal nodes to a fixed activity is a gross oversimplification. Not all gates will switch with
the same probability (i.e. a 3-input AND gate switches its output much less than say a 2-input XOR
gate). Instead of setting a default switching value to every internal node of the chip, it is also possible
to define only the activity of the input pins. C ADENCE S O C E NCOUNTER is then able to propagate
this activity inside the chip.
Student Task 8:
To execute this new power analysis go back into the Run Power Analysis... menu
and deselect the global activity option in the Activity tab. Return to the Basic tab and
put the value 0.2 in the input activity field. As before set the frequency to 50 MHz. Leave
the flop activity and the clock gate activity fields emptya .
Run the analysis and check the new report. What is the total power dissipation of the chip
now? Can you explain the difference with the previous value? Which of the two results is
more reliable?
Update the table you started from the last time with the current results.
As before, rename the generated report file:
sh > mv reports/power/chip_filter.rpt \
sh >
reports/power/chip_filter_ia.rpt
a
The first specifies the activity of outputs of sequential logic, while the latter specifies the average number of
times that a clock-gating cell switches in a clock cycle.
10
5.2 Stimuli-based Power Analysis

Using a circuit simulator to determine node activity figures
Instead of trying to estimate the switching power (with different levels of accuracies), we can use the
M ENTOR G RAPHICS M ODELSIM simulator to run the complete simulation and determine the exact
switching activity. We can tell M ENTOR G RAPHICS M ODELSIM to write out a Value Change Dump
(VCD) file from the post-layout netlist, which will for all nodes include information that tells when the
node has switched to what value.
Student Task 9: Step into the modelsim directory of this exercise:
sh > cd ../modelsim
Compile the placed & routed netlist of the final design. Also compile the testbench and related
files. All these compilations can be performed by executing a single shell scripta :
sh > ./compile_gate.csh
Now start the simulator with a prepared run script:
sh > ./run_gate.csh
a
A good idea is to take a look at it! you should know what you are executing.
To view the input and output of the filter, there is a .do file that will show the relevant signals in the
Wave window. On the console you could type:
vsim > do wave.do
Student Task 10:

Now we are ready to generate the dump file. We will first simulate the circuit for 100 ns
so that the circuit is properly initialized (we do not want to include the activity during the
initialization phase). Then we have to tell modelsim where to store the VCD file. The last
thing is to specify the names of the nodes that we would like to monitor, i.e., the scope. The
following three commands are used for this purpose:
vsim > run 100ns
vsim > vcd file ./vcd/chip_filter.vcd
vsim > vcd add -r /chip_filter_tb/DUT/*
At this stage we can run the gate-level simulation until the end (20.142 ns). Moreover, the
simulator needs to be flushed at the end of the simulation run to make M ENTOR G RAPHICS
M ODELSIM write the VCD file.
vsim > run -all
vsim > vcd flush
11
For a real design, the simulation could take a very long time, and more importantly, could produce
very large (Gigabytes !!) of VCD files. For your own designs consider writing the VCD files to the
/scratch directory.
This simulation, however, should not take that long. As you can see from the wave window, the inputs
are rather random, and should produce a lot of activity.
Stimuli-based Activity
At this point, we have a VCD file that contains the toggle activity of the nodes in the design based
on a simulation with actual stimuli. We will now give it to C ADENCE S O C E NCOUNTER to perform a
stimuli-based power analysis:
Student Task 11:
As before, select the menu Power Power Analysis Run Power Analysis....
In the main tab, select VCD F ILE to perform a simulation-based power analysis. Note
that if you dont check this option, SoC uses the values given in the other fields. Take the
generated VCD file and enter as S COPE the top-level module chip filter tb/DUT. Note that
there is no leading slash / in the scope. You could also specify a start and stop time for
the power simulation. Here, specify a start time of 100 ns, and a stop time of 20,000 ns
(numbers are taken from the simulation). Leave the block field empty and press A DD. Do
not forget to press A DD!
The results directory should be reports/power. See Figure 5 to get an overview of the
windows setup. Press OK..
Figure 5: Run Power Analysis menu in Cadence SoC Encounter with vcd file.
12
Once the power analysis starts, it will start writing to the C ADENCE S O C E NCOUNTER shell messages
that look similar to the last times. But we have to study them carefully. When the clock period specified
in the SDC file, and the clock period within the VCD file do not match, you will get a message that
says (for example):
WARNING (POWER-1784): Existing clock frequency 217.391MHz
is being overwritten with 200.034MHz on clock rooted on
net ClkxCI from VCD file.
In this case the VCD clock frequency will be taken. In our exercise, we do not have this problem.
Furthermore, there will be a message similar to the following one
With this vcd command, 4426896 value changes and 1.99e-05 second
simulation time were counted for power consumption calculation.
The line above summarizes how C ADENCE S O C E NCOUNTER has interpreted the VCD file. It is very
important to make sure that the time (expressed in seconds) is equal to what we have simulated (and
have intended). In our case, the time should be 20,000 ns - 100 ns =19,900 ns, which matches the
above message. Make sure that you have the correct time.
Filename (activity)
Found in design
Coverage for file
: ../modelsim/vcd/chip_filter.vcd
: 24858/26118
: 5473/5473 = 100%
The lines above tell us what C ADENCE S O C E NCOUNTER has extracted from the VCD file. It is
very easy to make mistakes and use the wrong VCD file. The second line shows the total number
of switching activities, and the third line shows what percentage of the internal nodes that were
annotated.
If you see that the message looks like the following:
Found in design
Coverage for file
: 0/0
: 0/5473 = 0%
you have a problem (most probably, it is the wrong file, or the wrong scope has been specified
because the leading slash has been omitted). C ADENCE S O C E NCOUNTER will still perform the
analysis regardless of the success of the annotation. Since nothing was backannotated, the results
will just be wrong.
Student Task 12:
Take a look at the report chip filter.rpt in the output directory that you have selected. How
much power does the chip dissipate now?
Update your results table with the latest result. Do not forget to update the power in the
second (mystery) column.
Compare the results with the older analyses, does your result make sense?
13
5.3 Effect of Switching Activity

For the last part we have used a simulation of random input data. The stimuli file was given for the
exercise, and we just used these values. The question that we should now investigate is how much
could the stimuli file effect the overall power consumption.
Student Task 13:
To do this, we apply the stimuli producing the least activity in the design: an all zero vector.
Generate a stimuli file with an all zero input and record a new VCD file. (You will have to
figure out how)
Update the estimated power in our table.
Present the results to an assistant.
5.4 Architectural Changes to Save Power

Architectural decisions can have a signicant effect on the power consumption of the circuit. The
test circuit we use in this exercise has been designed to have several different operation modes that
correspond to differing architectural choices. A summary of the options can be found in Section 3.3.
The stimulus file in the previous section used both the high-pass and the low-pass filter component
at the same time (option Enable all). The first thing that we will do is to disable the high-pass-filter
(option Disable HP) and check the resulting power analysis.
Student Task 14:
Modify the stimulus file, simvectors/input.stim so that the option Disable HP is selected.
You should only change the first number in the stimulus file. (Make sure you are not using
the stimuli file with zero activity!)
Perform a power analysis using the VCD file generated from the new stimulus file.
Report your numbers in the table. How does it compare to previous results?
After examining the power reports, and consulting the simplified block diagram in Figure 3, you should
notice that there is a way to reduce the power consumption without losing functionality.
Student Task 15:
Describe a couple of approaches that could reduce the power consumption of the circuit. Discuss
your solutions with an assistant.
We will implement a solution that uses clock gating technique to disable the unused filter bank. The
test circuit already has the control signals for this solution (see Section 3.3). We will use the option
Clock Gate. This option will a) only enable one block, and b) use clock gating to stop the clock
propagation in the block that is not enabled.
14
Student Task 16:

Modify the stimulus file, simvectors/input.stim so that the option Clock Gate is selected.
You should only change the first number in the stimulus file.
Perform a power analysis using the VCD file generated from the new stimulus file.
Report your numbers in the table. How does it compare to previous results?
The change in the input file uses the ModexSI to disable the filter blocks in connection with the
OutSelectxSI signal. The TestModexTI signal toggles the Clock gating circuitry: 1 - clock gate
inactive, 0 - clock gate active.
Normally architectural changes like the one we have just described can not always be performed by
changing the input stimuli (this was done in this exercise to save time). Such architectural changes
would require changes to be made to the circuit description, re-synthesis of the circuit, and a fresh
back end design process. Once the backend process is complete we would extract the SDF file and
the netlist, use M ENTOR G RAPHICS M ODELSIM to generate a new VCD file, import this file back into
C ADENCE S O C E NCOUNTER and perform the power analysis.
Explain the numbers in your final table with your assistant.
Next week, we will study the effects of IR drop and investigate the effects of different power distribution
strategies.
15
6 Ground bounce, supply droop and Electromigration

In this part of the training we want to determine an adequate power routing strategy for our design.
We can determine the width, layers, and the number of stripes and power rings by evaluating how
much the power distribution is affected.
To perform this analysis, we will use the Rail Analysis of C ADENCE S O C E NCOUNTER . The rail
analyser can show the current density, ground bounce and supply droop across the power lines in
a chip. This allows us to evaluate whether or not the current power distribution is adequate for the
design. In C ADENCE S O C E NCOUNTER the ground bounce and supply droop are called IR drop.
While designing the power nets, it is important to keep in mind two different problems:
IR drop: Since the metal exhibits a natural resistance (R), current (I) flowing through such a
connection will create a voltage drop. This in turn will reduce the supply voltage of any cell,
which is at the detriment of its performance (increased propagation time e. g.). Additionally,
excessive supply drop and ground bounce may violate noise margins leading to a malfunction of the chip4 . Depending on process voltage temperature (PVT) variations, it immediately
influences the correct behavior of the chip.
Electromigration5 : Thermally agitated metal ions are washed away by flowing electrons, thereby
reducing the cross section of the metal. As a final result an interruption of a power line can
occur, which destroys the chip. This phenomenon is dependent on the current density J.
IR drop is a problem that has an immediate effect on the chips operation, while electromigration is
a slow process, which may show its negative impact after months or even years during which the IC
has been correctly working. The positive side effect of designing the supply wires sufficiently wide
with respect to electromigration is that fusing due to of high current densities is prevented. That is,
constraints for preventing electromigration are much tighter than those for preventing fusing.
Fortunately, C ADENCE S O C E NCOUNTER features efficient rail analysis tools that show the IR drop
along the supply lines and the current density therein graphically. Basically, there are two versions of
the Rail Analysis available:
Early Rail Analysis: Is a simplified analysis that can be used after floorplanning.
Rail Analysis: A more accurate analysis, that can also take into account the power distribution within macros such as I/O pads and memory macros.
In this exercise we will use the more precise one, i. e. the Rail Analysis.
7 The Test Vehicle

The design being used throughout this part of the training will already be very familiar to you. It is
the same design you used in Exercise 3. In order to give you an overview of it once again, Figure 7
illustrates the main components.
4
5
Check the VLSI book in chapter 10.3

Check the VLSI book in chapter 11.6.1 for a more detailed discussion
16
BCJRDDataxDI
in2Gamma
DataxDI
DataxDI
mem1
Input
Memory
mem2
mem3
MBJCRFsm
OutRam1xD
OutRam2xD
OutRam3xD
LLRSelectxSI
PADS
gammaAdder
gammaAdder
PADS
FSM
ModexSI
gammaAdder
BistGammaOkxTO
BistAlphaOkxTO
TestModexTI
BistAlphaDonexTO
BistEnxTI
BistGammaDonexTO
alphaConn
alphaUnit
dummyBeta
Conn
betaConn
betaUnit
dummyBetaUnit
ClkxCI
alphaMem
ResetxRBI
GammaxD
BetaGammaxD
LLRxDO
LLRxDO
BetaxD
AlphaxD
LLRUnit
MBCJRUnit
i_res..
top
mbcjr_chip
Figure 6: The test vehicle being used.
7.1 Installation and Preparation Work

The test vehicle can be installed as follows:
Student Task 17:
1. Open a Unix shell window.
2. Install the test vehicle:
sh > /home/vlsi2/t2/install_t2_partB
3. Start the cockpit:
sh > cd training2_partB
sh > icdesign umcL180 &
Afterwards, load the already prepared design:

Student Task 18:
1. Start C ADENCE S O C E NCOUNTER .
2. Navigate to Design Restore Design SoCE... and choose mbcjr chip.enc from the
save directory.
3. Change to the P HYSICAL VIEW of the design.
17
For the Rail Analysis, some power-specific information is required, which can be gained from the
power analysis as follows:
Student Task 19:
1. Setup the power analysis mode: PowerPower AnalysisSet Power Analysis \
Mode... and click OK using the default settings.
2. Switch to the C ADENCE S O C E NCOUNTER shell and execute the following command in
order to perform a power analysis which generates the required power-specific information
for the Rail Analysis:
enc > report_power -rail_analysis_format VS \
enc >
-outfile reports/power/mbcjr_chip_vcdx4.rpt
3. Watch the output within the C ADENCE S O C E NCOUNTER shell and check whether the
coverage of the node activity file reaches 100%.
The output of the power analysis should look similar to the following:
Loading TCF file save/mbcjr_chip.enc.dat/mbcjr_chip.tcf
Filename (activity)
Found in design
Coverage for file
: save/mbcjr_chip.enc.dat/mbcjr_chip.tcf
: 26202/26202
: 26202/26202 = 100%
TCF-Toggle Count File: You should have recognized that for the previous power simulation we
didnt use a VCD file (as within the first part of the training), but a TCF file instead. As the name suggests, the TCF filetype contains the toggle count information of the nodes, and is an SoC Encouterspecific file format. In contrast, the VCD file format contains the complete timing information. TCF
files can be generated from VCD files but not the other way round.
Now you are ready to start analysing the design with regard to its power distribution.
8 Rail Analysis
8.1 Rail Analysis Setup
Student Task 20:
From the menu select PowerRail AnalysisSet Rail Analysis Mode.... Within
the B ASIC tab, set the ACCURACY to Accurate. For the P OWER G RID L IBRARIES choose
the .cl files in the directory tech/cl/.
Select EM Analyse Models and choose the file tech/EM.6.models.
Compare the settings to Figure 7. If all is correct, save the settings by using S AVE .. and
then press the OK button.
18
Figure 7: Set Rail Analysis Mode GUI in Cadence SoC Encounter .
8.2 IR Drop Threshold

To perform IR drop analysis, we need to fix a threshold that indicates the worst acceptable voltage
level in the design. The threshold voltage can be extracted from the databook (located in the DOCS
directory):
Student Task 21: Look for the operating conditions in the standard cell databook and report the
following values:
Operating voltage:
Minimal voltage:
At first sight, a good threshold value might be the minimal voltage of the standard cells. However, we
need to take into account that the IR drop analysis is done for VCC and ground separately, that is the
maximal IR drop is the sum of VCC and ground drops.
Student Task 22: Taking into account the considerations from before, determine an appropriate
threshold level for the IR drop on the power nets:
19
8.3 Rail Analysis Run

Student Task 23:
To run the analysis select from the menu Power Rail Analysis Run Rail Analysis....
Set VCC as the Power Net(s), set the Voltage(s) and the appropriate threshold.
In the Power Data menu choose the Current Files switch and then select the instance
current file that was generated in the previous step, i.e. static VCC.ptiavg for the net VCC
(located in the reports/power directory).
C ADENCE S O C E NCOUNTER does not really know how the power signal will enter the chip.
You can do this by using the P OWER PAD definition. The easiest way is to use a Pad File.
To create this file, choose Pad File click on the C REATE button.
In the Edit Pad Location window set the net name under Auto Fetch Pad Location
to VCC and press AUTO F ETCH. The Pad Location List is updated with all the VCC supplies. Now you can save this list under the name mbcjr chip VCC.pp in the save folder (use
the VS file format). Close the window by pressing C ANCEL.
Back to the Run Rail Analysis... you have to load the Pad Location List that you
have just created by selecting it within the Files: option. As the Net Name: use VCC
and press the A DD button.
After providing the results directory reports/rail, the GUI should look similar to Figure 8.
Press the A PPLY button.
If the rail analysis succeeded, the C ADENCE S O C E NCOUNTER shell should display an output similar
to the following:
* Exiting vstorm2 normally.
vstorm2 exited successfully.
Check Reports/main.html generated inside state directory.
8.4 View Rail Analysis Result

Once the rail analysis is completed, you have to open a new window, named Power & Rail \
Results to be able to see the results.
Student Task 24:
Go to the menu Power Report Power & Rail Results....
This will bring up a new window. In the Basic tab, at first select the B ROWSE button
and choose the previously generated rail analysis results, which should be located in the
reports/rail directory. The results files will be called something like VCC 25C avg 1. Press
the L OAD S TATE button to load the results.
20
Figure 8: Rail Analysis GUI in Cadence SoC Encounter .

Note that the last number of the result files directory (1 in the example aforementioned) gets incremented each time you run a new rail analysis. Thus, when you want to view the results of a new rail
analysis, you need to load the state from the new result directory. The tool will allow you to visualize
different features like the IR Drop or the Current Density directly on your chip.
IR Drop
For the first step we will analyze the IR Drop map of the chip.
Student Task 25: Under R AIL A NALYSIS P LOT T YPE select IR - IR D ROP. Make sure that the
option AUTO A PPLY in the ACTION field is checked. Otherwise you will have to press the A PPLY
button in order to show the results. Compare your settings with those from Figure 9.This will give
you a color coded map of the IR drop of your chip. The highest drop will be colored dark red. You
can dim the rest of the circuit with F9 button to see the IR drop more clearly.
By default, the tool will automatically determine the color ranges. You can change this if you want in
the AUTO F ILTER field (e. g. by pressing the AUTO button).
Resistor Current
In the Power & Rail Results window select RC - R ESISTOR C URRENT to show the plot of the current
flowing across the wires. Again you can check AUTO A PPLY or press A PPLY.
21
Figure 9: Power & Rail Results GUI in Cadence SoC Encounter .
22
Resistor Current Density

The resistor current density plot (RJ - R ESISTOR C URRENT D ENSITY option) computes the ratio
J/Jmax for every wire of the chip. More precisely, J corresponds to the actual chip current density
and Jmax is the maximal allowed current density of the selected metal. A ratio greater than 1 means
that the current density limit of the segment is violated. This is an important aspect since for critical
values of J/Jmax , your chip could suffer from the problems described at the beginning of Sec. 6.
Student Task 26:
Examine the default design, talk to an assistent and discuss some possible solutions in
order to better distribute the power.
Where is the worst IR drop located?
Where is the worst resistor current density located? Why?
9 Power Distribution Techniques

Throughout this section we will apply different techniques, which allow us to better distribute the
available power within our design. In order to see how the particular techniques effect the power
distribution of our design, Table 3 should be updated with the gained results continously.
Note that, in order to make you aware of the different problems for power distribution, we use a design
that is very bad in the beginning so that you see the increases of the different steps. In a typical chip
design flow, most of the steps are however not necessary.
Table 3: Power Distribution Techniques - Results Table.
Voltage / IR Drop [V ]
Nets below Threshold [%]
Default design:
Connected pads:
Connected macro:
Widened power rings:
Doubled power rings:
Power rings @ Metal /Metal :
Added power stripes:
Student Task 27: Have a look at the results of the rail analysis of the default design, which you
have gained during Section 8 and fill out the first row of Table 3. The first empty column of the
table should contain the maximum IR Drop within the design, whereas the second column should
be completed using the number of nets, which violate the IR Drop threshold (in %).
23
9.1 Supply Pads Connectivity

One of the major issues, why our test vehicle has such a bad power distribution is due to the tiny
connections between the supply pads and the actual core of the design. One way to solve this
problem would be to manually widen those connections. Another way is to use the built in routing
option from C ADENCE S O C E NCOUNTER , which makes sure that the connections are as wide as
possible for the used supply pads. This can be done in the following way:
Student Task 28:
Go to the menu Route Special Route.... Within the B ASIC tab have a look at the
R OUTE field and deselect all options except the one for PAD P INS. Your settings should
look similar to those in Figure 10. Close the dialog using the OK button and as soon as
routing has finished, have a look at the newly created connections at the supply pads.
Run another rail analysis as described in Section 8.3, have a look at the results (see Section 8.4) and complete the appropriate row in Table 3.
Figure 10: Special Route GUI in Cadence SoC Encounter to improve Pad Connectivity.
9.2 Macro Blocks Connectivity

You should have already recognized that another major problem within our design is the connectivity
of the macro block. Fixing this issue is more or less equal to the previous one:
24
Student Task 29:

Go to the menu Route Special Route.... Within the B ASIC tab have a look at the
R OUTE field and deselect all options except the one for B LOCK P INS. Close the dialog using
the OK button and as soon as routing has finished, check the newly created connections
at the macro block.
Run another rail analysis and complete the results table.
9.3 Adjustment of the Power Rings

The current width of the power rings is definitely at a minimum (they are almost as narrow as the cell
library allows them to be). In order to get some information about the different available metal layers
and their electrical characteristics, you will now examine one of the technology specific files provided
by the design kit:
Student Task 30: Navigate to the directory encounter/tech/lef/ and open the file header6 V55.\
lef using less. Browse through the file and complete the following table:
Minimum Wire Width
m
Maximum Wire Widtha

m
Resistance
/2
Thickness
m
Metal 1
Metal 3
Metal 6
a
Watch out for the maximum wire width before slotting occurs.
Now you should be able to set the width of the power rings accordingly:
Student Task 31:
Use the ruler to determine the width of the power rings. How wide are they currently?
What would be a more suitable width for the power rings?
Ask an assistant whether your assumptions are suitable or not. Correct them if necessary.
Afterwards open the menu Power Power Planning Add Rings... and insert the
settings illustrated in Figure 11.
Run another rail analysis and complete the results table.
Widen the power rings definitely improved the power distribution of our design. Nevertheless, not all
of the nets reach the previously defined threshold. Hence, we have to take further steps in order to
acquire a lower IR Drop. One possibility is to double the number of power rings:
Student Task 32: Open the menu Power Power Planning Add Rings... and apply
the same settings as within the previous step, except the N ET ( S ). Here insert GND VCC GND \
VCC, which results in doubled power rings. After hitting the OK button, run another rail analysis
and write down the results in Table 3.
25
Figure 11: Add Rings GUI in Cadence SoC Encounter .
26
As you should see from your results, the addition of a second power ring does not improve the power
distribution much. Therefore you can delete the second power ring we have just created by simply
removing the appropriate wires within the design. What you can see from the previous step is that
oversized power networks do not always help you to get a better power distribution. Instead, they
only consume die size, which certainly can be used in a better way.
Throughout the previous section you have gained some electrically-specific information about the
different metal layers. Maybe you can already imagine that the choice of the correct metal layer also
plays a major role during designing the power distribution network. Hence, let us now try to change
the metal layers of our power ring in order to reduce the IR Drop.
Student Task 33:
First, remove the existing power ring within the floorplan (select and delete).
Open the menu Power Power Planning Add Rings... and keep the previously
entered settings (Check that you do not insert the unnecessary second power ring this
time.), except that you choose a more suitable metal layer.
Press the OK button and run another rail alaysis. Have a look at the results of the rail
analysis and complete the corresponding row in your results table. Which metal layer did
you choose and does the change improve the power distribution?
9.4 Power Stripes

Still some of the voltage levels of the nets within our design are below the initially set threshold. As
we can see from the latest rail analysis results, the highest IR Drop is located right in the middle of
our design. Therefore we will try to correct these violations by inserting power stripes.
Like during the insertion of power rings, you also have many different parameters which you can tune
during the insertion of power stripes. Some of them are listed in the following:
Orientation: Power stripes can, of course, be inserted either horizontally or vertically. Because the
supply wires for the standard-cells are horizontally aligned, vertical power stripes are more
suitable to improve the power distribution.
Width: As with power rings, the width of the power stripes can be defined.
Quantity: Depending on the present design, you may have to adjust the number of the power stripes
being inserted.
Power Grids: Further power distribution techniques like a power grid (i. e. vertical as well as horizontal stripes) are possible6 .
For our design we will only insert a single power stripe:
Student Task 34:
Open the menu PowerPower Planning Add Stripes... and navigate to the B A SIC tab. The stripes should be designed for the Net(s) GND VCC. Choose an appropriate
metal layer and a Vertical direction. The Width of the stripes should be 20m and they
6
Figure 10.9 of Section 10.4 within our textbook Digital Integrated Circuit Design, from VLSI Architectures to CMOS
Fabrication shows some sample layouts.
27
should have a Spacing of 1.5m.

In the S ET PATTERN field select the N UMBER OF SETS and insert just a single set.
The stripes should be inserted at a predefined locationa . Within the First/Last Stripe section
select Start from: left and for R ELATIVE FROM CORE OR SELECTED AREA insert 430m.
Compare your settings with those from Figure 12 and press the OK button.
Run your final rail analysis and check the results. Complete the results table. Hopefully,
you dont have anymore violating nets.
a
As already mentioned earlier, re-runnig the whole backend designflow for each power distribution improvement
would have been too time-consuming for a single afternoon. Therefore the nice guys from the DZ have already
prepared a suitable location for the power stripes.
Figure 12: Add Power Stripes GUI in Cadence SoC Encounter .
28
9.5 Conluding remarks

Although we primarly tried to reduce the worst case IR drop and tried to be above the specified
threshold voltage, you should in general also check that the IR drop distribution is consistent to your
expectations. For instance, you would expect increasing IR drop the farther away you go from power
distribution.
Also note that we only do a rail analysis for the VCC net and thus omit the ground network in this
training.
10 Its Your Turn

Now that you are more or less an expert7 in power analysis and power distribution techniques and
you know how to circumvent appearing problems, you can show what you have learnt by the use of
a new sample design.
Student Task 35:
In order to close the current design use the C ADENCE S O C E NCOUNTER shell and type in:
enc > freeDesign
This will close the previous sample design. Open the new design as usual by navigating
to Design Restore Design SoCE... and choose mbcjr chip II.enc from the save
directory. Change to the P HYSICAL VIEW of the design.
Before you can start with the power distribution analysis in C ADENCE S O C E NCOUNTER , you need
to create a VCD file to get node activity information using the technique you learned in the first part
of this training. In the following, we recapitulate the flow:
Student Task 36:
Compilation of the netlist: As a starting point, use the (C ADENCE S O C E NCOUNTER ) exported
Verilog netlist, which is located at /encounter/out/mbcjr chip II.v. This netlist, together with
the testbench- and simulation-specific VHDL files, has to be compiled. The required VHDL
files are listed in the following:
1.
2.
3.
4.
5.
/sourcecode/VHDLTools.vhd
/sourcecode/LTEPkg.vhd
/sourcecode/mbcjr simulstuff.vhd
/sourcecode/mbcjr chip TB pack.vhd
/sourcecode/mbcjr chip TB.vhd
You may want to have have a look at the gate-level compile script we used during the first
part of this training.
Simulation of the netlist: If the netlist and the VHDL files have been compiled successfully, you
7
Although you should be familiar with all of the tasks required for this part of the training, do not hesitate to ask an
assistant if you get stuck somewhere. The EDA tools can be a little bit confusing at the beginning. Nevertheless,
this part of the training should help you to get a better overview of how power analysis works by going through all
of the different steps on your own, this time without a guided tour provided by the assistants.
29
can start with the actual simulation of the netlist. The gate-level simulation script from the
first part of the training will help you to design a suitable run script for your current design.
The SDF file you will need for the simulation is located at /encounter/out/mbcjr chip II.sdf\
.fixed.gz. Because the present design has a RAM macro block in it, you have to specify
the fsa0a c memaker verilog - library before you can run the simulation (In addition to the
core- and I/O-specific verilog libraries.).
In order to get the VCD file, which contains the information of the nodes during the actual
running phase of the design, we recommend to generate the VCD file only between 1s
and 3s. This, on the one hand, gives you the advantage that you do not generate the
toggle activity during the initialization phase and on the other hand limits the size of the
resulting VCD file because of the simulation end time.
Power Simulation: Now that you have the node activity file, you can switch back to C ADENCE
S O C E NCOUNTER and create the power-specific files required for the subsequent rail analysis by running a VCD-based power simulation. Do not forget to run the power simulation
setup at Power Power Analysis Set Power Analysis Mode... at first. How
much power does the design consume?
Check the output in the C ADENCE S O C E NCOUNTER shell in order to be sure that the
coverage of the VCD file is OK and hence your power value is correct. After running the
power analysis the files static GND.ptiavg and static VCC.ptiavg should be available in the
directory /encounter/reports/power/.
Now you are ready to start with your first attempts in order to improve the power distribution of the
new design. Do not forget to do the setup of the rail analysis as described in Section 8.1 before you
start with the actual analysis.
Student Task 37:
Your first task will be to perform a rail analysis of the initial design and complete the first row
of Table 4. Then, improve the power distribution network step-by-step using the techniques
you have seen in the guided example in the previous section.
Complete the results table below by describing the power distribution technique you have
applied in the first row and the resulting maximum IR Drop in the second row. The goal
should be to achieve a minimum supply voltage level of 1.788 V.
Remark (Hints): In the following, we provide some hints and comments that should help you to
the improve power distribution:
1. A well formed power distribution network cannot be detected by only considering the worst
case IR Drop. Rather, try to build your network in a way such that almost all components
(standard cells, macro blocks, etc.) are provided with the same supply voltage. This includes that you should not simply stop your efforts as soon as all nets do not violate the
initially set threshold anymore, but try to achieve a balanced power distribution.
2. As you have seen, the special route option in C ADENCE S O C E NCOUNTER can be used
to route specific nets, such as VCC and GND. However keep in mind that C ADENCE S O C
E NCOUNTER considers only those nets, which are not yet connected and moreover considers only those wires, which have not been placed yet (i. e. if there are two wires already
placed on two different metal layers and are running across each other, C ADENCE S O C
E NCOUNTER will not check whether they should be connected during a special route pro-
30
cess).
3. Some of the problems in the design might be much easier to detect by using further analysis
methods of the rail analysis, which we have not mentioned in this training. Feel free to try
the other analysis methodes besides IR Drop and Current Density.
Table 4: New Design Power Distribution Techniques - Results Table.

Step
Power Destribution Improvement
None (Initial design):
Voltage / IR Drop [V ]
1
2
3
4
Congratulations Thats it!
Present the results to your assistant and discuss any open questions.
31

Training 211

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Training 211

Hochgeladen von

Copyright:

Verfügbare Formate

Institut f

Department of Information Technology and Electrical Engineering

VLSI II: Entwurf von hochintegrierten Schaltungen

Energy Efficiency and Power Distribution

1 What you will learn

How to determine node activity figures of adequate accuracy.

2.2 Manual activity and power calculations for warm up

Figure 1: A small arithmetic unit used for hand calculations.

0000 1111 0000 1111 0000 1111 0000 1111 0000

Figure 2: Input and output waveforms.

Table 1: Power dissipated for driving the various nets of interconnect.

Table 2: Power dissipated by the various circuit blocks (@ 100 MHz).

3 The test vehicle used for computerized calculations

Figure 3: High-level diagram of test vehicle used in this exercise (simplified).

3.2 Install test vehicle and start cockpit

The design views now available include

Source code (available at sourcecode/..)

3.3 Generating stimuli

4 Power Estimation Flow

5 SoC Encounter Power Analysis

Power Analysis Method

Total Power [mW]

5.1 Statistical Power Analysis

Figure 4: Run Power Analysis menu in Cadence SoC Encounter .

5.2 Stimuli-based Power Analysis

Student Task 10:

5.3 Effect of Switching Activity

5.4 Architectural Changes to Save Power

Student Task 16:

Explain the numbers in your final table with your assistant.

6 Ground bounce, supply droop and Electromigration

7 The Test Vehicle

Check the VLSI book in chapter 10.3

Figure 6: The test vehicle being used.

7.1 Installation and Preparation Work

Afterwards, load the already prepared design:

Figure 7: Set Rail Analysis Mode GUI in Cadence SoC Encounter .

8.2 IR Drop Threshold

8.3 Rail Analysis Run

8.4 View Rail Analysis Result

Figure 8: Rail Analysis GUI in Cadence SoC Encounter .

Figure 9: Power & Rail Results GUI in Cadence SoC Encounter .

Resistor Current Density

9 Power Distribution Techniques

Nets below Threshold [%]

9.1 Supply Pads Connectivity

9.2 Macro Blocks Connectivity

Student Task 29:

9.3 Adjustment of the Power Rings

Maximum Wire Widtha

Figure 11: Add Rings GUI in Cadence SoC Encounter .

9.4 Power Stripes

should have a Spacing of 1.5m.

Figure 12: Add Power Stripes GUI in Cadence SoC Encounter .

9.5 Conluding remarks

10 Its Your Turn

Table 4: New Design Power Distribution Techniques - Results Table.

Power Destribution Improvement

None (Initial design):

Das könnte Ihnen auch gefallen