Beruflich Dokumente
Kultur Dokumente
ur Integrierte Systeme
Integrated Systems Laboratory
Training 2
SVN Rev.:
Last Changed:
1025
2013-11-05
Reminder:
With the execution of this training you declare that you understand and accept the regulations about
using CAE/CAD software installations at the ETH Zurich. These regulations can be read anytime at
http://dz.ee.ethz.ch/regulations/index.en.html.
You will be assisted by M ENTOR G RAPHICS M ODELSIM (for circuit simulation) and by C ADENCE S O C
E NCOUNTER (for place&route, preparation of power and ground nets, IR drop analysis, and current
density estimations).
2 Introduction
2.1 Theoretical background
As explained in section 9.1 of our textbook,1 four phenomena dissipate energy in static CMOS circuits:
Phenomenon
Charging and discharging of capacitive loads
Crossover currents
Driving of resistive loads (if any)
Leakage currents
Results in dissipation
while node voltages
are in transit
at all times, even after
circuit has settled
Nature
dynamic
static
We will not be concerned with static power in this exercise as we limit ourselves to pure CMOS circuits
with no resistive loads and because leakage is almost negligible due to the conservative fabrication
process being studied. For the needs of EDA tools the dynamic dissipation can be attributed to library
cells as follows.
Internal power Pint is the power dissipated inside a cell for the charging and discharging of internal
capacitances and due to crossover currents.
Switching power Pext is the power dissipated inside a cell for charging and discharging the load
capacitance connected to the cells output. That external load consists of the input capacitances
of all cells being driven plus the parasitic capacitances of the wires (aka interconnect).
The total power dissipation Ptot related to a cell can now be expressed as
Ptot = Pstat + Pdyn ' Pdyn = Pint + Pext
Calculating Pext is straightforward
Pext = fcp
1
2
Cext Udd
2
(1)
(2)
Hubert Kaeslin, Digital Integrated Circuit Design, from VLSI Architectures to CMOS Fabrication, Cambridge
University Press, 2008.
where denotes the switching activity of the cells output node, Cext the load capacitance attached,
and Udd the supply voltage. fcp stands for the computation rate, i.e. the inverse of the computation period. 2 Pint gets calculated in much the same way, yet coming up with accurate activity and
capacitance figures requires detailed information about the inner circuitry and layout of each cell.
A power estimator essentially is a piece of software that sums up the various contributions over an
entire circuit. Provided the same clock and voltage get used everywhere, this amounts to
Pckt =
M
X
m=1
Pint m +
N
X
2
Pext n ' fcp Udd
(
n=1
M
X
m
m=1
Cint m +
N
X
n
n=1
Cext n )
(3)
Index m = 1...M refers to the cells instantiated in the circuit and n = 1...N to the nets of interconnect
running in between. For each cell, an internal activity figure m is estimated from the node activities at
the input(s). Note that Cint m is not meant to correspond to any capacitance physically present in the
circuit. Rather, it is just a numerical parameter adjusted for each cell during library characterization
such as to model its internal dissipation. 3
Equation (3) tells us a few important things about power dissipation and power estimation:
Realistic switching activity figures are crucial, they can be obtained from gate-level simulations.
Realistic capacitance figures are important, they are best extracted from layout data.
Dynamic power grows with Udd squared. The power vs. speed dilemma is discussed in the
textbook.
For standard single-edge-triggered one-phase clocking, computation period and clock cycle are the same fcp = fclk .
Double-edge triggered circuits, in contrast, offer two computation periods per clock cycle so that fcp = 2fclk .
Incidentally observe that any attempt to capture the internal dissipation of a cell with a single quantity is not exactly
accurate as the energy dissipated when one input toggles may also depend on what is happening at other inputs at the
same time. And in the occurrence of a bistable, the current state is likely to matter too. While industrial standard
cell models typically cover all possible situations, we shall not be concerned with such details here.
if AddxSI = 1
then OutputxDO <= InputAxDI + InputBxDI;
else OutputxDO <= InputAxDI * InputBxDI;
end if;
The frequency of ClkxCI is 100 MHz and the input waveforms are represented in Figure2. They are
periodic and the two input values (InputAxDI and InputBxDI) have been chosen to be always the
same. Moreover, suppose that no glitches occur. Supply voltage is 1.8 V.
ClkxCI
InputAxDI
=
InputBxDI
AddxSI
OutputxDO
20
40
80 time (ns)
60
Net
ClkxCI
AddxSI
InputAxDI (per bit line)
InputBxDI (per bit line)
OutputxDO (per bit line)
Further nets
Node activity
[1]
Switching power
Pext [mW]
140
2
...
90
...
...
60
...
...
60
...
...
0
...
...
neglected in the context of this excercise
Internal power
0.04
0.66
0.00
0.12
0.56
0.54
Adder
Multiplier
Output register + mux
Student Task 1:
1. Output waveform: Collecting all 8 bits into one signature, draw the waveform and numeric values of OutputxDO in Figure2.
2. Switching activities: Assuming single-edge-triggered one-phase clocking, complete the
node activity column in Table 1.
3. Power spent for switching of nets: You now have all the facts required to calculate the
switching powers associated with the various nets according to (2). Fill in the numbers into
the last column.
4. Power dissipated within circuit blocks: Now consider Table 2. What is the main sink
of power among the blocks listed there and how much does it dissipate?
5. Consolidated dissipation: Compiling all contributions from Table 1 and Table 2, how
much power does the circuit dissipate internally, that is, with no load attached?
6. Overall dissipation: Suppose each output drives a load of 1 pF. What is the total power
consumption now?
TESTMODExTI
DATAINxDI
16
16
DATAOUTxDO
1
1
Clocking discipline
Clock frequency fclk [MHz]
Supply voltage Udd [V]
Number of interconnect nets N
Avg. load capacitance Cext n [fF]
Avg. switching activity n [1]
Number of cell instances M
Avg. equiv. capacitance Cint m [fF]
Avg. internal activity m [1]
single-edge-triggered one-phase
50
1.8
5 500
30.0
0.2
3 900
25.0
same as n
Student Task 2: Plug in these numbers into (3) and put down the result here: ....
Enable all:
Disable HP:
Clock Gate:
TestModexTI
1
1
0
ModexSI
1
1
0
OutSelectxSI(1)
1
1
1
OutSelectxSI(0)
1
0
0
int value
15
14
2
Dominating Instances
Power [mW]
Global Activity
Input Activity
VCD-Based Activity
Enable all
Enable all (zero inputs)
Disable HP
Clock Gate
Global activity
C ADENCE S O C E NCOUNTER allows to automatically set a default toggle-activity value to all internal
nodes. Throughout the power analysis each internal node of your chip will toggle with this value
during each clock cycle.
Student Task 5: In order to start this analysis, select Power Power Analysis
Run Power Analysis... a . In this form, select the folder reports/power as the results
directory (see Figure 4).For the moment leave the clock frequency at 100 Mhz. Then step into
the Activity tab and write 0.2 as global activity (this means that every node will change its
state with a probability of 0.2 per clock cycle). This is a good initial value. At this point, you are
able to start your first statistical power analysis. Press the OK button (or A PPLY)
a
If the menu Run Power Analysis... is not available select first Set Power Analysis Mode... and press
OK with the default settings. Now the previous menu should be accessible
The power analysis will then start and write lines similar to the following on the C ADENCE S O C
E NCOUNTER shell window:
CPE found ground net: GND
CPE found power net: VCC voltage: 1.8V
INFO (POWER-1606): Found clock ClkxCI with frequency 50MHz from SDC file.
CK: assigning clock ClkxCI to net ClkxCI
Propagating signal activity...
Starting Levelizing
2011-Nov-07 10:29:54 (2011-Nov-07 09:29:54 GMT)
2011-Nov-07 10:29:54 (2011-Nov-07 09:29:54 GMT): 5%
..
Among the messages in the console you will find some information about the clock. Notice that the
clock frequency extracted from the SDC file (50 MHz) does not match the frequency specified in the
GUI. The tool will use the SDC version, so the entry in the GUI will be ignored. It is important that
you always check the clock frequency on the console.
Student Task 6: Adjust the clock frequency (dominant frequency value) in the GUI so that
it matches the SDC value, and rerun the analysis.
There will be a warning message on the console about the TIE cells not having a power model. Since
the tie cells, do not have any switching activity (they tie the output to either logic-1 or logic-0), this is
not really a problem.
At the end of the analysis C ADENCE S O C E NCOUNTER will write a summary on the console. The
result will also be written to the chip filter.rpt file, in the reports/power directory. Have a look at it and
try to identify the main results of the power dissipation of your chip. How much power does the chip
dissipate? What are the values that contribute most to the total power?
Student Task 7: Talk to an assistant and discuss where most of the power is being dissipated.
Calculate the total power dissipated by these instances. Update the results table at the beginning
of section 5. Use the additional column to enter the power dissipated by the above mentioned
instances.
Once we run the analysis again this report file will be overwritten. For this exercise we would
like to preserve the file, so that we can compare the results later on. Step into the encounter
directory of this exercise and make a copy or move the file under a different name, for example:
sh > cd ../encounter
sh > mv reports/power/chip_filter.rpt \
sh >
reports/power/chip_filter_ga.rpt
The first specifies the activity of outputs of sequential logic, while the latter specifies the average number of
times that a clock-gating cell switches in a clock cycle.
10
A good idea is to take a look at it! you should know what you are executing.
To view the input and output of the filter, there is a .do file that will show the relevant signals in the
Wave window. On the console you could type:
vsim > do wave.do
11
For a real design, the simulation could take a very long time, and more importantly, could produce
very large (Gigabytes !!) of VCD files. For your own designs consider writing the VCD files to the
/scratch directory.
This simulation, however, should not take that long. As you can see from the wave window, the inputs
are rather random, and should produce a lot of activity.
Stimuli-based Activity
At this point, we have a VCD file that contains the toggle activity of the nodes in the design based
on a simulation with actual stimuli. We will now give it to C ADENCE S O C E NCOUNTER to perform a
stimuli-based power analysis:
Student Task 11:
As before, select the menu Power Power Analysis Run Power Analysis....
In the main tab, select VCD F ILE to perform a simulation-based power analysis. Note
that if you dont check this option, SoC uses the values given in the other fields. Take the
generated VCD file and enter as S COPE the top-level module chip filter tb/DUT. Note that
there is no leading slash / in the scope. You could also specify a start and stop time for
the power simulation. Here, specify a start time of 100 ns, and a stop time of 20,000 ns
(numbers are taken from the simulation). Leave the block field empty and press A DD. Do
not forget to press A DD!
The results directory should be reports/power. See Figure 5 to get an overview of the
windows setup. Press OK..
Figure 5: Run Power Analysis menu in Cadence SoC Encounter with vcd file.
12
Once the power analysis starts, it will start writing to the C ADENCE S O C E NCOUNTER shell messages
that look similar to the last times. But we have to study them carefully. When the clock period specified
in the SDC file, and the clock period within the VCD file do not match, you will get a message that
says (for example):
WARNING (POWER-1784): Existing clock frequency 217.391MHz
is being overwritten with 200.034MHz on clock rooted on
net ClkxCI from VCD file.
In this case the VCD clock frequency will be taken. In our exercise, we do not have this problem.
Furthermore, there will be a message similar to the following one
With this vcd command, 4426896 value changes and 1.99e-05 second
simulation time were counted for power consumption calculation.
The line above summarizes how C ADENCE S O C E NCOUNTER has interpreted the VCD file. It is very
important to make sure that the time (expressed in seconds) is equal to what we have simulated (and
have intended). In our case, the time should be 20,000 ns - 100 ns =19,900 ns, which matches the
above message. Make sure that you have the correct time.
Filename (activity)
Found in design
Coverage for file
: ../modelsim/vcd/chip_filter.vcd
: 24858/26118
: 5473/5473 = 100%
The lines above tell us what C ADENCE S O C E NCOUNTER has extracted from the VCD file. It is
very easy to make mistakes and use the wrong VCD file. The second line shows the total number
of switching activities, and the third line shows what percentage of the internal nodes that were
annotated.
If you see that the message looks like the following:
Found in design
Coverage for file
: 0/0
: 0/5473 = 0%
you have a problem (most probably, it is the wrong file, or the wrong scope has been specified
because the leading slash has been omitted). C ADENCE S O C E NCOUNTER will still perform the
analysis regardless of the success of the annotation. Since nothing was backannotated, the results
will just be wrong.
Student Task 12:
Take a look at the report chip filter.rpt in the output directory that you have selected. How
much power does the chip dissipate now?
Update your results table with the latest result. Do not forget to update the power in the
second (mystery) column.
Compare the results with the older analyses, does your result make sense?
13
We will implement a solution that uses clock gating technique to disable the unused filter bank. The
test circuit already has the control signals for this solution (see Section 3.3). We will use the option
Clock Gate. This option will a) only enable one block, and b) use clock gating to stop the clock
propagation in the block that is not enabled.
14
Next week, we will study the effects of IR drop and investigate the effects of different power distribution
strategies.
15
4
5
16
BCJRDDataxDI
in2Gamma
DataxDI
DataxDI
mem1
Input
Memory
mem2
mem3
MBJCRFsm
OutRam1xD
OutRam2xD
OutRam3xD
LLRSelectxSI
PADS
gammaAdder
gammaAdder
PADS
FSM
ModexSI
gammaAdder
BistGammaOkxTO
BistAlphaOkxTO
TestModexTI
BistAlphaDonexTO
BistEnxTI
BistGammaDonexTO
alphaConn
alphaUnit
dummyBeta
Conn
betaConn
betaUnit
dummyBetaUnit
ClkxCI
alphaMem
ResetxRBI
GammaxD
BetaGammaxD
LLRxDO
LLRxDO
BetaxD
AlphaxD
LLRUnit
MBCJRUnit
i_res..
top
mbcjr_chip
17
For the Rail Analysis, some power-specific information is required, which can be gained from the
power analysis as follows:
Student Task 19:
1. Setup the power analysis mode: PowerPower AnalysisSet Power Analysis \
Mode... and click OK using the default settings.
2. Switch to the C ADENCE S O C E NCOUNTER shell and execute the following command in
order to perform a power analysis which generates the required power-specific information
for the Rail Analysis:
enc > report_power -rail_analysis_format VS \
enc >
-outfile reports/power/mbcjr_chip_vcdx4.rpt
3. Watch the output within the C ADENCE S O C E NCOUNTER shell and check whether the
coverage of the node activity file reaches 100%.
The output of the power analysis should look similar to the following:
Loading TCF file save/mbcjr_chip.enc.dat/mbcjr_chip.tcf
Filename (activity)
Found in design
Coverage for file
: save/mbcjr_chip.enc.dat/mbcjr_chip.tcf
: 26202/26202
: 26202/26202 = 100%
TCF-Toggle Count File: You should have recognized that for the previous power simulation we
didnt use a VCD file (as within the first part of the training), but a TCF file instead. As the name suggests, the TCF filetype contains the toggle count information of the nodes, and is an SoC Encouterspecific file format. In contrast, the VCD file format contains the complete timing information. TCF
files can be generated from VCD files but not the other way round.
Now you are ready to start analysing the design with regard to its power distribution.
8 Rail Analysis
8.1 Rail Analysis Setup
Student Task 20:
From the menu select PowerRail AnalysisSet Rail Analysis Mode.... Within
the B ASIC tab, set the ACCURACY to Accurate. For the P OWER G RID L IBRARIES choose
the .cl files in the directory tech/cl/.
Select EM Analyse Models and choose the file tech/EM.6.models.
Compare the settings to Figure 7. If all is correct, save the settings by using S AVE .. and
then press the OK button.
18
19
20
IR Drop
For the first step we will analyze the IR Drop map of the chip.
Student Task 25: Under R AIL A NALYSIS P LOT T YPE select IR - IR D ROP. Make sure that the
option AUTO A PPLY in the ACTION field is checked. Otherwise you will have to press the A PPLY
button in order to show the results. Compare your settings with those from Figure 9.This will give
you a color coded map of the IR drop of your chip. The highest drop will be colored dark red. You
can dim the rest of the circuit with F9 button to see the IR drop more clearly.
By default, the tool will automatically determine the color ranges. You can change this if you want in
the AUTO F ILTER field (e. g. by pressing the AUTO button).
Resistor Current
In the Power & Rail Results window select RC - R ESISTOR C URRENT to show the plot of the current
flowing across the wires. Again you can check AUTO A PPLY or press A PPLY.
21
22
Default design:
Connected pads:
Connected macro:
Widened power rings:
Doubled power rings:
Power rings @ Metal /Metal :
Added power stripes:
Student Task 27: Have a look at the results of the rail analysis of the default design, which you
have gained during Section 8 and fill out the first row of Table 3. The first empty column of the
table should contain the maximum IR Drop within the design, whereas the second column should
be completed using the number of nets, which violate the IR Drop threshold (in %).
23
Figure 10: Special Route GUI in Cadence SoC Encounter to improve Pad Connectivity.
24
Resistance
/2
Thickness
m
Metal 1
Metal 3
Metal 6
a
Watch out for the maximum wire width before slotting occurs.
Now you should be able to set the width of the power rings accordingly:
Student Task 31:
Use the ruler to determine the width of the power rings. How wide are they currently?
What would be a more suitable width for the power rings?
Ask an assistant whether your assumptions are suitable or not. Correct them if necessary.
Afterwards open the menu Power Power Planning Add Rings... and insert the
settings illustrated in Figure 11.
Run another rail analysis and complete the results table.
Widen the power rings definitely improved the power distribution of our design. Nevertheless, not all
of the nets reach the previously defined threshold. Hence, we have to take further steps in order to
acquire a lower IR Drop. One possibility is to double the number of power rings:
Student Task 32: Open the menu Power Power Planning Add Rings... and apply
the same settings as within the previous step, except the N ET ( S ). Here insert GND VCC GND \
VCC, which results in doubled power rings. After hitting the OK button, run another rail analysis
and write down the results in Table 3.
25
26
As you should see from your results, the addition of a second power ring does not improve the power
distribution much. Therefore you can delete the second power ring we have just created by simply
removing the appropriate wires within the design. What you can see from the previous step is that
oversized power networks do not always help you to get a better power distribution. Instead, they
only consume die size, which certainly can be used in a better way.
Throughout the previous section you have gained some electrically-specific information about the
different metal layers. Maybe you can already imagine that the choice of the correct metal layer also
plays a major role during designing the power distribution network. Hence, let us now try to change
the metal layers of our power ring in order to reduce the IR Drop.
Student Task 33:
First, remove the existing power ring within the floorplan (select and delete).
Open the menu Power Power Planning Add Rings... and keep the previously
entered settings (Check that you do not insert the unnecessary second power ring this
time.), except that you choose a more suitable metal layer.
Press the OK button and run another rail alaysis. Have a look at the results of the rail
analysis and complete the corresponding row in your results table. Which metal layer did
you choose and does the change improve the power distribution?
Figure 10.9 of Section 10.4 within our textbook Digital Integrated Circuit Design, from VLSI Architectures to CMOS
Fabrication shows some sample layouts.
27
As already mentioned earlier, re-runnig the whole backend designflow for each power distribution improvement
would have been too time-consuming for a single afternoon. Therefore the nice guys from the DZ have already
prepared a suitable location for the power stripes.
28
/sourcecode/VHDLTools.vhd
/sourcecode/LTEPkg.vhd
/sourcecode/mbcjr simulstuff.vhd
/sourcecode/mbcjr chip TB pack.vhd
/sourcecode/mbcjr chip TB.vhd
You may want to have have a look at the gate-level compile script we used during the first
part of this training.
Simulation of the netlist: If the netlist and the VHDL files have been compiled successfully, you
7
Although you should be familiar with all of the tasks required for this part of the training, do not hesitate to ask an
assistant if you get stuck somewhere. The EDA tools can be a little bit confusing at the beginning. Nevertheless,
this part of the training should help you to get a better overview of how power analysis works by going through all
of the different steps on your own, this time without a guided tour provided by the assistants.
29
can start with the actual simulation of the netlist. The gate-level simulation script from the
first part of the training will help you to design a suitable run script for your current design.
The SDF file you will need for the simulation is located at /encounter/out/mbcjr chip II.sdf\
.fixed.gz. Because the present design has a RAM macro block in it, you have to specify
the fsa0a c memaker verilog - library before you can run the simulation (In addition to the
core- and I/O-specific verilog libraries.).
In order to get the VCD file, which contains the information of the nodes during the actual
running phase of the design, we recommend to generate the VCD file only between 1s
and 3s. This, on the one hand, gives you the advantage that you do not generate the
toggle activity during the initialization phase and on the other hand limits the size of the
resulting VCD file because of the simulation end time.
Power Simulation: Now that you have the node activity file, you can switch back to C ADENCE
S O C E NCOUNTER and create the power-specific files required for the subsequent rail analysis by running a VCD-based power simulation. Do not forget to run the power simulation
setup at Power Power Analysis Set Power Analysis Mode... at first. How
much power does the design consume?
Check the output in the C ADENCE S O C E NCOUNTER shell in order to be sure that the
coverage of the VCD file is OK and hence your power value is correct. After running the
power analysis the files static GND.ptiavg and static VCC.ptiavg should be available in the
directory /encounter/reports/power/.
Now you are ready to start with your first attempts in order to improve the power distribution of the
new design. Do not forget to do the setup of the rail analysis as described in Section 8.1 before you
start with the actual analysis.
Student Task 37:
Your first task will be to perform a rail analysis of the initial design and complete the first row
of Table 4. Then, improve the power distribution network step-by-step using the techniques
you have seen in the guided example in the previous section.
Complete the results table below by describing the power distribution technique you have
applied in the first row and the resulting maximum IR Drop in the second row. The goal
should be to achieve a minimum supply voltage level of 1.788 V.
Remark (Hints): In the following, we provide some hints and comments that should help you to
the improve power distribution:
1. A well formed power distribution network cannot be detected by only considering the worst
case IR Drop. Rather, try to build your network in a way such that almost all components
(standard cells, macro blocks, etc.) are provided with the same supply voltage. This includes that you should not simply stop your efforts as soon as all nets do not violate the
initially set threshold anymore, but try to achieve a balanced power distribution.
2. As you have seen, the special route option in C ADENCE S O C E NCOUNTER can be used
to route specific nets, such as VCC and GND. However keep in mind that C ADENCE S O C
E NCOUNTER considers only those nets, which are not yet connected and moreover considers only those wires, which have not been placed yet (i. e. if there are two wires already
placed on two different metal layers and are running across each other, C ADENCE S O C
E NCOUNTER will not check whether they should be connected during a special route pro-
30
cess).
3. Some of the problems in the design might be much easier to detect by using further analysis
methods of the rail analysis, which we have not mentioned in this training. Feel free to try
the other analysis methodes besides IR Drop and Current Density.
Voltage / IR Drop [V ]
1
2
3
4
Congratulations Thats it!
Present the results to your assistant and discuss any open questions.
31