Sie sind auf Seite 1von 5

Methods for Electronic System Design and Verification

Lab Report
Akshay Vijayashekar
akshay@student.chalmers.se

The main aim of the lab series is to understand and implement the stages in the physical
design flow of a chip from the basic RTL to GDSII stage. During this process, we learn
various software tools employed in Electronic Design Automation (EDA) to accomplish
the design of an IC. In the preparatory lab exercise we begin by drawing a block diagram
of a 32-bit ALU with different building blocks like the shifters, adder, muxes, registers,
etc. In the further exercises, we start by writing RTL code description for ALU using ripple
carry adder and sklansky adder. This design is tested against a given set of vectors and
further synthesized. We make a detailed analysis of area and power consumption for
different time constraints. Finally placement and routing of the design is done ending the
design flow process.

Exercise 1: ALU DESIGN – VERIFICATION


The block diagram done in the preparatory exercise is RTL coded using VHDL in this
exercise. We design two versions of RTL code for ALU, one based on ripple carry
adder(RCA) and another based on sklansky adder. We also write a test bench in VHDL to
logically verify the designed ALU. The VHDL description of ALU is based on the block
diagram submitted as a part of preparatory assignment. The test bench written is only
used for testing of the ALU code and hence need not be synthesizable i.e., need not be
realized in hardware. Such a code which cannot be realized in hardware is called non-
synthesizable code.
Next is the verification process which is the most important step. Here we compile and
test our VHDL code against a set of given vectors using ncsim compiler.

Results: The VHDL desciption of both ALU designs involving ripple carry adder and
skalnsky adder was successfully tested without any errors against all the test vectors
provided.

Conclusions: The block diagram which was drawn provides the designer with a proper
overview and understanding of what has to be designed. The VHDL code that is written
should be based on this overview block diagram. If this is done, we will end up writing a
good design.

Testing of the design is a crucial step in the design flow process. The bugs, if any in the
design, can be removed early in the process. If these bugs are found during the later
stages of the design flow the cost associated with the bug-fixes is much higher than the
time and cost invested in testing the design during the initial stages. Functional
verification is done to check if the desired output values are obtained for a given set of
inputs. This is accomplished by the use of test bench. The importance of test benches
can also be seen in this exercise. Since the number of test vectors are very high, looking
at the wave forms and verifying is not only tedious but also error-prone. Moreover, since
the number of test vectors match the real scenario, test benches provide a fool proof and
less time consuming method to test the design. In case of bugs, the specific part can be
tested using waveform simulations.

Exercise 2: ALU DESIGN - BASIC SYNTHESIS


This exercise mainly focuses on synthesizing the ALU design with the ripple carry adder
for different timing constraints. The RTL code is mapped using a 1.2 V, 130-nm process
technology library for synthesizing the design. We first synthesize without any timing
constraint and further with stricter constraints analysing the worst case path, area
required for implementation, power consumption for each case. Finally we verify the
synthesized design i.e., the gate netlist produced by the RTL compiler to check if it is
matching with the original VHDL code.
Results: As said above, initially the ALU code is synthesized without any timing
constraint. Since no constraint is set by us, we obtain the intrinsic implementation timing
so that we can provide further constraints based on this value. The timing report for my
design indicated 5454 ps as the worst case time, with the worst case path through the
ripple carry adder from bit 0 of input register B to the 30th bit of output register Outs
which is obvious. D-flip flops, inverters, a non inverting multiplexer, an AND-OR-INVERT
gate, a 4 input NAND gate and many 1-bit full adders belong to the worst case path. The
total area of implementation is 13129.95 µm2. As high as 69% is occupied by logical
elements while 27% constitutes of sequential elements.

In the next assignment, we synthesize by giving a time constraint which is 50% of the
previously obtained value which is 2727 ps. This was satisfied with a timing slack of 3 ps.
The worst case path here is still through the ripple carry adder starting from bit 0 of
Register A to the bit 31 of Output register. This path consists of D-flip flops, a 2 input
NAND with an inverted input, inverter, 1-bit full adders among many others. Essentially
the worst case path has not changed much from the previous case, but faster gates have
been implemented in this case. Gates with larger transistors, having higher driving
capabilities and lower resistance, have been selected to meet the timing constraints,
area being a trade-off for speed. Larger gates consume more area with less delays. The
implementation has increased to 15308.53 µm2 where logical elements consume 71.7%
of the area while sequential elements take 24.3%. Also, we notice 9 buffers being added
which take 0.4% of the total area.

We now further provide stricter constraints of 1250 ps i.e 800 MHz. We find that there is
a timing violation with a timing slack of -1139 ps. So this design is not suitable for an 800
MHz. The worst case path is again through the ripple carry adder. The area of
implementation in this case is 15346.86 µm2 which is slightly higher than the previous
one. In this case faster gates and flip flops with low delays and fan-outs are chosen and
also more buffers have been added. The percentage of buffers (24 in number) has
increased to 1% which also add to the cause of higher speeds. Even with the fastest
available gates in the given technology library of 130 nm, we are unable to meet the
timing constraints.

Finally, the synthesized netlist that is generated is verified against the same set of
vectors using which the VHDL code was verified. The netlist which was synthesized with
a 50% time constraint (2727 ps) was successfully verified.

Conclusions: Static Timing Analysis (STA) is an effective way for calculating the delay of
a circuit without simulating is employed here to calculate worst case timing path. Static
timing analysis plays a vital role in facilitating the fast and reasonably accurate
measurement of circuit timing. Since it involves the use of simplified delay models it is
fast and reasonably accurate measurement of circuit timing can be obtained. When the
timing constraint is reduced, the ripple carry adder continues to be the bottle neck of the
ALU even while using the fastest gates available in the technology library. The gates with
higher speeds are selected for tighter constraints but they consume more area. Resizing
of transistors and addition of buffers is done to achieve higher speeds.

Exercise 3 : DESIGN RESPIN AND POWER ANALYSIS


Fast adders like Sklansky adder can be implemented to overcome the bottle neck of RCA
in the previous design. The delay in case of this adder is logarithmically proportional to
the number of bits as against the RCA where it is directly proportional to the number of
bits. In this lab exercise, we analyse the improvement of ALU with the sklansky adder
over the RCA. Also, we do power analysis for the two designs.

Results : In the first exercise we synthesize the Sklansky adder and check for the
unconstrained intrinsic delay. My design had a worst case delay of 4805 ps with the
worst path from bit 0 of B register to the bit 31 of Outs register through the Sklansky
adder. The delay value is comparatively lesser than the ripple carry adder delay. The
estimated area for this design is 13892.456 µm2.
Next, we synthesize the Sklansky ALU design with a constraint of 1250 ps i.e. for 800MHz
speed. This design fulfils the constraint of 800MHz and now we notice the delay path to
have shifted from the adder to the right shifter. The worst case here begins with bit 0 of
register B and ends with bit 6 of Outs register. We have now successfully designed an
adder which is no more the bottleneck of the ALU design. On checking for the 10 worst
case paths, we find left shifter(position 8 and 9) also in the list along with the adder. The
required area for implementation is 16494.65 µm2 which is far higher than the
unconstrained case. This is because gates with larger transistors are used to reduce
delay as they have higher driving capabilities, naturally increasing the area for a better
performance. This process of changing the design to fulfill the given set of timing
constraints is called timing closure.

A plot of the area of implementation for both Sklansky and RCA is as shown in figure(1).
Notice that the area of Sklansky adder exponentially increases with the reducing time
constraint while the area of a ripple carry adder increases linearly. So a tighter constraint
means that the Sklansky based ALU uses much faster and larger gates as compared to
the RCA based ALU. Also since adder is no longer a part of the critical path Sklansky
based ALU, it optimizes the shifters using large gates. We also verify the Sklansky ALU
design against the set of given test vectors. The netlist was parsed without any errors.

We now look at the power consumed by both the ALU models. A theoretical-probabilistic
model based is used for analysis. The input high probability is set to 0.5 and its toggle
probability i.e. the number of times it toggles between a high and low is set to 0.02 in the
first case and then to a higher value of 0.1. The obtained values are plotted for toggle
probability 0.02 as shown in figure(2). We notice that RCA ALU consumes more power
with tighter constraints as compared to the Sklansky adder ALU. But looking at the trend
RCA ALU consumes less power for lower frequencies. Table(1) also summarizes the
power consumption values obtained for 0.1 and 0.02 toggle probabilities. We can notice
that there is an increase in the dynamic power when the toggling rate is high because of
higher switching rate between 1 and 0 where as the leakage power remains almost the
same for both the probabilities. When introspected into the power consumption of
individual blocks, we notice the adder block consumes more than 10% of the total power.

We also check for the power consumed by the clock tree. At 360 MHz, it consumes
162117.818 nW and 211706.182 nW for Sklansky and RCA ALUs respectively.
Theoretically calculating the power using the formula Power = f * VDD *C by considering
VDD=1.2V and C = 309 fF for Sklansky and 404fF for RCA ALU, we see that the values
fairly match up with the values we obtained.

We checked for the power consumption by assuming certain probabilities. We could get
more realistic by calculating the power consumed when the test vectors we used for
verification earlier are used for analysis. These vectors are now fed into the tools and
they determine the probability as well as the power consumed. Table(2) shows the power
consumed in the three different cases. The dynamic power is high in case of Random test
vectors which means that the toggling or switching of inputs in these cases is very high
Figure(1) Area(y-axis)
compared to the otherintestµm2 vs timeThe power consumed in case of Realtrace(150000
vectors.
test vectors) is comparable
in ns. with the Regular(1000 test vectors) despite the high number
Figure(2) Power(y-axis)
of test cases in µm2
in the former vs time
case. in
We cannot zero-in upon a fixed number of test vectors
ns. the number of test vectors higher is the accuracy of estimation.
for testing. The more

0.1 0.02 Leakage Dynam


ic
Leakag Dyna Leakag Dyna Rando 331642.5 10542
Power e mic e mic m 72 65
34915 19390 34905 15123 Realtra 338729.8 52264
RCA 3.1 50 6.6 84 ce 99 0.8
Sklans 33168 17946 33124 13950 Regula 333278.3 54062
ky 7.1 28 1.2 77 r 64 0.4
Table(1) Power consumed in nW for different toggle Table(2) Power consumed in nW for test
probabilities vectors
From the Toggle Count Format (.tcf file), Clk has a toggle probability of 0.5 for Realtrace
and 0.4998 for the other test vectors. The number provided after the probability gives us
the number of times the value switches between 0 and 1. Considering the bits A[15] and
A[16] in all the three cases, we tabulate table(3). Notice that in case of Regular test
vectors, A[15] has a toggle probability of 0 which means it never goes to state 1. This
can be verified from the A.tv file where we see that bit 15 remains zero throughout the
test vectors. Also, we can notice that A[16] toggles from 0-1 and 1-0 justifying 0.5002
probability shown in the .tcf file.

Random Realtrace Regular


A[1
A[15] A[16] A[15] A[16] 5] A[16]
Probabilit
y 0.4873 0.5022 0.1555 0.0614 0 0.5002
Toggle 242670 246640 82430 53070 4967700
count 00 00 00 00 0 0

Table(3) Toggle probabilities and Toggle count for


different test vectors
Conclusions : The selection of the best design depends upon the set of constraints the
manufacturer or the application requires. In case of our ALU design, Sklansky based ALU
suits well for high speed-more area while RCA based ALU is better suited for designs
where speed is not important but area is the main criterion. This can also be noticed from
the graphs. The power consumed also can be a criteria for selecting among the different
available designs.

Exercise 4: PLACE AND ROUTE

The ALU we have designed and tested is now just one step away from being sent to
production. This is the Placement and Routing of the ALU design where the different cells
are placed in the chip area and wires routed between them. This step is usually time
consuming and also re-iterated a number of times in practice. We first form various
blocks like the input, output register blocks and then proceed to Floor planning. Floor
planning is next done manually considering the die area where the different blocks like
adder, input and output registers, shifters are organized on the cell area. It is a good
practice to place macros in the corners or along the perimeter. The input block is placed
along the left edge while the output block is placed along the right edge. Another
important consideration is the placing of time critical blocks. The area utilization of the
various blocks should be between 60-85% for good results. Power routing is then done by
adding power rings and stripes. This is a very important step which if not planned well
affects the performance leading to voltage drops and noise. Even though it is possible to
manually place all the cells along the rows on the chip area, it is highly in-efficient and
time consuming. Also, considering the various constraints, it is practically infeasible. So
only Floor planning is done manually and then this is fed to the Place and Route EDA
tools. We use the tool SoC Encounter with Guide mode for placing of the cells. The netlist
generated during the previous lab exercise was considered for this exercise.
Results : We first use the mode pre-place optimization and then again by in-place
optimization which improves the design performance by reducing delays and correcting
capacitive violations. The placement density also increases from 73% to 86% after in-
place optimization indicating a more spread out placement. Clock Tree Synthesis(CTS) is
now performed on the placed layout. In this step the clock is routed to different parts of
the chip layout. We first perform Pre-CTS and upon looking at the timing report my
design showed a violation by -0.993 ps. A post-CTS operation only worsened the delay
increasing the delay to -1.064 ps.