Sie sind auf Seite 1von 58

Low-Power and Area-Efficient Shift Register

Using Pulsed Latches

1
ABSTRACT
A technique is proposes a low-power and area-efficient shift register using pulsed latches.
The area and power consumption are reduced by replacing flip-flops with pulsed latches. This
method solves the timing problem between pulsed latches through the use of multiple non-
overlap delayed pulsed clock signals instead of the conventional single pulsed clock signal. The
shift register uses a small number of the pulsed clock signals by grouping the latches to several
sub shifter registers and using additional temporary storage latches. A 256-bit shift register using
pulsed latches was fabricated using CMOS process. The proposed shift register saves area and
power compared to the conventional shift register with flip-flops.

2
CHAPTER 1

INTRODUCTION
Flip-flops (FFs) are the basic storage elements used extensively in all kinds of digital
designs. In particular, digital designs nowadays often adopt intensive pipelining techniques
and employ many FF-rich modules such as register file, shift register, and first in first out.
It is also estimated that the power consumption of the clock system, which consists of
clock distribution networks and storage elements, is as high as 50% of the total system
power. FFs thus contribute a significant portion of the chip area and power consumption to the
overall system design. Pulse-triggered FF (P-FF), because of its single-latch structure, is
more popular than the conventional transmission gate (TG) and masterslave based FFs in
high-speed applications. Besides the speed advantage, its circuit simplicity lowers the
power consumption of the clock tree system. A P-FF consists of a pulse generator for
strobe signals and a latch for data storage. If the triggering pulses are sufficiently narrow,
the latch acts like an edge-triggered FF.

Since only one latch, as opposed to two in the conventional masterslave


configuration, is needed, a P-FF is simpler in circuit complexity. This leads to a higher toggle
rate for high-speed operations. P-FFs also allow time borrowing across clock cycle
boundaries and characteristic a zero or even negative setup time. Here we present a low
power pulse triggered flip-flop based on a signal feed through scheme. The design manages
to shorten the longer delay by feeding the input signal directly to an internal node of the latch
design to speed up the data transition. This mechanism is implemented by introducing a simple
pass transistor for extra signal driving. When combined with the pulse generation circuitry, it
forms a new P-FF design with enhanced speed and power-delay-product (PDP)
performances.

3
As we know, the clock system which consists of the clock distribution network and
timing elements(flip-flops and latches) is one of the most power consuming components in a
VLSI system[1]. This power consumption is approximately 30% to 60% of the total power
dissipation in a system. As a result of reducing power consumed by flip -flops will have a deep
impact on the total power consumption. In common digital VLSI circuits,the various sources of
power dissipation are switching power (Pswitching),short circuit power(Pshortcircuit),static
power(Pstatic) and leakage power(Pleakage).

The following equation describes the total power consumption(Ptot) related to these four
power components.

The important ways to reduce this power consumption are voltage scaling and double
edge triggering .Voltage scaling is the most effective way to decrease power consumption, since
power is proportional to the square of the voltage (the golden equation for power consumption of
VLSI circuits P =CLVdd2fclk ; where CL load capacitance ,Vdd supply voltage and fclk
clock frequency [7]). However, voltage scaling is associated with threshold voltage
scaling which can cause the leakage to increase exponentially. On the other hand, double-edge
triggered clocking can be used to save half of the power on the clock distribution network results
in total power consumption. Double edge triggering means that , a flipflop responses for both
positive(0 to 1 transition) and negative(1 to 0) edges results in cutting the frequency of the clock
by one half . In this paper the second method-double edge triggering is proposed to implement
clock branch sharing-implicit pulse(CBS_ip) scheme flip-flop and make comparison analysis
with the existing double edge triggering flip-flops.

In view that, most double edge triggered flip -flops(DEFF) are developed using single
edge triggered flip -flops (SEFF) design. The various SEFF are traditional master-slave FF,
sense amplifier based FF, pulse triggered FF. The first two SEFF are having two stages and
are characterized by a positive setup time, causing large D-Q delays. Alternatively pulse
triggered FF reduces the two stages into single stage and is characterized by soft edge property.
The pulsed latches have fewer clocked transistors and hence lower power consumption. The

4
pulse edge triggered flip -flops are classified in to two types : Explicit pulsed FF(ep-FF) and
implicit pulsed flip-flop(ip-FF).

Generally the DEFF design will use more clocked transistors than SEFF design .
However, the DEFF design should not increase the clock load too much. The DEFF design
should aim at saving energy on both the clock distribution network (by halving the
frequency) and flip-flops. It is preferable to reduce circuits clock loads by minimizing the
number of clocked transistors . Furthermore, from the equation (1) ,circuits with reduced
switching activity would be preferable. Low swing capability is also very helpful to
further reduce the voltage on the clock distribution network for power saving.

Due to the fact that voltage scaling can reduce power efficiently, the cluster
voltage scaling (CVS) systems are also preferred. The various techniques to implement double
edge triggered flip-flops are conventional master slave scheme ,explicit pulse triggered scheme
and implicit pulse triggered scheme. In contrast, the various implicit pulse triggered schemes are
symmetric pulse generator(SPGFF) scheme, conditional pre-charge(DECPFF) and the clock
branch sharing-implicit pulse(CBS_ip)schemes.

The increasing significance of portable systems and the need to limit power
consumption (and hence, heat dissipation) in very-high density Very Large Scale Integration
(VLSI) chips have led to rapid and innovative developments in low-power design during the
recent years. Flip-flops (FFs) are the basic storage elements used extensively in all kinds
of digital designs. In particular, digital designs nowadays often adopt intensive pipelining
techniques and employ many FF-rich modules such as register file, shift register, and first in
first out. It is also estimated that the power consumption of the clock system, which
consists of clock distribution networks and storage elements, is as high as 50% of the total
system power. FFs thus contribute a significant portion of the chip area and power consumption
to the overall system design.

Pulse-triggered FF (P-FF), because of its single-latch structure, is more popular


than the conventional transmission gate (TG) and masterslave based FFs in high-speed
applications. Besides the speed advantage, its circuit simplicity lowers the power consumption of
the clock tree system. A P-FF consists of a pulse generator for strobe signals and a latch for data

5
storage. If the triggering pulses are sufficiently narrow, the latch acts like an edge-triggered
FF. Since only one latch, as opposed to two in the conventional masterslave
configuration, is needed, a P-FF is simpler in circuit complexity. This leads to a higher
toggle rate for high-speed operations. P-FFs also allow time borrowing across clock cycle
boundaries and characteristic a zero or even negative setup time.

Depending on the method of pulse generation, P-FF designs can be classified as implicit
and explicit [3]. In an implicit-type P-FF, the pulse generator is a built-in logic of the latch
design, and no explicit pulse signals are generated. In an explicit-type P-FF, the designs of pulse
generator and latch are separate. Some implicit type of P-FF designs are ip-DCO(implicit pulsed-
data close to output), modified hybrid latch flip flop(MHLFF), single ended conditional capture
energy recovery(SCCER), signal feed through flip flop(SFT FF).

Shift register is considered as a kind of sequential logic circuit, which is mostly for
storing digital data. Shift register is consisted of FFs in the group. FFs are linked together in a
way that the output of one is the input of next one. The whole FFs are running with
common clock and all FFs are sector reset simultaneously. A register let every FF to set free
for keeping information of its nearby neighbor.

The storing capacity in a register is the whole quantity of bits (0 or 1) from digital
data that may be held. Every FF within a shift register is considered as one bit of storing
capacity. Thus, the number of Flip-flop in a register defines its storing capacity. FF can be
defined as an electronic circuit that retains the logical state of data input signals once it
responds to a clock pulse. They are mostly implemented in the computational circuit to
function in a predefined sequence during repeating clock period to take and keep data for a
restricted time interval, which is adequate for other circuits in the system to promote the process
data.

In every clock signals rising and falling edge, alldata which are kept in the FFs are freely
available for other computational and sequential circuitry to be applied as input. Double-edge
triggered FFs are those, which keep data on leading and trailing edge. The FFs with one edge
storing capability are called single edge triggered FFs [23-24]. The FF of type D that is
extensively used is familiar as delay or data FF (D-FF). This kind of flip-flop takes the

6
inputvalue in certain part of the clock cycle (falling or rising edge). The taken value turns
to Q output and does not change at other times. D-FFs are used as a delay line or a zero order
hold or as a memory cell.

The D-FFs in integrated circuits have the ability to set or reset mandatory. Benefit of the
D flip-flop in comparison with the D-type transparent latch is the signal on the D input is
taken when the FF is clocked [25]. The next change on the D input is neglected until
the subsequent clock event. An exclusion is that some FFs have a reset signal input,
that will reset Q (to zero), and can synchronous or asynchronous by the clock.

Testing of any chip is mandatory to guarantee its functionality after the manufacturing
Process. Based on type of the circuit, different testing techniques were proposed. Scan based
testing is one of the popular testing technique for digital circuits [1]. During the logic synthesis
phase of ASIC design the classical D- flip-flop is replaced by the scan flip-flop as a design
for testability. The logic diagram of scan flip-flop is as shown in figure.1 usually the scan flip-
flop is the combination of multiplexer and a D-flip-flop. These scan flip-flops are
connected as a shift register to pass the test vectors into the circuit.

Block diagram of Scan flop

Testing cycle includes sequence of three different cycles as shift-in, capture and
shift-out. During shift-in and shift-out cycles the circuit remains in test mode and during
capture cycle the circuit remains in normal mode. The power consumption during shift
cycle is directly proportional to the switching activity of the number of components in the circuit

7
due to the serial shifting of test vectors. Zorian showed that Power dissipation during test
mode of an IC is significantly higher than during normal mode.

Different techniques are proposed to reduce the test power during both shift cycle
and capture cycle. Software based method of reducing test power during shift cycle proposed
by Dobholkar [4] where test vectors are reordered such that to reduce the number of
transition in the circuit by 10% - 14%. Kajihara [5] proposed the software based method to
reduce the switching activity in the circuit by filling the don t care value with the value
of adjacent on the left. This method reduces the switching activity by 36% - 47%. Preferred
fill is the software based power reduction method proposed by Ramersaro to reduce the
switching activity of the circuit during capture cycle. There are few hardware based methods of
reducing test power. Gerstendorfer proposed a method of adding NOR gate with the scan
cell to hold the constant output value in combinational circuit during scanning. Swarup

Bhunia proposed a technique of inserting extra supply gating transistor in the supply
to ground path for the first-level gates at the outputs of the scan flip-flop. This method showed
improvement of 62% in area overhead, 101% in power overhead and 94% in delay
overhead. Amit Mishra proposed a modified scan flip-flop for low power testing in
which the flip-flop disables the slave latch during scan and uses an alternate low cost
dynamic latch. There are different latches and flipflops with many different techniques are
proposed to reduce power and delay during testing.

8
CHAPTER 2
LITERATURE SURVEY

For each and every designing in VLSI era the power consumption plays a vital role. Low
power has emerged as a principal theme in todays electronics industry. The low power VLSI
design has important role in designing of many electronic design systems. On designing
any combinational or sequential circuits, the power consumption, implementation area,
voltage leakage, and efficiency of the circuit are the important parameters to be considered
initially. A low-power flip-flop (FF) design featuring an explicit type pulse-triggered structure
and a modified true single phase clock latch based on a signal feed-through scheme is
presented. The proposed design adopts a signal feed-through technique to improve this delay.
Similar to the SCDFF design, the proposed design also employs a static latch structure and a
conditional discharge scheme to avoid superfluous switching at an internal node. The

9
proposed design successfully solves the long discharging path problem in conventional explicit
type pulse -triggered FF (PFF) designs and achieves better speed and power performance.

In low-power digital design, especially in shift registers, flip-flops (FF) plays a


significant role. In shift registers, the power consumption of system clock is estimated to
be half of the overall system power. Therefore, selecting the right FF is very important
for designing an compact size and low power shift register. In this paper, a review of
different FF designs that have been applied for different shift register (SIPO, PIPO, SISO and
PISO) is presented. The connection between FFs parameters and shift registers is also discussed.
FFs architecture is evaluated via its average power, delay and power delay product.
Comparative study showed that FFs have great effecton the performance quality of shift
registers.

Universal shift registers, as all other types of registers, are used in computers as memory
elements. Flip-flops are an inherent building block in Universal shift registers design. In order
to achieve Universal shift registers, that is both high performances while also being power
efficient, careful attention must be paid to the design of flip flops. Several fast low power flip
flops, called pulse triggered flip flop (PTFF), design is analyzed and designed the universal shift
registers..

It presents a modified design for explicit pulse triggered Flip-flop with reduced transistor
count for low power and high performance applications. HSPICE simulation results of Shift
Register at a frequency of 1GHz indicate improvement in power -delay product with respect to
the Existing pulse triggered flip flop configurations using CMOS technology.

This paper it is proposed to implement low-power shift register using double edge
triggered flip-flops and make comparison analysis of existing double edge triggered flip-flops.
The flip-flops(FF) in the proposed shift register are designed using clock branch-sharing implicit
pulsed scheme(CBS_ip). The various existing double edge triggered flip -flops are transmission-
gate latch-MUX, C2MOS Latch-MUX, Dual-edge transmission-gate pulsed latch (DE-TGPL).
The main feature of the clock branch-sharing scheme is to reduce the number of clocked
transistors in the design as compared with existing double edge triggering flip-flops. As

10
compared to the other state of the art double-edge triggered flip-flop designs, this CBS_ip design
has an improvement in power consumption.

The Elementary Concept of Shift Registers:

Shift register is considered as a kind of sequential logic circuit, which is mostly for
storing digital data. Shift register is consisted of FFs in the group. FFs are linked together in a
way that the output of one is the input of next one. The whole FFs are running with
common clock and all FFs are setor resetsimultaneously. A register let every FF to set free for
keeping information of its nearby neighbor. Figure 1 represents the movement of basic
data in shift register.

Figure 1: Basic data movement in shift registers

The storing capacity in a register is the whole quantity of bits (0 or 1) from digital
data that may be held. Every FF within a shift register is considered as one bit of storing
capacity. Thus, the number of Flip-flop in a register defines its storing capacity. FF can be
defined as an electronic circuit that retains the logical state of data input signals once it
responds to a clock pulse. They are mostly implemented in the computational circuit to
function in a predefined sequence during repeating clock period to take and keep data for a
restricted time interval, which is adequate for other circuits in the system to promote the process
data.

11
In every clock signals rising and falling edge, all data which are kept in the FFs are freely
available for other computational and sequential circuitry to be applied as input. Double-edge
triggered FFs are those, which keep data on leading and trailing edge. The FFs with one edge
storing capability are called single edge triggered FFs [23-24]. The FF of type D that is
extensively used is familiar as delay or data FF (D-FF). This kind of flip-flop takes the
inputvalue in certain part of the clock cycle (falling or rising edge). The taken value turns
to Q output and does not change at other times. D-FFs are used as a delay line or a zero order
hold or as a memory cell. The D-FFs in integrated circuits have the ability to set or reset
mandatory. Benefit of the D flip-flop in comparison with the D-type transparent latch is the
signal on the D input is taken when the FF is clocked [25]. The next change on the D
input is neglected until the subsequent clock event. An exclusion is that some FFs have
a reset signal input, that will reset Q (to zero), and can synchronous or asynchronous by
the clock.

CHAPTER 3
Proposed system
Proposed Shift Register

A master-slave flip-flop using two latches in Fig. 1(a) can be replaced by a pulsed latch
consisting of a latch and a pulsed clock signal in Fig. 1(b)[6]. All pulsed latches share the pulse

12
generation circuit for the pulsed clock signal. As a result, the area and power consumption of the
pulsed latch become almost half of those of the master-slave flip-flop.

Fig. 1. (a) Master-slave flip-flop. (b) Pulsed latch.

The pulsed latch is an attractive solution for small area and low power consumption. The
pulsed latch cannot be used in shift registers due to the timing problem,asshowninFig.2.The shift
register in Fig. 2(a) consists of several latches and a pulsed clock signal (CLK_pulse). The
operation waveforms in Fig. 2(b) show the timing problem in the shifter register. The output
signal of the first latch (Q1) changes correctly because the input signal of the first latch (IN) is
constant during the clock pulse width . But the second latch has an uncertain output signal (Q2)
because its input signal (Q1) changes during the clock pulse width.

13
Fig. 2. Shift register with latches and a pulsed clock signal. (a) Schematic. (b)Waveforms

One solution for the timing problem is to add delay circuits between latches, as shown in
Fig. 3(a). The output signal of the latch is delayed (T delay) and reaches the next latch after the
clock pulse. As shown in Fig. 3(b) the output signals of the first and second latches (Q1 and Q2)
change during the clock pulse width( T pulse) , but the input signals of the second and third
latches (D2 and D3) become the same as the output signals of the first and second latches (Q1
and Q2) after the clock pulse. As a result, all latches have constant input signals during the clock
Pulse and no timing problem occurs between the latches. However, the delay circuits cause large
area and power overheads.

Fig. 3. Shift register with latches, delay circuits, and a pulsed clock signal. (a) Schematic.
(b) Waveforms

14
Another solution is to use multiple non-overlaps delayed pulsed clock signals, as shown
in Fig. 4(a). The delayed pulsed clock signals are generated when a pulsed clock signal goes
through delay circuits. Each latch uses a pulsed clock signal which is delayed from the pulsed
clock signal used in its next latch. Therefore, each latch updates the data after its next latch
updates the data. As a result, each latch has a constant input during its clock pulse and no timing
problem occurs between latches.

Fig. 4. Shift register with latches and delayed pulsed clock signals. (a) Schematic. (b)
Waveforms.

However, this solution also requires many delay circuits. Fig. 5(a) shows an example the
proposed shift register. The proposed shift register is divided into sub shifter registers to reduce
the number of delayed pulsed clock signals. A 4-bit sub shifter register consists offive latches and
it performs shift operations with five non-overlap delayed pulsed clock signals
(CLK_pulse1:4 and CLK_pulseT). In the 4-bit sub shift register #1, four latches store 4-

15
bit data (Q1-Q4) and the last latch stores 1-bit temporary data (T1) which will be stored in the
first latch (Q5) of the 4-bit sub shift register #2. Fig. 5(b) shows the operation waveforms in the
proposed shift register. Five non-overlap delayed pulsed clock signals are generated by the
delayed pulsed clock generator in Fig. 6. The sequence of the pulsed clock signals is in the
opposite order of the five latches. Initially, thepulsed clock signal CLK_pulseT updates the
latch data T1 from Q4. And then, the pulsed clock signals CLK_pulse1:4update the four latch
data from Q4 to Q1 sequentially. The latches Q2Q4 receive data from their previous latches
Q1Q3 but the first latch Q1 receives data from the input of the shift register (IN). The
operations of the other sub shift registers are the same as that of the sub shift register #1 except
that the first latch receives data from the temporary storage latch in the previous sub shift
register.

Fig. 5. Proposed shift register. (a) Schematic.

The proposed shift register reduces the number of delayed pulsed clock signals significantly,
but it increases the number of latches because of the additional temporary storage latches. As
shown in Fig. 6 each pulsed clock signal is generated in a clock- pulse circuit consisting a delay

16
circuit and an AND gate. When an shift register is divided into sub shift registers, the
number of clock-pulse circuits is and the number of latches is .A sub shift
register consisting of latches requires pulsed clock signals. The number of sub shift
registers becomes , each sub shift register has a temporary storage latch. Therefore,
latches are added for the temporary storage latches.

The conventional delayed pulsed clock circuits in Fig. 4 can be used to save the AND gates in
the delayed pulsed clock generator in Fig. 6. In the conventional delayed pulsed clock circuits,
the clock pulse width must be larger than the summation of the rising and falling times in all
inverters in the delay circuits to keep the shape of the pulsed clock. However, in the delayed
pulsed clock generator in Fig. 6 the clock pulsed width can be shorter than the summation of the
rising and falling times be-cause each sharp pulsed clock signal is generated from an AND gate
and two delayed signals. Therefore, the delayed pulsed clock generator is suitable for short
pulsed clock signals.

Fig. 6. Delayed pulsed clock generator.

The numbers of latches and clock-pulse circuits change ac-cording to the word length of
the sub shift register . is selected by considering the area, power consumption, speed. The
area optimization can be performed as follows. When the circuit areas are normalized with a
latch, the areas of a latch and a clock-pulse circuit are 1 and , respectively. The total area
becomes . The optimal for the minimum area is obtained from the
first-order differential equation of the total area ..

17
CHAPTER 4
VLSI TECHNOLOGY

4.1 HARDWARE REQUIREMENTS

Integrated circuit (IC) technology is the enabling technology for a whole host of innovative
devices and systems that have changed the way we live. Jack Kilby and Robert Noyce received the 2000
Nobel Prize in Physics for their invention of the integrated circuit; without the integrated circuit, neither
transistors nor computers would be as important as they are today. VLSI systems are much smaller and
consume less power than the discrete components used to build electronic systems before the 1960s.
Integration allows us to build systems with many more transistors, allowing much more computing power
to be applied to solving a problem. Integrated circuits are also much easier to design and manufacture and
are more reliable than discrete systems; that makes it possible to develop special-purpose systems that are
more efficient than general-purpose computers for the task at hand.

4.1.1 APPLICATIONS OF VLSI


Electronic systems now perform a wide variety of tasks in daily life. Electronic systems in some
cases have replaced mechanisms that operated mechanically, hydraulically, or by other means; electronics
are usually smaller, more flexible, and easier to service. In other cases electronic systems have created
totally new applications. Electronic systems perform a variety of tasks, some of them visible, some more
hidden:

Personal entertainment systems such as portable MP3 players and DVD players perform
sophisticated algorithms with remarkably little energy.
Electronic systems in cars operate stereo systems and displays; they also control fuel injection
systems, adjust suspensions to varying terrain, and perform the control functions required for
anti-lock braking (ABS) systems.
Digital electronics compress and decompress video, even at high definition data rates, on-the-fly
in consumer electronics.
Low-cost terminals for Web browsing still require sophisticated electronics, despite their
dedicated function.
Personal computers and workstations provide word-processing, financial analysis, and games.
Computers include both central processing units (CPUs) and special-purpose hardware for disk access,
faster screen display, etc.

18
Medical electronic systems measure bodily functions and perform complex processing algorithms
to warn about unusual conditions. The availability of these complex systems, far from overwhelming
consumers, only creates demand for even more complex systems. The growing sophistication of
applications continually pushes the design and manufacturing of integrated circuits and electronic systems
to new levels of complexity. And perhaps the most amazing characteristic of this collection of systems is
its variety as systems become more complex, we build not a few general-purpose computers but an ever
wider range of special-purpose systems. Our ability to do so is a testament to our growing mastery of both
integrated circuit manufacturing and design, but the increasing demands of customers continue to test the
limits of design and manufacturing.

4.2 ADVANTAGES OF VLSI


While we will concentrate on integrated circuits in this book, the properties of integrated circuits
what we can and cannot efficiently put in an integrated circuitlargely determine the architecture of the
entire system. Integrated circuits improve system characteristics in several critical ways. ICs have three
key advantages over digital circuits built from discrete components:

Size. Integrated circuits are much smallerboth transistors and wires are shrunk to micrometer sizes,
compared to the millimeter or centimeter scales of discrete components. Small size leads to advantages in
speed and power consumption, since smaller components have smaller parasitic resistances, capacitances,
and inductances.

Speed. Signals can be switched between logic 0 and logic 1 much quicker within a chip than they can
between chips. Communication within a chip can occur hundreds of times faster than communication
between chips on a printed circuit board. The high speed of circuits on-chip is due to their small size
smaller components and wires have smaller parasitic capacitances to slow down the signal.

Power consumption. Logic operations within a chip also take much less power. Once again, lower
power consumption is largely due to the small size of circuits on the chipsmaller parasitic capacitances
and resistances require less power to drive them.

4.3 VLSI AND SYSTEMS


These advantages of integrated circuits translate into advantages at the system level:

Smaller physical size. Smallness is often an advantage in itselfconsider portable televisions or


handheld cellular telephones.

19
Lower power consumption. Replacing a handful of standard parts with a single chip reduces total
power consumption. Reducing power consumption has a ripple effect on the rest of the system: a smaller,
cheaper power supply can be used; since less power consumption means less heat, a fan may no longer be
necessary; a simpler cabinet with less shielding for electromagnetic shielding may be feasible, too.

Reduced cost. Reducing the number of components, the power supply requirements, cabinet costs, and
so on, will inevitably reduce system cost. The ripple effect of integration is such that the cost of a system
built from custom ICs can be less, even though the individual ICs cost more than the standard parts they
replace. Understanding why integrated circuit technology has such profound influence on the design of
digital systems requires understanding both the technology of IC manufacturing and the economics of ICs
and digital systems.

4.4 INTEGRATED CIRCUIT MANUFACTURING


Integrated circuit technology is based on our ability to manufacture huge numbers of very small
devicestoday, more transistors are manufactured in California each year than raindrops fall on the state.
In this section, we briefly survey VLSI manufacturing.

4.4.1 TECHNOLOGY
Most manufacturing processes are fairly tightly coupled to the item they are manufacturing. An
assembly line built to produce Buicks, for example, would have to undergo moderate reorganization to
build Chevystools like sheet metal molds would have to be replaced, and even some machines would
have to be modified. And either assembly line would be far removed from what is required to produce
electric drills.

4.4.2 MASK-DRIVEN MANUFACTURING

Integrated circuit manufacturing technology, on the other hand, is remarkably versatile. While
there are several manufacturing processes for different circuit typesCMOS, bipolar, etc.a
manufacturing line can make any circuit of that type simply by changing a few basic tools called masks.
For example, a single CMOS manufacturing plant can make both microprocessors and microwave oven
controllers by changing the masks that form the patterns of wires and transistors on the chips. Silicon
wafers are the raw material of IC manufacturing.

The fabrication process forms patterns on the wafer that create wires and transistors. a series of
identical chips are patterned onto the wafer (with some space reserved for test circuit structures which
allow manufacturing to measure the results of the manufacturing process).

20
The IC manufacturing process is efficient because we can produce many identical chips by
processing a single wafer. By changing the masks that determine what patterns are laid down on the chip,
we determine the digital circuit that will be created. The IC fabrication line is a generic manufacturing
linewe can quickly retool the line to make large quantities of a new kind of chip, using the same
processing steps used for the lines previous product.

4.4.3 CIRCUITS AND LAYOUTS

We could build a breadboard circuit out of standard parts. To build it on an IC fabrication line, we
must go one step further and design the layout, or patterns on the masks. The rectangular shapes in the
layout (shown here as a sketch called a stick diagram) form transistors and wires which conform to the
circuit in the schematic. Creating layouts is very time-consuming and very importantthe size of the
layout determines the cost to manufacture the circuit, and the shapes of elements in the layout determine
the speed of the circuit as well.

During manufacturing, a photolithographic (photographic printing) process is used to transfer the


layout patterns from the masks to the wafer. The patterns left by the mask are used to selectively change
the wafer: impurities are added at selected locations in the wafer; insulating and conducting materials are
added on top of the wafer as well.

4.5 MANUFACTURING DEFECTS

Because no manufacturing process is perfect, some of the chips on the wafer may not work. Since
at least one defect is almost sure to occur on each wafer, wafers are cut into smaller, working chips; the
largest chip that can be reasonably manufactured today is 1.5 to 2 cm on a side, while a wafer is in
moving from 30 to 45 cm. Each chip is individually tested; the ones that pass the test are saved after the
wafer is diced into chips. The working chips are placed in the packages familiar to digital designers. In
some packages, tiny wires connect the chip to the packages pins while the package body protects the chip
from handling and the elements; in others, solder bumps directly connect the chip to the package.

Integrated circuit manufacturing is a powerful technology for two reasons: all circuits can be
made out of a few types of transistors and wires; and any combination of wires and transistors can be built
on a single fabrication line just by changing the masks that determine the pattern of components on the
chip. Integrated circuits run very fast because the circuits are very small. Just as important, we are not
stuck building a few standard chip typeswe can build any function we want. The flexibility given by IC
manufacturing lets we build faster, more complex digital systems in ever greater variety.

21
4.5.1 ECONOMICS

Because integrated circuit manufacturing has so much leveragea great number of parts can be
built with a few standard manufacturing proceduresa great deal of effort has gone into improving IC
manufacturing. However, as chips become more complex, the cost of designing a chip goes up and
becomes a major part of the overall cost of the chip.

Moores Law

In the 1960s Gordon Moore predicted that the number of transistors that could be manufactured on
a chip would grow exponentially. His prediction, now known as Moores Law, was remarkably prescient.
Moores ultimate prediction was that transistor count would double every two years, an estimate that has
held up remarkably well. Today, an industry group maintains the International Technology Roadmap for
Semiconductors (ITRS), that maps out strategies to maintain the pace of Moores Law.

4.6 TERMINOLOGY

The most basic parameter associated with a manufacturing process is the minimum channel
length of a transistor. (In this book, for example, we will use as an example a technology that can
manufacture 180 nm transistors.) A manufacturing technology at a particular channel length is called a
technology node. We often refer to a family of technologies at similar feature sizes: micron, submicron,
deep submicron, and now nanometer technologies. The term nanometer technology is generally used for
technologies below 100 nm.

4.6.1 COST OF MANUFACTURING

IC manufacturing plants are extremely expensive. A single plant costs as much as $4 billion. Given
that a new, state-of-the-art manufacturing process is developed every three years, that is a sizeable
investment. The investment makes sense because a single plant can manufacture so many chips and can
easily be switched to manufacture different types of chips. In the early years of the integrated circuits
business, companies focused on building large quantities of a few standard parts. These parts are
commoditiesone 80 ns, 256Mb dynamic RAM is more or less the same as any other, regardless of the
manufacturer. Companies concentrated on commodity parts in part because manufacturing processes were
less well understood and manufacturing variations are easier to keep track of when the same part is being
fabricated day after day.

22
Standard parts also made sense because designing integrated circuits was hardnot only the circuit, but
the layout had to be designed, and there were few computer programs to help automate the design
process.

4.6.2 COST OF DESIGN

One of the less fortunate consequences of Moores Law is that the time and money required to design a
chip goes up steadily. The cost of designing a chip comes from several factors:

Skilled designers are required to specify, architect, and implement the chip. A design team may range
from a half-dozen people for a very small chip to 500 people for a large, high-performance
microprocessor

These designers cannot work without access to a wide range of computer- aided design (CAD) tools.
These tools synthesize logic, create layouts, simulate, and verify designs. CAD tools are generally
licensed and you must pay a yearly fee to maintain the license. A license for a single copy of one tool,
such as logic synthesis, may cost as much as $50,000 US.

The CAD tools require a large compute farm on which to run. During the most intensive part of the
design process, the design team will keep dozens of computers running continuously for weeks or
months.

A large ASIC, which contains millions of transistors but is not fabricated on the state-of-the-art process,
can easily cost $20 million US and as much as $100 million. Designing a large microprocessor costs
hundreds of millions of dollars.

4.7 DESIGN COSTS AND IP

We can spread these design costs over more chips if we can reuse all or part of the design in other
chips. The high cost of design is the primary motivation for the rise of IP-based design, which creates
modules that can be reused in many different designs

23
4.7.1 TYPES OF CHIPS

The preponderance of standard parts pushed the problems of building customized systems back to the
board-level designers who used the standard parts. Since a function built from standard parts usually
requires more components than if the function were built with custom designed ICs, designers tended to
build smaller, simpler systems. The industrial trend, however, is to make available a wider variety of
integrated circuits. The greater diversity of chips includes:

4.7.2 MORE SPECIALIZED STANDARD PARTS

In the 1960s, standard parts were logic gates; in the 1970s they were LSI components. Today,
standard parts include fairly specialized components: communication network interfaces, graphics
accelerators, floating point processors. All these parts are more specialized than microprocessors but are
used in enough volume that designing special-purpose chips is worth the effort. In fact, putting a
complex, high-performance function on a single chip often makes other applications possibl.

4.7.3 Application-Specific Integrated Circuits (Asics)

Rather than build a system out of standard parts, designers can now create a single chip for their
particular application. Because the chip is specialized, the functions of several standard parts can often be
squeezed into a single chip, reducing system size, power, heat, and cost. Application-specific ICs are
possible because of computer tools that help humans design chips much more quickly.

4.7.4 Systems-On-Chips (Socs)

Fabrication technology has advanced to the point that we can put a complete system on a single
chip. For example, a single-chip computer can include a CPU, bus, I/O devices, and memory. SoCs allow
systems to be made at much lower cost than the equivalent board-level system. SoCs can also be higher
performance and lower power than board-level equivalents because on-chip connections are more
efficient than chip-to chip connections. A wider variety of chips is now available in part because
fabrication methods are better understood and more reliable. More importantly, as the number of
transistors per chip grows, it becomes easier and cheaper to design special-purpose ICs. When only a few
transistors could be put on a chip, careful design was required to ensure that even modest functions
could be put on a single chip.

24
Todays VLSI manufacturing processes, which can put millions of carefully-designed transistors
on a chip, can also be used to put tens of thousands of less-carefully designed transistors on a chip. Even
though the chip could be made smaller or faster with more design effort, the advantages of having a
single-chip implementation of a function that can be quickly designed often outweighs the lost potential
performance. The problem and the challenge of the ability to manufacture such large chips is designthe
ability to make effective use of the millions of transistors on a chip to perform a useful function.

4.7.5 CMOS TECHNOLOGY

CMOS is the dominant integrated circuit technology. In this section we will introduce some basic
concepts of CMOS to understand why it is so widespread and some of the challenges introduced by the
inherent characteristics of CMOS.

4.8 POWER CONSUMPTION

Power Consumption Constraints

The huge chips that can be fabricated today are possible only because of the relatively tiny
consumption of CMOS circuits. Power consumption is critical at the chip level because much of the
power is dissipated as heat, and chips have limited heat dissipation capacity.Even if the system in which a
chip is placed can supply large amounts of power, most chips are packaged to dissipate fewer than 10 to
15 Watts of power before they suffer permanent damage (though some chips dissipate well over 50 Watts
thanks to special packaging).

The power consumption of a logic circuit can, in the worst case, limit the number transistors we
can effectively put on a single chip. Limiting the number of transistors per chip changes system design in
several ways. Most obviously, it increases the physical size of a system. Using high-powered circuits also
increases power supply and cooling requirements. A more subtle effect is caused by the fact that the time
required to transmit a signal between chips is much larger than the time required to send the same signal
between two transistors on the same chip; as a result, some of the advantage of using a higher-speed
circuit family is lost.

Another subtle effect of decreasing the level of integration is that the electrical design of multi-
chip systems is more complex: microscopic wires on-chip exhibit parasitic resistance and capacitance,
while macroscopic wires between chips have capacitance and inductance, which can cause a number of
ringing effects that are much harder to analyze. The close relationship between power consumption and
heat makes low-power design techniques important knowledge for every CMOS designer.

25
Of course, low-energy design is especially important in battery-operated systems like cellular telephones.
Energy, in contrast, must be saved by avoiding unnecessary work. We will see throughout the rest of this
book that minimizing power and energy consumption requires careful attention to detail at every level of
abstraction, from system architecture down to layout. As CMOS features become smaller, additional
power consumption mechanisms come into play. Traditional CMOS consumes power when signals
change but consumes only negligible power when idle. In modern CMOS, leakage mechanisms start to
drain current even when signals are idle.

4.8.1 DESIGN AND TESTABILITY

Design Verification

Our ability to build large chips of unlimited variety introduces the problem of checking whether
those chips have been manufactured correctly. Designers accept the need to verify or validate their
designs to make sure that the circuits perform the specified function. (Some people use the terms
verification and validation interchangeably; a finer distinction reserves verification for formal proofs of
correctness, leaving validation to mean any technique which increases confidence in correctness, such as
simulation.) Chip designs are simulated to ensure that the chips circuits compute the proper functions to a
sequence of inputs chosen to exercise the chip. manufacturing test But each chip that comes off the
manufacturing line must also undergo

Manufacturing test

The chip must be exercised to demonstrate that no manufacturing defects rendered the chip
useless. Because IC manufacturing tends to introduce certain types of defects and because we want to
minimize the time required to test each chip, we cant just use the input sequences created for design
verification to perform manufacturing test. Each chip must be designed to be fully and easily testable.
Finding out that a chip is bad only after you have plugged it into a system is annoying at best and
dangerous at worst. Customers are unlikely to keep using manufacturers who regularly supply bad chips.
Defects introduced during manufacturing range from the catastrophic contamination that destroys every
transistor on the waferto the subtlea single broken wire or a crystalline defect that kills only one
transistor.

While some bad chips can be found very easily, each chip must be thoroughly tested to find even
subtle flaws that produce erroneous results only occasionally. Tests designed to exercise functionality and
expose design bugs dont always uncover manufacturing defects.

26
We use fault models to identify potential manufacturing problems and determine how they affect
the chips operation. The most common fault model is stuck-at-0/1: the defect causes a logic gates output
to be always 0 (or 1), independent of the gates input values. We can often determine whether a logic
gates output is stuck even if we cant directly observe its outputs or control its inputs. We can generate a
good

4.8.2 TESTABILITY AS A DESIGN PROCESS

Unfortunately, not all chip designs are equally testable. Some faults may require long input
sequences to expose; other faults may not be testable at all, even though they cause chip malfunctions that
arent covered by the fault model. Traditionally, chip designers have ignored testability problems, leaving
them to a separate test engineer who must find a set of inputs to adequately test the chip. If the test
engineer cant change the chip design to fix testability problems, his or her job becomes both difficult and
unpleasant. The result is often poorly tested chips whose manufacturing problems are found only after the
customer has plugged them into a system.

Companies now recognize that the only way to deliver high-quality chips to customers is to make
the chip designer responsible for testing, just as the designer is responsible for making the chip run at the
required speed. Testability problems can often be fixed easily early in the design process at relatively little
cost in area and performance. But modern designers must understand testability requirements, analysis
techniques which identify hard-to-test sections of the design, and design techniques which improve
testability

4.8.3 RELIABILITY

Reliability Is A Lifetime Problem

Earlier generations of VLSI technology were robust enough that testing chips at manufacturing
time was sufficient to identify working partsa chip either worked or it didnt. In todays nanometer-
scale technologies, the problem of determining whether a chip works is more complex. A number of
mechanisms can cause transient failures that cause occasional problems but are not repeatable. Some
other failure mechanisms, like overheating, cause permanent failures but only after the chip have operated
for some time. And more complex manufacturing problems cause problems that are harder to diagnose
and may affect performance rather than functionality.

27
4.8.4 DESIGN-FOR MANUFACTURABILITY

A number of techniques, referred to as design-for-manufacturability or design-for-yield, are in use


today to improve the reliability of chips that come off the manufacturing line. We can make chips more
reliable by designing circuits and architectures that reduce design stresses and check for problems. For
example, heat is one major cause of chip failure. Proper power management circuitry can reduce the
chips heat dissipation and reduce the damage caused by overheating. We also need to change the way we
design chips. Some of the convenient levels of abstraction that served us well in earlier technologies are
no longer entirely appropriate in nanometer technologies. We need to check more thoroughly and be
willing to solve reliability problems by modifying design decisions made earlier.

4.9 INTEGRATED CIRCUIT DESIGN TECHNIQUES

To make use of the flood of transistors given to us by Moores Law, we must design large, complex chips
quickly. The obstacle to making large chips work correctly is complexitymany interesting ideas for
chips have died in the swamp of details that must be made correct before the chip actually works.
Integrated circuit design is hard because designers must juggle several different problems:

Multiple Levels Of Abstraction

IC design requires refining an idea through many levels of detail. Starting from a specification of
what the chip must do, the designer must create an architecture which performs the required function,
expand the architecture into a logic design, and further expand the logic design into a layout like the one
in Figure 1-2. As you will learn by the end of this book, the specification-to-layout design process is a lot
of work.

Multiple And Conflicting Costs

In addition to drawing a design through many levels of detail, the designer must also take into
account costsnot dollar costs, but criteria by which the quality of the design is judged. One critical cost
is the speed at which the chip runs.

Two architectures that execute the same function (multiplication, for example) may run at very
different speeds. We will see that chip area is another critical design cost: the cost of manufacturing a
chip is exponentially related to its area, and chips much larger than 1 cm2 cannot be manufactured at all.

28
Furthermore, if multiple cost criteriasuch as area and speed requirementsmust be satisfied,
many design decisions will improve one cost metric at the expense of the other. Design is dominated by
the process of balancing conflicting constraints.

Short Design Time

In an ideal world, a designer would have time to contemplate the effect of a design decision. We
do not, however, live in an ideal world. Chips which appear too late may make little or no money because
competitors have snatched market share. Therefore, designers are under pressure to design chips as
quickly as possible. Design time is especially tight in application-specific IC design, where only a few
weeks may be available to turn a concept into a

working ASIC.

4.9.1 FIELD-PROGRAMMABLE GATE ARRAYS(FPGA)

A field-programmable gate array (FPGA) is a block of programmable logic that can implement
multi-level logic functions. FPGAs are most commonly used as separate commodity chips that can be
programmed to implement large functions. However, small blocks of FPGA logic can be useful
components on-chip to allow the user of the chip to customize part of the chips logical function. An
FPGA block must implement both combinational logic functions and interconnect to be able to construct
multi-level logic functions. There are several different technologies for programming FPGAs, but most
logic processes are unlikely to implement anti-fuses or similar hard programming technologies, so we will
concentrate on SRAM-programmed FPGAs.

4.9.2 LOOKUP TABLES

The basic method used to build a combinational logic block (CLB) also called a logic element
in an SRAM-based FPGA is the lookup table (LUT). As shown in Figure , the lookup table is an SRAM
that is used to implement a truth table. Each address in the SRAM represents a combination of inputs to
the logic element. The value stored at that address represents the value of the function for that input
combination. An n-input function requires an SRAM with locations.

29
Fig 4.1 Lookup Tables

Because a basic SRAM is not clocked, the lookup table logic element operates much as any other logic
gate as its inputs change, its output changes after some delay.

4.9.3 PROGRAMMING A LOOKUP TABLE

Unlike a typical logic gate, the function represented by the logic element can be changed by changing the
values of the bits stored in the SRAM. As a result, the n-input logic element can represent functions
(though some of these functions are permutations of each other).

Fig 4.2 Programming A Lookup Table

30
A typical logic element has four inputs. The delay through the lookup table is independent of the
bits stored in the SRAM, so the delay through the logic element is the same for all functions. This means
that, for example, a lookup table-based logic element will exhibit the same delay for a 4-input XOR and a
4-input NAND. In contrast, a 4-input XOR built with static CMOS logic is considerably slower than a 4-
input NAND. Of course, the static logic gate is generally faster than the logic element. Logic elements
generally contain registersflip-flops and latchesas well as combinational logic. A flip-flop or latch is
small compared to the combinational logic element (in sharp contrast to the situation in custom VLSI), so
it makes sense to add it to the combinational logic element. Using a separate cell for the memory element
would simply take up routing resources. The memory element is connected to the output; whether it
stores a given value is controlled by its clock and enable inputs.

4.9.4 COMPLEX LOGIC ELEMENT

Many FPGAs also incorporate specialized adder logic in the logic element. The critical
component of an adder is the carry chain, which can be implemented much more efficiently in specialized
logic than it can using standard lookup table techniques. The wiring channels that connect to the logic
elements inputs and outputs also need to be programmable. A wiring channel has a number of
programmable connections such that each input or output generally can

be connected to any one of several different wires in the channel.

31
4.9.5 PROGRAMMABLE INTERCONNECTION POINTS

Simple version of an interconnection point, often known as a connection box.

Fig4.3 Programming A Lookup Table

A programmable connection between two wires is made by a CMOS transistor (a pass transistor). The
pass transistors gate is controlled by a static memory program bit (shown here as a D register). When the
pass transistors gate is high, the transistor conducts and connects the two wires; when the gate is low, the
transistor is off and the two wires are not connected.

32
CHAPTER-5

TOOLS USED
5.1 SOFTWARE REQUIREMENTS

Verification Tool
Modelsim 6.4b
Synthesis Tool
Xilinx ISE 10.1

5.2 INTRODUCTION TO MODELSIM

ModelSim /VHDL, ModelSim /VLOG, ModelSim /LNL, and ModelSim /PLUS are produced by Model
Technology Incorporated. Unauthorized copying, duplication, or other reproduction is prohibited
without the written consent of Model Technology. The information in this manual is subject to change
without notice and does not represent a commitment on the part of Model Technology. The program
described in this manual is furnished under a license agreement and may not be used or copied except in
accordance with the terms of the agreement. The online documentation provided with this product may be
printed by the end-user. The number of copies that may be printed is limited to the number of licenses
purchased. ModelSim is a registered trademark of Model Technology Incorporated. Model Technology is
a trademark of Mentor Graphics Corporation. PostScript is a registered trademark of Adobe Systems
Incorporated. UNIX is a registered trademark of AT&T in the USA and other countries. FLEXlm is a
trademark of Globetrotter Software, Inc. IBM, AT, and PC are registered trademarks, AIX and RISC
System/6000 are trademarks of International Business Machines Corporation. Windows, Microsoft, and
MS-DOS are registered trademarks of Microsoft Corporation. OSF/Motif is a trademark of the Open
Software Foundation, Inc. in the USA and other countries. SPARC is a registered trademark and
SPARCstation is a trademark of SPARC International, Inc. Sun Microsystems is a registered trademark,
and Sun, SunOS and OpenWindows are trademarks of Sun Microsystems, Inc. All other trademarks and
registered trademarks are the properties of their respective holders.

ModelSim is a useful tool that allows you to stimulate the inputs of your modules and view both outputs
and internal signals. It allows you to do both behavioural and timing simulation, however, this document

33
will focus on behavioural simulation. Keep in mind that these simulations are based on models and thus
the results are only as accurate as the constituent models.

5.3 STANDARDS SUPPORTED

ModelSim VHDL supports both the IEEE 1076-1987 and 1076-1993 VHDL, the 1164-1993 Standard
Multivalue Logic System for VHDL Interoperability, and the 1076.2-1996 Standard VHDL Mathematical
Packages standards. Any design developed with ModelSim will be compatible with any other VHDL
system that is compliant with either IEEE Standard 1076-1987 or 1076-1993. ModelSim Verilog is based
on IEEE Std 1364-1995 and a partial implementation of 1364-2001, Standard Hardware Description
Language Based on the Verilog Hardware Description Language. The Open Verilog International Verilog
LRM version 2.0 is also applicable to a large extent. Both PLI (Programming Language Interface) and
VCD (Value Change Dump) are supported for ModelSim PE and SE users.

5.3.1 MODELSIM

Basic Steps For Simulation

This section provides further detail related to each step in the process of simulating your design using
ModelSim.

Step 1 - Collecting Files And Mapping Libraries

Files needed to run ModelSim on your design:

design files (VHDL, Verilog, and/or SystemC), including stimulus for the design
libraries, both working and resource
modelsim.ini (automatically created by the library mapping command

Providing Stimulus To The Design

You can provide stimulus to your design in several ways:

Language based testbench


Tcl-based ModelSim interactive command, force
VCD files / commands
See "Using extended VCD as stimulus" (UM-458) and "Using extended VCD as stimulus"
3rd party test bench generation tools

34
A Library In Modelsim

A library is a location where data to be used for simulation is stored. Libraries are ModelSims way of
managing the creation of data before it is needed for use in simulation. It also serves as a way to
streamline simulation invocation. Instead of compiling all design data each and every time you simulate,
ModelSim uses binary pre-compiled data from these libraries. So, if you make a changes to a single
Verilog module, only that module is recompiled, rather than all modules in the design.

Working And Resource Libraries

Design libraries can be used in two ways: 1) as a local working library that contains the compiled version
of your design; 2) as a resource library. The contents of your working library will change as you update
your design and recompile. A resource library is typically unchanging, and serves as a parts source for
your design. Examples of resource libraries might be: shared information within your group, vendor
libraries, packages, or previously compiled elements of your own working design. You can create your
own resource libraries, or they may be supplied by another design team or a third party (e.g., a silicon
vendor). For more information on resource libraries and working libraries, see "Working library versus
resource libraries", "Managing library contents", "Working with design libraries, and "Specifying the
resource librarie".

Creating The Logical Library Vlib

Before you can compile your source files, you must create a library in which to store the compilation
results. You can create the logical library using the GUI, using File > New > Library (see "Creating a
library"), or you can use the vlib command. For example, the command:

vlib work

creates a library named work. By default, compilation results are stored in the work

library.

Mapping The Logical Work To The Physical Work Directory Vmap

VHDL uses logical library names that can be mapped to ModelSim library directories. If libraries are not
mapped properly, and you invoke your simulation, necessary components will not be loaded and

35
simulation will fail. Similarly, compilation can also depend on proper library mapping. By default,
ModelSim can find libraries in your current directory (assuming they have the right name), but for it to
find libraries located elsewhere, you need to map a logical library name to the pathname of the library.
You can use the GUI ("Library mappings with the GUI", a command ("Library mappings with the GUI" ),
or a project ("Getting started with projects" to assign a logical name to a design library.

The format for command line entry is:

vmap <logical_name> <directory_pathname>

This command sets the mapping between a logical library name and a directory.

Step 2 - Compiling the design with vlog/vcom/sccom

Designs are compiled with one of the three language compilers.

Compiling Verilog - vlog

ModelSims compiler for the Verilog modules in your design is vlog . Verilog files may be compiled in
any order, as they are not order dependent. See "Compiling Verilog files" for details.

Verilog portions of the design can be optimized for better simulation performance.

"Optimizing Verilog designs".

Compiling VHDL - vcom

ModelSims compiler for VHDL design units is vcom . VHDL files must be compiled according to the
design requirements of the design. Projects may assist you in determining the compile order: for more
information, see"Auto-generating compile order" (UM-46). See "Compiling VHDL files" (UM-73) for
details. on VHDL compilation.

Compiling SystemC - sccom

ModelSims compiler for SystemC design units is sccom , and is used only if you have SystemC
components in your design. See "Compiling SystemC files" for details.

36
Step 3 - Loading the design for simulation

vsim <top>

Your design is ready for simulation after it has been compiled and (optionally) optimized with vopt . For
more information on optimization, see Optimizing Verilog designs . You may then invoke vsim with the
names of the top-level modules (many designs contain only one top-level module) or the name you
assigned to the optimized version of the design.

For example, if your top-level modules are "testbench" and "globals", then invoke the simulator as
follows:

vsim testbench globals

After the simulator loads the top-level modules, it iteratively loads the instantiated modules and UDPs in
the design hierarchy, linking the design together by connecting the ports and resolving hierarchical
references.

Using SDF

You can incorporate actual delay values to the simulation by applying SDF back annotation

files to the design. For more information on how SDF is used in the design, see "Specifying SDF files for
simulation" .

Step 4 - Simulating the design

Once the design has been successfully loaded, the simulation time is set to zero, and you

must enter a run command to begin simulation. For more information, see Verilog

simulation , VHDL simulation , and SystemC simulation .

The basic simulator commands are:

add wave

force

bp

run

37
step

next

Step 5- Debugging The Design

Numerous tools and windows useful in debugging your design are available from the ModelSim GUI. For
more information, seeWaveform analysis (UM-237), PSL Assertions andTracing signals with the
Dataflow window. In addition, several basic simulation commands are available from the command line
to assist you in debugging your design:

describe

drivers

examine

force

log

checkpoint

restore

show

5.3.2 MODELSIM BASICS

On the left side of the interface, under the project tab is the frame listing of the files that pertain

to the opened project. The Library frame lists the entities of the project (that have been ompiled).

38
To the right is the ModelSim shell frame. It is an extension of MS-DOS, so both ModelSim and MS-DOS
commands can be executed.

A. Creating a New Project

Once ModelSim has been started, create a new project:

File > New > Project

Name the project FA, set the project location to

F:/VHDL , and click OK.

A new window should appear to add new files to the

project. Choose Create New File.

Enter F:\VHDL\FA.vhd as the file name and click OK.

Then close the add new files window.

Additional files can be added later by choosing from the menu: Project > Add File to Project

B. Editing Source Files

39
Double click the FA.vhd source found under the Workspace windows project tab. This will

open up an empty text editor configured to highlight VHDL syntax. Copy the source found at the end of
this document. After writing the code for the entity and its architecture, save and close the source file.

C. Compiling projects

Select the file from the project files list frame and right click on it. Select compile to just compile. This
file or compile all for the files in the current project. If there are errors within the code or the project, a
red failure message will be displayed. Double click these red errors for more detailed errors. Otherwise, if
all is well then no red warning or error messages will be displayed.

II. Simulating with ModelSim

To simulate, first the entity design has to be loaded into the simulator. Do this by selecting fromthe menu:

Simulate > Simulate

A new window will appear listing all the entities (not filenames) that are in the work library. Select FA
entity for simulation and click OK.

Often times it will be necessary to create entities with multiple architectures. In this case the architecture
has to be specified for the simulation. Expand the tree for the entity and select the architecture to be
simulated and then click OK.

Creating test files for the simulator After the design is loaded, clear up any previous data and restart the
timer by typing in the Prompt:

View > Signals

A new window will be displayed listing the design entitys signals and their initial value (shown

40
below). Items in waveform and listing are ordered in the same order in which they are declared

in the code. To display the waveform, select the signals for the waveform to display (hold CTL

and click to select multiple signals) and from the signal list window menu select:

Add > Wave > Selected signals

5.4 XILINX ISE

5.4.1 INTRODUCTION

The Spartan-3 family of Field-Programmable Gate Arrays is specifically designed to meet the
needs of high volume, cost-sensitive consumer electronic applications. The eight-member family offers
densities ranging from 50,000 to five million system gates. The Spartan-3 family builds on the success of
the earlier Spartan-IIE family by increasing the amount of logic resources, the capacity of internal RAM,
the total number of I/Os, and the overall level of performance as well as by improving clock management
functions. Numerous enhancements derive from the Virtex-II platform technology. These Spartan-3
FPGA enhancements, combined with advanced process technology, deliver more functionality and
bandwidth per dollar than was previously possible, setting new standards in the programmable logic
industry.

Because of their exceptionally low cost, Spartan-3 FPGAs are ideally suited to a wide range of
consumer electronics applications, including broadband access, home networking, display/projection and
digital television equipment. The Spartan-3 family is a superior alternative to mask programmed ASICs.
FPGAs avoid the high initial cost, the lengthy development cycles, and the inherent inflexibility of
conventional ASICs. Also, FPGA programmability permits design upgrades in the field with no
hardware replacement necessary, an impossibility with ASICs.

41
5.4.2 FEATURES

Low-cost, high-performance logic solution for high-volume, consumer-oriented applications

- Densities up to 74,880 logic cells

SelectIO interface signaling

- Up to 633 I/O pins

- 622+ Mb/s data transfer rate per I/O

- 18 single-ended signal standards

- 8 differential I/O standards including LVDS, RSDS

- Termination by Digitally Controlled Impedance

- Signal swing ranging from 1.14V to 3.465V

- Double Data Rate (DDR) support

- DDR, DDR2 SDRAM support up to 333 Mbps

Logic resources

- Abundant logic cells with shift register capability

- Wide, fast multiplexers

- Fast look-ahead carry logic

- Dedicated 18 x 18 multipliers

- JTAG logic compatible with IEEE 1149.1/1532

SelectRAM hierarchical memory

- Up to 1,872 Kbits of total block RAM

- Up to 520 Kbits of total distributed RAM

Digital Clock Manager (up to four DCMs)

42
- Clock skew elimination

- Frequency synthesis

- High resolution phase shifting

Eight global clock lines and abundant routing

Fully supported by Xilinx ISE and WebPACK

software development systems

MicroBlaze and PicoBlaze processor, PCI, PCI

Express PIPE Endpoint, and other IP cores

Pb-free packaging options

Automotive Spartan-3 XA Family variant

5.5 ARCHITECTURAL OVERVIEW

The Spartan-3 family architecture consists of five fundamental programmable functional


elements:

Configurable Logic Blocks (CLBs) contain RAM-based Look-Up Tables (LUTs) to implement logic and
storage elements that can be used as flip-flops or latches. CLBs can be programmed to perform a wide
variety of logical functions as well as to store data.

Input/Output Blocks (IOBs) control the flow of data between the I/O pins and the internal logic of the
device. Each IOB supports bidirectional data flow plus 3-state operation. Twenty-six different signal

43
standards, including eight high-performance differential standards, are available as shown in Table 2.
Double Data-Rate (DDR) registers are included. The Digitally Controlled

Impedance (DCI) feature provides automatic on-chip terminations, simplifying board designs.

Block RAM provides data storage in the form of 18-Kbitdual-port blocks. Multiplier blocks accept two
18-bit binary numbers as inputs and calculate the product.

Digital Clock Manager (DCM) blocks provide self-calibrating, fully digital solutions for
distributing, delaying, multiplying, dividing, and phase shifting clock signals. These elements are
organized as shown in Figure. A ring of IOBs surrounds a regular array of CLBs. The XC3S50

has a single column of block RAM embedded in the array. Those devices ranging from the XC3S200 to
the XC3S2000 have two columns of block RAM. The XC3S4000 and XC3S5000 devices have four RAM
columns. Each column is made up of several 18-Kbit RAM blocks; each block is associated with a
dedicated multiplier. The DCMs are positioned at the ends of the outer block RAM columns. The Spartan-
3 family features a rich network of traces and switches that interconnect all five functional elements,
transmitting signals among them. Each functional element has an associated switch matrix that permits
multiple connections to the routing.

44
Fig 5.1- SPARTAN-3 Family Architecture

5.5.1 CONFIGURATION

Spartan-3 FPGAs are programmed by loading configuration data into robust, reprogrammable, static
CMOS configuration latches (CCLs) that collectively control all functional elements and routing
resources. Before powering on the FPGA, configuration data is stored externally in a PROM or some
other nonvolatile medium either on or off the board. After applying power, the configuration data is
written to the FPGA using any of five different modes: Master Parallel, Slave Parallel, Master Serial,
Slave Serial, and Boundary Scan (JTAG). The Master and Slave Parallel modes use an 8-bit-wide Select
MAP port.

The recommended memory for storing the configuration data is the low-cost Xilinx Platform
Flash PROM family, which includes the XCF00S PROMs for serial configuration and the higher density
XCF00P PROMs for parallel or serial configuration.

45
5.5.2 I/O CAPABILITIES

The Select IO feature of Spartan-3 devices supports 18 single- ended standards and 8 differential
standards. Many standards support the DCI feature, which uses integrated terminations to eliminate
unwanted signal reflections.

Package Marking

Figure 2 shows the top marking for Spartan-3 FPGAs in the quad-flat packages. Figure 3 shows the top
marking for Spartan-3 FPGAs in BGA packages except the 132-ball chip-scale package (CP132 and
CPG132). The markings for the BGA packages are nearly identical to those for the quad-flat packages,
except that the marking is rotated with respect to the ball A1 indicator. Figure 4 shows the top marking for
Spartan-3 FPGAs in the CP132 and CPG132 packages. The 5C and 4I part combinations may be dual
marked as 5C/4I. Devices with the dual mark can be used as either -5C or -4I devices. Devices with a
single mark are only guaranteed for the marked speed grade and temperature range. Some specifications
vary according to mask revision. Mask revision E devices are errata-free. All shipments since 2006 have
been mask revision E.

Fig5.2 - Spartan-3 QFP Package Marking Example for Part Number XC3S400-4PQ208C

46
Fig 5.3. Spartan-3 BGA Package Marking Example for Part Number XC3S1000-4FT256C

Fig 5.4. Spartan-3 CP132 and CPG132 Package Marking Example for XC3S50-4CP132C

47
Ordering Information

Spartan-3 FPGAs are available in both standard and Pb-free packaging options for all device/package
combinations. The Pb-free packages include a special G character in the ordering code.

Fig 5.5 Standard Packaging

For additional information on Pb-free packaging, see XAPP427: "Implementation and Solder Reflow
Guidelines for Pb-Free Packages".

48
Fig 5.6-Pb-Free Packaging

5.6 SIMULATION IMPLEMENTATION

Since the early 1980s, when schematic capture was introduced as an efficient way to design very
large-scale integration (VLSI) circuits, it has been the design method of choice for designers in the world
of VLSI design. However, the use of this method reached its limits in the early 1990s, as more and more
logic functionality and features were integrated onto a single chip. Today, most application-specific
integrated circuit (ASIC) chips consist of no fewer than one million transistors.

Designing circuits this large using the method of schematic capture is time consuming and is no longer
efficient. Therefore, a more efficient manner of design was required.

This new method had to increase the designers efficiency and allow ease of design, even when
dealing with large circuits. From this requirement arose the wide acceptance of HDL (hardware
description language). HDL allows a designer to describe the functionality of a required logic circuit in a
language that is easy to understand. The description is then simulated using test benches. After the HDL
description is verified for logic functionality, it is synthesized to logic gates by using synthesis tools. This
method helps a designer to design a circuit in a shorter timeframe.

The savings in design time is achieved because the designer need not be concerned with the
intricate complexities that exist in a particular circuit, but instead is focused on the functionality that is

49
required. This new method of design has been widely adopted today in the field of ASIC design. It allows
designers to design large numbers of logic gates to implement logic features and functionality that are
required on an ASIC chipAs the size and complexity of digital systems increase, more computer aided
design (CAD) tools are introduced into the hardware design process. Early simulation and primitive
hardware generation tools have given way to sophisticated design entry, verification, high-level synthesis,
formal verification, and automatic hardware generation and device programming tools. Growth of design
automation tools is largely due to hardware description languages (HDLs) and design methodologies that
are based on these languages. Based on HDLs, new digital system CAD tools have been developed

. At the same time research for finding better and more abstract hardware languages continues.
One of the most widely used HDLs is the Verilog HDL. Because of its wide acceptance in digital design
industry, Verilog has become a must-know for design engineers and students in computer-hardware-
related fields. This chapter presents tools and environments that are based on Verilog and are available to
a hardware designer for automating his or her design process, and hence improving the final products
time to market. We discuss steps involved in taking a hierarchical, high-level design from a Verilog
description of the design to its implementation in hardware. Processes and terminologies are illustrated
here. We discuss available electronic design automation (EDA) tools that are based on Verilog, and talk
about their role in an automated design environment. The last section of this chapter discusses some of the
properties of Verilog that make this language a good choice for designers and modelers of hardware.

Verilog HDL

The previous section showed steps involved in taking an RT level design from a Verilog
description to hardware implementation. This design process is only possible because Verilog is a
language that can be understood by system designers, RT level designers, test engineers, simulators,
synthesis tools, and machines. Because of this important role in design, Verilog has become an IEEE
standard. The standard is used by users as well as tool developers.

5.6.1 VERILOG EVOLUTION

Verilog was designed in early 1984 by Gateway Design Automation. Initially the original
language was used as a simulation and verification tool. After the initial acceptance of this language

50
by electronic industry, a fault simulator, a timing analyzer, and later in 1987, a synthesis tool was
developed based on this language. Gateway Design Automation and its Verilog-based tools were later
acquired by Cadence Design System. Since then, Cadence has been a strong force behind popularizing
the Verilog hardware description language. In 1987 VHDL became an IEEE standard hardware
description language. Because of its Department of Defense (DoD) support, VHDL was adapted by the
U.S. government for related projects and contracts. In an effort for popularizing Verilog, in 1990, OVI
(Open Verilog International) was formed and Verilog was placed in public domain.

This created a new line of interest in Verilog for the users and EDA vendors. In 1993, efforts for
standardization of this language started. Verilog became the IEEE standard, IEEE Std. 1364-1995, in
1995. Already having simulation tools, synthesizers, fault simulation programs, timing analyzers, and
many of their design tools developed for Verilog, this standardization helped further acceptance of Verilog
in electronic design communities. Anew version of Verilog was approved by IEEE in 2001. This version
that is referred to as Verilog-2001 is the present standard used by most users and tool developers. New
features for external file access for read and write, library management, constructs for design
configuration, higher abstraction level constructs, and constructs for specification of iterative structures,
are some of the features added to this version of Verilog. Work on improving this standard continues in
various IEEE sponsored study groups.

5.6.2 VERILOG ATTRIBUTES

Verilog is a hardware description language for describing hardware from transistor level to
behavioral. The language supports timing constructs for switch level timing simulation and at the same
time, it has features for describing hardware at the abstract algorithmic level. A Verilog description may
consist of a mix of modules at various abstraction levels with different degrees of detail.

Switch Level

51
Features of the language that make it ideal for switch level modeling and simulation includes primitive
unidirectional

Digital System Design Automation With Verilog

and bidirectional switches with parameters for delay and charge storage. Circuit delays may be modeled
as propagation delay, rise and fall delay, and line delays. The charge storage feature at this level of
abstraction in Verilog makes this language capable of describing dynamic complimentary metal oxide
semicondutor (CMOS) and metal oxide semiconductor (MOS) circuits.

Gate Level

Gate level primitives with predefined parameters provide a convenient platform for netlist
representation and gate level simulation. For more detailed and special purpose gate simulations, gate
components may be defined at the behavioral level. Verilog also provides utilities for defining primitives
with special functionalities. A simple 4-value logic system is used in Verilog for signal values. However,
for more accurate logic modeling, Verilog signals also include 16 levels of strength in addition to the four
values.

Pin-To-Pin Delay

Autility for timing specification of components at the input/output level is provided in Verilog. This
utility can be used for back annotation of timing information in original predesigned descriptions.
Moreover, the pin-to-pin language facility enables modelers to finetune timing behavior of their models
based on physical implementations.

Bussing Specifications

Bus and register modeling utilities are provided in Verilog. For various bus structures, Verilog supports
predefined wire and bus resolution functions using its 4-value logic value system. Combination of bus
logic and resolution-functions enable modeling of most physical bus types. For register modeling, high-
level clock representation and timing-control constructs can be used for representation of registers with
various clocking and resetting schemes.

Behavioral Level

52
Procedural blocks of Verilog enable algorithmic representations of hardware structures. Constructs
similar to those in software programming languages are provided for describing hardware at this level.

System Utilities

System tasks in Verilog provide designers with tools for testbench generation, file access for read and
write, data handling, data generation, and special hardware modeling. System utilities for reading memory
and programmable logic array (PLA) images provide convenient ways of modeling these components.
Verilog display and I/O tasks can be used to handle all inputs and outputs for data application and
simulation. Verilog allows random access to files for read and write operations.

Programming Language Interface (Pli)

Programming language interface (PLI) of Verilog provides an environment for accessing Verilog data
structures using a library of C-language functions.

5.7 THE VERILOG LANGUAGE

The Verilog HDL satisfies all requirements for design and synthesis of digital systems. The
language supports hierarchical description of hardware from system to gate or even switch level. Verilog
has strong support at all levels for timing specification and violation detection. Timing and concurrency
required for hardware modeling are specially emphasized. In Verilog a hardware component is described
by the module declaration language construct. Description of a module specifies a components input and
output list as well as internal component busses and registers.

Within a module, concurrent assignments, component instantiations, and procedural blocks can
be used to describe a hardware component. Several modules can hierarchically be instantiated to form
other hardware structures. Leaves of a hierarchical design specification may be modules, primitives, or
user defined primitives. For simulating a design, it is expected that all leaves of the hierarchy are
individually compiled. Many Verilog tools and environments exist that provide simulation, fault
simulation, formal verification, and synthesis. Simulation environments provide graphical front-end
programs and waveform editing and display tools. Synthesis tools are based on a subset of Verilog. For
synthesizing a design, target hardware, e.g., specific FPGA or ASIC, must be known.

53
This chapter is about the software language and the tools used in the development of the project. The
platform used here is JAVA. The Primary languages are JAVA,J2EE and J2ME. In this project J2EE is
chosen for implementation.

Chapter 5
Resulsts

Simulation Results:

54
Synthesis Results:
Rtl schematic:

Technology schematic:

Chapter 6
Conclusion
This paper proposed a low-power and area-efficient shift register using pulsed latches.
The shift register reduces area and power consumption by replacing flip-flops with pulsed
latches. The timing problem between pulsed latches is solved using multiple non- overlap
delayed pulsed clock signals instead of a single pulsed clock signal. A small number of the

55
pulsed clock signals is used by grouping the latches to several sub shifter registers and using
additional temporary storage latches. A 256-bit shift register was fabricated using a 0.18
CMOS process with . Its core area is . It con-sumes 1.2 mW at a 100 MHz clock
frequency. The proposed shift register saves 37% area and 44% power compared to the
conventional shift register with flip-flops.

REFERENCES

[1] P. Reyes, P. Reviriego, J. A. Maestro, and O. Ruano, New protection techniques against
SEUs for moving average filters in a radiation en-vironment, IEEE Trans. Nucl. Sci., vol.
54, no. 4, pp. 957964, Aug. 2007.
[2] M. Hatamian et al., Design considerations for gigabit ethernet 1000 base-T twisted pair
transceivers, Proc. IEEE Custom Integr. Circuits Conf., pp. 335342, 1998.

56
[3] H. YamasakiandT. Shibata,Areal-timeimage-feature-extractionand vector-generation vlsi
employing arrayed-shift-register architecture, IEEE J. Solid-State Circuits, vol. 42, no. 9,
pp. 20462053, Sep. 2007.
[4] H.-S. Kim, J.-H. Yang, S.-H. Park, S.-T. Ryu, andG.-H. Cho,A 10-bit column-driver IC
with parasitic-insensitive iterative charge-sharing based capacitor-string interpolation for
mobile active-matrix LCDs, IEEE J. Solid-State Circuits, vol. 49, no. 3, pp. 766782,
Mar. 2014.
[5] S.-H. W. Chiang andS. Kleinfelder,Scalingand design ofa16-mega-pixel CMOS image
sensor for electron microscopy, in Proc. IEEE Nucl. Sci. Symp. Conf. Record (NSS/MIC),
2009, pp. 12491256.
[6] S. Heo, R. Krashinsky, and K. Asanovic, Activity-sensitive flip-flop and latch selection
for reduced energy, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 9, pp.
10601064, Sep. 2007.
[7] S. Naffziger and G. Hammond, The implementation of the nextgen-eration 64 b itanium
microprocessor, in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb.
2002, pp. 276504.
[8] H. Partovi et al., Flow-through latch and edge-triggered flip-flop hy-bridelements, IEEE
Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 138139, Feb. 1996.
[9] E. Consoli, M. Alioto, G. Palumbo, and J. Rabaey, Conditional push-pull pulsed latch
with 726 fJops energy delay product in 65 nm CMOS, in IEEE Int. Solid-State Circuits
Conf. (ISSCC) Dig. Tech. Papers, Feb. 2012, pp. 482483.
[10] V. Stojanovic and V. Oklobdzija, Comparative analysis of master-slave latches and flip-
flops for high-performance and low-power sys-tems, IEEE J. Solid-State Circuits, vol. 34,
no. 4, pp. 536548, Apr. 1999.
[11] J. Montanaro et al., A 160-MHz, 32-b, 0.5-W CMOS RISC micropro-cessor, IEEE J.
Solid-State Circuits, vol. 31, no. 11, pp. 17031714, Nov. 1996.
[12] S. Nomura et al., A 9.7 mW AAC-decoding, 620 mW H.264 720p 60fps
decoding, 8-core media processor with embedded forward-body-biasing and power-
gating circuit in 65 nm CMOS technology, in IEEE Int. Solid-State Circuits Conf.
(ISSCC) Dig. Tech. Papers, Feb. 2008, pp. 262264.
[13] Y. Ueda et al., 6.33 mW MPEG audio decoding on a multimedia pro-cessor, in
IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Pa-pers, Feb. 2006, pp. 1636
1637.
[14] B.-S. Kong, S.-S. Kim, and Y.-H. Jun, Conditional-capture flip-flop for statistical
power reduction, IEEE J. Solid-State Circuits, vol. 36, pp. 12631271, Aug. 2001.

57
[15] C. K. Teh, T. Fujita, H. Hara, and M. Hamada, A 77% energy-saving 22-
transistor single-phase-clocking D-flip-flop with adaptive-coupling configuration in 40
nm CMOS, in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb.
2011, pp. 338339.

58

Das könnte Ihnen auch gefallen