Design of Embedded Processors

Design of Embedded
Processors

Lesson
20

Field Programmable Gate
Arrays and Applications
Instructional Objectives

After going through this lesson the student will be able to

Define what is a field programmable gate array (FPGA)
Distinguish between an FPGA and a stored-memory processor
List and explain the principle of operation of the various functional units within an FPGA
Compare the architecture and performance specifications of various commercially
available FPGA
Describe the steps in using an FPGA in an embedded system

Introduction

An FPGA is a device that contains a matrix of reconfigurable gate array logic circuitry.
When a FPGA is configured, the internal circuitry is connected in a way that creates a hardware
implementation of the software application. Unlike processors, FPGAs use dedicated hardware
for processing logic and do not have an operating system. FPGAs are truly parallel in nature so
different processing operations do not have to compete for the same resources. As a result, the
performance of one part of the application is not affected when additional processing is added.
Also, multiple control loops can run on a single FPGA device at different rates. FPGA-based
control systems can enforce critical interlock logic and can be designed to prevent I/O forcing by
an operator. However, unlike hard-wired printed circuit board (PCB) designs which have fixed
hardware resources, FPGA-based systems can literally rewire their internal circuitry to allow
reconfiguration after the control system is deployed to the field. FPGA devices deliver the
performance and reliability of dedicated hardware circuitry.
A single FPGA can replace thousands of discrete components by incorporating millions of logic
gates in a single integrated circuit (IC) chip. The internal resources of an FPGA chip consist of a
matrix of configurable logic blocks (CLBs) surrounded by a periphery of I/O blocks shown in
Fig. 20.1. Signals are routed within the FPGA matrix by programmable interconnect switches
and wire routes.

Fig. 20.1 Internal Structure of FPGA
PROGRAMMABLE
INTERCONNECT
LOGIC BLOCKS
I/O BLOCKS

In an FPGA logic blocks are implemented using multiple level low fan-in gates, which gives it a
more compact design compared to an implementation with two-level AND-OR logic. FPGA
provides its user a way to configure:
1. The intersection between the logic blocks and
2. The function of each logic block.
Logic block of an FPGA can be configured in such a way that it can provide functionality as
simple as that of transistor or as complex as that of a microprocessor. It can used to implement
different combinations of combinational and sequential logic functions. Logic blocks of an
FPGA can be implemented by any of the following:
1. Transistor pairs
2. combinational gates like basic NAND gates or XOR gates
3. n-input Lookup tables
4. Multiplexers
5. Wide fan-in And-OR structure.
Routing in FPGAs consists of wire segments of varying lengths which can be interconnected via
electrically programmable switches. Density of logic block used in an FPGA depends on length
and number of wire segments used for routing. Number of segments used for interconnection
typically is a tradeoff between density of logic blocks used and amount of area used up for
routing. Simplified version of FPGA internal architecture with routing is shown in Fig. 20.2.

Fig. 20.2 Simplified Internal Structure of FPGA
Logic
block
I/O block

Why do we need FPGAs?

By the early 1980s large scale integrated circuits (LSI) formed the back bone of most of
the logic circuits in major systems. Microprocessors, bus/IO controllers, system timers etc were
implemented using integrated circuit fabrication technology. Random glue logic or
interconnects were still required to help connect the large integrated circuits in order to:
1. Generate global control signals (for resets etc.)
2. Data signals from one subsystem to another sub system.
Systems typically consisted of few large scale integrated components and large number of SSI
(small scale integrated circuit) and MSI (medium scale integrated circuit) components.Intial
attempt to solve this problem led to development of Custom ICs which were to replace the large
amount of interconnect. This reduced system complexity and manufacturing cost, and improved
performance. However, custom ICs have their own disadvantages. They are relatively very
expensive to develop, and delay introduced for product to market (time to market) because of
increased design time. There are two kinds of costs involved in development of custom ICs
1. Cost of development and design
2. Cost of manufacture
(A tradeoff usually exists between the two costs)
Therefore the custom IC approach was only viable for products with very high volume, and
which were not time to market sensitive.FPGAs were introduced as an alternative to custom ICs
for implementing entire system on one chip and to provide flexibility of reporogramability to the
user. Introduction of FPGAs resulted in improvement of density relative to discrete SSI/MSI
components (within around 10x of custom ICs). Another advantage of FPGAs over Custom ICs
is that with the help of computer aided design (CAD) tools circuits could be implemented in a
short amount of time (no physical layout process, no mask making, no IC manufacturing)

Evaluation of FPGA

In the world of digital electronic systems, there are three basic kinds of devices: memory,
microprocessors, and logic. Memory devices store random information such as the contents of a
spreadsheet or database. Microprocessors execute software instructions to perform a wide variety
of tasks such as running a word processing program or video game. Logic devices provide
specific functions, including device-to-device interfacing, data communication, signal
processing, data display, timing and control operations, and almost every other function a system
must perform.
The first type of user-programmable chip that could implement logic circuits was the
Programmable Read-Only Memory (PROM), in which address lines can be used as logic circuit
inputs and data lines as outputs. Logic functions, however, rarely require more than a few
product terms, and a PROM contains a full decoder for its address inputs. PROMS are thus an
inefficient architecture for realizing logic circuits, and so are rarely used in practice for that
purpose. The device that came as a replacement for the PROMs are programmable logic devices
or in short PLA. Logically, a PLA is a circuit that allows implementing Boolean functions in
sum-of-product form. The typical implementation consists of input buffers for all inputs, the
programmable AND-matrix followed by the programmable OR-matrix, and output buffers. The
input buffers provide both the original and the inverted values of each PLA input. The input lines
run horizontally into the AND matrix, while the so-called product-term lines run vertically.
Therefore, the size of the AND matrix is twice the number of inputs times the number of
product-terms.
When PLAs were introduced in the early 1970s, by Philips, their main drawbacks were that they
were expensive to manufacture and offered somewhat poor speed-performance. Both
disadvantages were due to the two levels of configurable logic, because programmable logic
planes were difficult to manufacture and introduced significant propagation delays. To overcome
these weaknesses, Programmable Array Logic (PAL) devices were developed. PALs provide
only a single level of programmability, consisting of a programmable wired AND plane that
feeds fixed OR-gates. PALs usually contain flip-flops connected to the OR-gate outputs so that
sequential circuits can be realized. These are often referred to as Simple Programmable Logic
Devices (SPLDs). Fig. 20.3 shows a simplified structure of PLA and PAL.

Fig. 20.3 Simplified Structure of PLA and PAL
Inputs
Outputs
PLA
Inputs
O
u
t
p
u
t
s

PAL

With the advancement of technology, it has become possible to produce devices with
higher capacities than SPLDs.As chip densities increased, it was natural for the PLD
manufacturers to evolve their products into larger (logically, but not necessarily physically) parts
called Complex Programmable Logic Devices (CPLDs). For most practical purposes, CPLDs
can be thought of as multiple PLDs (plus some programmable interconnect) in a single chip. The
larger size of a CPLD allows to implement either more logic equations or a more complicated
design.

Fig. 20.4 Internal structure of a CPLD
Logic
block
Logic
block
Logic
block
Logic
block
Switch
matrix

Fig. 20.4 contains a block diagram of a hypothetical CPLD. Each of the four logic blocks shown
there is the equivalent of one PLD. However, in an actual CPLD there may be more (or less) than
four logic blocks. These logic blocks are themselves comprised of macrocells and interconnect
wiring, just like an ordinary PLD.
Unlike the programmable interconnect within a PLD, the switch matrix within a CPLD
may or may not be fully connected. In other words, some of the theoretically possible
connections between logic block outputs and inputs may not actually be supported within a given
CPLD. The effect of this is most often to make 100% utilization of the macrocells very difficult
to achieve. Some hardware designs simply won't fit within a given CPLD, even though there are
sufficient logic gates and flip-flops available. Because CPLDs can hold larger designs than
PLDs, their potential uses are more varied. They are still sometimes used for simple applications
like address decoding, but more often contain high-performance control-logic or complex finite
state machines. At the high-end (in terms of numbers of gates), there is also a lot of overlap in
potential applications with FPGAs. Traditionally, CPLDs have been chosen over FPGAs
whenever high-performance logic is required. Because of its less flexible internal architecture,
the delay through a CPLD (measured in nanoseconds) is more predictable and usually shorter.
The development of the FPGA was distinct from the SPLD/CPLD evolution just described.This
is apparent from the architecture of FPGA shown in Fig 20.1. FPGAs offer the highest amount of
logic density, the most features, and the highest performance. The largest FPGA now shipping,
part of the Xilinx Virtex line of devices, provides eight million "system gates" (the relative
density of logic). These advanced devices also offer features such as built-in hardwired
processors (such as the IBM Power PC), substantial amounts of memory, clock management
systems, and support for many of the latest, very fast device-to-device signaling technologies.
FPGAs are used in a wide variety of applications ranging from data processing and storage, to
instrumentation, telecommunications, and digital signal processing. The value of programmable
logic has always been its ability to shorten development cycles for electronic equipment
manufacturers and help them get their product to market faster. As PLD (Programmable Logic
Device) suppliers continue to integrate more functions inside their devices, reduce costs, and
increase the availability of time-saving IP cores, programmable logic is certain to expand its
popularity with digital designers.

FPGA Structural Classification

Basic structure of an FPGA includes logic elements, programmable interconnects and memory.
Arrangement of these blocks is specific to particular manufacturer. On the basis of internal
arrangement of blocks FPGAs can be divided into three classes:

Symmetrical arrays

This architecture consists of logic elements (called CLBs) arranged in rows and columns
of a matrix and interconnect laid out between them shown in Fig 20.2. This symmetrical matrix
is surrounded by I/O blocks which connect it to outside world. Each CLB consists of n-input
Lookup table and a pair of programmable flip flops. I/O blocks also control functions such as tri-
state control, output transition speed. Interconnects provide routing path. Direct interconnects
between adjacent logic elements have smaller delay compared to general purpose interconnect

Row based architecture

Row based architecture shown in Fig 20.5 consists of alternating rows of logic modules
and programmable interconnect tracks. Input output blocks is located in the periphery of the
rows. One row may be connected to adjacent rows via vertical interconnect. Logic modules can
be implemented in various combinations. Combinatorial modules contain only combinational
elements which Sequential modules contain both combinational elements along with flip flops.
This sequential module can implement complex combinatorial-sequential functions. Routing
tracks are divided into smaller segments connected by anti-fuse elements between them.

Hierarchical PLDs

This architecture is designed in hierarchical manner with top level containing only logic
blocks and interconnects. Each logic block contains number of logic modules. And each logic
module has combinatorial as well as sequential functional elements. Each of these functional
elements is controlled by the programmed memory. Communication between logic blocks is
achieved by programmable interconnect arrays. Input output blocks surround this scheme of
logic blocks and interconnects. This type of architecture is shown in Fig 20.6.

Fig. 20.5 Row based Architecture
Routing
Channels
Logic
Block
Rows
I/O Blocks
I/O Blocks
I
/
O

B
l
o
c
k
s

I
/
O

B
l
o
c
k
s

Fig. 20.6 Hierarchical PLD
I/O Block
I/O Block
I
/
O

B
l
o
c
k

I
/
O

B
l
o
c
k

Logic
Module
Interconnects

FPGA Classification on user programmable switch technologies

FPGAs are based on an array of logic modules and a supply of uncommitted wires to
route signals. In gate arrays these wires are connected by a mask design during manufacture. In
FPGAs, however, these wires are connected by the user and therefore must use an electronic
device to connect them. Three types of devices have been commonly used to do this, pass
transistors controlled by an SRAM cell, a flash or EEPROM cell to pass the signal, or a direct
connect using antifuses. Each of these interconnect devices have their own advantages and
disadvantages. This has a major affect on the design, architecture, and performance of the FPGA.
Classification of FPGAs on user programmable switch technology is given in Fig. 20.7 shown
below.

Fig. 20.7 FPGA Classification on user programmable technology
FPGA
SRAM-
Programmed
Antifuse-
Programmed
EEPROM-
Programmed
Actel ACT1 & 2
Quicklogics pASIC
Crosspoints CP20K
Xilinx LCA
AT&T Orca
Altera Flex
Toshiba
Plessers ERA
Atmels CLi
Alteras MAX
AMDs Mach
Xilinxs EPLD

SRAM Based

The major advantage of SRAM based device is that they are infinitely re-programmable
and can be soldered into the system and have their function changed quickly by merely changing
the contents of a PROM. They therefore have simple development mechanics. They can also be
changed in the field by uploading new application code, a feature attractive to designers. It does
however come with a price as the interconnect element has high impedance and capacitance as
well as consuming much more area than other technologies. Hence wires are very expensive and
slow. The FPGA architect is therefore forced to make large inefficient logic modules (typically a
look up table or LUT).The other disadvantages are: They needs to be reprogrammed each time
when power is applied, needs an external memory to store program and require large area. Fig.
20.8 shows two applications of SRAM cells: for controlling the gate nodes of pass-transistor
switches and to control the select lines of multiplexers that drive logic block inputs. The figures
gives an example of the connection of one logic block (represented by the AND-gate in the upper
left corner) to another through two pass-transistor switches, and then a multiplexer, all controlled
by SRAM cells . Whether an FPGA uses pass-transistors or multiplexers or both depends on the
particular product.

Fig. 20.8 SRAM-controlled Programmable Switches.
Logic Cell Logic Cell
Logic Cell
Logic Cell
SRAM
SRAM
SRAM

Antifuse Based

The antifuse based cell is the highest density interconnect by being a true cross point.
Thus the designer has a much larger number of interconnects so logic modules can be smaller
and more efficient. Place and route software also has a much easier time. These devices however
are only one-time programmable and therefore have to be thrown out every time a change is
made in the design. The Antifuse has an inherently low capacitance and resistance such that the
fastest parts are all Antifuse based. The disadvantage of the antifuse is the requirement to
integrate the fabrication of the antifuses into the IC process, which means the process will always
lag the SRAM process in scaling. Antifuses are suitable for FPGAs because they can be built
using modified CMOS technology. As an example, Actels antifuse structure is depicted in Fig.
20.9. The figure shows that an antifuse is positioned between two interconnect wires and
physically consists of three sandwiched layers: the top and bottom layers are conductors, and the
middle layer is an insulator. When unprogrammed, the insulator isolates the top and bottom
layers, but when programmed the insulator changes to become a low-resistance link. It uses
Poly-Si and n+diffusion as conductors and ONO as an insulator, but other antifuses rely on
metal for conductors, with amorphous silicon as the middle layer.

wire
wire
antifuse
oxide
dielectric
Poly-Si
n+ diffusion
Silicon substrate
Fig. 20.9 Actel Antifuse Structure.

EEPROM Based

The EEPROM/FLASH cell in FPGAs can be used in two ways, as a control device as in
an SRAM cell or as a directly programmable switch. When used as a switch they can be very
efficient as interconnect and can be reprogrammable at the same time. They are also non-volatile
so they do not require an extra PROM for loading. They, however, do have their detractions. The
EEPROM process is complicated and therefore also lags SRAM technology.

Logic Block and Routing Techniques

Crosspoint FPGA: consist of two types of logic blocks. One is transistor pair tiles in which
transistor pairs run in parallel lines as shown in figure below:
Transistor Pair
Fig. 20.10 Transistor pair tiles in cross-point FPGA.

second type of logic blocks are RAM logic which can be used to implement random access
memory.

Plessey FPGA: Basic building block here is 2-input NAND gate which is connected to each
other to implement desired function.

Fig. 20.11 Plessey Logic Block
Latch
Config RAM
8

i
n
t
e
r
c
o
n
n
e
c
t

l
i
n
e
s

8
-
2

m
u
l
t
i
p
l
e
x
e
r

CLK
Data

Both Crosspoint and Plessey are fine grain logic blocks. Fine grain logic blocks have an
advantage in high percentage usage of logic blocks but they require large number of wire
segments and programmable switches which occupy lot of area.

Actel Logic Block: If inputs of a multiplexer are connected to a constant or to a signal, it can be
used to implement different logic functions. For example a 2-input multiplexer with inputs a and
b, select, will implement function ac +bc. If b=0 then it will implement ac, and if a=0 it will
implement bc.

Typically an Actel logic block consists of multiple number of multiplexers and logic gates.
w
x
y
z
0
1
0
1
0
1
n1
n2
n3 n4
Fig. 20.12 Actel Logic Block

Xilinx Logic block

In Xilinx logic block Look up table is used to implement any number of different
functionality. The input lines go into the input and enable of lookup table. The output of the
lookup table gives the result of the logic function that it implements. Lookup table is
implemented using SRAM.

Fig. 20.13 Xilinx - LUT based
Inputs
Data in
Enable
clock
Clock
Reset
Look-up
Table
Vix
Gnd
(Global Reset)
OR
A
B
C
D
E
X
Y
M
U
X
M
U
X
S
R
R
S
Outputs

A k-input logic function is implemented using 2^k * 1 size SRAM. Number of different possible
functions for k input LUT is 2^2^k. Advantage of such an architecture is that it supports
implementation of so many logic functions, however the disadvantage is unusually large number
of memory cells required to implement such a logic block in case number of inputs is large. Fig.
20.13 shows 5-input LUT based implementation of logic block LUT based design provides for
better logic block utilization. A k-input LUT based logic block can be implemented in number of
different ways with tradeoff between performance and logic density.

Set by configuration
bit-stream
Logic Block
INPUTS 4-LUT FF
OUTPUT
4-input look up table
latch
1
0

An n-lut can be shown as a direct implementation of a function truth-table. Each of the latch
holds the value of the function corresponding to one input combination. For Example: 2-lut
shown in figure below implements 2 input AND and OR functions.
Example: 2-lut
INPUTS AND OR
00 0 0
01 0 1
10 0 1
11 1 1

Altera Logic Block

Altera's logic block has evolved from earlier PLDs. It consists of wide fan in (up to 100
input) AND gates feeding into an OR gate with 3-8 inputs. The advantage of large fan in AND
gate based implementation is that few logic blocks can implement the entire functionality
thereby reducing the amount of area required by interconnects. On the other hand disadvantage is
the low density usage of logic blocks in a design that requires fewer input logic. Another
disadvantage is the use of pull up devices (AND gates) that consume static power. To improve
power manufacturers provide low power consuming logic blocks at the expense of delay. Such
logic blocks have gates with high threshold as a result they consume less power. Such logic
blocks can be used in non-critical paths.
Altera, Xilinx are coarse grain architecture.

Example: Alteras FLEX 8000 series consists of a three-level hierarchy. However, the lowest
level of the hierarchy consists of a set of lookup tables, rather than an SPLD like block, and so
the FLEX 8000 is categorized here as an FPGA. It should be noted, however, that FLEX 8000 is
a combination of FPGA and CPLD technologies. FLEX 8000 is SRAM-based and features a
four-input LUT as its basic logic block. Logic capacity ranges from about 4000 gates to more
than 15,000 for the 8000 series. The overall architecture of FLEX 8000 is illustrated in Fig.
20.14.

Fig. 20.14 Architecture of Altera FLEX 8000 FPGAs.
Fast Track
interconnect
LAB
(8 Logic Elements &
local interconnect)
I/O
I/O

The basic logic block, called a Logic Element (LE) contains a four-input LUT, a flip-flop, and
special-purpose carry circuitry for arithmetic circuits. The LE also includes cascade circuitry that
allows for efficient implementation of wide AND functions. Details of the LE are illustrated in
Fig. 20.15.

Fig. 20.15 Altera FLEX 8000 Logic Element (LE).
Cascade out
LE out
Carry out
Cascade in
data1
data2
data3
data4
cntrl2
cntrl3
cntrl4
cntrl1
Carry in
Carry
clock
set/clear
Cascade
D
R
S
Q
Look-up
Table

In the FLEX 8000, LEs are grouped into sets of 8, called Logic Array Blocks (LABs, a term
borrowed from Alteras CPLDs). As shown in Fig. 20.16, each LAB contains local interconnect
and each local wire can connect any LE to any other LE within the same LAB. Local
interconnect also connects to the FLEX 8000s global interconnect, called FastTrack. All
FastTrack wires horizontal wires are identical, and so interconnect delays in the FLEX 8000 are
more predictable than FPGAs that employ many smaller length segments because there are fewer
programmable switches in the longer path

Fig. 20.16 Altera FLEX 8000 Logic Array Block (LAB).
To Fast Track
interconnect
To Fast Track
interconnect
To Fast Track
interconnect
From Fast Track
interconnect
to adjacent LAB
Local interconnect
data
cntrl Cascade, carry
LE
LE
LE
4
4 2

FPGA Design Flow

One of the most important advantages of FPGA based design is that users can design it
using CAD tools provided by design automation companies. Generic design flow of an FPGA
includes following steps:

System Design

At this stage designer has to decide what portion of his functionality has to be implemented on
FPGA and how to integrate that functionality with rest of the system.

I/O integration with rest of the system

Input Output streams of the FPGA are integrated with rest of the Printed Circuit Board, which
allows the design of the PCB early in design process. FPGA vendors provide extra automation
software solutions for I/O design process.

Design Description

Designer describes design functionality either by using schematic editors or by using one of the
various Hardware Description Languages (HDLs) like Verilog or VHDL.

Synthesis

Once design has been defined CAD tools are used to implement the design on a given FPGA.
Synthesis includes generic optimization, slack optimizations, power optimizations followed by
placement and routing. Implementation includes Partition, Place and route. The output of design
implementation phase is bit-stream file.

Design Verification

Bit stream file is fed to a simulator which simulates the design functionality and reports errors in
desired behavior of the design. Timing tools are used to determine maximum clock frequency of
the design. Now the design is loading onto the target FPGA device and testing is done in real
environment.

Hardware design and development

The process of creating digital logic is not unlike the embedded software development
process. A description of the hardware's structure and behavior is written in a high-level
hardware description language (usually VHDL or Verilog) and that code is then compiled and
downloaded prior to execution. Of course, schematic capture is also an option for design entry,
but it has become less popular as designs have become more complex and the language-based
tools have improved. The overall process of hardware development for programmable logic is
shown in Fig. 20.17 and described in the paragraphs that follow.
Perhaps the most striking difference between hardware and software design is the way a
developer must think about the problem. Software developers tend to think sequentially, even
when they are developing a multithreaded application. The lines of source code that they write
are always executed in that order, at least within a given thread. If there is an operating system it
is used to create the appearance of parallelism, but there is still just one execution engine. During
design entry, hardware designers must think-and program-in parallel. All of the input signals are
processed in parallel, as they travel through a set of execution engines-each one a series of
macrocells and interconnections-toward their destination output signals. Therefore, the
statements of a hardware description language create structures, all of which are "executed" at
the very same time.

Fig. 20.17 Programmable logic design process
Design Entry
Simulation
Synthesis
Place and Route
Download
Design
Constraints
Design
Library

Typically, the design entry step is followed or interspersed with periods of functional simulation.
That's where a simulator is used to execute the design and confirm that the correct outputs are
produced for a given set of test inputs. Although problems with the size or timing of the
hardware may still crop up later, the designer can at least be sure that his logic is functionally
correct before going on to the next stage of development.
Compilation only begins after a functionally correct representation of the hardware exists. This
hardware compilation consists of two distinct steps. First, an intermediate representation of the
hardware design is produced. This step is called synthesis and the result is a representation called
a netlist. The netlist is device independent, so its contents do not depend on the particulars of the
FPGA or CPLD; it is usually stored in a standard format called the Electronic Design
Interchange Format (EDIF).
The second step in the translation process is called place & route. This step involves mapping the
logical structures described in the netlist onto actual macrocells, interconnections, and input and
output pins. This process is similar to the equivalent step in the development of a printed circuit
board, and it may likewise allow for either automatic or manual layout optimizations. The result
of the place & route process is a bitstream. This name is used generically, despite the fact that
each CPLD or FPGA (or family) has its own, usually proprietary, bitstream format. Suffice it to
say that the bitstream is the binary data that must be loaded into the FPGA or CPLD to cause that
chip to execute a particular hardware design.
Increasingly there are also debuggers available that at least allow for single-stepping the
hardware design as it executes in the programmable logic device. But those only complement a
simulation environment that is able to use some of the information generated during the place &
route step to provide gate-level simulation. Obviously, this type of integration of device-specific
information into a generic simulator requires a good working relationship between the chip and
simulation tool vendors.

Things to Ponder

Q.1 Define the following acronyms as they apply to digital logic circuits:
ASIC
PAL
PLA
PLD
CPLD
FPGA

Q2.How granularity of logic block influences the performance of an FPGA?

Q3. Why would anyone use programmable logic devices (PLD, PAL, PLA, CPLD, FPGA,
etc.) in place of traditional "hard-wired" logic such as NAND, NOR, AND, and OR gates? Are
there any applications where hard-wired logic would do a better job than a programmable
device?

Q4.Some programmable logic devices (and PROM memory devices as well) use tiny fuses
which are intentionally "blown" in specific patterns to represent the desired program.
Programming a device by blowing tiny fuses inside of it carries certain advantages and
disadvantages - describe what some of these are.

Q5. Use one 4 x 8 x 4 PLA to implement the function.

1
2
( , , , ) ' ' ' ' '
( , , , ) ' ' '
= + +
= +
F w x y z wx y z wx yz wxy
F w x y z wx y x y z

Lesson
21

Introduction to Hardware
Description Languages - I

At the end of the lesson the student should be able to

Describe a digital IC design flow and explain its various abstraction levels.
Explain the need for a hardware description language in the IC desing flow
Model simple hardware devices at various levels of abstraction using Verilog
(Gate/Switch/Behavioral)
Write Verilog codes meeting the prescribed requirement at a specified level

1.1 Introduction

1.1.1 What is a HDL and where does Verilog come?

HDL is an abbreviation of Hardware Description Language. Any digital system can be
represented in a REGISTER TRANSFER LEVEL (RTL) and HDLs are used to describe this
RTL. Verilog is one such HDL and it is a general-purpose language easy to learn and use.
Its syntax is similar to C. The idea is to specify how the data flows between registers and how
the design processes the data. To define RTL, hierarchical design concepts play a very
significant role. Hierarchical design methodology facilitates the digital design flow with several
levels of abstraction. Verilog HDL can utilize these levels of abstraction to produce a simplified
and efficient representation of the RTL description of any digital design.
For example, an HDL might describe the layout of the wires, resistors and transistors on
an Integrated Circuit (IC) chip, i.e., the switch level or, it may describe the design at a more
micro level in terms of logical gates and flip flops in a digital system, i.e., the gate level. Verilog
supports all of these levels.

1.1.2 Hierarchy of design methodologies

Bottom-Up Design

The traditional method of electronic design is bottom-up (designing from transistors and moving
to a higher level of gates and, finally, the system). But with the increase in design complexity
traditional bottom-up designs have to give way to new structural, hierarchical design methods.

Top-Down Design

For HDL representation it is convenient and efficient to adapt this design-style. A real top-down
design allows early testing, fabrication technology independence, a structured system design and
offers many other advantages. But it is very difficult to follow a pure top-down design. Due to
this fact most designs are mix of both the methods, implementing some key elements of both
design styles.

1.1.3 Hierarchical design concept and Verilog

To follow the hierarchical design concepts briefly mentioned above one has to describe the
design in terms of entities called MODULES.

Modules

A module is the basic building block in Verilog. It can be an element or a collection of low level
design blocks. Typically, elements are grouped into modules to provide common functionality
used in places of the design through its port interfaces, but hides the internal implementation.

1.1.4 Abstraction Levels

Behavioral level
Register-Transfer Level
Gate Level
Switch level

Behavioral or algorithmic Level

This level describes a system by concurrent algorithms (Behavioral). Each algorithm itself is
sequential meaning that it consists of a set of instructions that are executed one after the other.
initial, always ,functions and tasks blocks are some of the elements used to define the
system at this level. The intricacies of the system are not elaborated at this stage and only the
functional description of the individual blocks is prescribed. In this way the whole logic
synthesis gets highly simplified and at the same time more efficient.

Register-Transfer Level

Designs using the Register-Transfer Level specify the characteristics of a circuit by operations
and the transfer of data between the registers. An explicit clock is used. RTL design contains
exact timing possibility, operations are scheduled to occur at certain times. Modern definition of
a RTL code is "Any code that is synthesizable is called RTL code".

Gate Level

Within the logic level the characteristics of a system are described by logical links and their
timing properties. All signals are discrete signals. They can only have definite logical values (`0',
`1', `X', `Z`). The usable operations are predefined logic primitives (AND, OR, NOT etc gates).
It must be indicated here that using the gate level modeling may not be a good idea in logic
design. Gate level code is generated by tools like synthesis tools in the form of netlists which are
used for gate level simulation and for backend.

Switch Level

This is the lowest level of abstraction. A module can be implemented in terms of switches,
storage nodes and interconnection between them.

However, as has been mentioned earlier, one can mix and match all the levels of abstraction in a
design. RTL is frequently used for Verilog description that is a combination of behavioral and
dataflow while being acceptable for synthesis.

Instances

A module provides a template from where one can create objects. When a module is invoked
Verilog creates a unique object from the template, each having its own name, variables,
parameters and I/O interfaces. These are known as instances.

1.1.5 The Design Flow

This block diagram describes a typical design flow for the description of the digital design for
both ASIC and FPGA realizations.

LEVEL OF FLOW TOOLS USED
Specification
Word processor like Word, Kwriter, AbiWord, Open
Office
High Level Design
Word processor like Word, Kwriter, AbiWord, for
drawing waveform use tools like waveformer or
testbencher or Word, Open Office.
Micro Design/Low level
design
Word processor like Word, Kwriter, AbiWord, for
drawing waveform use tools like waveformer or
testbencher or Word. For FSM StateCAD or some similar
tool, Open Office
RTL Coding Vim, Emacs, conTEXT, HDL TurboWriter
Simulation
Modelsim, VCS, Verilog-XL, Veriwell, Finsim, iVerilog,
VeriDOS
Synthesis
Design Compiler, FPGA Compiler, Synplify, Leonardo
Spectrum. You can download this from FPGA vendors
like Altera and Xilinx for free
Place & Route
For FPGA use FPGA' vendors P&R tool. ASIC tools
require expensive P&R tools like Apollo. Students can use
LASI, Magic
Post Si Validation
For ASIC and FPGA, the chip needs to be tested in real
environment. Board design, device drivers needs to be in
place

Specification

This is the stage at which we define the important parameters of the system that has to be
designed. For example for designing a counter one has to decide its bit-size, whether it should
have synchronous reset whether it must be active high enable etc.

High Level Design

This is the stage at which one defines various blocks in the design in the form of modules and
instances. For instance for a microprocessor a high level representation means splitting the
design into blocks based on their function. In this case the various blocks are registers, ALU,
Instruction Decode, Memory Interface, etc.

Micro Design/Low level design

Low level design or Micro design is the phase in which, designer describes how each block is
implemented. It contains details of State machines, counters, Mux, decoders, internal registers.
For state machine entry you can use either Word, or special tools like State CAD. It is always a
good idea if waveform is drawn at various interfaces. This is the phase, where one spends lot of
time. A sample low level design is indicated in the figure below.

RTL Coding

In RTL coding, Micro Design is converted into Verilog/VHDL code, using synthesizable
constructs of the language. Normally, vim editor is used, and conTEXT, Nedit and Emacs are
other choices.

Simulation

Simulation is the process of verifying the functional characteristics of models at any level of
abstraction. We use simulators to simulate the the Hardware models. To test if the RTL code
meets the functional requirements of the specification, see if all the RTL blocks are functionally
correct. To achieve this we need to write testbench, which generates clk, reset and required test
vectors. A sample testbench for a counter is as shown below. Normally, we spend 60-70% of
time in verification of design.

We use waveform output from the simulator to see if the DUT (Device Under Test) is
functionally correct. Most of the simulators come with waveform viewer, as design becomes
complex, we write self checking testbench, where testbench applies the test vector, compares the
output of DUT with expected value.
There is another kind of simulation, called timing simulation, which is done after synthesis or
after P&R (Place and Route). Here we include the gate delays and wire delays and see if DUT
works at the rated clock speed. This is also called as SDF simulation or gate level simulation

Synthesis

Synthesis is the process in which a synthesis tool like design compiler takes in the RTL in
Verilog or VHDL, target technology, and constrains as input and maps the RTL to target
technology primitives. The synthesis tool after mapping the RTL to gates, also does the minimal
amount of timing analysis to see if the mapped design is meeting the timing requirements.
(Important thing to note is, synthesis tools are not aware of wire delays, they know only gate
delays). After the synthesis there are a couple of things that are normally done before passing the
netlist to backend (Place and Route)

Verification: Check if the RTL to gate mapping is correct.
Scan insertion: Insert the scan chain in the case of ASIC.

Place & Route

Gate-level netlist from the synthesis tool is taken and imported into place and route tool in the
Verilog netlist format. All the gates and flip-flops are placed, Clock tree synthesis and reset is
routed. After this each block is routed. Output of the P&R tool is a GDS file, this file is used by a
foundry for fabricating the ASIC. Normally the P&R tool are used to output the SDF file, which
is back annotated along with the gatelevel netlist from P&R into static analysis tool like Prime
Time to do timing analysis.

Post Silicon Validation

Once the chip (silicon) is back from fabrication, it needs to be put in a real environment and
tested before it can be released into market. Since the speed of simulation with RTL is very slow
(number clocks per second), there is always a possibility to find a bug

1.2 Verilog HDL: Syntax and Semantics

1.2.1 Lexical Conventions

The basic lexical conventions used by Verilog HDL are similar to those in the C programming
language. Verilog HDL is a case-sensitive language. All keywords are in lowercase.

1.2.2 Data Types

Verilog Language has two primary data types :
Nets - represents structural connections between components.
Registers - represent variables used to store data.

Every signal has a data type associated with it. Data types are:
Explicitly declared with a declaration in the Verilog code.
Implicitly declared with no declaration but used to connect structural building blocks in
the code. Implicit declarations are always net type "wire" and only one bit wide.

Types of Net

Each net type has functionality that is used to model different types of hardware (such as PMOS,
NMOS, CMOS, etc).This has been tabularized as follows:
Net Data Type Functionality
wire, tri Interconnecting wire - no special resolution function
wor, trior Wired outputs OR together (models ECL)
wand,triand Wired outputs AND together (models open-collector)
tri0,tri1 Net pulls-down or pulls-up when not driven
supply0,suppy1 Net has a constant logic 0 or logic 1 (supply strength)
Register Data Types

Registers store the last value assigned to them until another assignment statement
changes their value.
Registers represent data storage constructs.
Register arrays are called memories.
Register data types are used as variables in procedural blocks.
A register data type is required if a signal is assigned a value within a procedural block
Procedural blocks begin with keyword initial and always.

Some common data types are listed in the following table:

Data Types Functionality
reg Unsigned variable
integer Signed variable 32 bits
time Unsigned integer- 64 bits
real Double precision floating point variable

1.2.3 Apart from these there are vectors, integer, real & time
register data types.

Some examples are as follows:
Integer
integer counter; // general purpose variable used as a counter.

initial
counter=-1; // a negative one is stored in the counter

Real
real delta; // Define a real variable called delta.

initial
begin
delta=4e10; // delta is assigned in scientific notation
delta =2.13; // delta is assigned a value 2.13
end

integer i; // define an integer I;

initial
i =delta ; // I gets the value 2(rounded value of 2.13)

Time
time save_sim_time; // define a time variable save_sim_time

initial
save_sim_time =$time; // save the current simulation time.
n.b. $time is invoked to get the current simulation time

Arrays
integer count [0:7]; // an array of 8 count variables
reg [4:0] port_id[0:7]; // Array of 8 port _ids, each 5 bit wide.
integer matrix[4:0] [0:255] ; // two dimensional array of integers.
1.2.4 Some Constructs Using Data Types

Memories

Memories are modeled simply as one dimensional array of registers each element of the array is
know as an element of word and is addressed by a single array index.
reg membit [0:1023] ; // memory meme1bit with 1K 1- bit words
reg [7:0] membyte [0:1023]; memory membyte with 1K 8 bit words
membyte [511] // fetches 1 byte word whose address is 511.

Strings

A string is a sequence of characters enclosed by double quotes and all contained on a single line.
Strings used as operands in expressions and assignments are treated as a sequence of eight-bit
ASCII values, with one eight-bit ASCII value representing one character. To declare a variable
to store a string, declare a register large enough to hold the maximum number of characters the
variable will hold. Note that no extra bits are required to hold a termination character; Verilog
does not store a string termination character. Strings can be manipulated using the standard
operators.
When a variable is larger than required to hold a value being assigned, Verilog pads the contents
on the left with zeros after the assignment. This is consistent with the padding that occurs during
assignment of non-string values. Certain characters can be used in strings only when preceded by
an introductory character called an escape character. The following table lists these characters in
the right-hand column with the escape sequence that represents the character in the left-hand
column.

Modules

Module are the building blocks of Verilog designs
You create design hierarchy by instantiating modules in other modules.
An instance of a module can be called in another, higher-level module.

Ports

Ports allow communication between a module and its environment.
All but the top-level modules in a hierarchy have ports.
Ports can be associated by order or by name.
You declare ports to be input, output or inout. The port declaration syntax is :
input [range_val:range_var] list_of_identifiers;
output [range_val:range_var] list_of_identifiers;
inout [range_val:range_var] list_of_identifiers;

Schematic

1.2.5 Port Connection Rules

Inputs : internally must always be type net, externally the inputs can be connected to
variable reg or net type.
Outputs : internally can be type net or reg, externally the outputs must be connected to a
variable net type.
Inouts : internally or externally must always be type net, can only be connected to a
variable net type.

Width matching: It is legal to connect internal and external ports of different sizes. But
beware, synthesis tools could report problems.
Unconnected ports : unconnected ports are allowed by using a ","
The net data types are used to connect structure
A net data type is required if a signal can be driven a structural connection.

Example Implicit

dff u0 ( q,,clk,d,rst,pre); // Here second port is not connected

Example Explicit

dff u0 (.q (q_out),
.q_bar (),
.clk (clk_in),
.d (d_in),
.rst (rst_in),
.pre (pre_in)); // Here second port is not connected

1.3 Gate Level Modeling

In this level of abstraction the system modeling is done at the gate level ,i.e., the properties of the
gates etc. to be used by the behavioral description of the system are defined. These definitions
are known as primitives. Verilog has built in primitives for gates, transmission gates, switches,
buffers etc.. These primitives are instantiated like modules except that they are predefined in
verilog and do not need a module definition. Two basic types of gates are and/or gates & buf /not
gates.

1.3.1 Gate Primitives

And/Or Gates: These have one scalar output and multiple scalar inputs. The output of the gate is
evaluated as soon as the input changes .

wire OUT, IN1, IN2;
// basic gate instantiations
and a1(OUT, IN1, IN2);
nand na1(OUT, IN1, IN2);
or or1(OUT, IN1, IN2);
nor nor1(OUT, IN1, IN2);
xor x1(OUT, IN1, IN2);
xnor nx1(OUT, IN1, IN2);
// more than two inputs; 3 input nand gate
nand na1_3inp(OUT, IN1, IN2, IN3);
// gate instantiation without instance name
and (OUT, IN1, IN2); // legal gate instantiation

Buf/Not Gates: These gates however have one scalar input and multiple scalar outputs
\// basic gate instantiations for bufif

bufif1 b1(out, in, ctrl);
bufif0 b0(out, in, ctrl);
// basic gate instantiations for notif
notif1 n1(out, in, ctrl);
notif0 n0(out, in, ctrl);

Array of instantiations

wire [7:0] OUT, IN1, IN2;
// basic gate instantiations
nand n_gate[7:0](OUT, IN1, IN2);

Gate-level multiplexer

A multiplexer serves a very efficient basic logic design element
// module 4:1 multiplexer
module mux4_to_1(out, i1, i2 , i3, s1, s0);
// port declarations
output out;
input i1, i2, i3;
input s1, s0;
// internal wire declarations
wire s1n, s0n;
wire y0, y1, y2, y3 ;
//gate instantiations
// create s1n and s0n signals
not (s1n, s1);
not (s0n, s0);
// 3-input and gates instantiated
and (y0, i0, s1n, s0n);
and (y1, i1, s1n, s0);
and (y2, i2, s1, s0n);
and (y3, i3, s1, s0);
// 4- input gate instantiated
or (out, y0, y1, y2, y3);
endmodule

1.3.2 Gate and Switch delays

In real circuits, logic gates haves delays associated with them. Verilog provides the mechanism
to associate delays with gates.
Rise, Fall and Turn-off delays.
Minimal, Typical, and Maximum delays

Rise Delay

The rise delay is associated with a gate output transition to 1 from another value (0,x,z).

Fall Delay

The fall delay is associated with a gate output transition to 0 from another value (1,x,z).

Turn-off Delay
The Turn-off delay is associated with a gate output transition to z from another value (0,1,x).
Min Value
The min value is the minimum delay value that the gate is expected to have.
Typ Value
The typ value is the typical delay value that the gate is expected to have.
Max Value
The max value is the maximum delay value that the gate is expected to have.

1.4 Verilog Behavioral Modeling

1.4.1 Procedural Blocks

Verilog behavioral code is inside procedures blocks, but there is an exception, some behavioral
code also exist outside procedures blocks. We can see this in detail as we make progress.
There are two types of procedural blocks in Verilog

initial : initial blocks execute only once at time zero (start execution at time zero).
always : always blocks loop to execute over and over again, in other words as the name
means, it executes always.

Example initial
module initial_example();
reg clk,reset,enable,data;
initial begin
clk =0;
reset =0;
enable =0;
data =0;
end
endmodule

In the above example, the initial block execution and always block execution starts at time 0.
Always blocks wait for the the event, here positive edge of clock, where as initial block without
waiting just executes all the statements within begin and end statement.

Example always
module always_example();
reg clk,reset,enable,q_in,data;
always @ (posedge clk)
if (reset) begin
data <=0;
end
else if (enable) begin
data <=q_in;
end
endmodule

In always block, when the trigger event occurs, the code inside begin and end is executed and
then once again the always block waits for next posedge of clock. This process of waiting and
executing on event is repeated till simulation stops.

1.4.2 Procedural Assignment Statements

Procedural assignment statements assign values to reg , integer , real , or time variables
and can not assign values to nets ( wire data types)
You can assign to the register (reg data type) the value of a net (wire), constant, another
register, or a specific value.

1.4.3 Procedural Assignment Groups

If a procedure block contains more then one statement, those statements must be enclosed within
Sequential begin - end block
Parallel fork - join block

Example - "begin-end"
module initial_begin_end();
initial begin
#1 clk =0;
#10 reset =0;
#5 enable =0;
#3 data =0;
end
endmodule

Begin : clk gets 0 after 1 time unit, reset gets 0 after 6 time units, enable after 11 time units, data
after 13 units. All the statements are executed sequentially.

Example - "fork-join"
module initial_fork_join();
initial fork
#1 clk =0;
#10 reset =0;
#5 enable =0;
#3 data =0;
join
endmodule

1.4.4 Sequential Statement Groups

The begin - end keywords:
Group several statements together.
Cause the statements to be evaluated sequentially (one at a time)
o Any timing within the sequential groups is relative to the previous statement.
o Delays in the sequence accumulate (each delay is added to the previous delay)
o Block finishes after the last statement in the block.

1.4.5 Parallel Statement Groups

The fork - join keywords:
Group several statements together.
Cause the statements to be evaluated in parallel ( all at the same time).
o Timing within parallel group is absolute to the beginning of the group.
o Block finishes after the last statement completes( Statement with high delay, it
can be the first statement in the block).

Example Parallel
module parallel();
reg a;
initial
fork
#10 a =0;
#11 a =1;
#12 a =0;
#13 a =1;
#14 a =$finish;
join
endmodule

Example - Mixing "begin-end" and "fork - join"
module fork_join();
initial begin
$display ( "Starting simulation" );
fork : FORK_VAL
#1 clk =0;
#5 reset =0;
#5 enable =0;
#2 data =0;
join
$display ( "Terminating simulation" );
#10 $finish;
end
endmodule

1.4.6 Blocking and Nonblocking assignment

Blocking assignments are executed in the order they are coded, Hence they are sequential. Since
they block the execution of the next statement, till the current statement is executed, they are
called blocking assignments. Assignment are made with "=" symbol. Example a =b;
Nonblocking assignments are executed in parallel. Since the execution of next statement is not
blocked due to execution of current statement, they are called nonblocking statement.
Assignment are made with "<=" symbol. Example a <=b;

Example - blocking and nonblocking
module blocking_nonblocking();
reg a, b, c, d ;
// Blocking Assignment
initial begin
#10 a =0;
#11 a =1;
#12 a =0;
#13 a =1;
end
initial begin
#10 b <=0;
#11 b <=1;
#12 b <=0;
#13 b <=1;
end
initial begin
c =#10 0;
c =#11 1;
c =#12 0;
c =#13 1;
end
initial begin
d <=#10 0;
d <=#11 1;
d <=#12 0;
d <=#13 1;
end
initial begin
$monitor( " TIME =%t A =%b B =%b C =%b D =%b" ,$time, a, b, c, d );
#50 $finish(1);
end
endmodule

1.4.7 The Conditional Statement if-else

The if - else statement controls the execution of other statements. In programming language like
c, if - else controls the flow of program. When more than one statement needs to be executed for
an if conditions, then we need to use begin and end as seen in earlier examples.

Syntax: if
if (condition) statements;
Syntax: if-else
else
statements;

1.4.8 Syntax: nested if-else-if

else if (condition) statements;
................
................
else statements;

Example- simple if
module simple_if();
reg latch;
wire enable,din;
always @ (enable or din)
if (enable) begin
latch <=din;
end
endmodule

Example- if-else
module if_else();
reg dff;
wire clk,din,reset;
if (reset) begin
dff <=0;
end else begin
dff <=din;
end
endmodule

Example- nested-if-else-if
module nested_if();
reg [3:0] counter;
wire clk,reset,enable, up_en, down_en;
// If reset is asserted
if (reset ==1'b0) begin
counter <=4'b0000;
// If counter is enable and up count is mode
end else if (enable ==1'b1 && up_en ==1'b1) begin
counter <=counter +1'b1;
// If counter is enable and down count is mode
end else if (enable ==1'b1 && down_en ==1'b1) begin
counter <=counter - 1'b0;
// If counting is disabled
end else begin

counter <=counter; // Redundant code
end
endmodule

Parallel if-else

In the above example, the (enable ==1'b1 && up_en ==1'b1) is given highest pritority and
condition (enable ==1'b1 && down_en ==1'b1) is given lowest priority. We normally don't
include reset checking in priority as this does not fall in the combo logic input to the flip-flop as
shown in figure below.

So when we need priority logic, we use nested if-else statements. On the other end if we don't
want to implement priority logic, knowing that only one input is active at a time i.e. all inputs are
mutually exclusive, then we can write the code as shown below.
It is a known fact that priority implementation takes more logic to implement then parallel
implementation. So if you know the inputs are mutually exclusive, then you can code the logic in
parallel if.

module parallel_if();
reg [3:0] counter;
wire clk,reset,enable, up_en, down_en;
// If reset is asserted
counter <=4'b0000;
end else begin
// If counter is enable and up count is mode
if (enable ==1'b1 && up_en ==1'b1) begin
counter <=counter +1'b1;
end
// If counter is enable and down count is mode
if (enable ==1'b1 && down_en ==1'b1) begin
counter <=counter - 1'b0;
end
end
endmodule

1.4.9 The Case Statement

The case statement compares an expression with a series of cases and executes the statement or
statement group associated with the first matching case
case statement supports single or multiple statements.
Group multiple statements using begin and end keywords.
Syntax of a case statement look as shown below.
case ()
<case1 >: <statement >
<case2 >: <statement >
default : <statement >
endcase

1.4.10 Looping Statements

Looping statements appear inside procedural blocks only. Verilog has four looping statements
like any other programming language.
forever
repeat
while
for

The forever statement
The forever loop executes continually, the loop never ends. Normally we use forever statement
in initial blocks.
syntax : forever <statement >
Once should be very careful in using a forever statement, if no timing construct is present in the
forever statement, simulation could hang.
The repeat statement
The repeat loop executes statement fixed <number >of times.
syntax : repeat (<number >) (<statement >)
The while loop statement
The while loop executes as long as an evaluates as true. This is same as in any other
programming language.
syntax: while (expression)<statement>
The for loop statement
The for loop is same as the for loop used in any other programming language.
Executes an <initial assignment >once at the start of the loop.
Executes the loop as long as an <expression >evaluates as true.
Executes a at the end of each pass through the loop
syntax : for (<initial assignment >; <expression >, <step assignment >) <statement >
Note : verilog does not have ++operator as in the case of C language.

1.5 Switch level modeling

1.5.1 Verilog provides the ability to design at MOS-transistor level, however with increase in
complexity of the circuits design at this level is growing tough. Verilog however only provides
digital design capability and drive strengths associated to them. Analog capability is not into
picture still. As a matter of fact transistors are only used as switches.

MOS switches
//MOS switch keywords
nmos
pmos

Whereas the keyword nmos is used to model a NMOS transistor, pmos is used for PMOS
transistors.

Instantiation of NMOS and PMOS switches
nmos n1(out, data, control); // instantiate a NMOS switch
pmos p1(out, data, control); // instantiate a PMOS switch

CMOS switches

Instantiation of a CMOS switch.

cmos c1(out, data, ncontrol, pcontrol ); // instantiate a cmos switch

The ncontrol and pcontrol signals are normally complements of each other

Bidirectional switches

These switches allow signal flow in both directions and are defined by keywords tran,tranif0 ,
and tranif1

Instantiation

tran t1(inout1, inout2); // instance name t1 is optional
tranif0(inout1, inout2, control); // instance name is not specified
tranif1(inout1, inout2, control); // instance name t1 is not specified

1.5.2 Delay specification of switches

pmos, nmos, rpmos, rnmos
Zero(no delay) pmos p1(out,data, control);
One (same delay in all) pmos#(1) p1(out,data, control);
Two(rise, fall) nmos#(1,2) n1(out,data, control);
Three(rise, fall, turnoff)mos#(1,3,2) n1(out,data,control);

1.5.3 An Instance: Verilog code for a NOR- gate

// define a nor gate, my_nor
module my_nor(out, a, b);
output out;
input a, b;

//internal wires
wire c;
// set up pwr n ground lines

supply1 pwr;// power is connected to V
dd
supply0 gnd; // connected to Vss

// instantiate pmos switches
pmos (c, pwr, b);
pmos (out, c, a);

//instantiate nmos switches

nmos (out, gnd, a);

Stimulus to test the NOR-gate
// stimulus to test the gate
module stimulus;
reg A, B;
wire OUT;

//instantiate the my_nor module
my_nor n1(OUT, A, B);

//Apply stimulus
initial
begin
//test all possible combinations
A=1b0; B=1b0;
#5 A=1b0; B=1b1;
#5 A=1b1; B=1b0;
#5 A=1b1; B=1b1;
end
//check results
initial
$ monitor($time, OUT =%b, B=%b, OUT, A, B);

endmodule

1.6 Some Exercises

1.6.1 Gate level modelling

i) A 2 inp xor gate can be build from my_and, my_or and my_not gates. Construct an xor module
in verilog that realises the logic function z=xy'+x'y. Inputs are x, y and z is the output. Write a
stimulus module that exercises all the four combinations of x and y
ii) The logic diagram for an RS latch with delay is being shown.

Write the verilog description for the RS latch, including delays of 1 unit when instantiating the
nor gates. Write the stimulus module for the RS latch using the following table and verify the
outputs.

Set Reset Qn+1
0 0 qn
0 1 0
1 0 1
1 1 ?

iii) Design a 2-input multiplexer using bufif0 and bufif1 gates as shown below

The delay specification for gates b1 and b2 are as follows

Min Typ Max
Rise 1 2 3
Fall 3 4 5
Turnoff 5 6 7

1.6.2. Behavioral modelling

i) Using a while loop design a clk generator whose initial value is 0. time period of the clk is 10.
ii) Using a forever statement, design a clk with time period=10 and duty cycle =40%. Initial
value of clk is 0
iii) Using the repeat loop, delay the statement a=a+1 by 20 positive edges of clk.
iv) Design a negative edge triggered D-FF with synchronous clear, active high (D-FF clears only
at negative edge of clk when clear is high). Use behavioral statements only. (Hint: output q of D-
FF must be declared as reg.) Design a clock with a period of 10units and test the D-FF
v) Design a 4 to 1 multiplexer using if and else statements
vi) Design an 8-bit counter by using a forever loop, named block, and disabling of named block.
The counter starts counting at count =5 and finishes at count =67. The count is incremented at
positive edge of clock. The clock has a time period of 10. The counter starts through the loop
only once and then is disabled (hint: use the disable statement)

Lesson
22

Description Languages - II



Call a task and a function in a Verilog code and distinguish between them
Plan and write test benches to a Verilog code such that it can be simulated to check the
desired results and also test the source code
Explain what are User Defined Primitives, classify them and use them in code

2.1 Task and Function

2.1.1 Task

Tasks are used in all programming languages, generally known as procedures or sub- routines.
Many lines of code are enclosed in -task....end task- brackets. Data is passed to the task,
processing done, and the result returned to the main program. They have to be specifically called,
with data in and out, rather than just wired in to the general netlist. Included in the main body of
code, they can be called many times, reducing code repetition.
Tasks are defined in the module in which they are used. it is possible to define a task in a
separate file and use compile directive 'include to include the task in the file which
instantiates the task.
Tasks can include timing delays, like posedge, negedge, # delay and wait.
Tasks can have any number of inputs and outputs.
The variables declared within the task are local to that task. The order of declaration
within the task defines how the variables passed to the task by the caller are used.
Task can take, drive and source global variables, when no local variables are used. When
local variables are used it assigns the output only at the end of task execution.
One task can call another task or function.
Task can be used for modeling both combinational and sequential logics.
A task must be specifically called with a statement, it cannot be used within an
expression as a function can.

Syntax

task begins with the keyword task and ends with the keyword endtask
Input and output are declared after the keyword task.
Local variables are declared after input and output declaration.

module simple_task();
task convert;
input [7:0] temp_in;
output [7:0] temp_out;
begin
temp_out =(9/5) *( temp_in +32)
end
endtask
endmodule

Example - Task using Global Variables
module task_global ();
reg[7:0] temp_in;
reg [7:0] temp_out;
task convert;
always@(temp_in)
begin
temp_out =(9/5) *( temp_in +32)
end
endtask
endmodule

Calling a task

Lets assume that thetask in example 1 is stored in a file called mytask.v. Advantage of coding
the task in a separate file is that it can then be used in multiple modules.
module task_calling (temp_a, temp_b, temp_c, temp_d);
input [7:0] temp_a, temp_c;
output [7:0] temp_b, temp_d;
reg [7:0] temp_b, temp_d;
`include "mytask.v"
always @ (temp_a)
Begin
convert (temp_a, temp_b);
End
always @ (temp_c)
Begin
convert (temp_c, temp_d);
End
Endmodule

Automatic (Re-entrant) Tasks

Tasks are normally static in nature. All declared items are statically allocated and they are shared
across all uses of the task executing concurrently. Therefore if a task is called simultaneously
from two places in the code, these task calls will operate on the same task variables. it is highly
likely that the result of such operation be incorrect. Thus, keyword automatic is added in front of
the task keyword to make the tasks re-entrant. All items declared within the automatic task are
allocated dynamically for each invocation. Each task call operates in an independent space.

Example
// Module that contains an automatic re-entrant task
//there are two clocks, clk2 runs at twice the frequency of clk and is synchronous with it.
module top;
reg[15:0] cd_xor, ef_xor; // variables in module top
reg[15:0] c,d,e,f ; // variables in module top
task automatic bitwise_xor
output[15:0] ab_xor ; // outputs from the task
input[15:0] a,b ; // inputs to the task
begin
#delay ab_and =a & b
ab_or=a| b;
ab_xor=a^b;
end
endtask
// these two always blocks will call the bitwise_xor task
// concurrently at each positive edge of the clk, however since the task is re-entrant, the
//concurrent calls will work efficiently
always @(posedge clk)
bitwise_xor(ef_xor, e ,f );
always @(posedge clk2)// twice the frequency as that of the previous clk
bitwise_xor(cd_xor, c ,d );
endmodule

2.1.2 Function

Function is very much similar to a task, with very little difference, e.g., a function cannot drive
more then one output and, also, it can not contain delays.
Functions are defined in the module in which they are used. It is possible to define
function in separate file and use compile directive 'include to include the function in the
file which instantiates the task.
Function can not include timing delays, like posedge, negedge, # delay. This means that a
function should be executed in "zero" time delay.
Function can have any number of inputs but only one output.
The variables declared within the function are local to that function. The order of
declaration within the function defines how the variables are passed to it by the caller.
Function can take, drive and source global variables when no local variables are used.
When local variables are used, it basically assigns output only at the end of function
execution.
Function can be used for modeling combinational logic.
Function can call other functions, but can not call a task.

Syntax

A function begins with the keyword function and ends with the keyword endfunction
Inputs are declared after the keyword function.

Example - Simple Function
module simple_function();
function myfunction;
input a, b, c, d;
begin
myfunction =((a+b) +(c-d));
end
endfunction
endmodule

Example - Calling a Function
module function_calling(a, b, c, d, e, f);
input a, b, c, d, e ;
output f;
wire f;
`include "myfunction.v"
assign f =(myfunction (a,b,c,d)) ? e :0;
endmodule

Automatic (Recursive) Function

Functions used normally are non recursive. But to eliminate problems when the same function is
called concurrently from two locations automatic function is used.

Example
// define a factorial with recursive function
module top;
// define the function
function automatic integer factorial:
input[31:0] oper;
integer i:
begin
if (operan>=2)
factorial=factorial(oper -1)* oper:// recursive call
else
factorial=1;
end
endfunction
// call the function
integer result;
initial
begin
result=factorial(4); // call the factorial of 7
$ display (Factorial of 4 is %0d, result) ; // Displays 24
end
endmodule
Constant function

A constant function is a regular verilog function and is used to reference complex values, can be
used instead of constants.

Signed function

These functions allow the use of signed operation on function return values.
module top;
// signed function declaration
// returns a 64 bit signed value
function signed [63:0] compute _signed (input [63:0] vector);
--
--
endfunction
// call to the signed function from a higher module
if ( compute_signed(vector)<-3)
begin
--
end
--
endmodule

2.1.3 System tasks and functions

Introduction

There are tasks and functions that are used to generate inputs and check the output during
simulation. Their names begin with a dollar sign ($). The synthesis tools parse and ignore system
functions, and, hence, they can be included even in synthesizable models.

$display, $strobe, $monitor

These commands have the same syntax, and display text on the screen during simulation. They
are much less convenient than waveform display tools like GTKWave. or Undertow. $display
and $strobe display once every time they are executed, whereas $monitor displays every time
one of its parameters changes. The difference between $display and $strobe is that $strobe
displays the parameters at the very end of the current simulation time unit rather than exactly
where a change in it took place. The format string is like that in C/C++, and may contain format
characters. Format characters include %d (decimal), %h (hexadecimal), %b (binary), %c
(character), %s (string) and %t (time), %m (hierarchy level). %5d, %5b. b, h, o can be appended
to the task names to change the default format to binary, octal or hexadecimal.

Syntax

$display ("format_string", par_1, par_2, ... );
$strobe ("format_string", par_1, par_2, ... );
$monitor ("format_string", par_1, par_2, ... );
$displayb ( as above but defaults to binary..);
$strobeh (as above but defaults to hex..);
$monitoro (as above but defaults to octal..);

$time, $stime, $realtime

These return the current simulation time as a 64-bit integer, a 32-bit integer, and a real number,
respectively.

$reset, $stop, $finish

$reset resets the simulation back to time 0; $stop halts the simulator and puts it in the interactive
mode where the user can enter commands; $finish exits the simulator back to the operating
system.

$scope, $showscope

$scope(hierarchy_name) sets the current hierarchical scope to hierarchy_name. $showscopes(n)
lists all modules, tasks and block names in (and below, if n is set to 1) the current scope.

$random

$random generates a random integer every time it is called. If the sequence is to be repeatable,
the first time one invokes random give it a numerical argument (a seed). Otherwise, the seed is
derived from the computer clock.

$dumpfile, $dumpvar, $dumpon, $dumpoff, $dumpall

These can dump variable changes to a simulation viewer like Debussy. The dump files are
capable of dumping all the variables in a simulation. This is convenient for debugging, but can
be very slow.

Syntax

$dumpfile("filename.dmp")
$dumpvar dumps all variables in the design.
$dumpvar(1, top) dumps all the variables in module top and below, but not modules
instantiated in top.
$dumpvar(2, top) dumps all the variables in module top and 1 level below.
$dumpvar(n, top) dumps all the variables in module top and n-1 levels below.
$dumpvar(0, top) dumps all the variables in module top and all level below.
$dumpon initiates the dump.
$dumpoff stop dumping.

$fopen, $fdisplay, $fstrobe $fmonitor and $fwrite

These commands write more selectively to files.

$fopen opens an output file and gives the open file a handle for use by the other
commands.
$fclose closes the file and lets other programs access it.
$fdisplay and $fwrite write formatted data to a file whenever they are executed. They are
the same except $fdisplay inserts a new line after every execution and $write does not.
$strobe also writes to a file when executed, but it waits until all other operations in the
time step are complete before writing. Thus initial #1 a=1; b=0;
$fstrobe(hand1, a,b); b=1; will write write 1 1 for a and b. $monitor writes to a file
whenever any one of its arguments changes.

Syntax

handle1=$fopen("filenam1.suffix")
handle2=$fopen("filenam2.suffix")
$fstrobe(handle1, format, variable list) //strobe data into filenam1.suffix
$fdisplay(handle2, format, variable list) //write data into filenam2.suffix
$fwrite(handle2, format, variable list) //write data into filenam2.suffix all on one line.
//put in the format string where a new line is
// desired.

2.2 Writing Testbenches

2.2.1 Testbenches

are codes written in HDL to test the design blocks. A testbench is also known as
stimulus, because the coding is such that a stimulus is applied to the designed block and its
functionality is tested by checking the results. For writing a testbench it is important to have the
design specifications of the "design under test" (DUT). Specifications need to be understood
clearly and test plan made accordingly. The test plan, basically, documents the test bench
architecture and the test scenarios (test cases) in detail.

Example Counter
Consider a simple 4-bit up counter, which increments its count when ever enable is high and
resets to zero, when reset is asserted high. Reset is synchronous with clock.

Code for Counter
// Function : 4 bit up counter
module counter (clk, reset, enable, count);
input clk, reset, enable;
output [3:0] count;
reg [3:0] count;
count <=0;
end else if ( enable ==1'b1) begin
count <=count +1;
end
endmodule

2.2.2 Test Plan

We will write self checking test bench, but we will do this in steps to help you understand the
concept of writing automated test benches. Our testbench environment will look something like
shown in the figure.

DUT is instantiated in testbench which contains a clock generator, reset generator, enable logic
generator, compare logic. The compare logic calculates the expected count value of the counter
and compares its output with the calculated value

2.2.3 Test Cases

Reset Test : We can start with reset deasserted, followed by asserting reset for few clock
ticks and deasserting the reset, See if counter sets its output to zero.
Enable Test : Assert/deassert enable after reset is applied.
Random Assert/deassert of enable and reset.

2.2.4 Creating testbenches

There are two ways of defining a testbench.
The first way is to simply instantiate the design block(DUT) and write the code such that it
directly drives the signals in the design block. In this case the stimulus block itself is the top-
level block.
In the second style a dummy module acts as the top-level module and both the design(DUT) and
the stimulus blocks are instantiated within it. Generally, in the stimulus block the inputs to DUT
are defined as reg and outputs from DUT are defined as wire. An important point is that there is
no port list for the test bench.
An example of the stimulus block is given below.
Note that the initial block below is used to set the various inputs of the DUT to a predefined
logic state.

Test Bench with Clock generator

module counter_tb;
reg clk, reset, enable;
wire [3:0] count;
counter U0 (
.clk (clk),
.reset (reset),
.enable (enable),
.count (count)
initial
begin
clk =0;
reset =0;
enable =0;
end

always
#5 clk =!clk;
endmodule

Initial block in verilog is executed only once. Thus, the simulator sets the value of clk, reset and
enable to 0(0 makes all this signals disabled). It is a good design practice to keep file names
same as the module name.

Another elaborated instance of thetestbench is shown below. In this instance the usage of system
tasks has been explored.
module counter_tb;
reg clk, reset, enable;
wire [3:0] count;
counter U0 (
.clk (clk),
.reset (reset),
.enable (enable),
.count (count)
initial begin
clk =0;
reset =0;
enable =0;
end

always
#5 clk =!clk;
initial begin
$dumpfile ( "counter.vcd" );
$dumpvars;
end

initial begin
$display( "\t\ttime,\tclk,\treset,\tenable,\tcount" );
$monitor( "%d,\t%b,\t%b,\t%b,\t%d" ,$time, clk,reset,enable,count);
end

initial
#100 $finish;
//Rest of testbench code after this line
Endmodule

$dumpfile is used for specifying the file that simulator will use to store the waveform, that can
be used later to view using a waveform viewer. (Please refer to tools section for freeware version
of viewers.) $dumpvars basically instructs the Verilog compiler to start dumping all the signals
to "counter.vcd".
$display is used for printing text or variables to stdout (screen), \t is for inserting tab. Syntax is
same as printf. Second line $monitor is bit different, $monitor keeps track of changes to the
variables that are in the list (clk, reset, enable, count). When ever anyone of them changes, it
prints their value, in the respective radix specified.
$finish is used for terminating simulation after #100 time units (note, all the initial, always
blocks start execution at time 0)

Adding the Reset Logic

Once we have the basic logic to allow us to see what our testbench is doing, we can next add the
reset logic, If we look at the testcases, we see that we had added a constraint that it should be
possible to activate reset anytime during simulation. To achieve this we have many approaches,
but the following one works quite well. There is something called 'events' in Verilog, events can
be triggered, and also monitored to see, if a event has occurred.
Lets code our reset logic in such a way that it waits for the trigger event "reset_trigger" to
happen. When this event happens, reset logic asserts reset at negative edge of clock and de-
asserts on next negative edge as shown in code below. Also after de-asserting the reset, reset
logic triggers another event called "reset_done_trigger". This trigger event can then be used at
some where else in test bench to sync up.

Code for the reset logic

event reset_trigger;
event reset_done_trigger;
initial begin
forever begin
@ (reset_trigger);
@ (negedge clk);
reset =1;
@ (negedge clk);
reset =0;
reset_done_trigger;
end
end

Adding test case logic

Moving forward, lets add logic to generate the test cases, ok we have three testcases as in the
first part of this tutorial. Lets list them again.
Reset Test : We can start with reset deasserted, followed by asserting reset for few clock
ticks and deasserting the reset, See if counter sets its output to zero.
Enable Test: Assert/deassert enable after reset is applied.
Random Assert/deassert of enable and reset.

Adding compare Logic

To make any testbench self checking/automated, a model that mimics the DUT in functionality
needs to be designed.For the counter defined previously the model looks similar to:
Reg [3:0] count_compare;
if (reset ==1'b1)
count_compare <=0;
else if ( enable ==1'b1)
count_compare <=count_compare +1;

Once the logic to mimic the DUT functionality has been defined, the next step is to add the
checker logic. The checker logic at any given point keeps checking the expected value with the
actual value. Whenever there is an error, it prints out the expected and the actual values, and,
also, terminates the simulation by triggering the event terminate_sim. This can be appended to
the code above as follows:

if (count_compare !=count) begin
$display ( "DUT Error at time %d" , $time);
$display ( " Expected value %d, Got Value %d" , count_compare, count);
#5 ->terminate_sim;
end

2.3 User Defined Primitives

2.3.1 Verilog comes with built in primitives like gates, transmission gates, and switches. This set
sometimes seems to be rather small and a more complex primitive set needs to be constructed.
Verilog provides the facility to design these primitives which are known as UDPs or User

Defined Primitives. UDPs can model:
Combinational Logic
Sequential Logic
One can include timing information along with the UDPs to model complete ASIC library
models.

Syntax

UDP begins with the keyword primitive and ends with the keyword endprimitive. UDPs must be
defined outside the main module definition.
This code shows how input/output ports and primitve is declared.
primitive udp_syntax (
a, // Port a
b, // Port b
c, // Port c
d // Port d
)
output a;
input b,c,d;
// UDP function code here
endprimitive

Note:
A UDP can contain only one output and up to 10 inputs max.
Output Port should be the first port followed by one or more input ports.
All UDP ports are scalar, i.e. Vector ports are not allowed.
UDP's can not have bidirectional ports.

Body

Functionality of primitive (both combinational and sequential) is described inside a table, and it
ends with reserve word endtable (as shown in the code below). For sequential UDPs, one can use
initial to assign initial value to output.

// This code shows how UDP body looks like
primitive udp_body (
a, // Port a
b, // Port b
c // Port c
);
input b,c;
// UDP function code here
// A =B | C;
table
// B C : A
? 1 : 1;
1 ? : 1;
0 0 : 0;
endtable
endprimitive

Note: A UDP cannot use 'z' in input table and instead it uses x.

2.3.2 Combinational UDPs

In combinational UDPs, the output is determined as a function of the current input. Whenever an
input changes value, the UDP is evaluated and one of the state table rows is matched. The output
state is set to the value indicated by that row.
Let us consider the previously mentioned UDP.

TestBench to Check the above UDP

include "udp_body.v"
module udp_body_tb();
reg b,c;
wire a;
udp_body udp (a,b,c);
initial begin
$monitor( " B =%b C =%b A =%b" ,b,c,a);
b =0;
c=0;
#1 b =1;
#1 c =1;
#1 b =1'bx;
#1 c =0;
#1 b =1;
#1 c =1'bx;
#1 b =0;
#10 $finish;
end
endmodule

Sequential UDPs

Sequential UDPs differ in the following manner from the combinational UDPs

The output of a sequential UDP is always defined as a reg
An initial statement can be used to initialize output of sequential UDPs
The format of a state table entry is somewhat different
There are 3 sections in a state table entry: inputs, current state and next state. The three
states are separated by a colon(:) symbol.
The input specification of state table can be in term of input levels or edge transitions
The current state is the current value of the output register.
The next state is computed based on inputs and the current state. The next state becomes
the new value of the output register.
All possible combinations of inputs must be specified to avoid unknown output.

Level sensitive UDPs
// define level sensitive latch by using UDP
primitive latch (q, d, clock, clear)

//declarations output q;
reg q; // q declared as reg to create internal storage
input d, clock, clear;

// sequential UDP initialization
// only one initial statement allowed
initial
q=0; // initialize output to value 0

// state table
table
// d clock clear : q : q+;
? ? 1 : ? : 0 ;// clear condition
// q+is the new output value
1 1 0 : ? : 1 ;// latch q =data =1
0 1 0 : ? : 0 ;// latch q =data =0

? 0 0 : ? : - ;// retain original state if clock =0

endtable

endprimitive

Edgesensitive UDPs

//Define edge sensitive sequential UDP;
primitive edge_dff(output reg q =0 input d, clock, clear);

// state table
table
// d clock clear : q : q+;
? ? 1 : ? : 0 ; // output=0 if clear =1
? ? (10): ? : - ; // ignore negative transition of clear
1 (10) 0 : ? : 1 ;// latch data on negative transition
0 (10) 0 : ? : 0 ;// clock

? (1x) 0 : ? : - ;// hold q if clock transitions to unknown state
? (0?) 0 : ? : - ;// ignore positive transitions of clock
? (x1) 0 : ? : - ;// ignore positive transitions of clock

(??) ? 0 : ? : - ;// ignore any change in d if clock is steady

endtable
endprimitive

Some Exercises

1. Task and functions

i. Define a function to multiply 2 four bit number. The output is a 32 bit value. Invoke the
function by using stimulus and check results
ii. define a function to design an 8-function ALU that takes 2 bit numbers a and computes a
5 bit result out based on 3 bit select signal . Ignore overflow or underflow bits.
iii. Define a task to compute even parity of a 16 bit number. The result is a 1-bit value that is
assigned to the output after 3 positive edges of clock. (Hint: use a repeat loop in the task)
iv. Create a design a using a full adder. Use a conditional compilation (idef). Compile the
fulladd4 with def parameter statements in the text macro DPARAM is defined by the
'define 'statement; otherwise compile the full adder with module instance parameter
values.
v. Consider a full bit adder. Write a stimulus file to do random testing of the full adder. Use
a random number to generate a 32 bit random number. Pick bits 3:0 and apply them to
input a; pick bits 7:4 and apply them to input b. use bit 8 and apply it to c_in. apply 20
random test vectors and see the output.

2. Timing

i) a. Consider the negative edge triggered with the asynchronous reset D-FF shown below. Write
the verilog description for the module D-FF. describe path delays using parallel connection.

b Modify the above if all the path delays are 5.

ii) Assume that a six delay specification is to be specified for all the path delays. All path delays
are equal. In the specify block define parameters t_01=4, t_10=5, t_0z=7,t_z1=2, t_z0=8. Using
the previous DFF write the six delay specifications for all the paths.

3. UDP

i. Define a positive edge triggered d-f/f with clear as a UDP. Signal clear is active low.
ii. Define a level sensitive latch with a preset signal. Inputs are d, clock, and preset. Output
is q. If clock=0, then q=d. If clock=1or x then q is unchanged. If preset=1, then q=1. If
preset=0 then q is decided by clock and d signals. If preset=x then q=x.
iii. Define a negative edge triggered J K FF, jk_ff with asynchronous preset and clear as a
UDP. Q=1when preset=1 and q=0 when clear=1

T he table for JK FF is as follows

J K qn+1
0 0 qn
0 1 0
1 0 1
1 1 qn

Lesson
23

Description Languages-III


Interface Verilog code to C & C++using Programming Language Interface
Synthesize a Verilog code and generate a netlist for layout
Verify the generated code, and carry out optimization and debugging
Classify various types of flows in Verification

3.1 Programming Language interface

3.1.1 Verilog
PLI (Programming Language Interface) is a facility to invoke C or C++functions from Verilog
code.
The function invoked in Verilog code is called a system call. Examples of built-in system calls
are $display, $stop, $random. PLI allows the user to create custom system calls, something that
Verilog syntax does not allow to do. Some of these are:-
Power analysis.
Code coverage tools.
Can modify the Verilog simulation data structure - more accurate delays.
Custom output displays.
Co-simulation.
Designs debug utilities.
Simulation analysis.
C-model interface to accelerate simulation.
Testbench modeling.

To achieve the above few application of PLI, C code should have the access to the internal data
structure of the Verilog simulator. To facilitate this Verilog PLI provides with something called
acc routines or access routines

How it Works?

Write the functions in C/C++code.
Compile them to generate shared lib (*.DLL in Windows and *.so in UNIX). Simulator
like VCS allows static linking.
Use this Functions in Verilog code (Mostly Verilog Testbench).
Based on simulator, pass the C/C++function details to simulator during compile process
of Verilog Code (This is called linking, and you need to refer to simulator user guide to
understand how this is done).
Once linked just run the simulator like any other Verilog simulation.

The block diagram representing above is as follows:

During execution of the Verilog code by the simulator, whenever the simulator encounters the
user defined system tasks (the one which starts with $), the execution control is passed to PLI
routine (C/C++function).

Example - Hello World

Define a function hello ( ), which when called will print "Hello World". This example does not
use any of the PLI standard functions (ACC, TF and VPI). For exact linking details, the
simulator manuals must be referred. Each simulator implements its own strategy for linking with
the C/C++functions.

C Code

#include <stdio.h >
Void hello () {
printf ( "\nHello World\n" );

Verilog Code

module hello_pli ();
initial begin
$hello;
#10 $finish;
end
endmodule

3.1.2 Running a Simulation

Once linking is done, simulation is run as a normal simulation with slight modification to the
command line options. These modifications tell the simulator that the PLI routines are being
used (e.g. Modelsim needs to know which shared objects to load in command line).

Writing PLI Application (counter example)

Write the DUT reference model and Checker in C and link that to the Verilog Testbench.

The requirements for writing a C model using PLI
Means of calling the C model, when ever there is change in input signals (Could be wire
or reg or types).
Means to get the value of the changes signals in Verilog code or any other signals in
Verilog code from inside the C code.
Means to drive the value on any signal inside the Verilog code from C code.
There are set of routines (functions), that Verilog PLI provides which satisfy the above
requirements

3.1.3 PLI Application Specification

This can be well understood in context to the above counter logic. The objective is to design the
PLI function $counter_monitor and check the response of the designed counter using it. This
problem can be addressed to in the following steps:
Implement the Counter logic in C.
Implement the Checker logic in C.
Terminate the simulation, whenever the checker fails.
This is represented in the block diagram in the figure 23.2.

Calling the C function

The change in clock signal is monitored and with its change the counter function is executed
The acc_vcl_add routine is used. The syntax can be obtained in the Verilog PLI LRM.
acc_vcl_add routine basically monitors the list of signals and whenever any of the monitor
signals change, it calls the user defined function (this function is called the Consumer C
routine). The vcl routine has four arguments.
Handle to the monitored object
Consumer C routine to call when the object value changes
String to be passed to consumer C routine
Predefined VCL flags: vcl_verilog_logic for logic monitoring vcl_verilog_strength for
strength monitoring
acc_vcl_add (net, display_net, netname, vcl_verilog_logic);
C Code Basic

The desired C function is Counter_monitor , which is called from the Verilog Testbench. As
like any other C code, header files specific to the application are included.Here the include e file
comprises of the acc routines.
The access routine acc_initialize initializes the environment for access routines and must be
called from the C-language application program before the program invokes any other access
routines. Before exiting a C-language application program that calls access routines, it is
necessary to exit the access routine environment by calling acc_close at the end of the program.
#include <stdio.h >
#include "acc_user.h"
typedef char * string;
handle clk ;
handle reset ;
handle enable ;
handle dut_count ;
int count ;
void counter_monitor()
{
acc_initialize();
clk =acc_handle_tfarg(1);
reset =acc_handle_tfarg(2);
enable =acc_handle_tfarg(3);
dut_count =acc_handle_tfarg(4);
acc_vcl_add(clk,counter,null,vcl_verilog_logic);
acc_close();
}
void counter ()
printf( "Clock changed state\n" );

Handles are used for accessing the Verilog objects. The handle is a predefined data type that is a
pointer to a specific object in the design hierarchy. Each handle conveys information to access
routines about a unique instance of an accessible object information about the object type and,
also, how and where the data pertaining to it can be obtained. The information of specific object
to handle can be passed from the Verilog code as a parameter to the function $counter_monitor.
This parameters can be accessed through the C-program with acc_handle_tfarg( ) routine.
For instance clk = acc_handle_tfarg(1) basically makes that the clk is a handle to the first
parameter passed. Similarly, all the other handles are assigned clk can now be added to the signal
list that needs to be monitored using the routine acc_vcl_add(clk, counter ,null ,
vcl_verilog_logic). Here clk is the handle, counter is the user function to execute, when the clk
changes.
Verilog Code
Below is the code of a simple testbench for the counter example. If the object being passed is an
instance, then it should be passed inside double quotes. Since here all the objects are nets or
wires, there is no need to pass them inside the double quotes.
module counter_tb();
reg enable;;
reg reset;
reg clk_reg;
wire clk;
wire [3:0] count;
initial begin
clk =0;
reset =0;
$display( "Asserting reset" );
#10 reset =1;
#10 reset =0;
$display ( "Asserting Enable" );
#10 enable =1;
#20 enable =0;
$display ( "Terminating Simulator" );
#10 $finish;

End

Always
#5 clk_reg =!clk_reg;
assign clk =clk_reg;
initial begin
$counter_monitor(top.clk,top.reset,top.enable,top.count);
end

counter U(
clk (clk),
reset (reset),
enable (enable),
count (count)
);
endmodule
Access Routines

Access routines are C programming language routines that provide procedural access to
information within Verilog. Access routines perform one of two operations:
Extract information pertaining to an object from the internal data representation.
Write information pertaining to an object into the internal data representation.
Program Flow using access routines

include <acc_user.h >
void pli_func() {
acc_initialize();
// Main body: Insert the user application code here
acc_close();
acc_user.h : all data-structure related to access routines
acc_initialize( ) : initialize variables and set up environment
main body : User-defined application
acc_close( ) : Undo the actions taken by the function acc_initialize( )

Utility Routines

Interaction between the Verilog tool and the users routines is handled by a set of programs that
are supplied with the Verilog toolset. Library functions defined in PLI1.0 perform a wide variety
of operations on the parameters passed to the system call and are used to do simulation
synchronization or implementing conditional program breakpoint.

3.2 Verilog and Synthesis

3.2.1 What is logic synthesis?

Logic synthesis is the process of converting a high-level description of design into an optimized
gate-level netlist representation. Logic synthesis uses standard cell libraries which consist of
simple cells, such as basic logic gates like and, or, and nor, or macro cells, such as adder, muxes,
memory, and flip-flops. Standard cells put together form the technology library. Normally,
technology library is known by the minimum feature size (0.18u, 90nm).
A circuit description is written in Hardware description language (HDL) such as Verilog Design
constraints such as timing, area, testability, and power are considered during synthesis. Typical
design flow with a large example is given in the last example of this lesson.

3.2.2 Impact of automation on Logic synthesis

For large designs, manual conversions of the behavioral description to the gate-level
representation are more prone to error. Prior to the development of modern sophisticated
synthesis tools the earlier designers could never be sure that whether after fabrication the design
constraints will be met. Moreover, a significant time of the design cycle was consumed in
converting the highlevel design into its gate level representation. On account of these, if the
gate level design did not meet the requirements then the turnaround time for redesigning the
blocks was also very high. Each designer implemented design blocks and there was very little
consistency in design cycles, hence, although the individual blocks were optimized but the
overall design still contained redundant logics. Moreover, timing, area and power dissipation was
fabrication process specific and, hence, with the change of processes the entire process needed to
be changed with the design methodology.
However, now automated logic synthesis has solved these problems. The high level design is less
prone to human error because designs are described at higher levels of abstraction. High level
design is done without much concentration on the constraints. The tool takes care of all the
constraints and sees to it that the constraints are taken care of. The designer can go back,
redesign and synthesize once again very easily if some aspect is found unaddressed. The
turnaround time has also fallen down considerably. Automated logic synthesis tools synthesize
the design as a whole and, thus, an overall design optimization is achieved. Logic synthesis
allows a technology independent design. The tools convert the design into gates using cells from
the standard cell library provided by the vendor.
Design reuse is possible for technology independent designs. If the technology changes the tool
is capable of mapping accordingly.

Constructs Not Supported in Synthesis
Construct Type Notes
Initial Only in testbenches
event Events make more sense for syncing test bench components
real Real data type not supported
time Time data type not supported

force and release force and release of data types not supported
assign and deassign assign and deassign of reg data types is not supported, but,
assign on wire data type is supported

Example of a Non-Synthesizable Verilog construct

Codes containing one or more of the above constructs are not synthesizable. But even with
synthesizable constructs, bad coding may cause serious synthesis concerns.

Example - Initial Statement

module synthesis_initial(
clk,q,d);
input clk,d;

output q;
reg q;
initial begin
q <=0;
end
begin
q <=d;
end
endmodule

Delays are also non-synthesizable e.g. a =#10 b; This code is useful only for simulation
purpose.
Synthesis tool normally ignores such constructs, and just assumes that there is no #10 in above
statement, treating the above code as just a =b.

3.2.3 Constructs and Their Description

Construct Type Keyword Description
ports input, inout, output Use inout only at IO level.
parameters parameter
This makes design more
generic
module definition module
signals and variables wire, reg, tri Vectors are allowed
instantiation
module instances primitive gate
instances
Eg- nand (out,a,b) bad idea
to code RTL this way.
function and tasks function , task Timing constructs ignored
procedural
always, if, then, else, case, casex,
casez
initial is not supported
procedural blocks begin, end, named blocks, disable
Disabling of named blocks
allowed
data flow assign
Delay information is
ignored
named Blocks disable
Disabling of named block
supported.
loops for, while, forever
While and forever loops
must contain @(posedge
clk) or @(negedge clk)

3.2.4 Operators and Their Description

Operator Type Operator Symbol
DESCRIPTION
Arithmetic * Multiply
/ Division
+ Add
- Subtract
% Modulus
+ Unary plus
- Unary minus
Logical ! Logical negation
&& Logical and
|| Logical or
Relational > Greater than
< Less than
>= Greater than or equal
<= Less than or equal
Equality == Equality
!= inequality
Reduction & Bitwise negation
~& nand
| or
~| nor
^ xor
^~~^ xnor
Shift >> Right shift
<< Left shift
Concatenation { } Concatenation
Conditional ? conditional

Constructs Supported In Synthesis

Construct Type Keyword Description
ports input, inout, output Use inout only at IO level.
parameters parameter
This makes design more
generic
module definition module
signals and variables wire, reg, tri Vectors are allowed
instantiation
module instances primitive gate
instances
Eg- nand (out,a,b) bad idea
to code RTL this way.
function and tasks function , task Timing constructs ignored
procedural
always, if, then, else, case, casex,
casez
initial is not supported
procedural blocks begin, end, named blocks, disable
Disabling of named blocks
allowed
data flow assign
Delay information is
ignored
named Blocks disable
Disabling of named block
supported.
loops for, while, forever
While and forever loops
must contain @(posedge
clk) or @(negedge clk)

3.2.5 Overall Logic Circuit Modeling and Synthesis in brief

Combinational Circuit modeling using assign

RTL description This comprises the high level description of the circuit incorporating the RTL
constructs. Some functional verification is also done at this level to ensure the validity of the
RTL description.
RTL for magnitude comparator
// module magnitude comparator
module magnitude_comparator(A_gt_B, A_lt_B, A_eq_B, A,_B);
//comparison output;
output A_gt_B, A_lt_B, A_eq_B ;
// 4- bit numbers input
input [3:0] A,B;
assign A_gt_B=(A>B) ; // A greater than B
assign A_lt_B=(A<B) ; // A greater than B
assign A_eq_B=(A==B) ; // A greater than B
endmodule

Translation

The RTL description is converted by the logic synthesis tool to an optimized, intermediate,
internal representation. It understands the basic primitives and operators in the Verilog RTL
description but overlooks any of the constraints.

Logic optimization

The logic is optimized to remove the redundant logic. It generates the optimized internal
representation.

Technology library

The technology library contains standard library cells which are used during synthesis to replace
the behavioral description by the actual circuit components. These are the basic building blocks.
Physical layout of these, are done first and then area is estimated. Finally, modeling techniques
are used to estimate the power and timing characteristics.
The library includes the following:
Functionality of the cells
Area of the different cell layout
Timing information about the various cells
Power information of various cells

The synthesis tools use these cells to implement the design.
// Library cells for abc_100 technology
VNAND// 2 input nand gate
VAND// 2 input and gate
VNOR // 2 input nor gate
VOR// 2 input or gate
VNOT// not gate
VBUF// buffer

Design constraints

Any circuit must satisfy at least three constraints viz. area, power and timing. Optimization
demands a compromise among each of these three constraints. Apart from these operating
conditions-temperature etc. also contribute to synthesis complexity.

Logic synthesis

The logic synthesis tool takes in the RTL design, and generates an optimized gate level
description with the help of technology library, keeping in pace with design constraints.
Verification of the gate level netlist

An optimized gate level netlist must always be checked for its functionality and, in addition, the
synthesis tool must always serve to meet the timing specifications. Timing verification is done in
order to manipulate the synthesis parameters in such a way that different timing constraints like
input delay, output delay etc. are suitably met.

Functional verification

Identical stimulus is run with the original RTL and synthesized gate-level description of the
design. The output is compared for matches.
module stimulus
reg [3:0] A, B;
wire A_GT_B, A_LT_B, A_EQ_B;
// instantiate the magnitude comparator MC (A_GT_B, A_LT_B, A_EQ_B,. A, B);
initial
$ monitor ($time, A=%b, B=%b, A_GT_B=%b, A_LT_B=%b, A_EQ_B=%b, A_GT_B,
A_LT_B, A_EQ_B, A, B)
// stimulate the magnitude comparator

endmodule

3.3 Verification

3.3.1 Traditional verification flow

Traditional verification follows the following steps in general.

1. To verify, first a design specification must be set. This requires analysis of architectural
trade-offs and is usually done by simulating various architectural models of the design.
2. Based on this specification a functional test plan is created. This forms the framework for
verification. Based on this plan various test vectors are applied to the DUT (design under
test), written in verilog. Functional test environments are needed to apply these test
vectors.
3. The DUT is then simulated using traditional software simulators.
4. The output is then analyzed and checked against the expected results. This can be done
manually using waveform viewers and debugging tools or else can be done automatically
by verification tools. If the output matches expected results then verification is complete.
5. Optionally, additional steps can be taken to decrease the risk of future design respin.
These include Hardware Acceleration, Hardware Emulation and assertion based
Verification.

Functional verification

When the specifications for a design are ready, a functional test plan is created based on them.
This is the fundamental framework of the functional verification. Based on this test plan, test
vectors are selected and given as input to the design_under_test(DUT). The DUT is simulated to
compare its output with the desired results. If the observed results match the expected values, the
verification part is over.

Functional verification Environment

The verification part can be divided into three substages :
Block level verification: verification is done for blocks of code written in verilog using a
number of test cases.
Full chip verification: The goal of full chip verification, i.e, all the feature of the full
chip described in the test plan is complete.
Extended verification: This stage depicts the corner state bugs.

3.3.2 Formal Verification

A formal verification tool proves a design by manipulating it as much as possible. All input
changes must, however, conform to the constraints for behaviour validation. Assertions on
interfaces act as constraints to the formal tool. Assertions are made to prove the assertions in the
RTL code false. However, if the constraints are too tight then the tool will not explore all
possible behaviours and may wrongly report the design as faulty.
Both the formal and the semi-formal methodologies have come into precedence with the
increasing complexity of design.

3.3.3 Semi- formal verification

Semi formal verification combines the traditional verification flow using test vectors with the
power and thoroughness of formal verification.

Semi-formal methods supplement simulation with test vectors
Embedded assertion checks define the properties targeted by formal methods
Embedded assertion checks defines the input constraints
Semi-formal methods explore limited space exhaustibility from the states reached by
simulation, thus, maximizing the effect of simulation.The exploration is limited to a
certain point around the state reached by simulation.

3.3.4 Equivalence checking

After logic synthesis and place and route tools create a gate level netlist and physical
implementations of the RTL design, respectively, it is necessary to check whether these
functionalities match the original RTL design. Here comes equivalence checking. It is an
application of formal verification. It ensures that the gate level or physical netlist has the same
functionality as the Verilog RTL that was simulated. A logical model of both the RTL and gate
level representations is constructed. It is mathematically proved that their functionality are same.

3.4 Some Exercises

3.4.1 PLI

i) Write a user defined system task, $count_and_gates, which counts the number of and gate
primitive in a module instance. Hierarchical module instance name is the input to the task. Use
this task to count the number of and gates in a 4-to-1 multiplexer.

3.4.2 Verilog and Synthesis

i) A 1-bit full subtractor has three inputs x, y, z(previous borrow) and two outputs D(difference)
and B(borrow). The logic equations for D & B are as follows
D=xyz+xyz+xyz +xyz
B=xy +xz+yz
Write the verilog RTL description for the full subtractor. Synthesize the full using any
technology library available. Apply identical stimulus to the RTL and gate level netlist and
compare the outputs.

ii) Design a 3-8 decoder, using a Verilog RTL description. A 3-bit input a[2:0] is provided to the
decoder. The output of the decoder is out[7:0]. The output bit indexed by a[2:0] gets the value 1,
the other bits are 0. Synthesize the decoder, using any technology library available to you.
Optimize for smallest area. Apply identical stimulus to the RTL and gate level netlist and
compare the outputs.

iii) Write the verilog RTL description for a 4-bit binary counter with synchronous reset that is
active high.(hint: use always loop with the @ (posedge clock)statement.) synthesize the counter
using any technology library available to you. Optimize for smallest area. Apply identical
stimulus to the RTL and gate level netlist and compare the outputs.

Design of Embedded Processors

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Design of Embedded Processors

Hochgeladen von

Copyright:

Verfügbare Formate

Design of Embedded

Das könnte Ihnen auch gefallen