Sie sind auf Seite 1von 532

Modern Digital Design Flow

• Agenda

1. History of Digital Design Approach


2. HDLs
3. Design Abstraction
4. Modern Design Steps
5. Implementation Options (FPGAs)
History
• In the beginning…

1970's
- designers used Paper/Pencil & Boolean Equations to create schematics
- the drawback :
- each flop required a Boolean equation
- impractical in large designs

1980's
- schematic based designs using electronic editors
- this enabled Copy/Past & Hierarchy
- Design-reuse was enabled which increased design sizes

mid 80's
- HDL's became more common (created mid 80's)
- Text-based Compilers (C, PASCAL) could be adapted to perform digital simulation
- Larger Designs could be described using text

Design

Physical
Simulation Still separate Implementation

2
History
• More recently

1990's
- Synthesis became practical due to increase in computational power of computers

Synthesis - the creation of circuitry from a functional description

ex) "Functional Description of MUX"


Sel
if (Sel = 0)
Out = A
A
else Synthesis
Out
Out = B B

3
HDL
• Real Power

1990's - Now engineers had a power combination

"HDL"

if (Sel = 0)
Out = A
else
Out = B

"Simulation" "Synthesis"
Sel

A
Out
B

4
HDL
• Abstraction

Engineers could now stay at a higher level of abstraction and rely on the tools to

1) Simulation
2) Synthesize the circuitry

- This allows larger systems to be described/designed in the same time

- Since HW is expensive to build, using the tools to reduce prototyping was the next step

5
HDL
• Timing Verification
HDL

- Let the tool "Verify" timing


Functional
Synthesis
Simulation
- Less time spent prepping
design for a prototyping run
Technology
Mapping

Place/Route
(extract RC's)

Post Implementation
Match? Simulation

Fab

6
Hardware Description Languages vs.
Programming Languages
• Program structure
– instantiation of multiple components of the same type
– specify interconnections between modules via schematic
– hierarchy of modules (only leaves can be HDL in Xilinx Foundation)
• Assignment
– continuous assignment (logic always computes)
– propagation delay (computation takes time)
– timing of signals is important (when does computation have its effect)
• Data structures
– size explicitly spelled out - no dynamic structures
– no pointers
• Parallelism
– hardware is naturally parallel (must support multiple threads)
– assignments can occur in parallel (not just sequentially)

7
Hardware Description Languages and
Combinational Logic
• Modules - specification of inputs, outputs, bidirectional, and
internal signals
• Continuous assignment - a gate's output is a function of its
inputs at all times (doesn't need to wait to be "called")
• Propagation delay- concept of time and delay in input affecting
gate output
• Composition - connecting modules together with wires
• Hierarchy - modules encapsulate functional blocks
• Specification of don't care conditions (accomplished by setting
output to “x”)

8
Hardware Description Languages and
Sequential Logic
• Flip-flops
– representation of clocks - timing of state changes
– asynchronous vs. synchronous
• FSMs
– structural view (FFs separate from combinational logic)
– behavioral view (synthesis of sequencers)
• Data-paths = ALUs + registers (e.g. Combinational Lock)
– use of arithmetic/logical operators
– control of storage elements
• Parallelism
– multiple state machines running in parallel
• Sequential don't cares

9
Design Abstraction
• At What level can we design?

10
Design Abstraction
• What does abstraction give us?

- The higher in abstraction we go, the more complex


& larger the system becomes

- But, we let go over the details of how it performs


(speed, fine tuning)

- There are engineering jobs at each level

- Guru's can span multiple levels

• What does VHDL model?

- System : Chip : Register : Gate

- VHDL let's us describe systems in two ways:

1) Structural (text netlist)


2) Behavioral (requires synthesis)

11
VHDL/Verilog: Structure/Behavior
• Supports structural and behavioral descriptions
• Structural
– explicit structure of the circuit
– e.g., each logic gate instantiated and connected to others
• Behavioral
– program describes input/output behavior of circuit
– many structural implementations could have same behavior
– e.g., different implementation of one Boolean function
• We’ll only be using behavioral VHDL/Verilog in design works
– rely on schematic when we want structural descriptions

12
Modern Digital Design Flow
• Designing Large Digital Circuits

- this is the ideal process

13
Digital Design Flow
• Designing Large Digital Circuits

- this is reality

14
Digital Design Flow
• A More Detailed Breakdown Relation to our class

HW or Lab Assignment

Write VHDL, Simulate with ModelSim

Synthesize in Quartus, Run Timing Simuluation

Place/Route on FPGA, Download, Test

Take idea, create custom HW to reduce cost


start your own company
sell and become rich

15
Hardware Design Flow

16
Digital Implementation
• What options do we have for hardware implementation?

- Discrete Devices (i.e., go to the stock room and buy NAND gates & Flip-flops)

- ASICs (Application Specific Integrated Circuits (custom silicon)

- Programmable Logic (CPLDs, FPGAs)

• FPGAs have become one of the most popular technologies recently

- We’ll use an FPGA in this class to test our designs

- We’ll use the ModelSim simulator for functional simulation

- We’ll use the Altera Quartus II design software for


synthesis, place/route, and post-synthesis verification.

- We’ll use an Altera Cyclone II FPGA on a DE2 evaluation board


to test our designs in hardware.

17
FPGA's
• What is an FPGA

Field Programmable Gate Array

• An FPGA uses Re-configurable Logic Blocks

- we set the config bits of this block to set its Boolean logic function

- the configuration is a Truth Table (or Look Up Table) of functionality

In1 config Out


Out 000 NOT(In1)
In2 001 NOT(In2)
010 OR
config 011 NOR
100 AND
101 NAND
110 XOR
111 XNOR

18
FPGA's
• LUTs = Look Up Tables

- we can program the LUTs to be whatever type of gate is needed by the design
- there are a finite number of LUTs within a given FPGA (also called "resources")

• The LUTs are configured into an ARRAY on the silicon

- Array of LUT's = Array of Gates = Gate Array


In1 In1 In1
Out Out Out
In2 In2 In2

config config config

In1 In1 In1


Out Out Out
In2 In2 In2

config config config

In1 In1 In1


Out Out Out
In2 In2 In2

config config config

19
FPGA's
• Programmable Interconnect

- there are programmable interconnect switches that connect the LUTs

LUT X LUT X LUT

X X X X X

LUT X LUT X LUT

X X X X X

LUT X LUT X LUT

20
FPGA's
• Configuration

- We start with a Gate Level Schematic of our design (from synthesis)


- The FPGA LUTs are configured to implement Gates

LUT X LUT X LUT

X X X X X

LUT X LUT X LUT

X X X X X

LUT X LUT X LUT

21
FPGA's
• Configuration

- The interconnect switches are then programmed to implement the net connections

A INV X AND X LUT

B X X X X X Out

C INV X OR X LUT

X X X X X

LUT X LUT X LUT

22
FPGA's
• Configuration

- The LUT and Interconnect configuration is volatile


(i.e., it goes away when power is removed)

- Since the programming is done by the user after fabrication, we call


it "Field Programmable”
A INV X AND X LUT

B X X X X X Out

C INV X OR X LUT

X X X X X

LUT X LUT X LUT

- We now understand where the name “Field Programmable Gate Array” comes from.

23
FPGA's
• Adding More Functionality

- FPGA manufacturer's quickly learned that Flip-Flops would be useful

- They put a DFF next to a 4-Input LUT to form a "Configurable Logic Block" (CLB)

CLB X CLB

X X X

CLB X CLB

24
FPGA's
• Adding Even More Functionality

- To Improve performance, common logic


functions were "hard coded" on the silicon

- Block RAM
- Adders / Multipliers
- Global Clock Buffers
- even Microprocessors!

25
FPGA's
• What else can we program?

- Which Pins to use on the package

- What logic levels

- CMOS_33, CMOS25
- SSTL, SSTL2, etc…

26
VHDL

• Agenda

1. Hardware Description Languages


2. VHDL History
3. VHDL Systems and Signals
4. VHDL Entities, Architectures, and Packages
5. VHDL Data Types
6. VHDL Operators
7. VHDL Structural Design
8. VHDL Behavioral Design
9. VHDL Test Benches
VHDL History
• VHDL

V = Very High Speed Integrated Circuit


H = Hardware
D = Description
L = Language

- Originally a Department of Defense sponsored project in the 80's

- Original Intent was to Document Behavior (instead of writing system manuals)

- Original Intent was NOT synthesis, that came later

- Simulation was a given, since the designs were already in text and we had text compilers (C, ….)

- Designed by IBM, TI, Intermetrics (all sponsored by DoD)

28
VHDL History
• VHDL & IEEE

- In 1987, IEEE published the "VHDL Standard"

- IEEE 1076-1987 = First formal version of VHDL


- Strong "Data Typing"

- each signal/variable is typed (bit, bit_vector, real, integer)

- assignments between different types NOT allowed

- Did not handle multi-valued logic

29
VHDL History
• VHDL & IEEE

- What is multi-valued logic?

- when there are more possible values than 0 and 1

- we need this for real world systems such as buses

- a bus is where multiple circuits drive and receive information


- only one agent drives the bus (low impedance)
- all other agents listen (high impedance)

- how can something drive AND receive?

- a "transceiver" has both a transmit (i.e., a gate facing out) and receive (i.e., a gate facing in)

- we can draw it as follows:


Tx/Rx'

30
VHDL History
• VHDL & IEEE

- What is multi-valued logic?

- but that circuit doesn't actually work because the driving gate will always be driving?

Tx/Rx

- in reality it looks like this:

Tx/Rx'

- what does this look like when it is "OFF"? High Impedance

31
VHDL History
• VHDL & IEEE

- High Impedance

Tx/Rx

Tx/Rx

Tx/Rx
- it is how circuits behave, strong drivers will control the bus when everyone is High-Z

- When nobody is driving the bus, the bus is High-Z

- So for true behavior, VHDL has to model High-Z

- VHDL's built in types (bit and bit_vector) can only be 0 or 1, these don't cut it.

- Weak/Strong

- Some busses have multiple drivers but some are weaker than others (i.e., MCAN)?

- We should model these too

32
VHDL History
• VHDL & IEEE

- VHDL allows users to come up with their own data types. Since the world needed multi-valued logic,
everyone started creating their own add-on packages.

- this created a lot of confusion when multiple vendors worked together (i.e., Fab Shop and Designer)

- In 1993, IEEE published an Upgrade

- IEEE 1164 - added support for Multi-Valued Logic through the "STD_LOGIC" package
- better syntax consistency

- Every time there is a need for a data type, industry will start to create add-ons. Then IEEE
will create a standard to reduce confusion

- Other package standards that were added to VHDL

- 1076.2 = "Real and Complex Data Types"


- 1076.3 = "Signed and Unsigned Data Types"

- The last rev of VHDL in 2003 (1076.3) is considered by most to be the more recent major release

- Although people are talking about VHDL 2006 (which now has turned into VHDL 200x)

33
VHDL History
• At What level can we design?

34
VHDL History
• What does abstraction give us?

- The higher in abstraction we go, the more complex


& larger the system becomes

- But, we let go over the details of how it performs


(speed, fine tuning)

- There are engineering jobs at each level

- Guru's can span multiple levels

• What does VHDL model?

- System : Chip : Register : Gate

- VHDL let's us describe systems in two ways:

1) Structural (text netlist)


2) Behavioral (requires synthesis)

35
VHDL Systems and Signals
• Systems

- The world is made up of systems communicating with each other

- Systems are made up of other Systems

- A System has a particular "Behavior" and "Structure"


Adder System

Behavior Structure

OUT = In1 + In2

- We can describe an "Adder" system in multiple ways and at multiple levels of abstraction

36
VHDL Systems and Signals
• System Interface

- We must first describe the system's Interface to connect it to other systems

Adder
In1
Out
In2

- An "Interface" is a description of the Inputs and Outputs

- We also call these "Ports"

37
VHDL Systems and Signals
• System Behavior

- We then must describe the system's behavior (or functionality)

Adder
In1
Out
In2

- There are many ways to describe the behavior in VHDL

- When describing a system, we must always describe its:

1) Interface
2) Behavior

38
VHDL Systems and Signals
• Signals

- Multiple Systems communicate with each other using signals

Adder
In1
Out Adder
In2
In1
Adder Out
In2
In1
Out
In2

Internal Signals
External Signals

39
VHDL Entity
• VHDL

Entity - used to describe a system's interface


- we call the Inputs and Outputs "Ports"
- creating this in VHDL is called an "Entity Declaration"

Architecture - used to describe a system's behavior (or structure)


- separate from an entity
- an architecture must be tied to an entity
- creating this in VHDL is called an "Architecture Definition"

Syntax Details we'll follow:

- we put the entity and architecture together in one text file


- we name the text file with the system name used in the entity
- the post fix for VHDL is *.vhd
adder.vhd

entity declaration

architecture definition

40
VHDL Entity
• More Syntax Notes

- VHDL is NOT case sensitive


- Comment text is proceeded with "--"
- Names must start with an alphabetic letter (not a number)
- Names can include underscore, but not two in a row (i.e., __) or as the last character.
- Names cannot be keywords (in, out, bit, ….)

41
VHDL Entity
• Entity Details

- an entity declaration must possess the following:

1) entity-name - user selected, same as text file

2) signal-names - user selected

- mode - direction of signal (in, out, buffer, inout)

3) signal-type - what type of data is it?


(bit, STD_LOGIC, real, integer, signed,…)

- this is where VHDL is strict!

- we say it is a "strong type cast" language

- there are built in (or pre-defined) types

(bit, bit_vector, boolean, character, integer, real, string, time)

- we can add more types for realistic behavior (i.e., buses)

42
VHDL Entity
• Entity Syntax

entity entity-name is

port (signal-name : mode signal-type;


signal-name : mode signal-type;

signal-name : mode signal-type);

end entity entity-name;

NOTES: - the keywords are entity, is, port, end


- multiple signal-names with the same type can be comma delimited on the same line
- the port definition is contained within parenthesis
- each signal-name line ends with a ";"
except
the last line (watch the ");" at the end, this will get you every time!)

43
VHDL Entity
• Entity Example

entity adder is Adder


In1
port (In1, In2 : in bit; Out
Out1 : out bit); In2

end entity adder;

NOTES: - we can also put "Generics" within an entity, which are dynamic variables

ex) generic (BusWidth : Integer := 8);

more on generics later….

44
VHDL Entity
• Systems in VHDL adder.vhd

entity declaration
- Systems need to have two things described

1) Interface (I/O, Ports…) architecture definition


2) Behavior (Functionality, Structure)

- In VHDL, we do this using entity and architecture

Entity - used to describe a system's interface


- we call the Inputs and Outputs "Ports"
- creating this in VHDL is called an "Entity Declaration"

Architecture - used to describe a system's behavior (or structure)


- separate from an entity
- an architecture must be tied to an entity
- creating this in VHDL is called an "Architecture Definition"

45
VHDL Architecture
• Architecture Details

- an architecture is always associated with an entity (in the same file too)

- an architecture definition must possess the following:

1) architecture-name - user selected, different from entity


- we usually give something descriptive (adder_arch, and2_arch)
- some companies like to use "behavior", "structural" as the names

2) entity-name - the name of the entity that this architecture is associated with
- must already be declared before compile

3) optional items… - types


- signals : internal connections within the architecture
- constants
- functions : calling predefined blocks
- procedures : calling predefined blocks
- components : calling predefined blocks

4) end architecture - keywords to signify the end of the definition


- we follow this by the architecture name and ";"

46
VHDL Architecture
• Architecture Syntax

architecture architecture-name of entity-name is

type…
signal…
constant…
function…
procedure…
component…

begin

…behavior or structure

end architecture architecture-name;

NOTE: - the keywords are architecture, of, is, type…component, begin, end
- there is a ";" at the end of the last line

47
VHDL Architecture
• Architecture definition of an AND gate

architecture and2_arch of and2 is

begin
Out1 <= In1 and In2;

end architecture and2_arch;

• Architecture definition of an ADDER

architecture adder_arch of adder is


Adder
begin
In1
Out1 <= In1 + In2;
Out
In2
end architecture adder_arch;

48
VHDL Packages
• VHDL is a "Strong Type Cast" language…

- this means that assignments between different data types are not allowed.

- this means that operators must be defined for a given data types.

- this becomes important when we think about synthesis

ex) string + real = ???

- can we add a string to a real?


- what is a "string" in HW?
- what is a "real" in HW?

- VHDL has built-in features:

1) Data Types
2) Operators

- built-in is also called "pre-defined"

49
VHDL Packages
• Pre-defined Functionality

ex) there is a built in addition operator for integers

integer + integer = integer

- the built-in operator "+" works for "integers" only


- it doesn't work for "bits" as is

• Adding on Functionality

- VHDL allows us to define our own data types and operators


- a set of types, operators, functions, procedures… is called a "Package"
- A set of packages are kept in a "Library"

50
VHDL Packages
• IEEE Packages

- when functionality is needed in VHDL, engineers start creating add-ons using Packages

- when many packages exist to perform the same function (or are supposed to)
keeping consistency becomes a problem

- IEEE publishes "Standards" that give a consistent technique for engineers to use in VHDL

- we include the IEEE Library at the beginning of our VHDL code

syntax: library library-name

- we include the Package within the library that we want to use

syntax: use library-name.package.function

- we can substitute "ALL" for "function" if we want to include everything

51
VHDL Packages
• Common IEEE Packages

- in the IEEE library, there are common Packages that we use:

STD_LOGIC_1164
STD_LOGIC_ARITH
STD_LOGIC_SIGNED

Ex) library IEEE;


use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_SIGNED.ALL;

- libraries are defined before the entity declaration

52
VHDL Design
• Let's Put it all together now…

library IEEE; -- package


use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_SIGNED.ALL;

entity and2 is -- entity declaration

port (In1, In2 : in STD_LOGIC;


Out1 : out STD_LOGIC);

end entity and2;

architecture and2_arch of and2 is -- architecture definition

begin
Out1 <= In1 and In2;

end architecture and2_arch;

53
VHDL Design
• Another Example…

library IEEE; -- package


use IEEE.STD_LOGIC_1164.ALL;

entity inv1 is -- entity declaration

port (In1 : in STD_LOGIC;


Out1 : out STD_LOGIC);

end entity inv1;

architecture inv1_arch of inv1 is -- architecture definition

begin
Out1 <= not In1;

end architecture inv1_arch;

• The Pre-defined features of VHDL are kept in the STANDARD library


- but we don't need to explicitly use the STANDARD library, it is automatic

54
VHDL Data Types
• Signals

- a single bit is considered a Scalar quantity

- a bus (or multiple bits represented with one name) is called a Vector

- in VHDL, we can define a signal bus as:

data_bus : in bit_vector (7 downto 0); -- we will use "downto"

or

data_bus : in bit_vector (0 to 7);

- the Most Significant Bit (MSB) is ALWAYS on the left of the range description:

ex) data_bus : in bit_vector (7 downto 0);

data_bus(7) = MSB

ex) data_bus : in bit_vector (0 to 7);

data_bus(0) = MSB

55
VHDL Data Types
• Signals

- there are "Internal" and "External" signals

Internal - are within the Entity's Interface

External - are outside the Entity's Interface and connect it to other systems

56
VHDL Data Types
• Scalar Data Types (Built into VHDL)

- scalar means that the type only has one value at any given time

Boolean - values {TRUE, FALSE}


- not the same as '0' or '1'

Character - values are all symbols in the 8-bit ISO8859-1 set (i.e., Latin-1)
- examples are '0', '+', 'A', 'a', '\'

Integer - values are whole numbers from -2,147,483,647 to +2,147,483,647


- the range comes from +/- 232
- examples are -12, 0, 1002

Real - values are fractional numbers from -1.0E308 to +1.0E308


- examples are 0.0, 1.134, 1.0E5

Bit - values {'0', '1'}


- different from Boolean
- this type can be used for logic gates
- single bits are always represented with single quotes (i.e., '0', '1')

57
VHDL Data Types
• Array Data Types (Built into VHDL)

- array is a name that represents multiple signals

Bit_Vector - vector of bits, values {'0', '1'}


- array values are represented with double quotes (i.e., "0010")
- this type can be used for logic gates

ex) Addr_bus : in BIT_VECTOR (7 downto 0);

- unlimited range
- first element of array has index=0 (i.e., Addr_bus(0)…)

String - vector of characters, values{Latin-1}


- again use double quotes
- define using "to" or "downto" ("to" is easier for strings)

ex) Message : string (1 to 10) := "message here…"

- first element in array has index=1, this is different from BIT_VECTOR

58
VHDL Data Types
• Physical Data Types (Built into VHDL)

- these types contain object value and unites


- NOT synthesizable

Time - range from -2,147,483,647 to +2,147,483,647


- units: fs, ps, ns, us, ms, sec, min, hr

• User-Defined Enumerated Types

- we can create our own descriptive types, useful for State Machine
- no quotes needed

ex) type States is (Red, Yellow, Green);

59
VHDL Operators
• VHDL Operators

- Data types define both "values" and "operators"

- There are "Pre-Determined" data types

Pre-determined = Built-In = STANDARD Package

- We can add additional types/operators by including other Packages

- We'll first start with the STANDARD Package that comes with VHDL

60
VHDL Operators
• Logical Operators

- works on types BIT, BIT_VECTOR, BOOLEAN

- vectors must be same length

- the result is always the same type as the input

not
and
nand
or
nor
xor
xnor

61
VHDL Operators
• Numerical Operators

- works on types INTEGER, REAL

- the types of the input operands must be the same

+ "addition"
- "subtraction"
* "multiplication"
/ "division"
mod "modulus"
rem "remainder"
abs "absolute value"
** "exponential"

ex) Can we make an adder circuit yet?

A,B : in BIT_VECTOR (7 downto 0)


Z : out BIT_VECTOR (7 downto 0)

Z <= A + B;

62
VHDL Operators
• Relational Operators

- used to compare objects

- objects must be of same type

- Output is always BOOLEAN (TRUE, FALSE)

- works on types: BOOLEAN, BIT, BIT_VECTOR, CHARACTER, INTEGER, REAL, TIME, STRING

= "equal"
/= "not equal"
< "less than"
<= "less than or equal"
> "greater than"
>= "greater than or equal"

63
VHDL Operators
• Shift Operators

- works on one-dimensional arrays

- works on arrays that contain types BIT, BOOLEAN

- the operator requires


1) An Operand (what is to be shifted)
2) Number of Shifts (specified as an INTEGER)

- a negative Number of Shifts (i.e., "-") is valid and reverses the direction of the shift

sll "shift left logical“ 0011 sll 1 = 0110


srl "shift right logical“ 1100 slr 2 = 0011
sla "shift left arithmetic“ 1100 sla 1 = 1000 (rightmost = 0, insert 0)
sra "shift right arithmetic“ 1100 sra 2 = 1111 (leftmost = 1, insert 1)
rol "rotate left“ 1001 rol 1 = 0011
ror "rotate right“ 0110 ror 1 = 0011
If negative integer occurs, it will perform the function same as opposite operator:
» 1100 ror -1 = 1100 rol 1 = 1001

64
VHDL Operators
• Concatenation Operator

- combines objects of same type into an array

- the order is preserved

& "concatenate"

ex) New_Bus <= ( Bus1(7:4) & Bus2(3:0) )

65
VHDL Operators
• Assignment Operators

- The assignment operator is <=

- The Results is always on the Left, Operands on the Right

- Types need to all be of the same type

- need to watch the length of arrays!

Ex) x <=y;

a <= b or c;

sum <= x + y;

NewBus <= m & k;

66
VHDL Operators
• Delay Modeling

- VHDL allows us to include timing information into assignment statements

- this gives us the ability to model real world gate delay

- we use the keyword "after" in our assignment followed by a time operand.

Ex) B <= not A after 2ns;

- VHDL has two types of timing models that allow more accurate representation of real gates

1) Inertial Delay (default)

2) Transport Delay

67
VHDL Operators
• Inertial Delay

- if the input has two edge transitions in less time than the inertial delay, the pulse is ignored

said another way…

- if the input pulse width is smaller than the delay, it is ignored

- this models the behavior of trying to charge up the gate capacitance of a MOSFET

ex) B <= A after 5ns;

any pulses less than 5ns in width are ignored.

68
VHDL Operators
• Transport Delay

- transport delay will always pass the pulse, no matter how small it is.

- this models the behavior of transmission lines

- we have to explicitly call out this type of delay using the "transport" keyword

ex) B <= transport A after 5ns;

B <= transport not A after t_delay; -- here we used a constant

69
Generics vs. Constants
• Generics vs. Constants

- it is very useful to be able to design using variables/parameters instead


of hard coded values

ex) width of bus, delay, loop counters,

- VHDL Provides two methods for this functionality

1) Generics
2) Constants

- These are similar but have subtle differences

70
Generics vs. Constants
• Generics

- declared in Entity

- design can be compiled without initialization

- global variable which can be altered at run-time

- is visible to all architectures below that entity

syntax:
generic (gen-name : gen-type := init-val)

NOTE: init-val is optional

ex) entity inv_n is

generic (WIDTH : integer := 7);

port (In1 : STD_LOGIC_VECTOR (WIDTH downto 0);


Out1 : STD_LOGIC_VECTOR (WIDTH downto 0) );

end entity inv_n;

71
Generics vs. Constants
• Constants

- declared in Architecture

- needs to be initialized

- only visible to the architecture it is defined in

syntax:
constant (const-name : const-type := init-val)

NOTE: init-val is NOT optional

ex) architecture inv_n_arch of inv_n is

constant (t_dly : time := 1ns);

begin
Out1 <= not In1 after t_dly;

end architecture inv_n_arch;

72
VHDL Concurrent Signal Assignments

• Concurrency

- the way that our designs are simulated is important in modeling real HW behavior

- components are executed concurrently (i.e., at the same time)

- VHDL gives us another method to describe concurrent logic behavior called

"Concurrent Signal Assignments"

- we simply list our signal assignments (<=) after the "begin" statement in the architecture

- each time any signal on the Right Hand Side (RHS) of the expression changes,
the Left Hand Side (LHS) of the assignment is updated.

- operators can be included (and, or, +, …)

73
VHDL Concurrent Signal Assignments

• Concurrent Signal Assignment Example

entity TOP is node1


port (A,B,C : in STD_LOGIC;
X : out STD_LOGIC);
end entity TOP;

architecture TOP_arch of TOP is

signal node1 : STD_LOGIC;

begin

node1 <= A xor B;


X <= node1 or C;

end architecture TOP_arch;

74
VHDL Concurrent Signal Assignments

• Concurrent Signal Assignment Example

node1 <= A xor B; node1


X <= node1 or C;

- if these are executed concurrently, does it model the real behavior of this circuit?

Yes, that is how these gates operate. We can see that there may be timing that
needs to be considered….

- When does C get to the OR gate relative to (A  B)?


- Could this cause a glitch on X? What about a delay in the actual value?

75
VHDL Concurrent Signal Assignments

• Conditional Signal Assignments

- we can also include conditional situations in a concurrent assignment

- the keywords for these are:

"when" = if the condition is TRUE, make this assignment


"else" = if the condition is FALSE, make this assignment Priority logic

ex) X <= '1' when A='0' else '0';


Y <= '0' when A='0' and C='0' else '1';

- X and Y are evaluated concurrently !!!

- notice that we are assigning static values (0 and 1), this is essentially a "Truth Table"

- if using this notation, make sure to include every possible input condition, or else you haven't
described the full operation of the circuit.

76
VHDL Concurrent Signal Assignments

• Conditional Signal Assignments

- We can also assign signals to other signals using conditions

- this is similar to a MUX

ex) X <= A when Sel='0' else B;

- Again, make sure to include every possible input condition, or else you haven't
described the full operation of the circuit.

- If you try to synthesis an incomplete description, the tool will start making stuff up!

77
VHDL Concurrent Signal Assignments

• Selected Signal Assignment

- We can also use a technique that allows the listing of "choices" and "assignments" in a comma
delimited fashion.

- this is called "Selected Signal Assignment" but it is still CONCURRENTLY assigned

syntax:

with expression select

signal-name <= signal-value when choices,


No priority, no overlap
signal-value when choices,
:
signal-value when others;

- we use the term "others" to describe any input condition that isn't explicitly described

78
VHDL Concurrent Signal Assignments

• Selected Signal Assignment Example

Describe the following Truth Table using Selected Signal Assignments:

Input X
000 0
001 1
010 1
011 0
100 1
101 1
110 0
111 0 begin
with Input select
X<= '0' when "000",
'1' when "001",
'1' when "010",
'0' when "011",
'1' when "100",
'1' when "101",
'0' when "110",
'0' when "111";

79
VHDL Concurrent Signal Assignments

• Selected Signal Assignment Example

- we can shorten the description by using "others" for the 0's

- we can also use "|" delimited choices

Input X
000 0
001 1
010 1
011 0
100 1
101 1
110 0
111 0 begin
with Input select
X<= '1' when "001" | "010" | "100" | "101",
'0' when others;

80
VHDL Structural Design
• Structural Design

- we can specify functionality in an architecture in two ways

1) Structurally : text based schematic, manual instantiation of another system


When internal connection are clear, straightforward; small design
2) Behaviorally : abstract description of functionality

- we will start with learning Structural VHDL design

• Components

- blocks that already exist and are included into a higher level design

- we need to know the entity declaration of the system we are calling

- we "declare" a component using the keyword "component"

- we declare the component in the architecture which indicates we wish to use it

81
VHDL Structural Design
• Component Syntax

component component-name

port (signal-name : mode signal-type;


signal-name : mode signal-type); -- exactly the same as the Entity declaration

end component;

• Let's build this…

82
VHDL Structural Design
• Component Example

- let's use these pre-existing entities "xor2" & "or2"

entity xor2 is

port (In1, In2 : in STD_LOGIC;


Out1 : out STD_LOGIC);

end entity xor2;

entity or2 is

port (In1, In2 : in STD_LOGIC;


Out1 : out STD_LOGIC);

end entity or2;

83
VHDL Structural Design
• Component Example

- now let's include the pre-existing entities "xor2" & "or2" into our "TOP" design

entity TOP is
port (A,B,C : in STD_LOGIC;
X : out STD_LOGIC);
end entity TOP;

architecture TOP_arch of TOP is

component xor2 -- declaration of xor2 component


port (In1, In2 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end component;

component or2 is -- declaration of or2 component


port (In1, In2 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end component;

begin
…..

84
VHDL Structural Design
• Signals

- now we want to connect items within an architecture, we need "signals" to do this

- we defined signals within an architecture

Internal "Signal"

Internal "Components"

85
VHDL Structural Design
• Signal Syntax

architecture TOP_arch of TOP is

signal signal-name : signal-type;


signal signal-name : signal-type;

86
VHDL Structural Design
• Let's put the signal declaration into our Architecture

- now let's include the pre-existing entities "xor2" & "or2" into our "TOP" design

architecture TOP_arch of TOP is

signal node1 : STD_LOGIC;

component xor2 -- declaration of xor2 component


port (In1, In2 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end component;

entity or2 is -- declaration of or2 component


port (In1, In2 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end component;

begin
…..
node1

end architecture TOP_arch;

87
VHDL Structural Design
• Component Instantiation

- after the "begin" keyword, we can start adding components and connecting signals

- we add components with a "Component Instantiation"

syntax:

label : component-name port map (port => signal, ……) ;

NOTE: - "label" is a unique reference designator for that component (U1, INV1, UUT1)

- "component-name" is the exact name as declared prior to the "begin" keyword

- "port map" is a keyword

- the signals with in the ( ) of the port map define how signals are connected
to the ports of the instantiated component

88
VHDL Structural Design
• Port Maps

- There are two ways describe the "port map" of a component

1) Positional
2) Explicit

• Positional Port Map

- signals to be connected to the component are listed in the exact order as the components port order

ex) U1 : xor2 port map (A, B, node1);

• Explicit Port Map

- signals to be connected to the component are explicitly linked to the port names of the
component using the "=>" notation (Port => Signal, Port => Signal, ….)

ex) U1 : xor2 port map (In1 => A, In2 => B, Out1 => node1);

89
VHDL Structural Design
• Execution

- All components are executed CONCURRENTLY

- this mimics real hardware

- this is different from traditional program execution (i.e., C/C++) which is executed sequentially

because

We are NOT writing code, we are describing hardware!!!

90
VHDL Structural Design
• Let's put everything together

architecture TOP_arch of TOP is

signal node1 : STD_LOGIC;

component xor2 -- declaration of xor2 component


port (In1, In2 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end component;

component or2 is -- declaration of or2 component


port (In1, In2 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end component;

begin
U1 : xor2 port map (In1=>A, In2=>B, Out1=>node1);
U2 : or2 port map (In1=>C, In2=>node1, Out1=>X);
U1
node1

end architecture TOP_arch; U2

91
VHDL Behavioral Design
• Behavioral Design

- we've learned the basic constructs of VHDL (entity, architecture, packages)

- we've learned how to use structural VHDL to instantiate lower-level systems


and to create text-based schematics

- now we want to go one level higher in abstraction and design using


"Behavioral Descriptions" of HW

- when we design at the Behavioral level, we now rely on Synthesis tools to create
the ultimate gate level schematic

- we need to be aware of what we CAN and CAN'T synthesis

- Remember, VHDL was invented to model systems, not for synthesis

- This means we can simulate a lot more functionality that could ever by synthesized

92
VHDL Behavioral Design
• Processes

- a way to describe interaction between signals

- a process executes a SEQUENCE of operations

- the new values in a process (i.e., the LHS) depend on the current and past values
of the other signals

- the new values in a process (i.e., the LHS) do not get their value until the process
terminates

- a process goes in the architecture after the "begin" keyword

syntax: name : process (sensitivity list)

declarations
begin
sequential statements
end process name;

93
VHDL Behavioral Design
• Process Execution

- Real systems start on certain conditions

- they then perform an operation

- they then wait for the next start condition

ex) Button pushed?


Clock edge present?
Reset?
Change on Inputs?

- to mimic real HW, we want to be able to START and STOP processes

- otherwise, the simulation would get stuck in an infinite loop or "hang"

94
VHDL Behavioral Design
• Process Execution

- Processes execute in Sequence (i.e., one after another, in order)

- these are NOT concurrent

- this is a difficult concept to grasp and leads to difficulty in describing HW

ex) name : process (sensitivity list)

begin
sequential statement;
sequential statement;
sequential statement;
end process name;

- these signal assignments are called "Sequential Signal Assignments"

(as opposed to "Concurrent Signal Assignments")

95
VHDL Behavioral Design
• Starting and Stopping a Process

- There are two ways to start and stop a process 1) Sensitivity List
2) Wait Statement

• Sensitivity List

- a list of signal names

- the process will begin executing if there is a change on any of the signals in the list

ex) FLOP : process (clock)

begin
Q <= D;
end process FLOP;

- each time there is a change on "clock", the process will execute ONCE

- the process ends after the last statement

96
VHDL Behavioral Design
• Wait Statements

- the keyword "wait" can be used inside of a process to start/stop it

- the process executes the sequences 1-by-1 until hitting the wait statement

- we don't use "waits" and "sensitivity lists" together

ex) DOIT : process DOIT : process


begin begin
statement 1; statement 1;
statement 2; statement 2;
statement 3; wait;
end process DOIT; end process DOIT;

(No Start/Stop Control, loops forever) (w/ Start/Stop Control, executes until "wait" then stops)

- we need to have a conditional operator associated with the wait statement,


otherwise it just stops the process and it will never start again.

97
VHDL Behavioral Design
• Wait Statements

- the wait statements can be followed by keywords "for" or "until" to describe the
wait condition

- the wait statement can wait for:

1) type-expression ex) wait for 10ns;


wait for period/2;

2) condition ex) wait until Clock='1'


wait until Data>16;

98
VHDL Behavioral Design
• Signals and Processes

- Rules of a Process

1) Signals cannot be declared inside of a process

2) Assignment to a Signal takes effect only after the process suspends.


Until it suspends, signals keeps their previous value

3) Only the last signal assignment to a signal in the list has an effect.
So there's no use making multiple assignments to the same signal.

ex) DOIT : process (A,B) -- initially A=2, B=2… then A changes to 7

begin -- Y = 7 + 2 NOT Y=7+0


A <= '0';
B <= '0';
Y <= A+B;

end process DOIT;

99
VHDL Behavioral Design
• Signals and Processes

- But what if we want this behavior?

ex) DOIT : process (A,B) -- initially A=2, B=2… then A changes to 7

begin
A <= '0'; -- we WANT A to be assigned '0'
B <= '0'; -- we WANT B to be assigned '0'
Y <= A+B; -- we WANT Y to be assigned A + B = 0

end process DOIT;

- we need something besides a Signal to hold the interim value

- we need a "Variable"

100
Variables
• Variables

- Signals in processes are only assigned their value when the process suspends

- this makes multiple assignments to a signal meaningless

ex) DOIT : process (A,B) -- a change on A or B will trigger this process

begin
A <= 2; -- B gets its value from the previous value of A,
B <= A + 1; -- not from the A <= 2 assignment

end process DOIT;

- Variables allow us to assign values during the sequence of statements

101
Variables
• Variables

- Variables are defined within a process

syntax:
variable var-name : var-type := init value

- assignments to variables are made using ":=" instead of "<="

- assignments take place immediately

ex) DOIT : process (A,B) -- a change on A or B will trigger this process

variable temp : integer := 0;

begin
temp := 2;
B <= temp + 1;

end process DOIT;

102
Variables
• Signal vs. Variable

Signal Variable

has type (type, value, time) has type (type, value)

assignment with <= assignment with :=

declared outside of the process declared inside of process

assignment takes place when process suspends assignment is immediate

always exists only exists when process executes

103
If-Then Statements
• If / Then Statements

- Used ONLY within a process. VHDL has the following:

- if, then
- if, then, else
- if, then, elsif, then
- if, then, elsif, then, else
syntax:

if boolean-exp then seq-statement


elsif boolean-exp then seq-statement
else seq-statement

- parenthesis are allowed, but not required

- multiple sequential statements allowed, they are separated by a ";" and


can be on different lines

- logical operators allowed in Boolean Expression

104
If-Then Statements
• If / Then Statements

ex) Design a 2-to-1 MUX

architecture mux_2to1_arch of mux_2to1 is

begin
MUX : process (A,B,Sel)
begin
if (Sel = '0') then
Out1 <= A;
elsif (Sel = '1') then
Out1 <= B;
else
Out1 <=A; -- this isn't necessary, just for illustration
end if;
end process MUX;

end architecture mux_2to1_arch;

105
Case Statements
• Case Statements

- used ONLY within a process

- better for larger input combinations, If/Then's can get too long

syntax:

case expression is
when choices => seq-statement;
when choices => seq-statement;
:
end case;

- the keyword "others" is available for input combinations not explicitly called out

106
Case Statements
• Case Statements

ex) Design a 2-to-1 MUX

architecture mux_2to1_arch of mux_2to1 is

begin
MUX : process (A,B,Sel)
begin
case (Sel) is
when '0' => Out1 <= A;
when '1' => Out1 <= B;
when others => Out1 <= A; -- this isn't necessary, just for illustration
end case;
end process MUX;

end architecture mux_2to1_arch;

- the case statement works nice on vectors

- if you want to combine individual signals to form a vector, you can use
variables and the concatenation operator

107
Conditional Loops
• Conditional Loops

- There are multiple loop structures we can use within VHDL

1) Loop
2) While
3) For

• Loops

- "Loop" is a keyword that starts a loop

- creates an infinite loop

- useful for modeling process that go forever (i.e, clocks, time)

108
Conditional Loops
• Loops

ex) CLOCK_GEN : process


begin
clock <= '0';

loop
clock <= '1' after 1ns;
clock <= '0' after 1ns;
end loop;

end process CLOCK_GEN;

- the loop is ended using the keywords "end loop;"

109
Conditional Loops
• While Loops

- a Boolean condition is tested at the beginning of the loop

- the loop only executes if the condition is true

ex) CLOCK_GEN : process


begin
clock <= '0';

while (EN = '1')


clock <= not clock after 1ns;
end loop;

end process CLOCK_GEN;

110
Conditional Loops
• For Loops

- a loop with a counter

- the loop executes the # of times in the range that is specified

syntax:

for identifier in range loop

seq-statement
seq-statement

end loop;

- the "identifier" is the loop variable.

- It is implicitly declared when included in the "for" statement.


- It is automatically the same type as the "range"
- it will step through ALL values in range

111
Conditional Loops
• For Loops

- the "range" needs to be previously defined. All types are allowed

- Supporting all types is powerful for enumerated lists in state machines

(i.e., state_list = idle, go, stop, ….)

ex) for state in state_list loop

if (current_state = state) then

valid_state = TRUE;
end if;
end loop;

112
Attributes
• Attributes

- ability to get more information about a signal other than its current value

- attributes allow access to the signal's history

- previous value
- time since last change

- this is how we can specify "edge triggered" events in sequential logic

- we put the attribute keyword after the signal name using the apostrophe (')

- there are many attributes, the most commonly used are:

1) event
2) transaction
3) last_value
4) last_event

113
Attributes
• "event" Attribute

- tells us when there was a change on the signal

- useful for edge detection

ex) "rising edge"

if (Clock'event and Clock='1')

• "transaction" Attribute

- tells us when there was an assignment is made to a signal

- the signal value does not need to change (i.e., 0 to 0)

ex) process (A'transaction)

statement if anybody ever assigns to A

114
Attributes
• "last_value" Attribute

- tells us the last value of a signal (before most recent assignment)

• "last_event" Attribute

- gives TIME since last event

- good for tracking timing violations (Setup/Hold, signals changing too fast)

ex) process (Data'event)

begin
if (Data'last_event < 0.5ns) then
too_fast <= TRUE;
else
too_fast <= FALSE;
end if;

115
VHDL : Test Benches
• Test Benches

- We need to stimulate our designs in order to test their functionality

- Stimulus in a real system is from an external source, not from our design

- We need a method to test our designs that is not part of the design itself

- This is called a "Test Bench“

- Test Benches are VHDL entity/architectures with the following:

- We instantiate the design to be tested using components

- We call these instantiations "Unit Under Test" (UUT) or "Device Under Test".

- The entity has no ports

- We create a stimulus generator within the architecture

- We can use reporting features to monitor the expected outputs

116
VHDL : Test Benches
• Test Benches

- Test Benches are for Verification, not for Synthesis!!!

- this allows us to use constructs that we ordinarily wouldn't put in a design


because they are not synthesizable

• Let's test this MUX

entity Mux_2to1 is

port (A, B, Sel : in STD_LOGIC;


Y : out STD_LOGIC);

entity Mux_2to1;

117
VHDL : Test Benches
entity Test_Mux is
end entity Test_Mux; -- the test bench entity has no ports

architecture Test_Mux_arch of Test_Mux is

signal In1_TB, In2_TB : STD_LOGIC; -- setup internal Test Signals


signal Sel_TB : STD_LOGIC; -- give descriptive names to make
signal Out_TB : STD_LOGIC; -- apparent they are test signals

component Mux_2to1 -- declare any used components


port (A, B, Sel : in STD_LOGIC;
Y : out STD_LOGIC);
end component;

begin

UUT : Mux_2to1 -- instantiate the design to test


port map ( A => In1_TB,
B => In2_TB,
Sel => Sel_TB,
Y => Out_TB);

118
VHDL : Test Benches

STIM : process -- create process to generate stimulus


begin
In1_TB <= '0'; In2_TB <= '0'; Sel_TB <= '0' wait for 10ns -- we can use wait
In1_TB <= '0'; In2_TB <= '1'; Sel_TB <= '0' wait for 10ns -- statements to control
In1_TB <= '1'; In2_TB <= '0'; Sel_TB <= '0' wait for 10ns -- the speed of the stim

:
:
:
In1_TB <= '1'; In2_TB <= '1'; Sel_TB <= '1' wait for 10ns -- end with a wait…

end process STIM;

end architecture Test_Mux_2to1;

119
VHDL : Test Benches
• Test Bench Reporting

- There are reporting features that allow us to monitor the output of a design

- We can compare the output against "Golden" data and report if there are differences

- This is powerful when we evaluate our designs across power, temp, process…..

• Assert

- the keyword "assert" will check a Boolean expression

- if the Boolean expression is FALSE, it will print a string following the "report" keyword

- Severity levels are also reported with possible values {ERROR, WARNING, NOTE, FAILURE}

ex) A<='0'; B<='0'; wait for 10ns;


assert (Z='1') report "Failed test 00" severity ERROR;

- The message comes out at the simulator console.

120
VHDL : Test Benches
• Report

- the keyword "report" will always print a string

- this is good for outputting the process of a test

- Severity levels are also reported

ex) report "Beginning the MUX test" severity NOTE;

A<='0'; B<='0'; wait for 10ns;


assert (Z='1') report "Failed test 00" severity ERROR;

121
Logic Synthesis with VHDL
What is logic synthesis
v Logic synthesis is the process of converting a high-
level description of design into an optimized gate-
level representation
v Logic synthesis uses standard cell library which have
simple cells, such as basic logic gates like and, or, and
nor, or macro cells, such as adder, muxes, memory, and
special flip-flops
v The designer would first understand the architectural
description. Then he/she would consider design
constraints such as timing, area, testability, and power

pp. 2
What is logic synthesis
v Synthesis = translation + optimization + mapping

residue = 16’h0000;
Translate
if ( high_bits == 2’b10) residue =
state_table[index]; else
state_table[index] =16’h0000;
Optimize + Map
HDL Source

Generic Boolean
(GTECH)

Target Technology
pp. 3
Synthesis is Constraint Driven

always @(reset or set)


area begin : direct_set_reset
if (reset)
y=1'b0;
Translation else if (set)
y=1'b1;
end
always @(gate or reset)
if (reset)
t=1'b0;
else if (gate)
t=d;

optimization

speed

pp. 4
Technology Independent
v Design can be transferred to any technology

area

Technology A

Technology B

speed

pp. 5
What is logic synthesis(cont.)
Architectural
Description

High-Level
Description Design
Constraints
Computer-Aided
Logic Synthesis
Standard Cell
Optimized Gate- Library
Level Netlist (technology
dependent)

no Meets
Constraints
Basic Computer-Aided Logic
yes Synthesis Process
Place and Route

pp. 6
Impact of Logic Synthesis
v Limitation on manual design
v For large designs, manual conversion was prone human
error, such as a small gate missed somewhere
v The designer could never be sure that the design constraints
were going to be met until the gate-level implementation is
complete and tested
v A significant portion of the design cycle was dominated by
the time taken to convert a high-level design into gates
v Design reuse was not possible
v Each designer would implement design blocks differently.
For large designs, this could mean that smaller blocks were
optimized but the overall design was not optimal

pp. 7
Impact of Logic Synthesis(cont.)
v Automated Logic synthesis tools addressed these problems as
follows
v High-level design is less prone to human error because
designs are described at a higher level of abstraction
v High-level design is done without significant concern about
design constraints
v Conversion from high-level design to gates is fast
v Logic synthesis tools optimize the design as a whole. This
removes the problem with varied designer styles for the
different blocks in the design and suboptimal designs
v Logic synthesis tools allow technology-independent design
v Design reuse is possible for technology-independent
descriptions.

pp. 8
Logic Synthesis
v Takes place in two stages:

v Translation of Verilog (or VHDL) source to a netlist


v Register inference

v Optimization of the resulting netlist to improve


speed and area
v Most critical part of the process
v Algorithms very complicated and beyond the scope of this
class

pp. 9
Logic Optimization
v Netlist optimization the critical enabling technology
v Takes a slow or large netlist and transforms it into one
that implements the same function more cheaply

v Typical operations
v Constant propagation
v Common subexpression elimination
v Function factoring

v Time-consuming operation
v Can take hours for large chips

pp. 10
Translating VHDL into Gates
vParts of the language easy to translate
vStructural descriptions with primitives
Already a netlist
vContinuous assignment
Expressions turn into little datapaths

vBehavioral statements the bigger challenge

pp. 11
What Can Be Translated
v Structural definitions
v Everything
v Behavioral blocks
v Depends on sensitivity list
v Only when they have reasonable interpretation as
combinational logic, edge, or level-sensitive latches
v Blocks sensitive to both edges of the clock, changes on
unrelated signals, changing sensitivity lists, etc. cannot be
synthesized
v User-defined primitives
v Primitives defined with truth tables
v Some sequential UDPs can’t be translated (not latches or
flip-flops)

pp. 12
What Isn’t Translated
v Initial blocks
v Used to set up initial state or describe finite testbench stimuli
v Don’t have obvious hardware component
v Delays
v May be in the Verilog source, but are simply ignored
v A variety of other obscure language features
v In general, things heavily dependent on discrete-
event simulation semantics
v Certain “disable” statements
v Pure events

pp. 13
Compile: the “Art” of Synthesis
vcompile command is design optimization
vLogic level Optimization
vflatten (off by default ):removes structure
vstructure : minimizes generic logic
vGate level Optimization
vmap : makes design technology dependent

pp. 14
Compile

pp. 15
Compile

pp. 16
Logic Level Optimization
vOperate with Boolean representation of
a circuit
vHas a global effect on the overall
area/speed characteristic of a design
vStrategy
vStructure
vFlatten
vIf both are true, the design is first flattened
and then structured

pp. 17
Gate Level Optimization
vSelect components to meet timing, design
rule & area goals specified for the circuit
vHas a local effect on the area/speed
characteristics of a design
vStrategy
vMapping
Combination mapping
Sequential Mapping

pp. 20
Combinational vs. Sequential Mapping
Combinational Mapping Sequential Mapping
v Mapping rearranges v Optimize the mapping to
components, combining and sequential cells from
re-combining logic into technology library
different components v Analyze combinational
v May use different algorithms surrounding a sequential cell
such as cloning, resizing or to see if it can absorb the
buffering logic attribute with HDL
v Try to meet the design rule v Try to save speed and area
constraints and timing/area by using a more complex
goals sequential cell

pp. 21
Mapping
Combinational mapping Sequential mapping

pp. 22
Design Methodology

pp. 23
Design Flow
v 1. Write a design description in the Verilog language. This
description can be a combination of structural and functional
elements. This description is used with both the Synopsys HDL
Compiler and the Verilog simulator.
v 2. Provide Verilog-language test drivers for the Verilog HDL
simulator. The drivers supply test vectors for simulation and
gather output data.
v 3. Simulate the design by using a Verilog HDL simulator. Verify
that the description is correct.
v 4. Synthesize the HDL description with HDL Compiler. HDL
Compiler performs architectural optimizations, then creates an
internal representation of the design.

pp. 24
Design Flow
v 5. Use Synopsys Design Compiler to produce an optimized
gate-level description in the target ASIC library. You can
optimize the generated circuits to meet the timing & area
constraints wanted.
v 6. Use Synopsys Design Compiler to output a gate-level Verilog
description. This netlist-style description uses ASIC components
as the leaf-level cells of the design. The gate-level description
has the same port and module definitions as the original high-
level Verilog description.
v 7. Use the original Verilog simulation drivers from Step 2
because module and port definitions are preserved.
v 8. Compare the output of the gate-level simulation with the
output of the original Verilog description simulation to verify that
the implementation is correct.

pp. 25
Basic Logic Design with VHDL

• Agenda
Combinational Logic Review
• Combinational logic circuits are memoryless
• No feedback path
• Output can have multiple logical transitions before settling to
correct value

146
Boolean Equations in VHDL
• Boolean equations and truth tables are both valid ways to
define a function (f = ???)
• Use logical operators in signal assignment statements

147
Boolean Equation Example

148
Binary Coding
• How do we represent information with more than two possible
values?
– eg, numbers
– N voltage levels? — No.
• Multiple binary signals (multiple bits)
• (a1, a0): (0, 0), (0, 1), (1, 0), (1, 1)
– This is a binary code
– Each pair of values is a code word
– Uses two signal wires for a1, a0
• Code Word Size
– An n-bit code has 2n code words
– To represent N possible values
• Need at least ⎡log2N⎤ code word bits
• More bits can be useful in some cases
• Example: code for inkjet printer
– black, cyan, magenta, yellow, red, blue
– six values, ⎡log26⎤ = 3
– black: (0, 0, 1), cyan: (0, 1, 0), magenta: (0, 1, 1), yellow: (1, 0, 0), red: (1, 0, 1), blue: (1, 1, 0)

149
One-Hot Codes
• Each code word has exactly one 1 bit
• Traffic light:
– red: (1,0,0), yellow: (0,1,0), green: (0,0,1)
– Three signal wires: red, yellow, green g,y,g
• Each bit of a one-hot code corresponds to an encoded value
– No hardware needed to decode values

150
Binary Codes in VHDL
• Multiple bits represented by a vector
• signal s: std_logic_vector(4 downto 0);
– This is a five-element signal
– s(4), s(3), s(2), s(1), s(0)
• signal a: std_logic_vector(1 to 3);
– This is a three-element signal
– a(1), a(2), a(3)

151
Binary Coding Example

152
Combinational Logic Design with VHDL

• Agenda

1. Decoders/Encoders
2. Multiplexers/Demultiplexers
3. Tri-State Buffers
4. Comparators
5. Adders (Ripple Carry, Carry-Look-Ahead)
6. Subtraction
7. Multiplication
8. Division (brief overview)
Integrated Circuit Scaling
• Integrated Circuit Scales
Example # of Transistors

SSI - Small Scale Integrated Circuits Individual Gates 10's

MSI - Medium Scale Integrated Circuits Mux, Decoder 100's

LSI - Large Scale Integrated Circuits RAM, ALU's 1k - 10k

VLSI - Very Large Scale Integrated Circuits uP, uCNT 100k - 1M

ULSI - Ultra Large Scale Integrated Circuits Modern uP's > 1M

SoC - System on Chip Microcomputers

SoP - System on Package Different technology blending

- we use the terms SSI and MSI. Everything larger is typically just called "VLSI"

- VLSI covers design that can't be done using schematics or by hand.

154
Decoders
• Decoders

- a decoder has n inputs and 2n outputs

- one and only one output is asserted for a given input combination

ex) truth table of decoder

Input Output
00 0001
01 0010
10 0100
11 1000

- these are key circuits for a Address Decoders

155
Decoder
• Decoder Structure

- The output stage of a decoder can be constructed using AND gates


- Inverters are needed to give the appropriate code to each AND gate
- Using AND/INV structure, we need:

2n AND gates
n Inverters

Showing more inverters


than necessary to illustrate
concept

156
Decoders
• Decoders with ENABLES

- An Enable line can be fed into the AND gate

- The AND gate now needs (n+1) inputs

- Using positive logic:

EN = 0, Output = 0
EN =1, Output depends on input code

157
Decoders
• Decoder Example

- Let's design a 2-to-4 Decoder using Structural VHDL

- We know we need to describe the following structure:

- We know what we'll need:

2n AND gates = 4 AND gates


n Inverters = 2 Inverters Showing more inverters
than necessary to illustrate
concept

158
Decoder
• Decoder Example

- Let's design the inverter using concurrent signal assignments….

entity inv is
port (In1 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end entity inv;

architecture inv_arch of inv is


begin
Out1 <= not In1;
end architecture inv_arch;

159
Decoders
• Decoder Example

- Let's design the AND gate using concurrent signal assignments….

entity and2 is
port (In1,In2 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end entity and2;

architecture and2_arch of and2 is


begin
Out1 <= In1 and In2;
end architecture and2_arch;

160
Decoders
• Decoder Example

- Now let's work on the top level design entity called "decoder_2to4"

entity decoder_2to4 is
port (A,B : in STD_LOGIC;
Y0,Y1,Y2,Y3 : out STD_LOGIC);
end entity decoder_2to4;

161
Decoders
• Decoder Example

- Now let's work on the top level design architecture called "decoder_2to4_arch"

architecture decoder_2to4 _arch of decoder_2to4 is

signal A_n, B_n : STD_LOGIC;

component inv
port (In1 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end component;

component and2
port (In1,In2 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end component;

begin
………

162
Decoders
• Decoder Example

- cont….

begin
U1 : inv port map (A, A_n);
U2 : inv port map (B, B_n);

U3 : and2 port map (A_n, B_n, Y0);


U4 : and2 port map (A, B_n, Y1);
U5 : and2 port map (A_n, B, Y2);
U6 : and2 port map (A, B, Y3);

end architecture decoder_2to4 _arch;

163
Decoder Example

164
Encoders
• Encoder

- an encoder has 2n inputs and n outputs

- it assumes that one and only one input will be asserted

- depending on which input is asserted, an output code will be generated

- this is the exact opposite of a decoder

ex) truth table of binary encoder

Input Output
0001 00
0010 01
0100 10
1000 11

165
Encoders
• Encoder

- an encoder output is a simple OR structure that looks at the incoming signals

ex) 4-to-2 encoder

I3 I2 I1 I0 Y1 Y0
0 0 0 1 0 0
0 0 1 0 0 1
0 1 0 0 1 0
1 0 0 0 1 1

Y1 = I3 + I2
Y0 = I3 + I1

166
Encoders
• Encoders in VHDL

- 8-to-3 binary encoder modeled with Structural VHDL

entity encoder_8to3_binary is
generic (t_delay : time := 1.0 ns);
port (I : in STD_LOGIC_VECTOR (7 downto 0);
Y : out STD_LOGIC_VECTOR (2 downto 0) );

end entity encoder_8to3_binary;

architecture encoder_8to3_binary_arch of encoder_8to3_binary is

component or4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;

begin
U1 : or4 port map (In1 => I(1), In2 => I(3), In3 => I(5), In4 => I(7), Out1 => Y(0) );
U2 : or4 port map (In1 => I(2), In2 => I(3), In3 => I(6), In4 => I(7), Out1 => Y(1) );
U3 : or4 port map (In1 => I(4), In2 => I(5), In3 => I(6), In4 => I(7), Out1 => Y(2) );

end architecture encoder_8to3_binary_arch;

167
Encoders
entity encoder_8to3_binary is
• Encoders in VHDL generic (t_delay : time := 1.0 ns);
port (I : in STD_LOGIC_VECTOR (7 downto 0);
- 8-to-3 binary encoder modeled Y : out STD_LOGIC_VECTOR (2 downto 0) );

with Behavioral VHDL end entity encoder_8to3_binary;

architecture encoder_8to3_binary_arch of encoder_8to3_binary is


begin
ENCODE : process (I)
begin
case (I) is
when "00000001" => Y <= "000";
when "00000010" => Y <= "001";
when "00000100" => Y <= "010";
when "00001000" => Y <= "011";
when "00010000" => Y <= "100";
when "00100000" => Y <= "101";
when "01000000" => Y <= "110";
when "10000000" => Y <= "111";
when others => Y <= "ZZZ";
end case;

end process ENCODE;

end architecture encoder_8to3_binary_arch;

168
Encoder Example

169
Priority Encoders
• Priority Encoder

- a generic encoder does not know what to do when multiple input bits are asserted

- to handle this case, we need to include prioritization

- we decide the list of priority (usually MSB to LSB) where the truth table can be written as follows:

ex) 4-to-2 encoder I3 I2 I1 I0 Y1 Y0


1 x x x 1 1
0 1 x x 1 0
0 0 1 x 0 1
0 0 0 1 0 0

- we can then write expressions for an intermediate stage of priority bits “H” (i.e., Highest Priority):

H3 = I3
H2 = I2∙I3’
H1 = I1∙I2’∙I3’
H0 = I0∙I1’∙I2’∙I3’

- the final output stage then becomes:

Y1 = H3 + H2
Y0 = H3 + H1

170
Priority Encoders
• Priority Encoders in VHDL

- 8-to-3 binary priority encoder modeled entity encoder_8to3_priority is


with Behavioral VHDL generic (t_delay : time := 1.0 ns);
port (I : in STD_LOGIC_VECTOR (7 downto 0);
Y : out STD_LOGIC_VECTOR (2 downto 0) );
- If/Then/Else statements give priority
end entity encoder_8to3_priority;
- Concurrent Conditional Signal
Assignments give priority
architecture encoder_8to3_priority_arch of encoder_8to3_priority is
begin
Y <= "111" when I(7) = '1' else -- highest priority code
"110" when I(6) = '1' else
"101" when I(5) = '1' else
"100" when I(4) = '1' else
"011" when I(3) = '1' else
"010" when I(2) = '1' else
"001" when I(1) = '1' else
"000" when I(0) = '1' else -- lowest priority code
"ZZZ";

end architecture encoder_8to3_priority_arch;

171
Priority Encoder Example

172
Seven-Segment Decoder

173
Multiplexer
• Multiplexer

- gates are combinational logic which generate an output depending on the current inputs

- what if we wanted to create a “Digital Switch” to pass along the input signal?

- this type of circuit is called a “Multiplexer”

ex) truth table of Multiplexer

Sel Out
0 A
1 B

174
Multiplexer
• Multiplexer

- we can use the behavior of an AND gate to build this circuit:

X∙0 = 0 “Block Signal”


X∙1 = X “Pass Signal”

- we can then use the behavior of an OR gate at the output state (since a 0 input has no effect)
to combine the signals into one output

175
Multiplexer
• Multiplexer

- the outputs will track the selected input

- this is in effect, a “Switch”

ex) truth table of Multiplexer

Sel AB Out
0 0x 0
0 1x 1
1 x0 0
1 x1 1

- an ENABLE line can also be fed into each AND gate

176
Multiplexer
• Multiplexers in VHDL

- Structural entity mux_4to1 is


Model port (D : in STD_LOGIC_VECTOR (3 downto 0);
Sel : in STD_LOGIC_VECTOR (1 downto 0);
Y : out STD_LOGIC);
end entity mux_4to1;

architecture mux_4to1_arch of mux_4to1 is

signal Sel_n : STD_LOGIC_VECTOR (1 downto 0);


signal U3_out, U4_out, U5_out, U6_out : STD_LOGIC;
component inv1 port (In1: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and3 port (In1,In2,In3 : in STD_LOGIC; Out1: out STD_LOGIC); end component;
component or4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;

begin
U1 : inv1 port map (In1 => Sel(0), Out1 => Sel_n(0));
U2 : inv1 port map (In1 => Sel(1), Out1 => Sel_n(1));
U3 : and3 port map (In1 => D(0), In2 => Sel_n(1), In3 => Sel_n(0), Out1 => U3_out);
U4 : and3 port map (In1 => D(1), In2 => Sel_n(1), In3 => Sel(0), Out1 => U4_out);
U5 : and3 port map (In1 => D(2), In2 => Sel(1), In3 => Sel_n(0), Out1 => U5_out);
U6 : and3 port map (In1 => D(3), In2 => Sel(1), In3 => Sel(0), Out1 => U6_out);
U7 : or4 port map (In1 => U3_out, In2 => U4_out, In3 => U5_out, In4 => U6_out, Out1 => Y);

end architecture mux_4to1_arch;

177
Multiplexer
• Multiplexers in VHDL
entity mux_4to1 is
- Structural port (D : in STD_LOGIC_VECTOR (3 downto 0);
Model Sel : in STD_LOGIC_VECTOR (1 downto 0);
w/ EN EN : in STD_LOGIC;
Y : out STD_LOGIC);
end entity mux_4to1;

architecture mux_4to1_arch of mux_4to1 is

signal Sel_n : STD_LOGIC_VECTOR (1 downto 0);


signal U3_out, U4_out, U5_out, U6_out : STD_LOGIC;

component inv1 port (In1: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component or4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;

begin
U1 : inv1 port map (In1 => Sel(0), Out1 => Sel_n(0));
U2 : inv1 port map (In1 => Sel(1), Out1 => Sel_n(1));
U3 : and4 port map (In1 => D(0), In2 => Sel_n(1), In3 => Sel_n(0), In4 => EN, Out1 => U3_out);
U4 : and4 port map (In1 => D(1), In2 => Sel_n(1), In3 => Sel(0), In4 => EN, Out1 => U4_out);
U5 : and4 port map (In1 => D(2), In2 => Sel(1), In3 => Sel_n(0), In4 => EN, Out1 => U5_out);
U6 : and4 port map (In1 => D(3), In2 => Sel(1), In3 => Sel(0), In4 => EN, Out1 => U6_out);
U7 : or4 port map (In1 => U3_out, In2 => U4_out, In3 => U5_out, In4 => U6_out, Out1 => Y);
end architecture mux_4to1_arch;

178
Multiplexer
• Multiplexers in VHDL entity mux_4to1 is
port (D : in STD_LOGIC_VECTOR (3 downto 0);
Sel : in STD_LOGIC_VECTOR (1 downto 0);
- Behavioral Model w/ EN EN : in STD_LOGIC;
Y : out STD_LOGIC);
end entity mux_4to1;

architecture mux_4to1_arch of mux_4to1 is


begin
MUX : process (D, Sel, EN)
begin

if (EN = '1') then


case (Sel) is
when "00" => Y <= D(0);
when "01" => Y <= D(1);
when "10" => Y <= D(2);
when "11" => Y <= D(3);
when others => Y <= 'Z';
end case;
else
Y <= 'Z';
end if;

end process MUX;


end architecture mux_4to1_arch;

179
Multiplexer Example

180
Multi-bit Mux Example

181
Demultiplexer
• Demultiplexer

- this is the exact opposite of a Mux

- a single input will be routed to a particular output pin depending on the Select setting

ex) truth table of Demultiplexer

Sel Y0 Y1
0 In 0
1 0 In

182
Demultiplexer
• Demultiplexer

- we can again use the behavior of an AND gate to “pass” or “block” the input signal

- an AND gate is used for each Demux output

183
Demultiplexer
• Demultiplexers in VHDL

- Structural entity demux_1to4 is


Model port (D : in STD_LOGIC;
Sel : in STD_LOGIC_VECTOR (1 downto 0);
EN : in STD_LOGIC;
Y : out STD_LOGIC_VECTOR (3 downto 0));
end entity demux_1to4;

architecture demux_1to4_arch of demux_1to4 is

signal Sel_n : STD_LOGIC_VECTOR (1 downto 0);

component inv1 port (In1: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;

begin
U1 : inv1 port map (In1 => Sel(0), Out1 => Sel_n(0));
U2 : inv1 port map (In1 => Sel(1), Out1 => Sel_n(1));

U3 : and4 port map (In1 => D, In2 => Sel_n(1), In3 => Sel_n(0), In4 => EN, Out1 => Y(0));
U4 : and4 port map (In1 => D, In2 => Sel_n(1), In3 => Sel(0), In4 => EN, Out1 => Y(1));
U5 : and4 port map (In1 => D, In2 => Sel(1), In3 => Sel_n(0), In4 => EN, Out1 => Y(2));
U6 : and4 port map (In1 => D, In2 => Sel(1), In3 => Sel(0), In4 => EN, Out1 => Y(3));

end architecture demux_1to4_arch;

184
Demultiplexer
• Demultiplexers in VHDL entity demux_1to4 is
port (D : in STD_LOGIC;
Sel : in STD_LOGIC_VECTOR (1 downto 0);
- Behavioral Model with High Z Outputs
EN : in STD_LOGIC;
Y : out STD_LOGIC_VECTOR (3 downto 0));
end entity demux_1to4;

architecture demux_1to4_arch of demux_1to4 is


begin
DEMUX : process (D, Sel, EN)
begin

if (EN = '1') then


case (Sel) is
when "00" => Y <= 'Z' & 'Z' & 'Z' & D;
when "01" => Y <= 'Z' & 'Z' & D & 'Z';
when "10" => Y <= 'Z' & D & 'Z' & 'Z';
when "11" => Y <= D & 'Z' & 'Z' & 'Z';
when others => Y <= "ZZZZ";
end case;
else
Y <= "ZZZZ";
end if;
end process DEMUX;

end architecture demux_1to4_arch;

185
Tri-State Buffers
• Tri-State Buffers

- Provides either a Pass-Through or High Impedance Output depending on Enable Line

- High Impedance (Z) allows the circuit to be connected to a line with multiple circuits driving/receiving

- Using two Tri-State Buffers creates a "Bus Transceiver"

- This is used for "Multi-Drop" Buses (i.e., many Drivers/Receivers on the same bus)

ex) truth table of Tri-State Buffer ex) truth table of Bus Transceiver

ENB Out Tx/Rx Mode


0 Z 0 Receive from Bus (Rx)
1 In 1 Drive Bus (Tx)

186
Tri-State Buffers
• Tri-State Buffers in VHDL

- 'Z' is a resolved value in the STD_LOGIC data type defined in Package STD_LOGIC

-Z&0=0
-Z&1=1
-Z&L=L
-Z&H=H
TRISTATE: process (In1, ENB)
begin
if (ENB = '1') then
Out1 <= 'Z';
else
Out1 <= In1;
end if;
end process TRISTATE;

187
Comparators
• Comparators

- a circuit that compares digital values (i.e., Equal, Greater Than, Less Than)

- we are considering Digital Comparators (Analog comparators also exist)

- typically there will be 3-outputs, of which only one is asserted

- whether a bit is EQ, GT, or LT is a Boolean expression

- a 2-Bit Digital Comparator would look like:

(A=B) (A>B) (A<B)

AB EQ GT LT
0 0 1 0 0 EQ = (AB)'
0 1 0 0 1 GT = A·B'
1 0 0 1 0 LT = A'·B
1 1 1 0 0

188
Comparators
• Non-Iterative Comparators

- "Iterative" refers to a circuit make up of identical blocks. The first block performs its operation which
produces a result used in the 2nd block and so on.

- this can be thought of as a "Ripple" effect

- Iterative circuits tend to be slower due to the ripple, but take less area

- Non-Iterative circuits consist of combinational logic executing at the same time

"Equality"

- since each bit in a vector must be equal, the outputs of each bit's compare can be AND'd

- for a 4-bit comparator:

EQ = (A3B3)' · (A2B2)' · (A1B1)' · (A0B0)'

189
Comparators
• Non-Iterative Comparators

"Greater Than"

- we can start at the MSB (n) and check whether An>Bn.

- If it is, we are done and can ignore the rest of the LSB's.
- If it is NOT, but they are equal, we need to check the next MSB bit (n-1)

- to ensure the previous bit was equal, we include it in the next LSB's logic expression:

Steps - GT = An·Bn' (this is ONLY true if An>Bn)


- if it is NOT GT, we go to the n-1 bit assuming that An= Bn (An  Bn)'
- we consider An-1>Bn-1 only when An= Bn [i.e., (An  Bn)' · (An-1·Bn-1') ]
- we continue this process through all of the bits

- 4-bit comparator

GT = (A3·B3') +
(A3B3)' · (A2·B2') +
(A3B3)' · (A2B2)' · (A1·B1') +
(A3B3)' · (A2B2)' · (A1B1)' · (A0·B0')

190
Comparators
• Non-Iterative Comparators

"Less Than"

- since we assume that if the vectors are either EQ, GT, or LT, we can create LT using:

LT = EQ' · GT'

• Iterative Comparators

- we can build an iterative comparator by passing signals between identical modules from MSB to LSB

ex) module for 1-bit comparator

EQout = (AB)' · EQin

- EQout is fed into the EQin port of the next LSB module

- the first iterative module has EQin set to '1'

191
Comparators
• Comparators in VHDL

- Structural Model

entity comparator_4bit is

port (In1, In2 : in STD_LOGIC_VECTOR (3 downto 0);


EQ, LT, GT : out STD_LOGIC);

end entity comparator_4bit;

architecture comparator_4bit_arch of comparator_4bit is

signal Bit_Equal : STD_LOGIC_VECTOR (3 downto 0);


signal Bit_GT : STD_LOGIC_VECTOR (3 downto 0);
signal In2_n : STD_LOGIC_VECTOR (3 downto 0);
signal In1_and_In2_n : STD_LOGIC_VECTOR (3 downto 0);
signal EQ_temp, GT_temp : STD_LOGIC;

component xnor2 port (In1,In2: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component or4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component nor2 port (In1,In2: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and2 port (In1,In2: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and3 port (In1,In2,In3: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component inv1 port (In1: in STD_LOGIC; Out1: out STD_LOGIC); end component;

192
Comparators
• Comparators in VHDL
begin
-- "Equal" Circuitry
Cont… XN0 : xnor2 port map (In1(0), In2(0), Bit_Equal(0)); -- 1st level of XNOR tree
XN1 : xnor2 port map (In1(1), In2(1), Bit_Equal(1));
XN2 : xnor2 port map (In1(2), In2(2), Bit_Equal(2));
XN3 : xnor2 port map (In1(3), In2(3), Bit_Equal(3));
AN0 : and4 port map (Bit_Equal(0), Bit_Equal(1), Bit_Equal(2), Bit_Equal(3), Eq); -- 2nd level of "Equal" Tree
AN1 : and4 port map (Bit_Equal(0), Bit_Equal(1), Bit_Equal(2), Bit_Equal(3), Eq_temp);

-- "Greater Than" Circuitry


IV0 : inv1 port map (In2(0), In2_n(0)); -- creating In2'
IV1 : inv1 port map (In2(1), In2_n(1));
IV2 : inv1 port map (In2(2), In2_n(2));
IV3 : inv1 port map (In2(3), In2_n(3));
AN2 : and2 port map (In1(3), In2_n(3), In1_and_In2_n(3)); -- creating In1 & In2'
AN3 : and2 port map (In1(2), In2_n(2), In1_and_In2_n(2));
AN4 : and2 port map (In1(1), In2_n(1), In1_and_In2_n(1));
AN5 : and2 port map (In1(0), In2_n(0), In1_and_In2_n(0));
AN6 : and2 port map (Bit_Equal(3), In1_and_In2_n(2), Bit_GT(2));
AN7 : and3 port map (Bit_Equal(3), Bit_Equal(2), In1_and_In2_n(1), Bit_GT(1));
AN8 : and4 port map (Bit_Equal(3), Bit_Equal(2), Bit_Equal(1), In1_and_In2_n(0), Bit_GT(0));
OR0 : or4 port map (In1_and_In2_n(3), Bit_GT(2), Bit_GT(1), Bit_GT(0), GT);
OR1 : or4 port map (In1_and_In2_n(3), Bit_GT(2), Bit_GT(1), Bit_GT(0), GT_temp);

-- "Less Than" Circuitry


ND0 : nor2 port map (EQ_temp, GT_temp, LT);

end architecture comparator_4bit_arch;

193
Comparators
• Comparators in VHDL

- Behavioral Model

entity comparator_4bit is

port (In1, In2 : in STD_LOGIC_VECTOR (3 downto 0);


EQ, LT, GT : out STD_LOGIC);

end entity comparator_4bit;

architecture comparator_4bit_arch of comparator_4bit is


begin

COMPARE : process (In1, In2)


begin

EQ <= '0'; LT <= '0'; GT <= '0'; -- initialize outputs to '0'

if (In1 = In2) then EQ <= '1'; end if; -- Equal


if (In1 < In2) then LT <= '1'; end if; -- Less Than
if (In1 > In2) then GT <= '1'; end if; -- Greater Than

end process COMPARE;

end architecture comparator_4bit_arch;

194
Numeric Basics
• Representing and processing numeric data is a common
requirement
– unsigned integers
– signed integers
– fixed-point real numbers
– floating-point real numbers
– complex numbers

195
Unsigned Integers in VHDL

196
Extending/Truncating Unsigned Numbers

197
Increment/Decrement in VHDL

198
Scaling in VHDL

199
Signed Integers in VHDL

200
Resizing Signed Integers

201
Ripple Carry Adder
• Addition – Half Adder

- one bit addition can be accomplished with an XOR gate (modulo sum 2)

0 1 0 1
+0 +0 +1 +1
0 1 1 10

- notice that we need to also generate a “Carry Out” bit

- the “Carry Out” bit can be generated using an AND gate

- this type of circuit is called a “Half Adder”

- it is only “Half” because it doesn’t consider a “Carry In” bit

202
Ripple Carry Adder
• Addition – Full Adder

- to create a full adder, we need to include the “Carry In” in the Sum

Cin A B Cout Sum


0 0 0 0 0
0 0 1 0 1 Sum = A  B  Cin
0 1 0 0 1 Cout = Cin∙A + A∙B + Cin∙B
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1

- you could also use two "Half Adders" to accomplish the same thing

203
Ripple Carry Adder
• Addition – Ripple Carry Adder

- cascading Full Adders together will allow the Cout’s to propagate (or Ripple) through the circuit

- this configuration is called a Ripple Carry Adder

204
Ripple Carry Adder
• Addition – Ripple Carry Adder

- What is the delay through the Full Adder?

- Each Full Adder has the following logic:

Sum = A  B  Cin
Cout = Cin∙A + A∙B + Cin∙B

- tFull-Adder will be the longest combinational logic delay path in the adder

205
Ripple Carry Adder
• Addition – Ripple Carry Adder

- What is the delay through the entire iterative circuit?

- Each Full Adder has the following logic:

tRCA = n·tFull-Adder

- the delay increases linearly with the number of bits

- different topologies within the full-adder to reduce delay (Δt) will have a n·Δt effect

206
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder

- We've seen a Ripple Carry Adder topology (RCA)

- this is good for simplicity and design-reuse

- however, the delay increases linearly with the number of bits

tRCA = n·tFull-Adder

- different topologies within the full-adder to reduce delay (Δt) will have a n·Δt effect

- the linear increase in delay comes from waiting for the Carry to Ripple through

207
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder

- to avoid the ripple, we can build a Carry Look-Ahead Adder (CLA)

- this circuit calculates the carry for all Full-Adders at the same time

- we define the following intermediate stages of a CLA:

Generate "g", an adder (i) generates a carry out (Ci+1)under input conditions Ai and Bi
independent of Ai-1, Bi-1, or Carry In (Ci)

Ai Bi Ci+1
0 0 0
0 1 0 we can say that: gi = Ai·Bi
1 0 0
1 1 1 remember, g does NOT consider carry in (Ci)

208
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder

Propagate "p", an adder (i) will propagate (or pass through) a carry in (Ci) depending on input
conditions Ai and Bi,:

Ci Ai Bi Ci+1
0 0 0 0
0 0 1 0 pi is defined when there is a carry in,
0 1 0 0 so we ignore the row entries where Ci=0
0 1 1 1
1 0 0 0 if we only look at the Ci=1 rows
1 0 1 1 we can say that:
1 1 0 1 pi = (Ai+Bi)·Ci
1 1 1 1

209
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder

- said another way, Adder(i) will "Generate" a Carry Out (Ci+1) if:

gi = Ai·Bi

and it will "Propagate" a Carry In (Ci) when

pi = (Ai+Bi)·Ci

- a full expression for the Carry Out (Ci+1) in terms of p and g is given by:

Ci+1 = gi+pi·Ci

- this is good, but we still generate Carry's dependant on previous stages (i-1) of the iterative circuit

210
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder

- We can eliminate this dependence by recursively expanding each Carry Equation

ex) 4 bit Carry Look Ahead Logic

C1 = g0+p0·C0 (2-Level Product-of-Sums)

C2 = g1+p1·C1
C2 = g1+p1·(g0+p0·C0)
C2 = g1+p1·g0+p1·p0·C0 (2-Level Product-of-Sums)

C3 = g2+p2·C2
C3 = g2+p2·(g1+p1·g0+p1·p0·C0)
C3 = g2+p2·g1+p2·p1·g0+p2·p1·p0·C0 (2-Level Product-of-Sums)

C4 = g3+p3·C3
C4 = g3+p3·(g2+p2·g1+p2·p1·g0+p2·p1·p0·C0)
C4 = g3+p3·g2+p3·p2·g1+p3·p2·p1·g0+p3·p2·p1·p0·C0 (2-Level Product-of-Sums)

- this gives us logic expressions that can generate a next stage carry based upon ONLY
the inputs to the adder and the original carry in (C0)

211
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder

- the Carry Look Ahead logic has 3 levels

1) g and p logic
2) product terms in the Ci equations
3) sum terms in the Ci equations

- the Sum bits require 2 levels of Logic

1) AiBiCi NOTE: A Full Adder made up of 2 Half Adders


has 3 levels. But the 3rd level is used in the
creation of the Carry Out bit. Since we do not
use it in a CLA, we can ignore that level.

- So a CLA will have a total of 5 levels of Logic

212
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder

- the 5 levels of logic are fixed no matter how many bits the adder is (really?)

- In reality, the most significant Carry equation will have i+1 inputs into its largest sum/product term

- this means that Fan-In becomes a problem since real gates tend to not have more than 4-6 inputs

- When the number of inputs gets larger than the Fan-In, the logic needs to be broken into another level

ex) A+B+C+D+E = (A+B+C+D)+E

- In the worst case, the logic Fan-In would be 2. Even in this case, the delay associated with the
Carry Look Ahead logic would be proportional to log2(n)

- Area and Power are also concerns with CLA's. Typically CLA's are used in computationally intense
applications where performance outweighs Power and Area.

213
Carry Look Ahead Adders
• Adders in VHDL

- (+) and (-) are not defined for STD_LOGIC_VECTOR

- The Package STD_LOGIC_ARITH gives two data types:

UNSIGNED (3 downto 0) := "1111"; -- +15


SIGNED (3 downto 0) := "1111"; -- -1

- these are still resolved types (STD_LOGIC), but the equality and arithmetic operations are slightly
different depending on whether you are using Signed vs. Unsigned

• Considerations

- when adding signed and unsigned numbers, the type of the result will dictate how the operands are
handled/converted

- if assigning to an n-bit, SIGNED result, an n-1 UNSIGNED operand will automatically be converted
to signed by extending its vector length by 1 and filling it with a sign bit (0)

214
Carry Look Ahead Adders
• Adders in VHDL

ex) A,B : in UNSIGNED (7 downto 0);


C : in SIGNED (7 downto 0);
D : in STD_LOGIC_VECTOR (7 downto 0);

S : out UNSIGNED (8 downto 0);


T : out SIGNED (8 downto 0);
U : out SIGNED (7 downto 0);

S(7 downto 0) <= A + B; -- 8-bit UNSIGNED addition, not considering Carry

S <= ('0' & A) + ('0' & B); -- manually increasing size of A and B to include Carry.
Carry will be kept in S(9)

T <= A + C; -- T is SIGNED, so A's UNSIGNED vector size is increased


by 1 and filled with '0' as a sign bit

U <= C + SIGNED(D); -- D is converted (considered) to SIGNED, not increased in size


U <= C + UNSIGNED(D); -- D is converted (considered) to UNSIGNED, not increased in size

215
Subtraction
• Half Subtractor

- one bit subtraction can be accomplished using combinational logic

(A-B) A B Bout D
0 0 0 0
0 1 1 1 D =AB
1 0 0 1 Bout = A'·B
1 1 0 0

216
Subtraction
• Full Subtractor

- to create a full Subtractor, we need to include the “Borrow In” in the Difference

(A-B-Bin) A B Bin Bout D


0 0 0 0 0
0 0 1 1 1 D = A  B  Bin
0 1 0 1 1 Bout = A'∙B + A'∙Bin + B∙Bin
0 1 1 1 0
1 0 0 0 1
1 0 1 0 0
1 1 0 0 0
1 1 1 1 1

- notice this is very similar to addition.

- The Sum and Difference Logic are identical

- The Carry and Borrow Logic are close

217
Subtraction
• Subtraction

- Can we manipulate the subtraction logic so that Full Adders can be used as Full Subtractors?

Addition Subtraction

S = A  B  Cin D = A  B  Bin
Cout = A∙B + A∙Cin + B∙Cin Bout = A'∙B + A'∙Bin + B∙Bin

- Let's manipulate Bout to try to get it into a form similar to Cout

Bout = A'∙B + A'∙Bin + B∙Bin

Bout' = (A+B') ∙ (A+Bin') ∙ (B'+Bin') Generalized DeMorgan's Theorem

Now Multiply Out the Terms

Bout' = (A∙A∙B')+(A∙B'∙Bin')+(A∙B'∙B')+(B'∙B'∙Bin')+(A∙A∙Bin')+(A∙Bin'∙Bin')+(A∙B'∙Bin')+(B'∙Bin'∙Bin')

Now Remove Redundant Terms


Bout' = (A∙B')+(A∙B'∙Bin')+(A∙Bin')+(B'∙Bin')
Bout' = (A∙B')+(A∙Bin')+(B'∙Bin')

218
Subtraction
• Subtraction

- Now we have similar expressions for Cout and Bout where

Addition Subtraction

Cout = A∙B + A∙Cin + B∙Cin Bout' = A∙B' + A∙Bin' + B'∙Bin'

- But this requires the Subtrahend and Bin be inverted, how does this effect the Sum/Difference Logic?

Addition Subtraction

S = A  B  Cin D = A  B  Bin
- remember that both inputs of a 2-input XOR can be inverted without changing the logic
function which gives us:

S = A  B  Cin D = A  B'  Bin'

219
Subtraction
• Subtraction

- After all of this manipulation, we are left with

Addition Subtraction

S = A  B  Cin D = A  B'  Bin'


Cout = A∙B + A∙Cin + B∙Cin Bout' = A∙B' + A∙Bin' + B'∙Bin'

- This means we can use "Full Adders" for subtraction as long as:

1) The Subtrahend is inverted


2) Bin is inverted
3) Bout is inverted

- In a ripple carry subtractor, intermediate Bout's are fed into Bin's, which is a double inversion

- We can now invert by the first Bin and the last Bout by inserting a '1' into the first Bin of the chain

220
Subtraction
• Subtraction

- this gives us the minimal logic for a "Ripple Carry Subtractor" using "Full Adders"

X-Y

221
Adders/Subtractors in VHDL

222
Signed Addition in VHDL

223
Multipliers
• Multipliers

- binary multiplication of an individual bit can be performed using combinational logic:

A*B P
0 0 0
0 1 0 we can say that: P = A·B
1 0 0
1 1 1

- for multi-bit multiplication, we can mimic the algorithm that we use when doing multiplication by hand

ex) 12 this number is the "Multiplicand"


x 34 this number is the "Multiplier"
48 1) multiplicand for digit (0)
+36 2) multiplicand for digit (1)
408 3) Sum of all multiplicands

- this is called the "Shift and Add" algorithm

224
Multipliers
• "Shift and Add" Multipliers

- example of Binary Multiplication using our "by hand" method

11 1011 - multiplicand
x 13 x 1101 - multiplier
33 1011
11 0000 - these are the individual multiplicands
1011
+ +1011
1 4 3 10001111 - the final product is the sum of all multiplicands

- this is simple and straight forward. BUT, the addition of the individual multiplicand products requires
as many as n-inputs.

- we would really like to re-use our Full Adder circuits, which only have 3 inputs.

225
Multipliers
• "Shift and Add" Multipliers

- we can perform the additions of each multiplicand after it is created

- this is called a "Partial Product"

- to keep the algorithm consistent, we use "0000" as the first Partial Product

1011 - Original multiplicand


x 1101 - Original multiplier
0000 - Partial Product for 1st multiply
1011 - Shifted Multiplicand for 1st multiply
1011 - Partial Product for 2nd multiply
0000 - Shifted Multiplicand for 2nd multiply
01011 - Partial Product for 3rd multiply
1011 - Shifted Multiplicand for 3rd multiply
110111 - Partial Product for 4th multiply
1011 - Shifted Multiplicand for 4th multiply
10001111 - the final product is the sum of all multiplicands

226
Multipliers
• "Shift and Add" Multipliers

- Graphical view of product terms and summation

227
Multipliers
• "Shift and Add" Multipliers

- Graphical View of interconnect for an 8x8 multiplier. Note the Full Adders

228
Multipliers
• "Sequential" Multipliers

- the main speed limitation of the Combinational "Shift and Add" multiplier is the delay through the
adder chain.

- in the worst case, the number of delay paths through the adders would be [n + 2(n-2)]

ex) 4-bit = 8 Full Adders


8-bit = 20 Full Adders

- we can decrease this delay by using a register to accumulate the incremental additions as they
take place.

- this would reduce the number of operation states to [n-1]

• "Carry Save" Multipliers

- another trick to speed up the multiplication is to break the carry chain

- we can run the 0th carry from the first row of adders into adder for the 2nd row

- a final stage of adders is needed to recombine the carrys. But this reduces the delay to [n+(n-2)]

229
Multipliers
• "Carry Save" Multipliers

230
Unsigned Multiplication in VHDL

231
Signed Multipliers
• Multipliers

- we leaned the "Shift and Add" algorithm for constructing a combinational multiplier

- but this only worked for unsigned numbers

- we can create a signed multiplier using a similar algorithm

• Convert to Positive

- one of the simplest ways is to first convert any negative numbers to positive, then use the unsigned
multiplier

- the sign bit is added after the multiplication following:

pos x pos = pos Remember 0=pos and 1=neg is 2's comp so this is an XOR
pos x neg = neg
neg x pos = neg
neg x neg = pos

232
Signed Multipliers
• 2's Comp Multiplier

- remember that in a "Shift and Add', we created a shifted multiplicand

- the shifted multiplicand corresponded to the weight of the multiplier bit

- we can use this same technique for 2's comp remembering that

- the MSB of a 2's comp # is -2(n-1)

- we also must remember that 2's comp addition must

- be on same-sized vectors
- the carry is ignored

- we can make partial products the same size as shifted multiplicands by doing a "2's comp sign extend"

ex) 1011 = 11011 = 1110111

- since the MSB has a negative weight, we NEGATE the shifted multiplicand for that bit prior to the
last addition.

233
Signed Multipliers
• 2's Comp Shift and Add Multipliers

- we can perform the additions of each multiplicand after it is created

- this is called a "Partial Product"

- to keep the algorithm consistent, we use "0000" as the first Partial Product

1011 - Original multiplicand


x 1101 - Original multiplier
00000 - Partial Product for 1st multiply w/ Sign Extension
11011 - Shifted Multiplicand for 1st multiply w/ Sign Extension
111011 - Partial Product for 2nd multiply w/ Sign Extension
00000 - Shifted Multiplicand for 2nd multiply w/ Sign Extension
1111011 - Partial Product for 3rd multiply w/ Sign Extension
11011 - Shifted Multiplicand for 3rd multiply w/ Sign Extension
11100111 - Partial Product for 4th multiply w/ Sign Extension
00101 - NEGATED Shifted Multiplicand for 4th multiply w/ Sign Extension
1 00001111 - the final product is the sum of all multiplicands ignore Carry_Out

234
Division
• Division - "Repeated Subtraction"

- a simple algorithm to divide is to count the number of times you can subtract the divisor from the
dividend

- this is slow, but simple

- the number of times it can be subtracted without going negative is the "Quotient"

- if the subtracted value results in a zero/negative number, whatever was left prior to the
subtraction is the "Remainder"

235
Division
• Division - "Shift and Subtract"

- Division is similar to multiplication, but instead of "Shift and Add", we "Shift and Subtract"

236
Fixed-Point in VHDL
• Many applications use non-integers
– especially signal-processing apps
– Fixed-point numbers allow for fractional parts
– represented as integers that are implicitly scaled by a power of 2
• Choosing Range and Precision
– Choice depends on application
– Need to understand the numerical behavior of computations performed
• some operations can magnify quantization
– In DSP: fixed-point range affects dynamic range
– In DSP: precision affects signal-to-noise ratio
• Use numeric_bit with implied scaling
• Use proposed fixed_pkg package
– Currently being standardized by IEEE
– Types ufixed and sfixed yp
– Arithmetic operations, resizing, conversion
237
Floating-Point in VHDL
• Similar to scientific notation for decimal
– e.g., 6.02214199×1023, 1.60217653×10–19
• Allow for larger range, with same
– relative precision throughout the range

• Use proposed float_pkg package


– Currently being standardized by IEEE
– Types float, float32, float64, float128
– Arithmetic operations, resizing, conversion
• Not likely to be synthesizable
– Rather, use to verify results of hand-
optimized circuits

238
Sequential Logic Design with VHDL

• Agenda

1. Flip-Flops & Latches


2. Counters
3. Finite State Machines
4. State Variable Encoding
Model of Sequential Circuits

240
Example

241
Types of Memory Elements
• Flip-Flop
– Latch
– Registers

• Others
– Register Files
– Cache
– Flash memory
– ROM
– RAM

242
D-FF vs. D-Latch
• FF is edge sensitive (can be either positive or negative edge)
– At trigger edge of clock, input transferred to output
• Latch is level sensitive (can be either active-high or active-low)
– When clock is active, input passes to output (transparent)
– When clock is not active, output stays unchanged

243
Important Timing Parameters (1)

244
Important Timing Parameters (2)

245
System Timing: Minimum Period

246
System Timing: Minimum Delay

247
FF Based, Edge Trigger Clocking
• Td = delay of combinational logic
• Tcycle = cycle time of clock
– Duty cycle does not matter

• Timing requirements for Td


– Tdmax < Tcycle –Tsetup – Tcq -> no setup time violation
– Tdmin > Thold – Tcq -> no hold time violation

248
Latch Based, Single Phase Clocking
• Aka. Pulse Mode clocking
• Tcycle = cycle time of clock; Tw = pulse width of clock

• Timing requirements for Td


– Tdmax < Tcycle –Tdq -> data latched correctly
– Tdmin > Tw – Tdq -> no racing through next stage

249
Comparison
• Flip-Flop Based
− Larger in area
− Larger clocking overhead (Tsetup, Tcq)
+ Design more robust
• Only have to worry about Tdmax
• Tdmin usually small, can be easily fixed by buffer
+ Pulse width does not matter
• Latch Based Single Phase
+ Smaller area
+ Smaller clocking overhead ( only Tdq)
− Worry about both Tdmax and Tdmin
– Pulse width DOES matter
(unfortunately, pulse width can vary on chip)

250
Latches
• Latches
– we’ve learned all of the VHDL syntax necessary to describe sequential
storage elements

– Let’s review where sequential devices come from

• SR Latch

- To understand the SR Latch, we must remember the truth table for a NOR Gate AB F
00 1
01 0
10 0
11 0

251
Latches
• SR Latch

- when S=0 & R=0, it puts this circuit into a Bi-stable feedback mode where the output is either:

Q=0, Qn=1 Q=1, Qn=0


0 0

1 0 0 1

0 1

0 1 0
0

AB F AB F
00 1 (U2) 00 1 (U1)
01 0 01 0 (U2)
10 0 (U1) 10 0
11 0 11 0

252
Latches
• SR Latch
- we can force a known state using S & R:

Set (S=1, R=0) Reset (S=0, R=1)


0 1

0 1 1 0

1 0

1 0 1
0

AB F AB F
00 1 (U1) 00 1 (U2)
01 0 01 0 (U1)
10 0 (U2) 10 0
11 0 (U2) 11 0 (U1)

253
Latches
• SR Latch
- we can write a Truth Table for an SR Latch as follows

SR Q Qn .
0 0 Last Q Last Qn - Hold
0 1 0 1 - Reset
1 0 1 0 - Set
1 1 0 0 - Don’t Use

- S=1 & R=1 forces a 0 on both outputs. However, when the latch comes out of this state it is
metastable. This means the final state is unknown.

254
Latches
• S’R’ Latch
- we can also use NAND gates to form an inverted SR Latch

S’ R’ Q Qn .
0 0 1 1 - Don’t Use
0 1 1 0 - Set
1 0 0 1 - Reset
1 1 Last Q Last Qn - Hold

255
Latches
• SR Latch w/ Enable
- we then can add an enable line using NAND gates

- remember the Truth Table for a NAND gate

AB F
00 1 - a 0 on any input forces a 1 on the output
01 1 - when C=0, the two EN NAND Gate outputs are 1, which forces “Last Q/Qn”
10 1 - when C=1, S & R are passed through INVERTED
11 0

256
Latches
• SR Latch w/ Enable
- the truth table then becomes

C SR Q Qn .
1 0 0 Last Q Last Qn - Hold
1 0 1 0 1 - Reset
1 1 0 1 0 - Set
1 1 1 1 1 - Don’t Use
0 x x Last Q Last Qn - Hold

257
Latches
• D Latch
- a modification to the SR Latch where R = S’ creates a D-latch

- when C=1, Q <= D


- when C=0, Q <= Last Value

CD Q Qn .
1 0 0 1 - track
1 1 1 0 - track
0 x Last Q Last Qn - Hold

258
Latches
• VHDL of a D Latch

architecture Dlatch_arch of Dlatch is


begin
LATCH : process (D,C)
begin
if (C=‘1’) then
Q<=D; Qn<=not D;
else
Q<=Q; Qn<=Qn;
end if;
end process;
end architecture;

259
Flip Flops
• D-Flip-Flops

- we can combine D-latches to get an edge triggered storage device (or flop)

- the first D-latch is called the “Master”, the second D-latch the “Slave”

Master Slave
CLK=0, Q<=D “Open” CLK=0, Q<=Q “Close”
CLK=1, Q<=Q “Closed” CLK=1, Q<=D “Open”

- on a rising edge of clock, D is “latched” and held on Q until the next rising edge

260
Flip Flops
• VHDL of a D-Flip-Flop

architecture DFF_arch of DFF is


begin
FLOP : process (CLK)
begin
if (CLK’event and CLK=1) then -- recognized by all synthesizers as DFF
Q<=D; Qn<=not D;
else
Q<=Q; Qn<=Qn;
end if;
end process;
end architecture;

261
Registers
• Store a multi-bit encoded value
– One D-flipflop per bit
– Stores a new value on each clock cycle

262
Register with Enable
• Storage controlled by a clock-enable
– stores only when CE = 1 on a rising edge of the clock
– CE is a synchronous control input
• One flipflop per bit
– clk and CE wired in common

263
Example: Accumulator
• Sum a sequence of signed numbers
– A new number arrives when data_en = 1
– Clear sum to 0 on synch reset

264
Flipflop and Register Variations

265
Shift Registers
• Performs shift operation on stored data
– Arithmetic scaling
– Serial transfer of data
• Example: Sequential Multiplier
– 16×16 multiply over 16 clock cycles, using one adder
– Shift register for multiplier bits
– Shift register for lsb’s of accumulated product

266
Counters
• Counters
- special name of any clocked sequential circuit whose state diagram is a circle

- there are many types of counters, each suited for particular applications

267
Counters
• Binary Counter
- state machine that produces a straight binary count

- for n-flip-flops, 2n counts can be produced

- the Next State Logic "F" is a combinational SOP/POS circuit

- the speed will be limited by the Setup/Hold and Combinational Delay of "F"

- this gives the maximum number of counts for n-flip flops

268
Counters
• Toggle Flop
- a D-Flip-Flop can product a "Divide-by-2" effect by feeding back Qn to D

- this topology is also called a "Toggle Flop"

269
Counters
• Ripple Counter
- Cascaded Toggle Flops can
be used to form rippled counter

- there is no Next State Logic

- this is slower than a straight


binary counter due to waiting
for the "ripple"

- this is good for low power,


low speed applications

270
Counters
• Synchronous Counter with ENABLE
- an enable can be included in a "Synchronous" binary counter using Toggle Flops

- the enabled is implemented by AND'ing the Q output prior to the next toggle flop

- this gives us the "ripple" effect, but also gives the ability to run synchronously

- a little faster, but still less gates than a straight binary circuit

271
Counters
• Shift Register
- a chain of D-Flip-Flops that
pass data to one another

- this is good for "pipelining"

- also good for Serial-to-Parallel


conversion

- for n-flip-flops, the data is


present at the final state after
n clocks

272
Counters
• Ring Counter
- feeding the output of a
shift register back to the
input creates a "ring counter"

- also called a "One Hot"

- The first flip-flop needs to


reset to 1, while the others
reset to 0

- for n flip-flops, there will


be n counts

273
Counters
• Johnson Counter
- feeding the inverted output of a
shift register back to the
input creates a "Johnson Counter"

- this gives more states with the


same reduced gate count

- all flip-flops can reset to 0

- for n flip-flops, there will


be 2n counts

274
Counters
• Linear Feedback Shift Register (LFSR) Counter
- all of the counters based off of shift registers give far less states than the 2n counts that are possible

- a LFSR counter is based off of the theory of finite fields

- created by French Mathematician Evariste Galois (1811-1832)

- for each size of shift register, a feedback equation is given which is the sum modulo 2 of a certain
set of output bits

- this equation produces the input to the shift register

- this type of counter can produce 2n-1 counts, nearly the maximum possible

275
Counters
• Linear Feedback Shift Register (LFSR) Counter
- the feedback equations are listed in Table 8.26 of the textbook

- It is defined that bits always shift from Xn-1 to X0 (or Q0 to Qn-1) as we defined the shift
register previously

- they each use XOR gates (sum modulo 2) of particular bits in the register chain

ex)

n Feedback Equation
2 X2 = X1  X0
3 X3 = X1  X0
4 X4 = X1  X0
5 X5 = X2  X0
6 X6 = X1  X0
7 X7 = X3  X0
8 X8 = X4  X3  X2  X0
: :
: :

276
Counters
• Linear Feedback Shift Register (LFSR) Counter
ex) 4-flip-flop LFSR Counter
Feedback Equation = X1  X0 (or Q2  Q3 as we defined it)

# Q(0:3) Sin
0 1000 0
1 0100 0
2 0010 1
3 1001 1
4 1100 0
5 0110 1
6 1011 0
7 0101 1
8 1010 1
9 1101 1
10 1110 1
11 1111 0
12 0111 0
13 0011 0
14 0001 1 - this is 2n-1 unique counts
repeat 1000

277
Counters
• Counters in VHDL
- strong type casting in VHDL can make modeling counters difficult (at first glance)

- the reason for this is that the STANDARD and STD_LOGIC Packages do not define
"+", "-", or inequality operators for BIT_VECTOR or STD_LOGIC_VECTOR types

278
Counters
• Counters in VHDL
- there are a couple ways that we get around this

1) Use the STD_LOGIC_UNSIGNED Package

- this package defines "+" and "-" functions for STD_LOGIC_VECTOR

- we can use +1 just like normal

- the vector will wrap as suspected (1111 - 0000)

- one catch is that we can't assign to a Port

- we need to create an internal signal of STD_LOGIC_VECTOR for counting

- we then assign to the Port at the end

279
Counters
• Counters in VHDL using STD_LOGIC_UNSIGNED

use IEEE.STD_LOGIC_UNSIGNED.ALL; -- call the package

entity counter is
Port ( Clock : in STD_LOGIC;
Reset : in STD_LOGIC;
Direction : in STD_LOGIC;
Count_Out : out STD_LOGIC_VECTOR (3 downto 0));
end counter;

280
Counters
• Counters in VHDL using STD_LOGIC_UNSIGNED
architecture counter_arch of counter is

signal count_temp : std_logic_vector(3 downto 0); -- Notice internal signal

begin
process (Clock, Reset)
begin
if (Reset = '0') then
count_temp <= "0000";
elsif (Clock='1' and Clock'event) then
if (Direction='0') then
count_temp <= count_temp + '1'; -- count_temp can be used on both LHS and RHS
else
count_temp <= count_temp - '1';
end if;
end if;
end process;

Count_Out <= count_temp; -- assign to Port after the process

end counter_arch;

281
Counters
• Counters in VHDL
2) Use integers for the counter and then convert back to STD_LOGIC_VECTOR

- STD_LOGIC_ARITH is a Package that defines a conversion function

- the function is: conv_std_logic_vector (ARG, SIZE)

- functions are defined for ARG = integer, unsigned, signed, STD_ULOGIC

- SIZE is the number of bits in the vector to convert to, given as an integer

- we need to keep track of the RANGE and Counter Overflow

282
Counters
• Counters in VHDL using STD_LOGIC_ARITH

use IEEE.STD_LOGIC_ARITH.ALL; -- call the package

entity counter is
Port ( Clock : in STD_LOGIC;
Reset : in STD_LOGIC;
Direction : in STD_LOGIC;
Count_Out : out STD_LOGIC_VECTOR (3 downto 0));
end counter;

283
Counters
• Counters in VHDL using STD_LOGIC_ARITH
architecture counter_arch of counter is

signal count_temp : integer range 0 to 15; -- Notice internal integer specified with Range

begin
process (Clock, Reset)
begin
if (Reset = '0') then
count_temp <= 0; -- integer assignment doesn't requires quotes
elsif (Clock='1' and Clock'event) then
if (count_temp = 15) then
count_temp <= 0; -- we manually check for overflow
else
count_temp <= count_temp + 1;
end if;
end if;
end process;

Count_Out <= conv_std_logic_vector (count_temp, 4); -- convert integer into a 4-bit STD_LOGIC_VECTOR

end counter_arch;

284
Counters
• Counters in VHDL
3) Use UNSIGNED data types #'s

- STD_LOGIC_ARITH also defines "+", "-", and equality for UNSIGNED types

- UNSIGNED is a Data type defined in STD_LOGIC_ARITH

- UNSIGNED is an array of STD_LOGIC

- An UNSIGNED type is the equivalent to a STD_LOGIC_VECTOR type

- the equality operators assume it is unsigned (as opposed to 2's comp SIGNED)

• Pro's and Cons


- using integers allows a higher level of abstraction and more functionality can be included

- easier to write unsynthesizable code or code that produces unwanted logic

- both are synthesizable when written correctly

285
Counters
• Ring Counters in VHDL
- to mimic the shift register behavior, we need access to the signal value before and after clock'event

- consider the following concurrent signal assignments:

architecture ….
begin
Q0 <= Q3;
Q1 <= Q0;
Q2 <= Q1;
Q3 <= Q2;

end architecture…

- since they are executed concurrently, it is equivalent to Q0=Q1=Q2=Q3, or a simple wire

286
Counters
• Ring Counters in VHDL
- since a process doesn't assign the signal values until it suspends, we can use this to model the
"before and after" behavior of a clock event.

process (Clock, Reset)


begin
if (Reset = '0') then
Q0<='1'; Q1<='0'; Q2<='0'; Q3<='0';
elsif (Clock'event and Clock='1') then
Q0<=Q3; Q1<=Q0; Q2<=Q1; Q3<=Q2;
end if;
end process

- notice that the signals DO NOT appear in the sensitivity list. If they did the process would
continually execute and not be synthesized as a flip-flop structure

287
Counters
• Johnson Counters in VHDL

process (Clock, Reset)


begin
if (Reset = '0') then
Q0<='0'; Q1<='0'; Q2<='0'; Q3<='0';
elsif (Clock'event and Clock='1') then
Q0<=not Q3; Q1<=Q0; Q2<=Q1; Q3<=Q2;
end if;
end process

288
Counters
• Linear Feedback Shift Register Counters in VHDL

process (Clock, Reset)


begin
if (Reset = '0') then
Q0<='0'; Q1<='0'; Q2<='0'; Q3<='0';
elsif (Clock'event and Clock='1') then
Q0<=Q3 xor Q2; Q1<=Q0; Q2<=Q1; Q3<=Q2;
end if;
end process

289
Terminal Count and Divide by k
• TC is '1' for one cycle in every 2n cycles
– frequency = clock frequency / 2n
– Called a clock divider
• Decode k–1 as terminal count and reset counter register
– Counter increments modulo k
• Example: decade counter
– Terminal count (TC) = 9
• Decade Counter in VHDL

290
Loadable Counter in VHDL
• Load a starting value, then decrement
– Terminal count = 0
– Useful for interval timer

291
Reloading Counter in VHDL

292
State Machines
What is FSM?
• A model of computation consisting of
– a set of states, (limited number)
– a start state,
– input symbols,
– a transition function that maps input symbols and current states to a next
state.

294
Counters
• Multiple Processes
- we can now use State Machines to control the start/stop/load/reset of counters

- each are independent processes that interact with each other through signals

- a common task for a state machine is:

1) at a certain state, load and enable a counter

2) go to a state and wait until the counter reaches a certain value

3) when it reaches the certain value, disable the counter and continue to the next state

- since the counter runs off of a clock, we know how long it will count between the start and stop

295
State Machines
• State Machines
- there is a basic structure for a Clocked, Synchronous State Machine

1) State Memory (i.e., flip-flops)


2) Next State Logic “G” (combinational logic)
3) Output Logic “F” (combinational logic) we’ll revisit F later…

- if we keep this structure in mind while designing digital machines in VHDL, then it is a very
straight forward task

- Each of the parts of the State Machine are modeled with individual processes

- let’s start by reviewing the design of a state machine using a manual method

296
Elements of FSM
• Memory Elements (ME)
– Memorize Current States (CS)
– Usually consist of FF or latch
– N-bit FF have 2n possible states
• Next-state Logic (NL)
– Combinational Logic
– Produce next state
• Based on current state (CS) and input (X)
• Output Logic (OL)
– Combinational Logic
– Produce outputs (Z)
• Based on current state, or
• Based on current state and input

297
Finite State Machine
• Used control the circuit core
• Partition FSM and non-FSM part

298
Finite State Machines
• Synchronous (i.e. clocked) finite state machines (FSMs) have
widespread application in digital systems, e.g. as datapath
controllers in computational units and processors.
Synchronous FSMs are characterized by a finite number of
states and by clock-driven state transitions.
• Mealy Machine: The next state and the outputs depend on the
present state and the inputs.
• Moore Machine: The next state depends on the present state
and the inputs, but the output depends on only the present
state.

299
State Machines
• State Machines
“Mealy Outputs” – outputs depend on the Current_State and the Inputs

300
State Machines
• State Machines
“Moore Outputs” – outputs depend on the Current_State only

301
State Machines
• State Machines
- the steps in a state machine design are:

1) Word Description of the Problem


2) State Diagram
3) State/Output Table
4) State Variable Assignment
5) Choose Flip-Flop type
6) Construct F
7) Construct G
8) Logic Diagram

302
State Machines
• State Machine Example “Sequence Detector”
1) Design a machine by hand that takes in a serial bit stream and looks for the pattern “1011”.
When the pattern is found, a signal called “Found” is asserted

2) State Diagram

303
State Machines
• State Machine Example “Sequence Detector”
3) State/Output Table

Current_State In Next_State Out


(Found)

S0 0 S0 0
1 S1 0
S1 0 S2 0
1 S0 0
S2 0 S0 0
1 S3 0
S3 0 S0 0
1 S0 1

304
State Machines
• State Machine Example “Sequence Detector”
4) State Variable Assignment – let’s use binary

Current_State In Next_State Out


Q1 Q0 Q1* Q0* Found

0 0 0 0 0 0
1 0 1 0
0 1 0 1 0 0
1 0 0 0
1 0 0 0 0 0
1 1 1 0
1 1 0 0 0 0
1 0 0 1

5) Choose Flip-Flop Type

- 99% of the time we use D-Flip-Flops

305
State Machines
• State Machine Example “Sequence Detector”
Q1 Q0 Q1
6) Construct Next State Logic “F” In
00 01 11 10
0 2 6 4

0 0 1 0 0

1 3 7 5
Q1* = Q1’∙Q0∙In’ + Q1∙Q0’∙In
In 1 0 0 0 1

Q0

Q1 Q0 Q1
In
00 01 11 10
0 2 6 4

0 0 0 0 0

1 3 7 5
Q0* = Q0’∙In
In 1 1 0 0 1

Q0

306
State Machines
• State Machine Example “Sequence Detector”
7) Construct Output Logic “G”
Q1 Q0 Q1
In
00 01 11 10
0 2 6 4
Found = Q1∙Q0∙In 0 0 0 0 0

1 3 7 5

In 1 0 0 1 0

Q0

8) Logic Diagram

- for large designs, this becomes impractical

307
State Machines in VHDL
• State Memory
- we use a process that updates the “Current_State” with the “Next_State”

- we describe DFF’s using (CLK’event and CLK=‘1’)

- this will make the assignment on the rising edge of CLK

STATE_MEMORY : process (CLK)


begin
if (CLK’event and CLK='1') then
Current_State <= Next_State;
end if;
end process;

- at this point, we need to discuss State Names

308
State Machines in VHDL
• State Memory using “User-Enumerated Data Types"
- we always want to use descriptive names for our states

- we can use a user-enumerated type for this

type State_Type is (S0, S1, S2, S3);


signal Current_State : State_Type;
signal Next_State : State_Type;

- this makes our simulations very readable.

• State Memory using “Pre-Defined Data Types"


- we haven’t encoded the variables though, we can either leave it to the synthesizer or manually do it

subtype State_Type is BIT_VECTOR (1 downto 0);


constant S0 : State_Type := “00”;
constant S1 : State_Type := “01”;
constant S2 : State_Type := “10”;
constant S3 : State_Type := “11”;

signal Current_State : State_Type;


signal Next_State : State_Type;

309
State Machines in VHDL
• State Memory with “Synchronous RESET”

STATE_MEMORY : process (CLK)


begin
if (CLK’event and CLK='1') then

if (Reset = ‘1’) then


Current_State <= S0; -- name of “reset” state to go to
else
Current_State <= Next_State;
end if;

end if;
end process;

- this design will only observe RESET on the positive edge of clock (i.e., synchronous)

310
State Machines in VHDL
• State Memory with “Asynchronous RESET”

STATE_MEMORY : process (CLK, Reset)


begin
if (Reset = ‘1’) then

Current_State <= S0; -- name of “reset” state to go to

elsif (CLK’event and CLK='1') then

Current_State <= Next_State;

end if;

end process;

- this design is sensitive to both RESET and the positive edge of clock (i.e., asynchronous)

311
State Machines in VHDL
• Next State Logic “F”
- we use another process to construct “F”

312
State Machines in VHDL
• Next State Logic “F”
- the process will be combinational logic

NEXT_STATE_LOGIC : process (In, Current_State)


begin
case (Current_State) is

when S0 => if (In=‘0’) then Next_State <= S0;


elsif (In=‘1’) then Next_State <= S1; end if;
when S1 => if (In=‘0’) then Next_State <= S2;
elsif (In=‘1’) then Next_State <= S0; end if;
when S2 => if (In=‘0’) then Next_State <= S0;
elsif (In=‘1’) then Next_State <= S3; end if;
when S3 => if (In=‘0’) then Next_State <= S0;
elsif (In=‘1’) then Next_State <= S0; end if;

end case;
end process;

313
State Machines in VHDL
• Output Logic “G”
- we use another process to construct “G”
- the expressions in the sensitivity list dictate Mealy/Moore type outputs
- for now, let’s use combinational logic for G (we’ll go sequential later)

314
State Machines in VHDL
• Output Logic “G”
- Mealy type outputs

OUTPUT_LOGIC : process (In, Current_State)


begin
case (Current_State) is

when S0 => if (In=‘0’) then Found <= 0;


elsif (In=‘1’) then Found <= 0; end if;
when S1 => if (In=‘0’) then Found <= 0;
elsif (In=‘1’) then Found <= 0; end if;
when S2 => if (In=‘0’) then Found <= 0;
elsif (In=‘1’) then Found <= 0; end if;
when S3 => if (In=‘0’) then Found <= 0;
elsif (In=‘1’) then Found <= 1; end if;

end case;
end process;

315
State Machines in VHDL
• Output Logic “G”
- Moore type outputs

OUTPUT_LOGIC : process (Current_State)


begin
case (Current_State) is

when S0 => Found <= 0;


when S1 => Found <= 0;
when S2 => Found <= 0;
when S3 => Found <= 1;

end case;
end process;

- this is just an example, it doesn’t really work for this machine

316
State Machines in VHDL
• Example
- Let’s design a 2-bit Up/Down Gray Code Counter using User-Enumerated State Encoding
- In=0, Count Up
- In=1, Count Down
- this will be a Moore Type Machine
- no Reset

317
State Machines in VHDL
• Example
- let’s collect our thoughts using a State/Output Table

Current_State In Next_State Out

CNT0 0 CNT1 00
1 CNT3
CNT1 0 CNT2 01
1 CNT0
CNT2 0 CNT3 11
1 CNT1
CNT3 0 CNT0 10
1 CNT2

318
State Machines in VHDL
• Example
architecture CNT_arch of CNT is

type State_Type is (CNT0, CNT1, CNT2, CNT3);


signal Current_State, Next_State : State_Type;

begin
STATE_MEMORY : process (CLK)
begin
if (CLK’event and CLK='1') then
Current_State <= Next_State;
end if;
end process;

NEXT_STATE_LOGIC : process (In, Current_State)


begin
case (Current_State) is
when CNT0 => if (In=‘0’) then Next_State <= CNT1;
elsif (In=‘1’) then Next_State <= CNT3; end if;
when CNT1 => if (In=‘0’) then Next_State <= CNT2;
elsif (In=‘1’) then Next_State <= CNT0; end if;
when CNT2 => if (In=‘0’) then Next_State <= CNT3;
elsif (In=‘1’) then Next_State <= CNT1; end if;
when CNT3 => if (In=‘0’) then Next_State <= CNT0;
elsif (In=‘1’) then Next_State <= CNT2; end if;
end case;
end process;

OUTPUT_LOGIC : process (Current_State)


begin
case (Current_State) is
when CNT0 => Out <= “00”;
when CNT1 => Out <= “01”;
when CNT2 => Out <= “11”;
when CNT3 => Out <= “10”;
end case;
end process;

end architecture;

319
State Machines in VHDL
• Example
- in the lab, we may want to observe the states on the LEDs
- in this case we want to explicitly encode the STATE variables

architecture CNT_arch of CNT is

subtype State_Type is BIT_VECTOR (1 dowto 0);


constant CNT0 : State_Type := “00”;
constant CNT1 : State_Type := “01”;
constant CNT2 : State_Type := “10”;
constant CNT3 : State_Type := “11”;
signal Current_State, Next_State : State_Type;

320
State Encoding
• State Variable Encoding
- we can decide how we encode our state variables
- there are advantages/disadvantages to different techniques

• Binary Encoding
- straight encoding of states

S0 = “00”
S1 = “01”
S2 = “10”
S3 = “11”

- for n states, there are log(n)/log(2) flip-flops needed

- this gives the Least # of Flip-Flops

- Good for “Area” constrained designs

- Drawbacks: - multiple bits switch at the same time = Increased Noise & Power
- the Next State Logic “F” is multi-level = Increased Power and Reduced Speed

321
State Encoding
• Gray-Code Encoding
- encoding using a gray code where only one bits switches at a time

S0 = “00”
S1 = “01”
S2 = “11”
S3 = “10”

- for n states, there are log(n)/log(2) flip-flops needed

- this gives low Power and Noise due to only one bit switching

- Good for “Power/Noise” constrained designs

- Drawbacks: - the Next State Logic “F” is multi-level = Increased Power and Reduced Speed

322
State Encoding
• One-Hot Encoding
- encoding one flip-flop for each state

S0 = “0001”
S1 = “0010”
S2 = “0100”
S3 = “1000”

- for n states, there are n flip-flops needed

- the combination logic for F is one level (i.e., a Decoder)

- Good for Speed

- Especially good for FPGA due to “Programmable Logic Block”

- Drawbacks: - takes more area

323
State Encoding
• State Encoding Trade-Offs
- We typically trade off Speed, Area, and Power

One-Hot

speed

area
power

Binary Gray

324
Mealy Finite State Machine
• A serially-transmitted BCD (8421 code) word is to be
converted into an Excess-3 code. An Excess-3 code word
is obtained by adding 3 to the decimal value and taking
the binary equivalent. Excess-3 code is self-complementing
[Wakerly, p. 80], i.e. the 9's complement of a code word is
obtained by complementing the bits of the word.

325
Mealy Finite State Machine
• The serial code converter is described by the state transition
graph of a Mealy FSM below
• The vertices of the state transition graph of a Mealy machine
are labeled with the states.
• The branches are labeled with (1) the input that causes a
transition to the indicated next state, and (2) with the output
that is asserted in the present state for that input.
• The state transition is synchronized to a clock.
• The state table summarizes the machine's behavior in tabular
format.

326
Design of Mealy Finite State
Machine

327
Design of Mealy Finite State
Machine

328
Design of Mealy Finite State
Machine

329
Design of Mealy Finite State
Machine

330
Example: Design of A Serial Line Code Converter

331
Example: Design of A Serial Line Code Converter

332
Example: Design of A Serial Line Code Converter

333
Example: Design of A Serial Line Code Converter

334
Example: Design of A Serial Line Code Converter

335
Example: Design of A Serial Line Code Converter

336
Example: Design of A Serial Line Code Converter

337
Example: Design of A Serial Line Code Converter

338
Example: Design of A Serial Line Code Converter

339
Pipelined Outputs
• Pipelined Outputs
- Having combinational logic drive outputs can lead to:

- multiple delay paths through the logic


- potential for glitches

- Both reduce the speed at which the system clock can be ran

- A good design practice is to pipeline the outputs (i.e., use DFF’s as the output driver)

340
Pipelined Outputs
• Pipelined Outputs
- This gives a smaller Data Uncertainty window on the output

- The only consideration is that the output is not present until one clock cycle later

341
Pipelined Outputs
• Pipelined Outputs
- we use a 4th process for this stage of the State Machine

PIPELINED_OUTPUTS : process (CLK)


begin
if (CLK’event and CLK='1') then
Out <= Next_Out;
end if;
end process;

342
Asynchronous Inputs
• Asynchronous Inputs
- Real world inputs are not phase-locked to the clock

- this means an input can change within the Setup/Hold window of the clock

- this can send the Machine into an incorrect state

- we always want to “synchronize” inputs so that this doesn’t happen

343
Asynchronous Inputs
• Asynchronous Inputs
- We use D-Flip-Flops to take in the input

- with one D-Flip-Flop, the input can still occur within the Setup/Hold window

- the output of the first DFF may be metastable for a moment of time (trecovery)

- a second DFF is used to latch in the metastable input after it has had time to settle

- the output of the second flip-flop is now stable and synchronized as long as:

Tclk > trecovery + tcomb + tsetup

- where tcomb is the delay of any combinational logic in the input path

344
Comparison of Binary and Onehot Style
• Binary-encoded FSM
– fewer flip-flops for state register
– = log2(state number)

• Onehot-encoded FSM
– more flip-flops for state register
– = state number

• FPGA vender frequently recommend using onehot encoding


style because flip-flops are plentiful in FPGA and the
combinational logic cells required to implement is less for
onehot style.
• i.e. Onehot style FSM usually runs faster than binary style
FSM on FPGA
345
A Simple Design Example:
Level-to-Pulse Converter
• A level-to-pulse converter produce a single-cycle pulse each
time its input goes high
– In other words, it’s a synchronous rising edge detector
• Sample application
– Button and switches (may need de-bounce processing)
– Single-cycle enable signal for counters

346
A Simple Design Example:
Level-to-Pulse Converter

347
A Simple Design Example:
Level-to-Pulse Converter

348
A Simple Design Example:
Level-to-Pulse Converter

349
A Simple Design Example:
Level-to-Pulse Converter

350
A Simple Design Example:
Level-to-Pulse Converter

351
A Simple Design Example:
Level-to-Pulse Converter

352
A Simple Design Example:
Level-to-Pulse Converter

353
Datapaths and Control
• Digital systems perform sequences of operations on encoded
data
• Digital hardware systems = data-path + control
• Datapath: registers, counters, combinational functional units
(e.g., ALU), communication (e.g., busses)
– Combinational circuits for operations
– Registers for storing intermediate results
• Control section: control sequencing (FSM generating
sequences of control signals that instructs datapath what to do
next)
– Generates control signals
• Selecting operations to perform
• Enabling registers at the right times
– Uses status signals from datapath

354
Review of FSM Design
• FSM Design
– Partition FSM and non-FSM logic
– Partition combinational part and sequential part
– Use parameter to define names of the state vector
– Assign a default (reset) state

355
Homework
• Design a traffic signal controller at crossroads

• One pair traffic signal controller


– State Diagram
– State Coding
– Performance
– [Optional] With interrupt/extra setting

• Other example:
– Automatic Vending Machine
– Automatic Teller Machine

356
Project Example:
DataPath - Digital combinational lock
(Verilog)
Digital combinational lock
• Door combination lock:
– punch in 3 values in sequence and the door opens; if there is an error
the lock must be reset; once the door opens the lock must be reset
– inputs: sequence of input values, reset
– outputs: door open/close
– memory: must remember combination or always have it available
– open questions: how do you set the internal combination?
• stored in registers (how loaded?)
• hardwired via switches set by user

358
Digital combinational lock
Implementation in software

359
Determining details of the specification
• How many bits per input value?
• How many values in sequence?
• How do we know a new input value is entered?
• What are the states and state transitions of the system?

360
Digital combination lock state diagram
• States: 5 states
– represent point in execution of machine
– each state has outputs
• Transitions: 6 from state to state, 5 self transitions, 1 global
– changes of state occur when clock says its ok
– based on value of inputs
• Inputs: reset, new, results of comparisons
• Output: open/closed

361
Digital combination lock
(state encoding)
• Verilog description including state encoding

module string (clk, value, new, rst, open); always @(posedge clk) begin
input clk, new; if rst state = ‘S1;
input [3:0] value; else
output open; case (state)
‘S1: if ((value== C1) & new) state = ‘S2
reg state[2:0]; else state = ‘ERR;
‘define S1 = [0,0,0]; ‘S2: if ((value== C2) & new) state = ‘S3
‘define S2 = [0,0,1]; else state = ‘ERR;
‘define S3 = [0,1,0]; ‘S3: if ((value== C3) & new) state = ‘OPEN
‘define OPEN = [0,1,1]; else state = ‘ERR;
‘define ERR = [1,0,0]; ‘OPEN: state = ‘OPEN;
‘ERR: state = ‘ERR;
‘define C1 = [1,1,0,1]; default: begin
‘define C2 = [0,1,1,1]; $display (“invalid state reached”);
‘define C3 = [0,1,0,0]; state = 3’bxxx;
end
assign open = (state == ‘OPEN); endcase
end
endmodule

362
Data-path and control structure

363
State table for combination lock
• Finite-state machine
– refine state diagram to take internal structure into account
– state table ready for encoding

next
reset new equal state state mux open/closed
1 – – – S1 C1 closed
0 0 – S1 S1 C1 closed
0 1 0 S1 ERR C1 closed
0 1 1 S1 S2 C1 closed
0 0 – S2 S2 C2 closed
0 1 0 S2 ERR C2 closed
0 1 1 S2 S3 C2 closed
0 0 – S3 S3 C3 closed
0 1 0 S3 ERR C3 closed
0 1 1 S3 OPEN C3 closed
0 – – OPEN OPEN – open

364
Encodings for combination lock
• Encode state table
– state can be: S1, S2, S3, OPEN, or ERR
• needs at least 3 bits to encode: 000, 001, 010, 011, 100
• and as many as 5: 00001, 00010, 00100, 01000, 10000
• choose 4 bits: 0001, 0010, 0100, 1000, 0000
– output mux can be: C1, C2, or C3
• needs 2 to 3 bits to encode
• choose 3 bits: 001, 010, 100
– output open/closed can be: open or closed
• needs 1 or 2 bits to encode
• choose 1 bit: 1, 0

365
Data-path implementation for combination lock

• Multiplexer
– easy to implement as combinational logic when few inputs
– logic can easily get too big for most PLDs
0 i  3

output mux can be: C1, C2, or C3 Value[i] C1[i] C2[i] C3[i]
3 Mux control bits: 001, 010, 100 mux
control
C1 C2 C3
4 4 4 mux
control
multiplexer
4
value comparator
4 equal

equal 366
Data-path implementation (cont’d)
• Tri-state logic
– utilize a third output state: “no connection” or “float”
– connect outputs together as long as only one is “enabled”
– open-collector gates can
0 i  3
only output 0, not 1
• can be used to implement Value[i] C1[i] C2[i] C3[i]
logical AND with only wires
mux
control

+ oc
C1 C2 C3
4 4 4 mux
control tri-state driver
multiplexer
4 (can disconnect
equal from output)
value comparator
4 equal open-collector connection
(zero whenever one connection is zero,
one otherwise – wired AND) 367
Tri-state gates
• The third value
– logic values: “0”, “1”
– don't care: “X” (must be 0 or 1 in real circuit!)
– third value or state: “Z” — high impedance, infinite R, no connection
• Tri-state gates
– additional input – output enable (OE)
– output values are 0, 1, and Z
– when OE is high, the gate functions normally
– when OE is low, the gate is disconnected from wire at output
– allows more than one gate to be connected to the same output wire
• as long as only one has its output enabled at any one time (otherwise, sparks
could fly)

368
Tri-state and multiplexing
• When using tri-state logic
– (1) make sure never more than one "driver" for a wire at any one time
(pulling high and low at the same time can severely damage circuits)
– (2) make sure to only use value on wire when its being driven (using a
floating value may cause failures)
• Using tri-state gates to implement an economical multiplexer

369
Open-collector gates and wired-AND
• Open collector: another way to connect gate outputs to the same wire
– gate only has the ability to pull its output low
– it cannot actively drive the wire high (default – pulled high through resistor)
• Wired-AND can be implemented with open collector logic
– if A and B are "1", output is actively pulled low
– if C and D are "1", output is actively pulled low
– if one gate output is low and the other high, then low wins
– if both gate outputs are "1", the wire value "floats", pulled high by resistor
• low to high transition usually slower than it would have been with a gate pulling
high
– hence, the two NAND functions are ANDed together

Equivalent circuits

open-collector with ouputs wired together


NAND gates using "wired-AND"
to form (AB)'(CD)'

370
Digital combination lock (new data-path)
• Decrease number of inputs
• Remove 3 code digits as inputs
– use code registers
– make them loadable from value
– need 3 load signal inputs (net gain in input (4*3)–3=9)
• could be done with 2 signals and decoder
(ld1, ld2, ld3, load none)

371
Complex Datapath
Complex Multiplier Datapath

373
Complex Multiplier in VHDL

374
Multiplier Control Sequence
• Avoid resource conflict
• First attempt
– 1. a_r * b_r → pp1_reg
– 2. a_i * b_i → pp2_reg
– 3. pp1 – pp2 → p_r_reg
– 4. a_r * b_i → pp1_reg
– 5. a_i * b_r → pp2_reg
– 6. pp1 + pp2 → p_i_reg
• Takes 6 clock cycles
• Merge steps where no resource conflict
• Revised attempt
– 1. a_r * b_r → pp1_reg
– 2. ai * bi → pp2reg
– 3. pp1 – pp2 → p_r_reg
– a_r * b_i → pp1_reg
– 4. a_i * b_r → pp2_reg
– 5. pp1 + pp2 → p_i_reg
375
• Takes 5 clock cycles
Finite-State Machines
• Used the implement control sequencing
– Based on mathematical automaton theory
• A FSM is defined by
– set of inputs: Σ
– set of outputs: Γ
– set of states: S
– initial state: s0 ∈ S
– transition function: δ: S × Σ → S
– output function: ω: S × Σ → Γ or ω: S → Γ

• FSM in Hardware

376
FSM Example: Multiplier Control
• One state per step
– Separate idle state?
– Wait for input_rdy = '1‘
– Then proceed to steps 1, 2, ...
– But this wastes a cycle!
• Use step 1 as idle state
– Repeat step 1 if input_rdy ≠ '1‘
– Proceed to step 2 otherwise
• Output function
– Defined by table on slide 43
– Moore or Mealy?

377
FSMs in VHDL
• Use an enumeration type for state values
– abstract, avoids specifying encoding

378
Multiplier Control in VHDL

379
Multiplier Control Diagram
• Input: input_rdy
• Outputs
– a_sel, b_sel, pp1_ce, pp2_ce, sub, p_r_ce, p_i_ce

380
Bubble Diagrams or VHDL?
• Many CAD tools provide editors for bubble diagrams
– Automatically generate VHDL for simulation and synthesis
• Diagrams are visually appealing
– but can become unwieldy for complex FSMs
• Your choice...
– or your manager's!

381
Verifying Sequential Circuits
• DUV may take multiple and varying number of cycles to
produce output
• Checker needs to
– „synchronize with test generator
– „ensure DUV outputs occur when expected
– „ensure DUV outputs are correct
– „ensure no spurious outputs occur

382
383
Computer Systems

• Agenda

1. Memory
2. Von Neumann Architecture
3. Sequence Controllers
4. Processing Units & Register Modeling
Memory

• Memory Types

Notes on definitions:

1) The word "RAM" is now used interchangeably with R/W memory.


Formally, most types ROM are also Random Access

2) ROM memory typically refers to storage that can't be written during program execution.
It can hold program and data information, but under normal operation a CPU doesn't
use it for variable storage.

As Flash EEprom gets faster and more reliable, Flash may become used as RAM

385
Memory - SRAM

• Static Random Access Memory (SRAM)

- SRAM is volatile memory (i.e., if the power is removed, the information is lost)

- SRAM uses an inverter loop to store the digital information

- two NMOS transistors acting as switches are used to Read and Write the stored data

- we call the circuitry to store 1-bit a "cell"

386
Memory - SRAM

• SRAM Addressing
- we configure the cells into an array

- we address each cell using:

Row Address

- a row decoder produces a "Word Line"

- this gives a "Row Select" (RS) signal

Column Address

- a column decoder produces a "Bit Line"

- this gives a "Column Select" (CS)

387
Memory - SRAM

• SRAM Addressing
- The Word Lines are used to address a row of cells

- The Bit Lines are used to address a column in addition to reading and writing

- There are two bit lines per cell, BL and BL'

- This allows a difference amplifier to be used to


distinguish between a 1 and a 0

388
Memory - SRAM

• SRAM Reading
- The capacitance of the Bit Lines can be very large due to multiple cells being attached

- This creates a problem during a READ because the small cell will need to drive this large capacitance

- To reduce the amount of charge that the


cell has to drive during a READ, pull-up
transistors are used to "pre-charge" the
lines to VDD

389
Memory - SRAM

• SRAM Reading

- In order to design a usable SRAM cell, we must meet the condition that:

"Reading the value does NOT destroy the contents of the cell"

- Let's look at what happens during a read to see how to meet this condition

Reading a '0'

- Initially V1=0v, V2=VDD

- M3 and M4 are turned ON

- this allows the Cell to drive BL and BL'

- The voltage V2 will be the same as the


pre-charged BL' line, so no current will
flow through M4

390
Memory - SRAM

• SRAM Writing

- when writing to the SRAM cell, we inject full swing digital signals onto BL and BL'.

- when we assert the Word Lines, M3 and M4 will open and attempt to change the state of
the cell.

391
Memory - DRAM

• Dynamic Random Access Memory (DRAM)


- A volatile memory storage device even smaller than SRAM

- DRAM uses a capacitor to store the value of the digital information (instead of an inverter loop)

- one NMOS transistor is used to address the storage element

- the one-transistor configuration is known as a “1T” DRAM

392
Memory - DRAM

• DRAM Operation
- When the cell is addressed, the charge on the storage capacitor (CS) is dumped onto the bit line (BL)

- To reduce the amount of charge the cell has to provide, the bit line capacitance (CBL) is
pre-charged to VDD/2

- When the NMOS switch closes, the two capacitances will share their charge and settle to a readable
level by amplifiers

393
Memory – ROM

• Nonvolatile Memory
- SRAM and DRAM and attractive due to their speed

- however, they are volatile which means when the power is removed, the data is lost

- for a microcomputer, we need a nonvolatile storage device so that upon power-up, the
computer knows what to do.

- currently, the most popular semiconductor ROM is Flash (or EEprom)

- before looking at the details of a Flash transistor, let’s first look at the different types
of ROM arrays and addressing modes

394
Memory – ROM

• ROM Arrays
- There are two basic types of ROM arrays

1) NOR-based ROM
2) NAND-based ROM

• NOR-based ROM
- All Column Lines are pulled-up using a PMOS transistor (or resistor)

- The Row Lines are connected to the gates of NMOS transistors at the intersection of
Row and Column Lines

- The presence or absence of the NMOS transistors dictates whether a 1 or a 0 is stored

- If the NMOS transistor is present, it will pull down the Column Line when its gate is
driven high by the Row Line

- if the NMOS transistor is absent, the Column Line will not be pulled down, so it will remain
pulled up by the PMOS’s

395
Memory – ROM

• NOR-based ROM

- In order to Read from the array, the Row line is asserted and the desired Column line is observed

- a NOR-based ROM is similar to a Hex Keypad

396
Memory – ROM

• NAND-based ROM
- NAND-based ROM is a different array architecture

- it uses a depletion-load NMOS as the pull-up transistor

- the Column NMOS’s are connected in series with the


column lines (i.e. a NAND configuration)

- If an NMOS exists in the Column line and the Row line


is asserted, the NMOS will pull the Column Line down
and represent a stored ’0’

- If an NMOS is absent on the Column line and the


Row line is asserted, the Column Line will remain
pulled high by the depletion NMOS and represent
a stored ‘1’

- since all of the NMOS’s are in series, in order to Read


from a Row, all other Rows much be turned ON

- this means in order to distinguish the Row we are asserting,


we write a ‘0’ to it

397
Memory – ROM

• NAND-based ROM
- In this configuration, if an NMOS is present, it will
represent a “stored 1” since in order to address its
location, the Row line is driven to a ‘0’ and the NMOS
not turned on. This leaves the Column line pulled HIGH

- if an NMOS is absent, it will represent a “stored 0”


since all of the other Row NMOS’s are turned on
and will pull the Column Line LOW

- this gives the opposite behavior as in a NOR-based ROM

NOR NAND
NMOS present 0 1
NMOS absent 1 0

- it also gives a complementary addressing scheme

NOR NAND
Address Row Line by driving: 1 0
All other Row Lines driven to: 0 1

398
Memory – Flash

• Flash Memory Cells


- a novel breakthrough in ROM memory was the invention of the floating gate transistor in 1984
by Toshiba

- this transistor is constructed such that the threshold of the device can be changed in-system

- if the threshold can be raised and lowered, this allows the transistor within the ROM array
to either be:

“present” i.e., Normal Row addressing will turn the device ON (VRow-HIGH>VT,n)

or

“absent” i.e., Normal Row addressing is not high enough to turn the device on (VRow-HIGH<VT,n)

- the threshold change is accomplished by applying an E-field to specifically induce


“hot electron injection” to change the characteristics of the Gate structure

- if this threshold change can be accomplished after fabrication, this allows a reconfigurable
ROM device that is nonvolatile, reusable, and programmable with electricity (i.e., EEprom)

399
Memory – Flash

• Flash Memory Cells


- a floating gate transistor has a Control Gate and a Floating Gate

- the Floating Gate is separated from the semiconductor substrate using a “Thin Tunneling Oxide”

- On top of the Floating Gate, a thick Dielectric is grown and another Control Gate is patterned

400
Memory – Flash

• Flash Memory Cells


Raising VT,n

- if charge accumulates at the Floating Gate, this in effect makes the thin dielectric a better conductor

- If the thin dielectric becomes a conductor, this is the same as moving the functional Gate further
away from the substrate

- this makes it more difficult to create a channel in the substrate (i.e., VT,n gets higher)

401
Memory – Flash

• Flash Memory Cells


Raising VT,n

- we use hot electron injection to accomplish this

- if we apply a high voltage across the Source and Drain (VD=6v), electrons near the Drain
region will receive enough energy to form electron/hole pairs

- if we apply a high voltage at the Gate (VG=12v), the hot electrons in the substrate
will be attracted to the gate

- since the electron/holes have enough energy to move freely, electrons will tunnel into the thin oxide
and holes will tunnel into the substrate

- when the high voltages are removed, the


electron/holes will remain in their new
locations and effectively increase VT,n

- Raising VT,n is called Programming

402
Memory – Flash

• Flash Memory Cells


Lowering VT,n

- we use the Fowler-Nordheim Tunneling Mechanism (FN tunneling) to return the


Thin Floating Gate oxide to a conductor

- if the Gate is grounded and a high voltage (12v) is applied to the Source, the electrons in the
Floating Gate will be ejected out of the dielectric and into the Source

- this has the effect of restoring the insulating ability of the Thin Dielectric and effectively moves
the functional gate of the transistor closer to the substrate

- this makes it easier to create a channel in the substrate (i.e., VT,n gets lower)

- Lowering VT,n is called Erasing

403
Memory – Flash

• Flash Memory Cells


- If we position the threshold voltage at a normal CMOS level (~1v), then the transistor
can be turned on using a standard signal level at the gate (i.e., Vgate=5v)

- If we position the threshold voltage at a raised level (>VDD), then a standard signal level
at the gate will NOT be able to turn on the transistor

404
Memory – Flash

• NAND/NOR Flash
- we can use Flash Cells in a NOR or NAND Array to implement a EEprom

- the Flash Cell requires one additional line on the Source of each transistor in order to accomplish
the programming and erasing.

405
Memory – Flash

• NAND vs. NOR Flash


- “Flash” implies that blocks of memory are erased at a time

- this is a specific type of EEprom and is cheaper to fabrication due to less programming circuitry

NOR Flash

- slower erase and write times


- allows access to any address which makes it truly Random Access
- this is suitable for uP ROM applications such as BIOS or Firmware in which the uP needs to access
memory locations individually

NAND Flash

- faster erase and write times


- smaller chip area which creates higher density and lower cost
- more erase cycles than NOR-Flash
- not Random Access, data must be read/written in large blocks, not suitable for uP ROM
- it is well suited for thumb drives, iPods, and secondary storage in microcomputers
(i.e., hard drives, CDROMS)

406
Memory in VHDL

• Memory in VHDL

– Memory is described in VHDL using the keyword array

– The array keyword defines a 2D vector of information.

type memory_type is array (0 to 255) of


std_logic_vector(7 downto 0);

– This defines a data type which is a 2D array that is m x n (256 x 8)

– This data type can then be used to define either a signal (for RAM) or
constant (for ROM)

– Arrays in VHDL require integers as their indeces. This means a type


conversion must be used when access the 2D array since the address
lines will come in as STD_LOGIC_VECTOR (i.e.,
conv_integer(address))
407
Memory in VHDL

• RAM in VHDL
entity ram_256x8_sync is
port (clock : in std_logic;
data_in : in std_logic_vector(7 downto 0);
write : in std_logic;
address : in std_logic_vector(7 downto 0);
data_out : out std_logic_vector(7 downto 0)); This line defines a new data
end entity;
type called “ram_type” which
is a 2D array that is 256x8 of
architecture rtl of ram_256x8_sync is STD_LOGIC_VECTOR

type ram_type is array (0 to 255) of std_logic_vector(7 downto 0);


signal RAM : ram_type; This line creates a signal
called RAM which uses
begin “ram_type”. This signal can
be read or written to.
memory : process (clock)
begin
if (clock'event and clock='1') then

if (write = '1')) then


RAM(conv_integer(address)) <= data_in; -- this handles the synchronous write mode (en=1, write = 1)
else
data_out <= RAM(conv_integer(address)); -- this handles the synchronous read mode (en=0, write = 0)
end if;
Since “address” is STD_LOGIC
end if; but the array can only be
end process; indexed with integers, we do a
type conversion when
end architecture; accessing the 2D array.

408
Memory in VHDL

• ROM in VHDL (synchronous)


entity rom_128x8_sync is This line defines a new data
port (clock : in std_logic; type called “rom_type” which
address : in std_logic_vector(7 downto 0); is a 2D array that is 128x8 of
data_out : out std_logic_vector(7 downto 0));
end entity;
STD_LOGIC_VECTOR

architecture rtl of rom_128x8_sync is

type rom_type is array (0 to 127) of std_logic_vector(7 downto 0);


constant ROM : rom_type := (0 => x“12",
1 => x"AA",
Instead of creating a signal as
2 => x“CD", in RAM, we create a constant
3 => x"80", of type “rom_type”. This
: constant is 128x8 and can be
: initialized. It can only be read
begin
from by external systems.

memory : process (clock)


begin
if (clock'event and clock='1') then
data_out <= ROM(conv_integer(address));
end if;
Again, a type conversion is
end process; needed to access the 2D array.
Only read capability needs to
end architecture; be modeled.

409
Memory in VHDL

• ROM in VHDL (asynchronous)


entity rom_128x8_sync is
port (clock : in std_logic;
address : in std_logic_vector(7 downto 0);
data_out : out std_logic_vector(7 downto 0));
end entity;

architecture rtl of rom_128x8_sync is

type rom_type is array (0 to 127) of std_logic_vector(7 downto 0);


constant ROM : rom_type := (0 => x“12",
1 => x"AA",
2 => x“CD",
3 => x"80",
:
:

begin

data_out <= ROM(conv_integer(address));

end architecture; data_out is always being


driven with this concurrent
signal assignment.

410
Memory Mapping

• Memory Mapping

- Mapping different types of memory to


certain address ranges creates a
“Memory Mapped” system.

- This makes addressing from the


CPU simpler

411
Memory Mapping

ROM mapped to addresses 0-127


• Address Decoding memory : process (clock)
begin
if (clock'event and clock='1') then
if (address >= 0 and address <= 127) then
data_out <= ROM(conv_integer(address));
- Address decoding can be accomplished end if;
within the model for the RAM/ROM/IO end if;
end process;

RAM mapped to addresses 128-191


memory : process (clock)
begin
if (clock'event and clock='1') then

if ((address >= 128 and address <= 191) and (write = '1')) then
RAM(conv_integer(address)) <= data_in;
elsif (address >= 128 and address <= 191) then
data_out <= RAM(conv_integer(address
end if;
end if;
end process;

An output port mapped to 192


U3 : process (clock, reset)
begin
if (reset = '0') then
port_out_00 <= x"00";
elsif (clock'event and clock='1') then
if (address = x"C0" and write = '1') then
port_out_00 <= data_in;
end if;
end if;
end process;

412
More Details of Using VHDL
for Memories

Portions of this work are from the book, Digital Design: An Embedded
Systems Approach Using VHDL, by Peter J. Ashenden, published by Morgan
Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.
VHDL

General Concepts
 A memory is an arrayof m bits
storage locations 0
 Each with a unique address 1
2
 Like a collection of 3
registers, but with 4
optimized implementation 5
6
 Address is unsigned-binary
encoded
2n-2
 n address bits ⇒ 2n locations 2n–1
 All locations the same size
 2n × m bit memory
2
VHDL

Memory Sizes
 Use power-of-2 multipliers
 Kilo (K): 210 = 1,024 ≈ 103
 Mega (M): 2
20 = 1,048,576 ≈ 106

 Giga (G): 230 = 1,073,741,824 ≈ 109

 Example
 32K × 32-bit memory
 Capacity = 1,025K = 1Mbit
 Requires 15 address bits
 Size is determined by application
requirements

3
VHDL

Basic Memory Operations


 a inputs: unsigned address
 d_in and d_out
 Type depends on application
a(0)
a(1)  Write operation
a(n-1)  en = 1, wr = 1
d_in(0) d_out(0)  d_in value stored in location given
d_in(1) d_out(1) by address inputs
d_in(m-1) d_out(m-1)  Read operation
en
wr
 en = 1, wr = 0
 d_out driven with value of location
given by address inputs
 Idle: en = 0

4
VHDL

Example: Audio Delay Unit


 System clock: 1MHz
 Audio samples: 8-bit signed, at 50kHz (50 samples/msec)

 New sample arrives when audio_in_en = 1


 Delay control: 8-bit unsigned ⇒ ms to delay
 Output: audio_out_en = 1 when output ready
20µs
clk

audio_in st st+1

audio_in_en

audio_out st−d st−d+1

audio out en

5
VHDL

Audio Delay Datapath


14
count_en en Q
clk clk
0 14
a
- 1 16
delay
8
×50
14 d_in d_out audio_out
en
addr_sel wr
16
audio_in
mem_en
mem_wr

 Max delay = 255ms


 Need to store 255 × 50 = 12,750 samples
 Use a 16K × 16-bit memory (14 address bits)
214 = 16384
6
VHDL

Audio Delay Control Section


Step 1: (idle state)
 audio_in_en = 0 ⇒ do nothing
 audio_in_en = 1 ⇒ write memory using counter
value as address
 Step 2:
 Read memory using subtractor output as address,
increment counter
State audio_ Next state addr_sel mem_en mem_wr count_en audio_
in_en out_en
Step 1 0 Step 1 0 0 0 0 0

Step 1 1 Step 2 0 1 1 0 0

Step 2 – Step 1 1 1 0 1 1

7
VHDL

Wider Memories
 Memory components have a fixed width
 E.g., ×1, ×4, ×8, ×16, ...
 Use memory en
wr
en
wr

components in a(13…0)
d_in(15…0)
a(13…0)
d_in(15…0)

parallel to make d_out(15…0) d_out(15…0)

a wider memory en
wr
a(13…0)
 E.g, three 16K×16 d_in(31…16) d_in(15…0)

components for a d_out(15…0) d_out(31…16)

16K×48 memory en
wr
a(13…0)
d_in(47…32) d_in(15…0)
d_out(15…0) d_out(47…32)

8
VHDL

More Locations


To provide 2n locations with 0
1
2k-location components 2k-1
 Use 2n/2k components 2k
2k+1
 Address A 2×2k-1
 at offset A mod 2k 2×2k
2×2k+1
 least-significant k bits of A
3×2k-1
 in component ⎣A/2k⎦
 most-significant n-k bits of A
2n-2k
 decode to select component 2n-2k +1

2n-1
n-k bits k bits
to decoder to address bus
to chip enables of all memory chips
-rks
9
VHDL

More Locations
en
wr wr
a(13…0) a(13…0)
d_in(7…0) d_in(7…0)
d_out(7…0)

en
wr
a(13…0)
en en 0
1 d_in(7…0)
2
a(15…14) 3 d_out(7…0)

 Example: en
wr
0
1
2 d_out(7…0)
64K×8 memory a(130)

d_in(7…0)
3

composed of d_out(7…0)

16K×8 components en
wr
a(13…0)
d_in(7…0)
d_out(7…0)

10
VHDL

Tristate Drivers
 Allow multiple outputs to be connected together
 Only one active at a time
 Remaining outputs are high-impedance
 Both output transistors turned off
 Allow bidirectional input/output ports
+V

+V +V +V
output

11
VHDL

Memories with Tristate Ports

 During write wr
en
wr

 memory d drivers hi-Z a(13…0) a(13…0)


d(7…0)

 memory senses d en
wr
 During read en en
0
1
a(13…0)
a(15…14)
2 d(7…0)
selected memory drives d
3

en
 Fewer pins and wires wr
a(13… 0)

 Reduced cost of PCB d(7…0)

 Usually not available en


wr

within ASICs or FPGAs d(7…0)


a(13…0)
d(7…0)

12
VHDL

Memory Types
 Random-Access Memory (RAM)
 Can read and write
 Static RAM (SRAM)
 Stores data so long as power is supplied
 Asynchronous SRAM: not clocked

Synchronous SRAM (SSRAM): clocked


 Dynamic RAM (DRAM)
 Needs to be periodically refreshed
 Read-Only Memory (ROM)
 Combinational
 Programmable and Flash rewritable
 Volatile and non-volatile
13
VHDL

Asynchronous SRAM
 Data stored in 1-bit latch cells
 Address decoded to enable a given cell
 Usually use active-low control inputs
 Not available as components in ASICs or FPGAs

A
A
CE
D
CE WE
WE
OE OE
tsu th
D stored data read data

Setup and Hold time (write) Access time (read)


14
VHDL

Asynch SRAM Timing


 Timing parameters published in data sheets
 Access time
 From address/enable valid to data-out valid
 Cycle time
 From start to end of access
 Data setup and hold
 Before/after end of WE pulse
 Makes asynch SRAMs hard to use in clocked
synchronous designs

15
VHDL

Example Data Sheet

16
VHDL

Synchronous SRAM (SSRAM)


 Clocked storage registers for inputs
 address, data and control inputs
 stored on a clock edge

 held for read/write cycle

 Flow-through SSRAM clk


A a1 a2
 no register on en
data output wr

Flow-through : On write, Input shows up at Output D_in xx


after propagation delay.
D out xx M(a22)

17
VHDL

Example: Coefficient Multiplier


 Compute function y = ci × x 2
 Coefficient stored in flow-through SSRAM
 12-bit unsigned integer index for i

 x, y, ci 20-bit signed fixed-point


 8 pre- and 8 post-binary point bits
 Use a single multiplier
 Multiply ci × x × x

18
VHDL

Multiplier Datapath
ci×x 1. (mult_sel = 0)

SSRAM
i A x × (ci × x) 2. (mult_sel = 1)
c in D in D out
c_ram_en en
c_ram_wr wr 0
1
clk
× D Q y
x D Q 0 ce
x_ce ce 1 clk
clk

mult_sel
y_ce
clk

19
VHDL

Multiplier Timing and Control

0
step1 1 step2
1, 1, 0, 0 0, 0, 0, 1

step3
0, 0, 1, 1 step1 step1 step2 step3 step1
clk
start

c_ram_en
x_ce

mult_sel
y_ce

20
VHDL

Pipelined SSRAM
 Data output also has a register
 More suitable for high-speed systems
 Access RAM in one cycle, use the data in

the next cycle


clk
A a1 a2

en
wr
D_in xx

D_out xx M(a2)

21
VHDL

Memories in VHDL
 RAM storage represented by an array signal

type RAM_4Kx16 is array (0 to 4095) of std_logic_vector(15 downto 0);


signal data_RAM : RAM_4Kx16;
...
data_RAM_flow_through : process (clk) is
begin
if rising_edge(clk) then
if en = '1' then Flow-through : On write, Input shows up at Output
after propagation delay.
if wr = '1' then
data_RAM(to_integer(a)) <= d_in; d_out <= d_in;
else
d_out <= RAM(to_integer(a));
end if;
end if;
end if;
end process data_RAM_flow_through;

22
VHDL

Example: Coefficient Multiplier


library ieee; use ieee.std_logic_1164.all,
ieee.numeric_std.all, ieee.fixed_pkg.all;
entity scaled_square is
port ( clk, reset : in std_logic;
start : in std_logic;
i : in unsigned(11 downto 0);
c_in, x : in sfixed(7 downto -12);
y : out sfixed(7 downto -12) );
end entity scaled_square;

architecture rtl of scaled_square is


signal c_ram_en ,c_ram_wr ,x_ce ,mult_sel ,y_ce : std_logic;
signal c_out, x_out : sfixed(7 downto -12);
signal y_out : sfixed(7 downto -12);
type c_array is array (0 to 4095) of sfixed(7 downto -12);
signal c_RAM : c_array;
type state is (step1, step2, step3);
signal current_state, next_state : state;

23
VHDL

Example: Coefficient Multiplier


begin SSRAM
i A
c_in D_inD_out
c_ram_wr <= '0'; c_ram_en en
c_ram_wr wr 0
1
c_RAM_flow_through : process (clk) is clk
× D Q y
x
begin x_ce
D Q
ce
0
1
ce
clk

if rising_edge(clk) then clk

mult_sel
if c_ram_en = '1' then y_ce
clk
if c_ram_wr = '1' then
c_RAM(to_integer(i)) <= c_in; Store (and use) the scaling values
c_out <= c_in;
else
c_out <= c_RAM(to_integer(i)); Use the previously stored values.
end if;
end if;
end if;
end process c_RAM_flow_through;

24
VHDL

Example: Coefficient Multiplier


y_reg : process (clk) is
variable operand1, operand2 : sfixed(7 downto -12);
begin
if rising_edge(clk) then
if y_ce = '1' then
if mult_sel = '0' then
operand1 := c_out; operand2 := x_out; i
c ×x
else
operand1 : =x out; operand2 : =y out; x × (ci × x)
end if;
y_out <= operand1 * operand2;
end if;
end if;
SSRAM
end process y_reg; i A
c_in D_inD_out

y <= y_out; c_ram_en


c_ram_wr
en
wr 0
1
clk
×
state_reg : process ... x D Q 0
D Q
ce
y

next_state_logic : process ... x_ce ce


clk
1 clk

output_logic : process ... mult_sel


y_ce
clk
end architecture rtl;

25
VHDL

Pipelined SSRAM in VHDL


data_RAM_pipelined : process (clk) is
variable pipeline_en : std_logic;
variable pipeline_d_out : std_logic_vector(15 downto 0);
begin
if rising_edge(clk) then
if pipelined_en = '1' then
output
d_out <= pipelined_d_out; register
end if;
pipeline_en := en; Need the enable for one more clock. SSRAM
if en = '1' then
if wr = '1' then
data_RAM(to_integer(a)) <= d_in; pipeline_d_out := d_in;
else
pipeline_d_out := RAM(to_integer(a));
end if;
end if;
end if;
end process data RAM pipelined;

26
VHDL

Generating SSRAM Components


 Variations on SSRAM behavior
 E.g., write-first, read-first or no-change on
write cycle
 Burst accesses to successive locations
 Not all synthesis tools recognize the
same templates
 Use a RAM core generator tool

27
VHDL

Example: RAM Core Generator

28
VHDL

Multiport Memories
 Multiple address, data and control
connections to the storage locations
Allows concurrent accesses
 Avoids multiplexing and sequencing

 Scenario
 Data producer and data consumer
 What if two writes to a location occur
concurrently?
 Result may be unpredictable
 Some multi-port memories include an arbiter

29
VHDL

FIFO Memories
 First-In/First-Out buffer
 Connecting producer and consumer
 Decouples rates of production/consumption

Producer Consumer
FIFO
subsystem subsystem

 Implementation using
dual-port RAM
read
 Circular buffer
 Full: write_addr = read_addr
write
 Empty: write_addr = read_addr
30
VHDL

Example: FIFO Datapath

counter
8-bit A_rd
rd_en ce Q
reset
clk = equal

counter dual-port
8-bit A_wr SSRAM
ce Q A_wr A_rd
reset reset D_wr D_rd D_rd
clk wr en rd en

D_wr clk clk


wr_en
clk

 Equal = full or empty


 Need to distinguish between these states — How?
31
VHDL

Example: FIFO Control


 Control FSM
 -> filling when write without concurrent read
-> emptying when read without concurrent write
Unchanged when concurrent write and read

emptying
full = filling and equal
wr_en, rd_en 1, 0 0, 1
empty = emptying and equal
filling

32
VHDL

Multiple Clock Domains


 Need to resynchronize data that
traverses clock domains
 Use resynchronizing registers
May overrun if sender's clock is faster
than receiver's clock
 FIFO smooths out differences in data
flow rates
 Latch cells inside FIFO RAM written with
sender's clock, read with receiver's clock

33
VHDL

Dynamic RAM (DRAM)


 Data stored in a 1-transistor/1-capacitor cell
 Smaller cell than SRAM, so more per chip
 But longer access time
 Write operation bit line
word line
 pull bit-line high or low (0 or 1)
 activate word line
 Read operation
 precharge bit-line to intermediate voltage
 activate word line, and sense charge equalization
 rewrite to restore charge

34
VHDL

DRAM Refresh
 Charge on capacitor decays over time
 Need to sense and rewrite periodically
 Typically every cell every 64ms
 Refresh each location
 DRAMs organized into banks of rows
 Refresh whole row at a time
 Can’t access while refreshing
 Interleave refresh among accesses
 Or burst refresh every 64ms

35
VHDL

DDR DRAM
Feature DDR DDR2 DDR3
Voltage 2.5V 1.8V 1.5V
Max data rate per I/O pin (Mbits/sec) 800 1066 1600
Peak Bandwidth 3.2 4.2 6.4
(Gbytes/sec for a 32 bit data bus)
Sustained Bandwidth 1.9 2.5 3.8
(Gbytes/sec for a 32 bit data bus) - (60%)
Max Density 1 4 4
(Gbits per device)

36
VHDL

Read-Only Memory (ROM)


 For constant data, or CPU programs
 Masked ROM

 Data manufactured into the ROM


 Programmable ROM (PROM)
 Use a PROM programmer
 Erasable PROM (EPROM)
 UV erasable
 Electrically erasable (EEPROM)

 Flash RAM

37
VHDL

Combinational ROM
 AROMmapsaddressinputtodataoutput
 This is a combinational function!
 Specify using a table
 Example: 7-segment decoder
Address Content Address Content
BCD0 A0 D0 a
BCD1 A1 D1 b 0 0111111 6 1111101
BCD2 A2 D2 c 1 0000110 7 0000111
BCD3 A3 D3 d
blank A4 D4 e 2 1011011 8 1111111
D5 f 3 1001111 9 1101111
D6 g
4 1100110 10-15 1000000
5 1101101 16-31 0000000
38
VHDL

Example: ROM in VHDL


library ieee; use ieee.numeric_std.all;
architecture ROM_based of seven_seg_decoder is
type ROM_array is
array (0 to 31) of std_logic_vector ( 7 downto 1 );
constant ROM_content : ROM_array
:= ( 0 => "0111111", 1 => "0000110",
2 => "1011011", 3 => "1001111",
4 => "1100110", 5 => "1101101",
6 => "1111101", 7 => "0000111",
8 => "1111111", 9 => "1101111"
10 to 15 => "1000000", 16 to 31
=> "0000000" ); begin

seg <= ROM_content(to_integer(unsigned(blank & bcd)));


end architecture ROM based;

39
VHDL

Flash RAM
 Non-volatile, readable (relatively fast), writable
(relatively slow)
 Storage partitioned into blocks
 Erase a whole block at a time, then write/read
 Once a location is written, can't rewrite until erased
 NOR Flash
 Can write and read individual locations
 Used for program storage, random-access data
 NAND Flash
 Denser, but can only write and read block at a time
 Used for bulk data, e.g., cameras, memory sticks

40
VHDL

Memory Errors
 Bits in memory can be flipped
 Hard error
 The chip is broken
 E.g., manufacturing defect, wear (in Flash)

 Soft error
 Stored data corrupted, but cell still works
 E.g., from atmospheric neutrons

 Soft-error rate
 frequency of occurrence

41
VHDL

Error Detection using Parity


 Add a parity bit to each location
 On write access

 compute data parity and store with data


 On read access
 check parity, take exception on error
 If we could tell which bit flipped
 correct by flipping it back, then write back
to memory location
 Can’t do this with parity

42
VHDL

Error-Correcting Codes (ECC)


 Allow identification of the flipped bit
 Hamming Codes
 Eg , for single-bit-error correction of N-bit word ,
need log2N + 1 extra bits
 Example: 8-bit word,d1 …d 8
 12-bit ECC code, e1...e12
 e1, e2, e4, e8 are check bits, the rest data

d8 d7 d6 d5 d4 d3 d2 d1
Check bits are in bit positions
whose indices are a power of 2

e12
12
e11
11
e10
10
e99 e88 e77 e66 e55 e44 e33 e22 e11

43
VHDL

Hamming Code Example


d8 d7 d6 d5 d4 d3 d2 d1 e1 = e3 e5 e7 e9 e11

e12 e11 e10 e9 e8 e7 e6 e5 e4 e3 e2 e1 e 2 = e3 e6 e7 e10 e11


e1 0 0 0 1 e4 = e5 e6 e7 e12
e2 0 0 1 0
e4 0 1 0 0 e8 =e 9 e10 e11 e12
e8 1 0 0 0
e3 0 0 1 1
e5 0 1 0 1
 Every data bit covered by two
e6 0 1 1 0 or more check bits
e7 0 1 1 1  On write: Compute check bits
e9 1 0 0 1 and store with data
e10 1 0 1 0
e11 1 0 1 1
e12 1 1 0 0
44
VHDL

Hamming Code Example


 On read: Recompute check bits
and XOR with read check bits
 result called the syndrome
e1 0 0 0 1  0000 => no error
e2 0 0 1 0
e4 0 1 0 0
 If data bit flipped
e8 1 0 0 0  covering bits of syndrome are 1
e3 0 0 1 1  = binary code of flipped ECC bit
e5 0 1 0 1  If stored check bit flipped
e6 0 1 1 0
 that bit of syndrome is 1
e7 0 1 1 1
e9 1 0 0 1  On error, unflip bit and rewrite
e10 1 0 1 0 memory location
e11 1 0 1 1
e12 1 1 0 0
45
VHDL

Multiple-Error Detection
 What if two bits flip
 syndrome identifies wrong bit, or is invalid
 One extra check bit allows
 single-error correction, double-error detection
Single-bit correction Double-bit detection
N Check bits Overhead Check bits Overhead
8 4 50% 5 63%
16 5 31% 6 38%
32 6 19% 7 22%
64 7 11% 8 13%
128 8 6.3% 9 7.0%
256 9 3.5% 10 3.9%
46
VHDL

Summary
 Memory: addressable storage locations
 Read and Write operations
 Asynchronous RAM
 Synchronous RAM (SSRAM)
 Dynamic RAM (DRAM)
 Read-Only Memory (ROM) and Flash
 Multiport RAM and FIFOs
 Error Detection and Correction
 Hamming Codes

47
Embedded Computers
VHDL

Embedded Computers
 A computer as part of a digital system
 Performs processing to implement or control the
system’s function
 Components
 Processor core
 Instruction and data memory
Input,output,and input/output controllers
For interacting with the physical world
 Accelerators
 High-performance circuit for specialized functions
 Interconnecting buses

2
VHDL

Memory Organization
 Von Neumann architecture
 Single memory for instructions and data
 Harvard architecture
 Separate instruction and data memories
 Most common in embedded systems
Instruction Data
CPU Accelerator
memory memory

Input Output I/O …


controller controller controller

3
VHDL

Bus Organization
 Single bus for low-cost low-performance
systems
 Multiple buses for higher performance
Data
Accelerator
memory

Instruction
CPU
memory

Input Output I/O


controller controller controller

4
VHDL

Bus Organization

Traditional Bus Topology

5
VHDL

Bus Organization

Typical Switch Fabric Topology

6
VHDL

Bus Organization
Altera’s System Interconnect Fabric Example

7
VHDL

Bus Organization
Altera’s Memory-Mapped and Streaming System Interconnect Fabrics

SRIO:
Serial RapidIO is a high-
performance, point-to-
point, packet-switched
interconnect technology
defined by the RapidIO
Trade Association.

Full-duplex point-to-point
links are established with
single or multiple high-
speed serial lanes (1x and
4x are currently defined),
and industry-standard
8B/10B-encoded data
transmission at signaling
rates of 1.25, 2.50, or
3.125 Gbaud for peak
bandwidth of up to 20
Gbps.

8
VHDL

Microprocessors
 Single-chip processor in a package
 External connections to memory and
I/O buses
 Most commonly seen in general
purpose computers
Eg , IntelPentiumfamily,PowerPC, …

9
VHDL

Microcontrollers
 Single chip combining
 Processor
 A small amount of instruction/data memory
 I/O controllers
 Microcontroller families
 Same processor, varying memory and I/O
 8-bit microcontrollers NXP’s 50-MHz ARM Cortex-
M0-based LPC1100
 Operate on 8-bit data microcontroller family
represents the latest 32-bit
 Low cost, low performance challenge to 8- and 16-bit
processors. The parts are

16-bit and 32-bit microcontrollers


available now with prices
 starting at 65 to 95 cents
(10,000).

 Higher performance CoreMark Benchmark


measures 40 to 50% better
code density for the LPC1100
than that of 8- and 16-bit
microcontrollers. 10
VHDL

Processor Cores
 Processor as a component in an FPGA or
ASIC
In FPGA,can be a fixed-function block
 E.g., PowerPC cores in some Xilinx FPGAs
 Or can be a soft core
 Implemented using programmable resources
 E.g., Xilinx MicroBlaze, Altera Nios-II
 In ASIC, provided as an IP block
 E.g., ARM, PowerPC, MIPS, Tensilica cores
 Can be customized for an application

11
VHDL

Digital Signal Processors


 DSPs are processors optimized for
signal processing operations
 E.g., audio, video, sensor data; wireless
communication
 Often combined with a conventional
core for processing other data
 Heterogeneous multiprocessor

12
VHDL

Instruction Sets
 Aprocessorexecutesaprogram
 A sequence of instructions, each performing a
small step of a computation
 Instruction set: the repertoire of available
instructions
 Different processor types have different instruction
sets How are new instructions chosen to be
added to Instruction Set?

 High-level languages: more abstract


 Eg , C,C++,Ada,Java
 Translated to processor instructions by a compiler
Memory _ stall _ cycles
CPU _ time= IC × (CPI execution + ) ×Clock _ period
Instruction
13
VHDL

Instruction Execution
 Instructions are encoded in binary
 Stored in the instruction memory
 Approcessor executes a program by
repeatedly
 Fetching the next instruction
 Decoding it to work out what to do

 Executing the operation

 Program counter (PC)


 Register in the processor holding the
address of the next instruction
14
VHDL

Data and Endian-ness


 Instructions operate on data from the data memory
 Byte: 8-bit data
 Data memory is usually byte addressed
 16-bit, 32-bit, 64-bit words of data
Little endian Big endian
0 8-bit data 0 8-bit data

m least sig. byte m most sig byte


16-bit data 16-bit data
m+1 most sig. byte m+1 least sig. byte
Little endian Big endian
LSB=lowest address MSB=lowest address
n least sig. byte
Intel x86 n most sig. byte PowerPC
n+1 n+1
32-bit data 32-bit data
n+2 n+2
n+3 most sig byte n+3 least sig. byte

15
von Neumann Computer
von Neumann Computer

• von Neumann Stored Program Computer

- "Stored Program" means the HW is designed to execute a set of pre-defined instructions

- the program and data reside in a storage unit (i.e., memory)

- to change the functionality of the computer, the program is changed (instead of the HW)

- John von Neumann was a mathematician who described a computer architecture where the
instructions and data reside in the same memory

- this implies sequential execution

- it is simple from the standpoint of state machine timing

- the drawback is the "von Neumann bottleneck" in getting data into and out of memory in order for
the computer to run

- this architecture is what we are using in the labs on the Freescale microcontrollers

476
von Neumann Computer

• Block Diagram of von Neumann Computer

- Notice that information going into/out-of the computer is on ports.

477
von Neumann Computer

• Bus Management

- There are a great deal of signal that exist in a microcomputer. Sharing lines reduces the amount
of wiring needed on the chip.

- This creates a situation where bus contention needs to be avoided.

- There are three common techniques for bus management:

1) verbose routing – every devices has a dedicated input / output bus that connects to
or explicit any/all devices that it needs to communicate.

2) High Impedance - devices share a signel output bus but each devices has a high
impedance state. Only one device is allowed to drive the bus at any
given time.

3) Mulitiplexed - device share a single output bus, but each devices routes its output
to a multiplexer which then in turn drives the bus.

478
von Neumann Computer

• Block Diagram of the Central Processing Unit (CPU)

479
von Neumann Computer

• Central Processing Unit (CPU)

- the CPU consists of:

1) Control Unit - the state machine that directs the execution of instructions.
- for a given Opcode, the state machine traverses a specific
path within its state diagram
- also called the "Sequence Controller" or "Sequencer"

2) Processing Unit - contains all of the registers and ALU that hold and manipulate data
- memory signals (data/address) coming into/out-of this unit

3) Control Signals - signals sent to processing unit from the control unit
- direct data flow
- load data into registers
- select ALU operation
- manage memory access signals

4) Test Signals - signals sent to control unit from the processing unit
- results of operations that effect state machine flow

480
von Neumann Computer (Processing Unit)

• Processing Unit

- let's start with the registers within the processing unit

Instruction Registers (IR) - holds the Opcode that is read from memory
- passes the Opcode to the Control Unit as a test signal

Memory Address Reg (MAR) - holds the current address being sent to memory

Program Counter (PC) - tracks the address of which instruction is being executed
- PC is sequential (0,1,2…)
- PC is loaded during a branch, incremented otherwise
- MAR tracks PC when executing instruction

User-Controlled Reg (X, Y,..) - these are operated on directly by the program
- can be loaded and stored

ALU Operand Register (Z) - holds one of the inputs to the ALU
- the other input comes from one of the user-controlled registers

481
von Neumann Computer (Processing Unit)

• Processing Unit

Arithmetic / Logic Unit (ALU)


- performs data math and manipulation
- we first load Z with the first input
- we then select which user-controlled register is the other input
- the control unit sends select lines to indicate which operation to perform

Condition Code Register (CCR)


- tracks the status of ALU operations (i.e., NZVC)
- these signals are sent to the control unit in order to alter sequence flow

482
von Neumann Computer (Processing Unit)

• Buses

- for this example, let’s use a multiplexed bus sytsem

- we route data in the processing unit between registers/memory using shared lines called buses

- for this architecture, we need two buses

Bus1 - can take either PC or the User-Controlled Registers

- will drive to Memory_In or Bus 1

Bus2 - can take either ALU, Bus1, or Memory_Out

- will drive to IR, MAR, PC, User-Controlled Registers, or ALU Operand Reg

- Information from Bus1 can be routed to Bus2 for feedback operations (PC = PC + 1)

- Bus select lines come from the Control Unit to select which information is on which bus at any
given time.

483
von Neumann Computer (Processing Unit)

• Control Signals
- the Bus1 and Bus2 control lines come from the control unit and drive the multiplexers

- the WRITE line is a synchronous load to memory from Memory_Out

- CCR_Load will load the status bits (NZVC), whose values depend on the previous ALU operation

- the ALU_Sel line tells the ALU which function to perform (AND, ADD, …)

• Test Signals
- the Instruction Register (IR) holds the Opcode for the Control Unit to base state decisions on

- the CCR_Result is the NZVC status bits from an ALU operation and influence state decisions

484
von Neumann Computer (Processing Unit)

• Register Modeling
- each register in the processing unit can be loaded by the control unit.

- the input to most registers is Bus2

- the loads are synchronous to clock and occur on the following state

Instruction Register (IR) Memory Address Register (MAR)


IR_Register : process (Clock, Reset) MAR_Register : process (Clock, Reset)
begin begin
if (Reset = '0') then if (Reset = '0') then
IR <= x"00"; MAR <= x"00";
elsif (Clock'event and Clock='1') then elsif (Clock'event and Clock='1') then
if (IR_Load = '1') then if (MAR_Load = '1') then
IR <= Bus2; MAR <= Bus2;
end if; end if;
end if; end if;
end process; end process;

485
von Neumann Computer (Processing Unit)

• Register Modeling Cont…


- The Program Counter needs a “load” and an “increment”
Program Counter (PC)
PC_Register : process (Clock, Reset)
begin
if (Reset = '0') then
PC <= x"00";
elsif (Clock'event and Clock='1') then
if (PC_Load = '1') then
PC <= Bus2;
elsif (PC_Inc = '1') then
PC <= PC + 1;
end if;
end if;
end process;

X Register Y Register Z Register


X_Register : process (Clock, Reset) Y_Register : process (Clock, Reset) Z_Register : process (Clock, Reset)
begin begin begin
if (Reset = '0') then if (Reset = '0') then if (Reset = '0') then
X <= x"00"; Y <= x"00"; Z <= x"00";
elsif (Clock'event and Clock='1') then elsif (Clock'event and Clock='1') then elsif (Clock'event and Clock='1') then
if (X_Load = '1') then if (Y_Load = '1') then if (Z_Load = '1') then
X <= Bus2; Y <= Bus2; Z <= Bus2;
end if; end if; end if;
end if; end if; end if;
end process; end process; end process;

486
von Neumann Computer (Processing Unit)

• MUX Modeling
- The bus select signals come from the control unit. The Multiplexers are “combinational logic”

Bus 1 Bus 2
BUS1_CONTROL : process (Bus1_Sel, PC, X, Y) BUS2_CONTROL : process (Bus2_Sel, ALU, Bus1, Memory_Out)
begin begin
case (Bus1_sel) is case (Bus2_sel) is
when "00" => Bus1 <= PC; when "00" => Bus2 <= ALU;
when "01" => Bus1 <= X; when "01" => Bus2 <= Bus1;
when "10" => Bus1 <= Y; when "10" => Bus2 <= Memory_Out;
when others => Bus1 <= "XXXXXXXX"; when others => Bus2 <= "XXXXXXXX";
end case; end case;
end process; end process;

487
von Neumann Computer (ALU)

• ALU Modeling
- The ALU is combinational logic. It contains as many operations as desired. The operation being
performed is dictated by the control unit.
ALU
ALU_Functions : process (ALU_Sel, Z, Bus1)
begin
case (ALU_sel) is
when '0' => ALU <= Z and Bus1; -- AND
when '1' => ALU <= Z + Bus1; -- ADD
when others => ALU <= x"00";
end case;
end process;

488
von Neumann Computer (ALU)

• CCR Modeling
- The CCR is a register because we want it to hold the status flags across multiple instructions.

- Typical flags are: Negative (N), Zero (Z), 2’s Comp Overflow (V), and Carry (C)

- These flags are fed back to the control unit for state transition decisions during branch instructions
(i..e, Branch if Zero, Branch if Carry, etc…)
CCR example for Zero Flag
CCR_Register : process (Clock, Reset)
begin
if (Reset = '0') then
CCR_Result <= x"00";
elsif (Clock'event and Clock='1') then
if (CCR_Load = '1') then
if (ALU = x"00") then
CCR_Result <= "00000100";
else
CCR_Result <= "00000000";
end if;
end if;
end if;
end process;

489
von Neumann Computer (Control Unit)

• Sequence Control Modeling


- The control unit is the finite state machine that handles the computer operations of:

Fetch, Decode, & Execute

- It consists of a single state transition path for Fetch & Decode followed by a set of parallel paths
which handle the execution of each instruction in the instruction set of the microcomputer.

- The Sequence Controller creates all of the control signals which drive the processing unit & ALU.

- Its inputs include:

- The Instruction Register (for decoding the Opcode)


- The Condition Code Register (for branching)

490
von Neumann Computer (Control Unit)

• Sequence Control State Diagram


- Example State Paths for:

1) Load X with Immediate Addressing


2) Store X with Immediate Addressing
3) Branch Always

Fetch States handle reading


the OpCode from memory and
placing it in the Instruction
Register.

Decode State(s) handle giving


time for the state machine to
decide which instruction was
read

Execute State(s) perform the


specific operation for each of
the instructions in the
microcomputer’s instruction
set.

491
von Neumann Computer (Control Unit)

• Sequence Controller Modeling


- Instruction mnemonics can be symbolized using “generics”
Mnemonics for 3 instructions
generic (LDX_IMM : STD_LOGIC_VECTOR (7 downto 0) := x"86"; -- Load Register X with Immediate Addressing
STX_DIR : STD_LOGIC_VECTOR (7 downto 0) := x"96"; -- Store Register X to memory (RAM or IO)
BRA : STD_LOGIC_VECTOR (7 downto 0) := x"20"); -- Branch Always

- States are included as instructions are added to the instruction set


State Names for executing 3 instructions
type State_Type is (S_FETCH_0, S_FETCH_1, S_FETCH_2, -- States to Fetch Opcode
S_DECODE_3, -- State to Decode Opcode
S_LXIMM_4, S_LXIMM_5, S_LXIMM_6, -- States for LDX_IMM Instruction
S_STXDIR_4, S_STXDIR_5, S_STXDIR_6, S_STXDIR_7, -- States for STX_DIR Instruction
S_BRA_4, S_BRA_5, S_BRA_6); -- States for BRA Instruction

492
von Neumann Computer (Control Unit)

• Sequence Controller Modeling


- The FSM is then modeled using the traditional 3-process technique in VHDL
Next State Memory
STATE_MEMORY : process (Clock, Reset)
begin
if (Reset = '0') then
Current_State <= S_FETCH_0; -- State upon reset
elsif (Clock'event and Clock='1') then
Current_State <= Next_State; -- Normal Operation
end if;
end process STATE_MEMORY;

493
von Neumann Computer (Control Unit)

• Sequence Controller Modeling


NEXT_STATE_LOGIC : process (Current_State, IR) Next State Logic
begin
case (Current_State) is

when S_FETCH_0 => Next_State <= S_FETCH_1; -- Fetch First Opcode


when S_FETCH_1 => Next_State <= S_FETCH_2;
when S_FETCH_2 => Next_State <= S_DECODE_3;
when S_DECODE_3 => if (IR = LDX_IMM) then
Next_State <= S_LXIMM_4; -- LDX_IMM Instruction
elsif (IR = STX_DIR) then
Next_State <= S_STXDIR_4; -- STX_DIR Instruction
elsif (IR = BRA) then
Next_State <= S_BRA_4; -- BRA Instruction
end if;

when S_LXIMM_4 => Next_State <= S_LXIMM_5; -- States when the instruction is Load X Immediate
when S_LXIMM_5 => Next_State <= S_LXIMM_6;
when S_LXIMM_6 => Next_State <= S_FETCH_0;

when S_STXDIR_4 => Next_State <= S_STXDIR_5; -- States when the instruction is Store X Direct
when S_STXDIR_5 => Next_State <= S_STXDIR_6;
when S_STXDIR_6 => Next_State <= S_STXDIR_7;
when S_STXDIR_7 => Next_State <= S_FETCH_0;

when S_BRA_4 => Next_State <= S_BRA_5; -- States when the instruction is a Branch Always
when S_BRA_5 => Next_State <= S_BRA_6;
when S_BRA_6 => Next_State <= S_FETCH_0;

when others => Next_State <= S_FETCH_0;


end case;
end process NEXT_STATE_LOGIC;

494
von Neumann Computer (Control Unit)

• Sequence Controller Modeling


Output Logic
OUTPUT_LOGIC : process (Current_State) -- Moore Type
begin
case (Current_State) is
when S_FETCH_0 => Bus1_Sel <= "00"; -- Bus1_Sel = PC
Bus2_Sel <= "01"; -- Bus2_Sel = Bus1
IR_Load <= '0';
MAR_Load <= '1'; -- Mar Load
PC_Load <= '0';
PC_Inc <= '0';
X_Load <= '0';
Y_Load <= '0';
Z_Load <= '0';
Write <= '0';
ALU_Sel <= '0';
CCR_Load <= '0';

when S_FETCH_1 => Bus1_Sel <= "00";


Bus2_Sel <= "10"; -- Bus2_Sel = Memory_Out
IR_Load <= '0';
MAR_Load <= '0';
PC_Load <= '0';
PC_Inc <= '1'; -- PC Inc
X_Load <= '0';
Y_Load <= '0';
Z_Load <= '0';
Write <= '0';
ALU_Sel <= '0';
CCR_Load <= '0';

495
Gumnut Core in VHDL
VHDL

The Gumnut Core


 Asmall8-bitsoftcore
 Can be used in FPGA designs
Instruction set illustrates features typical of 8-
bit cores and processors in general

 Programs written in assembly language


 Each processor instruction written explicitly
 Translated to binary representation by an

assembler
 Resources available on companions web site

16
VHDL

Gumnut Storage
General-Purpose Registers Condition Code Registers
How many
r0 0 C Carry
registers should r1 Z Zero
you encode for in r2
the instruction? r3
Two? Three?
r4 Program Counter
How many r5 PC
registers should r6
there be? r7

Data Memory Instruction Memory


(256 × 8-bit, 8-bit addresses) (4K × 18-bit, 12-bit addresses)
0 0
1 1
2 2

254 4094
255 4095

17
VHDL

Arithmetic Instructions
 Operate on register data and put result
in a register
 add,addc,sub,subc
 Can have immediate value operand

 Condition codes
 Z: 1 if result is zero, 0 if result is non-zero
 C: carry out of add/addc, borrow out of

sub/subc
 addc and subc include C bit in
operation
18
VHDL

Arithmetic Instructions
 Examples
 add r3, r4, r1 

add r5, r1, 2


 sub r4, r4, 1
 Evaluate 2x + 1; x in r3, result in r4
 add r4 ,r4 ,r3 ; double x
add r4, r4, 1 ; then add 1

19
VHDL

Logical Instructions
 Operate on register data and put result
in a register
 and, or, xor, mask (and not)
 Operate bitwise on 8-bit operands

 Can have immediate value operand

 Condition codes
 Z: 1 if result is zero, 0 if result is non-zero
 C: always 0

20
VHDL

Logical Instructions
 Examples
 and r3, r4, r5
 or r1, r1, 0x80 ; set r1(7)
 xor r5, r5, 0xFF ; invert r5
 Set Z if least-significant 4 bits of r2 are 0101
 and r1, r2, 0x0F ; clear high bits
sub r0, r1, 0x05 ; compare with 0101

21
VHDL

Shift Instructions
 Logical shift/rotate register data and
put result in a register
 shl, shr, rol, ror
 Count specified as a literal operand
 Condition codes
 Z: 1 if result is zero, 0iif result is non-zero
 C: the value of the last bit shifted/rotated

past the end of the byte

22
VHDL

Shift Instructions
 Examples
 shl r4, r1, 3
 ror r2, r2, 4
 Multiply r4 by 8, ignoring overflow
 shl r4, r4, 3
 Multiply r4 by 10, ignoring overflow
 shl r1, r4, 1; multiply by 2
shl r4, r4, 3 ; multiply by 8
add r4, r4, r1

23
VHDL

Memory Instructions
 Transfer data between registers and data
memory
 Compute address by adding an offset to a base
register value
 Load register from memory
 ldm r1, (r2)+5
 Store from register to memory
 stm r1, (r4)-2
 Use r0 if base address is 0
 ldm r3, 23 ≡ ldm r3, (r0)+23
 Condition codes not affected
24
VHDL

Memory Instructions
 Increment a 16-bit integer in memory
 Little-endian: address of lsb in r2, msb in next
location
 ldm r1, (r2) ; increment lsb
add r1, r1, 1
stm r1, (r2)
ldm r1, (r2)+1 ; increment msb
addc r1, r1, 0 ; with carry
stm r1, (r2)+1

25
VHDL

Input/Output Instructions
 I/O controllers have registers that govern
their operation
 Each has an address,like data memory
 Gumnut has separate data and I/O address spaces


Input from I/O register
 inp r3, 157 ≡ inp r3, (r0)+157
 Output to I/O register
 out r3, (r7) ≡ out r3, (r7)+0
 Condition codes not affected
 Further examples in Chapter 8
26
VHDL

Branch Instructions
 Programs can evaluate conditions and take
alternate courses of action
 Condition codes (Z,C) represent outcomes of
arithmetic/logical/shift instructions
 Branch instructions examine Z or C
 bz, bnz, bc, bnc
Add a displacement to PC if condition is true
 Specifies how many instructions forward or
backward to skip
 Counting from instruction after branch

27
VHDL

Branch Example
 Elapsed seconds in location 100
 Increment, wrapping to 0 after 59
 ldm r1, 100
add r1, r1, 1
sub r0, r1, 60 ; Z set if r1 = 60
bnz +1 ; Skip to store if
add r1, r0, 0 ; Z is 0
stm r1, 100

28
VHDL

Jump Instruction
 Unconditionally skips forward or backward to
specified address
 Changes the PC to the address
 Example: if r1 = 0, clear data location 100 to
0; otherwise clear location 200 to 0
 Assume instructions start at address 10
 10: sub r0, r1, 0
11: bnz +2
12: stm r0, 100
13: jmp 15
14: stm r0, 200
15: …

29
VHDL

Subroutines
 Asequenceofinstructionsthatperform
some operation
 Can call them from different parts of a
program using a jsb instruction
 Subroutine returns with a ret instruction

jsb m m subroutine

… instructions

jsb m


ret

30
VHDL

Subroutine Example
 Subroutine to increment second count
 Address of count in r2
 ldm r1, (r2)
add r1 ,r1,1
sub r0, r1, 60
bnz +1
add r1 ,r0,0
stm r1, (r2)
ret
 Call to increment locations 100 and 102
 add r2, r0, 100
jsb 20
add r2, r0, 102
jsb 20

31
VHDL

Return Address Stack


 The jsb saves the return address for
use by the ret
 But what if the subroutine includes a jsb?
Gumnut core includes an 8-entry push-
down stack of return addresses

return addr for third call


return addr for second call return addr for second call
return addr for first call return addr for first call

32
VHDL

Miscellaneous Instructions
 Instructions supporting interrupts
 See Chapter 8 (more later)
 reti Return from interrupt
 enai Enable interrupts
 disi Disable interrupts
 wait Wait for an interrupt
 stby Stand by in low power mode until
an interrupt occurs

33
VHDL

The Gumnut Assembler


 Gasm: translates assembly programs
 Generates memory images for program
text (binary-coded instructions) and data
 See documentation on web site

 Write a program as a text file


 Instructions
 Directives 
Comments
 Use symbolic labels

34
VHDL

Example Program
; Program to determine greater of value_1 and value_2
text
org 0x000 ; start here on reset
jmp main
; Data memory layout
data
value_1: byte 10
value_2: byte 20
result: bss 1
; Main program
text
org 0x010
main: ldm r1, value_1 ; load values
ldm r2, value_2
sub r0, r1, r2 ; compare values
bc value_2_greater
stm r1, result ; value_1 is greater
jmp finish
value_2_greater: stm r2, result ; value_2 is greater
finish: jmp finish ; idle loop

35
VHDL

Gumnut Instruction Encoding


 Instructions are a form of information
 Can be encoded in binary
 Gumnut encoding
 18 bits per instruction
 Divided into fields representing different

aspects of the instruction


 Opcodes and function codes
 Register numbers The VAX has a computer architecture with easily
the most complex instruction set.
 Addresses
The instruction set has a highly variable format
where the minimal instruction length is 1 byte
and the longest instruction is 37 bytes (296 bits).

36
VHDL

Gumnut Instruction Encoding


4 3 3 3 2 3
Arith/Logical
1 1 1 0 rd rs rs2 fn
Register
1 3 3 3 8
Arith/Logical
0 fn rd rs immed
Immediate
3 1 3 3 3 3 2
Shift 1 1 0 rd rs count fn
2 2 3 3 8
Memory, I/O 1 0 fn rd rs offset
6 2 2 8
Branch 1 1 1 1 1 0 fn disp
5 1 12
Jump 1 1 1 1 0 fn addr
7 3 8
Miscellaneous 1 1 1 1 1 1 0 fn

37
VHDL

Encoding Examples
 Encoding for addc r3 ,r5, 24
 Arithmetic immediate, fn = 001
1 3 3 3 8
0 fn rd rs immed
0 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 0 0  05D18

 Instruction encoded by 2ECFC


1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 0
6 2 2 8
Branch 1 1 1 1 1 0 fn disp  bnc -4

38
VHDL

Other Instruction Sets


 8-bit cores and microcontrollers
 Xilinx PicoBlaze: like Gumnut
051,and numerous like it
„ Originated as 8-bit microprocessors
„ Instructions encoded as one or more bytes
„ Instruction set is more complex and irregular
„ Complex instruction set computer (CISC)
„ C.f. Reduced instruction set computer (RISC)
 16-, 32- and 64-bit cores
 Mostly RISC
 E.g., PowerPC, ARM, MIPS, Tensilica, …

39
VHDL

Instruction and Data Memory


 In embedded systems
 Instruction memory is usually ROM, flash,
SRAM,or combination
 Data memory is usually SRAM

 DRAM if large capacity needed


 Processor/memory interfacing
 Gluing the signals together

40
VHDL

Example: Gumnut Memory

instruction gumnut data


ROM clk_i SRAM
clk_i rst_i clk_i

en inst_cyc_o data_cyc_o en
inst stb o data stb o
D Q inst_ack_i data_ack_i Q D
clk clk

data we o we
adr inst_adr_o
dat_o inst_dat_i data_adr_o adr
data_dat_o dat_i
data_dat_i dat_o

41
VHDL

Example: Gumnut Memory

IMem : process (clk) is


begin
if rising_edge(clk) then
if inst_cyc_o = '1' and inst_stb_o = '1' then
inst_dat_i <=
instr_ROM(to_integer(inst_adr_o(10 downto 0)));
inst_ack_i <= '1';
else
inst_ack_i <= '0';
end if;
end if;
end process IMem;

42
VHDL

Example: Gumnut Memory


DMem : process (clk) is
begin
if rising_edge(clk) then
if data_cyc_o = '1' and data_stb_o = '1' then
if data_we_o = '1' then
data_RAM(to_integer(data_adr_o)) <= data_dat_o;
data_dat_i <= data_dat_o;
data_ack_i <= '1';
else
data_dat_i <= data_RAM(to_integer(data_adr_o));
data_ack_i <= '1';
end if;
else
data_ack_i <= '0';
end if;
end if;
end process DMem;
43
VHDL

Example: Microcontroller Memory

8051 SRAM
P2 A(15..8)
D

P0 D Q A(7..0)

ALE LE

PSEN A(16)
WR WE

OE
RD

CE

PSEN (program store enable)

44
VHDL

32-bit Memory
 Four bytes per memory word
 Little-endian: lsb at least address
 Big-endian: msb at least address
0 1 2 3
4 5 6 7
8 9 10 11

 Partial-word read
 Read all bytes, processor selects those needed
 Partial-word write
 Use byte-enable signals
45
VHDL

Example: MicroBlaze Memory


2:16 SSRAM
Addr
Data_Write A
0:7 0:7
AS D_in D_out

Write_Strobe en

Byte_Enable(0) wr

Byte Enable(1) clk

Byte_Enable(2) SSRAM
Byte_Enable(3) A
Read_Strobe 8:15 8:15
D_in D_out
en
Data Read
wr
+V
clk
Ready
Clk SSRAM
A
16:23 16:23
D_in D_out
en
wr
clk

SSRAM
A
24:31 24:31
D_in D_out
en
wr
clk

46
VHDL

Cache Memory
 For high-performance processors
 Memory access time is several clock cycles
 Performance bottleneck

 Cache memory
 Small fast memory attached to a processor
 Stores most frequently accessed items,

plus adjacent items


 Locality: those items are most likely to be

accessed again soon

47
VHDL

Cache Memory
 Memory contents divided into fixed-
sized blocks (lines)
 Cache copies whole lines from memory
 When processor accesses an item
 If item is in cache: hit - fast access
 Occurs most of the time
 If item is not in cache: miss
 Line containing item is copied from memory
 Slower, but less frequent

 May need to replace a line already in cache

48
VHDL

Fast Main Memory Access


 Optimize memory for line access by cache
 Wide memory
 Read a line in one access
 Burst transfers
 Send starting address, then read successive locations
 Pipelining
 Overlapping stages of memory access
 E.g., address transfer, memory operation, data transfer
 Double data rate (DDR), Quad data rate (QDR)
 Transfer on both rising and falling clock edges

49
VHDL

Summary
 Embedded computer
 Processor, memory, I/O controllers, buses
 Microprocessors,microcontrollers,and
processor cores
 Soft-core processors for ASIC/FPGA

 Processor instruction sets


 Binary encoding for instructions
 Assembly language programs
 Memory interfacing

50
532