Sie sind auf Seite 1von 94

Digital Design

Chapter 5: Register-Transfer Level (RTL) Design

Digital Design Copyright 2006 Frank Vahid

Introduction
FSM inputs

5.1

Chapter 3: Controllers
Control input/output: single bit (or just a few) representing event or state Finite-state machine describes behavior; implemented as state register and combinational logic

bi

bo Combinational logic n1 s1 s0 n0

clk

State register

Chapter 4: Datapath components


Data input/output: Multiple bits collectively representing single entity Datapath components included registers, adders, ALU, comparators, register files, etc. Register
i s i s n a

Comparator Register file

ALU
e

This chapter: custom processors


Processor: Controller and datapath components working together to implement an algorithm
Digital Design Copyright 2006 Frank Vahid

bi

bo

Combinational logic n1 n0 s1 s0 State register

Register file ALU

Datapath Controller
2

FSM outputs

RTL Design: Capture Behavior, Convert to Circuit


Recall
Chapter 2: Combinational Logic Design
First step: Capture behavior (using equation or truth table) Remaining steps: Convert to circuit

Chapter 3: Sequential Logic Design


First step: Capture behavior (using FSM) Remaining steps: Convert to circuit

Capture behavior

RTL Design (the method for creating custom processors)


First step: Capture behavior (using highlevel state machine, to be introduced) Remaining steps: Convert to circuit
Digital Design Copyright 2006 Frank Vahid

Convert to circuit

RTL Design Method

5.2

Digital Design Copyright 2006 Frank Vahid

RTL Design Method: Preview Example


Soda dispenser
c: bit input, 1 when coin deposited a: 8-bit input having value of deposited coin s: 8-bit input having cost of a soda d: bit output, processor sets to 1 when total value of deposited coins equals or exceeds cost of a soda
s a

c d

Soda dispenser processor

a 25

0 1 0 1 0
c d

50

25
a

0 1 0
Digital Design Copyright 2006 Frank Vahid

Soda tot: tot: dispenser 25 processor50

How can we precisely describe this processors behavior?

Preview Example: Step 1 -Capture High-Level State Machine


Declare local register tot Init state: Set d=0, tot=0 Wait state: wait for coin
If see coin, go to Add state
c d s 8 Soda dispenser processor a 8

Add state: Update total value: tot = tot + a


Remember, a is present coins value Go back to Wait state

Inputs: c (bit), a (8 bits), s (8 bits) Outputs: d (bit) Local registers: tot (8 bits) c Add Init d=0 tot=0 Wait c*(tot<s) Disp d=1
6

In Wait state, if tot >= s, go to Disp(ense) state Disp state: Set d=1 (dispense soda)
Return to Init state
Digital Design Copyright 2006 Frank Vahid

tot=tot+a c*(tot<s)

Preview Example:
Step 2 -- Create Datapath
Need tot register Need 8-bit comparator to compare s and a Need 8-bit adder to perform tot = tot + a Wire the components as needed for above Create control input/outputs, give them names
Inputs : c (bit), a(8 bits), s (8 bits) Outputs : d (bit) Local reg isters: t ot (8 bits) c Add Init d=0 t ot=0 Wait c (tot<s) tot= tot+a c (tot<s) Disp d=1

tot_ld tot_clr 8

ld clr

tot 8 8

tot_lt_s

8-bit < Datapath

8-bit adder 8
7

Digital Design Copyright 2006 Frank Vahid

Preview Example: Step 3


Connect Datapath to a Controller
Controllers inputs
External input c (coin detected) Input from datapath comparators output, which we named tot_lt_s
tot_ld tot_clr 8 s a ld clr tot

8 8-bit < 8-bit adder 8

tot_lt_s

Datapath

s 8

a 8

Controllers outputs
External output d (dispense soda) Outputs to datapath to load and clear the tot register
Digital Design Copyright 2006 Frank Vahid

c d tot_ld tot_clr Controller tot_lt_s Datapath


8

Preview Example: Step 4 Derive the Controllers


FSM
Same states and arcs as high-level state machine But set/read datapath control signals for all c datapath operations d and conditions
Digital Design Copyright 2006 Frank Vahid c d tot_ld tot_clr tot_lt_s s Inputs: : c, tot_lt_s(bit) Outputs:d, tot_ld, tot_clr (bit) tot_ld c Add Init Wait
c* t ot_ lt_ s

s a 8 8

Controller

Datapath

tot_ld tot_clr 8

ld tpt clr 8 8-bit < Datapath 8

tot_clr tot_lt_s
tot_lt_s

tot_ld=1 c*tot_lt_s Disp d=1

d=0 tot_clr=1

8-bit adder 8

Controller

Preview Example: Completing the Design


Implement the FSM as a state register and logic
Init tot_lt_s tot_clr tot_ld

As in Ch3 Table shown on right


Inputs: : c, tot_lt_s(bit) Outputs:d, tot_ld, tot_clr (bit) tot_ld

s1 0 0 0 0 0 0 0 0 1 1

s0 0 0 0 0 1 1 1 1 0 1

c 0 0 1 1 0 0 1 1 0 0

n1 0 0 0 0 1 0 1 1 0 0

n0 1 1 1 1 1 1 0 0 1 0

d 0 0 0 0 0 0 0 0 0 1

0 1 0 1 0 1 0 1 0 0

0 0 0 0 0 0 0 0 1 0

1 1 1 1 0 0 0 0 0 0

c d
Init Wait
c* tot _lt _s

c Add tot_ld=1 c*tot_lt_s Disp d=1

tot_clr tot_lt_s
Add Disp

d=0 tot_clr=1

Controller
Digital Design Copyright 2006 Frank Vahid

Wait

10

Step 1: Create a High-Level State Machine


Lets consider each step of the RTL design process in more detail , s (8 bits) Inputs: c (bit), a (8 bits) Outputs: d (bit) Step 1
Soda dispenser example Not an FSM because:
Multi-bit (data) inputs a and s Local register tot Data operations tot=0, tot<s, tot=tot+a.
Local registers: tot (8 bits) c Init d=0 tot=0 Wait c(tot<s) tot= tot+a c (tot<s) Disp d=1

Useful high-level state machine:


Data types beyond just bits Local registers Arithmetic equations/expressions
Digital Design Copyright 2006 Frank Vahid

11

Step 1 Example: Laser-Based Distance Measurer


T (in seconds) laser D Object of interest 2D = T sec * 3*108 m/sec

sensor

Example of how to create a high-level state machine to describe desired processor behavior Laser-based distance measurement pulse laser, measure time T to sense reflection
Laser light travels at speed of light, 3*10 m/sec 8 Distance is thus D = T sec * 3*10 m/sec / 2
Digital Design Copyright 2006 Frank Vahid

12

Step 1 Example: Laser-Based Distance Measurer


T (in seconds) laser from button B Laser-based distance measurer L to laser

sensor

D to display

16

S from sensor

Inputs/outputs
B: bit input, from button to begin measurement L: bit output, activates laser S: bit input, senses laser reflection D: 16-bit output, displays computed distance
Digital Design Copyright 2006 Frank Vahid

13

Step 1 Example: Laser-Based Distance Measurer


from button B L Laserbased distance measurer

Inputs: B, S(1 bit each) Outputs: L (bit), D (16 bits)


to display D 16

to laser

from sensor

S0
a

L = 0 (laser off) D = 0 (distance = 0)

Step 1: Create high-level state machine Begin by declaring inputs and outputs Create initial state, name it S0
Initialize laser to off (L=0) Initialize displayed distance to 0 (D=0)
Digital Design Copyright 2006 Frank Vahid

14

Step 1 Example: Laser-Based Distance Measurer


Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) B (button not pressed)
a

from button B

L Laserbased distance measurer

to laser

to display

16

from sensor

S0 L=0 D=0

S1 B (button pressed)

Add another state, call S1, that waits for a button press
B stay in S1, keep waiting B go to a new state S2

Q: What should S2 do?


Digital Design Copyright 2006 Frank Vahid

A: Turn on the laser


a

15

Step 1 Example: Laser-Based Distance Measurer


Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
from button B L Laserbased distance measurer to laser to display D 16 S from sensor

S0 L=0 D=0

S1

S2 L=1 (laser on)

S3 L=0 (laser off)


a

Add a state S2 that turns on the laser (L=1) Then turn off laser (L=0) in a state S3 Q: What do next? A: Start timer, wait to sense reflection
a

Digital Design Copyright 2006 Frank Vahid

16

Step 1 Example: Laser-Based Distance Measurer


Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits)
to displ ay from but ton B Lase r-based distan ce measu rer L to laser D 16 S from sensor

S (no reflection) S (reflection) ?


a

S0 L=0 D=0

S1

S2 L=1

S3

Dctr = 0 (reset cycle count)

L=0 Dctr = Dctr + 1 (count cycles)

Stay in S3 until sense reflection (S) To measure time, count cycles for which we are in S3
To count, declare local register Dctr Increment Dctr each cycle in S3 Initialize Dctr to 0 in S1. S2 would have been O.K. too
Digital Design Copyright 2006 Frank Vahid

17

Step 1 Example: Laser-Based Distance Measurer


from but ton B Lase r-based distan ce measu rer L to laser

Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) B S

to displ ay

16

from sensor

S0 L=0 D=0

S1 Dctr = 0

S2 L=1

S3

S4

D = Dctr / 2 L=0 Dctr = Dctr + 1 (calculate D)

Once reflection detected (S), go to new state S4


Calculate distance 8 Assuming clock frequency is 3x10 , Dctr holds number of meters, so D=Dctr/2

After S4, go back to S1 to wait for button again


Digital Design Copyright 2006 Frank Vahid

18

Step 2: Create a Datapath


Datapath must
Implement data storage Implement data computations

Look at high-level state machine, do three substeps


(a) Make data inputs/outputs be datapath inputs/outputs (b) Instantiate declared registers into the datapath (also instantiate a register for each data output) (c) Examine every state and transition, and instantiate datapath components and connections to implement any data computations
Digital Design Copyright 2006 Frank Vahid

Instantiate: to introduce a new component into a design.

19

Step 2 Example: Laser-Based Distance Measurer


(a) Make data Local Registers: Dctr (16 bits) inputs/outputs be datapath B S inputs/outputs (b) Instantiate declared S4 S0 S1 S2 S3 registers into the B S datapath (also L=0 D = Dctr / 2 Dctr = 0 L=1 L=0 instantiate a D=0 Dctr = Dctr + 1 (calculate D) register for each a data output) Datapath (c) Examine every Dreg_clr state and Dreg_ld transition, and clear clear I Dctr_clr instantiate Dreg: 16-bit Dctr: 16-bit count Dctr_cnt load register up-counter datapath Q Q components and connections to implement any 16 data computations
D
Digital Design Copyright 2006 Frank Vahid

Inputs: B, S (1 bit each)

Outputs: L (bit), D (16 bits)

20

Step 2 Example: Laser-Based Distance Measurer


(c) (continued) Examine every state and transition, and instantiate datapath components and connections to implement any data computations
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) B S S4

S0 L=0 D=0
Datapath

S1 Dctr = 0

S2 L=1

S3

D = Dctr / 2 L=0 Dctr = Dctr + 1 (calculate D)


a

Dreg_clr Dreg_ld Dctr_clr Dctr_cnt clear count Q 16 Dctr: 16-bit up-counter clear load

>>1 16 I Q 16 D Dreg: 16-bit register

Digital Design Copyright 2006 Frank Vahid

21

Step 2 Example Showing Mux Use


Localregisters : E ,F , G, R (16 bits) E F G E F G E F G

T0 R = E + F A T1 R = R + G
a

add_A_s0 add_B_s0

1 2 A

1 2

R (a) (b)

R R

(c)

(d)

Introduce mux when one component input can come from more than one source
Digital Design Copyright 2006 Frank Vahid

22

Step 3: Connecting the Datapath to a Controller


from button B Controller Dreg_clr Dreg_ld Dctr_clr Dctr_cnt D to display 16 300 MHz Clock Datapath S L to laser from sensor

Laser-based distance measurer example Easy just connect all control signals between controller and datapath

Datapath Dreg_clr Dreg_ld Dctr_clr Dctr_cnt

clear count Q

Dctr: 16-bit up-counter

clear load

I Dreg: 16-bit register Q 16 D

Digital Design Copyright 2006 Frank Vahid

23

Step 4: Deriving the Controllers FSM


B from but ton L C o n troller D reg_clr D reg_ld D c tr_clr D a tap a th to displ a y D 16 D c tr_c n t 300H M z Clock S to laser from sensor

Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) B S

S0 L=0 D=0

S1 Dctr = 0

S2 L=1

S3

S4

L=0 D = Dctr / 2 Dctr = Dctr + 1 (calculate D)

Inputs: B, S FSM has same Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt structure as highB level state machine

S
a

Inputs/outputs all bits now Replace data operations by bit operations using datapath
Digital Design Copyright 2006 Frank Vahid

S0 L=0 Dreg_clr = 1 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser off) (clear D reg)

S1 L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 1 Dctr_cnt = 0 (clear count)

S2 L=1 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser on)

S3

S4 L=0 Dreg_clr = 0 Dreg_ld = 1 Dctr_clr = 0 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting)24

L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 1 (laser off) (count up)

Step 4: Deriving the Controllers FSM


B S S0 L=0 Dreg_clr = 1 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser off) (clear D reg) S1 L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 1 Dctr_cnt = 0 (clear count) B S2 L=1 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser on) S3 S S4 L=0 Dreg_clr = 0 Dreg_ld = 1 Dctr_clr = 0 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting)

L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 1 (laser off) (count up)

Using shorthand of outputs not assigned implicitly assigned 0

Inputs: B, S

Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt B S


a

S0 L=0 Dreg_clr = 1 (laser off) (clear D reg)

S1 Dctr_clr = 1 (clear count)

S2 L=1 (laser on)

S3

S4 Dreg_ld = 1 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting)

Digital Design Copyright 2006 Frank Vahid

L=0 Dctr_cnt = 1 (laser off) (count up)

25

Step 4
from button Controller B L Dreg_clr Dreg_ld Dctr_clr Dctr_cnt to display D 16
300 MHz Clock

Datapath

to laser from sensor

Datapath

Dreg_clr Dreg_ld Dctr_clr Dctr_cnt clear count Dctr: 16-bit up-counter Q clear load I Dreg: 16-bit register Q 16

Inputs: B, S

Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt B S

S0 L=0 Dreg_clr = 1 (laser off) (clear D reg)

S1 Dctr_clr = 1 (clear count)

S2 L=1 (laser on)

S3

S4 Dreg_ld = 1 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting)

L=0 Dctr_cnt = 1 (laser off) (count up)

Implement FSM as state register and logic (Ch3) to complete the design
26

Digital Design Copyright 2006 Frank Vahid

RTL Design Examples and Issues


Well use several more examples to illustrate RTL design Example: Bus interface
Master processor can read register from any peripheral
Each register has unique 4-bit address Assume 1 register/periph.
Per0 Per1 Master processor rd 32 D 4 A Per15

5.3

to/from processor bus rd D A 32 Bus interface Q 32 Main part Peripheral


27

Sets rd=1, A=address Appropriate peripheral places register data on 32-bit D lines
Periphs address provided on Faddr inputs (maybe from DIP switches, or another register)
Digital Design Copyright 2006 Frank Vahid

4 Faddr 4

RTL Example: Bus Interface


Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q1 (32 bits) rd ((A = Faddr) and rd) WaitMyAddress (A = Faddr) and rd D = Z Q1 = Q rd SendData D = Q1

Step 1: Create high-level state machine


State WaitMyAddress
Output nothing (Z) on D, store peripherals register value Q into local register Q1 Wait until this peripherals address is seen (A=Faddr) and rd=1

State SendData
Output Q1 onto D, wait for rd=0 (meaning main processor is done reading the D lines)
Digital Design Copyright 2006 Frank Vahid

28

RTL Example: Bus Interface


Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q1 (32 bits) rd ((A = Faddr) and rd) WaitMyAddress (A = Faddr) and rd D = Z Q1 = Q rd SendData D = Q1

clk Inputs rd State Outputs D


Digital Design Copyright 2006 Frank Vahid

W Z

SD Q1

W Z

SD

SD Q1

W Z

29

RTL Example: Bus Interface


Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q1 (32 bits) rd ((A = Faddr) and rd) SendData WaitMyAddress (A = Faddr) D = Q1 and rd D = Z Q1 = Q A rd Q1_ld 4 Faddr 4 Q 32 ld Q1 = (4-bit) A_eq_ Faddr D_en 32

32
a

Step 2: Create a datapath


(a) Datapath inputs/outputs (b) Instantiate declared registers (c) Instantiate datapath components and connections
Digital Design Copyright 2006 Frank Vahid

Datapath Bus interface D

30

RTL Example: Bus Interface


Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q1 (32 bits) rd Inputs: rd, A_eq_Faddr (bit) ((A = Faddr) Outputs: Q1_ld, D_en (bit)rd) and rd SendData rd WaitMyAddress (A = Faddr) (A_eq_Faddr D = Q1 and rd D = Z and rd) Q1 = Q
WaitMyAddress D_en = 0 Q1_ld = 1 A_eq_Faddr and rd

rd Q1_ld

A 4
rd

Faddr 4 ld

Q 32 Q1 32

= (4-bit) A_eq_Faddr D_en Datapath

SendData D_en = 1 Q1_ld = 0

32

Bus interface

Step 3: Connect datapath to controller Step 4: Derive controllers FSM


Digital Design Copyright 2006 Frank Vahid

31

RTL Example: Video Compression Sum of Absolute


Only difference: ball moving
Frame 1 Frame 2

Differences
Frame 1 Frame 2

Digitized

Digitized

Digitized

Difference of

frame 1

frame 2

frame 1

2 from 1

1 Mbyte (a)

1 Mbyte

1 Mbyte (b)

0.01 Mbyte

Video is a series of frames (e.g., 30 per second) Most frames similar to previous frame
Compression idea: just send difference from previous frame
Digital Design Copyright 2006 Frank Vahid

Just send difference

32

RTL Example: Video Compression Sum of Absolute


compare
Frame 1 Frame 2

Differences
Each is a pixel, assume represented as 1 byte (actually, a color picture might have 3 bytes per pixel, for intensity of red, green, and blue components of pixel)

Need to quickly determine whether two frames are similar enough to just send difference for second frame
Compare corresponding 16x16 blocks
Treat 16x16 block as 256-byte array

Compute the absolute value of the difference of each array item Sum those differences if above a threshold, send complete frame for second frame; if below, can use difference method (using another technique, not described)
Digital Design Copyright 2006 Frank Vahid

33

RTL Example: Video Compression Sum of Absolute


Differences
256-byte array 256-byte array
A SAD

B go
< i ( ! ) 6 5 2

sad

integer

Want fast sum-of-absolute-differences (SAD) component


When go=1, sums the differences of element pairs in arrays A and B, outputs that sum
Digital Design Copyright 2006 Frank Vahid

34

RTL Example: Video Compression Sum of Absolute


Differences
A SAD B go sad

Inputs: A, B (256 byte memory); go (bit) Outputs: sad (32 bits) Local registers: sum, sad_reg (32 bits); i (9 bits) S0 go S1
(i<256)
< i ( ! ) 6 5 2

!go sum = 0 i=0


a

S0: wait for go S1: initialize sum and index S2: check if done (i>=256) S3: add difference to sum, increment index S4: done, write to output sad_reg
Digital Design Copyright 2006 Frank Vahid

S2 i<256 sum=sum+abs(A[i]-B[i]) S3 i=i+1 S4 sad_reg = sum

35

RTL Example: Video Compression Sum of Absolute


Differences
Inputs: A, B (256 byte memory); go (bit) Outputs: sad (32 bits) Local registers: sum, sad_reg (32 bits); i (9 bits) S0 go S1
(i<256)

AB_addr i_lt_256 i_inc i_clr


a

A_data B_data

<256 9 i

!go sum = 0 i=0

8 32

sum_ld sum_clr
< i ( ! ) 6 5 2

S2 i<256 sum=sum+abs(A[i]-B[i]) S3 i=i+1

sum 32 32

abs 8

sad_reg_ld sad_reg Datapath 32 sad

t l < _ ! i ( ) 6 5 2

S4

sad_reg=sum

Step 2: Create datapath


Digital Design Copyright 2006 Frank Vahid

36

RTL Example: Video Compression Sum of Absolute


Differences
go AB_ rd i_lt_256 S0 go S1
?

AB_addr <256 9 i

A_data B_data

go i_inc sum=0 sum_clr=1 i=0 i_clr=1 i_clr sum_ld sum_clr


< i ( ! ) 6 5 2

8 32

S2 i<256 i_lt_256 sum=sum+abs(A[i]-B[i]) S3 sum_ld=1; AB_rd=1 i=i+1 i_inc=1 S4

sum 32 32 sad_reg 32 sad

abs 8

sad_reg_ld

t l < _ ! i ( ) 6 5 2

t l < _ ! i ( ) 6 5 2

sad_reg=sum sad_reg_ld=1

Controller

Digital Design Copyright 2006 Frank Vahid

Step 3: Connect to controller Step 4: Replace high-level state machine by FSM


37

RTL Example: Video Compression Sum of Absolute


Differences
Comparing software and custom circuit SAD
Circuit: Two states (S2 & S3) for each i, 256 is 512 clock cycles Software: Loop (for i = 1 to 256), but for each i, must move memory to local registers, subtract, compute absolute value, add to sum, increment i say about 6 cycles per array item 256*6 = 1536 cycles Circuit is about 3 times (300%) faster Later, well see how to build SAD circuit that is even faster
< i ( ! ) 6 5 2 t l < _ ! i ( ) 6 5 2

(i<256)

S2 i<256 sum=sum+abs(A[i]-B[i]) S3 i=i+1

Digital Design Copyright 2006 Frank Vahid

38

RTL Design Pitfalls and Good Practice


Common pitfall: Assuming register is update in the state its written
Final value of Q? Final state? Answers may surprise you
Value of Q unknown Final state is C, not D
Local registers: R, Q (8 bits) R<100 A R=99 Q=R B R=R+1 (a) R<100 clk R Q A 99 ? ? B 100 99 ? (b) C 100 ? C

R>=100 D

Why?
State A: R=99 and Q=R happen simultaneously State B: R not updated with R+1 until next clock cycle, simultaneously with state register being updated
Digital Design Copyright 2006 Frank Vahid

39

RTL Design Pitfalls and Good Practice


Solutions
Read register in following state (Q=R) Insert extra state so that conditions use updated value Other solutions are possible, depends on the example
Local registers: R, Q (8 bits) R<100 A R=99 Q=R B R=R+1 Q=R (a) R<100 R>=100 clk R Q A 99 ? ? B 100 99 ? (b) B2 100 99 D 100 99 B2 C

R>=100 D

Digital Design Copyright 2006 Frank Vahid

40

RTL Design Pitfalls and Good Practice


Common pitfall: Reading outputs
Outputs can only be written Solution: Introduce additional register, which can be written and read
Inputs: A, B (8 bits) Outputs: P (8 bits) Inputs: A, B (8 bits) Outputs: P (8 bits) Local register: R (8 bits)

S P=A (a)

T P=P+B

S R=A P=A (b)

T P=R+B

Digital Design Copyright 2006 Frank Vahid

41

RTL Design Pitfalls and Good Practice


Good practice: Register all data outputs
In fig (a), output P would show spurious values as addition computes
Furthermore, longest register-to-register path, which determines clock period, is not known until that output is connected to another component

B R R

+ P (a)

Preg P (b)

In fig (b), spurious outputs reduced, and longest register-to-register path is clear

Digital Design Copyright 2006 Frank Vahid

42

Control vs. Data Dominated RTL Design


Designs often categorized as control-dominated or datadominated
Control-dominated design Controller contains most of the complexity Data-dominated design Datapath contains most of the complexity General, descriptive terms no hard rule that separates the two types of designs Laser-based distance measurer control dominated Bus interface, SAD circuit mix of control and data Now lets do a data dominated design

Digital Design Copyright 2006 Frank Vahid

43

Data Dominated RTL Design Example: FIR Filter


Filter concept
Suppose X is data from a temperature sensor, and particular input sequence is 180, 180, 181, 240, 180, 181 (one per clock cycle) That 240 is probably wrong!
Could be electrical noise

X 12 clk digital filter 12

Filter should remove such noise in its output Y Simple filter: Output average of last N values
Small N: less filtering Large N: more filtering, but less sharp output
Digital Design Copyright 2006 Frank Vahid

44

Data Dominated RTL Design Example: FIR Filter


FIR filter
Finite Impulse Response Simply a configurable weighted sum of past input values y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Above known as 3 tap Tens of taps more common Very general filter User sets the constants (c0, c1, c2) to define specific filter
X 12 clk digital filter 12 Y

y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

RTL design
Step 1: Create high-level state machine
But there really is none! Data dominated indeed.

Go straight to step 2
Digital Design Copyright 2006 Frank Vahid

45

Data Dominated RTL Design Example: FIR Filter


Step 2: Create datapath
Begin by creating chain of xt registers to hold past values of X Suppose sequence is: 180, 181, 240
x(t) xt0 X clk 12 3-tap FIR filter x(t-1) xt1 12 x(t-2) xt2 12
X 12 clk digital filter 12 Y

y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

240 180 181

180 181

180

12

Y
a

Digital Design Copyright 2006 Frank Vahid

46

Data Dominated RTL Design Example: FIR Filter


Step 2: Create datapath (cont.)
Instantiate registers for c0, c1, c2 Instantiate multipliers to compute c*x values
x(t) xt0 X clk
a

X 12 clk digital filter 12

y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

c0

3-tap FIR filter x(t-1) c1 xt1

x(t-2) xt2

c2

Digital Design Copyright 2006 Frank Vahid

47

Data Dominated RTL Design Example: FIR Filter


Step 2: Create datapath (cont.)
Instantiate adders
X 12 clk digital filter 12 Y

y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)


3-tap FIR filter x(t) xt0 X clk c0 x(t-1) xt1 c1 x(t-2) xt2 c2

Digital Design Copyright 2006 Frank Vahid

48

Data Dominated RTL Design Example: FIR Filter


Step 2: Create datapath (cont.)
Add circuitry to allow loading of particular c register
CL Ca1 Ca0 C x(t) xt0 X clk c0 x(t-1) xt1 c1 x(t-2) xt2 c2
a

X 12 clk digital filter 12

y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)


3-tap FIR filter 3 2x4 2 1 0 e

* +
Digital Design Copyright 2006 Frank Vahid

* +

*
yreg Y

49

Data Dominated RTL Design Example: FIR Filter


Step 3 & 4: Connect to controller, Create FSM
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

No controller needed Extreme data-dominated example (Example of an extreme control-dominated design an FSM, with no datapath)

Comparing the FIR circuit to a software implementation


Circuit
Assume adder has 2-gate delay, multiplier has 20-gate delay Longest past goes through one multiplier and two adders
20 + 2 + 2 = 24-gate delay

100-tap filter, following design on previous slide, would have about a 34-gate delay: 1 multiplier and 7 adders on longest path

Software
100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per multiplication, 2 per addition. Say 10-gate delay per instruction. (100*2 + 100*2)*10 = 4000 gate delays

Circuit is more than 100 times faster (10,000% faster). Wow.


Digital Design Copyright 2006 Frank Vahid

50

Determining Clock Frequency


Designers of digital circuits often want fastest performance
Means want high clock frequency
clk a b

5.4

Frequency limited by longest register-to-register delay


Known as critical path If clock is any faster, incorrect data may be stored into register Longest path on right is 2 ns
Ignoring wire delays, and register setup and hold times, for simplicity
Digital Design Copyright 2006 Frank Vahid

2 ns delay

+
c

51

Critical Path
Example shows four paths
a to c through +: 2 ns a to d through + and *: 7 ns b to d through + and *: 7 ns b to d through *: 5 ns
2 ns delay
2 ns

1 / 7 ns = 142 MHz
Max
(2,7,7,5) = 7 ns

7 ns 7 ns

Digital Design Copyright 2006 Frank Vahid

5 ns

Longest path is thus 7 ns Fastest frequency

+
7 ns 7 ns

5 ns delay

52

Critical Path Considering Wire Delays


Real wires have delay too
Must include in critical path

Example shows two paths


Each is 0.5 + 2 + 0.5 = 3 ns clk a 0.5 ns b 0.5 ns

Trend
1980s/1990s: Wire delays were tiny compared to logic delays But wire delays not shrinking as fast as logic delays
Wire delays may even be greater than logic delays!

+
3 ns
s n 3

2 ns 0.5 ns
3 ns

Must also consider register setup and hold times, also add to path Then add some time to the computed path, just to be safe
e.g., if path is 3 ns, say 4 ns instead
Digital Design Copyright 2006 Frank Vahid

53

A Circuit May Have Numerous Paths


Paths can exist
In the datapath In the controller Between the controller and datapath May be hundreds or thousands of paths
Combinational logic
d

s 8

a 8

tot_ld c
tot_lt_s t ot_clr (c) n1

ld

tot
clr 8

n0
tot_lt_s

8-bit < Datapath

8-bit adder
8

Timing analysis tools that evaluate all possible paths automatically very helpful
Digital Design Copyright 2006 Frank Vahid

s1 clk

s0

(b) State register

(a)

54

Behavioral Level Design: C to Gates


C code
S0 go S1
(i<256)

5.5

!go sum = 0 i=0


int SAD (byte A[256], byte B[256]) // not quite C syntax { uint sum; short uint I; sum = 0; i = 0; while (i < 256) { sum = sum + abs(A[i] B[i]); i = i + 1; } return sum; }

S2 i<256 sum=sum+abs(A[i]-B[i]) S3 i=i+1 S4 sad_reg = sum


a

Earlier sum-of-absolute-differences example


Started with high-level state machine C code is an even better starting point -- easier to understand
Digital Design Copyright 2006 Frank Vahid

55

Behavioral-Level Design: Start with C (or Similar Language)


Replace first step of RTL design method by two steps
Capture in C, then convert C to high-level state machine How convert from C to high-level state machine?

Step 1A: Capture in C


a

Step 1B: Convert to high-level state machine

Digital Design Copyright 2006 Frank Vahid

56

Converting from C to High-Level State Machine


Convert each C construct to equivalent states and transitions Assignment statement
Becomes one state with assignment
target = expression; target= expression
a

If-then statement
Becomes state with condition check, transitioning to then statements if condition true, otherwise to ending state
then statements would also be converted to states
Digital Design Copyright 2006 Frank Vahid

!cond if (cond) { // then stmts } cond (then stmts) (end)


a

57

Converting from C to High-Level State Machine


If-then-else
Becomes state with condition check, transitioning to then statements if condition true, or to else statements if condition false
!cond if (cond) { // then stmts } else { // else stmts } cond (then stmts) (else stmts)
a

(end)

While loop statement


Becomes state with condition check, transitioning to while loops statements if true, then transitioning back to condition check
Digital Design Copyright 2006 Frank Vahid

!cond while (cond) { // while stmts } cond (while stmts)


a

(end)

58

Simple Example of Converting from C to HighLevel State Machine


Inputs: uint X, Y Outputs: uint Max X>Y if (X > Y) { Max = X; } else { Max = Y;
}
a a

!(X>Y)

!(X>Y)

X>Y

(then stmts)

(else stmts)

Max=X

Max=Y

(end)

(end)

(a)

(b)

(c)

Simple example: Computing the maximum of two numbers


Convert if-then-else statement to states (b) Then convert assignment statements to states (c)
Digital Design Copyright 2006 Frank Vahid

59

Example: Converting Sum-of-Absolute-Differences C


code to High-Level State Machine
Convert each construct to states
Simplify when possible, e.g., merge states
Inputs: byte A[256, B[256] bit go; Output: int sad main() { uint sum; short uint I; while (1) {
while (!go); !(!go)

!go

!go

go

!go

go sum=0 i=0

sum=0

From high-level state machine, follow RTL design method to create circuit Thus, can convert C to gates using straightforward automatable process
Not all C constructs can be efficiently converted Use C subset if intended for circuit Can use languages other than C, of course
Digital Design Copyright 2006 Frank Vahid

i=0
(c)

sum = 0; i = 0; while (i < 256) { sum = sum + abs(A[i] - B[i]); i = i + 1; } }


} sad = sum; !go (b)

(d)

go

!go

go a

(a)
!go go

sum=0 i=0
!(i<256)

sum=0 i=0
!(i<256)

sum=0 i=0
!(i<256) i<256

i<256
sum=sum + abs i=i+1

i<256
sum=sum + abs i=i+1 sad = sum

while stmts

(g) (e)
sad = sum

60

(f)

Memory Components
Register-transfer level design instantiates datapath components to create datapath, controlled by a controller
A few more components are often used outside the controller and datapath

5.6

MxN memory
M words, N bits wide each

M words
N-bits wide each MN memory

Several varieties of memory, which we now introduce


Digital Design Copyright 2006 Frank Vahid

61

Random Access Memory (RAM)


RAM Readable and writable memory
Random access memory
Strange name Created several decades ago to contrast with sequentially-accessed storage like tape drives
32 W_data 4 W_addr W_en 1632 register file R_data R_addr R_en 4 32

Logically same as register file Memory with address inputs, data inputs/outputs, and control
RAM usually just one port; register file usually two or more

Register file from Chpt. 4

RAM vs. register file


RAM typically larger than roughly 512 or 1024 words RAM typically stores bits using a bit storage approach that is more efficient than a flip flop RAM typically implemented on a chip in a square rather than rectangular shape keeps longest wires (hence delay) short
Digital Design Copyright 2006 Frank Vahid

32 10

data addr rw en 1024 32 R AM

RAM block symbol


62

RAM Internal Structure


32 10 data addr rw en 1024x32 RAM

Let A = log2M
d0 addr0 addr1
r d a

wdata(N-1) wdata(N-2) wdata0 word enable


bit storage block (aka cell) word

addr(A-1)
clk

a0 a1 AxM d1 decoder a(A-1)

data cell word word enable enable rw data

d(M-1) to all cells

en rw

rdata(N-1) rdata(N-2) rdata0

RAM cell

Similar internal structure as register file


Decoder enables appropriate word based on address inputs rw controls whether cell is written or read Lets see whats inside each RAM cell

Digital Design Copyright 2006 Frank Vahid

63

Static RAM (SRAM)


32 10 data addr
addr

Let A = log2 M d0

wdata(N-1)
word enable

wdata(N-2) wdata0
bit storage block ,, (aka cell ) word ,,

SRAM cell
data d
data cell

data cell d
a

rw en

1024x32 RAM

addr0 addr1

addr(A-1) clk en rw

a0 a1 A M d1 decoder a(A-1) e d(M-1) to all cells

word word enable enable rw data

rdata(N-1)

rdata(N-2)

rdata0

word enable

Static RAM cell


6 transistors (recall inverter is 2 transistors) Writing this cell
word enable input comes from decoder When 0, value d loops around inverters
That loop is where a bit stays stored

SRAM cell
data 1 d
a

data 0

When 1, the data bit value enters the loop


data is the bit to be stored in this cell data enters on other side Example shows a 1 being written into cell
Digital Design Copyright 2006 Frank Vahid
data

word enable

1
word enable

data cell d

64

Static RAM (SRAM)


32 10 data addr
addr

Let A = log2 M d0

wdata(N-1)
word enable

wdata(N-2) wdata0
bit storage block ,, (aka cell ) word ,,

rw en

1024x32 RAM

addr0 addr1

addr(A-1) clk en rw

a0 a1 A M d1 decoder a(A-1) e d(M-1) to all cells

data cell

word word enable enable rw data

Static RAM cell


Reading this cell

rdata(N-1)

rdata(N-2)

rdata0

SRAM cell
data 1 d 1 1 0
a

data 1

Somewhat trickier When rw set to read, the RAM logic sets both data and data to 1 The stored bit d will pull either the left line or the right bit down slightly below 1 Sense amplifiers detect which side is slightly pulled down

word enable

1 To sense amplifiers

<1

The electrical description of SRAM is really beyond our scope just general idea here, mainly to contrast with DRAM...
Digital Design Copyright 2006 Frank Vahid

65

Dynamic RAM (DRAM)


32 10 data addr
addr

Let A = log2 M d0

wdata(N-1)
word enable

wdata(N-2) wdata0
bit storage block ,, (aka cell ) word ,,

rw en

1024x32 RAM

addr0 addr1

addr(A-1) clk en rw

a0 a1 A M d1 decoder a(A-1) e d(M-1) to all cells

data cell

word word enable enable rw data

DRAM cell
data cell word enable

Dynamic RAM cell

rdata(N-1)

rdata(N-2)

rdata0

1 transistor (rather than 6) Relies on large capacitor to store bit


Write: Transistor conducts, data voltage level gets stored on top plate of capacitor Read: Just look at value of d Problem: Capacitor discharges over time
Must refresh regularly, by reading d and then writing it right back
Digital Design Copyright 2006 Frank Vahid

d capacitor slowly discharging (a)

data enable d discharges (b)

66

Comparing Memory Types


Register file
Fastest But biggest size
MxN Memory implemented as a: register file SRAM DRAM

SRAM
Fast More compact than register file

DRAM
Slowest
And refreshing takes time

But very compact

Use register file for small items, SRAM for large items, and DRAM for huge items
Note: DRAMs big capacitor requires a special chip design process, so DRAM is often a separate chip
Digital Design Copyright 2006 Frank Vahid

Size comparison for same number of bits (not to scale)

67

Reading and Writing a RAM


clk 1 addr data rw en RAM[9] RAM[13] now equals 500 now equals 999 9 500 13 999 Z 2 9 500 3 clk addr data rw valid setup time valid hold time setup time

500

1 means write

access time (b)

Writing
Put address on addr lines, data on data lines, set rw=1, en=1

Reading
Set addr and en lines, but put nothing (Z) on data lines, set rw=0 Data will appear on data lines

Dont forget to obey setup and hold times


In short keep inputs stable before and after a clock edge
Digital Design Copyright 2006 Frank Vahid

68

RAM Example: Digital Sound Recorder


4096 16 RAM

data

addr

rw

wire

16

en

microphone

analog-todigital converter

ad_buf

12

Ra Rrw Ren
da_ld

digital-toanalog converter

wire

ad_ld

processor

Behavior
Well use a 4096x16 RAM (12-bit wide RAM not common)

speaker

Record: Digitize sound, store as series of 4096 12-bit digital values in RAM Play back later Common behavior in telephone answering machine, toys, voice recorders

To record, processor should read a-to-d, store read values into successive RAM words
To play, processor should read successive RAM words and enable d-to-a
Digital Design Copyright 2006 Frank Vahid

69

RAM Example: Digital Sound Recorder


RTL design of processor
Create high-level state machine Begin with the record behavior Keep local register a
Stores current address, ranges from 0 to 4095 (thus need 12 bits)
16 analog-todigital converter ad_buf ad_ld 12 Ra Rw Ren da_ld digital-toanalog converter 4096x16 RAM

processor

Record behavior
Local register: a (12 bits) a<4095 S a=0 T ad_ld=1 ad_buf=1 Ra=a Rrw=1 Ren=1
a

Create state machine that counts from 0 to 4095 using a


For each a
Read analog-to-digital conv. ad_ld=1, ad_buf=1 Write to RAM at address a Ra=a, Rrw=1, Ren=1
Digital Design Copyright 2006 Frank Vahid

U a=a+1 a=4095

70

RAM Example: Digital Sound Recorder


Now create play behavior Use local register a again, create state machine that counts from 0 to 4095 again
For each a
Read RAM Write to digital-to-analog conv.
4096x16 RAM

data bus

16 analog-todigital converter ad_buf ad_ld 12 Ra Rw Ren da_ld digital-toanalog converter

processor

Note: Must write d-to-a one cycle after reading RAM, when the read data is available on the data bus

Play behavior
Local register: a (12 bits) a<4095 V a=0 W
a

The record and play state machines would be parts of a larger state machine controlled by signals that determine when to record or play
Digital Design Copyright 2006 Frank Vahid

ad_buf=0 Ra=a Rrw=0 Ren=1

X da_ld=1 a=a+1
a=4095

71

Read-Only Memory ROM


Memory that can only be read from, not written to
Data lines are output only No need for rw input
32 10 addr rw en 1024 32 R AM data

Advantages over RAM


Compact: May be smaller Nonvolatile: Saves bits even if power supply is turned off Speed: May be faster (especially than DRAM) Low power: Doesnt need power supply to save bits, so can extend battery life
RAM block symbol

32 10

data addr 1024x32 ROM en

Choose ROM over RAM if stored data wont change (or wont change often)
For example, a table of Celsius to Fahrenheit conversions in a digital thermometer
Digital Design Copyright 2006 Frank Vahid

ROM block symbol

72

Read-Only Memory ROM


32 10 addr en data 1024x32 ROM

Let A = log2M
d0 addr0 addr1
r d a

word enable

ROM block symbol

bit storage block (aka cell) word

addr(A-1)
clk

a0 a1 AxM d1 decoder a(A-1)

data word word enable enable data

d(M-1)

en rdata(N-1) rdata(N-2) rdata0

ROM cell

Internal logical structure similar to RAM, without the data input lines
Digital Design Copyright 2006 Frank Vahid

73

ROM Types
d0
addr

addr0 addr1

addr(A-1)

a0 a1 A M d1 decoder a(A-1) e d(M-1)

word

Storing bits in a ROM known as programming Several methods

en

data(N-1)

data(N-2)

data0

Mask-programmed ROM
Bits are hardwired as 0s or 1s during chip manufacturing
2-bit word on right stores 10 word enable (from decoder) simply passes the hardwired value through transistor
word enable

data line cell

Notice how compact, and fast, this memory would be


Digital Design Copyright 2006 Frank Vahid

,,

If a ROM can only be read, how are the stored bits stored in the first place?

Let A = log2 M word enable

bit storage block ,, (a cell )

data cell word word enable enable data

data line cell

74

ROM Types
d0
addr

addr0 addr1

Each cell has a fuse A special device, known as a programmer, blows certain fuses (using higher-than-normal voltage)
Those cells will be read as 0s (involving some special electronics) Cells with unblown fuses will be read as 1s 2-bit word on right stores 10

addr(A-1)

a0 a1 A M d1 decoder a(A-1) e d(M-1)

word

en

data(N-1)

data(N-2)

data0

data line cell

1 cell

word enable fuse blown fuse

Also known as One-Time Programmable (OTP) ROM


Digital Design Copyright 2006 Frank Vahid

,,

Fuse-Based Programmable ROM

Let A = log2 M word enable

bit storage block ,, (a cell )

data cell word word enable enable data

data line

75

ROM Types

d0
addr

addr0 addr1

Uses floating-gate transistor in each cell Special programmer device uses higherthan-normal voltage to cause electrons to tunnel into the gate
Electrons become trapped in the gate Only done for cells that should store 0 Other cells (without electrons trapped in gate) will be 1
2-bit word on right stores 10

addr(A-1)

a0 a1 A M d1 decoder a(A-1) e d(M-1)

word

en

data(N-1)

data(N-2)

data0

floating-gate transistor

data line cell 1

r o

word enable
n i t g a r t e t a g

ee trapped electrons

Details beyond our scope just general idea is necessary here

To erase, shine ultraviolet light onto chip


Gives trapped electrons energy to escape Requires chip package to have window
Digital Design Copyright 2006 Frank Vahid

,,

Erasable Programmable ROM (EPROM)

Let A = log2 M word enable

bit storage block ,, (a cell )

data cell word word enable enable data

data line cell 0

76

ROM Types
Electronically-Erasable Programmable ROM (EEPROM)
Similar to EPROM
Uses floating-gate transistor, electronic programming to trap electrons in certain cells

But erasing done electronically, not using UV light Erasing done one word at a time

Flash memory
Like EEPROM, but all words (or large blocks of words) can be erased simultaneously Become common relatively recently (late 1990s)
r o t n i t g a r t e

32 10

data addr en write busy 1024x32 EEPROM

Both types are in-system programmable

Can be programmed with new stored bits while in the system in which the ROM operates
Requires bi-directional data lines, and write control input Also need busy output to indicate that erasing is in progress erasing takes some time
Digital Design Copyright 2006 Frank Vahid

77

ROM Example: Talking Doll


4096x16 ROM

Hello there! audio divided into 4096 samples, stored in ROM

speaker

Hello there!

Hello there!
a

16

Ra Ren

digital-toanalog converter da_ld v

vibration sensor

processor

Doll plays prerecorded message, trigger by vibration


Message must be stored without power supply Use a ROM, not a RAM, because ROM is nonvolatile
And because message will never change, use a mask-programmed ROM or OTP ROM

Processor should wait for vibration (v=1), then read words 0 to 4095 from the ROM, writing each to the d-to-a
Digital Design Copyright 2006 Frank Vahid

78

ROM Example: Talking Doll


Local register: a (12 bits) 4096x16 ROM a=0
a d

v S T Ra=a Ren=1 v a=4095 v

a<4095

16

Ra Ren

digital-toanalog converter da_ld

U da_ld=1 a=a+1

processor

High-level state machine


Create state machine that waits for v=1, and then counts from 0 to 4095 using a local register a For each a, read ROM, write to digital-to-analog converter
Digital Design Copyright 2006 Frank Vahid

79

ROM Example: Digital Telephone Answering Machine


Using a Flash Memory
Want to record the outgoing announcement
When rec=1, record digitized sound in locations 0 to 4095 When play=1, play those stored sounds to digital-toanalog converter
y s u b

4096x16 Flash

Were not home.

What type of memory?


Should store without power supply ROM, not RAM Should be in-system programmable EEPROM or Flash, not EPROM, OTP ROM, or mask-programmed ROM Will always erase entire memory when reprogramming Flash better than EEPROM
Digital Design Copyright 2006 Frank Vahid

analog-todigital converter

16 ad_buf ad_ld 12 Ra Rrw Rener bu da_ld digital-toanalog converter

processor rec record play

microphone

speaker

80

ROM Example: Digital Telephone Answering Machine


Using a Flash Memory
High-level state machine
Once rec=1, begin erasing flash by setting er=1 Wait for flash to finish erasing by waiting for bu=0 Execute loop that sets local register a from 0 to 4095, reading analog-todigital converter and writing to flash for each a
a w d r n e

4096x16 Flash

analog-todigital converter

16 ad_buf ad_ld 12 Ra Rrw Ren er bu da_ld digital-toanalog converter

processor rec record play speaker

microphone

S a=0 er=1 rec

Local register: a (13 bits) bu a<4096 T bu U er=0 ad_ld=1 ad_buf=1 Ra=a Rrw=1 Ren=1 a=a+1

V a=4096

Digital Design Copyright 2006 Frank Vahid

81

Blurring of Distinction Between ROM and RAM


We said that
RAM is readable and writable ROM is read-only

ROM

Flash

RAM
a

EEPROM

NVRAM

But some ROMs act almost like RAMs


EEPROM and Flash are in-system programmable
Essentially means that writes are slow
Also, number of writes may be limited (perhaps a few million times)

And, some RAMs act almost like ROMs


Non-volatile RAMs: Can save their data without the power supply
One type: Built-in battery, may work for up to 10 years Another type: Includes ROM backup for RAM controller writes RAM contents to ROM before turning off

New memory technologies evolving that merge RAM and ROM benefits
e.g., MRAM

Bottom line
Lot of choices available to designer, must find best fit with design goals
Digital Design Copyright 2006 Frank Vahid

82

Queues
A queue is another component sometimes used during RTL design Queue: A list written to at the back, from read from the front
Like a list of waiting restaurant customers
write items
to the back of the queue

5.7

back

front

read (and
remove) items from front of the queue

Writing called a push, reading called a pop Because first item written into a queue will be the first item read out, also called a FIFO (first-infirst-out)
Digital Design Copyright 2006 Frank Vahid

83

Queues
7 6 5 4 3 2 1 0

Queue has addresses, and two pointers: rear and front


Initially both point to 0

Push (write)
Item written to address pointed to by rear rear incremented
A

rf 0 A
a

Pop (read)
Item read from address pointed to by front front incremented
7 B r 2 6 5 4 3 2

r 1 B

f 0 A f 0
a a

If front or rear reaches 7, next (incremented) value should be 0 (for a queue with addresses 0 to 7)
Digital Design Copyright 2006 Frank Vahid

1 B

f 84

Queues
Treat memory as a circle
If front or rear reaches 7, next (incremented) value should be 0 rather than 8 (for a queue with addresses 0 to 7)
7 6 5 4 3 2 1 B r f 0 A

Two conditions of interest


Full queue no room for more items
In 8-entry queue, means 8 items present No further pushes allowed until a pop occurs Causes front=rear 1 B
f

0 7
a

Empty queue no items


No pops allowed until a push occurs Causes front=rear 2
r
r

Both conditions have front=rear


To detect whether front=rear means full or empty, need state machine that detects if previous operation was push or pop, sets full or empty output signal (respectively)
Digital Design Copyright 2006 Frank Vahid

3 4

85

Queue Implementation
Can use register file for item storage Implement rear and front using up counters
rear used as register files write address, front as read address
wdata 16 3 8 16 register file wdata waddr wr clr inc rd rdata raddr rd 3 16 rdata

wr

clr inc 3-bit up counter front

Simple controller would set control lines for pushes and pops, and also detect full and empty situations
FSM for controller not shown
Digital Design Copyright 2006 Frank Vahid

reset

Controller

3-bit up counter rear

eq

=
full empty 8-word 16-bit queue

86

Common Uses of a Queue


Computer keyboard
Pushes pressed keys onto queue, meanwhile pops and sends to computer

Digital video recorder


Pushes captured frames, meanwhile pops frames, compresses them, and stores them

Computer network routers


Pushes incoming packets onto queue, meanwhile pops packets, processes destination information, and forwards each packet out over appropriate port

Digital Design Copyright 2006 Frank Vahid

87

Queue Usage Example


7 6 5 4 3 2 1 0

Example series of pushes and pops


Note how rear and front pointers move Note that popping doesnt really remove the data from the queue, but that data is no longer accessible Note how rear (and front) wraps around from address 7 to 0

Initially empty queue 7 1. After pushing 9, 5, 8, 5, 7, 2, 3 r 7 2. After popping r 7 3. After pushing 6 6 6 3 5 2 4 7 3 5 2 8 1 5 rf 0 9 f 0 9 data: 9

6 3

5 2

4 7

3 5

2 8

1 5 f 1 5 f 1 5 rf

6 3

5 2

4 7

3 5

2 8

0 9 r 0 3 full

Note: pushing a full queue is an error


As is popping an empty queue
Digital Design Copyright 2006 Frank Vahid
4. After pushing 3

7 6

6 3

5 2

4 7

3 5

2 8

5. After pushing 4

ERROR! Pushing a full queue results in unknown state

88

Hierarchy A Key Design Concept


Hierarchy
An organization with a few items at the top, with each item decomposed into other items Common example: A country
1 item at the top (the country) Country item decomposed into state/province items Each state/province item decomposed into city items
CityA CityD

5.8

Province 2

Province 3

Province 1 Province 1

CityF

CityB CityC

CityE

CityG

Country A

Map showing all levels of hierarchy

Province 2

Province 3

Hierarchy helps us manage complexity


P P i v n r r P o o r n i v n i v o n i v c c 2 e 3 e c

To go from transistors to gates, muxes, decoders, registers, ALUs, controllers, datapaths, memories, queues, etc. Imagine trying to comprehend a controller and datapath at the level of gates
c 1 e 1 e

n i v

i v n

2 e

Country A

Map showing just top two levels of hierarchy


89

Digital Design Copyright 2006 Frank Vahid

Hierarchy and Abstraction


Abstraction
Hierarchy often involves not just grouping items into a new item, but also associating higher-level behavior with the new item, known as abstraction
e.g., an 8-bit adder has an understandable high-level behavior it adds two 8-bit binary numbers

a7.. a0

b7.. b0 ci

8-bit adder co
P r o n i v i v n c

Frees designer from having to remember, or even from having to understand, the lower-level details
P r o P P i v n r r P o o r c n i v n i v o 1 e n i v c c 2 e 3 e c 1 e

s7.. s0
P r o c

2 e

3 e

Digital Design Copyright 2006 Frank Vahid

90

Hierarchy and Composing Larger Components from Smaller Versions


A common task is to compose smaller components into a larger one
Gates: Suppose you have plenty of 3-input AND gates, but need a 9-input AND gate
Can simple compose the 9-input gate from several 3-input gates

i0 i1 i2 i3

4 1 i0
i1 i2 i3 2 1 s1 s0 i0 d i1 s0 d d
a

Muxes: Suppose you have 4x1 and 2x1 muxes, but need an 8x1 mux
s2 selects either top or bottom 4x1 i4 s1s0 select particular 4x1 input Implements 8x1 mux 8 data inputs, 3 selects, one output i5
P

4 1 i0
i1 i2 i3
i v n c 3 e P r o

i6
P r o n i v

i7
c 2 e

i v n

n i v

n i v

1 e

n i v

2 e

3 e

1 e

s1 s1

s0 s0 s2

Digital Design Copyright 2006 Frank Vahid

91

Hierarchy and Composing Larger Components from Smaller Versions


Composing memory very common Making memory words wider
Easy just place memories side-by-side until desired width obtained Share address/control lines, concatenate data lines Example: Compose 1024x8 ROMs into 1024x32 ROM
10
r d a

n e

addr 1024x8 ROM en data 8


P

addr 1024x8 ROM en data


P

addr 1024x8 ROM en data


P r o

addr 1024x8 ROM en data


P r o

8
P r o n i v

8
r o i v n c 3 e

n i v

data(31..0)

n i v

2 e

3 e

1 e

10
r d a

1024x32 ROM data 32

n e

Digital Design Copyright 2006 Frank Vahid

92

Hierarchy and Composing Larger Components from Smaller Versions


11

Creating memory with more words


r d a

a9..a0 a10 1x2 d0 i0 dcd e d1

addr 1024x8 ROM en data 8

Put memories on top of one another until the number of desired words is achieved Use decoder to select among the memories
Can use highest order address input(s) as decoder input Although actually, any address line could be used
n e

addr 11
r d a

Example: Compose 1024x8 memories into 2048x8 memory


a10a9a8 a0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
P r P o r n i v o

1024x8 ROM 2048x8 ROM data en data 8


P

n e

a10 just chooses which memory to access


Digital Design Copyright 2006 Frank Vahid

0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1
n i v c 2 e c 1 e

addr 1024x8 ROM en data


P r o n i v c 3 e

8
i v n

n i v

2 e

3 e

1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1

addr 1024x8 ROM en data

To create memory with more words and wider words, can first compose to enough words, then widen.
93

Chapter Summary
Modern digital design involves creating processor-level components Four-step RTL method can be used
1. High-level state machine 2. Create datapath 3. Connect datapath to controller 4. Derive controller FSM

Several example
Control dominated, data dominated, and mix

Determining fastest clock frequency


By finding critical path

Behavioral-level design C to gates


By using method to convert C (subset) to high-level state machine

Additional RTL components


Memory: RAM, ROM Queues

Hierarchy: A key concept used throughout Chapters 2-5


Digital Design Copyright 2006 Frank Vahid

94

Das könnte Ihnen auch gefallen