Sie sind auf Seite 1von 335

Modeling with SystemC TM

Workshop Version 1.4


An investment in continuing success!

© 2000 Author: Martin Wang 3 Days SystemCTM training based on SystemC Ver. 1.0
Author: Martin Wang of SLD

Workshop Prerequisites

 Familiarity with UNIX workstations running X-


windows OR Windows NT
 Familiarity with vi, emacs, or other UNIX/NT
text editors
 Familiarity with C programming language

2
Ver 1.4

1
Workshop Target Audience
 System Architects
 HW Engineers responsible for board, ASIC and
FPGA design
 Test/Verification Engineers
 Programmers, software developers

System Architect Software Developer

SystemCTM

RTL Designer Lead Architect

3
Ver 1.4

Workshop Goal

Acquire the basic skills to write,


simulate and debug general
Hardware Models using
SystemCTM Ver 0.91

4
Ver 1.4

2
Workshop Measurable Objectives
By the end of this workshop, you should be able to:

 Demonstrate an applied, working knowledge of


SystemCTM C++ class libraries.
 Write general hardware models with SystemCTM.
 Simulate and debug hardware models using
SystemCTM.
 Write test-benches and verify SystemCTM models.

This is not just a language class, this is an design class!

5
Ver 1.4

Agenda: Day One


DAY
1
Unit Topic Lab

1 Why C-based Design Flow

2 SystemCTM - Introduction

3 C++ for the C Programmer

4 Datatypes for Diff Abstraction Level

5 Communication with Signals

6
Ver 1.4

3
Agenda: Day One
DAY
1
Unit Topic Lab

6 Ports and Signals

7 Asynchronous Function Process

8 Asynchronous Thread Process

7
Ver 1.4

Agenda: Day Two


DAY
2
Unit Topic Lab

9 Special Case: Synchronous Process

10 Process Execution Order

11 Top_Level and Testbench

12 Channels for Abstract Protocol

13 Hierarchy for Modular Design

8
Ver 1.4

4
Agenda: Day Three
DAY
3
Unit Topic Lab

14 Global and Local Watching

15 Modeling BUS with Resolved Vector

16 Refinement

17 Functional I/F

18 Hardware/Software Co-verification

9
Ver 1.4

Agenda: Day Three


DAY
3
Unit Topic Lab

19 System-on-a-Chip

20 Workshop Summary

21 Appendix and Labs Solutions

22 Quick Reference Guide

10
Ver 1.4

5
Agenda: Structured Labs & Exercises
DAY
Lab Topic Lab

1 Getting Started

2 Simple Arithmetic Pipeline Design

3 Simple Memory Controller

4 4x4 Multicast Helix Packet Switching

5 Multiple-Cycle RAM

6 Simple RISC CPU


11
Ver 1.4

Agenda: Labs
DAY
Lab Topic Lab

7 Image Smoother

8 Simple Arithmetic Pipeline Design II

9 Master-Slave Bus System

10 Simple FIR Filter IP

11 RSA Public-Key Crypto-system

Labs & exercises will challenge your understanding of the concepts taught
Additional experimentation labs are provided for those who might finish the required labs early
12
Ver 1.4

6
Icons Used in this Workshop
Icon What it Means Icon What it Means

Lab Exercise Caution

Question Note

Hint, Tip, or Suggestion Remember

Simulation Specific Implementation Specific

Lab Solutions at the end of the book

13
Ver 1.4

Abbreviations and Acronyms

Acronym Meaning

FSM Finite State Machine

GTECH Generic Technology

Hardware Description
HDL
Language

RTL Register Transfer Level

Synopsys On-Line
SOLD
Documentation

14
Ver 1.4

7
Agenda: Day One
DAY
1
Unit Topic Lab

1 Why C-based Design Flow

2 SystemCTM - Introduction

3 C++ for the C Programmer

4 Datatypes for Diff Abstraction Level

5 Communication with Signals

15
Ver 1.4

Today’s Market

Development Times are getting shorter


1st Silicon tape-out need to be functional !

Years to 1 Million Sales


Sales PCS PCVCR Color Cable
Volume 3G Cellula
TV TV
Cellul r Black &
ar White
1 DVD TV
Million
Units

5 10 15

16
Ver 1.4

8
Designers Challenges
Design Complexity is becoming greater
Design Complexity is still increasing

System on a Chip

High level
Design Complexity Behavioral Synthesis
Cycle-Based Simulation
Design Reuse
Deep Submicron
Gates
Logic
Synthesis

Transistors

ASICs Design Productivity


Place & Route
Logic Simulation
Polygons
SPICE
Pattern Generation
IC CAD

1975 1980 1985 1990 1995 2000


17
Ver 1.4

The Design Process Today


System Level Design
Hardware and Software
Algorithm Development
Processor Selection
Done mainly in C/C++
C/C++ Environment
Refinement Refinement
Manual Translation

IC Development Software
The Verification Process
Hardware Code development
Implementation RTOS details
Decisions
Done mainly in
$$ Done mainly in C/C++

Verilog/VHDL Emulation /
Prototyping
EDA Environment C/C++ Environment

Productivity Gaps

18
Ver 1.4

9
IC Development Process
The current design flow
Same flow for RTL, schematic, polygons, etc
Design done then captured in an EDA environment
Very little, if any, “what-if” exploration
Designers can end up focussing more on “data management”
than “design”
Library-based SystemDesign Methodology

19
Ver 1.4

HDL Based Flow


C/C++ HDL
4. Hand over specification document

1. Conceptualize
2. Simulate in C++ 5. Understand
3. Write specification 6. (Re)Implement in HDL
document 7. (Re)Verify
8. Synthesize from HDL

Problems:Written specifications are incomplete and inconsistent


Translation to HDL is time consuming and error prone

20
Ver 1.4

10
C++ Based Flow
C/C++
C/C++

1. Conceptualize
2. Simulate in C++
3. Write specification
document
4. Hand over
• Executable specification
• Testbench
• Written specification
5. Understand
6. Refine in C++
7. Verify reusing testbenches
8. Synthesize from C++

Turning Algorithms into the Right Architectures for ASICs


quicker and better
21
Ver 1.4

Why C/C++ Based Design?

Specification between architect and implementer is executable

High simulation speed at the higher level of abstraction

Refinement, no translation into HDL (no „semantic gap“)

Testbench re-use

C/C++

System
Architect SoC Marketing
& Sales

C/C++ Design HDL

Software Hardware
Designer Designer
22
Ver 1.4

11
Existing Approaches are limited ...

Languages such as C/C++ and Java are not created to


model HW

Lack of uniform modeling style

Translation tools (C/C++ to HDL)


No clear design methodology

Sub-optimal QoR

Why another translation tool?

23
Ver 1.4

Can C++ be used as is?

 C++ does not support


 Hardware style communication
 Signals, protocols, etc.
 Notion of time
 Time sequenced operations.
 Concurrency
 Hardware and systems are inherently concurrent, i.e. they
operate in parallel.
 Reactivity
 Hardware is inherently reactive, it responds to stimuli and is in
constant interaction with its environment, which requires
handling of exceptions.
 Hardware data types
 Bit type, bit-vector type, multi-valued logic type, signed and
unsigned integer types and fixed-point types.

24
Ver 1.4

12
Summary

 Why C-based Design Flow


 covered HDL based flow
 covered C++ Based flow
 covered benefits of a C++ based flow
 covered “Can C++ be used as is?”

25
Ver 1.4

Agenda: Day One


DAY
1
Unit Topic Lab

1 Why C-based Design Flow

2 SystemCTM - Introduction

3 C++ for the C Programmer

4 Datatypes for Diff Abstraction Level

5 Communication with Signals

26
Ver 1.4

13
C++ powered by SystemCTM
Purely Algorithm
/Software in C++
What is SystemC ?
SystemC is a C++ Class library
HW/SW partitioning Include any C++ programs, libraries, encapsulation
... a methodology for modeling SoC designs consisting DSPs, ASICs, IP-
Cores, Interfaces, ...
High Level Abstract
Hardware Model in SystemC also enables
SystemC
Modeling at high level of abstraction e.g. communication protocols
C++ Software Model Refinement to hardware
Software modeling - interrupts, exception handling
System wide verification
Hardware/Software co-verification
IP exchange
Behavioral Level
SystemC provides all the advantages of Verilog and VHDL
Hardware Model in
SystemC Concurrent processes e.g. methods, threads, clocked threads
Concept of a clock
C++ Software Model Wide variety of bit-true data types
SystemC IS NOT
Another C++ dialect -> it is C++
Just for hardware modeling only -> you can model hardware AND
Register Transfer software in C++ with SystemC
Level Hardware
Model in SystemC

C++ Software Model

C++ Programs 27
Ver 1.4

Short History of SystemCTM

Scenery V0.9 Launches 9/27/1999 V1.0 Release 3/2000

1997 DAC Paper


fixed pt datatypes

HDL constructs

1997 1998 1999 2000

• Source Code
• User Guide
• Reference Manual
• Discussion Forum
• All available from http://www.SystemC.org
28
Ver 1.4

14
Getting Started with SystemC

Compiler
gcc
native compiler Visual C++, SUN cc
Debugger
gdb, ddd
lint, profiler, memory access checking
quantify, purify
www.gnu.org

29
Ver 1.4

First example

Compiler version
which gcc, gcc -v
which architecture is used
autoconf helps installation
./configure; gmake
Makefiles control what is included
compile directives
libraries
include files

30
Ver 1.4

15
Books/Training

Nothing replaces hands-on experience!


Books
Kernigham/Ritchie; The C programming language
Stroustrup; The C++ Programming language
Lippman; C++ primer
man pages
web sources

31
Ver 1.4

Compile SystemC and Run the executable!


your standard
C/C++ development
environment DSP
header files Interface
compiler
IP-Core
.......

libraries
linker ASIC

class library debugger


and
....

simulation kernel source files for system


and testbenches
„make“
l e
tab n“
ex ecuicatio
„ ecif a.out
sp
executable = simulator

32
Ver 1.4

16
A SystemCTM System
 System consists of a set of concurrent processes
 Process
describes functionality
 Processes communicate with each other through
signals and channels
 Processes can be combined into modules to create
hierarchy

System
Process 3
Process 4
signals

Process 1 Process 5
Process 2
channel Module 1

33
Ver 1.4

Simple Example - 1

main.cc in1 in2


a1 in1
Stimulus Response
Adder out re
Generator Monitor
a2 in2

clk

stimgen.cpp adder.cpp monitor.cpp


+
stimgen.h adder.h monitor.h

main.cpp out

34
Ver 1.4

17
Simple Example - 2
// top level routine/level
#include “systemc.h”
#include “adder.h”
main.cc
#include “stimgen.h”
#include “monitor.h” a1 in1
Stimulus Response
Adder out re
int sc_main(int ac, char *av[ ]){ Generator Monitor
a2 in2
// Create signals to tie modules
sc_signal<int> s1; clk
sc_signal<int> s2;
sc_signal<int> s3;

// Create clock
sc_clock clock(“CLOCK”, 10, 0.5); stimgen.cpp adder.cpp monitor.cpp

// Instantiate modules stimgen.h adder.h monitor.h


adder Add(“MyAdd”);
Add << clock << s1 << s2 << s3;
stimgen ST(“STIM”);
ST(clock, s1, s2);
monitor M(“MON”); main.cpp
M.clockint(clock);
M.s3int(s3);

// Simulate
sc_start(200);
return(0); // since sc_main have
// type of int, return 0
// means no error.
}
35
Ver 1.4

Simple Example - 3
main.cc
// header file adder.h a1 in1
struct adder : public sc_module { Stimulus Response
Adder out re
// Input ports Generator Monitor
sc_in_clk CLK; a2 in2
sc_in<int> in1;
sc_in<int> in2; clk
// Output ports
sc_out<int> out;

// Constructor
adder (const char * NAME)
: sc_module(NAME) { stimgen.cpp adder.cpp monitor.cpp
sc_sync_tprocess(handle1,”ADDER”,
adder, entry, CLK.pos()) stimgen.h adder.h monitor.h
end_module();
}

// Functionality of the process


void entry();
}; main.cpp

36
Ver 1.4

18
Simple Example - 4
main.cc
a1 in1
Stimulus Response
// Implementation file adder.cc Adder out re
Generator Monitor
#include “systemc.h”
#include “adder.h” a2 in2

void clk
adder :: entry( )
{
while (true) {
out = in1 + in2;
wait( ); stimgen.cpp adder.cpp monitor.cpp
out = in1 + in2 + 2;
wait( ); stimgen.h adder.h monitor.h
}
}

main.cpp

37
Ver 1.4

Another Simple Multiply and Add Example


s3 // header file mac.h
s1
a1 in1 re out #include “systemc.h”
Stimulus Response struct mac : public sc_module {
MAC
Generator s2 s4 Monitor // Input ports
a2 in2 Rdy OK sc_in_clk CLK;
sc_in<int> in1;
CLK sc_in<int> in2;
// Output ports
// top level routine main.cc sc_out<int> out;
#include “mac.h” sc_out<bool>rdy;
#include “stimgen.h”
// Constructor
#include “monitor.h” mac (const char * NAME)
:sc_module (NAME) {
int sc_main(int ac, char *av[ ]){ sc_sync_tprocess(handle1,”MAC”, mac,
// Create signals entry, CLK.pos())
sc_signal<int> s1; { /* watching for reset */ }
sc_signal<int> s2; // Functionality of the process
sc_signal<int> s3; void entry();
sc_signal<bool> s4; };

// Create clock
sc_clock clock(“clock”, 10, 0.5); // Implementation file mac.cc
#include “mac.h”
// Instantiate Processes void mac :: entry( ){
mac MAC(“MAC_BLOCK”);
int int_var; // internal variables
MAC << clock << s1 << s2 << s3 << s4; while (true) {
// other processes e.g. stimulus generator, monitor rdy = false;
// Simulate int_var = in1 * in2;
sc_start(200); wait( );
return(0); out = out + int_var;
rdy = true;
}
wait( );
}
}
38
Ver 1.4

19
So, How to create a SystemC Model

Determine the abstraction level SystemC - System Level

Algorithm Level Description

Determine signals & channels


between processes
Behavioral Level Description

Determine protocol for signals RTL Description


between processes

Create the process


functionality

39
Ver 1.4

Summary

 SystemC - Introduction
 covered a short history of SystemCTM
 covered how to use SystemCTM
 covered a basic SystemCTM example
 covered how to create a SystemCTM model

40
Ver 1.4

20
Lab 1: Getting Started - 1
Primary Objective:
Understanding A SystemC System
This lab introduces the process of simulating with SystemCTM.
You will simulate a simple counter that counts up to seven, then
wraps around to zero. The counter can be reset at any point.

Counter.h contains the process declaration


Counter.cpp contains the process
functionality

Test_Counter.h contains the stimulus and


control process declaration
Test_Counter.cpp contains the stimulus and
control functionality

main.cpp contains the main entry point and


instantiates the two processes

41
Ver 1.4

Lab 1: Getting Started - 2


1. Compile and link the files with the following command:

gmake ultraclean; gmake

2. Run lab1.x You should see the counter counting and being reset

OR. Compile each file with the following command:


gcc -c -g -I. -I../include Counter.cpp
gcc -c -g -I. -I../include test_Counter.cpp
gcc -c -g -I. -I../include main.cpp
Where:
-c Compile only - do not attempt to link
-g Include debugging information
-I. Look for include files in the current dir
-I../include Look for include file in the ../include dir Note: no space between the option (I)and the argument.

Link the object files together to create the executable:

gcc -o lab1.x *.o -lsystemc-g -L../lib -lstdc++

Where:
-o lab1.x The resulting executable file will be called lab1.x
*.o Link together all the object files
-lsystemC-g Link with the library file libsystemC-g.a
-L../lib Look for the previous library in the ../lib dir
-lstdc++ Link with the standard I/O library

42
Ver 1.4

21
Lab 1: Sample Output

43
Ver 1.4

Lab 1: Debugging with SystemCTM - 1


Debugging. You determine what is wrong in your simulation. Debugging
consists of:
 Controlling the execution of your simulation
 Examining values of key data during the course of simulation

Use a debugger tool for source-level debugging:


 Unix: gdb, xgdb, dbx
 NT: Visual C++ development environment

Use "print" I/O statements in your code


 SystemCTM provides useful data

Trace waveforms for post-simulation viewing

Tip:
To enable source-level debugging, compile
with the -g option in gcc

44
Ver 1.4

22
Lab 1: Debugging with SystemCTM - 2

Controlling the execution of the simulation:

SystemCTM is different than normal, sequential programming


 Single-stepping from the beginning is not productive
 You don't know the actual order in which your processes will
execute
Tips:
 Set breakpoints in processes (first executable
statement inside the while loop)
proc1::entry()
 For asynchronous blocks, at first executable
{
statement
 Single-step within processes // Declarations, initialization
 Step over calls to SystemCTM built-in functions while (true)
{
val = some_func(in_sig);
Set breakpoint out_sig.write(val);
here wait();
}
Step OVER
}

45
Ver 1.4

Lab 1: Debugging with SystemCTM - 3

Examining data values:


While single-stepping, use the print command in the debugger
 gdb is C++ savvy

Use C++ I/O to print out data values


 SystemCTM data types know how to display themselves using C++ I/O
 If you must use C-style I/O, SystemCTM data types have a to_string()
method
proc1::entry()
{
while (true)
{
val = some_func(in_sig);
printf("proc1(%s): in_sig = %s\n",
sc_time_stamp(), in_sig.to_string());
cout << "proc1 (" << sc_clock::time_stamp() <<
"): val =" << val << endl;
out_sig.write(val);
wait();
}
}

46
Ver 1.4

23
Lab 1: Overview of ddd - Free Graphical C++ Debugger
http://www.gnu.org/software/ddd/

47
Ver 1.4

Lab 1: Debugging with ddd - 1


Invoking
the debugger:
ddd lab1.x is the executable

File -> Open Program

Open lab1.x

48
Ver 1.4

24
lab 1: Debugging with ddd - 2

Click here to Run!

NOTE: If you see this sc_xxx.cpp


then your library is compiled
correctly for debug mode.

49
Ver 1.4

Lab 1: Debugging with ddd - 3

Outputs window

50
Ver 1.4

25
Lab 1: Debugging with ddd - 4

51
Ver 1.4

Lab 1: Debugging with ddd - 5

52
Ver 1.4

26
Lab 1: Debugging with ddd - 6

53
Ver 1.4

Lab 1: Debugging with ddd - 7

Put your mouse here


=> get current value of “count” Set your breakpoint here!

Step your simulation

54
Ver 1.4

27
Case Study: Digital Video Camera Design Project
Subsequent labs will be based on the following design specification. By
modeling different modules using different abstraction levels we will learn
the capabilities of SystemCTM

Challenge
 Need to prove the concept -> ask for design
funding FAST from management.
 Complete design as fast as possible.
 IP Reuse, minimum changes to IP.
 Need to quickly explore different architectures
before actual implementation.
 Easy and quick validation/debug.
 Need a golden C++ model.
 Need a system model for software development
before hardware design complete.
 Quick to Prototyping.

Need RAPID product development to enter explosive, competitive consumer markets

55
Ver 1.4

Case Study: DSP Systems Development Process

DSP System Design Objectives

Algorithm • Algorithm Design:


Models
Design ³ Stable, accurate algorithms
that perform as required
Performance • Performance Optimization:
Models
Optimization ³ Optimized fixed-point
algorithms that meet
performance specifications
under system operating
Hardware Verification Software conditions
Implement- (Unit Test) Implement-
y

• HW or SW Implementation:
ation ation ³ Optimized VLSI architecture
that executes algorithms
efficiently and rapidly
• Integration/Verification:
System Integration and Verification ³ Functional correctness of
(System Test) the implementation of the
SYSTEM

56
Ver 1.4

28
Case Study: Digital Video Camera System View
DSP ASIC
CPU Core Application Software,
Device Driver… in C++
Reg Decode DSP Core FFT
from Company X While (…) {
BUS ...
Fetch

FPU ALU BUS }


SIMD Controller cout << …

System Bus Compiler

Assembly code/
Machine Code
DMA Network
Memory Add R1, R2, R3 0000 0110110010110101
1000 0111100001010101
Interface mul R2, R4. R5
0100 0011111010101011
1100 0010101011111001
0010 1101010111010101
1010 0000000011010101
bnz loop1 0110 0111111110010111

SystemCTM models
C++ models

57
Ver 1.4

Case Study: SystemCTM View


DSP ASIC
//Example: sc_main.cpp
(System Level)
DSP Core FFT #include “systemc.h” // REQUIRED
CPU Core from Company X #include “cpu_core.h”
#include “dsp_asic.h”
#include “testbench.h”

BUS int sc_main(int ac, char* av[]){ // MAIN
controller
// internal signals
...
sc_clock clk(“SYSCLK”, 20.0, 0.5,0.0);
testbench T(“TESTBENCH”, ...
cpu_core CPU(“CPU_CORE”, ...
dsp_asic CPU(“DSP_ASIC”, ...

sc_start(1000);
}
System Bus

DMA Hierarchical Design


Network Memory
Interface
//Example: cpu_core.cpp
(Cluster Level)
//Example: cpu_core.h
(Cluster Level) //Example: fft.h (Unit Level)
//Example:whiledsp_asic.cpp
(true) { #include “debug.h”
(Cluster Level) //Example: fft.cpp (Unit Level)
//Example:
while dsp_asic.h
(true) { #include “fft.h”
Provide (Module Level) ……
IP vendor .obj(compiled from same compiler) while “ip_interface.h”
#include (true) {
& interface.h file #include “fft.h”
….
#include “bus_controller.h”
//Example: bus_controller.h
}
… …. (Unit Level)
struct dsp_asic : sc_module { #include
//Example: “debug.h”
bus_controller.cpp
}
SystemCTM models …. (Unit Level)
//internal signals declaration. #include “bus_interface.h”
} ……
...
C++ models // constructor
dsp_asic (const char* NAME,…
...
}
58
Ver 1.4

29
Agenda: Day One

DAY
1
Unit Topic Lab

1 Why C-based Design Flow

2 SystemCTM - Introduction

3 C++ for the C Programmer

4 Datatypes for Diff Abstraction Level

5 Communication with Signals

59
Ver 1.4

Identifiers & Comments


Identifiers are the names assigned by the user to C++
or SystemCTM objects such as variables.
 Identifiers must begin with a letter or an underscore (a-z A-Z _ ).
 Identifiers may be composed of letters, digits and the underscore
(a-z A-Z 0-9 _ ).
 C++ is case sensitive
 No limit on length and all are significant.

Comments
Single line comments:
• Begin with "//" and end with a carriage return
• May begin anywhere on the line.
Multiple line comments:
• Begin with "/*" and end with a "*/"
• May begin and end anywhere on the line
• Everything in between is commented out

60
Ver 1.4

30
Strings

• Strings are enclosed in double quotes and are


specified on one line.
• SystemCTM recognizes normal C/C++ escape
Characters (\t, \n, \\, \",%%).

”Synopsys's tools are the key to your success!"


"This format is spaced with a tab \t followed with this"
"\n This puts a newline before this string"
"Address = %h at time %d"

61
Ver 1.4

If and if-else
syntax:
if (expression) statement;
If the expression evaluates to true then execute the statement (or
statement group)

if (expression) statement1;
else statement2;
If the expression evaluates to true then execute statement1,
If false, then execute statement2 (or corresponding statement groups).

if (Sel_A == 1){
Data_D 0 Data_Out = Data_A;
}
else if (Sel_B == 1) {
Data_C 1
Data_Out = Data_B;
Sel_C 0
}
else if (Sel_C == 1) {
0
Data_Out = Data_C;
Data_B 1 Data_Out
}
1 else
Sel_B
Data_A Data_Out = Data_D;
Sel_A }

62
Ver 1.4

31
Case statement
syntax:
switch (expression){
case_item_1: statement or statement_group
break;
case_item_2: break; REMEMBER TO PUT BREAK!
case_item_3: statement or statement_group
break;
case_item_n: statement or statement_group A 00
break; B 01
default: statement or statement_group D_Out
C 10
break; D 11
}

Sel(1 downto 0)

•Does an identity comparison


•Compares expression with each case_item_(n) in turn.
•If none match, the default code is executed.
•Default clause is ideal to catch unknown/unspecified values

63
Ver 1.4

When to use : case or if-then-else

Some general rules to remember:


• Use if-else where you MUST have priority encoded
logic
• Use case for non-priority encoded logic
• case items are mutually exclusive
• Always specify a default clause in case statements
• There is an impact on simulation speed

64
Ver 1.4

32
SystemCTM Modeling Constructs
SystemC Modeling
Modules

Inputs If-else Case Outputs

For While wait


Loop Loop statements

Clocks Operations

Loop Exits/
I/O Variables Continues

65
Ver 1.4

Quick(!) Introduction to C++ Class


structs
 Are used in C to group data elements together as an integral
whole.
 C++ extends this notion to include functions as data members
as well as data types.
 Encapsulates behavior as well as characteristics

Example:
struct RandNum
{ Function can be a data member
unsigned long seed;
unsigned short getRand();
}

Bad idea to allow users of an object to directly use and manipulate internal data
Better to provide a public interface

66
Ver 1.4

33
C++ Classes and Data Abstraction
C++ provides modifiers to control access to data:
public (default) IP
 private
 protected
 In C++, a class default is private. Struct default is public
Member access specifier

Example:
class Time{
private:
int hour;
int minute;
int second;
public:
Time();
void setTime(int, int, int);
void printMilitary();
void printStandard();
}

Alert: do not initialize a data member explicitly in class definition

67
Ver 1.4

Information Hiding Example

Example:

Time rv; Error!! Can’t access private member

rv.hour = 12;

rv.setTime(11, 12, 13); This is the right way


rv.printStandard();

Member functions are accessed


just like data members

Tips:Making data members of a class private and member functions of class


public facilitate debugging because data manipulation is localized to either
class’s member function or friends of the class.

68
Ver 1.4

34
Data Initialization
 C++: A special member function (constructor) is automatically called
by the compiler.
 Constructor has same name as the class or struct, with no return
type
Example: You can have more than
class Time{
private: one constructor, so you
int hour, minute, second; can initialize your data in
public:
Time(); different ways
void setTime(int, int, int);
void printMilitary();
void printStandard();
}
// constructor initialize each data members Don’t forget class name and :: !
Time::Time() { hour = minute = second = 0;}
void Time::setTime (int h, int m, int s) {
hour = ..
}
void Time::printMilitary() {
cout << (hour < 10 ? …
}

C: Explicitly call initialization routines, but not in C++.

69
Ver 1.4

C++ Constructor Overview

 A class member function is invoked automatically each time it


is instantiated

 Constructor can be overloaded for a variety of reasons like


initializing objects of the class

 Data members must:


 initialized in a constructor of the class
 their values are set after the object is created

Tips: NO return value for a constructor!


70
Ver 1.4

35
Operator & Function Overloading
 C: Multiple functions with unique names
 C++: Functions with the same name, different arguments
 C++: Define how operators interpret data, as if built into the
language.
 C++ is strongly typed
 Need more flexible, generic object e.g. Instances differ only in type

// Example: overloading for complex number


class Complex {
public:
Complex(double = 0.0, double = 0.0);
Complex operator+(const Complex &) const;
const Complex &operator=( const Complex &); Constructor !
void print() const;
private:
double real, imaginary;
};
Overloading + operator
Complex::Complex(double r, double I)
:real(R), imaginary(I){}

Complex Complex::operator+(const Complex &operand2) const{


return Complex(real+operand2.real, imaginary+operand2.imaginary);
}

71
Ver 1.4

Inheritance

 Reuse existing classes by defining a new sub-class that inherits


from a super (or parent) class
 You change only what is unique about the new class

 Benefit: Easily leverage previously debugged and optimized work

72
Ver 1.4

36
References vs. Pointers

 C: Arguments to functions passed by reference or by value


 C++ references simplify passing data by reference
 Eliminate pointer de-referencing
 Use const for passing large objects to simulate call-by-value.
 Benefit of reducing large data copying overhead
 Called function will not corrupt caller’s data.
 Benefits: Better code clarity, readability, eliminates common errors
Passes address of i2 to incr
Example: int i1, i2;
incr( int &arg1, int *arg2 )
{ i1 = 0; i2 = 0;
arg++;
*arg2++; incr(i1, &i2);
}

Tips: Call-by-reference improves performance because it


eliminates the overhead of copying large amounts of data.

73
Ver 1.4

Scope Resolution Operator


C++ provide the unary scope resolution operator ::
to access global variable when a local variable of the same name in
the scope.
Use const for passing large objects

// purely C++ software code


#include <iostream.h>
#include <iomanip.h> Local value = 3.14159
const double PI = 3.14159265358979;

int main() {
const float PI = static_cast<float>(::PI);
Global value = 3.14159265358..
cout<<setprecision(20)
<<“Local value of PI= “<<PI
<<“Global value of PI= “<<::PI<<endl;
return 0;
}

74
Ver 1.4

37
I/O in C++
 C++ simplifies I/O considerably
 Standard variables: cout, cin, cerr
 Operators: << for writing, >> for reading
 Formatting manipulators
 endl Output end-of-line (\n)
 setw(int) Set the width of the output
 hex Output numbers in hexadecimal format
 setprecision(int) Sets floating-point output precision
 Streams
 file, network
 memory buffer

Example:
#include <iostream.h>
#include <iomanip.h>

main() { Output:
int i = 12; 1234567890
float r = 12.34; 12
cout << “1234567890” << endl; c
cout << i << endl; 12.34
cout << hex << setw(4) << i << endl; 12.3
cout << r << endl;
cout << setprecision(1) << r << endl;
}

Tips: C printf work well too!.

75
Ver 1.4

Why Objects for hardware?

Object contains data and functions


Objects can get created and destroyed
Data members are visible local to the object
Splitting functionality is easy
Report internal states
add member function for debugging when designing

76
Ver 1.4

38
Why Objects for hardware?
struct cmult : sc_module {
sc_in_clk clk;
sc_in<bool> new_data;
sc_in<sc_int<8> > data_in;
sc_out<sc_int<8> > data_out;

sc_int<8> a;

void entry() {
sc_int<8> b, c, d;
while(true) {
wait_until(new_data.delayed() == true);
a = data_in.read();
#if DEBUG
dump();
#endif
wait();

data_out.write(a*c-a*d);
}
};

void dump() {
cout << “Information : cmult new data item “ << a << endl;
};

SC_CTOR(cmult) {
SC_METHOD(entry, clk.pos());
}
}

77
Ver 1.4

Summary

 C++ for the C Programmer


 covered basic standard C++ programming
languages
 introduced Classes
 covered Constructor - Way to initialize things
 covered Operator and Function Overloading
 covered References and Pointers

78
Ver 1.4

39
Agenda: Day One

DAY
1
Unit Topic Lab

1 Why C-based Design Flow

2 SystemCTM - Introduction

3 C++ for the C Programmer

4 Datatypes for Diff Abstraction Level

5 Communication with Signals

79
Ver 1.4

Data Types
 C++ built in data types may be used but are not
adequate to model Hardware.
 long, int, short, char, unsigned long, unsigned
int, unsigned short, unsigned char, float,
double, long double, and bool.

 SystemCTM provides other types that are needed for


System modeling.
 Boolean types: sc_bit
 Scalar types: sc_logic
 Integer types: sc_int<length>, sc_uint<length>,
sc_bigint<length>, sc_biguint<length>
 Logic/Boolean vector types: sc_bv<length>(Ver 0.9:
sc_bool_vector) , sc_lv<length> (Ver 0.9: sc_logic_vector),
sc_array

 Scalar and integer types introduced now, vector types


covered later.

80
Ver 1.4

40
Table of SystemC Datatypes

List of all operations SystemCTM allows on types


Types Bitwise Arith- Logical Equal- Rela- Assign- Auto- Arith- Concat Index
metic ity tional ment incre- metic -nation
ment or if
dec-
rement
sc_bit &^| all all all =. &=, yes yes
|=, ^=
sc_logic ~&^| all =. &=, yes
|=, ^=
sc_int<length> all all all all all all all yes yes
sc_uint<length> all all all all all all all yes yes
sc_bigint<length> all all all all all all all yes yes
sc_biguint<length> all all all all all all all yes yes
sc_bv<length> ~&^| all =. &=, yes yes
|=, ^=
sc_lv<length> ~&^| all =. &=, yes yes
|=, ^=
sc_array all = yes
sc_2d all = yes

81
Ver 1.4

When to use which Data Types?

There is a tradeoff between simulation speed and code objective:

To summarize, follow these rules in order:


Fastest
1) use native data types as much as possible
2) use sc_int<length>/sc_uint<length>
3) use sc_bv<length> for bitwise operations
4) use sc_logic/sc_lv<length> to model 'X' and 'Z’
Slowest 5) use sc_bigint<length>/sc_biguint<length> for arbitrary
length arithmetic operations

Hint: One very interesting option for simulation speed is using profiling tools
like quantify, which tells you the time spent in each function call and how
often functions are executed

82
Ver 1.4

41
sc_int<>, sc_uint<>
Unsigned integer

sc_int<length>, sc_uint<length>
 Used for performing arithmetic on signed and unsigned integers of
less than or equal to 64 bits (on a 32-bit machine)
 Not optimal for bit vector manipulation.
 A two's-complement representation is used for signed integers.
 An array of bits
 Can address an individual bit
 Interpreted as an integer for arithmetic operations
 Simulate faster than the arbitrary precision data types

NOTE: C++ language defines the int type to be 32 bits wide. SystemC
provides both fixed precision and arbitrary precision types when you need
integers of a different bit width than 32.

83
Ver 1.4

sc_bigint<>, sc_biguint<>
Unsigned integer

sc_bigint<length>, sc_biguint<length>
 Used when more than 64 bits of precision are required
 Not optimal for bit vector manipulation
 A two's-complement representation is used for signed integers
 An array of bits
 Can address an individual bit
 Interpreted as an integer for arithmetic operations

Hint: Use for arithmetic operations wider than 64 bits

84
Ver 1.4

42
sc_int, sc_uint, sc_bigint, sc_biguint Syntax
Syntax:
sc_int<length> variable_name ;
sc_uint<length> variable_name ;
sc_bigint<length> variable_name ;
sc_biguint<length> variable_name ;
length:
 Specifies the number of elements in the array
 Must be greater than 0
 Must be compile time constant
 Use [] to bit select and range() to part select.
 Rightmost is LSB(0), Leftmost is MSB (n-1)

sc_int<5> a; // a is a 5-bit signed integer


sc_uint<44> b; // b is a 44-bit unsigned integer
sc_int<3> c;
sc_bigint<5> a;
a = 13; // a gets 01101, a[4] = 0, a[3] = 1, …, a[0] = 1
bool b;
b = a[4]; // b gets 0
b = a[3]; // b gets 1
sc_bigint<3> c;
c = a.range(3, 1); // c gets 110 - interpreted as -2

85
Ver 1.4

sc_logic
sc_logic
 Interpreted as a single-bit variable representing multiple-valued
logic
 Values:
0 logical zero or false (equivalent to bool false)
1 logical one or true (equivalent to bool true)
Z high impedance
X unknown
 Use only when necessary to model tri-state buses and their
drivers.
 Remember: Simulation & Synthesis interpret “X” differently;
Simulation - unknown, Synthesis - don’t care
 Try to limit assignments of X to internal variables, not the output
nodes of your module.

86
Ver 1.4

43
sc_bit
sc_bit
 Interpreted as a single-bit variable representing two-valued logic
 Values:
0 logical zero or false (equivalent to bool false)
1 logical one or true (equivalent to bool true)

Syntax:
sc_bit variable_name, variable_name, … ;

Example:
sc_bit a, b; // declares variables ‘a’ and ‘b’ as type sc_bit

Hint: Use two-valued logic whenever possible.


WARNING: C++ bool is implemented as integer! Remember to initialize your
bool variables to 0 or 1 if you use C++‘s bool.

87
Ver 1.4

sc_logic Syntax
Syntax:
sc_logic variable_name, variable_name, … ;

Example:
sc_logic a, b; // declares variables ‘a’ and ‘b’ as type sc_logic

This is a model of a Tri-state driver.


{
bool control, data;
sc_logic ts_out;
control
if (control == false) {
ts_out = ’Z’; // Set the drive to Z data ts_out
}
else {
ts_out = data; // Set the drive to data
}
}

INFO: Most simulators simplify 9-value logic to 4-value logic for


faster simulation speed, therefore 9-value logic is not provided

88
Ver 1.4

44
Summary

 Datatypes for Different Abstraction Level


 covered sc_int<>, sc_uint<>
 covered sc_bigint<>, sc_biguint<> and sc_logic,
sc_bit
 mentioned simulation speed impact

89
Ver 1.4

Agenda: Day One

DAY
1
Unit Topic Lab

1 Why C-based Design Flow

2 SystemCTM - Introduction

3 C++ for the C Programmer

4 Datatypes for Diff Abstraction Level

5 Communication with Signals

90
Ver 1.4

45
Introduction to Signals
 Processes communicate with each other through signals or
channels

 Signals are analogous to wires in hardware


process1 sc_signal<T> process2
 Signals are typed and always carry a value

 Signals are resolved or non-resolved

 SystemC allows signals of all built-in C++ data types or


SystemC data types

 Declaring or instantiating a signal of a particular type require


the use of C++ template mechanism

 ie. in C++ terminology, a signal is a template class


Tips: Use sc_signal<type> for processes communication!

91
Ver 1.4

Reading Signals

When a process reads a signal it reads the


current value.

 The reading of a signal does not remove the value


stored in the signal.

 Multiple processes can read the same signal.

ALERT: Reading a sc_channel DOES REMOVE the value. We


will cover channels later.
92
Ver 1.4

46
Reading Signal Using read()

All signals support the read() method.


 Returns the value currently carried by the signal immediately.
 Any number of processes can read value of a signal with
read() method.

Example:
i = count.read(); //Read signal ’count’ and store
// value in variable ’i’

Examples:
i = count; // Equivalent to i = count.read()
if (count < 7) {…} // Equivalent to if (count.read() < 7) {…}

Hint: read() method is recommended.


Clearly differentiates reads & writes
Not required. Either or mix is OK

93
Ver 1.4

Writing Signal - 1
 A write to a signal overwrites the value.

 Value is overwritten regardless if the last value has been


read or not.

 Non-resolved signals may have only one writer

 Resolved signals may have multiple writers (drivers)


 Computed by applying resolution function to the drivers
Resolve X 0 1 Z
X X X X X
0 X 0 X 0
1 X X 1 1
Z X 0 1 Z

Tips: Use variables if you want to update your data


immediately.
94
Ver 1.4

47
Writing Signal - 2

All signals have a write() method for writing.


 Value is assigned to the signal but
 Signal value does not change as soon as the assignment
statement is executed.
 Assignment statement schedules the signal for a value
update.
 When the update occurs depends upon what type of
process (more later).
 Implication: Successive writes to the same signal can
overwrite the value of the signal before the update
occurs. Only the last value written will be visible to
other processes.
Example:
count.write(’a’); //Write char ’a’ on signal ’count’
count.write(’b’); //now ’b’ will be written to count
//in the next clock cycle.

count2 = ’c’; // Equivalent to count2.write(’c’)

95
Ver 1.4

Signals of Scalar types - sc_signal


 A signal is a C++ template class - sc_signal
 Used to communicate scalar values between processes.
 Non-resolved signal.
 Signals are as wide as their corresponding data type.
 Individual bits of a scalar signal cannot be read or
written. All bits must be read together.

Syntax:
sc_signal<type> signal_name ;

type:
 Built in C++ types
 sc_logic There is a space in between

sc_signal<char> a; // a is a signal of type char


sc_signal<sc_uint<10> > b; // b is a signal of type sc_uint 10 bits wide
sc_signal<int> c; // c is a signal of type int
sc_signal<bool> d; // d is a signal of type bool
sc_signal<sc_logic> e; // e is a signal of type sc_logic

96
Ver 1.4

48
Other Signal types

Other signal types are defined - covered later:


 Special signal arrays
 sc_signal_bv<length> (Ver 0.9: sc_signal_bool_vector)

 sc_signal_lv<length> (ver 0.9: sc_signal_logic_vector)

 Resolved signal of vector type


 sc_signal_rv<length> (Ver 0.9: sc_signal_resolved_vector )

Refer to Appendix or Reference Manual for more information.

97
Ver 1.4

Summary

 Communication with Signals


 introduced signals - means of communicating
between processes
 covered reading and writing of a signal

98
Ver 1.4

49
Lab 2: Simple Arithmetic Pipeline Design - 1
Primary Objective:
 Understanding how to write interface file
 This lab introduces the process communication using
SystemCTM signals. You will simulate a simple 3 stage
pipelined arithmetic operation. You need to provide the
correct signal type (by modifying Stage1-3.h file) for
inter-pipestage /process communication.

Stage1.h, Stage2.h, Stage3.h contains the process declaration


Stage1.cc, Stage2.cc, Stage3.cc contains the process functionality

Numgen.h and display.h contains the stimulus and control process declaration
Numgen.cc and display.cc contains the stimulus and control functionality

main.cpp contains the main entry point and instantiates all processes

Stage1.cc Stage2.cc Stage3.cc

Stage1.h Stage2.h Stage3.h

main.cc

99
Ver 1.4

Lab 2: Simple Arithmetic Pipeline Design - 2

In1 Sum Prod


a a+b a a*b a
<double> <double> <double>

Powr

Stage1 Stage 2 Stage3


a^b
<double>

In2 b a-b Diff Quot


b a/b b
<double> <double> <double>

Tips : Make sure you have sc_signal for your signals in your
interface file.

! 100
Ver 1.4

50
Lab 2: Sample Output
Sample output:

101
Ver 1.4

Agenda: Day One

DAY
1
Unit Topic Lab

6 Ports and Signals

7 Asynchronous Function Process

8 Asynchronous Thread Process

102
Ver 1.4

51
Ports
Ports of a module are external interface that pass information to
and from a module, and trigger actions within the modules

Signals create connections between module ports allowing


modules to communicate.

A port can have three different modes of operation


Inputsc_in<port type>
aninput port transfers data into a module e.g. sc_in<bool> reset;
Output sc_out<port type>
an output port transfers data out from a module e.g. sc_out<int> dest;
InOut sc_inout<port type>
an inout port transfers data both into and out of a module depending on
module operation e.g. sc_inout<sc_bit> dnt;

Ports are always bound to a signal except when a port is bound


directly to another port.

Signalbinding occurs during the execution of the module


constructor
103
Ver 1.4

Signals
A signal sc_signal<type> connects the port of a module to the
port of another module

The signal transfer data from one port to another as if the ports
were connected directly

When a port is read , the value of the signal connected to the


port is returned

When a port is written, the new value will be written to the signal
when the process performing the write operation has finished
execution or has been suspended.

Preventing some processes seeing the old value while other processes sees
the new value during execution.
All processes executing during a timestamp will see the old value of the
signal
These signal semantics are the same as VHDL signal operation and Verilog
deferred assignment behavior.
104
Ver 1.4

52
Ports and Signals

No signal required: port data of


the module NWK is directly
data din connected to a port din of
module FIFO, so in this case no
FIFO
local signal is required.
NWK
Signal required: if an input port
of a module connects directly to
clkin clkint clkout an output port, then a local signal
is needed. In this case, port clkin
connects directly to port clkout,
CTREE so a signal is needed to connect
the two ports.

105
Ver 1.4

Example of ports and signals

Example:

sc_in<sc_logic> a[32]; // ports a[0] to a[31] of type sc_logic


sc_signal<sc_logic> i[16]; // signals i[0] to i[15] of type sc_logic
sc_in_clk CLK; // CLOCK port
sc_in_bv<10> kan; // a bit vector input port 10 bits wide
sc_out_bv<20> bao; // a bit vector output port 20 bits wide
sc_inout_bv<33> wang; // a bit vector inout port of 33 bits wide
sc_in_lv<10> creat; // a logic vector input port 10 bits wide
sc_out_lv<20> this; // a logic vector output port 20 bits wide
sc_inout_lv<33> train; // a logic vector inout port of 33 bits wide
sc_in_rv<10> ing; // a resolved logic vector input port 10 bits wide
sc_out_rv<20> mater; // a resolved logic vector output port 20 bits wide
sc_inout_rv<8> labz; // a resolved logic vector inout port of 8 bits wide

106
Ver 1.4

53
Clocks - A special port and signal
 Time is determined by transitions of special signals called
clocks
 No smaller granularity of time than a clock cycle

 Clocks provide a mechanism for synchronization and


ordering of communication between processes

 A system can have more than one clock


 Clocks do not have to be related (e.g. CPU core clock, bus clock…)
 Time of first edge used to establish phase relationship.
Positive edge Negative edge
T=0

Master Clock

Time of first edge establishes the phase relationship between clocks

Out-of-Phase
Clock

period Duty Cycle = D/Period 107


Ver 1.4

Clock Syntax
 A clock is a C++ class - sc_clock

Example:
sc_clock variable_name (name, period, duty_cycle, start_time, positive_first ) ;
name: clock name type: char * default value: none
period: clock periodtype: double default value: 1
duty_cycle: clock duty cycle type: double default value: 0.5
start_time: time of first edge type: doubledefault value: 0
positive_first: first edge positive type: bool default value: true

sc_in_clk clk2; //creating Clock port


sc_clock clk2(“clk2”, 20.0, 0.5, 10, 1); //generating clock

10 edge
ime sitive
g e at t t edge po
t ed Firs
Firs

clk2
0 10 20 30 40 50 60

period
108
Ver 1.4

54
Clocks in general
 A clock is a general ordering mechanism. It does not have to
be associated with the real system clock.
 At a high level of abstraction, a clock cycle can be thought of as
a step of computation and a way of ordering communication

read Header Processing Sentout data

 At the implementation level of abstraction, a clock should be


thought of as the actual system clock

0 10 20 30 40 50
Hint: Good practice is to have the variable_name and the clock
name the same . Only one clock per process.

109
Ver 1.4

Clocks and Edges

 Each clock has two edges - each edge of a clock is also a C++
class
sc_clock_edge
 Clock edges are used to trigger synchronous processes
 Clock edges are used only by synchronous processes
 Clock edges are not explicitly defined, but are derived from a
clock using the pos() and neg() methods

Example:
given: sc_clock clk(“clk”, 10, 0.5);

clk.pos(); // Gives a reference to the positive edge of clk


clk.neg(); // Gives a reference to the negative edge of clk

110
Ver 1.4

55
Clocks - Other Information

 If you know the variable name for the signal:


 name() method returns the clock name
 period() method returns the clock period
 duty_cycle() method returns the clock duty cycle

Example:
given signal clk2 was defined as follows:
sc_clock clk2(“clk2”, 20.0, 0.5, 10, 1);

clk2.name(); // returns “clk2”


clk2.period(); // returns 20.0
clk2.duty_cycle(); // returns 0.5

111
Ver 1.4

More on Clock: Gated Clocks and DFT


 Though gated clocks are an important power
management strategy, they reduce the controllability
required by a design for test (DFT) methodology.
 Architects can use SystemC to do clock gating analysis
and DFT evaluation.
 Controllability can be restored by introducing another pin
and assigning a fixed value to it during testing
 This fixed value will override and nullify the gated clock
subsystem
Test Mode Select

+1
Count
Combin. TMS
Enable
circuit Clock
Reset

112
Ver 1.4

56
Power : Growing ASIC/IC Driver
Battery life
Power-Driven
Portables
Cellular
Pagers Packaging Cost
Medical
Consumer
Reliability Peripheral
Cooling Costs Hi-Volume
Military
Telecom

Area-Driven (cost)
Green PC
Workstations Packaging Costs
PCs

Performance-Driven

More than just your cellular phone

113
Ver 1.4

Clock Gating Overview


 Clock gating
 Reduces power consumption in sequential parts
 Eliminates high activity (power) feedback muxes
 Reduces capacitance on clock tree network

 Benefits greatest when


 High ratio of sequential elements
 Large bus width designs
 Non-throughput of one designs
 Variable lifetimes are long (and static!)

 Costs
 Impact on clock tree synthesis and balancing
 Impact on testability
 If small bus widths, possible increase in gate count

114
Ver 1.4

57
Clock Gating

 Clock the data registers only when there is new data


 Power saving is design dependent (a range of 0 to 60%)
 Typical network and control designs that “hold” a lot of
data is a “good fit” for clock gating.
 Wider the bus widths, better the savings
BEFORE AFTER

Enable F
F
S Enable
Clock
S Clock
M Latch
M

115
Ver 1.4

Manually generated Real Clocks

Tips: For your convenience, you can simply use


sc_start(value), sc_stop() for automatically generated clock!
116
Ver 1.4

58
Summary

 Ports and Signals


 covered the syntax of clock and it’s usage

117
Ver 1.4

Agenda: Day One

DAY
1
Unit Topic Lab

6 Ports and Signals

7 Asynchronous Function Process

8 Asynchronous Thread Process

118
Ver 1.4

59
Functionality Description - 1
 Functionality is described in processes.

 Processes are like functions that are executed whenever their


inputs change.

 Some processes execute when called and return control to the calling
function (behave like a function).

 Some processes are called once, and then can suspend themselves and
resume execution later (behave like threads).

 Processes are not hierarchical

 Cannot have a process inside another process (use module for


hierarchical design)

119
Ver 1.4

Functionality Description - 2

 Processes use signals to communicate with each other.


 Don't use global variables - may cause order dependencies in code
(very bad).

 Processes use timing control statements to implement


synchronization and writing of signals in processes.

 Process and signal updates follow the evaluate-update


semantics of SystemCTM

120
Ver 1.4

60
So What process SystemCTM supported?
 SystemCTM supports three different process types:
Asynchronous Function Asynchronous Thread Synchronous Thread
Process Process Process

Ver 0.9 sc_async sc_aproc sc_sync

Ver 1.0 DS sc_async_fprocess sc_async_tprocess sc_sync_tprocess

Ver 0.91 SC_METHOD SC_THREAD SC_CTHREAD

NOTE: Ver 1.0 DS stands for Version 1.0 draft specification.


SystemC library is backward-compatible.

121
Ver 1.4

SystemCTM Processes

♦sc_sync_tprocess(SC_CTHREAD)
♦execution is suspended on wait() until next active clock edge
♦process contains internal “FSM”
♦synthesis has to find the correct FSM based on the code
♦within one process can be only registers sensitive to one clock edge
♦sc_async_fprocess(SC_METHOD)
♦process executes everything and wakes up at next event
♦no automatic FSM generation necessary
♦synchronous logic is sensitive to clock edge
♦sc_async_tprocess(SC_THREAD)
♦execution is suspended on wait() until next event

122
Ver 1.4

61
Asynchronous Processes : Function and Thread
Type 1: Asynchronous Function Processes
Asynchronous Processes
Type 2: Asynchronous Thread Process

 Asynchronous Function Processes (sc_async_fprocess;SC_METHOD):


Once the block is invoked, the entire body of the block is executed
infinitely fast and in order.

Similar to each other but with some differences

 Asynchronous Thread Process (sc_async_tprocess;SC_THREAD):


Once the process is invoked, instructions are executed infinitely fast
and in order UNTIL wait() statement is reached.

123
Ver 1.4

Asynchronous Function Process - Basic Idea


 Asynchronous Function Process: Once the block is invoked,
the entire body of the block is executed infinitely fast and in
order.
 Behaves like a function call. //Implementation File
//sensitive to data1 data2
#include “systemc.h”
#include “interface.h”

void example::entry() {
2 int InternalState = 0;
if (reset) {

1 } else {

Clock }

data1
A 3
C D //Implementation File
data2
//sensitive to data1 data2
4 #include “systemc.h”
#include “interface.h”

void example::entry() {

5 int InternalState = 0;
if (reset) {

} else {

}

Local variable REDEFINED

124
Ver 1.4

62
Asynchronous Function Process Characteristics - 1

 Asynchronous Function Process sc_async_fprocess:


 Is sensitive to a set of signals.
 May be sensitive to any change on a signal
 May be sensitive to the positive or negative edge of a
boolean signal

 Is invoked whenever any of the inputs it is sensitive to changes.


 Once an asynchronous function process is invoked:
 Entire body of the block is executed.
 Instructions are executed infinitely fast.
 Instructions are executed in order.

125
Ver 1.4

Asynchronous Function Process Characteristics - 2

 Asynchronous Function Process sc_async_fprocess:


 Should NOT be enclosed in an infinite loop.

 CANNOT have wait() statements

 Local variables defined in entry() function are redefined each


time the block is invoked.
 Need to save the state of the block in member variables.

 Asynchronous Function Process with sensitivity lists can model


combinational logic.

Tips: Easier for Implementation/RTL level of abstraction.


Refer to SystemC Reference Manual for simulation algorithm.

126
Ver 1.4

63
Example: Simple Instruction Set Simulator
fetch.cpp Exec decode.cpp

fetch.h Exec-decode.h
 Illustrates event (instruction) driven ISS

 Advantage: faster simulation than cycle-accurate


main.cpp

 Fetch is an asynchronous function process for fetching


instruction from memory (file)

 exec_decode is also an asynchronous function process for


executing instruction after decoding.

Question: There is a bug in decode unit. Can you find it?

127
Ver 1.4

Example : ISS - Asynchronous Function Declaration


 Declaration of any process requires two files
 Interface file (also called header file - ends with .h)
 Implementation file (also called source file - ends with .cc or .cpp)

 Interface file contains


 Declaration of class
 Asynchronous Function Process: sc_async_fprocess
 Declaration of input and output ports
 Declaration of internal (state) variables
 The constructor for the process
 Prototypes of all member functions (methods)

 Implementation file contains


 Implementation of all member functions
fetch.cpp Exec decode.cpp

fetch.h Exec-decode.h

main.cpp

128
Ver 1.4

64
Example : ISS - Interface File Declaration Rules
 Asynchronous Function Process must be declared as a
structure that publicly derives from SystemC library class.
 struct fetch : public sc_async_fprocess
 An Asynchronous Function Process consists of the following
member variables and functions:
 Input ports
 Declared as variables
 const-qualified reference to a signal
 Output ports (same for INOUT ports)
 Declared as variables
 Reference to a signal fetch.cpp Exec decode.cpp

 Member variables
fetch.h Exec-decode.h
 Internal to process
 e.g. state_variable ;
 Generic constants
main.cpp
 const-qualified variables
 e.g. const int bit_width;

129
Ver 1.4

Example: ISS - Interface File fetch.h (Ver 0.9)


struct fetch : public sc_async { // INSTRUCTION FETCH UNIT
const sc_signal<unsigned>& program_counter;
sc_signal<unsigned>& instruction;
unsigned *prog_mem; // The program memory

fetch(const char * NAME,


const sc_signal<unsigned>& PROGRAM_COUNTER,
sc_signal<unsigned>& INSTRUCTION)
: sc_async(NAME), program_counter(PROGRAM_COUNTER), instruction(INSTRUCTION)
{
sensitive(program_counter);
FILE *fp = fopen("progmem", "r"); // Initialize the program memory from file progmem
if (fp == (FILE *) 0) return; // No prog mem file to read
// First field in this file is the size of program memory desired
int size;
fscanf(fp, "%d", &size);
prog_mem = new unsigned[size];
if (prog_mem == (unsigned *) 0) { printf("Not enough memory left\n"); return; }
unsigned mem_word;
size = 0;
while (fscanf(fp, "%x", &mem_word) != EOF) {
prog_mem[size++] = mem_word;
}
}
// Functionality
void entry();
};

130
Ver 1.4

65
Example: ISS - Interface File fetch.h (Ver 0.91)
struct fetch : public sc_module { // INSTRUCTION FETCH UNIT
sc_in<unsigned> program_counter;
sc_out<unsigned> instruction;
unsigned *prog_mem; // The program memory

fetch(const char * NAME)


: sc_module (NAME) {
sc_async_fprocess(handle1,”FETCH”,fetch,entry);

sensitive(program_counter);
FILE *fp = fopen("progmem", "r"); // Initialize the program memory from file progmem
if (fp == (FILE *) 0) return; // No prog mem file to read
// First field in this file is the size of program memory desired
int size;
fscanf(fp, "%d", &size);
prog_mem = new unsigned[size];
if (prog_mem == (unsigned *) 0) { printf("Not enough memory left\n"); return; }
unsigned mem_word;
size = 0;
while (fscanf(fp, "%x", &mem_word) != EOF) {
prog_mem[size++] = mem_word;
}

end_module();
}
// Functionality
void entry();
};

131
Ver 1.4

Example: ISS -Class Declaration


struct fetch : public sc_module { // INSTRUCTION FETCH UNIT
sc_in<unsigned> program_counter;
sc_out<unsigned> instruction;
unsigned *prog_mem; // The program memory

Name of thechar
fetch(const block
* NAME) Public inheritance
: sc_module (NAME) {
sc_async_fprocess(handle1,”FETCH”,fetch,entry);
(keyword public
can be left out)
sensitive(program_counter);
FILE *fp = fopen("progmem", "r"); // Initialize the program memory from file progmem
if (fp == (FILE *) 0) return; // No prog mem file to read
// First field in this file is the size of program memory desired
int size;
fscanf(fp, "%d", &size);
prog_mem = new unsigned[size];
if (prog_mem == (unsigned *) 0) { printf("Not enough memory left\n"); return; }
unsigned mem_word;
size = 0;
while (fscanf(fp, "%x", &mem_word) != EOF) {
prog_mem[size++] = mem_word;
}

end_module(); A module struct contains the class


}
// Functionality
sc_async_fprocess defined in SystemCTM
void entry();
};

132
Ver 1.4

66
Example: ISS - Input/Output Port Declaration
Name of port
(A port is a
struct fetch : public sc_module { // INSTRUCTION FETCH UNIT
sc_in<unsigned> program_counter; data member
sc_out<unsigned> instruction; of the class)
unsigned *prog_mem; // The program memory

fetch(const char * NAME) sc_in<T> is defined as input port in SystemCTM


: sc_module (NAME) {
sc_async_fprocess(handle1,”FETCH”,fetch,entry);

sensitive(program_counter);
FILE *fp = fopen("progmem", "r"); // Initialize the program memory from file progmem
if (fp == (FILE *) 0) return; // No prog mem file to read
sc_out<T> is defined as output port in SystemCTM
// First field in this file is the size of program memory desired
int size;
fscanf(fp, "%d", &size);
prog_mem = new unsigned[size];
if (prog_mem == (unsigned *) 0) { printf("Not enough memory left\n"); return; }
unsigned mem_word;
size = 0;
while (fscanf(fp, "%x", &mem_word) != EOF) {
prog_mem[size++] = mem_word;
}

end_module();
}
// Functionality
void entry();
};

133
Ver 1.4

Example: ISS - Constructor Declaration


struct fetch : public sc_module {
Name of the Module // INSTRUCTION FETCH UNIT Constructor has
sc_in<unsigned> program_counter;
the same name
sc_out<unsigned> instruction;
unsigned *prog_mem; // The program memory as the block

fetch(const char * NAME)


: sc_module (NAME) {
sc_async_fprocess(handle1,”FETCH”, fetch,entry);

sensitive(program_counter);
Initialize the FILE *fp = fopen("progmem", "r"); // Initialize the program memory from file progmem
if (fp == (FILE *) 0) return; // No prog mem file to read
Name of Thread Process (functionality)
module // First
int size;
field in this file is the size of program memory desired

by calling fscanf(fp, "%d", &size);


prog_mem = new unsigned[size];
this function if (prog_mem == (unsigned *) 0) { printf("Not enough memory left\n"); return; }
unsigned mem_word;
with NAME size = 0; Runtime Process name
while (fscanf(fp, "%x", &mem_word) != EOF) {
as prog_mem[size++] = mem_word;
argument }
end_module();
Sensitive to signal program_counter
}
// Functionality
void entry();
};

134
Ver 1.4

67
C++ Constructor
 What is a C++ constructor?
 It is a function defined in the class that is used to initialize a
class.
 This function is called whenever a variable of that class is
instantiated.
 The function has the same name as that of the class.
 A constructor can take several arguments, but is not allowed to
have a return type.

 What does the constructor for an asynchronous


function process (block) do?
 The constructor for a block is called every time the block is
instantiated.
 The constructor is used to provide the block with a name.
 The constructor is used to connect the ports of the block to
signals.
 The constructor is used to initialize the block.
 Each block is required to have one and only one constructor.
135
Ver 1.4

Fetch.cc - Implementation file


systemc.h includes the
Local variable redefined every time entry declaration of all classes
function is entered provided in SystemCTM

#include "systemc.h"
#include ”fetch.h”

void fetch::entry()
{
unsigned pc, instr;
pc = program_counter.read();
instr = prog_mem[pc];
instruction.write(instr);
}

136
Ver 1.4

68
Example: ISS EXEC_DECODE Interface file - 1
struct exec_decode : public sc_async { //
const sc_signal<unsigned>& instruction;
sc_signal<unsigned>& program_counter;
SystemCTM Version 0.9

unsigned pc; // Program counter


unsigned cpu_reg[32]; // Cpu registers

unsigned *data_mem; // The data memory

exec_decode(const char * NAME, struct exec_decode : public sc_module { //


const sc_signal<unsigned>& INSTRUCTION, sc_in<unsigned> instruction;
sc_signal<unsigned>& PROGRAM_COUNTER)sc_out<unsigned> program_counter;
: sc_async(NAME), instruction(INSTRUCTION),
program_counter(PROGRAM_COUNTER) unsigned pc; // Program counter
{ unsigned cpu_reg[32]; // Cpu registers

unsigned *data_mem; // The data memory

exec_decode(const char * NAME)


: sc_module (NAME) {
sc_async_fprocess(handle1, “EXEC_DEOCODE”,
SystemCTM Version 0.9 exec_decode, entry);

137
Ver 1.4

Example: ISS EXEC_DECODE Interface file - 2


// sensitive only to the clock
SystemCTM Version 0.9
sensitive(instruction);
pc = 0x000000; // Power up reset value // sensitive only to the clock
for (int i =0; i<32; i++) cpu_reg[i] = 0;
sensitive(instruction);
Problem: What happens if two consecutive instructions are the same? pc = 0x000000; // Power up reset value
// Initialize
Solution: the data
should memory
be sensitive tofrom file datamem
program_counter too! for (int i =0; i<32; i++) cpu_reg[i] = 0;
FILE *fp = fopen("datamem", "r");
if (fp == (FILE *) 0) return; // No data mem file to read
// Initialize the data memory from file datamem
// First field in this file is the size of data memory
FILE *fp = fopen("datamem", "r");
int size;
if (fp == (FILE *) 0) return; // No data mem file to read
fscanf(fp, "%d", &size);
// First field in this file is the size of data memory
data_mem = new unsigned[size];
int size;
if (data_mem == (unsigned *) 0) {
fscanf(fp, "%d", &size);
printf("Not enough memory left\n");
data_mem = new unsigned[size];
return;
if (data_mem == (unsigned *) 0) {
}
printf("Not enough memory left\n");
unsigned mem_word;
size = 0;
SystemCTM Version 0.9 return;
}
while (fscanf(fp, "%x", &mem_word) != EOF) {
unsigned mem_word;
data_mem[size++] = mem_word; size = 0;
} while (fscanf(fp, "%x", &mem_word) != EOF) {
program_counter.write(pc); data_mem[size++] = mem_word;
}
}
program_counter.write(pc);
// Functionality
end_module();
void entry();
}
};
// Functionality
void entry();
}; 138
Ver 1.4

69
Example: ISS EXEC_DECODE Implementation file
void exec_decode::entry() { case 0x2: // Load
unsigned instr; REMEMBER: native
regnum1 = (instrC++ datatypes
& 0x1f000000) run
>> 24; faster!reg
// Extract
unsigned opcode; addr = (instr & 0x00ffffff); // Extract address
unsigned regnum1, regnum2, regnum3; printf("Load: R[%d] = Memory[0x%x]\n", regnum1, addr);
unsigned addr; cpu_reg[regnum1] = data_mem[addr];
pc = pc + 1;
int i; program_counter.write(pc);
instr = instruction.read(); break;
opcode = (instr & 0xe0000000) >> 29; // Extract case 0x3: // Add
//opcode regnum1 = (instr & 0x1f000000) >> 24; // Extract
switch(opcode) { //register number
case 0x0: // Halt regnum2 = (instr & 0x00f80000) >> 19; // Extract
printf("CPU Halted\n"); //register number
printf("\tPC = 0x%x\n", pc); regnum3 = (instr & 0x0007c000) >> 14; // Extract
for (i = 0; i < 32; i++) //register number
printf("\tR[%d] = %x\n", i, cpu_reg[i]); cpu_reg[regnum3] = cpu_reg[regnum1] + cpu_reg[regnum2];
// Don't write pc and execution will stop pc = pc + 1;
break; program_counter.write(pc);
case 0x1: // Store
regnum1 = (instr & 0x1f000000) >> 24; // Extract break;
// register number case 0x7: // JNZ
addr = (instr & 0x00ffffff); // Extract address regnum1 = (instr & 0x1f000000) >> 24; // Extract
printf("Store: Memory[0x%x] = R[%d]\n", addr, addr = (instr & 0x00ffffff); // Extract address
regnum1); printf("JNZ R[%d] 0x%x\n", regnum1, addr);
data_mem[addr] = cpu_reg[regnum1]; if (cpu_reg[regnum1] == 0x0) pc = pc + 1;
pc = pc + 1; else pc = addr;
program_counter.write(pc); program_counter.write(pc);
break; break;
default: // Bad opcode
pc = pc + 1;
program_counter.write(pc);
break;
}
139
Ver 1.4

Another Example: MUX Using Version 0.91 Syntax


/* Header file for If-Then-Else MUX inference */ #include <stdio.h>
#include "systemc.h"
struct mux1 : public sc_module { #include "mux1.h"
sc_in<int> in_A;
sc_in<int> in_B;
sc_in<bool> in_Y; void mux1::entry()
sc_out<int> out_X; {
int out;
mux1(const char *NAME) int a, b;
: sc_module (NAME) {
sc_async_fprocess(handle1, “MUX1”,
a = in_A.read();
mux1, entry);
sensitive (in_A); b = in_B.read();
sensitive (in_B); if (in_Y.read())
sensitive (in_Y); out = b + a;
end_module(); else
} out = b;
void entry();
out_X.write(out);
};
}

140
Ver 1.4

70
Summary

 Asynchronous Function Process


 covered sc_async_fprocess
 asynchronous function process behaved like a
function call without wait() statement
 ran from top to bottom of the code and return
 NO wait() statement was allowed
 NO while loop containing functionality was
allowed
 advantage: run fast

141
Ver 1.4

Lab 3: Simple Memory Controller - 1


Primary Objective:
Understanding how to write asynchronous function
process (sc_async/sc_async_fprocess/SC_METHOD)
Write the next state and output logic for a memory
controller FSM
mem_control is an asynchronous function process
Alloutputs are signals of type bool, active high
opcode is a signal of type sc_uint<8>
reset is a signal of type bool, active high
Port order: state opcode, reset, a_wen, rd_wen, wd_wen, inca,
nextstate

sc_uint<8>
a_wen
opcode
rd_wen
reset mem_control
wd_wen
inca
state
nextstate
state_t
state_t
142
Ver 1.4

71
Lab 3: Simple Memory Controller - 2
 For your reference, here is the top level design of the memory
controller including the testbench and SRAM memory

8 8
into data data
addr 8
TestBench addr
8 sm_seq
outof rd rd SRAM
wr wr

reset
nextstate

wd_wen
opcode

rd_wen
state

a_wen

inca
reset
mem_control

143
Ver 1.4

Lab 3: Memory Controller State Diagram


Commands (opcodes)
NOP IDLE
Action/Outputs
RDWD WTWD

RDBLK WTBLK
READ_WORD a_wen WRITE_WORD a_wen

READ_BLOCK a_wen WRITE_BLOCK a_wen

READ_WORD2 rd_wen WRITE_WORD2 wd_wen

READ_BLOCK2
rd_wen WRITE_BLOCK2 wd_wen
inca

READ_BLOCK3 rd_wen WRITE_BLOCK3 wd_wen


inca inca
States

READ_BLOCK4 rd_wen WRITE_BLOCK4 wd_wen


inca inca

wd_wen
READ_BLOCK5 rd_wen WRITE_BLOCK5 inca
144
Ver 1.4

72
Lab 3: States, Commands (opcodes), and Outputs
#ifndef _FSM_TYPES_
#define _FSM_TYPES_
enum state_t{ // states
IDLE,
READ_WORD,
READ_WORD2, Control outputs:
READ_BLOCK, a_wen assert: high(true) address write enable
READ_BLOCK2, rd_wen assert: high(true) read data write enable
READ_BLOCK3,
wd_wen assert: high(true) write data write enable
READ_BLOCK4,
READ_BLOCK5, inca assert: high(true) increment address
WRITE_WORD,
WRITE_WORD2,
WRITE_BLOCK,
WRITE_BLOCK2,
WRITE_BLOCK3,
WRITE_BLOCK4,
WRITE_BLOCK5 Interface signal:
}; a_wen address write enable
rd_wen memory read enable
enum command_t { //opcodes wd_wen memory write enable
NOP,
inca increment address
RDWD,
RDBLK,
WTWD,
WTBLK
};
#endif

145
Ver 1.4

Lab 3 : sm_seq
into address addr
+

data

outof rdata

wr

rd
reset
nextstate

wd_wen
rd_wen
state

a_wen
inreg

inca

146
Ver 1.4

73
Lab 3: Sample Instruction Flow
 For your reference, here is the instruction flow.

in_reg
wt_blk rd_wd wt_wd

op op op 31
code code code
Data Data Data Data Addr Get Addr Data Addr
data
0
addr+3

addr+1
addr+2

Time
addr

147
Ver 1.4

Lab 3: Sample Output


Sample output:

148
Ver 1.4

74
Agenda: Day One

DAY
1
Unit Topic Lab

6 Ports and Signals

7 Asynchronous Function Process

8 Asynchronous Thread Process

149
Ver 1.4

Asynchronous Function and Thread Basics


 May model both synchronous and asynchronous
systems

 Model event driven systems where processes are


executed only when their inputs change.

 Model glue logic between synchronous processes.


 Conversion between signed and unsigned
 Bit extraction, multiplexing etc.
 Interface two clock domains

 Model sequential logic

 Monitor signals as part of a testbench

 Model single cycle interaction with synchronous


process.

150
Ver 1.4

75
Asynchronous Thread Process - Basic Idea
 Asynchronous Thread Process: Once the process is invoked,
instructions are executed infinitely fast and in order UNTIL wait()
statement is reached
 Behaves like a thread.
//Implementation File
//sensitive to data1 data2
#include “systemc.h”
#include “interface.h”
Local variable NOT REDEFINED
void example::entry() {
int InternalState = 0;

Clock while (true) {


if (reset) {

A } else {
data1

data2
C D 1 …
//do something here
printf(“here is what you do in
this invocation !”);
wait();

2
//do something else
printf(“here is what you do in
another invocation !”);
wait();

//where both are invocations


//done go back to the beginning
//of while loop
}
}

151
Ver 1.4

Special Case: Synchronous Process


 Synchronous process is a special type of process used to
model synchronous systems

 All communication is synchronous to the clock of the process.

 Has only one clock associated with it.


 Sensitive to only one edge (positive or negative) called the active edge

 Samples inputs at its clock’s active edge

combinational combinational
logic logic

clock

152
Ver 1.4

76
Timing Control Statements
 Timing control statements are used to implement
synchronization and writing of signals in processes.
 wait()
suspends execution of the process until the process is
invoked again.
 wait_until() (more later)
 Suspends execution of the process until an
expression becomes true.

 If no timing control statement used:


 Process executes in zero time
 Outputs are never visible

 To write to an output signal, a process needs to invoke a timing


control statement.

 Multiple interacting synchronous processes must have at least


one timing control statement in every path.

153
Ver 1.4

wait() Function
 wait() may be used in both asynchronous and synchronous
thread processes but NOT in an asynchronous function
process (block).

 wait() suspends execution of the process until the process is


invoked again.

 wait(argument) is used to mark cycle/event boundaries


where argument must be a positive value

 In Synchronous process (sc_sync_tprocess)


 Statements before the wait() are executed in one cycle.
 Statements after the wait() are executed the next cycle.

 In Asynchronous process (sc_async_tprocess)


 Statements before the wait() are executed in last event.
 Statements after the wait() are executed the next event.

Examples:
wait() ; // waits 1 cycle in synchronous process
wait(4); // waits 4 cycles in synchronous process
wait(4) ; //ERROR!! in Asynchronous process
154
Ver 1.4

77
SystemCTM (Ver 0.9 ONLY) Constructor Conventions

 Naming convention used (does not have to be followed)


 All ports have lowercase names
 All constructor arguments have uppercase name
 All constructor arguments that are references to
signals have the same name as the ports they are
going to be connected to, except in uppercase, e.g.
in1 and IN1, in2 and IN2
 Body of the constructor
 Used to perform process-specific initialization (such
as a global watcher)

Tips: Constructor is much easier to write in Version 0.91 !!

155
Ver 1.4

Asynchronous Thread Process Characteristics - 1


 Anasynchronous thread process
sc_async_tprocess (Ver 0.9:sc_aproc):
 Is sensitive to a set of signals.
 May be sensitive to any change on a signal
 May be sensitive to the positive or negative edge
of a boolean signal

 Is invoked whenever any of the inputs it is sensitive


to changes.
 Once an asynchronous thread process is invoked:
 statements are executed until a wait() statement
is encountered
 At the wait() statement, the process execution is
suspended.
 At the next invocation, process execution starts
from the statement following the wait().

ALERT: wait(argument) IS ILLEGAL for Asynchronous thread


processes.
156
Ver 1.4

78
Asynchronous Thread Process Characteristics - 2

 An asynchronous thread process:


 Should be enclosed in an infinite loop.
 Ensures that the process can be repeatedly invoked
for every input change.
 If not enclosed in an infinite loop, process invoked
only once.

 Local variables defined in entry() function are


saved each time the process is suspended.
 State of process is implicitly saved.

Tips: Putting things into process instead of function will


make the code easier to debug and maintain.

157
Ver 1.4

Example : Asynchronous Thread Process - 1


A Simple Communication Channel model
contains transmit, receiver, channel, timer and display block
display block emulates the application interface on the receiver side
timer block will generate timeout events
packets are generated by a function in the transmit block
packets are then sent through the channel to the receiver
receiver block then sends the data to the display block
Timer
data rate, effective data rate, error recovery time can be
analyzed

Xmit Channel Receiver

Display

158
Ver 1.4

79
Example : Asynchronous Thread Process - 2
Advantages of Using SystemC
 If the design goals suddenly change, SystemC
allows designer to start the design at a high level and
refine the design down to hardware implementation
level without the disconnect that occurs between the C
model and the VHDL/Verilog models.

Timer

Xmit Channel Receiver

Display

159
Ver 1.4

Example : Asyn Thread Process - Timer


#include "systemc.h"
#include "timer.h"
struct timer : sc_module {
void timer::runtimer() { sc_in<bool> start;
while (true) { sc_out<bool> timeout;
if (start.event()) { sc_in<bool> clock;
cout <<"Timer: timer start detected"<<endl;
count = 5; // need to make this a constant int count;
timeout = false; void runtimer();
} else {
if (count > 0) { timer(const char *name) : sc_module(name) {
count--; sc_async_tprocess(handle3, "runtimer", timer, runtimer);
timeout = false; sensitive_pos <<clock;
} else { Thread Process ONLY! sensitive <<start;
timeout = true; count = 0;
} // timeout = 0;
} end_module();
wait(); }
}
} };

160
Ver 1.4

80
Example : Asyn Thread Process - Transmit
#include "transmit.h" #include "systemc.h"
int transmit::get_data_fromApp() { #include "packet.h"
int result;
result = rand(); struct transmit : sc_module {
cout <<"Generate:Sending Data Value = "<<result << "\n"; sc_in<packet_type> tpackin;
return result; sc_in<bool> timeout;
} sc_in<bool> clock;
sc_out<packet_type> tpackout;
void transmit::send_data() { sc_out<bool> start_timer;
while (true) {
if (tpackin.event()) { int buffer;
packin = tpackin; int framenum;
if (packin.seq == framenum) { packet_type packin;
buffer = get_data_fromApp(); packet_type s;
framenum++; int retry;
retry = 0; bool start;
}
} void send_data();
int get_data_fromApp();
if (clock == true) {
s.info = buffer; transmit(const char *name) : sc_module(name) {
s.seq = framenum; sc_async_tprocess(handle, "send_data", transmit, send_data);
s.retry = retry; sensitive <<tpackin<<timeout;
retry++; sensitive_pos <<clock;
tpackout = s; framenum = 1;
start = !start; retry = 0;
start_timer = start; start = false;
cout <<"Transmit:Sending packet no. "<<s.seq << "\n"; buffer = get_data_fromApp();
} end_module();
wait(); }
}
};
}

161
Ver 1.4

Example : Asyn Thread Process - Channel


#include "channel.h" #include "systemc.h"
#include "packet.h"
void channel::receive_data() {
int i;
while (true) { struct channel : sc_module {
packin = tpackin; sc_in<packet_type> tpackin;
cout << "Channel:Received packet seq no. = " << packin.seq << "\n"; sc_in<packet_type> rpackin;
i = rand(); sc_out<packet_type> tpackout;
packout = packin; sc_out<packet_type> rpackout;
cout <<"Channel: Random number = "<<i<<endl;

if ((i > 1000) && (i < 50000)) { packet_type packin;


packout.seq = 0; packet_type packout;
}
rpackout = packout; packet_type ackin;
wait();
}
packet_type ackout;
}
void receive_data();
void channel::send_ack(){ void send_ack();
int i;
while (true){
ackin = rpackin;
cout <<"Channel:Received Ack for packet = " << ackin.seq << "\n"; channel(const char *name) : sc_module(name) {
i = rand(); sc_async_tprocess(handle1, "receive_data", channel,
ackout = ackin; receive_data);
sensitive <<tpackin;
if ((i > 10) && (i < 500)) {
ackout.seq = 0;
sc_async_tprocess(handle2, "send_ack", channel,
} send_ack);
tpackout = ackout; sensitive <<rpackin;
wait(); end_module();
} }
};
}

162
Ver 1.4

81
Example : Asyn Thread Process - Receiver
#include "receiver.h" #include "systemc.h"
void receiver::receive_data(){ #include "packet.h"
while (true) {
packin = rpackin; struct receiver : sc_module {
cout <<"Receiver: got packet no. = sc_in<packet_type> rpackin;
"<<packin.seq << "\n"; sc_out<packet_type> rpackout;
if (packin.seq == framenum) { sc_out<long> dout;
dout = packin.info;
framenum++; int framenum;
} packet_type packin;
s.seq = framenum -1; packet_type s;
rpackout = s;
wait(); void receive_data();
}
} receiver(const char *name) : sc_module(name)
{
sc_async_tprocess(handle, "receive_data",
receiver, receive_data);
sensitive <<rpackin;
framenum = 1;
end_module();
}
};

163
Ver 1.4

Example : Asyn Thread Process - Display


#include "systemc.h" #include "display.h"
#include "packet.h"
void display::print_data() {
while (true) {
struct display : sc_module { cout <<"Display:Data Value Received, Data
sc_in<long> din; = "<<din << "\n";
wait();
void print_data(); }
}
display(const char *name) :
sc_module(name) { //packet.h
#include "systemc.h"
sc_async_tprocess(handle,
"print_data", display, print_data); struct packet_type {
sensitive <<din; long info;
int seq;
end_module(); int retry;
}
inline bool operator == (const packet_type& rhs) const
}; {
return (rhs.info == info && rhs.seq == seq &&
rhs.retry == retry);
}
};

extern
void sc_trace(sc_trace_file *tf, const packet_type& v,
const sc_string& NAME);
164
Ver 1.4

82
Asynchronous Evaluate-Update Cycle
Evaluationphase:
Allasynchronous objects with events at their inputs are executed
Output are not updated

If update causes new events


that require asynchronous
objects to be executed

Update phase:
Once all asynchronous objects are executed,
signals with new values are updated

Takes place instantaneously (no simulation time elapses)

Execution order does not effect final signal values or process state.

Alert: Be careful, make sure you do not create dead-lock code


! (ie A trigger B and B trigger A in an infinite loop)

165
Ver 1.4

Special Signal Methods - event()


 The event() method:
 Is only used with Asynchronous function and thread
processes.
 Is supported by all signals and signal arrays.
 Returns true if there is an event on a signal, false if not.
 Returns true only if signal was updated in the last
evaluate-update cycle.

Cycle 1 Cycle 2 Cycle 3

Signal a updated a.event() returns a.event() returns


in this cycle only true in this cycle false in this cycle

166
Ver 1.4

83
Special Signal Methods - posedge() & negedge()
 The posedge() and negedge()method are:
 Only used with Asynchronous blocks and processes.
 Supported signals of type boolean.
 For posedge()
 Returns true only if there is an event on a signal and the
transition on the signal is from 0-to-1. False if not.
 For negedge()
 Returns true only if there is an event on a signal and the
transition on the signal is from 1-to-0. False if not.

Cycle 1 Cycle 2 Cycle 3

Signal a updated a.posedge() a.posedge()


(0-to-1) returns true in this returns false in
in this cycle only cycle this cycle

167
Ver 1.4

Summary

 Asynchronous Thread Processes


 covered sc_async_tprocess
 described it’s behavior - like a thread with wait()
statement (e.g. can be use for multi-cycle
operation)
 at least one wait() statement is needed
 mentioned the existence of while loop containing
functionality
 ran until a wait() statement

168
Ver 1.4

84
Summary for Day One

 Day One
 covered the motivation behind SystemC for SoC
design
 briefly discussed about C++ languages
 introduced SystemCTM datatypes for modeling
 introduced signals for communication between
processes
 introduced the concept of time
 introduced Asynchronous Function Process
(sc_async_fprocess)
 introduced Asynchronous Thread Processes
(sc_async_tprocess)

169
Ver 1.4

Lab 4: 4x4 MULTICAST HELIX PACKET SWITCH - 1


A network switch is a multi-port device used to increase
network bandwidth by allowing multiple conversations to
occur simultaneously

Backbone

X
Dedicated Segment Switch

...

HUB HUB Shared Segment HUB

LAN

170
Ver 1.4

85
Lab 4: 4x4 MULTICAST HELIX PACKET SWITCH - 2
The switch allows for higher bandwidth. Terminals are hooked up to
hubs which are hooked up to the switch, which is connected to the
backbone. Hubs are cheap, but they don’t filter data.
The switch addresses collisions and contentions when communicating
between two local terminals. A 4-port switch can handle 4 simultaneous
conversations.
The switch has a buffering scheme so it can handle burst data. As soon
as a port is free, it un-queues data out of buffer and sends it to the
network.
switching

Ports
171
Ver 1.4

Lab 4: 4x4 MULTICAST HELIX PACKET SWITCH - 3


Primary Objectives:
using asynchronous thread processes
write the sensitivity list for this asynchronous process
correct the implementation file for the asynchronous
thread process

Main.cc
sender0 sender2

receiver0 receiver2

Switch

sender3
sender1

receiver3
receiver1

172
Ver 1.4

86
Lab 4: 4x4 MULTICAST HELIX PACKET SWITCH - 4
Here is the interface of switch block

IN0 FIFO FIFO IN2

OUT0 FIFO FIFO OUT2

RO R2

Shift reg
ring
R1 R3

IN1 FIFO FIFO IN3

FIFO FIFO OUT3


OUT1

173
Ver 1.4

Lab 4: 4x4 MULTICAST HELIX PACKET SWITCH - 5

The switch uses a self routing ring of shift registers to transfer cells
from one port to another in a pipelined fashion, resolving output
contention and efficiently handling multicast cells. Input and output
ports have FIFO buffers of depth four each.

The packet structure:


dest0
dest1
dest2
dest3

data(8) source id(4)

Tips:The performance(% packets dropped) depends upon a number of factors: 1)


the traffic 2) FIFO buffer size, 3)speed of the switch, relative to the activity rates
on the input ports

174
Ver 1.4

87
Lab 4: 4x4 MULTICAST HELIX PACKET SWITCH - 6

Sender process, writes a random value to data, and sends


it to one or more of the four receivers. Each packet also
pkt contains a sender id field. Sender process sends packets
Sender Process
(id = i) at random intervals, varying from 1 to 4 units of it's
clock.

Receiver process is activated whenever a


packet arrives. It then displays the contents pkt Receiver Process
of the packet and the receiver id. (id = i)

175
Ver 1.4

Lab 4: Sample Output


Sample output:

176
Ver 1.4

88
Lab 4: Switch Question?
Similarly:
using aggregate data types, you can model ATM traffic policing block
µp Interface

ATM Utopia Utopia Cell Header Utopia Utopia ATM


Input Output
Cells L2 Rx L2 L1 L1 Tx Cells
FIFO FIFO
Input
Master Adapter Processing Adapter Slave
Output

VC Map Policing

UPC
Parameters GCRA
Table

177
Ver 1.4

Lab 4: Switch Question?

RAM Verdict
VC Number parameters
addr data
to Cell
from VC Map out
Header
Processing
UPC
Policing Verdict
Parameters
Algorithm Logic
Table

data
in GCRA

updated parameters

178
Ver 1.4

89
Lab 4: Switch Question?

ATM Cell

HEC
CLP

CLP
VCI

VCI
VPI

VPI
PTI

PTI
payload
from Input
VPI-VCI
to VC Map
Extract
ATM
VPI & VCI Cell

Verdict Passed/Tagged ATM Cell


Pass/Tag/Discard ?
from Policing to Output

Discarded Cell
Trash

179
Ver 1.4

Agenda: Day Two

DAY
2
Unit Topic Lab

9 Special Case: Synchronous Process

10 Process Execution Order

11 Top_Level and Testbench

12 Channels for Abstract Protocol

13 Hierarchy for Modular Design

180
Ver 1.4

90
Synchronous Thread Processes Characteristics - 1
 Synchronous process sc_sync_tprocess:
 Sensitive only to one edge of one and only one clock
 Cannot use multiple clocks with a synchronous process
 This edge is called the active edge

 The process samples inputs at its active edge


 It is not triggered if inputs other than the clock change.

 At the active edge, the process is invoked only after all the
asynchronous blocks and processes have executed
 This models behavior at the inputs of unregistered inputs
Inputs sampled here
Outputs produced here
sc_sync_tprocess

Clock

Active edge
181
Ver 1.4

Synchronous Thread Processes Characteristics - 2


 Synchronous process:
 Once invoked:
 The statements execute until a wait() or wait_until()
(defined later) statement is encountered
 At the wait()or wait_until() statement, the process
execution is suspended
 At the next execution, process execution starts from the
statement after the wait()or wait_until() statement

 Local variables defined in entry() function are saved each time the
process is suspended.
 State of process is implicitly saved.

 Use set_stack_size(some number); if you have too many


variables

Tips: Easy for modeling multi-cycle operations. Conceptually, it is


like an asynchronous thread process sensitive to clock ONLY!

182
Ver 1.4

91
Synchronous Thread Processes Characteristics - 3

 Synchronous process:
 On write of output signal the
 Value is not immediately visible
 Value is available at next active edge (like registered outputs)

 The clock referenced by wait() or wait_until() is implicitly the


clock of the synchronous process.

 Minimum granularity is one clock cycle

a2
a1 a3
sc_sync sc_async

Clock Process1 Clock Process2

183
Ver 1.4

What is wait_until() ?
 Wait_until(argument) is an expression involving:
 Signals of type sc_signal<bool>
 == and != operators
 Boolean connectives && and ||

 Suspends execution of the process until next active edge of the


calling process
 Expression is evaluated
 If expression is true, process proceeds with the next statement
 If expression is false, process execution remains suspended
until the next active edge

Examples:
wait_until(aa.delayed() == 1 && bb.delayed() == 0);
// suspends execution until both aa is 1 and bb is 0

184
Ver 1.4

92
Edge Detection

 at_posedge(), at_negedge()
 SystemCTM functions to implement behavior of waiting for
a positive edge or negative edge.
 Single argument of type sc_signal<bool>
 Suspends execution of process until next active edge
where the signal makes the appropriate transition.

// waiting for a positive edge on start


if (start.read() != ’0’) wait_until(start.delayed() == ’0’);
wait_until(start.delayed() == ’1’);

// waiting for a negative edge on start


if (start.read() != ’1’) wait_until(start.delayed() == ’1’);
wait_until(start.delayed() == ’0’);

at_posedge(start.delayed()); // Wait for positive edge on start


at_negedge(start.delayed()); // Wait for negative edge on start

185
Ver 1.4

Delay-evaluated Expressions
 wait_until() and watching() (defined later):
 take an expression as an argument
 Evaluate expression at next active edges of the process.
 Expression not evaluated immediately.
 Called a delay-evaluated expression
 Argument must be a delay-evaluated expression.

 Delay-evaluated expression is created when a signal of type


sc_signal<bool> is used in conjunction with delayed()
method.
 Don't use read() in delay-evaluated expressions.
 Value sampled once and used for all cycles

Equivalent
b_tmp = b.read();
wait_until(a == b.read()); wait_until(a == b_tmp);

Use
do {
wait_until(data_ready.delayed()==1); wait();
} while (data_ready.read() != 1);
186
Ver 1.4

93
Synchronous Thread Process : I/O Timing

in1 in1 while (true) {


out = in1 + in2;
out out
wait( );
in2 in2 }

clock
in1 1 35
in2 13 18
out 14 53

 Input are sampled at the active edge


 Outputs written during a clock cycle are not visible externally
until the next active edge

187
Ver 1.4

Can we modeling Half-Clock cycle Behavior?


Case 1: Both process1 and 2 are modeled using sc_sync_tprocess
 synchronous thread processes

sc_sync sc_sync

A1 while (true) { A2 while (true) { A3


A2 = A1; A3 = A2;
wait( ); wait( );
} }

Process 1 Process 2
clock
A1 AAAA BBBB CCCC
A2 AAAA BBBB CCCC
A2 sampled by Process2 AAAA BBBB CCCC
A3 AAAA BBBB

188
Ver 1.4

94
Yes, We can modeling Half-Clock cycle Behavior
Case 2: Process 1 : synchronous process
Process 2 : asynchronous block

sc_sync sc_async

A1 while (true) { A2 A3
A2 = A1; A3 = A2;
wait( );
}

Process 1 Process 2

clock
A1 AAAA BBBB CCCC
A2 AAAA BBBB CCCC
A2 sampled by Process2 AAAA BBBB CCCC
A3 AAAA BBBB CCCC

189
Ver 1.4

Behavioral vs. RTL Timing


Behavioral Code RTL Code

in1 while (true) { in1 while (true) {


out = in1 * in2; out = in1 * in2;
out out
wait( ); wait(2);
in2 } in2 }

in1 in2

15
25 critical timing for
clock period

15
* multiplier

Tips: In architectural level, we only care about the results of


in1*in2, but in implementation, we need to make sure it fits
into a single cycle operation, else we need to modify the
communication protocol to make it multi-cycle operation!
190
Ver 1.4

95
Example Interface File special_adder.h
Name of the Module Public inheritance
(keyword public
struct special_adder : public sc_module {
can be left out)
// Input ports
sc_in_clk CLK;
Name of port
sc_in<int> in1;
sc_in<int> in2; (A port is a
data member
// Output ports of the class)
sc_out<int> out;

A module struct contains the class


// Constructor sc_sync_tprocess defined in SystemCTM
special_adder (const char * NAME)
: sc_module (NAME) {
sc_sync_tprocess(handle1, “SA”,special_adder, entry,
CLK.pos());
end_module();
Positive-edge triggered Synchronous Thread Process
}

// Functionality of the process


void entry();
};
191
Ver 1.4

Special_adder.cc - Implementation file


c1

+ sum

b Interface file should be included


c0

This syntax indicates that the entry


#include “systemc.h”
#include “special_adder.h” function belongs to process special_adder
//THIS IS A VERY SPECIAL special_adder
void special_adder :: entry( )
{
while (true) {
out = in1 + in2; Process functionality
wait( ); enclosed in an infinite loop
out = in1 + in2 + 2;
wait( );
}
}

in1
Timing control statement special_adderout

in2

192
Ver 1.4

96
Special_adder functionality
void special_adder :: entry( )
{
while (true) {
out = in1 + in2;
wait( );
out = in1 + in2 + 2;
wait( );
}
}

Clock

Code out=in1+in2; out=in1+in2+2; out=in1+in2; out=in1+in2+2;


wait( ); wait( ); wait( ); wait( );
Executed

in1 5 2 65

in2 4 3 18

out unknown 9 7 83

in1
special_adderout

in2

193
Ver 1.4

Another Example fsm recognizer


stimgen.cpp fsmr.cpp pcntr.cpp

stimgen.h fsmr.h pcntr.h

 Illustrates:
 Internal variables main.cpp

 Initialization of internal variables

 Recognizer looks for pattern ”WANG"

handshake
Process Process found Process
Stimgen Recognizer Counter
stream

194
Ver 1.4

97
fsmr.h Interface File
/* Filename fsmr.h Interface file for FSM Recognizer Process */
struct fsm_recognizer : public sc_module {
// Input ports
sc_in_clk CLK;
sc_signal<char> input_char; stimgen.cpp fsmr.cpp pcntr.cpp
sc_in<bool> data_ready;
stimgen.h fsmr.h pcntr.h
// Output ports
sc_out<bool> found;

// The internal variables of this process


char pattern[4]; // Pattern to match against main.cpp

// The constructor
fsm_recognizer(const char* NAME)
: sc_module(NAME) {
sc_sync_tprocess(….. ..CLK.pos());
pattern[0] = ’W’;
pattern[1] = ’A’; Internal variables
pattern[2] = ’N’;
Initialization of internal variables
pattern[3] = ’G’;
end_module();
}
// The functionality of the process
void entry();
};
handshake
Process Process found Process
Stimgen Recognizer Counter
stream
195
Ver 1.4

fsmr.h implentation File


/* Filename fsmr.cc Implementation file for FSM Recognizer Process */
#include "systemc.h"
#include "fsmr.h"

void fsm_recognizer::entry()
{
handshake
char c; Process Process found Process
int state = 0;
bool out;
Stimgen Recognizer Counter
stream
while (true) {
wait_until(data_ready.delayed() == true);
c = input_char.read();
printf("%c", c); // for debugging

switch (state) {
case 0:
if (c == pattern[0]) state = 1;
else state = 0;
out = false; S1
S0 Finite State
break;
case 1:
S2 machine
if (c == pattern[1]) state = 2;
else if (c != pattern[0]) state = 0;
out = false;
break; …...

196
Ver 1.4

98
Example: Positive-edge trigger DFF
/* Positive-edge triggered DFF */

#include "systemc.h"

// process definition
struct dff_pos : public sc_module {
sc_in_clk CLK; sc_sync
sc_in<bool> in_data;
sc_out<bool> out_q;
Clock
dff_pos (const char *NAME)
: sc_module (NAME) {
sc_sync_tprocess(handle1, “DP”, diff_pos, entry, CLK.pos());
end_module();
}
void entry();
// process entry
};
void dff_pos::entry()
{
while (true) {
out_q = in_data;
wait();
}
}

197
Ver 1.4

Example: D Latch
#include "systemc.h"

struct d_latch : public sc_module {


sc_in_clk clock;
sc_in<bool> in_data;
sc_out<bool> out_q;

d_latch (const char *NAME) sc_async


: sc_module (NAME) {
sc_async_fprocess(handle1, “DLATCH”, d_latch, entry);
sensitive(in_data); Clock
sensitive(clock);
end_module();
}

void entry();
};
void d_latch::entry()
{
if (clock)
out_q = in_data;
}

198
Ver 1.4

99
Observation: One Size Does Not Fit All

SW

Speed

Behav

Mixed

Gate
RTL Detail

Flexibility &
Interactivity

SystemC Coding: Speed vs. Flexibility vs. Detail

199
Ver 1.4

So, when to use which ?


 Asynchronous Objects (function and thread process) is more general than
synchronous thread process. They are used for modeling event-driven systems

 Asynchronous function processes are useful for modeling at the RTL level
 they are used for faster simulation as they are not thread-based

 Asynchronous thread processes are useful for testbench

 Synchronous thread processes are simply asynchronous thread processes


sensitive to Clock ONLY!

 Numerous unnecessary sensitivity list will degrade simulation speed for


sc_async_tprocess and sc_async_fprocess

wait(); while(true) Used for type


sc_async_fprocess NO NO RTL
sc_async_tprocess YES YES Testbench Thread

sc_sync_tprocess YES YES Behavioral Thread

200
Ver 1.4

100
Summary

 Special Case: Synchronous Thread Processes


 covered synchronous process sc_sync_tprocess
 Note: It is a special case of asynchronous thread
process that is sensitive to clock ONLY!
 described it’s behavior - like a thread with wait()
statement (e.g. can use for multi-cycle operation)
 at least one wait() statement is needed
 mentioned the existence of while loop containing
functionality
 ran until a wait() statement

201
Ver 1.4

Lab 5: Multiple-Cycle RAM


Primary Objective:
 Learn how to write a synchronous thread process

1. Write a synchronous thread process describing a multiple-cycle RAM


memory
 memory latency is programmable
 MEMORY_LANTENCY holds the number of cycles needed
to access memory.

E.g. Assume 4 cycle latency for both read and write

202
Ver 1.4

101
Lab 5: Multiple-Cycle RAM
2. Understand how many wait() statements are needed for programmable
latency.

203
Ver 1.4

Lab 5: Multiple-Cycle RAM


3. Here is the interface between memory accessor ( testbench) and
your RAM model.

int
datain dataout
chip_select cs

Memory Accessor write_enable we RAM


int
address addr
int
dataout datain

Main.cc

204
Ver 1.4

102
Lab 5: Multiple-Cycle SRAM Question?
Question. Do you know how to model this?

Address Generation DRam READ Addres Generation


Column m Column m+1

CLK

ADR COL M COL M+1


ROW

DATAO

Col b Col b+1 Col b+2

Row a

Row a

Row a+1

Row a+2

205
Ver 1.4

Lab 5: Advanced Memory

Advantages over DRAM


large memory units available, e.g. 16MB modules
relatively cheap storage elements
latest generation embedded on chips
Disadvantages
Advanced addressing scheme and timing model
read and write bursts of 8 data elements to use bandwidth
avoid addressing the same column consecutively
charge row and column separate
addressing for multiple cycles
refresh mode
arbitration for multiple sources
in order to use memory efficient you schedule usually different sources
pointer management in certain applications
in order to save memory same application can use pointer units

206
Ver 1.4

103
Lab 5: Digital Designers' Perspective of Chip Select

As a digital designer you will search for a pattern in the


address ranges that will circumvent the need for the
usage of comparator operators
In fact, patterns for chip select address ranges (and
instruction decoding) are typically designed to
minimize the necessary decoding logic

207
Ver 1.4

Lab 5: Sample Output


Sample output:

208
Ver 1.4

104
Agenda: Day Two
DAY
2
Unit Topic Lab

9 Special Case: Synchronous Process

10 Process Execution Order

11 Top_Level and Testbench

12 Channels for Abstract Protocol

13 Hierarchy for Modular Design

209
Ver 1.4

Process Execution Order - 1


For every transition of the clock signal:
1. All asynchronous objects (function/thread processes) sensitive to the
clock are executed.
 Output signals updated according to evaluate-update cycle
 Multiple evaluate-update cycles may be necessary
 Entire body of asynchronous function process is executed
 Asynchronous thread processes are executed until the next
wait() suspends execution of that process.
 No fixed order of execution for asynchronous objects

2. All output signals of synchronous thread processes that have this


edge as the active edge are updated.

out input
Synchronous Asynchronous
Process A object
in result
210
Ver 1.4

105
Process Execution Order - 2
For every transition of the clock signal:
3. As a result of step 2, asynchronous objects may have new values on
their inputs.
 If so then the actions of step 1 are repeated.
 One of the signals with new values may be a clock signal.
 Step 2 is repeated.
 Steps 2 and 3 are repeated until no more signals change value.

4. All synchronous thread processes that have this edge as the active
edge:
 Execute until their next timing control statement suspends execution.
 Includes execution of wait_until()and watching().
 There is no fixed order of execution for synchronous processes.

out input
Synchronous Asynchronous
Process A object
in result
211
Ver 1.4

Process Execution Order - 3

For every transition of the clock signal:


5. Simulation time is advanced.
 To the time for the next clock edge if automatic clock generation used
OR
 By value provided to sc_cycle()

out input
Synchronous Asynchronous
Process A object
in result

212
Ver 1.4

106
Time & Event Queues
Event Queues time
t

t+1

t+2

t+3

• Time can only advance forward.


• Time advances when every event scheduled at that time step is
executed.
• Simulation completes when all event queues are empty
• An event at time t may schedule another event at time t or any other
time t+n

213
Ver 1.4

Zero time loops


A zero time loop is encountered when events are continuously
scheduled at a time step without advancing time.
• Characterized by the simulator appearing to "hang"

• example:

a= b;
no wait(); statement in Asynchronous/Synchronous thread process

Then this causes the event a <-- b to be continuously scheduled and


executed at time t. Time cannot advance beyond the start time t.

a<-b a<-b a<-b a<-b a<-b t

t+1
214
Ver 1.4

107
"Deadlocks"

A "deadlock" is encountered when a portion of the


code is waiting for events to occur and no events
are being scheduled.

• Characterized by simulation finishing


apparently in a premature fashion (usually
accompanied with an incredulous "what?").

• Simulation finishes because no events are


scheduled.

215
Ver 1.4

Mixing Synchronous and Asynchronous Example

Body:
... out input Body:
out.write(100); i = input.read();
wait(); Synchronous Asynchronous j = i + 20;
res = in.read(); Process A Function Pro. B result.write(j);
... in result

Given the above what is the value of res as a result of


reading the signal in ?

Ans: 120

216
Ver 1.4

108
Summary

 Process Execution Order


 covered process execution order of all 3
SystemCTM functionality descriptions ie.
processes
 Can you name all 3 of them?
 Can you tell the difference between them?

217
Ver 1.4

Agenda: Day Two


DAY
2
Unit Topic Lab

9 Special Case: Synchronous Process

10 Process Execution Order

11 Top_Level and Testbench

12 Channels for Abstract Protocol

13 Hierarchy for Modular Design

218
Ver 1.4

109
Top Level Routine - sc_main
stimgen.cpp adder.cpp monitor.cpp

stimgen.h adder.h monitor.h

 Top level routine is called sc_main


 Create signals and instantiate processes
main.cpp
 Call Simulation functions

 Prototype of sc_main in SystemCTM is:

int sc_main (int argc, char* argv[] );


The argc and argv provided to a main routine for
parameters’ passing in C++ are also provided to sc_main
 e.g. simulation period sc_start(n) where n is number of
time units
 Top level is called sc_main instead of main because main is
defined in SystemCTM
 Initialization of simulation kernel is performed before
sc_main is called.
 File name is main.cpp or main.cc (depends on Makefile)
219
Ver 1.4

sc_main Outline

stimgen.cpp adder.cpp monitor.cpp

 sc_main Outline: stimgen.h adder.h monitor.h


 include interface files
 sc_main
 arguments
main.cpp
 signals
 clock instantiations
 process instantiations
 clock generation / starting the simulation

Tips: Make sure you put return(0); at the end of your sc_main!

220
Ver 1.4

110
Automatic Clock Generation
 sc_clock clk2(“CLK2”, 20.0, 0.5, 10, 1); clock
period = 20 time units
 sc_start(argument) function
 Argument is a variable or a constant of type double
 Specifies the number of time units to simulate
 -1: simulation continues infinitely
 sc_stop() function
 No arguments
 Stops simulation
 Causes sc_start() to return control to sc_main routine
 sc_time_stamp() function
 No argument
 Returns the current simulation time as a double precision floating point
value.
 For synchronous processes - time of last active edge.

 For asynchronous process - time of invocation of process.

ALERT: Make sure you have sc_stop() in some of your code


if you use sc_start(-1) !
221
Ver 1.4

Example sc_main
#include “adder.h”
#include “stimgen.h” stimgen.cpp adder.cpp monitor.cpp

#include “monitor.h”
stimgen.h adder.h monitor.h
int sc_main(int argc, char *argv[ ]){
// Create signals
sc_signal<int> s1;
sc_signal<int> s2; main.cpp
sc_signal<int> s3;

// Create clock
sc_clock clock(“clock”, 10, 0.5);
Mapping 1: by order
// Instantiate Processes using “<<“ symbol
adder ADD(“ADD_BLOCK”);
ADD << clock << s1 << s2 << s3; Mapping 2: by position
stimgen STIM(“STIM_BLOCK”);
STIM (clock, s1, s2);
monitor MON(“MON_BLOCK”); Mapping 3: explicit
MON.clk(clock); mapping one by one
MON.input(s3); (order independent)

// Simulate
sc_start(200);
return (0);
222
Ver 1.4 }

111
Example sc_main - includes & arguments
Include interface files for all
#include “adder.h” processes to be instantiated
#include “stimgen.h”
#include “monitor.h”
int sc_main(int argc, char *argv[ ]){ ac and av are the same arguments
// Create signals
sc_signal<int> s1; that are passed to main() in C/C++
sc_signal<int> s2;
sc_signal<int> s3;
// Create clock
sc_clock clock(“clock”, 10, 0.5);
// Instantiate Processes
adder ADD(“ADD_BLOCK”);
ADD << clock << s1 << s2 << s3;
stimgen STIM(“STIM_BLOCK”);
STIM (clock, s1, s2);
monitor MON(“MON_BLOCK”);
OR programmable simulation duration
MON.clk(clock); if (argc==2) n = atoi(argv[1]);
MON.input(s3); sc_start(n);

// Simulate
sc_start(200);
return (0);
}
223
Ver 1.4

Example sc_main - Signals


#include “adder.h”
#include “stimgen.h”
#include “monitor.h” Creation of signals
int sc_main(int argc, char *argv[ ]){
// Create signals
sc_signal<int> s1;
sc_signal<int> s2; s3 is a signal of type int (and not
sc_signal<int> s3; a reference to a signal)
// Create clock
sc_clock clock(“clock”, 10, 0.5);
// Instantiate Processes
adder ADD(“ADD_BLOCK”);
ADD << clock << s1 << s2 << s3;
stimgen STIM(“STIM_BLOCK”);
STIM (clock, s1, s2);
monitor MON(“MON_BLOCK”);
MON.clk(clock);
MON.input(s3);

// Simulate
sc_start(200);
return (0);
}

Note: Signals must be instantiated before used in the instantiation of


a process!

224
Ver 1.4

112
Example sc_main - Clocks
#include “adder.h”
#include “stimgen.h”
#include “monitor.h”
int sc_main(int argc, char *argv[ ]){
// Create signals Creation of clocks
sc_signal<int> s1;
sc_signal<int> s2;
sc_signal<int> s3; Clock of period 10 time
// Create clock
sc_clock clock(“clock”, 10, 0.5); units and 50% duty
// Instantiate Processes cycle
adder ADD(“ADD_BLOCK”);
ADD << clock << s1 << s2 << s3;
stimgen STIM(“STIM_BLOCK”);
STIM (clock, s1, s2);
monitor MON(“MON_BLOCK”);
MON.clk(clock);
MON.input(s3);

// Simulate Clock run for 200 time units


sc_start(200);
return (0);
}
Note: Clocks must be instantiated before used in the instantiation of
a process.

225
Ver 1.4

Example sc_main - Processes


#include “adder.h”
#include “stimgen.h”
#include “monitor.h”
int sc_main(int argc, char *argv[ ])
{ Given name of instance
// Create signals (used for error reporting)
sc_signal<int> s1;
Name of sc_signal<int> s2;
sc_signal<int> s3;
instance
(variable name) // Create clock
sc_clock clock(“clock”, 10, 0.5);
adder ADD(“ADD_BLOCK”);
ADD << clock << s1 << s2 << s3;
stimgen STIM(“STIM_BLOCK”);
Name of process
STIM (clock, s1, s2);
monitor MON(“MON_BLOCK”);
MON.clk(clock);
MON.input(s3);
Arguments to the constructor
…. of the process

Note1: Processes may be instantiated in any order.


Note2: Good practice to have variable and given names same for
process instantiations.
226
Ver 1.4

113
Process Instantiation

 Duringinstantiation of a process, the constructor for


the process is called.

 Signals are connected to ports during instantiation.

 Each process needs a reporting name and an


instance name because the instance name is the
name of a variable and is accessible only to the C++
compiler.
 The reporting name is used by the simulation kernel for error
reporting.

227
Ver 1.4

Starting the simulation


 After all the processes are instantiated and connected with
signals, the clock needs to be generated.

 Function sc_start ()
 Automatic clock generator
 Takes a double as an argument
 Specifies the number of time units to run
 -1 for argument - simulation runs forever.

Simulation is
started by
calling this
// Simulate Duration of the
function sc_start(200);
provided in return (0);
simulation
SystemCTM }
Note that code following this is
not executed until after function returns

228
Ver 1.4

114
Compiling and Running the Program
 After the implementation and header files are written:
 Implementation files are compiled individually
 Linked together with SystemCTM
 Compilation requires header files from SystemCTM

 Current Compilers
 Gnu g++ version 2.95.2 or later (Free!)
 http://www.gnu.org/software/gcc/gcc.html
 Borland C++ 4.0 or later
 Microsoft Visual C++ 6.0 or later

 Current Platforms
 Sun Solaris
 Windows NT 4.0
 Linux

 SystemCTM has makefiles (in examples) that can be modified to


compile your program.
229
Ver 1.4

Program Profiling
 Use Quantify to profile your simulation.

230
Ver 1.4

115
System Profiling
Statement
Execution
Write to RAM Checksum
// bus_controller.h header file Order
Wait Interrupt Re-Initialize RAM Get
For Make Decision Read Accumulate Test Result Transfer Data
Next // signal declaration
Input Input
Set Flag struct bus_controller : public sc_module{
Process Data Output Result Update Status sc_in<sc_biguint<256> > databus;
…}
256 bits databus ...
// bus_controller.cc Implementation file..
// reset behavior
if(reset.read()==true) {
state = reset_s;

}
S1
S0 Finite State
Profile BUS utilization ...
12%
while (true) {
S2 machine
7% Memory read
reset_loop : loop
43% Block transfer
8
%
interrupt opcode = instruction.read();
Idle
switch (state) {
Memory write

30%
case reset_s : …..; break;
case WRITE: ……; break;
case READ: …….; break.;
cout << SOME PROFILE
STATISTICS
...
}

231
Ver 1.4

Project Management
 use SNIFF+ for project management

232
Ver 1.4

116
Common Makefile Definitions

Location
# --------------- Common macro definitions of SystemC library
----------------------
# C++ Compiler
CC = g++

SYSTEMC = ../../..
INCDIR = -I. -I.. -I$(SYSTEMC)/include
LIBDIR = -L. -L.. -L$(SYSTEMC)/lib-$(TARGET_ARCH)
CFLAGS = -Wall -pedantic $(D_OPT) $(INCDIR) $(LIBDIR) -fexceptions
LIBS = -lsystemc$(D_OPT) -lnumeric_bit$(D_OPT) -lstdc++ \
-lqt -lm $(EXTRA_LIBS)

# Name of final executable file


EXE = $(MODULE).x

# These are the suffixes that we know about


.SUFFIXES: .cpp .cc .o .x

# How to compile to an object file (.cpp source extension)


.cpp.o:
$(CC) $(CFLAGS) -c $<

# How to compile to an object file (.cc source extension)


.cc.o:
$(CC) $(CFLAGS) -c $<
233
Ver 1.4

Common Makefile Definitions (cont.)

# ----------------- Common targets ------------------


# How to link final executable file from object & lib files
$(EXE): $(OBJS) $(SCENERY)/lib-$(TARGET_ARCH)/libscenery$(D_OPT).a
$(CC) $(CFLAGS) -o $@ $(OBJS) $(LIBS) 2>&1 | c++filt

# Remove all derived files (except header dependency file)


clean:
rm -f $(OBJS) *~ $(EXE)

# Remove all derived files including header dependency file


ultraclean: clean
rm -f Makefile.deps

# Create the header dependency file


Makefile.deps:
gcc $(CFLAGS) -M $(SRCS) >> Makefile.deps

include Makefile.deps

234
Ver 1.4

117
Local makefile for a project
# MODULE is the prefix of the executable file. The actual
# executable file will have a ".x" appended to it
MODULE = run

# List all source files here


SRCS = display.cc numgen.cc main.cc stage1.cc stage2.cc stage3.cc

# Set this D_OPT flag to -g for debug and -O for optimized code
D_OPT = -g

# ------- Normally do not edit below this line ----------------


# Object files are the same name as source files, but with a
# .o extension
OBJS = $(SRCS:.cc=.o)

# Bring in all the common definitions


include ../../Makefile.defs

Note: Please use tabs in makefile instead of spaces.

235
Ver 1.4

Testbench

DUT
Selected Block
Testbench

Stimulus  Testbench is a set of


processes.
 Typically at least two:
 Stimulus
generation/application
Response
 Response checking
 Other processes as needed

236
Ver 1.4

118
Summary

 Top_Level and Testbench


 covered the sc_main in main.cc
 covered the interface between different module
and how to instantiate them in the top level
 learned how to create and use Makefile
 can you tell the difference between sc_main and
main?

237
Ver 1.4

Lab 6: Simple RISC CPU

 From this lab, you will see SystemC’s ability to model


and enforce:
 large-grain (architectural) components and their
interfaces
 structural compositions of architectural components
 dynamic interaction patterns between architectural
components
 refinement of architectures

238
Ver 1.4

119
Lab 6: Simple RISC CPU - 1
Primary Objective:
 Basic: Learn how to write a main.cc
 to instantiate multiple processes for different modules
 Advanced: implement some of the functionality using synchronous
processes.

1. Write the top_level routine main.cc


2. Advanced : add one instruction
- modify assembler.pl ( assembler in PERL)
- modify decode.cc for new instruction
- modify exec.cc/mmxu.cc/floating.cc for your new instruction

Fetch Decode Execute WriteBack

Note: Modeling at different abstraction level will have big impact on


simulation speed.

239
Ver 1.4

Lab 6: Simple RISC CPU - 2


Here is the System Level View.
ICACHE
FLOATING
PAGING

BIOS

DECODE EXEC
FETCH

PIC

MMXU
DCACHE

Main.cc
240
Ver 1.4

120
Lab 6: Sample Output
Sample output:

241
Ver 1.4

Writing Fast SystemC Simulation Models


What Affects Simulation Speed?

Level of abstraction
The data types used
The process types used
The process sensitivity list
The quality of the user code

242
Ver 1.4

121
Level of Abstraction

The more details you have to simulate, the


slower the simulation
Always code at the highest level of abstraction
that can be used
Take advantage of the capabilities provided by
the C++ language
Use native C++ types, struct
Use OO (Object Oriented) features where
appropriate

243
Ver 1.4

Speed of Datatypes for Arithmetic Operations

244
Ver 1.4

122
Speed of Datatypes for Logical Operations

245
Ver 1.4

Choosing the Right Datatype

At the higher levels of abstraction, use C++


native types (int, short, long, char, float, double,
struct)
When you require hardware data types for
arithmetic operations, choose:
sc_int/sc_uint if you can live with less than 64 bits
of precision
sc_bigint/sc_biguint if you need more than 64 bits
of precision
fixed-point types if you need fixed-point

246
Ver 1.4

123
Choosing the Right Datatype - II

When you require hardware data types for logical


operations, choose:
sc_int/sc_uint if you vector is less than 64 bits in
length
sc_bv if your vector is more than 64 bits, but only
two valued
sc_lv if your vector is more than 64 bits and you
absolutely require 4 values

247
Ver 1.4

SystemC Processes

Processes that are methods are invoked with


minimum performance penalty
just a function call, and therefore cheap
Thread processes are more expensive to invoke
requires a context switch every time such a
process is invoked, which is expensive SystemC
Processes

248
Ver 1.4

124
Process switching times

249
Ver 1.4

Choosing the Right Process Type

Use SC_METHOD when writing RTL code


Save state of the process explicitly in member
variables of the module
Usually requires more code and is more difficult to
understand
Use SC_THREAD or SC_CTHREAD when writing
behavioral code
No need to save state explicitly
This will often produce more compact and readable
code

250
Ver 1.4

125
Use of SC_METHOD and SC_THREAD
EXPLICIT IMPLICIT
int temp; void behavior() {
int state; while (true) {
int cmd; temp = 0;
wait();
void datapath() { temp = temp + in;
if (cmd) wait();
temp = temp + in; temp = temp + in;
} wait();
out = temp;
void fsm() { wait();
switch(state) { }
case 0: temp = 0; cmd = 0; state = 1; }
break;
case 1: state = 2; cmd = 1; break;
case 2: state = 3; cmd = 1; break;
case 3: state = 0; cmd = 0; out = temp;
break;
} 251
Ver 1.4
}

Process Sensitivity

A process is triggered every time one of the


inputs it is sensitive to changes
Every time a process is triggered, some code is
executed in the process body
Remove useless process triggering to increase
efficiency

252
Ver 1.4

126
Avoiding False Process Executions

void my_method()
{
if (clock.posedge())
out = a * b - c * d;
}

a
SC_CTOR(my_process) {
b SC_METHOD(my_method);
out sensitive << a;
c
sensitive << b;
d sensitive << c;
sensitive << d;
sensitive_pos << clock;
clock }

253
Ver 1.4

Agenda: Day Two

DAY
2
Unit Topic Lab

9 Special Case: Synchronous Process

10 Process Execution Order

11 Top_Level and Testbench

12 Channels for Abstract Protocol

13 Hierarchy for Modular Design

254
Ver 1.4

127
Channels - Introduction
 A channel is a special type of signal processes may use for
communication.
 Capabilities different from those of signals ( sc_signal ).
 Built in protocol allows modeling of communication without I/O
protocols (handshaking etc.)
 Useful at higher levels of abstraction

System
Architectural
Process 3
Process 4
signals
Behavioral
RTL
Process 1 Process 2 Process 5
channel
RTL
Module 1

Gate

Process 1 Process 2
255
Ver 1.4

Channel Characteristics

 Created between 1 source and 1 destination only


 Single reader and writer
 Has zero or more buffers
 Number specified by user
 Act as temp storage until read by destination

Process 1 Process 2

A buffer of depth 5 (buffersize = 5)

256
Ver 1.4

128
Channel Read
 Same read semantics as signals until buffers are empty
 If data is available:
 read completes instantaneously.
 If no data is available:
 Reader (destination) process blocks for some number of
cycles until data are available.
 data_available may be used to determine if data is
available in the buffer. Returns true or false
 If channel is specified with zero buffers:
 Reader (destination) process blocks until the writer
(source) process writes to the channel.

Process 1 Process 2

257
Ver 1.4

Channel Write
 Same write semantics as signals until no buffers available
 If buffers are full:
 channel blocks (integral number of clock cycles) the writing
(source) process until buffers become available.
 If buffers are available:
 write takes one clock cycle
 A write to a channel incorporates a wait()
 space_available may be used to determine if space is
available in the buffer. Returns true or false
 If channel is specified with zero buffers:
 Writer (source) blocked until reader (destination) performs a
read on the channel.

Process 1 Process 2

258
Ver 1.4

129
Scalar Channels
 A channel is a C++ class - sc_channel

Syntax:
sc_channel <type> variable_name ;
sc_channel <type> variable_name (int BufferSize ) ;
sc_channel <type> variable_name (char *Name , int BufferSize ) ;

type: channel type: C++ built in, aggregate types, SystemCTM


types
BufferSize (optional): number of buffers, default value: 0
Name (optional): channel name

EXAMPLE:
// c is a channel of type int with a buffer of size 10 (given name is ’I’)
sc_channel<int> c("I", 10);

Process 1 Process 2

259
Ver 1.4

Channel Arrays
 A channel array is a C++ class - sc_channel_array
 Whole array is read or write.
Syntax:
sc_channel _array <type> variable_name (int ArraySize) ;
sc_channel _array <type> variable_name (int ArraySize int BufferSize ) ;
sc_channel _array <type> variable_name (char *Name , int ArraySize, int
Buffersize ) ;

type: channel type: C++ built in, aggregate types, SystemCTM types
ArraySize: size of the array
BufferSize(optional): number of buffers, refers to number of buffers in each element
of the array default value: 0
Name (optional): channel name

EXAMPLE:
// a is a channel array of type int containing 10 elements (0 buffers)
sc_channel_array<sc_array<int> > a(10);

Process 1 Process 2
260
Ver 1.4

130
Channel read () method
 read() method is provided to read the values from
the channel
 For scalar channels:
 returns a value of the type of the channel
 For array channels:
 returns an array of the type of the channel
 If data is available:
 value returned instantaneously
 data is removed from the channel
 If no data is available:
 execution of the reading process blocks until data becomes
available.
 execution blocked for an integral number of clock cycles.
 If channel is specified with zero buffers:
 reader (destination) process blocks until the writer (source)
process writes to the channel

261
Ver 1.4

Channel write() method

 write() method is provided to write values to


the channel

 For scalar channels:


 accepts an argument of the type of the channel
 For array channels:
 accepts an sc_array of the type of the channel
 If space is available:
 takes one clock cycle to complete (implicit wait() )
 If no space is available:
 execution of the writing process blocks until space
becomes available.
 execution blocked for an integral number of clock cycles.
 If channel is specified with zero buffers:
 writer (source) process blocks until the reader
(destination) process reads the channel

262
Ver 1.4

131
Channel Read Gotcha - 1

 Each appearance of a channel in an expression corresponds to


a read from a channel
 Each read removes a value from the channel

Example
Given: sc_channel<int> chn;
if (chn < 7 || chn > 12)
or
if (chn.read() < 7 || chn.read() > 12)

 Evaluation of the expression in the examples above removes


two values from the channel.

263
Ver 1.4

Channel Read Gotcha - 2

 SOLUTION: To have the channel read just once, read into a


temp variable:

Example
Given: sc_channel<int> chn;

int temp = chn;


if (temp < 7 || temp > 12))

264
Ver 1.4

132
Channel Example

Example telecom system architecture

Structure model
Library
Channel1 System1
Terminal
TerminalA TerminalB

Channel2
Channel

System2
Application Layer Terminal
Tester Terminal
TerminalA
Tester

Services Layer Behavior model

265
Ver 1.4

Summary

 Channels for Abstract Protocol


 covered the sc_channel
 used for high level abstraction communication
with user-specified buffer size
 Read will remove the data!

266
Ver 1.4

133
Lab 7: Image Smoother - 1
Primary Objective:
Understanding how to use channel
Input: a raw picture
Output: a smooth (average pixel values) picture

1. Write an asynchronous process describing


the smooth function using sc_channel
(pix_smoother.cc)
2. Refer to smooth.h for interface names and Reader Smoother Writer
types.

267
Ver 1.4

Lab 7: Image Smoother - 2

Secondary Objective:
View the before and after pictures
Performance analysis:
you can change number for pixels to be average for optimal results
Run lab7.sh ( a C-Shell script in UNIX environment)

268
Ver 1.4

134
Lab 7: Image Smoother - 3
Smoothing Algorithm using:
pix_reader' reads the input image in raster order.
It reads the greyscale value of a pixel and its 3 X 3 pixel neighborhood
and transmits it to the `pix_smoother' process.
The pix_smoother process computes the average greyscale value over
the 3 X 3 pixel nbd. and
sends it to the `pix_writer' process which then writes it to the output
image.
Unsigned char
pix_in
Unsigned char
pix_nbd_cnt pix_smoother pix_out
Unsigned char

pix_nbd
Unsigned char

269
Ver 1.4

Lab 7: Image Smoother Interface -1


#ifndef PIX_SMOOTHER_H
#define PIX_SMOOTHER_H
constant define

struct pix_smoother : public sc_aproc { asynchronous process


/* Channel Ports */
sc_channel<unsigned char>& pix_in;
sc_channel<unsigned char>& pix_nbd_cnt;
sc_channel<unsigned char>& pix_nbd;
sc_channel<unsigned char>& pix_out;

pix_smoother( const char* NAME,


sc_channel<unsigned char>& PIX_IN, Creation of channels
sc_channel<unsigned char>& PIX_NBD_CNT,
sc_channel<unsigned char>& PIX_NBD,
sc_channel<unsigned char>& PIX_OUT )
: sc_aproc(NAME), pix_in(PIX_IN),
pix_nbd_cnt(PIX_NBD_CNT), pix_nbd(PIX_NBD),
pix_out(PIX_OUT)
{
sensitive(pix_in); // NOTE: sensitive<<a<<b is OK! Sensitivity list
sensitive(pix_nbd_cnt);
}
void entry();
};

270
Ver 1.4

135
Lab 7: Image Smoother Interface -2
pix_smoother Interface :
1. Each pixel greyvalue can be represented in 8 bits (0-255). So the channel type is
unsigned char.
2. `pix_in' is an input channel port that is the greyvalue of the input pixel
3. `pix_nbd' is an input channel port with 9 buffers. We need 9 buffers because we
implement a 3 X 3 pixel neighborhood. pix_nbd stores the greyvalues of the pixels in the 3
X 3 nbd. of `pix_in'.
4. `pix_nbd_cnt' is a count of the number of nbd. pixels. Note that the pixels on the border
have a partial 3 X 3 neighborhoods and we use this count while computing the average of
the nbd. pixels.
5. `pix_out' is a greyvalue of the output pixel i.e. greyvalue computed after averaging the
nbd pixels greyvalues.

Unsigned char
pix_in
Unsigned char
pix_nbd_cnt pix_smoother pix_out
Unsigned char

pix_nbd
Unsigned char

271
Ver 1.4

Agenda: Day Two

DAY
2
Unit Topic Lab

9 Special Case: Synchronous Process

10 Process Execution Order

11 Top_Level and Testbench

12 Channels for Abstract Protocol

13 Hierarchy for Modular Design

272
Ver 1.4

136
Why Partition ?
Why partition a design?
Functionality
Generate regular data-path architectures
Good partitioning strategy results in:
Shorter run times, easier to debug
Less memory requirements
Improved quality of results (QOR)
RISC_CORE

PRGRM_CNT CONTROL
ALU

20 k gates 40 k gates
Glue
DATA_PATH 5k
100 k gates
data_bus
32
How would you partition this design?
30 k gates

273
Ver 1.4

What is a Module?
Module Features:
Module  Well-defined I/O ports
Name  Specific logic function
DATA_MORPH:

IN[15] OUT[15]

External *
...

...

+
I/O Ports

IN[0] OUT[0]

Internal
Function
Module Complexity:
 Typically, a functional block
 As big as a multi-chip system
The basic building block
 As small as a single gate
in any SystemC/HDL description!

274
Ver 1.4

137
Structure of a Module Description
Source-Code
File
/**************************************/
Comment
/* SystemC Generic Module */
Block
/**************************************/
Struct data_morph: sc_module {
sc_out_bv<8> OUT;
Declarative
sc_in_bv<8> IN;
Portion
sc_in_clk CLK;
<other_declarations and constructor>

Void data_morph::entry() {
Executable <executable_statement>
Portion <executable_statement>
• • •

A module has declarative and executable code, including:


 port declarations specifying external inputs and outputs
 executable statements describing the internal function
275
Ver 1.4

Module Characteristics - 1

 Hierarchy is built with Modules.


 Modules are structural - can have functionality.
 Each module is a C++ class - sc_module
 Module may contain:
 Other modules
 Processes
 Signals m sig_int n
a c e g
Process1 Process2
b d f

p
Mymod
clk1 clk2

276
Ver 1.4

138
Module Characteristics - 2
 A module has:
 input and output ports
 Ports connected directly to ports of constituent
processes
 Not specified as data members
 Constituent processes as data members
Processes
 Constituent modules as data members
 Constituent signals as data members
 Constructor m sig_int n
a c e g
Process1 Process2
Output
Input Ports
b d f
Port
p
Mymod
clk1 clk2

Internal Signal 277


Ver 1.4

Module Characteristics - Clocks


 May have zero or more clock edges as inputs
 Active edges not specified as data members but
rather formal parameters of module constructor.
 Processes with different active edges may be in
same module
 No edges need be specified for asynchronous
processes.
 May not have internally defined clocks (no clock signal
instantiation inside a module).

m sig_int n
a c e g
Process1 Process2
b d f

p
Mymod
clk1 clk2
278
Ver 1.4

139
Example - Process 1 Interface File
#include “systemc.h”
struct process1 : public sc_module { m sig_int n
// Inputs a c e g
sc_in<bool> a;
sc_in<bool> b; Process1 Process2
// Outputs b d f
sc_out<bool> c;
sc_out<bool> d; p
sc_in_clk CLK; Mymod
clk1 clk2
Process2.cpp
// Constructor Process1.cpp
process1(const char *NAME) Process2.h
Process1.h
: sc_module (NAME) {
sc_sync_tprocess(handle1,”PROC1”,
process1, entry, CLK.pos());
end_module();
}
module.h

void entry( );
};

main.cpp

279
Ver 1.4

Example - Process 2 Interface File


#include “systemc.h”
struct process2 : public sc_module {
// Inputs m sig_int n
sc_in<bool> e; a c e g
sc_in<bool> f; Process1 Process2
// Outputs
sc_out<bool> g; b d f
sc_in_clk CLK;
p
Mymod
// Constructor clk1 clk2
process2(const char *NAME) Process2.cpp
Process1.cpp
: sc_module (NAME) {
sc_sync_tprocess(handle1,”PROC2”, Process1.h
Process2.h
process2, entry, CLK.pos());
end_module();
}

void entry( )
module.h
};

main.cpp

280
Ver 1.4

140
Example - module.h
#include “process1.h”
#include “process2.h”
struct mymod : public sc_module { m sig_int n
sc_in<bool> m;
sc_out<bool> n; a c e g
sc_out<bool> p; Process1 Process2
sc_in_clk CLK1;
sc_in_clk CLK2; b d f

p
// Internal signals Mymod
clk1 clk2
sc_signal<bool> sig_int;
Process2.cpp
Process1.cpp
// Component processes
process1 P1; Process1.h
Process2.h
process2 P2;

// Constructor
mymod(sc_module_name NAME)
: P1(“PROC1”), P2(“PROC2”) {
module.h
P1(M, M, sig_int, P, CLK1);
P2(sig_int, P, N, CLK2);
}
};
main.cpp

281
Ver 1.4

module.h - Module declaration


Interface files of
#include “process1.h”
#include “process2.h” component processes
struct mymod : public sc_module { need to be included
sc_in<bool> m;
sc_out<bool> n;
sc_out<bool> p;
sc_in_clk CLK1;
Like a process, asc_in_clk CLK2; A module inherits from the
module is a struct // Internal signals class sc_module defined in
SystemCTM
sc_signal<bool> sig_int;

Name //
of the module processes
Component
process1 P1;
process2 P2;

// Constructor
mymod(sc_module_name NAME)
: P1(“PROC1”), P2(“PROC2”) {
P1(M, M, sig_int, P, CLK1);
P2(sig_int, P, N, CLK2);
}
};

282
Ver 1.4

141
module.h - Data Members
#include “process1.h”
#include “process2.h”
struct mymod : public sc_module {
sc_in<bool> m;
sc_out<bool> n;
Not a reference, but sc_out<bool> p;
an actual signal sc_in_clk CLK1;
sc_in_clk CLK2;

// Internal signals Internal signals


sc_signal<bool> sig_int; are data members
// Component processes
process1 P1; Component processes
process2 P2; are data members
// Constructor
mymod(sc_module_name NAME)
: P1(“PROC1”), P2(“PROC2”) {
P1(M, M, sig_int, P, CLK1);
P2(sig_int, P, N, CLK2);
}
Note that};constructors for component
processes are not called yet, i.e. process
instantiation is not yet complete
283
Ver 1.4

module.h - Constructor
#include “process1.h”
#include “process2.h”
struct mymod : public sc_module {
sc_in<bool> m;
sc_out<bool> n; Port e of P2 gets connected to sig_int, port f
sc_out<bool> p; of P2 to the port P of the module and port g of
sc_in_clk CLK1; P2 to port N of the module
sc_in_clk CLK2;

// Internal signals
sc_signal<bool> sig_int;
Constructor has the same Clock of P1
name as the module// Component processes comes from
process1 P1; port CLK1
process2 P2;

// Constructor
mymod(sc_module_name NAME)
: P1(“PROC1”), P2(“PROC2”) {
P1(M, M, sig_int, P, CLK1);
P2(sig_int, P, N, CLK2);
Initialize module
} Process constructors -
};
process instantiation
is completed here
284
Ver 1.4

142
Module Instantiation

 Modules may be instantiated inside of other modules


A module does not have an entry function
 A module does not need an implementation file
 A module is instantiated like a process:

Example:
sc_signal<bool> sig1;
sc_signal<bool> sig2;
sc_signal<bool> sig3;
sc_clock clk1(“Clock1”, 10, 0.5);
sc_clock clk2(“Clock2”, 10, 0.8);

mymod mod1(“MOD1”, sig1, sig2, sig3 , clk1, clk2 );

285
Ver 1.4

Example Lab 8b: Dashboard Controller 1


Primary Objective:
There are many changes in writing constructor in header files
from Version 0.9 till now. We will use this lab to show you how to convert
one to another.

Basic Information:
•This controller contains a speedometer, two odometers
(total and partial distance), a clock, and the pulse
generator.
•The pulses are generated by the sensors placed around
one of the wheel shafts.
•The rate of pulse generation is determined by the speed
of the car, which is constant at 120 km/h.
•The clock represents the real time. The signals in this
program are traced.
•The simulation is stopped by the odometers module.

286
Ver 1.4

143
Example Lab 8b: Dashboard Controller 2

read filter
Driver
(stimulus generator)

Pulse

Sensor #1 Sensor #3 counter mileage

KANBAO 88

Sensor #2 Sensor #4

287
Ver 1.4

Example Lab 8b: Dashboard Controller 3


struct dist_mod : sc_module {
Original version Dash0
// Ports:
sc_in<bool> pulse; // Pulse coming from the pulse generator.
purpose -- no environment module; multiple modules at one level;
single processes
// Compute the total and partial distances within each module; input, output and clock
traveled.
ports; internal and external signals; asynchronous function and
void get_dist_proc(); thread processes; one clock; tracing.

dist_mod(const char *NAME) : sc_module(NAME) {


// a simple macro DEF_PROC(async_t, get_dist_proc, dist_mod);
//#define DEF_PROC(type, proc, mod) \
// sc_ ## type ## process(proc ## _handle, #proc, mod, proc);
sc_async_tprocess(get_dist_proc_handle, “get_dist_proc”, dist_mod,
get_dist_proc);
sensitive_pos << pulse;

end_module(); sc_async_tprocess: Asynchronous Thread Process


} sc_async_fprocess: Asynchronous Function Process
sc_sync_tprocess: Synchronous Thread Process

};
288
Ver 1.4

144
Example Lab 8b: Dashboard Controller 4
struct driver_mod : sc_module {
Version Dash1
// Input ports:
sc_in_clk clk; // Clock for the actions of the driver.
sc_in<double> speed;
sc_in<double> angle;
sc_in<double> total; purpose (in terms of changes to dash0's purpose) -- environment
sc_in<double> partial;
module (driver); multiple clocks.
// Output ports:
sc_out<bool> reset; // Set if the driver wants to reset the partial
// distance odometer.
sc_out<int> speed_set; // Speed of the car as set by the driver.
sc_out<bool> start; // Set if the driver starts the car. 2 process in this module.

// Driver's actions.
void driver_out_proc();
void driver_in_proc(); sc_async_tprocess: Asynchronous Thread Process
sc_async_fprocess: Asynchronous Function Process
driver_mod(const char *NAME) : sc_module(NAME) {
sc_sync_tprocess: Synchronous Thread Process
sc_async_tprocess(handle1, “DRIVER_OUT_PROC”, driver_mod, driver_out_proc);
sensitive_pos << clk;

sc_async_fprocess(handle2, “DRIVER_IN_PROC”, driver_mod, driver_in_proc);


sensitive << speed << angle << total << partial;

end_module();
}

};
289
Ver 1.4

Example Lab 8b: Dashboard Controller 5


struct driver_mod : sc_module {
Version Dash4
// Input ports:
sc_in_clk clk; // Clock for the actions of the driver.
sc_in<double> speed;
sc_in<double> angle;
sc_in<double> total; purpose (in terms of changes to dash3's purpose) -- new style of
sc_in<double> partial;
declaring modules and processes (i.e., via the use of
// Output ports: module_name).
sc_out<bool> reset; // Set if the driver wants to reset the partial
// distance odometer.
sc_out<int> speed_set; // Speed of the car as set by the driver.
sc_out<bool> start; // Set if the driver starts the car. 2 process in this module.

// Driver's actions.
void driver_out_proc();
void driver_in_proc(); sc_async_tprocess: Asynchronous Thread Process
sc_async_fprocess: Asynchronous Function Process
driver_mod(sc_module_name NAME) {
sc_sync_tprocess: Synchronous Thread Process
sc_async_tprocess(handle1, “DRIVER_OUT_PROC”, driver_mod, driver_out_proc);
sensitive_pos << clk;

sc_async_fprocess(handle2, “DRIVER_IN_PROC”, driver_mod, driver_in_proc);


sensitive << speed << angle << total << partial;

}
};

290
Ver 1.4

145
Example Lab 8b: Dashboard Controller 6
struct driver_mod : sc_module {
Version Dash4
// Input ports:
sc_in_clk clk; // Clock for the actions of the driver.
sc_in<double> speed;
sc_in<double> angle;
sc_in<double> total; purpose (in terms of changes to dash3's purpose) -- new style of
sc_in<double> partial;
declaring modules and processes (i.e., via the use of
// Output ports: module_name).
sc_out<bool> reset; // Set if the driver wants to reset the partial
// distance odometer.
sc_out<int> speed_set; // Speed of the car as set by the driver.
sc_out<bool> start; // Set if the driver starts the car. 2 process in this module.

// Driver's actions.
void driver_out_proc();
void driver_in_proc(); sc_async_tprocess: Asynchronous Thread Process
sc_async_fprocess: Asynchronous Function Process
driver_mod(sc_module_name NAME) {
sc_sync_tprocess: Synchronous Thread Process
sc_async_tprocess(handle1, “DRIVER_OUT_PROC”, driver_mod, driver_out_proc);
sensitive_pos << clk;

sc_async_fprocess(handle2, “DRIVER_IN_PROC”, driver_mod, driver_in_proc);


sensitive << speed << angle << total << partial;

}
};

291
Ver 1.4

Summary

 Hierarchy for Modular Design


 covered the sc_module for modular design
 covered module constructor macros

292
Ver 1.4

146
Summary for Day Two

 Day Two
 covered Synchronous processes
 covered process execution order
 covered main.cc and how to connect all
processes together
 covered channels for high level abstraction
protocol
 covered hierarchical design methodology

293
Ver 1.4

Lab 8: Simple Arithmetic Pipeline Design II - 1


Primary Objective:
Write the module interface file for hierarchical design

We will return to Lab2 - Simple Arithmetic Pipeline Design.


This lab introduces the hierarchical design
by grouping individual processes as a module and
instantiate them in the main.cc
First: Write a stage1_2.h interface file using stage1.h and stage2.h
Second: Write a testbench.h interface file using numgen.h and display.h
Third: Modify main.cc

Remember:

Stage1.h, Stage2.h, Stage3.h contains the process declaration


Stage1.cc, Stage2.cc, Stage3.cc contains the process functionality

Numgen.h and display.h contains the stimulus and control process declaration
Numgen.cc and display.cc contains the stimulus and control functionality

main.cc contains the main entry point and instantiates the all processes

294
Ver 1.4

147
Lab 8: Simple Arithmetic Pipeline Design II - 2
Stage2.cc Stage3.cc display.cc
Stage1.cc numgen.cc

Stage2.h display.h
Stage1.h Stage3.h numgen.h

Stage1_2.h Testbench.h

Pipeline.h

main.cc

295
Ver 1.4

Lab 8: Simple Arithmetic Pipeline Design II - 3


Pipeline
Stage1_2
In1 Sum Prod
a a+b a a*b a
<double> <double> <double>

Powr

Stage1 Stage1_2
Stage 2 Stage3
a^b
<double>

In2 b a-b Diff Quot


b a/b b
<double> <double> <double>

Tips : Make sure you have sc_signal in your module interface file.

! 296
Ver 1.4

148
Lab 8: Sample Output
Sample output:

297
Ver 1.4

Agenda: Day Three

DAY
3
Unit Topic Lab

14 Global and Local Watching

15 Modeling BUS with Resolved Vector

16 Refinement

17 Functional I/F

18 Hardware/Software Co-verification

298
Ver 1.4

149
Modeling Real-Time Systems

A system that maintains a continuous timely interaction


with its environment

environment

Real-Time
inputs System outputs
(state)

outputs = f (inputs, state)

299
Ver 1.4

Modeling Reactivity
A reactive system is in constant interaction
with its environment
 Reading inputs and producing outputs in response
to these inputs
 Responding to exceptions like interrupts and resets

 Accepting inputs and producing outputs


 Model this using signals

 Responding to exceptions
 Exceptions are interrupts, resets, etc.
 Responding to an exception requires aborting the
execution of the process and execution of an
exception handler
 Global or local

300
Ver 1.4

150
Watching
 Mechanism for handling exceptions.
 Check for exception conditions at every clock edge.

void
adder :: entry( )
{
while (true) {
out = in1 + in2;
wait( );
if (reset = = 1) { /* do something */ };
out = in1 + in2 + 2;
wait( );
if (reset = = 1) { /* do something */ };
}
}

“This method is cumbersome!”

301
Ver 1.4

Watching - What it is
 SystemC mechanism is watching for an event
 An event is watched in a particular region. Control flow is diverted
from its normal path whenever that event occurs

 Control flow is diverted regardless of the state of execution


 Control flow is diverted to a pre-determined place

 Watching an event does not affect the normal execution of the


process
 Only when the watched events occur does control get diverted

Exception Handler

Watched Region Watched Region

Exception Handler

302
Ver 1.4

151
Watching Characteristics - Syntax
A watched event is specified as an expression
involving signals

Syntax:
watching ( expression );
expression: Signals of type sc_signal<bool>
and operators: ==, !=, &&, ||

delayed() method is
used to indicate that
signal should be read
at every cycle

Example:
watching (reset.delayed() == 1);

303
Ver 1.4

Watching Characteristics

 The signals in the watching expression are sampled only at the


active edge of the process.
 Signals are sampled at every active edge.
 Since wait() or wait_until() statements suspend execution
until the next active edge, one can think of watching signals
being sampled after every wait() or wait_until()
statement.
Active edge of process

Watched signals sampled at each active edge of the process

304
Ver 1.4

152
Global Watching

 Globalwatching is used for events that have to be


watched at all times during the execution of the
process
 An example of such an event is a reset

 Globallywatched events need to be registered in the


constructor of the process

 Globally watched events are permanently registered,


i.e. their watching cannot be disabled at any time

 Only in sc_sync processes

305
Ver 1.4

Watching Example - Interface File: accumulator.h


// Interface File accumulator.h
struct accumulator : public sc_module {
sc_in<int> number; //input reset_add +
sc_in<bool> reset_add; //input
sc_in<bool> reset_mult; //input
Accumulator sum
. . . . number
*

reset_mult Accumulator prod

//Constructor Declaration of watched


accumulator(const char* NAME) events in constructor
. . .
Returns boolean
watching (reset_add.delayed() == true);
watching (reset_mult.delayed() == true);
. . .
} Without delayed method would
// Process functionality in this member function read signal only once!
void entry();
};

306
Ver 1.4

153
Global Watching Characteristics - 1
 watching() function
 Must be invoked inside the constructor of the
process
 Registers the globally watched event
 There can be multiple watching() functions
reset_add +
 delayed() method is used (more later). sum
Accumulator
 Globally watched events are permanently registered. number
*
 Watched throughout execution of the process
Accumulator prod
 Cannot be disabled at any time reset_mult

 Watching an event does not block execution of the process.


 Normal flow of process is preempted only when the event occurs

 Watching expression evaluated only at the active edge of the


process.
 On execution of wait() or wait_until() statements inside a
process
 delayed() method is used to evaluate the expression at the
active edge instead of immediately

307
Ver 1.4

Global Watching Characteristics - 2


 Watching expression evaluated at every active edge of the
process.
 If expression is true
 Control is transferred to the beginning of the entry()
function, instead of continuing with the execution of
the next sequential statement, and
 All variables declared in the entry() function are
re-defined.
 The previous values of variables are no longer
accessible.
 If execution state of a process needs to be stored,
do so in some data member of the process class.

// Interface File accumulator.h


struct accumulator : public sc_sync { reset_add +
const sc_signal<int>& number; //input Accumulator sum
const sc_signal<bool>& reset_add; //input number

const sc_signal<bool>& reset_mult; //input *


sc_signal<int>& sum; //output reset_mult Accumulator prod

sc_signal<int>& prod; //output


int sum_acc; //internal variable
int mult_acc; //internal variable
308
Ver 1.4

154
entry () function Structure
 The entry() function for watching begins with a check for the watched
conditions
 executes code for handling these conditions if true
 Ends with infinite loop
 Body of the process
 Statements are executed until watched conditions become true, then
 Control is diverted to the beginning of the entry() function
 Code for handling watched condition is executed
void process_name::entry()
{
// Put all local variable declarations here
int a;
char b;
// Put the code for handling watched signals here
check for if (reset.read() == 1) {
watched condition(s) // Code for handling reset
a = 0;
}
// Infinite loop that contains normal
// functionality of the process
while (true) {
Infinite loop // Normal process functionality
}
}
309
Ver 1.4

Example entry () function


/* Filename accumulator.cc */
/* This is the implementation file for
synchronous process ’accumulator’ */ reset_add +
#include "systemc.h" sum
Accumulator
#include "accumulator.h" number
void accumulator::entry() *
{ reset_mult Accumulator prod
int a; //Local variable - will be
//redefined on each reset

// Code to handle resets


if (reset_add.read() == true) {
sum_acc = 0; check for
}
if (reset_mult.read() == true) { watched condition(s)
mult_acc = 1;
Handler code }

// Normal operation of the process


while (true) {
a = number.read();
sum_acc += a;
mult_acc *= a;
sum.write(sum_acc);
prod.write(mult_acc);
wait();
}
} // end of entry function 310
Ver 1.4

155
Example entry () function notes - 1
 Explicit checks for watched conditions are recommended
 Important when there are multiple watched conditions and handling
behavior is different.
 Otherwise handlers may get executed the first time the entry()
function is executed.

 read() method used


 Checking value of the signal in watched condition is like any other
read from a signal.
 Different than specifying the condition in the watching() function.

 If both resets are asserted simultaneously, then both accumulators are


reset.

// Code to handle resets


reset_add +
if (reset_add.read() == true) {
Accumulator sum
sum_acc = 0; number
} *
if (reset_mult.read() == true) { reset_mult Accumulator prod

mult_acc = 1;
}
311
Ver 1.4

Example entry () function notes - 2


 Given the code below ( both reset handlers include timing controls) :
 Assume reset_mult asserts and then 2 cycles later reset_add
asserts:
 2 cycles into the execution of the wait(5) in the reset_mult
handler:
 The handler for reset_mult is preempted
 Control is diverted to the beginning of the entry()
function

reset_mult reset_add
asserted asserted
// Code to handle resets
if (reset_add.read() == true){
wait(3); reset_add +
sum_acc = 0; Accumulator sum
number
} *
if (reset_mult.read() == true){ prod
reset_mult Accumulator
wait(5);
mult_acc = 1;
} 312
Ver 1.4

156
Local Watching

 Local watching deals with issues arising from global watching


characteristics or behavior that cannot be done with global
watching.
 Globally watched events:
 Cannot be disabled.
 Have same priority when preempting the execution of a
process.
 Local watched events:
 Watch events only in certain portions of the code.
 Prioritize events so a lower priority event cannot preempt the
handler of a higher priority.

313
Ver 1.4

Local Watching Syntax

Syntax:
W_BEGIN // Declaration block
watching (event) ;
W_DO // Action block
W_ESCAPE // Escape block
W_END

W_BEGIN: SystemCTM macro


All watching() functions are called in declaration block
W_DO: SystemCTM macro
Code normally executed is placed in action block
W_ESCAPE: SystemCTM macro
Code for handlers placed in escape block
W_END: SystemCTM macro
Marks the end of the block watching the events

314
Ver 1.4

157
Example Local Watching
/* Filename accumulator.cc */
#include "systemc.h"
#include "accumulator.h"
void accumulator::entry()
{
int a;
// Code to handle resets
if (reset_add.read() == true) {
sum_acc = 0;
wait(3);
}
while (true) {
W_BEGIN // local watching block for reset_mult
{
watching(reset_mult.delayed() == true);
}
W_DO
{ // Normal functionality
a = number.read();
sum_acc += a;
mult_acc *= a; reset_add +
sum.write(sum_acc);
prod.write(mult_acc); Accumulator sum
wait(); number
} *
W_ESCAPE prod
reset_mult Accumulator
{ // reset_mult handler
mult_acc = 1;
wait(5);
}
W_END
}
} // end of entry function
315
Ver 1.4

Local Watching Characteristics - 1

 Used only in synchronous processes.

 There can be more than one watched event in the


declaration block.
 These events all have the same priority in terms of
preempting execution of the action block.

 Signals in the watching expression are sampled only at


the active edge of the process and at all active edges.

316
Ver 1.4

158
Local Watching Characteristics - 2
 Events being watched are registered when a local watching
block is entered (reset_mult).
 Normally, statements in the action block are executed.
 When any event being watched happens:
 action block is preempted
 control is diverted to escape block

W_BEGIN // local watching block for reset_mult


{
watching(reset_mult.delayed() == true);
}
W_DO { // action block - normal functionality
a = number.read(); 1
sum_acc += a;
mult_acc *= a;
2 sum.write(sum_acc);
prod.write(mult_acc);
wait();
}
W_ESCAPE { // escape block - reset_mult handler
mult_acc = 1;
wait(5);
}
W_END

317
Ver 1.4

Local Watching Characteristics - 3


 Control leaves the watching block through:
 Normal termination of action block
 Escape block is not executed.
 The escape block
 When control leaves the watching block:
 Control goes to the first statement after W_END
 Events being watched in the block are no longer watched.
W_BEGIN // local watching block for reset_mult
{
watching(reset_mult.delayed() == true);
}
W_DO { // action block - normal functionality
a = number.read();
sum_acc += a;
mult_acc *= a;
sum.write(sum_acc);
prod.write(mult_acc);
wait();
}
W_ESCAPE { // escape block - reset_mult handler
mult_acc = 1;
wait(5);
3 }
W_END
//Something else 318
Ver 1.4

159
Nested Local Watching - In Action Block
W_BEGIN // Local watching block A - event A is being watched
....
W_DO // Action block of A
....
W_BEGIN // Local watching block B - event B is being watched
....
W_DO // Action block of B
....
W_ESCAPE // Escape block of B
....
W_END // End of local watching block B
....
W_ESCAPE // Escape block of A
....
 If event A occurs, control is transferred to escape
W_END // End of local watching block A block of A
 Event B is no longer watched and has no effect
 If event B occurs:
 If local watching block B has not been entered,
then no effect until B is entered
 If local watching block B has been entered,
control is transferred to escape block of B
 If event A occurs, escape block B is preempted
and control is transferred to escape block of A
 If events A and B occur simultaneously, control is
transferred to escape block of A.
319
Ver 1.4

Nested Local Watching - In Escape Block - 1


W_BEGIN // Local watching block A - event A is being watched
....
W_DO // Action block of A
....
W_ESCAPE // Escape block of A
....
W_BEGIN // Local watching block B - event B is being watched
....
W_DO // Action block of B
....
W_ESCAPE // Escape block of B
....
W_END // End of local watching block B
....
W_END // End of local watching block A

 If event A occurs, control is transferred to escape


block of A, then if B occurs:
 If local watching block B has not been entered,
then no effect until B is entered
 If local watching block B has been entered,
control is transferred to escape block of B

320
Ver 1.4

160
Nested Local Watching - In Escape Block - 2
W_BEGIN // Local watching block A - event A is being watched
....
W_DO // Action block of A
....
W_ESCAPE // Escape block of A
....
W_BEGIN // Local watching block B - event B is being watched
....
W_DO // Action block of B
....
W_ESCAPE // Escape block of B
....
W_END // End of local watching block B
....
W_END // End of local watching block A

 If event B occurs:
 If only B then no effect.
 Then if event A occurs, control is transferred to escape block of A.
 At some point local watching block B is entered and condition
for event B may transfer control to escape block of B.
 If events A and B occur simultaneously, control is transferred to
escape block of A.
 If local watching block B has not been entered, then no effect until
B is entered
321
Ver 1.4

Summary

 Global and Local Watching


 covered the global and local watching for reactive
behavior e.g. global or local reset

322
Ver 1.4

161
Agenda: Day Three

DAY
3
Unit Topic Lab

14 Global and Local Watching

15 Modeling BUS with Resolved Vector

16 Refinement

17 Functional I/F

18 Hardware/Software Co-verification

323
Ver 1.4

Arbitrary Length Bit Vector: sc_bv<>

sc_bv<>
 Essentially an SystemCTM array of type bool
 Additional methods for:
 Bitwise logical operations
 Operations with C++ strings
 Operations with arbitrary-precision unsigned and signed
integers
 Operations with arbitrary-precision built-in integers
 Used whenever a vector of bits is required
 Use SystemCTM arithmetic types if the vector of bits to be
interpreted as a number with arithmetic operations performed on
it.

324
Ver 1.4

162
sc_bv<> Syntax
Data Type Syntax:
sc_bv<length> variable_name ;

Signal Syntax:
sc_signal_bv<length> signal_name ;

length:
 Specifies the number of elements in the array
 Must be greater than 0
 Must be compile time constant

Example:
sc_bv<10> a ; //variable “a” is a bool vector of 10 bits
sc_signal_bv<10> asig ; //signal “asig” is a bool vector of 10 bits

325
Ver 1.4

Special Array: sc_lv<>


sc_lv<>
 Same as sc_bv with additional values of sc_logic type.
 Represent vector of multiple-valued logic
 The functions and_reduce(), or_reduce(), and xor_reduce() can be
used to obtain the AND-reduction, OR-reduction and XOR-reduction
of a logic vector
 Used for modeling tri-state buses
 Values:

0 logical zero or false (equivalent to bool false)


1 logical one or true (equivalent to bool true)
Z high impedance
X unknown

326
Ver 1.4

163
sc_lv<> Syntax
Data Type Syntax:
sc_lv<length> variable_name ;

Signal Syntax:
sc_signal_lv<length> signal_name ;

length:
 Specifies the number of elements in the array
 Must be greater than 0
 Must be compile time constant

Example:
sc_lv<10> a ; //variable “a” is a logic vector of 10 bits
sc_signal_lv<10> asig ; //signal “asig” is a logic vector of 10 bits

327
Ver 1.4

Concatenation of Arrays
 Arrays of type sc_bv and sc_lv can be concatenated.
 Operator is the comma ( , )
 May be used with an array and a scalar if elements same type.
 Concatenation of two scalars is illegal
 Indexing operator [ ] and range methods are applicable

sc_bv<5> a;
sc_bv<4> b;
bool c = false;
b = "1010";
a = (b, c); // a gets 10100
a = (c, b); // a gets 01010
a = (b.range(3,1), c, b[0]); // a gets 10100
(a, b) = "101110000"; // a gets 10111, b gets 0000
(c, a) = "111001"; // c gets 1, a gets 11001
sc_bv<2> x;
bool m, n;
m = false; n = true;
x = (m, n); // concatenation of two scalars - illegal

328
Ver 1.4

164
Types & bit widths
 Objects of compatible types but different bit-widths:
Step 1: Identify maximum precision required for the expression
 Size of operand on LHS in the case of assignments
 Size of largest operand when not an assignment

Step 2: Promote smaller operand to size of larger when different size


operands.
 Depending upon operand type either sign-extension or zero
extension.

Step 3: Each operation performed with operator of appropriate size


 No precision lost as a result of operation
 If result larger than maximum precision from step 1:
 Truncate most significant bits

Given: a, b, c, d are all 8 bit quantities


d = a * b + c ;
 maximum precision is 8 bits (step 1)
 a * b result is 16 bits, truncate to 8 bits (result a*b)
 (result a*b) + c result is 9 bits, truncate to 8 bits
329
Ver 1.4

Resolved Signals

 Resolved signals have two or more drivers.

 Typically used to model busses with tri-state drivers

 Resolution function applied to set of values written by the


drivers to determine the value of the signal.

 Using resolved and non-resolved signals


 Identical as far as external interfaces are concerned
 Only difference is signal update mechanism (resolution
function or not).
 Input ports declared as references to non-resolved signals
may be connected to non-resolved or resolved signals.
 Output ports of resolved signal type must be used when port
is connected to resolved signals.

330
Ver 1.4

165
Resolved Signal DataType: sc_signal_rv<>

 C++ template class - sc_signal_rv<length> Resolve X 0 1 Z


 Used to interface to a tri-state bus. X X X X X
0 X 0 X 0
1 X X 1 1
Z X 0 1 Z

Syntax:
sc_signal_rv<length> signal_name ;

sc_signal_rv<10> a; // a is a 10 bit wide array


// of resolved signals of type sc_logic

331
Ver 1.4

Concatenation of Signal Arrays

 Signals of bit vectors may be concatenated to form a signal array

 Resulting array serves as a reference to its constituent arrays


 writes to concatenated array are reflected as writes to original
arrays.

 Only sc_signal_bv and sc_signal_lv types


 Can be concatenated with scalar signals
 Must have same underlying type

 Bounds are from 0 to n-1 where n is sum of the lengths of the


concatenated arrays.

332
Ver 1.4

166
Concatenation Example Use

sc_signal<bool> carry;
sc_signal_bv<16> sum;
sc_int<8> a, b;
sc_int<17> temp;
temp = a + b;
Similar method
(carry, sum).write(temp);
for checking overflow
. . .
sc_signal_bv<8> lsb;
sc_signal_bv<8> msb;
sc_bv<16> bus;
bus = (msb, lsb).read();

333
Ver 1.4

Summary

 Modeling BUS with Resolved Vector


 covered data types sc_lv, sc_bv
 covered resolved and non-resolved signals namely
sc_signal_rv, sc_signal_lv, sc_signal_bv

334
Ver 1.4

167
Lab 9: Master-Slave Bus System - 1
Basic Information:
 Assumption: the slaves are always accessible and never
busy for something else
 Data bus is used for both read and write and is accessed by
the masters and slaves through resolved signals
 Resolved signals are used by the masters to transmit to
targeted addresses ( on a 32 bit address bus) and direction
control signal
Bus Bus Bus
Master Master Arbiter

Bus Bus
Slave Slave

Assumption: a perfect behavior of the slaves whereby they do not


yield any latency of operation

335
Ver 1.4

Lab 9: Master-Slave Bus System - 2


Basic Information:
Since slave is always accessible , it will lead to a simplified
control protocol with basically only one control signal
originated by a master
This control signal conveys the direction of the transfer
requested by the slave
In this scheme, the arbiter is allocating the bus a fixed amount
of cycles.

Bus Bus Bus


Master Master Arbiter

Bus Bus
Slave Slave

336
Ver 1.4

168
Lab 9: Master-Slave Bus System - 3
Basic Information:
These blocks are all sc_async_fprocess (ver 0.9: sc_async
block) triggered by a clock signal
Master deliver request signals to the arbiter and monitor a
potential grant signal from the arbiter.
Master’s request delivery rule here is set to be random with a
probability of 1/10 to actually ask for the bus
When there is a conflict, a grant signal is randomly granted to
one of the bus master

Bus Bus Bus


Master Master Arbiter

Bus Bus
Slave Slave

337
Ver 1.4

Lab 9: Master-Slave Bus System - 4


Basic Information:
the slave is simply a block that reads and writes from the
communication medium but actually does nothing special with
the data for simplicity
If slave is asked for data, then a function is called that creates
random 256 bits
if slave requires to read data, then it reads data that have been
created in a similar manner

Bus Bus Bus


Master Master Arbiter

Bus Bus
Slave Slave

338
Ver 1.4

169
Lab 9: Master-Slave Bus System - 5
Primary Objective:
create an asynchronous function process for the bus Master
(Master.cc) using sc_signal_rv

mRequest
mClock mAddress

mData
Master
mGranted mDirection

sampleDirection

339
Ver 1.4

Lab 9: Master-Slave Bus System - 6


#include "systemc.h"
#include "math.h"
Bus Master Interface file struct Master : sc_module {

static unsigned int masterSeed;

sc_out<bool> mRequest;
sc_inout_rv<32> mAddress;
sc_inout_rv<256> mData;
sc_inout<sc_logic> mDirection;

sc_in_clk mClock;
sc_in<bool> mGranted;

sc_logic sampledDirection;

enum m_state { mStart, mRequesting, mTransfering } mCurrentState, mNextState;

Master( char* NAME)


: sc_module (NAME) {
sc_async_fprocess(handle1, "MASTER", Master, entry);
sensitive_pos(mClock);
sensitive(mGranted);
srand(masterSeed);
masterSeed++;
end_module();
}

void entry();

bool fireAtRandomRequest() const;


sc_logic fireAtRandomDirection() const;
sc_logic_vector fireAtRandomData() const;
sc_logic_vector fireAtRandomAddress() const;

};

340
Ver 1.4

170
Lab 9: Master-Slave Bus System - 7
State machine for Bus Master
Read sampledDirection
and mCurrentState = mNextstate

mCurrentState = mStart

Read mData if (SampledDirection == 0)

fireAtRandomRequest() == true

Yes

mRequest <= true and mNextState <= mStart


mNextState <= mRequesting

mAddress <= “ZZZZZ…..Z”


mData <= “ZZZZ…”
mDirection <= ‘Z’
341
Ver 1.4

Lab 9: Master-Slave Bus System - 8


State machine for Bus Master
Read sampledDirection
and mCurrentState = mNextstate

mCurrentState = mRequesting

mGranted == true

Yes

mRequest <= false mNextState = mRequesting


mNextState <= mTransfering

mAddress <= “ZZZZZ…..Z”


mData <= “ZZZZ…”
mDirection <= ‘Z’

342
Ver 1.4

171
Lab 9: Master-Slave Bus System - 9
State machine for Bus Master

Read sampledDirection
and mCurrentState = mNextstate

mCurrentState = mTransfering

If (sampledDirection ==0) then printout mAddress and mData

tmpData = fireAtRandomData();
cout << "master's Data Choice: " << tmpData << endl;
tmpDirection = fireAtRandomDirection();
cout << "master's direction choice: " << tmpDirection << endl;
tmpAddress = fireAtRandomAddress();
cout << "master's Address Choice: " << tmpAddress << endl;

343
Ver 1.4

Lab 9: Master-Slave Bus System - 10


State machine for Bus Master

tmpDirection == 1

Yes

mData.write(tmpData);
mAddress <= tmpAddress
mDirection.write(1);
mData <= “ZZZZ…”
mAddress.write(tmpAddress); mDirection <= 0

mGranted == false

Yes

mNextState <= mStart DO NOTHING

344
Ver 1.4

172
Lab 9: Master-Slave Bus System - 11
Provided Routine in Master.cc
bool Master::fireAtRandomRequest() const {
if ( ((double(rand()))/RAND_MAX) > 0.9 ) return true;
else return false;
};

sc_logic Master::fireAtRandomDirection() const {


if ( ((double(rand()))/RAND_MAX) > 0.5 ) return '1';
else return '0';
};

sc_logic_vector Master::fireAtRandomData() const {


unsigned int ui;
double r;
sc_logic_vector result(256);

for ( int i=0; i<8; i++) {


r = (double(rand())/RAND_MAX);
ui = (unsigned int)floor(r*pow(2.0,32.0));
result.range(31+i*32,i*32) = ui;
}
return result;
};

sc_logic_vector Master::fireAtRandomAddress() const {


unsigned int ui = (unsigned int)floor( (double(rand())/RAND_MAX) * pow(2.0,32.0));
sc_logic_vector result(32);
result = ui;
return result;
};

345
Ver 1.4

Lab 9: Sample Output


Sample output:

346
Ver 1.4

173
Lab 9: Additional Topics - 1
Can you model PCI CONFIGURATION ?
PROCESSOR

CACHE

BRIDGE / MEMORY
CONTROLLER
DRAM MOTION
AUDIO VIDEO

PCI BUS

LAN SCSI PCI GRAPHICS


BRIDGE

ISA / EISA / MICROCHANNEL

Refer to Appendix P for details


BASIC
I/O
FUNCTIONS
347
Ver 1.4

Lab 9: Additional Topics - 2


PCI COMPLIANT *** MASTER *** DEVICE SIGNALS
AD[31:0] AD[63:32]

Address/Data C/BE#[3:0] C/BE#[7:4]


and Command
PAR PAR64 64-Bit
Extension
REQ64#
FRAME#
ACK64#
TRDY#
LOCK#
Atomic Accesses
Interface
IRDY# PCI CLKRUN#
STOP# Clock Control
Control Compliant
DEVSEL# TDI
Master TDO
IDSEL JTAG
TCK Boundary
REQ# Scan
TMS
Arbitration GNT# IEEE 1149.1
TRST#
CLK
INTA#
System RST#
INTB#
PERR# Interrupt
INTC#
Error Request
SERR# INTD#
Reporting
348
Ver 1.4

174
Lab 9: Additional Topics - 3
PCI COMPLIANT *** Target *** DEVICE SIGNALS
AD[31:0]

Address/Data C/BE#[3:0]
64-Bit
and Command
Extension
PAR

FRAME#
TRDY#
Atomic Accesses
CLKRUN#
Interface
IRDY# PCI Clock Control
SBO#
STOP#
Control Compliant SDONE Snoop Result
DEVSEL#
Target TDI
IDSEL TDO JTAG
TCK Boundary
Scan
TMS IEEE 1149.1
TRST#
CLK
INTA#
System RST#
INTB#
PERR# Interrupt
Error INTC# Request
SERR#
Reporting INTD#
349
Ver 1.4

Agenda: Day Three


DAY
3
Unit Topic Lab

14 Global and Local Watching

15 Modeling BUS with Resolved Vector

16 Refinement

17 Functional I/F

18 Hardware/Software Co-verification

350
Ver 1.4

175
Refinement
System Specification

Refine Structure
• Partition into blocks that will be
independently synthesized/refined (HW/SW)
and resource sharing
• Refine interfaces for communication

Refine Control Refine Data


• Specify I/O protocol • Use bit-true types
•Specify clock domains • Select appropriate bit
• Specify latency, throughput widths
• Specify FSM & datapath for RTL

System Implementation
351
Ver 1.4

Example: Fast Fourier Transform


A simple synchronous system in SystemCTM for
DSP ASIC
16-point FFT computation
DSP Core FFT
CPU Core from Company X

BUS
Controller

DMA
Cache

in_real out_real

SOURCE FFT SINK


in_imag out_imag

352
Ver 1.4

176
Example: FFT Testbench

Processes:
Unit Level Validation - requires stimulus generation process and
result monitoring process.
There are 3 synchronous processes:

in_real Unit Under Test


out_real

in_imag out_imag

SOURCE FFT SINK


(sync process) (sync process) (sync process)

data_req data_valid
source.h, source.cc fft.h, fft.cc sink.h, sink.cc
data_ready data_ack

353
Ver 1.4

Example: Verification
Two Primary methods of verification
Create a local testbench to verify the design

SOURCE FFT SINK


Sync Sync Sync
Process Process Process

sc_main (main.cc)

Include the module in a System wide testbench


Enables HW/SW co-verification

DSP CPU
FFT DMA RTOS
Routine Model

sc_main (main.cc)

354
Ver 1.4

177
Example: FFT Floating Point Version

sc_module {
void fft::entry() sc_in<float> in_real;
{ float sample[16][2]; sc_in<float> in_imag;
unsigned int index; sc_in<bool> data_valid;
sc_in<bool> data_ack; Use of native Data Types
while(true) sc_out<float> out_real;
{ data_req.write(false); sc_out<float> out_imag;
data_ready.write(false); sc_out<bool> data_req;
index = 0; sc_out<bool> data_ready;
//Reading in the Samples sc_in_clk CLK;
cout << endl << "Reading in the samples..." . . .
<< endl;
while( index < 16 )
{
data_req.write(true);
wait_until(data_valid.delayed() == true);
sample[index][0] = in_real.read();
sample[index][1] = in_imag.read();
index++;
data_req.write(false);
wait();
}
index = 0;
….
355
Ver 1.4

Example: FFT Fixed Pt Implementation File


//Function for butterfly computation struct fft: sc_module {
void func_butterfly
( const sc_int<16>& w_real /* snps width 16 */, sc_in<sc_int<16> > in_real;
const sc_int<16>& w_imag /* snps width 16 */, sc_in<sc_int<16> > in_imag;
const sc_int<16>& real1_in /* snps width 16 */, sc_in<bool> data_valid;
const sc_int<16>& imag1_in /* snps width 16 */,
// others ,,,,,,) sc_in<bool> data_ack;
{ sc_out<sc_int<16> > out_real;
{ sc_out<sc_int<16> > out_imag;
// Variable declarations
sc_int<17> tmp_real1; sc_out<bool> data_req;
sc_int<17> tmp_imag1; sc_out<bool> data_ready;
sc_int<17> tmp_real2; sc_in_clk CLK;
// others ...
// Begin Computation
tmp_real1 = real1_in + real2_in;
// <s,6,10> = <s,5,10> + <s,5,10>
tmp_imag1 = imag1_in + imag2_in;
// <s,6,10> = <s,5,10> - <s,5,10>

// assign the sign-bit(MSB)


real1_out[15] = tmp_real1[16];
imag1_out[15] = tmp_imag1[16];
// assign the rest of the bits
real1_out.range(14,0) = tmp_real1.range(14,0);
imag1_out.range(14,0) = tmp_imag1.range(14,0);
// assign the sign-bit(MSB)
real2_out[15] = tmp_real3[33];
imag2_out[15] = tmp_imag3[33];
356
Ver 1.4

178
Arithmetic Operation
Use fixed-point datatypes like sc_int/sc_uint instead
of unnecessary full integer range for modeling
less than 32 bits data
Multiplying an INTEGER by 2 ** N will shift its bit
level equivalent N places to the left
Shifted bits on the right will be zero filled
USE bitwise shift operator will speed up the simulation
Dividing an INTEGER by 2 ** N will shift its bit level
equivalent N places to the right
Shifted bits on the left will be sign extended
USE bitwise shift operator will speed up the simulation

357
Ver 1.4

Unnecessary Repeated Computations

for (K=1;K <=7;K++) { Temp = A - 1;


if (K > (A - 1)) for (K=1;K <=7;K++) {
S(K) = '1'; if (K > Temp)
else S(K) = '1';
S(K) = '0'; else
} S(K) = '0';
}
A-1 A - 1
has the same fixed was manually
value during all removed from
iterations the for..loop
of the for..loop

358
Ver 1.4

179
Architectural Trade-Offs : Additions
Designer Makes Architecture Tradeoffs
Example: Serial vs. Parallel
Shared

X0 X1 X2 X3
Parallel
SEL
X X0 X1 X2 X3
RST
+
Z1 +

Z Z

X = SEL ? X3 : X2 : X1 : X0; Z = X0 + X1 + X2 + X3;


Z1 = X + Z&RST;
Z = Z1;
wait();
359
Ver 1.4

Carrysave Example In Decimal

Y= D + E + F
= 945 + 446 + 715 = 2106

Conventional Arithmetic Carrysave Arithmetic

Ripple Carry 101 945 D


945 D 446 E
+ 446 E + 715 F
1391 Temp 096 Sum0
201 Carry0
Ripple Carry 110
1391 Temp Ripple Carry 010
+ 715 F 096 Sum0
2106 Y + 201 Carry0
2106 Y

360
Ver 1.4

180
Modeling Optimal Arithmetic

Conventional Arithmetic Carry-Save-Adder (CSA) Arithmetic


a (a+b) + (c+d) + (e+f) (a+b+c) + (d+e+f)

+ a
csa
+
b b
c
c
+ + d
csa
+
Y
d e csa csa Y
e f
+
f

• Additions in groups of 2 • Additions in groups of 3


• Complete computation at the end • Partial intermediate computation
of each operator (saves area and improves speed)

Carrysave arithmetic
is the most effective
datapath technique

361
Ver 1.4

Carrysave Transformation, Addition


a b c d e f g h a b c d ef g h
CSA Tree Height
- Output is 2 bits
per position
csa csa
+
Levels # of
+ + + Addends
1 3
2 4
3 6

+ + csa csa 4 9
5 13
6 19
7 28
8 42

+
Carry delay 9 63
incurred CSA csa
10 94
three times Transformation
Level 1
(for carry-look-ahead Arch.) y
csa
Input w/o csa w/csa improvement
width area area area
8 965 438 44%

+
16 2310 897 61% Carry delay
32 5617 1957 65% incurred
delay delay delay once
8 4.02 3.77 6%
16 4.90 4.14 16% y
32 5.83 4.37 25% 362
Ver 1.4

181
Ripple Carry Adder - Review

A[3] B[3] A[2] B[2] A[1] B[1] A[0] B[0] Cin

A B CI A B CI A B CI A B CI

CO S CO S CO S CO S

Cout SUM[3] SUM[2] SUM[1] SUM[0]

Area = (bit-width) * (area of a full-adder cell) = [Order n]


Delay = (bit-width) * (delay of a full-adder cell) = [Order n]

363
Ver 1.4

Carry Look Ahead Adder - Review

A[3] B[3] A[2] B[2] A[1] B[1] A[0] B[0] Cin

A3 B3 A2 B2 A1 B1 A0 B0
CLA Logic
C4 C3 C2 C1

A[3] B[3] A[2] B[2] A[1] B[1] A[0] B[0] Cin

Cout SUM[3] SUM[2] SUM[1] SUM[0]

Area = approximately (1 + Log2(bit-width)) * (area of a ripple adder)


= [Order n*log2(n)]
Delay = delay of CLA block + delay of a full-adder cell
This is approximately (1 + Log2(bit-width)) * (delay of a full-adder cell)
= [Order log2(n)]

364
Ver 1.4

182
Carry Select Adder - Example
Y = A[9:0] + B[9:0]

4 bit ripple adder 3 bit ripple adder 2 bit ripple adder 1 bit ripple adder
A[9:6] B[9:6] A[5:3] B[5:3] A[2:1] B[2:1] A[0] B[0]

FA FA FA FA 0 FA FA FA 0 FA FA 0 FA 0

0 0 0 0
Cout
1 A[9:6] B[9:6] 1 A[5:3] B[5:3] 1 A[2:1] B[2:1] 1 A[0] B[0]

FA FA FA FA 1 FA FA FA 1 FA FA 1 FA 1

Cin

0 1 0 1 0 1 0 1

Y[9:6] Y[5:3] Y[2:1] Y[0]

•The key to designing this structure is to balance the delays

•Assume mux and FA delays are equal


10 bit adder critical path = 5 gate delays

365
Ver 1.4

Carry Propagate Adder Types


g
(Fast Carry Look Ahead)
Dense Carry Look Ahead Tree
Big [nlog(n)] and Fast [log(n)]
(Carry Look Ahead)
Sparse Carry Look Ahead Tree. Slower
Twice the Delay of fastcla 2*[log(n)-1], much less Area (n)
(Carry Select Adder)
Generally Slower and Smaller than CLA
Reacts to Delay Skews (e.g. Multipliers, FIR)
Works Very Well with Moderate Pipelining
(Carry Look Ahead Select)
Very Flexible (ripple to fastcla)
Poor Pipelining
Both csa and clsa are variable block adders

ripple (Ripple Carry Adder)


366
Ver 1.4

183
Subtraction in 2’s Complement
a b
Example:
a b 8 8 y = a+b-c-d
8 8 = a+b+(~c)+(~d) +2
c d
+ - Logic1
free!
- A B a b

8 + Cin
CSA
Cin
Logic1

8
y
Logic1
y CSA
Cin

y = a - b
+ Cin
y = a + (~b) + 1 Logic0

Two’s
Complement
of b

367
Ver 1.4

Architectural Trade-Offs : Shifting; RGB to YUV


Shared Parallel
SHIFT IN LEFT SHIFT IN LEFT

BITREV
Conceptual
View

>> << >>

BITREV

OUT OUT

Parallel Serial
Y
Σ
AxB

Conceptual
Y Σ
AxB U
View Σ
AxB
V
U

Σ
AxB
V
[RGB] [C1]
[RGB] [C1]
[C0] [C2]
[C0] [C2]

368
Ver 1.4

184
Architectural Trade-Offs : Complex Multiply

Parallel version Golub’s version


a
sum
+
Conceptual
a
* b
c
*
View diff
Real d -
b
* - a
d
+
-
Real
*
c + b

* Imag
c * +
Imag

sum = a + b;
SystemC
Code
d
*Real = a*c - b*d;
diff = c-d;
ad = a*d;
bc = b*c;
Imag = b*c + a*d; Real = (sum * diff) + ad - bc;
Imag = bc + ad;

369
Ver 1.4

What is Finite State Machine?


Usage: control block send out control signals to control
behavior of data-processing block

R1 S1 S2
add_1
* +
S3
R2

FSM R3
state
reg Decode
Logic

370
Ver 1.4

185
Finite State Machine Overview
Two general types of Finite State Machines
Implicit FSM descriptions Explicit FSM descriptions
• Suited for “structured” • Easier to write with many
sequential systems state transitions or
• Easier to read & debug “unstructured” systems

371
Ver 1.4

Implicit FSM style


No state register -
states are implied

An while loop with while (true) {


one or more clock total <= data; // state A
cycles. wait();
total <= total + data; // state B
wait();
states separated by total <= total + data; // state C
clock boundaries wait();

}
A

372
Ver 1.4

186
Explicit FSM style

Inputs Output
Next state Logic Outputs
logic cloud State cloud
Vector
clk

• Mealy machine: Outputs derived from present state & inputs


• Moore machine: Outputs derived from present state
while (true) {
………..
case IDLE:
cmd = opcode.read();
switch (cmd)
{
case NOP:
next_state = IDLE;
break;
case RDWD:
……...
} 373
Ver 1.4

Synchronous Mealy Machine Overview

Next-State State Output


inputs Logic Registers current state Logic outputs

clock

reset

374
Ver 1.4

187
Synchronous Moore Machine Overview

Next-State State Output


inputs Logic Registers current state Logic outputs

clock

reset

375
Ver 1.4

Explicit FSM style


•Several ways to describe explicit FSM
•Sequential: steps the state machine

•Combinational:

•A case statement, one state per branch.

•State register is the case selector.

•Next state assignments are explicit in each branch.

•Outputs are assigned in each branch.

•OR
•Combinational logic is separated from the next state register
assignments

•Reduced FSM code-size for better simulation efficiency


376
Ver 1.4

188
Alternative Coding Styles for Synchronous FSMs
One process only
Handles both state transitions and outputs
Two processes
A synchronous process for updating the state register
A combinational process for conditionally deriving the next
machine state and updating the outputs
Three processes
A synchronous process for updating the state register
A combinational process for conditionally deriving the next
machine state
A combinational process for conditionally deriving the outputs

377
Ver 1.4

Converting Existing HDL to SystemC


SystemC support many different data types and process types
for different level of abstraction and simulation speed.

Here we will use a simple RISC core and show you how to
convert it from Verilog to SystemC code easily.

SystemC

378
Ver 1.4

189
A Simple Verilog RISC Core - 1
// //
module RISC; OP_A_MINUS_ONE = 6'b000110,
OP_A_AND_NOT_B = 6'b010100,
parameter OP_A_XOR_B = 6'b010110,
RWIDTH = 16, OP_A_OR_B = 6'b010111,
IWIDTH = 32, OP_NOT_B = 6'b011100,
ZERO = 0, OP_A_OR_NOT_B = 6'b011101,
NEG = 1, OP_NAND = 6'b011110,
CARRY = 2; OP_JTRUE = 6'b100000,
OP_JFALSE = 6'b100010,
parameter OP_HALT = 6'b111111,
OP_ADD = 6'b000000, OP_CALL= 6'b010000,
OP_ADD_PLUS_ONE = 6'b000001, OP_RET = 6'b001000,
OP_A = 6'b000010,
OP_A_PLUS_ONE = 6'b000011, ALU_NEG = 8'b00000000,
OP_AND = 6'b010001, ALU_ZERO = 8'b00000001,
OP_NOT_A_AND_B = 6'b010010, ALU_CARRY = 8'b00000010,
OP_B = 6'b010011, ALU_NEG_ZERO = 8'b00000011,
OP_NOT_A_AND_NOT_B = 6'b011000, BOOL_SHIFT_ZERO = 8'b00000100,
OP_A_XNOR_B = 6'b011001,
OP_NOT_A = 6'b011010, ALU_TRUE = 8'b00111111;
OP_NOT_A_OR_B = 6'b011011,
OP_SUB_MINUS_ONE = 6'b000100,

OP_SUB = 6'b000101,

379
Ver 1.4

A Simple Verilog RISC Core - 2

// //
reg [31:0] instr_mem[0:255]; function [2:0] check_result ;
reg [IWIDTH-1:0] instruction; input [RWIDTH-1 :0] result;
reg [15:0] pc; //Program Counter input [RWIDTH-1 :0] op1;
reg [15:0] regfile[0:127]; input [RWIDTH-1 :0] op2;
reg [7:0] ra,rb; //Register address for op1,op2; begin
reg [RWIDTH-1:0] op1,op2;
reg [7:0] wr; //Write address check_result = 0;
reg [RWIDTH-1:0] result; //result of ALU
operation if ( result == 0)
reg [7:0] cond; //Condition check_result[0] = 1'b1;
reg [7:0] ja; //Jump address else
reg [7:0] ssr; //Status register check_result[0] = 1'b0;
reg clk;
reg [7:0] stack_ssr[0:127]; //Stack to store ssr if ( result[RWIDTH-1] == 1 )
info check_result[1] = 1'b1;
reg [15:0] stack_pc[0:127]; //Stack to store pc else
info check_result[1] = 1'b0;

if ( (op1[RWIDTH-1] ^ op2[RWIDTH-1] ) != 0 )
reg instr_type[1:0]; check_result[2] = 1'b0;
reg opcode[5:0]; else if ( ( result[RWIDTH-1] ^ op1[RWIDTH-1]
integer i; //Pointer for stack memory ) ==1 )
check_result[2] = 1'b1;
integer clk_count;
else

check_result[2] = 1'b0;
end
endfunction
380
Ver 1.4

190
A Simple Verilog RISC Core - 3

// //
initial always @( posedge clk)
begin begin
clk = 1; if ( clk_count == 10000000)
pc=0; $finish;
i=0; clk_count = clk_count+1;
ssr=0; instruction=instr_mem[pc];
$readmemh("instruction.hex",instr_mem); // instr_type[1:0]=instruction[31:30];
$readmemh("data.hex",regfile); // opcode[5:0]=instruction[IWIDTH-3:IWIDTH-8];

// regfile[0] = 16'b0000000000000001 ; // value


of op1 case(instruction[31:30])
// regfile[1] = 16'b0000000000000011 ; // value 2'b00 :begin
of op2 cond=instruction[IWIDTH-9:IWIDTH-16];
// instr_mem[0] = ja=instruction[IWIDTH-17:IWIDTH-24];
32'b01000001000000110000000000000001 ; end
end 2'b01 :begin
wr=instruction[IWIDTH-9:IWIDTH-16];
initial ra=instruction[IWIDTH-17:IWIDTH-24];
//#350000000 $finish; rb=instruction[IWIDTH-25:IWIDTH-32];
clk_count = clk_count+1; op1=regfile[(ra)];
op2=regfile[(rb)];
always
begin end
#20 clk = !clk;

end

381
Ver 1.4

A Simple Verilog RISC Core - 4

// //
2'b10 :begin OP_ADD_PLUS_ONE :begin
wr=instruction[IWIDTH-9:IWIDTH-16]; // result = itype_to_signed(op1) + itype_to_signed(op2);
ra=instruction[IWIDTH-17:IWIDTH-24]; result = op1 + op2 + 1;
//regfile[std_logic_vector_to_integer(wr)]=result;
op1=regfile[(ra)];
ssr[2:0] = check_result(result,op1,op2);
//op2=sign_extend(instruction[IWIDTH-25:IWIDTH-32]); //$display("ssr = %d", ssr);
op2 = { 8'b00000000, instruction[IWIDTH-25:IWIDTH- //$display("op1 + op2 + 1 = %b ", result);
32]}; pc = pc + 1;
end end
2'b11 :begin
wr=instruction[IWIDTH-9:IWIDTH-16]; OP_A :begin
//op1=sign_extend(instruction[IWIDTH-17:IWIDTH-29]); result = op1 ;
op1 = { 3'b000, instruction[IWIDTH-17:IWIDTH-29]}; ssr[2:0] = check_result(result, op1, op2) ;
//$display("ssr = %d", ssr);
end
//$display("op1 = %b ", result);
default :; pc = pc + 1;
endcase end

case (instruction[IWIDTH-3:IWIDTH-8]) OP_A_PLUS_ONE :begin


OP_ADD :begin result = (op1) + 1;
// result = itype_to_signed(op1) + itype_to_signed(op2); ssr[2:0] = check_result(result, op1, op2);
result = op1+op2; //$display("ssr = %d", ssr);
ssr[7:0] = {5'b00000,check_result(result, op1, op2)};
//$display("ssr = %d", ssr); //$display("op1 + 1 = %b ", result);
// //$display("op1 + op2 = %b ", result);
pc = pc + 1;
pc = pc + 1; end

end OP_AND :begin


result = (op1) & (op2) ;
ssr[2:0] = check_result(result, op1, op2);
//$display("ssr = %d", ssr);
//$display("op1 AND op2 = %b ", result, result);
pc = pc + 1;

end
382
Ver 1.4

191
A Simple Verilog RISC Core - 5
// //
OP_NOT_A_AND_B :begin OP_A_XNOR_B :begin
result = (~op1) & (op2) ; // result = CONV_SIGNED(SIGNED(not (op1 xor op2)),
ssr[2 : 0] = check_result(result, op1, op2); RWIDTH+1) ;
//$display("ssr = %d", ssr); result = ~ ((op1) ^ (op2)) ;
//$display("(NOT op1) AND op2 = %b ", result); ssr[2 : 0] = check_result(result, op1, op2);
pc = pc + 1; //$display("ssr = %d", ssr);
end //$display("op1 XNOR op2 = %b", result);
pc = pc + 1;
OP_B :begin end
// result = abs(CONV_SIGNED(SIGNED(op2),
RWIDTH+1)) ; OP_NOT_A :begin
result = op2 ; // result = abs ((CONV_SIGNED(SIGNED(not op1),RWIDTH+1)))
ssr[2 : 0] = check_result(result, op1, op2) ; ;
//$display("ssr = %d", ssr); result = ~(op1) ;
//$display("op2 = %b ", result); ssr[2 : 0] = check_result(result, op1, op2);
pc = pc + 1; //$display("ssr = %d", ssr);
end //$display("NOT op1 = %b ",result);
pc = pc + 1;
OP_NOT_A_AND_NOT_B :begin end
result = (~ op1) & (~ op2) ;
ssr[2 : 0] = check_result(result, op1, op2); OP_NOT_A_OR_B :begin
//$display("ssr = %d", ssr); // result = abs (CONV_SIGNED(SIGNED((not op1) OR
//$display("(NOT op1) AND (NOT op2) = %b ", result); (op2)),RWIDTH+1)) ;
pc = pc + 1;
result = (~ op1) | (op2) ;
end
ssr[2 : 0] = check_result(result, op1, op2);
//$display("ssr = %d", ssr);
//$display("(NOT op1) OR op2 %b ", result);
pc = pc + 1;
end

383
Ver 1.4

A Simple Verilog RISC Core - 6

// //
OP_SUB_MINUS_ONE :begin OP_A_XOR_B :begin
result = op1 - op2 -1; result = op1 ^ op2 ;
ssr[2 : 0] = check_result(result, op1, op2); // result = (CONV_STD_LOGIC_VECTOR(SIGNED(op1),
//$display("ssr = %d", ssr); RWIDTH+1) xor CO
NV_STD_LOGIC_VECTOR(SIGNED(op2), RWIDTH+1)) ;
//$display("op1 - op2 - 1 = %b ",result); ssr[2 : 0] = check_result(result, op1, op2) ;
pc = pc + 1; //$display("ssr = %d", ssr);
end //$display("op1 XOR op2 = %b ", result);
pc = pc + 1;
OP_SUB :begin end
result = op1 - op2 ;
ssr[2 : 0] = check_result(result, op1, op2) ; OP_A_OR_B :begin
//$display("ssr = %d", ssr); result = op1 | op2 ;
//$display("op1 - op2 = %b ", result); // result = (CONV_STD_LOGIC_VECTOR(SIGNED(op1),
pc = pc + 1; RWIDTH+1) or CONV_STD_LOGIC_VECTOR(SIGNED(op2), RWIDTH+1)) ;
end // result = op1 or op2 ;
ssr[2 : 0] = check_result(result, op1, op2) ;
OP_A_MINUS_ONE :begin //$display("ssr = %d", ssr);
result = op1 - 1 ; //$display("op1 OR op2 = %b ", result);
ssr[2 : 0] = check_result(result, op1, op2) ; pc = pc + 1;
//$display("ssr = %d", ssr);
//$display("op1 - 1 = %b ", result); end

pc = pc + 1;
OP_NOT_B :begin
result = ~ op2 ;
end
ssr[2 : 0] = check_result(result, op1, op2) ;
//$display("ssr = %d", ssr);
OP_A_AND_NOT_B :begin
//$display("NOT op2 = %b ",result);
result = (op1) & (~ op2) ;
ssr[2 : 0] = check_result(result, op1, op2) ;
pc = pc + 1;
//$display("ssr = %d", ssr);
//$display("op1 AND (NOT op2) = %b ", result); end
pc = pc + 1;
end
384
Ver 1.4

192
A Simple Verilog RISC Core - 7

// //
OP_JTRUE :begin
OP_A_OR_NOT_B :begin case (cond)
ALU_NEG :begin
result = op1 | (~ op2) ; if (ssr[NEG] ==1)
// result = pc = (ja);
(CONV_STD_LOGIC_VECTOR(SIGNED(op1),RWIDTH+1)) OR else
pc = pc + 1;
// (CONV_STD_LOGIC_VECTOR(SIGNED(not end
op2),RWIDTH+1)) ;
ssr[2 : 0] = check_result(result, op1, op2) ; ALU_ZERO :begin
//$display("ssr = %d", ssr); if (ssr[ZERO] == 1)
//$display("op1 OR (NOT op2) = %b ", result); pc = (ja);
pc = pc + 1; else
end pc = pc + 1;
end
OP_NAND :begin
result = ~(op1 & op2) ; ALU_CARRY :begin
// result = if (ssr[CARRY] == 1)
CONV_STD_LOGIC_VECTOR(SIGNED(op1),RWIDTH+1) NAND pc = (ja);
else
// pc = pc + 1;
CONV_STD_LOGIC_VECTOR(SIGNED(op2),RWIDTH+1) ; end
ssr[2 : 0] = check_result(result, op1, op2) ;
//$display("ssr = %d", ssr); ALU_NEG_ZERO :begin
//$display("op1 NAND op2 = %b ",result); if (ssr[ZERO] == 1 || ssr[NEG] == 1)
pc = pc + 1; pc = (ja);
else
end pc = pc + 1;
end
//BOOL_SHIFT_ZERO :begin
//end
//ALU_TRUE :begin
//end

default :;

385
Ver 1.4

A Simple Verilog RISC Core - 8


// //
OP_CALL :begin
OP_JFALSE :begin pc = pc + 1;
case ( cond ) stack_ssr[i]=ssr;
ALU_NEG :begin stack_pc[i]=pc;
if (ssr[NEG] == 0) i = i + 1;
pc = (ja); pc = (ja);
else end
pc = pc + 1;
end OP_RET :begin
i = i - 1;
ALU_ZERO :begin
if (ssr[ZERO] == 0) ssr=stack_ssr[i];
pc = (ja); pc=stack_pc[i];
else end
pc = pc + 1;
end endcase
ALU_CARRY :begin end
if (ssr[CARRY] == 0)
pc = (ja);
else
endmodule
pc = pc + 1;
end
ALU_NEG_ZERO :begin
if (ssr[ZERO] == 0 | ssr[NEG] == 0)
pc = (ja);
else
pc = pc + 1;
end

BOOL_SHIFT_ZERO :begin
end
ALU_TRUE :begin
end

default :begin
end

endcase

end 386
Ver 1.4

193
A Simple Verilog RISC Core - sc_async Integer - 1

//risc.h //risc.h
#include <stdio.h> #define OP_NOT_B 28
#include <stdlib.h> #define OP_A_OR_NOT_B 29
#include "systemc.h" #define OP_NAND 30
#define OP_JTRUE 32
#define OP_JFALSE 34
#define OP_HALT 63
#define OP_ADD 0 #define OP_CALL 16
#define OP_ADD_PLUS_ONE 1 #define OP_RET 8
#define OP_A 2
#define OP_A_PLUS_ONE 3
#define OP_AND 17 #define ALU_NEG 0
#define OP_NOT_A_AND_B 18 #define ALU_ZERO 1
#define OP_B 19 #define ALU_CARRY 2
#define OP_NOT_A_AND_NOT_B 24 #define ALU_NEG_ZERO 3
#define OP_A_XNOR_B 25 #define BOOL_SHIFT_ZERO 4
#define OP_NOT_A 26 #define ALU_TRUE 63
#define OP_NOT_A_OR_B 27
#define OP_SUB_MINUS_ONE 4 #define ZERO 0
#define OP_SUB 5 #define NEG 1
#define OP_A_MINUS_ONE 6
#define OP_A_AND_NOT_B 20 #define CARRY 2
#define OP_A_XOR_B 22

#define OP_A_OR_B 23

387
Ver 1.4

A Simple Verilog RISC Core - sc_async Integer - 2


//risc.h //risc.h
class risc_nopipe : public sc_async FILE* fp_instr = fopen("instr.hex","r");
{ if ( fp_instr == NULL )
{
printf("ERROR : no instruction file specified \n");
const sc_signal<bool>& clock_i; exit(0);
}
unsigned pc ; // program counter
unsigned reg_file[128]; // values of operands FILE* fp_data = fopen("reg.bin", "r");
if ( fp_data == NULL )
unsigned instr_mem[256]; {
unsigned ssr; printf("ERROR : no data file specified \n");
unsigned stack_ssr[256]; exit(0);
unsigned stack_pc[255]; }
unsigned stack_top; int i = 0 ;

for( i = 0; i<256; i++)


{
if ( feof(fp_instr) )
public : break;
fscanf(fp_instr,"%x",&instr_mem[i]);
}
// Constructor for( i = 0; i< 128; i++)
risc_nopipe( const sc_string& NAME, {
const sc_signal<bool>& CLOCK_I) if ( feof(fp_data))
: sc_async(NAME),clock_i(CLOCK_I) break;
fscanf(fp_data,"%d",&reg_file[i]);
{ }
fclose(fp_instr);
sensitive(clock_i);
fclose(fp_data);
pc = 0;
}
stack_top = 0;
ssr = 0;
/** Process functionality in member function below ****/
void entry() ;

388
Ver 1.4 };

194
A Simple Verilog RISC Core - sc_async Integer - 3

//risc.cc //risc.cc
#include<stdio.h> signed result ; // (15,0);
#include "risc.h"

if ( clock_i.posedge())
#define CHECK_RESULT if ( result == 0 ) ssr = {
ssr | 0x01; else ssr = ssr & 0xf instruction = instr_mem[pc];
e; if ( result < 0 ) ssr = ssr | 0x02; else ssr = ssr & instr_type = (instruction ) >> 30 ; //sub(31,30);
0xfd ; if (((op1 >> 15) opcode = (instruction & 0x3fffffff) >> 24; //
^ (op2 >> 15) ) != 0) ssr = ssr & 0xfb; else {if (( sub(29,24)
(result < 0) ^ (op1>>15)) == switch (instr_type)
1) ssr = ssr | 0x04 ; else ssr = ssr &0xfb;} {
case 0 :
cond = (instruction & 0x00ffffff) >> 16;
void risc_nopipe :: entry() ja = (instruction & 0x0000ffff) >> 8;
{ break;
case 1 :
unsigned instruction ; //(31,0); wr = (instruction & 0x00ffffff) >> 16;
unsigned instr_type ; //(1,0); ra = (instruction & 0x0000ffff) >> 8;
unsigned opcode ; //(5,0); rb = (instruction & 0x000000ff );
unsigned cond ; // (7,0); op1 = reg_file[ra];
unsigned ja ; // (7,0); op2 = reg_file[rb];
unsigned ra ; // (7,0); break;
unsigned rb ; // (7,0); case 2 :
unsigned wr ; //(7,0);
unsigned op1 ; //(15,0); wr = (instruction & 0x00ffffff) >> 16;

unsigned op2 ; // (15,0);


389
Ver 1.4

A Simple Verilog RISC Core - sc_async Integer - 4

//risc.cc //risc.cc
ra = (instruction & 0x0000ffff) >> 8;
op1 = reg_file[ra]; CHECK_RESULT;
op2 = instruction & 0x000000ff; //printf(" ssr = %d\n", ssr);
break; //printf("op1 + op2 + 1 = %d \n",
case 3 : result);
wr = (instruction & 0x00ffffff) >> 16; pc = pc + 1;
op1 = (instruction &0x0000ffff) >> 3; break;
break;
default : case OP_A : result = op1;
pc = pc + 1; CHECK_RESULT;
break; //printf(" ssr = %d\n", ssr);
} //printf(" op1 = %d \n", op1);
pc = pc + 1;
break;
switch ( opcode)
{ case OP_A_PLUS_ONE :
case OP_ADD : result = op1 + op2; result = op1 + 1;
CHECK_RESULT; CHECK_RESULT;
//printf(" ssr = %d\n", ssr); //printf(" ssr = %d\n", ssr);
//printf("op1 + op2 = %d \n", result); // //printf("op1+ 1 = %d \n", result);
pc = pc + 1; pc = pc + 1;
break; break;
case OP_AND :
case OP_ADD_PLUS_ONE : result = op1 + op2 + 1; result = op1 & op2 ;

CHECK_RESULT;

390
Ver 1.4

195
A Simple Verilog RISC Core - sc_async Integer - 5

//risc.cc //risc.cc
//printf(" ssr = %d\n", ssr); CHECK_RESULT;
//printf(" op1 & op2 = %d \n", //printf(" ssr = %d\n", ssr);
result); //printf(" (NOT OP1) & ( NOT OP2) =
pc = pc + 1; %d \n", result);
break; pc = pc + 1;
break;
case OP_NOT_A_AND_B :
result = ( ~op1) & op2; case OP_A_XNOR_B :
CHECK_RESULT; result = op1 ^ op2;
//printf(" ssr = %d\n", ssr); result = ~ result;
//printf(" OP_NOT_A_AND_B = CHECK_RESULT;
%d\n", result); //printf(" ssr = %d\n", ssr);
pc = pc + 1; //printf( "OP A XNOR B = %d \n",
break; result);
pc = pc + 1;
case OP_B : break;
result = op2;
CHECK_RESULT; case OP_NOT_A :
//printf(" ssr = %d\n", ssr); result = ~op1;
//printf(" op 2 = %d\n", result); CHECK_RESULT;
pc = pc + 1; //printf(" ssr = %d\n", ssr);
break; //printf( " NOT A = %d \n", result);
pc = pc + 1;
case OP_NOT_A_AND_NOT_B :
break;
result = ( ~op1) & ( ~op2);

391
Ver 1.4

A Simple Verilog RISC Core - sc_async Integer - 6

//risc.cc //risc.cc
case OP_NOT_A_OR_B : case OP_A_MINUS_ONE :
result = ~op1 | op2; result = op1 - 1;
CHECK_RESULT; CHECK_RESULT;
//printf(" ssr = %d\n", ssr); //printf(" ssr = %d\n", ssr);
//printf(" (NOT A ) or B %d \n" , //printf( " op1 - 1 = %d \n", result);
result); pc = pc + 1;
pc = pc + 1; break;
break; case OP_A_AND_NOT_B :
result = op1 & ( ~op2);
case OP_SUB_MINUS_ONE : CHECK_RESULT;
result = op1 - op2 - 1; //printf(" ssr = %d\n", ssr);
CHECK_RESULT; //printf( " op1 AND ( not op2 ) %d \n",
//printf(" ssr = %d\n", ssr); result);
//printf( " op1 - op2 - 1 = %d \n", pc = pc + 1;
result); break;
pc = pc + 1; case OP_A_XOR_B :
break; result = op1 ^ op2;
case OP_SUB : CHECK_RESULT;
result = op1 - op2; //printf(" ssr = %d\n", ssr);
CHECK_RESULT; //printf( "op1 XOR op2 = %d \n",
//printf(" ssr = %d\n", ssr); result);
//printf(" op1 - op2 = %d\n", result); pc = pc + 1;
pc = pc + 1; break;
case OP_A_OR_B :
break;
result = op1 | op2 ;

392
Ver 1.4

196
A Simple Verilog RISC Core - sc_async Integer - 7

//risc.cc //risc.cc
CHECK_RESULT; //printf(" ssr = %d\n", ssr);
//printf(" ssr = %d\n", ssr); //printf( " op1 NAND op2 = %d \n", result);
//printf( " op1 OR op2 = %d \n", result); pc = pc + 1;
pc = pc + 1; break;
break; case OP_JTRUE :
case OP_NOT_B : switch ( cond)
result = ~op2; {
CHECK_RESULT; case ALU_NEG :
//printf(" ssr = %d\n", ssr); if ( ((ssr & 0x02)>> NEG) == 1)
//printf( " NOT op2 = %d\n", result); pc = ja;
pc = pc + 1; else
break; pc = pc + 1;
case OP_A_OR_NOT_B : break;
result = op1 | ( ~op2); case ALU_ZERO :
CHECK_RESULT; if (( ssr & 0x01) == 1)
//printf(" ssr = %d\n", ssr); pc = ja;
//printf( " op1 OR ( not op2 ) %d \n", else
result); pc = pc + 1;
pc = pc + 1; break;
break; case ALU_CARRY :
case OP_NAND : if ( (ssr & 0x04 ) >>CARRY == 1)
result = op1 & op2 ; pc = ja;
result = ~result;
else
CHECK_RESULT;

393
Ver 1.4

A Simple Verilog RISC Core - sc_async Integer - 8

//risc.cc //risc.cc
pc = pc + 1; if (( ssr & 0x01) == 1)
break; pc = ja;
case ALU_NEG_ZERO : else
if ( ((ssr & 0x01 )== 1) && (( (ssr pc = pc + 1;
& 0x02) >> NEG) == 1)) break;
pc = ja; case ALU_CARRY :
else if ( (ssr & 0x04 ) >>CARRY ==
pc = pc + 1; 0)
break; pc = ja;
default : pc = pc + 1; else
break; pc = pc + 1;
} break;
break; case ALU_NEG_ZERO :
case OP_JFALSE : if ( ((ssr & 0x01 )== 0) && (( (ssr
switch ( cond) & 0x02) >> NEG) == 0))
{ pc = ja;
case ALU_NEG : else
if ( ((ssr & 0x02)>> NEG) == 0 ) pc = pc + 1;
pc = ja; break;
else default : pc = pc + 1;
pc = pc + 1; break;
break; }

case ALU_ZERO : break;

394
Ver 1.4

197
A Simple Verilog RISC Core - sc_async Integer - 9

//risc.cc
case OP_CALL :
pc = pc + 1;
stack_ssr[stack_top] = ssr;
stack_pc[stack_top] = pc;
stack_top = stack_top + 1;
pc = ja;
break;
case OP_RET :
stack_top = stack_top - 1;
ssr = stack_ssr[stack_top];
pc = stack_pc[stack_top];
break;

default :
pc = pc + 1;
break;
}
}

395
Ver 1.4

A Simple Verilog RISC Core - sc_async Integer -

//risc.cc //risc.cc

396
Ver 1.4

198
Proverb for Effective Refinement

e coding
The mor ues
d techniq
tricks an ow
kn
that you
ier your
the happ life will be
nt
refineme

397
Ver 1.4

Summary

 Refinement
 covered how to refine a model taking bit-width into
consideration
 and FSM

398
Ver 1.4

199
Agenda: Day Three
DAY
3
Unit Topic Lab

14 Global and Local Watching

15 Modeling BUS with Resolved Vector

16 Refinement

17 Functional I/F

18 Hardware/Software Co-verification

399
Ver 1.4

The Changing System Design Market


System Design Today System Design Tomorrow

ASIC
Data µP
DSP
Random 5000
Shift in Design Method
µP DSP ASIC
Path Logic
Program
Data- 4500
RAM
Ram memory FPGA path ROM
4000
3500 Annual Revenue
3000
2500
2000
1500
1000
500
0
1991 19921993 19941995 1996 19971998 19992000
ASIC * Source: IBS
ASSP/Custom
On-Chip System

400
Ver 1.4

200
Systems Require Multiple IP Sources

Cisco
Micron AMD SUN
Cisco
AMD SUN

NEC
NEC
Motorola
ASIC

401
Ver 1.4

System Level Building Blocks

As systems move on to chips .......

Chip verification becomes a system problem

• System performance
• Component interaction
• HW / SW debug

ASIC

Detailed timing & logic

SYSTEMS MUST BE DESIGNED WITH PRE-VERIFIED COMPONENTS

402
Ver 1.4

201
Verification Becomes the Challenge

Gates Frequency Vectors Specification

1990 15 K 25 MHz 1 x106 Text

1996 250 K 50-100 MHz 1 x108 Testbench

2000 >2 M 500 MHz “minutes” Live Systems

403
Ver 1.4

Verification in Synthesis-Based Design

Must-Have:

❏ Fast functional simulation at all levels of System/Algorithm


Design
Specification
Validation
abstraction

❏ Consistent verification tool set that High Level Hardware


Design (Behavioral)
Fast
matches the design task Functional
Simulation

Hardware Architecture
❏ Common verification environment Design (RTL)

throughout the design process


Implementation
Logic and Test Verification
❏ Streamlined path to sign-off Implementation (Gate)

❏ Tight integration with synthesis design


Physical Physical
flow Design (Layout) Verification

❏ Interface between HW/SW environments

404
Ver 1.4

202
Designing with Pre-verified Blocks

New Flexible Fixed


Block Block Block
Pre-verified System Blocks

• Functionally verified
• Abstracted timing
• Accurately modeled
• Protected IP
• Transportable at physical level

405
Ver 1.4

Classes of Verification

Specification Validation
Validate this is what I want to build
Specification Remove ambiguity from spec
Validation

Fast
Functional High performance Functional Verification
Simulation Verify that the implementation captures the
desired functionality
Implementation
Verification
Implementation Verification
Physical Verify final design equals the
Verification RTL golden spec
Streamlined paths to signoff

406
Ver 1.4

203
Cycle-Accurate Verification Using SystemC

Uses the same design description as the


creation tools
Supports flexible design constructs for
testbench creation
100 - 10,000 time the speed of event
simulation
Flexible debug and fast design
modification
Supports creation of system models and
reusable blocks

System System HW/SW


Verification

Cycle-Based Hardware -Based High


Functional RTL Performance
Simulation
Acceleration Models
Full language Formal
Implementation Static Timing
Event Simulation Verification

407
Ver 1.4

IP Challenges

Authoring Protecting
the IP? the IP?

Integrating
the IP?
Evaluating
the IP?
Implementing
the IP? Delivering
the IP?
Configuring
the IP?
Validating
the IP?

Key to Reducing the barriers Advanced Specialist EDA Tools


+
Advanced “User friendly” IP Models

408
Ver 1.4

204
IP Protection Problems?

Processes and modules are instantiated


CPU Core as variables.
Reg Decode Requires that the interface file is exported
BUS Exporting exposes the internals of the process or module to the
Fetch

FPU ALU user.


SIMD
May want to hide the internal structure
Interfacethat shows only the input and output signals of a
process or module

// cpu_core header file


struct cpu_core : sc_module {
ports;
internal signals;
module Reg, BUS, FPU, SIMD
Fetch, ALU, Decode ….;
constructor
}

409
Ver 1.4

Functional I/F - IP Protection Solution


A solution:
CPU Core
Make process or module instantiation identical to a
Reg Decode
function call
BUS
Fetch

FPU ALU Only the function prototype (doesn’t contain internal


SIMD details) is exported

Compile as object file


// cpu_core header file /library file
struct cpu_core : sc_module {
ports;
internal signals;
module Reg, BUS, FPU, SIMD
Fetch, ALU, Decode ….;
constructor
}

// cpu_core .cc implementation file extern f_cpu_core(const * Name, ports)


f_cpu_core( const char * NAME, ….
SC_NEW (cpu_core (NAME, CLK,….)) ;}

This is what user see!

410
Ver 1.4

205
Process1 - Functional I/F
 Function f_process1
 Takes exactly the same arguments as the constructor for
process1
 Must have a return type of void
 SC_NEW()
 SystemCTM function
 Creates process process1 and makes the process object
persistent

// addition to implementation file for the process process1


void f_process1 (const char *NAME,
sc_clock_edge& CLK,
const sc_signal<bool>& A,
const sc_signal<bool>& B,
sc_signal<bool>& C,
sc_signal<bool>& D)
{
SC_NEW (process1 (NAME, CLK, A, B, C, D)) ;
}

411
Ver 1.4

Prototype of Function Export


 Prototype exported through a separate interface file - the
functional interface file
 Same name as the interface file, but with prefix f_

 Naming convention not a requirement

// Filename f_process1.h
// Functional interface file for process
process1
extern void f_process1 (const char *NAME,
sc_clock_edge& CLK,
const sc_signal<bool>& A,
const sc_signal<bool>& B,
sc_signal<bool>& C,
sc_signal<bool>& D) ;
extern declaration indicates
that the function is defined in
some other file.

412
Ver 1.4

206
Functional I/F & Functional Interface File

// addition to implementation file for the process process2

void f_process2 (const char *NAME,


sc_clock_edge& CLK,
const sc_signal<bool>& E,
const sc_signal<bool>& F,
const sc_signal<bool>& G)
{
SC_NEW (process2 (NAME, CLK, E, F, G)) ;
}

// Filename f_process2.h
// Functional interface file for process process2
extern void f_process2 (const char *NAME,
sc_clock_edge& CLK,
const sc_signal<bool>& E,
const sc_signal<bool>& F,
const sc_signal<bool>& G) ;

413
Ver 1.4

Functional I/F Use


 Once declared it can be used wherever a process is instantiated.
 Compare example below module.h in module section
#include “process1.h”
#include “process2.h” Declarations of process1 and
struct mymod : public sc_module { process two have been removed
// Internal signals
sc_signal<bool> sig_int;
// Note no processes as data elements
// Constructor
mymod(const char *NAME, Constructor only invokes
sc_clock_edge& CLK1,
sc_clock_edge& CLK2, constructor of the base
const sc_signal<bool>& M, class sc_module
const sc_signal<bool>& N,
sc_signal<bool>& P)
: sc_module(NAME) // note no other initializers called
{
f_process1 (“Proc1”, CLK1, M, M, sig_int, P);
f_process2 (“Proc2”, CLK2, sig_int, P, N); Body of the constructor has
end_module( ); the function calls that
}
}; instantiate process1 and
process2.

414
Ver 1.4

207
Example: Simple Arithmetic Pipeline Design - 1
Stage2.cc Stage3.cc display.cc
Stage1.cc numgen.cc

Stage2.h display.h
Stage1.h Stage3.h numgen.h

Stage1_2.h Testbench.h

Pipeline.h

main.cc

415
Ver 1.4

Example: Simple Arithmetic Pipeline Design - 2


 Let us take the process called stage1 and create a new function

//additional to implementation file


//for stage1 process

void f_stage1(const char *NAME,


sc_clock_edge& CLK,
const sc_signal<double>& IN1,
const sc_signal<double>& IN2,
sc_signal<double>& SUM,
sc_signal<double>& DIFF)
{
SC_NEW(stage1(NAME,CLK,IN1,IN
2,
SUM, DIFF));
}

416
Ver 1.4

208
Example: Simple Arithmetic Pipeline Design - 3
 Similarly for the process called stage2,3 and create a new function


//additional to implementation file //additional to implementation file


//for stage2 process //for stage3 process

void f_stage2(const char *NAME, void f_stage3(const char *NAME,


sc_clock_edge& CLK, sc_clock_edge& CLK,
const sc_signal<double>& SUM, const sc_signal<double>& PROD,
const sc_signal<double>& DIFF, const sc_signal<double>& QUOT,
sc_signal<double>& PROD, sc_signal<double>& POWR)
sc_signal<double>& QUOT) {
{ SC_NEW(stage3(NAME,CLK,PROD,
SC_NEW(stage2(NAME,CLK,SUM, QUOT, POWR));
DIFF, PROD, QUOT)); }
}

417
Ver 1.4

Example: Simple Arithmetic Pipeline Design - 4


 Once the function is defined, the prototype needed to be exported.
 Exporting the prototype through a separate interface file will hide
the internal details of a process
 Similarly for f_stage2.h and f_stage3.h

/* Filename f_stage1.h */
/* This is the functional interface file for synchronous process `stage1' */

extern void f_stage1(const char *NAME,


sc_clock_edge& CLK,
const sc_signal<double>& IN1,
const sc_signal<double>& IN2,
sc_signal<double>& SUM,
sc_signal<double>& DIFF);

418
Ver 1.4

209
Example: Simple Arithmetic Pipeline Design - 5
Functional interface can be used in module declaration too.

Everything is identical except for the constructor function which


now invokes only the constructor for the sc_module class. The body
of it has the function calls that instantiate the process stage1 and stage2
and connect them with internal signals SUM and DIFF.
419
Ver 1.4

Example: Simple Arithmetic Pipeline Design - 6


Let use now use the functional interface f_stage3 in the
definition of the pipeline module

Instantiation through
function calls can coexist
with instantiation through
variable definition

Note the module stage1_2 does not have a functional interface, therefore
S1_2 is declared as data member (line10). On the other hand, functional
interface for stage3 is used, so f_stage3 is called in line 21.
420
Ver 1.4

210
Example: Simple Arithmetic Pipeline Design - 7
Let use now use the functional interface f_pipeline in the
definition of the pipeline module

Let use now use the functional interface for implementation


file pipeline.cc

421
Ver 1.4

Example: Simple Arithmetic Pipeline Design - 8


Putting all together

422
Ver 1.4

211
Lab 10: Simple FIR Filter IP
Primary Objectives:
understand SystemC’s functional Interface usage
your task is to update fir.cc and fir.h using functional interface

Background
• A simple FIR filter which reads in samples with each
input_valid signal and writing out the result when
output_data_ready is high.
•The filter is a 16 tap FIR filter(fir.cc).
•The test bench feeds simply ascending values into the
FIR(stimulus.cc) and the
•Output is sampled(display.cc) and displayed with print
statements.

423
Ver 1.4

Lab 10: Simple FIR Filter IP

FINITE-IMPULSE RESPONSE FILTER

Z −1 Z −1 .... Z −1

C1 C2 C N −1 CN

424
Ver 1.4

212
Lab 10: Simple FIR Filter IP

000 x(0) F(0)

100 x(4) F(1)

010 x(2) F(2)

110 x(6) F(3)

001 x(1) F(4)

101 x(5) F(5)

011 x(3) F(6)

111 x(7) F(7)

Four 2-point Two 4-point One 8-point DFT


DFTs DFTs

Data flow in the radix-2 decimation-in-time FFT algorithm


425
Ver 1.4

Lab 10: Sample Output


Sample output:

426
Ver 1.4

213
Summary

 FunctionalI/F
 covered Functional I/F which is useful for IP
exchange

427
Ver 1.4

Agenda: Day Three


DAY
3
Unit Topic Lab

14 Global and Local Watching

15 Modeling BUS with Resolved Vector

16 Refinement

17 Functional I/F

18 Hardware/Software Co-verification

428
Ver 1.4

214
It is Silicon and Software!

applications / standards / specifications

Systems algorithm exploration


system partitioning
Hardware

✍ Software


application-specific processor
µP
programmable DSP/µ
architecture exploration
µP code
DSP/µ handcrafted
behavioral synthesis generation µP code
DSP/µ
compilers


FSM generators
µP code integration
DSP/µ
logic synthesis
✍ µP
ROM-coded DSP/µ
standard cells FPGA

application specific general purpose


hard-wired processor programmable
429
Ver 1.4 processor

Silicon Complexity vs. Software Complexity

● Silicon complexity is My ● Software in systems is


growing 10x every 6 Design growing faster than
years 10x every 6 years.

Log scale

1G 256M

64M
100 M
16M
P7
10 M 4M P6
IBM gate array
Pentium
1M Mitsubishi
gate array
1M 256K 80486TMS320C80
80386 68040
64K 68020 LSI Logic gate array
100 K 80286 TMS320C240 Memory (DRAM)
16K
68000
4K 8086 TMS320C30 Microprocessor/
10 K Logic
1K 8085 TMS320C15 DSP
8080
1K 4044
70 74 78 82 86 90 94 98

430
Ver 1.4

215
Algorithms
An algorithm is a sequence of computational steps that
transform the input into the output
Demodulation, Parameter Estimation,
& Symbol Detection

Modulation &
Pulse-shaping Source Coding
Channel Coding &
Interleaving

Sequence, conditionals, loops of


memory access A[i],A[j] assignments t<-,i<- and operations +,and,>

431
Ver 1.4

HARDWARE/SOFTWARE CO-VERIFICATION

Hardware
Window Is the ASIC
hardware working
correctly with the
µp Controller or
µP?

Co-Simulation Is the HW/SW


Interface co-simulation
Bridge interface working?

Is the application
software working
Software correctly with the
Window ASIC hardware?

432
Ver 1.4

216
C/C++ Based HW/SW Co-Design Motivations

 Well-known Programming Language


 tools: compiler, debugger
 legacy code
 System Modeled in C/C++
 verify functionality with fast simulation speed
 estimate performance
 software/hardware migration
 Libraries for hardware modeling exists
 parallelism and communication through signals
and channels
 data types + bit-level operations
 timing

433
Ver 1.4

Software Flow: HW/SW Co-verification

 Functional Specification
 validate algorithm and functionality
 processes can be mapped to SW or HW
 no timing
 Architectural specification
 processes are mapped to specific HW blocks (e.g. processor,
memory, DSP…)
 test interfaces: e.g. memory map, interrupts…
 bus functional model (BFM) for processors
 abstract behavioral model for hardware
 timing: e.g. number of cycles, buffer size
 use BFM + ISS (instruction set simulator) for proccessors
 refined RTL description for hardware

434
Ver 1.4

217
Architectural Level: Untimed
 Bus Functional model (BFM)
 model transactions on processor bus
 untimed software execution

Software Hardware

C/C++ BFM Hardware


Model

435
Ver 1.4

Architectural Level: Cycle - Accurate


 Bus Functional model (BFM)
 model transactions on processor bus ( cycle
accurate)
 Cycle Accurate Instruction Set Simulator (ISS)
 fetch, decode, execute instructions

Software Hardware

Assembly ISS BFM Hardware


Model

436
Ver 1.4

218
Speeding up Simulation
 Reduce activity on memory bus
 95% of traffic on memory, bus comes from instructions and data transfer
 bus functional model (BFM) integration after testing memory interface
 instruction memory
 part of data memory ( definition of memory map)
 turn off clocks on modules
 bus functional model (BFM) generate clock signal on when devices
are “active”.

BFM

Inst data device RAM

Memory Bus

Controller
Clock

437
Ver 1.4

Hardware/software partition
 Hardware/software partition / co-verification
 e.g. implement FFT in software and run it through ISS and
compare result with FFT hardware model
 then determine whether to implement FFT in software or
hardware
 co-verify each other for accuracy

ISS

Inst data FFT RAM

Memory Bus

CPUCore
Clock

438
Ver 1.4

219
Types of Simulation Models

Hardware
models
Full Functional Hardware Modeler
models

Main Model System Software


Bus Functional
model models models

Model Vendor

Bus Interface Software


models models

Model Vendor
User Defined

439
Ver 1.4

Full Functional Models

uP
System

Accurately represent interaction between ASIC and


key functional components
ASIC
Check interaction between ASIC and Memory or
between ASIC and Microprocessor Memory

Test ASIC under different system timing conditions


Verify ASIC operation with min/typical/max timing
for external components

440
Ver 1.4

220
Bus Functional Models

Bus Functional Model

Bus cycle simulation of standard


microprocessor (eg. Pentium III)
with bus (eg. AGP) interconnect System Bus

Stimulus is provided by a high level


language

User
ASIC

441
Ver 1.4

Bus Interface Models

Bus Interface Model

Simulate all types of bus cycles


Stimulus is provided by the bus control
System Bus
commands
Bus models avoid handwritten vectors and
simplifies test development
Verify compliance of interface circuits to
specification

User
ASIC

442
Ver 1.4

221
Hardware Models

Device

486
486
K62

Ethernet
Timing Workstation running
Files Device(s) LM-family hardware simulator
mounted on model server
an adapter

A Hardware Model uses the silicon itself to model


functional behavior
Model is fully functional and may include
undocumented behavior
The server applies patterns to the device, captures the
responses and returns results & timing to simulator

443
Ver 1.4

Software Models

Software models use high level language to model


functional behavior
VHDL models
C models
Software models can be developed or purchased
User defined models
Models available from third party
Software models are available from third party as
VHDL source models
Pre-compiled models

444
Ver 1.4

222
Hardware/software Co-design

Software
Hardware
RTOS
C/C++ CPU
SystemC Bus
memory

VHDL
Verilog
SystemC
ASM
OMI
C/C++

445
Ver 1.4

Summary

 Hardware and Software Co-Verification


 covered issues involving hardware and software
co-verification for System Level design

446
Ver 1.4

223
Agenda: Day Three
DAY
3
Unit Topic Lab

19 System-on-a-Chip

20 Workshop Summary

21 Appendix and Labs Solutions

447
Ver 1.4

SOC

Standard
Memory
µp

Graphic New
Chip Set Subsystem Block

Reused
Block
Comm. Multimedia
Subsystem Subsystem

Host
Processor

Co- Co- Control


Processor Processor

448
Ver 1.4

224
System Design Market Forces

Semiconductor
Technology

Market Needs

Systems Systems
on Boards on Chips

449
Ver 1.4

System Design Market Forces

Semiconductor
ROI Concerns

IP Protection &
Ownership Issues

Design Transportability

Existing Design Tools and


Methodologies

Systems Systems
on Boards on Chips

450
Ver 1.4

225
What is Inside a Module?

degree of
uP/DSPD/A RAM
resource
core A/D

uP core
sharing P->S ROM
S->P
bus based

multiplexer-based ALU
register

operation
pipeline proc/ multiplexor
parallel HW

No. of clock cycles/


1 10 100 1000 data sample
451
Ver 1.4

Design Trends - Merging Design Styles


SOG

• more cores, re-


Embedded IC Design
use blocks
• layout issues

• Few large blocks UART


RAM
• Compiled RAM, data path
• Usable 30KG - 200KG
• Up to 300 I/Os
• 0.8um, 0.5um CMOS FIFO
• Metal programmable only CORE
REG
STANDARD CELL
• Compiled RAM, datapath, random logic
UART
• Custom blocks and I/Os
• Soft macros, hard and firm macros
• Use of large blocks (30-70% die)
CORE • Re-use design paradigm
• must automate to • Usable 100KG - 1MG
• Few large blocks (RAM, data path) • Up to 1000 I/Os
meet schedule
• Some custom blocks • 0.5um, 0.35 CMOS
• Supports heterogenous design style
(module compiler, synthesis, PLA) Higher levels of system integration is changing the implementation
• Usable 60KG - 300KG methods -> non-synthesizable components are becoming more important
µP’s are growing in importance relative to ASIC’s -> Design re-use is becoming more important
• Up to 500 I/Os
• 0.8um , 0.5um CMOS / Mixed analog

452
Ver 1.4

226
HW/SW/Time/Cost
Effect of HW Resource Constraints on HW/SW System Prototyping Costs

1- Minimum Hardware cost - lowest production cost with nominal memory


and computation margins
2- Reduced Development Cost & Time - more memory, use of advanced
software development tools and methodology
3- Minimum Development Cost & time - Powerful processor reducing software
development
453
Ver 1.4

SoC Require Fast Prototyping Design


 Typical Prototyping Process Flow’s Limitations
 long prototyping times
 high cost of design
 in-cycle silicon fabrication and test
 adhoc techniques used to make architectural and packaging trade-
offs
 lack of coupling between HW and SW design efforts

454
Ver 1.4

227
New SoC Design Flows
C, C++, Fortran
Algorithm
Design

Design
HARDWARE Partitioning SOFTWARE

Hardware Software

Behavioral HDL,
RTL HDL VHDL, Verilog, assembly C, assembly-code
C++ with SystemC code

Logic Synthesis C-asm Compiler

RTL and gate-level


MegaCells, cores

DSP processor
behavioral models
Hardware Models

455
Ver 1.4

What Does Silicon Inversion Look Like?

2000

SILICON INVERSION

DSP cores algorithm modules


1990 uC cores peripheral modules
interface modules

begin
Err <= HiAddr xor LoAddr;

process begin
1980 wait until CLOCK'event
and CLOCK = '1';
Flag <= (Err = "0000000");
end process;
end;

1970
456
Ver 1.4

228
SystemC Solve the Verification Bottleneck

A/D
A/D DP
DP Memory
Memory
‘1’ ’0’
“001011”
D/A
D/A µPP Mega
Mega
‘1’ ’0’ S/P
S/P µCC Cells
Cells
“001011”
‘1’ ’0’ P/S
P/S
“001011” Control
Control Logic
Logic
DMA
DMA

Gate-level simulation

Gate-level simulation is not feasible for System on a Chip

457
Ver 1.4

So, What is a System Design Language?

 SystemC is a System Design Language


 It is a language that raises the level of abstraction
above conventional digital design levels
 It is a language that adds collateral information to
descriptions at conventional digital design levels

Language Orientation Data Types Assignments Processes Structure Delays

Verilog DES model event, real, int, inertial, concurrent, components timing &
(gate level) no, weak typing immediate, initial, always, functionality
high priority task, cont. assig
preemptive
VHDL DES model user defined preemptive sequential, components delayed
guarded assignments
Esterel Reactive int, bool, triv atomic nested actions modules perfect
models abstract reaction synchrony
SystemC Synch. HW/SW data next cycle Synchronized, Hierarchical Delayed
Syn, RTL types next delta cycle asynchronized wProcesses assignments

458
Ver 1.4

229
Integrated Design Flow for SOC with SystemC
FSM / Protocol COSSAP SystemC/
a=1 Verilog /
int int VHDL process begin
wait until not
a=0 CLOCK'stable
and CLOCK=1;

b super
if(ENABLE='1') then
TOGGLE<= not
TOGGLE;

state end if;


end process;

Entry Verification C/C++ Module Behavioral


Compiler Generation Compiler
Design Design
Assembler Compiler
Compiler

Implementation Co-synthesis Design


RAM DSP µC ASIC
Post synthesis refinement

459
Ver 1.4

SystemC for modeling


 Modeling hardware
 SystemC provides all the advantages of
Verilog and VHDL
 Concurrent processes
 Concept of the clock
 Wide variety of bit true data types
 SystemC also enables
 System wide verification
 Portability between software and hardware
 Ability to handle late changes
 Performance Modeling

460
Ver 1.4

230
Roles of Models in SOC Verification
SystemC Bus
Legacy Functional
Arb Models
Source

Standard Bus
Models CPU

Interface
RAM Bridge • Popular buses
Peripherals

Bus

Kit
• Fast,

Peripherals
On-Chip Bus
• Memory and programmable
Std Logic Local Bus device BFMs
• VHDL/Verilog • Link to
Source DMAC Instruction Set
uP
• Customizable uP STD Simulators
Core
Core PROD SmartModel
IP vendor BFM Library
Cmds On-Chip
DSP User
HW/SystemC Bus Kit • Binary, ASIC
Core Defined
Models Logic
vendor
• Full- System compliant
ISA Sim On-Chip
functional; Bus Kit Models • All popular HDL
“accurate as simulators.
the silicon!”
• Guarantee of System ASIC
availability System ASIC Testbench
• COSSAP models of
real-world. DSP-Specific
• Verify standards Models 461
Ver 1.4

Model ‘Views’ Through the Flow


Determine Design Requirements

System Architectural Design


& Partitioning
Info Model (power, etc.)
Design Exploration Subsystem Verification

Ex. Testbench/Tests Test Development


SW Development

Instruction Set Level Behavioral Simulation


HW-SW Co-simulation

RTL Simulation
Bus Functional Model Co-sim Test Extraction

Synthesis & Optimization


Cycle-based Full-function
Static Timing Analysis

Synthesis & Timing Model Unique Delay Simulation

Design For Test


Cycle-based Discrete Event
Floor-planning & P&R
Fault Model Static Timing Signoff
&
Formal Verification
Physical Model
Debug Prototype

462
Ver 1.4

231
Behavioral Modeling

Abstraction Levels
Characteristics
System • Interfaces between Major Models are mixed clock/event
driven
• Balance of design is event driven behavioral code
• Behaviorally accurate
Behavioral Purpose
• Integrated System Model
• Fastest simulating code
RTL Challenges
• Behavioral design is new: guidelines, models etc. are
scarce

Gate

Transistor

463
Ver 1.4

System Level Modeling

CPU
Clk
Event driven
driven Bus IF

Clk
Event driven I/O Board Disk
driven Bus IF
Clk Clk
driven Event driven Event
Memory Bus IF driven Bus IF driven

Clk
Bus driven
Arbiter Bus IF

464
Ver 1.4

232
Simple Example : System Code
 System Level Code
 Concise

 Easy to write

 Functionally accurate

index = 0;
...
while( index < 16 )
{
D_diff = data_A * data_B - data_C * data_D;
D_out = (D_diff + Accum);
} ...

There are two ways to get this to a more


accurate representation of hardware..

465
Ver 1.4

Behavioral Level Modeling


Bus Mixed clk/event code for interface:
wait_until(bus_valid == true);
data_reg = bus_data;
Bus Interface addr_reg = bus_addr;
wait();

Bus
State
Machine

In Out
Q Q Memory Event code for rest of module:
Event wait_until(in_q_not_empty);
Memory Control read_queue(in_q, data_reg, addr_reg);

The behavior of a hardware design block can be


described as a series of conditionally executed loops,
which may in turn be conditional
466
Ver 1.4

233
Modeling Behavioral Building Blocks Overview
Determine the tasks, in order,
you wish to model
Concept

Specify how these tasks are


executed (repetitive,
conditionally)

Basic Code
Write the code

Ensure the code correctly


Behavioral models hardware
Code
Synthesize

467
Ver 1.4

Determine Tasks
 Determine the tasks required to fulfill functionality Step 1

 Sketch a task diagram


 Include only what is required to describe the functionality

 Implementation details are not necessary


Typical hardware
Write to RAM Update Checksum
design example

Read RAM Wait for Send Output all data


Wait for
Begin Input End
Read Input Accumulate Update RAM

Indicate Ready

Process RAM Data Output Result Update Status

 A Behavioral description contains


 The tasks to be executed and the required order of execution

468
Ver 1.4

234
Behavioral Building Blocks
 How is a task executed? Step 2
 One-time execution

 Repetitive execution Subtraction must


happen after addition
 How is a task is ordered?
 Inherent dependencies A=B+C
D=A-5
 Conditional execution
if (D > 10)
 Behavioral Building Blocks Out = 10
 Order of the tasks else
Out = D * 2
 Loops end if
 Conditional statements
Multiplication is
 Conditional loops conditional

469
Ver 1.4

Specify the Behavior


 In this example Step 3
 The entire process always repeats

 Some tasks require repetitive execution


 loop
 Branching requires a conditional test
 if-then-else
IF
Write to RAM Update Checksum

IF
Read RAM Wait for Send Output all data
Wait for
Begin Input End
Read Input Accumulate Update RAM

IF Indicate Ready

Process RAM Data Output Result Update Status

470
Ver 1.4

235
Write the Code
Begin
Loop
 The Code Wait for Input
IF if
 Falls naturally from the task Write to RAM
diagram Update Checksum
else if
 Rotate the task diagram Read RAM
Write to RAM Update Checksum Wait for Send
IF Output all data
Read RAM Wait for Send Output all data
Wait for else if
Begin Input End
Read Input Accumulate Update RAM Read Input
IF Indicate Ready Accumulate
Process RAM Data Output Result Update Status IF if
Update RAM
else
Indicate Ready
end if
else
Process RAM Data
In 3 steps, we now have an Output Result
overview of a behavioral Update Status
description end if
End Loop
End

471
Ver 1.4

Simple Example : Behavioral Code


 Behavioral Level Code
 Add IO accesses

 Specify a clock

 Refine data types/widths


 Synthesize directly from this point

index = 0;
...
//Read in the Sample values
Add Input and Output while( index < 16 )
ports accesses {
data_A = Port1.read;
data_B = Port2.read;
data_C = Port3.read;
data_D = Port4.read;
D_diff = data_A * data_B - data_C * data_D;
D_out.write(D_diff + Accum);
Add a clock
wait();
} ...

472
Ver 1.4

236
Coding in Behavioral Style and Tips - 1

Abstraction Levels  describe behavior


System  start with generic types like float or integer
and try to limit the bit width as much as
possible
Behavioral  move main behavior/IO into entry function
 be careful with using signals globally;
RTL think about it as a routing problem
 add better I/O protocol to the design
 use handshake
Gate
 channels

 try to improve memory performance by


Transistor
using wider memories

473
Ver 1.4

Coding in Behavioral Style and Tips - 2

Abstraction Levels  try to find reasonable compromise


between complexity of function and
System maintainability
 too much behavior in a function makes it
difficult to read
Behavioral
 too few behavior leads to an “artificial”
hierarchy
RTL  I/O usually changes between behavioral
and RT level simulation

Gate
 think about latency impact; using
memories means latency(wait statements)
 internal bit width have to be adjusted so
Transistor that the error rate is not violated

474
Ver 1.4

237
Coding in Behavioral Style and Tips - 3

Abstraction Levels  keep internal variables, e.g. loop variables


as short as possible
System
 don’t hide I/O operations
 keep reasonable size subprograms
Behavioral
 try to find the “lifetime” of signals
 control statements, e.g. if or switch go by
RTL default into state machine not implemented
as simple multiplex operation
Gate  try to describe the behavior not a state
machine
 control branches can increase design
Transistor
complexity dramatically and unnecessary

475
Ver 1.4

Coding in Behavioral Style and Tips - 4

Abstraction Levels  keep names simple and easy to


understand(similar to standard)
System
 entry function should shown main
behavior, hide details in subprograms, but
Behavioral
don’t hide everything
 make sure you know the “key” parts of the
design(comments, comments,
RTL comments…)

Gate

Transistor

476
Ver 1.4

238
RTL Architecture

Control FSM
RTL Architecture
Ext
Logic

State
Optional

Status Pipe Register

Data
input
0000 0110110010110101
1000 0111100001010101
0100 0011111010101011
input 1100 0010101011111001
0010 1101010111010101
1010 0000000011010101
0110 0111111110010111
CLK
output
Memory
Datapath

477
Ver 1.4

RTL Level Modeling - 1

Abstraction Levels

Characteristics
System
• Fully clock driven RTL code with some behavioral
constructs
• Contains complete functional description
Behavioral • Cycle accurate
Purpose
• Input for Synthesis tools
RTL • Validation model for structural code
• Full functionality
Challenges
• Textual entry
Gate
• Synthesis coding style

Transistor

478
Ver 1.4

239
RTL Level Modeling - 2

Abstraction Levels
At the RT Level, language issues blur somewhat:
System  Are we modeling a design for simulation?
 Are we describing the design to a synthesis tool?

Compromise, because both apply


Behavioral  Must simulate the design & the target system
 Don’t even need a simulator for successful synthesis
 Synthesis uses a subset of the C++ language
RTL  Good code for synthesis may not be fastest to simulate
 Synthesis tools require partitioning/structure not useful
in simulation

Gate

Transistor

479
Ver 1.4

Simple Example : RTL Code


 RT Level Code
 Design the Implementation

 Capture SystemC code


 Add I/O accesses, clock and refine data types/widths
 Synthesize from here directly
Datapath
{
Control {
while (true);
while (true); if (clock.posedge()) {
if (clock.posedge()) { if reset {
if reset { dout.write(0);
index = 0; } else {
} else { if (enable ) {

+
if (index < 16) { data_A = Port1.read;
index++; data_B = Port2.read;
enable = true; data_C = Port3.read;
} else { data_D = Port4.read;
index = 0; D_diff = data_A * data_B - data_C * data_D;
enable = false; D_out.write(D_diff + Accum);
} }
} }
} }

480
Ver 1.4

240
Gate Level Modeling

Abstraction Levels

Characteristics
System
• Fully clock driven Structural code (gate level)
• ASIC's, PLD's, Glue logic
• Synthesized representation
Behavioral • Schematic representation
Purpose
• Representation for Physical design
RTL Challenges
• More simulator events, so slower simulation

Gate

Transistor

481
Ver 1.4

Use Behavioral or RTL Modeling?


 Enter the design in the most intuitive fashion ….
 If you have thought of design as an FSM, use RTL
 Do not make an FSM into behavioral code
 If a design has a lot of arbitrary state transitions
(“go to” functionality) use RTL
 Use Behavioral for :
 Designs with algorithmic computations
 Designs with complex data flow
 Designs with a lots of memory accesses
 Designs with a flexible latency to implement the
functionality Behav RTL

 Control dominated designs

482
Ver 1.4

241
Methodology Note

 System to Behavioral Level


 Add I/O reads and writes
 re-verify with top level testbench
 Add Clock information
 re-verify with top level testbench
 Refine data types and bit widths
 re-verify with top level testbench
 synthesize
 System to RTL
 Write SystemC code
 verify with top level testbench
 here, the chance of being in a long debug loop is
much greater

483
Ver 1.4

Summary

 System-on-a-Chip
 covered issues involving System Level Design

484
Ver 1.4

242
Lab 11: RSA Public-Key Crypto-System -1
Primary Objectives:
understand SystemC’s role in Software development
understand the use of sc_biguint<> and sc_bigint<>

Background
RSA is a public-key cryptosystem that can be used to encrypt message sent
between 2 communicating parties
RSA also enable “digital signature” to the end of the electronic message
which is the electronic version of a handwritten signature on a paper document
It is based on the dramatic difference between the ease of finding large prime
numbers and the difficulty of factoring the product of 2 large prime numbers.

485
Ver 1.4

Lab 11: RSA Public-Key Crypto-System -2


Basic Information
To encrypt data, enter the data ("plaintext") and an encryption key to the encryption
portion of the algorithm
To decrypt the "ciphertext," a proper decryption key is used at the decryption portion
of the algorithm. Those keys, which contains simply a string of numbers, are called
public key and private key, respectively
For example, suppose Alice intends to send e-mail to Bob. Through a public-key
directory, she finds his public key.
Then, she encrypts her message using the key and send it to Bob.
This public key, however, will not decrypt the ciphertext. Knowledge of Bob's public
key will not help an eavesdropper.
 In order for Bob to decrypt his ciphertext, he must use his private key. If Bob wants to

respond to Alice, he encrypts his message using her public key.

486
Ver 1.4

243
Lab 11: RSA Public-Key Crypto-System -3

Lab Assignment:
 Learn how to write a software model using sc_bigint<> and
sc_biguint<>

1. If you Understand RSA’s Algorithm, try to implement it in SystemC.


Else
look at the model solution and try to implement it in ANSI C++ and
compare the ease of coding and code size.

File Edit View Analyze

CLK1

487
Ver 1.4

Lab 11: Sample Output


Sample output:

488
Ver 1.4

244
Agenda: Day Three
DAY
3
Unit Topic Lab

19 System-on-a-Chip

20 Workshop Summary

21 Appendix and Labs Solutions

489
Ver 1.4

SystemCTM Enable System/Behavior/RTL Design


Algorithms System

• Using C/C++ • Using C/C++

Application Architecture

• Using C/C++
• Using C/C++
490
Ver 1.4

245
The Open SystemC™ Initiative (OSCI)

 Launched on 9/27/99
 Technical collaborators:
 CoWare
 Frontier Design
 ARM
 Endorsed by over 55 systems(dated
09/27/1999),
 semiconductor
 IP
 embedded software and EDA companies

491
Ver 1.4

System-level implementation tools

Memory Compilers Design Compiler


Cyclone

Analog design phone


keypad
phone
RAM DMA book
book intfc

µC core
S/P control protocol

demod
Logic
A digital
and Viterbi
Analog SH
D
downconv
sync Eqls.
SAW FILTER
speech
voice
quality
recognition
enhancement

RF 900Mhz
70Mhz IF 10.7Mhz 40Ms/sec - 540ks/sec
DSP core
de-intl RPE-LTP
270.8ks/sec BB & speech
decoder decoder

High-speed
datapath compiler Behavioral
Compiler COSSAP

492
Ver 1.4

246
System Level Co-Design Continuum

Architecture Definition Functional Validation

HW/SW Performance Evaluation


C/C++ C/C++

+ a.out

Synthesis(?!) Contact your Synopsys representatives for details! 493


Ver 1.4

Problems in System Level Design


Here is what the author (Martin Wang) think about what are the System Level Design problems and why I believe
SystemC is the solution for these problems.
1) "I have an algorithm in hand, how do I know whether to implement it in dedicated hardware or software?
Which one is more cost-efficient and effective?" This is a Hardware/software partition problem.

2) "I have an IP core, and I want to test whether my algorithm fits with it or not, but I don't want to spend 4
months writing the RTL code and then find out that it doesn’t. Is there a faster way to prove to my
management that my algorithm is worth using?" This is fast proof of concept problem.

3) "I found out that there is a bug in my chip after tape-out, I have an idea how to fix it, but I want to verify it
first before asking the RTL designer to change the code?" This is a verification problem.

4) "I want to improve my chip's performance, but I don't want to write the whole RTL. I just want to do some
experiments to explore the performance improvement. What should I do?" This is architectural exploration
problem.

5) "We got a CPU C model from company X, and we would like to add other designs to it as a System on a
Chip. But our company's C model language is different from their's. We are all using our company's
internal proprietary language. Is there a way out?" This is a IP problem.

6) "We wanted to move to C++ modeling, but many architects and designers are still not comfortable
programming in C++. They are very good at C. Is there a C++ modeling language that is easy to use for C
programmers?" This is a programming problem.

SystemCTM is YOUR SOLUTION>>.


494
Ver 1.4

247
Summary for Day Three

 Day Three
 covered global and local watching
 covered hardware modeling especially bus
modeling
 covered refinement for more details
 covered functional I/F
 covered hardware/software co-verification
 briefly covered system-on-a-chip

495
Ver 1.4

SystemCTM Workshop Summary


 SystemCTM is the C++ Modeling Standard
 Designers does not need have to be expert in a number of different languages.
SystemC will allow modeling from the system level to the gate level if necessary.
 One set of tools

 Hardware and Software Co-verification with SystemCTM


 Fast Simulation when compared with tradition Co-simulation
 An executable specification
 A System to Gates Develop Environment
 Faster develop & debug

Connecting Designers
HW/SW
496
Ver 1.4

248
SystemCTM Community

http://www.systemc.org

497
Ver 1.4

Beyond SystemC (!?)

TTM Complexity
Synthesize?
Test?
Simulate?
Floorplan?

???

Contact your local AC/Synopsys representatives . . .

498
Ver 1.4

249
Synopsys can HELP you SUCCEED!

Design idea System Hardware Implementation


Definition and Entry synopsys
Tools

DESIGN CYCLE
System-Level
Debugging
Software Development

C
R
I T
T I Customer Feedback Loop
I M
C E
A
L
Customer Response Product Presentation

Product Sale

Appendix A - Tracing

250
Appendix A: Tracing - 1

 All tracing functions have in common:


 Called sc_trace
 First argument is pointer to the trace file data structure
sc_trace_file
 Second argument is a reference to a variable being traced
 Third argument is a reference to a string.

 Each tracing function has additional arguments that specify how the
variable is traced.
 All have default so only need to specify the first 3 arguments

501
Ver 1.4

Appendix A: Tracing - 2

Tracing functions:
// For tracing unsigned char
void sc_trace(sc_trace_file *tf, const unsigned char& v,
const sc_string& NAME,
const int width = 8 * sizeof(char));
// For tracing unsigned short
void sc_trace(sc_trace_file *tf, const unsigned short& v,
const sc_string& NAME,
const int width = 8 * sizeof(short));
// For tracing unsigned long
void sc_trace(sc_trace_file *tf, const unsigned long& v,
const sc_string& NAME,
const int width = 8 * sizeof(long));
// For tracing unsigned int
void sc_trace(sc_trace_file *tf, const unsigned int & v,
const sc_string& NAME,
const int width = 8 * sizeof(int));

502
Ver 1.4

251
Appendix A: Tracing - 3

Tracing functions:
// For tracing char
void sc_trace(sc_trace_file *tf, const char& v,
const sc_string& NAME,
const int width = 8 * sizeof(char));
// For tracing short
void sc_trace(sc_trace_file *tf, const short& v,
const sc_string& NAME,
const int width = 8 * sizeof(short));
// For tracing long
void sc_trace(sc_trace_file *tf, const long& v,
const sc_string& NAME,
const int width = 8 * sizeof(long));
// For tracing int
void sc_trace(sc_trace_file *tf, const int & v,
const sc_string& NAME,
const int width = 8 * sizeof(int));
// For tracing float
void sc_trace(sc_trace_file *tf, const float& v,
const sc_string& NAME);
503
Ver 1.4

Appendix A: Tracing - 4

Tracing functions:
// For tracing double
void sc_trace(sc_trace_file *tf, const double& v,
const sc_string& NAME);
// For tracing bool
void sc_trace(sc_trace_file *tf, const bool& v, const sc_string& NAME);
// For tracing sc_logic
void sc_trace(sc_trace_file *tf, const sc_logic& v,
const sc_string& NAME);

 Note that it is a template function for tracing signals of scalar


and for channels
Tracing functions:
// scalars
template <class T> void
sc_trace(sc_trace_file *tf, const sc_signal<T>& v,
const sc_string& NAME);
// channels
template <class T> void
sc_trace(sc_trace_file *tf, const sc_channel<T>& v,
const sc_string& NAME); 504
Ver 1.4

252
Appendix A: Tracing - 5

Tracing functions:
// For tracing sc_signal<int>
void sc_trace(sc_trace_file *tf, const sc_signal<int>& v,
const sc_string& NAME, const int width);
// For tracing sc_signal<long>
void sc_trace(sc_trace_file *tf, const sc_signal<long>& v,
const sc_string& NAME, const int width);
// For tracing sc_signal<short>
void sc_trace(sc_trace_file *tf, const sc_signal<short>& v,
const sc_string& NAME, const int width);
// For tracing sc_signal<char>
void sc_trace(sc_trace_file *tf, const sc_signal<char>& v,
const sc_string& NAME, const int width);

505
Ver 1.4

Appendix A: Tracing Example


/* Top-level routine for handshaking simulation */
#include "systemc.h"
#include "proc1.h"
#include "proc2.h"
int sc_main(int ac, char* av[])
{
sc_signal<bool> data_ready;
sc_signal<bool> data_ack;
sc_signal<int> data;
sc_clock clock("CLOCK", 10, 0.5, 0.0);
proc1 Mast("MastrProcess", clock.pos(), data_ack, data, data_ready);
proc2 Slav("SlaveProcess", clock.pos(), data_ready, data, data_ack);
// Create VCD file hshake.vcd
sc_trace_file * tf = sc_create_vcd_trace_file("hshake");
// Dump signal values
sc_trace(tf, data_ready, "DataReady");
sc_trace(tf, data_ack, "Data_Ack");
sc_trace(tf, data);
// Dump internal variables of processes
sc_trace(tf, Mast.data_written, "DataWritten");
sc_trace(tf, Slav.data_read, "DataRead");
sc_trace(tf, clock.signal(), "Clock");
sc_start(-1);

cout << "SIMULATION COMPLETED AT TIME " << sc_time_stamp()


<< endl;
}
506
Ver 1.4

253
Appendix A: Tracing aggregate type

Given:

struct bus {
unsigned address;
bool read_write;
unsigned data;

example data trace function:

void sc_trace(sc_trace_file *tf, const bus& v, const sc_string& NAME)


{
sc_trace(tf, v.address, NAME + ".address");
sc_trace(tf, v.read_write, NAME + ".rw");
sc_trace(tf, v.data, NAME + ".data");
}

507
Ver 1.4

Appendix A: Tracing Variable, signal & channel


arrays - 1
Tracing function example for sc_array<int> :

void sc_trace(sc_trace_file *tf, const sc_array<int>& v,


const sc_string& NAME, const int width = 8 * sizeof(int))
{
char stbuf[20];
for (int i = 0; i< v.length(); i++) {
sprintf(stbuf, "[%d]", i);
sc_trace(tf, v[i], NAME + stbuf, width);
}
}

508
Ver 1.4

254
Appendix A: Tracing Variable, signal & channel
arrays - 2
Tracing function example for sc_signal_array<sc_array<int> > :
void sc_trace(sc_trace_file *tf,
const sc_signal_array<sc_array<int> >& v,
const sc_string& NAME, const int width = 8 * sizeof(int))
{
char stbuf[20];
for (int i = 0; i< v.length(); i++) {
sprintf(stbuf, "[%d]", i);
sc_trace(tf, v[i], NAME + stbuf, width);
}
}

509
Ver 1.4

Appendix A: Tracing Variable, signal & channel


arrays - 3
Tracing functions :
// For tracing sc_bool_vector
void sc_trace(sc_trace_file *tf, const sc_bool_vector& v,
const sc_string& NAME);
// For tracing sc_signal_bool_vector
void sc_trace(sc_trace_file *tf, const sc_signal_bool_vector& v,
const sc_string& NAME);
// For tracing sc_logic_vector
void sc_trace(sc_trace_file *tf, const sc_logic_vector& v,
const sc_string& NAME);
// For tracing sc_signal_logic_vector
void sc_trace(sc_trace_file *tf, const sc_signal_logic_vector& v,
const sc_string& NAME);
// For tracing sc_signal_resolved_vector
void sc_trace(sc_trace_file *tf, const sc_signal_resolved_vector& v,
const sc_string& NAME);
// For tracing sc_int
void sc_trace(sc_trace_file *tf, const sc_int& v,
const sc_string& NAME);
// For tracing sc_uint
void sc_trace(sc_trace_file *tf, const sc_uint& v,
const sc_string& NAME);

510
Ver 1.4

255
Appendix B
Data Types

Conventional Number Systems


Unsigned Binary
n −1
V = ∑ bi 2 i
i =0

Signed Binary
n −2
V = −bn −1 2 n −1 + ∑ bi 2 i
i =0

General form: base “r”


n −1
V = ∑ vi r i vi ∈[0, r − 1]
i =0

In the number systems shown above, each allowed value, V, can be


uniquely represented. Example: “7” in unsigned binary is: “111”

512
Ver 1.4

256
Redundant Binary Number Systems
Binary
base is “2” and magnitudes are 20,21,22,23, ……
more than one bit per position
V beyond 0,1 (base is still “2”)

Signed Digit {-1,0,+1} Carrysave {0,+1,+2}

example example
00 0 00 0
01 +1 01 +1
10 0 10 +1
11 -1 11 +2

Why are the above redundant binary ?

513
Ver 1.4

Appendix B: List of Operations by operation type

List of all operations SystemCTM allows,


classified into operations types
Operator Class Operators in class
Bitwise ~ & ^ | << >>
Arithmetic + - * / %
Logical ! && ||
Equality == !=
Relational < <= > >=
Assignment =, +=, -=, /=, %=, &=, ^=, |=, <<=, >>=
Auto-increment ++
Auto-decrement --
Arithmetic if ?=
Concatenation ,
Index []
514
Ver 1.4

257
Appendix B: sc_int, sc_uint Operators - 1

sc_int, sc_uint operators:


 [ ] used to refer to a particular element

 range() method extracts a sub-vector - signed integer or


unsigned integer
 User responsible to make sure sub-vector is valid (makes
sense in the design)

Examples:
sc_int<5> a;
a = 13; // a gets 01101, a[4] = 0, a[3] = 1, …, a[0] = 1
bool b;
b = a[4]; // b gets 0
b = a[3]; // b gets 1
b = a[0]; // b gets 1
sc_int<3> c;
c = a.range(1, 3); // c gets 011 - interpreted as 3

515
Ver 1.4

Appendix B: sc_int, sc_uint Operators - 2

sc_int, sc_uint operators (Cont.):


 Arithmetic and logical operators
 Standard C++ arithmetic and logical operations are supported
 For binary arithmetic operators when one operand is sc_int

or sc_uint other operand may be of type sc_int, sc_uint ,


(unsigned) long, (unsigned) int, (unsigned) short,
or (unsigned) char only.
Example:
sc_int<10> a, b;
if (a == b)
a *= 10;
else if (a > b)
a = b + 5;
else {
while (a > 0) {
b++;
a--;
}
} 516
Ver 1.4

258
Appendix B: sc_int, sc_uint Operators - 3

sc_int, sc_uint operators (cont.)


 Arithmetic and logical operators
 When the operands of a binary operator are:
 sc_uint and a built-in integer type
 Built-in integer type converted to sc_uint same width
as sc_uint .
 Result is of type sc_uint

Example:
signed char a = -1;
sc_uint b(16), c(16);
b = 5;
c = b + a; // a gets promoted to sc_uint of width 16 and
// gets value 255. Therefore the value of c is 260 and not 4.

517
Ver 1.4

Appendix B: sc_int, sc_uint Operators - 4

sc_int, sc_uint operators (cont.)


 Arithmetic and logical operators
 When the operands of a binary operator are:
 sc_int and a built-in integer type
 Built-in integer type converted to sc_int same width
as sc_int .
 Result is of type sc_int

Examples:
unsigned char a = 2;
sc_int<8> b;
sc_int<9> c;
b = 4;
c = b + a; // a gets promoted to sc_int of width 8 and
// gets value 2. Therefore the value of c is 6.

518
Ver 1.4

259
Appendix B: sc_int, sc_uint Operators - 5

sc_int, sc_uint operators (cont.)


 Arithmetic and logical operators
 When the operands of a binary operator are (Cont.):
 sc_uint and sc_int
 sc_uint promoted to sc_int of appropriate width.
 Result is of type sc_int

 Bitwise operators defined for sc_bool_vector are also


defined for sc_int or sc_uint .
 For bitwise logical operators, when one operand is
sc_int or sc_uint other operand may be of type
sc_int, sc_uint , (unsigned) long, (unsigned)
int, (unsigned) short, or (unsigned) char only.

519
Ver 1.4

Appendix B: sc_int, sc_uint Operators - 6

Summary of binary operators allowed for sc_int, sc_uint


Operators Type of the other operand
& | ^ + * / % == != sc_int, sc_uint, long, int, short,
<< >> < <= > >= = += char, unsigned long, unsigned int,
-= *= /= %= &= ^= unsigned hsort, unsigned char
<<= >>=
= sc_bool_vector, sc_logic_vector, char *

520
Ver 1.4

260
Appendix B: sc_int, sc_uint Type Conversions - 1

 Assignment operator = is used.


 RHS:
 sc_bv, sc_lv covered already
 C++ string (char * )
 used when number expressed in binary
 format is <base><sign> digits
 base is "d", "D", "o", "O", x", "X".
 No base specified means binary
 Converted to 2's complement

Examples:
sc_int<10> a;
sc_uint<10> b;
a = "d28"; // a gets decimal 28 or 0000011100
a = "d-28"; // a gets decimal -28 or 1111100100
b = "o32"; // b gets decimal 26 or 0000011010
b = "o-32"; // b gets decimal 998 or 1111100110
a = "xfff"; // a gets decimal -1 or 1111111111
a = "x-fff"; // a gets decimal 1 or 0000000001
b = "d-2048"; // b gets 0 521
Ver 1.4

Appendix B: sc_int, sc_uint Type Conversions - 2


 RHS (Cont.):
 sc_int sc_uint

 LHS shorter
 RHS left-most bits truncated
 RHS shorter
 LHS sc_int then RHS sign extended
 LHS sc_uint then RHS zero extended

Examples: Examples:
sc_int<4> a; sc_int<4> a;
sc_int<6> b; sc_int<6> b;
sc_uint<4> x; sc_int<8> c;
sc_uint<6> y; sc_uint<4> m;
a = "1101"; // a gets -3 sc_uint<6> n;
b = "010011"; // b gets 19 sc_uint<8> p;
x = a; // x gets 1101 or 13 b = "101110"; // b gets -18
y = a; // y gets 001101 or 13 a = b; // a gets 1110 or -2
x = b; // x gets 0011 or 3 c = a; // c gets 1111111110 or -2
x = "0011" // x gets 3 n = "100011"; // n gets 35
y = "100101"; // y gets 37 m = n; // m gets 0011 or 3
a = x; // a gets 0011 or 3 p = m; // p gets 0000000011 or 3
b = x; // b gets 000011 or 3
a = y; // a gets 0101 or 5 522
Ver 1.4

261
Appendix B: sc_int, sc_uint Type Conversions - 3

 RHS (Cont.):
 unsigned int
 Assignment is right-aligned
 RHS larger, left most bits truncated
 RHS smaller, excess bits are assigned value of 0

Examples:
sc_int<4> a;
sc_uint<4> c;
unsigned int b = 6; // 110 in binary
sc_int<40> d;
d = b; // d gets 000…0110 (37 leading zeros)
a = b; // a gets 0110 or 6;
c = b; // c gets 0110 or 6;
b = 31; // 31 decimal = 11111
a = b; // a gets 1111 or -1
c = b; // c gets 1111 or 15
523
Ver 1.4

Appendix B: sc_int, sc_uint Type Conversions - 4

 RHS (Cont.):
 signed int
 Assignment is right-aligned
 RHS larger, left most bits truncated
 RHS smaller, excess bits are assigned value of sign bit

Examples:
sc_int x(4);
sc_uint y(4);
int w = -7; // 111…1001 in binary (29 leading 1's)
sc_int<33> m;
sc_uint<33> n;
x = w; // x gets 1001 or -7
y = w; // y gets 1001 or 9
m = w; // m gets 111…1001 (30 leading 1's) = -7
n = w; // n gets 111…1001 (30 leading 1's) = 8589934585
w = -28; // -28 decimal = 11…100100
x = w; // x gets 0100 or 4
y = w; // y gets 0100 or 4
524
Ver 1.4

262
Appendix B: sc_int, sc_uint Type Conversions - 5

 sc_int, conversion to C++ types:


 to_int() integer
 to_double() double precision floating point
 to_string() character string.
 sc_int conversion to C++ types:
 to_unsigned() integer
 to_double() double precision floating point
 to_string() character string.
Examples:
sc_int<4> x;
sc_uint<4> y;
int a;
unsigned b;
double c;
char *str;
x = "1011"; // x gets –5
y = "0111"; // y gets 7
a = x.to_int(); // a gets –5
b = y.to_unsigned(); // b gets 7
c = x.to_int(); // c gets –5.0
str = y.to_string(); // str gets "0111"
525
Ver 1.4

Appendix B: sc_logic Operators - 1

sc_logic operators:
& logical AND
| logical OR
^ logical exclusive OR
~ logical NOT
== equality
NOT X 0 1 Z
!= inequality
X 1 0 X

AND X 0 1 Z OR X 0 1 Z XOR X 0 1 Z
X X 0 X X X X X 1 X X X X X X
0 0 0 0 0 0 X 0 1 X 0 X 0 1 X
1 X 0 1 X 1 1 1 1 1 1 X 1 0 X
Z X 0 X X Z X X 1 X Z X X X X

526
Ver 1.4

263
Appendix B: sc_logic Operators - 2

 When one operand is of type sc_logic, the other


operand may be of type sc_logic, bool or char only.
 Assignment operators
(given: sc_logic a, b; )
&=
a = a & b; a &= b;
|=
a = a | b; a |= b;
^=
a = a ^ b; a ^=b;

 Equality operators
 when using the == or != operator, one of the
operands may be a character literal:
‘0’, ‘1’, ‘X’, or ‘z’

527
Ver 1.4

Appendix B: sc_logic Operators

Summary of binary operators allowed for sc_logic


Operators Type of the other operand
& | ^ &= ^= =!= sc_logic, bool, char
= sc_logic, char, bool(restricted to 0
and 1 comparisons only)

sc_logic a, b, c, d, f, g;
b = ’0’;
c = ’1’;
f = ’1’;
a = b & c; // a gets 0
d = a | f; // d gets 1
g = f ^ d; // g gets 0
a = ~g; // a gets 1
b |= a; // b gets 1
c ^= a; // c gets 0
if (a == b) // condition is true
d = f & (~a); // d gets 0;
if (a == ’X’) // condition is false
d = a & (~f); // not executed in this case 528
Ver 1.4

264
Appendix B: sc_logic Type Conversions
 The assignment operator = is used to assign a value to a variable of
type sc_logic.
Example ( given sc_logic a,b; ):
a = b;
 The RHS of the assignment operator may be of type:
 sc_logic
 bool
 char
 When the RHS is of type char, translation is from textual to logic
value.
 Lower case letters are translated to their uppercase
counterparts
 values other than ‘0’, ‘1’, ‘x’, ‘z’ translate to unknown (X)

Examples (given sc_logic a;):


a = ‘0’; // a gets 0
a = ‘1’; // a gets 1
a = ‘z’; // a gets Z
a = ‘x’; // a gets X
a = ‘s’; // a gets X
529
Ver 1.4

Appendix B: sc_bv Operators - 1


sc_bv operators:
 &, |, ^, ~, &=, ^=
 Same as for bool except perform bitwise on entire vector
 Length of both vectors must be same
 Must be right-aligned
 When one operand is sc_bv other operand may be of type sc_bv,
sc_lv and char * .
 char *
 consist of 0 and 1 only
 equal in length to other operand, excess characters
ignored.

Examples:
sc_bv<3> a;
sc_bv<3> b;
a[2] = true; a[1] = false; a[0] = true; // a = 101
b[2] = true; b[1] = true; b[0] = false; // b = 110
sc_bv<3> c;
c = a & b; // c gets 100, that is c[2] = 1, c[1] = 0, c[0] = 0
c = a | b; // c gets 111
c = a ^ b; // c gets 011
c = ~c; // c gets 100
a &= "110"; // a gets 100, that is a[2] &= 1, a[1] &= 0, a[0] &= 0
b |= "0010"; // b gets 110 - leftmost 0 on C-string ignored 530
Ver 1.4

265
Appendix B: sc_bv Operators - 2
sc_bv operators (cont.):
 and_reduce(), or_reduce(), and xor_reduce()
 logical reduction of all elements of the vector

 ==, !=
 Bitwise comparison
 Variables must be of same size

Examples:
sc_bv<3> a;
sc_bv<3> b;
b |= "0010"; // b gets 110 - leftmost 0 on C-string ignored
a &= "110"; // a gets 100, that is a[2] &= 1, a[1] &= 0, a[0] &= 0
bool d;
d = and_reduce(a); // d gets 0
d = or_reduce(b); // d gets 1
d = xor_reduce(b) & xor_reduce(a); // d gets 0

531
Ver 1.4

Appendix B: sc_lv Type Conversions - 1

 Assignment operator = is used.


 RHS variable:
 sc_bv same length LHS
 C++ array of sc_logic at least the length of LHS
 sc_lv same length LHS
 C++ string (char * )
 Greater than or equal to length LHS
 Assignment is right-aligned
 Any excess characters are ignored
 Characters can only be valid characters used to represent
sc_logic (‘1’, ‘0’, ‘x’, ‘z’)

532
Ver 1.4

266
Appendix B: sc_lv Type Conversions - 2
 RHS variable (Cont.):
 unsigned int or sc_uint or sc_int (discussed later)
 Assignment is right-aligned.
 If number of bits on RHS is greater than LHS then left most
bits are truncated.
 For sc_uint if number of bits on LHS is greater than RHS
then excess bits are assigned value of 0.
 For sc_int if number of bits on LHS is greater than RHS
then excess bits are assigned value of the sign bit (sign
extend).

533
Ver 1.4

Appendix B: sc_lv Type Conversions - 3

 sc_lv may be converted to the following types:


 signed int or unsigned int
 to_unsigned() method
 returns a unsigned int equivalent.
 sc_lv treated as an unsigned quantity.
 If sc_lv is greater than size of int then most significant
(leftmost) bits are truncated before conversion.
 to_signed() method
 returns a signed int equivalent.
 sc_lv treated as an signed quantity.
 If sc_lv is greater than size of int then most significant
(leftmost) bits are truncated before conversion.

534
Ver 1.4

267
Appendix B: sc_lv Type Conversions - 4

 sc_lv may be converted to the following types


(Cont.):
 sc_int or sc_uint
 If size of sc_int is larger than sc_lv
 Sign extend sc_lv
 If size of sc_uint is larger than sc_lv
 Fill left most bits of sc_lv with zeros
 If size is smaller than sc_lv
 Truncate left most bits of sc_lv

535
Ver 1.4

Appendix B: sc_lv Type Conversions - 5

 Conversion (Cont.)
 C++ String
 to_string() method
 returns a C++ string representation.
 Usefull for printing out sc_lv value

Example:
sc_lv<10> a;
cout << a; //Prints the value of a using C++ iostream
printf("%s", a.to_string()); //Prints using C stdio

536
Ver 1.4

268
Appendix B: One Dimensional Arrays: sc_array

Useful for modeling bit-vectors and memories.


sc_array
 Semantics identical to C++ arrays
 template class - SystemCTM arrays
 Declared by invoking template sc_array
 Array of elements
 Organized left to right.
 Rightmost element index: 0
 Leftmost element index: length of array - 1
 Each element is a scalar type (not an array)
 Arrays that are bit vectors of length n
 Rightmost bit (index 0) is LSB
 Leftmost bit (index n-1) is MSB
 Used to write to signals of array types.

sc_array<bool> a(4); a[3] a[2] a[1] a[0]


MSB LSB
537
Ver 1.4

Appendix B: sc_array Syntax

Syntax:
sc_array<type> array_name (length) ;

type:
 Built in C++ type or class that has a default
constructor
length:
 Specifies the number of elements in the array
 Must be greater than 0
 Must be compile time constant

Examples:
sc_array<int> a(10) ; // an array of 10 int’s
sc_array<bool> b(32) ; // an array of 32 bool’s
sc_array<sc_logic> c(4) ; // an array of 4 sc_logic’s
sc_array<double> d(5) ; // an array of 5 double’s

538
Ver 1.4

269
Appendix B: sc_array Operator
sc_array operator:
 [ ] used to refer to a particular element
 Index ranges from 0 to length - 1

Example (given sc_array<int> a(10) ):


a[5]

539
Ver 1.4

Appendix B: sc_array Type Conversions


 Assignment operator = is used.
 RHS(source), LHS (target)
 Source may be (given an sc_array of type T ):
 sc_array of type T
 Length of target array and source array must be equal
 C++ array of type T
 Size of source array must be at least equal to target
array (may be greater).
 Assignment is right-aligned (rightmost to rightmost)
 Any excess elements are ignored

Examples:
sc_array<int> a(3);
sc_array<int> b(3);
sc_array<int> c(4);
int d[3] = {0, 1, 2}; //d[2] = 2, d[1] = 1, d[0] = 0
int e[5] = {0, 1, 2, 3, 4}; //e[4] = 4, e[3] = 3, …, e[1] = 1, e[0] = 0
a = d; // O.K. a[2] = d[2], a[1] = d[1], a[0] = d[0]
b = a; // O.K. b[2] = a[2], b[1] = a[1], b[0] = a[0]
c = b; // Illegal - array lengths not the same!!
c = d; // Illegal - C++ array smaller than SystemCTM array
c = e; // O.K. c[3] = e[3], c[2] = e[2], c[1] = e[1], c[0] = e[0]

540
Ver 1.4

270
Appendix B: Two Dimensional Arrays - sc_2d
 Collection of objects arranged into rows and columns.
 Each row is considered a one-dimensional array

 Two dimensions is highest supported at this time.

 Declared by invoking template on a one-dimensional


SystemCTM array type.

 Use sc_int<16> mem [1024]; instead of sc_2d

Column numbers
4 3 2 1 0
Row 0
Row 1
Row 2

Leftmost element of row (MSB) Rightmost element of row (LSB)

541
Ver 1.4

Appendix B: sc_2d Syntax

Syntax:
sc_2d< array_type > 2d_array_name (row_num,
column_num ) ;

array_type: one of sc_array<>, sc_bool_vector,


sc_logic_vector, sc_int, sc_uint
row_num: Number of rows
column_num: Number of columns

Examples:
sc_2d<sc_array<int> > a (3, 7); //note the space between <int> and >
// 3 rows, 7 columns
sc_2d<sc_int<32> > b (5, 10); // 5 rows, 10 columns

542
Ver 1.4

271
Appendix B: sc_2d Operators

sc_2d operators:
 [ ] [ ] used to access an individual element

Example:
sc_2d<sc_bool_vector> a(10, 16);
sc_bool_vector c(16);
bool d;
c = a[0];
d = a[7][2];

543
Ver 1.4

Appendix B: sc_2d Type Conversions


 Assignment operator = is used.

 Can assign one two-dimensional SystemCTM array to another,


provided they have the same dimensions and same type for
elements.

 Can initialize with C++ one-dimensional array


 Through constructor for the SystemCTM two-dimensional array
 Or though assignment

Assignment example:
int init_array [] = {
//Column numbers
//0 1 2 3 4 5 6
1, 2, 3, 4, 5, 6, 7, //row 0
11, 12, 13, 14, 15, 16, 17, //row 1
111, 121, 131, 141, 151, 161, 171 //row 2
};
sc_2d<sc_array<int> > actual_array(3, 7,init_array);
// actual_array[0][0] = init_array[0] = 1
// actual_array[0][6] = init_array[6] = 7
// actual_array[1][2] = init_array[9] = 13
// actual_array[2][6] = init_array[20] = 171
544
Ver 1.4

272
Appendix C
Debugging

Appendix C: Debugging with SystemCTM - 1


Debugging. You determine what is going wrong in your simulation.
Debugging consists of:
 Controlling the execution of your simulation
 Examining values of key data during the course of simulation

Use a debugger tool for source-level debugging:


 Unix: gdb, xgdb, dbx
 NT: Visual C++ development environment

Use "print" I/O statements in your code


 SystemCTM provides useful data

Trace waveforms for post-simulation viewing

Tip:
To enable source-level debugging, compile
with the -g option in gcc

546
Ver 1.4

273
Appendix C: Debugging with SystemCTM - 2

Controlling the execution of the simulation:

SystemCTM is different from normal, sequential programming


 Single-stepping from the beginning is not productive
 You don't know the actual order in which your processes will
execute
Tips:
 Set breakpoints in processes (first executable
statement inside the while loop)
proc1::entry()
 For asynchronous blocks, at first executable
{
statement
 Single-step within processes // Declarations, initialization
 Step over calls to SystemCTM built-in functions while (true)
{
val = some_func(in_sig);
Set breakpoint out_sig.write(val);
here wait();
}
Step OVER
}

547
Ver 1.4

Appendix C: Debugging with SystemCTM - 3

Examining data values:


While single-stepping, use the print command in the debugger
 gdb is C++ savvy

Use C++ I/O to print out data values


 SystemCTM data types know how to display themselves using C++ I/O
 If you must use C-style I/O, SystemCTM data types have a to_string()
method
proc1::entry()
{
while (true)
{
val = some_func(in_sig);
printf("proc1(%s): in_sig = %s\n",
sc_time_stamp(), in_sig.to_string());
cout << "proc1 (" << sc_clock::time_stamp() <<
"): val =" << val << endl;
out_sig.write(val);
wait();
}
}

548
Ver 1.4

274
Appendix C: Debugging with gdb

Invoking the debugger:


gdb [-x command-file] program [core-dump]

Common gdb commands:

List source lines to screen Managing breakpoints Miscellaneous


list info breakpoints set gdb-var = text
list line1,line2 delete breakpoint# set variable name = expression
list routine disable breakpoint# source command-file-name
enable breakpoint# shell unix-shell-command
Informational commands enable once breakpoint# cd new-working-dir
where clear pwd
whatis name kill
ptype structure-name Controlling execution quit
print name run command-line-args help
backtrace continue
step
Setting breakpoints next
break filename:line-number finish
break filename:function-name
break line-or-function if expression Searching for text in source
tbreak same-breakpoint-options search regexp
watch expression reverse-search regexp
549
Ver 1.4

Appendix C: Trace Waveforms


 Trace waveforms VCD example (WIF the same, except
substitute wif for vcd in the example):
 Trace functions can accept signals or variable of scalar types only.
Cannot trace variables
 Put in top-level main routine, or in a constructor

// Constructor for proc1


proc1(const char *NAME,
sc_clock_edge& CLK, sc_signal<int>& IN_SIG,
sc_signal<int>& OUT_SIG) : sc_sync(NAME, CLK),
in_sig(IN_SIG),out_sig(OUT_SIG)
{ // Create trace file
sc_trace_file *tf = sc_create_vcd_trace_file("tracefile");

// Trace signals
sc_trace(tf, in_sig, "in_sig");
sc_trace(tf, out_sig, "out_sig");
}

Use a waveform viewing tool such as VSS for WIF waveform file,
wave (GTK wave) for VCD files, or Sim Wave for ISBD files.
550
Ver 1.4

275
Appendix TB
Testbench Issues

Testbench
ASIC 2 Memory ASIC 3

S
t Plug out the model
icro d Behavioral Model
essor P
a
of the ASIC
r
t

S S
t t
d d

system around Partition and Distribute


the ASIC

Testbench Testbench Testbench

Designer 1 Designer 2 Designer n

Need ONE testbench for standalone ASIC simulation

552
Ver 1.4

276
Testbench issues
 Terminology
 Reset Interval
 The number of cycles spent in the “reset” tail of a process for
initialization (Ex: initialize a RAM)
 Handshake Interval
 The number of cycles required by the handshake part of the algorithm
 Functional latency
 The number of cycles required to schedule the actual functionality
 Initialization interval
 The number of cycles between successive iterations of an algorithm
loop (including handshake, if any)
testbench Handshake
Interval Functional latency
Reset
Interval entity under test

Initialization output
interval rate

553
Ver 1.4

Testbench issues

 Correct protocol between testbench and design


 Use Handshake signals if possible
 Avoid different testbenches for behavioral and RTL

Clock

Reset

Input

Output
Latency Handshake

Reset interval
Initialization Interval

554
Ver 1.4

277
Simulation

clocking design Testbench


process behavioral
process
reset
RTL
process
process
stimulation random monitor
process logic process

stimulus response
file file

555
Ver 1.4

One Way Handshake Request Data

data[31:0]
Block with
Fixed
Response ready_for_data
Time

clk

ready_for_data

data Old Data New Data 1 New Data 2 New Data 3

556
Ver 1.4

278
Two Way Handshake; Respond to Ready

When rdy4data high, send data, toggle


new_data

data[31:0]
Take
Block1 new_data Data
rdy4data Block

clk

rdy4data

data Old Data New Data 1 New Data 2

new_data

557
Ver 1.4

C++

Appendix : More on C++

558
Ver 1.4

279
Using C functions in C++ code
Just use your C functions as if you were programming in C:
include header file, and then call the function.
In C++, this will compile without any problem:

#include "stdio.h"
#include "stdlib.h"

char buffer[1];

void foo (char *name)


{
FILE *f;
if ((f = fopen (name, "r")) == NULL) return;
fread (buffer, 1, 1, f);
fclose (f);
}

559
Ver 1.4

Declaring C functions to be used with C++


You can write modules of an application in C, compile them, and then
link them with C++ modules. In order to be able to call C functions from
C++ code, you should declare your C functions in a header file, and use
the same header to compile C and C++. This is possible. Just have a look
at this example:

...
#ifdef __cplusplus
extern "C" {
#endif

/* C definitions */

#ifdef __cplusplus
}
#endif
...

Ths extern "C" just tells the compiler that following declaration is a
C based declaration. C definitions are after that usable like in C mode.
560
Ver 1.4

280
Function prototypes
You always need to prototype a function before using it. This prototype is defined by the type of the
returned value (or void if none), the name of the function, and the list of parameters' type.
Example: good:
void hello();
int about (int, char);
foo (int); // default return type is int

bad:
void hello();
...
void foo(){
hello (10);
}

this will generate the error: call to undetermined function hello (int) .

Just note that the name of a parameter in the prototype is not needed, only its type is. If the return type is
ommitted, int is used by default. You don't need to use a prototype in a source code if the function is called only
after its definition.
Example:

// void hello(); Line not needed


void hello(){
printf ("hello world.\n");
}
void foo(){
hello(); 561
Ver 1.4 }

Overloading functions
The changes in the definition of functions also mean that two functions with the same name are allowed,
provided that their parameters' list is different. Just declare and use them as usual.

Example:
int foo (int a, int b){
return a * b;
}

int foo (int a){


return a * 2;
}
...
{
a = foo (5, 5) + foo (5); // returns 5 * 5 (first foo)
} // + 5 * 2 (second foo)

But there are traps. You can define two functions this way:
int add (int a, int b);
float add (float a, float b);
and then make a call. This is very dangerous. But what about that?

int add (int a, float b);


float add (float a, int b);
...
a = add (2, 2);

Fortunately, the compiler will reject that, explaining it can't choose between the two foo functions. But be
careful! 562
Ver 1.4

281
Parameters default value
You can specify a default value for the last parameters of a function. If the parameter is omitted in the call,
the compiler will give it the default value.

Example:
int foo (int a, int b = 5);
...
int foo (int a, int b){
return a * b;
}

{
a = foo (2); // Same result as
a = foo (2, 5);
}

Note that it is only true for the last parameters, or the compiler would not know which parameter to
replace...
int func1 (int = 0, char); // Not allowed
void func2 (char, int = 0); // Ok
A default value must be specified in the prototype, and not necessary in the definition of the function. In
fact, you always implicitly define many functions. In the last example, foo (int) and foo (int, int) were
prototyped and implicitly defined. So combined with function overloading there can be forbidden things:

int foo (int a, int b = 5); // Definition with default value for b
int foo (float a); // Other definition
int foo (int a); // Error: conflict with the first one

563
Ver 1.4

Constant Definitions

To declare a variable constant, just add the word "const" before its definition.

const int a = 10;

When you declare a const variable in C++ files, you can give this variable a different value for each file. It
was not allowed by ANSI C, you had to declare this constant static.

Another difference with C is that C++ compiler calculates constant values when compiling. This means that
such things are now allowed:

const int number = 10; // Const declaration


int array1[number]; // Normal C and C++ code
int array2[number * 2]; // Not allowed by ANSI C.

564
Ver 1.4

282
Void pointers

In C++ mode, you cannot implicitly convert a void pointer (void *) to a pointer on another type. You
have to explicit it using a cast operator.

Example:

int *p_int;
void *p_void;

p_void = p_int; // Ok to implicitly convert

p_int = p_void; // Conversion not allowed

p_int = (int *)p_void; // Explicit conversion -> OK

In a sense, a void pointer can point on anything, so you affect it any value. But an int pointer for example
can only point on an int value, so you can't affect it any value.

565
Ver 1.4

Inline functions
In C, it was not easy to define macro-functions. The use of the directive #define was not very simple. C++
provides a better way to handle macros.

If you declare a function "inline", then the code will be reproduced wherever you call the function. So you
need to define the code each time you call the function, in order it to be expanded properly. That's the
reason why the code is generaly put in the header file with the declaration.

inline int addition (int a, int b){ return a + b;}

But you can also do it in two steps:


inline int addition (int a, int b);
...
int addition (int a, int b){ return a + b;}

Using inline functions will probably be more efficient, because most C++ compilers will first replace code,
and then optimise it. But the size of the code can increase dramatically. So it is recommended for small
functions (class constructors and operators are good examples).

Here is a code for those who don't see the difference between inline functions and a #define:
#define SQUARE (x) ((x) * (x))
int a = 2;
int b = SQUARE (a++);
// end: a = 4 and b = 2 * 3 = 6. Surprising ?!

inline int square (int x){ return x * x;}


int c = 2;
int d = square (c++);
// end: c = 3 and d = 2 * 2 = 4. Much better results !! 566
Ver 1.4

283
References

You can now transmit parameters to a function by reference. It means that when you modify the parameter
inside the function, the variable transmitted will be modified too. To specify that a parameter is going to be
referenced, just put a & between the type and the name of the parameter:

void A (int number) // by value. Inside the function, number is


{ // another variable
number *= 2;
}
void B (int & number) // by reference. The same variable will be modified
{
number *= 2;
}
int a = 5;
A (a); // a not modified
B (a); // a modified

You could compare that to pointers. But it is more powerful. The compiler will optimize it better. The code
is far more easy to write, just add a & to switch between value and reference.

567
Ver 1.4

Namespaces - 1
This concept is the answer to the common problem of multi-definition. Imagine you have two header files
describing two different types, but with the same name. How can you access both of them ?

The answer in C++ is namespaces. You can define types or functions in a namespace and use them
afterwards. Namespaces are defined using the namespace reserved word, followed by an identifier. But
let's have a look at an example...

namespace Geometry
{
struct Vector
{
float x, y;
};

float Norm (Vector v);


}

In this example, what is inside the {} is usual C++ code. To use this code, you have to specify to the
compiler that you're going to use geometry namespace. There are two ways of doing this: by specifying
for a single variable its namespace (the syntax is namespace::variable) or by using the reserved
sequence of words using namespace.

568
Ver 1.4

284
Namespaces - 2
// With ::
void Foo1(){
Geometry::Vector v = (1.0, 2.0);
float f = Geometry::Norm (v);
}
// With using keyword
using namespace Geometry;
void Foo2(){
Vector v = (1.0, 2.0);
float f = Norm (v);
}

When you use using namespace sequence, you can get into trouble if many namespaces define the same
type. If there is any ambiguous code, the compiler won't compile it. For example...
namespace Geometry{
struct Vector { using namespace Geometry;
float x, y; using namespace Transport;
};
} void foo()
{
namespace Transport{ Vector v0; // Error ambiguous
struct Vector Geometry::Vector v1; // OK
{ Transport::Vector v2; // OK
float speed; }
};
} A last thing (obvious ?): you cannot use a namespace if it has not been defined...
569
Ver 1.4

How to declare a class - 1


Class declaration

The declaration of a class is really simple. The keyword class must be followed by the name of the class
and the declaration of internal variables and methods, as shown in example:

class Complex
{
float r, i;
public:
float GetRe();
float GetIm();
float Modulus();
};

You can notice a few things:

Variables are declared the same way they are in classical C structs.
Functions are also declared as if they were prototypes.
Do not forget the final ";" or the compiler won't like it!

570
Ver 1.4

285
How to declare a class - 2
public is a keyword that indicates that GetRe, GetIm and Modulus functions may be called outside of the
object. There's no keyword before r and i, so they're declared private by default, and not accessible from
the outside.

In fact, there are three types of member functions:

private: only accessible inside of the class.


protected: accessible inside of the class and its inheritance classes.
public: accessible everywhere in the code.

When you declare a class with the reserved keyword class, every member is "private" by default. You
can also declare a class with the struct keyword, and every member will be "public" by default. That's
the only difference.

class Complex
{
float r, i; // Private by default
};
struct Complex
{
float r, i; // Public by default
};
571
Ver 1.4

Member functions implementation

Then you have to implement member functions. Just tell the compiler which class the function is part of by
puting the name of the class followed by :: before the name.

On your previous example:

float Complex::Modulus()
{
return r*r + i*i; // Note the use of r and i
}

In those functions, you can use every variable or method of the object as if they were globally declared.
Look at the use of r and i to get the idea.

572
Ver 1.4

286
Instantiation of a class as variable

A class can be used like other types of variables. To declare an object belonging to a class, just put the
name of the class followed by the name of the object you want to create.

For example, to use a complex you can do:

Complex x, y; // Declaration
Complex vector[10]; // Array of Complex
...
x = y; // Using declared objects

573
Ver 1.4

Dynamic allocation
Pointers on object are declared exactly like in C, with a "*". You can also get the address of an object
with "&".

Complex x; // Declaration of a Complex


Complex *y; // Declaration of a pointer on a Complex
y = &x; // y points on x

The C++ call for the malloc() function is the operator new. It allocates the room in memory for an object
and returns a pointer to this area. To free the object, use the operator delete. Everything will be clear
after that:

Complex *p, *q; // Declaration of two pointers


p = new Complex; // Dynamic allocation of a single Complex
q = new Complex[10]; // Dynamic allocation of an array of 10 Complex
... // Code using p and q
delete p; // Free memory for a single object
delete [] q; // Free memory for an array

Note the use of [] to tell the compiler that an array has to be deleted. If you do not specify this, you will
most of the time (depending on the compiler) only free the first object of the array, not the whole memory.
The effect of using delete [] with a single object is also undefined.

You can use new and delete operators anywhere in the code, just as malloc. Do not forget to free an
object when no longer used.

574
Ver 1.4

287
Using members of an object
Once you've declared a new object, you can access its public members as you would do with a C struct.
Just use . separator with a staticaly declared object, and -> with a dynamicaly declared object.

Complex x; // Static object


Complex *y = new Complex; // Dynamic object
float a;
a = x.Modulus(); // Call of Complex Modulus functions
a = y->Modulus();
delete y;

Class foo
{
int a; // private member
int Calc(); // private function
public:
int b;
int GetA();
};

575
Ver 1.4

Auto-reference to an object: this


Until now, you were able to use any member of a class inside of it, considering them as declared variables.
But you may want to get the address of an object inside of its member functions, for example to insert an
object in a chained list.

The solution is the reserved keyword this. this is a pointer of type (object *), and it points on the
current object. It is an implicit member variable, declared as private. Example:

class A
{
int a, b;
public:
void Display();
};

void A::Display()
{
printf ("Object at [%p]: %d, %d.\n", this, a, b);
}

576
Ver 1.4

288
Constructors - 1
A constructor member of a class is a function called when the object is declared or dynamicaly allocated.
Its name must be the same as the class, but there can be many different overloaded constructors, provided
that their list of parameters is different. A constructor has no return value (not even void), it must be left
blank. It cannot be declared static.

Example:

class Complex
{
float i, r;
public:
Complex(); // First simple constructor
Complex (float, float); // Second overloaded constructor
};
Complex::Complex()
{
i = 0;
r = 0;
}
Complex::Complex (float ii, float rr)
{
i = ii;
r = rr;
}
577
Ver 1.4

Constructors - 2
But how can you call a constructor? This depends whether you use the "new" operator or not. Without
"new":

Complex x(); // Or to make shorter:


Complex x; // Call of first constructor
Complex y (2, 2); // Call of second constructor

Just place parameters after the name of the variable, and the corresponding constructor will be called.
With the "new" operator, it becomes:

Complex *x = new Complex(); // Or to make shorter


Complex *y = new Complex; // Call of first constructor
Complex *z = new Complex (2, 2); // Call of second constructor
...
delete x;
delete y;
delete z;

The compiler generates code that does the following:

Allocates memory on the stack (static allocation) or on the heap (dynamic allocation).
Initializes the object (copy virtual functions, see inheritance).
Calls the constructor with the right parameters.

If objects are declared outside of any function as global variables, their constructor is called before the
function "main" starts. There are two particular constructors (default and copy) which are described below.

578
Ver 1.4

289
Destructor
After the definition of a constructor, the use of a destructor is obvious: it is called when the object is freed.
Its name is the name of the class preceded by ~. It has no return value (not even void). It takes no
parameters, since the programmer never needs to call it directly.

Example: class Complex{


float i, r;
public:
Complex(); // First simple constructor
Complex (float, float); // Second overloaded constructor
~Complex(); // Destructor
};
Complex::~Complex(){
printf ("Destructor called.\n");
}
...
{
Complex *x = new Complex (5, 5);
Complex y;
...
delete x; // Call of destructor for x
} // Implicit call of destructor for y

When you "delete" an object, you will do:


- Call destructor
- Free memory occupied by the object

With a static object, you can not determine when the destructor will be called, because C++ rules don't indicate it.
\You only know that it will theoricaly be called. 579
Ver 1.4

Default constructor and destructor


When you do not implement any destructor, a default destructor is used, which does nothing. But as soon
as you give code, it will be used instead. There is a default constructor too, which also becomes unusable when any
constructor is declared. Youcan then only use the provided constructors.

class Complex{
public:
Complex (float, float); // A single constructor is declared
};

After this, the only way to construct a Complex object is to give 2 parameters to the constructor.

Complex x; // The compiler will refuse that


Complex y (2, 2); // Ok to call the right constructor
Complex z[10]; // Not allowed (see below)

With arrays of objects, you should give the arguments for each constructor. That's the reason why an array
can only be declared if the object has a constructor taking no argument, or one argument, but not more,
because of the syntax:

class Complex{
public:
Complex (float = 0, float = 0);
};

Complex v1[5] = { 0, 1, 2, 3, 4 }; // each constructor is called


// with (x, 0) arguments
void Create (int n)
{
Complex v2[n] = { 0, 1, 2}; // The three first are initiated
// with (x, 0), and the others with (0,0) 580
Ver 1.4
}

290
Copy Constructor - 1
The copy constructor is the other default constructor. It only takes a reference to another object of the
same class as argument. When no copy constructor is declared, a default one only copies each field of the
source object to the destination object. If you declare one, you will have to copy all the fields you want
manually. Note the use of the keyword "const", needed for the compiler to recognize a copy constructor.

class Vector
{
int n;
float *v;
public:
Vector();
Vector (const Vector &);
};
Vector::Vector()
{
v = new float[100];
n = 100;
}
Vector::Vector (const Vector &vector)
{
n = vector.n; // Copy of field n
v = new float[100]; // Create a new array
for (int i = 0; i < 99; i++)
v[i] = vector.v[i]; // Copy array
}

581
Ver 1.4

Copy Constructor - 2

Copy constructor are needed if you make memory allocation inside of an object. This way, you can not
only copy the pointer, but really alloc a part of memory with values that can be different.

This copy constructor is called whenever you declare an object of a class and you assign it a value in the
same instruction.

Vector a; // Empty constructor


Vector b (a); // Copy constructor
Vector c = a; // Copy constructor (equal to the previous one)
Vector *d = new vector (a); // Copy constructor

You can also use the copy constructor with basic types, as int or char:

int a = 2; // Assign a value


int b (2); // Assign a value

582
Ver 1.4

291
Including objects in other objects - 1
In this part, the situation is simple. You want to include a class as member of another. The declaration is
quite simple:

class A
{
int a;
public:
A (int);
};
class B
{
int b;
A a_element;
public:
B (int, int);
};

In this example, you include an object of class A in an object of class B. The interesting part is what
happens when you call B constructor. After memory is allocated, the constructor for each object included
is called, and after that the constructor of the child object. Which A constructor will be called (if there are
many), and how to specify it? When you define the B constructor, you must specify which A constructor
will be called, and with which arguments. This looks like:

583
Ver 1.4

Including objects in other objects - 2


B::B (int aa, int bb) : a_element (aa)
{
// Code for constructor
b = bb;
}

Note the use of ":" before the call to A constructor. If many constructors are to be called, just separate
their call with a comma (see below). Another simple way to do this is to use constructors of basic types:

B::B (int aa, int bb) : a_element (aa), b (bb)


{
// Code for constructor
}

When the default copy constructor for the object containing other objects is called, it will implicitly call the
copy constructor of each object contained. When you overload it, you should call yourself the copy
constructor of each contained object.

584
Ver 1.4

292
Friend Functions
You can't make any access to private members of a class from the outside of the class. But sometimes you
need to. For example, a callback function has a predefined header, and can't be a member of any class. In
those case, you can declare the function as "friend" of the class. You will then be able to access any private
member without error. Let's have a look at an example:

class A{
int a;
public:
A (int aa) : a (aa) {} // Constructor initializing a
friend void foo();
};

void foo()
{ A a_obj (5); // Normal call
a_obj.a = 10; // Wouldn't be allowed in a non-friend function
}
You can place your friend declarations anywhere in the class definition, because a friend function has
nothing to do with public or private. Note that it is the class that chooses which functions will have access
to its private members. After such a declaration, no other prototype is needed for the function.

Friend functions are often used for redefining operators. A friend function cannot access the this pointer.
You can also use a member function of another class as a friend function of a class.

class B; // needed to avoid circular references


class A{
void uses_class_B (B &); // Normal member function
};
class B{
friend void A::uses_class_B (B &); // declaration of a friend function
};
585
Ver 1.4

Friend classes - 1
As you did for functions, you can declare friend classes. This is rather simple:

class A
{
int a;

friend class B;
};
class B
{
public:
void foo();
};
B::foo()
{
A a_obj;
a_obj.a = 10; // Not allowed without being friend
}

Also note that this is class A that chooses which classes are allowed to access its private members.

586
Ver 1.4

293
Friend classes - 2
Another consequence of this is that friend classes of friend classes of a class are not friend classes. Let's
look at an example:

class A
{
int a;
friend class B;
};
class B
{
int b;
friend class C;
};
class C
{
int c;
int foo()
{
B b_obj;
b_obj.b = 10; // OK
A a_obj;
a_obj.a = 10; // Refused by compiler. It would have needed
} // a friend class C declaration in A.
};

587
Ver 1.4

Static variables
To declare a static variable, just add static keyword before. After that, a definition in a code file is also
needed. Every instance of the class will reference the same variable.

// A.h
class A
{
static int number;
public:
A() { number++; }
~A() { number--; }
};

// A.C
#include "A.h"
int A::number = 0;

In this example, the internal variable number is used to count the number of objects from class A in
memory. The same protection rules apply. A private member can only be initialized, and public variables
can also be modified.

588
Ver 1.4

294
Static functions
Static functions exist and can be called without any instanciation of the class. Therefore, they can only
modify static members. To declare a static function, just put the keyword static before its declaration.

class A
{
static int number;
public:
A() { number++; }
~A() { number--; }
static int GetNumber();
};
int A::GetNumber()
{
return number;
}

You can then normally call a static function, using an object. But you can also call it without any object.
Just put the name of the class followed by a "::" and the name of the function:

A a_obj;
n = a_obj.GetNumber(); // Normal call
n = A::GetNumber(); // Does not need any object

589
Ver 1.4

Constant functions
Constant functions are member functions of a class that do not modify internal values of variables. They
can therefore be called on a const object. To declare them, just put "const" keyword after the declaration
and the definition of the function:

#include
class A
{
int a;
public:
A (int aa = 1) { a = aa; }
void Display() const;
};
void A::Display() const
{
printf ("%d", a);
}
const A a_obj (2);
int main (int, char **)
{
a_obj.Display(); // Would not be allowed if Display weren't const
}

You can also call const functions of normal objects.

590
Ver 1.4

295
Class inheritance - 1
Inheritance is a mechanism that allows a class to reproduce every member of a father-class, adding and
modifying things according to its functionnality.

When you want to use every member of a father class without having to declare each one, just declare it in
the header:

class father
{
public:
int a;
};

class A : public father


{
public:
int b;
};

In this example, the class A will have one public member b, as usual. But as it is declared as son of the
class father, it will also have an implicit member a.

When a class inherits from another, it gets every member of it. This leads to two problems: how will a class
protect its members for its potential sons not to access it, and how can a son class forbid the outside world
access to the part inherited from its father.

591
Ver 1.4

Class inheritance - 2
A class can declare members public. This way, everyone can access them. This is not a object oriented
programmation concept. It is good for debugging for example, but don't use it too much. Instead, use inline
member functions (accessors) which return the value of components, if needed. You then make the
variable read-only in a clean way.

class A{
int a;
public:
int GetA() { return a; }
}

On the contrary, you can declare every member private. No one will be able to use them. This is good
object oriented programming. But you may want that only children of a class be allowed to access some members.
Therefore, there will be no way to access those members from the outside, but inside, everything will be allowed.

class father{
protected:
int a;
};

class A : public father{


int GetA() { return a; } // can access a
}
...
{
father f;
x = f.a; // refused by computer
}
592
Ver 1.4

296
Private, protected or public inheritance - 1
A child of a class can choose what to do of public members of its father. It can keep them public, totally
respecting its father's wishes. Just add word public between ":" and the name of the father class:

class father
{
public:
int a;
};

class A : public father


{
int GetA() { return a; } // can access a
}
...
{
A a_obj;
x = a_obj.a; // can access public member inherited public
}

A son class can also protect the member it inherited from its father, thinking its father is now internal. The
inheritence must then be declared protected (all father's public members will become protected) or private
(public members will become private). The only difference between those two is for children of the child
class.

593
Ver 1.4

Private, protected or public inheritance - 2


class father
{
public:
int a;
};

class A : private father


{
int GetA() { return a; } // can access a (a is becoming private member)
}
...
{
A a_obj;
x = a_obj.a; // refused by compiler: cannot access private a
}

Note that you can omit the keyword public, protected or private. It will then be private by
default.

594
Ver 1.4

297
Constructors and destructor
As for member objects, you have to transmit parameters to the constructor of the father class, and it is
called before the construction of the child.

The syntax is also the same:

class Father
{
int a;
public:
Father (int aa) { a = aa; }
};

class Child : public Father


{
int b;
public:
Child (int aa) : Father (aa) {} // As inline constructor
Child (int, int); // Or as normal function
}

Child::Child (int aa, int bb) : Father (aa)


{
b = bb; Mr. Constructor
}

595
Ver 1.4

Member function overloading


When a class inherits from another one, it keeps every variable member (static or not). It can't refuse it.
But for a function, a child has the choice of replacing a father's function or not.

class Father{
public:
void MakeAThing();
};
class Child : public Father{
public:
void MakeAThing();
};
...
{
Father father;
father.MakeAThing(); // Father's method MakeAThing called
Child child;
child.MakeAThing(); // Child's method MakeAThing called
}

In the new version of the function, you may want to call the father's version of the function, or even a global
function with the same name. That is possible using :: and the name of the class you want to be called.

void MakeAThing();
class Father{
{
public: Father father;
void MakeAThing(); father.MakeAThing(); // Father's method MakeAThing called
};
class Child : public Father{ Child child;
public: child.MakeAThing(); // Child's method MakeAThing called
void MakeAThing() {
Father::MakeAThing(); // Father's method called
MakeAThing(); // Global function called
::MakeAThing(); // Global function called }
}
};
596
Ver 1.4

298
Polymorphism
The main advantage of this is that a child class can always replace its father as argument of a function for
example. The child can be seen as a class with two (or more) identities.

class Father
{
...
};
class Child : public Father
{
...
};
...
void ExampleFunction (Father &);
...
{
Father father;
ExampleFunction (father); // Normal call
Child child;
ExampleFunction (child); // child is considered as a father
}

This is possible with objects, objects' pointers and references. You can then improve a class, and also use
every characteristic and functionnality of its father. This is the main source of success of object oriented
programming.

597
Ver 1.4

Output Streams
The type of an output stream is ostream. For standard types, the operator << is redefined:

ostream & operator << (base_type);

The output is also a stream, and this allows to chain outputs:

stream << "Time is" << hour << ":" << minute << ":" << second;

Standard streams are cout for standard output, cerr for standard error with buffer, and clog for
non-buffered standard error. For example, to write hour, just type:

cout << "Time is" << hour << ":" << minute << ":" << second;

Another way to write into streams is to use the member function put. It is declared as:

ostream & put (char);

and it is a member function of ofstream. You can use it in a very simple way:

char Message[] = "This is a message.\n";


...
{
while (Message[i])
cout.put (Message[i]);
}

598
Ver 1.4

299
Input Streams

The operator () is now overloaded:

ostream & operator >> (& base_type);

Standard input is cin, so you can use it this way:

{
int hh;
int mm;

cin >> hh >> mm; // get two ints from standard input
}

You can also use the get member function to get a char or a set of chars:

istream & get (char &c);


istream & get (char *p, int max, char separator);

599
Ver 1.4

Overloading << and >> operators


You can redefine << and >> operators to allow your objects to comply to the standard architecture
described in this page. Just declare them as follows:

class Example{
friend operator << (ostream &st, Example &ex);
friend operator >> (ostream &st, Example &ex);
};

operator << (ostream &st, Example &ex){


...
}

We can take as example the Complex class:

class Complex{
float r, i;

friend ostream & operator << (ostream &out, Complex &ex);


friend istream & operator >> (istream &in, Complex &ex);
};

ostream & operator << (ostream &out, Complex &ex){


out << '(' << ex.r << ',' << ex.i << ')';
return out;
}

istream & operator >> (istream &in, Complex &ex)


{
in >> ex.r >> ex.in;
return in;
} 600
Ver 1.4

300
Connecting a stream to a file - 1

When you want to connect a stream to file, you use fstream, ifstream and ofstream as basic classes.
Constructors look like:

fstream (char *filename, open_mode mode);


ifstream (char *filename, open_mode mode = 0);
ofstream (char *filename, open_mode mode = 0);

where filename is the name of the file and mode an enumarated type. This enum is declared is the class
ios, and can take the value:

in open in input mode


out open in output mode
ate open and seek at the end
app open in append mode
trunc erase file before writing
nocreate exits if file does not exists
noreplace exits if file exits

601
Ver 1.4

Connecting a stream to a file - 2


// Copy a file to another

#include <iostream.h>
#include <fstream.h>
#include <libc.h>

void Fatal (const char *message1, const char *message2 = "")


{
cerr << message1 << ' ' << message2 << '\n';
exit (-1);
}

int main (int argc, char *argv[])


{
if (argc != 3)
Fatal ("Not enough parameters.");
ifstream source (argv[1]);
if (!source) Fatal ("Cannot open file : ", argv[1]);
ofstream dest (argv[2]);
if (!dest) Fatal ("Cannot open file : ", argv[2]);

char c;
while (source.get(c))
dest.put (c);

if (!source.eof() || dest.bad())
Fatal ("Reading or writing error.");
} 602
Ver 1.4

301
Declaring an exception handler - 1

This is a powerful way of handling every error occuring in a program in a simple way. You declare an
exception in a class using the keyword class:

class exception_name {};

You then decide to start an exception using the keyword throw:

throw exception_name();

In C++, you try executing a piece of code, and catch every exception that can occur during the
execution:

try
{
// code
}
catch (class_name::exception_1)
{
// handling of exception
}

603
Ver 1.4

Declaring an exception handler - 2


// Vector example
int main (int, char **)
#include <sys/types.h> {
#include <iostream.h> Vector *v;
class Vector try
{ {
private: int si, index;
size_t si; cout << "Give the size of the array: ";
int *value; cin >> si;
public: v = new Vector (si);
class BadIndex {}; cout << "Give an index in the array: ";
class BadSize {}; cin >> index;
class BadAllocation {}; cout << "Give its value: ";
cin >> (*v)[index];
Vector (int); }
int& operator[](int); catch (Vector::BadSize)
}; {
cerr << "The size of an array must be greater than 0.\n";
Vector::Vector (int s) }
{ catch (Vector::BadAllocation)
if (s <= 0) throw BadSize(); {
si = s; cerr << "Memory allocation error.\n";
value = new int[si]; }
if (value == 0) throw BadAllocation(); catch (Vector::BadIndex)
} {
cerr << "Index out of range.\n";
int& Vector::operator[] (int i) }
{ }
if ((i < 0) || (i >= si)) throw BadIndex();
return value[i];
} 604
Ver 1.4

302
Lab Solutions

Appendix : Solution
Lab Solutions

605
Ver 1.4

Sample Solution Lab 2 - stage1.h and stage1.cc


// stage1.h // stage1.cc
/* Filename stage1.h */ #include "systemc.h"
/* This is the interface file for synchronous #include "stage1.h"
process `stage1' */

struct stage1 : sc_module { void stage1::entry()


sc_in<double> in1; //input {
sc_in<double> in2; //input double a, b;
sc_out<double> sum; //output
sc_out<double> diff; //output a = 20.0;
sc_in_clk CLK; b = 5.0;
//Constructor
while (true) {
stage1(const char *NAME) sum.write(a+b);
: sc_module (NAME) { diff.write(a-b);
sc_sync_tprocess(handle1, "STAGE1", wait();
stage1, entry, CLK.pos()); a = in1.read();
end_module(); b = in2.read();
}
}
// Process functionality in member } // end of entry function
function below
void entry();
};

606
Ver 1.4

303
Sample Solution Lab 2 - stage2.h and stage2.cc
// stage2.h // stage2.cc
/* Filename stage2.h */ /* Filename stage2.cc */
/* This is the interface file for synchronous /* This is the implementation file for
process `stage2' */ synchronous process `stage2' */

struct stage2 : sc_module { #include "systemc.h"


sc_in<double> sum; //input #include "stage2.h"
sc_in<double> diff; //input
sc_out<double> prod; //output void stage2::entry()
sc_out<double> quot; //output {
sc_in_clk CLK; double a, b;

//Constructor a = 20.0;
stage2(const char *NAME) b = 5.0;
: sc_module (NAME) { while (true) {
sc_sync_tprocess(handle1, "STAGE2", prod.write(a*b);
stage2, entry, CLK.pos()); quot.write(a/b);
end_module(); wait();
} a = sum.read();
b = diff.read();
// Process functionality in member }
function below } // end of entry function
void entry();
};

607
Ver 1.4

Sample Solution Lab 2 - stage3.h and stage3.cc


// stage3.h // stage3.cc
/* Filename stage3.h */ #include <math.h>
/* This is the interface file for synchronous #include "systemc.h"
process `stage3' */ #include "stage3.h"

struct stage3 : sc_module { void stage3::entry()


sc_in<double> prod; //input {
sc_in<double> quot; //input double a, b;
sc_out<double> powr; //output double c;
sc_in_clk CLK;
a = 20.0;
//Constructor b = 5.0;
stage3(const char *NAME) while (true) {
: sc_module (NAME) { c = pow(a, b);
sc_sync_tprocess(handle1, "STAGE3", powr.write(c);
stage3, entry, CLK.pos()); wait();
end_module(); a = prod.read();
} b = quot.read();
}
// Process functionality in member
function below } // end of entry function
void entry();
};

extern "C" double pow(double, double);


608
Ver 1.4

304
Sample Solution Lab 2 -main.cc
//main.cc //main.cc
#include "systemc.h" S1 << in1 << in2 << sum << diff << clk;
#include "stage1.h" stage2 S2("Stage2");
#include "stage2.h" S2 << sum << diff << prod << quot << clk;
#include "stage3.h" stage3 S3("Stage3");
#include "display.h" S3 << prod << quot << powr << clk;
#include "numgen.h"
display D("Display");
int sc_main(int ac, char *av[]) D << powr << clk;
{
sc_signal<double> in1; sc_start(1000);
sc_signal<double> in2; return 0;
sc_signal<double> sum;
}
sc_signal<double> diff;
sc_signal<double> prod;
sc_signal<double> quot;
sc_signal<double> powr;

sc_clock clk("CLOCK", 20.0, 0.5, 0.0);

numgen N("STIMULUS");
N << in1 << in2 << clk;

stage1 S1("Stage1");

609
Ver 1.4

Sample Solution Lab 3 - mem_control.h, test_fsm.h


// mem_control.h // test_fsm.h
struct mem_control : sc_module { struct test_fsm : sc_module {
sc_in<sc_uint<8> > outof;
sc_in<state_t> state;
sc_out<sc_uint<8> > into;
sc_in<sc_uint<8> > opcode;
sc_in<bool> reset; sc_out<bool> reset;
sc_out<bool> a_wen; sc_in_clk CLK;
sc_out<bool> rd_wen;
sc_out<bool> wd_wen; test_fsm(const char *NAME)
sc_out<bool> inca; : sc_module (NAME) {
sc_out<state_t> next_state; sc_sync_tprocess(handle1,
"TEST_FSM", test_fsm, entry,
sc_uint<8> tstate;
CLK.pos());
mem_control(const char *NAME) end_module();
: sc_module (NAME) { }
sc_async_fprocess(handle1,
"MEM_CONTROL", mem_control, entry); void entry();
sensitive(state);
sensitive(opcode); };
sensitive(reset);
end_module();
}

void entry();
};
610
Ver 1.4

305
Sample Solution Lab 3 - mem_control.cpp
// mem_control.cpp default:
#include <iostream.h> next_state = IDLE; case WRITE_BLOCK5:
#include "systemc.h" break; wd_wen.write(true);
#include "fsm_types.h" } // end case inca.write(true);
#include "mem_control.h" break; next_state = IDLE;
case READ_WORD: break;
void mem_control::entry(){ a_wen.write(true); case READ_BLOCK:
sc_uint<8> cmd; next_state = READ_WORD2; a_wen.write(true);
tstate = state; break; next_state = READ_BLOCK2;
a_wen.write(false); case READ_WORD2: break;
rd_wen.write(false); rd_wen.write(true); case READ_BLOCK2:
wd_wen.write(false); next_state = IDLE; rd_wen.write(true);
inca.write(false); break; inca.write(true);
case WRITE_WORD: next_state = READ_BLOCK3;
if (reset) a_wen.write(true); break;
next_state = IDLE; next_state = WRITE_WORD2; case READ_BLOCK3:
else { break; rd_wen.write(true);
switch(state.read()) case WRITE_WORD2: inca.write(true);
{ wd_wen.write(true); next_state = READ_BLOCK4;
case IDLE: next_state = IDLE; break;
cmd = opcode.read(); break; case READ_BLOCK4:
switch (cmd) case WRITE_BLOCK: rd_wen.write(true);
{ a_wen.write(true); inca.write(true);
case NOP: next_state = WRITE_BLOCK2; next_state = READ_BLOCK5;
next_state = IDLE; break; break;
break; case WRITE_BLOCK2: case READ_BLOCK5:
case RDWD: wd_wen.write(true); rd_wen.write(true);
next_state = READ_WORD; next_state = WRITE_BLOCK3; next_state = IDLE;
break; break; break;
case WTWD: case WRITE_BLOCK3: } // end case
next_state = WRITE_WORD; wd_wen.write(true); } // end if
break; inca.write(true);
case WTBLK: next_state = WRITE_BLOCK4; }
next_state = WRITE_BLOCK; break;
break; case WRITE_BLOCK4:
case RDBLK: wd_wen.write(true);
next_state = READ_BLOCK; inca.write(true);
break; next_state = WRITE_BLOCK5;
break; 611
Ver 1.4

Sample Solution Lab 3- sram.h, Makefile


// sram.h // makefile
struct sram : sc_module { MODULE = lab3
sc_in<bool> reset;
sc_in<bool> rd; SRCS = mem_control.cpp main.cpp sm_seq.cpp
sc_in<bool> wr; sram.cpp test_fsm.cpp
sc_in<sc_uint<8> > addr; OBJS = $(SRCS:.cpp=.o)
sc_inout_rv<8> data;
sc_in_clk CLK; include ../Makefile.defs

const int width; // parameters


const int depth;

sc_int<8> memory[1024]; // Internal variables

sram(const char* NAME, const int WIDTH = 8,


const int DEPTH = 64)
: sc_module (NAME), width(WIDTH),
depth(DEPTH) {
sc_async_fprocess(handle1, "SRAM", sram,
entry);
// Global watching for reset
sensitive_pos(CLK);

end_module();
}

// Process functionality
void entry();
};

612
Ver 1.4

306
Sample Solution Lab 3 - sram.cpp
// sram.cpp
else
#include <iostream.h>
{
#include "systemc.h"
if (rd) // Read
#include "sram.h"
{
address = addr.read();
/* * Reg - functionality */
/* cout << sc_clock::time_stamp() << " " <<
"MEM: Reading " << memory[address] << " from
void sram::entry()
address " << address << endl;
{
*/
int i = 0;
data_tmp = memory[address];
sc_uint<8> address;
data.write(data_tmp);
sc_bool_vector data_tmp(8);
} else
data.write("ZZZZZZZZ");
// Reset behavior
if (wr) // Write
{
if (reset)
address = addr.read();
{
/* cout << sc_clock::time_stamp() << " " <<
for (i = 0; i < depth; i++) memory[i] = 0;
"MEM: Writing " << data << " to address " <<
data.write("ZZZZZZZZ");
address << endl; */
}
memory[address] = data;
}
if (rd && wr)
}
cout << "Can't read and write at the same time!!"
}
<< endl;

613
Ver 1.4

Sample Solution Lab 3 - sm_seq.h


// sm_seq.h // sm_seq.h
struct sm_seq : sc_module { sm_seq(const char *NAME)
: sc_module (NAME) {
sc_in<sc_uint<8> > into; sc_sync_tprocess(handle1,
sc_in<bool> reset; "SM_SEQ", sm_seq, entry, CLK.pos());
sc_in<state_t> next_state; watching(reset.delayed() == true);
sc_out<state_t> state; idata = 0;
sc_in<bool> a_wen; end_module();
sc_in<bool> rd_wen; }
sc_in<bool> wd_wen;
sc_in<bool> inca; void entry();
sc_inout<sc_uint<8> > inreg;
sc_inout_rv<8> data;
sc_out<sc_uint<8> > outof; };
sc_inout<sc_uint<8> > addr;
sc_out<bool> mem_wr;

sc_uint<8> rdata, address,


idata;
sc_in_clk CLK;

614
Ver 1.4

307
Sample Solution Lab 3- sm_seq.cpp
// sram.cpp if (a_wen)
#include <iostream.h> {
#include "systemc.h" address = inreg;
#include "fsm_types.h" addr.write(address);
#include "sm_seq.h" }
else if (inca)
void sm_seq::entry() {
{ address++;
sc_bool_vector tmp_inreg(8); addr.write(address);
}
if (reset.read() == true)
{ if (wd_wen)
data.write("ZZZZZZZZ"); {
inreg.write(0); mem_wr.write(true);
outof.write(0); tmp_inreg = inreg.read();
mem_wr.write(false); data.write(tmp_inreg);
rdata = 0; idata = inreg.read();
address = 0; }
state.write(IDLE); else
wait(); {
} mem_wr.write(false);
data.write("ZZZZZZZZ");
while (true) }
{ if (rd_wen)
{
inreg.write(into); wait();
rdata = data;
outof.write(rdata); idata = data;;
state.write(next_state); }
wait();
} 615
Ver 1.4 }

Sample Solution Lab 3- main.cpp


// main.cpp into.write(0); // Don't want initialization 'X' warnings
#include "systemc.h" inreg.write(0);
#include "fsm_types.h" state.write(IDLE);
#include "mem_control.h" next_state.write(IDLE);
#include "sm_seq.h"
#include "sram.h" mem_control fsm1("MEM_CONTROL");
#include "test_fsm.h" fsm1 << state << inreg << reset << a_wen << rd_wen <<
wd_wen << inca << next_s
int sc_main(int argc, char *argv[]) { tate;
sc_signal<bool> reset;
sc_signal<bool> mem_wr; sm_seq seq1("SEQ");
sc_signal<bool> a_wen; seq1 << into << reset << next_state << state << a_wen <<
sc_signal<bool> rd_wen; rd_wen << wd_wen << i
sc_signal<bool> wd_wen; nca << inreg
sc_signal<bool> inca; << data << outof << addr << mem_wr << clock;
sc_signal<state_t> state;
sc_signal<state_t> next_state; sram ram1("SRAM", 8, 64);
sc_signal_rv<8> data; ram1 << reset << rd_wen << mem_wr << addr << data <<
sc_signal<sc_uint<8> > addr; clock;
sc_signal<sc_uint<8> > into;
sc_signal<sc_uint<8> > outof; test_fsm TestBench("TEST");
sc_signal<sc_uint<8> > inreg; TestBench << outof << into << reset << clock;
sc_clock::start(-1);
sc_clock clock("CLOCK", 100, 0.5,
return 0;
0.0); }
616
Ver 1.4

308
Sample Solution Lab 4- switch.h, pkt.h
// switch.h // pkt.h
struct mcast_pkt_switch : sc_module { #include "systemc.h"
sc_in<bool> switch_cntrl; struct pkt {
sc_in<pkt> in0; sc_int<8> data;
sc_in<pkt> in1; sc_int<4> id;
sc_in<pkt> in2; bool dest0;
sc_in<pkt> in3; bool dest1;
bool dest2;
sc_out<pkt> out0; bool dest3;
sc_out<pkt> out1;
sc_out<pkt> out2; inline bool operator == (const pkt& rhs) const
sc_out<pkt> out3; {
return (rhs.data == data && rhs.id == id &&
mcast_pkt_switch(char* NAME ) rhs.dest0 == dest0 && rhs.dest1
: sc_module(NAME) { == dest1 && rhs.dest2 == dest2 && rhs.dest3 ==
sc_async_tprocess(handle1, dest3);
"SWITCH", mcast_pkt_switch, entry); }
sensitive(in0);
sensitive(in1);
};
sensitive(in2);
sensitive(in3);
sensitive(switch_cntrl);
end_module();
}
void entry();
};
617
Ver 1.4

Sample Solution Lab 4- switch.cc


// switch.cc // switch.cc
#include "systemc.h" fifo q2_in;
#include "pkt.h" fifo q3_in;
#include "fifo.h"
#include "switch_reg.h" fifo q0_out;
#include "switch.h" fifo q1_out;
#define SIM_NUM 500 fifo q2_out;
fifo q3_out;
void mcast_pkt_switch :: entry()
{ FILE *result;
// declarations
switch_reg R0; // initialization
switch_reg R1; pkt_count = 0;
switch_reg R2; drop_count = 0;
switch_reg R3; sim_count = 0;
switch_reg temp;
q0_in.pntr = 0;
int sim_count; q1_in.pntr = 0;
int pkt_count; q2_in.pntr = 0;
int drop_count; q3_in.pntr = 0;

fifo q0_in; q0_out.pntr = 0;


q1_out.pntr = 0;
fifo q1_in;
q2_out.pntr = 0;

618
Ver 1.4

309
Sample Solution Lab 4- switch.cc
// switch.cc // switch.cc
q3_out.pntr = 0; R1.free = true;
R2.free = true;
R3.free = true;
q0_in.full = false;
q1_in.full = false; result = fopen("result","w");
q2_in.full = false;
q3_in.full = false; cout << endl;
cout << "--------------------------------------------------------------------------
q0_in.empty = true; ------------" << endl;
cout << endl << " 4x4 Multicast Helix Packet Switch
q1_in.empty = true;
Simulation" <<endl;
q2_in.empty = true; cout << "--------------------------------------------------------------------------
q3_in.empty = true; ------------" << endl;
cout << " This is the simulation of a 4x4 non-blocking multicast
q0_out.full = false; helix packe
q1_out.full = false; t switch. The" << endl;
cout << " switch uses a self-routing ring of shift registers to
q2_out.full = false;
transfer cells from one" << endl;
q3_out.full = false; cout << " port to another in a pipelined fashion, resolving output
contention and handling" << endl;
q0_out.empty = true; cout << " multicast switch efficiently." << endl << endl;
q1_out.empty = true;
q2_out.empty = true;
cout << " Press any key to start the simulation..." << endl << endl;
q3_out.empty = true;

R0.free = true;

619
Ver 1.4

Sample Solution Lab 4- switch.cc


// switch.cc // switch.cc
getchar(); if (in2.event())
{
wait(); pkt_count++;
// functionality if (q2_in.full == true) drop_count++;
while( sim_count++ < SIM_NUM ) else q2_in.pkt_in(in2.read());
{ };
wait();
if (in3.event())
/////read input packets {
if (in0.event()) pkt_count++;
{ if (q3_in.full == true) drop_count++;
pkt_count++; else q3_in.pkt_in(in3.read());
if (q0_in.full == true) drop_count++; };
else q0_in.pkt_in(in0.read());
}; /////move the packets from fifo to shift register ring/////

if (in1.event()) if((!q0_in.empty) && R0.free)


{ {
pkt_count++; R0.val = q0_in.pkt_out();
if (q1_in.full == true) drop_count++; R0.free = false;
else q1_in.pkt_in(in1.read()); }
};
if((!q1_in.empty) && R1.free)

620
Ver 1.4

310
Sample Solution Lab 4- switch.cc
// switch.cc // switch.cc

{ /////write the register values to output fifos////////////


R1.val = q1_in.pkt_out(); if ((!R0.free) && (R0.val.dest0) && (!q0_out.full))
R1.free = false; {
} q0_out.pkt_in(R0.val);
if((!q2_in.empty) && R2.free) R0.val.dest0 = false;
{ if
R2.val = q2_in.pkt_out(); (!(R0.val.dest0|R0.val.dest1|R0.val.dest2|R0.val.dest3))
R2.free = false; R0.free = true;
} }
if((!q3_in.empty) && R3.free)
{ if ((!R1.free) && (R1.val.dest1) && (!q1_out.full))
R3.val = q3_in.pkt_out(); {
R3.free = false; q1_out.pkt_in(R1.val);
} R1.val.dest1 = false;
if
if(switch_cntrl.event() && switch_cntrl) (!(R1.val.dest1|R1.val.dest1|R1.val.dest2|R1.val.dest3))
{ R1.free = true;
/////shift the channel registers }
temp = R0; if ((!R2.free) && (R2.val.dest2) && (!q2_out.full))
R0 = R1; {
R1 = R2; q2_out.pkt_in(R2.val);
R2 = R3; R2.val.dest2 = false;
if
R3 = temp; (!(R2.val.dest2|R2.val.dest1|R2.val.dest2|R2.val.dest3))
R2.free = true; 621
Ver 1.4

Sample Solution Lab 4- switch.cc


// switch.cc // switch.cc
}
if ((!R3.free) && (R3.val.dest3) && cout << "Total number of packets received: " << pkt_count <<
(!q3_out.full)) endl;
{ cout << "Total number of packets dropped: " << drop_count <<
q3_out.pkt_in(R3.val); endl;
cout << "Percentage packets dropped: " <<
R3.val.dest3 = false; drop_count*100/pkt_count << endl;
if cout << "---------------------------------------------------------------------
(!(R3.val.dest3|R3.val.dest1|R3.val.dest2|R3. -----------------" << endl;
val.dest3)) R3.free = true;
} }
/////write the packets out//////////
if (!q0_out.empty)
out0.write(q0_out.pkt_out());
if (!q1_out.empty)
out1.write(q1_out.pkt_out());
if (!q2_out.empty)
out2.write(q2_out.pkt_out());
if (!q3_out.empty)
out3.write(q3_out.pkt_out());
}
}
sc_stop();
cout << endl << endl << "------------------------" << endl;
cout << "End of switch operation..." << endl;
cout << "Total number of packets received: " << pkt_count
<< endl;
622
Ver 1.4

311
Sample Solution Lab 4- main.cc
// main.cc // main.cc
#include "systemc.h" sc_signal<bool> switch_cntrl;
#include "pkt.h" sc_clock clock1("CLOCK1", 75, 0.5, 0.0);
#include "switch_clk.h" sc_clock clock2("CLOCK2", 30, 0.5, 10.0);
#include "sender.h"
#include "receiver.h" sender sender0("SENDER0");
sender0 << pkt_in0 << id0 << clock1;
#include "switch.h" sender sender1("SENDER1");
sender1 << pkt_in1 << id1 << clock1;
sender sender2("SENDER2");
int sc_main(int argc, char *argv[]) { sender2 << pkt_in2 << id2 << clock1;
sc_signal<pkt> pkt_in0; sender sender3("SENDER3");
sc_signal<pkt> pkt_in1; sender3 << pkt_in3 << id3 << clock1;
sc_signal<pkt> pkt_in2; switch_clk switch_clk1("SWITCH_CLK");
sc_signal<pkt> pkt_in3; switch_clk1 << switch_cntrl << clock2;
sc_signal<pkt> pkt_out0; mcast_pkt_switch switch1("SWITCH");
sc_signal<pkt> pkt_out1; switch1 << switch_cntrl << pkt_in0 << pkt_in1 << pkt_in2 <<
sc_signal<pkt> pkt_out2; pkt_in3 << pkt_out0 << pkt_out1 << pkt_out2 << pkt_out3;
sc_signal<pkt> pkt_out3;
receiver receiver0("RECEIVER0");
sc_signal<sc_int<4> > id0, id1, id2, id3; receiver0 << pkt_out0 << id0;
receiver receiver1("RECEIVER1");
receiver1 << pkt_out1 << id1;
id0.write(0); receiver receiver2("RECEIVER2");
receiver2 << pkt_out2 << id2;
id1.write(1); receiver receiver3("RECEIVER3");
id2.write(2); receiver3 << pkt_out3 << id3;
sc_clock::start(-1);
id3.write(3); return 0; 623
Ver 1.4
}

Sample Solution Lab 5- ram.h and ram.cc


// ram.h // ram.cc
struct ram : sc_module { #include <iostream.h> // For C++ I/O
sc_in<sc_uint<32> > datain; //input #include <stdio.h> // For C I/O
sc_in<bool> cs; //input #include "systemc.h"
sc_in<bool> we; //input #include "ram.h"
sc_in<sc_int<32> > addr; //input
void ram::entry(){
sc_out<sc_uint<32> > dataout; //output int address;
sc_in_clk CLK; unsigned int datai;
// Internal variable unsigned int datao;
int memory[1000];
// Parameter // initialize the memory
const int wait_cycles; // Number of cycles it for (int i=0;i<1000;i++) {
memory[i] = 0xffffffff;
takes to access memory }
//Constructor while (true) {
ram(const char * NAME, const int wait_until(cs.delayed() == true);
address = addr.read();
WAIT_CYCLES) if (we.read() == true) { // Write operation
: sc_module (NAME), wait(wait_cycles-1);
wait_cycles(WAIT_CYCLES) { datai = datain.read();
sc_sync_tprocess(handle1, "RAM", ram, memory[address] = datai;
entry, CLK.pos()); } else { // Read operation
end_module(); if (wait_cycles > 2)
wait(wait_cycles-2); // Introduce delay needed
} datao = memory[address];
// Process functionality in member function dataout.write(datao);
below wait();
void entry(); }
}
};
} // end of entry function 624
Ver 1.4

312
Sample Solution Lab 5- accessor.h, accessor.cc
// accessor.cc
// accessor.h
#include <iostream.h> // For C++ I/O
struct accessor : sc_module { #include <stdio.h> // For C I/O
sc_in<sc_uint<32> > datain; //input #include "systemc.h"
#include "accessor.h"
sc_out<bool> chip_select; //output void accessor::entry() {
sc_out<bool> write_enable; //output int addr;
sc_out<sc_int<32> > address; //output unsigned int datao;
unsigned int datai;
sc_out<sc_uint<32> > dataout; //output addr = 10;
sc_in_clk CLK; datao = 0xdeadbeef;
while (true) {
// Write memory location first
// Parameter chip_select.write(true);
const int memory_latency; write_enable.write(true);
address.write(addr);
dataout.write(datao);
//Constructor printf("Accessor: Data Written = %x at address %x\n", datao, addr);
wait(memory_latency); // To make all the outputs appear at the interface
accessor(const char * NAME, const int // some process functionality not shown here during which chip
MEMORY_LATENCY) // chip select is deasserted and bus is tristated
: sc_module (NAME), chip_select.write(false);
dataout.write(0);
memory_latency(MEMORY_LATENCY) { wait();
sc_sync_tprocess(handle1, "ACCESSOR", // Now read memory location
chip_select.write(true);
accessor, entry, CLK.pos()); write_enable.write(false);
end_module(); address.write(addr);
} wait(memory_latency); // For data to appear
datai = datain.read();
printf("Accessor: Data Read = %x from address %x\n", datai, addr);
// Process functionality in member function chip_select.write(false);
wait();
below addr++;
void entry(); datao++;
}
};
} // end of entry function 625
Ver 1.4

Sample Solution Lab 5- main.cc


// main.cc
#include "systemc.h"
#include "accessor.h"
#include "ram.h"

int sc_main(int ac, char *av[]){


sc_signal<bool> cs("CS");
sc_signal<bool> we("WE");
sc_signal<sc_int<32> > addr("Address");
sc_signal<sc_uint<32> > data1("Data1");
sc_signal<sc_uint<32> > data2("Data2");
const int delay_cycles = 2;

sc_clock clk("Clock", 20, 0.5, 0.0);

accessor A("Accessor", delay_cycles);


A.datain(data2);
A.chip_select(cs);
A.write_enable(we);
A.address(addr);
A.dataout(data1);
A.CLK(clk);

ram R("Ram", delay_cycles);


R(data1, cs, we, addr, data2, clk);

sc_start(1060);
return(0);

626
Ver 1.4

313
Sample Solution Lab 6- main.cc
// main.cc // main.cc
#include "directive.h" // ************************ ICACHE
#include "systemc.h" //***********************************
#include "bios.h" // ICACHE = ram_cs
#include "paging.h" // ICACHE = ram_we
//#include "debugger.h" // ICACHE = addr
// ICACHE = ram_datain
//#include "ebl.h" // ICACHE = ram_dataout
#include "icache.h" // ICACHE = ld_valid = pid_valid
#include "fetch.h" // ICACHE = ld_data = pid_data
#include "decode.h" sc_signal<bool> icache_valid("ICACHE_VALID") ;
#include "exec.h"
#include "mmxu.h" // ************************ BIOS
//***********************************
#include "floating.h" sc_signal<bool> ram_cs("RAM_CS") ;
#include "dcache.h" sc_signal<bool> ram_we("RAM_WE") ;
#include "pic.h" sc_signal<sc_uint<32> > addr("Address") ;
#include <climits> sc_signal<sc_uint<32> > ram_datain("RAM_DATAIN") ;
#include <cstdlib> sc_signal<sc_uint<32> > ram_dataout("RAM_DATAOUT") ;
sc_signal<bool> bios_valid("BIOS_VALID") ;
int sc_main(int ac, char *av[]) const int delay_cycles = 2;
{
// ************************ Paging
//***********************************
// Paging paging_din = ram_datain

// Paging paging_csin = ram_cs

627
Ver 1.4

Sample Solution Lab 6- main.cc


// main.cc // main.cc
// Paging paging_wein = ram_we
// Paging logical_address = addr // IFU ram_we = ram_we
// Paging dataout = ram_dataout // IFU address = addr
// Paging data_valid = icache_valid // IFU smc_instrction = ram_datain
// Paging stall_ifu = stall_fetch // IFU pred_branch_address = pred_branch_address
// IFU pred_branch_valid = pred_branch_valid
sc_signal<sc_uint<32> > icache_din("ICACHE_DIN") ;
sc_signal<unsigned> instruction("INSTRUCTION") ;
sc_signal<bool>
sc_signal<bool> instruction_valid("INSTRUCTION_VALID") ;
icache_validin("ICACHE_VALIDIN") ;
sc_signal<bool> icache_stall("ICACHE_STALL") ; sc_signal<sc_uint<32> >
sc_signal<sc_uint<32> > program_counter("PROGRAM_COUNTER") ;
paging_dout("PAGING_DOUT") ; sc_signal<bool> branch_clear("BRANCH_CLEAR") ;
sc_signal<bool> paging_csout("PAGING_CSOUT") ; sc_signal<bool> pred_fetch_valid("PRED_FETCH_VALID") ;
sc_signal<bool> paging_weout("PAGING_WEOUT")
; sc_signal<bool> reset("RESET") ;
sc_signal<sc_uint<32> >

physical_address("PHYSICAL_ADDRESS") ; // ************************ Branch


// BPU: fetch_inst = instruction
// ************************ Fetch
// BPU: fetch_pc = program_counter
//***********************************
// IFU ramdata = ram_dataout // BPU: fetch_valid = instruction_valid
// IFU ram_valid = bios_valid // BPU: branch_inst_addr = branch_instruction_address
// BPU: branch_target_address = branch_target_address
// IFU ram_cs = ram_cs // BPU: branch_valid = branch_valid
sc_signal<sc_uint<32> > sc_signal<sc_uint<32> >
branch_target_address("BRANCH_TARGET_A pred_branch_address("PRED_BRANCH_ADDRESS");
DDRESS") ; sc_signal<bool> pred_branch_valid("PRED_BRANCH_VALID") ;
sc_signal<bool> next_pc("NEXT_PC") ; sc_signal<bool> pred_tellid("PRED_TELLID") ;
sc_signal<bool> sc_signal<unsigned> pred_instruction("PRED_INSTRUCTION") ;
branch_valid("BRANCH_VALID") ; sc_signal<bool> pred_inst_valid("PRED_INST_VALID") ;
sc_signal<bool>
stall_fetch("STALL_FETCH") ; sc_signal<sc_uint<32> > pred_inst_pc("PRED_INST_PC");
sc_signal<bool> 628
Ver 1.4
pred fetch("PRED FETCH") ;

314
Sample Solution Lab 6- main.cc
// main.cc // main.cc
// ************************ Decode
// ************************ DCACHE
***********************************
sc_signal<bool> pred_on("PRED_ON") ;
sc_signal<signed> mmic_datain("MMIC_DATAIN") ; /* DCU: datain */
sc_signal<sc_uint<32> >
sc_signal<unsigned> mmic_statein("MMIC_STATEIN") ;/* DCU: statein */
branch_instruction_address("BR_INSTRUCTION_ADDRE
sc_signal<bool> mmic_cs("MMIC_CS") ; /* DCU: cs */
SS");
sc_signal<bool> mmic_we("MMIC_WE") ; /* DCU: we */
// ID alu_dataout = dout from EXEC
sc_signal<sc_uint<32> > mmic_addr("MMIC_ADDR") ; /* DCU: addr */
sc_signal<signed> dram_dataout("DRAM_DATAOUT") ;
sc_signal<unsigned> mmic_dest("MMIC_DEST") ; /* DCU: dest */
sc_signal<bool> dram_rd_valid("DRAM_RD_VALID") ;
sc_signal<unsigned> mmic_destout("MMIC_DESTOUT") ;/* DCU: destout */
sc_signal<unsigned>
sc_signal<signed> mmic_dataout("MMIC_DATAOUT") ;/* DCU: dataout */
dram_write_src("DRAM_WRITE_SRC");
sc_signal<bool> mmic_out_valid("MMIC_OUT_VALID") ;/* DCU:
// ID next_pc = next_pc
out_valid*/
// ID branch_valid = branch_valid
sc_signal<unsigned> mmic_stateout("MMIC_STATEOUT") ;/* DCU:
// ID branch_target_address = branch_target_address
stateout */
sc_signal<bool> mem_access("MEM_ACCESS") ;
sc_signal<sc_uint<32> >
// ************************ Execute ***********************************
mem_address("MEM_ADDRESS") ;
sc_signal<int> alu_op("ALU_OP") ; // EXEC in_valid = decode_valid
sc_signal<bool> mem_write("MEM_WRITE") ; sc_signal<bool> in_valid("IN_VALID") ;
sc_signal<unsigned> alu_src("ALU_SRC") ; // EXEC opcode = alu_op
sc_signal<bool> reg_write("REG_WRITE") ; sc_signal<bool> negate("NEGATE") ;
sc_signal<signed int> src_A("SRC_A") ;
sc_signal<int> add1("ADD1") ;
sc_signal<signed int> src_B("SRC_B") ;
sc_signal<bool> forward_A("FORWARD_A") ; sc_signal<bool> shift_sel("SHIFT_SEL") ;
sc_signal<bool> forward_B("FORWARD_B") ; // EXEC dina = src_A
// ID stall_fetch = stall_fetch // EXEC dinb = src_B
sc_signal<bool> decode_valid("DECODE_VALID") ; // EXEC dest = alu_src
sc_signal<bool> c("C") ;
sc_signal<bool> float_valid("FLOAT_VALID") ;
sc_signal<bool> v("V") ;
sc_signal<bool> z("Z") ;
sc_signal<bool> mmx_valid("MMX_VALID") ;
sc_signal<signed> dout("DOUT") ;
sc_signal<bool> pid_valid("PID_VALID") ;
sc_signal<bool> out_valid("OUTPUT_VALID") ;
sc_signal<signed> pid_data("PID_DATA") ;
sc_signal<unsigned> destout("DESTOUT") ;

629
Ver 1.4

Sample Solution Lab 6- main.cc


// main.cc // main.cc
// ************************ Floating point // ************************ MMX
****************************** ***********************************
// FPU in_valid = float_valid // MMX mmx_valid = mmx_valid
// FPU opcode = alu_op // MMX opcode = alu_op
// FPU floata = src_A // MMX mmxa = src_A
// FPU floatb = src_B // MMX mmxb = src_B
// FPU dest = alu_src // MMX dest = dest
sc_signal<signed> fdout("FDOUT") ; // MMX mmxdout = fdout
sc_signal<bool> // MMX mmxout_valid = fpu_valid
fout_valid("FOUT_VALID") ; // MMX mmxdestout = fpu_destout
sc_signal<unsigned>
fdestout("FDESTOUT") ; // ************************ DSP
*****************************************
// ************************ PIC sc_signal<int> dsp_in1("DPS_IN1");
***************************************** sc_signal<int> dsp_out1("DSP_OUT1");
sc_signal<bool> ireq0("IREQ0") ; sc_signal<bool> dsp_data_valid("DSP_DATA_VALID");
sc_signal<bool> ireq1("IREQ1") ; sc_signal<bool> dsp_input_valid("DSP_INPUT_VALID");
sc_signal<bool> ireq2("IREQ2") ; sc_signal<bool>
sc_signal<bool> ireq3("IREQ3") ; dsp_data_requested("DSP_DATA_REQUESTED");
// PIC cs = interrupt_ack
// PIC intack_cpu = interrupt_ack ////////////////////////////////////////////////////////////////////////////
sc_signal<bool> rd_wr("RD_WR") ; // MAIN PROGRAM
sc_signal<bool> intreq("INTREQ") ;
sc_signal<unsigned> vectno("VECTNO") ; ////////////////////////////////////////////////////////////////////////////
sc_signal<bool> intack("INTACK") ;
sc_signal<bool>
intack_cpu("INTACK_CPU") ;

630
Ver 1.4

315
Sample Solution Lab 6- main.cc
// main.cc // main.cc
sc_clock clk("Clock", 1, 0.5, 0.0); fetch IFU("FETCH_BLOCK", delay_cycles);
IFU << ram_dataout << branch_target_address <<
printf("////////////////////////////////////////////////////////////////////// next_pc << branch_valid << stall_fetch << intreq << vectno <<
///\n"); bios_valid << icache_valid << pred_fetch << pred_branch_address
printf("// This code is written at SYNOPSYS, Inc.\n"); << pred_branch_valid << ram_cs << ram_we << addr <<
printf("////////////////////////////////////////////////////////////////////// ram_datain << instruction << instruction_valid <<program_counter
///\n"); << intack_cpu << branch_clear << pred_fetch_valid << reset << clk;
printf("// Module : main of CPU Model\n");
printf("// Author : Martin Wang\n");
printf("// Company : SYNOPSYS, Inc.\n"); decode IDU("DECODE_BLOCK");
printf("// Purpose : This is a simple CPU modeling using IDU << reset << instruction << pred_instruction <<
SystemC.\n"); instruction_valid << pred_inst_valid << out_valid << destout <<
printf("// Instruction Set Architecure defined by dout << dram_dataout << dram_rd_valid << destout << fdout <<
Martin Wang.\n") fout_valid << fdestout << branch_clear << dsp_data_valid <<
; program_counter << pred_on << branch_instruction_address <<
printf("// \n"); next_pc << branch_valid << branch_target_address << mem_access
printf("// SystemC (TM) Copyright (c) 1988-1999 << mem_address << alu_op << mem_write << alu_src << reg_write
by Synopsys, Inc. \ << src_A << src_B << forward_A << forward_B << stall_fetch <<
n"); decode_valid << float_valid << mmx_valid << pid_valid << pid_data
printf("// \n");
printf("////////////////////////////////////////////////////////////////////// << clk;
///\n");

631
Ver 1.4

Sample Solution Lab 6- main.cc


// main.cc floating // main.cc
FPU("FLOAT_BLOCK"); // order dependent paging PAGING("PAGING_BLOCK");
FPU << float_valid << alu_op << src_A << src_B
PAGING << ram_datain << ram_cs << ram_we <<
<< alu_src addr << icache_din << icache_validin << icache_stall <<
<< fdout << fout_valid << fdestout << clk; paging_dout << paging_csout << paging_weout <<physical_address
<< ram_dataout << icache_valid << stall_fetch << clk ;
mmxu MMXU("MMX_BLOCK");
MMXU << mmx_valid << alu_op << src_A <<
icache ICACHE("ICACHE_BLOCK", delay_cycles);
src_B << alu_src ICACHE << paging_dout << paging_csout <<
<< fdout << fout_valid << fdestout << clk; paging_weout << physical_address << pid_valid << pid_data <<
icache_din << icache_validin<< icache_stall << clk;
bios BIOS("BIOS_BLOCK", delay_cycles);
BIOS.datain(ram_datain); // order dcache DCACHE("DCACHE_BLOCK", delay_cycles);
independent
DCACHE << mmic_datain << mmic_statein <<
BIOS.cs(ram_cs); mmic_cs << mmic_we << mmic_addr << mmic_dest <<
BIOS.we(ram_we); mmic_destout << mmic_dataout << mmic_out_valid <<
BIOS.addr(addr); mmic_stateout << clk;
BIOS.dataout(ram_dataout); pic APIC("PIC_BLOCK");
BIOS.bios_valid(bios_valid); APIC << ireq0 << ireq1 << ireq2 << ireq3 <<intack_cpu
BIOS.stall_fetch(stall_fetch); << rd_wr << intack_cpu << intreq << intack << vectno;
BIOS.CLK(clk);
struct tms tbuffer; // For keeping track of cycle counts
times(&tbuffer);
float old_time = float((tbuffer.tms_utime + tbuffer.tms_stime));
sc_start(-1);
times(&tbuffer);
float new_time = float((tbuffer.tms_utime + tbuffer.tms_stime));
cout << "Time for simulation = " << (new_time - old_time) <<
endl;
return 0; /* this is necessary */ 632
Ver 1.4
}

316
Sample Solution Lab 7- pix_smoother.h and .cc
// pix_smoother.h // pix_smoother.cc
#ifndef PIX_SMOOTHER_H #include "systemc.h"
#define PIX_SMOOTHER_H #include "pix_smoother.h"
void pix_smoother::entry()
struct pix_smoother : sc_module { {
/* Channel Ports */ unsigned char pix;
unsigned char nbd_pix_cnt;
sc_channel<unsigned char> pix_in;
sc_channel<unsigned char> pix_nbd_cnt; unsigned char pix_array[9];
sc_channel<unsigned char> pix_nbd;
sc_channel<unsigned char> pix_out; while(true){
int sum = 0;
pix_smoother( const char* NAME) pix = pix_in.read(); //do you know why I have this
statemen
: sc_module (NAME) { nbd_pix_cnt = pix_nbd_cnt.read();
sc_async_tprocess(handle1, for(int i=0 ;i < nbd_pix_cnt; i++){
"PIX_SMOOTHER", pix_smoother, entry); pix_array[i] = pix_nbd.read();
sensitive << pix_in << pix_nbd_cnt << sum += pix_array[i];
pix_nbd; }
end_module(); sum /= nbd_pix_cnt; // Average over nbd pixel count
pix_out.write((unsigned char)sum);
}
}
void entry();
};
}

#endif

633
Ver 1.4

Sample Solution Lab 8- stage1_2.h , pipeline.h


// pipeline.h
// stage1_2.h
#include "stage1_2.h"
#include "stage1.h"
#include "stage3.h"
#include "stage2.h"
struct pipeline : public sc_module {
struct stage1_2 : public sc_module {
sc_in<double> in1;
sc_in<double> in1;
sc_in<double> in2;
sc_in<double> in2;
sc_out<double> out;
sc_out<double> prod;
sc_in_clk CLK;
sc_out<double> quot;
sc_in_clk CLK;
sc_signal<double> prod; // internal signals
sc_signal<double> sum ; // internal signal
sc_signal<double> quot; // internal signals
sc_signal<double> diff ; // internal signal
stage1_2 STAGE1_2;
stage1 STAGE1;
stage3 STAGE3;
stage2 STAGE2;
//Constructor
//Constructor
pipeline(sc_module_name NAME)
stage1_2(sc_module_name NAME)
: STAGE1_2("STAGE1_2_BLOCK"),
: STAGE1("STAGE1_BLOCK"),
STAGE3("STAGE3_BLOCK") {
STAGE2("STAGE2_BLOCK")
{
STAGE1_2(in1, in2, prod, quot, CLK);
STAGE1(in1, in2, sum, diff, CLK);
STAGE3(prod, quot, out, CLK);
STAGE2(sum, diff, prod, quot, CLK);
}
end_module();
};
}

};

634
Ver 1.4

317
Sample Solution Lab 8- testbench.h , main.cc
// main.cc
// testbench.h
#include "display.h" #include "systemc.h"
#include "numgen.h" #include "pipeline.h"
#include "testbench.h"
struct testbench : public sc_module {
numgen N; // component int sc_main(int ac, char *av[])
display D; // component
{
//Constructor sc_signal<double> in1;
testbench(const char *NAME, sc_signal<double> in2;
sc_clock_edge& CLK, sc_signal<double> powr;
const sc_signal<double>& IN,
sc_signal<double>& OUT1, sc_clock clk("CLOCK", 20.0, 0.5, 0.0);
sc_signal<double>& OUT2)
: sc_module(NAME),
N("Numgen", CLK, OUT1, OUT2), testbench T("Testbench", clk.pos(), powr, in1, in2);
D("Display", CLK, IN) pipeline P("PIPE");
{ P << in1 << in2 << powr << clk;
end_module();
} sc_start(1000);
}; return(0);
}

635
Ver 1.4

Sample Solution Lab 9- Master.h


// master.h
// master.h
#ifndef MASTER_H : sc_module (NAME) {
#define MASTER_H sc_async_fprocess(handle1, "MASTER",
#include "systemc.h" Master, entry);
#include "math.h" sensitive_pos(mClock);
struct Master : sc_module {
sensitive(mGranted);
static unsigned int masterSeed; srand(masterSeed);
masterSeed++;
sc_out<bool> mRequest; end_module();
sc_inout_rv<32> mAddress; }
sc_inout_rv<256> mData;
sc_inout<sc_logic> mDirection; void entry();
sc_in_clk mClock;
sc_in<bool> mGranted; bool fireAtRandomRequest() const;
sc_logic fireAtRandomDirection() const;
sc_logic sampledDirection; sc_logic_vector fireAtRandomData() const;
sc_logic_vector fireAtRandomAddress() const;
enum m_state { mStart, mRequesting,
mTransfering } mCurrentState, mNextState;
};

Master( char* NAME)

636
Ver 1.4

318
Sample Solution Lab 9- Master.cc
// master.cc
// master.cc
mData.read();
#include "Master.h"
}
#define debugging
if (fireAtRandomRequest()==true) {
mRequest.write(true);
unsigned int Master::masterSeed=1;
mNextState = mRequesting;
}
void Master::entry() {
else mNextState = mStart;
// reaction to a positive edge of the clock
mAddress.write("ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
");
sc_logic tmpDirection;
sc_logic_vector tmpAddress(32);
mData.write("ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
sc_logic_vector tmpData(256);
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
if (mClock.posedge()) {
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
sampledDirection = mDirection.read();
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
mCurrentState = mNextState;
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
}
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ");
mDirection.write('Z');
// master finite state machine
break;
switch (mCurrentState) {
case mRequesting:
case mStart:
if (mGranted.read()==true) {
if (sampledDirection=='0') { mRequest.write(false);
mNextState = mTransfering;
}
else {
mNextState = mRequesting;

}
637
Ver 1.4

Sample Solution Lab 9- Master.cc


// master.cc
// master.cc
mData.write(tmpData);
mAddress.write("ZZZZZZZZZZZZZZZZZZZZZZZZZZ
mDirection.write(1);
ZZZZZZ");
mAddress.write(tmpAddress);
mData.write("ZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
}
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
else {
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
mData.write("ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
ZZZZZZZZZZZZZZZZZZZZZZZZZ
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
ZZZZZZZZ");
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
mDirection.write('Z');
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
break;
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ");
case mTransfering:
mDirection.write(0);
if (mClock.posedge()){
mAddress.write(tmpAddress);
if (sampledDirection == 0) {
}
cout << "master's reading from address: "<<
}
mAddress.read() << endl;
if (mGranted.read() == false) mNextState =mStart;
cout << "data read by master: " << mData.read() <<
break;
endl;
default:
}
mNextState = mStart;
break;
tmpData = fireAtRandomData();
}
cout << "master's Data Choice: " << tmpData << endl;
};
tmpDirection = fireAtRandomDirection();
cout << "master's direction choice: " << tmpDirection
<< endl; bool Master::fireAtRandomRequest() const {
tmpAddress = fireAtRandomAddress();
cout << "master's Address Choice: " << tmpAddress
<< endl;

if (tmpDirection == 1) { 638
Ver 1.4
D t it (t D t )

319
Sample Solution Lab 9- Master.cc
// master.cc
// master.cc
if ( ((double(rand()))/RAND_MAX) > 0.9 ) return true; return result;
else return false;
};
};
sc_logic_vector Master::fireAtRandomAddress() const {
sc_logic Master::fireAtRandomDirection() const {
unsigned int ui = (unsigned int)floor(
if ( ((double(rand()))/RAND_MAX) > 0.5 ) return '1'; (double(rand())/RAND_MAX) * pow(2.0,32.0)
else return '0'; );
sc_logic_vector result(32);
}; result = ui;
return result;
sc_logic_vector Master::fireAtRandomData() const {
};
unsigned int ui;
double r;
sc_logic_vector result(256);

for ( int i=0; i<8; i++) {


r = (double(rand())/RAND_MAX);
ui = (unsigned int)floor(r*pow(2.0,32.0));
result.range(31+i*32,i*32) = ui;

639
Ver 1.4

Sample Solution Lab 10- fir.h, Makefile


// fir.h // makefile
#ifndef fir_H TARGET_ARCH = gccsparcOS5
#define fir_H
MODULE = lab10
extern void f_fir(const char* name, sc_in<bool> reset, SRCS = stimulus.cc display.cc main.cc
sc_in<bool> input_valid, OBJS = $(SRCS:.cc=.o) fir.o
sc_in<int> sample,
sc_out<bool> output_data_ready, sc_out<int> result,
sc_in_clk include ../Makefile.defs
CLK) ;

#endif

640
Ver 1.4

320
Sample Solution Lab 10- fir.cc
// fir.cc // fir.cc
#include <systemc.h> coefs[4] = -18;
#include "fir.h" coefs[5] = -41;
coefs[6] = 23;
struct fir : sc_module {
coefs[7] = 154;
sc_in<bool> reset; coefs[8] = 222;
sc_in<bool> input_valid; coefs[9] = 154;
sc_in<int> sample; coefs[10] = 23;
sc_out<bool> output_data_ready; coefs[11] = -41;
sc_out<int> result;
coefs[12] = -18;
sc_in_clk CLK;
coefs[13] = 16;
sc_int<9> coefs[16]; coefs[14] = 13;
coefs[15] = -4;
SC_CTOR(fir) }
{
SC_CTHREAD(entry, CLK.pos());
void entry();
set_stack_size(99999);
watching(reset.delayed() == true); };
coefs[0] = -6;
coefs[1] = -4;
coefs[2] = 13;

coefs[3] = 16;

641
Ver 1.4

Sample Solution Lab 10- fir.cc


// fir.cc
// Multiply accumulate unit
void fir::entry() {
// preserve if necessary
sc_int<19> mac_func( sc_int<8> sample_tmp;
const sc_int<8>& a, sc_int<17> pro;
const sc_int<9>& b, sc_int<19> acc;
const sc_int<19>& c sc_int<8> shift[16];
){
/* MAC function */
sc_int<19> d; // reset watching
d = a*b; /* this would be an unrolled loop */
d += c; for (int i=0; i<=15; i++)
return d; shift[i] = 0;
}
result.write(0);
output_data_ready.write(false);
wait();

// main functionality
while(1) {
output_data_ready.write(false);
wait_until(input_valid.delayed() == true);
sample_tmp = sample.read();
acc = sample_tmp*coefs[0];

for(int i=14; i>=0; i--) {

642
Ver 1.4

321
Sample Solution Lab 10- fir.cc
// fir.cc
void f_fir(const char* name, sc_in<bool> reset, sc_in<bool> input_valid, sc_in<i
/* this would be an unrolled loop */ nt> sample,
sc_out<bool> output_data_ready, sc_out<int> result, sc_in_clk
// pro = shift[i]*coefs[i+1]; CLK) {
// acc += pro; fir* f = (fir*) SC_NEW(fir(name));
acc = mac_func(shift[i], coefs[i+1], acc); (*f)(reset, input_valid, sample, output_data_ready, result, CLK);
}
};

for(int i=14; i>=0; i--) {


/* this would be an unrolled loop */
shift[i+1] = shift[i];
};

shift[0] = sample_tmp;
// write output values
result.write(acc);
output_data_ready.write(true);
wait();
};
}

643
Ver 1.4

Sample Solution Lab 11- rsa.cc


// rsa.cc
#include <stdlib.h> srand48(in_seed);
#include <sys/types.h>
#include <time.h>
#include "systemc.h" return in_seed;
}
#define ABS_VAL(x) (x < 0 ? -x : x) // Absolute value of x.

#define NBITS 250 // The num. of bits in n of P // Flip a coin with probability p.
and S below.
#define NBITS2 (NBITS / 2) // The num. of bits in p and
bool
q, the prime factor flip(double p)
s of n. {
#define MSG_NBITS (NBITS - 2) // The num. of bits in if (drand48() < p)
the message to cipher. return true;
else
typedef sc_bigint<NBITS> bigint;
return false;
// Initialize the random number generator. }
long
randomize(long seed)
{ // Randomly generate a bit string with nbits bits.
long in_seed = seed; // str has a length of nbits + 1.
if (in_seed == -1)
void
rand_bitstr(char *str, int nbits)
time(&in_seed); {
str[0] = '0'; // Sign for positive numbers.

644
Ver 1.4

322
Sample Solution Lab 11- rsa.cc
// rsa.cc
for (int i = 1; i < nbits; ++i) ret_pos(const bigint& x, const bigint& n)
str[i] = (flip(0.5) == true ? '1' : '0'); {
if (x < 0)
str[nbits] = '\0'; return x + n;
} return x;
}

// Generate "111..111" with nbits bits for masking.


// str has a length of nbits + 1. // Compute the greatest common divisor (gcd) of a and b.
void bigint
max_bitstr(char *str, int nbits) gcd(const bigint& a, const bigint& b)
{ {
str[0] = '0'; // Sign for positive numbers. if (b == 0)
return a;
for (int i = 1; i < nbits; ++i) return gcd(b, a % b);
str[i] = '1'; }

str[nbits] = '\0';
} // Compute d, x, and y such that d = gcd(a, b) = ax + by
// x and y can be zero or negative.
void
// Return a positive remainder. euclid(const bigint& a, const bigint& b, bigint& d,
bigint bigint& x, bigint& y)

645
Ver 1.4

Sample Solution Lab 11- rsa.cc


// rsa.cc
{
if (b == 0) { bigint d = 1;
d = a;
x = 1; for (int i = b.length() - 1; i >= 0; --i)
y = 0; {
d = (d * d) % n;
return;
} if (b[i])
d = (d * a) % n;
euclid(b, a % b, d, x, y); }

bigint tmp = x; return ret_pos(d, n);


x = y; }
y = tmp - (a / b) * y;

return; // Return the multiplicative inverse of a, modulo n,


} when a and n are
// relatively prime.
bigint
// Return d = a^b % n. inverse(const bigint& a, const bigint& n)
bigint {
modular_exp(const bigint& a, const bigint& b, bigint d, x, y;
const bigint& n)
euclid(a, n, d, x, y);
{ assert(d == 1);

646
Ver 1.4

323
Sample Solution Lab 11- rsa.cc
// rsa.cc
x %= n; // Return true iff a is a witness to the compositeness of n,
i.e., a
return ret_pos(x, n); // can be used to prove that n is composite.
} bool
witness(const bigint& a, const bigint& n)
{
// Find a small odd integer a that is relatively bigint d = 1;
prime to b. I do not bigint x;
// know an efficient algorithm to do that. The loop
below usually // Compute d = a^(n-1) % n.
// iterates a few times. for (int i = n.length() - 1; i >= 0; --i)
bigint {
find_rel_prime(const bigint& n) x = ABS_VAL(d);
{
bigint a = 3; d = (d * d) % n;
while (true) {
if (gcd(a, n) == 1) // x is a nontrivial square root of 1 modulo n ==> n is
break; composite.
a += 2; if ((ABS_VAL(d) == 1) && (x != 1) && (x != (n - 1)))
} return true;

return a; if ((i > 0) && (n[i]))


} d = (d * a) % n;
}

647
Ver 1.4

Sample Solution Lab 11- rsa.cc


// rsa.cc
// d = a^(n-1) % n != 1 ==> n is composite. return true; // n may be prime.
if (ABS_VAL(d) != 1) }
return true;
// Return true if n is almost surely prime, return
return false; false if n is
} // definitely composite. CLR suggests s = 50 for any
imaginable
// application, and s = 3 if we are trying to find
// Check to see if n has any small divisors. large primes by
bool // applying miller_rabin to randomly chosen large
div_test(const bigint& n) integers. Even
{ // though we are doing the latter here, we will still
int limit; choose s = 50.
bool
if (n < 1023) miller_rabin(const bigint& n)
limit = n.to_int() - 2; {
else if (n <= 1)
limit = 1023; return false;

for (int i = 3; i <= limit; i += 2) { if (! div_test(n))


if (n % i == 0) return false;
return false; // n is composite.
char str[NBITS + 1];
}
int s = 50;
for (int j = 1; j <= s; ++j) { 648
Ver 1.4

324
Sample Solution Lab 11- rsa.cc
// rsa.cc
// Choose a random number. rand_bitstr(p_str, NBITS2);
rand_bitstr(str, NBITS); p_str[NBITS2 - 1] = '1'; // Force p to be an odd number.
bigint p = p_str;
// Set a to the chosen number.
bigint a = str; // p is randomly determined. Now, we'll look for a prime
in the
// Make sure that a is in [1, n - 1]. // vicinity of p. By the prime number theorem, executing
a = (a % (n - 1)) + 1; the
// following loop approx. ln (2^NBITS) iterations should
// Check to see if a is a witness. find a
if (witness(a, n)) // prime.
return false; // n is definitely composite.
} while (! miller_rabin(p))
p = (p + 2) % r;
return true; // n is almost surely prime. return p;
} }

// Return a prime number. // Encode or cipher the message in msg using the RSA
bigint public key P=(e, n).
find_prime(const bigint& r) bigint
{ cipher(const bigint& msg, const bigint& e, const bigint&
n)
char p_str[NBITS2 + 1]; {
return modular_exp(msg, e, n);
}
649
Ver 1.4

Sample Solution Lab 11- rsa.cc


// rsa.cc
// Dencode or decipher the message in msg using // Find two large primes p and q.
the RSA secret key S=(d, n). bigint p = find_prime(r);
bigint bigint q = find_prime(r);
decipher(const bigint& msg, const bigint& d, const
bigint& n) // Compute n and (p - 1) * (q - 1) = m.
{ bigint n = p * q;
return modular_exp(msg, d, n); bigint m = (p - 1) * (q - 1);
}
// Find a small odd integer e that is relatively prime to m.
bigint e = find_rel_prime(m);
// The RSA cipher.
void // Find the multiplicative inverse d of e, modulo m.
rsa() bigint d = inverse(e, m);
{
// Generate all 1's in r. // Output public and secret keys.
char r_str[NBITS2 + 1]; cout << "RSA public key: P=(e, n)" << endl;
cout << "e = " << e << endl;
max_bitstr(r_str, NBITS2); cout << "n = " << n << endl;
bigint r = r_str; cout << endl;

// Initialize the random number generator. cout << "RSA secret key: S=(d, n)" << endl;
cout << "seed = " << randomize(long(-1)) << cout << "d = " << d << endl;
endl;
cout << "n = " << n << endl;

cout << endl;


650
Ver 1.4

325
Sample Solution Lab 11- rsa.cc
// rsa.cc
cout << endl; int
sc_main(int argc, char *argv[])
// Cipher and decipher a randomly generated {
message msg. rsa();
char msg_str[MSG_NBITS + 1]; return 0;
rand_bitstr(msg_str, MSG_NBITS); }
bigint msg = msg_str;

msg %= n; // Make sure msg is smaller than n. If


larger, this part
// will be a block of the input message.

cout << "msg = " << msg << endl;


bigint msg2 = cipher(msg, e, n);
cout << "Ciphered msg = " << msg2 << endl;

msg2 = decipher(msg2, d, n);


cout << "Deciphered msg = " << msg2 << endl;

// Make sure that the original message is


recovered.
assert(msg == msg);
return;
}

651
Ver 1.4

Changes from Ver 0.9 to 0.91

Appendix : What’s New


about Version 0.91

652
Ver 1.4

326
What is New?

 More Structured Hierarchical Design Methodology


 Multiple processes can now be easily specified inside one
module.
 SystemC Version 0.9 Processes Renamed
 More intuitive name for hardware and software designers

 Introduction of Ports
 Similar to VHDL/Verilog, concept of ports is introduced making
writing constructor for the interface file much easier.
 i.e.sc_in<input type>, sc_out<output type>, sc_inout<inout type>
 Introduction of fast bit-vector (sc_bv) and logic-vector
(sc_lv)
 e.g. sc_bv<32> address;

653
Ver 1.4

Why Version 0.91 need gcc2.95.x ?

in sc_module.h, we have some macros to define processes. these macros use


another macro called SC_DECL_HELPER_STRUCT. This macro may define a
structure that contains a member function. according to the c++ standard,
since this structure is defined in a block statement in declare_sc_xxxx_process
macros, this structure itself and its member function should not be visible outside
this block statement, i.e., it is said to have 'no linkage'. g++'s older versions
doesn't handle this correctly and complaints about multiple definitions of this
structure /function. g++'s versions 2.95.x handle this case according to the
standard. since this issue is difficult to resolve w/ changes in the code,
=> we had to use a better version of g++ that conforms to the c++ standard.

654
Ver 1.4

327
New SystemC Ver 0.91 Process Naming Convention

 SystemCTM supports three different process types:


 NOTE: SystemC is backward compatible

Asynchronous Function Asynchronous Thread Synchronous Thread


Process Process Process

Ver 0.9 sc_async sc_aproc sc_sync

Ver 1.0 Draft Spec sc_async_fprocess sc_async_tprocess sc_sync_tprocess

SC_METHOD SC_THREAD SC_CTHREAD


Ver 0.91

655
Ver 1.4

Version 0.91 Example


Example: // ONE MODULE
SC_MODULE(driver_mod) {

// Input ports:
Input port sc_in<type>
sc_in_clk clk; // Clock for the actions of the driver.
sc_in<double> speed;
sc_in<double> angle;
sc_in<double> total; Output port sc_out<type>
sc_in<double> partial;

// Output ports:
sc_out<bool> reset; // Set if the driver wants to reset the partial
// distance odometer.
sc_out<int> speed_set; // Speed of the car as set by the driver.
sc_out<bool> start; // Set if the driver starts the car. Multiple Processes
// Driver's actions.
void driver_out_proc();
void driver_in_proc();

SC_CTOR(driver_mod) {

SC_CTHREAD(driver_out_proc, clk.pos());

SC_METHOD(driver_in_proc);
sensitive << speed << angle << total << partial;

}
};
656
Ver 1.4

328
Simple Version 1.0 Example

Appendix : Version 1.0

Draft Specification
Simple Example

657
Ver 1.4

Simple SystemC 1.0 Example


//example.cc
//example.cc
struct my_module : sc_module { sensitive << in1 << in2;
sc_in<sc_uint<8> > in1;
sc_in<sc_uint<8> > in2; sc_async_fprocess(handle2, "update", my_module,
sc_out<sc_uint<8> > out1; update);
sc_out<sc_uint<8> > out2; sensitive << clock;
sc_in<bool> clock; }
};
sc_uint<8> temp;
struct testbench : sc_module {
void add() sc_in<sc_uint<8> > in1, in2;
{ sc_out<sc_uint<8> > out1, out2;
temp = in1.read() + in2.read(); sc_in<bool> clock;
out1 = temp;
} int i, j;
void stimulus()
void update() {
{ cout << "Out1 = " << i << " Out2 = " << j << endl;
out2 = temp; out1 = i++;
} out2 = j++;
}
my_module(sc_module_name myname)
{ void response()
sc_async_fprocess(handle1, "add", {

my_module, add); int a, b;

658
Ver 1.4

329
Simple SystemC 1.0 Example
//example.cc
//example.cc
a = in1.read(); b = in2.read();
cout << "In1 = " << a << " In2 = " << b << endl; my_module M("MM");
} M << adder_in_1 << adder_in_2 << adder_out_1 <<
adder_out_2 << clock;
testbench(sc_module_name my_name) testbench T("TB");
{ T << adder_out_1 << adder_out_2 << adder_in_1 <<
sc_async_fprocess(handle1, "stim", testbench, adder_in_2 << clock;
stimulus);
sensitive_pos(clock); clock = 0;
adder_in_1.write(0); adder_in_2.write(0);
sc_async_fprocess(handle2, "resp", testbench, sc_initialize();
response); for (int i = 0; i < 10; i ++) {
sensitive << in1 << in2; clock = 1;
sc_cycle(10);
i = j = 0; clock = 0;
} sc_cycle(10);
}; }

return 0;
int sc_main(int ac, char *av[]) }
{
sc_signal<sc_uint<8> > adder_in_1, adder_in_2;
sc_signal<sc_uint<8> > adder_out_1, adder_out_2;
sc_signal<bool> clock;

659
Ver 1.4

SystemC 1.0 preview: fixed point data types

 Signed/unsigned fixed-point datatypes


 Two’s complement representation used for representing
signed fixed-point numbers
 Operations performed using arbitrary precision
 Quantization mode and over