Sie sind auf Seite 1von 6

SDF or DDF

Algorithm Simulation
BP domain

Partition

topic of this paper

tem could be found. To be simulated, the generated C code


is compiled into a UNIX process and the generated VHDL
code from the hardware graph is passed to the VHDL simulator for hardware simulation.
Back Plane

VHDL

CGC

BP scheduler

interface insertion

: interface node

CGC

Dataflow in
original design

Event Queue

VHDL

test vector

VHDL

Dataflow
in Backplane
Dataflow
Visulaization using socket

Code generation
with interface
VHDL

compile with
Unix CC
Unix
Process

construction
simulator
VHDL

cosimulation

simulator
Synthesize

DSP compiler
Evaluation

C Process

VHDL simulator

not satisfied

O.K.
DSP executable file

FPGA loadable file


Prototyping Board

Figure 1. Hardware Software Codesign Workflow

 

 $ #)'*+ +*+



%

The next step is to partition the initial dataflow graph


into subgraphs; software graphs and hardware graphs. By
cosimulation and evaluation, the feasibility and the cost effectiveness of a partition is examined. At each iteration of
codesign process, a new partition is made, which requires
the rebuilding of the interface between two subgraphs.
Since making an interface is tedious and error-prone work,
it is desirable to generate the interface automatically[10].
This is the main topic of this paper. After partitioning, each
subgraph is modified in order to add interface nodes at the
boundary. From the partitioned graphs, C and VHDL codes
are generated. No user intervention is needed for the addition of interface node and code generation. The interface
node to be added is chosen from the reusable interface node
library according to the synthesis target and communication
protocol.
Through cosimulation, a designer can check the functional correctness ahead of the final synthesis step. Also,
cosimulation is used to get the profiling information. As
shown in [1], the profiling results help the designer partition
the target system more efficiently. Another role is to identify the performance bottleneck. Moreover, by the timed
cosimulation, the exact timing behavior of the whole sys-

Figure 2. The cosimulation backplane and


communication with client simulators

To combine two concurrent simulator (C process and


VHDL simulator), PeaCE introduces and implements a
backplane concept, which reduces the number of interface
module from N(N-1) to N. In the backplane approach, a
software or a hardware simulator interacts with PeaCE,
the cosimulation backplane, not knowing the existence of
the counterpart. The backplane monitors and manages all
communication events between the software and the hardware simulators. On the other hand, Ptolemy group of
U.C.Berkeley[8] devised a common interface mechanism
between any pair of simulators so that they also achieve the
same reduction of interface modules to N. Their approach
is called heterogeneoussimulation[12].
Therefore, as shown in figure 2, while the cosimulation
is in progress, several UNIX processes are running concurrently and cooperatively: C processes, a VHDL simulator
and PeaCE itself. In figure 2, a C process and a VHDL simulator communicate with each other through the cosimulation backplane, PeaCE. In the backplane, we can use simulation and visualization capability of Ptolemy. In figure 2,
a dashed line represents a flow of data within the backplane
through a function call and a solid line displays a data transmission through socket. The user specifies the dataflow
with the dotted lines as shown in figure 4.
The backplane supports an event-driven scheduling with
an event queue, which holds future events sorted by time.
Any data between processes is transferred through the
cosimulation backplane, which makes the event queue in
the backplane a global event queue. The backplane scheduler manages the event queue and transmits a packet to a
client simulator in the order of event generation time. If

the destination is an external process like a C process or a


VHDL simulator, the backplane scheduler calls the utility
functions to send the packet via socket. The interface node,
automatically inserted by PeaCE, receives the packets and
transmits it to C or VHDL module. After transmitting packets to a client simulator, the backplane scheduler waits in a
polling loop until it receives the results. To avoid deadlock,
the client simulator should make it sure to send a response
packet to the backplane even when there is no result data.



this paper are from this requirement. It is revealed that


the interface design will be improved significantly if
we can add some capabilities to the simulator. In section 4.3, the desirable capabilities will be described.


#&+
 + +$% +"

No modification of VHDL module libraries : We


will not change the code of VHDL modules but augment the interface code automatically without users
intervention. It distinguishes our approach from Saris
approach of Carnegie Mellon University[11].



No restriction on VHDL specification : A previous work from U.C.Berkeley restricted the VHDL program, which is generated from a program graph with
SDF semantics, to a single thread of control [8]. Even
though this approach schedules the communication
statically for deadlock avoidance as well as runtime
performance improvement, it is too restrictive for general applications. In [8], only one sequential process is
running on the VHDL simulator. Our VHDL specification has no such restrictions on VHDL model. In fact,
we add a new VHDL module as the interface module,
which will run concurrently with the VHDL graph.

Based on simulation results, the evaluation module


makes a decision whether the current partition satisfies the
system requirements. Unless they are satisfied, the codesign process continues to iterate from the partition stage
to the evaluation stage. Otherwise, the codesign process
reaches the synthesis and testing step. After validation
check through timed cosimulation and evaluation step, C
and VHDL codes are regenerated from the hardware and
software subgraphs. In this stage, the generated code includes interface code suitable for the prototyping board,
which consists of a DSP and a FPGA.


3 Design and generation of cosimulation interface


In this paper, we use VHDL simulators and UNIX processes as concurrent processes communicating each other
through BSD sockets. An alternative approach is to treat
the entire system as a single process by making the C portions of the system as the procedures called from the VHDL
modules[7]. A serious drawback of this approach is that not
all software parts can be expressed as procedures.
In our environment, the communication between the
hardware and the software is modeled as a message passing system. We aim to design the flexible and extensible
message passing interface for cosimulation.
We first make the desired characteristics of the message
passing interface for generic cosimulation.


No modification of the initial specification : Generally, in the initial algorithm specification, there is
no considerations for partitioning and interfacing. So,
whenever the cosimulation is needed, interface code
should be generated on every hardware-software partition automatically. By only adding new the interface, a
cosimulation is constructed, without any modification
of the user design.
No modification of VHDL simulator : Since we will
use existing VHDL simulators for HW simulation, this
requirement is crucial. Many of problems we met in

Timed cosimulation : After functional correctness is


validated, we also need to check the timing requirements of the system. Since the VHDL simulator has
the notion of time, we can perform timed cosimulation
if software processes are managed by an event-driven
scheduling. Then, we need to define a synchronization
protocol between two concurrent event-driven simulators.

4 Implementation




*)'   #)


$

"!
+
$%

Our codesign environment, PeaCE is based on the


Ptolemy which is a framework for heterogeneous system
specification, simulation and synthesis[6]. In Ptolemy, an
application is represented as a block diagram which is given
an appropriate semantics. For example, a DSP algorithm
is represented as a block diagram with the dataflow graph
semantics. Each block contains the code to be executed
to make the coarse grain dataflow graph. In this paper,
we consider a category of applications that can be represented with dataflow graphs, for example, DSP systems.
Ptolemy generates C codes from the partitioned dataflow
graphs for software and makes stand-alone C processes to
be run on the host computer. And, from the partitioned
dataflow graphs for hardware, Ptolemy generates VHDL
codes that are passed to the VHDL simulator. Before C
codes and VHDL codes are generated from the partitioned

graphs, communication blocks are automatically inserted at


each partition boundary.
The type and connectivity information of the communication block are decided by those of the partitioned arc.
These communication blocks contain all communication
and synchronization routines for cosimulation: they include
socket establishment, socket termination, data communication, buffer management and synchronization by timestamp management. By the foreign interface mechanism
according to the ANSI/IEEE std. 1076-1993, the communication codes for VHDL are written in C and called by the
VHDL code in the newly added communication modules.

 


)" 

#)
$
 
'
%& 
inf(1) object_time

inf(2)

VHDL signal
VHDL foreign interface

M_1

Foreign Module (C)

Receive

Send

Receive

Send

new data to the outside. Even in case there is no output data,


it generates the END signal to the outside to indicate that
the VHDL partitioned graph has finished its execution.
If we define a simulation loop as the duration between
receiving input packets form the backplane and transmitting response packets, a simulation loop is divided into two
parts: the interface time (inf(1) + inf(2) in figure 3) and user
module time(object time in figure 3). At the beginning of
a simulation loop, the master node calls the socket receive
function, which scans all incoming channels to read all simultaneous data from the backplane until the GO control
packet is received. Each packet from the outside contains
an identification field to specify which receiver node it is
transferred to. After transferring all received data to the input buffer, it generates an enable signal to each receive
node to run the partitioned graph. The receive nodes get the
data from the buffer, which defines the end point of inf(1)
duration. If a send node is scheduled, it writes a result data
into an output buffer in the foreign module. This is the beginning of inf(2) duration. After delta delay in the VHDL
simulator, the master node scans the output buffer and transmit the data to the backplane through socket. The end of
inf(2) is marked here. The time measurement is described
in section 5.2.




 #+)
$% ! $ %'
 #)'*+%

wait for 1 ns;


M_2
Users Design
Master

Figure 3. Function of three entities of VHDL


interface
For VHDL simulator we append three types of interface
nodes; master node, send node, and receive node. For each
partitioned arc, we add either a receive node for an input arc
or a send node for an output arc. Send nodes and receive
nodes are implemented separately for each data type: float
or integer. A VHDL simulator schedules interface nodes
when communication occurs. In case there are more than
one input communication links between hardware and software, there exists a risk of deadlock unless we carefully
manage the firing order of receiver modules. We solve this
problem by adding one VHDL entity, called the master
node, which serializes the firings of communication nodes.
The architecture of interface VHDL simulation is depicted in figure 3. In the initialization stage, the master node
establishes the socket connection and setups the input and
output queues; one for each partitioned link. Also, the master node scans the output buffer of send blocks to export the

To make the concurrent event-driven simulation, we use


a conservative approach so that the clock of VHDL simulator may not be ahead of the global clock.
In a conservative timed cosimulation, a client simulator
can advance its local time only when it receives a packet
which has a time stamp larger than its local time. Since
we may not expect that the VHDL simulator has an interface for the external cosimulation engine to prevent the advancement of the local clock, we let the master node keeps
the time advancement of the VHDL simulator from being
ahead of the global clock. The master node plays a role of
runtime manager of the VHDL simulator.
For correct timed cosimulation, the behavior of the master node described in the previous subsection is divided into
two parts. At each execution, the master node checks the
input connection and send signals to wake up the receive
nodes. After all events at the current time are processed in
the VHDL simulator, the master node is scheduled again to
check the output buffer and sends packets to the cosimulation backplane. Then, the master node goes into a wait loop
to be excited from the outside.
Figure 4 and figure 5 are screen dumps from QAM
cosimulation in PeaCE. Figure 4 has a top view of QAM
system which has two super nodes; one has a C subgraph
and the other has a VHDL subgraph that is displayed in figure 5. The top view window also has two simulation support

Figure 4. A screen dump of top view window


of QAM in PeaCE

is scheduled and produces a output value when a packet is


received from the backplane. If a ramp node is scheduled
after the master node checks the output buffer and produces
a result, then the result will be checked and sent out at the
next time slot. The backplane will receive the packet after
it advances to the next time slot. So, the packet will be an
old packet and the conservative cosimulation will be broken. Another problem is that the value responded first is a
glitch.
Since it is usually not possible to schedule a certain process at the end of the current time wheel, as shown in figure 6, we advance the local time for delta delay before the
master node checks the output buffer and sends response
packets to the backplane of cosimulation. Although the
master node is scheduled at the beginning of the next time
slot rather than at the end of the current time slot, we can
deliver the final results of the previous time slot to the backplane if we subtract the delta time from the time-stamp of
the output packets.
while(Go packet is received) loop
receive packet from socket;
write data into input queue;
end loop;
Check timestamp of received packet;
if (timestamp > simulators time) advance time;
Send ENABLE signal to receiver nodes;
Advance local time for delta delay(1 NS);
Check data in output queue;
If data exists then send data to socket;
Send END signal with next time;

Figure 5. A screen dump of QAM cosimulation in PeaCE

nodes; a clock node (a test pattern generation node) and a


XGraph node (a visualization node) in the backplane.
While the VHDL module graph has 8 nodes, 12 VHDL
entities are simulated because of the insertion of a master
node, two receive nodes, and a send node. Each of 12 entities has its own process. Thus 12 concurrent processes are
running within the VHDL simulator.
For the master node to check the output buffer, the master
node should be scheduled only after all events in the same
time wheel are processed. Or, the VHDL simulator may not
produce the current output to the backplane on time. Recall
that once the master node receives a packet from the backplane, even if there is no result data at that time, a response
packet should be sent to the backplane for the backplane
scheduler to exit from the wait loop. In the VHDL module graph shown in figure 5, there is a source node, a ramp
node, which produces an incremented value whenever it is
scheduled. The designers intention is that the ramp node

Figure 6. Behavior of the master node


The advancement of the local clock by delta delay is prohibited in conservative distributed simulation. To cure this
problem, the duration of arbitrary time advancement should
be small enough to be ignorable at the simulation interface.
An easiest way is to use a very small time unit such as 1
femto second. Since the small unit of time makes the internal data structure of VHDL simulator inefficient, we use a
SCALE parameter, which is a ratio between the time unit
in the entire cosimulation and that within the VHDL simulator. When the master communicate with the backplane,
the time-stamp of a packet is interpreted by multiplying or
dividing the time-stamp with the SCALE value.
The behavior of the master node is shown in figure 6. There are two points where the time advancement is occurred. After packets from the backplane are
received, if the future time stamp is detected by calling
check time advance() foreign procedure, the VHDL simulator advances its local time. The amount of time advancement is computed by multiplying the SCALE value with the
time difference between the backplane and the VHDL simu-

lator. Another advancement is the delta delay advancement


described earlier. This is used only once per each simulation cycle. If the SCALE is larger than 1, timed simulation
works well.
However, another problem is caused by using the
SCALE parameter. If there is a VHDL module which uses
a time unit smaller than SCALE, the VHDL module work
differently from the designers intention. In QAM, for example, a ramp node has a wait for 1ns; statement. After
we replace the statement with wait for SCALE ns;, the
system works correctly. We will construct a VHDL module
library, a SCALE is used as one of its generic parameters.
From the experiences of interface design and implementation, we make a list of facilities that VHDL simulators had
better support for cosimulation.


A callback function for hooking the scheduler: Before the VHDL simulators scheduler advances to the
next cycle, a function pointed by a pointer is called. In
normal situation the pointer is pointed a null function.
If a cosimulation environment designer wants to hook
it up, he defines a function body and set the pointer to
the new address. If the supported language is a C++,
it will be done by a virtual function. By the callback
mechanism, the master node can be executed at the end
of the current wheel. Then, we will do without the
delta delay management described above.


The time of nearest future event: To perform more


efficient cosimulation, the information of nearest future event is required. The current implementation of
PeaCE uses the next time increment as a nearest future
event and it is a major source of inefficiency of cosimulation time. The C language interface in the Synopsys
VHDL simulator(VSS) supports this facility with the
cliGetNextEventTime function.

5 Experiences of cosimulation with designed


interface mechanism
We implemented two sets of the proposed interface generation mechanism with QAM modulation example. One
set of implementation is for Synopsys VSS simulator and
the other is for IVSIM, which is developed in the same
University[13]. Though both support the VHDL foreign interface, since the implementation details of the foreign interface are not specified in the standard, the interface mechanism in each simulator depends on the simulator implementation.



  +
)

#
$ &+
    #)
$
 
 

The fixed C module in table 1 is a foreign module which


includes socket handling routine and VHDL interface rou-

tine. In VSS, a foreign module defines the body of a VHDL


entity itself, thus needs more extra code to interface with the
scheduler of the VSS simulator. On the other hand, the C
modules in IVSIM are just procedure calls. As a result, the
fixed C part in VSS is larger than that in IVSIM. In VSS,
however, interface designer can exploit more facilities related with VHDL simulator kernel using CLI(C Level Interface) facility. The fixed VHDL module in VSS includes
entity definitions of send, receive, and master node, while
that in IVSIM has only a master node. On the contrary, the
proportional parts in IVSIM are larger than those in VSS.
The proportional part in VSS has only entity instantiation
codes while, in IVSIM, the proportional part defines and instantiates entities of send or receive node. Although there
are much differences in detailed implementation, the same
interface mechanism using the master node is used for both
simulators, and the size of interface part is about 10% of the
whole simulated code for both VSS and IVSIM in the QAM
example.

Table 1. Interface code overhead in VSS and


IVSIM simulators
Fixed C
Module
(bytes)
VSS
IVSIM

 


17688
5628

Fixed
VHDL
module
(lines)
341
25

'

#
$ &
  

VHDL
lines
per
Receive
2
27

VHDL
lines
per
Send
2
17

#)
$ 
  

The result of runtime monitoring of VHDL simulator


and interface module is presented in table 2. To measure
the execution time of user design VHDL modules and that
of interface modules, we use gettimeofday() UNIX system
call. The current time is expressed in elapsed seconds and
microseconds since 00:00 GMT, January 1,1970. We obtain the system time at four points as described in section
2.2. The time spent in interface modules means the duration
between the UNIX socket to input/output buffers while the
module time is the time duration between when the receive
nodes get the input data from the input buffer and when the
send nodes write the results into the output buffer. The time
overhead of interface block is small enough to be ignored.

6 Conclusion
We have presented a new interface mechanism for
hardware-software cosimulation. We think that the approach, which satisfies all of the requirements in the wish-

Table 2. Interface time overhead in QAM


cosimulation. (IVSIM simulator is used and
all values are presented in micro seconds
unit.)
Simulation
Loop
96
192
288
384
480

User Module
65,905
127,768
190,005
259,545
327,625

Interface
Module
5,599
8,772
11,522
15,220
18,399

list, is applicable to all detailed levels of cosimulations and


to various existing VHDL simulators. It is not known to
us that there has been any cosimulation environment which
works with any VHDL simulator and at the same time performs timed cosimulation. We implemented the interface
mechanism both for VSS and IVSIM simulators and compared them. Also, we verified the feasibility of the proposed approach by conservative timed cosimulation with
a QAM-16 modulation example. The implemented interface and lessons for more efficient timed cosimulation
has been described. As a future work we will improve our
cosimulation environment, construct a generic and parameterized VHDL module library, and make a smooth migration path to cosynthesis.

References
[1] C. Passerone, et. al.
Fast and accurate hardwaresoftware co-simulation using software timing estimates.
CODE/CASHE96, 1996.
[2] E. A. Lee. Recurrences, Iteration, and Conditionals in Statically Scheduled Block Diagram Language in VLSI Signal
Processing III. IEEE Press, 1988.
[3] E. A. Lee, and D. G. Messerschimitt. Synchronous data flow.
IEEE Proceedings, September 1987.
[4] G. Jennings. A case against event driven simulation of digital system design. The 24th Annual Simulation Symposium,
pages 170176, April 1991.
[5] IEEE. IEEE Standard VHDL Language : Reference Manual. IEEE, Inc., 345 East 47th Street, New York, NY 10017,
USA, 1993.
[6] J. Buck, S. Ha, E. A. Lee, and D. G. Messerschimitt.
Ptolemy: A framework for simulating and prototyping heterogeneous systems. International Journal of Computer
Simulation, 4:155182, April 1994.
[7] J. P. Soninen, et. al. Co-simulation of real-time control systems. IEEE/ACM Proc. of Euro-Dac95, pages 170175,
1995.
[8] J. Pino, Michael C. Williamson, and Edward A. Lee. Interface Synthesis in Heterogeneous System-Level DSP Design

[9]

[10]

[11]

[12]
[13]

Tool. IEEE International Conference on Acoustics, Speech,


and Signal Processing, 1996.
Peter Zepter, Thorsten Grotker, and Heinrich Meyr. Digital
Receiver Design Using VHDL Generation From Data Flow
Graphs. Procedings of 34th DAC, June 1995.
S. Schemeler, et. al. A backplane approach for cosimulation
in high-level system specification environments. European
Design and Test Conference, 1995.
Sari L. Coumeri and Donald E. Thomas. A Simulation Environment for Hardware-Software Codesign. IEEE Design
and Test of Computers, pages 1628, September 1993.
Wayne Wolf. Hardware-software codesign of embedded
systems. Proceedings of IEEE, 82:967989, July 1994.
Y.Kim, K. Kim, Y.Shin, T.Ahn, W.Sung, K.Choi, and S.Ha.
An integrated hardware-software cosimulation environment
for heterogeneous system prototyping. Proc. of ASPDAC,
pages 101106, August 1995.

Das könnte Ihnen auch gefallen