You are on page 1of 42

This is the html version of the file

http://www.tik.ee.ethz.ch/~leiden05/data/presentations/Bhattacharyya.ppt.
Google automatically generates html versions of documents as we crawl the web.

Design and Synthesis of Image


Processing Systems using
Reconfigurable Dataflow Graphs
Mainak Sen and Shuvra S.
Bhattacharyya
Department of Electrical and
Computer Engineering, and
Institute for Advanced Computer
Studies
University of Maryland at College
Park
Maryland DSPCAD Research Group
http://www.ece.umd.edu/DSPCAD/ho
me/dspcad.htm
November 22, 2005
Leiden University, The Netherlands
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park


Outline
• Dataflow-based model of
computation for modeling the
behavior of DSP applications
• Decidable dataflow models
o Example: use of decidable dataflow as a
model of computation for modeling the
mapping of (decidable) dataflow
behaviors onto embedded
multiprocessors
• Structured reconfiguration of
dataflow graphs
• Examples of meta-modeling
techniques that can be classified
as structured, reconfigurable
dataflow
o Parameterized dataflow and its
application to SDF
o Homogeneous-parameterized dataflow
and its application to SDF and CSDF
o Experiments on a gesture recognition
application
• Summary
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park


Dataflow-based design
for DSP
(Example from Agilent
ADS tool)
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

DSP-oriented Dataflow
Models of Computation
• Used widely in design tools for
DSP
• Application is modeled as a
directed graph
o Nodes (actors) represent

functions
o Edges represent

communication channels
between functions
o Nodes produce and consume

data from edges


o Edges buffer data in FIFO
(first-in first-out) fashion
• Data-driven execution model
o A node can execute whenever

it has sufficient data on its


input edges
o The order in which nodes

execute is not part of the


specification
o The order is typically

determined by the compiler,


the hardware, or both
• Iterative execution
o Body of loop to be iterated a

large or infinite number of


times
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Dataflow Features and


Advantages
• Exposes coarse-grain parallelism.
• Exposes high-level structure that
facilitates analysis, verification,
and optimization.
• Captures multi-rate behavior.
• Complementary to ongoing
advances in DSP compiler
technology for procedural
languages, such as C and
MATLAB.
• Encourages desirable software
engineering practices: modularity
and code reuse
o Amenable also to aspect-oriented design.
• Intuitive to DSP algorithm
designers: signal flow graphs.
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Evolution of Dataflow Models for


DSP
• Synchronous dataflow: static
multirate behavior
o Agilent ADS, Cadence SPW, etc.
• Well-behaved dataflow: schemas for
bounded dynamics
• Boolean/integer dataflow: Turing
complete models
• Multidimensional synchronous
dataflow: image and video
• Scalable synchronous dataflow:
block processing
o Synopsys COSSAP

• Cyclo-static dataflow: phased


behavior
o Synopsys El Greco, Eonic

Systems Virtuoso Synchro,


System Canvas
• Bounded dynamic dataflow :
bounded dynamics
• The processing graph method:
reconfigurable dynamic DF
o US Naval Research Laboratory,

MCCI Autocoding Toolset


• Parameterized dataflow:
dynamically-reconfigurable static DF
• Blocked dataflow: image and video
in terms of reconfigurable dataflow
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Modeling Design Space


E
x
p
r
e
s
s
i
v
e
p
o
w
e
r
Verification / synthesis power
X
C, BDF, DDF
X
SDF
X
CSDF

X
CSDF, SSDF
MDSDF,
WBDF
X
X
PSDF
X
PCSDF
(Third dimension: simplicity and intuitive
appeal)
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Decidable Dataflow
Models
• Modeling flow for representing
static flowgraph behavior:
o Cyclo-static dataflow (CSDF), multiphase
modeling 
o Synchronous dataflow (SDF), multirate
modeling 
o Homogeneous synchronous dataflow
(HSDF) 
o Acyclic homogeneous synchronous
dataflow (“task graphs”)
• These are in decreasing order or
generality
• Designs represented in the more
general models can be converted
to equivalent representations in
the less general ones
o e.g., CSDF SDF  HSDF  task graph
• HSDF: each actor (graph node)
produces/consumes exactly one
data value to/from each incident
output/input edge
o Suitable for exposing parallelism
o Not the best model for minimizing
memory requirements
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Synthesis Techniques
for Decidable Models
• Static scheduling: low overhead,
predictability
• Performance analysis through
synchronization graphs
• Loop scheduling
o Implicit repetition in the dataflow graph
(through changes in sample rate) needs
to be translated into explicit repetition in
the form of loops on the execution target.
o Complex design space exists for such
translation
o Complementary to procedural language
techniques for nested loop compilation
• Loop scheduling techniques
o Simulation speedup (minimization of
scheduling complexity)
o Code/data minimization
o Hierarchical parallel scheduling
o Block processing
• Task scheduling for
latency/throughput optimization
• Probabilistic design: exploiting
tolerances to deadline misses
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park


Example: Intermediate
representations for
synthesis from
decidable dataflow
models
• Consider a decidable dataflow
behavior that is to be
implemented on a self-timed,
embedded multiprocessor
o Natural way to implement DSP
multiprocessors from decidable dataflow
o Actor assignment and ordering are
performed statically
o Invocation (dispatch) of actors is
performed dynamically, through
synchronization
• Candidate mappings of the
behavior onto the architecture can
be represented through an
intermediate representation that
also has decidable dataflow
semantics
o This representation is useful for
understanding the performance,
communication overhead, and
synchronization structure associated
with the candidate mapping
• Facilitates the separation of
communication and
synchronization functionality
• This is a useful modeling
methodology for design space
exploration
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Interprocessor
Communication Graph
(Gipc)
2r1
4s1
4s2
4s3
5s1
7r1
8r1
9r1
6
2
3

5
8
7
9
1

IPC Graph
Every edge (vi, vj) induces the precedence
constraint

4
1
3
6
5
8
7
9
Self-Timed Schedule
Proc 1: (1, 2, 3, 4, 6)
Proc 2: (5, 7, 8)
Proc 3: (9)
Proc 1
Proc 2
Proc 3

Self-timed schedule and


its IPC graph
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park


The synchronization
graph Gs
• Derived from the interprocessor
communication graph
• Synchronization edges are
distinguished from interprocessor
communication (IPC) edges
o Synchronization edges

represent precedence
constraints that are enforced
by synchronization protocols
o IPC edges represent data

transfers
• Interprocessor connections
o Coincident synchronization

and IPC edges 


communication together with
synchronization protocol
(conventional approach)
o IPC edge only 
communication without synch.
protocol
o Synchronization edge only 
synchronization protocol only
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Applications of
Synchronization Graphs
• Simulation
• Throughput estimation through
cycle mean analysis
• Removal of redundant
synchronizations
• Resynchronization
• Conversion to more efficient
synchronization protocols
(strongly connected
synchronization graphs)
• Statically determining and
minimizing the sizes of
interprocessor communication
buffers

• All are post-processing methods


that can be applied to improve a
wide range of existing task graph
scheduling techniques on a wide
range of multiprocessor
architectures.
• These techniques benefit from
good execution time estimates,
but do not depend on exact
execution time values to deliver
useful results.
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Beyond Decidable
Models
• Limited expressive power: DSP
applications increasingly employ
high-level dynamics in their
behavior
o User interface functionality
o Mode changes
o Adaptive algorithms
o Reconfiguration of processing
resources/parameters
• However, key subsystems still
exhibit large amounts of “quasi-
static” structure --- structure that
stays fixed across significant
windows of time.
• Various dynamic dataflow models
have been proposed that address
the limitation above by
abandoning most or all
restrictions related to decidable
dataflow
• However, these methods are
correspondingly limited in their
ability to exploit the quasi-static
structure described above
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Parameterized Dataflow:
Structured Control of
Dynamic Parameters
• The Key discipline that is imposed
on reconfiguration is that each
subsystem must have a consistent
view of each of its actors
(hierarchical or primitive)
throughout any given iteration of
that subsystem.
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Parameterized Dataflow
• Hierarchical modeling
subsystem
parent graph
subinit
init
body
parameter n, ...
writes n
reads n
• Parameterized DF subsystem is
composed of 3 parmeterized DF
graphs:
o init, subinit, body

• Subsystem parameters
o configured in init/subinit, used in body

• Dynamically reconfigurable
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Meta-modeling with
parameterized dataflow
• Parameterized dataflow can be
applied to any dataflow model of
computation (“base model”) to
augment that model with dynamic
reconfiguration capabilities in a
structured way
o Provides for efficient quasi-

static scheduling
o Enables execution to be

viewed in terms of a sequence


of dataflow graphs in the base
model
• Parameterized dataflow + XYZ 
“Parameterized XYZ”
• Examples of parameterized
dataflow models of computation
that we are developing and
experimenting with
o parameterized synchronous

dataflow (PSDF)
o parameterized cyclo-static

dataflow (PCSDF)
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Parameterized
Synchronous Dataflow
(PSDF)
• “Locally synchrony” conditions
can be formulated and checked in
a quasi-static fashion to ensure
that bounded token production
and consumption along with
bounded delays lead to bounded
memory requirements overall.
o This is not true of unstructured

dynamic dataflow models,


such as general dynamic
dataflow, boolean dataflow,
and bounded dynamic dataflow
• Techniques for construction of
streamlined looped schedules for
synchronous dataflow graphs
have natural and efficient
extensions to the construction of
parameterized looped schedules
for PSDF graphs.
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

PSDF Example: CD to
DAT Conversion
initChild
setFac
(sets i1,…d4)
CD
PF1

1 1
d1 i4

i1 i3
d2 d4
i2 d3
PF2
preamble
PF3
PF4
DAT
params i1, d1, …., i4, d4
init
body
body
repeat 5 times {
fire setFac /* sets i1, d1, i2, d2, i3, d3, i4, d4 */
int _g1 = gcd(i1, d2); int _g2=gcd((i2 x
i1)/_g1, d3)
int _g3=gcd((i3 x i2 x i1)/(_g2 x _g1), d4);
repeat (d4/_g3) times {
repeat (d3/_g2) times {
repeat (d2/_g1) times {
repeat (d1) times {fire CD}
fire PF1
}
repeat (i1/_g1) times {fire PF2}
}
repeat ((i2 x i1)/(_g2 x _g1)) times {fire
PF3}
}
repeat ((i3 x i2 x i1)/(_g3 x _g2 x _g1)) times
{
fire PF4
}
repeat (i4) times {fire DAT}
}
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

PSDF Example: Speech


Compression
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

PCSDF Version of
Speech Compression
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Outline
• Dataflow-based model of
computation for modeling the
behavior of DSP applications
• Decidable dataflow models
o Example: use of decidable dataflow as a
model of computation for modeling the
mapping of (decidable) dataflow
behaviors onto embedded
multiprocessors
• Structured reconfiguration of
dataflow graphs
• Examples of meta-modeling
techniques that can be classified
as structured, reconfigurable
dataflow
o Parameterized dataflow and its
application to SDF
o Homogeneous-parameterized dataflow
and its application to SDF and CSDF
o Experiments on a gesture recognition
application
• Summary
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Homogeneous
Parameterized
Dataflow
(HPDF)
• Parameterized dataflow model that
can encapsulate dynamicity of
application.
• Meta-modeling technique.
Hierarchical actors can have any
other underlying dataflow model
(SDF, CSDF, PSDF etc.)
• Data production & consumption rates
though dynamic are equal across an
edge for a large number of
applications - thus the name
homogeneous.
• Reconfiguration can be performed
without introducing hierarchy when
more natural to do so (advantage
over parameterized dataflow).
• Parameterized dataflow is a more
powerful technique and thus can be
used to represent a wider set of
applications.
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park


Applications
• Applications with dynamic run-time
data and aggregated final-stage
processes perform especially well for
HPDF over SDF semantics.
• Many applications in image and
speech processing seem well
suited for our model.
• We applied the model on two
applications –
- A real-time video processing
algorithm for smart camera developed at
Princeton
- A face detection algorithm developed
at CFAR labs in UMD.
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Application
characteristics
A
B
M
N
Dynamic but balanced amount of data

Aggregating

final-stage

• This structure seems to be abundant


in many audio/video applications.
• Our HPDF model is a natural fit for
applications with the above structure.
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Gesture recognition
algorithm
• Real-time video processing for
gesture recognition.
• Does low-level (red oval) and high-
level processing.
• Low-level processing recognizes
body parts and identifies
movements.
• High-level processing recognized
actions.
• We concentrate on low-level
processing.

Ref : W. Wolf, B. Ozer, T. LV. Smart cameras as embedded systems.


IEEE Computer Magazine Vol 35, Iss 9, Sept 2002, Pages 48-53

Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

HPDF model of
gesture recognition
algorithm
Region
finding
Contour
following
Ellipse
Fitting
Graph
Matching
Dynamic data
Aggregating
final-stage
Dynamic data
nn

pp

Ptolemy II implementation
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Modeling with
HPDF/CSDF
VIDEO
INPUT
REGION
EXTRACTION
CONTOUR
FOLLOWING
(s 1) (s 1)

(s 1) (s 1)

(s 1) (s 1)

(s 1) (Xi, Yi)

(s 1) (Xi, Yi)

ELLIPSE
FITTING
(I 0,I ki) (n 1)

MATCH
p (pi1, qi 0)

p phases with 1 token and (n-p) phases with 0 token production

#phases = #pixels = s

Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Integrating HPDF and


CSDF
• Number of phases in a
fundamental period can vary
dynamically.
• Number of tokens produced or
consumed in a given phase can
also vary dynamically.
• HPDF constraint: the total number
of tokens produced by a source
actor of a given edge in a given
invocation (a fundamental period)
must equal the total number of
tokens consumed by the sink in
its corresponding invocation.
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

• Each frame has 384x240 pixels, so


we model the input as a CSDF
actor with 92160 = s phases.
• Model captures pixel level
parallelism present in Region.
• It also captures the frame level
parallelism through the number of
phases in Input (s).

Finer granularity and Input modeling


VIDEO
INPUT
REGION
EXTRACTION
(s 1) (s 1)

(s 1) (s 1)

(s 1) (s 1)

#phases = #pixels = s

Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Modeling dynamicity -
Contour
• 2 phases for Contour
• First one scans until finds a
contour.
o Output = 0 tokens

• Second one follows this contour


and all the overlapping ones.
i
o Output = k tokens, each token

is a list of pixels from a


contour
• Homogeneous condition
remains:
=s
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Scheduling
• VRCEM
• (s V)(s R)(2I C)(n E)M
• (s VR)(2I C)(n E)M
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park


• We applied HPDF to successfully
model a face detection algorithm
also.
• We developed a TI DSP
implementation of the HPDF model
of the gesture recognition algorithm.
• The application was run on a
TMS320C64xx fixed point processor.
• When implemented with our HPDF
model, the runtime was 21405671
cycles.
• With a 40ns cycle period, execution
time for the application was 0.86 sec.

Results
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Results (contd.)
• Scheduling overhead was minimal
as imperatively highly streamlined
quasi-static schedule was obtained.
• Worst case buffer size 642 Kb when
the input images were 384X240
pixels. HPDF modeling suggested
buffer reuse between the edges.
• Original C code had runtime of
27741882 cycles, execution time
was 1.11 sec with the same clock
period of 40 ns.
• HPDF improved runtime by 23%.
• Efficient hardware code generation is
being looked into using hardware
synthesis framework developed in
our research group.
Design and Synthesis of Image Processing Systems,

University of Maryland at College Park

Summary
• Dataflow-based model of
computation for is attractive for
modeling the behavior of DSP
applications
• Decidable dataflow models are
useful for exposing and exploiting
static structure in synthesis tools
for DSP
• Decidable dataflow models in
conjunction with structured
reconfigurable techniques allow
for efficient handling of
application dynamics
• Examples of structured,
reconfigurable dataflow
techniques that we discussed:
o Parameterized dataflow and its
application to SDF
o Homogeneous-parameterized dataflow
and its application to SDF and CSDF
o Experiments on a gesture recognition
application
• Other examples include dynamic
configuration of graph topologies,
and blocked dataflow modeling.
Design and Synthesis of Image Processing Systems,
University of Maryland at College Park

References
• B. Bhattacharya and S. S. Bhattacharyya.
Parameterized dataflow modeling for DSP
systems. IEEE Transactions on Signal
Processing, 49(10):2408-2421, October 2001
• S. S. Bhattacharyya, R. Leupers, and
P. Marwedel. Software synthesis and code
generation for DSP. IEEE Transactions on
Circuits and Systems --- II: Analog and Digital
Signal Processing, 47(9):849-875, September
2000.
• G. Bilsen, M. Engels, R. Lauwereins, and J. A.
Peperstraete. Cyclo-static dataflow. IEEE
Transactions on Signal Processing, 44(2):397-
408, February 1996.
• D. Ko and S. S. Bhattacharyya. Dynamic
configuration of dataflow graph topology for
DSP system design. In Proceedings of the
International Conference on Acoustics,
Speech, and Signal Processing, pages V-69-
V-72, Philadelphia, Pennsylvania, March 2005.
• E. A. Lee and D. G. Messerschmitt. Static
scheduling of synchronous dataflow
programs for digital signal processing. IEEE
Transactions on Computers, February 1987.
• S. Neuendorffer and E. Lee. Hierarchical
reconfiguration of dataflow models. In
Proceedings of the International Conference
on Formal Methods and Models for Codesign,
June 2004.
• M. Sen, S. S. Bhattacharyya, T. Lv, and
W. Wolf. Modeling image processing systems
with homogeneous parameterized dataflow
graphs. In Proceedings of the International
Conference on Acoustics, Speech, and Signal
Processing, pages V-133-V-136, Philadelphia,
Pennsylvania, March 2005