Sie sind auf Seite 1von 33

MPSoC PerformanceModeling

andAnalysis
MPSoC PerformanceModeling
andAnalysis
Chapter9 Chapter9
Chapter6
ofMultiprocessorSystemsonChips
Outline Outline
Chapter objectives
To give a better understanding of run-time effects on
system performance (nonfunctional performance
dependency)
To review the formal approach of performance analysis,
which is more efficient and reliable alternative to co-
simulation
2/60
Outline Outline
1. Introduction
2. Performance modeling and analysis of architecture
components
Processing elements
Communication elements
Memories
3. Modeling of process execution
4. Modeling the effect of the shared resources
5. Global performance analysis
6. Conclusions
3/60
1.Introduction 1.Introduction
4/60
1.1 Complex Heterogeneous Architectures
1.2 Design Challenges
1.3 State of the Practice
1.4 Structuring Performance Analysis
1.1ComplexHeterogeneousArchitectures 1.1ComplexHeterogeneousArchitectures
The productivity requirements for MPSoC design can only be met
by systematic reuse of components and subsystems
Reuse and specialization result in increasingly heterogeneous
MPSoC structures
Architecture templates, called platforms, have been introduced
To improve component reusability
To simplify the design process
An example of platform is the Philips Nexperia proposed for
multimedia applications
Like for other platforms, there are many types of processors,
coprocessors, memories, and busses available as library elements
for the Nexperia platform
5/60
1.1ComplexHeterogeneousArchitectures 1.1ComplexHeterogeneousArchitectures
Philips VIPER for set-top box and digital TV applications
6/42
(Nexperia platform)
1.1ComplexHeterogeneousArchitectures 1.1ComplexHeterogeneousArchitectures
Software architecture example
7/42
In any sufficiently complex MPSoC, we find different
hardware/software architectures and different scheduling
strategies
Scheduling strategies
RISC processors with static or dynamic priority
scheduling
Hardware components or DSPs with a fixed schedule
Communication devices with time-triggered resource
sharing
The heterogeneous hardware architecture leads to a
similarly heterogeneous software architecture.
1.1ComplexHeterogeneousArchitectures 1.1ComplexHeterogeneousArchitectures
8/60
Major challenges in MPSoC design
Component integration
Component and subsystem interfacing
Design space exploration and optimization
Integration verification
Design process integration
1.2DesignChallenges 1.2DesignChallenges
9/60
Design verification of MPSoC can be separated into
Function verification: verification that the specified
function has correctly been implemented
Performance verification: validating that the target
architecture is well designed by observing the followings
the target architecture provides sufficient processor
and communication performance to run the application
the target architecture meets the timing requirements
the target architecture avoids memory underflow or
overflow
the target architecture does not introduce run time-
dependent behavior such as deadlocks
1.2DesignChallenges 1.2DesignChallenges
10/60
What is new in MPSoC design is the heterogeneity of
scheduling algorithms, leading to almost arbitrary
combinations
The problem of evaluating the performance of an MPSoC is
found in the integration of different scheduling strategies
Some of the advanced analysis techniques cover more than
one scheduling approach, but are limited to few
combinations and, therefore, do not scale to large
heterogeneous systems
Terms
Component: processor, communication link, meory
Subsystem: a set of components to be controlled by a
common scheduling strategy and a common layered SW
1.2DesignChallenges 1.2DesignChallenges
11/60
The state-of-the-art method to verify MPSoC performance is co-
simulation
We find analysis tools from major ECAD vendors, which
support the co-simulation of hardware and software
Advantages of HW/SW co-simulation
We can use the same simulation environment and patterns for
performance verification as well as function verification
The system designer can reuse components simulation
patterns or subsystem simulation patterns for system
integration
Application benchmarks can be reused for simulation-based
performance verification
Simulation provides use cases for a given set of input patterns
that support the understanding and debugging of a system
1.3StateofthePractice 1.3StateofthePractice
12/60
Critical disadvantages of simulation-based performance
verification (including HW/SW co-simulation)
Performance verification is much more time-consuming than
function verification
Performance verification and function verification use the
same simulation patterns
However, they use different models; performance verification
needs to use timed models, whereas function verification
uses untimed models
The (detailed) timed model exhibits much higher complexity
in behavior representation and simulation. Thus, even for the
same number of simulation patterns, timed model results in
much higher run times than untimed models timed model
Thus, if we use simulation for performance verification, it can be
a bottleneck in the design process
1.3StateofthePractice 1.3StateofthePractice
13/60
Serious limitations to simulation: performance verification is only
partially possible due to complex runtime interdependency,
introduced by resource sharing
The arbitration for a shared medium, Bus B, introduces a
nonfunctional performance dependency that is not reflected in
the system function
1.3StateofthePractice 1.3StateofthePractice
Bus B arbitration introduces a
nonfunctional performance
dependency that is not
reflected in the system function.
The runtime interdependency
among components can turn
component or subsystem best-
case performance into system
worst-case performance
Fixed
function
component
14/60
Process P1 running on CPU1 sends one data packet per process
execution.
The fastest execution time of p1 corresponds to the maximum
transient bus load, slowing down communication of other
components.
The burst behavior of CPU1 depends on the input queue state,
which in turn may depend on previous executions of other
processes.
1.3StateofthePractice 1.3StateofthePractice
15/60
Resource sharing introduces a confusing variety of run-time
interdependencies among processes
They lead to data-dependent transient run-time effects
that are difficult to find and to debug
System corner cases are different from component corner
cases
System corner cases can be quite complex and difficult to
reach in simulation
Function verification patterns are not sufficient since
function corner cases do not cover nonfunctional
dependencies introduced by resource sharing
So, where do we get the stimuli for the performance
verification for the system corner cases?
Formal performance analysis, more efficient and reliable
than co-simulation (Authors claim)
1.3StateofthePractice 1.3StateofthePractice
16/60
1.4StructuringPerformanceAnalysis 1.4StructuringPerformanceAnalysis
Fig.65Performancemodelstructure
Anactivationmodelthatdefines
theactivationofprocessesis
introducedforP1
CP1andM1runtwoprocesses,
P1andP2,onasoftware
architecture
AnMPSoC exampletoshowperformanceanalysisproblem
17/60
1.4StructuringPerformanceAnalysis 1.4StructuringPerformanceAnalysis
HowtoanalyzetheperformanceofanentireMPSoC system
18/60
Cosimulationapproachhasdifficultyinevaluatingthesystem
worstcasesduetothenonfunctionalperformancedependencyof
theMPSoC system.
Formalperformanceanalysisismoreefficientandreliablethan
cosimulation(Authorsclaim)
Formal analysis of MPSoC performance:
The basic idea is to extend the SW analysis approach to
MPSoC analysis
Formal SW performance analysis differentiates two problems
Formal analysis of process performance
Analyzes the execution time of a process running on a single
processor without resource sharing
Schedulability analysis
Analyzes the effects of resource sharing on one or multiple
processors
Above two problems are independent.
1.4StructuringPerformanceAnalysis 1.4StructuringPerformanceAnalysis
19/60
Authors contributions
Formal analysis of process performance:
The approach for SW analyzes the execution time of a
process.
Authors introduced a more general approach to analyzing
the performance of architecture components (processing
elements, communication elements, and memories).
Schedulability analysis:
The classical schedulability analysis is limited to subsystems.
Authors introduced a new approach to investigating the
performance of a complete heterogeneous system through
the composition of individual subsystem analysis.
1.4StructuringPerformanceAnalysis 1.4StructuringPerformanceAnalysis
20/60
1.4StructuringPerformanceAnalysis 1.4StructuringPerformanceAnalysis
HowtoformallyanalyzetheperformanceofanentireMPSoC system
21/60
Sec.6.2Modelingandanalyzingthearchitecturecomponents
Sec.6.3Determinethefrequencyofprocessactivationtoanalyze
thetotalload
Sec.6.4Resourcesharingandperformanceanalysis
Sec.6.5Approachestotheanalysisofheterogeneoussystems
withsubsystemsofdifferentschedulingstrategies
2.ArchitectureComponentPerformanceModeling
andAnalysis
2.ArchitectureComponentPerformanceModeling
andAnalysis
What components do we study?
1. Processing elements
2. Communication elements
3. Memories
After studying architecture components, we study process
modeling and analysis with activation frequency
22/60
2.1ProcessingElementModelingandAnalysis 2.1ProcessingElementModelingandAnalysis
Two types of important information we need from the
processing element for the system performance analysis:
the execution time and the communication activity of the
processing element
1. Execution time of an execution path in an algorithm:
is the time needed to execute the given path. The
execution time is determined by the execution path
and the function implementation.
2. Communication activity:
is needed to determine communication load and,
hence, communication component timing
23/60
Execution path example
An execution path is a sequence of actions defined by a SW
program or by a state machine in case of a HW, which are
executed for a given initial state and a given input data
The execution paths are hardware
architecture independent.
The execution time, however,
depends not only on the
execution path, but also on the
function implementation,
capturing the target architecture
(HW).
24/60
2.1ProcessingElementModelingandAnalysis 2.1ProcessingElementModelingandAnalysis
Execution path example
The execution path is determined by control statements in
the program or FSM, i.e., by branches and loops
Notethat
Theconditionsthatcontrolbranchesandloops
dependoninputdataorinternalcontrolvariables
Iftheexecutionpathofacomponentdependson
theinputdata,thenthatcomponentwillhavedata
dependentexecutiontimes
Inpractice,processingelementanalysistodayis
basedonprocesssimulation
Processexecutiontimesaremeasuredorsimulated
using,e.g.,breakpoints
Thesimulationpatternsaretakenfromcomponent
design.
Even with formal methods, basic component
data are derived via simulation
Basic blocks
25/60
2.1ProcessingElementModelingandAnalysis 2.1ProcessingElementModelingandAnalysis
Formal analysis?
Execution time and communication activity
2.2FormalAnalysisofProcessingElement 2.2FormalAnalysisofProcessingElement
26/60
Execution time of an execution path
The execution time of a path can be computed by
summing up the execution times of all basic blocks
along an execution path: T=t
i
In case of pipelined architectures
There will be overlap between the execution of
subsequent basic blocks. The overlap must be taken
into account for execution time computation
To determine the architecture-dependent timing
interval,
Basic blocks can be analyzed for best-case and
worst-case execution times
The overlap can be considered for the best case
(e.g., perfect pipelining) and worst case (pipeline
empty)
2.2FormalAnalysisofProcessingElement 2.2FormalAnalysisofProcessingElement
27/60
2.2FormalAnalysisofProcessingElement 2.2FormalAnalysisofProcessingElement
t
pe
(F , pe
j
) =
i
t
pe
(b
i
, pe
j
) x(b
i
) (1)
where
F is a process
b
i
is basic block i on F
x(b
i
) is a parameter for basic block execution, associated with the
execution path
t
pe
(b
i
, pe
j
) is the execution time of b
i
when it is executed on
processing element pe
j
.
28/60
1. Execution time of process F on processing element pe
j
,
t
pe
(F, pe
j
)
Existing approaches to analyze the execution time of a process
using the execution times of basic blocks
A. ILP (Integer Linear Programming) solving for minimum, that
correspond to the best-case execution time (BCET) and the
worst-case execution time (WCET)
B. Clustering of basic blocks to longer segments with a single
execution path
In complex architectures, execution paths and function
implementation cannot always be treated independently. How
to treat the impact of cache operations has been studied
extensively.
2.2FormalAnalysisofProcessingElement 2.2FormalAnalysisofProcessingElement
29/60
2. Computing the execution time of each basic block
For simple component architectures, it is sufficient to add the
instruction execution times of basic block bi
More complex architecture models include pipelining effects
How to reflect the communication activity of processing element:
s(F): total sent data volume
r(F): total received data volume
s(b
i
) being the data volume sent in b
i
(b
i
) being the data volume received in b
i
2.2FormalAnalysisofProcessingElement 2.2FormalAnalysisofProcessingElement
30/60
Communication element behavior is typically far less complex
than processing element behavior
More complexity is found in communication links with protocols
that are used
to share the communication resource or
to ensure correct transmission
Communication element modeling and analysis follow the same
principles as processing element analysis
In addition to simulation, profiling and statistical load analysis are
popular performance validation approaches
This appears useful if the source of a communication is
statistically characterized and/or if communication latency and
memory cost are not critical
2.3CommunicationElementModelingandAnalysis 2.3CommunicationElementModelingandAnalysis
31/60
CE
j
transfersdatawordbyword
A simple point-to-point communication link
Suppose that communication element CEj transfers data word
by word
The communication time to transfer the data volume x grows
linearly with the size of x: (1)
For packet-based transmission with a fixed packet length pl: (2)
2.4FormalCommunicationElementAnalysis 2.4FormalCommunicationElementAnalysis
32/60
) , ( ) , (
) , 1 ( ) , (
j ce com
ce
j com
j com j com
ce pl t
pl
x
ce x t
ce t x ce x t
j
j

(
(

=
=
(1)
(2)
t
com
(1,ce
j
)
A one-to-many link
This can logically be split into many one-to-one links with
possibly different timing.
Many-to-many links, i.e., busses, originate in resource sharing and
are to be treated in Modeling the shared resources.
2.4FormalCommunicationElementAnalysis 2.4FormalCommunicationElementAnalysis
33/60
The types of memories in MPSoC
SRAMs, FLASH, EEPROM, and DRAM
Since the memory words are much wider than the
communication links, words are best communicated via
burst transmission.
DRAMs are inevitable for many MPSoC applications with
large memory requirements, e.g., in multimedia applications.
DRAMs are implemented on chip or externally
DRAM timing analysis depends on the memory architecture,
e.g., having caches or not.
If the communication link between memory and processing
element is shared resources, then the access time is
nonfunctionally dependent on other processes. This makes
accurate single process simulation hard to be discussed
in Modeling shared resources.
2.5MemoryElementModelingandAnalysis 2.5MemoryElementModelingandAnalysis
34/60
To determine the load of a component on a system, the
number of such executions per time is needed. activation
modeling
3.ProcessExecutionModeling 3.ProcessExecutionModeling
35/60
Sec.6.2Modelingandanalyzingthearchitecturecomponents
Sec.6.3Determinethefrequencyofprocessactivationtoanalyze
thetotalload
Sec.6.4Resourcesharingandperformanceanalysis
Sec.6.5Approachestotheanalysisofheterogeneoussystems
withsubsystemsofdifferentschedulingstrategies
Activation modeling aims at modeling how the process is
activated. Activation modeling depends on the types of
processes
There are two very different types of processes
1. Process with infinite execution time (runs forever)
The system behavior is only limited by the
environment and by the available processing and
communication performance.
It is hard to analyze delay times and buffer
requirements of systems with such processes
2. Process with finite execution time
Can be time-activated or event-activated
3.1ActivationModeling 3.1ActivationModeling
36/60
1. Time activated process?
A. A process is executed when a given time has elapsed
B. Typically, this process is activated periodically
C. The process reads input data, but activation is input
data-independent
D. Time-activated processes often use timer-generated
interrupts and poll the input data
3.1ActivationModeling 3.1ActivationModeling
37/60
2. Event-activated processes?
A. A process is activated by the events (one or more) that
are generated by the environment or by other processes
B. The implementation of event-activated processes often
uses interrupts caused by events
C. Typical events are
a. Arrival of a signal or
b. Satisfaction of certain input data condition
D. A dense sequence of events may lead to overlapping
executions of the same process.
E. A system can make the sequence of events to follow a
certain pattern that can be exploited for the better
utilization of a resource.
3.1ActivationModeling 3.1ActivationModeling
38/60
Events with periods (time-activated modeling)
Periodic events
Periodic events with jitter
Periodic events with bursts
3.1ActivationModeling 3.1ActivationModeling
39/60
Events with minimum inter-arrival time
Sporadic events with minimum interarrival times
Sporadic events with burst
3.1ActivationModeling 3.1ActivationModeling
40/60
Not only HW but also SW architecture impacts on system delay
The application processes are only a small part of the chain of
activations necessary to trigger the actuator (from a sensor to an
actuator)
3.2SoftwareArchitecture 3.2SoftwareArchitecture
Softwareeventpath:example
41/60
Thus,theSWcomponentsmustbeincludedinpathdelaycalculation,andSW
librariesmustbecharacterized.
So far, we have focused on the timing of a single process
execution (processing element) or communication activity.
In fact, processes and communication activities share the
MPSoC resources during operation.
4.ModelingSharedResources 4.ModelingSharedResources
42/60
Resource sharing requires resource arbitration, i.e.,
scheduling, and context switching
Scheduling strategies can be divided into
Static order versus dynamic order scheduling
Preemptive scheduling (interrupt) or non-preemptive
(run-to-completion) scheduling
Context switching implies overhead
Context switching time is mostly constant and can,
therefore, usually be determined at design time
Scheduling effects are highly execution time dependent
4.1ResourceSharingPrincipleandImpact 4.1ResourceSharingPrincipleandImpact
43/60
Arbitration (scheduling) strategies: for process and communication
Static execution order scheduling
Time-driven scheduling
Fixed time slot assignment
fixed assignment of time slices to processes or
communication links
Dynamic time slot assignment
Priority driven scheduling
Static priority assignment
Dynamic priority assignment
The efficiency of above models depends heavily on the activation
model
4.1ResourceSharingPrincipleandImpact 4.1ResourceSharingPrincipleandImpact
44/60
Uses a fixed scheduling sequence.
The fixed scheduling sequence can repeat.
There is full control on the execution order. Thus, it provides a
number of important advantages such as minimized idle time and
optimized context switch.
4.2StaticExecutionOrderScheduling 4.2StaticExecutionOrderScheduling
Staticexecutionorderscheduling
(a)Examplearchitecture
(b)Schedule
45/60
Time-driven scheduling assigns time slices to processes or
communication links without considering activation,
execution times, or data dependencies flexible
scheduling
Fixed time slot assignment: The time division multiple
access (TDMA) strategy keeps a fixed assignment of time
slices to processes or communication links
This assignment is periodically repeated
The greatest advantages are predictability and simplicity
The main limitations are efficiency and long total
response times
There is some flexibility since the time slots can be
adapted at system startup time
4.3TimeDrivenScheduling 4.3TimeDrivenScheduling
46/60
4.3TimeDrivenScheduling 4.3TimeDrivenScheduling
ProcessP
1
,P
2
,P
3
,andP
4
areassigned12,10,5,and13ms,respectively
Thisresultsinatotalperiodt
pTDMA
of40ms
Totalexecutiontime
P
1
:45mst
r
=129ms
P
2
:23mst
r
=95ms
P
3
:54mst
r
=426ms
P
4
:30mst
r
=111ms
P
1
slotremainsidleuntilP
1
isactivatedagain
SchedulingandidletimesinTDMA
TDMA
120ms
47/60
40ms
Dynamic time slot assignment: Round-robin scheduling
Round-robin scheduling begins with the fixed time slot
assignment and terminates a slot if the corresponding process
ends, avoiding idle time of TDMA
Slots are omitted or shortened, and the cycle time, t
RR
, of the
round-robin schedule is time-variant
4.3TimeDrivenScheduling 4.3TimeDrivenScheduling
P
1
nowendsatt
r
=113ms,but,moreimpressively,
P
3
hasaresponsetimeoft
r
=179ms?
Roundrobinscheduling
48/60
P
1
:45ms
P
2
:23ms
P
3
:54ms
P
4
:30ms
94ms
143ms
172ms
Round-robin scheduling
As the round robin avoids the idle times of TDMA, it
reaches the maximum resource utilization
On the other hand, process execution is no longer
independent. So, it loses the integration property of TDMA,
which is most important advantage.
However, round robin guarantees a minimum resource
assignment per process, since under full load conditions it
falls back to a TDMA schedule
Round robin is applicable to communication and processing.
It is found in many applications, such as in the Sonics
MicroNetwork for on-chip interconnect or in standard
operating systems.
4.3TimeDrivenScheduling 4.3TimeDrivenScheduling
49/60
Process priority or communication priority is also used to
schedule the use of shared resources in an MPSoC.
Static priority assignment / Dynamic priority assignment
Static priority assignment
It allows one to offload the scheduling problem to a simple
interrupt unit
Vectorized interrupt units reducing interrupt latency and
context switching overhead are found even in small 8-bit
microcontrollers, such as the 8051
Static priority algorithms cannot reach maximum resource
utilization, not even for the simple model 1
To reach higher resource utilization (or shorter deadlines), the
priority must be assigned dynamically at run time
4.4PriorityDrivenScheduling 4.4PriorityDrivenScheduling
50/60
Static priority assignment
Model 1
Processes are activated by the arrival of an input event
Input events are periodic with jitter
The process deadline is at the end of the period
Therefore, process execution must be periodic with jitter, as
well
The input event and, hence, the process execution rates may
have different periods
It was proved that the optimal solution for single processors
is to order the process priorities according to increasing
process execution rates, i.e., the process with the shortest
period is assigned the highest priority. Rate monotonic
scheduling (RMS) is very popular in embedded system design
due to its simplicity and ease of analysis.
4.4PriorityDrivenScheduling 4.4PriorityDrivenScheduling
51/60
Model 2
Similar to model 1. However, a process P2 depends on
P1, P2 can only be executed if process P1 has been
finished in this period
Above relation can be represented in a task graph
Obviously P1 and P2 must have the same period
Model 3
Similar to model 1, but with arbitrary deadlines
This seemingly small change has a major impact on
optimization, analysis, and system load
This model is frequently found in more complex
systems, in which a deadline covers multiple
components and subsystems
4.4PriorityDrivenScheduling 4.4PriorityDrivenScheduling
52/60
Dynamic priority assignment
Earliest deadline first
The best dynamic priority assignment strategy is the one
that gives the process with the earliest deadline the
highest priority
The advantage is its flexible response to input event
timing and process execution times
However, it depends on the availability of hard deadlines
for each process
Least slack time first
Which process has the least slack time?
Dynamic priority assignment requires a scheduler process
running the assignment strategy and thereby observing the
(local) system state. Increases power consumption.
4.4PriorityDrivenScheduling 4.4PriorityDrivenScheduling
53/60
The efficiency of resource sharing strategies is largely dependent
on the activation model
Time driven scheduling is very robust but reaches less
efficiency than other strategies or is only suitable for best
effort applications
Static order scheduling reaches highest efficiency but also
imposes the tightest constraints on the input event stream and
on narrow execution time intervals to reach this efficiency
Static priority scheduling provides good adaptation to a wide
range of input event model parameters but can create burst
event sequences for the general case of arbitrary deadlines
Static priority scheduling is well supported by algorithms for
analysis and optimization
Dynamic priority scheduling provides the highest flexibility but
incurs significant scheduling overhead (power consumption
overhead)
4.5ResourceSharing:Summary 4.5ResourceSharing:Summary
54/60
Difficulties of resource sharing
None of the resource sharing strategies is sufficiently optimal to
be used for all functions of an MPSoC
In any sufficiently complex MPSoC, there will always be
subsystems that use different scheduling strategies
5.GlobalPerformanceAnalysis 5.GlobalPerformanceAnalysis
Dynamicscheduling
55/60
Three classes of approaches: to analysis of a complete system
from subsystems analysis
1. Coherent analysis for several subsystems
2. Event model generalization for a set of scheduling strategies
3. Event model adaptation
1. Coherent analysis for several subsystems
This is to extend the scope of analysis to cover all scheduling
strategies of a system coherently: a holistic approach.
However, there is a huge variety of possible combinations of
scheduling strategies. So, such a holistic approach will be very
complex in general.
However, for some combinations, the coherent analysis proved
feasible and efficient. : e.g., TDMA combined with static priority
scheduling (used in automotives)
5.GlobalPerformanceAnalysis 5.GlobalPerformanceAnalysis
56/60
I
n
p
u
t

e
v
e
n
t

m
o
d
e
l
(
s
)
O
u
t
p
u
t

e
v
e
n
t

m
o
d
e
l
(
s
)
2. Event model generalization for a set of scheduling strategies
3. Event model adaptation
These approaches regard a system as a set of independent
subsystems communicating via event streams
Communication, then, corresponds to event propagation
5.GlobalPerformanceAnalysis 5.GlobalPerformanceAnalysis
57/60
I
n
p
u
t

e
v
e
n
t

m
o
d
e
l
(
s
)
O
u
t
p
u
t

e
v
e
n
t

m
o
d
e
l
(
s
)
2. Event model generalization for a set of scheduling strategies
3. Event model adaptation
Given an input model and a scheduling strategy, we can derive
an appropriate output event model.
E.g., a static process order scheduling with periodic input
generates a periodic output event stream with jitter.
5.GlobalPerformanceAnalysis 5.GlobalPerformanceAnalysis
58/60
2. Event model generalization for a set of scheduling strategies
This approach tries to cover all possible event models between
subsystems by a single, general event model
Then, all global analysis and optimization techniques are based
on this single interface event model
Early approach uses a set of vectors to declare the number of
possible events in a given time interval.
In other approach, a more intuitive model consisting of
minimum and maximum event arrival curves is defined. The
definition using arrival and service curves is intentionally
flexible to cover as many event models as possible
The event generalization approach trades generality for
analysis complexity
5.GlobalPerformanceAnalysis 5.GlobalPerformanceAnalysis
59/60
3. Event model adaptation
Two approaches of EMIF and EAF
Event model interface (EMIF)
This approach transforms an output event model to an input event
model. The transformation is called EMIF.
EMIF are just mathematical functions enabling event propagation for
analysis algorithms but do not appear in the target architecture and,
therefore, do not cause any overhead.
Lossless transformation, whereby the transformed input model can use
all information of the output model.
A periodic output event model with period tp can be transformed
losslessly to an input event model that is periodic with jitter, with
period tp and jitter j = 0.
Lossy transformation, whereby the input event model contains less
information than available in the output event model
An output event model with period tp with jitter j can be
transformed to a sporadic signal with minimum distance tp-j.
5.GlobalPerformanceAnalysis 5.GlobalPerformanceAnalysis
60/60
3. Event model adaptation
Event adaptation function(EAF)
Represents a function that must be implemented as a buffer
in hardware or software.
Example
A periodic event stream with burst can be transformed to
a periodic event stream, but it requires an interface
function with an internal state, effectively buffering and
resynchronizing the event flow.
The buffer overhead is not introduced as transformation
overhead, but it is always needed to enable correct systems
integration.
5.GlobalPerformanceAnalysis 5.GlobalPerformanceAnalysis
61/60
3. Event model adaptation
Event adaptation function(EAF)
Model transformation is a requirement of the target
architecture. Otherwise, such subsystem combinations are not
feasible, leading to potential buffer underflows.
5.GlobalPerformanceAnalysis 5.GlobalPerformanceAnalysis
Possibleeventmodeltransformationsforthefourcommonlyusedeventmodels.
62/60
3. Event model adaptation
Event adaptation function(EAF)
5.GlobalPerformanceAnalysis 5.GlobalPerformanceAnalysis
63/60
6.Conclusions 6.Conclusions
MPSoCs show complex run-time behavior due to the
implementation of multiple hardware, software, and
communication scheduling strategies in one system
Resource sharing creates transient dependencies, which are
not reflected in the system function (nonfunctional
dependency)
Finding and quantifying the impact on performance and
correctness by simulation is increasingly hard
Systematic formal MPSoC performance analysis is possible
with a hierarchical approach exploiting knowledge from
real-time systems analysis and formal execution time
analysis combined with novel approaches to heterogeneous
systems analysis
64/60
6.Conclusions 6.Conclusions
Supplementary
System-on-Chip Next Generation Electronics, Edited by
Bashir M. Al-Hashimi , The Institution of Engineering and
Technology
Chapter 2, System-level performance analysis the
SymTA/S approach, Rafik Henia, Arne Hamann, Marek
Jersak, Razvan Racu, Kai Richter and Rolf Ernst
65/60

Das könnte Ihnen auch gefallen