Examen Embedded Computing

-1-
52
Acest tip de arhitectura are o complexitate de 16 = ff mare imposibil de testat in
totalitate. Acest tip de sisteme se pot optimiza doar prin intermediul algoritmilor genetici
testandu-se in mod aleator posibile combinatii.
Ciprian Radu – pg 129
Pareto efficiency:
When you change something to an individual and you make him better without making
any other individual worse is called a Pareto improvement or a Pareto-optimal move.
An allocation is defined as "Pareto efficient" or "Pareto optimal" when no further Pareto
improvements can be made.
Domination relation: no order can be established between points a and b (see figure) but
both a and b dominate c
-2-
1. generate the first population randomly
2. evaluate all the individuals
3. evaluate if the individuals have broken some rules
4. while the maximum number of evaluations has not been reached do:
a. …
5. STOP
b. Definiti multimea indivizilor Pareto pt o problema de minimizare bi-obiectiv (O1,

O2).
Optimization is an essential process in engineering applications where multiple

and often conflicting objectives need to be satisfied. Solving such problems has
traditionally consisted of converting all objectives into a single objective (SO)
function. The ultimate goal is to find the solution that minimizes or maximizes
this single objective while maintaining the physical constraints of the system or
process
The Pareto principle (also known as the 80–20 rule, the law of the vital few, and
the principle of factor sparsity) states that, for many events, roughly 80% of the
effects come from 20% of the causes.
To compare candidate solutions to the MO problems, the concepts of Pareto

dominance and Pareto optimality are commonly used. These concepts were
originally introduced by Francis Ysidro, and then generalized by Vilfredo Pareto [
[3]]. A solution belongs to the Pareto set if there is no other solution that can
improve at least one of the objectives without degradation any other objective.
-3-
Fig 2 depicts a Pareto set for a two-objective minimization problem. Potential
solutions that optimize f1 and f2 are shown on the graph.
c. PDF: 1c.pdf
-4-
-5-
-6-
Subiectul 2.
a) Explicati performanta superioara (IPC) a modelului de reutilizare Sv
fata de modelul Sn.
These two schemes are used to implement dynamic reuse. These schemes mainly differ in
the way in which reusable results are identified.
The first scheme (Sv) tracks operand values for each instruction, the second scheme
(Sn) tracks only operand names (register identifiers).
(the third(Sn+d) and the fourth scheme (Sv+d) extend the first two schemes by the use of
dependence relationships among the instructions for tracking reuse).
There are a couple of issues with using these schemes. First of all we have to take a look
at the type of information that we store in the RB(Reuse Buffer). Second how we will
know that we can reuse those values store in the RB. And third how does the information
from the RB gets updated or invalidated.
Reuse Buffer Entry’s formats:

A. Format 1 :
TAG OP1_NAME OP2_ NAME ADDRESS RESULT MEM_VALID
B. Format 2 :
TAG OP1_NAME OP2_ ADDRESS RESULT RES_VALID MEM_VALID

NAME
C. Format 3 :
TAG OP1_NAME OP2_ NAME ADDRESS RESULT RES_VALID MEM_VALID

src-index | reg name src-index | reg name
-7-
TAG – might be represented by the instruction’s PC;
OP1, OP2 – represent the value or name of source registers used by the instruction;
RESULT – represents the actual result of the instruction, which will be reused in the case
of a “hit” in the RB;
RES_VALID – indicates, in the case of arithmetic/logic instructions, if the result

“RESULT” is valid or not. In the case of Load and Store instructions, if it is set, shows
that the address of the instruction is valid in the RB and thus it can be reused. It is set
when the instruction is introduced into the RB. It is reset automatically by any instruction
that writes one of the source registers (OP1, OP2);
ADDRESS – is the (reusable) memory address in the case of a Load/Store instruction;
MEM_VALID – indicates if the value from the “RESULT” field is reusable in the case
of a Load instruction. The bit is set when the Load instruction is written into the RB. The
bit is reset by any Store instruction that has the same access address. Thus, the reuse of
arithmetic/logic instructions is assured if RES_VALID=1.
If RES_VALID=1, also guarantees the correct address for any Load/Store instruction
and exempts the processor from its computation (indexed addressing ó [Register +
Offset]). On the other hand, the result of a Load instruction can be reused only if
MEM_VALID=1 AND RES_VALID=1.
Scheme Sv : Reuse based upon operand values
Scheme Sv, is a straightforward implementation of the reuse concept. The

operand values of an instruction are stored along with its result, Since the reuse test
is based on operand values.
When an instruction is decoded, its current operand values are compared
with those stored in the RB. If they are the same, then the result stored in the RB
is reused.
RB entry: The tag field stores part of the PC. The result, operand value1 and
operand value2, store the result and the operand values of the instruction. These fields
are used to identify the instruction (or address calculation in case of a load/store)
that can be reused.
The memvalid bit and the address field are used to determine if the actual
memory access for a load instruction can be reused; the memvalid bit indicates
whether the value loaded from memory (present in the result field) is valid, and
the address field stores the memory address (i.e., the outcome of the address
calculation).
-8-
Reuse test: For testing reuse, the operands of an instruction are compared
with the values in the operand value fields of the RB entry.
A match indicates that result is valid (for non-load/ store instructions) or
address is valid (for loads and stores). For loads, in addition to testing the validity
of the address bits, we also need to test the memvalid bit to see if the outcome of
the load (in the result field) can be reused. If the operand values are not known at the
time of the reuse test then the instruction is not reused.
Invalidation: For non-load operations, the reuse test works because the
operands uniquely determine the result and therefore invalidations are not needed to
maintain the integrity of the test.
For loads, a store to the same address invalidates the value in the result field.
Accordingly, on a store the address field of each RB entry is searched for a matching
address, and the memvalid bit reset for matching entries.
Note that the address field, memvalid field, and the associative search for
invalidations are required only to maintain the integrity of load values.
The RB can be split into two buffers: one for storing load values and
another, the main RB, for storing everything except the load values (including
entries for load addresses).
Scheme Sn: Reuse based upon register names
In scheme Sn, we attempt to trivialize the reuse test (and also to reduce the size of
each RB entry). Rather than store operand values, we store operand (architectural)
register identifiers in the RB.
When an instruction writes into a register, all instructions with a matching
(source) register identifier in the RB are invalidated. Only the valid instructions are
reused from the RB.
The advantage of this reuse test is that it can be done much earlier in the pipeline
than the reuse test in scheme Sv since it does not require the operand values.
Since the reuse test is based on operand names (and not value), we call this
scheme Sn, where ‘n’ stands for name.
RB entry : Differences from scheme Sv are:

(i) the operand1 and operand2 fields contain register names of the operands
instead of actual operand values,
(ii) there is a resultvalid bit, which indicates whether the result is valid. (This bit
was not required in scheme Sv because the reuse test detected the stale results.) This bit is
set when an entry is first inserted into the RB.
Reuse test: The reuse test is as simple as testing the state of resultvalid and
memvalid bits.
Address calculation for load/store instructions and results for all other instructions can be
reused if the resultvalid bit is set; the result of a load instruction can be reused if both
resultvalid and memvalid are set. (Since different instances of the same static instruction
-9-
will have the same operand names, we do not need to compare the operand names
explicitly for reuse.)
As mentioned above, since this reuse test does not require operand values, it can be
potentially done earlier in the pipeline; this may result in the reuse being more beneficial.
Invalidations : As before, stores invalidate the loads from the same address
(memvalid bit is reset). Moreover, when a register is written, the RB is searched for
entries whose operand field matches the name of the register. The entries that match are
marked invalid (resultvalid bit is reset).
Suplimentar(nu e in subiecte):
Scheme Sn+d: Reuse using register names and dependence chains
Scheme Sn+d extends scheme Sn by attempting to establish chains of dependent

instructions, and to track the reuse status of such instruction chains. Since the reuse status
of an instruction in the RB is established based on its operand names and/or its
dependence information in this scheme, we call it scheme Sn+d (the letters ‘n’ and ‘d’
stand for name and dependence respectively).
Dependence chains are created as entries are inserted into the RB. To facilitate
this process, we use a mapping table called a Register Source Table (RST). The RST has
an entry for each architectural register; it tracks the RB entry which has (or will have) the
latest result for that register. When an entry is reserved in the RB for an instruction, the
RST entry for its destination register is updated to point to the reserved entry. If,
however, an entry could not be reserved, then the RST entry for the destination register is
set to invalid (since the latest producer of that register will not be in the RB). When an
instruction is reused, the RST entry for its destination register is updated to point to the
reused RB entry. The RST is similar in spirit
to the rename map used in register renaming. In essence, the RST is used to link a
consumer
instruction to the latest producer instruction by pointing to the “physical register” (RB
entry)
of the producer. Accordingly, another way of looking at scheme Sn+d is to consider it as
a “physical register” version of scheme Sn, which tracks dependences using architectural
registers.
RB entry: An RB entry is similar to the one in scheme Sn, except forthe addition
of a src-index field. The dependence links are created by storing the RB index of
the source instructions in this field. An invalid value is inserted in this field if the source
doesn’t exist in the RB.
Reuse test: The reuse status of independent instructions is established as it was in
scheme Sn (resultvalid bit is set; memvalid is set in the case of load instructions). A
dependent instruction is reused if its source instructions (in the RB), as indicated by the
src-index field of its operands, are indeed the latest producers for its operands. This fact is
established with the help of the RST.
State updates: As in schemes Sv and Sn, stores invalidate loads to the same
address (memvalid is reset). As in scheme Sn, independent instructions are invalidated
-10-
when their operands registers are overwritten (resultvalid is reset). Dependent
instructions need not be invalidated on operand overwrites because their reuse status can
be established using their dependence information. Instead, they are invalidated when
their source instructions are evicted from the RB, i.e., when the dependence information
is lost. To perform this operation the RB needs to be searched for entries whose src-index
field matches the index (in the RB) of the source instruction being evicted. The entries
which result in a match are invalidated (resultvalid bit is reset).
Here is a reuse example with Sn+d:
Scheme Sv+d: Reuse using register values and dependence chains
Although the scheme Sv is the most accurate in detecting the reusable instructions
among
the three schemes presented so far, it is not very well suited for reusing chains of
dependent instructions in a single cycle. For example, reusing two instructions, I and J,
with J being dependent on I, would require that we first reuse I and then using the reused
result of I we perform the reuse test for J. This whole operation may be difficult to do in a
single cycle, especially for long dependence chains. To facilitate the reuse of dependent
instructions, we augment the scheme Sv with the dependence-tracking ability of scheme
Sn+d, giving us thescheme Sv+d. As in scheme Sn+d, instructions in this scheme are
stored in the RB with pointers to the RB entries containing their source instructions.
RB entry: An RB entry is similar to the one in scheme Sv, except for the addition
of a src-index field. Just like in scheme Sn+d, the dependence links are created by storing
the RB index of the source instructions in this field. An invalid value is inserted in this
field if the source doesn’t exist in the RB.
Reuse test: The reuse status of independent instructions is established as in
scheme Sv : the operand values are compared with the current values of those registers
and the memvalid bit is used to determine the validity of loads. As in scheme Sn+d, a
dependent instruction is reused by confirming that its source instructions (in the RB), as
-11-
indicated by the src-index field of its operands, are indeed the latest producers for its
operands. This fact is established with the help of the RST.
State updates: As in other schemes, stores invalidate the loads to the same
address (memvalid is reset). As in scheme Sn+d, the state of dependent instructions is
updated when their source instructions are evicted from the RB, i.e., when their
dependence information is lost. The state can be updated in two ways: either (i) the
dependent instructions can be marked invalid, or (ii) their src-index fields, pointing to the
evicted source, are annulled (and thereafter, they are treated like independent instructions
— i.e., their validity is determined by value comparison). The first option is simple but
conservative since it invalidates potentially useful instructions. The second option, on the
other hand, retains the dependent instructions, but it requires additional space in RB
entries since the operand values need to stored for the dependent instructions as well (so
that value comparison can be performed if the dependent instructions become
independent). Nevertheless, both update operations require that the RB be searched for
the entries whose src-index field matches the RB index of the source instruction being
evicted. These matching entries are either invalidated or converted into independent
entries.
b) Definiti conceptul de k-value Locality. Posibile avantaje ale unui

predictor contextual de valori fata de o schema de reutilizare dinamica a
instructiunilor.
Dynamic Instruction Reuse (DIR)

Statistics suggest that many instructions, and group of instructions having the same
inputs, are executed dynamically. Such instructions do not have to be executed repeatedly
– their results can be obtained from a cache data structure where they were previously
saved. Result: throughput gain.
Value Prediction (VP)
It is based on the value locality concept, which describes the likelihood of the recurrence
of an instruction's previously seen value (50% - 80%). As a consequence, VP consists on
predicting instruction's value based on its previously – seen values.
DIR and VP collapse true dependences and reduce average result latency.
Value Prediction & Value Locality
Dynamic Instruction Value Prediction is a speculative technique that is based on
instructions value prediction, followed by speculative execution of the successive
dependent instructions.
This speculative execution is later validated, after the instruction's result is produced. In
the case of a correct prediction, the (critical path's) execution time might be reduced.
Otherwise, the wrong predicted instructions must be re=executed again(recovery), with
corresponding time penalties.
Pred_Probab = 1/232
Value Locality (Vecinatatea valorii)
K-Value Locality means the statistical probability (frequency) to produce a value
belonging to the previous k value instruction's instances.
- Temporal locality is necessary but not sufficient for value locality to exist.
-12-
A contextual predictor predicts the next value
based on a particular stored pattern (context) that is repetitively generated in
the value sequence, in a markovian stochastic manner. Theoretically they
can predict any repetitive value sequences. A context predictor is of order k
if its context information includes the last k values, and, therefore, the
search is done using this pattern of k values length. As we already pointed
out, a contextual predictor of order k derives from the k-value locality metric
that represents an idealised k-context predictor.
In this case the prediction will be done based on the most frequent value that
follows a pattern context in the string of history values.
The dynamic instruction reuse is

a relatively new non-speculative micro-architectural technique that exploits
the repetition of dynamic instructions, reducing thus the quantity of code
necessary for execution with remarkable benefit on the processing speed.
RESTUL, BALARII…
- Measurements using SPEC benchmarks shows that value locality on Load instructions
is about 50% using a history of one (producing the same value like the previous one)
respectively 80% using a history of 16 previous instances.
- The concept is strongly related with the redundant computing concepts (like the
memorization technique) including here the introduced Dynamic Instruction Reuse
technique.
Value Locality -> Value Predictability
However, value locality and value predictability is not the same concept. You can
have 100% locality and be very unpredictable (as a simple example, a random sequence
of 0s and 1s has 100% with history of two values but can be very unpredictable). More
general: if the values sequence: is not a Markov process.
Nu orice secventa predictibila de valori deriva din conceptul de localitate a valorilor:
Ex: i++, 1, 2, 3, 4, 5, 6, 7, ?
Why Value Locality?
Data redundancy – the input data sets for general – purpose programs are redundant
(sparse matrices, file texts – with many blanks and many repetitive characters, free cells
in table calculus, etc).
Exploiting compiler error tables when there are generated repetitive errors.
Program constants, meaning that is more efficient to load program constants from
memory than constructing them as immediate operands.
In case – swith constructions, it is needed the repetitive load of a constant (branch's
base address)
Virtual function calls – loaded a function pointer that is a constant during the run-
time. Similar in object oriented programming is polymorphism's implementation
Computed branches – for calculating a branch address it is necessary to load a register
with the base address for the branch jump table, which might be a run-time constant.
Register spill code – when all the CPU registers are busy, variables that may remain
constant are spilled to data-memory and loaded repeatedly.
-13-
Polling algorithms – the most likely outcome is that the event being interrogated for has
not yet occurred, involving the redundant computation to repeatedly check for the event,
etc.
Requirements:
- Prediction and speculation need dedicated mechanisms for:
- Detecting mispredicted values and chacking the prediction's accuracy.
- Processor's context recovery after a miss-prediction (ROB)
- Issuing dependent instructions speculatively (involving the standatd out-of-order logic
with some minor modifications)
- Storing and bypassing predicted values for the next dependent processed instructions.
This speculative mechanism is the main VP's advantage.
-14-
-15-
-16-
-17-
-18-
-19-
Two-Level Adaptive Branch Prediction uses
two levels of branch history information to make a branch prediction. The
first level consists of a History Register (HR) that records the outcome of
the last k branches encountered. The HR may be a single global register,
HRg, that records the outcome of last k branches executed in the dynamic
instruction stream or one of multiple local history registers, HRl, that record
the last k outcomes of each branch. The second level of the predictor,
known as the Pattern History Table (PHT) records the behaviour of a branch
during previous occurrences of the first level predictor. It consists of an
array of two-bit saturating counters, one for each possible entry in the HR.
2k entries are therefore required if a global PHT is provided, or many times
this number if a separate HR and therefore PHT is provided for each branch.
Two types of global neural predictors were

developed and presented, an LVQ predictor and an MLP predictor. While
the LVQ predictor achieved results comparable to an equivalent
conventional predictor, the statically trained MLP predictor and also the
(only) dynamically trained MLP, outperformed its conventional counterpart.
The dynamic MLP predictor was a bit less successful but still managed to
outperform a conventional predictor. These results suggested that not only
can neural networks generate respectable prediction results, but also in some
circumstances a neural predictor may be able to exploit correlation
information more effectively than a conventional predictor and at a linear
cost. The obtained results are extremely optimistic taking into account that
we compared global neural predictors (one predictor to all branches!) with
classical branch predictors (one predictor to each branch!), therefore totally
unfair from neural point of view.
An important advantage is that, in contrast with

conventional branch prediction, neural branch prediction can exploit deeper
correlations at linear complexities rather than exponential complexities as
the classical Two Level Adaptive schemes involve.
Subiectul 4:
a. De ce este dificil de predictionat un branch dinamic avand un

comportament entropic si nepolarizat nepolarizat intr-un anumit
context dynamic? Ce strategii ar putea fi aplicate ca un asemenea
branch sa devina (mai) predictibil?
b. Avantaje ale predictoarelor neuronalede branch-uri fata de cele clasice

de tip Two Level Adaptive.
-20-
The branch prediction problem consists of two sub-problems:
firstly generating the correct prediction and secondly in the case of a
taken branch predicting the correct branch target.
Two-level adaptive branch predictors:

- can achieve a 3% misprediction rate which still have a severe
limiting impact on MII processor performance (multiple-instruction
issue =MII).
- been implemented in several commercial microprocessors
- although high prediction rates are achieved with two-level adaptive
predictors, this success is obtained by providing very large arrays of
prediction counters or pattern history tables (PHTs). Since the size
of the PHT increases exponentially as a function of history register
length, the cost of the PHT can become excessive, and it is difficult
to exploit a large amount of branch history effectively
- disadvantages: in most practical implementations each prediction
counter is shared between several branches; large arrays of
prediction counters require extensive initial training before they can
predict accurately. Furthermore, the amount of training required
increases as additional branch history is exploited, further limiting
the amount of branch history that can be exploited
- The main disadvantages can be found in the following areas:
• High cost of PHT
• Branch interference
• Slow initialisation
It should be possible to predict certain branches that are currently

hard to predict more accurately by identifying new correlation
mechanisms and adding them to the prediction process. We suggest
that neural predictors may prove to be a useful vehicle for
investigating potential new correlation mechanisms.
NNs can be used to dynamically predict branch outcomes by
forecasting future values of data series. Neural predictors can achieve
success rates that are comparable to conventional two-level adaptive
predictors. NN’s: uses the history register as input values to his
perceptron predictor, whereas we use the history register and the
branch address as input values to our LVQ and backpropagation
neural predictors.
-21-
Both researchers (Jimenez and Vintan) conclude that greater
correlations are achieved by neural predictors than two-level
predictors and greater prediction accuracy can be achieved. Jimenez
showed that his predictor achieved a misprediction rate of 1.71%,
which equates to 36% fewer mispredictions than a McFarling style
hybrid two-level predictor [18].
Vintan showed that his predictor achieved a misprediction rate of
about 11%, which equates to 3% improvement in the misprediction
rate for his
neural predictor over a conventional two-level predictor.
Neural networks branch prediction has a linear growth
compared to two-level adaptive branch predictors which has an
exponential growth.
-22-
-23-
-24-
Subiectul 5
1. Problema coerentei cash-urilor in sist. multiprocessor
Problema coerentei cash-urilor porneste de la posibilitatea ca mai mult decat un cash

al sistemului sa mentina o copie a aceluiasi bloc de memorie.
Daca procesoare diferite transfera in in cash-ul lor acelasi bloc de memorie este
necesar sa se asigure faptul ca aceste copii raman la fel, una ci cealalta.
Se considera urmatoarea figura:
Sistemul considerat este alcatuit din memoria M, procesoarele Pi fiecare avand

cash-ul Ci.
Daca Pi si Pj transfera aceeasi variabila X din memoria in cash-ul lor nu este nici
o problema daca X este doar citit. Insa daca de exemplu Pi il modifica pe X atunci copia
lui X in cash-ul Cj devine inconsistenta (nu mai reflecta corect valoarea lui X din
memorie).
In privinta aceste probleme sunt folosite 2 metode principale:
a. Metoda invalidarii: presupune ca fiind valida ultima copie care s-a schimbat; in
cazul nostru ar fi considerata valida copia din Ci.
b. Metoda update-ului: fiecare modificare a unui bloc din fiecare cash este
comunicata celorlalte memorii cash
2. Avantajele protocolului MESI fata de MSI
-25-
Pentru a intelege aceste avantaje este necesar sa explicam in primul rand caracteristicile
protocolului MSI.
Acesta este practic un protocol de invalidare pentru scrierea ulterioara ”write-back” a
cash-urilor.
Pentru a reusi sa se deosebeasca blocurile care au fost modificate de cele nemodificate

acest protocol foloseste 3 stari la orice write-back:
- Modified: inseamna ca doar acest cash are o coipe valida a blocului de
memorie; celelalte cash-uri sunt inconsistente.
- Shared: toate blocurile sunt corelate intre ele, deci toate au o copie valida.
- Invalid.
In continuare vor fi prezentate avantajele protocolului MESI fata de MSI:

Analizand protocolul MSI se poate observa ineficienta atunci cand un proces necesita
citirea si modificarea unui bloc de date. Prima data este o tranzitie care trece blocul de
memorie in starea Shared iar apoi urmeaza trecerea blocului din Shared in Modified.
Protocolul MESI aduce nou o a patra stare exclusive pentru a reduce traficul creat de
scrierea unui bloc ce exista decat in unul dintre cash-uri (atunci cand sunt modificate
datele dintr-un cash iar celelalte devin neconsistente).
3. Avantajele protocolului MOESI fata dr MESI
Elementul de noutate il constituie starea Owned.
-26-
Aceasta stare este asemanatoare cu cea Shared in sensul ca poate sa stocheze o copie a
celor mai recente date (datele corecte) .
Cu starea Modified se aseamana prin faptul ca acea copie din memoria principala poate
sa fie incorecta.
Acest protocol grupeaza toate starile posibile utilizate in celelalte protocoale.
Clasificarea unui bloc cash utilizand protocolul MOESI se face infunctie de urmatoarele
caracteristici:
- Validity
- Exclusiveness
- Ownership
Aceasta metoda permite evitarea scrierii datelor modificate in memorie inainte ca ele sa
fie transmise catre celelalte cash-uri.
4. Necesitatea proceselor atomice

Necesitatea utilizarii proceselor atomice provine de la nevoia ca procesarea in paralel sa
se poata desfasura, in sensul in care procesele nu se interblocheza intre ele.
In procesele atomice se utilizeaza atomic-locks, care atunci cand sunt utilizate blocheza
mai multe variabile in acelasi timp . Daca nu pot fi blocate toate atunci nu mai este
blocata nici una.
In acest mod se exclude posibilitatea unui deadlock, ce ar fi putut sa apara de exemplu
atunci cand un thread ar bloca prima variabilea iar cel de-al doilea thread pe a doia
variabla. In acest mod nici unul dintre cele doua thread-uri u ar fi completat.
Dezavantaje:
- Creaza blocaje, in sensul in care anumite thread-uri sunt obligate sa astepte
pana cand un lock este eliberat;
- Maresc complexitatea programelor;
- Prioritate: thread-urile cu priporitate mai mare nu pot fi executate daca cele
cu prioritate mai mica au loc pe anumite resurse ce sunt necesitate;
- Greu de realizat procesul de debugging deoarece bugg-urile sunt dependente
de timp.
If an operation requires multiple CPU instructions, then it may be interrupted in the middle of
executing. If this results in a context switch (or if the interrupt handler refers to data that was
being used) then atomicity could be compromised. It is possible to use any standard locking
technique (e.g. aspinlock) to prevent this, but may be inefficient. If it is possible, disabling
interrupts may be the most efficient method of ensuring atomicity (although note that this may
increase the worst-case interrupt latency, which could be problematic if it becomes too long).
-27-
-28-
-29-
- MPI_Reduce: un singur proces (procesul radacina) colecteaza datele de
la celelalte procese dintr-un grup si le combina pe baza unei operatii intr-o
singura data
- MPI_Barrier : blocheaza procesele comunicatorului comm pe masura ce
functia este apelata de catre acestea pana cand toate procesele comunicatorului
au apelat aceasta functie. La revenirea din functie toate procesele sunt
sincronizate.
Scope for exam (discussed in last lesson)

(These are the topics covered in the last lesson when talking about the exam. Please add
topics and answers which you think are relevant.)
1. Advanced branch prediction

a. Difficult predictable branches. For example:
i. dynamic branches
dynamic branches that are not predictable with the
correlation information used by the actual prediction methods (local branch
history, global history, and path).
the branch’s condition is represented by the difference between
two source operands.
Obviously, this can be positive, negative or zero.
ii. unbiased branches - in a given context (= path information: GHR,

LHR)
iii. shuffled/random behaviour
b. solutions
i. 1.) enlarge prediction context (number of bits)
1. => number of unbiased branches decreases
ii. 2.) find new relevant context information for
representing/predicting such branches
1. => but., it was for example proved, that additional path
information did not improve prediction accuracy
significantly
c. Big question is: “What is random in computer engineering?”
2. Neural Branch prediction methods
a. Weaknesses of two-level adaptive branch predictor
-30-
i. 1.) finite state machine automata
1. Is not a mathematical proof for convergence, but only a
empirical algorithm
2. 2.) polynomial rising complexity (to exploit n bits of
history, a 2^n history table is needed)
ii. Proposal: perceptron branch predictor
1. Perceptron based learning/prediction algorithm
2. Perceptron is also feasible to be implemented in hardware
3. Dynamic instruction reuse
a. Why is there a significant degree of instruction reuse in programs? (e.g.
polymorphism)
b. How could this be efficiently exploited in a superscalar processor? (Sodani
& Sohi -> reuse buffer structure)
c. Ideas about extending Dynamic Instruction Reuse to function reuse at high
level language (= Memoization)
memoization that
saved the function’s result in a table. If the function is called again with the
same parameters then its result is reused from the table instead of reevaluation
Memoization is also used to reduce the running time of some
optimized compilers where the same data dependence test is carried out
repeatedly.
4. Dynamic Value Prediction

a. Value locality concept
b. Computational and contextual prediction schemes
Computational predictors are predicting the

next value based on some previous values, in an algorithmic
(computational) manner, therefore according to a deterministic recurrence
formula. An incremental predictor belongs to the computational class. As it
can be observed, a computational predictor doesn’t directly derive from the
value locality concept. A contextual predictor predicts the next value
based on a particular stored pattern (context) that is repetitively generated in
the value sequence, in a markovian stochastic manner. Theoretically they
can predict any repetitive value sequences. A context predictor is of order k
if its context information includes the last k values, and, therefore, the
search is done using this pattern of k values length. As we already pointed
out, a contextual predictor of order k derives from the k-value locality metric
that represents an idealised k-context predictor.
-31-
c. Hybrid prediction (multiple value predictors to work together -> Meta-
Predictor)
5. Advanced Prediction Methods in computer science
a. Simple Markovian Predictors
Prediction by Partial Matching

in data compression and pre-fetching and also in some
speech recognition problems.
A Markov predictor of order j predicts
the next bit based upon the j immediately preceding bits pattern (a simple
Markov chain).
More precisely, the prediction process counts every time
when that pattern on j bits was found, if it was followed by a ‘1’ or
respectively by a ‘0’.
aaabcaaabcaaa?
Context a
a complete PPM predictor contains N simple Markov

predictors, from 0th order to (N-1)th order. If the (N-1)th Markov predictor
produces a prediction (the context is matched in the sequence) the process is
finished, otherwise the (N-2)th order Markov predictor will be activated, and
so on until order 0.
b. Prediction by partial matching algorithms

i. N-markov predictors of order 1 working together
c. Bayesian Predictors
For example, a Bayesian network could represent the probabilistic relationships between
diseases and symptoms. Given symptoms, the network can be used to compute the probabilities
of the presence of various diseases.
d. Neural predictors
Learning Vector
Quantisation Network (LVQ, T. Kohonen) and a Multi-Layer Perceptron
(MLP)
e. Methaphor of hidden markov predictors (have a rough idea, no details!)

In simpler Markov models (like a Markov chain), the state is directly visible to the observer, and
therefore the state transition probabilities are the only parameters. In a hidden Markov model, the
state is not directly visible, but output, dependent on the state, is visible. Each state has a
probability distribution over the possible output tokens.
-32-
f. Meta-Prediction (again, see 4.)
6. X
a. Multi-Core and many-core architectures
i. Fundamental concepts: programming models
1. (shared address/ memory vs. message passing)
ii. Cache coherence problem
iii. Critical section concept
In concurrent programming, a critical section is a piece of code that accesses a shared resource
(data structure or device) that must not be concurrently accessed by more than one thread of
execution.[1] A critical section will usually terminate in fixed time, and a thread, task, or process
will have to wait for a fixed time to enter it (aka bounded waiting).
Some synchronization mechanism is required at the entry and exit of the critical section to ensure
exclusive use, for example a semaphore.
iv. Shared variable consistency problem

b. Automatic design space exploration
i. Pareto-multiobjective optimization concepts based on evolutionary
algorithms (e.g. genetic algorithms)
ii. Optimization examples
General scope:
● Only important concepts, no details! For example:
○ How does a perceptron learn?
○ What is an unbiased branch?
○ Why can problems happen in object oriented programming? (e.g. C++)
-33-

Examen Embedded Computing

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Examen Embedded Computing

Hochgeladen von

Copyright:

Verfügbare Formate

-1-

Ciprian Radu – pg 129

b. Definiti multimea indivizilor Pareto pt o problema de minimizare bi-obiectiv (O1,

Optimization is an essential process in engineering applications where multiple

To compare candidate solutions to the MO problems, the concepts of Pareto

Reuse Buffer Entry’s formats:

TAG OP1_NAME OP2_ NAME ADDRESS RESULT MEM_VALID

TAG OP1_NAME OP2_ ADDRESS RESULT RES_VALID MEM_VALID

TAG OP1_NAME OP2_ NAME ADDRESS RESULT RES_VALID MEM_VALID

RES_VALID – indicates, in the case of arithmetic/logic instructions, if the result

ADDRESS – is the (reusable) memory address in the case of a Load/Store instruction;

Scheme Sv : Reuse based upon operand values

Scheme Sv, is a straightforward implementation of the reuse concept. The

Scheme Sn: Reuse based upon register names

RB entry : Differences from scheme Sv are:

Scheme Sn+d extends scheme Sn by attempting to establish chains of dependent

Scheme Sv+d: Reuse using register values and dependence chains

b) Definiti conceptul de k-value Locality. Posibile avantaje ale unui

Dynamic Instruction Reuse (DIR)

The dynamic instruction reuse is

Two types of global neural predictors were

An important advantage is that, in contrast with

a. De ce este dificil de predictionat un branch dinamic avand un

b. Avantaje ale predictoarelor neuronalede branch-uri fata de cele clasice

Two-level adaptive branch predictors:

It should be possible to predict certain branches that are currently

Problema coerentei cash-urilor porneste de la posibilitatea ca mai mult decat un cash

Sistemul considerat este alcatuit din memoria M, procesoarele Pi fiecare avand

2. Avantajele protocolului MESI fata de MSI

Pentru a reusi sa se deosebeasca blocurile care au fost modificate de cele nemodificate

In continuare vor fi prezentate avantajele protocolului MESI fata de MSI:

3. Avantajele protocolului MOESI fata dr MESI

Elementul de noutate il constituie starea Owned.

Acest protocol grupeaza toate starile posibile utilizate in celelalte protocoale.

4. Necesitatea proceselor atomice

Scope for exam (discussed in last lesson)

1. Advanced branch prediction

ii. unbiased branches - in a given context (= path information: GHR,

4. Dynamic Value Prediction

Computational predictors are predicting the

Prediction by Partial Matching

a complete PPM predictor contains N simple Markov

b. Prediction by partial matching algorithms

e. Methaphor of hidden markov predictors (have a rough idea, no details!)

iv. Shared variable consistency problem

Das könnte Ihnen auch gefallen