Sie sind auf Seite 1von 1

Review Binod Kumar (Roll No.

-143079023)
Strong points:

The authors revisit the decade old technique of speculative scheduling. They have proposed
an easily implementable, low-cost solution to reduce the number of replays caused by L1
bank conflicts, which can lead to performance improvement.
The authors have proposed a solution to improve upon existing L1 hit/miss prediction
schemes by taking into account instruction criticality.
Experimental results indicate that 78% replays due to L1 data cache banks conflicts can be
avoided and 96.5% L1 hit mispredictions can be avoided. The authors have discussed in
great detain the experimental results in each section which helps in understanding the result
for each benchmark.

Weak points:

The novelty of the paper is limited. While that is not a big problem, the authors have not
clearly outlined the distinction between their contribution and previous proposals.
The authors have used the criticality of a load using its position in the ROB when
it is completed as a criterion. They do not mention as to why this criterion is selected. A
good discussion on such criticality can be found in Focusing Processor Policies via
Critical-Path Prediction by Fields et al. (2001)

Points of disagreement:

The authors mention that, We assume that this buffer can handle all -ops in flight. The
recovery buffer may not be capable of handling all -ops and a discussion on that case
would have been useful for analysis in case of lack of write ports.

Suggestions for improvement:

The authors could have improved the quality of writing, stating their proposed techniques
(mentioning contrast with previous proposals) and then the experimental evaluations. In the
present paper, the flow is highly confusing and the contribution is not clear at all.
The authors could have compared the performance improvement with earlier approaches.
The implementation of a replay mechanism can affect energy also. An experimental
evaluation of energy impact could have been useful for estimating the usefulness of the
complete micro-architecture.

Points which are not clear:

The authors claim that This is not the case for several -ops due to circumstances mostly
associated with memory accesses. The authors should mention the kind of such -ops for
better explaination of the proposed methodology.

Points to be discussed in class:

The authors mention that We found that, at equal number of banks, set interleaving
performs similarly to a quadword (8-byte) interleaved scheme on our benchmark set. The
reason behind this is not clear. Is it depending on the workload (i.e, benchmark set)? Can set
interleaving perform better on a different workload?

Das könnte Ihnen auch gefallen