Sie sind auf Seite 1von 2

RECVF review --- Binod kumar, 143079023

Strong points

The paper utilizes the widely reserached concept of redundant execution to achieve fault-tolerance using the idea of execution assistance. The main contribution is the introduction of forwarding of critical values which act as hints for the execution of the second core (the trailing core). This leads to an energy-efficient fault-tolerant architecture. The proposed technique cleverly uses DVFS to obtain reduction in energy consumption. The experimental results show improvement in energy consumption as compared to two of the previously reported techniques, CRT and PVA. Generally, usage of per-core DVFS has shown to be effective in power reduction of a chip-level redundant architecture. This reduction though is appreciable only for some kind of programs such as those having poor branch prediction and L1 data cache performance by utilizing the slack created due to these events which provides an opportunity to operate the trailing core at lower frequency. It is indeed a strong point that the technique (i.e., critical value forwarding) introduced in this paper does not rely on these program properties. Another strong point is that the area overhead is quite minimal as compared to other fault-tolerant architectures. The paper shows that forwarding of values of critical instructions provides fault-tolerance simulataneously with improvement in power. This is definitely a better approach then simply forwarding the outcome of branches or forwarding the outcome of all load and branch instructions (sum of these two constitute nearly one-third of total instructions in SPEC'00 programs) as reported in the literature. This is clearly summarized in their key finding that 80% of the speedup of forwarding can be achieved by forwarding the results of just 10-15% of all instructions. The performance degradation with the proposed technique is also not much.

Weak points

One of the weak points of the proposed technique is its probable inability to cover all faults that occur in the cache coherence related circuitry. This is because it does not redundantly access the memory hierarchy for unverified cache lines obtained from cache-to-cache transfers. The authors could have commented on this aspect, regarding how fault recovery can still be done upto some extent (if possible). Another weak point is that the proposed technique does not provide any hint towards fault localisation (i.e., where the fault occured- whether trailing core/leading core?). If fault localisation can be done, then debugging for the corresponding electrical errors becomes relatively easier. The authors could have analyzed the chances of fault diagnosis with some additional amount of hardware on the lines of “Online Diagnosis of Hard Faults in Microprocessors” by Bower et al. ACM TACO-2007. The authors claim that, “the architecture provides a high degree of coverage for processor control and execution logic”. This may not be always true, i.e., for all kind of programs. Say, if a sparse matrix multiplication/computation is performed, then the execution logic is relatively less exercised. The proposed architecture would surely be able to perform fault recovery in this case. However, in some other program, a fault (which was latent during sparse matrix computation) may appear and the fault recovery/fault detection may not be that easy/smooth .

Disagreement

For the identification of critical instructions for value forwarding, the authors utilise the concept introduced in reference no. 32. These are merely heuristics and a plausible explaination for their relevance for all kind of programs need to be provided. Although results shown by the authors in present work seem to validate these heuristics, a reasoning as to why other heuristics do not perform as good as fanout2 heuristic is definitely needed. As is common with any signature based error detection technique, the chances of aliasing are

relevant to this paper also when a fingerprint generated by faulty leading core matches with that of the trailing core or vice-versa. The authors do not comment much on these scenarios. A very interesting case is when both the core are faulty and due to this, somehow the generated fingerprint (signatures) match, how does the fault recovery happen under such a scenario.

Point of discussion

This method can be used for coarse-grained multithreading to schedule several trailing threads on few processor cores and allow leading threads to occupy individual cores. This can definetely improve CMP throughput as compared to the proposal in this paper. The idea of critical value forwarding can be used for the scheduling in this case. Thus, compared to this proposal where each leading/trailing thread combination requires two cores for execution, there is a need to multiplex multiple trailing threads on a single trailing core. However, the multiplexing scheme must ensure small performance penalty as compared to non-redundant execution.

Possibilities of improvement

There are few opportunities (enhancements) which stem out of the use of execution assistance using critical value forwarding. One of them is exploring the possibility of adaptive critical value forwarding. The technique of adaptive critical value forwarding can be based on monitoring of some parameters of the leading core's execution. One such parameter can be to forward attempts to increase execution assistance for the programs by monitoring retirement stalls in the leading core. Similar to this, branch forwarding may also be carried out in an adaptive manner. The fanout() heuristic fails to gauge the importance of special-cases of branch/jump instructions as these do not produce any values to be consumed. However, they can be very important for performance. So, branch, jump and call instructions which are mispredicted should be treated on critical paths and result values of these instructions are forwarded from leading to trailing core where the branch outcomes can be used instead of branch predictions.

Suggestions

Execution assistance is an ideal candidate technique for performance enhancement in addition to fault-tolerance. Many work in literature have observed that redundant execution in leader-follower fashion can help achieve gain in performance. This however requires the execution to be adaptive in nature for optimium improvement in performance. This is because some programs require more execution assistance (i.e., more instruction results to be forwarded) to achieve good performance. However, a static scheme of forwarding would have to provide this higher amount to all programs even if they need/not. This wastes interconnect power, core power, core-to-core bandwidth and chip area. Compared to this, execution assistance in an adaptive fashion provides higher assistance to only to programs that need it. This can increasing the hardware efficiency while achieving the goal of fault-tolerance, improvement in performance and energy efficiency. The enhancement suggested in “Point of Discussion” can potentially solve the problem of throughput loss which is inherent in the present proposal because two cores are getting used to execute a single program. However, another important aspect is the order in which execution requests are handled by trailing core. For a multiplexed scheme, priority-based scheduling can be performed such that a higher priority is assigned to trailing core threads that are stalled in the leading core. This can also assist in performance improvement.