Sie sind auf Seite 1von 3

Survey on Shared Cache Management in Chip Multiprocessors

Mohammad Foyzur Rahman (mfrahman@ucdavis.edu) Student ID: 994851501 February 5, 2010

Problem

There has been an increasing disparity between processor and memory speed since 1980s. With the emergence of chip multiprocessors, pressure on memory and IO subsystem has only worsened. To improve the situation, traditional mechanisms such as multi level cache memory is adapted for CMP. Typically there have been multiple cache level in any such system where the larger last level cache is shared among all cores. This poses a new challenge of contention among the cores for limited cache space. Dierent core may have dierent data requirement (e.g. a core running data processing dominated [1] application may exhaust cache and starve others without any gain in performance), so using traditional sharing oblivious cache management policy such as LRU may yield poor performance. A body of research is trying to improve shared cache management through various ideas such as data partitioning [2], application based proling [3], compiler assisted cache management [4], cooperative data sharing [5], eviction algorithm [6], dynamic insertion and promotion policies [7] and many other approaches. We intend to summarize these works, their pros and cons and the present state of research in this area.

Motivation

For decades computer industry has enjoyed Moores law to improve performance. Performance scaling was traditionally done either through improved clock speed or increasing ILP. However, computer architects are now facing enormous challenge as ILP optimization has reached their limit and frequency scaling is simply not possible due to many limitations, such as power, heat, wire delay etc. So, to use the bounty of Moores law, we are into TLP era where more and more cores are assembled on chip. However, while in ideal situation we could enhance our

performance linearly with the number of cores, in real world, it seems increasingly illusory. One of the major challenges is the disparity between processor and memory speed. In ideal situation we would like memory to be as fast as processor. But in reality memory is several order of magnitudes slower than processor. Typical memory parameters such as capacity, speed and cost are in direct conict with each other and so, improving them overnight is not feasible. We can however utilize principle of locality to improve the situation while keeping the cost low. The newer multicore architecture utilizes principle of locality and supports multi level cache architecture to transparently reduce disparity of processor and memory speed while keeping the cost low. In traditional multilevel cache architecture, smaller size cache is dedicated to each core while larger cache is shared among all the cores. While this approach should work pretty well in theory, it has several practical challenges as each of cores starts contending for a shared limited space, where it could end up with worse performance if the contention is not managed properly. To ameliorate this situation a body of research is trying to develop ideas on improving shared cache management. Better management of shared cache could transparently allow processors to utilize scarce cache memory eciently, thereby yielding better performance. We propose to explore this research area and summarize their approaches, pros and cons, and the current state of the research. We believe our work will help future researchers to get an idea on the existing research literature in this area.

References
[1] G. Blake, R. G. Dreslinski, and T. Mudge, A survey of multicore processors, Signal Processing Magazine, IEEE, vol. 26, no. 6, pp. 2637, October 2009. [Online]. Available: http://dx.doi.org/10.1109/MSP.2009. 934110 [2] Y. Chen, W. Li, C. Kim, and Z. Tang, Ecient shared cache management through sharing-aware replacement and streaming-aware insertion policy, Parallel and Distributed Processing Symposium, International, vol. 0, pp. 111, 2009. [Online]. Available: http://dx.doi.org/10.1109/IPDPS.2009. 5161016 [3] R. Iyer, Cqos: a framework for enabling qos in shared caches of cmp platforms, in ICS 04: Proceedings of the 18th annual international conference on Supercomputing. New York, NY, USA: ACM, 2004, pp. 257266. [Online]. Available: http://dx.doi.org/10.1145/1006209.1006246 [4] M. Kandemir, S. P. Muralidhara, S. H. K. Narayanan, Y. Zhang, and O. Ozturk, Optimizing shared cache behavior of chip multiprocessors, in Micro-42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. New York, NY, USA: ACM, 2009, pp. 505516. [Online]. Available: http://dx.doi.org/10.1145/1669112.1669176

[5] S. Fide and S. Jenks, Proactive use of shared l3 caches to enhance cache communications in multi-core processors, Computer Architecture Letters, vol. 7, no. 2, pp. 5760, July 2008. [Online]. Available: http://dx.doi.org/10.1109/L-CA.2008.10 [6] M. Chaudhuri, Pseudo-lifo: the foundation of a new family of replacement policies for last-level caches, in Micro-42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. New York, NY, USA: ACM, 2009, pp. 401412. [Online]. Available: http://dx.doi.org/10.1145/1669112.1669164 [7] Y. Xie and G. H. Loh, Pipp: promotion/insertion pseudo-partitioning of multi-core shared caches, SIGARCH Comput. Archit. News, vol. 37, no. 3, pp. 174183, 2009. [Online]. Available: http://dx.doi.org/10.1145/1555815. 1555778

Das könnte Ihnen auch gefallen