Sie sind auf Seite 1von 29

Dynamic Way Partitioning Of Hybrid Last Level Cache

Anushree Pendharkar

MTP Phase 1
Guide: Prof Virendra Singh

June 29, 2019

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 1 / 29
Outline

Background
Motivation
Literature review
Experimental results
Summary
Proposed approach
Future work

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 2 / 29
Background

1
Figure: Performance gap between processor and memory

One way to bridge the gap is to increase Last Level Cache (LLC) size.
1
Carvalho, Carlos. ”The gap between processor and memory speeds.” Proc. of IEEE International Conference on Control
and Automation. 2002.
Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 3 / 29
Motivation

Problems with LLC made of SRAM :-


SRAM consumes significant static power and occupies significant
area.
Leakage current of transistor will increase more in nanometer
technology.
Need to look for new memory type with :-
Negligible leakage power dissipation
Access time comparable to SRAM
Compatibility with CMOS
High density and scalability

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 4 / 29
Motivation

STTRAM is a potential alternative to SRAM. However it suffers from :-


High write energy and write latency.
Limited write endurance (1012 - 1015 writes per cell)2
Problem statement :- Design last level cache using STTRAM with less
writes to it.

2
P. Chi, S. Li, Y. Cheng, Y. Lu, S. H. Kang, and Y. Xie, Architecture design with stt-ram: Opportunities and challenges, in
2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 109114, IEEE, 2016.
Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 5 / 29
Literature review

Two category of solutions :-


Circuit level solutions
Architecture level solutions.
Circuit level solutions :-
Reducing retention time of STTRAM cells 3

(requires refresh hardware)


Early write termination 4

These works do not consider non-uniform writes distribution across


cache.

3
Smullen, Clinton W., et al. ”Relaxing non-volatility for fast and energy-efficient STT-RAM caches.”2011 IEEE 17th
International Symposium on High Performance Computer Architecture. IEEE, 2011.
4
Zhou, Ping, et al. ”Energy reduction for STT-RAM using early write termination.”Proceedings of the 2009 International
Conference on Computer-Aided Design. ACM, 2009.
Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 6 / 29
Literature review

Architecture level solutions for inclusive cache :-


Partition few ways of set into SRAM type and remaining ways into
STTRAM type that is hybrid cache.
Identify frequently modified (write-intensive) blocks and place them
into SRAM region.
Simple design like Read Write Aware Hybrid Cache Architecture
5 (RWHCA) places block in SRAM region on store miss and in

STTRAM region on load miss.


Blocks brought after load miss may also become write intensive.
Hence, RWHCA approach incur significant migration overhead.

5
X. Wu, J. Li, L. Zhang, E. Speight, and Y. Xie, Power and performance of read-write aware hybrid caches with
non-volatile memories, in Proceedings of the Conference on Design, Automation and Test in Europe, pp. 737742, European
Design and Automation Association, 2009.
Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 7 / 29
Literature review

Dynamic reconfigurable hybrid cache 6 aims at reducing power


dissipation by power gating a way of the set.
Identifies the underutilized set using per set counter .
The technique reduces power dissipation but also leads to
performance loss.
Static and dynamic co-optimizations for mapping blocks in
hybrid cache 7 is another compiler assisted approach.
Compiler generates hints about write frequency of block.

6
Chen, Yu-Ting, et al. ”Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design.”Proceedings
of the Conference on Design, Automation and Test in Europe. EDA Consortium, 2012.
7
Chen, Yu-Ting, et al. ”Static and dynamic co-optimizations for blocks mapping in hybrid caches.”Proceedings of the 2012
ACM/IEEE international symposium on Low power electronics and design. ACM, 2012.
Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 8 / 29
Literature review

Capacity assessment hardware with compiler generated hints guide


block placement.
Not much accurate block placement guiding mechanism leads to
block swapping.
Frequent block swapping consumes significant power and performance
benefit is minimal.
In order to get performance benefit, accuracy of block placement
decisions is crucial.
Hence predictor based designs are more effective to avoid migration
overhead.

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 9 / 29
Literature review

Write intensity predictor 8

Correlates write intensity of block with address of memory instruction


that generates miss.
Instruction address along with cost is stored per LLC cache block.
Cost model is used to find out write intensity of block after miss :-
C = Nr x (ErSTT - ErS)+ Nw x (EwSTT - EwS).
Block is predicted to be write intensive if its cost is greater than
threshold.
The memory instruction address of write-intensive block is used to
index prediction table
The corresponding counter value is incremented.

8
J. Ahn, S. Yoo, and K. Choi, Write intensity prediction for energy-efficient non-volatile caches, in International
Symposium on Low Power Electronics and Design (ISLPED), pp. 223228, IEEE, 2013.
Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 10 / 29
Literature review

Adding cost bits and instruction address bits per LLC block incur
significant storage overhead.
The threshold is application specific.
This predictor is modified to tackle above issues.
Prediction hybrid cache 9
Samples few sets and uses their metadata to train prediction table.
Incorporates dynamic threshold adjustment unit.

9
SJ. Ahn, S. Yoo, and K. Choi, Prediction hybrid cache: An energy-efficient stt-ramcache architecture, IEEE Transactions
on Computers, vol. 65, no. 3, pp. 940951, 2015.
Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 11 / 29
Literature review

9
Figure: Architecture of prediction hybrid cache

On average, the write intensity predictor achieves accuracy of 93%.

9
SJ. Ahn, S. Yoo, and K. Choi, Prediction hybrid cache: An energy-efficient stt-ramcache architecture, IEEE Transactions
on Computers, vol. 65, no. 3, pp. 940951, 2015.
Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 12 / 29
Literature review
Architecture level solutions for exclusive cache :-

11
Figure: Exclusive cache

Adaptive placement and migration based policy for hybrid


cache 10 :- Based on identifying write intensive and dead blocks.Dead
blocks are bypassed,write intensive blocks are placed in SRAM region.
LAP-hybrid 11 :- Based on identifying loop blocks and giving them
preference to stay in STTRAM region.
10
Wang, Zhe, et al. ”Adaptive placement and migration policy for an STT-RAM-based hybrid cache.”2014 IEEE 20th
International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2014.
11
H.-Y. Cheng, J. Zhao, J. Sampson, M. J. Irwin, A. Jaleel, Y. Lu, and Y. Xie,Lap: loop-block aware inclusion properties for
energy-efficient asymmetric last level caches, in ACM SIGARCH Computer Architecture News, vol. 44, pp. 103114, IEEE Press,
Anushree Pendharkar (IIT Bombay)
2016. Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 13 / 29
Literature review

Insights:-
All optimizations aim at reducing number of writes to STTRAM
region.
Due to non uniform intra-set write distribution, hybrid cache designs
give performance improvement and lesser power dissipation for
inclusive caches.
Hybrid cache designs assume static partitioning of ways into SRAM
and STTRAM type in a set.
Architecture modifications in cache are done to identify
write-intensive blocks and place them into SRAM region.
The works discussed takes intra-set write imbalance into account.

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 14 / 29
Experimental results
Simulator setup :-
Simulator :- SNIPER v7.2
Core is Nehalem, frequency = 2.66GHz, issue width = 4, ROB size =
128
Caches are inclusive,write-back and use LRU replacement policy
Cache parameters used in simulation :-

Cache parameter L1-I L1-D L2 L3


Cache size (kB) 32 32 256 2048
Associativity 4 8 8 16
Data access latency (cycles) 4 4 8 30

Data read latency of STTRAM region is taken as 30 cycles while write


latency is taken as 180 cycles 9 .
9
SJ. Ahn, S. Yoo, and K. Choi, Prediction hybrid cache: An energy-efficient stt-ramcache architecture, IEEE Transactions
on Computers, vol. 65, no. 3, pp. 940951, 2015.
Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 15 / 29
Experimental results
Experiment 1:-To compare performance of hybrid and STTRAM
LLC design to traditional LLC design

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 16 / 29
Experimental results
Experiment 2:-To find out number of writes in each way of the set.

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 17 / 29
Experimental results

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 18 / 29
Experimental results

There is inter set write imbalance too as some sets experience higher
number of STTRAM writes than other sets.
Number of writes allocated to SRAM region in a particular set are
limited due to static way partitioning.
More SRAM ways can be allocated to the set that experiences higher
writes.

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 19 / 29
Literature review

Adaptive line replacement 12 technique involves mechanism to deal


with inter set and intra set write imbalance.
Intra set write imbalance is mitigated using the approach in RWHCA
design.
Frequent data swapping incurs significant energy and leads to
minimum performance benefit.
Inter set write imbalance is tackled by monitoring STTRAM writes
count in each set.
8 sets with difference in 3MSBs of tag index form merge group.

12
Jadidi, Amin, Mohammad Arjomand, and Hamid Sarbazi-Azad. ”High-endurance and performance-efficient design of
hybrid cache architectures through adaptive line replacement.”Proceedings of the 17th IEEE/ACM international symposium on
Low-power electronics and design. IEEE Press, 2011.
Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 20 / 29
Literature review

Merge destination field hold 3MSBs of the coupled set.

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 21 / 29
Summary

Limitations of adaptive line replacement technique:-


Intra set re-mapping
Simple counter based approach is used to overcome intra set write
imbalance.
Multiple block swapping leads to additional power dissipation and
slows down the performance.
An efficient predictor can help in avoiding block swapping and
providing performance benefit.
Inter set re-mapping
Tag searching needs to be done in more than one sets.
Replacement policy designed for set associative cache may not be
beneficial.
Linking tag and data structures through pointers such that tag
remains in native set may tackle above issues.

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 22 / 29
Proposed approach

Static mapping

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 23 / 29
Proposed approach

Dynamic mapping

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 24 / 29
Future work

Simulating proposed approach for hybrid LLC in sniper simulator and


observing the performance and power dissipation of single and
multi-core for different workloads.
Combining above implementation with write intensity predictor and
observing the performance and power dissipation of single and
multi-core for different workloads.

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 25 / 29
Thanks

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 26 / 29
References

Carvalho, Carlos. ”The gap between processor and memory speeds.”


Proc. of IEEE International Conference on Control and Automation.
2002.
P. Chi, S. Li, Y. Cheng, Y. Lu, S. H. Kang, and Y. Xie, Architecture
design with stt-ram: Opportunities and challenges, in 2016 21st Asia
and South Pacific Design Automation Conference (ASP-DAC), pp.
109114, IEEE, 2016.
Smullen, Clinton W., et al. ”Relaxing non-volatility for fast and
energy-efficient STT-RAM caches.”2011 IEEE 17th International
Symposium on High Performance Computer Architecture. IEEE,
2011.
Zhou, Ping, et al. ”Energy reduction for STT-RAM using early write
termination.”Proceedings of the 2009 International Conference on
Computer-Aided Design. ACM, 2009.

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 27 / 29
References

Chen, Yu-Ting, et al. ”Dynamically reconfigurable hybrid cache: An


energy-efficient last-level cache design.”Proceedings of the Conference
on Design, Automation and Test in Europe. EDA Consortium, 2012.
Chen, Yu-Ting, et al. ”Static and dynamic co-optimizations for
blocks mapping in hybrid caches.”Proceedings of the 2012
ACM/IEEE international symposium on Low power electronics and
design. ACM, 2012.
J. Ahn, S. Yoo, and K. Choi, Write intensity prediction for
energy-efficient non-volatile caches, in International Symposium on
Low Power Electronics and Design (ISLPED), pp. 223228, IEEE,
2013.
SJ. Ahn, S. Yoo, and K. Choi, Prediction hybrid cache: An
energy-efficient stt-ramcache architecture, IEEE Transactions on
Computers, vol. 65, no. 3, pp. 940951, 2015.

Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 28 / 29
References
X. Wu, J. Li, L. Zhang, E. Speight, and Y. Xie, Power and
performance of read-write aware hybrid caches with non-volatile
memories, in Proceedings of the Conference on Design, Automation
and Test in Europe, pp. 737742, European Design and Automation
Association, 2009.
Wang, Zhe, et al. ”Adaptive placement and migration policy for an
STT-RAM-based hybrid cache.”2014 IEEE 20th International
Symposium on High Performance Computer Architecture (HPCA).
IEEE, 2014.
H.-Y. Cheng, J. Zhao, J. Sampson, M. J. Irwin, A. Jaleel, Y. Lu, and
Y. Xie,Lap: loop-block aware inclusion properties for energy-efficient
asymmetric last level caches, in ACM SIGARCH Computer
Architecture News, vol. 44, pp. 103114, IEEE Press, 2016.
Jadidi, Amin, Mohammad Arjomand, and Hamid Sarbazi-Azad.
”High-endurance and performance-efficient design of hybrid cache
architectures through adaptive line replacement.”Proceedings of the
17th IEEE/ACM international symposium on Low-power electronics
Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 29 / 29

Das könnte Ihnen auch gefallen