Dynamic Way Partitioning of Hybrid Last Level Cache: Anushree Pendharkar

Dynamic Way Partitioning Of Hybrid Last Level Cache
Anushree Pendharkar
MTP Phase 1
Guide: Prof Virendra Singh
June 29, 2019
Anushree Pendharkar (IIT Bombay) Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 1 / 29
Outline
Background
Motivation
Literature review
Experimental results
Summary
Proposed approach
Future work
Background
1
Figure: Performance gap between processor and memory
One way to bridge the gap is to increase Last Level Cache (LLC) size.
1
Carvalho, Carlos. ”The gap between processor and memory speeds.” Proc. of IEEE International Conference on Control
and Automation. 2002.
Motivation
Problems with LLC made of SRAM :-

SRAM consumes significant static power and occupies significant
area.
Leakage current of transistor will increase more in nanometer
technology.
Need to look for new memory type with :-
Negligible leakage power dissipation
Access time comparable to SRAM
Compatibility with CMOS
High density and scalability
Motivation
STTRAM is a potential alternative to SRAM. However it suffers from :-

High write energy and write latency.
Limited write endurance (1012 - 1015 writes per cell)2
Problem statement :- Design last level cache using STTRAM with less
writes to it.
2
P. Chi, S. Li, Y. Cheng, Y. Lu, S. H. Kang, and Y. Xie, Architecture design with stt-ram: Opportunities and challenges, in
2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 109114, IEEE, 2016.
Literature review
Two category of solutions :-

Circuit level solutions
Architecture level solutions.
Circuit level solutions :-
Reducing retention time of STTRAM cells 3
(requires refresh hardware)

Early write termination 4
These works do not consider non-uniform writes distribution across

cache.
3
Smullen, Clinton W., et al. ”Relaxing non-volatility for fast and energy-efficient STT-RAM caches.”2011 IEEE 17th
International Symposium on High Performance Computer Architecture. IEEE, 2011.
4
Zhou, Ping, et al. ”Energy reduction for STT-RAM using early write termination.”Proceedings of the 2009 International
Conference on Computer-Aided Design. ACM, 2009.
Literature review
Architecture level solutions for inclusive cache :-

Partition few ways of set into SRAM type and remaining ways into
STTRAM type that is hybrid cache.
Identify frequently modified (write-intensive) blocks and place them
into SRAM region.
Simple design like Read Write Aware Hybrid Cache Architecture
5 (RWHCA) places block in SRAM region on store miss and in
STTRAM region on load miss.

Blocks brought after load miss may also become write intensive.
Hence, RWHCA approach incur significant migration overhead.
5
X. Wu, J. Li, L. Zhang, E. Speight, and Y. Xie, Power and performance of read-write aware hybrid caches with
non-volatile memories, in Proceedings of the Conference on Design, Automation and Test in Europe, pp. 737742, European
Design and Automation Association, 2009.
Literature review
Dynamic reconfigurable hybrid cache 6 aims at reducing power

dissipation by power gating a way of the set.
Identifies the underutilized set using per set counter .
The technique reduces power dissipation but also leads to
performance loss.
Static and dynamic co-optimizations for mapping blocks in
hybrid cache 7 is another compiler assisted approach.
Compiler generates hints about write frequency of block.
6
Chen, Yu-Ting, et al. ”Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design.”Proceedings
of the Conference on Design, Automation and Test in Europe. EDA Consortium, 2012.
7
Chen, Yu-Ting, et al. ”Static and dynamic co-optimizations for blocks mapping in hybrid caches.”Proceedings of the 2012
ACM/IEEE international symposium on Low power electronics and design. ACM, 2012.
Literature review
Capacity assessment hardware with compiler generated hints guide

block placement.
Not much accurate block placement guiding mechanism leads to
block swapping.
Frequent block swapping consumes significant power and performance
benefit is minimal.
In order to get performance benefit, accuracy of block placement
decisions is crucial.
Hence predictor based designs are more effective to avoid migration
overhead.
Literature review
Write intensity predictor 8
Correlates write intensity of block with address of memory instruction

that generates miss.
Instruction address along with cost is stored per LLC cache block.
Cost model is used to find out write intensity of block after miss :-
C = Nr x (ErSTT - ErS)+ Nw x (EwSTT - EwS).
Block is predicted to be write intensive if its cost is greater than
threshold.
The memory instruction address of write-intensive block is used to
index prediction table
The corresponding counter value is incremented.
8
J. Ahn, S. Yoo, and K. Choi, Write intensity prediction for energy-efficient non-volatile caches, in International
Symposium on Low Power Electronics and Design (ISLPED), pp. 223228, IEEE, 2013.
Literature review
Adding cost bits and instruction address bits per LLC block incur
significant storage overhead.
The threshold is application specific.
This predictor is modified to tackle above issues.
Prediction hybrid cache 9
Samples few sets and uses their metadata to train prediction table.
Incorporates dynamic threshold adjustment unit.
9
SJ. Ahn, S. Yoo, and K. Choi, Prediction hybrid cache: An energy-efficient stt-ramcache architecture, IEEE Transactions
on Computers, vol. 65, no. 3, pp. 940951, 2015.
Literature review
9
Figure: Architecture of prediction hybrid cache
On average, the write intensity predictor achieves accuracy of 93%.
9
Literature review
Architecture level solutions for exclusive cache :-
11
Figure: Exclusive cache
Adaptive placement and migration based policy for hybrid

cache 10 :- Based on identifying write intensive and dead blocks.Dead
blocks are bypassed,write intensive blocks are placed in SRAM region.
LAP-hybrid 11 :- Based on identifying loop blocks and giving them
preference to stay in STTRAM region.
10
Wang, Zhe, et al. ”Adaptive placement and migration policy for an STT-RAM-based hybrid cache.”2014 IEEE 20th
International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2014.
11
H.-Y. Cheng, J. Zhao, J. Sampson, M. J. Irwin, A. Jaleel, Y. Lu, and Y. Xie,Lap: loop-block aware inclusion properties for
energy-efficient asymmetric last level caches, in ACM SIGARCH Computer Architecture News, vol. 44, pp. 103114, IEEE Press,
Anushree Pendharkar (IIT Bombay)
2016. Dynamic Way Partitioning Of Hybrid Last Level Cache June 29, 2019 13 / 29
Literature review
Insights:-
All optimizations aim at reducing number of writes to STTRAM
region.
Due to non uniform intra-set write distribution, hybrid cache designs
give performance improvement and lesser power dissipation for
inclusive caches.
Hybrid cache designs assume static partitioning of ways into SRAM
and STTRAM type in a set.
Architecture modifications in cache are done to identify
write-intensive blocks and place them into SRAM region.
The works discussed takes intra-set write imbalance into account.
Simulator setup :-
Simulator :- SNIPER v7.2
Core is Nehalem, frequency = 2.66GHz, issue width = 4, ROB size =
128
Caches are inclusive,write-back and use LRU replacement policy
Cache parameters used in simulation :-
Cache parameter L1-I L1-D L2 L3

Cache size (kB) 32 32 256 2048
Associativity 4 8 8 16
Data access latency (cycles) 4 4 8 30
Data read latency of STTRAM region is taken as 30 cycles while write

latency is taken as 180 cycles 9 .
9
Experiment 1:-To compare performance of hybrid and STTRAM
LLC design to traditional LLC design
Experiment 2:-To find out number of writes in each way of the set.
There is inter set write imbalance too as some sets experience higher
number of STTRAM writes than other sets.
Number of writes allocated to SRAM region in a particular set are
limited due to static way partitioning.
More SRAM ways can be allocated to the set that experiences higher
writes.
Literature review
Adaptive line replacement 12 technique involves mechanism to deal

with inter set and intra set write imbalance.
Intra set write imbalance is mitigated using the approach in RWHCA
design.
Frequent data swapping incurs significant energy and leads to
minimum performance benefit.
Inter set write imbalance is tackled by monitoring STTRAM writes
count in each set.
8 sets with difference in 3MSBs of tag index form merge group.
12
Jadidi, Amin, Mohammad Arjomand, and Hamid Sarbazi-Azad. ”High-endurance and performance-efficient design of
hybrid cache architectures through adaptive line replacement.”Proceedings of the 17th IEEE/ACM international symposium on
Low-power electronics and design. IEEE Press, 2011.
Literature review
Merge destination field hold 3MSBs of the coupled set.
Summary
Limitations of adaptive line replacement technique:-

Intra set re-mapping
Simple counter based approach is used to overcome intra set write
imbalance.
Multiple block swapping leads to additional power dissipation and
slows down the performance.
An efficient predictor can help in avoiding block swapping and
providing performance benefit.
Inter set re-mapping
Tag searching needs to be done in more than one sets.
Replacement policy designed for set associative cache may not be
beneficial.
Linking tag and data structures through pointers such that tag
remains in native set may tackle above issues.
Proposed approach
Static mapping
Proposed approach
Dynamic mapping
Future work
Simulating proposed approach for hybrid LLC in sniper simulator and

observing the performance and power dissipation of single and
multi-core for different workloads.
Combining above implementation with write intensity predictor and
observing the performance and power dissipation of single and
multi-core for different workloads.
Thanks
References
Carvalho, Carlos. ”The gap between processor and memory speeds.”

Proc. of IEEE International Conference on Control and Automation.
2002.
P. Chi, S. Li, Y. Cheng, Y. Lu, S. H. Kang, and Y. Xie, Architecture
design with stt-ram: Opportunities and challenges, in 2016 21st Asia
and South Pacific Design Automation Conference (ASP-DAC), pp.
109114, IEEE, 2016.
Smullen, Clinton W., et al. ”Relaxing non-volatility for fast and
energy-efficient STT-RAM caches.”2011 IEEE 17th International
Symposium on High Performance Computer Architecture. IEEE,
2011.
Zhou, Ping, et al. ”Energy reduction for STT-RAM using early write
termination.”Proceedings of the 2009 International Conference on
Computer-Aided Design. ACM, 2009.
References
Chen, Yu-Ting, et al. ”Dynamically reconfigurable hybrid cache: An

energy-efficient last-level cache design.”Proceedings of the Conference
on Design, Automation and Test in Europe. EDA Consortium, 2012.
Chen, Yu-Ting, et al. ”Static and dynamic co-optimizations for
blocks mapping in hybrid caches.”Proceedings of the 2012
ACM/IEEE international symposium on Low power electronics and
design. ACM, 2012.
J. Ahn, S. Yoo, and K. Choi, Write intensity prediction for
energy-efficient non-volatile caches, in International Symposium on
Low Power Electronics and Design (ISLPED), pp. 223228, IEEE,
2013.
SJ. Ahn, S. Yoo, and K. Choi, Prediction hybrid cache: An
energy-efficient stt-ramcache architecture, IEEE Transactions on
Computers, vol. 65, no. 3, pp. 940951, 2015.
References
X. Wu, J. Li, L. Zhang, E. Speight, and Y. Xie, Power and
performance of read-write aware hybrid caches with non-volatile
memories, in Proceedings of the Conference on Design, Automation
and Test in Europe, pp. 737742, European Design and Automation
Association, 2009.
Wang, Zhe, et al. ”Adaptive placement and migration policy for an
STT-RAM-based hybrid cache.”2014 IEEE 20th International
Symposium on High Performance Computer Architecture (HPCA).
IEEE, 2014.
H.-Y. Cheng, J. Zhao, J. Sampson, M. J. Irwin, A. Jaleel, Y. Lu, and
Y. Xie,Lap: loop-block aware inclusion properties for energy-efficient
asymmetric last level caches, in ACM SIGARCH Computer
Architecture News, vol. 44, pp. 103114, IEEE Press, 2016.
Jadidi, Amin, Mohammad Arjomand, and Hamid Sarbazi-Azad.
”High-endurance and performance-efficient design of hybrid cache
architectures through adaptive line replacement.”Proceedings of the
17th IEEE/ACM international symposium on Low-power electronics

Dynamic Way Partitioning of Hybrid Last Level Cache: Anushree Pendharkar

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Dynamic Way Partitioning of Hybrid Last Level Cache: Anushree Pendharkar

Hochgeladen von

Copyright:

Verfügbare Formate

Dynamic Way Partitioning Of Hybrid Last Level Cache

June 29, 2019

Problems with LLC made of SRAM :-

STTRAM is a potential alternative to SRAM. However it suffers from :-

Two category of solutions :-

(requires refresh hardware)

These works do not consider non-uniform writes distribution across

Architecture level solutions for inclusive cache :-

STTRAM region on load miss.

Dynamic reconfigurable hybrid cache 6 aims at reducing power

Capacity assessment hardware with compiler generated hints guide

Write intensity predictor 8

Correlates write intensity of block with address of memory instruction

On average, the write intensity predictor achieves accuracy of 93%.

Adaptive placement and migration based policy for hybrid

Cache parameter L1-I L1-D L2 L3

Data read latency of STTRAM region is taken as 30 cycles while write

Adaptive line replacement 12 technique involves mechanism to deal

Merge destination field hold 3MSBs of the coupled set.

Limitations of adaptive line replacement technique:-

Simulating proposed approach for hybrid LLC in sniper simulator and

Carvalho, Carlos. ”The gap between processor and memory speeds.”

Chen, Yu-Ting, et al. ”Dynamically reconfigurable hybrid cache: An

Das könnte Ihnen auch gefallen