Beruflich Dokumente
Kultur Dokumente
Hierarchy?
µProc
CPU
60%/yr.
DRAM
DRAM
7%/yr.
Way Predicting Caches
• Use processor address to index into way prediction table
• Look in predicted way at given index, then:
HIT MISS
MISS
SLOW HIT
(change entry in
prediction table) Read block of data from
next level of cache
Merging Arrays
int val[SIZE]; struct record{
int key[SIZE]; int val;
int key;
for (i=0; i<SIZE; i++){ };
key[i] = newkey; struct record records[SIZE];
val[i]++;
} for (i=0; i<SIZE; i++){
records[i].key = newkey;
records[i].val++;
}
Splitted loops: every access to a and c misses. Fused loops: only 1st
access misses. Improves temporal locality
Summary of Compiler Optimizations
to Reduce Cache Misses
vpenta (nasa7)
gmty (nasa7)
tomcatv
btrix (nasa7)
mxm (nasa7)
spice
cholesky (nasa7)
compress
1 1.5 2 2.5 3
Performance Improvement
28 28
256 Entries 28
4MB