Sie sind auf Seite 1von 24

Who Cares About the Memory

Hierarchy?

µProc
CPU
60%/yr.

DRAM
DRAM
7%/yr.
Way Predicting Caches
• Use processor address to index into way prediction table
• Look in predicted way at given index, then:

HIT MISS

Return copy Look in other way


of data from
cache

MISS
SLOW HIT
(change entry in
prediction table) Read block of data from
next level of cache
Merging Arrays
int val[SIZE]; struct record{
int key[SIZE]; int val;
int key;
for (i=0; i<SIZE; i++){ };
key[i] = newkey; struct record records[SIZE];
val[i]++;
} for (i=0; i<SIZE; i++){
records[i].key = newkey;
records[i].val++;
}

• Reduces conflicts between val & key and improves spatial


locality
Loop Fusion
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
a[i][j] = 1/b[i][j] * c[i][j];
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
d[i][j] = a[i][j] + c[i][j];

for (i = 0; i < N; i++) Reference can be directly to register


for (j = 0; j < N; j++){
a[i][j] = 1/b[i][j] * c[i][j];
d[i][j] = a[i][j] + c[i][j];
}

Splitted loops: every access to a and c misses. Fused loops: only 1st
access misses. Improves temporal locality
Summary of Compiler Optimizations
to Reduce Cache Misses
vpenta (nasa7)

gmty (nasa7)

tomcatv

btrix (nasa7)

mxm (nasa7)

spice

cholesky (nasa7)

compress

1 1.5 2 2.5 3
Performance Improvement

merged arrays loop interchange loop fusion blocking


Translation Lookaside Buffer
8KB

28 28

256 Entries 28

4MB

Das könnte Ihnen auch gefallen