Beruflich Dokumente
Kultur Dokumente
Ray Seyfarth
c
2011
Ray Seyfarth
Outline
c
2011
Ray Seyfarth
c
2011
Ray Seyfarth
Use C or C++
c
2011
Ray Seyfarth
Seconds to process 80 GB
10000
1e+05
1e+06
1e+07
Array Size in Bytes
1e+08
1e+09
1e+10
c
2011
Ray Seyfarth
2000
MFLOPS
1500
1000
500
500
Block size
1000
c
2011
Ray Seyfarth
c
2011
Ray Seyfarth
Strength reduction
c
2011
Ray Seyfarth
c
2011
Ray Seyfarth
c
2011
Ray Seyfarth
c
2011
Ray Seyfarth
Unroll loops
c
2011
Ray Seyfarth
rax, [rdi]
rbx, [rdi+8]
rcx, [rdi+16]
rdx, [rdi+16]
rdi, 32
rsi, 4
.add_words
rcx, rdx
rax, rbx
rax, rcx
c
2011
Ray Seyfarth
Merge loops
If 2 loops have some loop limits, consider merging the bodies
There will be less loop overhead
The following 2 loops can be profitably merged
for ( i = 0; i < 1000; i++ ) a[i] = b[i] + c[i];
for ( j = 0; j < 1000; j++ ) d[j] = b[j] - c[j];
After merging values for b[i] and c[i] can be used twice
for ( i = 0; i < 1000; i++ ) {
a[i] = b[i] + c[i];
d[i] = b[i] - c[i];
}
c
2011
Ray Seyfarth
Split loops
c
2011
Ray Seyfarth
Interchange loops
for ( j = 0; j < n; j++ ) {
for ( i = 0; i < n; i++ ) {
x[i][j] = 0;
}
}
The previous loop steps through the x array in large increments
The loop below steps through the array one element after the other
Cache fetches are better used
for ( i = 0; i < n; i++ ) {
for ( j = 0; j < n; j++ ) {
x[i][j] = 0;
}
}
64 Bit Intel Assembly Language
c
2011
Ray Seyfarth
c
2011
Ray Seyfarth
Remove recursion
c
2011
Ray Seyfarth
c
2011
Ray Seyfarth
Inline functions
c
2011
Ray Seyfarth
c
2011
Ray Seyfarth
The compiler will have a harder time doing this than you
There are SIMD integer instructions
There are also SIMD floating point instructions
The AVX instructions are a new feature which allow twice as many
floating point values in the SIMD registers
c
2011
Ray Seyfarth