Beruflich Dokumente
Kultur Dokumente
Strassen Implementation
This algorithm was implemented as follows:
void inline strassen(int n,double* c,int ldc,int rc,int cc, double*
a,int lda,int ra, int ca,double *b, int ldb, int rb, int cb, int alpha,
int min_block_size ){
//C(i,j) = alpha*C(i,j) + A(i,k) * B(k,j)
//alpha [0,1]
Since it is recursive in nature, a flag named alpha, was used to delineate whether the
results need to be accumulated into C or merely set to C. Also, a minimum level of
recursion was defined using a variable called min_block_size was used. If the recursion
level produces an n such that n < min_block_size, then normal matrix multiply is used.
The results are shown below for min_block_size = 32 and min_block_size = 64. Also
included are the IJK_Blocking_20 (block size = 20) and IJK_Blocking_21 (block size =
21). A naïve implementation is included as well as the default DGEMM for comparison.
400
350
Performance MFLOP/s
300 strassen32
strassen64
250
DGEMM
IJK_BLOCKING_20
200
IJK_BLOCKING_21
150 IJK_NAIVE
100
50
0
0 100 200 300 400 500 600 700
N
APPENDIX A – Results Strassen
F=frobenius norm