Beruflich Dokumente
Kultur Dokumente
Evrim Guler
Question: Develop a matrix multiplication algorithm on shared memory model which uses up
to n3 processors.
Find Tp, Sp, Ep, cost, work, and calculate the range of cost optimality, and time
optimality.
Solution:
On the following lines, there is an algorithm doing matrix multiplication. A, B, and C matrices
are n*n two dimensional square matrices.
, = , ,
, <
Algorithm is:
for (int i = 0; i<n; i++)
for (int j = 0; j<n; j++)
C[i][j]=0;
for (int k = 0; k<n; k++)
C[i][j] = C[i][j] + A[i][k]*B[k][j];
1,1
[,
3,1
1,2
,
3,2
1,3
1,1
, ] [2,1
3,1
3,3
,
,
,
1,3
1,1
2,3 ] = [2,1
3,3
3,1
1,2
,
3,2
1,3
2,3 ]
3,3
After given nave algorithm to multiply n square matrices, we can calculate time complexity. For
the time complexity, we need to monitor multiplication of each row with each column of A and
B, and addition of all them for specified row and column of C.
This algorithm requires n3 multiplications and n3 additions in terms of sequential time
complexity which is O(n3). It means is that T1* = O(n3).
To parallelize this algorithm is very easy with # of processors; to do that we need to focus on
separately each cell of C matrix as C1,1 , C1,2 , C2,1 , and so on. Computation time for each cell
must be related to multiplication and addition to all of them.
So, for multiplication we can calculate Tmultiplication = O(n3/p), and for the addition part we can
calculate that in C matrix we have n2 elements, for each element we need to find n/(p/n2) and
logn steps to sum up, so Tsummation = O(n3/p + logn).
To explain it in detail is that we need create reduction tree as I mentioned below. To add n
numbers is requesting n/2p steps with p processors in first level, n/4p steps with p processors in
second level and so on. So, if we think we have n3 elements to add for the matrix multiplication,
then we can solve it with n3 instead of n.
For the n elements;
Tsummation = (n/2p + n/4p + n/8p + ) + log(n) = n/p(1/2 + + 1/8 + ) + log(n)
= O(n/p + log(n))
In our matrix application title, we need to change first n with n3. So, it must be;
Tsummation = O(n3/p + log(n))
Tp
I can explain it using tree construction; previous pictures are just sample for calculating 1 cell in
C matrix. It is going be repeated to fill out n * n matrix.
n numbers can be added in log 2 steps using n processors;
So, computational time complexity of O(log 2 ) using n3 processors.
To calculate the following terms;
T1*
O(n3)
Tp
O(n3 / p + )
Cost
p * Tp
Work
# of operations Cost
Sp
T1* / Tp
Ep
Sp / p
T1* / ( Tp * p )
If Tp = O(n2) with n processors, each instance of inner loop independent and can be done
by a different processor.
If Tp = O(n) with n2 processors, one element of A and B is assigned to each different
processor. So, cost optimal when we show O(n3) = n * O(n2) = n2 * O(n) is good.
If Tp = O( ) with n3 processors, to parallelize to multiplication, we need to show
inner loop computation. So, this is not cost optimal because O(n3) = n3 * O(log 2 ).
Sp = T1* / Tp =
3
(
+ log2 )
( )
3
(
+ log2 )
= (
1+ log2
( )
( +
=
)
( + )
with n3
O( n3 + plog 2 ) O(n3)
If we have 1 processor, the Time complexity will be O(n3) which is sequential time(T1*)
If we use n processor, the Time complexity will be O(n2)
If we use n2 processor, the Time complexity will be O(n)
If we use n3 processor, the Time complexity will be O(log 2 )
If we need to show the therotical range of time optimality, we can say that
Tp Problem Complexity O(n3 / p + ) (n2), so p needs to be greater than n.
If we think that p is n, then O( n2 +logn) = O(n2) = (n2).
So, the range of time optimality must be; (p: # of processor)
1
O(n2)
Tp
O(n3)
<
n2
O(n)
Tp
<
O(n2)
n2
<
n3
Tp
<
O(n)
O(log 2 )