Sie sind auf Seite 1von 4

Parallel Algorithm Homework 1

Evrim Guler
Question: Develop a matrix multiplication algorithm on shared memory model which uses up
to n3 processors.
Find Tp, Sp, Ep, cost, work, and calculate the range of cost optimality, and time
optimality.

Solution:
On the following lines, there is an algorithm doing matrix multiplication. A, B, and C matrices
are n*n two dimensional square matrices.

, = , ,

, <

Algorithm is:
for (int i = 0; i<n; i++)
for (int j = 0; j<n; j++)
C[i][j]=0;
for (int k = 0; k<n; k++)
C[i][j] = C[i][j] + A[i][k]*B[k][j];
1,1
[,
3,1

1,2
,
3,2

1,3
1,1
, ] [2,1
3,1
3,3

,
,
,

1,3
1,1
2,3 ] = [2,1
3,3
3,1

1,2
,
3,2

1,3
2,3 ]
3,3

After given nave algorithm to multiply n square matrices, we can calculate time complexity. For
the time complexity, we need to monitor multiplication of each row with each column of A and
B, and addition of all them for specified row and column of C.
This algorithm requires n3 multiplications and n3 additions in terms of sequential time
complexity which is O(n3). It means is that T1* = O(n3).

To parallelize this algorithm is very easy with # of processors; to do that we need to focus on
separately each cell of C matrix as C1,1 , C1,2 , C2,1 , and so on. Computation time for each cell
must be related to multiplication and addition to all of them.
So, for multiplication we can calculate Tmultiplication = O(n3/p), and for the addition part we can
calculate that in C matrix we have n2 elements, for each element we need to find n/(p/n2) and
logn steps to sum up, so Tsummation = O(n3/p + logn).
To explain it in detail is that we need create reduction tree as I mentioned below. To add n
numbers is requesting n/2p steps with p processors in first level, n/4p steps with p processors in
second level and so on. So, if we think we have n3 elements to add for the matrix multiplication,
then we can solve it with n3 instead of n.
For the n elements;
Tsummation = (n/2p + n/4p + n/8p + ) + log(n) = n/p(1/2 + + 1/8 + ) + log(n)
= O(n/p + log(n))
In our matrix application title, we need to change first n with n3. So, it must be;
Tsummation = O(n3/p + log(n))
Tp

O(n3 / p + ) + O(n3) = O(n3 / p + )

If we assume that we have n processors, time complexity is going to be O(n2).


If we assume that we have n2 processors, time complexity is going to be O(n).
If we assume that we have n3 processors, time complexity is going to be O(log 2 ).
So, lower bound for parallel matrix multiplication is O(log 2 ).

I can explain it using tree construction; previous pictures are just sample for calculating 1 cell in
C matrix. It is going be repeated to fill out n * n matrix.
n numbers can be added in log 2 steps using n processors;
So, computational time complexity of O(log 2 ) using n3 processors.
To calculate the following terms;
T1*

O(n3)

Tp

O(n3 / p + )

Cost

p * Tp

Work

# of operations Cost

Sp

T1* / Tp

Ep

Sp / p

T1* / ( Tp * p )

For the cost;

If Tp = O(n2) with n processors, each instance of inner loop independent and can be done
by a different processor.
If Tp = O(n) with n2 processors, one element of A and B is assigned to each different
processor. So, cost optimal when we show O(n3) = n * O(n2) = n2 * O(n) is good.
If Tp = O( ) with n3 processors, to parallelize to multiplication, we need to show
inner loop computation. So, this is not cost optimal because O(n3) = n3 * O(log 2 ).

O(log 2 ) must be lower bound for parallel matrix multiplication.


Also, to show all equations in the homework question, I will explain in the following lines;
( )

Sp = T1* / Tp =

Ep = T1* / Cost = Sp / p = T1* / ( Tp * p ) =


=

3
(

+ log2 )

( )
3
(

+ log2 )

= (

1+ log2

( )
( +

SpeedUp with n3 processors

=
)

( + )

Efficiency with n3 processors

Cost = p * O(n3 / p + log 2 ) = O( n3 + p * log 2 ) = O(n3 + n3log 2 )


processors
3

with n3

Work = # of operation = O(n3 / p) + O(n3 / p + log 2 ) = O(n3 / p + log 2 ).

To look at cost optimality, we need to explain that;

If we use 1 processor to do all computation, Cost = 1 * O(n3) which is sequential


time(T1*)
If we use n processor to do all computation, Cost = n * O(n2) = O(n3)
If we use n2 processor to do all computation, Cost = n2 * O(n) = O(n3)
If we use n3 processor to do all computation, Cost = n3 * O(log 2 ) = O(n3log 2 )

After that, we need to show finding cost optimality;


p * Tp T1*

O( n3 + plog 2 ) O(n3)

plog 2 n3 p n3/ log 2

To check and proof it, if we can take p = n3/ log 2 ,


p * log 2 = (n3/ log 2 ) * log 2 = n3 is included by O(n3).
So, cost optimality must be bounded by O(n3).
For the time optimality, we need to monitor each step of how many processors we are using to
computation;

If we have 1 processor, the Time complexity will be O(n3) which is sequential time(T1*)
If we use n processor, the Time complexity will be O(n2)
If we use n2 processor, the Time complexity will be O(n)
If we use n3 processor, the Time complexity will be O(log 2 )

If we need to show the therotical range of time optimality, we can say that
Tp Problem Complexity O(n3 / p + ) (n2), so p needs to be greater than n.
If we think that p is n, then O( n2 +logn) = O(n2) = (n2).
So, the range of time optimality must be; (p: # of processor)
1

O(n2)

Tp

O(n3)

<

n2

O(n)

Tp

<

O(n2)

n2

<

n3

Tp

<

O(n)

O(log 2 )

Das könnte Ihnen auch gefallen