Sie sind auf Seite 1von 7

Algorithms for Parallel Machines

1) Problems which will be answered in the following


sections.

1 In what way is design PP different from design of sequential


programs?
2 Can we always write sequential programs first and then get
them to run on parallel machines?
3 Is SuperLinear Speedup achievable?

2) Speedup, Complexity and Cost

Speedup measures the degree of improvement in speed when


a problem is solved on a parallel machine as compared to
sequential machine.

Complexity can be considered of two types:

1. Worst case time complexity it is the maximum time taken


by a program to execute over all inputs.
2. Expected time complexity it is the average of execution
times over all inputs.

Analogously, we can define Worst case Space complexity and


Expected Space complexity.

3) Histogram Computation

Given an image as an array, Image[m][n] of integers in range


[0,255], find the Histogram of distribution of pixels over gray
scales in the image.
Sequential algorithm to solve above problem:

For(I = 0; I < 256; I ++)

Histogram[i] = 0;

For (I = 0; I < m; I ++)

For (j = 0; j < n; j ++){

Color = image[i][j];

Histogram[color]++;

Complexity of the above program is O(mn).

Lets parallelize the above program in following way.

Let there p processes numbered 0 to (p-1) and m rows (m > p) in


the image. Each process can be assigned (m / p) rows to compute
the Histogram. The Histogram is shared across the processes for
concurrent updation. Hence kth process executes the following:

For(i=k*(m/p); I <(k+1)*(m/p); i++)

For(j=0; j<n; j++){

Mutex_begin();

Color = image[i][j];

Histogram[color]++;

Mutex_end();

}
Still above program takes longer than the previous one to execute
due to use of mutex.

The better approach is that each process independently its


histogram which can be done in parallel. After all processes have
done the work, a single process accumulates the count in the P
separate histograms to update the final result.

For(i=k*(m/p); I <(k+1)*(m/p); i++)

For(j=0; j<n; j++){

Color = image[i][j];

Hist[color][k]++;

For(I = 0; I < 256; I ++)

For(j = 0; j < p; j ++)

Histogram [i] = histogram [i] + hist[i][j];

4) Parallel reduction :

Given a set of n values a0, a1, ..,a(n-1) and an associative


operator Reduction is the process of computing a0 a1 ..
a(n-1).

Eg of associative operator are :

Addition , multiplication etc


Algorithm for sequential reduction will be :

a) Initailise sum=0 b) iterate over the n integers in the input


adding it to the sum.

For parallel processing we view the process of reduction as a tree


structured operation. The element of input to the reduction
problem are placed at the leaf node of the binary tree.

The reduction operator is applied to the children of each parent


node and the result is propagated towards the root of the tree.

When the result at root node is completed the work of the


algorithm is over.

Analysis of parallel Reduction

There is dependency across levels of reduction. Unless


reduction at lower level is completed , higher level cannot
proceed with the operation.

After each step , the algorithm must have an equivalent of a


barrier operation.

5) Quadrature Problems :

For a function of type y=f(x) , we want to find the area under the
curve.

The range of methods for numerically computing this integral is


referred to as quadrature methods.

EG : Trapezoidal Rule.

The trapezoidal rule statically sub divides the domain into


uniformly spaced partitions .
The integral is estimated by summing up the areas of trapezia
approximating the area under the curve.

It is parallised by allotting non overlapping ranges of domain to


different process. Each process finds the local sum of areas in its
allocated range of domain and finally the global sum is updated.

Problems :

Accuracy depends on function : Since domain is divided


statically , the region where the domain has sharp variations
cannot be found very easily.Hence the accuracy suffers.

Complexity Increases.

Solution 1 : Adaptive Quadrature Algorithm :

It uses the divide and conquer method .

To find the integral of a function of type y=f(x) in the range [a,b],


we take the following steps.

1. Lets A= Area of the trapezium considering the end points


a,b.

B= Area of the trapezium considering the end points


[a, (a+b)/2]

C= Area of the trapezium considering the end points


[(a+b)/2 , b].

2. Sub-divide the range [a,b] if |A-(B+C)|>=0

3. Assign the responsibility of one sub domain to another process


and take the other part for local processing .
4. If the above condition does not hold , compute the quadrature
in the same process and update the global result.

Solution 2 : Self Scheduling Implementation

This scheme creates a group of worker processes and the


workers schedule themselves dynamically to pick up work that is
available from the shared global stack.

When there is no more work to be done , the workers perish.

This implementing strategy is called Self Scheduling.

The algorithm is :

The parent process :

1) Stores the task on shared stack

2) Creates the worker process.

Each worker Process :

1) Pops a task from the stack and starts working on it .

2) If problem cannot be solved immediately , partition the tasks


further and push the subtasks on the stack. Go to 1 and repeat
the same steps.

3) If there is no space on the stack then process must


sequentially do the tasks and update the global result.

4) All processes terminate when there is no more work to be


done.

6.Matrix Multiplication
Sequential algorithm for matrix multiplication

For(i=0 ; i<l ; i++)

For(j=0 ; j<n ; i++)

For(i=0 ; i<l ; i++)

Das könnte Ihnen auch gefallen