Algorithms For Parallel Machines

Algorithms for Parallel Machines
1) Problems which will be answered in the following

sections.
1 In what way is design PP different from design of sequential

programs?
2 Can we always write sequential programs first and then get
them to run on parallel machines?
3 Is SuperLinear Speedup achievable?
2) Speedup, Complexity and Cost
Speedup measures the degree of improvement in speed when

a problem is solved on a parallel machine as compared to
sequential machine.
Complexity can be considered of two types:
1. Worst case time complexity it is the maximum time taken

by a program to execute over all inputs.
2. Expected time complexity it is the average of execution
times over all inputs.
Analogously, we can define Worst case Space complexity and

Expected Space complexity.
3) Histogram Computation
Given an image as an array, Image[m][n] of integers in range

[0,255], find the Histogram of distribution of pixels over gray
scales in the image.
Sequential algorithm to solve above problem:
For(I = 0; I < 256; I ++)
Histogram[i] = 0;
For (I = 0; I < m; I ++)
For (j = 0; j < n; j ++){
Color = image[i][j];
Histogram[color]++;
Complexity of the above program is O(mn).
Lets parallelize the above program in following way.
Let there p processes numbered 0 to (p-1) and m rows (m > p) in

the image. Each process can be assigned (m / p) rows to compute
the Histogram. The Histogram is shared across the processes for
concurrent updation. Hence kth process executes the following:
For(i=k*(m/p); I <(k+1)*(m/p); i++)
For(j=0; j<n; j++){
Mutex_begin();
Histogram[color]++;
Mutex_end();
}
Still above program takes longer than the previous one to execute
due to use of mutex.
The better approach is that each process independently its

histogram which can be done in parallel. After all processes have
done the work, a single process accumulates the count in the P
separate histograms to update the final result.
For(i=k*(m/p); I <(k+1)*(m/p); i++)
For(j=0; j<n; j++){
Hist[color][k]++;
For(I = 0; I < 256; I ++)
For(j = 0; j < p; j ++)
Histogram [i] = histogram [i] + hist[i][j];
4) Parallel reduction :
Given a set of n values a0, a1, ..,a(n-1) and an associative

operator Reduction is the process of computing a0 a1 ..
a(n-1).
Eg of associative operator are :
Addition , multiplication etc

Algorithm for sequential reduction will be :
a) Initailise sum=0 b) iterate over the n integers in the input

adding it to the sum.
For parallel processing we view the process of reduction as a tree

structured operation. The element of input to the reduction
problem are placed at the leaf node of the binary tree.
The reduction operator is applied to the children of each parent

node and the result is propagated towards the root of the tree.
When the result at root node is completed the work of the

algorithm is over.
Analysis of parallel Reduction
There is dependency across levels of reduction. Unless

reduction at lower level is completed , higher level cannot
proceed with the operation.
After each step , the algorithm must have an equivalent of a

barrier operation.
5) Quadrature Problems :
For a function of type y=f(x) , we want to find the area under the
curve.
The range of methods for numerically computing this integral is

referred to as quadrature methods.
EG : Trapezoidal Rule.
The trapezoidal rule statically sub divides the domain into

uniformly spaced partitions .
The integral is estimated by summing up the areas of trapezia
approximating the area under the curve.
It is parallised by allotting non overlapping ranges of domain to

different process. Each process finds the local sum of areas in its
allocated range of domain and finally the global sum is updated.
Problems :
Accuracy depends on function : Since domain is divided

statically , the region where the domain has sharp variations
cannot be found very easily.Hence the accuracy suffers.
Complexity Increases.
Solution 1 : Adaptive Quadrature Algorithm :
It uses the divide and conquer method .
To find the integral of a function of type y=f(x) in the range [a,b],

we take the following steps.
1. Lets A= Area of the trapezium considering the end points

a,b.
B= Area of the trapezium considering the end points

[a, (a+b)/2]
C= Area of the trapezium considering the end points

[(a+b)/2 , b].
2. Sub-divide the range [a,b] if |A-(B+C)|>=0
3. Assign the responsibility of one sub domain to another process

and take the other part for local processing .
4. If the above condition does not hold , compute the quadrature
in the same process and update the global result.
Solution 2 : Self Scheduling Implementation
This scheme creates a group of worker processes and the

workers schedule themselves dynamically to pick up work that is
available from the shared global stack.
When there is no more work to be done , the workers perish.
This implementing strategy is called Self Scheduling.
The algorithm is :
The parent process :
1) Stores the task on shared stack
2) Creates the worker process.
Each worker Process :
1) Pops a task from the stack and starts working on it .
2) If problem cannot be solved immediately , partition the tasks

further and push the subtasks on the stack. Go to 1 and repeat
the same steps.
3) If there is no space on the stack then process must

sequentially do the tasks and update the global result.
4) All processes terminate when there is no more work to be

done.
6.Matrix Multiplication
Sequential algorithm for matrix multiplication
For(i=0 ; i<l ; i++)
For(j=0 ; j<n ; i++)
For(i=0 ; i<l ; i++)

Algorithms For Parallel Machines

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Algorithms For Parallel Machines

Hochgeladen von

Copyright:

Verfügbare Formate

Algorithms for Parallel Machines

1) Problems which will be answered in the following

1 In what way is design PP different from design of sequential

2) Speedup, Complexity and Cost

Speedup measures the degree of improvement in speed when

Complexity can be considered of two types:

1. Worst case time complexity it is the maximum time taken

Analogously, we can define Worst case Space complexity and

Given an image as an array, Image[m][n] of integers in range

For(I = 0; I < 256; I ++)

For (I = 0; I < m; I ++)

For (j = 0; j < n; j ++){

Complexity of the above program is O(mn).

Lets parallelize the above program in following way.

Let there p processes numbered 0 to (p-1) and m rows (m > p) in

For(i=k*(m/p); I <(k+1)*(m/p); i++)

For(j=0; j<n; j++){

The better approach is that each process independently its

For(i=k*(m/p); I <(k+1)*(m/p); i++)

For(j=0; j<n; j++){

For(I = 0; I < 256; I ++)

For(j = 0; j < p; j ++)

Histogram [i] = histogram [i] + hist[i][j];

Given a set of n values a0, a1, ..,a(n-1) and an associative

Eg of associative operator are :

Addition , multiplication etc

a) Initailise sum=0 b) iterate over the n integers in the input

For parallel processing we view the process of reduction as a tree

The reduction operator is applied to the children of each parent

When the result at root node is completed the work of the

Analysis of parallel Reduction

There is dependency across levels of reduction. Unless

After each step , the algorithm must have an equivalent of a

The range of methods for numerically computing this integral is

The trapezoidal rule statically sub divides the domain into

It is parallised by allotting non overlapping ranges of domain to

Accuracy depends on function : Since domain is divided

Solution 1 : Adaptive Quadrature Algorithm :

It uses the divide and conquer method .

To find the integral of a function of type y=f(x) in the range [a,b],

1. Lets A= Area of the trapezium considering the end points

B= Area of the trapezium considering the end points

C= Area of the trapezium considering the end points

2. Sub-divide the range [a,b] if |A-(B+C)|>=0

3. Assign the responsibility of one sub domain to another process

Solution 2 : Self Scheduling Implementation

This scheme creates a group of worker processes and the

When there is no more work to be done , the workers perish.

This implementing strategy is called Self Scheduling.

The parent process :

1) Stores the task on shared stack

2) Creates the worker process.

Each worker Process :

1) Pops a task from the stack and starts working on it .

2) If problem cannot be solved immediately , partition the tasks

3) If there is no space on the stack then process must

4) All processes terminate when there is no more work to be

For(i=0 ; i<l ; i++)

For(j=0 ; j<n ; i++)

For(i=0 ; i<l ; i++)

Das könnte Ihnen auch gefallen

For(i=k(m/p); I <(k+1)(m/p); i++)

For(i=k(m/p); I <(k+1)(m/p); i++)