Beruflich Dokumente
Kultur Dokumente
• Severe penalty for academic dishonesty.: Automatic FR in course with report to ADAC.
1. Consider a hypothetical Von Neumann type machine shown in the figure. This machine
has a cache of 1 KiloByte and a main memory of 10 MegaBytes. The cost of fetching
one floating point number (8 bytes in size) from the cache is 1 CPU cycle (or one clock
tick). In the event the data needed is not in the cache, a chunk of data 1 KiloByte
in size is fetched from the main memory to cache, replacing the contents of the cache.
The cost of fetching data from main memory is 150 CPU cycles.
Now, considering the following snippet code, and the operations on matrix A . Both
A and c are 8 byte float datatypes.
(a) What is the cost in CPU cycles to fetch the requisite data and compute the matrix
C (code is in C)? [5]
(b) If the code were to be in FORTRAN, what will be the cost in CPU cycles? [5]
Assume that each addition, multiplication and comparison costs 1 CPU cycle.
1
#define n 1024
for(j=1; j<n; j++)
for(i=1; i<n; i++)
{
if(i>j)
{
c=A[i][j]/A[j][j];
for(k=1; k<n; k++)
A[i][k]=A[i][k]-c*A[j][k];
}
}
(a) Write a psuedocode for a parallel algorithm to find the maximum value in the set
of integers. [5]
(b) If the number of integers N was 1 million and assuming each comparison (an IF
statement essentially) costs 1 CPU cycle. What is the minimum number of CPU
cycles required to compute the maximum if the program is run in parallel on 2
hypothetical Von Neumann Machines which are configured exactly as in Problem
1. Disregard the cost of communication between parallel processes. [5]
(c) Repeat the analysis for above question, considering 4 and 8 processes. Plot a
graph of speedup vs processor count for 2,4 and 8 processes. [5]
(d) If the cost of sending 1 floating point number across the network (interconnect)
is 1 million CPU cycles and one uses the distributed memory architecture, what
is the smallest problem size in terms of N for which using 4 processors will result
in a speedup? [5 + 5 bonus points]
3. Consider a continuous function f (x) in the interval [a, b]. f (a) and f (b) have opposite
signs and there exists at least one root of the polynomial f (x) in the interval [a, b].
The bisection method of finding the polynomial root is given below
Write a pseudocode for computing the root using a parallel bisection method with k
processors. [10]