Sie sind auf Seite 1von 2

ME 766 : Final : Time: 2 hrs Tuesday, 27/04/2018 [Max: 40 points]

• Open book, open notes.

• Severe penalty for academic dishonesty.: Automatic FR in course with report to ADAC.

• Make a reasonable assumption if you think any data is missing.

1. Consider a hypothetical Von Neumann type machine shown in the figure. This machine
has a cache of 1 KiloByte and a main memory of 10 MegaBytes. The cost of fetching
one floating point number (8 bytes in size) from the cache is 1 CPU cycle (or one clock
tick). In the event the data needed is not in the cache, a chunk of data 1 KiloByte
in size is fetched from the main memory to cache, replacing the contents of the cache.
The cost of fetching data from main memory is 150 CPU cycles.

Figure 1: Hypothetical Von Neumann Machine

Now, considering the following snippet code, and the operations on matrix A . Both
A and c are 8 byte float datatypes.

(a) What is the cost in CPU cycles to fetch the requisite data and compute the matrix
C (code is in C)? [5]
(b) If the code were to be in FORTRAN, what will be the cost in CPU cycles? [5]

Assume that each addition, multiplication and comparison costs 1 CPU cycle.

1
#define n 1024
for(j=1; j<n; j++)
for(i=1; i<n; i++)
{
if(i>j)
{
c=A[i][j]/A[j][j];
for(k=1; k<n; k++)
A[i][k]=A[i][k]-c*A[j][k];
}
}

2. Consider a one dimensional array consisting of N random integers.

(a) Write a psuedocode for a parallel algorithm to find the maximum value in the set
of integers. [5]
(b) If the number of integers N was 1 million and assuming each comparison (an IF
statement essentially) costs 1 CPU cycle. What is the minimum number of CPU
cycles required to compute the maximum if the program is run in parallel on 2
hypothetical Von Neumann Machines which are configured exactly as in Problem
1. Disregard the cost of communication between parallel processes. [5]
(c) Repeat the analysis for above question, considering 4 and 8 processes. Plot a
graph of speedup vs processor count for 2,4 and 8 processes. [5]
(d) If the cost of sending 1 floating point number across the network (interconnect)
is 1 million CPU cycles and one uses the distributed memory architecture, what
is the smallest problem size in terms of N for which using 4 processors will result
in a speedup? [5 + 5 bonus points]

3. Consider a continuous function f (x) in the interval [a, b]. f (a) and f (b) have opposite
signs and there exists at least one root of the polynomial f (x) in the interval [a, b].
The bisection method of finding the polynomial root is given below

1. Let c= (a+b)/2 (midpoint of interval [a,b] )


2. compute f(c). If f(c)=0 or f(c) < some tolerance value ,
then c is a root
3. If f(a).f(c) < 0, a root exists in the interval [a,c]
else the root exists in the interval [c,b] .

Write a pseudocode for computing the root using a parallel bisection method with k
processors. [10]

Das könnte Ihnen auch gefallen