Beruflich Dokumente
Kultur Dokumente
PROGRAMMING
Part I
OpenMP
February 2008
Outline
Matrix multiplication
Sequential version: pseudocode
sequential solution
pthreads solution
OpenMP solution
for i := 1 to n do
for j := 1 to n do
C[i][j] := 0;
for s := 1 to n do
C[i][j] := C[i][j] + A[i][s] * B[s][j];
end for;
end for;
end for;
2. Introduction to OpenMP
3. Expressing parallel execution
4. Work sharing
5. Data storage attributes
6. Synchronization
7. Nested parallelism
8. Using third party libraries
Matrix multiplication
Matrix multiplication
PRAM CREW
parfor i := 1 to n do
parfor j := 1 to n do
C[i][j] := 0;
for s := 1 to n do
C[i][j] := C[i][j] + A[i][s] * B[s][j];
end for;
end parfor;
end parfor;
}
File matrix-seq.c.
Wojciech Mikanik, PhD
Matrix multiplication
Matrix multiplication
Facts
for i := 1 to n do
for j := 1 to n do
C[i][j] := 0;
for s := 1 to n do
C[i][j] := C[i][j] + A[i][s] * B[s][j];
end for;
end for;
end for;
Decision
Thread i computes rows i, i + NT , i + 2NT , i + 3NT , . . . of the
result matrix C
(NT Number of Threads)
Matrix multiplication
Matrix multiplication
typedef struct {
double (*a) [N];
double (*b) [N];
double (*c) [N];
sem_t * s;
int id;
} thread_args;
Matrix multiplication
Matrix multiplication
File: matrix-seq.c
File: matrix-omp.c
Wojciech Mikanik, PhD
OpenMP
OpenMP
I
Goals
I
I
I
I
I
History
I
I
I
I
I
I
I
I
Wojciech Mikanik, PhD
I
I
Format
#pragma omp directive [clause ]
Parallel region
#pragma omp parallel [clause ]
Work sharing
Synchronization
I
I
I
I
I
I
#include <omp.h>
Query functions
Lock functions
Dynamic adjustment of number of threads
Nested parallelism
I
I
I
OMP
OMP
OMP
OMP
I
I
I
I
CONCURRENT AND PARALLEL PROGRAMMING
region wide
num threads clause
application wide
I
DYNAMIC
NUM THREADS
NESTED
SCHEDULE
Wojciech Mikanik, PhD
Environment variables
I
parallel directive
#pragma omp parallel
{
printf("hello world\n");
}
Useful query functions
I
Runtime library
I
Complier directives
I
Expressing parallelism
sections directive
Syntax
I
I
I
I
I
sections directive
combined parallel sections directive
for directive
combined parallel for directive
single directive
#pragma omp single
{
doSth (defaultDelay);
single2 = omp_get_thread_num();
}
cobegin
block1
block2
block3
...
coend;
master directive
#pragma omp master
master = omp_get_thread_num();
Wojciech Mikanik, PhD
sections directive
sections directive
An example
sections
#pragma omp section
{
printf ("This code has been executed by thread %d of %d\n",
omp_get_thread_num (), omp_get_num_threads());
}
#pragma omp section
{
printf ("This code has been executed by thread %d of %d\n",
omp_get_thread_num (), omp_get_num_threads());
}
I
I
parallel
I
I
Exactly 4 messages
Thread numbers printed is any 4 element
permutation with repetitions of integers
[0 . . . teamSize 1]
Exactly teamSize messages
Thread numbers printed is a permutation
without repetitions of integers
[0 . . . teamSize 1]
}
Wojciech Mikanik, PhD
for directive
Syntax
#pragma omp for [clause]
for-loop
Example
int a[ARR SIZE];
#pragma omp parallel
{
#pragma omp for
for (i = 0; i < ARR SIZE; i ++)
a[i] = 1;
}
schedule clause
I
I
I
chunks of a chunk size size or equal size (if chunk size not
provided)
round-robin in the order of the thread number,
schedule(runtime)
scheduling according to the value of OMP SCHEDULE
environment variable
Wojciech Mikanik, PhD
Syntax
#pragma omp parallel for [clause]
for-loop
Example
double a [N][N], b [N][N], c [N][N];
int i;
...
#pragma omp parallel for
for (i = 0; i < N; i ++){
int j, s;
for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
Data model
Data model
Declaration
Variables can be
I
I
I
Shared
Private
Subject to reduction
I
I
I
Defaults
I
I
Usually: shared
Private
I
I
I
Clauses
I
I
list )
shared ( list )
private ( list )
default ( list )
firstprivate
lastprivate
reduction (operator :
threadprivate directive
copyin clause
Data model
Data model
void matrix_mult2(){
double a [N][N], b [N][N], c [N][N];
int i;
...
#pragma omp parallel
{
#pragma omp for
for (i = 0; i < N; i ++){
int j, s;
for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
}
}
Wojciech Mikanik, PhD
CONCURRENT AND PARALLEL PROGRAMMING
Data model
void matrix_mult3(){
double a [N][N], b [N][N], c [N][N];
int i, j, s;
...
#pragma omp parallel
{
#pragma omp for private(j, s)
for (i = 0; i < N; i ++){
for (j = 0; j < N; j ++){
c[i][j] = 0.0;
for (s = 0; s < N; s ++)
c[i][j] += a[i][s] * b[s][j];
}
}
}
}
Wojciech Mikanik, PhD
reduction clause
Sequential consistency
I
I
benefits
drawbacks
Relaxed consistency
I
I
temporary view
synchronization points
I
I
I
list )
Example:
#define ARR_SIZE 100000
int a [ARR_SIZE], i, sum;
...
sum = 0;
#pragma omp parallel
{
#pragma omp for reduction (+: sum)
for (i = 0; i < ARR_SIZE; i ++)
sum += a[i];
}
Synchronization
I
I
for directive
sections directive
single directive
nowait clause
Explicit synchronization (directives)
I
I
I
I
I
Implicit synchronization
I
nowait example
master
critical
atomic
barrier
ordered directive and ordered clause
critical directive
I
Syntax
#pragma omp critical [ (name)]
block
Example
#define ARR_SIZE 100000
int a [ARR_SIZE], i, sum;
...
sum = 0;
#pragma omp parallel for
for (i = 0; i < ARR_SIZE; i ++)
#pragma omp critical
sum += a[i];
printf ("Sum == %d\n", sum);
ordered example
Remaining issues
Nested parallelism
I
I
I