Beruflich Dokumente
Kultur Dokumente
4 Introduction to Threads
5 Kinds of threads
User threads
Kernel threads
Mixed models
6 User Threads and Blocking System Calls
Scheduler Activations
7 Thread Programming Interface
POSIX Threads
Linux POSIX Threads Libraries
Basic POSIX Thread API
Part I: High Performance Architectures
Part II: Parallelism and Threads
Outlines
Part III: Synchronisation
Part IV: Multithreading and Networking
Synchronisation
8 Hardware Support
9 Busy-waiting Synchronisation
Part I
Parallel Architectures
Multiprocessors Clusters
P P
P Mem P Mem P Mem
Mem
P P Fast network
Parallel Machines with Shared Memory
ILP and multi-cores
Parallel Machines with Distributed Memory
Symmetric Multi Processors
Current Architectures in HPC
P P M P P M
M
● SMT P P M P P M
(HyperThreading) M
● Multi-core
● NUMA
P P M P P M
M
Parallel Machines with Shared Memory
ILP and multi-cores
Parallel Machines with Distributed Memory
Symmetric Multi Processors
Current Architectures in HPC
AMD Quad-Core
P P P P
L2 $ L2 $ L2 $ L2 $
M L3 $
P P P P
€ L2
L2 $ €
L2L2
$ L2
€ L2
$ L2
€ L2
$
Shared L3 cache M € L3
L3 $
P P P P
NUMA factor ~1.1-1.5 M
L2 $ L2 $ L2 $ L2 $
L3 $
...
Parallel Machines with Shared Memory
ILP and multi-cores
Parallel Machines with Distributed Memory
Symmetric Multi Processors
Current Architectures in HPC
Intel Quad-Core
P P P P
L2 $ L2 $
M
P P P P
L2 $ L2 $
...
Parallel Machines with Shared Memory
ILP and multi-cores
Parallel Machines with Distributed Memory
Symmetric Multi Processors
Current Architectures in HPC
dual-quad-core
● Intel
● Hierarchical cache levels
P0 P4 P2 P6
P1 P5 P3 P7
5
Parallel Machines with Shared Memory
Clusters
Parallel Machines with Distributed Memory
Grids
Current Architectures in HPC
Clusters
Grids
Part II
4 Introduction to Threads
5 Kinds of threads
User threads
Kernel threads
Mixed models
6 User Threads and Blocking System Calls
Scheduler Activations
7 Thread Programming Interface
POSIX Threads
Linux POSIX Threads Libraries
Basic POSIX Thread API
Introduction to Threads
Kinds of threads
User Threads and Blocking System Calls
Thread Programming Interface
Using process
Process
Processors
Introduction to Threads
Kinds of threads
User Threads and Blocking System Calls
Thread Programming Interface
Using threads
Multithreaded process
Process
Memory
Processors
Introduction to Threads
Kinds of threads
User Threads and Blocking System Calls
Thread Programming Interface
Introduction to Threads
Why threads ?
To take profit from shared memory parallel architectures
SMP, hyperthreaded, multi-core, NUMA, etc. processors
future Intel processors: several hundreds cores
To describe the parallelism within the applications
independent tasks, I/O overlap, etc.
User threads
Multithreaded process
User level
threads
User scheduler
Entities managed
by the kernel
Kernel scheduler
Operating system
Processors
Kernel threads
Multithreaded process
Kernel level
threads
Entities managed
by the kernel
Kernel scheduler
Operating system
Processors
Mixed models
Multithreaded process
User level
threads
User scheduler
Entities managed
by the kernel
Kernel scheduler
Operating system
Processors
Mixed models
Characteristics
Library Efficiency Flexibility SMP Blocking syscalls
User + + - -
Kernel - - + +
Mixed + + + limited
Summary
Mixed libraries seems more attractive however they are more
complex to develop. They also suffer from the blocking system
call problem.
Introduction to Threads
Kinds of threads
Scheduler Activations
User Threads and Blocking System Calls
Thread Programming Interface
Scheduler Activations
Idea proposed by Anderson et al. (91)
Dialogue (and not monologue) between the user and kernel
schedulers
the user scheduler uses system calls
the kernel scheduler uses upcalls
Upcalls
Notify the application of scheduling kernel events
Activations
a new structure to support upcalls
a kinf of kernel thread or virtual processor
creating and destruction managed by the kernel
Introduction to Threads
Kinds of threads
Scheduler Activations
User Threads and Blocking System Calls
Thread Programming Interface
Scheduler Activations
Instead of:
User
space
Kernel Time
Scheduler Activations
Instead of:
User
space
Kernel Time
User
space
upcall(blocked, new) upcall(unblocked, preempted, new)
Kernel Time
Act. B Act. C
Act. A (next)
External request Interruption
Introduction to Threads
Kinds of threads
Scheduler Activations
User Threads and Blocking System Calls
Thread Programming Interface
Working principle
Multithreaded process
User-level
threads
User scheduler
Activations
managed by the kernel
Kernel scheduler
Operating system
Processors
Introduction to Threads
Kinds of threads
Scheduler Activations
User Threads and Blocking System Calls
Thread Programming Interface
Working principle
Multithreaded process
User-level
threads
Upcall(New)
Activations
managed by the kernel
Kernel scheduler
Operating system
Processors
Introduction to Threads
Kinds of threads
Scheduler Activations
User Threads and Blocking System Calls
Thread Programming Interface
Working principle
Multithreaded process
User-level
threads
Activations
managed by the kernel
Kernel scheduler
Operating system
Processors
Introduction to Threads
Kinds of threads
Scheduler Activations
User Threads and Blocking System Calls
Thread Programming Interface
Working principle
Multithreaded process
User-level
threads
Upcall(New)
Activations
managed by the kernel
Kernel scheduler
Operating system
Processors
Introduction to Threads
Kinds of threads
Scheduler Activations
User Threads and Blocking System Calls
Thread Programming Interface
Working principle
Multithreaded process
Blocking
User-level system
threads call
Activations
managed by the kernel
Kernel scheduler
Operating system
Processors
Introduction to Threads
Kinds of threads
Scheduler Activations
User Threads and Blocking System Calls
Thread Programming Interface
Working principle
Multithreaded process
Blocking
User-level system
threads call
Upcall(Blocked, New)
Activations
managed by the kernel
Kernel scheduler
Operating system
Processors
Introduction to Threads
Kinds of threads
Scheduler Activations
User Threads and Blocking System Calls
Thread Programming Interface
Working principle
Multithreaded process
Blocking
User-level system
threads call
Activations
managed by the kernel
Kernel scheduler
Operating system
Processors
Introduction to Threads
Kinds of threads
Scheduler Activations
User Threads and Blocking System Calls
Thread Programming Interface
Working principle
Multithreaded process
Blocking
User-level system
threads call
Activations
managed by the kernel
Kernel scheduler
Operating system
Processors
Introduction to Threads
Kinds of threads
Scheduler Activations
User Threads and Blocking System Calls
Thread Programming Interface
Working principle
Multithreaded process
User-level
threads
Activations
managed by the kernel
Kernel scheduler
Operating system
Processors
Introduction to Threads
Kinds of threads
Scheduler Activations
User Threads and Blocking System Calls
Thread Programming Interface
Working principle
Multithreaded process
User-level
threads
Upcall(Unblocked,Preempted,New)
Activations
managed by the kernel
Kernel scheduler
Operating system
Processors
Introduction to Threads
POSIX Threads
Kinds of threads
Linux POSIX Threads Libraries
User Threads and Blocking System Calls
Basic POSIX Thread API
Thread Programming Interface
POSIX normalisation
IEEE POSIX 1003.1 C norm (also called POSIX threads
norm)
Only the API is normalised (not the ABI)
POSIX thread libraries can easily be switched at source
level but not at runtime
POSIX threads own
processor registers, stack, etc.
signal mask
POSIX threads can be of any kind (user, kernel, etc.)
Introduction to Threads
POSIX Threads
Kinds of threads
Linux POSIX Threads Libraries
User Threads and Blocking System Calls
Basic POSIX Thread API
Thread Programming Interface
Creation/destruction
int pthread_create(pthread_t *thread, const
pthread_attr_t *attr, void
*(*start_routine)(void*), void *arg)
void pthread_exit(void *value_ptr)
int pthread_join(pthread_t thread, void
**value_ptr)
Synchronisation (semaphores)
int sem_init(sem_t *sem, int pshared, unsigned
int value)
int sem_wait(sem_t *sem)
int sem_post(sem_t *sem)
int sem_destroy(sem_t *sem)
Introduction to Threads
POSIX Threads
Kinds of threads
Linux POSIX Threads Libraries
User Threads and Blocking System Calls
Basic POSIX Thread API
Thread Programming Interface
Synchronisation (conditions)
int pthread_cond_init(pthread_cond_t *cond,
const pthread_condattr_t *attr)
int pthread_cond_wait(pthread_cond_t *cond,
pthread_mutex_t *mutex)
int pthread_cond_signal(pthread_cond_t *cond)
Introduction to Threads
POSIX Threads
Kinds of threads
Linux POSIX Threads Libraries
User Threads and Blocking System Calls
Basic POSIX Thread API
Thread Programming Interface
Part III
Synchronisation
Hardware Support
Busy-waiting Synchronisation
High-level Synchronisation Primitives
Some examples with Linux
Outlines: Synchronisation
8 Hardware Support
9 Busy-waiting Synchronisation
Hardware Support
var++; var++;
Hardware Support
Busy-waiting Synchronisation
High-level Synchronisation Primitives
Some examples with Linux
Hardware Support
Hardware Support
Example of code
while (! TAS(&var))
;
/* in critical section */
var=0;
Hardware Support
Busy-waiting Synchronisation
High-level Synchronisation Primitives
Some examples with Linux
Example of code
while (! TAS(&var))
while (var) ;
/* in critical section */
var=0;
Hardware Support
Busy-waiting Synchronisation
High-level Synchronisation Primitives
Some examples with Linux
Example of code
while (! TAS(&var))
while (var) ;
/* in critical section */
var=0;
Busy waiting
+ very reactive
+ no OS or lib support required
- use a processor while not doing anything
- does not scale if there are lots of waiters
Hardware Support
Busy-waiting Synchronisation Semaphores
High-level Synchronisation Primitives Monitors
Some examples with Linux
Semaphores
Monitors
Mutex
Two states: locked or not
Two methods:
lock(m) take the mutex
unlock(m) release the mutex (must be done by the
thread owning the mutex)
Conditions
waiting thread list (conditions are not related with tests)
Three methods:
wait(c, m) sleep on the condition. The mutex is released
atomically during the wait.
signal(c) one sleeping thread is wake up
broadcast(c) all sleeping threads are wake up
Hardware Support
Busy-waiting Synchronisation Old Linux libpthread
High-level Synchronisation Primitives New POSIX Thread Library
Some examples with Linux
Part IV
Standard in industry
frequently used by engineers, physicians, etc.
MPI2 begin to be available
if (mynode!=0)
MPI_Recv();
req=MPI_Isend(next);
Work(); /* about 1s */
MPI_Wait(req);
if (mynode==0)
MPI_Recv();
A Brief Overview of MPI Problems arises
Mixing Threads and Communication in HPC Discussion about solution
if (mynode!=0)
MPI_Recv();
req=MPI_Isend(next);
Work(); /* about 1s */
MPI_Wait(req);
if (mynode==0)
MPI_Recv();
expected time: ~ 1 s
observed time: ~ 4 s
A Brief Overview of MPI Problems arises
Mixing Threads and Communication in HPC Discussion about solution
req=MPI_Isend(next);
Recv req
Isend Work(); /* about 1s */
ack MPI_Wait(req);
Calculus
if (mynode==0)
Wait data MPI_Recv();
Isend req
expected time: ~ 1 s
Calculus observed time: ~ 4 s
A Brief Overview of MPI Problems arises
Mixing Threads and Communication in HPC Discussion about solution
Possible solutions
add calls to MPI_test() in the code
using a multithreaded MPI version
+ parallelism, communication progression independent from
computations
- busy waiting synchronisation less efficient
- scrutation must be managed
A Brief Overview of MPI Problems arises
Mixing Threads and Communication in HPC Discussion about solution
instead of
A Brief Overview of MPI Problems arises
Mixing Threads and Communication in HPC Discussion about solution
A
MPI call
group
{
MPI
}
library poll
{
}
User level
Application threads scheduler
A Brief Overview of MPI Problems arises
Mixing Threads and Communication in HPC Discussion about solution
A
MPI call
init_server() Callback functions:
- group
- poll
Frequency
group
{
MPI
}
library poll
{
}
User level
Application threads scheduler
A Brief Overview of MPI Problems arises
Mixing Threads and Communication in HPC Discussion about solution
A C
B MPI call
ev_wait()
ev_wait()
group
{
MPI
}
library poll
{ Req1 Req2
}
User level
Application threads scheduler
A Brief Overview of MPI Problems arises
Mixing Threads and Communication in HPC Discussion about solution
A C
B MPI call
Aggregated
requests
1 2
group
{
MPI
}
library poll
{ Req1 Req2
}
User level
Application threads scheduler
A Brief Overview of MPI Problems arises
Mixing Threads and Communication in HPC Discussion about solution
A C
B MPI call
Aggregated
requests
1 2
group
{
MPI
}
library poll
{ Req1 Req2
}
User level
Application threads scheduler
A Brief Overview of MPI Problems arises
Mixing Threads and Communication in HPC Discussion about solution
A C
B MPI call
Aggregated
requests
1
group
{
MPI
}
library poll
{ Req1
}
User level
Application threads scheduler
Part V
Conclusion
Outlines: Conclusion
Conclusion
Multi-threading
cannot be avoided in current HPC
directly or through languages/middlewares
difficulties to get a efficient scheduling
no perfect universal scheduler
threads must be scheduled with respect to memory (NUMA)
threads and communications must be scheduled together