Parallel Computing

Parallel & High Performance Computing
8/18/2013
High Performance Computing

Branch
of computing that deals with extremely powerful computers and the applications that use them Supercomputers: Fastest computer at any given point of time HPC Applications: Applications that cannot be solved by conventional computers in a reasonable amount of time
8/18/2013
Supercomputers
Characterized
memory Speed measured in terms of number of floating point operations per second (FLOPS) Fastest Computer in the world: Earth Simulator (NEC, Japan) 35 Tera Flops Memory in the order of hundreds of gigabytes or terabytes
by very high speed, very large
8/18/2013
HPC Technologies
Different
approaches for building supercomputers

Traditional : Build faster CPUs
Special
Semiconductor technology for increasing clock speed Advanced CPU architecture: Pipelining, Vector Processing, Multiple functional units etc.
Parallel Processing : Harness large number of ordinary CPUs and divide the job between them
8/18/2013
Traditional Supercomputers
Eg:
CRAY Very complex architecture Very high clock speed results in very high heat dissipation and advanced cooling techniques (Liquid Freon / Liquid Nitrogen) Custom built or produced as per order Extremely expensive Advantages: Program development is conventional and straight forward
8/18/2013
Alternative to Supercomputer
Parallel Computing: the use of multiple computers or processors working together on a single problem; harness large number of ordinary CPUs and divide the job between them each processor works on its section of the problem processors are allowed to exchange information with other processors via fast interconnect path
Sequential 1 1
Parallel
cpu 1 2500 2501 5000 5001 cpu 3 7500 7501 cpu 4 cpu 2
10000
10000
Big advantages of parallel computers: 1. total computing performance multiples of processors used 2. total very large amount of memory to fit very large programs 3. Much lower cost and can be developed in India
8/18/2013
What is Parallel Computing?

Traditionally
software has been written for serial computation. Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem.
8/18/2013
Why Use Parallel Computing?

Saves
time wall clock time Cost savings Overcoming memory constraints Its the future of computing
8/18/2013
10
8/18/2013
Flynns Classical Taxonomy

Distinguishes
multi-processor architecture by instruction and data SISD Single Instruction, Single Data SIMD Single Instruction, Multiple Data MISD Multiple Instruction, Single Data MIMD Multiple Instruction, Multiple Data
11
8/18/2013
Flynns Classical Taxonomy: SISD

Serial
Only
one instruction and data stream is acted on during any one clock cycle
12
8/18/2013
SISD
IS
IS
DS
13
8/18/2013
Flynns Classical Taxonomy: SIMD

All
processing units execute the same instruction at any given clock cycle. Each processing unit operates on a different data element.
14
8/18/2013
SIMD
P
IS DS
C
P
DS
15
8/18/2013
Flynns Classical Taxonomy: MISD

Different
instructions operated on a single data element. Very few practical uses for this type of classification. Example: Multiple cryptography algorithms attempting to crack a single coded message.
16
8/18/2013
MISD
IS
IS
DS
M
IS
IS
DS
17
8/18/2013
Flynns Classical Taxonomy: MIMD

Can
execute different instructions on different data elements. Most common type of parallel computer.
18
8/18/2013
MIMD
IS
IS
DS
M
IS
IS
DS
19
8/18/2013
Modern Classification
Parallel architectures
Data-parallel
Function-parallel
architectures
architectures
20
8/18/2013
Data Parallel Architectures

Data-parallel
architectures
Vector architectures
Associative And neural architectures
SIMDs
Systolic architectures
21
8/18/2013
Function Parallel Architectures

Function-parallel architectures Instr level Parallel Arch (ILPs) Thread level Parallel Arch Process level Parallel Arch (MIMDs)
Pipelined VLIWs Superscalar processors processors
Distributed Memory MIMD
Shared Memory MIMD
22
8/18/2013
Types of Parallel Computers
The parallel computers are classified as

shared memory
distributed memory
Both shared and distributed memory systems have:

1. 2. 3.
processors: now generally commodity processors memory: now general commodity DRAM/DDR network/interconnect: between the processors or memory
23
8/18/2013
Interconnect Method
There is no single way to connect bunch of processors The manner in which the nodes are connected Network & Topology Best choice would be a fully connected network (every processor to every other). Unfeasible for cost and scaling reasons : Instead, processors are arranged in some variation of a grid, torus, tree, bus, mesh or hypercube.
3-d hypercube
2-d mesh
2-d torus
24
8/18/2013
Parallel Computer Memory Architectures: Shared Memory Architecture

All
processors access all memory as a single global address space. Data sharing is fast. Lack of scalability between memory and CPUs
25
8/18/2013
Shared Address Space

Process P1 Shared Private Process P2 Shared Private Process P3 Pvt P1 Shared
Pvt P2
Pvt P3
Shared
Private
Physical address space
Virtual address space of each process
26
8/18/2013
Parallel Computer Memory Architectures: Distributed Memory

Each
processor has its own memory. Is scalable, no overhead for cache coherency. Programmer is responsible for many details of communication between processors.
27
8/18/2013
Symmetric Multiprocessors (SMP)

A collection of processors, a collection of memory: both are connected through some interconnect (usually, the fastest possible) Symmetric because latency for any processor to access any memory is constant uniform memory access (UMA)
Proc 1 Proc 2 Proc 3 Proc 4
Mem 1
Mem 2
Mem 3
Mem 4
28
8/18/2013
Distributed Memory Multiprocessors

Each processor has local memory that is accessible through a fast interconnect The different nodes are connected as I/O devices with (potentially) slower interconnect
Local memory access is a lot faster than remote memory non-uniform memory access (NUMA)
Advantage: can be built with commodity processors and many applications will perform well thanks to locality
Proc 1 Mem 1 Proc 2 Mem 2 Proc 3 Mem 3 Proc 4 Mem 4
29
8/18/2013
Parallel Programming Models

Exist
as an abstraction above hardware and memory architectures Examples:
Shared Memory Threads Messaging Passing Data Parallel
30
8/18/2013
Parallel Programming Models: Shared Memory Model

Appears
to the user as a single shared memory, despite hardware implementations. Locks and semaphores may be used to control shared memory access. Program development can be simplified since there is no need to explicitly specify communication between tasks.
31
8/18/2013
Parallel Programming Models: Threads Model

A
single process may have multiple, concurrent execution paths. Typically used with a shared memory architecture. Programmer is responsible for determining all parallelism.
32
8/18/2013
Parallel Programming Models: Message Passing Model

Tasks
exchange data by sending and receiving messages. Typically used with distributed memory architectures. Data transfer requires cooperative operations to be performed by each process. Ex.- a send operation must have a receive operation. MPI (Message Passing Interface) is the interface standard for message passing.
33
8/18/2013
Parallel Programming Models: Data Parallel Model

Tasks
performing the same operations on a set of data. Each task working on a separate piece of the set. Works well with either shared memory or distributed memory architectures.
34
8/18/2013
Designing Parallel Programs: Automatic Parallelization

Automatic
Compiler analyzes code and identifies opportunities for parallelism Analysis includes attempting to compute whether or not the parallelism actually improves performance. Loops are the most frequent target for automatic parallelism.
35
8/18/2013
Designing Parallel Programs: Manual Parallelization

Understand
the problem
A Parallelizable Problem:
Calculate
the potential energy for each of several thousand independent conformations of a molecule. When done find the minimum energy conformation. Fibonacci Series
A Non-Parallelizable Problem:
The
All calculations are dependent
Designing Parallel Programs: Domain Decomposition

Each
36
8/18/2013
task handles a portion of the data set.
37
8/18/2013
Designing Parallel Programs: Functional Decomposition

Each
work
task performs a function of the overall
38
8/18/2013
Parallel Algorithm Examples: Array Processing

Serial

Solution
Perform a function on a 2D array. Single processor iterates through each element in the array
Possible

Parallel Solution
Assign each processor a partition of the array. Each process iterates through its own partition.
39
8/18/2013
Parallel Algorithm Examples: Odd-Even Transposition Sort

Basic
idea is bubble sort, but concurrently comparing odd indexed elements with an adjacent element, then even indexed elements. If there are n elements in an array and there are n/2 processors. The algorithm is effectively O(n)!
40
8/18/2013
Parallel Algorithm Examples: Odd Even Transposition Sort

Initial
6,
4, 5, 2, 3, 0, 1 4, 6, 2, 5, 0, 3, 1 4, 2, 6, 0, 5, 1, 3 2, 4, 0, 6, 1, 5, 3 2, 0, 4, 1, 6, 3, 5 0, 2, 1, 4, 3, 6, 5 0, 1, 2, 3, 4, 5, 6
6, 5, 4, 3, 2, 1, 0

array:
Worst case scenario. Phase 1 Phase 2 Phase 1 Phase 2 Phase 1 Phase 2 Phase 1
41
8/18/2013
Other Parallelizable Problems

The
n-body problem Floyds Algorithm
Serial: O(n^3), Parallel: O(n log p)
Game
Trees Divide and Conquer Algorithms
42
8/18/2013
Conclusion
Parallel
computing is fast. There are many different approaches and models of parallel computing. Parallel computing is the future of computing.

Parallel Computing

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Parallel Computing

Hochgeladen von

Copyright:

Verfügbare Formate

Parallel & High Performance Computing

High Performance Computing

by very high speed, very large

approaches for building supercomputers

What is Parallel Computing?

Why Use Parallel Computing?

Flynns Classical Taxonomy

Flynns Classical Taxonomy: SISD

Flynns Classical Taxonomy: SIMD

Flynns Classical Taxonomy: MISD

Flynns Classical Taxonomy: MIMD

Data Parallel Architectures

Associative And neural architectures

Function Parallel Architectures

Pipelined VLIWs Superscalar processors processors

Distributed Memory MIMD

Shared Memory MIMD

Types of Parallel Computers

The parallel computers are classified as

Both shared and distributed memory systems have:

Parallel Computer Memory Architectures: Shared Memory Architecture

Shared Address Space

Physical address space

Virtual address space of each process

Parallel Computer Memory Architectures: Distributed Memory

Symmetric Multiprocessors (SMP)

Distributed Memory Multiprocessors

Parallel Programming Models

as an abstraction above hardware and memory architectures Examples:

Shared Memory Threads Messaging Passing Data Parallel

Parallel Programming Models: Shared Memory Model

Parallel Programming Models: Threads Model

Parallel Programming Models: Message Passing Model

Parallel Programming Models: Data Parallel Model

Designing Parallel Programs: Automatic Parallelization

Designing Parallel Programs: Manual Parallelization

All calculations are dependent

Designing Parallel Programs: Domain Decomposition

task handles a portion of the data set.

Designing Parallel Programs: Functional Decomposition

task performs a function of the overall

Parallel Algorithm Examples: Array Processing

Parallel Algorithm Examples: Odd-Even Transposition Sort

Parallel Algorithm Examples: Odd Even Transposition Sort

Other Parallelizable Problems

n-body problem Floyds Algorithm

Serial: O(n^3), Parallel: O(n log p)

Trees Divide and Conquer Algorithms

Das könnte Ihnen auch gefallen