Sie sind auf Seite 1von 42

Parallel & High Performance Computing

8/18/2013

High Performance Computing


Branch

of computing that deals with extremely powerful computers and the applications that use them Supercomputers: Fastest computer at any given point of time HPC Applications: Applications that cannot be solved by conventional computers in a reasonable amount of time

8/18/2013

Supercomputers
Characterized

memory Speed measured in terms of number of floating point operations per second (FLOPS) Fastest Computer in the world: Earth Simulator (NEC, Japan) 35 Tera Flops Memory in the order of hundreds of gigabytes or terabytes

by very high speed, very large

8/18/2013

HPC Technologies
Different

approaches for building supercomputers


Traditional : Build faster CPUs
Special

Semiconductor technology for increasing clock speed Advanced CPU architecture: Pipelining, Vector Processing, Multiple functional units etc.

Parallel Processing : Harness large number of ordinary CPUs and divide the job between them

8/18/2013

Traditional Supercomputers
Eg:

CRAY Very complex architecture Very high clock speed results in very high heat dissipation and advanced cooling techniques (Liquid Freon / Liquid Nitrogen) Custom built or produced as per order Extremely expensive Advantages: Program development is conventional and straight forward

8/18/2013

Alternative to Supercomputer

Parallel Computing: the use of multiple computers or processors working together on a single problem; harness large number of ordinary CPUs and divide the job between them each processor works on its section of the problem processors are allowed to exchange information with other processors via fast interconnect path

Sequential 1 1

Parallel

cpu 1 2500 2501 5000 5001 cpu 3 7500 7501 cpu 4 cpu 2

10000

10000

Big advantages of parallel computers: 1. total computing performance multiples of processors used 2. total very large amount of memory to fit very large programs 3. Much lower cost and can be developed in India

8/18/2013

What is Parallel Computing?


Traditionally

software has been written for serial computation. Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem.

8/18/2013

Why Use Parallel Computing?


Saves

time wall clock time Cost savings Overcoming memory constraints Its the future of computing

8/18/2013

10

8/18/2013

Flynns Classical Taxonomy


Distinguishes

multi-processor architecture by instruction and data SISD Single Instruction, Single Data SIMD Single Instruction, Multiple Data MISD Multiple Instruction, Single Data MIMD Multiple Instruction, Multiple Data

11

8/18/2013

Flynns Classical Taxonomy: SISD


Serial

Only

one instruction and data stream is acted on during any one clock cycle

12

8/18/2013

SISD
IS

IS

DS

13

8/18/2013

Flynns Classical Taxonomy: SIMD


All

processing units execute the same instruction at any given clock cycle. Each processing unit operates on a different data element.

14

8/18/2013

SIMD
P
IS DS

C
P
DS

15

8/18/2013

Flynns Classical Taxonomy: MISD


Different

instructions operated on a single data element. Very few practical uses for this type of classification. Example: Multiple cryptography algorithms attempting to crack a single coded message.

16

8/18/2013

MISD
IS

IS

DS

M
IS

IS

DS

17

8/18/2013

Flynns Classical Taxonomy: MIMD


Can

execute different instructions on different data elements. Most common type of parallel computer.

18

8/18/2013

MIMD
IS

IS

DS

M
IS

IS

DS

19

8/18/2013

Modern Classification
Parallel architectures

Data-parallel

Function-parallel

architectures

architectures

20

8/18/2013

Data Parallel Architectures


Data-parallel

architectures

Vector architectures

Associative And neural architectures

SIMDs

Systolic architectures

21

8/18/2013

Function Parallel Architectures


Function-parallel architectures Instr level Parallel Arch (ILPs) Thread level Parallel Arch Process level Parallel Arch (MIMDs)

Pipelined VLIWs Superscalar processors processors

Distributed Memory MIMD

Shared Memory MIMD

22

8/18/2013

Types of Parallel Computers

The parallel computers are classified as


shared memory
distributed memory

Both shared and distributed memory systems have:


1. 2. 3.

processors: now generally commodity processors memory: now general commodity DRAM/DDR network/interconnect: between the processors or memory

23

8/18/2013

Interconnect Method
There is no single way to connect bunch of processors The manner in which the nodes are connected Network & Topology Best choice would be a fully connected network (every processor to every other). Unfeasible for cost and scaling reasons : Instead, processors are arranged in some variation of a grid, torus, tree, bus, mesh or hypercube.

3-d hypercube

2-d mesh

2-d torus

24

8/18/2013

Parallel Computer Memory Architectures: Shared Memory Architecture


All

processors access all memory as a single global address space. Data sharing is fast. Lack of scalability between memory and CPUs

25

8/18/2013

Shared Address Space


Process P1 Shared Private Process P2 Shared Private Process P3 Pvt P1 Shared

Pvt P2
Pvt P3

Shared
Private

Physical address space

Virtual address space of each process

26

8/18/2013

Parallel Computer Memory Architectures: Distributed Memory


Each

processor has its own memory. Is scalable, no overhead for cache coherency. Programmer is responsible for many details of communication between processors.

27

8/18/2013

Symmetric Multiprocessors (SMP)


A collection of processors, a collection of memory: both are connected through some interconnect (usually, the fastest possible) Symmetric because latency for any processor to access any memory is constant uniform memory access (UMA)
Proc 1 Proc 2 Proc 3 Proc 4

Mem 1

Mem 2

Mem 3

Mem 4

28

8/18/2013

Distributed Memory Multiprocessors


Each processor has local memory that is accessible through a fast interconnect The different nodes are connected as I/O devices with (potentially) slower interconnect

Local memory access is a lot faster than remote memory non-uniform memory access (NUMA)
Advantage: can be built with commodity processors and many applications will perform well thanks to locality
Proc 1 Mem 1 Proc 2 Mem 2 Proc 3 Mem 3 Proc 4 Mem 4

29

8/18/2013

Parallel Programming Models


Exist

as an abstraction above hardware and memory architectures Examples:

Shared Memory Threads Messaging Passing Data Parallel

30

8/18/2013

Parallel Programming Models: Shared Memory Model


Appears

to the user as a single shared memory, despite hardware implementations. Locks and semaphores may be used to control shared memory access. Program development can be simplified since there is no need to explicitly specify communication between tasks.

31

8/18/2013

Parallel Programming Models: Threads Model


A

single process may have multiple, concurrent execution paths. Typically used with a shared memory architecture. Programmer is responsible for determining all parallelism.

32

8/18/2013

Parallel Programming Models: Message Passing Model


Tasks

exchange data by sending and receiving messages. Typically used with distributed memory architectures. Data transfer requires cooperative operations to be performed by each process. Ex.- a send operation must have a receive operation. MPI (Message Passing Interface) is the interface standard for message passing.

33

8/18/2013

Parallel Programming Models: Data Parallel Model


Tasks

performing the same operations on a set of data. Each task working on a separate piece of the set. Works well with either shared memory or distributed memory architectures.

34

8/18/2013

Designing Parallel Programs: Automatic Parallelization


Automatic

Compiler analyzes code and identifies opportunities for parallelism Analysis includes attempting to compute whether or not the parallelism actually improves performance. Loops are the most frequent target for automatic parallelism.

35

8/18/2013

Designing Parallel Programs: Manual Parallelization


Understand

the problem

A Parallelizable Problem:
Calculate

the potential energy for each of several thousand independent conformations of a molecule. When done find the minimum energy conformation. Fibonacci Series

A Non-Parallelizable Problem:
The

All calculations are dependent

Designing Parallel Programs: Domain Decomposition


Each

36

8/18/2013

task handles a portion of the data set.

37

8/18/2013

Designing Parallel Programs: Functional Decomposition


Each

work

task performs a function of the overall

38

8/18/2013

Parallel Algorithm Examples: Array Processing


Serial

Solution

Perform a function on a 2D array. Single processor iterates through each element in the array

Possible

Parallel Solution

Assign each processor a partition of the array. Each process iterates through its own partition.

39

8/18/2013

Parallel Algorithm Examples: Odd-Even Transposition Sort


Basic

idea is bubble sort, but concurrently comparing odd indexed elements with an adjacent element, then even indexed elements. If there are n elements in an array and there are n/2 processors. The algorithm is effectively O(n)!

40

8/18/2013

Parallel Algorithm Examples: Odd Even Transposition Sort


Initial

6,

4, 5, 2, 3, 0, 1 4, 6, 2, 5, 0, 3, 1 4, 2, 6, 0, 5, 1, 3 2, 4, 0, 6, 1, 5, 3 2, 0, 4, 1, 6, 3, 5 0, 2, 1, 4, 3, 6, 5 0, 1, 2, 3, 4, 5, 6

6, 5, 4, 3, 2, 1, 0

array:

Worst case scenario. Phase 1 Phase 2 Phase 1 Phase 2 Phase 1 Phase 2 Phase 1

41

8/18/2013

Other Parallelizable Problems


The

n-body problem Floyds Algorithm

Serial: O(n^3), Parallel: O(n log p)

Game

Trees Divide and Conquer Algorithms

42

8/18/2013

Conclusion
Parallel

computing is fast. There are many different approaches and models of parallel computing. Parallel computing is the future of computing.

Das könnte Ihnen auch gefallen