Sie sind auf Seite 1von 36

Parallel Programming

1) History of parallel omputing

 Introdu tion and de nitions


 Brief history of super omputers
 Programming parallel omputers

1{1

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Introdu tion and de nitions

Why faster omputers?

 Solve ompute-intensive problems faster


{ Make infeasible problems feasible
{ Redu e design time

 Solve larger problems in same amount of time


{ Improve answer's pre ision
{ Redu e design time

 Gain ompetitive advantage

End of Moore's Law \free lun h for software"

1{3

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Introdu tion and de nitions

De nitions

 Parallel omputing: Using parallel omputers to solve single problems


faster.

 Parallel omputer: Multiple-pro essor system supporting parallel


exe ution.

 Parallel programming: Programming parallel omputers. Two ways:


{ Expli it
{ Impli it

1{4

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

Military-driven evolution of (super) omputing

 World War II
{ Hand- omputed artillery tables: ENIAC (USA)
{ Break Nazi odes: Bombe, Colossus (UK)

 Cold War
{ Nu lear weapons design
{ Air raft, submarine, et . design
{ Intelligen e gathering
{ Code breaking

1{5

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

ENIAC, 1943
E kert and Mau hly build the ENIAC (Ele troni Numeri al Integrator And
Cal ulator) | the rst stored-program \ele troni omputer"

1{6

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

The rst attempt at a super omputer: Illia -IV, 1966-1976

 Linear array of 256 64-bit Pro essing Elements, ECL


 Target: 1 GFLOP, 13 MHz
 Programmed in \GLYPNIR", a ve torized derivative of ALGOL 60
1{7

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

The rst real super omputer: Seymour Cray's CRAY-1, 1976

 S alar+ve tor pro essor, 80 MHz lo k, 133 MFLOPS, 8 MB main


memory in bipolar te hnology (ECL), $5 to $8+ million

 150 kW motor generator, 20-ton ompressor for freon ooling system


 Programmed in CFT, Cray Fortran Compiler, ve torized DO loops.
1{8

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

Commer ial super omputing

 Started in apital-intensive industries:


{ Petroleum exploration
{ Automobile, air raft manufa turing

 Today
{ Consumer produ ts
{ Pharma euti al design
{ Cir uit simulation
{ ...

1{9

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

Mi ropro essor-based super omputers: Calte h's Cosmi Cube


(1981)

 64-node hyper ube based on Intel 8086 + 8087, 128 KB RAM per node
 8 MHz, 10 MFLOPS, $80,000
 Programmed in Pas al or C, with message-passing library.
1{10

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

A new model: Thinking Ma hines' CM-1

 Tried to model the human brain: variable- onne tivity 12-D hyper ube
 65,536 1-bit pro essing elements, 4 Kbit (CM-1) or 64 Kbits (CM-2) per
pro essor. 2,500 MIPS and 2,500 MFLOPS (CM-2)

 Programmed in *Lisp, C* or CM Fortran


1{11

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

A popular massively-parallel SIMD omputer: MasPar MP-2


(1993)

 2-D mesh, up to 16K 1-bit (MP-1) and 32-bit (MP-2) pro essors
 Full- edged SIMD, with Xnet and global router ommuni ation
 Programmed in MPL (MasPar Language) and HPF (High-Performan e
Fortran)

1{12

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

Commodity lusters: NASA's Beowulf luster (1994)

 16 Intel 486DX PCs onne ted with standard 10 Mb/s Ethernet


 Linux with MPI
 1 GFLOP on a $50,000 system
1{13

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

Massively-parallel SIMD opro essors: UCSC Kestrel (1999)

 512-PE linear SIMD array, 8-bit Pro essing Elements (PE), 20 MHz
 64 PEs/ hip, 0.5 m CMOS (HP), 256 bytes SRAM per PE
 30 GOPs (integer, 8-bit), 1 W peak power per hip
1{14

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

Today: IBM BlueGene/L

 64K nodes (32  32  64) in a 3-D torus, 2 PPC 440 at 700 MHz per node
 360 TFLOPS peak (World's Fastest Super omputer)
 Starting at only $1.5 million per ra k (1024 nodes)
1{15

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

Tomorrow: IBM Roadrunner

 1.6 PFLOPS peak (1.0 Linpa k PFLOPS)


 Hybrid AMD Opteron + IBM Cell, 16 K nodes
 Being delivered to Los Alamos National Lab, fully operational in 2008

1{16

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

Super omputer manufa turers


Today are:

 IBM
 NEC
 Cray In .
 Dell
 Hewlett-Pa kard
 Sun Mi rosystems
 Sili on Graphi s

1{17

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

History of parallel omputing

Yesterday: IBM 7044

 Solid-state (transistors), 36-bit words, 32K addressing spa e


 Fixed-point and oating-point
1{18

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Programming parallel omputers


Seeking on urren y:

 Data parallelism
 Fun tional parallelism

1{19

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Data dependen e graphs

P
Q
T
1{20

= (X + Y) * (X - Y)
= Z - W
= P + Q

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Data parallelism

 Independent tasks apply same operation to di erent data.


 Example:
for(i = 0; i < 100; ++i)
a[i = b[i + [i

 OK to perform operations on urrently

1{21

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Fun tional parallelism

 Independent tasks apply di erent operations to di erent data.


 Example:
a = 2;
b = 3;
m = (a + b) / 2;
s = (a*a + b*b) / 2;
v = s - m;

 First and se ond statements


 Third and fourth statements

1{22

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Programming parallel omputers


Four possible ways:

 Extend ompilers to translate sequential programs into parallel ode


automati ally (\impli itly parallel")

 Extend languages with new operations to express parallelism (\expli itly


parallel")

 Add new parallel language layer on top of existing sequential language


 De ne a totally new parallel language and ompiler system

1{23

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Strategy 1: Extend ompilers


Let the ompiler dis over parallelism and produ e exe utable ode.
Advantages:

 Easiest to use | doesn't require any spe i parallel programming training


 Leverage billions of lines of existing (Fortran) ode

Disadvantages:

 Parallelism may be lost when programs are formulated in a sequential


fashion

 Performan e of parallelizing ompilers still poor on generi appli ations

1{24

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Strategy 1: Example

www.parallelsp. om

 Translates serial FORTRAN sour e ode into parallel sour e ode

1{25

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Strategy 2: Extend language


Let the programmer reate, terminate, and syn hronize pro esses, and de ne
all ommuni ations to expli itly en ode parallelism.
Advantages:

 Easiest, qui kest, and least expensive to implement.


 Leverage existing ompiler te hnology
 New libraries ready soon after new parallel omputers are available

Disadvantages:

 La k of ompiler support to at h errors


 Easy to write programs hard to debug
 Harder to learn
1{26

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Strategy 2: Examples
All most popular parallel programming tools belong to this lass:

 Message-Passing Interfa e (MPI)


 Open spe i ations for Multi-Pro essing (OpenMP)
 POSIX Threads (Pthreads)
 Parallel Virtual Ma hine (PVM)

1{27

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Strategy 3: Two-layer approa h


View ea h parallel program as made of two layers:

 Lower layer:
{ Single-pro ess omputation ( ore of the omputation)
{ Expressed in any sequential programming language

 Uppper layer:
{ Creation and syn hronization of pro esses
{ Partitioning of data among pro esses

 Only resear h prototypes so far.

1{28

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Strategy 3: Example

The CODE Proje t at


University of Texas

www. s.utexas.edu/users/ ode/

 Visually glue C fun tions in parallel using Pthreads, MPI, or PVM


1{29

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Strategy 4: Create a parallel language


Two approa hes:

 Create a parallel language from s rat h


 Add parallel onstru ts to an existing language: Fortran 90,

High-Performan e Fortran (HPF), C* (Thinking Ma hines Corp.)

Advantages:

 Program with parallelism in mind (higher performan e)

Disadvantages:

 Requires new languages and new ompilers


 Programmers' resistan e

1{30

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Strategy 4: Examples
INMOS' O am language

High-Performan e FORTRAN

SISAL data ow language

1{31

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

Current status
Low-level approa h is most popular:

 Augment existing languages with low-level parallel onstru ts


 MPI, OpenMP, and Pthreads

Advantages:

 E ien y
 Portability
Disadvantages:

 Harder to program
 Harder to debug

1{32

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Programming parallel omputers

MPI

 MPI = \Message-Passing Interfa e"


 Expli itly-parallel programming strategy
 Standard spe i ation for message-passing API
 (Free) Libraries available on virtually all parallel omputers, in luding
networks of workstations and ommodity lusters

 Libraries available for C/C++ and Fortran


 Assumes distributed-memory systems:
CPU

CPU

CPU

Cache

Cache

Cache

Memory

I/O
devices

Memory

I/O
devices

Memory

I/O
devices

INTERCONNECTION NETWORK

1{33

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Introdu tion and de nitions

OpenMP

 OpenMP = \Open spe i ations for Multi Pro essing"


 In rementally expli it parallel programming strategy
 An appli ation-program interfa e (API) for multi-threaded,
shared-memory systems

 Set of ompiler dire tives and runtime library routines


 Available for C/C++ and Fortran
 Not meant for distributed-memory, but only for shared-memory systems:
CPU

CPU

CPU

CPU

Cache

Cache

Cache

Cache

BUS

MAIN
MEMORY

1{34

Parallel programming 2009, Ver. 2.0

I/O Dev.

Copyright 2007{2009 Andrea Di Blas

Introdu tion and de nitions

Pthreads

 Pthreads = \POSIX Threads"


 Expli it parallel programming language
 An appli ation-program interfa e (API) for multi-threaded,
shared-memory systems

 Set of library routines for C/C++ only


 Not meant for distributed-memory, but only for shared-memory systems:
CPU

CPU

CPU

CPU

Cache

Cache

Cache

Cache

BUS

MAIN
MEMORY

1{35

Parallel programming 2009, Ver. 2.0

I/O Dev.

Copyright 2007{2009 Andrea Di Blas

Pra ti e

Pra ti e problem 1.A


Card sorting:
1. How long does it take one person to sort a shued de k of ards?
2. How long does it take p people to sort p de ks of ards?
3. How long does it take p people to sort one de k of ards?
4. What is the optimal number of people?

1{36

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Pra ti e

Pra ti e problem 1.B


You have 1000 ards, ea h with a number on, and you an use up to 1000
a ountants (ready at their desks in a 40  25 ave) to add them all up.
1. How do you do it?
2. How long does it take?
3. Where are you?
4. Can they do it 1000
times faster than you?

1{37

Parallel programming 2009, Ver. 2.0

Copyright 2007{2009 Andrea Di Blas

Das könnte Ihnen auch gefallen