Pemrosesan Parale2l

Pemrosesan Paralel
Kudang B. Seminar
Kebutuhan Komputer Berkinerja

Tinggi
Peramalan cuaca
Aerodinamik
Kercerdasan buatan: robotik
Rekayasa genetik
Contoh aplikasi di atas

melibatkan komputasi
intensif dan memerlukan
Example 1: Weather Prediction

Area, segments
3000*3000*11 cubic miles
.1*.1*.1 cubic mile: ~ 1011 segments
Two day prediction

half hour periods: ~ 100 periods
Computation per segment

Temp, Pressure, Humidity, Wind speed, Wind
direction
Assume ~ 100 FLOPs
Performance: Weather
Prediction
Computational requirement: 1015
Serial supercomputer: 109 instr/sec
Total serial time: 106 sec = 280 hours
Not too good for 48 hour weather
prediction
Parallel Weather Prediction

1 K workstations, grid connected
108 segment computations per processor

108 instructions per second
100 instructions per segment computation
100 time steps: 104 seconds = ~3 hours
Much more acceptable
Assumption: Communication not a problem here
More workstations:
finer grid
better accuracy
Example 2: N body problem

Astronomy: bodies in space
Attract each other: Gravitational force Newtons
law
O(n*n) calculations per snapshot
Galaxy: ~ 1011 bodies -> ~ 1022 calculations
Calculation 1 micro sec
Snapshot: 1016 secs = ~1011 days = ~ 3*108 years
Is parallelism going to help us? NO
What does help? Better algorithm: Barnes Hut
Divides the space in quad tree
Treats far away quads as one body
Other Challenging
Applications
Satellite data acquisition: billions of bits / sec
Satellite data processing
Pollution levels, Remote sensing of materials
Image recognition
Discrete optimization problems
Planning, Scheduling, VLSI design
Material modeling
Nuclear weapons modeling (ASCI)
Airplane/Satellite/Vehicle design
Application Specific
Architectures
Mapping an algorithm directly onto hardware
ASICs: Application Specific Integrated Circuits

Levels of specificity
Full custom ASICs
Standard cell ASICs
Field programmable gate arrays
Computational models
Dataflow graphs
Systolic arrays
Orders of magnitude better performance
Orders of magnitude lower power
ASICS cont
How much faster than General purpose?
Example: 1D 1024 FFT
General purpose machine (G4): 25 micro secs
ASIC device (MIT Lincoln Labs): 32 nano secs
ASIC device uses 20 milliwatts (100 * less power)
Future designs:
2 tera ops in small ( < cubic ft ) device

Target applications
FFT
Finite Impulse Response (FIR) Filters
Matrix multiply
QR decomposition
Contoh Nyata
Peramalan cuaca 24 jam di UK melibatkan sekitar 1012
operasi untuk dieksekusi. Ini memerlukan waktu 2.7 hours

pada mesin Cray-1 (berkemampuan 108 operasi per detik).
Berapa operasi untuk peramalan

mingguan, bulanan, tahunan?
Menurut Einstein kecepatan cahaya: 3 x 108 m/dt. Dua
peralatan elektronik yang masing-masing mampu

melakukan 1012 operasi/detik dan terpisah dengan jarak 0.5
mm. Dalam hal ini akan lebih lama waktu yang diperlukan
bagi sinyal melakukan perjalanan antar dua peralatan
tersebut daripada waktu yang diperlukan untuk melakukan
eksekusi operasi (10-12 detik) oleh salah satu peralatan
Jadi faktor pembatasnya

adalah kecepatan cahaya.
elektronik tersebut.
SOLUSI: mendayagunakan
paralelisme
Motivation of Parallel
Computing
Parallel Computing is cost effective
Off the shelf, commodity processors are very fast

Memory is very cheap
Building a processor that is a small factor faster
costs an order of magnitude more
NoW is the time!
Cheapest way to get more performance: multiprocessor
NoW: Networks of workstations
Workstation can be an SMP
SMP: Symmetric Multi Processor
Shared memory
Bus
Wile E. Coyotes Parallel

Computer
Get a lot of the fastest processors

Get a lot of memory per processor
Get the fastest network
Hook it all together
And then what ???
Now you need to program

it!
Parallel programming introduces:
Task partitioning, task scheduling

Data partitioning
Synchronization
Load balancing
Latency issues
hiding
tolerance
Problem with Wile E. Coyote

Architecture
Von Neumann Machines not built for //ism
To get high speed, processors have lots of state
Cache, stack, global memory
To tolerate latency, we need fast context switch. WHY?

No free lunch: cant have both
Certainly not if the processor was not designed for both
Memory wall: memory gets slower and slower

in terms of number of cycles it takes to access
Memory hierarchy gets more and more complex

Memory accesses block
No split phase memory access
Sequential vs Parallel
Algorithms
Efficient Parallel Algorithms
Maximize parallelism
Minimize synchronization, remote accesses
Efficiency is Architecture Dependent
Efficient Sequential Algorithms

Minimize time, space
Efficiency is portable
Efficient C program on Pentium ~ Efficient C program on
Alpha
Speedup
Ideal: n processors n fold speed up
Ideal not always possible. WHY?

Tasks are data dependent
Not all processors are always busy
Remote data
Super linear speedup: >n speedup

Nonsense! Because we can execute the faster
parallel program sequentially
No nonsense!! Because parallel computers do not
just have more processors, they have more caches
Parallel Programming
Parallel Programming Paradigms
Super compilers
20 years of parallelizing compilers and what do we get?
..not much: we understand loops (a bit)
Multithreading
Pthreads, Solaris threads, not much difference
Message Passing
MPI rules, ..well, there is PVM (parallel virtual machine)
Data parallel programming
Niche work, but important
Implicit vs Explicit //ism

Implicit: super compilers
Extract parallelism from sequential program
The general case is too hard
pointers, aliases, recursion, separate compilation
dynamic dependence distances in array references
Explicit Parallelism: threads or messages

Complicates programming
creation, allocation, scheduling of processes
data partitioning
Synchronization ( locks, messages )
Pemrosesan Sekuensial &

Paralel
3 x lebih
cepat
dari
Klasifikasi Mesin
Models of Computation ( Flynn
Paralel
1966 )
1. Single Instruction Stream, Single Data Stream : SISD.
2. Multiple Instruction Stream, Single Data Stream : MISD.
3. Single Instruction Stream, Multiple Data Stream : SIMD.
4. Multiple Instruction Stream, Multiple Data Stream :
MIMD.
5. Single Program Multiple Data: SPMD.
SISD Computers
Untuk operasi a1 + a2 + a3 + + an
memerlukan sebanyak n akses ke
memori oleh prosesor dan sebanyak n-1
operasi penjumlahan. Jadi kompleksitas
waktu operasi adalah O(n).
von Neumann Architecture

Computer
MISD Computers
N prosesor yang memiliki unit kontrol pribadi, berbagi guna
memori bersama (shared memori).
Parallelisme diperoleh dengan menugaskan semua prosesor

mengerjakan operasi/tugas yang berbeda secara simultan pada
data yang sama.
SIMD Computers
N prosesor beroperasi dibawah kendali aliran

instruksi tunggal yang dikeluarkan oleh unit
kontrol pusat.
The processors operate synchronously and a

global clock is used to ensure lockstep operation.
MIMD Computers
Potensi dari 4 kelas

komputer
SPMD Computers
Program yang sama dieksekusi pada prosesor komputer

MIMD.
SPMD bukan merupakan paradigma hardware, ini adalah
software ekuivalen dari SIMD, namun bersifat
asynchronous.
Perhatikan instruksi IF X = 0 THEN S1 ELSE S2

Asumsikan X = 0 pada prosesor P1, dan untuk X != 0 pada
prosesor P2
Proses P1 mengeksekusi S1 paralel dengan prosesor P2
mengeksekusi S2 ( ini tidak dapat terjadi pada SIMD )

Pemrosesan Parale2l

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Pemrosesan Parale2l

Hochgeladen von

Copyright:

Verfügbare Formate

Pemrosesan Paralel

Kebutuhan Komputer Berkinerja

Contoh aplikasi di atas

Example 1: Weather Prediction

Two day prediction

Computation per segment

Parallel Weather Prediction

108 segment computations per processor

Example 2: N body problem

Discrete optimization problems

Planning, Scheduling, VLSI design

ASICs: Application Specific Integrated Circuits

2 tera ops in small ( < cubic ft ) device

operasi untuk dieksekusi. Ini memerlukan waktu 2.7 hours

Berapa operasi untuk peramalan

Menurut Einstein kecepatan cahaya: 3 x 108 m/dt. Dua

peralatan elektronik yang masing-masing mampu

Jadi faktor pembatasnya

Off the shelf, commodity processors are very fast

Wile E. Coyotes Parallel

Get a lot of the fastest processors

Now you need to program

Task partitioning, task scheduling

Problem with Wile E. Coyote

To tolerate latency, we need fast context switch. WHY?

Memory wall: memory gets slower and slower

Memory hierarchy gets more and more complex

Efficient Sequential Algorithms

Ideal not always possible. WHY?

Super linear speedup: >n speedup

Implicit vs Explicit //ism

Explicit Parallelism: threads or messages

Pemrosesan Sekuensial &

von Neumann Architecture

Parallelisme diperoleh dengan menugaskan semua prosesor

N prosesor beroperasi dibawah kendali aliran

The processors operate synchronously and a

Potensi dari 4 kelas

Program yang sama dieksekusi pada prosesor komputer

Perhatikan instruksi IF X = 0 THEN S1 ELSE S2

Das könnte Ihnen auch gefallen