93 views

Uploaded by Jean-Philippe Perron

- dopcor
- Minimax Traverse Gpu
- AI 501 - Lesson 7 - Hardware - V1.0
- Death to Spies Moment of Truth MULTi5-PROPHET _ Ova Games
- Matlab & vectors
- SciLab Intro
- Datastage Parellel jobs.pdf
- Dell Poweredge r720 r720xd Technical Guide
- CUDA-PSO
- Learning .NET High-performance Programming - Sample Chapter
- 663760561
- Binomial Options
- Lec 2 MATLAB Environment and Basics
- 7fd216d3c2288f433301467dd0f4b098acd2
- Verification of Efficacy of Inside-Outside Judgement in Respect of a 3D-Primitive Shapes Using GPGPU
- Output Log
- gpu-150727143433-lva1-app6891
- Importing Data to MATLAB
- How to sound like a Parallel Programming Expert - Part 1 Introducing concurrency and parallelism
- ipps98

You are on page 1of 44

Application Engineer

MathWorks, Inc.

1

Some Questions to Consider

Do you have datasets too big to fit on your computer?

If so…

Do you have a multi-core or multiprocessor desktop

machine?

Do you have access to a computer cluster?

2

Solving Big Technical Problems

Long running

Larger Compute Pool

Wait

Computationally (e.g. More Processors)

intensive

Large data set

of problem (e.g. More Machines)

3

Utilizing Additional Processing Power

Built-in multithreading

– Core MATLAB

– Introduced in R2007a

– Utility for specific matrix operations

– Automatically enabled since R2008a

– Parallel Computing Toolbox

– MATLAB Distributed Computing Server

– Broad utility controlled by the MATLAB user

4

Parallel Computing with MATLAB

Worker Worker

Worker

TOOLBOXES Worker

Worker

Worker

BLOCKSETS Worker

Worker

5

Parallel Computing with MATLAB

Parallel Computing

Toolbox

MATLAB Distributed

Computing Server

MATLAB Workers

6

Programming Parallel Applications

Level of control Required effort

Minimal None

Some Straightforward

Extensive Involved

7

Programming Parallel Applications

Level of control Parallel Options

Toolboxes

High-Level

Some Programming Constructs:

(e.g. parfor, batch, distributed)

Low-Level

Extensive Programming Constructs:

(e.g. Jobs/Tasks, MPI-based)

8

Example: Optimizing Tower Placement

Maximize coverage

Minimize overlap

9

Summary of Example

Parallel Computing Toolbox

in Optimization Toolbox

10

Parallel Support in Optimization Toolbox

Functions:

– fmincon

Finds a constrained minimum of a function of several variables

– fminimax

Finds a minimax solution of a function of several variables

– fgoalattain

Solves the multiobjective goal attainment optimization problem

in order to speed the estimation of gradients

11

Tools with Built-in Support

Optimization Toolbox

Global Optimization Toolbox

Statistics Toolbox

SystemTest Worker

Worker

Simulink Design Optimization TOOLBOXES Worker

Worker Worker

Bioinformatics Toolbox BLOCKSETS

Worker

…

http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html

12

Programming Parallel Applications

Level of control Parallel Options

Toolboxes

High-Level

Some Programming Constructs:

(e.g. parfor, batch, distributed)

Low-Level

Extensive Programming Constructs:

(e.g. Jobs/Tasks, MPI-based)

13

Running Independent Tasks or Iterations

No dependencies or communications between tasks

Examples include parameter sweeps and Monte Carlo

simulations

Time Time

14

Example: Parameter Sweep of ODEs

1.2

Displacement (x)

0.6

0.4

5 0.2

m = 5, b = 2, k = 2

m x b x k x 0 0

-0.2

m = 5, b = 5, k = 5

-0.4

1, 2 ,... 1, 2 ,... 0 5 10

Time (s)

15 20 25

values for b and k

2.5

Record peak value for each run 2

1.5

0

5

2 4

4 3

Damping (b) 2

6 1 Stiffness (k)

15

Summary of Example

1.2

0.8

Mixed task-parallel and serial

Displacement (x)

0.6

m = 5, b = 2, k = 2

0.2

-0.2

m = 5, b = 5, k = 5

Ran loops on a pool of -0.4

0 5 10 15 20 25

Time (s)

MATLAB resources

in converting existing for-loop

2

1.5

into parfor-loop 1

0.5

0

5

2 4

4 3

Damping (b) 2

6 1 Stiffness (k)

16

The Mechanics of parfor Loops

1 23 34 4 55 66 7 88 9 910 10

1 2

Worker

a(i) = i; Worker

a = zeros(10, 1) a(i) = i;

parfor i = 1:10

a(i) = i;

end

a Worker Worker

a(i) = i; a(i) = i;

17

Converting for to parfor

– Task independent

– Order independent

– Cannot “introduce” variables (e.g. eval, load,

global, etc.)

– Cannot contain break or return statements

– Cannot contain another parfor loop

18

Advice for Converting for to parfor

wrapping a subset of the body to a function

classification of variables

http://blogs.mathworks.com/loren/2009/10/02/using-parfor-

loops-getting-up-and-running/

19

Performance Gain with More Hardware

Core 1 Core 2

Core 3 Core 4

Device Memory

Cache

20

What is a Graphics Processing Unit

(GPU)

used for scientific calculations

floating point processors

– Typically hundreds of processors per card

– GPU cores complement CPU cores

* Parallel Computing Toolbox requires NVIDIA GPUs with Compute Capability 1.3 or greater, including

NVIDIA Tesla 10-series and 20-series products. See http://www.nvidia.com/object/cuda_gpus.html

for a complete listing

21

Summary of Options for Targeting GPUs

built-in functions

Greater Control

Ease of Use

the GPU array

MATLAB

22

Performance: A\b with Double Precision

23

Performance Acceleration Options in the

Parallel Computing Toolbox

MATLAB

Technology Example Execution Target

Workers

GPU-based

GPUArray No Compute Capability

parallelism

1.3 or greater

24

Parallel Computing enables you to …

Larger Compute Pool Larger Memory Pool

11 26 41

12 27 42

13 28 43

14 29 44

15 30 45

16 31 46

17 32 47

17 33 48

19 34 49

20 35 50

21 36 51

22 37 52

25

Limited Process Memory

32-bit platforms

– Windows 2000 and XP (by default): 2 GB

– Linux/UNIX/MAC system configurable: 3-4 GB

– Windows XP with /3gb boot.ini switch: 3 GB

64-bit platforms

– Linux/UNIX/MAC: 8 TB

– Windows XP Professional x64: 8TB

26

Client-side Distributed Arrays

11 26 41

12 27 42

13 28 43

14 29 44

15 30 45

16 31 46

TOOLBOXES 17 32 47

17 33 48

BLOCKSETS 19 34 49

20 35 50

21 36 51

22 37 52

from Desktop Lives on the Cluster

27

Enhanced MATLAB Functions That

Operate on Distributed Arrays

28

spmd blocks

spmd

% single program across workers

end

Run on a pool of MATLAB resources

Single Program runs simultaneously across workers

– Distributed arrays, message-passing

Multiple Data spread across multiple workers

– Data stays on workers

29

Programming Parallel Applications

Level of control Parallel Options

Toolboxes

High-Level

Some Programming Constructs:

(e.g. parfor, batch, distributed)

Low-Level

Extensive Programming Constructs:

(e.g. Jobs/Tasks, MPI-based)

38

MPI-Based Functions in

Parallel Computing Toolbox™

Use when a high degree of control over parallel algorithm is required

– labSendReceive, labBroadcast, and others

– Send, receive, and broadcast any data type in MATLAB

Automatic bookkeeping

– Setup: communication, ranks, etc.

– Error detection: deadlocks and miscommunications

Pluggable

– Use any MPI implementation that is binary-compatible with MPICH2

39

Scheduling Applications

40

Interactive to Scheduling

Interactive

– Great for prototyping

– Immediate access to MATLAB workers

Scheduling

– Offloads work to other MATLAB workers (local or on a

cluster)

– Access to more computing resources for improved

performance

– Frees up local MATLAB session

41

Scheduling Work

Worker

Work

Result

BLOCKSETS Worker

42

Example: Schedule Processing

1.2

0.8

Offload parameter sweep

Displacement (x)

0.6

m = 5, b = 2, k = 2

0.2

-0.2

m = 5, b = 5, k = 5

Get peak value results when -0.4

0 5 10 15 20 25

Time (s)

processing is complete

2

1.5

0.5

0

5

2 4

4 3

Damping (b) 2

6 1 Stiffness (k)

43

Summary of Example

1.2

Displacement (x)

0.6

0.4

m = 5, b = 2, k = 2

0.2

m = 5, b = 5, k = 5

-0.2

0 5 10 15 20 25

Time (s)

worker’s workspace 2.5

2

1.5

0.5

0

5

2 4

4 3

Damping (b) 2

6 1 Stiffness (k)

44

Scheduling Workflows

parfor

– Multiple independent iterations

– Easy to combine serial and parallel code

– Workflow

Interactive using matlabpool

Scheduled using batch

jobs/tasks

– Series of independent tasks; not necessarily iterations

– Workflow Always scheduled

45

Scheduling Jobs and Tasks

Worker

Task

Result

Task

Worker

Result

Job

Task

TOOLBOXES Scheduler

Results

BLOCKSETS Result

Worker

Task

Result

Worker

46

Parallel Computing with MATLAB

Global Optimization

Calibration Toolbox

Simulink Design

Bioinformatics

Model-Based

Optimization

Optimization

System Test

Built in parallel functionality

Toolbox

Toolbox

Toolbox

within specific toolboxes

(also requires Parallel

Computing Toolbox)

MATLAB and Parallel Computing Tools

standard libraries Message Passing Interface (MPI)

ScaLAPACK

50

Parallel Computing on the Desktop

Desktop Computer

Rapidly develop parallel

applications on local computer

power by using CPUs and

GPUs

not required

51

Scale Up to Clusters, Grids and Clouds

Computer Cluster

Desktop Computer

MATLAB Distributed Computing Server

Parallel Computing Toolbox

Scheduler

52

Licensing: MATLAB® Distributed

Computing Server™

One key required per worker:

– Packs of 8, 16, 32, 64, 128, etc.

– Worker is a MATLAB® session,

not a processor

All-product install

– No code generation or deployment

MATLAB Distributed

products Computing Server

Worker

Task

MATLAB Job Worker

Scheduler

Simulink Parallel Result

Result

Computing Worker

Toolboxes

Toolbox

Blocksets

53

Support for Schedulers

Direct Support

TORQUE

55

Programming Parallel Applications

Level of control Parallel Options

Toolboxes

High-Level

Some Programming Constructs:

(e.g. parfor, batch, distributed)

Low-Level

Extensive Programming Constructs:

(e.g. Jobs/Tasks, MPI-based)

56

- dopcorUploaded byAdam Octavian
- Minimax Traverse GpuUploaded byAlexey Shpitalyov
- AI 501 - Lesson 7 - Hardware - V1.0Uploaded byYuri Queiroz
- Death to Spies Moment of Truth MULTi5-PROPHET _ Ova GamesUploaded byMiky Khaey
- Matlab & vectorsUploaded byMahmoud Basho
- SciLab IntroUploaded bykcshieh
- Datastage Parellel jobs.pdfUploaded byNivedhaSekar
- Dell Poweredge r720 r720xd Technical GuideUploaded byJocelyn Dervain
- CUDA-PSOUploaded bymalliwi88
- Learning .NET High-performance Programming - Sample ChapterUploaded byPackt Publishing
- 663760561Uploaded bytechsmart
- Binomial OptionsUploaded bylycancapital
- Lec 2 MATLAB Environment and BasicsUploaded byUzairtanveer62
- 7fd216d3c2288f433301467dd0f4b098acd2Uploaded byJhovany Solis
- Verification of Efficacy of Inside-Outside Judgement in Respect of a 3D-Primitive Shapes Using GPGPUUploaded byIjmret Journal
- Output LogUploaded byNoel Mamani Chino
- gpu-150727143433-lva1-app6891Uploaded byApoorva
- Importing Data to MATLABUploaded byCasey Long
- How to sound like a Parallel Programming Expert - Part 1 Introducing concurrency and parallelismUploaded byerkaninho
- ipps98Uploaded byRosário Cunha
- On-Chip Network-Enabled Multicore Platforms.pdfUploaded byvenkatmusala
- DataStage Training OutlineUploaded byVeera Venkata Sairam Chakravartula
- TFT driver_040217Uploaded bylooks_emb
- Relations of geometry and space -time to general-purpose quantum parallel computing and artificial intelligenceUploaded byinventionjournals
- srini_ngdm07Uploaded byapi-3798592
- 2018-01-25T22-23-54_r3dlogUploaded byIonut Smoke
- Tech_market_BPS_May2018_v10.pdfUploaded byJames Westover
- Binomial Options Pricing ModelUploaded byVolodja
- A Tree of Life Approach for Multidimensional DataUploaded byseventhsensegroup
- Sols.book.pdfUploaded bytt_aljobory3911

- Manual SybaseUploaded bycafrava
- Basys 2 Reference ManualUploaded byAnonymous lidok7lDi
- jumbo.pdfUploaded bycarrialdi
- SIS Bulletin 060Uploaded bymohammed0781
- Coefficitent of Friction LabUploaded byMelissa Danowski
- 1 IJAERS-DEC-2014-2-Implementation of LC Filter in Torque Ripple Minimization of Sensorless BLDC Motor.pdfUploaded byIJAERS JOURNAL
- wybrow-gd-2005Uploaded bynfkb001
- Atoms, Molecules and IonsUploaded byJuan Antonio Valls Ferrer
- A Credibility Analysis System for Assessing Information on Twitter.pdfUploaded byMechWindNani
- LV-STEHO-ENUploaded byKeresztúri Ferenc
- FL-NH - FUSIBLES ETI.pdfUploaded byGustavo Zavala
- Analysis Design of TrussUploaded bysurendra_panga
- MIT15_053S13_lec19.pdfUploaded byShashank Singla
- COLLOIDS.pdfUploaded bysridharanc
- C16Uploaded byvijay
- Utilities & HR ApplicationsUploaded byMirza Mulaomerovic
- Design Lab- AeroUploaded bySubharanjani Mathi
- Glossery Terms ITUploaded bychrisbock
- Java LabUploaded byJothimani Murugesan K
- Bypass SisUploaded bySuriyachakArchwichai
- NI PXI-4071 Calibration ProcedureUploaded bycisco211
- Recent Developments in Permanent Magnet Gear Systems & MachinesUploaded byMimidbe Bibi
- Aileron ReversalUploaded byathira
- Biosorption of Cr(VI) From Aqueous Solutions by Eichhornia CrassipesUploaded byandresmilquez
- Aerospace Toolbox User Guide MatlabUploaded byramksree
- DAAAM 2009 (Majstorovic, Rakic, Bandic Glavas)Uploaded bys1ckb0y
- Grade 10 Chemistry Final Exam Review SheetUploaded byAshley
- ultrasonic testing.pptUploaded byAmit Sindhya
- 1-may-00Uploaded byJorge Muñoz Aristizabal
- An Integral Associated With the Modified Saigo Operators Involving Two I-Functions and Generalized PolynomialsUploaded byInternational Organization of Scientific Research (IOSR)