You are on page 1of 44

Parallel Computing with MATLAB

Sarah Wait Zaranek


Application Engineer
MathWorks, Inc.

1
Some Questions to Consider

 Do you want to speed up your algorithms?


 Do you have datasets too big to fit on your computer?

If so…
 Do you have a multi-core or multiprocessor desktop
machine?
 Do you have access to a computer cluster?

2
Solving Big Technical Problems

Challenges You could… Solutions

Long running
Larger Compute Pool
Wait
Computationally (e.g. More Processors)
intensive

Reduce size Larger Memory Pool


Large data set
of problem (e.g. More Machines)

3
Utilizing Additional Processing Power

 Built-in multithreading
– Core MATLAB
– Introduced in R2007a
– Utility for specific matrix operations
– Automatically enabled since R2008a

 Parallel computing tools


– Parallel Computing Toolbox
– MATLAB Distributed Computing Server
– Broad utility controlled by the MATLAB user

4
Parallel Computing with MATLAB

Worker Worker

Worker

TOOLBOXES Worker
Worker

Worker
BLOCKSETS Worker

Worker

5
Parallel Computing with MATLAB
Parallel Computing
Toolbox

MATLAB Distributed
Computing Server

MATLAB Workers

User’s Desktop Compute Cluster


6
Programming Parallel Applications
Level of control Required effort

Minimal None

Some Straightforward

Extensive Involved

7
Programming Parallel Applications
Level of control Parallel Options

Minimal Support built into


Toolboxes

High-Level
Some Programming Constructs:
(e.g. parfor, batch, distributed)

Low-Level
Extensive Programming Constructs:
(e.g. Jobs/Tasks, MPI-based)

8
Example: Optimizing Tower Placement

 Determine location of cell towers

 Maximize coverage

 Minimize overlap

9
Summary of Example

 Enabled built-in support for


Parallel Computing Toolbox
in Optimization Toolbox

 Used a pool of MATLAB workers

 Optimized in parallel using fmincon

10
Parallel Support in Optimization Toolbox

 Functions:
– fmincon
 Finds a constrained minimum of a function of several variables
– fminimax
 Finds a minimax solution of a function of several variables
– fgoalattain
 Solves the multiobjective goal attainment optimization problem

 Functions can take finite differences in parallel


in order to speed the estimation of gradients

11
Tools with Built-in Support

 Optimization Toolbox
 Global Optimization Toolbox
 Statistics Toolbox
 SystemTest Worker
Worker
 Simulink Design Optimization TOOLBOXES Worker
Worker Worker
 Bioinformatics Toolbox BLOCKSETS
Worker

 Model-Based Calibration Toolbox Worker

 …

http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html

Directly leverage functions in Parallel Computing Toolbox

12
Programming Parallel Applications
Level of control Parallel Options

Minimal Support built into


Toolboxes

High-Level
Some Programming Constructs:
(e.g. parfor, batch, distributed)

Low-Level
Extensive Programming Constructs:
(e.g. Jobs/Tasks, MPI-based)

13
Running Independent Tasks or Iterations

 Ideal problem for parallel computing


 No dependencies or communications between tasks
 Examples include parameter sweeps and Monte Carlo
simulations

Time Time
14
Example: Parameter Sweep of ODEs
1.2

 Solve a 2nd order ODE 0.8

Displacement (x)
0.6


0.4
5 0.2
m = 5, b = 2, k = 2

m x b x k x 0 0

-0.2
m = 5, b = 5, k = 5

-0.4
1, 2 ,... 1, 2 ,... 0 5 10
Time (s)
15 20 25

 Simulate with different


values for b and k
2.5

Peak Displacement (x)


 Record peak value for each run 2

1.5

 Plot results 0.5


0
5
2 4
4 3
Damping (b) 2
6 1 Stiffness (k)

15
Summary of Example
1.2

0.8
 Mixed task-parallel and serial

Displacement (x)
0.6

code in the same function 0.4


m = 5, b = 2, k = 2
0.2

-0.2
m = 5, b = 5, k = 5
 Ran loops on a pool of -0.4
0 5 10 15 20 25
Time (s)
MATLAB resources

 Used Code Analyzer to help 2.5

Peak Displacement (x)


in converting existing for-loop
2

1.5

into parfor-loop 1

0.5
0
5
2 4
4 3
Damping (b) 2
6 1 Stiffness (k)

16
The Mechanics of parfor Loops
1 23 34 4 55 66 7 88 9 910 10
1 2

Worker
a(i) = i; Worker
a = zeros(10, 1) a(i) = i;
parfor i = 1:10
a(i) = i;
end
a Worker Worker
a(i) = i; a(i) = i;

Pool of MATLAB Workers


17
Converting for to parfor

 Requirements for parfor loops


– Task independent
– Order independent

 Constraints on the loop body


– Cannot “introduce” variables (e.g. eval, load,
global, etc.)
– Cannot contain break or return statements
– Cannot contain another parfor loop

18
Advice for Converting for to parfor

 Use Code Analyzer to diagnose parfor issues

 If your for loop cannot be converted to a parfor, consider


wrapping a subset of the body to a function

 Read the section in the documentation on


classification of variables

 http://blogs.mathworks.com/loren/2009/10/02/using-parfor-
loops-getting-up-and-running/

19
Performance Gain with More Hardware

Using More Cores (CPUs) Using GPUs

Core 1 Core 2

Core 3 Core 4

Device Memory
Cache

20
What is a Graphics Processing Unit
(GPU)

 Originally for graphics acceleration, now also


used for scientific calculations

 Massively parallel array of integer and


floating point processors
– Typically hundreds of processors per card
– GPU cores complement CPU cores

 Dedicated high-speed memory

* Parallel Computing Toolbox requires NVIDIA GPUs with Compute Capability 1.3 or greater, including
NVIDIA Tesla 10-series and 20-series products. See http://www.nvidia.com/object/cuda_gpus.html
for a complete listing
21
Summary of Options for Targeting GPUs

1) Use GPU array interface with MATLAB


built-in functions

Greater Control
Ease of Use

2) Execute custom functions on elements of


the GPU array

3) Invoke your CUDA kernels directly from


MATLAB

22
Performance: A\b with Double Precision

23
Performance Acceleration Options in the
Parallel Computing Toolbox

MATLAB
Technology Example Execution Target
Workers

matlabpool parfor Required CPU Cores

user-defined tasks createTask Required CPU Cores

NVIDIA GPU with


GPU-based
GPUArray No Compute Capability
parallelism
1.3 or greater

24
Parallel Computing enables you to …
Larger Compute Pool Larger Memory Pool

Speed up Computations Work with Large Data

11 26 41

12 27 42

13 28 43

14 29 44

15 30 45

16 31 46

17 32 47

17 33 48

19 34 49

20 35 50

21 36 51

22 37 52

25
Limited Process Memory

 32-bit platforms
– Windows 2000 and XP (by default): 2 GB
– Linux/UNIX/MAC system configurable: 3-4 GB
– Windows XP with /3gb boot.ini switch: 3 GB

 64-bit platforms
– Linux/UNIX/MAC: 8 TB
– Windows XP Professional x64: 8TB

26
Client-side Distributed Arrays

11 26 41

12 27 42

13 28 43

14 29 44

15 30 45

16 31 46

TOOLBOXES 17 32 47

17 33 48

BLOCKSETS 19 34 49

20 35 50

21 36 51

22 37 52

Remotely Manipulate Array Distributed Array


from Desktop Lives on the Cluster
27
Enhanced MATLAB Functions That
Operate on Distributed Arrays

28
spmd blocks

spmd
% single program across workers
end

 Mix parallel and serial code in the same function


 Run on a pool of MATLAB resources
 Single Program runs simultaneously across workers
– Distributed arrays, message-passing
 Multiple Data spread across multiple workers
– Data stays on workers

29
Programming Parallel Applications
Level of control Parallel Options

Minimal Support built into


Toolboxes

High-Level
Some Programming Constructs:
(e.g. parfor, batch, distributed)

Low-Level
Extensive Programming Constructs:
(e.g. Jobs/Tasks, MPI-based)

38
MPI-Based Functions in
Parallel Computing Toolbox™
Use when a high degree of control over parallel algorithm is required

 High-level abstractions of MPI functions


– labSendReceive, labBroadcast, and others
– Send, receive, and broadcast any data type in MATLAB

 Automatic bookkeeping
– Setup: communication, ranks, etc.
– Error detection: deadlocks and miscommunications

 Pluggable
– Use any MPI implementation that is binary-compatible with MPICH2

39
Scheduling Applications

40
Interactive to Scheduling

 Interactive
– Great for prototyping
– Immediate access to MATLAB workers

 Scheduling
– Offloads work to other MATLAB workers (local or on a
cluster)
– Access to more computing resources for improved
performance
– Frees up local MATLAB session

41
Scheduling Work

Worker
Work

TOOLBOXES Scheduler Worker Worker


Result

BLOCKSETS Worker

42
Example: Schedule Processing
1.2

0.8
 Offload parameter sweep

Displacement (x)
0.6

to local workers 0.4


m = 5, b = 2, k = 2
0.2

-0.2
m = 5, b = 5, k = 5
 Get peak value results when -0.4
0 5 10 15 20 25
Time (s)
processing is complete

 Plot results in local MATLAB 2.5

Peak Displacement (x)


2

1.5

0.5
0
5
2 4
4 3
Damping (b) 2
6 1 Stiffness (k)

43
Summary of Example
1.2

 Used batch for off-loading work 0.8

Displacement (x)
0.6

0.4
m = 5, b = 2, k = 2
0.2

 Used matlabpool option to 0


m = 5, b = 5, k = 5
-0.2

off-load and run in parallel -0.4


0 5 10 15 20 25
Time (s)

 Used load to retrieve


worker’s workspace 2.5

Peak Displacement (x)


2

1.5

0.5
0
5
2 4
4 3
Damping (b) 2
6 1 Stiffness (k)

44
Scheduling Workflows

 parfor
– Multiple independent iterations
– Easy to combine serial and parallel code
– Workflow
 Interactive using matlabpool
 Scheduled using batch

 jobs/tasks
– Series of independent tasks; not necessarily iterations
– Workflow  Always scheduled

45
Scheduling Jobs and Tasks

Worker
Task
Result
Task

Worker
Result
Job
Task
TOOLBOXES Scheduler
Results
BLOCKSETS Result
Worker
Task

Result

Worker
46
Parallel Computing with MATLAB

Global Optimization

Calibration Toolbox
Simulink Design

Bioinformatics

Model-Based
Optimization

Optimization
System Test
Built in parallel functionality

Toolbox

Toolbox
Toolbox
within specific toolboxes
(also requires Parallel
Computing Toolbox)
MATLAB and Parallel Computing Tools

 High level parallel functions parfor matlabpool batch

 Low level parallel functions jobs, tasks

 Built on industry Industry Libraries


standard libraries Message Passing Interface (MPI)
ScaLAPACK

50
Parallel Computing on the Desktop

Desktop Computer

Parallel Computing Toolbox


 Rapidly develop parallel
applications on local computer

 Take full advantage of desktop


power by using CPUs and
GPUs

 Separate computer cluster


not required

51
Scale Up to Clusters, Grids and Clouds

Computer Cluster
Desktop Computer
MATLAB Distributed Computing Server
Parallel Computing Toolbox

Scheduler

52
Licensing: MATLAB® Distributed
Computing Server™
 One key required per worker:
– Packs of 8, 16, 32, 64, 128, etc.
– Worker is a MATLAB® session,
not a processor

 All-product install
– No code generation or deployment
MATLAB Distributed
products Computing Server

Worker
Task
MATLAB Job Worker
Scheduler
Simulink Parallel Result
Result
Computing Worker
Toolboxes
Toolbox
Blocksets
53
Support for Schedulers
Direct Support

TORQUE

Open API for others

55
Programming Parallel Applications
Level of control Parallel Options

Minimal Support built into


Toolboxes

High-Level
Some Programming Constructs:
(e.g. parfor, batch, distributed)

Low-Level
Extensive Programming Constructs:
(e.g. Jobs/Tasks, MPI-based)

56