Beruflich Dokumente
Kultur Dokumente
Agenda
Motivation
A little terminology
Hardware in a heterogeneous world Software in a heterogeneous world
Solution
Definitions
Parallelism A property of a computation where portions of the calculations are independent of each other, allowing them to be executed at the same time.
Definitions
Concurrency A logical programming abstraction used to arbitrate communication between multiple processing entities (like processes or threads). For example, concurrency can be used to build user interfaces and other asynchronous tasks.
Definitions
Heterogenous Computing A system comprised of two or more compute engines with signficant structural differences In our case, a low latency x86 CPU and a high throughput Radeon GPU Fusion
Bringing together two or more components and joining them into a single unified whole
In our case, combining CPUs and GPUs on a single silicon die for higher performance and lower power
Delivers
optimal performance
Multi-Core Era
Enabled by: Moores Law Desire for Throughput 20 years of SMP arch Constrained by: Power Parallel SW availability Scalability
Throughput Performance
o
we are here
o
we are here
o
we are here
Time
Time (# of Processors)
HD5870
HD4870
CPU
HD5870 HD4870
CPU
250
200
150
HD5870
100
HD4870
50
GFLOPS/W
GFLOPS/mm2 4.50
7.50
7.90
GFLOPS/mm2
2.01
1.07
2.21 2.24
4.56
0.42
1.06
0.92
Fusion APU
Programmer Accessibility
Heterogeneous Computing
System-level Programmable
Unacceptable
Throughput Performance
16 | Introduction to Parallel and Heterogeneous Computing| October, 2010
GPU Advancement
Scalable architecture that can target a broad range of platforms from mobile to data center
20 | Introduction to Parallel and Heterogeneous Computing| October, 2010
http://sites.amd.com/us/fusion/apu/Pages/fusion-developer-summit.aspx
Introduction to OpenCL
OpenCL Programming in Detail Using OpenCL C Language
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
Java Threads
X10 Cuda Thread Building Blocks Cilk Fortress Data parallelism
Brook+
Data-decomposition
26 | Introduction to Parallel and Heterogeneous Computing| October, 2010
Task-decomposition
Divides the problem by type of task to be done
Synchronization
Load balancing Etc
Data-decomposition
Divides the problem into elements to be processed
Braided Parallelism
Reference: Aaron Lefohn. Programming Larrabee: Beyond Data Parallelism. Beyond Programmable Shading Course. SIGGRAPH 2008.
32 | Introduction to Parallel and Heterogeneous Computing| October, 2010
Fusion APU
Programmer Accessibility
Unacceptable
Throughput Performance
33 | Introduction to Parallel and Heterogeneous Computing| October, 2010
GPU Advancement
Braided Parallelism High Performance Heterogeneous Task Parallel Execution a natural programming Computing model for heterogeneous computing
Experts Only
System-level Programmable
Trademark Attribution AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners. 2009 Advanced Micro Devices, Inc. All rights reserved.