Beruflich Dokumente
Kultur Dokumente
Module Overview
Overview OpenCL Architecture & Programming Model Basic components for getting started Information on tools
OVERVIEW
OpenCL
OpenCL Open Computing Language Open Standard
Royalty free, cross-platform, vendor neutral
GPUs
Emerging Intersection
Increasingly general purpose data-parallel computing Improving numerical precision
OpenCL
Multi-processor programming e.g. OpenMP
Heterogenous Computing
Defines a configuration profile for handheld and embedded devices Close integration with OpenGL and other 3D APIs
OpenCL
Interface designed for graphics free API Software Stack
High level Language
Extended C to show parallelism
Runtime libraries
Allows GPU memory management
GPU as Co-processor
GPU as Compute device
Has its own DRAM (Video memory) Can run multiple threads in parallel
Application runs on host The compute intensive, data-parallel part is sent to GPU
Written as C functions called kernel The kernel is executed on device simultaneously by multiple threads
Programming Model
Host application GPU kernel
FireStream
Main Memory
GPU Memory
C - Rewritten
float sum_kernel(int x, float A[], float B[]) { return A[x] + B[x]; } void sum(float A[], float B[], float C[]) { for(int i = 0; i < n; i++) C[j][i] = sum_kernel(i, A, B); }
OpenCL
// Kernel definition __kernel void vecAdd(__global float* A, __global float* B, __global float* C) { int i = get_local_id(0); C[i] = A[i] + B[i]; } int main() { // Kernel invocation size_t globalWorkSize[] = {n}; size_t localWorkSize[] = {n}; clEnqueueNDRangeKernel(..,1, NULL, globalWorkSize, localWorkSize, 0, NULL,NULL); }
Kernel
Each thread has a unique thread ID
__kernel void vecAdd(__global float* A, __global float* B, __global float* C) { int i = get_local_id(0); Unique Thread ID Accessible within the kernel through C[i] = A[i] + B[i]; intrinsic function }
Work-Group
Work-items are organized into work-groups Group can be a 1D, 2D or 3D array of work-items
Specified during kernel invocation Helpful to invoke kernels on Matrices, fields Each work-item within a group can be identified by a 1D, 2D or 3D id
Built-in function get_local_id()
Work-Group
WI (0, 0) WI (0, 1) WI (0, 2) WI (1, 0) WI (1, 1) WI (1, 2) WI (2, 0) WI (2, 1) WI (2, 2) WI (3, 0) WI (3, 1) WI (3, 2) WI (4, 0) WI (4, 1) WI (4, 2)
Work-Group
Example of 2D work-group
// Add two matrices A and B of dimension NxN and store the // result into C __kernel void matAdd(int N, __global float* A, __global float* B, __global float* C) { int i = get_local_id(0); int j = get_local_id(1); C[j * N + i] = A[j * N + i] + B[j * N + i]; }
// host code int main() { // Declare, allocate and initialize device memory A, B & C
// Kernel invocation size_t globalWorkSize[] = {N, N}; size_t localWorkSize[] = {N, N}; clEnqueueNDRangeKernel(.., 1, NULL, globalWorkSize, localWorkSize, 0, NULL, NULL); }
Choose the dimensions that are best for your algorithm Maps well Performs well
Host program
Query compute devices Platform Layer Create contexts Create memory objects associated to contexts Compile and create kernel program objects Runtime Issue commands to command-queue Synchronization of commands Clean up OpenCL resources
INFORMATION ON TOOLS
OpenCL Implementation
AMDs implementation
Ships with ATI Stream SDK v2.0 Released on: 21th Dec, 2009
OpenCL Installation
ATI Stream SDK
Environment variable
$(ATISTREAMSDKROOT) = ATI Stream SDK installation directory $(ATISTREAMSDKSAMPLESROOT) = ATI Stream SDK Samples installation directory
Library files
OpenCL.lib under $(ATISTREAMSDKROOT)\lib\x86