Beruflich Dokumente
Kultur Dokumente
programming in
CUDA C
Will Landau
A review: GPU
parallelism and
CUDA architecture
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 1 / 40
Introduction to
Outline programming in
CUDA C
Will Landau
A review: GPU
parallelism and
CUDA architecture
A review: GPU parallelism and CUDA architecture
Beginning CUDA
C
Hello world
Skeleton program
Beginning CUDA C Simple program
Vector addition
Hello world Pairwise summation
Respecting the SIMD
Skeleton program paradigm
Simple program
Vector addition
Pairwise summation
Respecting the SIMD paradigm
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 2 / 40
A review: GPU parallelism and CUDA architecture
Introduction to
Outline programming in
CUDA C
Will Landau
A review: GPU
parallelism and
CUDA architecture
A review: GPU parallelism and CUDA architecture
Beginning CUDA
C
Hello world
Skeleton program
Beginning CUDA C Simple program
Vector addition
Hello world Pairwise summation
Respecting the SIMD
Skeleton program paradigm
Simple program
Vector addition
Pairwise summation
Respecting the SIMD paradigm
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 3 / 40
A review: GPU parallelism and CUDA architecture
Introduction to
The single instruction, multiple data (SIMD) programming in
CUDA C
paradigm Will Landau
A review: GPU
parallelism and
I SIMD: apply the same command to multiple places in a CUDA architecture
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 4 / 40
A review: GPU parallelism and CUDA architecture
Introduction to
CPU / GPU cooperation programming in
CUDA C
I The CPU (“host”) is in charge. Will Landau
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 5 / 40
A review: GPU parallelism and CUDA architecture
Introduction to
How GPU parallelism works programming in
CUDA C
1. The CPU sends a command called a kernel to a GPU. Will Landau
2. The GPU executes several duplicate realizations of this
A review: GPU
command, called threads. parallelism and
CUDA architecture
I These threads are grouped into bunches called blocks.
Beginning CUDA
I The sum total of all threads in a kernel is called a grid. C
Hello world
I Toy example: Skeleton program
Simple program
I CPU says: “Hey, GPU. Sum pairs of adjacent numbers. Vector addition
Use the array, (1, 2, 3, 4, 5, 6, 7, 8).” Pairwise summation
Respecting the SIMD
I GPU thinks: “Sum pairs of adjacent numbers” is a paradigm
kernel.
I The GPU spawns 2 blocks, each with 2 threads:
Block 0 1
Thread 0 1 0 1
Action 1+2 3+4 5+6 7+8
I I could have also used 1 block with 4 threads and given
the threads different pairs of numbers.
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 6 / 40
A review: GPU parallelism and CUDA architecture
Introduction to
programming in
CUDA C
Will Landau
A review: GPU
parallelism and
CUDA architecture
Beginning CUDA
C
Hello world
Skeleton program
Simple program
Vector addition
Pairwise summation
Respecting the SIMD
paradigm
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 7 / 40
A review: GPU parallelism and CUDA architecture
Introduction to
CUDA: making a gaming toy do science programming in
CUDA C
Will Landau
A review: GPU
I CUDA: Compute Unified Device Architecture. parallelism and
CUDA architecture
I Before CUDA, programmers could only do GPU Beginning CUDA
C
programming in graphics languages, which are Hello world
Skeleton program
appropriate for video games but clumsy for science. Simple program
Vector addition
I CUDA devices support CUDA C, an extension of C for Pairwise summation
Respecting the SIMD
programs that use GPUs. paradigm
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 8 / 40
Beginning CUDA C
Introduction to
Outline programming in
CUDA C
Will Landau
A review: GPU
parallelism and
CUDA architecture
A review: GPU parallelism and CUDA architecture
Beginning CUDA
C
Hello world
Skeleton program
Beginning CUDA C Simple program
Vector addition
Hello world Pairwise summation
Respecting the SIMD
Skeleton program paradigm
Simple program
Vector addition
Pairwise summation
Respecting the SIMD paradigm
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 9 / 40
Beginning CUDA C Hello world
Introduction to
Hello world programming in
CUDA C
A review: GPU
1 #i n c l u d e < s t d i o . h> parallelism and
2 CUDA architecture
3 i n t main ( ) { Beginning CUDA
4 p r i n t f ( ” H e l l o , World ! \ n” ) ; C
5 return 0; Hello world
Skeleton program
6 } Simple program
Vector addition
Pairwise summation
I A beginner CUDA C program: Respecting the SIMD
paradigm
1 #i n c l u d e < s t d i o . h>
2
3 global v o i d myKernel ( ) {
4 }
5
6 i n t main ( ) {
7 myKernel <<<1, 1>>>() ;
8 p r i n t f ( ” H e l l o , World ! \ n” ) ;
9 return 0;
10 }
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 10 / 40
Beginning CUDA C Hello world
Introduction to
Hello world programming in
CUDA C
5 Beginning CUDA
C
6 i n t main ( ) { Hello world
7 myKernel <<<2, 4>>>() ; Skeleton program
Simple program
8 p r i n t f ( ” H e l l o , World ! \ n” ) ; Vector addition
Pairwise summation
9 return 0; Respecting the SIMD
10 } paradigm
Introduction to
Prefixes in CUDA C programming in
CUDA C
Will Landau
I host
A review: GPU
I Runs once per call on the CPU. parallelism and
CUDA architecture
I Only callable from the CPU (i.e., from another host
Beginning CUDA
function). C
I All functions without explicit prefixes are host functions. Hello world
Skeleton program
Simple program
I global Vector addition
Pairwise summation
Respecting the SIMD
I Used to specify a kernel. paradigm
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 12 / 40
Beginning CUDA C Hello world
Introduction to
Prefix example: 2 blocks and 5 threads per block programming in
CUDA C
Will Landau
1 #i n c l u d e < s t d i o . h>
2 A review: GPU
3 device i n t dev1 ( ) { parallelism and
CUDA architecture
4 }
Beginning CUDA
5 C
6 device i n t dev2 ( ) { Hello world
Skeleton program
7 } Simple program
Vector addition
8 Pairwise summation
9 global void pleaseRunThis10Times () { Respecting the SIMD
paradigm
10 dev1 ( ) ;
11 dev2 ( ) ;
12 }
13
14 i n t main ( ) {
15 p l e a s e R u n T h i s 1 0 T i m e s <<<2, 5>>>() ;
16 p r i n t f ( ” H e l l o , World ! \ n” ) ;
17 return 0;
18 }
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 13 / 40
Beginning CUDA C Skeleton program
Introduction to
skeleton.cu: outlining a CUDA C workflow programming in
CUDA C
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 14 / 40
Beginning CUDA C Simple program
Introduction to
simple.cu: a program that actually does programming in
CUDA C
something Will Landau
A review: GPU
parallelism and
1 #i n c l u d e <s t d i o . h> CUDA architecture
2 #i n c l u d e <s t d l i b . h>
3 #i n c l u d e <cuda . h> Beginning CUDA
4 #i n c l u d e <c u d a r u n t i m e . h> C
5 Hello world
6 global v o i d c o l o n e l ( i n t ∗a d ){ Skeleton program
7 ∗a d = 2; Simple program
8 } Vector addition
9 i n t main ( ) { Pairwise summation
10 i n t a = 0 , ∗a d ; Respecting the SIMD
paradigm
11
12 c u d a M a l l o c ( ( v o i d ∗∗) &a d , s i z e o f ( i n t ) ) ;
13 cudaMemcpy ( a d , &a , s i z e o f ( i n t ) , cudaMemcpyHostToDevice ) ;
14
15 c o l o n e l <<<1,1>>>(a d ) ;
16
17 cudaMemcpy(&a , a d , s i z e o f ( i n t ) , cudaMemcpyDeviceToHost ) ;
18
19 p r i n t f ( ” a = %d\n” , a ) ;
20 cudaFree ( a d ) ;
21
22 }
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 15 / 40
Beginning CUDA C Simple program
Introduction to
Compiling and running simple.cu programming in
CUDA C
Will Landau
A review: GPU
1 > n v c c s i m p l e . cu −o s i m p l e parallelism and
2 > ./ simple CUDA architecture
3 a = 2 Beginning CUDA
C
Hello world
Skeleton program
I Notes: Simple program
Vector addition
I nvcc is the NVIDIA CUDA C compiler, Pairwise summation
Respecting the SIMD
I CUDA C source files usually have the *.cu extension, paradigm
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 16 / 40
Beginning CUDA C Simple program
Introduction to
Builtin CUDA C variables programming in
CUDA C
I For a kernel call with B blocks and T threads per A review: GPU
parallelism and
block, CUDA architecture
I blockIdx.x Beginning CUDA
I ID of the current block (in the x direction). C
Hello world
I Integer from 0 to B − 1 inclusive. Skeleton program
Simple program
I threadIdx.x Vector addition
Pairwise summation
I within the current block, ID of the current thread (in Respecting the SIMD
paradigm
the x direction).
I Integer from 0 to T − 1 inclusive.
I gridDim.x: number of blocks in the current grid (in
the x direction).
I blockDim.x: number of threads per block (in the x
direction).
I With some modifications that I will describe in later
lectures, you can use the y and z directions with
variables like threadIdx.y, threadIdx.z etc.
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 17 / 40
Beginning CUDA C Vector addition
Introduction to
Vector addition: vectorsums.cu programming in
CUDA C
Will Landau
1 #i n c l u d e <s t d i o . h>
2 #i n c l u d e <s t d l i b . h>
3 #i n c l u d e <cuda . h> A review: GPU
4 #i n c l u d e <c u d a r u n t i m e . h> parallelism and
5 CUDA architecture
6 #d e f i n e N 10
7 Beginning CUDA
8 global v o i d add ( i n t ∗a , i n t ∗b , i n t ∗c ) { C
9 i nt bid = blockIdx . x ; Hello world
10 i f ( b i d < N) Skeleton program
11 c [ bid ] = a [ bid ] + b [ bid ] ; Simple program
12 } Vector addition
13 Pairwise summation
14 i n t main ( v o i d ) { Respecting the SIMD
paradigm
15 i n t i , a [N] , b [N] , c [N ] ;
16 i n t ∗dev a , ∗dev b , ∗dev c ;
17
18 c u d a M a l l o c ( ( v o i d ∗∗) &d e v a , N∗ s i z e o f ( i n t ) ) ;
19 c u d a M a l l o c ( ( v o i d ∗∗) &d e v b , N∗ s i z e o f ( i n t ) ) ;
20 c u d a M a l l o c ( ( v o i d ∗∗) &d e v c , N∗ s i z e o f ( i n t ) ) ;
21
22 f o r ( i =0; i<N ; i ++){
23 a [ i ] = −i ;
24 b[ i ] = i∗i ;
25 }
26
27 cudaMemcpy ( d e v a , a , N∗ s i z e o f ( i n t ) , cudaMemcpyHostToDevice ) ;
28 cudaMemcpy ( d e v b , b , N∗ s i z e o f ( i n t ) , cudaMemcpyHostToDevice ) ;
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 18 / 40
Beginning CUDA C Vector addition
Introduction to
Vector addition: vectorsums.cu programming in
CUDA C
Will Landau
A review: GPU
parallelism and
CUDA architecture
29 Beginning CUDA
30 add<<<N,1>>>(d e v a , d e v b , d e v c ) ; C
31 Hello world
32 cudaMemcpy ( c , d e v c , N∗ s i z e o f ( i n t ) , cudaMemcpyDeviceToHost ) ; Skeleton program
33 Simple program
34 p r i n t f ( ”\na + b = c\n” ) ; Vector addition
35 f o r ( i = 0 ; i<N ; i ++){ Pairwise summation
36 p r i n t f ( ”%5d + %5d = %5d\n” , a [ i ] , b [ i ] , c [ i ] ) ; Respecting the SIMD
37 } paradigm
38
39 cudaFree ( dev a ) ;
40 cudaFree ( dev b ) ;
41 cudaFree ( dev c ) ;
42 }
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 19 / 40
Beginning CUDA C Vector addition
Introduction to
Compiling and running vectorsums.cu programming in
CUDA C
Will Landau
A review: GPU
1 > n v c c v e c t o r s u m s . cu −o v e c t o r s u m s parallelism and
CUDA architecture
2 > ./ vectorsums
Beginning CUDA
3 a + b = c C
4 0 + 0 = 0 Hello world
Skeleton program
5 −1 + 1 = 0 Simple program
6 −2 + 4 = 2 Vector addition
Pairwise summation
7 −3 + 9 = 6 Respecting the SIMD
paradigm
8 −4 + 16 = 12
9 −5 + 25 = 20
10 −6 + 36 = 30
11 −7 + 49 = 42
12 −8 + 64 = 56
13 −9 + 81 = 72
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 20 / 40
Beginning CUDA C Pairwise summation
Introduction to
Synchronizing threads within blocks: the pairwise programming in
CUDA C
sum revisited Will Landau
Beginning CUDA
C
Hello world
Skeleton program
Simple program
Vector addition
Pairwise summation
Respecting the SIMD
5 2 -3 1 1 8 2 6 paradigm
Thread 10
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 21 / 40
Beginning CUDA C Pairwise summation
Introduction to
Synchronizing threads within blocks: the pairwise programming in
CUDA C
sum revisited Will Landau
A review: GPU
parallelism and
CUDA architecture
5 2 -3 1 1 8 2 6 Beginning CUDA
C
Hello world
Skeleton program
Simple program
Vector addition
Pairwise summation
Thread 21
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 22 / 40
Beginning CUDA C Pairwise summation
Introduction to
Synchronizing threads within blocks: the pairwise programming in
CUDA C
sum revisited Will Landau
A review: GPU
parallelism and
CUDA architecture
5 2 -3 1 1 8 2 6 Beginning CUDA
C
Hello world
Skeleton program
Simple program
Vector addition
Pairwise summation
Thread 32
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 23 / 40
Beginning CUDA C Pairwise summation
Introduction to
Synchronizing threads within blocks: the pairwise programming in
CUDA C
sum revisited Will Landau
A review: GPU
parallelism and
CUDA architecture
5 2 -3 1 1 8 2 6 Beginning CUDA
C
Hello world
Skeleton program
Simple program
6 10 -1 7 Vector addition
Pairwise summation
Respecting the SIMD
paradigm
Thread 3
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 24 / 40
Beginning CUDA C Pairwise summation
Introduction to
Synchronizing threads within blocks: the pairwise programming in
CUDA C
sum revisited Will Landau
A review: GPU
parallelism and
CUDA architecture
5 2 -3 1 1 8 2 6 Beginning CUDA
C
Hello world
Skeleton program
Simple program
6 10 -1 7 Vector addition
Pairwise summation
Respecting the SIMD
paradigm
Synchronize threads
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 25 / 40
Beginning CUDA C Pairwise summation
Introduction to
Synchronizing threads within blocks: the pairwise programming in
CUDA C
sum revisited Will Landau
A review: GPU
parallelism and
CUDA architecture
5 2 -3 1 1 8 2 6 Beginning CUDA
C
Hello world
Skeleton program
Simple program
6 10 -1 7 Vector addition
Pairwise summation
Respecting the SIMD
paradigm
5
Thread 0
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 26 / 40
Beginning CUDA C Pairwise summation
Introduction to
Synchronizing threads within blocks: the pairwise programming in
CUDA C
sum revisited Will Landau
A review: GPU
parallelism and
CUDA architecture
5 2 -3 1 1 8 2 6 Beginning CUDA
C
Hello world
Skeleton program
Simple program
6 10 -1 7 Vector addition
Pairwise summation
Respecting the SIMD
paradigm
5 17
Thread 1
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 27 / 40
Beginning CUDA C Pairwise summation
Introduction to
Synchronizing threads within blocks: the pairwise programming in
CUDA C
sum revisited Will Landau
A review: GPU
parallelism and
CUDA architecture
5 2 -3 1 1 8 2 6 Beginning CUDA
C
Hello world
Skeleton program
Simple program
6 10 -1 7 Vector addition
Pairwise summation
Respecting the SIMD
paradigm
5 17
Synchronize Threads
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 28 / 40
Beginning CUDA C Pairwise summation
Introduction to
Synchronizing threads within blocks: the pairwise programming in
CUDA C
sum revisited Will Landau
A review: GPU
parallelism and
CUDA architecture
5 2 -3 1 1 8 2 6 Beginning CUDA
C
Hello world
Skeleton program
Simple program
6 10 -1 7 Vector addition
Pairwise summation
Respecting the SIMD
paradigm
5 17
Thread 0
22
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 29 / 40
Beginning CUDA C Pairwise summation
Introduction to
Pairwise sum in pseudocode programming in
CUDA C
Will Landau
3. Synchronize threads.
4. Integer divide offset by 2.
5. Return to step 2 if offset > 0.
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 30 / 40
Beginning CUDA C Pairwise summation
Introduction to
pairwise sum.cu programming in
CUDA C
1 #i n c l u d e <s t d i o . h> Will Landau
2 #i n c l u d e <s t d l i b . h>
3 #i n c l u d e <math . h>
4 #i n c l u d e <cuda . h> A review: GPU
5 #i n c l u d e <c u d a r u n t i m e . h> parallelism and
6 CUDA architecture
7 /∗
8 ∗ T h i s program computes t h e sum o f t h e e l e m e n t s o f Beginning CUDA
9 ∗ v e c t o r v u s i n g t h e p a i r w i s e ( c a s c a d i n g ) sum a l g o r i t h m . C
10 ∗/ Hello world
11 Skeleton program
12 #d e f i n e N 8 // l e n g t h o f v e c t o r v . MUST BE A POWER OF 2 ! ! ! Simple program
Vector addition
13
Pairwise summation
14 // F i l l t h e v e c t o r v w i t h n random f l o a t i n g p o i n t numbers .
Respecting the SIMD
15 v o i d v f i l l ( f l o a t ∗ v , i n t n ){ paradigm
16 int i ;
17 f o r ( i = 0 ; i < n ; i ++){
18 v [ i ] = ( f l o a t ) r a n d ( ) / RAND MAX ;
19 }
20 }
21
22 // P r i n t t h e v e c t o r v .
23 v o i d v p r i n t ( f l o a t ∗ v , i n t n ){
24 int i ;
25 p r i n t f ( ” v = \n” ) ;
26 f o r ( i = 0 ; i < n ; i ++){
27 p r i n t f ( ” %7.3 f \n” , v [ i ] ) ;
28 }
29 p r i n t f ( ”\n” ) ;
30 }
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 31 / 40
Beginning CUDA C Pairwise summation
Introduction to
pairwise sum.cu programming in
CUDA C
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 32 / 40
Beginning CUDA C Pairwise summation
Introduction to
pairwise sum.cu programming in
CUDA C
Will Landau
A review: GPU
parallelism and
58 // W r i t e t h e c o n t e n t s o f v h t o v d CUDA architecture
59 cudaMemcpy ( v d , v h , N ∗ s i z e o f ( f l o a t ) , cudaMemcpyHostToDevice ) ;
60 Beginning CUDA
61 // Compute t h e p a i r w i s e sum o f t h e e l e m e n t s o f v d and s t o r e t h e C
result in v d [ 0 ] . Hello world
62 psum<<< 1 , N/2 >>>(v d ) ; Skeleton program
63 Simple program
64 // W r i t e t h e p a i r w i s e sum , v d [ 0 ] , t o v h [ 0 ] . Vector addition
65 cudaMemcpy ( v h , v d , s i z e o f ( f l o a t ) , cudaMemcpyDeviceToHost ) ; Pairwise summation
66 Respecting the SIMD
67 // P r i n t t h e p a i r w i s e sum . paradigm
68 p r i n t f ( ” P a i r w i s e sum = %7.3 f \n” , v h [ 0 ] ) ;
69
70 // F r e e d y n a m i c a l l y −a l l o c a t e d h o s t memory
71 free (v h) ;
72
73 // F r e e d y n a m i c a l l y −a l l o c a t e d d e v i c e memory
74 cudaFree ( v d ) ;
75 }
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 33 / 40
Beginning CUDA C Pairwise summation
Introduction to
Compiling and running pairwise sum.cu programming in
CUDA C
Will Landau
A review: GPU
1 > n v c c p a i r w i s e s u m . cu −o p a i r w i s e s u m parallelism and
CUDA architecture
2 > ./ pairwise sum
Beginning CUDA
3 v = C
4 0.840 Hello world
Skeleton program
5 0.394 Simple program
Vector addition
6 0.783 Pairwise summation
7 0.798 Respecting the SIMD
paradigm
8 0.912
9 0.198
10 0.335
11 0.768
12
13 P a i r w i s e sum = 5.029
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 34 / 40
Beginning CUDA C Respecting the SIMD paradigm
Introduction to
Best practices: respect the SIMD paradigm programming in
CUDA C
Will Landau
A review: GPU
parallelism and
CUDA architecture
Beginning CUDA
C
I SIMD: “Single Instruction, Multiple Data” Hello world
Skeleton program
I Under this paradigm, the thread in a kernel call write to Simple program
Vector addition
different memory spaces. Pairwise summation
Respecting the SIMD
paradigm
I When threads write to the same memory (SISD),
problems can arise.
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 35 / 40
Beginning CUDA C Respecting the SIMD paradigm
Introduction to
sisd.cu: violating the SIMD paradigm programming in
CUDA C
Will Landau
1 #i n c l u d e <s t d i o . h>
2 #i n c l u d e <s t d l i b . h>
3 #i n c l u d e <cuda . h> A review: GPU
4 #i n c l u d e <c u d a r u n t i m e . h> parallelism and
5 CUDA architecture
6 global v o i d c o l o n e l ( i n t ∗a d ){
7 ∗a d = blockDim . x ∗ b l o c k I d x . x + t h r e a d I d x . x ; Beginning CUDA
8 } C
9 Hello world
10 i n t main ( ) { Skeleton program
11 Simple program
Vector addition
12 i n t a = 0 , ∗a d ;
Pairwise summation
13
Respecting the SIMD
14 c u d a M a l l o c ( ( v o i d ∗∗) &a d , s i z e o f ( i n t ) ) ; paradigm
15 cudaMemcpy ( a d , &a , s i z e o f ( i n t ) , cudaMemcpyHostToDevice ) ;
16
17 c o l o n e l <<<4,5>>>(a d ) ;
18
19 cudaMemcpy(&a , a d , s i z e o f ( i n t ) , cudaMemcpyDeviceToHost ) ;
20
21 p r i n t f ( ” a = %d\n” , a ) ;
22 cudaFree ( a d ) ;
23
24 }
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 36 / 40
Beginning CUDA C Respecting the SIMD paradigm
Introduction to
sisd.cu: violating the SIMD paradigm programming in
CUDA C
Will Landau
A review: GPU
parallelism and
CUDA architecture
Beginning CUDA
C
1 > n v c c s i s d . cu −o s i s d Hello world
2 > ./ sisd Skeleton program
Simple program
3 a = 14 Vector addition
Pairwise summation
Respecting the SIMD
paradigm
I The output is unpredictable because the threads modify
the same variable in an unpredictable order.
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 37 / 40
Beginning CUDA C Respecting the SIMD paradigm
Introduction to
Outline programming in
CUDA C
Will Landau
A review: GPU
parallelism and
CUDA architecture
A review: GPU parallelism and CUDA architecture
Beginning CUDA
C
Hello world
Skeleton program
Beginning CUDA C Simple program
Vector addition
Hello world Pairwise summation
Respecting the SIMD
Skeleton program paradigm
Simple program
Vector addition
Pairwise summation
Respecting the SIMD paradigm
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 38 / 40
Beginning CUDA C Respecting the SIMD paradigm
Introduction to
Resources programming in
CUDA C
Will Landau
A review: GPU
parallelism and
I Texts: CUDA architecture
I Code:
I skeleton.cu
I simple.cu
I vectorsums.cu
I pairwise sum.cu
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 39 / 40
Beginning CUDA C Respecting the SIMD paradigm
Introduction to
That’s all for today. programming in
CUDA C
Will Landau
A review: GPU
parallelism and
CUDA architecture
Beginning CUDA
C
Hello world
Skeleton program
Simple program
I Series materials are available at Vector addition
Pairwise summation
http://will-landau.com/gpu. Respecting the SIMD
paradigm
Will Landau (Iowa State University) Introduction to programming in CUDA C September 30, 2013 40 / 40