Beruflich Dokumente
Kultur Dokumente
Daniel Weingaertner
Informatics Department Federal University of Paran a - Brazil
FH-Regensburg
1 / 40
Summary
1
Introduction Insight Toolkit (ITK) GPGPU and CUDA Integrating CUDA and ITK Canny Edge Detection Experimental Results Conclusion
FH-Regensburg
2 / 40
Paran a Brazil
FH-Regensburg
3 / 40
Brazil Europe
FH-Regensburg
4 / 40
Paran a
FH-Regensburg
5 / 40
Curitiba
FH-Regensburg
6 / 40
FH-Regensburg
7 / 40
Informatics Department
Undergraduate: Bachelor in Computer Science 8 semesters course 80 incoming students per year Bachelor in Biomedical Informatics 8 semesters course 30 incoming students per year Graduate: Master and PhD in Computer Science Algorithms, Image Processing, Computer Vision, Articial Intelligence Databases, Scientic Computing and Open Source Software, Computer-Human Interface Computer Networks, Embedded Systems
FH-Regensburg
8 / 40
Summary
1
Introduction Insight Toolkit (ITK) GPGPU and CUDA Integrating CUDA and ITK Canny Edge Detection Experimental Results Conclusion
FH-Regensburg
9 / 40
FH-Regensburg
10 / 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
i n t main ( i n t a r g c , c h a r a r g v ) { ReaderType : : P o i n t e r r e a d e r = ReaderType : : New ( ) ; reader >S e t F i l e N a m e ( a r g v [ 1 ] ) ; reader >Update ( ) ; C a n n y F i l t e r : : P o i n t e r c a n n y = C a n n y F i l t e r : : New ( ) ; canny >S e t I n p u t ( r e a d e r >GetOutput ( ) ) ; canny >S e t V a r i a n c e ( a t o f ( a r g v [ 3 ] ) ) ; canny >S e t U p p e r T h r e s h o l d ( a t o i ( a r g v [ 4 ] ) ) ; canny >S e t L o w e r T h r e s h o l d ( a t o i ( a r g v [ 5 ] ) ) ; canny >Update ( ) ; W r i t e r T y p e : : P o i n t e r w r i t e r = W r i t e r T y p e : : New ( ) ; writer >S e t F i l e N a m e ( a r g v [ 2 ] ) ; writer >S e t I n p u t ( canny >GetOutput ( ) ) ; writer >Update ( ) ; r e t u r n EXIT SUCCESS ; }
FH-Regensburg
11 / 40
Summary
1
Introduction Insight Toolkit (ITK) GPGPU and CUDA Integrating CUDA and ITK Canny Edge Detection Experimental Results Conclusion
FH-Regensburg
12 / 40
FH-Regensburg
13 / 40
What is CUDA?
CUDA = Compute Unied Device Architecture. General-Purpose Parallel Computing Architecture. Provides libraries, C language extension and hardware driver.
FH-Regensburg
14 / 40
FH-Regensburg
15 / 40
Creates, handles, schedules and executes groups of 32 threads (warp ). All threads in a warp start at the same point. But they are free to jump to dierent code positions independently.
FH-Regensburg
16 / 40
FH-Regensburg
17 / 40
Main optimization strategies for CUDA involve: Optimized/careful memory access Maximization of processor utilization Maximization of non-serialized instructions
FH-Regensburg
18 / 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
FH-Regensburg
19 / 40
Summary
1
Introduction Insight Toolkit (ITK) GPGPU and CUDA Integrating CUDA and ITK Canny Edge Detection Experimental Results Conclusion
FH-Regensburg
20 / 40
ITK community suggests: Re-implement lters where parallelizing provides signicant speedup Consider the entire workow: copying to/from the GPU is very time consuming Careful! Premature optimization is the root of all evil! (Donald Knuth)
FH-Regensburg
21 / 40
ITK community suggests: Re-implement lters where parallelizing provides signicant speedup Consider the entire workow: copying to/from the GPU is very time consuming Careful! Premature optimization is the root of all evil! (Donald Knuth)
FH-Regensburg
21 / 40
FH-Regensburg
22 / 40
Summary
1
Introduction Insight Toolkit (ITK) GPGPU and CUDA Integrating CUDA and ITK Canny Edge Detection Experimental Results Conclusion
FH-Regensburg
23 / 40
CudaCanny
itkCudaCannyEdgeDetectionImageFilter Algorithm 1 Canny Edge Detection Filter Gaussian Smoothing Gradient Computation Non-Maximum Supression Histeresis
FH-Regensburg
24 / 40
(a) Sobel X
(b) Sobel Y
Lv =
2 L2 x + Ly
(1) (2)
= arctan
Ly Lx
FH-Regensburg
25 / 40
FH-Regensburg
26 / 40
FH-Regensburg
27 / 40
Histeresis Operation
FH-Regensburg
28 / 40
Histeresis Algorithm
Algorithm 2 Histeresis on CPU Transfers the Gradient/NMS images to the GPU repeat Run the histeresis kernel on GPU until no pixel changes status Return edge image
FH-Regensburg
29 / 40
Histeresis Algorithm
Algorithm 3 Histeresis on GPU Load an image region with size 18x18 into shared memory modied false repeat modied region false Synchronize threads of same multiprocessor if Pixel changes status then modied true modied region true end if Synchronize threads of same multiprocessor until modied region = false if modied = true then Update modied status on HOST end if
Daniel Weingaertner (DInf-UFPR) FH-Regensburg 30 / 40
Summary
1
Introduction Insight Toolkit (ITK) GPGPU and CUDA Integrating CUDA and ITK Canny Edge Detection Experimental Results Conclusion
FH-Regensburg
31 / 40
Metodology
Hardware: Server:
CPU: 4x AMD Opteron(tm) Processor 6136 2,4GHz with 8 cores, each with 512 KB cache and 126GB RAM GPU1: NVidia Tesla C2050 with 448 1,15GHz cores and 3GB RAM. GPU2: NVidia Tesla C1060 com 240 1,3GHz cores and 4GB RAM.
Desktop:
CPU: Intel R Core(TM)2 Duo E7400 2,80GHz with 3072 KB cache and 2GB RAM GPU: NVidia GeForce 8800 GT with 112 1,5GHz cores and 512MB RAM.
FH-Regensburg
32 / 40
Metodology
Images from the Berkeley Segmentation Dataset Base B1 B2 B3 B4 Image resolution 321481 e 481321 642962 e 962642 12841924 e 19241284 25683848 e 38482568 Num. of Images 100 100 100 100
FH-Regensburg
33 / 40
Performance Tests
FH-Regensburg
34 / 40
Performance Tests
FH-Regensburg
35 / 40
Performance Tests
FH-Regensburg
36 / 40
Performance Tests
FH-Regensburg
37 / 40
Summary
1
Introduction Insight Toolkit (ITK) GPGPU and CUDA Integrating CUDA and ITK Canny Edge Detection Experimental Results Conclusion
FH-Regensburg
38 / 40
Conclusion
Parallel Programming Parallel programming is denitely the way to go. Implement ecient parallel code is demanding. Programmer should know more details about the hardware, especially memory architecture. Canny Filter with CUDA We had a great speedup on the edge detection lter Also noticed that the existing implementation is not ecient There is still a LOT of work if we want to parallelize ITK.
FH-Regensburg
39 / 40
Conclusion
Parallel Programming Parallel programming is denitely the way to go. Implement ecient parallel code is demanding. Programmer should know more details about the hardware, especially memory architecture. Canny Filter with CUDA We had a great speedup on the edge detection lter Also noticed that the existing implementation is not ecient There is still a LOT of work if we want to parallelize ITK.
FH-Regensburg
39 / 40
Contact
Thank You!
FH-Regensburg
40 / 40