Beruflich Dokumente
Kultur Dokumente
- Using CUBLAS/CUFFT libraries: simple codes show how to call the functions provided by
the libraries
• Profiling techniques
– ‘asyncAPI’/’simpleStream’: overlapping execution on CPU and GPU
– ‘bandwidthTest’: measuring memcopy bandwidth of the GPU
– ‘clock’: using clock function to measure the performance of kernel accurately
– ‘deviceQuery’: how to get properties of the CUDA devices present in the
system
• Batch-processing
– ‘fastWalshTransform’: Hadamard-ordered Fast Walsh Transform for batched vectors of
arbitrary eligible length
• Scan/reduction
– ‘reduction’: the fundamental operation for many parallel algorithms
– ‘scan’: efficient implementation of parallel prefix sum algorithm for large arrays
• Data compression
– ‘dxtc’: high quality DXT compression using CUDA