CUDA Programming Within Mathematica

Wolfram White Paper
CUDA Programming within Mathematica
Introduction
CUDA is short for Common Unified Device Architecture. NVIDIA developed the language to program graphics cards and let you run C-like code on the Graphical Processing Unit (GPU). CUDA promises significant performance increases, and CUDA-designed programs will scale to multicore systems. Considering CUDAs advanced technology, you may expect its programming to be enormously complicated. Enter Mathematica, the easiest way to program for CUDA and unlock GPU performance potential. Unlike programming in C or developing CUDA wrapper code, now you dont have to be a programming wizard to use CUDA. Mathematica offers an intuitive environmenteven featuring built-in ready-to-use examples for common application areas, such as image processing, medical imaging, statistics, and finance that makes CUDA programming a breeze, even if youve never used Mathematica before. And if you have used Mathematica, youll be amazed by the massive boost in computational power, as well as application performance, enhancing speed by factors easily exceeding 100. In this document, we describe the benefits of CUDA integration in Mathematica and provide some applications for which it is suitable, such as image processing and financial engineering.
Motivations for CUDA

CUDA is different from regular C code, which runs on the Central Processing Unit (CPU). In CUDA terminology, the device is the GPU, while the host is the CPU. There are two motivations for using CUDA. The first is the promise of significant speed, while the other is the promise that a CUDA-designed program will scale to multicore systems. With CPU manufacturers indicating that hundred- and thousand-core systems will be included in user desktops, the CUDA programming paradigm, which scales well to thousands of CPU cores, becomes more important. GPUs currently have thousands of cores, and correct formulation of the problem and memory usage can give you a significant performance boost.
A Brief Introduction to Mathematica

Mathematica has been one of the most powerful languages for technical computing for more than 20 years. Wolfram Research introduces fully integrated GPU programming capabilities in Mathematica, which brings a whole new meaning to high-performance computing. For CUDA developers, the new integration means unlimited access to Mathematicas vast computing abilities. Mathematica is a sophisticated development environment that combines a flexible programming language with a wide range of symbolic and numeric computational capabilities, production of high-quality visualizations, built-in application area packages, and a range of immediate deployment options. With direct integration of dynamic libraries, instant interface construction, and automatic C code generation and linking, Mathematica provides the most sophisticated build-todeploy environment in the market. Some highlights of Mathematicas features include:
Full-featured, unified development environment

Through its unique interface and integrated features for computation, development, and deployment, Mathematica provides a streamlined workflow.
Unified data representation

At the core of Mathematica is the foundational idea that everythingdata, programs, formulas, graphics, documentscan be represented as symbolic entities, called expressions. This unified representation makes Mathematicas language and functions extremely flexible, streamlined, and consistent.
Multiparadigm programming language

Mathematica provides its own highly declarative functional language, as well as several different programming paradigms, such as procedural and rule-based programming. Programmers can choose their own style for writing code with minimal effort. Along with comprehensive documentation and resources, Mathematicas flexibility greatly reduces the cost of entry for new users.
Unique symbolic-numeric hybrid system

The fundamental principle of Mathematica is full integration of symbolic and numeric computing capabilities. Through its full automation and preprocessing mechanisms, users can enjoy the full power of a hybrid computing system without the knowledge of specific methodologies and algorithms.
Extensive scientific and technical area coverage

Mathematica provides thousands of built-in functions and packages that cover a broad range of scientific and technical computing areas, such as statistics, control systems, data visualization, and image processing. All functions are carefully designed and tightly integrated with the system, which enables users to break the barriers between specialized areas and explore new possibilities.
Built-in high-performance computing

Mathematica fully supports multi-threaded, multicore processors without extra packages. Many functions automatically utilize the power of multicore processors, and built-in generic parallel functions make high-performance programming a simple process.
Easy data access and connectivity

Mathematica natively supports hundreds of formats for importing and exporting, as well as unlimited access to data from Wolfram|Alpha, Wolfram Researchs computational knowledge engine. It also provides APIs for accessing common database and programming languages, such as SQL, C/C++, and .NET.
Platform-dependent deployment options

Through its interactive notebooks, Mathematica Player, and browser plug-ins, Mathematica provides a wide range of options for deployment. Built-in code generation functionality can be used to create standalone programs for independent distribution.
Benefits of CUDA Integration in Mathematica

For users who want to tap into the power of GPU computing, CUDA integration in Mathematica provides benefits in both development and performance. The full integration and automation of Mathematicas CUDA capability means a more productive and efficient development cycle. In addition, it brings unprecedented levels of performance improvements without extra development time and cost.
Simplified Development Cycle

Automation of development project management
Like many other development frameworks, programming with CUDA may require programmers to manage project setup, platform dependencies, and device configuration. CUDA integration in Mathematica makes the process completely transparent and fully automated. Users do not have to set up any project files or worry about platform dependency.
Automated GPU memory and thread management

In a typical CUDA program, programmers should write memory and thread management code manually, in addition to a CUDA kernel function:
With Mathematica, memory and thread management for the GPU is completely automatic, which completely abolishes the need for extra code writing:
The memory between the host and the device is synchronized only as needed, to avoid data transfer (which is known to be very costly) unless absolutely needed. For advanced applications, full control of how and when the memory needs to be copied between the host and GPU devices is provided.
Automatic compilation and binding of dynamic libraries

Mathematicas CUDA support streamlines the whole programming process. It goes one step further by compiling and establishing the connection with Mathematica in a completely transparent process, thereby greatly simplifying the development process for the end user. With a single command, your CUDA kernel code is compiled, linked, and bound to Mathematica, and is ready to use.
CUDA integration provides full access to Mathematicas native language and built-in functions. It also provides free exchange of data between Mathematica and users CUDA programs. With Mathematicas comprehensive symbolic and numerical functions, built-in application area CUDA Programming within the power 5 support, and graphical interface building functions, users can not only combine Mathematica of Mathematica and GPU computing, but also spend more time on developing and optimizing core CUDA kernel algorithms.
With Mathematicas comprehensive symbolic and numerical functions, built-in application area support, and graphical interface building functions, users can not only combine the power of Mathematica and GPU computing, but also spend more time on developing and optimizing core CUDA kernel algorithms.
Ready-to-use applications
CUDA integration in Mathematica provides several ready-to-use CUDA functions that cover a broad range of topics such as mathematics, image processing, financial engineering, and more. Examples will be given in Section 6.
Zero device configuration

As long as NVIDIAs CUDA Toolkit is installed on the development system, there is nothing for users to configure or set up to utilize the GPU computing power. Mathematica automatically finds, configures, and makes CUDA devices available to the users.
Performance Improvements
GPUs have been using many cores for quite some time now. A current example is the NVIDIA GeForce GTX 480, which is a consumer card with 480 CUDA cores, support for double precision, and a retail price of less than $500. Mathematica provides support for multicore systems via built-in functions such as Parallelize, ParallelMap, etc., that free you from the cumbersome details of launching multiple kernels and processors and let you concentrate on the core implementation of your algorithms. This provides considerable speed up in many areas such as:
On a NVIDIA GT 240 with an Intel i7 950, FinancialDerivative improvements of up to 35x for binomial and 80x for Monte Carlo methods. Several random number generation algorithms have been implemented and speed ups of 50x have been observed. Volumetric rendering, implicit algebraic functions, and 3D fractals can be computed in real time, with a high frame rate.
Mathematicas CUDALink: Integrated GPU Programming

CUDALink is a built-in Mathematica package that provides a simple and powerful interface for using CUDA within Mathematicas streamlined workflow. CUDALink provides you with carefully tuned linear algebra, discrete Fourier transform, and image processing algorithms. You can also write your own CUDALink modules with minimal effort. Using CUDALink from within Mathematica gives you access to Mathematicas features, including visualization, import/export, and programming capabilities.
The CUDALink package included with Mathematica at no additional cost offers:

Automatic compilation of CUDA programs Automatic memory handling between host and GPU devices The ability to write your own CUDALink modules with little effort Support for Microsoft Visual Studio and GCC compilers Multiple-GPU support Support for single and double arithmetic precision Access to Mathematicas flexible programming language, automatic interface builders, and fullfeatured development environment Access to Mathematicas computable data, import/export capabilities, visualization features, and more Ready-to-use functionality with zero configuration needed if current video drivers and compilers are already installed Support for 32-bit and 64-bit Windows, Mac OS X, and Linux systems
System Requirements
To utilize Mathematicas CUDALink, the following operating systems and software are required:
Operating System: Windows, Linux, and Mac OS X, both 32- and 64-bit architecture. Mac OS X users need at least Mac OS X 10.6.3. NVIDIA CUDA enabled products. See www.nvidia.com/object/cuda_gpus.html for further information. Mathematica 8.0 or higher CUDA Toolkit 3.1 or higher A recent NVIDIA driver On Windows, Microsoft Visual Studio (Windows platforms) 2005 or higher
Getting Started with CUDALink

Programming the GPU in Mathematica is straightforward. It begins with loading the CUDALink package into Mathematica:
Needs@"CUDALink`"D
The following function verifies that the system has CUDA support:
CUDAQ@D True
Now, we will create a simple example that negates colors of a 3-channel image. First, write a CUDA kernel function as a string, and assign it to a variable:
kernel = " __global__ void cudaColorNegateHint *img, int *dim, int channelsL 8 int width = dim@0D, height = dim@1D; int xIndex = threadIdx.x + blockIdx.x * blockDim.x; int yIndex = threadIdx.y + blockIdx.y * blockDim.y; int index = channels * HxIndex + yIndex*widthL; if HxIndex < width && yIndex < heightL 8 for Hint c = 0; c < channels; c++L img@index + cD = 255 - img@index + cD;<<";
Pass that string to a built-in function CUDAFunctionLoad, along with the kernel function name and the argument specification. The last argument denotes the dimension of threads per block to be launched.
colorNegate = CUDAFunctionLoad@kernel, "cudaColorNegate", 88_Integer, _, "InputOutput"<, 8_Integer, _, "Input"<, _Integer<, 816, 16<D;
Several things are happening at this stage. Mathematica automatically compiles the kernel function as a dynamic library. There is no need for users to add system interface or memory management code. After compilation, the function is automatically bound to Mathematica and is ready to be called. Now you can apply this new CUDA function to any image format that Mathematica can handle.
i=
colorNegate@i, ImageDimensions@iD, ImageChannels@iDD
>
Finally, CUDAFunctionUnload can be called to unbind the function from Mathematica.

CUDAFunctionUnload@colorNegateD
Accessing System Information

CUDALink supplies several functions that make it easy to acquire detailed system information for GPU programming. For instance, CUDAQ tells whether the current hardware and system configuration support CUDALink:
Needs@"CUDALink`"D CUDAQ@D True
CUDAInformation generates a detailed report on supported CUDA devices. The returned data from CUDAInformation is a valid Mathematica input form, which means that it can be used to optimize CUDA kernel code programmatically. Several other functions are also provided that return in-depth information about Mathematica, operating systems, hardware, and C/C++ compilers that are currently used by CUDALink.
Example of a report generated by CUDAInformation.
Retargetable Code Generation

CUDALink provides you with the ability to perform on-the-fly compilation and execution, or to compile executables or libraries to be used later. CUDALinks portable simple compiling mechanism for CUDA code uses the NVCC compiler. Code can be compiled into an executable or an external library for reuse.
Mathematica also provides SymbolicC, which provides a hierarchical view of C code as Mathematicas own language. This makes it well suited to creating, manipulating, and optimizing C code. In conjunction with this capability, users can generate CUDA kernel code for several different targets for greater portability, less platform dependency, and better code optimization. Several built-in functions perform code generation, depending on the target:
SymbolicC
CUDACodeGenerate takes a CUDA kernel function or program and generates SymbolicC output. The SymbolicC output can then be used to render CUDA code that calls the correct functions to bind the CUDA code to Mathematica. CUDASymbolicCGenerate produces SymbolicC output for CUDA kernel as well as C++ wrapper code.
Dynamic library
CUDALibraryGenerate generates CUDA interface code and compiles it into a library that can be loaded into Mathematica. CUDALibraryFunctionGenerate takes a CUDA kernel function and generates the code required to use it. Then it takes the sources, compiles it to a dynamic library, and returns Mathematicas library function, which can be called with Mathematicas automatic dynamic library binding capability.
String
CUDACodeStringGenerate generates CUDA interface code in string form, which can be exported to other development platforms.
Integration with Mathematica Functions

When you use CUDALink, you can use all of Mathematicas features including visualization, import/export, and programming capabilities. Combining Mathematicas full-featured development environment and CUDA integration, you can focus on innovating your algorithms in CUDA kernels, rather than spending time on repetitive tasks, such as interface building.
Manipulate: Mathematicas automatic interface generator

Mathematica provides extensive built-in interface functions including both standard and advanced controls. Users can also customize controls using Mathematicas highly declarative interface language. Furthermore, Mathematica provides a fully automated interface generating function called Manipulate.
10
ManipulateBoperationB
, xF,
8x, 0, 9<, 8operation, 8CUDAErosion, CUDADilation<<F
By specifying possible ranges for variables, Manipulate automatically chooses appropriate controls and creates a user interface around it.
Example of a user interface built with Manipulate.
Support for import and export

Mathematica natively supports hundreds of file formats and their subformats for importing and exporting. Supported formats include: common image formats (JPEG, PNG, TIFF, BMP, etc.), video formats (AVI, MOV, H264, etc.), audio formats (WAV, AU, AIFF, FLAC, etc.), medical imaging formats (DICOM), data formats (Excel, CSV, MAT, etc.), and various raw formats for further processing. Any supported data formats will be automatically converted to Mathematicas unified data representation, or an expression, which can be used in all Mathematica functions, including CUDALink functions.
CUDA Programming within Mathematica 11
Not only does Mathematica provide access to local resources, but any URL can be used to access data online. The following code imports an image from a given URL:
image = Import@ "http:gallery.wolfram.com2dpopup00_contourMosaic.pop.jpg"D;
The function Import automatically recognizes the file format and converts it into Mathematica expression. This can be directly used by CUDALink functions, such as CUDAImageAdd:
output = CUDAImageAddBimage,
All outputs from Mathematica functions, including the ones from CUDALink functions, are also expressions, and can be easily exported to one of the supported formats using the Export function. For example, the following code exports CUDA output into PNG format:
Export@"masked.png", outputD masked.png $ImportFormats and $ExportFormats give full lists of supported import and export formats.
OpenCL Compatibility
In addition to CUDALink, Mathematica supports OpenCL with the built-in package OpenCLLink, which provides the same benefits and functionality of GPU programming as CUDALink over OpenCL architecture.
Mathematicas CUDALink Applications

In addition to support for user-defined CUDA functions and automatic compilation, CUDALink includes several ready-to-use functions that support NVIDIAs CUFFT (Fourier analysis) and CUBLAS (linear algebra) libraries and cover application areas such as image processing, financial derivatives, and random number generation.
Image Processing
CUDALink offers many image processing functions that have been carefully tuned for the GPU. These include pixel operations such as image arithmetic and composition; morphological operators such as such as erosion, dilation, opening, and closing; and image convolution and filtering. All of these operations work on either images or arrays of real and integer numbers.
12 CUDA Programming within Mathematica
Image convolution
CUDALinks convolution is similar to Mathematicas ListConvolve and ImageConvolve functions. It will operate on images, lists, or CUDA memory references, and it can use Mathematicas built-in filters as the kernel.
-1 0 1 -2 0 2 F -1 0 1
CUDAImageConvolveB
Convolving a microscopic image with a Sobel mask to detect edges. CUDALink supports simple pixel operations on one or two images, such as adding or multiplying pixel values from two images.
CUDAImageMultiplyB
Multiplication of two images.
Morphological operations
CUDALink supports fundamental operations such as erosion, dilation, opening, and closing. CUDAErosion , CUDADilation, CUDAOpening , and CUDAClosing are equivalent to Mathematicas built-in Erosion, Dilation, Opening, and Closing functions. More sophisticated morphological operations can be built using these fundamental operations.
CUDALink supports fundamental operations such as erosion, dilation, opening, and closing. CUDAErosion , CUDADilation, CUDAOpening , and CUDAClosing are equivalent to Mathematicas built-in Erosion, Dilation, Opening, and Closing functions. More sophisticated morphological operations can be built using these fundamental operations.
Video Processing
CUDALinks built-in image processing functions can also be applied to videos to perform realtime filtering. Many common formats such as H.264, QuickTime, and DivX are supported. With GPU computing power, CUDALinks video processing function can easily handle full high-resolution video (1080p) filtering in 30 frames per second.
Processing multiple video streams with different filters in real time.
Linear Algebra
You can perform various linear algebra functions with the GPU. Examples include vector addition, products, and other operations, finding minimum or maximum elements, or transposing rows and columns of an image.
Fourier Analysis
The Fourier analysis capabilities of the CUDALink package include forward and inverse Fourier transforms. The CUDAFourier function can operate on a list of 1D, 2D, or 3D real or complex numbers, or on data held in a chunk of device memory that is registered with CUDALinks Fourier memory manager.
14
Fluid Dynamics
Computational fluid dynamics examples are included with CUDALink. They simulate a large number of particles in real time using the GPUs computing power.
Fluid simulation with multi-particles. CUDALinks options pricing function uses the binomial or Monte Carlo method, depending on the type of option selected. Computing options on the GPU can be dozens of times faster than using the CPU and parallel processing.
Volumetric Rendering
CUDALink includes functions to read and display volumetric data in 3D, with interactive interfaces for transfer functions and other volume-rendering parameter controls.
Volumetric rendering of a medical image.

Pricing and Licensing Information

Wolfram Research offers many flexible licensing options for both organizations and individuals. You can choose a convenient, cost-effective plan for your workgroup, department, directorate, university, or just yourself, including network licensing for groups. Visit us online for more information: www.wolfram.com/mathematica-purchase
Summary
Thanks to Mathematicas integrated platform design, all functionality is included without the need to buy, learn, use, and maintain multiple tools and add-on packages. With its simplified development cycle, automatic memory management, multicore computing, and built-in functions for many applicationsplus full integration with all of Mathematicas other computation, development, and deployment capabilitiesMathematicas built-in CUDALink package provides a powerful interface for GPU computing.
Recommended Next Steps
Request your free Mathematica trial version

www.wolfram.com/mathematica-trial
Request a technical demo:

www.wolfram.com/mathematica-demo
Learn more about CUDA programming in Mathematica:

U.S. and Canada: 1-800-WOLFRAM (965-3726) info@wolfram.com Outside U.S. and Canada (except Europe and Asia): +1-217-398-0700 info@wolfram.com Europe: +44-(0)1993-883400 info@wolfram.co.uk Asia: +81-(0)3-3518-2880 info@wolfram.co.jp
2010 Wolfram Research, Inc. Mathematica is a registered trademark and Mathematica Player is a trademark of Wolfram Research, Inc. Wolfram|Alpha is a registered trademark and computational knowledge engine is a trademark of Wolfram Alpha LLCA Wolfram Research Company. All other trademarks are the property of their respective owners. Mathematica is not associated with Mathematica Policy Research, Inc. or Mathtech, Inc. MKT3014 2060699 0910.hk
16

CUDA Programming Within Mathematica

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

CUDA Programming Within Mathematica

Hochgeladen von

Copyright:

Verfügbare Formate

Wolfram White Paper

CUDA Programming within Mathematica

Motivations for CUDA

CUDA Programming within Mathematica

A Brief Introduction to Mathematica

Full-featured, unified development environment

Unified data representation

Multiparadigm programming language

Unique symbolic-numeric hybrid system

Extensive scientific and technical area coverage

CUDA Programming within Mathematica

Built-in high-performance computing

Easy data access and connectivity

Platform-dependent deployment options

Benefits of CUDA Integration in Mathematica

Simplified Development Cycle

CUDA Programming within Mathematica

Automated GPU memory and thread management

Automatic compilation and binding of dynamic libraries

Zero device configuration

Mathematicas CUDALink: Integrated GPU Programming

CUDA Programming within Mathematica

The CUDALink package included with Mathematica at no additional cost offers:

Getting Started with CUDALink

CUDA Programming within Mathematica

colorNegate@i, ImageDimensions@iD, ImageChannels@iDD

Finally, CUDAFunctionUnload can be called to unbind the function from Mathematica.

Accessing System Information

CUDA Programming within Mathematica

Example of a report generated by CUDAInformation.

Retargetable Code Generation

CUDA Programming within Mathematica

Integration with Mathematica Functions

Manipulate: Mathematicas automatic interface generator

CUDA Programming within Mathematica

8x, 0, 9<, 8operation, 8CUDAErosion, CUDADilation<<F

Example of a user interface built with Manipulate.

Support for import and export

Mathematicas CUDALink Applications

Multiplication of two images.

Processing multiple video streams with different filters in real time.

CUDA Programming within Mathematica

Volumetric rendering of a medical image.

Pricing and Licensing Information

Recommended Next Steps

Request your free Mathematica trial version

Request a technical demo:

Learn more about CUDA programming in Mathematica:

CUDA Programming within Mathematica

Das könnte Ihnen auch gefallen