Sie sind auf Seite 1von 17

Wolfram White Paper

CUDA Programming within Mathematica

Introduction
CUDA is short for Common Unified Device Architecture. NVIDIA developed the language to program graphics cards and let you run C-like code on the Graphical Processing Unit (GPU). CUDA promises significant performance increases, and CUDA-designed programs will scale to multicore systems. Considering CUDAs advanced technology, you may expect its programming to be enormously complicated. Enter Mathematica, the easiest way to program for CUDA and unlock GPU performance potential. Unlike programming in C or developing CUDA wrapper code, now you dont have to be a programming wizard to use CUDA. Mathematica offers an intuitive environmenteven featuring built-in ready-to-use examples for common application areas, such as image processing, medical imaging, statistics, and finance that makes CUDA programming a breeze, even if youve never used Mathematica before. And if you have used Mathematica, youll be amazed by the massive boost in computational power, as well as application performance, enhancing speed by factors easily exceeding 100. In this document, we describe the benefits of CUDA integration in Mathematica and provide some applications for which it is suitable, such as image processing and financial engineering.

Motivations for CUDA


CUDA is different from regular C code, which runs on the Central Processing Unit (CPU). In CUDA terminology, the device is the GPU, while the host is the CPU. There are two motivations for using CUDA. The first is the promise of significant speed, while the other is the promise that a CUDA-designed program will scale to multicore systems. With CPU manufacturers indicating that hundred- and thousand-core systems will be included in user desktops, the CUDA programming paradigm, which scales well to thousands of CPU cores, becomes more important. GPUs currently have thousands of cores, and correct formulation of the problem and memory usage can give you a significant performance boost.

CUDA Programming within Mathematica

A Brief Introduction to Mathematica


Mathematica has been one of the most powerful languages for technical computing for more than 20 years. Wolfram Research introduces fully integrated GPU programming capabilities in Mathematica, which brings a whole new meaning to high-performance computing. For CUDA developers, the new integration means unlimited access to Mathematicas vast computing abilities. Mathematica is a sophisticated development environment that combines a flexible programming language with a wide range of symbolic and numeric computational capabilities, production of high-quality visualizations, built-in application area packages, and a range of immediate deployment options. With direct integration of dynamic libraries, instant interface construction, and automatic C code generation and linking, Mathematica provides the most sophisticated build-todeploy environment in the market. Some highlights of Mathematicas features include:

Full-featured, unified development environment


Through its unique interface and integrated features for computation, development, and deployment, Mathematica provides a streamlined workflow.

Unified data representation


At the core of Mathematica is the foundational idea that everythingdata, programs, formulas, graphics, documentscan be represented as symbolic entities, called expressions. This unified representation makes Mathematicas language and functions extremely flexible, streamlined, and consistent.

Multiparadigm programming language


Mathematica provides its own highly declarative functional language, as well as several different programming paradigms, such as procedural and rule-based programming. Programmers can choose their own style for writing code with minimal effort. Along with comprehensive documentation and resources, Mathematicas flexibility greatly reduces the cost of entry for new users.

Unique symbolic-numeric hybrid system


The fundamental principle of Mathematica is full integration of symbolic and numeric computing capabilities. Through its full automation and preprocessing mechanisms, users can enjoy the full power of a hybrid computing system without the knowledge of specific methodologies and algorithms.

Extensive scientific and technical area coverage


Mathematica provides thousands of built-in functions and packages that cover a broad range of scientific and technical computing areas, such as statistics, control systems, data visualization, and image processing. All functions are carefully designed and tightly integrated with the system, which enables users to break the barriers between specialized areas and explore new possibilities.

CUDA Programming within Mathematica

Built-in high-performance computing


Mathematica fully supports multi-threaded, multicore processors without extra packages. Many functions automatically utilize the power of multicore processors, and built-in generic parallel functions make high-performance programming a simple process.

Easy data access and connectivity


Mathematica natively supports hundreds of formats for importing and exporting, as well as unlimited access to data from Wolfram|Alpha, Wolfram Researchs computational knowledge engine. It also provides APIs for accessing common database and programming languages, such as SQL, C/C++, and .NET.

Platform-dependent deployment options


Through its interactive notebooks, Mathematica Player, and browser plug-ins, Mathematica provides a wide range of options for deployment. Built-in code generation functionality can be used to create standalone programs for independent distribution.

Benefits of CUDA Integration in Mathematica


For users who want to tap into the power of GPU computing, CUDA integration in Mathematica provides benefits in both development and performance. The full integration and automation of Mathematicas CUDA capability means a more productive and efficient development cycle. In addition, it brings unprecedented levels of performance improvements without extra development time and cost.

Simplified Development Cycle


Automation of development project management
Like many other development frameworks, programming with CUDA may require programmers to manage project setup, platform dependencies, and device configuration. CUDA integration in Mathematica makes the process completely transparent and fully automated. Users do not have to set up any project files or worry about platform dependency.

CUDA Programming within Mathematica

Automated GPU memory and thread management


In a typical CUDA program, programmers should write memory and thread management code manually, in addition to a CUDA kernel function:

With Mathematica, memory and thread management for the GPU is completely automatic, which completely abolishes the need for extra code writing:

The memory between the host and the device is synchronized only as needed, to avoid data transfer (which is known to be very costly) unless absolutely needed. For advanced applications, full control of how and when the memory needs to be copied between the host and GPU devices is provided.

Automatic compilation and binding of dynamic libraries


Mathematicas CUDA support streamlines the whole programming process. It goes one step further by compiling and establishing the connection with Mathematica in a completely transparent process, thereby greatly simplifying the development process for the end user. With a single command, your CUDA kernel code is compiled, linked, and bound to Mathematica, and is ready to use.

CUDA integration provides full access to Mathematicas native language and built-in functions. It also provides free exchange of data between Mathematica and users CUDA programs. With Mathematicas comprehensive symbolic and numerical functions, built-in application area CUDA Programming within the power 5 support, and graphical interface building functions, users can not only combine Mathematica of Mathematica and GPU computing, but also spend more time on developing and optimizing core CUDA kernel algorithms.

With Mathematicas comprehensive symbolic and numerical functions, built-in application area support, and graphical interface building functions, users can not only combine the power of Mathematica and GPU computing, but also spend more time on developing and optimizing core CUDA kernel algorithms.

Ready-to-use applications
CUDA integration in Mathematica provides several ready-to-use CUDA functions that cover a broad range of topics such as mathematics, image processing, financial engineering, and more. Examples will be given in Section 6.

Zero device configuration


As long as NVIDIAs CUDA Toolkit is installed on the development system, there is nothing for users to configure or set up to utilize the GPU computing power. Mathematica automatically finds, configures, and makes CUDA devices available to the users.

Performance Improvements
GPUs have been using many cores for quite some time now. A current example is the NVIDIA GeForce GTX 480, which is a consumer card with 480 CUDA cores, support for double precision, and a retail price of less than $500. Mathematica provides support for multicore systems via built-in functions such as Parallelize, ParallelMap, etc., that free you from the cumbersome details of launching multiple kernels and processors and let you concentrate on the core implementation of your algorithms. This provides considerable speed up in many areas such as:
On a NVIDIA GT 240 with an Intel i7 950, FinancialDerivative improvements of up to 35x for binomial and 80x for Monte Carlo methods. Several random number generation algorithms have been implemented and speed ups of 50x have been observed. Volumetric rendering, implicit algebraic functions, and 3D fractals can be computed in real time, with a high frame rate.

Mathematicas CUDALink: Integrated GPU Programming


CUDALink is a built-in Mathematica package that provides a simple and powerful interface for using CUDA within Mathematicas streamlined workflow. CUDALink provides you with carefully tuned linear algebra, discrete Fourier transform, and image processing algorithms. You can also write your own CUDALink modules with minimal effort. Using CUDALink from within Mathematica gives you access to Mathematicas features, including visualization, import/export, and programming capabilities.

CUDA Programming within Mathematica

The CUDALink package included with Mathematica at no additional cost offers:


Automatic compilation of CUDA programs Automatic memory handling between host and GPU devices The ability to write your own CUDALink modules with little effort Support for Microsoft Visual Studio and GCC compilers Multiple-GPU support Support for single and double arithmetic precision Access to Mathematicas flexible programming language, automatic interface builders, and fullfeatured development environment Access to Mathematicas computable data, import/export capabilities, visualization features, and more Ready-to-use functionality with zero configuration needed if current video drivers and compilers are already installed Support for 32-bit and 64-bit Windows, Mac OS X, and Linux systems

System Requirements
To utilize Mathematicas CUDALink, the following operating systems and software are required:
Operating System: Windows, Linux, and Mac OS X, both 32- and 64-bit architecture. Mac OS X users need at least Mac OS X 10.6.3. NVIDIA CUDA enabled products. See www.nvidia.com/object/cuda_gpus.html for further information. Mathematica 8.0 or higher CUDA Toolkit 3.1 or higher A recent NVIDIA driver On Windows, Microsoft Visual Studio (Windows platforms) 2005 or higher

Getting Started with CUDALink


Programming the GPU in Mathematica is straightforward. It begins with loading the CUDALink package into Mathematica:
Needs@"CUDALink`"D

The following function verifies that the system has CUDA support:
CUDAQ@D True

Now, we will create a simple example that negates colors of a 3-channel image. First, write a CUDA kernel function as a string, and assign it to a variable:
kernel = " __global__ void cudaColorNegateHint *img, int *dim, int channelsL 8 int width = dim@0D, height = dim@1D; int xIndex = threadIdx.x + blockIdx.x * blockDim.x; int yIndex = threadIdx.y + blockIdx.y * blockDim.y; int index = channels * HxIndex + yIndex*widthL; if HxIndex < width && yIndex < heightL 8 for Hint c = 0; c < channels; c++L img@index + cD = 255 - img@index + cD;<<";

CUDA Programming within Mathematica

Pass that string to a built-in function CUDAFunctionLoad, along with the kernel function name and the argument specification. The last argument denotes the dimension of threads per block to be launched.
colorNegate = CUDAFunctionLoad@kernel, "cudaColorNegate", 88_Integer, _, "InputOutput"<, 8_Integer, _, "Input"<, _Integer<, 816, 16<D;

Several things are happening at this stage. Mathematica automatically compiles the kernel function as a dynamic library. There is no need for users to add system interface or memory management code. After compilation, the function is automatically bound to Mathematica and is ready to be called. Now you can apply this new CUDA function to any image format that Mathematica can handle.

i=

colorNegate@i, ImageDimensions@iD, ImageChannels@iDD

>

Finally, CUDAFunctionUnload can be called to unbind the function from Mathematica.


CUDAFunctionUnload@colorNegateD

Accessing System Information


CUDALink supplies several functions that make it easy to acquire detailed system information for GPU programming. For instance, CUDAQ tells whether the current hardware and system configuration support CUDALink:
Needs@"CUDALink`"D CUDAQ@D True

CUDA Programming within Mathematica

CUDAInformation generates a detailed report on supported CUDA devices. The returned data from CUDAInformation is a valid Mathematica input form, which means that it can be used to optimize CUDA kernel code programmatically. Several other functions are also provided that return in-depth information about Mathematica, operating systems, hardware, and C/C++ compilers that are currently used by CUDALink.

Example of a report generated by CUDAInformation.

Retargetable Code Generation


CUDALink provides you with the ability to perform on-the-fly compilation and execution, or to compile executables or libraries to be used later. CUDALinks portable simple compiling mechanism for CUDA code uses the NVCC compiler. Code can be compiled into an executable or an external library for reuse.

CUDA Programming within Mathematica

Mathematica also provides SymbolicC, which provides a hierarchical view of C code as Mathematicas own language. This makes it well suited to creating, manipulating, and optimizing C code. In conjunction with this capability, users can generate CUDA kernel code for several different targets for greater portability, less platform dependency, and better code optimization. Several built-in functions perform code generation, depending on the target:

SymbolicC
CUDACodeGenerate takes a CUDA kernel function or program and generates SymbolicC output. The SymbolicC output can then be used to render CUDA code that calls the correct functions to bind the CUDA code to Mathematica. CUDASymbolicCGenerate produces SymbolicC output for CUDA kernel as well as C++ wrapper code.

Dynamic library
CUDALibraryGenerate generates CUDA interface code and compiles it into a library that can be loaded into Mathematica. CUDALibraryFunctionGenerate takes a CUDA kernel function and generates the code required to use it. Then it takes the sources, compiles it to a dynamic library, and returns Mathematicas library function, which can be called with Mathematicas automatic dynamic library binding capability.

String
CUDACodeStringGenerate generates CUDA interface code in string form, which can be exported to other development platforms.

Integration with Mathematica Functions


When you use CUDALink, you can use all of Mathematicas features including visualization, import/export, and programming capabilities. Combining Mathematicas full-featured development environment and CUDA integration, you can focus on innovating your algorithms in CUDA kernels, rather than spending time on repetitive tasks, such as interface building.

Manipulate: Mathematicas automatic interface generator


Mathematica provides extensive built-in interface functions including both standard and advanced controls. Users can also customize controls using Mathematicas highly declarative interface language. Furthermore, Mathematica provides a fully automated interface generating function called Manipulate.

10

CUDA Programming within Mathematica

ManipulateBoperationB

, xF,

8x, 0, 9<, 8operation, 8CUDAErosion, CUDADilation<<F

By specifying possible ranges for variables, Manipulate automatically chooses appropriate controls and creates a user interface around it.

Example of a user interface built with Manipulate.

Support for import and export


Mathematica natively supports hundreds of file formats and their subformats for importing and exporting. Supported formats include: common image formats (JPEG, PNG, TIFF, BMP, etc.), video formats (AVI, MOV, H264, etc.), audio formats (WAV, AU, AIFF, FLAC, etc.), medical imaging formats (DICOM), data formats (Excel, CSV, MAT, etc.), and various raw formats for further processing. Any supported data formats will be automatically converted to Mathematicas unified data representation, or an expression, which can be used in all Mathematica functions, including CUDALink functions.
CUDA Programming within Mathematica 11

Not only does Mathematica provide access to local resources, but any URL can be used to access data online. The following code imports an image from a given URL:
image = Import@ "http:gallery.wolfram.com2dpopup00_contourMosaic.pop.jpg"D;

The function Import automatically recognizes the file format and converts it into Mathematica expression. This can be directly used by CUDALink functions, such as CUDAImageAdd:

output = CUDAImageAddBimage,

All outputs from Mathematica functions, including the ones from CUDALink functions, are also expressions, and can be easily exported to one of the supported formats using the Export function. For example, the following code exports CUDA output into PNG format:
Export@"masked.png", outputD masked.png $ImportFormats and $ExportFormats give full lists of supported import and export formats.

OpenCL Compatibility
In addition to CUDALink, Mathematica supports OpenCL with the built-in package OpenCLLink, which provides the same benefits and functionality of GPU programming as CUDALink over OpenCL architecture.

Mathematicas CUDALink Applications


In addition to support for user-defined CUDA functions and automatic compilation, CUDALink includes several ready-to-use functions that support NVIDIAs CUFFT (Fourier analysis) and CUBLAS (linear algebra) libraries and cover application areas such as image processing, financial derivatives, and random number generation.

Image Processing
CUDALink offers many image processing functions that have been carefully tuned for the GPU. These include pixel operations such as image arithmetic and composition; morphological operators such as such as erosion, dilation, opening, and closing; and image convolution and filtering. All of these operations work on either images or arrays of real and integer numbers.
12 CUDA Programming within Mathematica

Image convolution
CUDALinks convolution is similar to Mathematicas ListConvolve and ImageConvolve functions. It will operate on images, lists, or CUDA memory references, and it can use Mathematicas built-in filters as the kernel.
-1 0 1 -2 0 2 F -1 0 1

CUDAImageConvolveB

Convolving a microscopic image with a Sobel mask to detect edges. CUDALink supports simple pixel operations on one or two images, such as adding or multiplying pixel values from two images.

CUDAImageMultiplyB

Multiplication of two images.

Morphological operations
CUDALink supports fundamental operations such as erosion, dilation, opening, and closing. CUDAErosion , CUDADilation, CUDAOpening , and CUDAClosing are equivalent to Mathematicas built-in Erosion, Dilation, Opening, and Closing functions. More sophisticated morphological operations can be built using these fundamental operations.
CUDA Programming within Mathematica 13

CUDALink supports fundamental operations such as erosion, dilation, opening, and closing. CUDAErosion , CUDADilation, CUDAOpening , and CUDAClosing are equivalent to Mathematicas built-in Erosion, Dilation, Opening, and Closing functions. More sophisticated morphological operations can be built using these fundamental operations.

Video Processing
CUDALinks built-in image processing functions can also be applied to videos to perform realtime filtering. Many common formats such as H.264, QuickTime, and DivX are supported. With GPU computing power, CUDALinks video processing function can easily handle full high-resolution video (1080p) filtering in 30 frames per second.

Processing multiple video streams with different filters in real time.

Linear Algebra
You can perform various linear algebra functions with the GPU. Examples include vector addition, products, and other operations, finding minimum or maximum elements, or transposing rows and columns of an image.

Fourier Analysis
The Fourier analysis capabilities of the CUDALink package include forward and inverse Fourier transforms. The CUDAFourier function can operate on a list of 1D, 2D, or 3D real or complex numbers, or on data held in a chunk of device memory that is registered with CUDALinks Fourier memory manager.

14

CUDA Programming within Mathematica

Fluid Dynamics
Computational fluid dynamics examples are included with CUDALink. They simulate a large number of particles in real time using the GPUs computing power.

Fluid simulation with multi-particles. CUDALinks options pricing function uses the binomial or Monte Carlo method, depending on the type of option selected. Computing options on the GPU can be dozens of times faster than using the CPU and parallel processing.

Volumetric Rendering
CUDALink includes functions to read and display volumetric data in 3D, with interactive interfaces for transfer functions and other volume-rendering parameter controls.

Volumetric rendering of a medical image.


CUDA Programming within Mathematica 15

Pricing and Licensing Information


Wolfram Research offers many flexible licensing options for both organizations and individuals. You can choose a convenient, cost-effective plan for your workgroup, department, directorate, university, or just yourself, including network licensing for groups. Visit us online for more information: www.wolfram.com/mathematica-purchase

Summary
Thanks to Mathematicas integrated platform design, all functionality is included without the need to buy, learn, use, and maintain multiple tools and add-on packages. With its simplified development cycle, automatic memory management, multicore computing, and built-in functions for many applicationsplus full integration with all of Mathematicas other computation, development, and deployment capabilitiesMathematicas built-in CUDALink package provides a powerful interface for GPU computing.

Recommended Next Steps

Request your free Mathematica trial version


www.wolfram.com/mathematica-trial

Request a technical demo:


www.wolfram.com/mathematica-demo

Learn more about CUDA programming in Mathematica:


U.S. and Canada: 1-800-WOLFRAM (965-3726) info@wolfram.com Outside U.S. and Canada (except Europe and Asia): +1-217-398-0700 info@wolfram.com Europe: +44-(0)1993-883400 info@wolfram.co.uk Asia: +81-(0)3-3518-2880 info@wolfram.co.jp

2010 Wolfram Research, Inc. Mathematica is a registered trademark and Mathematica Player is a trademark of Wolfram Research, Inc. Wolfram|Alpha is a registered trademark and computational knowledge engine is a trademark of Wolfram Alpha LLCA Wolfram Research Company. All other trademarks are the property of their respective owners. Mathematica is not associated with Mathematica Policy Research, Inc. or Mathtech, Inc. MKT3014 2060699 0910.hk

16

CUDA Programming within Mathematica

Das könnte Ihnen auch gefallen