Sie sind auf Seite 1von 70

Compilation for GPU

Accelerated Ray Tracing in OptiX


Steven G. Parker

Tuesday, August 16, 2011

Overview

Lessons from history


OptiX overview
CUDA as a shading language
OptiX compiler
Results

Tuesday, August 16, 2011

RTSL: a Ray Tracing Shading


Language
Steven G. Parker
Solomon Boulos
James Bigler
Austin Robison
University of Utah
3

Tuesday, August 16, 2011

Foundation
Build on OpenGL Shading Language (GLSL)
Simple, clean, C-like syntax
Simple type system
float, int, bool
small vectors (2-4 elements)
small matrices (float only)
structs, fixed-length arrays, no pointers

Convenient element access syntax:


v1.xy = v2.yx; v3.xyz = v4.yyy;

Graphics oriented operators, functions

Tuesday, August 16, 2011

New functions
Convenience:
dominant axis of vector
vector perpendicular to vector
range test
horizontal min/max

Additional functionality:
random numbers
matrix inversion
overloaded pow for integer exponent

Ray Tracing:
trace(): Recursive ray tracing call
hit(): Report possible intersection with primitive

Tuesday, August 16, 2011

Deviation from GLSL


Object model with inheritance:
class Sphere : rt_Primitive;

Reference type
Different variable qualifiers
GLSL: attribute, uniform, varying, const
RTSL: public, private, scratch, const

First class color type (not always 3/4 components)

Tuesday, August 16, 2011

Computational Model
dow
Sha
s
Ray

Point,
Normal

Light

Frame Buffer
or
Parent Ray

Camera

Lighting
Engine

LightColor,
LightDirection,
LightDistance

RayOrigin,
RayDirection

Se
Ra con
ys da
ry

Render Loop

ScreenCoord,
LensCoord

lor
leCo
p
m
Sa

RayOrigin,
RayDirection

GeometricNormal,
ShadingNormal
HitPoint

HitDistance

Primitive

Material

TextureUV/UVW

TextureColor

RTSL

Scene Traversal

Texture

Rendering Engine
Tuesday, August 16, 2011

Tuesday, August 16, 2011

OptiX Goals

Make GPU ray tracing simpler


Function in a resource limited device
Achieve high performance
Able to express most ray tracing algorithms
Algorithm agnostic

User defined payloads


Programmable intersection
Interoperate with rasterization pipeline

Dont require a new language

Leverage CUDA architecture and existing compiler infrastructure

Tuesday, August 16, 2011

Life of a ray

1 Ray Generation
2

Intersection

3 Shading
Pinhole

1 Camera
Payload
float3 color

Ray-Sphere
Intersection

Tuesday, August 16, 2011

Lambertian
Shading

10

Life of a ray
1

Pinhole
Camera

RT_PROGRAM void pinhole_camera()


{
float2 d = make_float2
(launch_index) / make_float2
(launch_dim) * 2.f - 1.f;
float3 ray_origin = eye;
float3 ray_direction = normalize
(d.x*U + d.y*V + W);
optix::Ray ray = optix::make_Ray
(ray_origin, ray_direction,
radiance_ray_type, scene_epsilon,
RT_DEFAULT_MAX);
PerRayData_radiance prd;
rtTrace(top_object, ray, prd);
output_buffer[launch_index] =
make_color( prd.result );
}

Tuesday, August 16, 2011

Ray-Sphere
Intersection

RT_PROGRAM void intersect_sphere()


{
float3 O = ray.origin - center;
float3 D = ray.direction;
float b = dot(O, D);
float c = dot(O, O)-radius*radius;
float disc = b*b-c;
if(disc > 0.0f){
float sdisc = sqrtf(disc);
float root1 = (-b - sdisc);
bool check_second = true;
if( rtPotentialIntersection
( root1 ) ) {
shading_normal = geometric_normal
= (O + root1*D)/radius;
if(rtReportIntersection(0))
check_second = false;
}
if(check_second) {
float root2 = (-b + sdisc);
if( rtPotentialIntersection
( root2 ) ) {
shading_normal =
geometric_normal = (O + root2*D)/
radius;
rtReportIntersection(0);
}
}
}

Lambertian
Shading

RT_PROGRAM void closest_hit_radiance3()


{
float3 world_geo_normal
= normalize
( rtTransformNormal
( RT_OBJECT_TO_WORLD,
geometric_normal ) );
float3 world_shade_normal = normalize
( rtTransformNormal
( RT_OBJECT_TO_WORLD,
shading_normal ) );
float3 ffnormal
= faceforward
( world_shade_normal, -ray.direction,
world_geo_normal );
float3 color = Ka *
ambient_light_color;
float3 hit_point = ray.origin + t_hit
* ray.direction;
for(int i = 0; i < lights.size(); +
+i) {
BasicLight light = lights[i];
float3 L = normalize(light.pos hit_point);
float nDl = dot( ffnormal, L);
if( nDl > 0.0f ){
// cast shadow ray
PerRayData_shadow shadow_prd;

11

Program objects (shaders)


RT_PROGRAM void pinhole_camera()
{
float2 d = make_float2
(launch_index) / make_float2
(launch_dim) * 2.f - 1.f;
float3 ray_origin = eye;
float3 ray_direction = normalize
(d.x*U + d.y*V + W);

Interconnection of programs defines the outcome

optix::Ray ray = optix::make_Ray


(ray_origin, ray_direction,
radiance_ray_type, scene_epsilon,
RT_DEFAULT_MAX);

Input language is based on CUDA C/C++

PerRayData_radiance prd;
rtTrace(top_object, ray, prd);
output_buffer[launch_index] =
make_color( prd.result );
}

Data associated with ray is programmable


No new language to learn
Powerful language features available immediately
(templates, function overloading, default values, ...)

Can also take raw PTX as input

Caveat: still need to use it responsibly to get performance

Tuesday, August 16, 2011

12

Abstract Execution Model


Launch
Ray Generation
Program

rtContextLaunch

Exception
Program

rtTrace

Shade

Traverse

Miss
Program

Node Graph
Traversal

Selector Visit
Program

Tuesday, August 16, 2011

Acceleration
Traversal

Closest Hit
Program

Intersection
Program

Any Hit
Program

13

Programmable Operations
Rasterization

Ray Tracing

Fragment

Closest Hit
Any Hit

Vertex

Intersection

Geometry

Selector

Hull/Domain (Tesselation)
Ray Generation
Miss
Exception

Tuesday, August 16, 2011

14

Flexible intersection
Primary Ray

Intersection (miss)
Intersection (hit)
Any Hit
Intersection (miss)
Intersection (hit)
Any hit (ignore intersection)
Closest hit

Tuesday, August 16, 2011

15

Flexible intersection
Shadow Ray

Tuesday, August 16, 2011

Intersection (hit)
Any Hit (ignore intersection)
Intersection (hit)
Any hit (terminate ray)
Closest hit

16

Geometry representation

Geometry Instance

Geometry
Intersection

Tuesday, August 16, 2011

Material

Material
Material

Geometry instance binds


geometry to one or more
materials

Closest Hit/Any Hit pairs per raytype (user defined)

Closest
Hit 00 Any
Hit 00
Closest
ClosestHit
Hit 0 Any
AnyHit
Hit 0
Closest
Hit 11 Any
Hit 11
Closest
ClosestHit
Hit 1 Any
AnyHit
Hit 1

17

Scene representation

Context
Group

Acceleration

Transform

Transform

Acceleration

Geometry Group

Geometry Instance

Geometry
Intersection

Material

Closest
Hit 00
Closest
ClosestHit
Hit 0

Any
Hit 00
Any
AnyHit
Hit 0

Closest
Hit 11
Closest
ClosestHit
Hit 1

Any
Hit 11
Any
AnyHit
Hit 1

Tuesday, August 16, 2011

Geometry
Intersection

Geometry Instance

Material
Material

All objects can have multiple parents


(instancing)
Geometry data
Materials
Acceleration structures

Tree may have multiple roots


Separate objects for shadow rays

Not shown: Selector


Programmable (LOD, switch, etc).

Material
Material
Material

Closest
Hit 00
Closest
ClosestHit
Hit 0

Any
Hit 00
Any
AnyHit
Hit 0

Closest
Hit 11
Closest
ClosestHit
Hit 1

Any
Hit 11
Any
AnyHit
Hit 1

18

Object variable inheritance

Context

Group

Acceleration

Transform

Transform

Acceleration

Geometry Group

Geometry Instance

Geometry
Intersection

Geometry Instance

Material
Material
Material

Closest
Hit 00
Closest
ClosestHit
Hit 0

Any
Hit 00
Any
AnyHit
Hit 0

Closest
Hit 11
Closest
ClosestHit
Hit 1

Any
Hit 11
Any
AnyHit
Hit 1

Tuesday, August 16, 2011

Material parameters can be


attached to context (global),
geometry instance or material

Geometry
Intersection

Material
Material
Material

Closest
Hit 00
Closest
ClosestHit
Hit 0

Any
Hit 00
Any
AnyHit
Hit 0

Closest
Hit 11
Closest
ClosestHit
Hit 1

Any
Hit 11
Any
AnyHit
Hit 1

Variables are one of:


A small primitive type (float4, matrix, ...)
A small user defined type
A handle to a buffer (1D, 2D, 3D)
A texture

19

Object variable inheritance

Lights
Context

Group

Acceleration

Transform

Transform

Acceleration

Geometry Group

Geometry Instance

Geometry
Intersection

Geometry Instance

Material
Material
Material

Closest
Hit 00
Closest
ClosestHit
Hit 0

Any
Hit 00
Any
AnyHit
Hit 0

Closest
Hit 11
Closest
ClosestHit
Hit 1

Any
Hit 11
Any
AnyHit
Hit 1

Tuesday, August 16, 2011

Material parameters can be


attached to context (global),
geometry instance or material

Geometry
Intersection

Lights

Material
Material
Material

Closest
Hit 00
Closest
ClosestHit
Hit 0

Any
Hit 00
Any
AnyHit
Hit 0

Closest
Hit 11
Closest
ClosestHit
Hit 1

Any
Hit 11
Any
AnyHit
Hit 1

Variables are one of:


A small primitive type (float4, matrix, ...)
A small user defined type
A handle to a buffer (1D, 2D, 3D)
A texture

19

Object variable inheritance

Lights
Context

Group

Acceleration

Transform

Transform

Acceleration

Geometry Group

Geometry Instance

Geometry
Intersection

Geometry Instance

Material
Material

Material

Closest
Hit 00
Closest
ClosestHit
Hit 0

Any
Hit 00
Any
AnyHit
Hit 0

Closest
Hit 11
Closest
ClosestHit
Hit 1

Any
Hit 11
Any
AnyHit
Hit 1

Tuesday, August 16, 2011

Material parameters can be


attached to context (global),
geometry instance or material

Geometry
Intersection

Lights
Material
Material

Material

Closest
Hit 00
Closest
ClosestHit
Hit 0

Any
Hit 00
Any
AnyHit
Hit 0

Closest
Hit 11
Closest
ClosestHit
Hit 1

Any
Hit 11
Any
AnyHit
Hit 1

Variables are one of:


A small primitive type (float4, matrix, ...)
A small user defined type
A handle to a buffer (1D, 2D, 3D)
A texture

19

CUDA as a shading language

Shading language for OptiX is a restricted subset of the CUDA device code

Just C++
A few conventions for accessing runtime (object model, trace functions, etc.)
No data management functionality required from CUDA
Some CUDA functionality disallowed (shared memory, barriers, etc.)

Use CUDA compiler (nvcc)

Produces PTX

Tuesday, August 16, 2011

20

Intersection program
!"#"$%&'"()&*+,-"$")$./)-&#().')012"3#
4"#().5$&16#"()7'0$%.89)#":#6$")300$;&'.#"(<)
=(";)1/)380("(#)-&#)(-.;"$)*0$)(-.;&'>

4"8"3#(),-&3-)%.#"$&.8)#0)6("
=(";)*0$
?$0>$.%%.18")(6$*.3"(
@880,&'>).$1&#$.$/)#$&.'>8")16A"$)*0$%.#(
B#3C

Tuesday, August 16, 2011

21

rtDeclareVariable(float3, p0, );
rtDeclareVariable(float3, p1, );
rtDeclareVariable(float3, p2, );
rtDeclareVariable(float3, geometric_normal, attribute geometric_normal, );
rtDeclareVariable(float3, shading_normal, attribute shading_normal, );
rtDeclareVariable(optix::Ray, ray, rtCurrentRay, );

RT_PROGRAM void triangle_intersect(int)


{
// Intersect ray with triangle
float3 e0 = p1 - p0;
float3 e1 = p0 - p2;
float3 n = cross( e0, e1 );
float v
float r

= dot( n, ray.direction );
= 1.0f / v;

if(t < ray.tmax && t > ray.tmin) {


float3 i
= cross( e2, ray.direction );
float v1
= dot( i, e1 );
float beta = r*v1;
if(beta >= 0.0f){
float v2 = dot( i, e0 );
float gamma = r*v2;
if( (v1+v2)*v <= v*v && gamma >= 0.0f ) {
if(

float3 e2 = p0 - ray.origin;
float va = dot( n, e2 );
float t
= r*va;

Tuesday, August 16, 2011

rtPotentialIntersection( t ) ) {
shading_normal = geometric_normal = -n;
rtReportIntersection( 0 );
}}}}}

22

rtBuffer<Vertex> vertex_buffer;
rtBuffer<uint3> index_buffer;
rtDeclareVariable(float3, geometric_normal, attribute geometric_normal, );
rtDeclareVariable(float3, shading_normal, attribute shading_normal, );
rtDeclareVariable(optix::Ray, ray, rtCurrentRay, );

RT_PROGRAM void mesh_intersect(int primIdx)


{
uint3 v_idx = index_buffer[primIdx];
float3 p0 = vertex_buffer[v_idx].p0;
float3 p1 = vertex_buffer[v_idx].p1;
float3 p2 = vertex_buffer[v_idx].p2;
// Intersect ray with triangle
float3 e0 = p1 - p0;
float3 e1 = p0 - p2;
float3 n = cross( e0, e1 );
float v
float r
...

= dot( n, ray.direction );
= 1.0f / v;

Tuesday, August 16, 2011

23

Closest hit program


(traditional shader)
!"D'"(),-.#)-.EE"'(),-"').)$./)-&#().')012"3#
B:"36#";)*0$)'".$"(#)&'#"$("3F0')7380("(#)-&#<).80'>).)$./
@6#0%.F3.88/)E"$*0$%();"*"$$";)(-.;&'>)7(-.;")0'3")E"$)$./<
G.')$"36$(&H"8/)(-00#)%0$")$./(
4-.;0,(
I"J"3F0'(
@%1&"'#)03386(&0'

K0(#)30%%0'

Tuesday, August 16, 2011

24

Ray Payloads
G.');"D'").$1&#$.$/);.#.),&#-)#-")$./
40%"F%"()3.88";)#-")LE"$)$./);.#.M
!.#.)3.')1")E.((";);0,')0$)6E)#-")$./)#$"")70$)10#-<
N6(#).)6("$O;"D'";)(#$63#).33"((";)1/).88)(-.;"$)E$0>$.%(
P.$&"()E"$)$./)#/E"
Attenuation
Color
Depth
importance

Tuesday, August 16, 2011

Color
Depth
importance

25

struct PerRayData_radiance
{
float3 result;
};
rtDeclareVariable(PerRayData_radiance, prd_radiance, rtPayload,);
rtDeclareVariable(float3, shading_normal, attribute shading_normal,);
RT_PROGRAM void closest_hit_radiance()
{
float3 worldnormal = normalize(rtTransformNormal(RT_OBJECT_TO_WORLD,
shading_normal));
prd_radiance.result = worldnormal * 0.5f + 0.5f;
}

Tuesday, August 16, 2011

26

Normal shader

Tuesday, August 16, 2011

27

for(int i = 0; i < lights.size(); ++i) {


BasicLight light = lights[i];
float3 L = normalize(light.pos - hit_point);
float nDl = dot( ffnormal, L);
if( nDl > 0.0f ){
// cast shadow ray
PerRayData_shadow shadow_prd;
shadow_prd.attenuation = 1.0f;
float Ldist = length(light.pos - hit_point);
Ray shadow_ray = make_Ray( hit_point, L, shadow_ray_type, scene_epsilon, Ldist );
rtTrace(top_shadower, shadow_ray, shadow_prd);
float light_attenuation = shadow_prd.attenuation;
if( light_attenuation > 0.0f ){
float3 Lc = light.color * light_attenuation;
color += Kd * nDl * Lc;
}
}
}

Tuesday, August 16, 2011

28

rtTextureSampler<float4, 2> envmap;


rtDeclareVariable(PerRayData_Radiance, prd_radiance, ,);
RT_PROGRAM void envmap_miss()
{
float theta = atan2f( ray.direction.x, ray.direction.z );
float phi
= M_PIf * 0.5f - acosf( ray.direction.y );
float u
= (theta + M_PIf) * (0.5f * M_1_PIf);
float v
= 0.5f * ( 1.0f + sin(phi) );
prd_radiance.result = make_float3( tex2D(envmap, u, v) );
}

Tuesday, August 16, 2011

29

Environment Maps

Tuesday, August 16, 2011

30

rtDeclareVariable(uint2, launchIndex, rtLaunchIndex,);


RT_PROGRAM void accumulation_camera()
{
...
rtTrace(top_object, ray, prd);
float4 acc_val = accum_buffer[launch_index];
if( frame > 0 )
acc_val += make_float4(prd.result, 0.f);
else
acc_val = make_float4(prd.result, 0.f);
output_buffer[launch_index] = make_color( make_float3(acc_val) * 1.f/(float(frame+1)) );
accum_buffer[launch_index] = acc_val;
}

Tuesday, August 16, 2011

31

Accumulation Camera

Tuesday, August 16, 2011

32

Under the hood


rtDeclareVariable(optix::Ray, ray, rtCurrentRay, );
.global .align 16 .b8 ray[36];
.global .align 4 .b8 _ZN21rti_internal_typeinfo3rayE[8] = {82,97,121,0,36,0,0,0};
.global .align 1 .b8 _ZN21rti_internal_typename3rayE[11] = {0x6f,0x70,0x74,0x69,0x78,0x3a,
0x3a,0x52,0x61,0x79,0x0};
.global .align 1 .b8 _ZN21rti_internal_semantic3rayE[13] =
{0x72,0x74,0x43,0x75,0x72,0x72,0x65,0x6e,0x74,0x52,0x61,0x79,0x0};
.global .align 1 .b8 _ZN23rti_internal_annotation3rayE[1] = {0x0};
...
ld.global.v4.f32
ld.global.v2.f32

{%f30,%f31,%f32,%f33}, [ray+0];
{%f38,%f39}, [ray+16];

mov %f30, ray_ox;


mov %f31, ray_oy;
mov %f33, ray_oz;
...

Tuesday, August 16, 2011

33

Under the hood


int3 vertex = vindex_buffer[primIdx];

ld.param.s32
%r1, [__cudaparm__Z14mesh_intersecti_primIdx];
cvt.s64.s32
%rd1, %r1;
mov.u64
%rd2, vindex_buffer;
mov.u32
%r2, 1;
mov.u32
%r4, 12;
mov.u64
%rd5, 0;
mov.u64
%rd7, 0;
mov.u64
%rd9, 0;
call (%rd11), _rt_buffer_get_64, (%rd2, %r2, %r4, %rd1, %rd5, %rd7, %rd9);
ld.global.s32
%r6, [%rd12+0];
ld.global.s32
%r7, [%rd12+4];
ld.global.s32
%r12, [%rd12+8];

Tuesday, August 16, 2011

34

Under the hood


int3 vertex = vindex_buffer[primIdx];

ld.param.s32
%r1, [__cudaparm__Z14mesh_intersecti_primIdx];
cvt.s64.s32
%rd1, %r1;
mov.u64
%rd2, vindex_buffer;
mov.u32
%r2, 1;
mov.u32
%r4, 12;
mov.u64
%rd5, 0;
mov.u64
%rd7, 0;
mov.u64
%rd9, 0;
call (%rd11), _rt_buffer_get_64, (%rd2, %r2, %r4, %rd1, %rd5, %rd7, %rd9);
ld.global.s32
%r6, [%rd12+0];
ld.global.s32
%r7, [%rd12+4];
ld.global.s32
%r12, [%rd12+8];

!"#$%&'$()*#+&)%#%&,%-*.-

Tuesday, August 16, 2011

34

Execution models
QEFR)E&E"8&'");0"()'0#)30%E8"#"8/)(E"3&*/).')":"36F0')%0;"8
4"S6"'F.8)30'(&(#"'3/),&#-&').)#-$".;)76(6.88/)E&:"8+(.%E8"<
T0)30'(&(#"'3/)1"#,""')#-$".;(
T0)0$;"$&'>)>6.$.'#""()*0$)&'#"$("3F0')#"(#(
4&;"O"A"3#()8&%&#";)#0)06#E6#)16A"$().';)$./)E./80.;(

K.'/)E0((&18")":"36F0')%0;"8(
/00123&41$&56786&*..19%:1.

Tuesday, August 16, 2011

35

Execution models
QEFR)E&E"8&'");0"()'0#)30%E8"#"8/)(E"3&*/).')":"36F0')%0;"8
4"S6"'F.8)30'(&(#"'3/),&#-&').)#-$".;)76(6.88/)E&:"8+(.%E8"<
T0)30'(&(#"'3/)1"#,""')#-$".;(
T0)0$;"$&'>)>6.$.'#""()*0$)&'#"$("3F0')#"(#(
4&;"O"A"3#()8&%&#";)#0)06#E6#)16A"$().';)$./)E./80.;(

K.'/)E0((&18")":"36F0')%0;"8(
/00123&41$&56786&*..19%:1.
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
F<&8(G>(.:%0&("('>:1.&2*#?*.&(%'?&#?$(%)

Tuesday, August 16, 2011

35

Execution models
QEFR)E&E"8&'");0"()'0#)30%E8"#"8/)(E"3&*/).')":"36F0')%0;"8
4"S6"'F.8)30'(&(#"'3/),&#-&').)#-$".;)76(6.88/)E&:"8+(.%E8"<
T0)30'(&(#"'3/)1"#,""')#-$".;(
T0)0$;"$&'>)>6.$.'#""()*0$)&'#"$("3F0')#"(#(
4&;"O"A"3#()8&%&#";)#0)06#E6#)16A"$().';)$./)E./80.;(

K.'/)E0((&18")":"36F0')%0;"8(
/00123&41$&56786&*..19%:1.
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
F<&8(G>(.:%0&("('>:1.&2*#?*.&(%'?&#?$(%)
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
I%'?*.(

Tuesday, August 16, 2011

35

Execution models
QEFR)E&E"8&'");0"()'0#)30%E8"#"8/)(E"3&*/).')":"36F0')%0;"8
4"S6"'F.8)30'(&(#"'3/),&#-&').)#-$".;)76(6.88/)E&:"8+(.%E8"<
T0)30'(&(#"'3/)1"#,""')#-$".;(
T0)0$;"$&'>)>6.$.'#""()*0$)&'#"$("3F0')#"(#(
4&;"O"A"3#()8&%&#";)#0)06#E6#)16A"$().';)$./)E./80.;(

K.'/)E0((&18")":"36F0')%0;"8(
/00123&41$&56786&*..19%:1.
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
F<&8(G>(.:%0&("('>:1.&2*#?*.&(%'?&#?$(%)
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
I%'?*.(
I%'?*.(
J<&E(4($&21$K&2?(.&)%#%&.1#&,$(3(.#

Tuesday, August 16, 2011

35

Execution models
QEFR)E&E"8&'");0"()'0#)30%E8"#"8/)(E"3&*/).')":"36F0')%0;"8
4"S6"'F.8)30'(&(#"'3/),&#-&').)#-$".;)76(6.88/)E&:"8+(.%E8"<
T0)30'(&(#"'3/)1"#,""')#-$".;(
T0)0$;"$&'>)>6.$.'#""()*0$)&'#"$("3F0')#"(#(
4&;"O"A"3#()8&%&#";)#0)06#E6#)16A"$().';)$./)E./80.;(

K.'/)E0((&18")":"36F0')%0;"8(
/00123&41$&56786&*..19%:1.
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
F<&8(G>(.:%0&("('>:1.&2*#?*.&(%'?&#?$(%)
;<@.*:%#(&$%H3&*.&0%$-(A*3?&L%#'?(3
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
F<81$#&1$&3'%.&41$&3#%#(&%.)&)%#%&'1?($(.'(
I%'?*.(
I%'?*.(
J<M$1'(33&L%#'?(3&>.:0&)1.(
J<&E(4($&21$K&2?(.&)%#%&.1#&,$(3(.#

Tuesday, August 16, 2011

35

Execution models
QEFR)E&E"8&'");0"()'0#)30%E8"#"8/)(E"3&*/).')":"36F0')%0;"8
4"S6"'F.8)30'(&(#"'3/),&#-&').)#-$".;)76(6.88/)E&:"8+(.%E8"<
T0)30'(&(#"'3/)1"#,""')#-$".;(
T0)0$;"$&'>)>6.$.'#""()*0$)&'#"$("3F0')#"(#(
4&;"O"A"3#()8&%&#";)#0)06#E6#)16A"$().';)$./)E./80.;(

K.'/)E0((&18")":"36F0')%0;"8(
/00123&41$&56786&*..19%:1.
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
F<&8(G>(.:%0&("('>:1.&2*#?*.&(%'?&#?$(%)
;<@.*:%#(&$%H3&*.&0%$-(A*3?&L%#'?(3
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
;<@.*:%#(&$%H3&#1&N00&2%$,
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
F<81$#&1$&3'%.&41$&3#%#(&%.)&)%#%&'1?($(.'(
I%'?*.(
F<=%.%-(&1.A'?*,&G>(>(3&#1&,$1'(33&(%'?&3#%#(
I%'?*.(
J<M$1'(33&L%#'?(3&>.:0&)1.(
J<&E(4($&21$K&2?(.&)%#%&.1#&,$(3(.#

Tuesday, August 16, 2011

35

Compilation to state machine


RT_PROGRAM void pinhole_camera()
{

RT_PROGRAM void pinhole_camera()


{

Ray ray = make_ray();

Ray ray = make_ray();

PerRayData_radiance prd;

PerRayData_radiance prd;
save prd, index;
rtTrace(top_object, ray, prd);
restore prd, index;
output_buffer[index] =
make_color( prd.result );

rtTrace(top_object, ray, prd);


output_buffer[index] =
make_color( prd.result );
}

Tuesday, August 16, 2011

Optix Just-in Time Compiler


Inserts continuations
Transforms to state
machine
Rewrites variable load/
store for object model
Inlines intrinsic functions

36

Compilation to state machine


RT_PROGRAM void pinhole_camera()
{

RT_PROGRAM void pinhole_camera()


{

Ray ray = make_ray();

Ray ray = make_ray();

Optix Just-in Time Compiler

State 1

PerRayData_radiance prd;

PerRayData_radiance prd;
save prd, index;
rtTrace(top_object, ray, prd);
restore prd, index;
output_buffer[index] =
make_color( prd.result );

rtTrace(top_object, ray, prd);


output_buffer[index] =
make_color( prd.result );
}

Tuesday, August 16, 2011

Inserts continuations
Transforms to state
Statemachine
2
Rewrites variable load/
store for object model
Inlines intrinsic functions

36

Step 1: Compile to PTX


for( int i = 0; i < 5; ++i ) {
Ray ray = make_Ray( make_float3( i, 0, 0 ),
make_float3( 0, 0, 1 ),
0, 1e-4f, 1e20f );
UserPayloadStruct payload;
rtTrace( top_object, ray, payload );
}

Tuesday, August 16, 2011

37

Step 1: Compile to PTX


for( int i = 0; i < 5; ++i ) {
Ray ray = make_Ray( make_float3( i, 0, 0 ),
make_float3( 0, 0, 1 ),
0, 1e-4f, 1e20f );
UserPayloadStruct payload;
rtTrace( top_object, ray, payload );
}

Tuesday, August 16, 2011

ld.global.u32
%node, [top_object+0];
mov.s32
%i, 0;
loop:
call _rt_trace, ( %node, %i, 0, 0, 0, 0, 1,
0, 1e-4f, 1e20f,
payload );
%i, %i, 1;
nvcc add.s32
mov.u32
%iend, 5;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop;

37

Step 2: Insert continuations


ld.global.u32
%node, [top_object+0];
mov.s32
%i, 0;
loop:
call _rt_trace, ( %node, %i, 0, 0, 0, 0, 1,
0, 1e-4f, 1e20f,
payload );
add.s32
%i, %i, 1;
mov.u32
%iend, 5;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop;

Tuesday, August 16, 2011

38

Step 2: Insert continuations


ld.global.u32
%node, [top_object+0];
mov.s32
%i, 0;
loop:
call _rt_trace, ( %node, %i, 0, 0, 0, 0, 1,
0, 1e-4f, 1e20f,
payload );
add.s32
%i, %i, 1;
mov.u32
%iend, 5;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop;

Tuesday, August 16, 2011

ld.global.u32
%node, [top_object+0];
mov.s32
%i, 0;
loop:
mov payload, %stack;
save %i, %iend, %node;
call _rt_trace, ( %node, %i, 0, 0, 0, 0, 1,
1e-4f, 1e20f,payload );
OptiX restore %i, %iend,0,%node;
add.s32
%i, %i, 1;
mov.u32
%iend, 5;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop;

38

Step 3: Apply optimizations


ld.global.u32
%node, [top_object+0];
mov.s32
%i, 0;
loop:
mov payload, %stack;
save %i, %iend, %node;
call _rt_trace, ( %node, %i, 0, 0, 0, 0, 1,
0, 1e-4f, 1e20f,payload );
restore %i, %iend, %node;
add.s32
%i, %i, 1;
mov.u32
%iend, 5;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop;

Tuesday, August 16, 2011

39

Step 3: Apply optimizations


ld.global.u32
%node, [top_object+0];
ld.const.u32
%node, [top_object+0];
mov.s32
%i, 0;
mov.s32
%i, 0;
loop:
loop:
mov payload, %stack;
mov payload, %stack;
save %i, %iend, %node;
save %i;
call _rt_trace, ( %node, %i, 0, 0, 0, 0, 1,
call _rt_trace, ( %node, %i, 0, 0, 0, 0, 1,
0, 1e-4f, 1e20f,payload );OptiX
0, 1e-4f, 1e20f,payload );
restore %i, %iend, %node;
restore %i;
add.s32
%i, %i, 1;
rematerialize %iend, %node;
mov.u32
%iend, 5;
add.s32
%i, %i, 1;
setp.ne.s32
%predicate, %i, %iend;
mov.u32
%iend, 5;
@%predicate bra loop;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop;

Tuesday, August 16, 2011

39

Step 4: Transform to state machine


ld.const.u32
%node, [top_object+0];
mov.s32
%i, 0;
loop:
mov payload, %stack;
save %i;
call _rt_trace, ( %node, %i, 0, 0, 0, 0, 1,
0, 1e-4f, 1e20f,payload );
restore %i;
rematerialize %iend, %node;
add.s32
%i, %i, 1;
mov.u32
%iend, 5;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop;

Tuesday, August 16, 2011

40

Step 4: Transform to state machine


ld.const.u32
%node, [top_object+0];
state 1:
mov.s32
%i, 0;
ld.const.u32
%node, [top_object+0];
loop:
mov.s32
%i, 0;
mov payload, %stack;
loop:
save %i;
mov payload, %stack;
call _rt_trace, ( %node, %i, 0, 0, 0, 0, 1,
save %i;
0, 1e-4f, 1e20f,payload );OptiX bra mainloop;
restore %i;
rematerialize %iend, %node;
state 2:
add.s32
%i, %i, 1;
restore %i;
mov.u32
%iend, 5;
rematerialize %iend, %node;
setp.ne.s32
%predicate, %i, %iend;
add.s32
%i, %i, 1;
@%predicate bra loop;
mov.u32
%iend, 5;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop;
bra mainloop;

Tuesday, August 16, 2011

40

Step 5: Restore structured control flow


state 1:
ld.const.u32
%node, [top_object+0];
mov.s32
%i, 0;
loop:
mov payload, %stack;
save %i;
bra mainloop;
state 2:
restore %i;
rematerialize %iend, %node;
add.s32
%i, %i, 1;
mov.u32
%iend, 5;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop;
bra mainloop;

Tuesday, August 16, 2011

41

Step 5: Restore structured control flow


state 1:
ld.const.u32
%node, [top_object+0];
mov.s32
%i, 0;
loop:
mov payload, %stack;
save %i;
bra mainloop;
state 2:
restore %i;
rematerialize %iend, %node;
add.s32
%i, %i, 1;
mov.u32
%iend, 5;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop;
bra mainloop;

Tuesday, August 16, 2011

state 1:
ld.const.u32
%node, [top_object+0];
mov.s32
%i, 0;
loop:
mov payload, %stack;
save %i;
OptiX bra mainloop;
loop_copy:
mov payload, %stack;
save %i;
bra mainloop;
state 2:
restore %i;
rematerialize %iend, %node;
add.s32
%i, %i, 1;
mov.u32
%iend, 5;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop_copy;

41

Graphical View
Initial

Transformed
State
Machine

begin

begin

State
Machine

Restored

begin

loop part 1
loop

loop part 1
loop part 2

Tuesday, August 16, 2011

loop part 2
loop part 1

42

Other optimizations

JIT Compiler gives opportunity for data-dependent optimizations

Elide unused transforms: up to 7%


Eliminate printing/exception code when not enabled: arbitrary
Reduce continuation size with rematerialization: arbitrary
Specialize traversal based on tree characteristics: 10-15%
Move small read-only data to textures or constant memory: up to 29%

Traditional compiler optimizations

Loop hoisting, dead-code elimination, copy propagation, etc.

Different load-balancing, scheduling, traversal

Architecture-dependent optimizations

Tuesday, August 16, 2011

43

Execution

Goal: execute efficiently on a GPU


Must manage execution coherence within a warp (32 threads)
Must manage data coherence
Minimize local state
Minimize context switch overhead

Strategy: compile to a megakernal


All programs (traversal, shading, intersection)
Utilize dynamic load-balancing
Some divergence unavoidable

Key observation
Although threads may temporarily diverge, they return to frequently used states

Tuesday, August 16, 2011

44

Fine-grained scheduling
Naive SIMD Schedule

Prioritized SIMD Schedule

13 iterations, 88% occupancy

22 iterations, 53% occupancy

14 iterations, 82% occupancy

A12121

B12123121212

B1212

C1212121231212
D121212121212
Time

Tuesday, August 16, 2011

2
3

1
1

2
2

1
1

C12121

D12121

2
Time

2
2

12
3

A1212

12
2

1
1

2
2

Thread

A1212121212
Thread

Thread

MIMD Schedule

1212

12

B121231212

12

C1212

121231212

D1 2 1 2

1212

1212

Time

45

ilt
in
fu
n
ct
io

pi

n
r
bu etu
nh il rn
_
o ti
pi le_ n e 0
n h c xi t
_
o a
pi le_ me 1
n h c ra
ol am _6
e_ e
ca ra _
m 7
er
m a_8
i n i ss
te
rs _ 1
in ec 0
te t_
r
1
in sec 2
te t_
rs 1
in ec 3
te t_
rs 1
in ec 4
te t_
rs 1
in ec 5
te t_
rs 1
in ec 6
te t_
rs 1
in ec 7
te t_
rs 1
in ec 9
te t_
rs 2
in ec 0
cl
os
te t_
rs 2
e
cl st_ int ec 1
os h
t
e
rs _ 2
e it
cl st_ _ra ec 3
os h di t_
2
a
e it
cl st_ _ra nce 4
os h di
_
a
2
e it
cl st_ _ra nce 5
os h di
_
i
es t_ an 26
t_ rad ce
h
_
an it_r ian 27
c
cl
o s y_ h a d i e _
2
e
i a
cl st_ t_s nce 8
os h ha _
i
e t
d 2
cl st_ _ra ow 9
os h di
a _3
e it
cl st_ _ra nce 0
os h di
a _3
e it
cl st_ _ra nce 1
os h di
_
es it_ an 32
t_ rad ce
hi
_3
a t ia
cl ny _ra nce 3
os _h di _
e
i a 3
cl st_ t_s nce 4
os h ha _
e it
d 3
cl st_ _ra ow 5
os h di
_
es it_ an 36
r
c
t
a
tra _h di e_
ve it_r an 37
c
tra rse adi e_
ve _ g a n 3 8
rs eo ce
e_ m _3
ge etr 9
om y_
et 43
ry
_4
4

bu

3.5E+06

3.0E+06

!"#$%&'()*#+(+,-%

Tuesday, August 16, 2011

.#/+#/0123%

2.5E+06

2.0E+06

1.5E+06

1.0E+06

5.0E+05

0.0E+00

46

ilt
in
fu
n
ct
io

pi

n
r
bu etu
nh il rn
_
o ti
pi le_ n e 0
n h c xi t
_
o a
pi le_ me 1
n h c ra
ol am _6
e_ e
ca ra _
m 7
er
m a_8
i n i ss
te
rs _ 1
in ec 0
te t_
r
1
in sec 2
te t_
rs 1
in ec 3
te t_
rs 1
in ec 4
te t_
rs 1
in ec 5
te t_
rs 1
in ec 6
te t_
rs 1
in ec 7
te t_
rs 1
in ec 9
te t_
rs 2
in ec 0
cl
os
te t_
rs 2
e
cl st_ int ec 1
os h
t
e
rs _ 2
e it
cl st_ _ra ec 3
os h di t_
2
a
e it
cl st_ _ra nce 4
os h di
_
a
2
e it
cl st_ _ra nce 5
os h di
_
i
es t_ an 26
t_ rad ce
h
_
an it_r ian 27
c
cl
o s y_ h a d i e _
2
e
i a
cl st_ t_s nce 8
os h ha _
i
e t
d 2
cl st_ _ra ow 9
os h di
a _3
e it
cl st_ _ra nce 0
os h di
a _3
e it
cl st_ _ra nce 1
os h di
_
es it_ an 32
t_ rad ce
hi
_3
a t ia
cl ny _ra nce 3
os _h di _
e
i a 3
cl st_ t_s nce 4
os h ha _
e it
d 3
cl st_ _ra ow 5
os h di
_
es it_ an 36
r
c
t
a
tra _h di e_
ve it_r an 37
c
tra rse adi e_
ve _ g a n 3 8
rs eo ce
e_ m _3
ge etr 9
om y_
et 43
ry
_4
4

bu

3.5E+06

3.0E+06

!"#$%&'()*#+(+,-%

5.0E+05

Divergence Before and After

Tuesday, August 16, 2011

.#/+#/0123%

2.5E+06

2.0E+06

1.5E+06

1.0E+06

Time Before and After

0.0E+00

46

Raw Traversal Performance


Aila-Laine

OptiX

Aila-Laine

Conference (Primary)

Conference (Primary)

Conference (AO)

Conference (AO)

Fairy Forest (Primary)

Fairy Forest (Primary)

Fairy Forest (AO)

Fairy Forest (AO)

Sibenik (Primary)

Sibenik (Primary)

Sibenik (AO)

Sibenik (AO)
50

100

150

GTX285
Tuesday, August 16, 2011

200

250

50

100

OptiX

150

200

250

GTX480
47

System performance (ISPM)


Frog Performance
150
OpenGL local
Trace

OpenGL Gather
I/O

Frame ms

120

90

60

30

OptiX
GTX480

McGuire-Luebke
Quad Core2 DUO

3.5X ray tracing speedup


2.5X net speedup
Tuesday, August 16, 2011

48

ISPM

ISPM

ISPM

Caustics

Diffuse Interreflection
2.8X

Color Bleeding
3.0X

OptiX Examples
Tuesday, August 16, 2011

49

Cook

Photon Map

Path tracer

Motion Blur and Depth of Field


14 programs
30 states
4648 bytes constant data

Progressive Photon Mapping


14 programs
30 states
1416 bytes constant data

Forward Path Tracing


11 programs
21 states
2100 bytes constant data

OptiX SDK Samples


Tuesday, August 16, 2011

50

Sample6

Julia

Tutorial

Ambient occlusion and .obj loading


10 programs
20 states
648 bytes constant data

Fractals using custom intersection


20 programs
37 states
844 bytes constant data

Programmable shading and intersection


17 programs
41 states
1120 bytes constant data

OptiX SDK Samples


Tuesday, August 16, 2011

51

Whirligig

Whitted

Collision

Animated scenes
14 programs
41 states
1120 bytes constant data

Interactive reconstruction of 1980 paper


20 programs
57 states
1072 bytes constant data

Line of sight and collision detection with


ray casting
13 programs
23 states
552 bytes constant data

OptiX SDK Samples


Tuesday, August 16, 2011

52

Mandelbulb

Path tracer

Design Garage

Fractal intersection

Reconstruction of 1986 Kajiya paper

Progressive path tracing


28 programs
66 states
36444 bytes constant data

OptiX Examples
Tuesday, August 16, 2011

53

Tuesday, August 16, 2011

54

Questions?

Tuesday, August 16, 2011

55

Das könnte Ihnen auch gefallen