Beruflich Dokumente
Kultur Dokumente
Overview
Foundation
Build on OpenGL Shading Language (GLSL)
Simple, clean, C-like syntax
Simple type system
float, int, bool
small vectors (2-4 elements)
small matrices (float only)
structs, fixed-length arrays, no pointers
New functions
Convenience:
dominant axis of vector
vector perpendicular to vector
range test
horizontal min/max
Additional functionality:
random numbers
matrix inversion
overloaded pow for integer exponent
Ray Tracing:
trace(): Recursive ray tracing call
hit(): Report possible intersection with primitive
Reference type
Different variable qualifiers
GLSL: attribute, uniform, varying, const
RTSL: public, private, scratch, const
Computational Model
dow
Sha
s
Ray
Point,
Normal
Light
Frame Buffer
or
Parent Ray
Camera
Lighting
Engine
LightColor,
LightDirection,
LightDistance
RayOrigin,
RayDirection
Se
Ra con
ys da
ry
Render Loop
ScreenCoord,
LensCoord
lor
leCo
p
m
Sa
RayOrigin,
RayDirection
GeometricNormal,
ShadingNormal
HitPoint
HitDistance
Primitive
Material
TextureUV/UVW
TextureColor
RTSL
Scene Traversal
Texture
Rendering Engine
Tuesday, August 16, 2011
OptiX Goals
Life of a ray
1 Ray Generation
2
Intersection
3 Shading
Pinhole
1 Camera
Payload
float3 color
Ray-Sphere
Intersection
Lambertian
Shading
10
Life of a ray
1
Pinhole
Camera
Ray-Sphere
Intersection
Lambertian
Shading
11
PerRayData_radiance prd;
rtTrace(top_object, ray, prd);
output_buffer[launch_index] =
make_color( prd.result );
}
12
rtContextLaunch
Exception
Program
rtTrace
Shade
Traverse
Miss
Program
Node Graph
Traversal
Selector Visit
Program
Acceleration
Traversal
Closest Hit
Program
Intersection
Program
Any Hit
Program
13
Programmable Operations
Rasterization
Ray Tracing
Fragment
Closest Hit
Any Hit
Vertex
Intersection
Geometry
Selector
Hull/Domain (Tesselation)
Ray Generation
Miss
Exception
14
Flexible intersection
Primary Ray
Intersection (miss)
Intersection (hit)
Any Hit
Intersection (miss)
Intersection (hit)
Any hit (ignore intersection)
Closest hit
15
Flexible intersection
Shadow Ray
Intersection (hit)
Any Hit (ignore intersection)
Intersection (hit)
Any hit (terminate ray)
Closest hit
16
Geometry representation
Geometry Instance
Geometry
Intersection
Material
Material
Material
Closest
Hit 00 Any
Hit 00
Closest
ClosestHit
Hit 0 Any
AnyHit
Hit 0
Closest
Hit 11 Any
Hit 11
Closest
ClosestHit
Hit 1 Any
AnyHit
Hit 1
17
Scene representation
Context
Group
Acceleration
Transform
Transform
Acceleration
Geometry Group
Geometry Instance
Geometry
Intersection
Material
Closest
Hit 00
Closest
ClosestHit
Hit 0
Any
Hit 00
Any
AnyHit
Hit 0
Closest
Hit 11
Closest
ClosestHit
Hit 1
Any
Hit 11
Any
AnyHit
Hit 1
Geometry
Intersection
Geometry Instance
Material
Material
Material
Material
Material
Closest
Hit 00
Closest
ClosestHit
Hit 0
Any
Hit 00
Any
AnyHit
Hit 0
Closest
Hit 11
Closest
ClosestHit
Hit 1
Any
Hit 11
Any
AnyHit
Hit 1
18
Context
Group
Acceleration
Transform
Transform
Acceleration
Geometry Group
Geometry Instance
Geometry
Intersection
Geometry Instance
Material
Material
Material
Closest
Hit 00
Closest
ClosestHit
Hit 0
Any
Hit 00
Any
AnyHit
Hit 0
Closest
Hit 11
Closest
ClosestHit
Hit 1
Any
Hit 11
Any
AnyHit
Hit 1
Geometry
Intersection
Material
Material
Material
Closest
Hit 00
Closest
ClosestHit
Hit 0
Any
Hit 00
Any
AnyHit
Hit 0
Closest
Hit 11
Closest
ClosestHit
Hit 1
Any
Hit 11
Any
AnyHit
Hit 1
19
Lights
Context
Group
Acceleration
Transform
Transform
Acceleration
Geometry Group
Geometry Instance
Geometry
Intersection
Geometry Instance
Material
Material
Material
Closest
Hit 00
Closest
ClosestHit
Hit 0
Any
Hit 00
Any
AnyHit
Hit 0
Closest
Hit 11
Closest
ClosestHit
Hit 1
Any
Hit 11
Any
AnyHit
Hit 1
Geometry
Intersection
Lights
Material
Material
Material
Closest
Hit 00
Closest
ClosestHit
Hit 0
Any
Hit 00
Any
AnyHit
Hit 0
Closest
Hit 11
Closest
ClosestHit
Hit 1
Any
Hit 11
Any
AnyHit
Hit 1
19
Lights
Context
Group
Acceleration
Transform
Transform
Acceleration
Geometry Group
Geometry Instance
Geometry
Intersection
Geometry Instance
Material
Material
Material
Closest
Hit 00
Closest
ClosestHit
Hit 0
Any
Hit 00
Any
AnyHit
Hit 0
Closest
Hit 11
Closest
ClosestHit
Hit 1
Any
Hit 11
Any
AnyHit
Hit 1
Geometry
Intersection
Lights
Material
Material
Material
Closest
Hit 00
Closest
ClosestHit
Hit 0
Any
Hit 00
Any
AnyHit
Hit 0
Closest
Hit 11
Closest
ClosestHit
Hit 1
Any
Hit 11
Any
AnyHit
Hit 1
19
Shading language for OptiX is a restricted subset of the CUDA device code
Just C++
A few conventions for accessing runtime (object model, trace functions, etc.)
No data management functionality required from CUDA
Some CUDA functionality disallowed (shared memory, barriers, etc.)
Produces PTX
20
Intersection program
!"#"$%&'"()&*+,-"$")$./)-&#().')012"3#
4"#().5$&16#"()7'0$%.89)#":#6$")300$;&'.#"(<)
=(";)1/)380("(#)-&#)(-.;"$)*0$)(-.;&'>
4"8"3#(),-&3-)%.#"$&.8)#0)6("
=(";)*0$
?$0>$.%%.18")(6$*.3"(
@880,&'>).$1&#$.$/)#$&.'>8")16A"$)*0$%.#(
B#3C
21
rtDeclareVariable(float3, p0, );
rtDeclareVariable(float3, p1, );
rtDeclareVariable(float3, p2, );
rtDeclareVariable(float3, geometric_normal, attribute geometric_normal, );
rtDeclareVariable(float3, shading_normal, attribute shading_normal, );
rtDeclareVariable(optix::Ray, ray, rtCurrentRay, );
= dot( n, ray.direction );
= 1.0f / v;
float3 e2 = p0 - ray.origin;
float va = dot( n, e2 );
float t
= r*va;
rtPotentialIntersection( t ) ) {
shading_normal = geometric_normal = -n;
rtReportIntersection( 0 );
}}}}}
22
rtBuffer<Vertex> vertex_buffer;
rtBuffer<uint3> index_buffer;
rtDeclareVariable(float3, geometric_normal, attribute geometric_normal, );
rtDeclareVariable(float3, shading_normal, attribute shading_normal, );
rtDeclareVariable(optix::Ray, ray, rtCurrentRay, );
= dot( n, ray.direction );
= 1.0f / v;
23
K0(#)30%%0'
24
Ray Payloads
G.');"D'").$1&#$.$/);.#.),&#-)#-")$./
40%"F%"()3.88";)#-")LE"$)$./);.#.M
!.#.)3.')1")E.((";);0,')0$)6E)#-")$./)#$"")70$)10#-<
N6(#).)6("$O;"D'";)(#$63#).33"((";)1/).88)(-.;"$)E$0>$.%(
P.$&"()E"$)$./)#/E"
Attenuation
Color
Depth
importance
Color
Depth
importance
25
struct PerRayData_radiance
{
float3 result;
};
rtDeclareVariable(PerRayData_radiance, prd_radiance, rtPayload,);
rtDeclareVariable(float3, shading_normal, attribute shading_normal,);
RT_PROGRAM void closest_hit_radiance()
{
float3 worldnormal = normalize(rtTransformNormal(RT_OBJECT_TO_WORLD,
shading_normal));
prd_radiance.result = worldnormal * 0.5f + 0.5f;
}
26
Normal shader
27
28
29
Environment Maps
30
31
Accumulation Camera
32
{%f30,%f31,%f32,%f33}, [ray+0];
{%f38,%f39}, [ray+16];
33
ld.param.s32
%r1, [__cudaparm__Z14mesh_intersecti_primIdx];
cvt.s64.s32
%rd1, %r1;
mov.u64
%rd2, vindex_buffer;
mov.u32
%r2, 1;
mov.u32
%r4, 12;
mov.u64
%rd5, 0;
mov.u64
%rd7, 0;
mov.u64
%rd9, 0;
call (%rd11), _rt_buffer_get_64, (%rd2, %r2, %r4, %rd1, %rd5, %rd7, %rd9);
ld.global.s32
%r6, [%rd12+0];
ld.global.s32
%r7, [%rd12+4];
ld.global.s32
%r12, [%rd12+8];
34
ld.param.s32
%r1, [__cudaparm__Z14mesh_intersecti_primIdx];
cvt.s64.s32
%rd1, %r1;
mov.u64
%rd2, vindex_buffer;
mov.u32
%r2, 1;
mov.u32
%r4, 12;
mov.u64
%rd5, 0;
mov.u64
%rd7, 0;
mov.u64
%rd9, 0;
call (%rd11), _rt_buffer_get_64, (%rd2, %r2, %r4, %rd1, %rd5, %rd7, %rd9);
ld.global.s32
%r6, [%rd12+0];
ld.global.s32
%r7, [%rd12+4];
ld.global.s32
%r12, [%rd12+8];
!"#$%&'$()*#+&)%#%&,%-*.-
34
Execution models
QEFR)E&E"8&'");0"()'0#)30%E8"#"8/)(E"3&*/).')":"36F0')%0;"8
4"S6"'F.8)30'(&(#"'3/),&#-&').)#-$".;)76(6.88/)E&:"8+(.%E8"<
T0)30'(&(#"'3/)1"#,""')#-$".;(
T0)0$;"$&'>)>6.$.'#""()*0$)&'#"$("3F0')#"(#(
4&;"O"A"3#()8&%&#";)#0)06#E6#)16A"$().';)$./)E./80.;(
K.'/)E0((&18")":"36F0')%0;"8(
/00123&41$&56786&*..19%:1.
35
Execution models
QEFR)E&E"8&'");0"()'0#)30%E8"#"8/)(E"3&*/).')":"36F0')%0;"8
4"S6"'F.8)30'(&(#"'3/),&#-&').)#-$".;)76(6.88/)E&:"8+(.%E8"<
T0)30'(&(#"'3/)1"#,""')#-$".;(
T0)0$;"$&'>)>6.$.'#""()*0$)&'#"$("3F0')#"(#(
4&;"O"A"3#()8&%&#";)#0)06#E6#)16A"$().';)$./)E./80.;(
K.'/)E0((&18")":"36F0')%0;"8(
/00123&41$&56786&*..19%:1.
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
F<&8(G>(.:%0&("('>:1.&2*#?*.&(%'?&#?$(%)
35
Execution models
QEFR)E&E"8&'");0"()'0#)30%E8"#"8/)(E"3&*/).')":"36F0')%0;"8
4"S6"'F.8)30'(&(#"'3/),&#-&').)#-$".;)76(6.88/)E&:"8+(.%E8"<
T0)30'(&(#"'3/)1"#,""')#-$".;(
T0)0$;"$&'>)>6.$.'#""()*0$)&'#"$("3F0')#"(#(
4&;"O"A"3#()8&%&#";)#0)06#E6#)16A"$().';)$./)E./80.;(
K.'/)E0((&18")":"36F0')%0;"8(
/00123&41$&56786&*..19%:1.
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
F<&8(G>(.:%0&("('>:1.&2*#?*.&(%'?&#?$(%)
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
I%'?*.(
35
Execution models
QEFR)E&E"8&'");0"()'0#)30%E8"#"8/)(E"3&*/).')":"36F0')%0;"8
4"S6"'F.8)30'(&(#"'3/),&#-&').)#-$".;)76(6.88/)E&:"8+(.%E8"<
T0)30'(&(#"'3/)1"#,""')#-$".;(
T0)0$;"$&'>)>6.$.'#""()*0$)&'#"$("3F0')#"(#(
4&;"O"A"3#()8&%&#";)#0)06#E6#)16A"$().';)$./)E./80.;(
K.'/)E0((&18")":"36F0')%0;"8(
/00123&41$&56786&*..19%:1.
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
F<&8(G>(.:%0&("('>:1.&2*#?*.&(%'?&#?$(%)
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
I%'?*.(
I%'?*.(
J<&E(4($&21$K&2?(.&)%#%&.1#&,$(3(.#
35
Execution models
QEFR)E&E"8&'");0"()'0#)30%E8"#"8/)(E"3&*/).')":"36F0')%0;"8
4"S6"'F.8)30'(&(#"'3/),&#-&').)#-$".;)76(6.88/)E&:"8+(.%E8"<
T0)30'(&(#"'3/)1"#,""')#-$".;(
T0)0$;"$&'>)>6.$.'#""()*0$)&'#"$("3F0')#"(#(
4&;"O"A"3#()8&%&#";)#0)06#E6#)16A"$().';)$./)E./80.;(
K.'/)E0((&18")":"36F0')%0;"8(
/00123&41$&56786&*..19%:1.
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
F<&8(G>(.:%0&("('>:1.&2*#?*.&(%'?&#?$(%)
;<@.*:%#(&$%H3&*.&0%$-(A*3?&L%#'?(3
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
F<81$#&1$&3'%.&41$&3#%#(&%.)&)%#%&'1?($(.'(
I%'?*.(
I%'?*.(
J<M$1'(33&L%#'?(3&>.:0&)1.(
J<&E(4($&21$K&2?(.&)%#%&.1#&,$(3(.#
35
Execution models
QEFR)E&E"8&'");0"()'0#)30%E8"#"8/)(E"3&*/).')":"36F0')%0;"8
4"S6"'F.8)30'(&(#"'3/),&#-&').)#-$".;)76(6.88/)E&:"8+(.%E8"<
T0)30'(&(#"'3/)1"#,""')#-$".;(
T0)0$;"$&'>)>6.$.'#""()*0$)&'#"$("3F0')#"(#(
4&;"O"A"3#()8&%&#";)#0)06#E6#)16A"$().';)$./)E./80.;(
K.'/)E0((&18")":"36F0')%0;"8(
/00123&41$&56786&*..19%:1.
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
;<&=%,&0%>.'?@.)("&AB&CDE/&#?$(%)
F<&8(G>(.:%0&("('>:1.&2*#?*.&(%'?&#?$(%)
;<@.*:%#(&$%H3&*.&0%$-(A*3?&L%#'?(3
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
;<@.*:%#(&$%H3&N00&2%$,
F<&!",0*'*#0H&I%.%-(&)*9($-(.'(&*.&%&=@=E&3#%#(&
F<81$#&1$&3'%.&41$&3#%#(&%.)&)%#%&'1?($(.'(
I%'?*.(
F<=%.%-(&1.A'?*,&G>(>(3&,$1'(33&(%'?&3#%#(
I%'?*.(
J<M$1'(33&L%#'?(3&>.:0&)1.(
J<&E(4($&21$K&2?(.&)%#%&.1#&,$(3(.#
35
PerRayData_radiance prd;
PerRayData_radiance prd;
save prd, index;
rtTrace(top_object, ray, prd);
restore prd, index;
output_buffer[index] =
make_color( prd.result );
36
State 1
PerRayData_radiance prd;
PerRayData_radiance prd;
save prd, index;
rtTrace(top_object, ray, prd);
restore prd, index;
output_buffer[index] =
make_color( prd.result );
Inserts continuations
Transforms to state
Statemachine
2
Rewrites variable load/
store for object model
Inlines intrinsic functions
36
37
ld.global.u32
%node, [top_object+0];
mov.s32
%i, 0;
loop:
call _rt_trace, ( %node, %i, 0, 0, 0, 0, 1,
0, 1e-4f, 1e20f,
payload );
%i, %i, 1;
nvcc add.s32
mov.u32
%iend, 5;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop;
37
38
ld.global.u32
%node, [top_object+0];
mov.s32
%i, 0;
loop:
mov payload, %stack;
save %i, %iend, %node;
call _rt_trace, ( %node, %i, 0, 0, 0, 0, 1,
1e-4f, 1e20f,payload );
OptiX restore %i, %iend,0,%node;
add.s32
%i, %i, 1;
mov.u32
%iend, 5;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop;
38
39
39
40
40
41
state 1:
ld.const.u32
%node, [top_object+0];
mov.s32
%i, 0;
loop:
mov payload, %stack;
save %i;
OptiX bra mainloop;
loop_copy:
mov payload, %stack;
save %i;
bra mainloop;
state 2:
restore %i;
rematerialize %iend, %node;
add.s32
%i, %i, 1;
mov.u32
%iend, 5;
setp.ne.s32
%predicate, %i, %iend;
@%predicate bra loop_copy;
41
Graphical View
Initial
Transformed
State
Machine
begin
begin
State
Machine
Restored
begin
loop part 1
loop
loop part 1
loop part 2
loop part 2
loop part 1
42
Other optimizations
Architecture-dependent optimizations
43
Execution
Key observation
Although threads may temporarily diverge, they return to frequently used states
44
Fine-grained scheduling
Naive SIMD Schedule
A12121
B12123121212
B1212
C1212121231212
D121212121212
Time
2
3
1
1
2
2
1
1
C12121
D12121
2
Time
2
2
12
3
A1212
12
2
1
1
2
2
Thread
A1212121212
Thread
Thread
MIMD Schedule
1212
12
B121231212
12
C1212
121231212
D1 2 1 2
1212
1212
Time
45
ilt
in
fu
n
ct
io
pi
n
r
bu etu
nh il rn
_
o ti
pi le_ n e 0
n h c xi t
_
o a
pi le_ me 1
n h c ra
ol am _6
e_ e
ca ra _
m 7
er
m a_8
i n i ss
te
rs _ 1
in ec 0
te t_
r
1
in sec 2
te t_
rs 1
in ec 3
te t_
rs 1
in ec 4
te t_
rs 1
in ec 5
te t_
rs 1
in ec 6
te t_
rs 1
in ec 7
te t_
rs 1
in ec 9
te t_
rs 2
in ec 0
cl
os
te t_
rs 2
e
cl st_ int ec 1
os h
t
e
rs _ 2
e it
cl st_ _ra ec 3
os h di t_
2
a
e it
cl st_ _ra nce 4
os h di
_
a
2
e it
cl st_ _ra nce 5
os h di
_
i
es t_ an 26
t_ rad ce
h
_
an it_r ian 27
c
cl
o s y_ h a d i e _
2
e
i a
cl st_ t_s nce 8
os h ha _
i
e t
d 2
cl st_ _ra ow 9
os h di
a _3
e it
cl st_ _ra nce 0
os h di
a _3
e it
cl st_ _ra nce 1
os h di
_
es it_ an 32
t_ rad ce
hi
_3
a t ia
cl ny _ra nce 3
os _h di _
e
i a 3
cl st_ t_s nce 4
os h ha _
e it
d 3
cl st_ _ra ow 5
os h di
_
es it_ an 36
r
c
t
a
tra _h di e_
ve it_r an 37
c
tra rse adi e_
ve _ g a n 3 8
rs eo ce
e_ m _3
ge etr 9
om y_
et 43
ry
_4
4
bu
3.5E+06
3.0E+06
!"#$%&'()*#+(+,-%
.#/+#/0123%
2.5E+06
2.0E+06
1.5E+06
1.0E+06
5.0E+05
0.0E+00
46
ilt
in
fu
n
ct
io
pi
n
r
bu etu
nh il rn
_
o ti
pi le_ n e 0
n h c xi t
_
o a
pi le_ me 1
n h c ra
ol am _6
e_ e
ca ra _
m 7
er
m a_8
i n i ss
te
rs _ 1
in ec 0
te t_
r
1
in sec 2
te t_
rs 1
in ec 3
te t_
rs 1
in ec 4
te t_
rs 1
in ec 5
te t_
rs 1
in ec 6
te t_
rs 1
in ec 7
te t_
rs 1
in ec 9
te t_
rs 2
in ec 0
cl
os
te t_
rs 2
e
cl st_ int ec 1
os h
t
e
rs _ 2
e it
cl st_ _ra ec 3
os h di t_
2
a
e it
cl st_ _ra nce 4
os h di
_
a
2
e it
cl st_ _ra nce 5
os h di
_
i
es t_ an 26
t_ rad ce
h
_
an it_r ian 27
c
cl
o s y_ h a d i e _
2
e
i a
cl st_ t_s nce 8
os h ha _
i
e t
d 2
cl st_ _ra ow 9
os h di
a _3
e it
cl st_ _ra nce 0
os h di
a _3
e it
cl st_ _ra nce 1
os h di
_
es it_ an 32
t_ rad ce
hi
_3
a t ia
cl ny _ra nce 3
os _h di _
e
i a 3
cl st_ t_s nce 4
os h ha _
e it
d 3
cl st_ _ra ow 5
os h di
_
es it_ an 36
r
c
t
a
tra _h di e_
ve it_r an 37
c
tra rse adi e_
ve _ g a n 3 8
rs eo ce
e_ m _3
ge etr 9
om y_
et 43
ry
_4
4
bu
3.5E+06
3.0E+06
!"#$%&'()*#+(+,-%
5.0E+05
.#/+#/0123%
2.5E+06
2.0E+06
1.5E+06
1.0E+06
0.0E+00
46
OptiX
Aila-Laine
Conference (Primary)
Conference (Primary)
Conference (AO)
Conference (AO)
Sibenik (Primary)
Sibenik (Primary)
Sibenik (AO)
Sibenik (AO)
50
100
150
GTX285
Tuesday, August 16, 2011
200
250
50
100
OptiX
150
200
250
GTX480
47
OpenGL Gather
I/O
Frame ms
120
90
60
30
OptiX
GTX480
McGuire-Luebke
Quad Core2 DUO
48
ISPM
ISPM
ISPM
Caustics
Diffuse Interreflection
2.8X
Color Bleeding
3.0X
OptiX Examples
Tuesday, August 16, 2011
49
Cook
Photon Map
Path tracer
50
Sample6
Julia
Tutorial
51
Whirligig
Whitted
Collision
Animated scenes
14 programs
41 states
1120 bytes constant data
52
Mandelbulb
Path tracer
Design Garage
Fractal intersection
OptiX Examples
Tuesday, August 16, 2011
53
54
Questions?
55