Sie sind auf Seite 1von 120

1

Highly dense foliage visualization on


multi-cpu architectures.

Mark McCubbin

With thanks to
Thomas Engel
Markus Breyer

2
Goals

3
Goals

 Highly Dense foliage for games with large


view distances (duh?)

3
Goals

 Highly Dense foliage for games with large


view distances (duh?)
 Near zero stalls so effective utilization of
multiple threads/cpus.

3
Goals

 Highly Dense foliage for games with large


view distances (duh?)
 Near zero stalls so effective utilization of
multiple threads/cpus.
 Low GPU cost.

3
Goals

 Highly Dense foliage for games with large


view distances (duh?)
 Near zero stalls so effective utilization of
multiple threads/cpus.
 Low GPU cost.
 Low main CPU cost.

3
Goals

 Highly Dense foliage for games with large


view distances (duh?)
 Near zero stalls so effective utilization of
multiple threads/cpus.
 Low GPU cost.
 Low main CPU cost.
 No memory management overhead.

3
Goals

 Highly Dense foliage for games with large


view distances (duh?)
 Near zero stalls so effective utilization of
multiple threads/cpus.
 Low GPU cost.
 Low main CPU cost.
 No memory management overhead.
 Visually Appealing, no point in being fast if
it looks like a turd.

3
Example – Lair PS3

4
Overview

5
LOD and rendering basic visualization of the foliage.
Overview

 Basic Idea

5
LOD and rendering basic visualization of the foliage.
Overview

 Basic Idea
 World description

5
LOD and rendering basic visualization of the foliage.
Overview

 Basic Idea
 World description
 Density Representation

5
LOD and rendering basic visualization of the foliage.
Overview

 Basic Idea
 World description
 Density Representation
 Job Management / Dispatch

5
LOD and rendering basic visualization of the foliage.
Overview

 Basic Idea
 World description
 Density Representation
 Job Management / Dispatch
 Worker-CPU specifics

5
LOD and rendering basic visualization of the foliage.
Overview

 Basic Idea
 World description
 Density Representation
 Job Management / Dispatch
 Worker-CPU specifics
 Rendering

5
LOD and rendering basic visualization of the foliage.
Basic Idea

6
Basic Idea

 We want a foliage system that can have


multiple layers of foliage.

6
Basic Idea

 We want a foliage system that can have


multiple layers of foliage.
 Each layer represents one visual type of
foliage, such as grass.

6
Basic Idea

 We want a foliage system that can have


multiple layers of foliage.
 Each layer represents one visual type of
foliage, such as grass.
 Visually, on each layer we have up to 3
levels of LOD.

6
Basic Idea

 We want a foliage system that can have


multiple layers of foliage.
 Each layer represents one visual type of
foliage, such as grass.
 Visually, on each layer we have up to 3
levels of LOD.
 Setup to process all the data and generate
foliage renderable data on the fly using a
magic carpet around the camera
approach.

6
Basic Idea

7
The switch between camera facing sprite+ground aligined sprite can be surprisingly close to the camera to be
convincing.

You can also get an excellent parallax effect if you go further and have multiple slices on the camera facing
sprite layer. GPU time was critical to us in Lair, so that wasn’t an option. This is being added though as a 4th
LOD.
Basic Idea

 Let me explain the LODing a little more.

7
The switch between camera facing sprite+ground aligined sprite can be surprisingly close to the camera to be
convincing.

You can also get an excellent parallax effect if you go further and have multiple slices on the camera facing
sprite layer. GPU time was critical to us in Lair, so that wasn’t an option. This is being added though as a 4th
LOD.
Basic Idea

 Let me explain the LODing a little more.


 We have 3 levels.

7
The switch between camera facing sprite+ground aligined sprite can be surprisingly close to the camera to be
convincing.

You can also get an excellent parallax effect if you go further and have multiple slices on the camera facing
sprite layer. GPU time was critical to us in Lair, so that wasn’t an option. This is being added though as a 4th
LOD.
Basic Idea

 Let me explain the LODing a little more.


 We have 3 levels.
 [3D Object] when the user is very close to
the ground (and is optional).

7
The switch between camera facing sprite+ground aligined sprite can be surprisingly close to the camera to be
convincing.

You can also get an excellent parallax effect if you go further and have multiple slices on the camera facing
sprite layer. GPU time was critical to us in Lair, so that wasn’t an option. This is being added though as a 4th
LOD.
Basic Idea

 Let me explain the LODing a little more.


 We have 3 levels.
 [3D Object] when the user is very close to
the ground (and is optional).
 [Camera Facing Sprite] + [Ground Aligned
Sprite] when the user is a little further away.

7
The switch between camera facing sprite+ground aligined sprite can be surprisingly close to the camera to be
convincing.

You can also get an excellent parallax effect if you go further and have multiple slices on the camera facing
sprite layer. GPU time was critical to us in Lair, so that wasn’t an option. This is being added though as a 4th
LOD.
Basic Idea

 Let me explain the LODing a little more.


 We have 3 levels.
 [3D Object] when the user is very close to
the ground (and is optional).
 [Camera Facing Sprite] + [Ground Aligned
Sprite] when the user is a little further away.
 [Ground Aligned Sprite] when the user is a
reasonable distance away from the camera.

7
The switch between camera facing sprite+ground aligined sprite can be surprisingly close to the camera to be
convincing.

You can also get an excellent parallax effect if you go further and have multiple slices on the camera facing
sprite layer. GPU time was critical to us in Lair, so that wasn’t an option. This is being added though as a 4th
LOD.
Level of Detail

 3 basic levels of detail

8
World Description

Each foliage layer should only represent one type of foliage, like grass for example.

Tweak-able nodes meaning you need to be able to change the size of the area represented by each node. The directly affects both
rendering and processing time.

The LOD levels should be independently tweak-able in the foliage layer.


World Description

 Use spatial subdivision of the world for


visibility and batching, quadtree is good or
octree, preferably you should use
something in engine already.

Each foliage layer should only represent one type of foliage, like grass for example.

Tweak-able nodes meaning you need to be able to change the size of the area represented by each node. The directly affects both
rendering and processing time.

The LOD levels should be independently tweak-able in the foliage layer.


World Description

 Use spatial subdivision of the world for


visibility and batching, quadtree is good or
octree, preferably you should use
something in engine already.
 The area each node represents in the world
will determine the accuracy and overall
density of the final foliage.

Each foliage layer should only represent one type of foliage, like grass for example.

Tweak-able nodes meaning you need to be able to change the size of the area represented by each node. The directly affects both
rendering and processing time.

The LOD levels should be independently tweak-able in the foliage layer.


World Description

 Use spatial subdivision of the world for


visibility and batching, quadtree is good or
octree, preferably you should use
something in engine already.
 The area each node represents in the world
will determine the accuracy and overall
density of the final foliage.
 These should be tweak-able, the number of
nodes and area definitely affect the balance
of power between main CPU and the worker
CPUs.

Each foliage layer should only represent one type of foliage, like grass for example.

Tweak-able nodes meaning you need to be able to change the size of the area represented by each node. The directly affects both
rendering and processing time.

The LOD levels should be independently tweak-able in the foliage layer.


Density Representation

10
Density Representation

 We use the density map to determine where


and how much of the foliage is rendered.

10
Density Representation

 We use the density map to determine where


and how much of the foliage is rendered.
 Must have low memory footprint for
platforms with low available memory in co-
processors.

10
Density Representation

 We use the density map to determine where


and how much of the foliage is rendered.
 Must have low memory footprint for
platforms with low available memory in co-
processors.
 DMA bandwidth and size friendly.

10
Density Representation

 We use the density map to determine where


and how much of the foliage is rendered.
 Must have low memory footprint for
platforms with low available memory in co-
processors.
 DMA bandwidth and size friendly.
 Traditionally content will want to be really
accurate (don’t annoy your content folks).

10
Density Representation

11
Density Representation

 Use byte map to describe density.

11
Density Representation

 Use byte map to describe density.


 Per byte this gives you 0 (nothing visible) to
255 (as dense as possible).

11
Density Representation

 Use byte map to describe density.


 Per byte this gives you 0 (nothing visible) to
255 (as dense as possible).
 But wait, one huge byte map for an entire
level is too large for low memory co-
processors ?

11
Density Representation

12
Density Representation

 Old school solution, character map.

12
Density Representation

 Old school solution, character map.


 Since we divided our world into nodes using
spatial subdivision, we can cover each area
with 16x16 pixels worth of density data.

12
Density Representation

 Old school solution, character map.


 Since we divided our world into nodes using
spatial subdivision, we can cover each area
with 16x16 pixels worth of density data.
 DMA friendly size.

12
Density Representation

 Old school solution, character map.


 Since we divided our world into nodes using
spatial subdivision, we can cover each area
with 16x16 pixels worth of density data.
 DMA friendly size.
 For most levels you can lossless compress
this data very well.

12
Density Representation

 Old school solution, character map.


 Since we divided our world into nodes using
spatial subdivision, we can cover each area
with 16x16 pixels worth of density data.
 DMA friendly size.
 For most levels you can lossless compress
this data very well.
 Hint: Ensure you also make empty =
character 0 in your map (for easy early
rejection of the node)

12
Visible Nodes / LOD Example

13
Visible Nodes / LOD Example

 Node

13
Job Management – CPU

14
Job Management – CPU

 Step 1: High level interrogation of engine


visibility system to determine whether a
node is visible.

14
Job Management – CPU

 Step 1: High level interrogation of engine


visibility system to determine whether a
node is visible.
 Step 2: For all visible nodes, query density
map at a macro level, remove any nodes
with zero density.

14
Job Management – CPU

 Step 1: High level interrogation of engine


visibility system to determine whether a
node is visible.
 Step 2: For all visible nodes, query density
map at a macro level, remove any nodes
with zero density.
 Step 3: Request memory from your per-
frame vertex buffer for each node that has
passed all the early out tests.

14
Job Management - Flow

15
Job Management - Flow

 Step 4: Take all the nodes remaining and


create jobs, submit to your engine job
manager.

15
Job Management - Flow

 Step 4: Take all the nodes remaining and


create jobs, submit to your engine job
manager.
 Step 5: Only when we pass the early outs
do we dispatch jobs to generate foliage.

15
Job Management - Flow

 Step 4: Take all the nodes remaining and


create jobs, submit to your engine job
manager.
 Step 5: Only when we pass the early outs
do we dispatch jobs to generate foliage.
 Step 6: Repeat for both ground LOD and
camera facing LOD.

15
Job Management - Flow

16
Job Management - Flow

 Tip: During development, obviously you’ll


want to dynamically grow your buffers and
available space for content generation.

16
Job Management - Flow

 Tip: During development, obviously you’ll


want to dynamically grow your buffers and
available space for content generation.
 Tip: For memory constrained games, you
can early out and just not draw if you sort
your nodes by depth, this is worst case.

16
Job Management – Performance

17
Job Management – Performance

 Having an individual job per node could be


costly ?

17
Job Management – Performance

 Having an individual job per node could be


costly ?
 Possible to batch multiple nodes into a
single Job and let the worker CPU’s work on
it.

17
Job Management – Performance

 Having an individual job per node could be


costly ?
 Possible to batch multiple nodes into a
single Job and let the worker CPU’s work on
it.
 This creates a problem, our objective is to
avoid stalling any single worker CPU for any
length of time.

17
Job Management – Performance

18
Job Management – Performance

 It’s all about balance and the current


performance characteristics of your main
CPU code.

18
Job Management – Performance

 It’s all about balance and the current


performance characteristics of your main
CPU code.
 We found it was better to have hundreds of
small short and sweet jobs that made the
worker CPU’s available for more important
tasks that weren’t so loosely coupled.

18
Job Management – Performance

19
Job Management – Performance

 Boiling it down, there are two cases.

19
Job Management – Performance

 Boiling it down, there are two cases.


 1) You have the perfect engine that only
dispatches fire and forget Jobs to your
worker CPUs (loosely coupled)

19
Job Management – Performance

 Boiling it down, there are two cases.


 1) You have the perfect engine that only
dispatches fire and forget Jobs to your
worker CPUs (loosely coupled)
 2) You live in the real world where your
engine requires the results of Jobs you
dispatch quickly (tightly coupled).

19
Job Management - Performance

20
Job Management - Performance

 Case 1: Batch everything up and only


dispatch a few large jobs.

20
Job Management - Performance

 Case 1: Batch everything up and only


dispatch a few large jobs.
 Case 2: Dispatch small individual jobs for
each Node (or really small groups of nodes).
You’ll lose some main CPU time, but, we
found it was better than stalling.

20
Worker-CPU/Thread

21
Worker-CPU/Thread

 Each node in our world represents a simple


quad area, which can become a task or job.

21
Worker-CPU/Thread

 Each node in our world represents a simple


quad area, which can become a task or job.
 Payload attached to this job includes
character index (quad index).

21
Worker-CPU/Thread

 Each node in our world represents a simple


quad area, which can become a task or job.
 Payload attached to this job includes
character index (quad index).
 World space bounding volume.

21
Worker-CPU/Thread

22
Worker-CPU/Thread

 Foliage parameter block.

22
Worker-CPU/Thread

 Foliage parameter block.


 1) Randomization of positions.

22
Worker-CPU/Thread

 Foliage parameter block.


 1) Randomization of positions.
 2) Animation type/parameters

22
Worker-CPU/Thread

 Foliage parameter block.


 1) Randomization of positions.
 2) Animation type/parameters
 3) Base Size

22
Worker-CPU/Thread

 Foliage parameter block.


 1) Randomization of positions.
 2) Animation type/parameters
 3) Base Size
 4) Randomization of sizes (easy variety)

22
Worker-CPU/Thread

23
Worker-CPU/Thread

 We have no dependencies, waiting on a job


to finish is bad.

23
Worker-CPU/Thread

 We have no dependencies, waiting on a job


to finish is bad.
 No memory management.

23
Worker-CPU/Thread

 We have no dependencies, waiting on a job


to finish is bad.
 No memory management.
 Using your job manager assign these jobs
as low as possible a priority (within reason)

23
Job - Implementation

24
LOD consists of an alpha fade out.

We have basically 3 LOD levels, ground aligned sprite, camera facing sprite and 3D Object.

Additional tricks available to fake large areas of foliage in the distance by increasing the size of the nodes that the ground aligned
sprites represent.
Job - Implementation

 First pass generate degenerate quads


(default no foliage case)

24
LOD consists of an alpha fade out.

We have basically 3 LOD levels, ground aligned sprite, camera facing sprite and 3D Object.

Additional tricks available to fake large areas of foliage in the distance by increasing the size of the nodes that the ground aligned
sprites represent.
Job - Implementation

 First pass generate degenerate quads


(default no foliage case)
 LOD.

24
LOD consists of an alpha fade out.

We have basically 3 LOD levels, ground aligned sprite, camera facing sprite and 3D Object.

Additional tricks available to fake large areas of foliage in the distance by increasing the size of the nodes that the ground aligned
sprites represent.
Job - Implementation

 First pass generate degenerate quads


(default no foliage case)
 LOD.
 Check the density map, this gives you a
probability that something should appear.

24
LOD consists of an alpha fade out.

We have basically 3 LOD levels, ground aligned sprite, camera facing sprite and 3D Object.

Additional tricks available to fake large areas of foliage in the distance by increasing the size of the nodes that the ground aligned
sprites represent.
Job - Implementation

 First pass generate degenerate quads


(default no foliage case)
 LOD.
 Check the density map, this gives you a
probability that something should appear.
 Query your height map unless you are lucky
enough to be working on a sports title 

24
LOD consists of an alpha fade out.

We have basically 3 LOD levels, ground aligned sprite, camera facing sprite and 3D Object.

Additional tricks available to fake large areas of foliage in the distance by increasing the size of the nodes that the ground aligned
sprites represent.
Job - Implementation

 First pass generate degenerate quads


(default no foliage case)
 LOD.
 Check the density map, this gives you a
probability that something should appear.
 Query your height map unless you are lucky
enough to be working on a sports title 
 Using your world bounding volume
generate a simple position within the quad
(initially this is a simple regular grid).

24
LOD consists of an alpha fade out.

We have basically 3 LOD levels, ground aligned sprite, camera facing sprite and 3D Object.

Additional tricks available to fake large areas of foliage in the distance by increasing the size of the nodes that the ground aligned
sprites represent.
Job - Implementation

25

Ensure the noise function you use is deterministic.


Job - Implementation

 Add some noise to the positions, using the


parameters in the payload (relax the
positions).

25

Ensure the noise function you use is deterministic.


Job - Implementation

 Add some noise to the positions, using the


parameters in the payload (relax the
positions).
 Depending on LOD type of [ground] or
[camera-facing] you will want to generate 4
vertices or 1.

25

Ensure the noise function you use is deterministic.


Job - Implementation

 Add some noise to the positions, using the


parameters in the payload (relax the
positions).
 Depending on LOD type of [ground] or
[camera-facing] you will want to generate 4
vertices or 1.
 Procedural Animation (warning, when
content ask for “silly animation”, they don’t
actually want “silly” animation).

25

Ensure the noise function you use is deterministic.


Job - Implementation

 Add some noise to the positions, using the


parameters in the payload (relax the
positions).
 Depending on LOD type of [ground] or
[camera-facing] you will want to generate 4
vertices or 1.
 Procedural Animation (warning, when
content ask for “silly animation”, they don’t
actually want “silly” animation).
 Write final data to output buffer and
transfer data or release resources.

25

Ensure the noise function you use is deterministic.


Rendering

26
Rendering

 Yay, our first and only stall is here, we have


to wait for ALL our jobs to be finished.

26
Rendering

 Yay, our first and only stall is here, we have


to wait for ALL our jobs to be finished.
 If you correctly kicked off the Jobs at the
beginning of your game loop this stall
shouldn’t cost much

26
Rendering

 Yay, our first and only stall is here, we have


to wait for ALL our jobs to be finished.
 If you correctly kicked off the Jobs at the
beginning of your game loop this stall
shouldn’t cost much
 Our final output from the CPU farm is a
huge display list.

26
Rendering

 Display list example

27
Rendering

28
Rendering

 We have/can have multiple layers of foliage,


which can represent (green grass, wheat,
flower, etc etc).

28
Rendering

 We have/can have multiple layers of foliage,


which can represent (green grass, wheat,
flower, etc etc).
 Each layer of foliage has a [ground] and
[camera facing] sub-list.

28
Rendering

 We have/can have multiple layers of foliage,


which can represent (green grass, wheat,
flower, etc etc).
 Each layer of foliage has a [ground] and
[camera facing] sub-list.
 Our display list is contiguous in memory so
we have the perfect single batch of each
type for rendering.

28
Rendering

29
Note: If you have an expensive vertex shader, this might show up, was not an issue for us.
Rendering

 Anything that wasn’t visible will


immediately be rejected by the GPU anyway
since we injected degenerate quads, this is
fast and didn’t even show up on our
profiles.

29
Note: If you have an expensive vertex shader, this might show up, was not an issue for us.
Rendering

 Anything that wasn’t visible will


immediately be rejected by the GPU anyway
since we injected degenerate quads, this is
fast and didn’t even show up on our
profiles.
 Final draw code needs to setup shader once
depending on list and type.

29
Note: If you have an expensive vertex shader, this might show up, was not an issue for us.
Rendering

 Anything that wasn’t visible will


immediately be rejected by the GPU anyway
since we injected degenerate quads, this is
fast and didn’t even show up on our
profiles.
 Final draw code needs to setup shader once
depending on list and type.
 We have two lists for speed, flat (aligned to
ground) and camera facing.

29
Note: If you have an expensive vertex shader, this might show up, was not an issue for us.
Rendering

30
Rendering

 For very primitive sorting, we setup and


draw all the [ground] lists first.

30
Rendering

 For very primitive sorting, we setup and


draw all the [ground] lists first.
 We draw all the camera facing foliage last
(for all layers of foliage, this prevented
most sorting issues).

30
Rendering

 For very primitive sorting, we setup and


draw all the [ground] lists first.
 We draw all the camera facing foliage last
(for all layers of foliage, this prevented
most sorting issues).
 Alpha is always an issue, for most games,
you can get away with drawing your foliage
last.

30
Rendering

 For very primitive sorting, we setup and


draw all the [ground] lists first.
 We draw all the camera facing foliage last
(for all layers of foliage, this prevented
most sorting issues).
 Alpha is always an issue, for most games,
you can get away with drawing your foliage
last.
 Kick off the drawing…

30
Q&A

31
Q&A

 Any Questions ?

31

Das könnte Ihnen auch gefallen