Sie sind auf Seite 1von 4

2010 Second International Conference on Computer Modeling and Simulation

Real-time Rendering for 3D Game Terrain with GPU Optimization

Bing Tang LanFang Miao *


College of Math, Physics and Information Engineering College of Math, Physics and Information Engineering
Zhejiang Normal University Zhejiang Normal University
Jinhua, China, 321004 Jinhua, China, 321004
email :tangb888@163.com * Corresponding author email: mlf@zjnu.cn

Abstract—In this paper, a more suitable algorithm for 3D game II. SUMMARY OF GEOMETRY CLIPMAPS
terrain real-time rendering is put forward based on Geometry
Clipmaps algorithm and combined with 3D games. In this Geometry Clipmap, which is similar to Texture Clipmap,
algorithm, the entire terrain can be rendered through only one was put forward by Frank Losasso and Hugues Hoppe in
draw call by improving the structure of grid and the ACM 2004[4]. In GPU Gems2, Arul Asirvatham and
organizational structure of the index, which can take full Hugues Hoppe’s article [5] carried out the algorithm
advantage of GPU. And by increasing the index of vertices in optimized for GPU. In addition to decompression, all other
the rougher adjacent level, the issue of crack and jumping can operations are implemented by the GPU, which greatly
be resolved. Using alpha blend texture to control the location
improved the efficiency of the algorithm and reduced the
and weight of the texture of terrain, it makes the control of
burden on the CPU.
color texture more flexible. The results of experiment show
that the real-time rendering of the terrain can be done very
GeoClipmap uses 2D elevation image for preservation
well with this algorithm and it has higher rendering efficiency of terrain height data. Centered on viewpoint, the grid of
than traditional game terrain algorithm. terrain is organized into L layers, and outer grid spacing is 2
times the inner layer (see Figure1). The algorithm defines a
Keywords- Game Terrain;Real-time Rendering;GPU square clip window of n×n samples within each level. The
height data of the clip window updates with the moving of
I. INTRODUCTION view point. The original algorithm is used in GIS, and the
In the traditional game engines, there are many methods data of GIS is on the scale of tens of GB or even hundreds of
to organize and render outdoor terrain, such as Continuous GB, so the terrain data will exist in the form of highly
Level of Detail(CLOD) [1], Octree, Real-time Optimally compressed. And each grid will be divided into many
Adaptive Mesh(ROAM) [2]and so on. These algorithms used sub-blocks, to facilitate the CPU to carry out cutting.
the technique of LOD. The area near the view position uses Therefore, it needs to extract the compressed data in the
the high-resolution grid, and the far area uses the process of updating the terrain, which consumes a large
low-resolution grid. The main idea of these methods is to use amount of time of CPU.
CPU to reduce the number of vertexes that are transferred to
the GPU. Those algorithms can real-time render large-scale
terrain and made great success in the previous era in which
GPU has very weak performance. However, with the rapid
development of GPU, the computing speed of GPU has been
much higher than CPU. The floating-point operations per
second (FLOPS) of Geforce8800GTX is up to 300GFLOPS,
but the popular CPU with dual-core 3.0GHZ is only
50GFLOPS. With increasingly powerful GPU performance,
the method that reduces the number of vertex of CPU to
GPU is out of date. In order to excavate the potential of
modern GPU, it is need to reduce the number of draw calls
and the transmission of data of CPU to GPU[3]. Because in
each drawing call, CPU must be prepared to set a lot of state Figure1 GeoClipmap hierarchical grid, the inner layer is dense
and outer layer is sparse.
for rendering in the device drives.
In 2004, Hoppe proposed Geometry Clipmaps
(GeoClipmap)[4] algorithm for terrain rendering, which III. ALGORITHM DESCRIPTION IN THIS PAPER
improved greatly the speed of terrain rendering. However, A. Height map
the area of the consideration of GeoClipmap algorithm is in
the application of GIS. However, the scale of terrain in 3D The height map is a grayscale image used to store
games is not as big as GIS. In this paper, combined with the elevation data for the terrain. In the original paper [4], Hoppe
characteristics of 3D games, the GeoClipmap algorithm is stored the height map in the video memory, and used the
improved for more fitting to render game terrain. function of Vertex Texture Fetch(VTF) of shader model 3.0
to sample height. However, in this paper, the height map is

978-0-7695-3941-6/10 $26.00 © 2010 IEEE 196


198
DOI 10.1109/ICCMS.2010.146

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 27,2010 at 20:53:09 UTC from IEEE Xplore. Restrictions apply.
stored in system memory. First, many ATI display card can’t can accelerate GPU to read data. And the buffer of vertex is
support VTF, so we store the height map in the system set to only-write for optimization of CPU write operation.
memory to adapt more players. Second, the players need to
roam in the terrain, so the CPU must know the value of
height of the position of the player. It is very slow for CPU 2) Buffer Data Organization
to acquire the height from the video memory. Finally, for The structure of Vertex data is as follows:
many of NVIDIA graphics card, the speed of VTF is struct VERTEX
relatively slow. {
short x;
B. Grid structure short z;
In order to real-time render ultra-large-scale time in GIS, float y;
the grid is divided into many blocks for facilitating CPU to };
cutting data in the original paper [4], which greatly increase The index buffer is organized into triangle strips. In order to
the complexity of the algorithm. In addition, CPU must draw the entire terrain through only one DIP, the indexes of
judge each sub-block’s visibility, therefore, it is impossible degenerate triangle are increased. As shown in Figure 3,
to render the entire terrain by calling only one there are four rows in each inner layer and the outer layer. If
DrawIndexedPrimitive(DIP). As the game terrain data is no additional indexes are increased, there will be eight strips
much smaller than the GIS, it may consume more time to use need to call 8 times rendering draw. In order to reduce the
CPU cutting than to use GPU rending directly. In this paper, number of strips, additional indexes are added for linking
terrain mesh has been simplified in order to reduce the strips together. The indexes of vertex are organized as
complexity of the algorithm and achieve one DIP to render follows: 5,0,6,1,7,2,8,3,9,4,4,10,10,5,
the entire terrain. In this article, each layer of grid size 2^n 11,6,12,7 ,13,8,14,9,9,15,15,10,16,11,
+1, and the innermost layer of the grid is complete accurate. 17,12,18,13,19,14,14,20,20,15,21,16,22,
And the outer layer is sparse (see Figure 2). In each layer, the
17,23,18,24,19,19,30,30,25,0,26,1,27,
grid is a whole block, and has no sub-block.
In such a grid structure, it is easy to render whole terrain 2,27,3,28,4,29,31,31,32,32,30,30,0,30,
by only one DIP, and it is also helpful to eliminate cracks 5,32,10,10,4,4,31,9,33,14,33,33,34,34,
and the jumping of terrain. 32,32,10,32,15,34,20,20,14,14,33,19,35,
24,35,35,36,36,34,20,37,21,38,22,38,23,
39,24,40。 In the end of first line and the beginning of
second line, the two indexes 4 and 10 can be set twice, so
degenerate triangle are increased, which make the whole
inner block of grid organized into a triangle strip. Similarly,
the index 24 and 30 can be set twice, which link finer layer
and the rough layer together into a whole block. In this
method, rendering the whole terrain only need one DIP,
while the original algorithm in the article [5] requires at least
3L +2 sub-calls, if we can set L = 5, we need to call 17 times
at least.
Figure2. Structure of grid, increase the index of vertex
of rough level, for eliminating cracks and jumping.

C. Vertex buffers and index buffers


1) Create buffer
In this article, all the algorithms are based on Directx.
There are three storage areas in the Directx: system memory,
AGP memory and video memory. For the GPU, the speed of
reading is: video memory > AGP memory > system
memory,and it is just the opposite for the CPU.
CPU needs to write updated data to cache of vertex and
does not need to fetch data from it. When CPU writes data to
the video memory, 32 or 64 bytes sized cache is allocated.
Only when the buffer is filled fully, the vertex data is
translated to GPU. So it is faster for CPU to write data from
the video memory than to read from it. Therefore, the vertex Figure3. The triangles are organized into a strip with indexes
data and index data are stored in the video memory, which

199
197

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 27,2010 at 20:53:09 UTC from IEEE Xplore. Restrictions apply.
D. The elimination of cracks and jumping However, it is obvious that this method will make the texture
There will be T-connections and cracks in the border of of each scene be similar and monotonous. In this paper, we
grid without processing, because of difference of resolution use alpha blend texture to control the specific location for
between adjacent layers. As can be seen from figure 3, there texture posted. For example, in the figure 4, we can fill area
will be cracks in the border points 1,3,5,9…. In the original A with red, B with green and C with blue color. This can be
paper [4], by increasing a transitional area in the border area done by Photoshop easily.
and using the geometric morphing, the height of vertex of the In the pixel shader, use the following formula to
transition region can change gradually from one level to the computer the color:
adjacent level. Color = w×( n1×blend.r×grass + n2×blend.g×stone +
However, the drawback of this method is that it needs n3×blend.b×path);
additional storage for the height of rough layer, and needs to w = 1.0 / ( blend.r + blend.g + blend.b );
calculate an alpha value for morphing. In this article, the size Grass, stone and path are the color value that sample
of grid of each layer is 2^n +1. In the boundaries of the grid, from the texture. Parameter blend.r is the red color
the height of rough layer is same as the fine layer in components of the alpha blend texture. blend.g is the green
even-numbered points but different in odd-numbered points. color components and blend.b is the blue color components.
However, by adding the index of the rough layer in the Parameters n1, n2 and n3 are used to control the brightness
odd-numbered points, the rough layer has the same index of each color of texture. If you want to use much more
with the fine layer in these points, which maintains the same texture mapping, you can create more than one alpha blend
height of coarse and fine layers in the boundary area. This texture to control the weight of each color texture.
method can completely prevent the appearance of cracks and DXT1 format is used for the compression of texture,
does not need to store an extra height of the coarse layer, which can significantly reduce the storage space of texture.
which can help to reduce the translation of data from CPU to GPU can rapidly decompress the DDS files with DXT1
the GPU. format. Using the format of DXT1 can greatly reduce the
The great advantage of this method is that it is not size of the file and the bandwidth from CPU to GPU. It also
necessary to handle jumping and the complex calculations of can avoid the papers[5] of the CPU decompression.
alpha values. The reason is that increasing the additional
index of the rough level in the border region can make the
height and the number of vertices in fine and rough layers be
consistent in the border region. And therefore, there is no
jumping that generated by the inconsistency of the
interpolation of the height of rough level and original height
of the fine level.

E. Update buffer.
The index buffer is unchanged and the vertex buffer
needs to update with the moving of view point. When CPU
updates the vertex buffer, it needs to translate the data from
CPU to GPU. For increasing the speed of rendering, we
should try to reduce this transmission. In this article, it is
not necessary to update the buffer in each frame. Only
Figure 4, alpha blend texture, used to control
when the viewpoint of current frame and viewpoint last the weight of each color texture.
frame is more than one grid space, the update will be done.
In order to reduce the number of updates, you can zoom G. The Light of terrain
terrain grid x, z, making the space of the grid cell equal 4 or In the paper [5], the authors real-time calculate the light
8 times moving step of the virtual characters. In this case, if of terrain using the vector of normal. This method is good for
the virtual character does not move, then do not need to dynamic light. But for the computer game, the position of
update the scene vertex buffer, and when the virtual light of a scene is always fixed. Therefore, it is better to use
characters move, we update the buffer only each 4 or 8 the lighting texture map for modulating. The lighting map
frames. can be pre-made by relevant software. In the run-time, the
lighting map is passed to pixel shader and mixes with the
F. Terrain texture color texture. This approach can reduce the run-time of
The terrain of computer game must have high-realistic calculating illumination and get very real effect.
texture and the texture must be placed in desired location.
For example, the texture of pebbles is arranged in the river, IV. EXPERIMENTS AND RESULTS
the grass in the plain, while the texture of path is posted in
the middle of the grass and so on. Therefore, there must be a Our main data is a 4096×4096 height map. We render
way to control the position of these textures where you want this terrain into a 1024×768 window on a PC with a 1.6 GHz
to post. In some conventional methods, the position of the Sempron2800+ CPU, 1GB system memory, and an NVIDIA
texture that will be pasted depends on the height of terrain. GeForce6600GT GPU with 128MB video memory. The

200
198

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 27,2010 at 20:53:09 UTC from IEEE Xplore. Restrictions apply.
size of clip window is 129×129, and the number of level is 5 The experiments show that the algorithm in this article
layers. Table 1 shows the comparison results between the takes fully advantage of GPU capabilities. In our algorithm,
method used in this article and the method of Quadtree in GPU idle time is 0ms and the number of DIP is one times. In
traditional game engine. Figure 6, 7 are two screenshots of the Quadtree algorithm, GPU idle time is 11ms, while the
experimental results tested by the software NVIDIA number of DIP is as high as 63 times. Compared to 133
PerfHUD6. frames per second with Quadtree algorithm, the frame rate in
our algorithm is 260 frames per second, which greatly
TABLE I, THE COMPARISON TABLE BETWEEN THE improves the efficiency of terrain rendering.
ALGORITHM IN THIS ARTICAL AND THE ALGORITHM OF
QUANDTREE.
Method of Method of V. SUMMARY
this article Quadtree
FPS(frames / sec) 260 133 In this paper, the algorithm Geometry Clipmaps is
DrawPrim count(red line) 1 63 improved to more fit for rendering terrain of game. By
GPU idle time(green line) 0ms 11ms
improving the grid structure, cleverly setting up the index,
and setting the degradation of the triangle, the entire terrain
In the Table I, FPS is the speed of the frame, and it can be render only with one draw call. Cracks and jumping
reflects the speed of the algorithm. The DrawPrim Count are eliminated by increasing the index of rough level in the
denotes the number of draw call in each frame, and the GPU boundary area. In this algorithm, GPU are fully utilized and
idle time denotes the time for GPU to wait for CPU. the frame rate has been improved more greatly than the
DrawPrim Count and GPU idle time are tested by using the Quadtree algorithm. And using the alpha blend texture to
software NVIDIA PerfHUD 6. Because NVIDIA PerfHUD6 control the color texture blend, it increases flexibility in
software itself must spend a part of the time, DrawPrim controlling texture mapping.
count and GPU idle time are bit different from the result
without using NVIDIA PerfHUD6. However, the data still REFERENCES
can reflect the speed of the two algorithms very well.
[1] Lindstrom Koller D, Ribarsky W, Hodges L, Faust N, and
Turner G, “Real—time,continuous level of detail rendering of
height fietds,” [C] ACM SIGGRAPH 1996.USA:ACM,1996
pp. 109—118.
[2] M. Duchaineau, M. Wolinsky, D. Sigeti, M. Miller, C. Aldrich, and
M. Mineev –Weinstein, “ROAMing terrain: Real-time optimally
adapting meshes,” Proceeding of IEEE Visualization 97,
pp.81-88,1997
[3] John Owens, “Streaming Architectures and Technology Trends GPU”
Gems 2 ,ADDISON-WESLEY ,2005,7
[4] Frank Losasso and Hugues Hoppe, “Geometry clipmaps: terrain
Figure 5 The testing screenshot with my algorithm (using the NVIDIA rendering using nested regular grids,” ACM Transactions on
PerfHUD 6 Performance Test) Graphics, pp:769-776,2004
[5] Arul Asirvatham and Hugues Hoppe, “Terrain Rendering Using
GPU-Based Geometry Clipmaps,” GPU Gems 2, ADDISON
-WESLEY,2005,7

Figure 6 The testing screenshot with Quadtree algorithm (using the NVIDIA
PerfHUD 6 Performance Test)

201
199

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 27,2010 at 20:53:09 UTC from IEEE Xplore. Restrictions apply.

Das könnte Ihnen auch gefallen