OpenGL 4.0 - Mountains demo

OpenGL 4.0 - Mountains demo

Dynamic geometry level-of-detail (LOD) algorithms are very popular and powerful algorithms that provide a great level of rendering performance optimization while preserving detail by using less detailed geometry for objects that are far away, too small or otherwise less significant in the quality of the final rendering. Many of these are used since the very beginning of computer graphics technologies and are present in some form in current CAD softwares, video games and other graphics applications. While determining the appropriate geometry LOD was previously the task of the CPU, with todays hardware it is possible to also offload this to the GPU which excels at handling large amount of objects in parallel.

Introduction

With the advent of Shader Model 5.0 GPUs and the appearance of programmable tessellation hardware it may seem like the geometry LOD problem is solved once and for all. However, in many cases it is simply not enough as for far away objects even a patch pass-through tessellation shader already produces too much geometry than the added detail worths. As a result, classic geometry LOD algorithms are still a good-to-have feature in the tool-box of the developer. Not to mention that all vendors recommend disabling tessellation shaders at all if we don’t need any geometry amplification as even a pass-through tessellation shader does have its payload.

This means that there has to be still a conventional rendering path for geometries that should not be tessellated. Then why not to try offloading the geometry LOD determination to the GPU if possible?

This article presents a technique that was already presented by AMD’s March of the Froblins demo and by NVIDIA’s Skinned Instancing demo and allows GPU based dynamic geometry LOD determination using a geometry shader that selects the most appropriate LOD from a group of geometry LODs based on the object’s distance from camera. While this article and the reference implementation (OpenGL 4.0 – Mountains demo) presents the application of the technique only for instanced geometry, the same method can be easily extended to support heterogeneous objects by taking advantage of the latest functionalities introduced in OpenGL 4.

The algorithm

The technique is based on the geometry shader’s ability to emit or deny the emission of primitives into a transform feedback buffer as done in the mentioned DX based implementations. One major improvement compared to earlier approaches is that the LOD determination is done in a single pass rather than requiring a separate pass for each geometry LOD. Additionally, this LOD determination pass can be also merged together with other visibility determination passes like Instance Cloud Reduction or Hierarchical-Z map based occlusion culling as it is done in the reference implementation. This was made possible thanks to the latest transform feedback capabilities introduced in OpenGL 4.0 (see the extension ARB_transform_feedback3) that enables the geometry shader to output data to separate primitive streams.

Culling and dynamic LOD in the March of the Froblins demo

Flow-chart presenting the culling and dynamic LOD algorithms used in AMD's March of the Froblins demo. The implementation needs five passes for culling and separating three detail levels and performs two asynchronous queries meanwhile. Requires OpenGL 3 compliant hardware.

Culling and dynamic LOD in the Mountains demo

Flow-chart presenting the culling and dynamic LOD algorithm used in our Mountains demo. The implementation requires only one pass for culling and separating three detail levels without the need to use asynchronous queries. Requires OpenGL 4 compliant hardware.

The algorithm itself is very simple and straightforward. For each object instance determine the appropriate geometry LOD based on it’s distance from the camera and the LOD distances passed as uniform to the shader. After this, output the instance’s data to the output stream ID that corresponds to the determined LOD’s index. Here you can see a GLSL implementation of the algorithm:

#version 400 core

uniform mat4 ModelViewMatrix;
uniform vec2 LodDistance;

layout(points) in;
layout(points, max_vertices = 1) out;

in vec3 InstancePosition[1];

layout(stream=0) out vec3 InstPosLOD0;
layout(stream=1) out vec3 InstPosLOD1;
layout(stream=2) out vec3 InstPosLOD2;

void main() {
  float distance = length(ModelViewMatrix * vec4(InstancePosition[0], 1.0));
  if ( distance < LodDistance.x ) {
    InstPosLOD0 = InstancePosition[0];
    EmitStreamVertex(0);
  } else
  if ( distance < LodDistance.y ) {
    InstPosLOD1 = InstancePosition[0];
    EmitStreamVertex(1);
  } else {
    InstPosLOD2 = InstancePosition[0];
    EmitStreamVertex(2);
  }
}

Additionally, the geometry LOD determination pass has to be executed with primitive queries enabled for all the relevant output streams to acquire the number of instances for each geometry LOD index:

for (int i=0; i<NUM_LOD; i++)
  glBeginQueryIndexed(GL_PRIMITIVES_GENERATED, i, lodQuery[i]);

glBeginTransformFeedback(GL_POINTS);
  glDrawArrays(GL_POINTS, 0, instanceCount);
glEndTransformFeedback();

for (int i=0; i<NUM_LOD; i++)
  glEndQueryIndexed(GL_PRIMITIVES_GENERATED, i);

Finally, the only thing what is left is to issue an instanced draw call for each geometry LOD index to draw all the instances:

for (int i=0; i<NUM_LOD; i++) {
  glGetQueryObjectiv(lodQuery[i], GL_QUERY_RESULT, instanceCountLOD[i]);
  if ( instanceCountLOD[i] > 0 )
    glDrawElementsInstanced(..., instanceCountLOD[i]);
}

That’s all, and what you get as a result is a fully GPU based geometry LOD selection algorithm.

The Mountains demo

The reference implementation provided as part of the OpenGL 4.0 – Mountains demo that is available with full source code and Windows executable in the downloads section. The demo application implements the same visibility determination algorithms that were presented in the SIGGRAPH 2008 Course Notes besides the dynamic geometry LOD algorithm presented here in a single pass.

Dynamic LOD can be enabled in the demo by using the F3 key. After enabled, the demo separates the various geometry detail levels according to the LOD distances configured. As it can be seen, there is almost no visible difference between the scene rendered with dynamic geometry LOD enabled and disabled. Also, by setting the LOD distances appropriately, the algorithm provides seamless transition between subsequent geometry detail levels as the camera is moved.

Close-up view to compare image quality without and with dynamic LOD

Close-up view of distant objects to compare the image quality without (left) and with (right) dynamic LOD.

LOD visualization

Geometry LOD visualization: LOD 0 (red), LOD 1 (green), LOD 2 (blue).

When dyamic LOD is enabled, the demo also makes it possible to visualize the various geometry detail levels by pressing the F4 key. The highest detail LOD is marked with red, mid-level with green and the lowest detail geometries are marked as blue. It can be seen that as the camera moves the renderer automatically adjusts the detail of each individual instance.

Besides maintaining a constant quality without the viewer to observe any transitions between the various detail levels, the algorithm provides a huge performance gain in case of complex geometries as it can be seen on the figure below:

Performance comparison of various culling and LOD techniques in frames per second on a Radeon HD5770 (higher is better)

Performance comparison of the demo in frames per second on a Radeon HD5770 (higher is better): no culling (bottom), instance cloud reduction (middle), ICR + Hi-Z map based occlusion culling (top), no geometry LOD (blue), dynamic geometry LOD (red).

Conclusion

We’ve seen how straightforward is to implement GPU based dynamic geometry LOD determination using geometry shaders on OpenGL 4.0 compliant hardware providing also a reference implementation that uses the algorithm to efficiently determine detail levels for large number of instanced geometry. We also briefly mentioned that the algorithm can be extended to handle arbitrary object sets. We discussed about a possible OpenGL 3 based implementation but we did not provide one as it requires several rendering passes to perform all the operations that can be implemented in a single pass on Shader Model 5.0 hardware.

Even though the algorithm is already extremely efficient, it still involves the use of asynchronous primitive queries that may induce some latency. Of course, this latency can be easily hidden by performing other operations on the CPU/GPU until the results are available.

Furthermore, taking full advantage of Shader Model 5.0 GPUs it would be possible to eliminate the need of asynchronous queries by using atomic counters and indirect rendering, however the core OpenGL specification does not expose yet such functionality so this improvement is left for a future release of the demo.

Classic dynamic geometry LOD algorithms are still first class citizens of every rendering system and even though the introduction of hardware tessellation somewhat subsumes the need for these classic techniques, practice shows that the best way to implement a full-fledged dynamic LOD system is by using geometry LOD selection and tessellation together rather that one instead of the other.