Uniform Buffers VS Texture Buffers
OpenGL 3.1 introduced two new sources from where shaders can retrieve their data, namely uniform buffers and texture buffers. These can be used to accelerate rendering when heavy usage of application provided data happens like in case of skeletal animation, especially when combined with geometry instancing. However, even if the functionality is in the core specification for about a year now, there are few demos out there to show their usage, as so, there is a big confusion around when to use them and which one is more suitable for a particular use case.
Both AMD and NVIDIA have updated their GPU programming guides to present the latest facilities provided by both OpenGL and DirectX, however I still see that people don’t really understand how they work and that prevents them from effectively taking advantage of these features.
Once, at some online forum, I found somebody arguing why is this whole confusion introduced by the Khronos Group and why there is no general buffer type to use instead and the decision whether to use uniform or texture buffers should be a decision made by the driver. This particular post motivated me to write this article.
By the way, it seems suitable for the application to have such an abstraction, however, one should never forget that OpenGL is just a thin layer on top of any graphics capable hardware and as such, it should not hide such details that in the hand of a good programmer can provide added performance benefits.
When using as input to shaders, both uniform buffers and texture buffers have their strengths and weaknesses that are public to application developers, especially taking into account the detailed descriptions of each in the corresponding GPU programming guides of the vendors. It would be very difficult if not impossible for the driver to decide which particular buffer type to use based on shader source code and it would provide less flexibility to the programmer.
For the developer to decide which of the two should be used for a particular purpose one must investigate the characteristics of both and make the choice based on that. To ease this decision I will try to present the most important features of both. I will also talk about what I’ve used them for and what results I’ve achieved.
Maximum size: 64KByte (or more)
Memory access pattern: coherent access
Memory storage: usually local memory
Use case examples: geometry instancing, skeletal animation, etc.
Uniform buffers were introduced in OpenGL 3.1 but are available on driver implementations that don’t conform to the version 3.1 of the standard via the GL_ARB_uniform_buffer_object extension. As the specification says, uniform buffers provide a way to group GLSL uniforms into so called “uniform groups” and source their data from buffer objects to provide more streamlined access possibilities for the application.
As uniform buffers are relatively small they can easily fit in local memory. This makes data access instant thus provide optimum performance when the size constraints don’t prevent the application developer to use them. However, vendors also state that uniform buffers prefer a sequential memory access pattern. This means that it performs best when the data in the uniform buffer accesses are relative local, however, it does not necessarily mean that this sequential read must occur in one shader execution as, like in case of geometry instancing, subsequent shader executions can provide the desired access pattern.
Personally I use them for instanced rendering by storing the model-view matrix and related information of each and every instance in a common uniform buffer and use the instance id as an index to this combined data structure. This usage performs very well on my system.
Also uniform buffers can be used to store the matrices of bones and use them for implementing skeletal animation, however, I personally prefer using normal 2D textures for this purpose to take advantage of the free interpolation thanks to the dedicated texture fetching units but that’s another story.
Uniform buffers can also be used for other rendering techniques like skinned instancing or geometry deformation but the buffer size limitation may prevent such use case scenarios.
Maximum size: 128MByte (or more)
Memory access pattern: random access
Memory storage: global texture memory
Use case examples: skinned instancing, geometry tesselation etc.
Texture buffers were also became core OpenGL in version 3.1 of the specification but are available also via the GL_ARB_texture_buffer_object extension (or via the GL_EXT_texture_buffer_object extension on earlier implementations). Buffer textures are one-dimensional arrays of texels whose storage comes from an attached buffer object.
They provide the largest memory footprint for raw data access, much higher than equivalent 1D textures. However, they don’t provide texture filtering and other facilities that are usually available for other texture types. They represent formatted 1D data arrays rather than texture images. From some perspective, however, they are still textures that are resided in global memory so the access method is totally different than that of uniform buffers’. This has both advantages and disadvantages.
First, global texture memory access means texture fetching which involves the usage of a texture unit and possibly requires several clock cycles to complete. Anyway, thanks to the latency hiding mechanisms inside today commodity GPUs sometimes this can be as cheap as accessing uniform buffers. This part of the story is implementation dependent and is up to the hardware vendor. However, as stated in their programming guides, both AMD and NVIDIA have such latency hiding facilities and they also suggest that one should not expect a huge performance impact when using texture buffers.
Anyway, texture memory access provides a huge benefit compared to uniform buffers. Textures are more prone to scattered accesses and thus are more capable of dealing with random memory access. As the AMD HD2000 series programming guide says, if a certain set of data is accessed in a very random fashion it may be even faster to use texture fetches than indexed uniform access.
So even if texture buffers can be used in the same use case scenarios as uniform buffers, performance of either depends much more on the actual shader implementation rather than on the hardware implementation of the features.
Beside the aforementioned use cases, texture buffers can be used in more advanced techniques like instanced skeletal animation or even for implementing geometry tesselation, however I’m not convinced that it has any practical usage as it involves such tricks that don’t perform well on current hardware. Personally I use texture buffers for different geometry deformation techniques, to resolve batching issues when the size limitation of uniform buffers is a blocking factor, and for some inverse kinematics effects.
By the way, from now it’s your task to draw a conclusion based on the information read here but I recommend to read the mentioned programming guides to see a more accurate presentation of both methods. My personal conclusion is that there is no ultimate choice as both buffer types serve different purposes. Even if their possible use cases overlap, there are plenty of rendering techniques that would take advantage of the benefits of one but would suffer from the disadvantages of the other.
For further details on the topic, please refer to the OpenGL extension registry and the vendor supplied GPU programming guides:
- AMD’s ATI Radeon HD2000 programming guide: http://developer.amd.com/media/gpu_assets/ATI_Radeon_HD_2000_programming_guide.pdf
- NVIDIA’s G80 GPU programming guide: http://developer.download.nvidia.com/GPU_Programming_Guide/GPU_Programming_Guide_G80.pdf
|Print article||This entry was posted by Daniel Rákos on January 18, 2010 at 10:18 pm, and is filed under Graphics, Programming. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site.|
No trackbacks yet.
about 1 year ago - 80 comments
I’ve chosen the title based on the popular article that tries to prove that OpenGL lost the war against Direct3D. To be honest, I didn’t really like the article at all. First, because it compared OpenGL 3 which targeted Shader Model 4.0 hardware and DirectX 11 which targeted Shader Model 5.0 hardware. Besides that, as we
about 1 year ago - 6 comments
After the release of the OpenGL 4.1 specification the Khronos Group slowed down the pace a little bit but they didn’t left OpenGL developers without a new specification version for too long as a few weeks ago they’ve released OpenGL 4.2. The new version of the specification brings several API improvements as well as exposes
about 1 year ago - 3 comments
You might remember that I wrote an article about my suggestions for OpenGL 4.2 and beyond. One of the features that I recommended to be added to OpenGL was a yet non-existent extension called GL_ARB_draw_indirect2 which suggested the addition of new draw commands that are similar in fashion to the ancient MultiDraw* commands but they
about 2 years ago - 16 comments
In this article, I would like to present you an edge detection algorithm that shares similar performance characteristics like the well-known Sobel operator but provides slightly better edge detection and can be seamlessly extended with little to no performance overhead to also detect corners alongside with edges. The algorithm works on a 3×3 texel footprint
about 2 years ago - 29 comments
The Khronos Group did a great job in the last few years to once again prove that OpenGL is still in game and that it can become the ultimate graphics API of choice, if it is not that already. However, we must note that it is not quite yet true that OpenGL 4.1 is a
about 2 years ago - 12 comments
Currently there are several ways to feed data to the GPU no matter of what API we use and what type of application we develop. In case of OpenGL we have uniform buffers, texture buffers, texture images, etc. The same is true for OpenCL and other compute APIs that even provide more fine-grained memory management
about 2 years ago - 6 comments
Dynamic geometry level-of-detail (LOD) algorithms are very popular and powerful algorithms that provide a great level of rendering performance optimization while preserving detail by using less detailed geometry for objects that are far away, too small or otherwise less significant in the quality of the final rendering. Many of these are used since the very
about 2 years ago - 26 comments
Hierarchical-Z is a well known and standard feature of modern GPUs that allows them to speed up depth testing by rejecting large group of incoming fragments using a reduced and compressed version of the depth buffer that resides in on-chip memory. The technique presented in this article uses the same basic idea to allow batched
about 2 years ago - 18 comments
OpenGL 3.0 capable GPUs introduced a level of processing power and programming flexibility that isn’t comparable with any earlier generations. After that, OpenGL 4.0 and the hardware supporting it even further pushed the limits of what previously seemed to be impossible. Thanks to these features nowadays more and more possibilities are available for the graphics
about 2 years ago - 4 comments
With the introduction of Shader Model 5.0 hardware and the API support provided by OpenGL 4.0 made GPU based geometry tessellation a first class citizen in the latest graphics applications. While the official support from all the commodity graphics card vendors and the relevant APIs are quite recent news, little to no people know that