An introduction to OpenGL 4.1
The Khronos Group keeps the pace that they set themselves being able to deliver the latest specification of OpenGL less than half year after the revolutionary appearance of OpenGL 4. Abandoning the OpenGL 3.x line of the specification (at least for a while) the new update concentrates on Shader Model 5.0 class GPUs and extensions heavily promoted by the community. Beside all this, the Khronos Group now confessedly opens towards convergence to OpenGL ES making the desktop version of the specification downward compatible with its embedded brother. In this article I would like to present the features introduced with the latest revision of the specification.
At the time of the release of the OpenGL 4 specification I was able to quickly deliver you a thorough presentation of all the new features introduced by that revision of the specification. This time I am already quite late, however I hope that this article will still prove as value for lots of you, especially for those who haven’t had time in the recent past to dig into the details of the new API version.
OpenGL 4.1 is not as revolutionary and feature-rich as its predecessor, however the latest revision was well received by the community as it brought such core extensions to the API that the community was waiting for a long time now. The new revision of the specification was accompanied with the appearance of a couple of other ARB extensions that have not yet been included into core, however I will still talk about some of them as they indicate a slight shift in the force of influence of various vendors and representatives inside the Architecture Review Board (ARB).
New features of OpenGL 4.1
Let’s start with the presentation of the new features arriving with the OpenGL 4.1 specification primarily targeting Shader Model 5.0 hardware. Here you will see a lot of harmonization features as well as community’s choice features that squarely intended to increase OpenGL development efficiency and feedom.
There have been for a long time rumors about the Khronos Group preparing a convergence between desktop OpenGL and OpenGL ES. This extension of the core specification clearly makes the first step towards this goal by providing an all-in-one specification pack that makes the desktop version of the specification downward compatible with ES. The extension adds support for features of OpenGL ES 2.0 that are missing from OpenGL 3+. According to the extension specification, enabling these features will ease the process of porting applications from OpenGL ES 2.0 to OpenGL.
More precisely, GL_ARB_ES2_compatibility exposes not just all the functions and tokens that weren’t present in the desktop version of the specification but also completes it with all the semantics that were exclusively specified only in the embedded version. Just to mention few of these issues:
- Vertex data format is now extended with the possibility to use 16-bit fixed point values by exposing the GL_FIXED type identifier token.
- Providing possibility to query the precision format used internally by shaders.
- Enable the use of GLSL ES for writing shaders for desktop GL.
While having this extension under the hood does not mean that we can simply pick our last game made for e.g. Symbian and just drop it on our PC, this extension may prove to be great value for GL ES developers migrating their software to desktop platforms.
This is one of the most waited additions to the core specification by the developer community. This extension introduces the possibility to acquire some sort of binary format of the compiled and linked shaders that can be later used to specify the program object directly with its binary code thus providing caching possibility to eliminate the need of compilation and linking next time the shader has to be used. This also makes it possible to create an offline GLSL compiler just using the OpenGL API itself.
Still, it has to be mentioned that having this feature in our hand does not necessarily mean that we can simply create our shader binaries offline and then distribute our software without the shader source itself as the binary formats supported by a particular implementation heavily depend on the hardware vendor as well as driver version. This is due to the fact that the shader binary most probably consists of instructions specially generated for the particular GPU-driver combo. The only way to relax this limitation would be to have some sort of cross-platform byte-code for shaders but that would in fact defeat most of the benefits of the extension on its own. Additionally, this extension does not provide any binary formats but leaves this to vendor specific extensions. It only exposes a common infrastructure for acquiring and loading program binaries.
While the usage of this extension does not completely eliminates the need for shader source compilation, it can limit the need for recompilation and relink to an installation time or first-run time compilation instead and use the stored binaries later. It also opens up room for SDK tools providing shader compilers with more aggressive optimization at their disposal being used offline. Such tools can truly be introduced as the specification explicitly mentions that run-time generated binaries by the GL should be interchangeable with those generated by offline SDK tools.
This is one another extension requested over several forums by the community. This feature has a longer history as it is actually based on the already existing and widely supported extension GL_EXT_separate_shader_objects by NVIDIA. For those who are already familiar with the predecessor of this extension won’t really find too much new stuff reading the specification of the ARB version of the extension, however it is still a must to read for them as well as even though there aren’t too much semantic differences between the functionality of the two, the usage of them still differs quite a lot as the ARB version solved the design issues of its predecessor by introducing a new type of GL object that I will talk about just in a moment.
In a nutshell, this extension provides a way to create program objects using any variation of shaders and bind them together to the current rendering context. Previously there was no way to bind multiple program objects to the context as the program object was designed to be a container for all the shaders forming the rendering pipeline of the context. This was a design decision during the development of GLSL that, before this extension, made the connection between the varyings of subsequent shader stages using a name based binding. As name information is available for shaders latest in the link stage, shaders were tightly coupled meaning that a change in any shader stage code required the relinking of the complete program object.
This proved to be very unpleasant for OpenGL developers as usually every rendering engine has its own set of vertex and fragment shaders (maybe accompanied with other shader types) that are used in various combinations. As an example, let’s take two vertex shaders: a simple MVP matrix based transformation shader and a more complex one that also supports skeletal animation. Also let’s take two fragment shaders: one for diffuse material and one for reflective material. We can have several types of objects: static with diffuse material, static with reflective material, animated with diffuse material and animated with reflective material.
In traditional GLSL the vertex and fragment shaders are bound together at link time rather than at the time they are bound to the context, like it was in case of legacy shaders (GL_ARB_vertex_program, GL_ARB_fragment_program and others). This means that in order to be able to use any of the combinations of vertex and fragment shaders (and maybe some geometry and tesselation shaders as well) we end up with two possible solutions, both having their severe drawbacks:
Link every combination of the shader objects
While this sounds as a viable solution and is still used by most of the developers, it has several problems. First of all, it wastes resources as we now have several copies of the same piece of code and the number of combinations can be pretty high, especially if not just vertex and fragment shaders are in use. While this is already quite a reasonable issue with the solution, the biggest problem arises for the application developer when he or she has to maintain an individual set of uniform locations as well as binding points for vertex attributes, draw buffers and possibly transform feedback buffers. While the GL_ARB_explicit_attrib_location extension already eliminates the need for maintaining binding points for vertex attributes, this solution is still simply unacceptable.
Link the program objects on an on-demand basis
In case of this alternative we are said to link the shader objects only when they are actually needed. While this solution eliminates the need for a possibly huge number of program objects, it introduces a reasonable run-time performance hit due to the additional relink process needed. Additionally, this solution proves to be more inferior even compared to the previous one as the uniform locations are determined at link time so it makes no less headache to the application developer.
This is the rationale behind this extension and why it is included into the core specification. The extension relaxes the strict tightly coupled behavior of the GLSL and adopts a mix-and-match shader stage model allowing multiple different program objects to be bound at once each to an individual set of rendering pipeline stage independently of other stage bindings.
Due to the fact that from now program objects are not the top most containers for the code used currently by the rendering pipeline, the ARB decided to introduce a new container object called a “program pipeline object” that can contain a set of program objects bound to their very own set of shader stages. This is the main difference between the EXT and the ARB version of the extension. I think it was a good decision to introduce this new type of object and the associated semantics as I always thought that the EXT version of the extension doesn’t have a really good design as I’ve seen it kind of a hack to relax the limitations of GLSL. The program pipeline object idea is definitely superior and I hope that the GLSL does not have too much of such annoying design issues hidden within.
This extension is much more a clarification to the existing specification rather than a new feature. It restricts more clearly the precision requirements of implementations of GLSL. According to the specification, the extension is meant to more precisely define the precision of arithmetic operations (addition, multiplication, etc.), transcendentals (log, exp, pow, etc.), when NaNs (not-a-number) and INFs (infinites) will be accepted and generated and denorm flushing behavior. The precision of the rest of the operations, including trigonometric operations are not addressed by the extension. For further details, please refer to the extension specification.
This extension trivially introduces 64-bit floating-point types into the list of supported vertex attribute component types. Nominally OpenGL did support this already from the very early stages of its history, however in practice only the latest generation of hardware does really accept vertex attributes in double precision floating-point type. While OpenGL 4 already introduced support for 64-bit floating-point values in GLSL and most of the shaders’ environment, vertex attributes gained the 64-bit precision only with this new extension.
This new feature makes it possible to use high precision for positioning data and other attributes of our geometries. While this sounds pretty awesome and it is actually, still for game developers and other real-time graphics users this shouldn’t mean that they should quickly switch to the new precision only in such cases when the precision requirements of the application really need it as using 64-bit floating-point values for vertex attributes does not just double the memory consumption but also involves a serious hit on performance due to bandwidth limitations and vertex attributes of this type may count double against the implementation-dependent limit on the number of vertex shader attribute vectors.
Previously, the configuration of the viewport, aka the transformation that generates the screen space coordinates based on the incoming view space coordinates of the vertices, was a global configuration that had effect on all draw commands meaning that in order to draw a primitive into multiple viewports the OpenGL viewport had to be changed between several draw calls. While previously this limitation wasn’t really an issue, due to the introduction of geometry shaders the possibility to amplify geometry and produce multiple output primitives for each primitive input justifies the need of several separately configurable viewports. Why? Because even though one was able to render the output primitives into separate render targets, they still shared the same global viewport.
This extension enhances OpenGL by providing a mechanism to specify multiple viewports and a new ability for the geometry shader being able to select the used viewport on a per-primitive basis. This does not just mean that separate viewports can be used for separate render targets but also enables to use multiple viewports to render to the same render target.
Additionally, the introduction of a viewport array means that we’re gonna have separate scissor rectangle for each viewport in the array as well. This can come handy for deferred shading based renderers that often use the scissor rectangle to limit the number of pixels to be accessed in case of rendering the effect of a light source. Having multiple scissors means that we have to change state less often, thus batching is much less an issue even in case of heavy scissor rectangle usage.
Finally, the new viewport specification commands accept floating point values thus providing additional flexibility to the application developer to define their very own pixel center conventions.
I’m pretty unsure whether this feature depends on any Shader Model 5.0 hardware, maybe others are more aware of this. Anyway, I wouldn’t be surprised if this extension will be supported by a much larger range of graphics cards than just pure SM5 GPUs. Actually this is true for many other extensions introduced by OpenGL 4.1 but let’s not guess but wait for the upcoming drivers to see whether I’m right or wrong.
Some other interesting extensions
So far I presented the new features of the latest revision of the OpenGL specification. While this was the main topic of this article, at about the same time the specification was published, a lot of other ARB extensions just appeared in the registry. While these extensions are not yet included into core and I cannot know whether they will be ever included, I would like to talk about some of them as it made me get to an interesting conclusion.
The stencil test is a powerful mechanism of OpenGL to selectively discard fragments based on the content of the stencil buffer that is used in a wide variety of rendering techniques including shadow volumes and deferred shading. However, the whole configuration of the stencil test and stencil operations is completely fixed function that is limited to operations such as incrementing, decrementing the existing value, or replacing the existing value in the stencil buffer with a fixed reference value.
This extension provides some programmability to the fixed function stencil operations by enabling the fragment shader to output a stencil reference value on a per-fragment basis. When stencil testing is enabled, this allows the test to be performed against the value generated in the shader. Also, when the stencil operation is set to GL_REPLACE, this allows a value generated in the shader to be written to the stencil buffer directly.
This opens up a lot of possibilities, however, I need to think much more about it as the best use cases of this feature are pretty much not basic ones. Obviously, by using the stencil reference value export inside a fragment shader disables early stencil test in the same style as exporting an new depth value from within a fragment shader disables early depth test.
This extension allows OpenGL to notify the application when various events occur that can come handy during application development and debugging. These events include errors, usage of deprecated functionalities, using configuration that results in undefined behavior, portability or performance issues. The application is notified about these events using a callback function that is defined by passing a function pointer to the appropriate OpenGL command.
While this extension provides a callback mechanism only for debugging purposes, the most revolutionary thing by having such an ARB extension is that this is the first official appearance of a feature that supports callbacks to the application code. Most probably not I’m the only person who would like to see a lot of other callbacks in the future included in the OpenGL API as we can benefit from it by getting notification about e.g. the completion of various asynchronous commands issued previously. This does not just provide a lot of flexibility but may also help in optimizing the rendering code based on the additional information previously available only if we use polling.
Why these extensions are so interesting?
The two extensions presented above already great value on their own but this isn’t why I mentioned them. The reason why I found these extensions so interesting as they are both obviously based on some vendor specific extensions released in the recent past by AMD, namely GL_AMD_shader_stencil_export and GL_AMD_debug_output. This conspicuously reveals that AMD has serious plans with their OpenGL support and this is something that a lot of those crazy folks waited for, who develop OpenGL stuff using ATI cards like me.
I think this also means that the NVIDIA monopoly in the ARB is over and this results in concurency and competition from what OpenGL and its community will definitely benefit in the long run.
The article ran out of control again, like the one I wrote about the previous release of the specification. Again, hope there are at least a few of you who kept up reading and finally got to this last chapter of the article. We can again quote the always recurring question of the community:
Where is direct state access?
Well, it is still not here, however, finally AMD has finished implementing it as well and published it finally. They have been working on it for quite some time but it became officially public only with Catalyst 10.7. Haven’t used it so far so maybe plenty of hidden bugs are still in it but at least they have it. This is one another thing that strengthens my prognostication that AMD committed itself for support OpenGL as previously they barely added support for any other extensions beside core features.
Back to the topic of the OpenGL 4.1 specification, while it is not as revolutionary as we got used to after reading the previous update, OpenGL is still on track and this is thanks to the Khronos Group and obviously to the great community. If OpenGL will get its iterative evolution in this pace like we’ve seen in the last two years, Microsoft will have a difficult time to keep up.
Thanks for reading this not-so-short article!
|Print article||This entry was posted by Daniel Rákos on August 24, 2010 at 7:32 pm, and is filed under Graphics, Programming. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site.|
No trackbacks yet.
about 4 years ago - 80 comments
I’ve chosen the title based on the popular article that tries to prove that OpenGL lost the war against Direct3D. To be honest, I didn’t really like the article at all. First, because it compared OpenGL 3 which targeted Shader Model 4.0 hardware and DirectX 11 which targeted Shader Model 5.0 hardware. Besides that, as we…
about 4 years ago - 6 comments
After the release of the OpenGL 4.1 specification the Khronos Group slowed down the pace a little bit but they didn’t left OpenGL developers without a new specification version for too long as a few weeks ago they’ve released OpenGL 4.2. The new version of the specification brings several API improvements as well as exposes…
about 5 years ago - 3 comments
You might remember that I wrote an article about my suggestions for OpenGL 4.2 and beyond. One of the features that I recommended to be added to OpenGL was a yet non-existent extension called GL_ARB_draw_indirect2 which suggested the addition of new draw commands that are similar in fashion to the ancient MultiDraw* commands but they…
about 5 years ago - 16 comments
In this article, I would like to present you an edge detection algorithm that shares similar performance characteristics like the well-known Sobel operator but provides slightly better edge detection and can be seamlessly extended with little to no performance overhead to also detect corners alongside with edges. The algorithm works on a 3×3 texel footprint…
about 5 years ago - 29 comments
The Khronos Group did a great job in the last few years to once again prove that OpenGL is still in game and that it can become the ultimate graphics API of choice, if it is not that already. However, we must note that it is not quite yet true that OpenGL 4.1 is a…
about 5 years ago - 12 comments
Currently there are several ways to feed data to the GPU no matter of what API we use and what type of application we develop. In case of OpenGL we have uniform buffers, texture buffers, texture images, etc. The same is true for OpenCL and other compute APIs that even provide more fine-grained memory management…
about 5 years ago - 6 comments
Dynamic geometry level-of-detail (LOD) algorithms are very popular and powerful algorithms that provide a great level of rendering performance optimization while preserving detail by using less detailed geometry for objects that are far away, too small or otherwise less significant in the quality of the final rendering. Many of these are used since the very…
about 5 years ago - 29 comments
Hierarchical-Z is a well known and standard feature of modern GPUs that allows them to speed up depth testing by rejecting large group of incoming fragments using a reduced and compressed version of the depth buffer that resides in on-chip memory. The technique presented in this article uses the same basic idea to allow batched…
about 5 years ago - 18 comments
OpenGL 3.0 capable GPUs introduced a level of processing power and programming flexibility that isn’t comparable with any earlier generations. After that, OpenGL 4.0 and the hardware supporting it even further pushed the limits of what previously seemed to be impossible. Thanks to these features nowadays more and more possibilities are available for the graphics…
about 5 years ago - 4 comments
With the introduction of Shader Model 5.0 hardware and the API support provided by OpenGL 4.0 made GPU based geometry tessellation a first class citizen in the latest graphics applications. While the official support from all the commodity graphics card vendors and the relevant APIs are quite recent news, little to no people know that…