<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>RasterGrid Blog &#187; callback</title>
	<atom:link href="http://rastergrid.com/blog/tag/callback/feed/" rel="self" type="application/rss+xml" />
	<link>http://rastergrid.com/blog</link>
	<description>A technical blog from Daniel Rákos (aka aqnuep)</description>
	<lastBuildDate>Fri, 24 Feb 2012 03:23:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Suggestions for OpenGL 4.2 and beyond</title>
		<link>http://rastergrid.com/blog/2010/11/suggestions-for-opengl-4-2-and-beyond/</link>
		<comments>http://rastergrid.com/blog/2010/11/suggestions-for-opengl-4-2-and-beyond/#comments</comments>
		<pubDate>Sun, 14 Nov 2010 17:15:23 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[callback]]></category>
		<category><![CDATA[fragment shader]]></category>
		<category><![CDATA[geometry instancing]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[texture buffer]]></category>
		<category><![CDATA[transform feedback]]></category>
		<category><![CDATA[uniform buffer]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=504</guid>
		<description><![CDATA[The Khronos Group did a great job in the last few years to once again prove that OpenGL is still in game and that it can become the ultimate graphics API of choice, if it is not that already. However, we must note that it is not quite yet true that OpenGL 4.1 is a]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2010%252F11%252Fsuggestions-for-opengl-4-2-and-beyond%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2FdymyU0%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Suggestions%20for%20OpenGL%204.2%20and%20beyond%22%20%7D);"></div>
<p>The Khronos Group did a great job in the last few years to once again prove that OpenGL is still in game and that it can become the ultimate graphics API of choice, if it is not that already. However, we must note that it is not quite yet true that OpenGL 4.1 is a superset of its competitor, DirectX 11. We still have some holes that still have to be filled and I think the ARB should not stop just there as there is much more potential in the current hardware architectures than that is currently exposed by any graphics API so establishing the future of OpenGL should start by going one step further than DX11. In this article I would like to present my vision of items of importance that should be included in the next revision of the specification and how I see the future of OpenGL.</p>
<p><span id="more-504"></span>Since the original OpenGL Longs Peak announcement, graphics developers were really excited to get their hands on the completely revised OpenGL 3 specification. Still, due to severe backward compatibility and portability issues the original plan seemed to be failed and developers expressed their great sense of disappointment about the ARB&#8217;s decision to choose rather a more evolutionary move away from the legacy API instead of the radical rewrite, the Khronos Group has proved that the decision was not necessarily bad for OpenGL and in fact we got now a pretty powerful API, even though the coexistence of the legacy and the new design greatly increased the complexity of the specification.</p>
<p>What we have now is an API that can really compete with DirectX 11 but I strongly believe that this is not the end of the story yet as we still have a lot of things to do in ahead of us. I mean this both from point of view of exposing more hardware capabilities as well as streamlining the API language itself to increase the productivity of the developers who use it. My plan is to target both of these issues in this article, also trying to focus on hardware functionalities that are not even exposed by other graphics APIs yet.</p>
<h2>Exposing more hardware capabilities</h2>
<p>In this chapter of the article I will talk about some familiar and some not so familiar hardware features and corresponding OpenGL extensions that should be included in the next revision of the specification in order to be able to confidently say that OpenGL is a strict superset of the competing graphics APIs. The extensions listed here are not in any particular priority order, they are just listed in a way that ease the discussion about their functionality.</p>
<h3><a title="GL_EXT_shader_image_load_store" href="http://www.opengl.org/registry/specs/EXT/shader_image_load_store.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/shader_image_load_store.txt?referer=');">GL_EXT_shader_image_load_store</a></h3>
<p>This extension provides GLSL built-in functions allowing shaders to load from, store to, and perform atomic read-modify-write operations to a single level of a texture from any shader stage. Also, the extension also indirectly enables the same operations for buffer objects by using texture buffers. This enables developers to implement more sophisticated algorithms using shaders that require more complex data structures than just plain arrays.</p>
<p>An example use case can be the implementation of Order-Independent Transparency (OIT) using fragment linked lists as presented by <a title="OIT And Indirect Illumination Using Dx11 Linked Lists" href="http://www.slideshare.net/hgruen/oit-and-indirect-illumination-using-dx11-linked-lists" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.slideshare.net/hgruen/oit-and-indirect-illumination-using-dx11-linked-lists?referer=');">AMD at GDC10</a>. Of course, there are a lot of other techniques that could benefit from hardware accelerated random access images (called UAV textures/buffers in DX11 terminology) including algorithms related to global illumination, ray tracing, and my personal favorite: scene management.</p>
<p>As the introduction of new write operations to fragment shaders besides the traditional framebuffer writes makes the execution of the shaders sensitive to whether early-Z is used or not by the hardware, the extension also introduces a new fragment shader input layout qualifier called &#8220;early_fragment_tests&#8221; to force OpenGL to use early depth and stencil test. Otherwise the specification language is valid stating that the depth and stencil tests are performed after fragment shader execution.</p>
<p>Finally, the extension enables some form of control over the order of image loads, stores, and atomics relative to other pipeline operations accessing the same memory region both using the OpenGL API and from within shaders.</p>
<p>The API itself provides a DSA-style binding mechanism that enables binding to so called &#8220;image units&#8221; that are separate from that of texture image units. In the same style, the specification language and GLSL refers to the introduced read-write textures with the term &#8220;image&#8221;.</p>
<p>In my opinion this is one of the most important extensions that should be made core with OpenGL 4.2 and I&#8217;m pretty sure this will actually happen.</p>
<h3><a title="GL_NV_texture_barrier" href="http://www.opengl.org/registry/specs/NV/texture_barrier.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/NV/texture_barrier.txt?referer=');">GL_NV_texture_barrier</a></h3>
<p>This extension relaxes the restrictions of OpenGL on rendering to a currently bound texture and provides a mechanism to avoid read-after-write problems. More precisely, the extension allows rendering to a currently bound texture in the following cases:</p>
<ul>
<li>If the reads and writes are from/to disjoint sets of texels (after accounting for texture filtering rules) so it should work unless the drawn areas overlap, or</li>
<li>If there is only a single read and write of each texel, and the read is in the fragment shader invocation that writes the same texel (e.g. using texelFetch2D).</li>
</ul>
<p>Some of these situations were already supported implicitly like rendering to a texture level and fetching from another texture level. But the extension goes further and provides an API function to put an explicit barrier between draw calls to ensure proper rendering.</p>
<p>The extension can be used to accomplish a limited form of programmable blending and can eliminate the need of any image or buffer data copy in case we can live with the restrictions mentioned above.</p>
<p>One may ask why we need this extension if we have the <a title="GL_EXT_shader_image_load_store" href="http://www.opengl.org/registry/specs/EXT/shader_image_load_store.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/shader_image_load_store.txt?referer=');">GL_EXT_shader_image_load_store</a> extension as this one is just a subset of the functionality provided by that. The answer is simple: performance. While read-write textures can mimic the same functionality they usually use different hardware paths that are slower than regular read-only texture accesses. So it would be a definite benefit to having also this extension in core OpenGL.</p>
<h3>GL_ARB_shader_atomic_counters</h3>
<p>This extension does not have public specifications yet, however it can be found in the extension lists of the latest Catalyst driver releases sometimes with EXT, sometimes with ARB prefix. The extension itself provides API to access a number of hardware atomic counters that provide efficient counter operations on a GPU global scale.</p>
<p>Atomic counters come handy when one has to read or write individual elements of a buffer or texture. As an example, this extension is needed to be able to efficiently implement the OIT algorithm mentioned earlier as, when constructing the fragment linked list, we need to have unique offsets to the linked list buffer. This unique offset can be, of course, acquired by using atomic read-modify-write operations but those perform much slower than hardware atomic counters.</p>
<p>Besides the mentioned example, atomic counters are useful in many algorithms from many domains, one important use case is to perform feedback operations similar to that provided by transform feedback. Such feedback operations can be used to perform various scene management or culling mechanisms.</p>
<p>The extension provides access to these atomic counters from GLSL and also makes it possible to back them up with buffer objects so after OpenGL draw calls the value of the counters is conserved in these buffers for subsequent use.</p>
<h3><a title="GL_AMD_conservative_depth" href="http://www.opengl.org/registry/specs/AMD/conservative_depth.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/conservative_depth.txt?referer=');">GL_AMD_conservative_depth</a></h3>
<p>Early depth test is a common optimization for hardware accelerated graphics that can skip the evaluation of fragment shaders for fragments that end up being discarded because they don&#8217;t pass the depth test. The problem is that in case the fragment shader modifies the depth value of the fragment then the early depth test is disabled. One can force early depth test with the functionality introduced by the extension <a title="GL_EXT_shader_image_load_store" href="http://www.opengl.org/registry/specs/EXT/shader_image_load_store.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/shader_image_load_store.txt?referer=');">GL_EXT_shader_image_load_store</a> but that can lead to some rendering artifacts as the modified depth value output by the fragment shader is not taken into account.</p>
<p>This extension allows the application to pass enough information to the GL implementation to activate some early depth test optimizations safely while still preserving the ability to account the final depth value in the depth test. In order to solve this, the extension introduces four new fragment shader input layout qualifiers called &#8220;depth_unchanged, &#8220;depth_any&#8221;, &#8220;depth_greater&#8221; and &#8220;depth_less&#8221;. The most interesting ones are the latest two that provide the ability to do early-Z and hierarchical-Z tests from one direction to discard some groups of fragments and still allow the fragment shader to safely modify the depth value.</p>
<p>This technique comes very handy in case of rendering volumetric particles, decals or billboards. Without this extension one have to sacrifice the possibility to do early rejection of fragments in order to be able to create the volumetric primitives mentioned.</p>
<p>As far as I know this feature is also present in DirectX 11 so it should be a must for OpenGL 4.x also. As the extension is an AMD one, I don&#8217;t know whether NVIDIA GPUs do support anything like this in hardware but even if not, they can simply ignore the new layout qualifiers and do late depth test instead. Of course, it would result in lower performance but if only functionality is concerned it should be just okay.</p>
<h3>GL_ARB_instanced_arrays2</h3>
<p>OpenGL provides two means to perform geometry instancing via the extensions <a title="GL_ARB_draw_instanced" href="http://www.opengl.org/registry/specs/ARB/draw_instanced.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_instanced.txt?referer=');">GL_ARB_draw_instanced</a> and <a title="GL_ARB_instanced_arrays" href="http://www.opengl.org/registry/specs/ARB/instanced_arrays.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/instanced_arrays.txt?referer=');">GL_ARB_instanced_arrays</a>. While this (yet non-existent) extension would extend both, it is more relevant in case of the extension mentioned later so I named it accordingly.</p>
<p>The extension should trivially add the possibility to specify a &#8220;first instance&#8221; parameter for the instanced draw commands. Whether this is accomplished by introducing new variants of the glDrawElement* and glDrawArrays* draw commands or having a separate command for specifying the new parameter is up to the ARB. The extension should also interact with <a title="GL_ARB_draw_indirect" href="http://www.opengl.org/registry/specs/ARB/draw_indirect.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_indirect.txt?referer=');">GL_ARB_draw_indirect</a> which already mentions the lack of the parameter in GL and reserved already a field in the indirect draw command structure for specifying the &#8220;first instance&#8221; parameter.</p>
<p>This extension itself would be much more a bug fix rather than a completely new feature as this functionality should have been already exposed at the first time instancing was introduced to OpenGL.</p>
<h3>GL_ARB_draw_indirect2</h3>
<p>This is one of the extensions I would be the most happy to see in the next release of the OpenGL specification. It would be a functional addition to the <a title="GL_ARB_draw_indirect" href="http://www.opengl.org/registry/specs/ARB/draw_indirect.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_indirect.txt?referer=');">GL_ARB_draw_indirect</a> extension that currently only allows the execution of a single instanced draw command that sources its parameter from a buffer object.</p>
<p>The new extension would add a new buffer binding point called e.g. GL_DRAW_INDIRECT_PRIMITIVE_COUNT that would specify the source of the &#8220;primcount&#8221; parameter to the following newly introduced draw commands:</p>
<pre>    void <strong>MultiDrawArraysIndirect</strong>( enum <em>mode</em>, sizei stride,
                                  const void *<em>indirect</em>,
                                  const void *<em>primcount</em> );
    void <strong>MultiDrawElementsIndirect</strong>( enum <em>mode</em>, enum <em>type</em>, sizei stride,
                                    const void *<em>indirect</em>,
                                    const void *<em>primcount</em> );</pre>
<p>This would not just allow for executing multiple indirect draw commands at once, without further CPU action, but also would source the &#8220;primcount&#8221; parameter from a buffer object thus if the draw commands are generated using transform feedback, read-write buffers or OpenCL (e.g. based on some GPU based scene management algorithm) then the application does not have to use asynchronous queries or other means that may introduce sync points in the rendering to be able to feed the &#8220;primcount&#8221; parameter.</p>
<p>Some people said that this is quite a futuristic feature to expect and most probably such functionality will be available only on newer generation of GPUs and maybe with OpenGL 5. I was not that pessimistic so I decided to raise my question to the relevant ARB members of NVIDIA and AMD. While I did not receive any answer from NVIDIA, I did received some good news from AMD as they said that this functionality can be implemented for Shader Model 5.0 level hardware.</p>
<p>What this extension would give developers is a way to efficiently implement GPU based scene management where the GPU bakes together all the rendering commands for the current frame using atomic counters and buffer writes, and the CPU just have to issue a few or maybe just a single MultiDraw*Indirect command to render the whole scene. But of course, the feature can increase draw command throughput also in case of CPU based scene management.</p>
<p>So my message to the Khronos Group is please, start working on such an extension as this would not just make developers happy, but you can also strengthen OpenGL&#8217;s position in the industry by putting something into the specification that even DirectX 11 cannot do.</p>
<h3><a title="GL_AMD_transform_feedback3_lines_triangles" href="http://www.opengl.org/registry/specs/AMD/transform_feedback3_lines_triangles.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/transform_feedback3_lines_triangles.txt?referer=');">GL_AMD_transform_feedback3_lines_triangles</a></h3>
<p>OpenGL 4.0 introduced the extension <a title="GL_ARB_transform_feedback3" href="http://www.opengl.org/registry/specs/ARB/transform_feedback3.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/transform_feedback3.txt?referer=');">GL_ARB_transform_feedback3</a> that further extended the transform feedback capabilities provided by earlier extensions to allow ouput to separate vertex streams. However there is one caveat: separate vertex streams are only supported for point primitives.</p>
<p>This new AMD extension does nothing more than just simply removes that restrictions for separate output streams allowing the same set of primitive types to be used with multiple transform feedback streams as with a single stream as long as the primitive types are the same for all output streams.</p>
<p>Limiting the possible output primitive types for transform feedback into multiple streams should not be a problem unless you want also to rasterize some triangles at the same time you output. Without relaxing this restriction can do this only by issuing two separate draw commands that incurs a performance hit.</p>
<p>I don&#8217;t know if the restriction is present in the ARB extension because NVIDIA does not support this in hardware but if this is not the case then I think this extension should be included in the next release of the specification. Otherwise, please NVIDIA include this feature in your next GPU generation.</p>
<h3><a title="GL_NV_copy_image" href="http://www.opengl.org/registry/specs/NV/copy_image.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/NV/copy_image.txt?referer=');">GL_NV_copy_image</a></h3>
<p>OpenGL 3.1 already introduced a method to provide GPU accelerated copy of buffer data. This NVIDIA extension provides a similar functionality that can be used to execute efficient image data transfer between image objects (i.e. textures and renderbuffers).</p>
<p>While there are already methods to perform image data copies between textures e.g. using the <a title="GL_EXT_framebuffer_blit" href="http://www.opengl.org/registry/specs/EXT/framebuffer_blit.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/framebuffer_blit.txt?referer=');">GL_EXT_framebuffer_blit</a> extension promoted to core with OpenGL 3.0 these require expensive framebuffer object operations and they also lack direct support for transferring 3D image data.</p>
<p>This extension simply introduces a single command that allows such image data copies for every type of textures (including cube maps, 3D textures and array textures) without the need to bind the image objects or otherwise configure the rendering.</p>
<h3><a title="GL_AMD_depth_clamp_separate" href="http://www.opengl.org/registry/specs/AMD/depth_clamp_separate.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/depth_clamp_separate.txt?referer=');">GL_AMD_depth_clamp_separate</a></h3>
<p>The extension <a title="GL_ARB_depth_clamp" href="http://www.opengl.org/registry/specs/ARB/depth_clamp.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/depth_clamp.txt?referer=');">GL_ARB_depth_clamp</a> promoted to core with OpenGL 3.2 introduced the ability to control the clamping of the depth value for both the near and far clip planes. This eliminates artifacts like seeing inside an object happening when the object&#8217;s geometry is clipped by the near clip plane.</p>
<p>This new extension provides a mean for the application to enable depth clamp separately for the near and the far clip plane. This increases the flexibility of depth clamping and can save some fill-rate in certain situations.</p>
<h3><a title="GL_EXT_texture_filter_anisotropic" href="http://www.opengl.org/registry/specs/EXT/texture_filter_anisotropic.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/texture_filter_anisotropic.txt?referer=');">GL_EXT_texture_filter_anisotropic</a></h3>
<p>I don&#8217;t think that I have to talk too much about this extension as it should be familiar to all of you. It simply enables the possibility to use anisotropic filtering on a per-texture basis. I really wonder how this extension didn&#8217;t make its way into core as it is supported by hardware since more than a decade.</p>
<p>I know that the extension itself is supported by all relevant graphics driver vendors but really, why we can&#8217;t just simply include it in the core specification?</p>
<h3>GL_ARB_texture_gather_lod</h3>
<p>This is another yet non-existent extension that would extend <a title="GL_ARB_texture_gather" href="http://www.opengl.org/registry/specs/ARB/texture_gather.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_gather.txt?referer=');">GL_ARB_texture_gather</a> by adding GLSL built-in functions called textureGatherLod that would allow gathered fetches with explicit LOD. I&#8217;m not sure if these functions are missing from the specification because of lack of hardware support or just because the ARB thought they might not be of any use. Anyway, if the hardware supports it then OpenGL should expose it to developers as there are certain situations when one has to use explicit LOD and could benefit from the increased fetching performance enabled by gathered fetches.</p>
<h3><a title="GL_ARB_shader_stencil_export" href="http://www.opengl.org/registry/specs/ARB/shader_stencil_export.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_stencil_export.txt?referer=');">GL_ARB_shader_stencil_export</a></h3>
<p>This extension was published at the time the OpenGL 4.1 specification came out and provides the ability for the fragment shader to output the stencil reference value that was otherwise configurable only using API calls. This enables a great level of flexibility to existing and future stencil buffer based algorithms making it possible also to directly write independent values to the stencil buffer on a per-fragment basis.</p>
<p>The predecessor of the extension is <a title="GL_AMD_shader_stencil_export" href="http://www.opengl.org/registry/specs/AMD/shader_stencil_export.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/shader_stencil_export.txt?referer=');">GL_AMD_shader_stencil_export</a> and as such it indicates that maybe it is only supported in hardware on AMD GPUs. However, if this is not the case and NVIDIA could support this also then I think it worths to promote this feature also to core OpenGL.</p>
<h2>Streamlining the API</h2>
<p>After discussing the long list of functional features that would be nice to be included into the next release of OpenGL let&#8217;s focus on the API improvement extensions and ideas that are necessary to improve the usability of the API itself. Actually this part could go way longer than I&#8217;ll discuss because as we get more and more features to OpenGL, developers struggle with the increased complexity of the API. I&#8217;ll try to focus on the most crucial issues.</p>
<h3><a title="GL_EXT_direct_state_access" href="http://www.opengl.org/registry/specs/EXT/direct_state_access.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/direct_state_access.txt?referer=');">GL_EXT_direct_state_access</a></h3>
<p>This is the extension what all OpenGL developers are waiting for a long time now. Direct state access eliminates the OpenGL API&#8217;s stupid &#8220;bind-to-modify&#8221; nature.</p>
<p>For a very long time the only vendor supporting the extension was NVIDIA. Fortunately, since Catalyst 10.7 AMD also exposes the extension to developers. Still, I have one problem: this extension is very poorly designed.</p>
<p>The main problem with the extension is that the functions were designed in a way that a naive implementation could be done by simply using &#8220;bind-to-modify&#8221; under the hood. That&#8217;s what resulted in crazy API functions like MultiTexParameter* and friends. Also, enabling DSA for all of the deprecated functionalities would result in an explosion of the API specification and as a consequence it would result in bloated specification language. Finally, I would also like to object somewhat the lack of creativity of the contributors regarding to the awkward naming conventions present in the current DSA extension.</p>
<p>In my opinion the Khronos Group has to address the issue by creating a new ARB version of the DSA extension that focuses strictly on core functionalities, throwing away DSA support for deprecated features (if somebody needs to use deprecated features they can still use the EXT version) and provide a naming convention that fits much better into the current API language.</p>
<p>Anyway, I completely agree with the other developers out there and scream for DSA. I think the Khronos Group has to eliminate the problem of the &#8220;bind-to-modify&#8221; semantics as soon as possible otherwise, even though the core specification exposes more and more hardware features, developers will not be attracted to use OpenGL.</p>
<h3>GL_ARB_explicit_sampler_location</h3>
<p>The ARB moved in the right direction when they introduced the <a title="GL_ARB_explicit_attrib_location" href="http://www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt?referer=');">GL_ARB_explicit_attrib_location</a> extension by eliminating the need to use dummy API calls to bind vertex attributes and output buffers to shader variables but they should not stop here. One of the most important addition could be adding a similar language syntax to GLSL that would allow us to bind sampler uniforms to texture image units. Obviously, the same goes for read-write images if <a title="GL_EXT_shader_image_load_store" href="http://www.opengl.org/registry/specs/EXT/shader_image_load_store.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/shader_image_load_store.txt?referer=');">GL_EXT_shader_image_load_store</a> is included.</p>
<h3>GL_ARB_explicit_uniform_block_index</h3>
<p>Similar to the previous request, uniform block indices should be as well explicitly specifiable in the shaders themselves. This extension would add exactly such functionality. The implementation is also straightforward: just a simple uniform block layout qualifier has to be added.</p>
<h3>Other API clarifications</h3>
<p>Besides the major issues the current specification language also has some bugs and unclear parts that should be addressed as well:</p>
<ul>
<li>Program pipeline objects are created by binding the object name which is not in align with the rest of the API language.</li>
<li>No language is about whether program pipeline objects are shared among contexts or not which suggests that they aren&#8217;t which is not in align with the fact that program and shader objects are shared.</li>
</ul>
<p>Most probably there are a lot more issues with the specification language but for now just these came into my mind. Maybe some of you can extend the list with tons of other specification mistakes.</p>
<h2>OpenGL 4.2 and beyond</h2>
<p>While my feature requests cover most of the needed functionality that should be included in the next revision of the OpenGL specification, there are a lot of other things that could be very useful for developers but are very unlikely to get their way into the specification any soon. I will talk about these features in this section of the article as these raise much more questions than just to be able to simply include it in OpenGL 4.2.</p>
<h3>Affinity contexts</h3>
<p>We have multi-GPU designs like SLI and CrossFire for a long time now. Fortunately, we have also vendor specific extensions to create affinity contexts that are associated with a single GPU of a multi-GPU configuration. We have <a title="WGL_AMD_gpu_association" href="http://www.opengl.org/registry/specs/AMD/wgl_gpu_association.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/wgl_gpu_association.txt?referer=');">WGL_AMD_gpu_association</a> and <a title="WGL_NV_gpu_affinity" href="http://www.opengl.org/registry/specs/NV/gpu_affinity.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/NV/gpu_affinity.txt?referer=');">WGL_NV_gpu_affinity</a> for Windows and <a title="GLX_AMD_gpu_association" href="http://www.opengl.org/registry/specs/AMD/glx_gpu_association.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/glx_gpu_association.txt?referer=');">GLX_AMD_gpu_association</a> on GLX based platforms. I have just two problems with this:</p>
<ul>
<li>First, these are vendor specific extensions.</li>
<li>Second, NVIDIA exposes its affinity context support only on Windows and just for their professional cards, leaving consumer hardware owners without affinity context support.</li>
</ul>
<p>I would be pleased to see in the future extensions like <span style="text-decoration: underline;">WGL_ARB_gpu_affinity_context</span> and <span style="text-decoration: underline;">GLX_ARB_gpu_affinity_context</span> that will be supported by both NVIDIA and AMD, and that are supported on both professional and consumer hardware.</p>
<h3>Command buffers</h3>
<p>I would like to see something similar in OpenGL that what we have in OpenCL. Having several separate command buffers for a single OpenGL context can have its performance benefits as some of the implicit sync points that are otherwise present in OpenGL draw commands could be eliminated. Another solution would be to use simply multiple GL contexts but it is much more complicated and context switches are quite heavy-weight operations. This would be something like how framebuffer objects replaced pbuffers.</p>
<p>Also this could go that far as we can encapsulate state manipulation data into command buffers in a similar way how display lists allowed this in many cases just in a more efficient and hardware centric manner.</p>
<h3>Immutable state objects</h3>
<p>Another thing strongly related to the previous idea would be immutable state objects. If state management data could not be efficiently stored in such a command buffer we could use instead immutable state objects that would be very similar in nature to display lists that are hiding the underlying representation of the commands.</p>
<p>Display lists are deprecated and I don&#8217;t think it was a wrong decision. It made the API language complex and you&#8217;ve never knew which command compiles into display lists and how. I remember the time I was making an OpenGL app on my GeForce2 and used DrawElements calls inside display lists that referenced buffer object data. Funnily it was working on NVIDIA hardware, even though the specification says otherwise, and I was wondering why I my app crashes on ATI cards.</p>
<p>Anyway, display lists are gone, but we need some complex state objects that could fill those holes that were left after them.</p>
<h3>More callbacks</h3>
<p>I was very happy to see the appearance of an extension that introduced the callback concept into OpenGL (<a title="GL_AMD_debug_output" href="http://www.opengl.org/registry/specs/AMD/debug_output.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/debug_output.txt?referer=');">GL_AMD_debug_output</a>). Since that, the functionality was promoted to an ARB extension meaning that the ARB has accepted the fact that we need callbacks.</p>
<p>What I would like to see in the future is more OpenGL callbacks. One of the most trivial things I can think of are asynchronous queries. It would so much easier if we would be able to receive a callback from OpenGL when the results of our asynchronous queries are available, rather than having to manually poll it for result in various phases of the rendering.</p>
<p>Actually, I could imagine callbacks for every rendering command issued that will be called by the driver as soon as the actual rendering is complete on the GPU side.</p>
<h3>Programmable blending</h3>
<p>This is one another thing that developers are screaming for. Fortunately now we have indirect methods to solve most of the issues of programmable blending via the extensions <a title="GL_EXT_shader_image_load_store" href="http://www.opengl.org/registry/specs/EXT/shader_image_load_store.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/shader_image_load_store.txt?referer=');">GL_EXT_shader_image_load_store</a> and <a title="GL_NV_texture_barrier" href="http://www.opengl.org/registry/specs/NV/texture_barrier.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/NV/texture_barrier.txt?referer=');">GL_NV_texture_barrier</a>, however a more general solution would be welcomed.</p>
<p>I don&#8217;t know whether this would be actually possible on current hardware but if not, then this is a message to hardware vendors to solve the issue in the near future.</p>
<h2>Summary</h2>
<p>We&#8217;ve seen that even though OpenGL is on track and the Khronos Group is keeping up the pace with its competitors, still there are lots of room for improvement regarding to the OpenGL specification from both functional point of view as well as from API design point of view.</p>
<p>I would like to end the article with a summary of what I expect to be part of the OpenGL 4.2 specification and my personal wish-list beyond those in some kind of priority order.</p>
<p><strong>My expectations for OpenGL 4.2:</strong></p>
<ul>
<li>GL_EXT_shader_image_load_store</li>
<li>GL_ARB_shader_atomic_counters</li>
<li>GL_ARB_instanced_arrays2</li>
<li>GL_ARB_explicit_sampler_location</li>
<li>GL_ARB_explicit_uniform_block_index</li>
</ul>
<p><strong>My personal wish-list for OpenGL 4.2:</strong></p>
<ul>
<li>GL_ARB_draw_indirect2</li>
<li>GL_ARB_direct_state_access</li>
<li>GL_NV_texture_barrier</li>
<li>GL_AMD_conservative_depth</li>
<li>GL_ARB_texture_gather_lod</li>
<li>GL_NV_copy_image</li>
<li>GL_EXT_texture_filter_anisotropic</li>
<li>GL_ARB_shader_stencil_export</li>
<li>GL_AMD_depth_clamp_separate</li>
<li>GL_AMD_transform_feedback3_lines_triangles</li>
</ul>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/11/suggestions-for-opengl-4-2-and-beyond/feed/</wfw:commentRss>
		<slash:comments>29</slash:comments>
		</item>
		<item>
		<title>An introduction to OpenGL 4.1</title>
		<link>http://rastergrid.com/blog/2010/08/an-introduction-to-opengl-4-1/</link>
		<comments>http://rastergrid.com/blog/2010/08/an-introduction-to-opengl-4-1/#comments</comments>
		<pubDate>Tue, 24 Aug 2010 19:32:51 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[binary shader]]></category>
		<category><![CDATA[callback]]></category>
		<category><![CDATA[fragment shader]]></category>
		<category><![CDATA[geometry shader]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[OpenGL ES]]></category>
		<category><![CDATA[stencil]]></category>
		<category><![CDATA[vertex shader]]></category>
		<category><![CDATA[viewport]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=290</guid>
		<description><![CDATA[The Khronos Group keeps the pace that they set themselves being able to deliver the latest specification of OpenGL less than half year after the revolutionary appearance of OpenGL 4. Abandoning the OpenGL 3.x line of the specification (at least for a while) the new update concentrates on Shader Model 5.0 class GPUs and extensions]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2010%252F08%252Fan-introduction-to-opengl-4-1%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2F99sxN3%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22An%20introduction%20to%20OpenGL%204.1%22%20%7D);"></div>
<p>The Khronos Group keeps the pace that they set themselves being able to deliver the latest specification of OpenGL less than half year after the revolutionary appearance of OpenGL 4. Abandoning the OpenGL 3.x line of the specification (at least for a while) the new update concentrates on Shader Model 5.0 class GPUs and extensions heavily promoted by the community. Beside all this, the Khronos Group now confessedly opens towards convergence to OpenGL ES making the desktop version of the specification downward compatible with its embedded brother. In this article I would like to present the features introduced with the latest revision of the specification.</p>
<p><span id="more-290"></span>At the time of the release of the OpenGL 4 specification I was able to quickly deliver you a <a title="A brief preview of the new features introduced by OpenGL 3.3 and 4.0" href="http://rastergrid.com/blog/2010/03/a-brief-preview-of-the-new-features-introduced-by-opengl-3-3-and-4-0/">thorough presentation</a> of all the new features introduced by that revision of the specification. This time I am already quite late, however I hope that this article will still prove as value for lots of you, especially for those who haven&#8217;t had time in the recent past to dig into the details of the new API version.</p>
<p>OpenGL 4.1 is not as revolutionary and feature-rich as its predecessor, however the latest revision was well received by the community as it brought such core extensions to the API that the community was waiting for a long time now. The new revision of the specification was accompanied with the appearance of a couple of other ARB extensions that have not yet been included into core, however I will still talk about some of them as they indicate a slight shift in the force of influence of various vendors and representatives inside the <a title="About the OpenGL ARB &quot;Architecture Review Board&quot;" href="http://www.opengl.org/about/arb/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/about/arb/?referer=');">Architecture Review Board (ARB)</a>.</p>
<h2>New features of OpenGL 4.1</h2>
<p>Let&#8217;s start with the presentation of the new features arriving with the OpenGL 4.1 specification primarily targeting Shader Model 5.0 hardware. Here you will see a lot of harmonization features as well as community&#8217;s choice features that squarely intended to increase OpenGL development efficiency and feedom.</p>
<h3><a title="GL_ARB_ES2_compatibility" href="http://www.opengl.org/registry/specs/ARB/ES2_compatibility.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/ES2_compatibility.txt?referer=');">ARB_ES2_compatibility</a></h3>
<p>There have been for a long time rumors about the Khronos Group preparing a convergence between desktop OpenGL and OpenGL ES. This extension of the core specification clearly makes the first step towards this goal by providing an all-in-one specification pack that makes the desktop version of the specification downward compatible with ES. The extension adds support for features of OpenGL ES 2.0 that are missing from OpenGL 3+. According to the extension specification, enabling these features will ease the process of porting applications from OpenGL ES 2.0 to OpenGL.</p>
<p>More precisely, <a title="GL_ARB_ES2_compatibility" href="http://www.opengl.org/registry/specs/ARB/ES2_compatibility.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/ES2_compatibility.txt?referer=');">GL_ARB_ES2_compatibility</a> exposes not just all the functions and tokens that weren&#8217;t present in the desktop version of the specification but also completes it with all the semantics that were exclusively specified only in the embedded version. Just to mention few of these issues:</p>
<ul>
<li>Vertex data format is now extended with the possibility to use 16-bit fixed point values by exposing the GL_FIXED type identifier token.</li>
<li>Providing possibility to query the precision format used internally by shaders.</li>
<li>Enable the use of GLSL ES for writing shaders for desktop GL.</li>
</ul>
<p>While having this extension under the hood does not mean that we can simply pick our last game made for e.g. Symbian and just drop it on our PC, this extension may prove to be great value for GL ES developers migrating their software to desktop platforms.</p>
<h3><a title="GL_ARB_get_program_binary" href="http://www.opengl.org/registry/specs/ARB/get_program_binary.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/get_program_binary.txt?referer=');">ARB_get_program_binary</a></h3>
<p>This is one of the most waited additions to the core specification by the developer community. This extension introduces the possibility to acquire some sort of binary format of the compiled and linked shaders that can be later used to specify the program object directly with its binary code thus providing caching possibility to eliminate the need of compilation and linking next time the shader has to be used. This also makes it possible to create an offline GLSL compiler just using the OpenGL API itself.</p>
<p>Still, it has to be mentioned that having this feature in our hand does not necessarily mean that we can simply create our shader binaries offline and then distribute our software without the shader source itself as the binary formats supported by a particular implementation heavily depend on the hardware vendor as well as driver version. This is due to the fact that the shader binary most probably consists of instructions specially generated for the particular GPU-driver combo. The only way to relax this limitation would be to have some sort of cross-platform byte-code for shaders but that would in fact defeat most of the benefits of the extension on its own. Additionally, this extension does not provide any binary formats but leaves this to vendor specific extensions. It only exposes a common infrastructure for acquiring and loading program binaries.</p>
<p>While the usage of this extension does not completely eliminates the need for shader source compilation, it can limit the need for recompilation and relink to an installation time or first-run time compilation instead and use the stored binaries later. It also opens up room for SDK tools providing shader compilers with more aggressive optimization at their disposal being used offline. Such tools can truly be introduced as the specification explicitly mentions that run-time generated binaries by the GL should be interchangeable with those generated by offline SDK tools.</p>
<h3><a title="GL_ARB_separate_shader_objects" href="http://www.opengl.org/registry/specs/ARB/separate_shader_objects.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/separate_shader_objects.txt?referer=');">ARB_separate_shader_objects</a></h3>
<p>This is one another extension requested over several forums by the community. This feature has a longer history as it is actually based on the already existing and widely supported extension <a title="GL_EXT_separate_shader_objects" href="http://www.opengl.org/registry/specs/EXT/separate_shader_objects.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/separate_shader_objects.txt?referer=');">GL_EXT_separate_shader_objects</a> by NVIDIA. For those who are already familiar with the predecessor of this extension won&#8217;t really find too much new stuff reading the specification of the ARB version of the extension, however it is still a must to read for them as well as even though there aren&#8217;t too much semantic differences between the functionality of the two, the usage of them still differs quite a lot as the ARB version solved the design issues of its predecessor by introducing a new type of GL object that I will talk about just in a moment.</p>
<p>In a nutshell, this extension provides a way to create program objects using any variation of shaders and bind them together to the current rendering context. Previously there was no way to bind multiple program objects to the context as the program object was designed to be a container for all the shaders forming the rendering pipeline of the context. This was a design decision during the development of GLSL that, before this extension, made the connection between the varyings of subsequent shader stages using a name based binding. As name information is available for shaders latest in the link stage, shaders were tightly coupled meaning that a change in any shader stage code required the relinking of the complete program object.</p>
<p>This proved to be very unpleasant for OpenGL developers as usually every rendering engine has its own set of vertex and fragment shaders (maybe accompanied with other shader types) that are used in various combinations. As an example, let&#8217;s take two vertex shaders: a simple MVP matrix based transformation shader and a more complex one that also supports skeletal animation. Also let&#8217;s take two fragment shaders: one for diffuse material and one for reflective material. We can have several types of objects: static with diffuse material, static with reflective material, animated with diffuse material and animated with reflective material.</p>
<p>In traditional GLSL the vertex and fragment shaders are bound together at link time rather than at the time they are bound to the context, like it was in case of legacy shaders (<a title="GL_ARB_vertex_program" href="http://www.opengl.org/registry/specs/ARB/vertex_program.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/vertex_program.txt?referer=');">GL_ARB_vertex_program</a>, <a title="GL_ARB_fragment_program" href="http://www.opengl.org/registry/specs/ARB/fragment_program.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/fragment_program.txt?referer=');">GL_ARB_fragment_program</a> and others). This means that in order to be able to use any of the combinations of vertex and fragment shaders (and maybe some geometry and tesselation shaders as well) we end up with two possible solutions, both having their severe drawbacks:</p>
<p><strong><em>Link every combination of the shader objects</em></strong></p>
<p>While this sounds as a viable solution and is still used by most of the developers, it has several problems. First of all, it wastes resources as we now have several copies of the same piece of code and the number of combinations can be pretty high, especially if not just vertex and fragment shaders are in use. While this is already quite a reasonable issue with the solution, the biggest problem arises for the application developer when he or she has to maintain an individual set of uniform locations as well as binding points for vertex attributes, draw buffers and possibly transform feedback buffers. While the <a title="GL_ARB_explicit_attrib_location" href="http://www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt?referer=');">GL_ARB_explicit_attrib_location</a> extension already eliminates the need for maintaining binding points for vertex attributes, this solution is still simply unacceptable.</p>
<p><strong><em>Link the program objects on an on-demand basis</em></strong></p>
<p>In case of this alternative we are said to link the shader objects only when they are actually needed. While this solution eliminates the need for a possibly huge number of program objects, it introduces a reasonable run-time performance hit due to the additional relink process needed. Additionally, this solution proves to be more inferior even compared to the previous one as the uniform locations are determined at link time so it makes no less headache to the application developer.</p>
<p>This is the rationale behind this extension and why it is included into the core specification. The extension relaxes the strict tightly coupled behavior of the GLSL and adopts a mix-and-match shader stage model allowing multiple different program objects to be bound at once each to an individual set of rendering pipeline stage independently of other stage bindings.</p>
<p>Due to the fact that from now program objects are not the top most containers for the code used currently by the rendering pipeline, the ARB decided to introduce a new container object called a &#8220;program pipeline object&#8221; that can contain a set of program objects bound to their very own set of shader stages. This is the main difference between the EXT and the ARB version of the extension. I think it was a good decision to introduce this new type of object and the associated semantics as I always thought that the EXT version of the extension doesn&#8217;t have a really good design as I&#8217;ve seen it kind of a hack to relax the limitations of GLSL. The program pipeline object idea is definitely superior and I hope that the GLSL does not have too much of such annoying design issues hidden within.</p>
<h3><a title="GL_ARB_shader_precision" href="http://www.opengl.org/registry/specs/ARB/shader_precision.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_precision.txt?referer=');">ARB_shader_precision</a></h3>
<p>This extension is much more a clarification to the existing specification rather than a new feature. It restricts more clearly the precision requirements of implementations of GLSL. According to the specification, the extension is meant to more precisely define the precision of arithmetic operations (addition, multiplication, etc.), transcendentals (log, exp, pow, etc.), when <a title="NaN - Wikipedia" href="http://en.wikipedia.org/wiki/NaN" target="_blank" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/NaN?referer=');">NaN</a>s (not-a-number) and INFs (infinites) will be accepted and generated and denorm flushing behavior. The precision of the rest of the operations, including trigonometric operations are not addressed by the extension. For further details, please refer to the extension specification.</p>
<h3><a title="GL_ARB_vertex_attrib_64bit" href="http://www.opengl.org/registry/specs/ARB/vertex_attrib_64bit.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/vertex_attrib_64bit.txt?referer=');">ARB_vertex_attrib_64bit</a></h3>
<p>This extension trivially introduces 64-bit floating-point types into the list of supported vertex attribute component types. Nominally OpenGL did support this already from the very early stages of its history, however in practice only the latest generation of hardware does really accept vertex attributes in double precision floating-point type. While OpenGL 4 already introduced support for 64-bit floating-point values in GLSL and most of the shaders&#8217; environment, vertex attributes gained the 64-bit precision only with this new extension.</p>
<p>This new feature makes it possible to use high precision for positioning data and other attributes of our geometries. While this sounds pretty awesome and it is actually, still for game developers and other real-time graphics users this shouldn&#8217;t mean that they should quickly switch to the new precision only in such cases when the precision requirements of the application really need it as using 64-bit floating-point values for vertex attributes does not just double the memory consumption but also involves a serious hit on performance due to bandwidth limitations and vertex attributes of this type may count double against the implementation-dependent limit on the number of vertex shader attribute vectors.</p>
<h3><a title="GL_ARB_viewport_array" href="http://www.opengl.org/registry/specs/ARB/viewport_array.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/viewport_array.txt?referer=');">ARB_viewport_array</a></h3>
<p>Previously, the configuration of the viewport, aka the transformation that generates the screen space coordinates based on the incoming view space coordinates of the vertices, was a global configuration that had effect on all draw commands meaning that in order to draw a primitive into multiple viewports the OpenGL viewport had to be changed between several draw calls. While previously this limitation wasn&#8217;t really an issue, due to the introduction of geometry shaders the possibility to amplify geometry and produce multiple output primitives for each primitive input justifies the need of several separately configurable viewports. Why? Because even though one was able to render the output primitives into separate render targets, they still shared the same global viewport.</p>
<p>This extension enhances OpenGL by providing a mechanism to specify multiple viewports and a new ability for the geometry shader being able to select the used viewport on a per-primitive basis. This does not just mean that separate viewports can be used for separate render targets but also enables to use multiple viewports to render to the same render target.</p>
<p>Additionally, the introduction of a viewport array means that we&#8217;re gonna have separate scissor rectangle for each viewport in the array as well. This can come handy for deferred shading based renderers that often use the scissor rectangle to limit the number of pixels to be accessed in case of rendering the effect of a light source. Having multiple scissors means that we have to change state less often, thus batching is much less an issue even in case of heavy scissor rectangle usage.</p>
<p>Finally, the new viewport specification commands accept floating point values thus providing additional flexibility to the application developer to define their very own pixel center conventions.</p>
<p>I&#8217;m pretty unsure whether this feature depends on any Shader Model 5.0 hardware, maybe others are more aware of this. Anyway, I wouldn&#8217;t be surprised if this extension will be supported by a much larger range of graphics cards than just pure SM5 GPUs. Actually this is true for many other extensions introduced by OpenGL 4.1 but let&#8217;s not guess but wait for the upcoming drivers to see whether I&#8217;m right or wrong.</p>
<h2>Some other interesting extensions</h2>
<p>So far I presented the new features of the latest revision of the OpenGL specification. While this was the main topic of this article, at about the same time the specification was published, a lot of other ARB extensions just appeared in the <a title="OpenGL Extension Registry" href="http://www.opengl.org/registry/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/?referer=');">registry</a>. While these extensions are not yet included into core and I cannot know whether they will be ever included, I would like to talk about some of them as it made me get to an interesting conclusion.</p>
<h3><a title="GL_ARB_shader_stencil_export" href="http://www.opengl.org/registry/specs/ARB/shader_stencil_export.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_stencil_export.txt?referer=');">ARB_shader_stencil_export</a></h3>
<p>The stencil test is a powerful mechanism of OpenGL to selectively discard fragments based on the content of the stencil buffer that is used in a wide variety of rendering techniques including shadow volumes and deferred shading. However, the whole configuration of the stencil test and stencil operations is completely fixed function that is limited to operations such as incrementing, decrementing the existing value, or replacing the existing value in the stencil buffer with a fixed reference value.</p>
<p>This extension provides some programmability to the fixed function stencil operations by enabling the fragment shader to output a stencil reference value on a per-fragment basis. When stencil testing is enabled, this allows the test to be performed against the value generated in the shader. Also, when the stencil operation is set to GL_REPLACE, this allows a value generated in the shader to be written to the stencil buffer directly.</p>
<p>This opens up a lot of possibilities, however, I need to think much more about it as the best use cases of this feature are pretty much not basic ones. Obviously, by using the stencil reference value export inside a fragment shader disables early stencil test in the same style as exporting an new depth value from within a fragment shader disables early depth test.</p>
<h3><a title="GL_ARB_debug_output" href="http://www.opengl.org/registry/specs/ARB/debug_output.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/debug_output.txt?referer=');">ARB_debug_output</a></h3>
<p>This extension allows OpenGL to notify the application when various events occur that can come handy during application development and debugging. These events include errors, usage of deprecated functionalities, using configuration that results in undefined behavior, portability or performance issues. The application is notified about these events using a callback function that is defined by passing a function pointer to the appropriate OpenGL command.</p>
<p>While this extension provides a callback mechanism only for debugging purposes, the most revolutionary thing by having such an ARB extension is that this is the first official appearance of a feature that supports callbacks to the application code. Most probably not I&#8217;m the only person who would like to see a lot of other callbacks in the future included in the OpenGL API as we can benefit from it by getting notification about e.g. the completion of various asynchronous commands issued previously. This does not just provide a lot of flexibility but may also help in optimizing the rendering code based on the additional information previously available only if we use polling.</p>
<h3>Why these extensions are so interesting?</h3>
<p>The two extensions presented above already great value on their own but this isn&#8217;t why I mentioned them. The reason why I found these extensions so interesting as they are both obviously based on some vendor specific extensions released in the recent past by AMD, namely <a title="GL_AMD_shader_stencil_export" href="http://www.opengl.org/registry/specs/AMD/shader_stencil_export.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/shader_stencil_export.txt?referer=');">GL_AMD_shader_stencil_export</a> and <a title="GL_AMD_debug_output" href="http://www.opengl.org/registry/specs/AMD/debug_output.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/debug_output.txt?referer=');">GL_AMD_debug_output</a>. This conspicuously reveals that AMD has serious plans with their OpenGL support and this is something that a lot of those crazy folks waited for, who develop OpenGL stuff using ATI cards like me.</p>
<p>I think this also means that the NVIDIA monopoly in the ARB is over and this results in concurency and competition from what OpenGL and its community will definitely benefit in the long run.</p>
<h2>Conclusion</h2>
<p>The article ran out of control again, like the one I wrote about the previous release of the specification. Again, hope there are at least a few of you who kept up reading and finally got to this last chapter of the article. We can again quote the always recurring question of the community:</p>
<blockquote><p>Where is direct state access?</p>
</blockquote>
<p>Well, it is still not here, however, finally AMD has finished implementing it as well and published it finally. They have been working on it for quite some time but it became officially public only with Catalyst 10.7. Haven&#8217;t used it so far so maybe plenty of hidden bugs are still in it but at least they have it. This is one another thing that strengthens my prognostication that AMD committed itself for support OpenGL as previously they barely added support for any other extensions beside core features.</p>
<p>Back to the topic of the OpenGL 4.1 specification, while it is not as revolutionary as we got used to after reading the previous update, OpenGL is still on track and this is thanks to the Khronos Group and obviously to the great community. If OpenGL will get its iterative evolution in this pace like we&#8217;ve seen in the last two years, Microsoft will have a difficult time to keep up.</p>
<p>Thanks for reading this not-so-short article!</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/08/an-introduction-to-opengl-4-1/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>One more degree of freedom for C++</title>
		<link>http://rastergrid.com/blog/2010/02/one-more-degree-of-freedom-for-c/</link>
		<comments>http://rastergrid.com/blog/2010/02/one-more-degree-of-freedom-for-c/#comments</comments>
		<pubDate>Sun, 14 Feb 2010 14:38:42 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[callback]]></category>
		<category><![CDATA[delegate]]></category>
		<category><![CDATA[delegation]]></category>
		<category><![CDATA[Delphi]]></category>
		<category><![CDATA[event handling]]></category>
		<category><![CDATA[message]]></category>
		<category><![CDATA[signal]]></category>
		<category><![CDATA[signals and slots]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=176</guid>
		<description><![CDATA[Those who worked enough with C or other procedure oriented languages know how much flexibility callbacks provide. The simplest example is the qsort function of the C standard library. It is also not unintentional that many libraries, windowing system APIs and operating system APIs also highly rely on callbacks to pass a particular task over]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2010%252F02%252Fone-more-degree-of-freedom-for-c%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2Fdrfdzt%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22One%20more%20degree%20of%20freedom%20for%20C%2B%2B%22%20%7D);"></div>
<p>Those who worked enough with C or other procedure oriented languages know how much flexibility callbacks provide. The simplest example is the qsort function of the C standard library. It is also not unintentional that many libraries, windowing system APIs and operating system APIs also highly rely on callbacks to pass a particular task over to another program module and it is one of the fundamental tools needed to implement an event-driven application. At the same time, object oriented languages does not directly support the concept of callbacks as they don&#8217;t really fit into the paradigms used by these languages. Fortunately, even if not as a language feature, all object oriented languages support a similar facility like callbacks in the form of delegates.</p>
<p><span id="more-176"></span>Delegation as a design pattern is used to describe the situation when one object passes on the implementation of a particular task to another object. This clearly reflects the purpose of callbacks used in procedure oriented languages. Many languages does natively support some form of delegation, some of the well known ones are C# and Delphi.</p>
<h2>Callbacks</h2>
<p>As mentioned before, the facility present in procedure oriented languages that enables the delegation of functionalities to other modules is done with callbacks. These callbacks are specified by passing function pointers to some registration functions provided by the library. Here is a very simple C example:</p>
<pre class="brush: c">/* server header */
void registerFooCallback(int (*fooCB)(int, float));
int doFoo(int a, float b);

/* client code */
int myFooCallback(int a, float b) {
    /* ... do something ... */
}

int main() {
    registerFooCallback(myFooCallback);
    cout &lt;&lt; doFoo(5, 3.2f);
    return 0;
}</pre>
<p>Here we can see how easily callbacks provide injection of user code for handling events happened in the server.</p>
<h2>Delegation as a design pattern</h2>
<p>The simplest way to create object oriented callbacks is by applying the design pattern of delegation. If we would like to construct the C++ equivalent of the example above using the mentioned pattern, we end up with something like the following:</p>
<pre class="brush: cpp">/* server header */
class IFooCallback {
public:
    virtual int operator() (int a, float b) = 0;
};

class Foo {
private:
    IFooCallback* _fooCB;
public:
    void registerCallback(IFooCallback* fooCB);
    int doFoo(int a, float b);
};

/* client code */
class MyFooCallback: public IFooCallback {
    int operator() (int a, float b) {
        /* ... do something ... */
    }
};

int main() {
    Foo foo;
    MyFooCallback fooCB;
    foo.registerCallback(fooCB);
    cout &lt;&lt; foo.doFoo(5, 3.2f);
    return 0;
}</pre>
<p>As you can see, it is quite straightforward to provide an object oriented alternative to callbacks. However, there is a very significant drawback when using the technique above, namely the type intrusion inherently coming from this definition of a callback. The client code needs to explicitly inherit it&#8217;s own code from a type defined in the server. This results in tight coupling and is likely to carry other disadvantages inside regarding to maintainability and migration issues.</p>
<h2>Delegate methods</h2>
<p>In our previous attempt to provide an easy to use C++ alternative for callbacks with OOP in mind we tried to replace function pointers with a pure virtual base class that acts like an interface definition for our callback. However, it somewhat violates the original goals of delegates which by definition should be some form of run-time inheritance (this varies from definition to definition, still, this is the one that I&#8217;m referring to in this article). We soon figure out that the most convenient way would be to be able to assign member functions of any class as a callback. Obviously, the parameters and return type should still match as previously to provide type safety, but we would like to remove any additional dependencies between the client and the server.</p>
<p>While C++ does have the term of pointers to member functions there is no easy and standard way to implement callbacks using them. Or is there? First of all, there is no particular problem with class static member functions as they are much like C functions, however, limiting delegates to static methods heavily affects the freedom of the developer. The problem with object member functions and especially with virtual member functions is that they have the implicit parameter <strong>this</strong> that enables them to access the object they correspond to.</p>
<p>The popular Boost library provides mechanisms that enables the use of object member functions as separate entities by using the <strong>bind</strong> functor adaptor which became part of the language standard as part of <a title="ISO/IEC TR 19768:2007" href="http://www.iso.org/iso/catalogue_detail.htm?csnumber=43289" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.iso.org/iso/catalogue_detail.htm?csnumber=43289&amp;referer=');">Technical Report 1</a>. This extension makes it possible to use member functions as delegates in a way that does not involve any type intrusion side effects.</p>
<p>Unfortunately, these facilities involve a noticeable performance hit when the callback is invoked compared to simple method invocations. Also, using functor adaptors for implementing delegates is not the most straightforward and makes the code quite ugly compared to an ideal situation when delegates are part of the language itself. Of course, this is only my opinion, others who used these libraries more often may have a different vision about the topic.</p>
<p>Anyway, as for me performance is always a concern, I started to look around for alternatives. It surprised me that I&#8217;ve found even two of them very soon:</p>
<ul>
<li><a title="Fastest Possible C++ Delegates" href="http://www.codeproject.com/KB/cpp/FastDelegate.aspx" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.codeproject.com/KB/cpp/FastDelegate.aspx?referer=');">Fastest Possible C++ Delegates</a> by Don Clugston &#8211; This is a library that provides delegates that are as fast as simple virtual method invocations. The implementation strongly relies on the behavior of different compilers, yet is very portable, at least as far as I can tell.</li>
<li><a title="The Impossibly Fast C++ Delegates" href="http://www.codeproject.com/KB/cpp/ImpossiblyFastCppDelegate.aspx" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.codeproject.com/KB/cpp/ImpossiblyFastCppDelegate.aspx?referer=');">The Impossibly Fast C++ Delegates</a> by Sergey Ryazanov &#8211; This library was introduced as an alternative to the previous one that strictly relies only on standard features of the languages. Surprisingly, this later is less supported by different compiler implementations and it is also somewhat slower than the previous one.</li>
</ul>
<p>Personally, I go with the first one as for me performance and portability is more important than conformance with the standard. And, of course, it is not that hard to change the back-end for the delegate support at some time if I change my mind. Finally, lets see how our foo callback looks like when using the fast delegates of Don Clugston:</p>
<pre class="brush: cpp">/* server header */
class Foo {
private:
    FastDelegate2&lt;int, float, int&gt; _fooCB;
public:
    void registerCallback(FastDelegate2&lt;int, float, int&gt; fooCB);
    int doFoo(int a, float b);
};

/* client code */
class MyClass {
    virtual int handleFoo(int a, float b) {
        /* ... do something ... */
    }
};

int main() {
    Foo foo;
    MyClass myObj;
    foo.registerCallback( MakeDelegate(&amp;myObj, &amp;MyClass::handleFoo) );
    cout &lt;&lt; foo.doFoo(5, 3.2f);
    return 0;
}</pre>
<h2>Multicast delegates</h2>
<p>The delegates presented previously can only be bound to a single method, as usually delegates behave this way, although a single method can be bound by many delegates. The signals and slots model extends this to a many-to-many relationship. Thus a signal is actually just a delegate that can bind to multiple methods at once. Such a primitive is sometimes also referred to as a multicast delegate.</p>
<p>Multicast delegates come handy especially in case of user interface programming and other situations where the event based programming model is used. The basic foundation behind this programming model is the idea of &#8220;subscribe and notify&#8221;. That means there are <em>publishers</em> who will do some logic and sometimes publish <em>events</em>. When such an <em>event</em> is published, it is actually sent out to the <em>subscribers</em> who have subscribed to receive the specific event. At implementation level this is nothing more than having a multicast delegate in the <em>publisher</em> object and providing an interface that will be used by the <em>subscriber</em> objects to register one of their methods that has to be called in case a particular <em>event</em> occurs.</p>
<p>There are plenty of signals and slots libraries out there including but not limited to the Boost Signals library. However, again, if performance is a concern one must look around carefully to find the appropriate library suitable for a particular purpose. One such library that extends the fast delegates of Clugston with a signals and slots framework is that of <a title="Simpler UI Code With Signals and Slots" href="http://www.gallantgames.com/2009/12/13/simpler-ui-code-with-signals-and-slots" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.gallantgames.com/2009/12/13/simpler-ui-code-with-signals-and-slots?referer=');">Patrick Hogan</a>&#8216;s.</p>
<h2>Asynchronous delegates</h2>
<p>If we do one more step forward, we arrive to asynchronous delegates that can provide us a flexible yet efficient messaging system for multi-threaded applications. The only additional thing we have to implement a message queue on the callee side and optionally some form of synchronization if we would like to also make it possible for the asynchronous delegates to return data to the caller.</p>
<p>As this topic deserves a thorough discussion on its own, I would recap on the subject in a future article and try to provide a sample implementation using OpenMP as usual.</p>
<h2>Conclusion</h2>
<p>We&#8217;ve just touched the surface of what possible use case scenarios of delegates one can met during software development, still, we&#8217;ve seen how many advantages such a programming primitive can give to C++ developers no matter if they are implementing a very simple library of sorting algorithms like the qsort C standard library function or a robust, fully event-driven multi-threaded application. We&#8217;ve also seen that there exist several efficient implementations of such a framework for those performance fanatics like me.</p>
<p>It is a perfect example how easily one can extend C++ with another facility that is usually available only in the most modern managed languages. By the way, I would be interested in your opinion what do you like the most in other languages like Java and C#, and you are disappointed that C++ does not directly provide the same thing. Maybe there exists a C++ alternative for those facilities as well, just we have to look around to find them&#8230;</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/02/one-more-degree-of-freedom-for-c/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

