<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>RasterGrid Blog &#187; fragment shader</title>
	<atom:link href="http://rastergrid.com/blog/tag/fragment-shader/feed/" rel="self" type="application/rss+xml" />
	<link>http://rastergrid.com/blog</link>
	<description>A technical blog from Daniel Rákos (aka aqnuep)</description>
	<lastBuildDate>Fri, 24 Feb 2012 03:23:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>OpenGL vs DirectX: The War Is Far From Over</title>
		<link>http://rastergrid.com/blog/2011/10/opengl-vs-directx-the-war-is-far-from-over/</link>
		<comments>http://rastergrid.com/blog/2011/10/opengl-vs-directx-the-war-is-far-from-over/#comments</comments>
		<pubDate>Fri, 07 Oct 2011 19:02:12 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Direct3D]]></category>
		<category><![CDATA[DirectX]]></category>
		<category><![CDATA[fragment shader]]></category>
		<category><![CDATA[geometry instancing]]></category>
		<category><![CDATA[geometry shader]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[occlusion culling]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[tessellation control shader]]></category>
		<category><![CDATA[tessellation evaluation shader]]></category>
		<category><![CDATA[transform feedback]]></category>
		<category><![CDATA[uniform buffer]]></category>
		<category><![CDATA[vertex buffer]]></category>
		<category><![CDATA[vertex shader]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=652</guid>
		<description><![CDATA[I&#8217;ve chosen the title based on the popular article that tries to prove that OpenGL lost the war against Direct3D. To be honest, I didn&#8217;t really like the article at all. First, because it compared OpenGL 3 which targeted Shader Model 4.0 hardware and DirectX 11 which targeted Shader Model 5.0 hardware. Besides that, as we]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2011%252F10%252Fopengl-vs-directx-the-war-is-far-from-over%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2FnmYZeW%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22OpenGL%20vs%20DirectX%3A%20The%20War%20Is%20Far%20From%20Over%22%20%7D);"></div>
<div class="wp-caption alignleft" style="width: 260px"><img title="OpenGL vs DirectX" src="http://rastergrid.com/blog/wp-content/uploads/2011/10/opengl-vs-directx-250x138.jpg" alt="OpenGL vs DirectX" width="250" height="138" /><p class="wp-caption-text">The War Is Far From Over</p></div>
<p>I&#8217;ve chosen the title based on the <a title="OpenGL 3 &amp; DirectX 11: The War Is Over" href="http://www.tomshardware.com/reviews/opengl-directx,2019.html" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.tomshardware.com/reviews/opengl-directx_2019.html?referer=');">popular article</a> that tries to prove that OpenGL lost the war against Direct3D. To be honest, I didn&#8217;t really like the article at all. First, because it compared OpenGL 3 which targeted Shader Model 4.0 hardware and DirectX 11 which targeted Shader Model 5.0 hardware. Besides that, as we will see, the war is really far from over&#8230; This article aims to list the most important features introduced by OpenGL 3.x, OpenGL 4.x, Direct3D 10, Direct3D 11 and we will also talk about the promised features of the upcoming Direct3D 11.1 to be fair with DirectX <img src='http://rastergrid.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><span id="more-652"></span></p>
<p>After I wrote <a title="An introduction to OpenGL 4.2" href="http://rastergrid.com/blog/2011/08/an-introduction-to-opengl-4-2/">my article about the latest features introduced in OpenGL</a> someone asked me whether I can write an article about the comparison of the hardware features exposed by OpenGL and Direct3D. Instead of a long explanation, I decided to simply create a table of the features introduced by the APIs. Please note that the list focuses on hardware features and does not discuss API feature differences between the two APIs. The list may be far from complete and I&#8217;m happy to get feedback about what is missing from the table so that I can extend it. Also there are features for which I did not find whether an equivalent exists in D3D and are marked with a question mark. If anybody can point me to the answer, I would be happy, but I did not find a specification of the HLSL versions.</p>
<table style="width: 100%;" border="0">
<tbody>
<tr>
<td style="text-align: center; background-color: #c5e526;" colspan="6"><strong>HARDWARE FEATURES EXPOSED</strong></td>
</tr>
<tr>
<td style="text-align: center; background-color: #c5e526;" colspan="6"><strong>Draw command related features</strong></td>
</tr>
<tr style="height: 20px">
<td style="background-color: #aaaaaa;"></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 3.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 4.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 10</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11.1</span></strong></td>
</tr>
<tr>
<td style="padding: 0px">Conditional/predicated rendering based on the result of occlusion queries (<a href="http://www.opengl.org/registry/specs/NV/conditional_render.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/NV/conditional_render.txt?referer=');">NV_conditional_render</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Basic geometry instancing support and instanced draw commands (<a href="http://www.opengl.org/registry/specs/ARB/draw_instanced.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_instanced.txt?referer=');">ARB_draw_instanced</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Geometry instancing with the ability to specify instanced vertex attributes (<a href="http://www.opengl.org/registry/specs/ARB/instanced_arrays.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/instanced_arrays.txt?referer=');">ARB_instanced_arrays</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Primitive restart (cut index) feature for batching multiple strips together (<a href="http://www.opengl.org/registry/specs/NV/primitive_restart.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/NV/primitive_restart.txt?referer=');">NV_primitive_restart</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Draw commands allowing modification of the base vertex index (<a href="http://www.opengl.org/registry/specs/ARB/draw_elements_base_vertex.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_elements_base_vertex.txt?referer=');">ARB_draw_elements_base_vertex</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Indirect draw commands that source their parameters from server side buffers (<a href="http://www.opengl.org/registry/specs/ARB/draw_indirect.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_indirect.txt?referer=');">ARB_draw_indirect</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="text-align: center; background-color: #c5e526;" colspan="6"><strong>New shader type related features</strong></td>
</tr>
<tr style="height: 20px">
<td style="background-color: #aaaaaa;"></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 3.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 4.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 10</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11.1</span></strong></td>
</tr>
<tr>
<td style="padding: 0px">Geometry shader support and adjacency primitive support (<a href="http://www.opengl.org/registry/specs/ARB/geometry_shader4.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/geometry_shader4.txt?referer=');">ARB_geometry_shader4</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Instanced geometry shader support with fixed number of invocations (<a href="http://www.opengl.org/registry/specs/ARB/gpu_shader5.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/gpu_shader5.txt?referer=');">ARB_gpu_shader5</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Tessellation control and evaluation (hull and domain) shader support (<a href="http://www.opengl.org/registry/specs/ARB/tessellation_shader.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/tessellation_shader.txt?referer=');">ARB_tessellation_shader</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="text-align: center; background-color: #c5e526;" colspan="6"><strong>Transform feedback (stream-output) related features</strong></td>
</tr>
<tr style="height: 20px">
<td style="background-color: #aaaaaa;"></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 3.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 4.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 10</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11.1</span></strong></td>
</tr>
<tr>
<td style="padding: 0px">Basic transform feedback (stream-output) support (<a href="http://www.opengl.org/registry/specs/EXT/transform_feedback.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/transform_feedback.txt?referer=');">EXT_transform_feedback</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Transform feedback support without a geometry shader being active (<a href="http://www.opengl.org/registry/specs/EXT/transform_feedback.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/transform_feedback.txt?referer=');">EXT_transform_feedback</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for pausing and resuming transform feedback (stream-output) (<a href="http://www.opengl.org/registry/specs/ARB/transform_feedback2.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/transform_feedback2.txt?referer=');">ARB_transform_feedback2</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Auto-draw support (feed back the contents of the transform feedback buffer) (<a href="http://www.opengl.org/registry/specs/ARB/transform_feedback2.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/transform_feedback2.txt?referer=');">ARB_transform_feedback2</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Instanced auto-draw support (transform feedback buffer drawing with instancing support) (<a href="http://www.opengl.org/registry/specs/ARB/transform_feedback_instanced.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/transform_feedback_instanced.txt?referer=');">ARB_transform_feedback_instanced</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for outputting multiple primitive streams using transform feedback (stream-output) (<a href="http://www.opengl.org/registry/specs/ARB/transform_feedback3.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/transform_feedback3.txt?referer=');">ARB_transform_feedback3</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="text-align: center; background-color: #c5e526;" colspan="6"><strong>Asynchronous queries and related features</strong></td>
</tr>
<tr style="height: 20px">
<td style="background-color: #aaaaaa;"></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 3.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 4.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 10</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11.1</span></strong></td>
</tr>
<tr>
<td style="padding: 0px">Support for occlusion query for getting number of samples passed (<a href="http://www.opengl.org/registry/specs/ARB/occlusion_query.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/occlusion_query.txt?referer=');">ARB_occlusion_query</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for occlusion query for getting only a boolean value about visibility (<a href="http://www.opengl.org/registry/specs/ARB/occlusion_query2.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/occlusion_query2.txt?referer=');">ARB_occlusion_query2</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support to query the number vertices processed and the number of vertex shader invocations</td>
<td style="background-color: #cc5555"></td>
<td style="background-color: #cccc55; text-align: center;"><a href="#tblcmt1">[1]</a></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support to query the number of geometry shader invocations in case a geometry shader is active</td>
<td style="background-color: #cc5555"></td>
<td style="background-color: #cccc55; text-align: center;"><a href="#tblcmt1">[1]</a></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support to query the number of primitives output by the geometry shader (<a href="http://www.opengl.org/registry/specs/EXT/transform_feedback.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/transform_feedback.txt?referer=');">EXT_transform_feedback</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support to query the number of primitives that were sent to the rasterizer (<a href="http://www.opengl.org/registry/specs/EXT/transform_feedback.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/transform_feedback.txt?referer=');">EXT_transform_feedback</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support to query the number of primitives that were passing clipping and were actually rendered</td>
<td style="background-color: #cc5555"></td>
<td style="background-color: #cccc55; text-align: center;"><a href="#tblcmt1">[1]</a></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support to query the number of times a fragment/pixel shader was invoked</td>
<td style="background-color: #cc5555"></td>
<td style="background-color: #cccc55; text-align: center;"><a href="#tblcmt1">[1]</a></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support to query the number of primitives written during transform feedback (stream-output) (<a href="http://www.opengl.org/registry/specs/EXT/transform_feedback.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/transform_feedback.txt?referer=');">EXT_transform_feedback</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support to query the number of primitives generated during transform feedback (stream-output) (<a href="http://www.opengl.org/registry/specs/EXT/transform_feedback.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/transform_feedback.txt?referer=');">EXT_transform_feedback</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support to query a server side high resolution timestamp (<a href="http://www.opengl.org/registry/specs/ARB/timer_query.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/timer_query.txt?referer=');">ARB_timer_query</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support to query the completeness of rendering commands (<a href="http://www.opengl.org/registry/specs/ARB/sync.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/sync.txt?referer=');">ARB_sync</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="text-align: center; background-color: #c5e526;" colspan="6"><strong>Texture, vertex and renderbuffer format related features</strong></td>
</tr>
<tr style="height: 20px">
<td style="background-color: #aaaaaa;"></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 3.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 4.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 10</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11.1</span></strong></td>
</tr>
<tr>
<td style="padding: 0px">Floating point color and depth formats for textures and render buffers (various extensions)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Cube map textures with depth component internal format (<a href="http://www.opengl.org/registry/specs/EXT/gpu_shader4.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/gpu_shader4.txt?referer=');">EXT_gpu_shader4</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Half-float (16-bit) vertex and pixel data support (<a href="http://www.opengl.org/registry/specs/NV/half_float.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/NV/half_float.txt?referer=');">NV_half_float</a>, <a href="http://www.opengl.org/registry/specs/ARB/half_float_pixel.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/half_float_pixel.txt?referer=');">ARB_half_float_pixel</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Non-normalized integer color formats for textures and renderbuffers (<a href="http://www.opengl.org/registry/specs/EXT/texture_integer.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/texture_integer.txt?referer=');">EXT_texture_integer</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Packed depth/stencil texture and renderbuffer formats (<a href="http://www.opengl.org/registry/specs/EXT/packed_depth_stencil.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/packed_depth_stencil.txt?referer=');">EXT_packed_depth_stencil</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">RGTC texture compression for two-component textures (<a href="http://www.opengl.org/registry/specs/EXT/texture_compression_rgtc.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/texture_compression_rgtc.txt?referer=');">EXT_texture_compression_rgtc</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Signed normalized texture component formats (<a href="http://www.opengl.org/registry/specs/EXT/texture_snorm.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/texture_snorm.txt?referer=');">EXT_texture_snorm</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Seamless cube map filtering support (to hide artifacts at cube map edges) (<a href="http://www.opengl.org/registry/specs/ARB/seamless_cube_map.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/seamless_cube_map.txt?referer=');">ARB_seamless_cube_map</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for swizzling the components of a texture (<a href="http://www.opengl.org/registry/specs/ARB/texture_swizzle.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_swizzle.txt?referer=');">ARB_texture_swizzle</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
</tr>
<tr>
<td style="padding: 0px">BPTC texture compression for floating point and unsigned normalized textures (<a href="http://www.opengl.org/registry/specs/ARB/texture_compression_bptc.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_compression_bptc.txt?referer=');">ARB_texture_compression_bptc</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">64-bit floating point vertex attribute formats (<a href="http://www.opengl.org/registry/specs/ARB/vertex_attrib_64bit.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/vertex_attrib_64bit.txt?referer=');">ARB_vertex_attrib_64bit</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
</tr>
<tr>
<td style="text-align: center; background-color: #c5e526;" colspan="6"><strong>New texture type related features</strong></td>
</tr>
<tr style="height: 20px">
<td style="background-color: #aaaaaa;"></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 3.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 4.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 10</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11.1</span></strong></td>
</tr>
<tr>
<td style="padding: 0px">One- and two-dimensional layered array textures (<a href="http://www.opengl.org/registry/specs/EXT/texture_array.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/texture_array.txt?referer=');">EXT_texture_array</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Cube map array textures as special two-dimensional array textures (<a href="http://www.opengl.org/registry/specs/ARB/texture_cube_map_array).txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_cube_map_array_.txt?referer=');">ARB_texture_cube_map_array)</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Rectangular textures with no mipmap support and that are accessed with integer coordinates (<a href="http://www.opengl.org/registry/specs/ARB/texture_rectangle.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_rectangle.txt?referer=');">ARB_texture_rectangle</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Multisampled textures and support for fetching specific sample locations (<a href="http://www.opengl.org/registry/specs/ARB/texture_multisample.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_multisample.txt?referer=');">ARB_texture_multisample</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Casting a texture&#8217;s interpreted internal format to another internal format</td>
<td style="background-color: #cccc55; text-align: center;"><a href="#tblcmt4">[4]</a></td>
<td style="background-color: #cccc55; text-align: center;"><a href="#tblcmt4">[4]</a></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="text-align: center; background-color: #c5e526;" colspan="6"><strong>Uniform buffer (constant buffer) related features</strong></td>
</tr>
<tr style="height: 20px">
<td style="background-color: #aaaaaa;"></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 3.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 4.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 10</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11.1</span></strong></td>
</tr>
<tr>
<td style="padding: 0px">Basic uniform buffer (constant buffer) support (<a href="http://www.opengl.org/registry/specs/ARB/uniform_buffer_object.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/uniform_buffer_object.txt?referer=');">ARB_uniform_buffer_object</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for large uniform buffers and binding subranges (<a href="http://www.opengl.org/registry/specs/ARB/uniform_buffer_object.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/uniform_buffer_object.txt?referer=');">ARB_uniform_buffer_object</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="text-align: center; background-color: #c5e526;" colspan="6"><strong>Framebuffer and texture rendering related features</strong></td>
</tr>
<tr style="height: 20px">
<td style="background-color: #aaaaaa;"></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 3.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 4.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 10</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11.1</span></strong></td>
</tr>
<tr>
<td style="padding: 0px">Rendering to textures and renderbuffers (<a href="http://www.opengl.org/registry/specs/EXT/framebuffer_object.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/framebuffer_object.txt?referer=');">EXT_framebuffer_object</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Multisample stretch blit functionality (<a href="http://www.opengl.org/registry/specs/EXT/framebuffer_multisample.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/framebuffer_multisample.txt?referer=');">EXT_framebuffer_multisample</a>, <a href="http://www.opengl.org/registry/specs/EXT/framebuffer_blit.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/framebuffer_blit.txt?referer=');">EXT_framebuffer_blit</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">sRGB rendering and blending support for framebuffers (<a href="http://www.opengl.org/registry/specs/EXT/framebuffer_sRGB.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/framebuffer_sRGB.txt?referer=');">EXT_framebuffer_sRGB</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for enabling or disabling clamping of the depth of fragments (<a href="http://www.opengl.org/registry/specs/ARB/depth_clamp.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/depth_clamp.txt?referer=');">ARB_depth_clamp</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for logical operations on integer render targets (supported for a decade in OpenGL)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="text-align: center; background-color: #c5e526;" colspan="6"><strong>Blending related features</strong></td>
</tr>
<tr style="height: 20px">
<td style="background-color: #aaaaaa;"></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 3.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 4.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 10</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11.1</span></strong></td>
</tr>
<tr>
<td style="padding: 0px">Support for alpha-to-coverage when using multisampling (<a href="http://www.opengl.org/registry/specs/ARB/multisample.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/multisample.txt?referer=');">ARB_multisample</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Per-color-buffer blend enables and color writemasks (<a href="http://www.opengl.org/registry/specs/EXT/draw_buffers2.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/draw_buffers2.txt?referer=');">EXT_draw_buffers2</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Dual-source color blending support based on a secondary output of the fragment shader (<a href="http://www.opengl.org/registry/specs/ARB/blend_func_extended.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/blend_func_extended.txt?referer=');">ARB_blend_func_extended</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Individual blend equations and blend functions support for each color output (<a href="http://www.opengl.org/registry/specs/ARB/draw_buffers_blend.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_buffers_blend.txt?referer=');">ARB_draw_buffers_blend</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="text-align: center; background-color: #c5e526;" colspan="6"><strong>Shader related features</strong></td>
</tr>
<tr style="height: 20px">
<td style="background-color: #aaaaaa;"></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 3.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 4.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 10</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11.1</span></strong></td>
</tr>
<tr>
<td style="padding: 0px">Texture lookup functions to access individual texels of a LOD using integer coordinates (<a href="http://www.opengl.org/registry/specs/EXT/gpu_shader4.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/gpu_shader4.txt?referer=');">EXT_gpu_shader4</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Query the dimensions of a specific LOD of a texture in shaders (<a href="http://www.opengl.org/registry/specs/EXT/gpu_shader4.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/gpu_shader4.txt?referer=');">EXT_gpu_shader4</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Ability to apply integer offsets to the texel location during texture lookup (<a href="http://www.opengl.org/registry/specs/EXT/gpu_shader4.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/gpu_shader4.txt?referer=');">EXT_gpu_shader4</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Ability to explicitly pass in derivative values that are used to compute LOD during texture lookup (<a href="http://www.opengl.org/registry/specs/EXT/gpu_shader4.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/gpu_shader4.txt?referer=');">EXT_gpu_shader4</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Control over varying variable interpolation: non-perspective, flat, centroid sampling, etc. (<a href="http://www.opengl.org/registry/specs/EXT/gpu_shader4.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/gpu_shader4.txt?referer=');">EXT_gpu_shader4</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Full signed and unsigned integer support in shaders (<a href="http://www.opengl.org/registry/specs/EXT/gpu_shader4.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/gpu_shader4.txt?referer=');">EXT_gpu_shader4</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<tr>
<td style="padding: 0px">Vertex ID built-in variable available in vertex shader (<a href="http://www.opengl.org/registry/specs/EXT/gpu_shader4.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/gpu_shader4.txt?referer=');">EXT_gpu_shader4</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Primitive ID built-in variable available in geometry and fragment shader (<a href="http://www.opengl.org/registry/specs/EXT/gpu_shader4.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/gpu_shader4.txt?referer=');">EXT_gpu_shader4</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Instance ID built-in variable available in vertex shader (<a href="http://www.opengl.org/registry/specs/ARB/draw_instanced.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_instanced.txt?referer=');">ARB_draw_instanced</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Shader fragment coordinate convention control (<a href="http://www.opengl.org/registry/specs/ARB/fragment_coord_conventions.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/fragment_coord_conventions.txt?referer=');">ARB_fragment_coord_conventions</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
</tr>
<tr>
<td style="padding: 0px">Provoking vertex control (for flat shaded varying value selection) (<a href="http://www.opengl.org/registry/specs/ARB/provoking_vertex.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/provoking_vertex.txt?referer=');">ARB_provoking_vertex</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cc5555;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for encoding and decoding floating point values from and to integers (<a href="http://www.opengl.org/registry/specs/ARB/shader_bit_encoding.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_bit_encoding.txt?referer=');">ARB_shader_bit_encoding</a>)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for get the results of the automatic LOD computations in shaders (<a href="http://www.opengl.org/registry/specs/ARB/texture_query_lod.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_query_lod.txt?referer=');">ARB_texture_query_lod</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for coherent indexing into arrays of samplers using non-constant indices (addressable samplers) (<a href="http://www.opengl.org/registry/specs/ARB/gpu_shader5.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/gpu_shader5.txt?referer=');">ARB_gpu_shader5</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for indexing into arrays of uniform blocks (addressable constant buffers) (<a href="http://www.opengl.org/registry/specs/ARB/gpu_shader5.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/gpu_shader5.txt?referer=');">ARB_gpu_shader5</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Gathered texture fetches over a 2&#215;2 footprint (with custom offsets) (<a href="http://www.opengl.org/registry/specs/ARB/texture_gather.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_gather.txt?referer=');">ARB_texture_gather</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Invocation ID built-in variable available in geometry shader (<a href="http://www.opengl.org/registry/specs/ARB/gpu_shader5.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/gpu_shader5.txt?referer=');">ARB_gpu_shader5</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for double-precision floating-point data types in shaders (<a href="http://www.opengl.org/registry/specs/ARB/gpu_shader_fp64.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/gpu_shader_fp64.txt?referer=');">ARB_gpu_shader_fp64</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for sample-frequency fragment shader execution (<a href="http://www.opengl.org/registry/specs/ARB/sample_shading.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/sample_shading.txt?referer=');">ARB_sample_shading</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support indirect subroutine calls in all shader stages (<a href="http://www.opengl.org/registry/specs/ARB/shader_subroutine.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_subroutine.txt?referer=');">ARB_shader_subroutine</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for selecting from multiple viewports using a geometry shader (<a href="http://www.opengl.org/registry/specs/ARB/viewport_array.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/viewport_array.txt?referer=');">ARB_viewport_array</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for dedicated atomic counters in shaders (<a href="http://www.opengl.org/registry/specs/ARB/shader_atomic_counters.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_atomic_counters.txt?referer=');">ARB_shader_atomic_counters</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55; text-align: center;"><a href="#tblcmt2">[2]</a></td>
<td style="background-color: #55cc55; text-align: center;"><a href="#tblcmt2">[2]</a></td>
</tr>
<tr>
<td style="padding: 0px">Support for backing up dedicated atomic counters with buffers (<a href="http://www.opengl.org/registry/specs/ARB/shader_atomic_counters.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_atomic_counters.txt?referer=');">ARB_shader_atomic_counters</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cccc55; text-align: center;"><a href="#tblcmt5">[5]</a></td>
<td style="background-color: #cccc55; text-align: center;"><a href="#tblcmt5">[5]</a></td>
</tr>
<tr>
<td style="padding: 0px">Support for load/store (read/write) buffers and textures in shaders (<a href="http://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_image_load_store.txt?referer=');">ARB_shader_image_load_store</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #cccc55; text-align: center;"><a href="#tblcmt3">[3]</a></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for atomic operations on load/store buffers and textures (<a href="http://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_image_load_store.txt?referer=');">ARB_shader_image_load_store</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for disabling or forcing early depth test (<a href="http://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_image_load_store.txt?referer=');">ARB_shader_image_load_store</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for conservative depth (enabling safe early tests even when modifying depth) (<a href="http://www.opengl.org/registry/specs/ARB/conservative_depth.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/conservative_depth.txt?referer=');">ARB_conservative_depth</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support for coverage as input to the fragment shader (<a href="http://www.opengl.org/registry/specs/ARB/gpu_shader5.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/gpu_shader5.txt?referer=');">ARB_gpu_shader5</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="text-align: center; background-color: #c5e526;" colspan="6"><strong>Miscellaneous features</strong></td>
</tr>
<tr style="height: 20px">
<td style="background-color: #aaaaaa;"></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 3.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">GL 4.x</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 10</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11</span></strong></td>
<td style="text-align: center; width: 50px; background-color: #aaaaaa; padding: 0px;"><strong><span style="color: #ffffff;">DX 11.1</span></strong></td>
</tr>
<tr>
<td style="padding: 0px">Support for floating point viewport specification (<a href="http://www.opengl.org/registry/specs/ARB/viewport_array.txt" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/viewport_array.txt?referer=');">ARB_viewport_array</a>)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Per-texture mipmap clamping (supported since the very early versions of OpenGL)</td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
<tr>
<td style="padding: 0px">Support to use a single depth texture for depth testing and as texture input (when depth writes are disabled)</td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #cc5555;"></td>
<td style="background-color: #55cc55;"></td>
<td style="background-color: #55cc55;"></td>
</tr>
</tbody>
</table>
<p><a name="tblcmt1">[1]</a> There is no support for these counters in OpenGL, however they can be implemented with the help of shader atomic counters.<br />
<a name="tblcmt2">[2]</a> There is no support in Direct3D to use the dedicated atomic counter hardware (supported currently only by AMD GPUs) only by using an append/consume buffer. Though, as atomic counters are the part of UAVs and arbitrary number of UAVs can be attached to a single resource, the same functionality is supported indirectly.<br />
<a name="tblcmt3">[3]</a> There is read/write buffer and texture support in Direct3D 11, however it is available only in the fragment (pixel) shader. Direct3D 11.1 plans to remove this restriction.<br />
<a name="tblcmt4">[4]</a> There is no support for texture format casting in OpenGL, conversion, however, can be done by doing a copy preferably using pixel buffer objects.<br />
<a name="tblcmt5">[5]</a> There is no support for automatic storage of atomic counter values in buffers in Direct3D, however, their value can be manually copied to arbitrary resources.</p>
<p>As a conclusion, I would like to say just one thing: even though there are some features that are not supported by either OpenGL or Direct3D, we really can say that the two APIs are on par with the number of hardware features they expose.</p>
<p>(Sorry in advance for any mistakes, it took quite some time to create this table and I may became too tired at the end)</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2011/10/opengl-vs-directx-the-war-is-far-from-over/feed/</wfw:commentRss>
		<slash:comments>70</slash:comments>
		</item>
		<item>
		<title>An introduction to OpenGL 4.2</title>
		<link>http://rastergrid.com/blog/2011/08/an-introduction-to-opengl-4-2/</link>
		<comments>http://rastergrid.com/blog/2011/08/an-introduction-to-opengl-4-2/#comments</comments>
		<pubDate>Sun, 28 Aug 2011 14:25:25 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[atomic counter]]></category>
		<category><![CDATA[fragment shader]]></category>
		<category><![CDATA[geometry instancing]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[image load store]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[texture buffer]]></category>
		<category><![CDATA[transform feedback]]></category>
		<category><![CDATA[uniform buffer]]></category>
		<category><![CDATA[vertex shader]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=611</guid>
		<description><![CDATA[After the release of the OpenGL 4.1 specification the Khronos Group slowed down the pace a little bit but they didn&#8217;t left OpenGL developers without a new specification version for too long as a few weeks ago they&#8217;ve released OpenGL 4.2. The new version of the specification brings several API improvements as well as exposes]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2011%252F08%252Fan-introduction-to-opengl-4-2%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2FpAMBuE%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22An%20introduction%20to%20OpenGL%204.2%22%20%7D);"></div>
<p>After the release of the OpenGL 4.1 specification the Khronos Group slowed down the pace a little bit but they didn&#8217;t left OpenGL developers without a new specification version for too long as a few weeks ago they&#8217;ve released OpenGL 4.2. The new version of the specification brings several API improvements as well as exposes some important pieces of hardware functionality that makes OpenGL 4.x class hardware a great step forward in GPU history. This article aims to present the newly introduced features in the latest version of the OpenGL specification and, as a few months ago I wrote an article about <a title="Suggestion for OpenGL 4.2 and beyond" href="http://rastergrid.com/blog/2010/11/suggestions-for-opengl-4-2-and-beyond/">Suggestions for OpenGL 4.2 and beyond</a>, I will write a few words about how does the new specification reflect my forecast.</p>
<p><span id="more-611"></span></p>
<h2>New features in OpenGL 4.2</h2>
<p>OpenGL 4.2 finally filled the holes in the capability matrix of Shader Model 5.0 hardware with some long waited extensions from which some of the functionalities were actually already accessible through cross-vendor and vendor specific extensions. Also, the new version of the specification brings some important API improvement extensions and GLSL constructs that continue the transition to a more easy to use state and shader management.</p>
<h3><a title="GL_ARB_texture_compression_bptc" href="http://www.opengl.org/registry/specs/ARB/texture_compression_bptc.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_compression_bptc.txt?referer=');">ARB_texture_compression_bptc</a></h3>
<p>This extension adds the new block compression texture formats called BC7 and BC6H in Direct3D terminology. The extension is actually available for quite some time, since the release of OpenGL 4.0 but now it became core. The formats provide high quality block compression for fixed point RGBA and sRGB textures as well as two floating point texture compression formats for signed and unsigned data.</p>
<p>Traditional block compression methods (as S3TC or RGTC) use the gradients in a block of pixels which works fine for smooth images but does provide poor results in case of sharp edges. BPTC solves the issue by dividing blocks into multiple partitions which are compressed using independent gradients thus providing better overall quality.</p>
<p>When comparing compression efficiency, BPTC has a compression ratio of 3:1 compared to 6:1, 4:1 and 2:1 that are the compression ratios of the S3TC DXT1, S3TC DXT5 and RGTC formats respectively.</p>
<h3><a title="GL_ARB_compressed_texture_pixel_storage" href="http://www.opengl.org/registry/specs/ARB/compressed_texture_pixel_storage.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/compressed_texture_pixel_storage.txt?referer=');">ARB_compressed_texture_pixel_storage</a></h3>
<p>This is an interesting extension that solves a problem that I didn&#8217;t even know is such a big issue. The extension is designed primarily to support compressed image formats with fixed-size blocks as that of BPTC as an example. The application can use this extension to configure pixel store parameters so that subtexture operations can provide consistent results in all cases.</p>
<h3><a title="GL_ARB_texture_storage" href="http://www.opengl.org/registry/specs/ARB/texture_storage.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_storage.txt?referer=');">ARB_texture_storage</a></h3>
<p>This is again an interesting extension that provides API improvement over how texture storage is allocated in classic OpenGL. As we all know, OpenGL was always too ad hoc on resource management, from the point of view of when actual resources are allocated for a particular API primitive. This is especially a problem in case of textures where we potentially talk about large amount of data. In classic OpenGL the driver could not know from the beginning for example whether the application will need mipmaps for the texture or how many levels are required. This could easily result in bad allocation patterns and/or large reallocations. This extension introduces the concept of immutable texture images where all the levels are allocated up-front for a texture object.</p>
<h3><a title="GL_ARB_transform_feedback_instanced" href="http://www.opengl.org/registry/specs/ARB/transform_feedback_instanced.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/transform_feedback_instanced.txt?referer=');">ARB_transform_feedback_instanced</a></h3>
<p>This extension extends the so called &#8220;AutoDraw&#8221; feature by providing instanced &#8220;AutoDraw&#8221;. This means that geometry captured using transform feedback can be rendered multiple time using geometry instancing. This is actually a feature that even D3D11 does not provide and being such, I didn&#8217;t even think that hardware supports it, even though I think the list usage patterns of the extensions is most probably pretty narrow.</p>
<h3><a title="GL_ARB_base_instance" href="http://www.opengl.org/registry/specs/ARB/base_instance.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/base_instance.txt?referer=');">ARB_base_instance</a></h3>
<p>This extension is actually the feature I called <strong>ARB_instanced_arrays2</strong> in my <a title="Suggestions for OpenGL 4.2 and beyond." href="http://rastergrid.com/blog/2010/11/suggestions-for-opengl-4-2-and-beyond/" target="_blank">suggestion list</a>. The extension provides three new draw commands, one is kind of illy named as <strong>DrawElementsInstancedBaseVertexBaseInstance</strong>, even though this command can be called the &#8220;basic&#8221; indexed draw commands that specifies all parameters. Also, the parameter list of the indirect indexed draw command is extended with the base instance parameter. Fortunately, however, the ARB chosen to add new commands rather than a <strong>SetBaseInstance</strong>-style state specifier command to introduce the new concept. Funnily this feature was missing for a long time as, as far as I know, it is supported by all GPUs capable of doing instanced drawing, and is available in D3D as well.</p>
<h3><a title="GL_ARB_shader_image_load_store" href="http://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_image_load_store.txt?referer=');">ARB_shader_image_load_store</a></h3>
<p>This is where things get start really interesting. This new extension is the ARBified version of the extension <a title="GL_EXT_shader_image_load_store" href="http://www.opengl.org/registry/specs/EXT/shader_image_load_store.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/shader_image_load_store.txt?referer=');">EXT_shader_image_load_store</a> which fortunately didn&#8217;t make it into core in its current form.</p>
<p>The extension provides GLSL built-in functions allowing shaders to load from, store to, and perform atomic read-modify-write operations to a single level of a texture called an image from any shader stage. Also, the extension indirectly enables the same set of operations for buffer objects by using buffer textures. This enables developers to implement more sophisticated algorithms using shaders that require more complex data structures than just plain arrays.</p>
<p>This, together with atomic counters that we will talk about later, enables the possibility to implement append/consume buffers and rendering techniques like AMD&#8217;s Order-Independent Transparency (OIT) algorithm as <a title="OIT and Indirect Illumination  Using DX11 Linked Lists" href="http://www.slideshare.net/hgruen/oit-and-indirect-illumination-using-dx11-linked-lists" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.slideshare.net/hgruen/oit-and-indirect-illumination-using-dx11-linked-lists?referer=');">presented at GDC10</a>.</p>
<p>As the introduction of the new write operations to fragment shaders besides the traditional framebuffer writes makes the execution of the shader have side effects and thus sensitive to whether early-Z is used or not by the hardware, so the extension also provides a mechanism to force or disable early-Z in the fragment shader.</p>
<p>A similar issue is in case of vertex shaders as the post-transform cache may be no longer valid in case of certain usage patterns of load/store images so, based on how smart the shader compiler is, the post-transform cache could be easily disabled in case a vertex shader uses load/store images resulting in downgraded performance, so care must be taken when using read/write images in vertex shaders as OpenGL does not have any mechanism to help these issues (but I actually have a proposal that I&#8217;ll talk about in a future article).</p>
<p>The API of this extension is greatly improved compared to the EXT version, especially when dealing with various texture image formats. The extension also provides a future-proof DSA-style API. Further, the ARB version of the extension supports loads from any texture format and corrected some specification bugs of the EXT version.</p>
<p>From hardware implementation point of view, it must be noted that in case a shader contains atomic operations applied to a particular read/write image the driver uses a different hardware path, as required by atomic read-modify-writes so that care must be taken to use atomic operations only when necessary. Also note that this decision is made statically at compile time by the driver so even a single atomic operation in an unlikely taken branch will result it degraded performance. This is another reason why to use atomic counters to implement append/consume buffers instead of using read/write image atomics.</p>
<h3><a title="GL_ARB_shader_atomic_counters" href="http://www.opengl.org/registry/specs/ARB/shader_atomic_counters.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_atomic_counters.txt?referer=');">ARB_shader_atomic_counters</a></h3>
<p>This the other long waited feature that I also suggested and was still missing from OpenGL but was available in D3D11. The specification was actually ongoing for a long time now (about a year) and it even appeared for a while in AMD&#8217;s OpenGL drivers sometimes as EXT, sometimes as ARB extension. The extension provides API to access a number of hardware atomic counters that provide efficient counter operations on a GPU global scale. Atomic counters come handy in many cases like append/consume buffers or indirect draw buffer construction.</p>
<p>The extension provides access to these atomic counters from GLSL and also makes it possible to back them up with buffer objects so after OpenGL draw calls the value of the counters is preserved in these buffers for later use.</p>
<p>The OpenGL implementation is superior compared to D3D&#8217;s as it provides access to atomic counters from all shader stages, with caveats of course as, it was mentioned in the previous section, the side effects made possible with read/write images and atomic counters require special care in case of fragment and vertex shaders as they may result in invalid rendering and/or lower performance.</p>
<p>On hardware vendor implementations, it must be noted that atomic counters are much, much more faster than read/write image atomics, at least on AMD hardware which has dedicated hardware for atomic counters. On NVIDIA hardware, though, it seems that there is no different hardware path for atomic counters as their performance is roughly the same as in case of read/write image atomics.</p>
<p>The dedicated hardware implementation of atomic counters, however, comes with a trade-off as the number of atomic counters is severely limited on AMD hardware, but one can still use read/write image atomics if ran out of atomic counters.</p>
<h3><a title="GL_ARB_conservative_depth" href="http://www.opengl.org/registry/specs/ARB/conservative_depth.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/conservative_depth.txt?referer=');">ARB_conservative_depth</a></h3>
<p>This is another extension I&#8217;ve suggested and that fills another functionality hole compared to D3D11. The extension is actually an ARBified version of <a title="GL_AMD_conservative_depth" href="http://www.opengl.org/registry/specs/AMD/conservative_depth.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/conservative_depth.txt?referer=');">AMD_conservative_depth</a> that extends the application developer&#8217;s control over eary depth and stencil tests. <a title="GL_ARB_shader_image_load_store" href="http://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_image_load_store.txt?referer=');">ARB_shader_image_load_store</a>  already provides a way to force or disable eary-Z and this extension provides further modes that provide a hint to the driver about how depth is modified in a fragment shader that outputs depth. This passes enough information to the GL implementation to activate some early depth test optimizations safely while still preserving the ability to account the final depth value in the depth test.</p>
<p>The extension exposes the new capability in the form of fragment shader input layout qualifiers called &#8220;depth_any&#8221;, &#8220;depth_greater&#8221;, &#8220;depth_less&#8221; and &#8220;depth_unchanged&#8221;. The interesting ones are the one that assume a greater or less depth value as output and provide the ability to early reject groups of fragments using Hi-Z and early-Z even when depth is modified. This technique can greatly improve the rendering performance of volumetric particles, decals and billboards.</p>
<p>As far as I can tell, though, the extension provides performance benefits only the AMD hardware currently as NVIDIA hardware does not have such functionality thus using the extension would still force NVIDIA GPUs to disable early-Z in case the fragment shader outputs a depth value, but future hardware may change this.</p>
<h3><a title="GL_ARB_shading_language_420pack" href="http://www.opengl.org/registry/specs/ARB/shading_language_420pack.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shading_language_420pack.txt?referer=');">ARB_shading_language_420pack</a></h3>
<p>This is a strangely named extension that provides a lot of improvements to GLSL. These are mostly API improvements only, but have a great value when looking at source code maintainability and resource management.</p>
<p>I think the most useful addition of the extension is the &#8220;binding&#8221; layout qualifier that I referred to as ARB_explicit_sampler_location and ARB_explicit_uniform_block_index in my <a title="Suggestions for OpenGL 4.2 and beyond." href="http://rastergrid.com/blog/2010/11/suggestions-for-opengl-4-2-and-beyond/" target="_blank">suggestion list</a>. This enables shader writers to explicitly bind a uniform block binding index to a uniform block as well as explicitly bind sampler, texture and image binding points to a sampler or image variable.</p>
<p>Besides that, the extension adds other minor improvements, like implicit conversion of return values of functions, UTF-8 character set support, C-style initializer list support and scalar swizzle operators.</p>
<h3><a title="GL_ARB_internalformat_query" href="http://www.opengl.org/registry/specs/ARB/internalformat_query.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/internalformat_query.txt?referer=');">ARB_internalformat_query</a></h3>
<p>This is another kind of strangely named extension that was meant to provide the possibility to query information about the internal format of textures, however, it actually failed it as it provides only the ability to query the maximum number of samples available for different texture formats.</p>
<p>The extension was ambitious as it planned to provide internal format information like the ability to query the actual internal format used, whether the format is renderable, accessible in a particular shader stage, whether it can be used as read/write image, and even to provide performance hint about using a particular texture internal format. Unfortunately all these were left for a future extension.</p>
<h3><a title="GL_ARB_map_buffer_alignment" href="http://www.opengl.org/registry/specs/ARB/map_buffer_alignment.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/map_buffer_alignment.txt?referer=');">ARB_map_buffer_alignment</a></h3>
<p>This is the last new extension introduced in OpenGL 4.2 that trivially adds the requirement to the pointer returned by buffer mapping commands that they provide a minimum of 64 byte alignment to support processing of the data directly with special CPU instructions like SSE or AVX. This can provide further performance increase when client is modifying buffer data.</p>
<h2>Conclusion</h2>
<p>OpenGL 4.2 again proven that OpenGL is not dead, but in fact plans to be again the ultimate choice of 3D API by pushing the exposed hardware capabilities over the line set by D3D11. When thinking about the list of expected extensions I presented in my earlier article, <a title="Suggestions for OpenGL 4.2 and beyond" href="http://rastergrid.com/blog/2010/11/suggestions-for-opengl-4-2-and-beyond/" target="_blank">Suggestions for OpenGL 4.2 and beyond</a> we can see that OpenGL 4.2 fulfilled all my expectations and even my wish list was partly fulfilled, but here&#8217;s the list for a better overview:</p>
<p><strong>My expectations for OpenGL 4.2:</strong></p>
<pre style="background-color: #ccffcc;"><strong>GL_EXT_shader_image_load_store</strong>
<span>- added in the form of GL_ARB_shader_image_load_store</span></pre>
<pre style="background-color: #ccffcc;"><strong>GL_ARB_shader_atomic_counters</strong>
<span>- added as is</span></pre>
<pre style="background-color: #ccffcc;"><strong>GL_ARB_instanced_arrays2</strong>
<span>- added in the form of GL_ARB_base_instance</span></pre>
<pre style="background-color: #ccffcc;"><strong>GL_ARB_explicit_sampler_location</strong>
<span>- added in the form of GL_ARB_shading_language_420pack</span></pre>
<pre style="background-color: #ccffcc;"><strong>GL_ARB_explicit_uniform_block_index</strong>
<span>- added in the form of GL_ARB_shading_language_420pack</span></pre>
<p><strong>My personal wish-list for OpenGL 4.2:</strong></p>
<pre style="background-color: #ffcccc;"><strong>GL_ARB_draw_indirect2</strong>
<span>- still missing, though partly available though <a title="GL_AMD_multi_draw_indirect" href="http://www.opengl.org/registry/specs/AMD/multi_draw_indirect.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/multi_draw_indirect.txt?referer=');">GL_AMD_multi_draw_indirect</a></span></pre>
<pre style="background-color: #ffcccc;"><strong>GL_ARB_direct_state_access</strong>
<span>- still missing, however, there is hope that it will be included in the next release where the ARB plans to rewrite the whole structure of the core specification</span></pre>
<pre style="background-color: #ccffcc;"><strong>GL_NV_texture_barrier</strong>
<span>- not in core but it is implicitly subsumed by GL_ARB_shader_image_load_store, they say</span></pre>
<pre style="background-color: #ccffcc;"><strong>GL_AMD_conservative_depth</strong>
<span>- added in the form of GL_ARB_conservative_depth, despite lack of NVIDIA support</span></pre>
<pre style="background-color: #ffcccc;"><strong>GL_ARB_texture_gather_lod</strong>
<span>- still missing, because of lack of supporting hardware</span></pre>
<pre style="background-color: #ffcccc;"><strong>GL_NV_copy_image</strong>
<span>- still missing, even though it could be a good API improvement</span></pre>
<pre style="background-color: #ffcccc;"><strong>GL_EXT_texture_filter_anisotropic</strong>
<span>- still missing, as I was informed, because of patent issues</span></pre>
<pre style="background-color: #ffcccc;"><strong>GL_ARB_shader_stencil_export</strong>
<span>- still missing, most probably because of lack of NVIDIA hardware support</span></pre>
<pre style="background-color: #ffcccc;"><strong>GL_AMD_depth_clamp_separate</strong>
<span>- still missing, most probably because of lack of NVIDIA hardware support</span></pre>
<pre style="background-color: #ffcccc;"><strong>GL_AMD_transform_feedback3_lines_triangles</strong>
<span>- still missing, most probably because of lack of NVIDIA hardware support</span></pre>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2011/08/an-introduction-to-opengl-4-2/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Frei-Chen edge detector</title>
		<link>http://rastergrid.com/blog/2011/01/frei-chen-edge-detector/</link>
		<comments>http://rastergrid.com/blog/2011/01/frei-chen-edge-detector/#comments</comments>
		<pubDate>Sun, 30 Jan 2011 15:27:43 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Samples]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[detection]]></category>
		<category><![CDATA[edge]]></category>
		<category><![CDATA[filter]]></category>
		<category><![CDATA[fragment shader]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[OpenGL]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=532</guid>
		<description><![CDATA[In this article, I would like to present you an edge detection algorithm that shares similar performance characteristics like the well-known Sobel operator but provides slightly better edge detection and can be seamlessly extended with little to no performance overhead to also detect corners alongside with edges. The algorithm works on a 3&#215;3 texel footprint]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2011%252F01%252Ffrei-chen-edge-detector%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2Fehkb4E%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Frei-Chen%20edge%20detector%22%20%7D);"></div>
<div class="wp-caption alignleft" style="width: 160px"><img title="Frei-Chen edge detector" src="http://www.rastergrid.com/blog/wp-content/uploads/2011/01/frei-chen.png" alt="Frei-Chen edge detector" width="150" height="150" /><p class="wp-caption-text">Frei-Chen edge detector</p></div>
<p>In this article, I would like to present you an edge detection algorithm that shares similar performance characteristics like the well-known Sobel operator but provides slightly better edge detection and can be seamlessly extended with little to no performance overhead to also detect corners alongside with edges. The algorithm works on a 3&#215;3 texel footprint similarly like the Sobel filter but applies a total of nine convolution masks over the image that can be used for either edge or corner detection. The article presents the mathematical background that is needed to implement the edge detector and provides a reference implementation written in C/C++ using OpenGL that showcases both the Frei-Chen and the Sobel edge detection filter applied to the same image.</p>
<p><span id="more-532"></span>I met with the algorithm during my computer graphics studies when one of my homeworks was to implement the Frei-Chen edge detector. As I already mentioned it in an earlier post, I am willing to provide source code for more basic graphics algorithms after seeing the success of <a title="Efficient Gaussian blur with linear sampling" href="http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/">my former post</a> about the Gaussian blur filter. This one is a very similarly basic article, taking in consideration it shows only how to apply a particular convolution filter based algorithm on a still image, while the possibilities this edge detection algorithm brings is a more complex topic that is out of the scope of this article.</p>
<p>As the provided reference implementation also showcases applying the Sobel operator on an image, I would like to present that first and then continue with the presentation of the Frei-Chen masking set. Those who are already well familiar with edge detection and the Sobel operator can skip the following two sections.</p>
<h2>Edge detection</h2>
<p>Before getting deep into how to implement edge detectors, let&#8217;s first talk about what is an edge detector and why we need it.</p>
<p>In general, edge detection is one of the most fundamental image processing tools, particularly used in the areas of feature detection and feature extraction. The aim of the technique is to identify points of a digital image at which the intensity changes sharply. The reason of these intensity changes can be either discontinuities in depth, surface orientation, lighting condition changes and many other factors. In the ideal case, the result of applying an edge detector to an image leads us to a set of connected lines or curves that indicate the boundaries of objects.</p>
<p>Not going that far, what an edge detector gives us from the very beginning is a gray-scale image where each pixel intensity tries to approximate the likelihood of whether that pixel belongs to an object boundary. How well a particular algorithm can detect such pixels depends on many factors and usually it is better to try multiple edge detectors in order to choose one that fits most for the particular use case.</p>
<p>After we got this gray-scale image we usually have to define a threshold value that will be used as an acceptance criteria for edge pixels. If the intensity value previously calculated is above this threshold then we accept the pixel as an edge otherwise we don&#8217;t. This part is the so called binarization stage. Additionally, subsequent image processing algorithms can be used to further interpret the edge image.</p>
<p>In computer graphics, edge detection is usually used to implement various image decoration algorithms. Maybe the most popular applications of edge detectors nowadays are non-photorealistic rendering (NPR) and screen-space anti-aliasing techniques.</p>
<h2>Sobel filter</h2>
<p>The Sobel edge detection filter works on a 3&#215;3 texel footprint and applies two convolution masks to the image that are intended to detect horizontal and vertical gradients of the image. The filter weights can be seen in on the figure below:</p>
<p style="text-align: center;"><img class="   aligncenter" title="Sobel masks" src="http://www.rastergrid.com/blog/wp-content/uploads/2011/01/sobel-masks.png" alt="Sobel masks" width="457" height="119" /></p>
<p>These masks are applied to the intensities gathered from the 3&#215;3 footprint of the image and then are accumulated to produce the final gradient value in the following way:</p>
<p style="text-align: center;"><img class="aligncenter" title="Sobel gradient" src="http://www.rastergrid.com/blog/wp-content/uploads/2011/01/sobel-grad.png" alt="Sobel gradient" width="321" height="84" /></p>
<p>The actual algorithm can be seen in the accompanying demo that provides a GLSL based implementation. The algorithm is defined to work on one channel image, however it can be easily extended to be applied either separately on a usual three-channel RGB image or by first calculating a gray-scale value based on the color component values. The former is more computationally intensive but usually provides better results by defining the threshold criteria in a way that a pixel is accepted as boundary point if the gradient value is larger than the threshold for either of the color channels. The reference implementation, however is based on the later approach for the sake of simplicity so for each pixel first an intensity value is calculated simply by taking the length of the vector comprised of the RGB components.</p>
<h2>Frei-Chen filter</h2>
<p>The Frei-Chen edge detector also works on a 3&#215;3 texel footprint but applies a total of nine convolution masks to the image. Frei-Chen masks are unique masks, which contain all of the basis vectors. This implies that a 3&#215;3 image area is represented with the weighted sum of nine Frei-Chen masks that can be seen below:</p>
<p style="text-align: center;"><img class="aligncenter" title="Frei-Chen masks" src="http://www.rastergrid.com/blog/wp-content/uploads/2011/01/frei-chen-masks.png" alt="Frei-Chen masks" width="650" height="237" /></p>
<p>The first four Frei-Chen masks above are used for edges, the next four are used for lines and the last mask is used to compute averages. For edge detection, appropriate masks are chosen and the image is projected onto it. The projection equation is given below:</p>
<p style="text-align: center;"><img class="aligncenter" title="Frei-Chen equation" src="http://www.rastergrid.com/blog/wp-content/uploads/2011/01/frei-chen-eq.png" alt="Frei-Chen equation" width="631" height="108" /></p>
<p>When we are using the Frei-Chen masks for edge detection we are searching for the cosine defined above and we use the first four masks as the elements of importance so the first sum above goes from one to four.</p>
<p>The application of a threshold and applying the filter to multi-channel images works exactly the same way like in case of the Sobel filter. Similarly, the reference implementation applies the filter on the image as it would be a single-channel image by first calculating the intensity value for each texel in the same fashion like with the previously presented filter.</p>
<h2>Comparison</h2>
<p>Based on my experience, the Frei-Chen edge detector looks better than the Sobel filter as it is less sensitive to noise and is able to detect edges that have small gradients and thus are not found by the basic Sobel filter. For a comparison, you can check the figure below:</p>
<div class="wp-caption aligncenter" style="width: 610px"><a href="http://www.rastergrid.com/blog/wp-content/uploads/2011/01/ed-comparison.png" onclick="pageTracker._trackPageview('/outgoing/www.rastergrid.com/blog/wp-content/uploads/2011/01/ed-comparison.png?referer=');"><img title="Comparison of edge detectors" src="http://www.rastergrid.com/blog/wp-content/uploads/2011/01/ed-comparison-thumb.png" alt="Comparison of edge detectors" width="600" height="200" /></a><p class="wp-caption-text">Comparison of edge detectors: original image (left), Sobel filter (middle), Frei-Chen filter (right).</p></div>
<p>The reason why the Frei-Chen edge detector seems to work better is because its construction includes a normalization factor as well as other factors that are meant to exclude all other features except edges. A normalization factor can be also added to the Sobel filter by having a third mask that is equivalent with the ninth Frei-Chen mask and is used to normalize the gradients. This could help in reducing the number of undetected edges and the amount of noise that arises from the fact that the Sobel filter calculates absolute gradients rather than relative ones.</p>
<p>From performance point of view, the Frei-Chen edge detector is much more heavyweight as it uses nine masks instead of two, however, in practice, the performance difference between the two is much less taking in consideration that both use the same sized texel footprint and the computational performance of today&#8217;s GPUs is usually much higher than their texture fetching performance.</p>
<h2>Conclusion</h2>
<p>We managed to present an alternative algorithm for the Sobel filter in the form of the Frei-Chen edge detector that, even though having little impact on the performance compared to the Sobel operator, provides better edge detection quality. Having little to no difference in the way how the input data has to be organized and how the result is output, the Frei-Chen edge detector can be easily used as a drop-in replacement for implementations that used the Sobel filter before.</p>
<p><strong>Source code</strong> and <strong>Win32 binary</strong> can be acquired in the <a title="Frei-Chen Edge Detector" href="http://rastergrid.com/blog/downloads/frei-chen-edge-detector/">downloads section</a>.</p>
<p>I would like to encourage those who read this article to add the Frei-Chen edge detector into their software for making a comparison about whether it yields to better results than the Sobel filter for applications that rely on the output of the edge detection filter. I would be interested how the filter works in real-life computer graphics scenarios.</p>
<p>Thanks in advance and hope you enjoyed the article!</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2011/01/frei-chen-edge-detector/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Suggestions for OpenGL 4.2 and beyond</title>
		<link>http://rastergrid.com/blog/2010/11/suggestions-for-opengl-4-2-and-beyond/</link>
		<comments>http://rastergrid.com/blog/2010/11/suggestions-for-opengl-4-2-and-beyond/#comments</comments>
		<pubDate>Sun, 14 Nov 2010 17:15:23 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[callback]]></category>
		<category><![CDATA[fragment shader]]></category>
		<category><![CDATA[geometry instancing]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[texture buffer]]></category>
		<category><![CDATA[transform feedback]]></category>
		<category><![CDATA[uniform buffer]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=504</guid>
		<description><![CDATA[The Khronos Group did a great job in the last few years to once again prove that OpenGL is still in game and that it can become the ultimate graphics API of choice, if it is not that already. However, we must note that it is not quite yet true that OpenGL 4.1 is a]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2010%252F11%252Fsuggestions-for-opengl-4-2-and-beyond%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2FdymyU0%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Suggestions%20for%20OpenGL%204.2%20and%20beyond%22%20%7D);"></div>
<p>The Khronos Group did a great job in the last few years to once again prove that OpenGL is still in game and that it can become the ultimate graphics API of choice, if it is not that already. However, we must note that it is not quite yet true that OpenGL 4.1 is a superset of its competitor, DirectX 11. We still have some holes that still have to be filled and I think the ARB should not stop just there as there is much more potential in the current hardware architectures than that is currently exposed by any graphics API so establishing the future of OpenGL should start by going one step further than DX11. In this article I would like to present my vision of items of importance that should be included in the next revision of the specification and how I see the future of OpenGL.</p>
<p><span id="more-504"></span>Since the original OpenGL Longs Peak announcement, graphics developers were really excited to get their hands on the completely revised OpenGL 3 specification. Still, due to severe backward compatibility and portability issues the original plan seemed to be failed and developers expressed their great sense of disappointment about the ARB&#8217;s decision to choose rather a more evolutionary move away from the legacy API instead of the radical rewrite, the Khronos Group has proved that the decision was not necessarily bad for OpenGL and in fact we got now a pretty powerful API, even though the coexistence of the legacy and the new design greatly increased the complexity of the specification.</p>
<p>What we have now is an API that can really compete with DirectX 11 but I strongly believe that this is not the end of the story yet as we still have a lot of things to do in ahead of us. I mean this both from point of view of exposing more hardware capabilities as well as streamlining the API language itself to increase the productivity of the developers who use it. My plan is to target both of these issues in this article, also trying to focus on hardware functionalities that are not even exposed by other graphics APIs yet.</p>
<h2>Exposing more hardware capabilities</h2>
<p>In this chapter of the article I will talk about some familiar and some not so familiar hardware features and corresponding OpenGL extensions that should be included in the next revision of the specification in order to be able to confidently say that OpenGL is a strict superset of the competing graphics APIs. The extensions listed here are not in any particular priority order, they are just listed in a way that ease the discussion about their functionality.</p>
<h3><a title="GL_EXT_shader_image_load_store" href="http://www.opengl.org/registry/specs/EXT/shader_image_load_store.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/shader_image_load_store.txt?referer=');">GL_EXT_shader_image_load_store</a></h3>
<p>This extension provides GLSL built-in functions allowing shaders to load from, store to, and perform atomic read-modify-write operations to a single level of a texture from any shader stage. Also, the extension also indirectly enables the same operations for buffer objects by using texture buffers. This enables developers to implement more sophisticated algorithms using shaders that require more complex data structures than just plain arrays.</p>
<p>An example use case can be the implementation of Order-Independent Transparency (OIT) using fragment linked lists as presented by <a title="OIT And Indirect Illumination Using Dx11 Linked Lists" href="http://www.slideshare.net/hgruen/oit-and-indirect-illumination-using-dx11-linked-lists" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.slideshare.net/hgruen/oit-and-indirect-illumination-using-dx11-linked-lists?referer=');">AMD at GDC10</a>. Of course, there are a lot of other techniques that could benefit from hardware accelerated random access images (called UAV textures/buffers in DX11 terminology) including algorithms related to global illumination, ray tracing, and my personal favorite: scene management.</p>
<p>As the introduction of new write operations to fragment shaders besides the traditional framebuffer writes makes the execution of the shaders sensitive to whether early-Z is used or not by the hardware, the extension also introduces a new fragment shader input layout qualifier called &#8220;early_fragment_tests&#8221; to force OpenGL to use early depth and stencil test. Otherwise the specification language is valid stating that the depth and stencil tests are performed after fragment shader execution.</p>
<p>Finally, the extension enables some form of control over the order of image loads, stores, and atomics relative to other pipeline operations accessing the same memory region both using the OpenGL API and from within shaders.</p>
<p>The API itself provides a DSA-style binding mechanism that enables binding to so called &#8220;image units&#8221; that are separate from that of texture image units. In the same style, the specification language and GLSL refers to the introduced read-write textures with the term &#8220;image&#8221;.</p>
<p>In my opinion this is one of the most important extensions that should be made core with OpenGL 4.2 and I&#8217;m pretty sure this will actually happen.</p>
<h3><a title="GL_NV_texture_barrier" href="http://www.opengl.org/registry/specs/NV/texture_barrier.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/NV/texture_barrier.txt?referer=');">GL_NV_texture_barrier</a></h3>
<p>This extension relaxes the restrictions of OpenGL on rendering to a currently bound texture and provides a mechanism to avoid read-after-write problems. More precisely, the extension allows rendering to a currently bound texture in the following cases:</p>
<ul>
<li>If the reads and writes are from/to disjoint sets of texels (after accounting for texture filtering rules) so it should work unless the drawn areas overlap, or</li>
<li>If there is only a single read and write of each texel, and the read is in the fragment shader invocation that writes the same texel (e.g. using texelFetch2D).</li>
</ul>
<p>Some of these situations were already supported implicitly like rendering to a texture level and fetching from another texture level. But the extension goes further and provides an API function to put an explicit barrier between draw calls to ensure proper rendering.</p>
<p>The extension can be used to accomplish a limited form of programmable blending and can eliminate the need of any image or buffer data copy in case we can live with the restrictions mentioned above.</p>
<p>One may ask why we need this extension if we have the <a title="GL_EXT_shader_image_load_store" href="http://www.opengl.org/registry/specs/EXT/shader_image_load_store.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/shader_image_load_store.txt?referer=');">GL_EXT_shader_image_load_store</a> extension as this one is just a subset of the functionality provided by that. The answer is simple: performance. While read-write textures can mimic the same functionality they usually use different hardware paths that are slower than regular read-only texture accesses. So it would be a definite benefit to having also this extension in core OpenGL.</p>
<h3>GL_ARB_shader_atomic_counters</h3>
<p>This extension does not have public specifications yet, however it can be found in the extension lists of the latest Catalyst driver releases sometimes with EXT, sometimes with ARB prefix. The extension itself provides API to access a number of hardware atomic counters that provide efficient counter operations on a GPU global scale.</p>
<p>Atomic counters come handy when one has to read or write individual elements of a buffer or texture. As an example, this extension is needed to be able to efficiently implement the OIT algorithm mentioned earlier as, when constructing the fragment linked list, we need to have unique offsets to the linked list buffer. This unique offset can be, of course, acquired by using atomic read-modify-write operations but those perform much slower than hardware atomic counters.</p>
<p>Besides the mentioned example, atomic counters are useful in many algorithms from many domains, one important use case is to perform feedback operations similar to that provided by transform feedback. Such feedback operations can be used to perform various scene management or culling mechanisms.</p>
<p>The extension provides access to these atomic counters from GLSL and also makes it possible to back them up with buffer objects so after OpenGL draw calls the value of the counters is conserved in these buffers for subsequent use.</p>
<h3><a title="GL_AMD_conservative_depth" href="http://www.opengl.org/registry/specs/AMD/conservative_depth.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/conservative_depth.txt?referer=');">GL_AMD_conservative_depth</a></h3>
<p>Early depth test is a common optimization for hardware accelerated graphics that can skip the evaluation of fragment shaders for fragments that end up being discarded because they don&#8217;t pass the depth test. The problem is that in case the fragment shader modifies the depth value of the fragment then the early depth test is disabled. One can force early depth test with the functionality introduced by the extension <a title="GL_EXT_shader_image_load_store" href="http://www.opengl.org/registry/specs/EXT/shader_image_load_store.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/shader_image_load_store.txt?referer=');">GL_EXT_shader_image_load_store</a> but that can lead to some rendering artifacts as the modified depth value output by the fragment shader is not taken into account.</p>
<p>This extension allows the application to pass enough information to the GL implementation to activate some early depth test optimizations safely while still preserving the ability to account the final depth value in the depth test. In order to solve this, the extension introduces four new fragment shader input layout qualifiers called &#8220;depth_unchanged, &#8220;depth_any&#8221;, &#8220;depth_greater&#8221; and &#8220;depth_less&#8221;. The most interesting ones are the latest two that provide the ability to do early-Z and hierarchical-Z tests from one direction to discard some groups of fragments and still allow the fragment shader to safely modify the depth value.</p>
<p>This technique comes very handy in case of rendering volumetric particles, decals or billboards. Without this extension one have to sacrifice the possibility to do early rejection of fragments in order to be able to create the volumetric primitives mentioned.</p>
<p>As far as I know this feature is also present in DirectX 11 so it should be a must for OpenGL 4.x also. As the extension is an AMD one, I don&#8217;t know whether NVIDIA GPUs do support anything like this in hardware but even if not, they can simply ignore the new layout qualifiers and do late depth test instead. Of course, it would result in lower performance but if only functionality is concerned it should be just okay.</p>
<h3>GL_ARB_instanced_arrays2</h3>
<p>OpenGL provides two means to perform geometry instancing via the extensions <a title="GL_ARB_draw_instanced" href="http://www.opengl.org/registry/specs/ARB/draw_instanced.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_instanced.txt?referer=');">GL_ARB_draw_instanced</a> and <a title="GL_ARB_instanced_arrays" href="http://www.opengl.org/registry/specs/ARB/instanced_arrays.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/instanced_arrays.txt?referer=');">GL_ARB_instanced_arrays</a>. While this (yet non-existent) extension would extend both, it is more relevant in case of the extension mentioned later so I named it accordingly.</p>
<p>The extension should trivially add the possibility to specify a &#8220;first instance&#8221; parameter for the instanced draw commands. Whether this is accomplished by introducing new variants of the glDrawElement* and glDrawArrays* draw commands or having a separate command for specifying the new parameter is up to the ARB. The extension should also interact with <a title="GL_ARB_draw_indirect" href="http://www.opengl.org/registry/specs/ARB/draw_indirect.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_indirect.txt?referer=');">GL_ARB_draw_indirect</a> which already mentions the lack of the parameter in GL and reserved already a field in the indirect draw command structure for specifying the &#8220;first instance&#8221; parameter.</p>
<p>This extension itself would be much more a bug fix rather than a completely new feature as this functionality should have been already exposed at the first time instancing was introduced to OpenGL.</p>
<h3>GL_ARB_draw_indirect2</h3>
<p>This is one of the extensions I would be the most happy to see in the next release of the OpenGL specification. It would be a functional addition to the <a title="GL_ARB_draw_indirect" href="http://www.opengl.org/registry/specs/ARB/draw_indirect.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_indirect.txt?referer=');">GL_ARB_draw_indirect</a> extension that currently only allows the execution of a single instanced draw command that sources its parameter from a buffer object.</p>
<p>The new extension would add a new buffer binding point called e.g. GL_DRAW_INDIRECT_PRIMITIVE_COUNT that would specify the source of the &#8220;primcount&#8221; parameter to the following newly introduced draw commands:</p>
<pre>    void <strong>MultiDrawArraysIndirect</strong>( enum <em>mode</em>, sizei stride,
                                  const void *<em>indirect</em>,
                                  const void *<em>primcount</em> );
    void <strong>MultiDrawElementsIndirect</strong>( enum <em>mode</em>, enum <em>type</em>, sizei stride,
                                    const void *<em>indirect</em>,
                                    const void *<em>primcount</em> );</pre>
<p>This would not just allow for executing multiple indirect draw commands at once, without further CPU action, but also would source the &#8220;primcount&#8221; parameter from a buffer object thus if the draw commands are generated using transform feedback, read-write buffers or OpenCL (e.g. based on some GPU based scene management algorithm) then the application does not have to use asynchronous queries or other means that may introduce sync points in the rendering to be able to feed the &#8220;primcount&#8221; parameter.</p>
<p>Some people said that this is quite a futuristic feature to expect and most probably such functionality will be available only on newer generation of GPUs and maybe with OpenGL 5. I was not that pessimistic so I decided to raise my question to the relevant ARB members of NVIDIA and AMD. While I did not receive any answer from NVIDIA, I did received some good news from AMD as they said that this functionality can be implemented for Shader Model 5.0 level hardware.</p>
<p>What this extension would give developers is a way to efficiently implement GPU based scene management where the GPU bakes together all the rendering commands for the current frame using atomic counters and buffer writes, and the CPU just have to issue a few or maybe just a single MultiDraw*Indirect command to render the whole scene. But of course, the feature can increase draw command throughput also in case of CPU based scene management.</p>
<p>So my message to the Khronos Group is please, start working on such an extension as this would not just make developers happy, but you can also strengthen OpenGL&#8217;s position in the industry by putting something into the specification that even DirectX 11 cannot do.</p>
<h3><a title="GL_AMD_transform_feedback3_lines_triangles" href="http://www.opengl.org/registry/specs/AMD/transform_feedback3_lines_triangles.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/transform_feedback3_lines_triangles.txt?referer=');">GL_AMD_transform_feedback3_lines_triangles</a></h3>
<p>OpenGL 4.0 introduced the extension <a title="GL_ARB_transform_feedback3" href="http://www.opengl.org/registry/specs/ARB/transform_feedback3.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/transform_feedback3.txt?referer=');">GL_ARB_transform_feedback3</a> that further extended the transform feedback capabilities provided by earlier extensions to allow ouput to separate vertex streams. However there is one caveat: separate vertex streams are only supported for point primitives.</p>
<p>This new AMD extension does nothing more than just simply removes that restrictions for separate output streams allowing the same set of primitive types to be used with multiple transform feedback streams as with a single stream as long as the primitive types are the same for all output streams.</p>
<p>Limiting the possible output primitive types for transform feedback into multiple streams should not be a problem unless you want also to rasterize some triangles at the same time you output. Without relaxing this restriction can do this only by issuing two separate draw commands that incurs a performance hit.</p>
<p>I don&#8217;t know if the restriction is present in the ARB extension because NVIDIA does not support this in hardware but if this is not the case then I think this extension should be included in the next release of the specification. Otherwise, please NVIDIA include this feature in your next GPU generation.</p>
<h3><a title="GL_NV_copy_image" href="http://www.opengl.org/registry/specs/NV/copy_image.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/NV/copy_image.txt?referer=');">GL_NV_copy_image</a></h3>
<p>OpenGL 3.1 already introduced a method to provide GPU accelerated copy of buffer data. This NVIDIA extension provides a similar functionality that can be used to execute efficient image data transfer between image objects (i.e. textures and renderbuffers).</p>
<p>While there are already methods to perform image data copies between textures e.g. using the <a title="GL_EXT_framebuffer_blit" href="http://www.opengl.org/registry/specs/EXT/framebuffer_blit.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/framebuffer_blit.txt?referer=');">GL_EXT_framebuffer_blit</a> extension promoted to core with OpenGL 3.0 these require expensive framebuffer object operations and they also lack direct support for transferring 3D image data.</p>
<p>This extension simply introduces a single command that allows such image data copies for every type of textures (including cube maps, 3D textures and array textures) without the need to bind the image objects or otherwise configure the rendering.</p>
<h3><a title="GL_AMD_depth_clamp_separate" href="http://www.opengl.org/registry/specs/AMD/depth_clamp_separate.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/depth_clamp_separate.txt?referer=');">GL_AMD_depth_clamp_separate</a></h3>
<p>The extension <a title="GL_ARB_depth_clamp" href="http://www.opengl.org/registry/specs/ARB/depth_clamp.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/depth_clamp.txt?referer=');">GL_ARB_depth_clamp</a> promoted to core with OpenGL 3.2 introduced the ability to control the clamping of the depth value for both the near and far clip planes. This eliminates artifacts like seeing inside an object happening when the object&#8217;s geometry is clipped by the near clip plane.</p>
<p>This new extension provides a mean for the application to enable depth clamp separately for the near and the far clip plane. This increases the flexibility of depth clamping and can save some fill-rate in certain situations.</p>
<h3><a title="GL_EXT_texture_filter_anisotropic" href="http://www.opengl.org/registry/specs/EXT/texture_filter_anisotropic.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/texture_filter_anisotropic.txt?referer=');">GL_EXT_texture_filter_anisotropic</a></h3>
<p>I don&#8217;t think that I have to talk too much about this extension as it should be familiar to all of you. It simply enables the possibility to use anisotropic filtering on a per-texture basis. I really wonder how this extension didn&#8217;t make its way into core as it is supported by hardware since more than a decade.</p>
<p>I know that the extension itself is supported by all relevant graphics driver vendors but really, why we can&#8217;t just simply include it in the core specification?</p>
<h3>GL_ARB_texture_gather_lod</h3>
<p>This is another yet non-existent extension that would extend <a title="GL_ARB_texture_gather" href="http://www.opengl.org/registry/specs/ARB/texture_gather.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_gather.txt?referer=');">GL_ARB_texture_gather</a> by adding GLSL built-in functions called textureGatherLod that would allow gathered fetches with explicit LOD. I&#8217;m not sure if these functions are missing from the specification because of lack of hardware support or just because the ARB thought they might not be of any use. Anyway, if the hardware supports it then OpenGL should expose it to developers as there are certain situations when one has to use explicit LOD and could benefit from the increased fetching performance enabled by gathered fetches.</p>
<h3><a title="GL_ARB_shader_stencil_export" href="http://www.opengl.org/registry/specs/ARB/shader_stencil_export.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_stencil_export.txt?referer=');">GL_ARB_shader_stencil_export</a></h3>
<p>This extension was published at the time the OpenGL 4.1 specification came out and provides the ability for the fragment shader to output the stencil reference value that was otherwise configurable only using API calls. This enables a great level of flexibility to existing and future stencil buffer based algorithms making it possible also to directly write independent values to the stencil buffer on a per-fragment basis.</p>
<p>The predecessor of the extension is <a title="GL_AMD_shader_stencil_export" href="http://www.opengl.org/registry/specs/AMD/shader_stencil_export.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/shader_stencil_export.txt?referer=');">GL_AMD_shader_stencil_export</a> and as such it indicates that maybe it is only supported in hardware on AMD GPUs. However, if this is not the case and NVIDIA could support this also then I think it worths to promote this feature also to core OpenGL.</p>
<h2>Streamlining the API</h2>
<p>After discussing the long list of functional features that would be nice to be included into the next release of OpenGL let&#8217;s focus on the API improvement extensions and ideas that are necessary to improve the usability of the API itself. Actually this part could go way longer than I&#8217;ll discuss because as we get more and more features to OpenGL, developers struggle with the increased complexity of the API. I&#8217;ll try to focus on the most crucial issues.</p>
<h3><a title="GL_EXT_direct_state_access" href="http://www.opengl.org/registry/specs/EXT/direct_state_access.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/direct_state_access.txt?referer=');">GL_EXT_direct_state_access</a></h3>
<p>This is the extension what all OpenGL developers are waiting for a long time now. Direct state access eliminates the OpenGL API&#8217;s stupid &#8220;bind-to-modify&#8221; nature.</p>
<p>For a very long time the only vendor supporting the extension was NVIDIA. Fortunately, since Catalyst 10.7 AMD also exposes the extension to developers. Still, I have one problem: this extension is very poorly designed.</p>
<p>The main problem with the extension is that the functions were designed in a way that a naive implementation could be done by simply using &#8220;bind-to-modify&#8221; under the hood. That&#8217;s what resulted in crazy API functions like MultiTexParameter* and friends. Also, enabling DSA for all of the deprecated functionalities would result in an explosion of the API specification and as a consequence it would result in bloated specification language. Finally, I would also like to object somewhat the lack of creativity of the contributors regarding to the awkward naming conventions present in the current DSA extension.</p>
<p>In my opinion the Khronos Group has to address the issue by creating a new ARB version of the DSA extension that focuses strictly on core functionalities, throwing away DSA support for deprecated features (if somebody needs to use deprecated features they can still use the EXT version) and provide a naming convention that fits much better into the current API language.</p>
<p>Anyway, I completely agree with the other developers out there and scream for DSA. I think the Khronos Group has to eliminate the problem of the &#8220;bind-to-modify&#8221; semantics as soon as possible otherwise, even though the core specification exposes more and more hardware features, developers will not be attracted to use OpenGL.</p>
<h3>GL_ARB_explicit_sampler_location</h3>
<p>The ARB moved in the right direction when they introduced the <a title="GL_ARB_explicit_attrib_location" href="http://www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt?referer=');">GL_ARB_explicit_attrib_location</a> extension by eliminating the need to use dummy API calls to bind vertex attributes and output buffers to shader variables but they should not stop here. One of the most important addition could be adding a similar language syntax to GLSL that would allow us to bind sampler uniforms to texture image units. Obviously, the same goes for read-write images if <a title="GL_EXT_shader_image_load_store" href="http://www.opengl.org/registry/specs/EXT/shader_image_load_store.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/shader_image_load_store.txt?referer=');">GL_EXT_shader_image_load_store</a> is included.</p>
<h3>GL_ARB_explicit_uniform_block_index</h3>
<p>Similar to the previous request, uniform block indices should be as well explicitly specifiable in the shaders themselves. This extension would add exactly such functionality. The implementation is also straightforward: just a simple uniform block layout qualifier has to be added.</p>
<h3>Other API clarifications</h3>
<p>Besides the major issues the current specification language also has some bugs and unclear parts that should be addressed as well:</p>
<ul>
<li>Program pipeline objects are created by binding the object name which is not in align with the rest of the API language.</li>
<li>No language is about whether program pipeline objects are shared among contexts or not which suggests that they aren&#8217;t which is not in align with the fact that program and shader objects are shared.</li>
</ul>
<p>Most probably there are a lot more issues with the specification language but for now just these came into my mind. Maybe some of you can extend the list with tons of other specification mistakes.</p>
<h2>OpenGL 4.2 and beyond</h2>
<p>While my feature requests cover most of the needed functionality that should be included in the next revision of the OpenGL specification, there are a lot of other things that could be very useful for developers but are very unlikely to get their way into the specification any soon. I will talk about these features in this section of the article as these raise much more questions than just to be able to simply include it in OpenGL 4.2.</p>
<h3>Affinity contexts</h3>
<p>We have multi-GPU designs like SLI and CrossFire for a long time now. Fortunately, we have also vendor specific extensions to create affinity contexts that are associated with a single GPU of a multi-GPU configuration. We have <a title="WGL_AMD_gpu_association" href="http://www.opengl.org/registry/specs/AMD/wgl_gpu_association.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/wgl_gpu_association.txt?referer=');">WGL_AMD_gpu_association</a> and <a title="WGL_NV_gpu_affinity" href="http://www.opengl.org/registry/specs/NV/gpu_affinity.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/NV/gpu_affinity.txt?referer=');">WGL_NV_gpu_affinity</a> for Windows and <a title="GLX_AMD_gpu_association" href="http://www.opengl.org/registry/specs/AMD/glx_gpu_association.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/glx_gpu_association.txt?referer=');">GLX_AMD_gpu_association</a> on GLX based platforms. I have just two problems with this:</p>
<ul>
<li>First, these are vendor specific extensions.</li>
<li>Second, NVIDIA exposes its affinity context support only on Windows and just for their professional cards, leaving consumer hardware owners without affinity context support.</li>
</ul>
<p>I would be pleased to see in the future extensions like <span style="text-decoration: underline;">WGL_ARB_gpu_affinity_context</span> and <span style="text-decoration: underline;">GLX_ARB_gpu_affinity_context</span> that will be supported by both NVIDIA and AMD, and that are supported on both professional and consumer hardware.</p>
<h3>Command buffers</h3>
<p>I would like to see something similar in OpenGL that what we have in OpenCL. Having several separate command buffers for a single OpenGL context can have its performance benefits as some of the implicit sync points that are otherwise present in OpenGL draw commands could be eliminated. Another solution would be to use simply multiple GL contexts but it is much more complicated and context switches are quite heavy-weight operations. This would be something like how framebuffer objects replaced pbuffers.</p>
<p>Also this could go that far as we can encapsulate state manipulation data into command buffers in a similar way how display lists allowed this in many cases just in a more efficient and hardware centric manner.</p>
<h3>Immutable state objects</h3>
<p>Another thing strongly related to the previous idea would be immutable state objects. If state management data could not be efficiently stored in such a command buffer we could use instead immutable state objects that would be very similar in nature to display lists that are hiding the underlying representation of the commands.</p>
<p>Display lists are deprecated and I don&#8217;t think it was a wrong decision. It made the API language complex and you&#8217;ve never knew which command compiles into display lists and how. I remember the time I was making an OpenGL app on my GeForce2 and used DrawElements calls inside display lists that referenced buffer object data. Funnily it was working on NVIDIA hardware, even though the specification says otherwise, and I was wondering why I my app crashes on ATI cards.</p>
<p>Anyway, display lists are gone, but we need some complex state objects that could fill those holes that were left after them.</p>
<h3>More callbacks</h3>
<p>I was very happy to see the appearance of an extension that introduced the callback concept into OpenGL (<a title="GL_AMD_debug_output" href="http://www.opengl.org/registry/specs/AMD/debug_output.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/debug_output.txt?referer=');">GL_AMD_debug_output</a>). Since that, the functionality was promoted to an ARB extension meaning that the ARB has accepted the fact that we need callbacks.</p>
<p>What I would like to see in the future is more OpenGL callbacks. One of the most trivial things I can think of are asynchronous queries. It would so much easier if we would be able to receive a callback from OpenGL when the results of our asynchronous queries are available, rather than having to manually poll it for result in various phases of the rendering.</p>
<p>Actually, I could imagine callbacks for every rendering command issued that will be called by the driver as soon as the actual rendering is complete on the GPU side.</p>
<h3>Programmable blending</h3>
<p>This is one another thing that developers are screaming for. Fortunately now we have indirect methods to solve most of the issues of programmable blending via the extensions <a title="GL_EXT_shader_image_load_store" href="http://www.opengl.org/registry/specs/EXT/shader_image_load_store.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/shader_image_load_store.txt?referer=');">GL_EXT_shader_image_load_store</a> and <a title="GL_NV_texture_barrier" href="http://www.opengl.org/registry/specs/NV/texture_barrier.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/NV/texture_barrier.txt?referer=');">GL_NV_texture_barrier</a>, however a more general solution would be welcomed.</p>
<p>I don&#8217;t know whether this would be actually possible on current hardware but if not, then this is a message to hardware vendors to solve the issue in the near future.</p>
<h2>Summary</h2>
<p>We&#8217;ve seen that even though OpenGL is on track and the Khronos Group is keeping up the pace with its competitors, still there are lots of room for improvement regarding to the OpenGL specification from both functional point of view as well as from API design point of view.</p>
<p>I would like to end the article with a summary of what I expect to be part of the OpenGL 4.2 specification and my personal wish-list beyond those in some kind of priority order.</p>
<p><strong>My expectations for OpenGL 4.2:</strong></p>
<ul>
<li>GL_EXT_shader_image_load_store</li>
<li>GL_ARB_shader_atomic_counters</li>
<li>GL_ARB_instanced_arrays2</li>
<li>GL_ARB_explicit_sampler_location</li>
<li>GL_ARB_explicit_uniform_block_index</li>
</ul>
<p><strong>My personal wish-list for OpenGL 4.2:</strong></p>
<ul>
<li>GL_ARB_draw_indirect2</li>
<li>GL_ARB_direct_state_access</li>
<li>GL_NV_texture_barrier</li>
<li>GL_AMD_conservative_depth</li>
<li>GL_ARB_texture_gather_lod</li>
<li>GL_NV_copy_image</li>
<li>GL_EXT_texture_filter_anisotropic</li>
<li>GL_ARB_shader_stencil_export</li>
<li>GL_AMD_depth_clamp_separate</li>
<li>GL_AMD_transform_feedback3_lines_triangles</li>
</ul>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/11/suggestions-for-opengl-4-2-and-beyond/feed/</wfw:commentRss>
		<slash:comments>29</slash:comments>
		</item>
		<item>
		<title>Hierarchical-Z map based occlusion culling</title>
		<link>http://rastergrid.com/blog/2010/10/hierarchical-z-map-based-occlusion-culling/</link>
		<comments>http://rastergrid.com/blog/2010/10/hierarchical-z-map-based-occlusion-culling/#comments</comments>
		<pubDate>Tue, 19 Oct 2010 19:13:32 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Samples]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[culling]]></category>
		<category><![CDATA[depth buffer]]></category>
		<category><![CDATA[fragment shader]]></category>
		<category><![CDATA[geometry instancing]]></category>
		<category><![CDATA[geometry shader]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[LOD]]></category>
		<category><![CDATA[mipmap]]></category>
		<category><![CDATA[occlusion culling]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[transform feedback]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=397</guid>
		<description><![CDATA[Hierarchical-Z is a well known and standard feature of modern GPUs that allows them to speed up depth testing by rejecting large group of incoming fragments using a reduced and compressed version of the depth buffer that resides in on-chip memory. The technique presented in this article uses the same basic idea to allow batched]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2010%252F10%252Fhierarchical-z-map-based-occlusion-culling%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2FaGM0Fs%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Hierarchical-Z%20map%20based%20occlusion%20culling%22%20%7D);"></div>
<div class="wp-caption alignleft" style="width: 210px"><a href="http://rastergrid.com/blog/wp-content/uploads/2010/10/mountains.png"><img class="  " title="Click to enlarge" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/10/mountains-thumb.png" alt="OpenGL 4.0 - Mountains demo" width="200" height="150" /></a><p class="wp-caption-text">OpenGL 4.0 - Mountains demo</p></div>
<p>Hierarchical-Z is a well known and standard feature of modern GPUs that allows them to speed up depth testing by rejecting large group of incoming fragments using a reduced and compressed version of the depth buffer that resides in on-chip memory. The technique presented in this article uses the same basic idea to allow batched occlusion culling for large amount of individual objects using a geometry shader without the need of any CPU intervention that is unavoidable using traditional occlusion queries. The article also provides a reference implementation in the form of the OpenGL 4.0 Mountains demo that uses the technique for culling thousands of object instances.</p>
<p><span id="more-397"></span></p>
<h2>Introduction</h2>
<p>Occlusion culling is a visibility determination algorithm that is used to identify those objects that did reside in the view volume but still aren&#8217;t visible on the screen due to occlusion. That means they are hidden by such objects that reside closer to the camera.</p>
<p>For several generations now GPUs allow hardware accelerated methods to perform occlusion culling in the form of occlusion queries. OpenGL provides the functionality via the extension <a title="GL_ARB_occlusion_query" href="http://www.opengl.org/registry/specs/ARB/occlusion_query.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/occlusion_query.txt?referer=');">ARB_occlusion_query</a>. Occlusion queries are very simple: when you draw an object with occlusion query enabled the query returns the number of samples that passed the depth test (or simply return true or false based on whether any samples of the objects passed the depth test or not as it is provided by the OpenGL extension <a title="GL_ARB_occlusion_query2" href="http://www.opengl.org/registry/specs/ARB/occlusion_query2.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/occlusion_query2.txt?referer=');">ARB_occlusion_query2</a>).</p>
<p>So actually performing occlusion culling using occlusion queries means simply the following:</p>
<ol>
<li>Draw the object while occlusion query is enabled.</li>
<li>If the query result is that the object is visible then draw the object.</li>
</ol>
<p>At first, this may sound stupid as you have to draw the object in order to tell whether it is visible or not. While in this form it really sounds silly, in practice occlusion query can save a lot of work for the GPU. Think about you have a complex object with several thousands of triangles. If you would like to determine the visibility of it using occlusion query you would simply render e.g. the bounding box of the object and if the bounding box is visible (occlusion query returns that some samples have passed) then it means the object itself is most probably visible. This way you can save the GPU from the unnecessary processing of large amount of geometry.</p>
<p>I have to mention here that I intentionally used the expression &#8220;most probably visible&#8221; as occlusion queries provide just a conservative estimate on whether the object is visible or not rather than an exact result. This is because the bounding box occupies a different (larger) portion of the screen than the original geometry. So what we expect from an occlusion culling algorithm is to give one of the following results: the object is not visible or the object is most probably visible. The bigger this probability is the better the occlusion culling effectiveness is.</p>
<p>While we would always want an occlusion culling algorithm to be as effective as possible usually we have to make a trade-off between effectiveness and efficiency. In the above example if we would like to have 100% effectiveness then we would have to draw the whole object and that would defeat most of the goals of occlusion culling. The algorithm presented in this article is somewhat even more conservative but enables the use of occlusion culling for much larger datasets.</p>
<h2>Motivation</h2>
<p>While hardware accelerated occlusion query is a powerful tool to use in visibility determination it puts a quite reasonable burden on the application to manage the occlusion queries and to draw the objects based on the results when they are available (taking in consideration the asynchronous nature of occlusion queries). The most naive use of occlusion queries would be to execute the query right before we have to draw the object. While this seems like a feasible idea, it does not perform well in practice as the CPU has to be stalled until the result of the query is available and that involves also empty cycles on the GPU as well thus results in unacceptable performance. In order to resolve this, the application has to fill the time between the query execution and the drawing of the object based on the query result. While there are techniques to accomplish this, it definitely comes at a cost as the implementation becomes more complex.</p>
<p>The aforementioned problem is somewhat resolved by using conditional rendering introduced in OpenGL 3 (<a title="GL_NV_conditional_render" href="http://www.opengl.org/registry/specs/NV/conditional_render.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/NV/conditional_render.txt?referer=');">NV_conditional_render</a> extension). However, this extension does nothing just in case the results of the query are not available yet then we simply draw the object no matter if it is visible or not. This can avoid the stalling of the rendering pipeline and can be done in software if the extension is not available, however, it somewhat defeats the purpose of occlusion culling.</p>
<p>Another deficit when using occlusion queries is that there is still need for CPU intervention in order to make a decision about the visibility of the object. For today&#8217;s hardware where proper batching is one of the most crucial aspects of the renderer such an approach is rather ineffective.</p>
<p>The occlusion culling technique presented in this article solves both these issues by providing an implementation that is very simple to integrate into any renderer, does put little to no burden on the renderer and makes decision about the visibility of objects entirely on the GPU.</p>
<h2>The algorithm</h2>
<p>As in case of many other GPU based culling algorithm presented by me and others, the hierarchical-Z map based occlusion culling uses the geometry shader&#8217;s ability to deny the emission of primitives that are determined to be invisible on the final rendering. The shader will only emit data for those objects that are visible and this data is streamed out into a buffer object using transform feedback.</p>
<p>The algorithm itself is similar in spirit to the hierarchical Z testing that is implemented in modern GPUs. After rendering all the occluders in the scene, we construct a hierarchical depth image from the depth buffer which we will refer to as the Hi-Z map. This texture map is a mip-mapped, screen resolution image where each texel in mip level <em>i</em> contains the maximum depth of all corresponding texels in mip level <em>i-1</em>. This depth information can be collected during the main rendering pass for the occluding objects as we need a texture of the same resolution so we don&#8217;t need a separate depth pass. This can be simply accomplished using OpenGL framebuffer objects.</p>
<p>After the construction of the Hi-Z map, occlusion culling can be performed by comparing depth value of the object&#8217;s bounding volume and the depth information stored in the Hi-Z map. This is when the hierarchical mip-mapped structure of the Hi-Z map comes handy as we can do conservative depth comparisons with less texture fetches by sampling directly from a particular mip level.</p>
<p>This is why we constructed the Hi-Z map using a &#8220;store maximum depth&#8221; policy. This will work with a usual depth buffer setup where the depth comparison function is either GREATER or GEQUAL. For a reverse directed depth buffer the &#8220;store minimum depth&#8221; policy has to be used.</p>
<h3>Hi-Z map construction</h3>
<p>In case of single-sample rendering, one can use the Hi-Z map as the main depth buffer for rendering the scene. The technique extends also to multi-sampled rendering but in this case a separate full-screen quad pass is needed to calculate the maximum depth of each individual sample in the multi-sampled depth buffer and store it in the single-sampled Hi-Z map. This is possible since OpenGL 3.2 or using the extension <a title="GL_ARB_texture_multisample" href="http://www.opengl.org/registry/specs/ARB/texture_multisample.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_multisample.txt?referer=');">ARB_texture_multisample</a>. Besides this additional step, the algorithm remains the same.</p>
<p>The Hi-Z map can be constructed using OpenGL framebuffer objects by rendering a full-screen quad pass for each mip level where the previous mip level is bound as the input texture and the current mip level is bound as render target. As OpenGL does allow rendering from and to the same texture object as far as we don&#8217;t access the same mip level for both reading and writing, the algorithm simply looks like the following:</p>
<pre class="brush:cpp">// bind depth texture
glBindTexture(GL_TEXTURE_2D, depthTexture);
// calculate the number of mipmap levels for NPOT texture
int numLevels = 1 + (int)floorf(log2f(fmaxf(SCREEN_WIDTH, SCREEN_HEIGHT)));
int currentWidth = SCREEN_WIDTH;
int currentHeight = SCREEN_HEIGHT;
for (int i=1; i&lt;numLevels; i++) {
  // calculate next viewport size
  currentWidth /= 2;
  currentHeight /= 2;
  // ensure that the viewport size is always at least 1x1
  currentWidth = currentWidth &gt; 0 ? currentWidth : 1;
  currentHeight = currentHeight &gt; 0 ? currentHeight : 1;
  glViewport(0, 0, currentWidth, currentHeight);
  // bind next level for rendering but first restrict fetches only to previous level
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_BASE_LEVEL, i-1);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAX_LEVEL, i-1);
  glFramebufferTexture2D(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT,
                         GL_TEXTURE_2D, depthTexture, i);
  // draw full-screen quad
  ............
}
// reset mipmap level range for the depth image
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_BASE_LEVEL, 0);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAX_LEVEL, numLevels-1);</pre>
<p>It is very important not to forget about the step when we ensure that the viewport size is always at least 1&#215;1 as in case of non-power-of-two (NPOT) textures due to rounding problems. I forgot this first and I was wondering an hour why my last mip level didn&#8217;t get filled.</p>
<p>While one may wonder how this technique can be efficient after so many full-screen quad passes, it is in fact very efficient and it constructs the Hi-Z map on my Radeon HD5770 in less than <strong>0.2 milliseconds</strong>. The measurement should be quite accurate as I&#8217;ve done it using OpenGL timer queries (see the extension <a title="GL_ARB_timer_query" href="http://www.opengl.org/registry/specs/ARB/timer_query.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/timer_query.txt?referer=');">ARB_timer_query</a>).</p>
<p>The fragment shader used for the construction of the Hi-Z map is very straightforward except one thing. We use an NPOT depth texture due to the aspect ratio of the window and as NPOT textures use a &#8220;floor&#8221; convention to determine the size of subsequent mip levels (see the extension <a title="GL_ARB_texture_non_power_of_two" href="http://www.opengl.org/registry/specs/ARB/texture_non_power_of_two.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_non_power_of_two.txt?referer=');">ARB_texture_non_power_of_two</a>) we need predicated fetches as in case of reduction from odd-sized mip levels we should not forgot about the edge texels:</p>
<pre class="brush:c">#version 400 core

uniform sampler2D LastMip;
uniform ivec2 LastMipSize;

in vec2 TexCoord;

void main(void)
{
  vec4 texels;
  texels.x = texture( LastMip, TexCoord ).x;
  texels.y = textureOffset( LastMip, TexCoord, ivec2(-1, 0) ).x;
  texels.z = textureOffset( LastMip, TexCoord, ivec2(-1,-1) ).x;
  texels.w = textureOffset( LastMip, TexCoord, ivec2( 0,-1) ).x;

  float maxZ = max( max( texels.x, texels.y ), max( texels.z, texels.w ) );

  vec3 extra;
  // if we are reducing an odd-width texture then fetch the edge texels
  if ( ( (LastMipSize.x &amp; 1) != 0 ) &amp;&amp; ( int(gl_FragCoord.x) == LastMipSize.x-3 ) ) {
    // if both edges are odd, fetch the top-left corner texel
    if ( ( (LastMipSize.y &amp; 1) != 0 ) &amp;&amp; ( int(gl_FragCoord.y) == LastMipSize.y-3 ) ) {
      extra.z = textureOffset( LastMip, TexCoord, ivec2( 1, 1) ).x;
      maxZ = max( maxZ, extra.z );
    }
    extra.x = textureOffset( LastMip, TexCoord, ivec2( 1, 0) ).x;
    extra.y = textureOffset( LastMip, TexCoord, ivec2( 1,-1) ).x;
    maxZ = max( maxZ, max( extra.x, extra.y ) );
  } else
  // if we are reducing an odd-height texture then fetch the edge texels
  if ( ( (LastMipSize.y &amp; 1) != 0 ) &amp;&amp; ( int(gl_FragCoord.y) == LastMipSize.y-3 ) ) {
    extra.x = textureOffset( LastMip, TexCoord, ivec2( 0, 1) ).x;
    extra.y = textureOffset( LastMip, TexCoord, ivec2(-1, 1) ).x;
    maxZ = max( maxZ, max( extra.x, extra.y ) );
  }

  gl_FragDepth = maxZ;
}</pre>
<p>I was experimenting with using texture gather lookups to reduce the number of texture fetches from 4-to-7 fetches per fragment down to 1-to-3 fetches per fragment (see the extension <a title="GL_ARB_texture_gather" href="http://www.opengl.org/registry/specs/ARB/texture_gather.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_gather.txt?referer=');">ARB_texture_gather</a>) it seems that texture gather works only if the image is linearly sampled and to avoid the additional burden involved by switching filtering state during rendering I stuck to simple texture lookups as using texture gather lookups did not show any visible effect on the construction time of the Hi-Z map.</p>
<div class="wp-caption aligncenter" style="width: 602px"><img title="Various mip levels of the Hi-Z map" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/10/depth-lods.png" alt="Various mip levels of the Hi-Z map" width="592" height="144" /><p class="wp-caption-text">Various mip levels of the Hi-Z map. The Hi-Z map size is 1024x768 and the displayed mip levels are: level 4 (left), level 5 (middle) and level 6 (right).</p></div>
<p>For debugging and demonstration purposes the Mountains demo has built-in function to display the content of the various mip levels of the Hi-Z map. This is available by pressing the F4 key while Hi-Z map based occlusion culling is enabled. The + and &#8211; keys can be used to switch between the mip levels.</p>
<p>In order to better visualize the depth information in the depth buffer I converted the non-linear depth values stored in the depth texture into linear depth values as presented in <a title="[GeeXLab] How to Visualize the Depth Buffer in GLSL" href="http://www.geeks3d.com/20091216/geexlab-how-to-visualize-the-depth-buffer-in-glsl/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.geeks3d.com/20091216/geexlab-how-to-visualize-the-depth-buffer-in-glsl/?referer=');">[GeeXLab] How to Visualize the Depth Buffer in GLSL</a>.</p>
<h3>Culling with the Hi-Z map</h3>
<p>Once we have constructed the Hi-Z map, we can perform the actual occlusion culling by fetching the 2&#215;2 texel neighborhood corresponding to the screen area occupied by the bounding volume of the object whose visibility has to be determined. In the demo I used bounding boxes but any other bounding volume can be used (e.g. a bounding sphere is usually accurate enough for this technique).</p>
<p>First, we have to calculate the clip space bounding rectangle of the bounding volume. In the bounding box case this is done by transforming the bounding box vertices into clip space and then calculate the minimum and maximum X and Y coordinates. This bounding rectangle will be used for two things: it defines the texture coordinates that we&#8217;ll have to use for the Hi-Z map lookup and it helps determining the appropriate LOD for the texture lookup.</p>
<p>In order to determine the texture LOD that we&#8217;ll have to fetch we have to calculate the screen space size of the bounding square corresponding to the clip space bounding rectangle determined previously. This can be simply done by calculating the width and height of the bounding rectangle in clip space and then transforming this into screen space:</p>
<pre class="brush:c">float ViewSizeX = (BoundingRect[1].x-BoundingRect[0].x) * Transform.Viewport.y;
float ViewSizeY = (BoundingRect[1].y-BoundingRect[0].y) * Transform.Viewport.z;</pre>
<p>After this, the texture LOD can be simply calculated using the following formula:</p>
<pre class="brush:c">float LOD = ceil( log2( max( ViewSizeX, ViewSizeY ) / 2.0 ) );</pre>
<p>Finally, as we have the texture coordinates (the vertices of the clip space bounding rectangle) and the texture LOD, we simply have to make four texture lookups into the Hi-Z map using these parameters, calculate the maximum of the four depth values returned and compare it to the depth value corresponding to the object (this is the object&#8217;s front-most point&#8217;s depth value that comes also from the clip space coordinates of the bounding box). If the object depth is greater than the reference depth the object is occluded and so it is culled by the geometry shader as usual.</p>
<p>One may ask why we use a 2&#215;2 texel footprint for calculating the reference depth value why not just fetch the next mip level only once (as there we also get the maximum values of a 2&#215;2 texel footprint due to the Hi-Z map construction method). That&#8217;s what I&#8217;ve also asked myself at first sight but quickly figured out the reason (see the figure below).</p>
<div class="wp-caption aligncenter" style="width: 530px"><img class=" " title="Comparison of four texel fetches and one texel fetch for depth comparison" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/10/fetch-modes.png" alt="Comparison of four texel fetches and one texel fetch for depth comparison" width="520" height="256" /><p class="wp-caption-text">Comparison of number of fetches used for occlusion culling. Both figures show the magnified screen coverage of a single Hi-Z map texel at mip level N, texel coverage for mip level N-1 is in cyan and texel coverage for mip level N-2 is in blue. Object is show as red and yellow indicates the fetched texels.</p></div>
<p>In case of four texels not just the determination of the texture LOD is much easier but also it better encompasses the actual object bounding rectangle. In case of one texture fetch the computation of texture LOD is more complicated and expensive but the main problem is that a larger LOD has to be fetched and it is not always the LOD determined in the case of four fetches plus one. In the most extreme situation (if the bounding rectangle is right at the middle of the screen) it is possible that we have to fetch the largest LOD. This does not result in any false culling but it severely degrades the effectiveness of the culling.</p>
<p>Of course, it is possible to use more complex screen space bounding polygon as well as more fetches but those would increase the effectiveness of the culling much less than the additional burden and expensive operations worth.</p>
<h2>Conclusion</h2>
<p>We&#8217;ve seen how traditional hardware occlusion culling works by using occlusion queries. We also discussed that we sometimes need a better algorithm that does the occlusion culling for large amount of objects without CPU intervention.</p>
<p>The article also described a way to implement such an occlusion culling algorithm by using a hierarchical-Z map and geometry shaders. We&#8217;ve also managed to provide a reference implementation in the form of the demo called Mountains that can be downloaded with full source code in the <a title="OpenGL 4.0 - Mountains demo download" href="http://rastergrid.com/blog/downloads/mountains-demo/">downloads section</a>.</p>
<p>The algorithm performs very well in practice on current hardware. The Hi-Z map construction takes less than 0.2 milliseconds and the actual culling comes at almost no cost for even thousands of objects. For more detail about performance comparison between rendering with and without hierarchical-Z map based occlusion culling read the article about the <a title="OpenGL 4.0 - Mountains demo released" href="http://rastergrid.com/blog/2010/10/opengl-4-0-mountains-demo-released/">OpenGL 4.0 Mountains Demo</a>.</p>
<p>While the demo uses the technique only for culling instances of the same object, the technique can be easily extended to work for heterogeneous set of objects as the actual culling algorithm works on a per-object basis and is completely indifferent regarding to the method used for rendering the actual geometry.</p>
<p>This technique can be thought of as the next step towards a completely GPU based visibility determination and scene management system.</p>
<p>Acknowledgements go to Jeremy Shopf, Joshua Barczak, Christopher Oat and Natalya Tatarchuk and their <a title="SIGGRAPH 2008 Course Notes about the March of the Froblins" href="http://developer.amd.com/documentation/presentations/legacy/Chapter03-SBOT-March_of_The_Froblins.pdf" target="_blank" onclick="pageTracker._trackPageview('/outgoing/developer.amd.com/documentation/presentations/legacy/Chapter03-SBOT-March_of_The_Froblins.pdf?referer=');">SIGGRAPH 2008 Course Notes</a> that inspired this work.</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/10/hierarchical-z-map-based-occlusion-culling/feed/</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
		<item>
		<title>Efficient Gaussian blur with linear sampling</title>
		<link>http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/</link>
		<comments>http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/#comments</comments>
		<pubDate>Tue, 07 Sep 2010 20:48:16 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Samples]]></category>
		<category><![CDATA[bloom]]></category>
		<category><![CDATA[blur]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[depth-of-field]]></category>
		<category><![CDATA[filter]]></category>
		<category><![CDATA[fragment shader]]></category>
		<category><![CDATA[GLEW]]></category>
		<category><![CDATA[GLM]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[postprocessing]]></category>
		<category><![CDATA[SFML]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=299</guid>
		<description><![CDATA[Gaussian blur is an image space effect that is used to create a softly blurred version of the original image. This image then can be used by more sophisticated algorithms to produce effects like bloom, depth-of-field, heat haze or fuzzy glass. In this article I will present how to take advantage of the various properties]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2010%252F09%252Fefficient-gaussian-blur-with-linear-sampling%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2FcLq0EW%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Efficient%20Gaussian%20blur%20with%20linear%20sampling%22%20%7D);"></div>
<div class="wp-caption alignleft" style="width: 160px"><br />
<img class=" " title="Gaussian blur" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/09/gaussian_thumbnail.png" alt="Gaussian blur" width="150" height="150" /><p class="wp-caption-text">Gaussian blur</p></div>
<p>Gaussian blur is an image space effect that is used to create a softly blurred version of the original image. This image then can be used by more sophisticated algorithms to produce effects like bloom, depth-of-field, heat haze or fuzzy glass. In this article I will present how to take advantage of the various properties of the Gaussian filter to create an efficient implementation as well as a technique that can greatly improve the performance of a naive Gaussian blur filter implementation by taking advantage of bilinear texture filtering to reduce the number of necessary texture lookups. While the article focuses on the Gaussian blur filter, most of the principles presented are valid for most convolution filters used in real-time graphics.</p>
<p><span id="more-299"></span></p>
<p>Gaussian blur is a widely used technique in the domain of computer graphics and many rendering techniques rely on it in order to produce convincing photorealistic effects, no matter if we talk about an offline renderer or a game engine. Since the advent of configurable fragment processing through texture combiners and then using fragment shaders the use of Gaussian blur or some other blur filter is almost a must for every rendering engine. While the basic convolution filter algorithm is a rather expensive one, there are a lot of neat techniques that can drastically reduce the computational cost of it, making it available for real-time rendering even on pretty outdated hardware. This article will be most like a tutorial article that tries to present most of the available optimization techniques. Some of them may be familiar to all of you but maybe the linear sampling will bring you some surprise, but let&#8217;s not go that far but start with the basics.</p>
<h2>Terminology</h2>
<p>In order to precede any possibility of confusion, I&#8217;ll start the article with the introduction of some terms and concepts that I will use in the post.</p>
<p><strong>Convolution filter</strong> &#8211; An algorithm that combines the color value of a group of pixels.</p>
<p><strong>NxN-tap filter &#8211; </strong>A filter that uses a square shaped footprint of pixels with the square&#8217;s side length being N pixels.</p>
<p><strong>N-tap filter</strong> &#8211; A filter that uses an N-pixel footprint. Note that an N-tap filter does *not* necessarily mean that the filter has to sample N texels as we will see that an N-tap filter can be implemented using less than N texel fetches.</p>
<p><strong>Filter kernel</strong> &#8211; A collection of relative coordinates and weights that are used to combine the pixel footprint of the filter.</p>
<p><strong>Discrete sampling</strong> &#8211; Texture sampling method when we fetch the data of exactly one texel (aka GL_NEAREST filtering).</p>
<p><strong>Linear sampling</strong> &#8211; Texture sampling method when we fetch a footprint of 2&#215;2 texels and we apply a bilinear filter to aquire the final color information (aka GL_LINEAR filtering).</p>
<h2>Gaussian filter</h2>
<p>The image space Gaussian filter is an NxN-tap convolution filter that weights the pixels inside of its footprint based on the Gaussian function:</p>
<p style="text-align: center;"><img class=" aligncenter" title="Gaussian function 2D" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/09/gaussian_function_2D.png" alt="Gaussian function 2D" width="190" height="41" /></p>
<p>The pixels of the filter footprint are weighted using the values got from the Gaussian function thus providing a blur effect. The spacial representation of the Gaussian filter, sometimes referred to as the &#8220;bell surface&#8221;, demonstrates how much the individual pixels of the footprint contribute to the final pixel color.</p>
<div class="wp-caption aligncenter" style="width: 444px"><img title="Gaussian function graphical representation" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/09/gaussian_graph.png" alt="Gaussian function graphical representation" width="434" height="351" /><p class="wp-caption-text">The graphical representation of the 2-dimensional Gaussian function</p></div>
<p>Based on this some of you may already say &#8220;aha, so we simply need to do NxN texture fetches and weight them together and voilà&#8221;. While this is true, it is not that efficient as it looks like. In case of a 1024&#215;1024 image, using a fragment shader that implements a 33&#215;33-tap Gaussian filter based on this approach would need an enormous number of 1024*1024*33*33 ≈ 1.14 billion texture fetches in order to apply the blur filter for the whole image.</p>
<p>In order to get to a more efficient algorithm we have to analyze a bit some of the nice properties of the Gaussian function:</p>
<ul>
<li>The 2-dimensional Gaussian function can be calculated by multiplying two 1-dimensional Gaussian function:</li>
</ul>
<p style="text-align: center;"><img class="aligncenter" title="Gaussian function 1D" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/09/gaussian_function_1D.png" alt="Gaussian function 1D" width="190" height="41" /></p>
<ul>
<li>A Gaussian function with a distribution of 2σ is equivalent with the product of two Gaussian functions with a distribution of σ.</li>
</ul>
<p>Both of these properties of the Gaussian function give us room for heavy optimization.</p>
<p>Based on the first property, we can separate our 2-dimensional Gaussian function into two 1-dimensional one. In case of the fragment shader implementation this means that we can separate our Gaussian filter into a horizontal blur filter and the vertical blur filter, still getting the accurate results after the rendering. This results in two N-tap filters and an additional rendering pass needed for the second filter. Getting back to our example, applying the two filters to a 1024&#215;1024 image using two 33-tap Gaussian filters will get us to 1024*1024*33*2 ≈ 69 million texture fetches. That is already more than an order of magnitude less than the original approach made possible.</p>
<p>Using the second property of the Gaussian function, we can separate our 33&#215;33-tap filter into three 9&#215;9-tap filter (9+8=17, 17+16=33). Back to our example, for the 1024&#215;1024 sized image this results in 1024*1024*9*9*3 ≈ 255 million texture fetches. As we can see, we also spared a large amount of the necessary texture fetches using this approach as well.</p>
<p>Of course, the combination of the two techniques is also possible. That means we both separate our filter to a vertical and horizontal filter as well as decompose our 33-tap filter into three 9-tap filter. This will get us to the almost optimal number of 1024*1024*9*3*2 ≈ 56 million texture fetches.</p>
<h2>Gaussian kernel weights</h2>
<p>We&#8217;ve seen how to implement an efficient Gaussian blur filter for our application, at least in theory, but we haven&#8217;t talked about how we should calculate the weights for each pixel we combine using the filter in order to get the proper results. The most straightforward way to determine the kernel weights is by simply calculating the value of the Gaussian function for various distribution and coordinate values. While this is the most generic solution, there is a simpler way to get some weights by using the binomial coefficients. Why we can do that? Because the Gaussian function is actually the distribution function of the normal distribution and the normal distribution&#8217;s discrete equivalent is the binomial distribution which uses the binomial coefficients for weighting its samples.</p>
<div class="wp-caption aligncenter" style="width: 630px"><img title="Binomial coefficients" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/09/binomial_coeff2.png" alt="Binomial coefficients" width="620" height="300" /><p class="wp-caption-text">The Pascal triangle showcasing the binomial coefficients that can be used to calculate the kernel weights (each element in the succeeding rows is the sum of its &quot;parents&quot;).</p></div>
<p>For implementing our 9-tap horizontal and vertical Gaussian filter we will use the last row of the Pascal triangle illustrated above in order to calculate our weights. One may ask why we don&#8217;t use the row with index 8 as it has 9 coefficients. This is a justifiable question, but it is rather easy to answer it. This is because with a typical 32 bit color buffer the outermost coefficients don&#8217;t have any effect on the final image while the second outermost ones have little to no effect. We would like to minimize the number of texture fetches but provide the highest quality blur as possible with our 9-tap filter. Obviously, in case very high precision results are a must and a higher precision color buffer is available, preferably a floating point one, using the row with index 8 is better. But let&#8217;s stick to our original idea and use the last row&#8230;</p>
<p>By having the necessary coefficients, it is very easy to calculate the weights that will be used to linearly interpolate our pixels. We just have to divide the coefficient by the sum of the coefficients that is 4096 in this case. Of course, for correcting the elimination of the four outermost coefficients, we shall reduce the sum to 4070, otherwise if we apply the filter several times the image may get darker.</p>
<p>Now, as we have our weights it is very straightforward to implement our fragment shaders. Let&#8217;s see how the vertical file shader will look like in GLSL:</p>
<pre class="brush:cpp">uniform sampler2D image;

out vec4 FragmentColor;

uniform float offset[5] = float[]( 0.0, 1.0, 2.0, 3.0, 4.0 );
uniform float weight[5] = float[]( 0.2270270270, 0.1945945946, 0.1216216216,
                                   0.0540540541, 0.0162162162 );

void main(void)
{
    FragmentColor = texture2D( image, vec2(gl_FragCoord)/1024.0 ) * weight[0];
    for (int i=1; i&lt;5; i++) {
        FragmentColor +=
            texture2D( image, ( vec2(gl_FragCoord)+vec2(0.0, offset[i]) )/1024.0 )
                * weight[i];
        FragmentColor +=
            texture2D( image, ( vec2(gl_FragCoord)-vec2(0.0, offset[i]) )/1024.0 )
                * weight[i];
    }
}</pre>
<p>Obviously the horizontal filter is no different just the offset value is applied to the X component rather than to the Y component of the fragment coordinate. Note that we hardcoded here the size of the image as we divide the resulting window space coordinate by 1024. In a real life scenario one may replace that with a uniform or simply use texture rectangles that don&#8217;t use normalized texture coordinates.</p>
<p>If you have to apply the filter several times in order to get a more strong blur effect, the only thing you have to do is ping-pong between two framebuffers and apply the shaders to the result of the previous step.</p>
<div class="wp-caption aligncenter" style="width: 610px"><a href="http://www.rastergrid.com/blog/wp-content/uploads/2010/09/gaussian1.png" onclick="pageTracker._trackPageview('/outgoing/www.rastergrid.com/blog/wp-content/uploads/2010/09/gaussian1.png?referer=');"><img class=" " title="Gaussian blur effect" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/09/gaussian1_thumbnail.png" alt="Gaussian blur effect" width="600" height="200" /></a><p class="wp-caption-text">9-tap Gaussian blur filter applied to an image of size 1024x1024: no filter applied (left), applied once (middle), applied nine times (right). Click to view the full-sized image in order to better see the difference.</p></div>
<h2>Linear sampling</h2>
<p>So far, we were able to see how to implement a separable Gaussian filter using two rendering pass in order to get a 9-tap Gaussian blur. We&#8217;ve also seen that we can run this filter three times over a 1024&#215;1024 sized image in order to get a 33-tap Gaussian blur by using only 56 million texture fetches. While this is already quite efficient it does not really expose any possibilities of the GPUs as this form of the algorithm would work perfectly almost unmodified on a CPU as well.</p>
<p>Now, we will see that we can take advantage of the fixed function hardware available on the GPU that can even further reduce the number of required texture fetches. In order to get to this optimization let&#8217;s discuss one of the assumptions that we made from the beginning of the article:</p>
<p>So far, we assumed that in order to get information about a single pixel we have to make a texture fetch, that means for 9 pixels we need 9 texture fetches. While this is true in case of a CPU implementation, it is not necessarily true in case of a GPU implementation. This is because in the GPU case we have bilinear texture filtering at our disposal that comes with practically no cost. That means if we don&#8217;t fetch at texel center positions our texture then we can get information about multiple pixels. As we already use the separability property of the Gaussian function we actually working in 1D so for us bilinear filter will provide information about two pixels. The amount of how much each texel contribute to the final color value is based on the coordinate that we use.</p>
<p>By properly adjusting the texture coordinate offsets we can get the accurate information of two texels or pixels using a single texture fetch. That means for implementing a 9-tap horizontal/vertical Gaussian filter we need only 5 texture fetches. In general, for an N-tap filter we need [N/2] texture fetches.</p>
<p>What this will mean for our weight values previously used for the discrete sampled Gaussian filter? It means that each case we use a single texture fetch to get information about two texels we have to weight the color value retrieved by the sum of the weights corresponding to the two texels. Now that we know what are our weights, we just have to calculate the texture coordinate offsets properly.</p>
<p>For texture coordinates, we can simply use the middle coordinate between the two texel centers. While this is a good approximation, we won&#8217;t accept it as we can calculate much better coordinates that will result us exactly the same values as when we used discrete sampling.</p>
<p>In case of such a merge of two texels we have to adjust the coordinates that the distance of the determined coordinate from the texel #1 center should be equal to the weight of texel #2 divided by the sum of the two weights. In the same style, the distance of the determined coordinate from the texel #2 center should be equal to the weight of texel #1 divided by the sum of the two weights.</p>
<p>As a result, we get the following formulas to determine the weights and offsets for our linear sampled Gaussian blur filter:</p>
<p style="text-align: center;"><img class="aligncenter" title="Weight and offset calculation for linear sampling" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/09/equation.png" alt="Weight and offset calculation for linear sampling" width="597" height="116" /></p>
<p>By using this information we just have to replace our uniform constants and decrease the number of iterations in our vertical filter shader and we get the following:</p>
<pre class="brush:cpp">uniform sampler2D image;

out vec4 FragmentColor;

uniform float offset[3] = float[]( 0.0, 1.3846153846, 3.2307692308 );
uniform float weight[3] = float[]( 0.2270270270, 0.3162162162, 0.0702702703 );

void main(void)
{
    FragmentColor = texture2D( image, vec2(gl_FragCoord)/1024.0 ) * weight[0];
    for (int i=1; i&lt;3; i++) {
        FragmentColor +=
            texture2D( image, ( vec2(gl_FragCoord)+vec2(0.0, offset[i]) )/1024.0 )
                * weight[i];
        FragmentColor +=
            texture2D( image, ( vec2(gl_FragCoord)-vec2(0.0, offset[i]) )/1024.0 )
                * weight[i];
    }
}</pre>
<p>This simplification of the algorithm is mathematically correct and if we don&#8217;t consider possible rounding errors resulting from the hardware implementation of the bilinear filter we should get the exact same result with our linear sampling shader like in case of the discrete sampling one.</p>
<div class="wp-caption aligncenter" style="width: 523px"><a href="http://www.rastergrid.com/blog/wp-content/uploads/2010/09/side2side.png" onclick="pageTracker._trackPageview('/outgoing/www.rastergrid.com/blog/wp-content/uploads/2010/09/side2side.png?referer=');"><img class=" " title="Side-to-side comparison of Gaussian blur with discrete and linear sampling" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/09/side2side_thumbnail.png" alt="Side-to-side comparison of Gaussian blur with discrete and linear sampling" width="513" height="250" /></a><p class="wp-caption-text">9-tap Gaussian blur applied nine times with discrete sampling (left) and linear sampling (right). Click for the full resolution of the image. Note that there is no visible difference between the two techniques even after several passes.</p></div>
<p>While the implementation of the linear sampling is pretty straightforward, it has a quite visible effect on the performance of the Gaussian blur filter. Taking into consideration that we managed to implement a 9-tap filter using just five texture fetches instead of nine, back to our example, blurring a 1024&#215;1024 image with a 33-tap filter takes only 1024*1024*5*3*2 ≈ 31 million texture fetches instead of the 56 million required by discrete sampling. This is a quite reasonable difference and in order to better present how much that matters I&#8217;ve done some experiment to measure the difference between the two techniques. The result speaks for itself:</p>
<div class="wp-caption aligncenter" style="width: 532px"><img title="Performance comparison of discrete and linear sampling" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/09/comparison2.png" alt="Performance comparison of discrete and linear sampling" width="522" height="400" /><p class="wp-caption-text">Performance comparison of the 9-tap Gaussian blur filter with discrete and linear sampling on a Radeon HD5770. The vertical axis is the frames per second (higher is better) and the horizontal axis represents results with various number of blur steps (higher is blurrier).</p></div>
<p>As we can see, the performance of the Gaussian filter implemented with linear sampling is about 60% faster than the one implemented with discrete sampling indifferent from the number of blur steps applied to the image. This roughly proportional to the number of texture fetches spared by using linear filtering.</p>
<h2>Conclusion</h2>
<p>We&#8217;ve seen that implementing an efficient Gaussian blur filter is quite straightforward and the result is a very fast real-time algorithm, especially using the linear sampling, that can be used as the basis of more advanced rendering techniques.</p>
<p>Even though we concentrated on Gaussian blur in this article, many of the discussed principles apply to most convolution filter types. Also, most of the theory applies in case we need a blurred image of reduced size like it is usually needed by the bloom effect, even the linear sampling. The only thing that is really different in case of a reduced size blurred image is that our center pixel is also a &#8220;double-pixel&#8221;. This means that we have to use a row from our Pascal triangle that has even number of coefficients as we would like to linear sample the middle texels as well.</p>
<p>We&#8217;ve also had a brief insight into the computational complexity of the various techniques and how the filter can be efficiently implemented on the GPU.</p>
<p>The demo application used for the measurements performed to compare the discrete and linear sampling method can be downloaded here:</p>
<h3>Binary release</h3>
<p><strong>Platform:</strong> Windows<br />
<strong>Dependency:</strong> OpenGL 3.3 capable graphics driver<br />
<strong>Download link:<span style="font-weight: normal;"> </span><a href="http://www.rastergrid.com/blog/wp-content/uploads/2010/09/gaussian_win32.zip" onclick="pageTracker._trackPageview('/outgoing/www.rastergrid.com/blog/wp-content/uploads/2010/09/gaussian_win32.zip?referer=');"><span style="font-weight: normal;">gaussian_win32.zip (2.96MB)</span></a></strong></p>
<p><a href="http://rastergrid.com/blog/wp-content/uploads/2010/06/nature12_win32.zip"></a><strong>Source code</strong></p>
<p><strong>Language:</strong> C++<br />
<strong>Platform:</strong> cross-platform<br />
<strong>Dependency:</strong> GLEW, SFML, GLM<br />
<strong>Download link:</strong> <a href="http://www.rastergrid.com/blog/wp-content/uploads/2010/09/gaussian_src.zip" onclick="pageTracker._trackPageview('/outgoing/www.rastergrid.com/blog/wp-content/uploads/2010/09/gaussian_src.zip?referer=');">gaussian_src.zip (5.37KB)</a><br />
<strong> </strong></p>
<p>P.S.: Sorry for the high minimum requirements of the application just I would really like to stick to strict OpenGL 3+ demos.</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/feed/</wfw:commentRss>
		<slash:comments>52</slash:comments>
		</item>
		<item>
		<title>An introduction to OpenGL 4.1</title>
		<link>http://rastergrid.com/blog/2010/08/an-introduction-to-opengl-4-1/</link>
		<comments>http://rastergrid.com/blog/2010/08/an-introduction-to-opengl-4-1/#comments</comments>
		<pubDate>Tue, 24 Aug 2010 19:32:51 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[binary shader]]></category>
		<category><![CDATA[callback]]></category>
		<category><![CDATA[fragment shader]]></category>
		<category><![CDATA[geometry shader]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[OpenGL ES]]></category>
		<category><![CDATA[stencil]]></category>
		<category><![CDATA[vertex shader]]></category>
		<category><![CDATA[viewport]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=290</guid>
		<description><![CDATA[The Khronos Group keeps the pace that they set themselves being able to deliver the latest specification of OpenGL less than half year after the revolutionary appearance of OpenGL 4. Abandoning the OpenGL 3.x line of the specification (at least for a while) the new update concentrates on Shader Model 5.0 class GPUs and extensions]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2010%252F08%252Fan-introduction-to-opengl-4-1%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2F99sxN3%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22An%20introduction%20to%20OpenGL%204.1%22%20%7D);"></div>
<p>The Khronos Group keeps the pace that they set themselves being able to deliver the latest specification of OpenGL less than half year after the revolutionary appearance of OpenGL 4. Abandoning the OpenGL 3.x line of the specification (at least for a while) the new update concentrates on Shader Model 5.0 class GPUs and extensions heavily promoted by the community. Beside all this, the Khronos Group now confessedly opens towards convergence to OpenGL ES making the desktop version of the specification downward compatible with its embedded brother. In this article I would like to present the features introduced with the latest revision of the specification.</p>
<p><span id="more-290"></span>At the time of the release of the OpenGL 4 specification I was able to quickly deliver you a <a title="A brief preview of the new features introduced by OpenGL 3.3 and 4.0" href="http://rastergrid.com/blog/2010/03/a-brief-preview-of-the-new-features-introduced-by-opengl-3-3-and-4-0/">thorough presentation</a> of all the new features introduced by that revision of the specification. This time I am already quite late, however I hope that this article will still prove as value for lots of you, especially for those who haven&#8217;t had time in the recent past to dig into the details of the new API version.</p>
<p>OpenGL 4.1 is not as revolutionary and feature-rich as its predecessor, however the latest revision was well received by the community as it brought such core extensions to the API that the community was waiting for a long time now. The new revision of the specification was accompanied with the appearance of a couple of other ARB extensions that have not yet been included into core, however I will still talk about some of them as they indicate a slight shift in the force of influence of various vendors and representatives inside the <a title="About the OpenGL ARB &quot;Architecture Review Board&quot;" href="http://www.opengl.org/about/arb/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/about/arb/?referer=');">Architecture Review Board (ARB)</a>.</p>
<h2>New features of OpenGL 4.1</h2>
<p>Let&#8217;s start with the presentation of the new features arriving with the OpenGL 4.1 specification primarily targeting Shader Model 5.0 hardware. Here you will see a lot of harmonization features as well as community&#8217;s choice features that squarely intended to increase OpenGL development efficiency and feedom.</p>
<h3><a title="GL_ARB_ES2_compatibility" href="http://www.opengl.org/registry/specs/ARB/ES2_compatibility.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/ES2_compatibility.txt?referer=');">ARB_ES2_compatibility</a></h3>
<p>There have been for a long time rumors about the Khronos Group preparing a convergence between desktop OpenGL and OpenGL ES. This extension of the core specification clearly makes the first step towards this goal by providing an all-in-one specification pack that makes the desktop version of the specification downward compatible with ES. The extension adds support for features of OpenGL ES 2.0 that are missing from OpenGL 3+. According to the extension specification, enabling these features will ease the process of porting applications from OpenGL ES 2.0 to OpenGL.</p>
<p>More precisely, <a title="GL_ARB_ES2_compatibility" href="http://www.opengl.org/registry/specs/ARB/ES2_compatibility.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/ES2_compatibility.txt?referer=');">GL_ARB_ES2_compatibility</a> exposes not just all the functions and tokens that weren&#8217;t present in the desktop version of the specification but also completes it with all the semantics that were exclusively specified only in the embedded version. Just to mention few of these issues:</p>
<ul>
<li>Vertex data format is now extended with the possibility to use 16-bit fixed point values by exposing the GL_FIXED type identifier token.</li>
<li>Providing possibility to query the precision format used internally by shaders.</li>
<li>Enable the use of GLSL ES for writing shaders for desktop GL.</li>
</ul>
<p>While having this extension under the hood does not mean that we can simply pick our last game made for e.g. Symbian and just drop it on our PC, this extension may prove to be great value for GL ES developers migrating their software to desktop platforms.</p>
<h3><a title="GL_ARB_get_program_binary" href="http://www.opengl.org/registry/specs/ARB/get_program_binary.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/get_program_binary.txt?referer=');">ARB_get_program_binary</a></h3>
<p>This is one of the most waited additions to the core specification by the developer community. This extension introduces the possibility to acquire some sort of binary format of the compiled and linked shaders that can be later used to specify the program object directly with its binary code thus providing caching possibility to eliminate the need of compilation and linking next time the shader has to be used. This also makes it possible to create an offline GLSL compiler just using the OpenGL API itself.</p>
<p>Still, it has to be mentioned that having this feature in our hand does not necessarily mean that we can simply create our shader binaries offline and then distribute our software without the shader source itself as the binary formats supported by a particular implementation heavily depend on the hardware vendor as well as driver version. This is due to the fact that the shader binary most probably consists of instructions specially generated for the particular GPU-driver combo. The only way to relax this limitation would be to have some sort of cross-platform byte-code for shaders but that would in fact defeat most of the benefits of the extension on its own. Additionally, this extension does not provide any binary formats but leaves this to vendor specific extensions. It only exposes a common infrastructure for acquiring and loading program binaries.</p>
<p>While the usage of this extension does not completely eliminates the need for shader source compilation, it can limit the need for recompilation and relink to an installation time or first-run time compilation instead and use the stored binaries later. It also opens up room for SDK tools providing shader compilers with more aggressive optimization at their disposal being used offline. Such tools can truly be introduced as the specification explicitly mentions that run-time generated binaries by the GL should be interchangeable with those generated by offline SDK tools.</p>
<h3><a title="GL_ARB_separate_shader_objects" href="http://www.opengl.org/registry/specs/ARB/separate_shader_objects.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/separate_shader_objects.txt?referer=');">ARB_separate_shader_objects</a></h3>
<p>This is one another extension requested over several forums by the community. This feature has a longer history as it is actually based on the already existing and widely supported extension <a title="GL_EXT_separate_shader_objects" href="http://www.opengl.org/registry/specs/EXT/separate_shader_objects.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/separate_shader_objects.txt?referer=');">GL_EXT_separate_shader_objects</a> by NVIDIA. For those who are already familiar with the predecessor of this extension won&#8217;t really find too much new stuff reading the specification of the ARB version of the extension, however it is still a must to read for them as well as even though there aren&#8217;t too much semantic differences between the functionality of the two, the usage of them still differs quite a lot as the ARB version solved the design issues of its predecessor by introducing a new type of GL object that I will talk about just in a moment.</p>
<p>In a nutshell, this extension provides a way to create program objects using any variation of shaders and bind them together to the current rendering context. Previously there was no way to bind multiple program objects to the context as the program object was designed to be a container for all the shaders forming the rendering pipeline of the context. This was a design decision during the development of GLSL that, before this extension, made the connection between the varyings of subsequent shader stages using a name based binding. As name information is available for shaders latest in the link stage, shaders were tightly coupled meaning that a change in any shader stage code required the relinking of the complete program object.</p>
<p>This proved to be very unpleasant for OpenGL developers as usually every rendering engine has its own set of vertex and fragment shaders (maybe accompanied with other shader types) that are used in various combinations. As an example, let&#8217;s take two vertex shaders: a simple MVP matrix based transformation shader and a more complex one that also supports skeletal animation. Also let&#8217;s take two fragment shaders: one for diffuse material and one for reflective material. We can have several types of objects: static with diffuse material, static with reflective material, animated with diffuse material and animated with reflective material.</p>
<p>In traditional GLSL the vertex and fragment shaders are bound together at link time rather than at the time they are bound to the context, like it was in case of legacy shaders (<a title="GL_ARB_vertex_program" href="http://www.opengl.org/registry/specs/ARB/vertex_program.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/vertex_program.txt?referer=');">GL_ARB_vertex_program</a>, <a title="GL_ARB_fragment_program" href="http://www.opengl.org/registry/specs/ARB/fragment_program.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/fragment_program.txt?referer=');">GL_ARB_fragment_program</a> and others). This means that in order to be able to use any of the combinations of vertex and fragment shaders (and maybe some geometry and tesselation shaders as well) we end up with two possible solutions, both having their severe drawbacks:</p>
<p><strong><em>Link every combination of the shader objects</em></strong></p>
<p>While this sounds as a viable solution and is still used by most of the developers, it has several problems. First of all, it wastes resources as we now have several copies of the same piece of code and the number of combinations can be pretty high, especially if not just vertex and fragment shaders are in use. While this is already quite a reasonable issue with the solution, the biggest problem arises for the application developer when he or she has to maintain an individual set of uniform locations as well as binding points for vertex attributes, draw buffers and possibly transform feedback buffers. While the <a title="GL_ARB_explicit_attrib_location" href="http://www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt?referer=');">GL_ARB_explicit_attrib_location</a> extension already eliminates the need for maintaining binding points for vertex attributes, this solution is still simply unacceptable.</p>
<p><strong><em>Link the program objects on an on-demand basis</em></strong></p>
<p>In case of this alternative we are said to link the shader objects only when they are actually needed. While this solution eliminates the need for a possibly huge number of program objects, it introduces a reasonable run-time performance hit due to the additional relink process needed. Additionally, this solution proves to be more inferior even compared to the previous one as the uniform locations are determined at link time so it makes no less headache to the application developer.</p>
<p>This is the rationale behind this extension and why it is included into the core specification. The extension relaxes the strict tightly coupled behavior of the GLSL and adopts a mix-and-match shader stage model allowing multiple different program objects to be bound at once each to an individual set of rendering pipeline stage independently of other stage bindings.</p>
<p>Due to the fact that from now program objects are not the top most containers for the code used currently by the rendering pipeline, the ARB decided to introduce a new container object called a &#8220;program pipeline object&#8221; that can contain a set of program objects bound to their very own set of shader stages. This is the main difference between the EXT and the ARB version of the extension. I think it was a good decision to introduce this new type of object and the associated semantics as I always thought that the EXT version of the extension doesn&#8217;t have a really good design as I&#8217;ve seen it kind of a hack to relax the limitations of GLSL. The program pipeline object idea is definitely superior and I hope that the GLSL does not have too much of such annoying design issues hidden within.</p>
<h3><a title="GL_ARB_shader_precision" href="http://www.opengl.org/registry/specs/ARB/shader_precision.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_precision.txt?referer=');">ARB_shader_precision</a></h3>
<p>This extension is much more a clarification to the existing specification rather than a new feature. It restricts more clearly the precision requirements of implementations of GLSL. According to the specification, the extension is meant to more precisely define the precision of arithmetic operations (addition, multiplication, etc.), transcendentals (log, exp, pow, etc.), when <a title="NaN - Wikipedia" href="http://en.wikipedia.org/wiki/NaN" target="_blank" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/NaN?referer=');">NaN</a>s (not-a-number) and INFs (infinites) will be accepted and generated and denorm flushing behavior. The precision of the rest of the operations, including trigonometric operations are not addressed by the extension. For further details, please refer to the extension specification.</p>
<h3><a title="GL_ARB_vertex_attrib_64bit" href="http://www.opengl.org/registry/specs/ARB/vertex_attrib_64bit.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/vertex_attrib_64bit.txt?referer=');">ARB_vertex_attrib_64bit</a></h3>
<p>This extension trivially introduces 64-bit floating-point types into the list of supported vertex attribute component types. Nominally OpenGL did support this already from the very early stages of its history, however in practice only the latest generation of hardware does really accept vertex attributes in double precision floating-point type. While OpenGL 4 already introduced support for 64-bit floating-point values in GLSL and most of the shaders&#8217; environment, vertex attributes gained the 64-bit precision only with this new extension.</p>
<p>This new feature makes it possible to use high precision for positioning data and other attributes of our geometries. While this sounds pretty awesome and it is actually, still for game developers and other real-time graphics users this shouldn&#8217;t mean that they should quickly switch to the new precision only in such cases when the precision requirements of the application really need it as using 64-bit floating-point values for vertex attributes does not just double the memory consumption but also involves a serious hit on performance due to bandwidth limitations and vertex attributes of this type may count double against the implementation-dependent limit on the number of vertex shader attribute vectors.</p>
<h3><a title="GL_ARB_viewport_array" href="http://www.opengl.org/registry/specs/ARB/viewport_array.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/viewport_array.txt?referer=');">ARB_viewport_array</a></h3>
<p>Previously, the configuration of the viewport, aka the transformation that generates the screen space coordinates based on the incoming view space coordinates of the vertices, was a global configuration that had effect on all draw commands meaning that in order to draw a primitive into multiple viewports the OpenGL viewport had to be changed between several draw calls. While previously this limitation wasn&#8217;t really an issue, due to the introduction of geometry shaders the possibility to amplify geometry and produce multiple output primitives for each primitive input justifies the need of several separately configurable viewports. Why? Because even though one was able to render the output primitives into separate render targets, they still shared the same global viewport.</p>
<p>This extension enhances OpenGL by providing a mechanism to specify multiple viewports and a new ability for the geometry shader being able to select the used viewport on a per-primitive basis. This does not just mean that separate viewports can be used for separate render targets but also enables to use multiple viewports to render to the same render target.</p>
<p>Additionally, the introduction of a viewport array means that we&#8217;re gonna have separate scissor rectangle for each viewport in the array as well. This can come handy for deferred shading based renderers that often use the scissor rectangle to limit the number of pixels to be accessed in case of rendering the effect of a light source. Having multiple scissors means that we have to change state less often, thus batching is much less an issue even in case of heavy scissor rectangle usage.</p>
<p>Finally, the new viewport specification commands accept floating point values thus providing additional flexibility to the application developer to define their very own pixel center conventions.</p>
<p>I&#8217;m pretty unsure whether this feature depends on any Shader Model 5.0 hardware, maybe others are more aware of this. Anyway, I wouldn&#8217;t be surprised if this extension will be supported by a much larger range of graphics cards than just pure SM5 GPUs. Actually this is true for many other extensions introduced by OpenGL 4.1 but let&#8217;s not guess but wait for the upcoming drivers to see whether I&#8217;m right or wrong.</p>
<h2>Some other interesting extensions</h2>
<p>So far I presented the new features of the latest revision of the OpenGL specification. While this was the main topic of this article, at about the same time the specification was published, a lot of other ARB extensions just appeared in the <a title="OpenGL Extension Registry" href="http://www.opengl.org/registry/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/?referer=');">registry</a>. While these extensions are not yet included into core and I cannot know whether they will be ever included, I would like to talk about some of them as it made me get to an interesting conclusion.</p>
<h3><a title="GL_ARB_shader_stencil_export" href="http://www.opengl.org/registry/specs/ARB/shader_stencil_export.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_stencil_export.txt?referer=');">ARB_shader_stencil_export</a></h3>
<p>The stencil test is a powerful mechanism of OpenGL to selectively discard fragments based on the content of the stencil buffer that is used in a wide variety of rendering techniques including shadow volumes and deferred shading. However, the whole configuration of the stencil test and stencil operations is completely fixed function that is limited to operations such as incrementing, decrementing the existing value, or replacing the existing value in the stencil buffer with a fixed reference value.</p>
<p>This extension provides some programmability to the fixed function stencil operations by enabling the fragment shader to output a stencil reference value on a per-fragment basis. When stencil testing is enabled, this allows the test to be performed against the value generated in the shader. Also, when the stencil operation is set to GL_REPLACE, this allows a value generated in the shader to be written to the stencil buffer directly.</p>
<p>This opens up a lot of possibilities, however, I need to think much more about it as the best use cases of this feature are pretty much not basic ones. Obviously, by using the stencil reference value export inside a fragment shader disables early stencil test in the same style as exporting an new depth value from within a fragment shader disables early depth test.</p>
<h3><a title="GL_ARB_debug_output" href="http://www.opengl.org/registry/specs/ARB/debug_output.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/debug_output.txt?referer=');">ARB_debug_output</a></h3>
<p>This extension allows OpenGL to notify the application when various events occur that can come handy during application development and debugging. These events include errors, usage of deprecated functionalities, using configuration that results in undefined behavior, portability or performance issues. The application is notified about these events using a callback function that is defined by passing a function pointer to the appropriate OpenGL command.</p>
<p>While this extension provides a callback mechanism only for debugging purposes, the most revolutionary thing by having such an ARB extension is that this is the first official appearance of a feature that supports callbacks to the application code. Most probably not I&#8217;m the only person who would like to see a lot of other callbacks in the future included in the OpenGL API as we can benefit from it by getting notification about e.g. the completion of various asynchronous commands issued previously. This does not just provide a lot of flexibility but may also help in optimizing the rendering code based on the additional information previously available only if we use polling.</p>
<h3>Why these extensions are so interesting?</h3>
<p>The two extensions presented above already great value on their own but this isn&#8217;t why I mentioned them. The reason why I found these extensions so interesting as they are both obviously based on some vendor specific extensions released in the recent past by AMD, namely <a title="GL_AMD_shader_stencil_export" href="http://www.opengl.org/registry/specs/AMD/shader_stencil_export.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/shader_stencil_export.txt?referer=');">GL_AMD_shader_stencil_export</a> and <a title="GL_AMD_debug_output" href="http://www.opengl.org/registry/specs/AMD/debug_output.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/AMD/debug_output.txt?referer=');">GL_AMD_debug_output</a>. This conspicuously reveals that AMD has serious plans with their OpenGL support and this is something that a lot of those crazy folks waited for, who develop OpenGL stuff using ATI cards like me.</p>
<p>I think this also means that the NVIDIA monopoly in the ARB is over and this results in concurency and competition from what OpenGL and its community will definitely benefit in the long run.</p>
<h2>Conclusion</h2>
<p>The article ran out of control again, like the one I wrote about the previous release of the specification. Again, hope there are at least a few of you who kept up reading and finally got to this last chapter of the article. We can again quote the always recurring question of the community:</p>
<blockquote><p>Where is direct state access?</p>
</blockquote>
<p>Well, it is still not here, however, finally AMD has finished implementing it as well and published it finally. They have been working on it for quite some time but it became officially public only with Catalyst 10.7. Haven&#8217;t used it so far so maybe plenty of hidden bugs are still in it but at least they have it. This is one another thing that strengthens my prognostication that AMD committed itself for support OpenGL as previously they barely added support for any other extensions beside core features.</p>
<p>Back to the topic of the OpenGL 4.1 specification, while it is not as revolutionary as we got used to after reading the previous update, OpenGL is still on track and this is thanks to the Khronos Group and obviously to the great community. If OpenGL will get its iterative evolution in this pace like we&#8217;ve seen in the last two years, Microsoft will have a difficult time to keep up.</p>
<p>Thanks for reading this not-so-short article!</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/08/an-introduction-to-opengl-4-1/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>A brief preview of the new features introduced by OpenGL 3.3 and 4.0</title>
		<link>http://rastergrid.com/blog/2010/03/a-brief-preview-of-the-new-features-introduced-by-opengl-3-3-and-4-0/</link>
		<comments>http://rastergrid.com/blog/2010/03/a-brief-preview-of-the-new-features-introduced-by-opengl-3-3-and-4-0/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 16:23:17 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[fragment shader]]></category>
		<category><![CDATA[geometry instancing]]></category>
		<category><![CDATA[geometry shader]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[synchronization]]></category>
		<category><![CDATA[tessellation]]></category>
		<category><![CDATA[tessellation control shader]]></category>
		<category><![CDATA[tessellation evaluation shader]]></category>
		<category><![CDATA[texture array]]></category>
		<category><![CDATA[texture buffer]]></category>
		<category><![CDATA[transform feedback]]></category>
		<category><![CDATA[uniform buffer]]></category>
		<category><![CDATA[vertex shader]]></category>
		<category><![CDATA[vertex stream]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=207</guid>
		<description><![CDATA[The Khronos Group continues the progress of streamlining the OpenGL API. One very important step in this battle has been made just a few days ago by releasing two concurrent core releases of the OpenGL specification, namely version 3.3 and 4.0. This is a major update of the standard containing many revolutionary additions to the]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2010%252F03%252Fa-brief-preview-of-the-new-features-introduced-by-opengl-3-3-and-4-0%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2FceIGqq%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22A%20brief%20preview%20of%20the%20new%20features%20introduced%20by%20OpenGL%203.3%20and%204.0%22%20%7D);"></div>
<p>The Khronos Group continues the progress of streamlining the OpenGL API. One very important step in this battle has been made just a few days ago by releasing two concurrent core releases of the OpenGL specification, namely version 3.3 and 4.0. This is a major update of the standard containing many revolutionary additions to the tool-set of OpenGL that need careful examination. In this article I would like to talk about these new features trying to point out their importance and touching also some practical use case scenarios.</p>
<p><span id="more-207"></span>This is the fourth revision of the OpenGL API standard in the last two years. This fast pace revolution started about one and half years ago with the release of the version 3.0 of the specification. At that time, a great feel of disappointment has overcame the developers due to the lack of the promised rewrite of the whole API. Others, who had to deal with legacy code were also disappointed but they felt so because the new revision of the API threatened them with removing old features. These two opposing forces have put the Khronos Group into a situation where there was very difficult to make a decision that would make everybody happy. After two releases, this issue has been mostly resolved with OpenGL 3.2 and also lots of missing features have been integrated into the core API meanwhile.</p>
<p>Even though great steps has been made in order to fulfill everybody&#8217;s needs, the gap between the core functionality of OpenGL and the DirectX API still increased, especially due to the introduction of Shader Model 5.0 hardware. OpenGL was in a position when it had to adopt the features of the new hardware generation and also try to make up leeway in case of Shader Model 4.0 hardware. My personal wish was that there should be two new versions of the API: one that complements the OpenGL 3.x API with the missing features and another that catches up to DirectX 11. Actually my wish became true as the first time in the history of OpenGL we got two new releases of the standard at once, and finally, we got an API that is a really competitive alternative for Microsoft&#8217;s DirectX API. I think I can say this in the name of every OpenGL developer: Thank you Khronos!</p>
<p>Okay, but that&#8217;s enough about history and acknowledgements. Lets see what&#8217;s under the hood of the new API revisions! When I read the good news at <a title="OpenGL.org" href="http://www.opengl.org/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/?referer=');">OpenGL.org</a> I felt myself like a child at Christmas just taking the first look at the presents under the tree: I was in great ecstasy and started to &#8220;open the presents&#8221; as fast as I could&#8230;</p>
<h2>New features of OpenGL 3.3</h2>
<p>Let&#8217;s start with the new version of the API targeting Shader Model 4.x hardware. It seems that the concentration on the major release 4.0 didn&#8217;t capture the attention of the ARB explicitly as we have many interesting features already in the first box&#8230;</p>
<h3><a title="ARB_blend_func_extended" href="http://www.opengl.org/registry/specs/ARB/blend_func_extended.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/blend_func_extended.txt?referer=');">ARB_blend_func_extended</a></h3>
<p>This is a feature for what I&#8217;ve seen many requests on the OpenGL discussion forums. It enables fragment shaders to output an additional color per render target that can be used as a blending factor for either source or destination colors providing an additional degree of freedom to affect the way how fragments are blended into the destination buffers. This is one functionality that is supported by the underlying hardware for a while but without API support it was impossible to take advantage of it. As it is very straightforward how this feature works I would not even talk about it too much. Just one additional comment: surprisingly <a title="ATI Catalyst 10.2: Better CrossFire and OpenGL Support" href="http://www.geeks3d.com/20100218/test-ati-catalyst-10-2-better-crossfire-and-opengl-support/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.geeks3d.com/20100218/test-ati-catalyst-10-2-better-crossfire-and-opengl-support/?referer=');">AMD already supports this extension</a> in its latest graphics drivers which is a remarkable thing taking in consideration that AMD drivers were always a step behind the NVIDIA ones in the race of adopting latest OpenGL features. It seems that now AMD takes seriously the OpenGL support and this is good news for all the developers out there, especially for me, being an ATI fan.</p>
<h3><a title="ARB_explicit_attrib_location" href="http://www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt?referer=');">ARB_explicit_attrib_location</a></h3>
<p>Most probably not just for me, the way how the binding of vertex attributes to shader attributes and the binding of shader outputs to render targets happened earlier caused a big headache from both the point of view of modular software design and efficiency. Previously, the application developer had little to no control over how to automatically connect these elements together in a shader independent way. This tight coupling between the host application code and the shaders just make the work of the developers cumbersome. This feature leverages the way how this binding process is done by allowing to globally assign a particular semantic meaning to an attribute location without knowing how that attribute will be named in any particular shader, decoupling the host application from the shaders. This extension is a typical example how design abstractions can ease the life of the developer without any dependency on hardware support.</p>
<h3><a title="ARB_occlusion_query2" href="http://www.opengl.org/registry/specs/ARB/occlusion_query2.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/occlusion_query2.txt?referer=');">ARB_occlusion_query2</a></h3>
<p>Well, there isn&#8217;t too much to say about this extension as it just adds a new occlusion query type that reports just a boolean value about the visibility of the object rather than the actual samples. It is somewhat equivalent to the occlusion query extensions prior to <a title="ARB_occlusion_query" href="http://www.opengl.org/registry/specs/ARB/occlusion_query.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/occlusion_query.txt?referer=');">ARB_occlusion_query</a>. Don&#8217;t ask me why this feature is important but they felt that it might be useful. One thing I can think about that with such a query we might get our results about the occlusion query of the proxy object sooner as we have to wait only till the first passed sample but I&#8217;m not confident whether such thing is supported by either the hardware or the drivers.</p>
<h3><a title="ARB_sampler_objects" href="http://www.opengl.org/registry/specs/ARB/sampler_objects.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/sampler_objects.txt?referer=');">ARB_sampler_objects</a></h3>
<p>This is one another feature that people have been waiting for years. This extension decouples texture image data from sampler state. Previously, if a texture image had to be used with different sampler modes, no matter if we talk about various filtering modes or texture coordinate wrapping, one had to do expensive state changes to modify the sampler state of the texture object, accomplish the needed filtering or wrapping from within shaders or, in worst case, duplicating texture image data in order to have access to the same texture with different sampler parameters. The primary intend of this feature is to solve these problems.</p>
<p>One thing to remark regarding to this extension is that even though it is a long waited addition to the API, several people already expressed their discontent regarding to the fact that the texture unit semantics have been kept. Nevertheless, I also expected that the introduction of this feature should be the point when the texture unit semantics has to go but after seeing the example of <a title="ARB_explicit_attrib_location" href="http://www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt?referer=');">ARB_explicit_attrib_location</a> as a way to decouple the shader code from the host application code I tend to agree with Khronos in this decision as we can think about the texture units from now as an adapter layer between GPU and CPU code and as such the decision seems reasonable.</p>
<h3><a title="ARB_shader_bit_encoding" href="http://www.opengl.org/registry/specs/ARB/shader_bit_encoding.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_bit_encoding.txt?referer=');">ARB_shader_bit_encoding</a></h3>
<p>This extension adds built-in functions for getting and setting the bit encoding for floating-point values in the OpenGL Shading Language. As it is more like an indicator extension regarding to added functionality in the Shading Language I would rather not go into details as I will talk about the new Shading Language later.</p>
<h3><a title="ARB_texture_rgb10_a2ui" href="http://www.opengl.org/registry/specs/ARB/texture_rgb10_a2ui.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_rgb10_a2ui.txt?referer=');">ARB_texture_rgb10_a2ui</a></h3>
<p>Again, an extension that is quite self-explanatory: new texture image format called RGB10_A2 with non-normalized unsigned integers in them. This is nothing more than another hole filled in the gap between hardware and API support.</p>
<h3><a title="ARB_texture_swizzle" href="http://www.opengl.org/registry/specs/ARB/texture_swizzle.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_swizzle.txt?referer=');">ARB_texture_swizzle</a></h3>
<p>Especially when using one or two component texture formats, like in the case of shadow maps, the specification was somewhat unclear how these components are finally mapped to RGBA quadruples and provided little to no facilities to control this process. If the developers weren&#8217;t already fed up with this, the possibility of a problem increased even further because often the driver implementations behaved differently as well. This issue has been finally clarified with this extension by providing an explicit tool for the application developer to control the swizzling of the components that is done implicitly afterwards in case of every single texture fetch. The new state is introduced as part of texture object state that provides fine grained control over when and how to use the swizzling. According to the extension specification, this feature has a notable role in helping porting issues of legacy OpenGL applications as well as those of the games written for PlayStation 3 as the console provides such functionality already.</p>
<h3><a title="ARB_timer_query" href="http://www.opengl.org/registry/specs/ARB/timer_query.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/timer_query.txt?referer=');">ARB_timer_query</a></h3>
<p>Prior to this extension, runtime performance measurements were limited to the use of client side timing information or relying on the use of offline profiling mechanisms like that of AMD&#8217;s <a title="GPU PerfStudio" href="http://developer.amd.com/gpu/perfstudio/Pages/default.aspx" target="_blank" onclick="pageTracker._trackPageview('/outgoing/developer.amd.com/gpu/perfstudio/Pages/default.aspx?referer=');">GPUPerfStudio</a>. During development, this timing information can help identify application, driver or GPU bottlenecks. At runtime, this data can be used to dynamically optimize the scene to achieve reasonable frame rates. While today&#8217;s hardware provides a great repertoire of performance measurement metrics there was no API support to access these previously. This feature provides an additional asynchronous query type that enables application developers to measure the driver and GPU time that is required to complete a set of rendering commands, thus providing additional flexibility for both offline and runtime optimizations. While this extension does not guarantee 100% consistency and repeatability, the information gathered with timer queries will definitely make it possible to identify server side bottlenecks and the reasons behind them.</p>
<h3><a title="ARB_instanced_arrays" href="http://www.opengl.org/registry/specs/ARB/instanced_arrays.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/instanced_arrays.txt?referer=');">ARB_instanced_arrays</a></h3>
<p>Many people argued with me at the OpenGL discussion forums when I stated that instanced arrays should be included in core OpenGL. Their reasoning was built on the fact that we already have the <a title="ARB_draw_instanced" href="http://www.opengl.org/registry/specs/ARB/draw_instanced.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_instanced.txt?referer=');">ARB_draw_instanced</a> extension that provides a shader based thus much more flexible way to handle instanced geometry. While from this point of view I tend to agree with them, there are many non-trivial use cases which prove that my reasoning is not pointless. It seems that Khronos agrees with me regarding to this topic.</p>
<p>In a nutshell, the instanced arrays feature enables the use of vertex attributes as a source of instance data. This is done by introducing a so called &#8220;array divisor&#8221; that specifies how the corresponding vertex attributes are mapped to instances. Usually a vertex attribute advances on a per-vertex basis. In case of instanced arrays this advance happens only after ever<span style="font-size: small;">y Nth conceptual draw calls that is equivalent to  a traditional draw command, excluding instanced draw commands.</span></p>
<p>One use case can be when one deals with huge number of instances where the per-instance data simply not fits into uniform buffers. While in such cases one can use a texture buffer instead to source the instance data like it was mentioned in my article <a title="Uniform Buffers VS Texture Buffers - RasterGrid Blog" href="http://rastergrid.com/blog/2010/01/uniform-buffers-vs-texture-buffers/">Uniform Buffers VS Texture Buffers</a>, accepting the additional overhead of using texture fetches may prove to be a not-so-performance-wise decision. Beside standard instancing use cases, there are plenty of nasty tricks that can be efficiently achieved using this feature but that goes far beyond the scope of this article and requires a separate discussion on what I will most probably recap in the near future.</p>
<h3><a title="ARB_vertex_type_2_10_10_10_rev" href="http://www.opengl.org/registry/specs/ARB/vertex_type_2_10_10_10_rev.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/vertex_type_2_10_10_10_rev.txt?referer=');">ARB_vertex_type_2_10_10_10_rev</a></h3>
<p>We&#8217;ve arrived to the final new extension included in core OpenGL 3.3. This is another gap filling extension to provide two new vertex attribute data formats: a signed and an unsigned format with 10 bits for each significant coordinate. The most typical use of this format is to store vertex normals in the signed-normalized version of the format in order to have a compact (4 bytes per normal) yet high precision (due to 10 bits per component) format that can reduce memory needs and bandwidth requirements while retaining sufficient precision. Previously, there was no way to have such high precision for the vertex attributes in case of a 4-byte footprint.</p>
<h3>The OpenGL Shading Language 3.30</h3>
<p>The first remarkable thing is the shift in the versioning of the Shading Language. It seems that from now it will be in align with the core specification version. This decision was most probably made because of the introduction of two release branches of the standard specification in order to avoid confusion regarding to the correspondence between API and Shading Language versioning.</p>
<p>As in case of talking about the OpenGL Shading Language it is much more difficult to easily summarize the new features with corresponding use cases I will simply limit my comments to an excerpt from its specification regarding to the features added in this new version:</p>
<ul>
<li>Layout qualifiers can be used to declare the location of vertex shader inputs and fragment shader outputs in align with the API functionality provided by <a title="ARB_explicit_attrib_location" href="http://www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt?referer=');">ARB_explicit_attrib_location</a> as mentioned before.</li>
<li>Built-in functions provided to converting floating-point values to integer ones representing their encoding.</li>
<li>Some clarification of already existing facilities of the language.</li>
</ul>
<h2>New features of OpenGL 4.0</h2>
<p>It is very obvious that the major version number change indicates that this revision of the specification is targeting Shader Model 5.0 hardware. To be honest, as I was never really interested in DirectX, I barely know all the features introduced by DX11 but seems that there are some great facilities in OpenGL 4.0 that I&#8217;ve never heard that hardware supports it. This can be due to DX11 does not even support such functionalities but it is maybe because I don&#8217;t know enough details about DX11. Anyway, let&#8217;s see the revolutionary things that we face we checking out the latest version of the OpenGL specification&#8230;</p>
<h3><a title="ARB_draw_buffers_blend" href="http://www.opengl.org/registry/specs/ARB/draw_buffers_blend.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_buffers_blend.txt?referer=');">ARB_draw_buffers_blend</a></h3>
<p>Using this feature one is able to select individual blend equations and blend functions for each render target. This extension was already exposed for a few months now so most probably everybody heard about it or even if not the functionality is very straightforward. It simply removes some of the restrictions when dealing with multiple render targets (MRT). One interesting thing is still that the Khronos Group decided to include this extension in the 4.0 version of the API but not in 3.3. This is odd as Shader Model 4.0 capable hardware already supports this feature or at least I have the extension on my Radeon HD2600 which raises the question: why only in 4.0? Unfortunately, I don&#8217;t know the answer but I hope the ARB has a good reason behind this, as we will see later, there are other features that for some reason were only exposed in the latest version of the API but not in core for Shader Model 4.0 hardware.</p>
<h3><a title="ARB_sample_shading" href="http://www.opengl.org/registry/specs/ARB/sample_shading.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/sample_shading.txt?referer=');">ARB_sample_shading</a></h3>
<p>In case of traditional multisample rendering the hardware optimizes the multisampling in a way that the fragment shader is executed only once for each fragment. This can be done as the standard specification relaxes the way how the implementation behaves regarding to feeding color and texture coordinate values for each sample. While this optimization usually does not provide any rendering artifacts and it heavily reduces the amount of pressure on the GPU, there are some situations when this optimization results in aliasing artifacts. One sample use case is when alpha-tested primitives are rendered.</p>
<p>This extension provides a global state for enabling and disabling sample shading and a way to control how fine-grained per-sample shading should be by supplying a minimum number of samples that need to be shaded. Beside this, it also introduces the required language elements to the OpenGL Shading Language to support sample shading.</p>
<h3><a title="ARB_shader_subroutine" href="http://www.opengl.org/registry/specs/ARB/shader_subroutine.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_subroutine.txt?referer=');">ARB_shader_subroutine</a></h3>
<p>In my humble opinion, this is one of the most important features introduced in this new version of the API specification. So far, many engine and shader developers faced the problems that where inherently there in the Shading Language that heavily reduced the ability to create a modular shader design in order to separate the independent tasks done in shaders nowadays. One initiative was the idea behind the <a title="EXT_separate_shaders" href="http://www.opengl.org/registry/specs/EXT/separate_shader_objects.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/separate_shader_objects.txt?referer=');">EXT_separate_shader_objects</a> extension. While that extension removed the dependency between shader stages, it does not address the problem with tight coupling inside one shader stage, also the aforementioned extension defeats some of the design goals of the Shading Language introducing complicated language semantics in order to solve the problem of inter-stage dependency.</p>
<p>Just to emphasize the importance of this new functionality with a very basic example, let&#8217;s take a simple rendering engine that supports skeletal animated geometry, materials and lights. In such a use case both the vertex and fragment shaders have multiple roles: the vertex shader has to perform the skeletal animation (property of the geometry) and the view transformation (property of the camera or of the light in case of shadow map rendering), and the fragment shader has to calculate the incident light to the surface point (property of the light) and then calculate the illuminance factor (property of the material). With the traditional tool-set these components of the shaders were tightly coupled and in order to support the combination of any geometry type (animated or not, skeletal or morph animation, etc.), any light type (directional, point, etc.) and material type (diffuse, phong, environment mapped, etc.), one had to compile all possible combinations of the shaders or create uber-shaders that do run-time decisions in order to solve the problem of heterogeneous inputs. Both of these solutions provide additional hardware resource usage and possible runtime overhead.</p>
<p>This extension adds some kind of polymorphism support to shaders. This way a single shader can include many alternative subroutines for a particular task and dynamically select through the API which subroutine is called from each call site. This opens the doors for modular shader designs while retaining most of the performance of specialized shaders.</p>
<h3><a title="ARB_tessellation_shader" href="http://www.opengl.org/registry/specs/ARB/tessellation_shader.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/tessellation_shader.txt?referer=');">ARB_tessellation_shader</a></h3>
<p>Yes, this is about the new geometry tessellation mechanism introduced by Shader Model 5.0 hardware. The extension itself introduces three new stages that are roughly situated between the vertex shader and the geometry shader:</p>
<ul>
<li><strong>Tessellation Control Shader</strong> &#8211; This new shader type operates on a patch that is actually nothing more than a fixed-size collection of vertices, each with per-vertex attributes and a number of associated per-patch attributes. Also note that while it operates on a patch, it is invoked on a per-vertex basis. The most important rule of this shader is to perturb the tessellation level for the patch that controls how finely the patch will be tesselated. Usually think about a patch as a triangle or quad. This shader is equivalent to DX11&#8242;s hull shader.</li>
<li><strong>Fixed-function tessellation primitive generator</strong> &#8211; The role of this new stage is to subdivide the incoming patch based on the tessellation level and related configuration that the unit gets as input.</li>
<li><strong>Tessellation Evaluation Shader</strong> &#8211; This new shader type is responsible of calculating the position and other attributes of the vertices produced by the tesselator. This shader is equivalent to DX11&#8242;s domain shader.</li>
</ul>
<p>One important thing to notice is that a new primitive type is introduced, namely a patch. A patch on its own it is not directly or indirectly related to any traditional OpenGL primitive as it cannot be directly rendered. It is used only as the input type for the tesselator, however, a patch supplies the control grid of the geometry to be generated via tessellation so in practice it is most likely to be equivalent with triangles or quads but it is important to remark the difference.</p>
<p>As this is maybe the most well known feature of Shader Model 5.0 hardware I wouldn&#8217;t like to talk about it more as everybody knows what is it for and it would be rather long to explain how to use it. Also, it is not the intension of this article to fully cover the usage of all the new features, it is just a quick summarization of the new possibilities.</p>
<h3><a title="ARB_texture_buffer_object_rgb32" href="http://www.opengl.org/registry/specs/ARB/texture_buffer_object_rgb32.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_buffer_object_rgb32.txt?referer=');">ARB_texture_buffer_object_rgb32</a></h3>
<p>Yet another extension that introduces an additional format, now for texture buffers. Previously, texture buffers supported only four-component formats, this is extended with three-component formats. As currently there is no any practical use case in my mind when this can be useful, I would rather not come up with one. However, my opinion is that these formats most probably work with reduced performance compared to the four-component ones even though the memory footprint and bandwidth usage is maybe somewhat lower, I have concerns regarding to alignment related performance issues.</p>
<h3><a title="ARB_texture_cube_map_array" href="http://www.opengl.org/registry/specs/ARB/texture_cube_map_array.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_cube_map_array.txt?referer=');">ARB_texture_cube_map_array</a></h3>
<p>Those who already use texture arrays to help batching issues and remove unnecessary state changes most probably adore this extension as it enables texture array capabilities also for cube map textures. This comes handy especially in case when many materials use environment cube maps or when shadow cube maps are used for point lights. To come up with even a more concrete example, you can render the shadow cube maps for many hundreds of point lights with a single draw call by taking advantage of the layered rendering capability of geometry shaders and the possibility to bind texture arrays as render targets.</p>
<p>One more thing to notice here is that cube map arrays are already supported by Shader Model 4.1 hardware so the question to the ARB is again there, however, as OpenGL 3.3 still targets Shader Model 4.0 hardware maybe we will see a 3.x version of the specification that will also include this extension. The judgement is up to you whether you agree with me or not.</p>
<h3><a title="ARB_texture_gather" href="http://www.opengl.org/registry/specs/ARB/texture_gather.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_gather.txt?referer=');">ARB_texture_gather</a></h3>
<p>Another feature from the repertoire of Shader Model 4.1. This extension introduces new texture fetching functions to the Shading Language that determine a 2&#215;2 footprint of the texture that would be used for linear filtering in a texture lookup and returns a vector consisting of the first component from each of the four texels in the footprint. This is the so called Gather4 texture fetching mode and can be useful to accelerate percentage closer filtering of shadow maps as it can fetch four samples at once. Still, there are some limitations on the use of this fetching mode, one important thing is that a shader cannot use normal and gather fetches on the same sampler. This makes me think about whether this feature is not part of the sampler object state instead of being a Shading Language construct. Anyway, as in typical use cases these limitations does not defeat the goal of the feature, I would not consider this problem a design issue.</p>
<h3><a title="ARB_transform_feedback2" href="http://www.opengl.org/registry/specs/ARB/transform_feedback2.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/transform_feedback2.txt?referer=');">ARB_transform_feedback2</a></h3>
<p>The transform feedback mechanism already proved to me that is a great addition to the tool-set of graphics application developers. This feature extends transform feedback with an object type that encapsulates transform feedback related state to enable configuration reuse. Also it provides a way to pause and resume transform feedback mode if, for some reason, some rendering commands should be excluded from the feedback process.</p>
<p>The last and maybe most important benefit of this extension is the ability to draw primitives captured in transform feedback mode without querying the captured primitive count. It is roughly equivalent to DX10&#8242;s AutoDraw feature and the purpose of it is to eliminate the need to query the number of previously generated primitives in order to supply it to an OpenGL draw command. This solves the synchronization issues that previously happened between the CPU and the GPU.</p>
<p>One example is when a skeletal animated geometry has to be used in a multipass rendering technique. We can think about traditional forward rendering or when dealing with multiple shadow maps that have to be generated. Anyway, as the calculations needed to perform skeletal animation are rather expensive, it is wastage to perform these calculations in each pass.  A common way to solve this problem is to use transform feedback to capture the geometry emitted by a vertex shader that simply executes the skeletal animation on the input geometry. In subsequent rendering passes this feedback buffer can be used to source the geometry data to eliminate the need to recompute the animation. Without this extension, in such cases the application is most probably stalled until the feedback process ends as it needs to query the number of generated primitives. With this extension, this is solved as we don&#8217;t have to know the results of the previous transform feedback in order to issue a draw command that sources the data from the feedback buffer. By the way, this seems to be logical as the information is already on the GPU so why it should ping-pong between the CPU and the GPU?</p>
<p>As I mentioned before, the functionality provided by this extension is equivalent to DX10&#8242;s AutoDraw feature. This time my question is really serious: why this feature haven&#8217;t been included in OpenGL 3.3? It would provide a great benefit for those who use transform feedback and I don&#8217;t see any reason behind not supporting it because, as far as I can tell, it is supported on the corresponding hardware.</p>
<h3><a title="ARB_transform_feedback3" href="http://www.opengl.org/registry/specs/ARB/transform_feedback3.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/transform_feedback3.txt?referer=');">ARB_transform_feedback3</a></h3>
<p>Surprisingly, OpenGL 4.0 comes with another transform feedback extension as well but this time a true Shader Model 5.0 feature. The new hardware generation has the ability to emit vertices from the geometry shader to multiple vertex streams. In order to provide clever API support, the ARB decided to relax the previous limitation of transform feedback mode that output can be in either interleaved format or to separate buffers. This new extension enables the use of both together also providing a way to group geometry shader outputs to groups in order to target the individual vertex streams.</p>
<p>The most important benefit of this feature is still that we have separate streams, each with its own primitive emission counter so the outputs should not necessarily have the same granularity. This provide room for very clever rendering techniques. As an example, remember NVIDIA&#8217;s <a title="NVIDIA Skinned Instancing demo" href="http://developer.download.nvidia.com/SDK/10/direct3d/samples.html" target="_blank" onclick="pageTracker._trackPageview('/outgoing/developer.download.nvidia.com/SDK/10/direct3d/samples.html?referer=');">Skinned Instancing</a> demo that used one draw call per geometry LOD to sort instance data on a per-LOD basis. Using this extension, this preprocessing step can be done with a single draw call, but the abilities of this feature goes far beyond such a simple use case, I will also talk a bit about another in the next section.</p>
<p>One of my less technical notes is that it seems that the Khronos Group members have good sense of humor. I realized this when I met the &#8220;manbearbig&#8221; when reading one of the examples in the extension specification.</p>
<h3><a title="ARB_draw_indirect" href="http://www.opengl.org/registry/specs/ARB/draw_indirect.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_indirect.txt?referer=');">ARB_draw_indirect</a></h3>
<p>We&#8217;ve arrived to the most culminating point in the list of features introduced. It is hard to say such a thing, but in my humble opinion this extension can be the Holy Grail of next generation rendering engines. I will explain why I think so&#8230;</p>
<p>The extension provides a way to source the parameters of instanced draw commands from within buffer objects. One naive use case would be to put all the rendering command parameters to a buffer object using the host application and then draw everything with a single command. While this simple method already has its benefits, this feature provides much more flexibility than this. The most revolutionary is that, using this extension, one is able to generate instanced draw commands with the GPU on-the-fly. Together with <a title="ARB_transform_feedback3" href="http://www.opengl.org/registry/specs/ARB/transform_feedback3.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/transform_feedback3.txt?referer=');">ARB_transform_feedback3</a> it is possible to write a completely GPU based scene management system.</p>
<p>Those who remember my <em>Instance Cloud Reduction</em> (ICR) algorithm, presented in the article <a title="Instance culling using geometry shaders - RasterGrid Blog" href="http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/">Instance culling using geometry shaders</a>, know that the required synchronization points between the CPU and the GPU heavily limited the practical utility of the culling technique. By taking advantage of the aforementioned features in case of ICR does not just eliminate the synchronization issues that I&#8217;ve spoken of but makes the technique practical also in case of heavily heterogeneous scenes with virtually any number of geometries even if there are multiple number of LOD level for them, and this whole stuff can be done with even less number of draw calls than that of the demo that accompanied my article. As soon as we will see OpenGL 4.0 capable drivers I will write an article about this technique, supplying also a reference implementation.</p>
<h3><a title="ARB_texture_query_lod" href="http://www.opengl.org/registry/specs/ARB/texture_query_lod.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_query_lod.txt?referer=');">ARB_texture_query_lod</a></h3>
<p>This extension provides new fragment shader texture functions, namely textureLOD*, that return the results of automatic LOD computations that would be performed if a texture lookup would be performed. These functions return a two-component vector. The X component of the result vector contains information about the mipmap level that would be used if a normal texture lookup would have been made with the same coordinates. This value can be a concrete mipmap level or a value between two levels if trilinear filtering is in use. The Y component of the result holds the computed LOD lambda-prime, see the OpenGL specification in order to check out where it is actually coming from and how it is calculated.</p>
<p>One interesting thing that this extension can be used for is when one implements some shader based filtering and addressing method for textures. As an example, lets take a mega-texture implemented that uses a 3D texture for storage, without actual mipmaps, and the addressing, filtering and mipmapping is done with shader code. As right now this is the only example that came into my mind and this is already awkward enough, I would rather leave the further discussion of the importance of this feature to more competent people.</p>
<h3><a title="ARB_gpu_shader5" href="http://www.opengl.org/registry/specs/ARB/gpu_shader5.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/gpu_shader5.txt?referer=');">ARB_gpu_shader5</a></h3>
<p>Basically, this extension is nothing more than a big umbrella feature under what all the additional general or minor API changes go. Just to sum up the miscellaneous features provided by this extension, here is an excerpt from the extension specification:</p>
<ul>
<li>Support for indexing into arrays of samplers using non-constant indices.</li>
<li>Support for indexing into an array of uniform blocks.</li>
<li>Extending Gather4 with the ability to select any single component of a multi-component texture, to perform per-sample depth comparison, and to specify arbitrary offsets computed at runtime when gathering the 2&#215;2 footprint.</li>
<li>Support for instanced geometry shaders, where a geometry shader may be run multiple times for each primitive.</li>
</ul>
<p>For a full list of new facilities introduced by the extension refer to the extension specification.</p>
<h3><a title="ARB_gpu_shader_fp64" href="http://www.opengl.org/registry/specs/ARB/gpu_shader_fp64.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/gpu_shader_fp64.txt?referer=');">ARB_gpu_shader_fp64</a></h3>
<p>This extension enables the use of double-precision floating-point data types and arithmetic from within shaders, also providing API entry points for double-precision data where it was missing. While one may think that the added precision is somewhat wastage in case of real-time graphics, it is important to note that GPUs are more and more often used for scientific calculations, not even necessarily in case of graphics related tasks. Taking in consideration this fact, the importance of double-precision floating-point support should not be underestimated. Beside that, maybe standard graphics application developers can also take advantage of the higher precision in some extreme use case scenarios.</p>
<h3>The OpenGL Shading Language 4.00</h3>
<p>Beside what I&#8217;ve already mentioned, there is no important thing to mention regarding to the Shading Language. There where many changes but most of them are simply provide Shading Language support to the API extensions. What I haven&#8217;t mentioned so far is the synchronization possibility for tessellation shaders, more implicit conversions, more integer functions, packing and unpacking facilities for floating-point formats and a new qualifier to force precision and disallow optimizations that re-order operations or treat different instances of the same operator with different precision.</p>
<h2>Conclusion</h2>
<p>I hope that some of you didn&#8217;t give up the reading so far. Sorry, but it seems that this article gone wild and still didn&#8217;t manage to cover all the topics I intended to talk about. But still, maybe I&#8217;ll recap on those subjects later.</p>
<blockquote><p>Where is direct state access?</p></blockquote>
<p>The original promise of eliminating the bind-to-modify semantics from the OpenGL API is still not done. The first reaction of many people is still to ask this question. While the bind-to-modify semantics is a rather annoying &#8220;feature&#8221; of OpenGL, I tend to state that if we are not talking about legacy OpenGL, the importance of direct state access is less and less relevant as we can already heavily reduce the number of state changes and API calls in our applications, thanks to the fast pace evolution of OpenGL. I sincerely think that with a modern rendering engine design built upon the idioms behind the new versions of the OpenGL API one should not face any significant scalability issues due to the outdated bind-to-modify semantics but maybe I&#8217;m wrong.</p>
<p>Personally, I have only one problem with the newly released specification versions that I&#8217;ve already tried to emphasize several times: the fact that so far many Shader Model 4.x features are missing from the 3.x line of the API specification. Hopefully that will be solved sooner or later, however addressing these issues should happen before the hardware to support will become outdated.</p>
<p>Anyway, we should not have any harsh complains as the Khronos Group did a great job again. They managed to keep again the half-year schedule and they even published two parallel releases at once! If someone still says that the DirectX API is superior compared to OpenGL should think it twice, as it seems that the tendency is that OpenGL just starts to evolve more and more fast. Beside that as now also AMD is being active in the OpenGL world, we can expect good support from both industry and developer community point of view.</p>
<p>My respect for the Khronos Group and thanks for reading the article!</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/03/a-brief-preview-of-the-new-features-introduced-by-opengl-3-3-and-4-0/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Instance culling using geometry shaders</title>
		<link>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/</link>
		<comments>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/#comments</comments>
		<pubDate>Mon, 08 Feb 2010 22:58:53 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Samples]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[culling]]></category>
		<category><![CDATA[fragment shader]]></category>
		<category><![CDATA[geometry instancing]]></category>
		<category><![CDATA[geometry shader]]></category>
		<category><![CDATA[GLEW]]></category>
		<category><![CDATA[GLM]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[SFML]]></category>
		<category><![CDATA[texture buffer]]></category>
		<category><![CDATA[transform feedback]]></category>
		<category><![CDATA[uniform buffer]]></category>
		<category><![CDATA[vertex buffer]]></category>
		<category><![CDATA[vertex shader]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=135</guid>
		<description><![CDATA[Since the appearance of Shader Model 4.0 people wonder how to take advantage of the newly introduced programmable pipeline stage. The most important feature enabled by geometry shaders is that one can change the amount of emitted primitives inside the pipeline. The first thing that a naive developer would try to do with it is]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2010%252F02%252Finstance-culling-using-geometry-shaders%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2FanKmpg%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Instance%20culling%20using%20geometry%20shaders%22%20%7D);"></div>
<div id="attachment_136" class="wp-caption alignleft" style="width: 160px"><a href="http://rastergrid.com/blog/wp-content/uploads/2010/02/Nature-2010-02-08-20-20-36-24.png"><img class="size-thumbnail wp-image-136  " title="Nature demo screenshot" src="http://rastergrid.com/blog/wp-content/uploads/2010/02/Nature-2010-02-08-20-20-36-24-150x150.png" alt="Nature demo screenshot" width="150" height="150" /></a><p class="wp-caption-text">OpenGL 3.2 - Nature</p></div>
<p>Since the appearance of Shader Model 4.0 people wonder how to take advantage of the newly introduced programmable pipeline stage. The most important feature enabled by geometry shaders is that one can change the amount of emitted primitives inside the pipeline. The first thing that a naive developer would try to do with it is geometry tesselation. However, the new shader performs very bad when used for tesselation in a real life scenario even though there are demos show casting this possibility. If we take a closer look at the new feature we observe that the most revolutionary in it is not that it can raise the number of emitted primitives but that it can discard them. This article would like to present a rendering technique that takes advantage of this aspect of geometry shaders to enable the GPU accelerated culling of higher order primitives.</p>
<p><span id="more-135"></span>Geometry shaders can be used for many different advanced rendering techniques that were impossible before the introduction of this flexible programmable shader stage. In this article I would like to present one use case that for me seemed to be one of the most practical application of primitive manipulation possibilities introduced by geometry shaders. As I haven&#8217;t seen any whitepaper talking specifically about this particular technique, even if some of them inherently used it, I would dare name the technique myself as <strong>Instance Cloud Reduction</strong>. I will also present a demo program that shows how to take advantage of the technique in a heavy workload situation.</p>
<p>The idea itself was inspired by AMD&#8217;s  tech demo for the Radeon 4800 series cards called <a title="March of the Froblins" href="http://developer.amd.com/samples/demos/pages/froblins.aspx" target="_blank" onclick="pageTracker._trackPageview('/outgoing/developer.amd.com/samples/demos/pages/froblins.aspx?referer=');">March of the Froblins</a>. An almost identical technique presented in this article is used in the mentioned demo for the culling of large amount of animated creatures against the view frustum. Also a somewhat similar technique is used in NVIDIA&#8217;s <a title="Skinned Instancing" href="http://developer.download.nvidia.com/SDK/10/direct3d/samples.html" target="_blank" onclick="pageTracker._trackPageview('/outgoing/developer.download.nvidia.com/SDK/10/direct3d/samples.html?referer=');">Skinned Instancing</a> demo for determining LOD instance sets. Unfortunately, both demos are for DirectX only and, as far as I can tell, there is no OpenGL demo showing any of the aforementioned rendering techniques.</p>
<h3>Motivation</h3>
<p>Nowadays, as the computational capabilities of GPUs is growing in a much faster pace than that of CPUs, graphics developers meet more and more optimization problems related to CPU bound applications. More and more focus is on minimizing the number of driver invocations, actually that&#8217;s what motivated the restructuring of the two most commonly used graphics APIs. As a result we have now DirectX 10+ and OpenGL 3+. However, even if the introduction of geometry instancing, texture arrays and local memory buffer storage for the most important inputs of the rendering, there is still need for wise decisions from graphics programmers to take full advantage of the horsepower coming with the latest GPUs.</p>
<p>Earlier graphics applications strongly relied on CPU based culling techniques, whether it be the usage of the quite outdated BSPs or the more generic and still heavily applied hierarchical culling techniques. We&#8217;ve already reached the point that sometimes even the most efficient CPU based culling techniques seem to be too expensive and usually introduce the small batch problem. Instanced rendering is not an exception.</p>
<p>The applicability of geometry instancing is strongly limited by several factors. One of the most important ones is the culling of instanced geometries. One may choose to cull these objects in the same fashion as others, using the CPU, but that usually breaks the batch and maybe we loose the benefits of geometry instancing. It is more and more imminent to have a GPU based alternative. Without CPU based culling, by sending the whole bunch of instances down the graphics pipeline may choke our vertex processor in case we have high poly geometries and quite large amount of instances of it.</p>
<p>The rendering technique presented in this article will try to achieve this goal. We will use a multi-pass technique that in the first pass culls the object instances against the view frustum using the GPU and in the second pass renders only those instances that are likely to be visible in the final scene. This way we can severely reduce the amount of vertex data sent through the graphics pipeline.</p>
<h3>Implementation</h3>
<p>For some people it might seem that the promise for such a technique is simply too naive and is most probably relying on very exotic OpenGL features, heavy misuse of some basic features or need of data conversions during the frame rendering. Wondrously, this is not the case as we have all we need in OpenGL 3.2 to implement the object culling method sketched above. All we need are the followings:</p>
<ul>
<li>instanced rendering (core since OpenGL 3.1)</li>
<li>geometry shaders (core since OpenGL 3.2)</li>
<li>transform feedback (core since OpenGL 3.0)</li>
<li>uniform or texture buffers (core since OpenGL 3.1)</li>
</ul>
<p>The method itself is a multi-pass rendering technique, however, unlike other multi-pass rendering techniques it does not produce any fragments in the first pass, instead the first pass does the view frustum culling and processes data entirely only inside buffer objects.</p>
<h3>Culling pass</h3>
<p>In the first pass we will feed the graphics pipeline with information about the instances that are needed to perform the view frustum culling. For this we need two inputs for the executed shaders in order to be able to perform the required calculations:</p>
<ol>
<li><strong>Instance transformation data</strong> (whether it be a simple transformation matrix or quaternions or whatever) -- This preferably comes from one or more buffer objects that are bound as vertex buffers to the context.</li>
<li><strong>Object extents information</strong> -- Beside the instance positions we have to know the extents of an instance in order to perform correct culling. This can be either a single float representing the object radius if we choose to use bounding spheres for the culling or a three-dimensional extent vector if we would like to use bounding boxes.</li>
</ol>
<p>Using these as input we can feed in the instance transformation data as attributes of point primitives to our culling shader. The culling shader is composed of a vertex and a geometry shader. In a typical setup the role of each is the following: the vertex shader determines whether the actual object instance&#8217;s bounding volume is inside the view frustum and sends a flag about the culling to the geometry shader, that will emit the instance data to the destination buffer if the flag says that the instance is likely to be visible or does not emit anything if it is determined that the object instance is out of view.</p>
<p>Next, transform feedback is used to capture the primitives emitted by the geometry shader into another buffer object that will be used in the actual rendering pass to source instance transformation data. Beside this, we also need to have an asynchronous query to determine the number of primitives generated to know how many instances of the object do we actually need to render. The following figure shows the workflow of the first pass:</p>
<div id="attachment_146" class="wp-caption aligncenter" style="width: 460px"><a href="http://rastergrid.com/blog/wp-content/uploads/2010/02/icr_pass1.png"><img class="size-full wp-image-146" title="Culling pass" src="http://rastergrid.com/blog/wp-content/uploads/2010/02/icr_pass1.png" alt="Culling pass" width="450" height="200" /></a><p class="wp-caption-text">Instance Cloud Reduction - Pass 1: Culling</p></div>
<p>The actual geometry shader implementation needed to perform the actual culling based on the view frustum check performed by the vertex shader should look like the following chunk:</p>
<pre class="brush: c">#version 150 core

layout(points) in;
layout(points, max_vertices = 1) out;

in vec4 OrigPosition[1];
flat in int objectVisible[1];

out vec4 CulledPosition;

void main() {

	/* only emit primitive if the object is visible */
	if ( objectVisible[0] == 1 )
	{
		CulledPosition = OrigPosition[0];
		EmitVertex();
		EndPrimitive();
	}
}</pre>
<p>In this example we used only simply a four-component position vector for the instance transformation data but the technique works well for transformation matrices and quaternions as well.</p>
<p>One more thing is that beside that we set up transform feedback in a way that we feed our buffer object dedicated for the culled instance data and we also started an asynchronous query to be able to determine the number of primitives written into the buffer object, it is also useful to turn of rasterization as we wouldn&#8217;t like to produce any fragments as a result of the first pass.</p>
<h3>Rendering pass</h3>
<p>In the second pass there is nothing special to do. Simply use whatever rendering setup you would like to use. The only things that need to be changed in this step compared to your already existing rendering path is that the instance data for the rendering must be sourced from the generated culled instance data buffer and, as a result, the number of instances passed for the instanced drawing functions shall be changed in order to render only the visible instances. This number can be read from the asynchronous query&#8217;s result that we started in the first pass.</p>
<p>The instance data in the rendering pass can be, of course, sourced from either a uniform or a texture buffer object. This depends on the actual use case and is more clearly explained in the article <a href="http://rastergrid.com/blog/2010/01/uniform-buffers-vs-texture-buffers/">Uniform Buffers VS Texture Buffers</a>.</p>
<p>Important note is that when one has to deal with several instanced geometries it is recommended to do the culling phase prior to rendering any instanced primitives because of the following reasons:</p>
<ul>
<li>The result of the first instance cloud&#8217;s culling is more likely to be finished on the GPU so no sync issues arise from reading the asynchronous query result to determine the number of visible instances.</li>
<li>Probably less state changes are needed as very different setup is required by the two passes.</li>
<li>Results in tidier renderer design as culling is clearly separated from actual rendering.</li>
</ul>
<p>Putting everything together, the application of the presented technique would result in the following workflow on the GPU:</p>
<div id="attachment_150" class="wp-caption aligncenter" style="width: 660px"><a href="http://rastergrid.com/blog/wp-content/uploads/2010/02/icr_combined.png"><img class="size-full wp-image-150" title="Instance Cloud Reduction" src="http://rastergrid.com/blog/wp-content/uploads/2010/02/icr_combined.png" alt="Instance Cloud Reduction" width="650" height="347" /></a><p class="wp-caption-text">Instance Cloud Reduction - Combined view of Pass 1 + Pass 2</p></div>
<h3>Conclusion</h3>
<p>We&#8217;ve seen that the presented advanced rendering technique is able to help in situations when we have to deal with large number of instanced geometries and how to take advantage of the latest features of graphics cards and OpenGL to perform view frustum culling calculations on the GPU. This prevents us from having to deal with complicated and expensive CPU based object culling methods that break the drawing batches, especially when dealing with dynamic objects. For ease the decision whether to incorporate this technique in your rendering engine I would like to present the advantages and disadvantages of it.</p>
<p><strong>Advantages:</strong></p>
<ul>
<li>Heavily reduces the amount of processed data in a naive implementation.</li>
<li>No need for any space partitioning methods in the host application to handle the culling of dynamic objects.</li>
<li>Can handle huge amount of instanced objects due to the enormous horsepower of today&#8217;s GPUs.</li>
<li>Scales well with increased number of instances as the per-instance calculation is relatively low.</li>
<li>Relies strictly on OpenGL 3.2 core features.</li>
<li>No need for OpenCL capable hardware.</li>
</ul>
<p><strong>Disadvantages:</strong></p>
<ul>
<li>Needs an extra rendering pass to perform the culling.</li>
<li>Requires the usage of asynchronous queries to determine the number of visible instances.</li>
</ul>
<p>I hope you agree with me and think about this technique as one more step towards fully GPU based scene management. If you have any remarks or improvement ideas regarding to the rendering technique itself feel free to tell me.</p>
<h3>The Demo</h3>
<p>As I promised, the technique presented above comes with a live demo that actually took most of my time dedicated to writing this blog in the last two weeks. The demo itself is more like a technical show cast rather than a presentation of a real-life use case scenario.</p>
<p>First of all, I used high polygon count models for the rendering to emphasize the amount of time the culling phase spares from the very valuable time of our GPU. In a real world application one would never do something like this. As a result, the demo is more like a benchmark than an interactive application. However, maybe on high-end graphics cards it can perform pretty well.</p>
<p>The demo scene consists of two object types: trees and grass blocks. The tree model is further divided into two parts as they need different textures: the tree trunk and the tree foliage. Obviously, this additional burden can be prevented by using texture arrays to avoid the need of separate draw calls to render the trunk and the foliage.</p>
<p>The tree trunk consists of 33138 triangles, the tree foliage has 16069 triangles and the faking-free grass block consists of 8961 triangles which I had to model myself as didn&#8217;t found any suitable model. Actually this modeling step consumed quite a reasonable amount of my time spent with the demo as I&#8217;m not an expert in this domain.As you can see, these models are not the ones that one might use in an interactive real-time application like games. However, they seemed to be very suitable for the purpose of the demonstration.</p>
<p>What really kicks off the boundaries of GPUs is that the demo renders 10,000 trees and 250,000 grass blocks using instancing. This ends up in more than <strong>2.7 billion triangles</strong> in the scene. This is far more that a GPU can handle without the aid of some scene management and culling. However, we will use no scene management at all and the only culling method that we will use is the one presented in this article.</p>
<p>The actual results are quite promising. The view frustum culling step usually spares more than <strong>99.9%</strong> of the GPU horsepower as the amount of actually rendered triangles after the culling step is far below 2 million triangles. This is still quite much but as we use high polygon count models and we don&#8217;t use any LOD techniques this seems reasonable.</p>
<p>Even if the demo scene statistics doesn&#8217;t seem like a typical use case scenario, the ease of the implementation and the compelling visual results made me pleased anyway:</p>
<p style="text-align: center;"><span class="youtube">
<object width="640" height="480">
<param name="movie" value="http://www.youtube.com/v/srbOFTLTe8k?color1=3a3a3a&amp;color2=999999&amp;border=0&amp;fs=1&amp;hl=en&amp;modestbranding=1&amp;loop=&amp;showinfo=0&amp;iv_load_policy=3&amp;showsearch=0&amp;rel=1&amp;hd=1" />
<param name="allowFullScreen" value="true" />
<embed wmode="opaque" src="http://www.youtube.com/v/srbOFTLTe8k?color1=3a3a3a&amp;color2=999999&amp;border=0&amp;fs=1&amp;hl=en&amp;modestbranding=1&amp;loop=&amp;showinfo=0&amp;iv_load_policy=3&amp;showsearch=0&amp;rel=1&amp;hd=1" type="application/x-shockwave-flash" allowfullscreen="true" width="640" height="480"></embed>
<param name="wmode" value="opaque" />
</object>
</span><p><a href="http://www.youtube.com/watch?v=srbOFTLTe8k&fmt=18" onclick="pageTracker._trackPageview('/outgoing/www.youtube.com/watch?v=srbOFTLTe8k_fmt=18&amp;referer=');">www.youtube.com/watch?v=srbOFTLTe8k</a></p></p>
<p>On my Radeon HD2600XT I have achieved 6-7 frames per second which is acceptable taking in consideration the huge amount of geometry data still passed to the graphics card. On more recent cards I suppose it should run with good frame rates, however, due to the lack of hardware to test on, these are my only results. If anybody manages to take a better screen capture than mine above then please let me know.</p>
<h3>Implementation details</h3>
<p>Just to tell a few words about what techniques and tricks I&#8217;ve used during the creation of the demo here is a listing of the most important ones:</p>
<ul>
<li>Three models are used as mentioned previously with high instance counts with over 2.7 billion of total triangles in the scene as mentioned already.</li>
<li>Three 512x512 RGBA textures are used for the models that are partially handmade, and again, I&#8217;m not a texture artist so sorry if they don&#8217;t look flawless.</li>
<li>The wavefront model and TGA image loader that accompany the demo are very roughly implemented only for the demo so I would strongly encourage you not to use it to any purpose as it handles only a subset of the possibilities of the file formats.</li>
<li>The vertex data from the wavefront model files is transferred in a very naive way so vertex reuse isn&#8217;t taken into account.</li>
<li>The instance data consists of simple four-component vectors representing the world-space position of the instance. This seemed to be the most simple for the demonstration purposes.</li>
<li>In the second pass, the instance data is sourced from a texture buffer but not really because the visible instance count exceeded the amount that would fit in a uniform buffer. I used texture buffers because for this simple demonstration they seemed to be a little bit more easy to be integrated.</li>
<li>The morphing effect that simulated wind blow is done using hard-coded geometry deformation in the vertex shader. It is not physically correct but visually compelling.</li>
<li>The lighting is a simple directional light using Phong&#8217;s shading and reflection model.</li>
<li>Simple fog is simulated with some awkward formula that I&#8217;ve chosen after a few test runs.</li>
<li>Alpha testing is achieved by using the discard operation in the fragment shader.</li>
</ul>
<h3>Driver issues</h3>
<p>During the development of the demonstration program I&#8217;ve met several driver related problems as I&#8217;ve never used so heavily the latest OpenGL features previously. I&#8217;ve worked with Catalyst 9.12 and 10.1 but both seemed to lack of a proper GLSL compiler. Here are some of the issues I&#8217;ve met:</p>
<ul>
<li>When I&#8217;ve forgot to declare the varyings in the geometry shader as arrays like the standard requires then still the driver hasn&#8217;t complained about any syntax error but when tried to execute the code the program crashed.</li>
<li>Except the texture sampler uniform, all other uniforms failed to work when used in the fragment shader only so I&#8217;ve put them all in the vertex shader.</li>
<li>For loops seemed not to work when used inside the geometry shader, that&#8217;s why the culling itself is done in the vertex shader in the demo.</li>
</ul>
<p>All these problems resulted in nasty tricks to make things working and ended up in awful shader code. Sorry for that. At least now it works on my configuration but pretty unsure whether it will work on other graphics card and driver combos. Please report me any success or failure when trying out the demo. Anyway, be sure to have the latest graphics drivers installed as, at least in case of AMD, OpenGL 3.2 drivers came out only at the fall of 2009.</p>
<p><em><strong>Edit:</strong></em></p>
<p><em>Thanks to the information got from Pierre Boudier from AMD I&#8217;ve updated both the source and binary releases to support the latest drivers properly. The problem was that I didn&#8217;t use attribute location binding as specified in the standard.</em></p>
<p><em>Also have to mention that with my new Radeon HD5770 I managed to achieve over 90 frames per second that actually show that this technique can be in fact used for games and other interactive applications.</em></p>
<p><em>One more thing in the end. As you know this version of the Nature demo uses a texture buffer to source instance positions. I plan to create another version that will take advantage of the instanced arrays introduced in core with OpenGL 3.4. I expect quite a reasonable speedup as that would eliminate the need for texture fetches in the vertex array by rather dedicating a vertex fetcher for the purpose thus increasing the overall performance of the technique.</em></p>
<h3>Binary release</h3>
<p><strong>Platform:</strong> Windows<br />
<strong>Dependency:</strong> OpenGL 3.2 capable graphics driver<br />
<strong>Download link:</strong> <a href="http://rastergrid.com/blog/wp-content/uploads/2010/06/nature12_win32.zip" target="_blank">nature12_win32.zip (3.58MB)<br />
</a><strong>Comments:</strong> Includes the update that makes it work even with the latest drivers.</p>
<h3>Full source code</h3>
<p><strong>Language:</strong> C++<br />
<strong>Platform:</strong> cross-platform<br />
<strong>Dependency:</strong> GLEW, SFML, GLM<br />
<strong>Download link:</strong> <a href="http://rastergrid.com/blog/wp-content/uploads/2010/06/nature12_src.zip" target="_blank">nature12_src.zip (12.6KB)<br />
</a><strong>Comments:</strong> Sorry for the many dependencies, however, I would recommend the mentioned libraries for everybody who is doing OpenGL development.</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/feed/</wfw:commentRss>
		<slash:comments>46</slash:comments>
		</item>
	</channel>
</rss>

