<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>RasterGrid Blog</title>
	<atom:link href="http://rastergrid.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://rastergrid.com/blog</link>
	<description>A technical blog from Daniel Rákos (aka aqnuep)</description>
	<lastBuildDate>Wed, 30 Jun 2010 21:30:51 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Instance Cloud Reduction reloaded</title>
		<link>http://rastergrid.com/blog/2010/06/instance-cloud-reduction-reloaded/</link>
		<comments>http://rastergrid.com/blog/2010/06/instance-cloud-reduction-reloaded/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 19:36:38 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[attribute divisor]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[culling]]></category>
		<category><![CDATA[geometry instancing]]></category>
		<category><![CDATA[geometry shader]]></category>
		<category><![CDATA[GLEW]]></category>
		<category><![CDATA[GLM]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[instanced array]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[SFML]]></category>
		<category><![CDATA[texture buffer]]></category>
		<category><![CDATA[transform feedback]]></category>
		<category><![CDATA[uniform buffer]]></category>
		<category><![CDATA[vertex buffer]]></category>
		<category><![CDATA[vertex shader]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=251</guid>
		<description><![CDATA[
A few months ago I&#8217;ve presented an object culling mechanism that I&#8217;ve named Instance Cloud Reduction (ICR) in the article Instance culling using geometry shaders. The technique targets the first generation of OpenGL 3 capable cards and takes advantage of geometry shaders&#8217; capability to reduce the emitted geometry amount in order to get to a [...]]]></description>
			<content:encoded><![CDATA[
<div class="wp-caption alignleft" style="width: 160px"><img src="http://rastergrid.com/blog/wp-content/uploads/2010/02/Nature-2010-02-08-20-20-36-24-150x150.png" alt="" width="150" height="150" /><p class="wp-caption-text">OpenGL 3.3 - Nature</p></div>
<p>A few months ago I&#8217;ve presented an object culling mechanism that I&#8217;ve named Instance Cloud Reduction (ICR) in the article <a title="Instance culling using geometry shaders" href="http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/">Instance culling using geometry shaders</a>. The technique targets the first generation of OpenGL 3 capable cards and takes advantage of geometry shaders&#8217; capability to reduce the emitted geometry amount in order to get to a fully GPU accelerated algorithm that performs view frustum culling on instanced geometry without the need of OpenCL or any other GPU compute API. After the culling step the reduced set of instance data is fed to the drawing pass in the form of a texture buffers. In this article I will present an improved version of the algorithm that exploits the use of instanced arrays introduced lately in OpenGL 3.3 to further optimize it.</p>
<p><span id="more-251"></span>Lets recap the basics of the algorithm before I present the improved technique. The geometry shaders have a very nice feature that they cannot just emit a modified version of the input geometry but can also alter the number of emitted primitives compared to the number of received ones. This is a both-way ability what means that we cannot just increase but also decrease the number of primitives. That is what the technique takes advantage.</p>
<p>In the first pass we feed a simple vertex shader &#8211; geometry shader pair with the instance data of the geometries as they&#8217;ve been the data of point primitives. The vertex shader then checks whether the actual instance is inside the view frustum or not and sends the result to the geometry shader. If the result is yes then the geometry shader outputs the instance data otherwise discards it. The primitives emitted by the geometry shaders are captured then using transform feedback into a buffer object. Also a query object is needed in order to be able to get the amount of instances that passed the view frustum culling. In the drawing pass we use the result of the query to decide how many instances we have to draw and the captured feedback buffer is used as instance data.</p>
<div class="wp-caption aligncenter" style="width: 660px"><img src="http://rastergrid.com/blog/wp-content/uploads/2010/02/icr_combined.png" alt="" width="650" height="347" /><p class="wp-caption-text">Instance Cloud Reduction - Combined view of Pass 1 + Pass 2</p></div>
<p>This is a very brief description of the culling mechanism so for a complete specification please read the <a title="Instance culling using geometry shaders" href="http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/">original article</a>.</p>
<h3>Motivation</h3>
<p>While Instance Cloud Reduction is a quite robust technique that can severely simplify and speed up the rendering of high amount of instanced geometry its performance is also limited due to some hardware and API restrictions. The most important ones are the following:</p>
<ul>
<li>Needs an extra rendering pass to perform the culling.</li>
<li>Requires the usage of asynchronous queries to determine the number of visible instances.</li>
<li>Uses texture fetching in the vertex shader of the actual drawing pass.</li>
</ul>
<p>The first mentioned drawback means that more draw commands are required that use the output of the first pass as input. This and the second disadvantage may cause stalls due to the fact that the CPU has to wait for the data to be ready before issuing the second pass thus the GPU is not used effectively.</p>
<p>What this improvement tries to solve is the third problem. Texture fetching itself is quite fast in the latest generation of hardware, however it causes some slowdowns anyway due to the latency introduced by texture fetches even though GPUs use some latency hiding techniques.</p>
<p>Instanced arrays provide us a way to replace texture fetching with vertex fetching that is usually done by different hardware element that works synchronously with the execution of vertex shaders. I&#8217;ve expected quite a reasonable speedup by taking advantage of instanced arrays, however we will see that actual results were far from my initial expectations.</p>
<h3>Implementation</h3>
<p>Traditional vertex fetching happens in a way that one element is fetched from each enabled input attribute buffer and the vertex shader is issued with these values. One element in a vertex attribute buffer can mean up to four floating point or integer values and for each execution of the vertex shader one set of these elements is used. There is an internal counter that is increased after each fetch and the next vertex attribute fetch will use this counter as an index into the buffer object.</p>
<p>While this mechanism is satisfactory for the most attributes of a vertex, it is not practical for instance data as such data belongs to an instance rather than a vertex. In order to source instance data from vertex attributes in case of traditional vertex fetching, high amount of redundant storage is required in order to get the same information for all the vertices belonging to a particular instance. This is not just waste of memory but also waste of bandwidth and it also defeats the goal of Instance Cloud Reduction.</p>
<p>Compared to traditional vertex fetching, instanced arrays provide a way to increase the internal counter used as the index into the vertex attribute buffer in a different way, in particular one can set the frequency of increase using a vertex attribute divisor that specifies after how many instances the counter shall be increased. This is a per-attribute property and by setting it to one we end up with exactly what we need: one vertex fetch per instance.</p>
<p>This means that actually we need just a very minor change compared to the original technique, more precisely we replace our texture buffer with a vertex attribute buffer that has a divisor of one and use it as the source of instance data in the vertex shader of the drawing pass.</p>
<h3>Execution results</h3>
<p>As we are not talking about a new technique but just an optimized implementation of the same method, the best way to evaluate it is by comparing the performance of the new version with the original one.</p>
<p>As I&#8217;ve mentioned earlier, I expected a reasonable performance increase by replacing texture fetches with vertex fetches, in practice the difference was not so significant. However, the performance difference between the two implementation can heavily depend on the underlying hardware implementation so various cards from various vendors and GPU generations can show more diverging behavior. In fact even driver versions may have an effect on the results.</p>
<div class="wp-caption aligncenter" style="width: 620px"><img class="  " src="http://rastergrid.com/blog/wp-content/uploads/2010/06/comparison.png" alt="" width="610" height="139" /><p class="wp-caption-text">Performance comparison of the old implementation and the presented one on an AMD Radeon HD5770. Scale is in frames per second (higher value is better).</p></div>
<p>Due to lack of hardware to use for testing, I&#8217;ve checked only with one card, namely a Radeon HD5770 with Catalyst 10.6 drivers. I noticed roughly a 10% speedup as the the new version of the Nature demo showed 100 FPS compared to the 90 FPS observed with the old implementation.</p>
<p>Even though this was not exactly the outcome I&#8217;ve expected from the new implementation, maybe the assumption is still valid for older generation of GPUs or for NVIDIA cards. I suspect so because for Shader Model 4.0 cards the hardware implementation of the texture fetching unit and the vertex fetching unit was most probably more differentiated than that of the latest GPUs. Also my guess is that on NVIDIA cards the difference is maybe higher as the vertex fetching hardware in SM 4.0 GeForce cards is less flexible than that of AMD&#8217;s taking in consideration that the first HD series Radeons already had some form of tessellation functionality that requires more freedom from the vertex pushing hardware.</p>
<p>In order to get a better picture about how effective the presented optimization is, I would like to ask all the visitors of this post to try the two releases and send me feedback about it.</p>
<h3>Conclusion</h3>
<p>We&#8217;ve seen that how easy it was to take advantage of instanced arrays in an existing implementation of the ICR technique and how does it perform on the latest generation of GPUs compared to the previous version. While this small addition provides some benefits, it also comes at a cost and we have to talk about that as well.</p>
<p><strong>Advantages:</strong></p>
<ul>
<li>Eliminates the need for texture fetching in the vertex shader thus improving performance.</li>
<li>Does not compromise the goal and the implementation architecture of the original method.</li>
<li>Frees up one texture unit that was previously reserved for the texture buffer containing the instance data.</li>
</ul>
<p><strong>Disadvantages:</strong></p>
<ul>
<li>Requires OpenGL 3.3 or the <a title="GL_ARB_instanced_arrays" href="http://www.opengl.org/registry/specs/ARB/instanced_arrays.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/instanced_arrays.txt?referer=');">GL_ARB_instanced_arrays</a> extension in addition to the OpenGL 3.2 features.</li>
<li>We have to possibly sacrifice multiple vertex input attributes to feed the instance data to the shaders.</li>
</ul>
<p>Most of the mentioned benefits and drawbacks are self-explanatory, however I would like to say a few words about the last mentioned one&#8230;</p>
<p>For the purpose of showcase I used a simple translation factor as instance data that means a single vector of floats. In real life situation one may need more complex transformation data that can only be stored in the matrix. While in the demo the feeding of instance data consumed only one vertex attribute slot, in case of a full transformation matrix it would require four of them (not to mention other possible instance attributes). As the maximum number of input attributes is severely limited, usually to 16, the application of the optimization is restricted to situations when all the vertex and instance attributes fit into this limit.</p>
<p>In case of the original implementation, where a texture buffer was used as input, this did not cause any problem as the vertex shader is free to fetch any number of texels from that (still, performance can be a concern in this case). In order to help situations when input attribute slots are at a premium, in real life scenarios it is recommended to use quaternions instead of transformation matrices as they consume two times less attribute resources. Actually this can be a general recommendation as using quaternions decreases the bandwidth requirements of the instance data fetch thus increasing performance even in situations when there are enough input attribute slots available.</p>
<p>In order to ease the performance comparison for you, you can find download links for both versions of the Nature demo.</p>
<h3>Old version binary release</h3>
<p><strong>Platform:</strong> Windows<br />
<strong>Dependency:</strong> OpenGL 3.2 capable graphics driver<br />
<strong>Download link:</strong> <a href="http://rastergrid.com/blog/wp-content/uploads/2010/06/nature12_win32.zip">nature12_win32.zip (3.58MB)</a><br />
<strong>Comments:</strong> This version does <strong>NOT </strong>include the optimization presented in this article.</p>
<h3>Old version source code</h3>
<p><strong>Language: <span style="font-weight: normal;">C++</span><br />
Platform:</strong> cross-platform<br />
<strong>Dependency:</strong> GLEW, SFML, GLM<br />
<strong>Download link:</strong> <a href="http://rastergrid.com/blog/wp-content/uploads/2010/06/nature12_src.zip">nature12_src.zip (12.6KB)</a><br />
<strong>Comments:</strong> This version does <strong>NOT </strong>include the optimization presented in this article.</p>
<h3>New version binary release</h3>
<p><strong>Platform:</strong> Windows<br />
<strong>Dependency:</strong> OpenGL 3.3 capable graphics driver<br />
<strong>Download link:</strong> <a href="http://rastergrid.com/blog/wp-content/uploads/2010/06/nature20_win32.zip">nature20_win32.zip (3.58MB)</a><br />
<strong>Comments:</strong> This version includes the optimization presented in this article.</p>
<h3>New version source code</h3>
<p><strong>Language:</strong> C++<br />
<strong>Platform:</strong> cross-platform<br />
<strong>Dependency:</strong> GLEW, SFML, GLM<br />
<strong>Download link:</strong> <a href="http://rastergrid.com/blog/wp-content/uploads/2010/06/nature20_src.zip">nature20_src.zip (12.8KB)</a><br />
<strong>Comments:</strong> This version includes the optimization presented in this article.</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/06/instance-cloud-reduction-reloaded/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Common pitfalls of iPhone development</title>
		<link>http://rastergrid.com/blog/2010/05/common-pitfalls-of-iphone-development/</link>
		<comments>http://rastergrid.com/blog/2010/05/common-pitfalls-of-iphone-development/#comments</comments>
		<pubDate>Mon, 10 May 2010 19:04:20 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Telecommunication]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[IDE]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[Mac OS X]]></category>
		<category><![CDATA[mobile technology]]></category>
		<category><![CDATA[Objective-C]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[phone]]></category>
		<category><![CDATA[Visual Studio]]></category>
		<category><![CDATA[Xcode]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=234</guid>
		<description><![CDATA[
I haven&#8217;t written any posts lately. This is because I dug into iPhone application development and this really consumed most of my spare time. As you may remember, I&#8217;ve already mentioned that I would like to start dealing with mobile platforms as a target for my OpenGL related experiments and projects.  After Android, this time [...]]]></description>
			<content:encoded><![CDATA[
<p>I haven&#8217;t written any posts lately. This is because I dug into iPhone application development and this really consumed most of my spare time. As you may remember, I&#8217;ve already mentioned that I would like to start dealing with mobile platforms as a target for my OpenGL related experiments and projects.  After Android, this time I got my hands on a Mac mini and took a look at the currently most popular mobile gaming platform. Actually, these initial experiments wouldn&#8217;t take that long time if I would have to deal with just a new API and not with a brand new world with its own benefits and drawbacks.</p>
<p><span id="more-234"></span>I have a long experience in using Windows and Linux as a development platform with tons of different development environment and programming languages. Beside that, I&#8217;ve also done some Mac application development, at least if we can call a cross-platform application so that works on all of these three desktop operating systems. Taking in consideration these facts I thought that starting to develop under Mac OS X targeting the iPhone platform will be piece of cake as I just have to master one another programming language and API, namely Objective-C and Cocoa Touch. Well, it turned out that I was too optimistic and this is not that simple as it looks like (at least for me, who hardly ever used a Macintosh before).</p>
<h2><img class="alignleft" title="Mac OS X Leopard" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/05/leopard-logo.png" alt="Mac OS X Leopard" width="150" height="172" />The GUI of Mac OS X Leopard</h2>
<p>The first thing I&#8217;ve found very unusual is the user interface of the operating system. Primarily, I&#8217;m talking about the following little differences:</p>
<p>- The menu of the windows is at the top of the desktop, not the top of the window.</p>
<p>- The system buttons on the window title bar are on the left side, not on the right side.</p>
<p>- Double-clicking the title bar doesn&#8217;t maximize the window, it minimizes it instead.</p>
<p>- The most used key combinations like Copy-Paste and stuff like that are also different.</p>
<p>Okay, I know that these are such things that everybody know who already seen the GUI of Mac OS X, but still, it&#8217;s quite annoying that Apple is going totally against the rest of the PC world. I agree that different operating systems can define their own direction regarding to the layout of the GUI, but I wonder how I didn&#8217;t notice such huge conceptual differences when I&#8217;ve first started to work on Linux after several years of Windows user experience?</p>
<p>Anyway, these are just small things and it&#8217;s just a matter of time to get used to the new interface. Also, I can say that the Mac OS X user experience is excellent. I don&#8217;t say that it is any better than that of Windows or some well-made Linux distributions but it is not worse either.</p>
<h2><img class="alignleft" title="Xcode" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/05/xcode.png" alt="" width="150" height="150" />The Xcode IDE</h2>
<p>Those who heard anything about iPhone development know that the only legal possibility to write applications for this mobile platform is to do it under Mac OS X using the Xcode development environment. I know that thousands of people will blame me for what I&#8217;m going to say but I sincerely think that Xcode is the worst development environment I&#8217;ve worked with lately.</p>
<p>First of all, it was completely alien for me even though I worked with dozens of development environments ranging from basic text editors to full-fledged development environments with every tool integrated within them. While the most used editor features like syntax highlighting and code completion work flawless, there are huge amount of features missing from it that are common in other IDEs like a source code outline window. But this is not the only thing I can complain about&#8230;</p>
<p>As I mentioned it before, I dealt with many different development environments, the most noticeable ones are CodeGear RAD Studio, Visual Studio and Eclipse. What is common in these (and a lot of other Windows and Linux IDEs)  is that if you migrate from one to another you&#8217;ll most probably won&#8217;t notice too much except some minor differences in key bindings. I think actually this is the way how it should work as if I would be the head of a software development company I would invest in a development environment that can be easily learned no matter what my future employees used earlier. This reduces competence development cost and enables the programmers to work more effectively in a relatively short time-frame. Well, it seems that Apple disagrees with me as comparing the aforementioned IDEs and Xcode is like comparing an armchair with a stool.</p>
<p>In spite of all his faults, Xcode has also some clever solutions for some usual tasks that makes it popular even though Apple gone again against the rest of the world with this software so I don&#8217;t say that Xcode is completely useless, but what&#8217;s for sure is that it has a long way to go in order to be able to compete with the other development environments out there. I don&#8217;t even understand why didn&#8217;t they just made a simple Eclipse plugin for their purposes like all other players do?</p>
<p>While maybe I&#8217;ve already scared many of you from using Xcode, yet I haven&#8217;t talked about the &#8220;feature&#8221; that made me the most frustrated about my new development platform. As I already mentioned, there are big differences between the key bindings of usual PC platforms and Mac OS X. I haven&#8217;t really worried about them, because for the basic navigation it is not a huge burden to get used to it, but when it comes to code writing I&#8217;m more choosy&#8230;</p>
<p>It seems like in the Mac world the Home and End keys have totally different meaning, in particular they don&#8217;t control the position of the cursor in the current line instead they control it in the scope of the file. As I&#8217;m heavily using these keys during source code editing I figured out it would be a huge burden to get used to this little but important difference. Fortunately, Xcode has possibility to change the key bindings of the editor but I was already quite pessimistic about how much hassle I will have with the key binding so I thought one step further and started to google for a quasi normal key binding for Xcode. Soon, I&#8217;ve found a Visual Studio-like one <a title="Code Dojo - How to make Xcode feel like MS Visual Studio (MSVC)" href="http://www.codedojo.com/?p=580" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.codedojo.com/?p=580&amp;referer=');">here</a> that you can download below:</p>
<p><a title="Visual Studio Key Bindings for Xcode" href="http://rastergrid.com/blog/wp-content/uploads/2010/05/files/MSVC_xcode_config.zip">Visual Studio Key Bindings for Xcode</a></p>
<p>In order to install it, you only need to unzip the file <strong>MSVC.pbxkeys</strong>, copy it to /Users/YourUserName/Library/Application Support/Xcode/Key Bindings/ (if the directory doesn&#8217;t exist, just create it), restart Xcode and select the configuration set in the key binding tab of Xcode&#8217;s preferences window.</p>
<h2><img class="alignleft" title="Apple Developer Program" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/05/apple-logo.png" alt="" width="150" height="184" />The Apple Developer Program</h2>
<p>While the new key configuration already solved most of my problems that prevented me to efficiently start working on my first iPhone application, soon I realized how much of hassle an iPhone developer has to deal with on a daily basis&#8230;</p>
<p>As soon as I mastered the very basics of developing for the iPhone OS, I registered myself for the Apple Developer Program. It costs you $99 per year but it most probably worths its price as beside the fact that they deal with the publishing and marketing of your application, according to other developers I know, they also give adequate support for the money. The registration itself was very straightforward. I just had to fill and fax an application form and next day my developer program was already active. What made me outrageous at the end is the amount of administrative work that is required from the iPhone developers in order to solve the most simple tasks.</p>
<p>If you just want to play with the iPhone Simulator, the basic configuration of Xcode and the iPhone SDK is adequate, but as soon as you would like to try out your application on a real device, things get more complicated. As an example, I wrote my first cube rotating OpenGL app and wanted to test it on the iPhone of my friend. First, I thought that this will need only three things: connect the iPhone to the Mac, select the real device as target in Xcode and then press the Build&amp;Go button. Well, it turned out that it is much much more complicated.</p>
<p>First, you have to create a certificate on your Mac that you have to upload in the Apple Provisioning Portal and then download it from there, install on your Mac, then connect the iPhone to the Mac, open the Organizer in Xcode, copy the device identifier of the iPhone, go back to the Portal, add the device, then go back to Xcode and copy the reverse URI of your application and then add the application in the Portal, create a provisioning profile with the device-application pair on the Portal, download it and add to Xcode and now you can select the device as target and press the Build&amp;Go. This would be even acceptable if you would have to do it only once, as a preparation for development, but this is not the case. You have to create an individual provisioning profile for each and every device-application pair that you want to test. This is simply too much, especially in the early phases of development as you may start with several different test projects.</p>
<h2><img class="alignleft" title="iPhone SDK" src="http://www.rastergrid.com/blog/wp-content/uploads/2010/05/iphone_sdk.png" alt="" width="150" height="187" />iPhone SDK and Mac OS X Leopard</h2>
<p>You may think that after these steps I&#8217;ve already had my box rotating app running on the iPhone. Well, if it is so, then you&#8217;re wrong. Things got just more annoying after this&#8230; After investing money into a Mac mini and paying $99 to Apple for entering the Developer Program I though that I can startup with my first real project but I was wrong as well.</p>
<p>I bought the Mac mini from the brother-in-law of one of my friends. As he already developed some iPhone applications on the Mac earlier, I&#8217;ve already had Xcode and iPhone SDK 3.0 installed on it. The problem I&#8217;ve encountered is that just the day before my friend, Imi brought his iPhone to my place in order to test the application he had to update the firmware of the phone to iPhone OS 3.1.3. You may wonder, but Xcode with iPhone SDK 3.0 doesn&#8217;t work with an iPhone device that has the version 3.1.3 of OS installed on it.</p>
<p>Well, I thought I will just download the latest version of the SDK from Apple&#8217;s website, but I faced some further surprises&#8230; Actually there is a free download of Xcode 3.2 and iPhone SDK 3.2 at the site what you can freely download. The problem with this is that it works only on Mac OS X Snow Leopard but not on Leopard.</p>
<p>At this point I was thinking about that I&#8217;ve already invested a huge amount of money and now I have to pay another $29 for the OS update just to run my box rotating app on a real device. Anyway, I&#8217;ve chosen to invest this remaining amount as I really want to develop some games for iPhone. Unfortunately, I figured it out soon that you can order the OS update only inside US and there is no any way to grab a downloadable version for your money (at least as far as I can tell because I was already too pissed off to spend more time on Apple&#8217;s site to look around for a solution).</p>
<p>Our next idea was to downgrade the operating system of the phone to match the SDK what we have but Imi informed me that the oldest OS what you can downgrade for is 3.1.2. This was the time when I gone mad. I really found it ridiculous that someone buys a developer machine that was earlier capable for the purpose and afterwards, due to some stupid decision at Apple, it is simply not capable for it anymore, unless you pay more money to them for the OS upgrade. Anyway, at this time I would have been already satisfied if I can pay that $29 for the upgrade just to run the app.</p>
<p>It was actually pure luck that I didn&#8217;t give it up and started to google for people who met the same problem lately. Fortunately, I&#8217;ve found a guy struggling with the same problem and he <a title="iPhone SDK 3.1.3 Download for Leopard (if you don’t have Snow Leopard!)" href="http://www.dropthenerd.com/iphone-sdk-3-1-3-download-for-leopard-if-you-dont-have-snow-leopard/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.dropthenerd.com/iphone-sdk-3-1-3-download-for-leopard-if-you-dont-have-snow-leopard/?referer=');">posted</a> a download link for the SDK 3.1.3 that works also on Leopard. He really saved the day! If you are having the same problem just grab the SDK from the following address:</p>
<p><a title="iPhone SDK 3.1.3 and Xcode 3.1.4" href="http://developer.apple.com/iphone/download.action?path=%2Fiphone%2Fiphone_sdk_3.1.3__final%2Fiphone_sdk_3.1.3_with_xcode_3.1.4__leopard__9m2809a.dmg" onclick="pageTracker._trackPageview('/outgoing/developer.apple.com/iphone/download.action?path=_2Fiphone_2Fiphone_sdk_3.1.3_final_2Fiphone_sdk_3.1.3_with_xcode_3.1.4_leopard_9m2809a.dmg&amp;referer=');">Download iPhone SDK 3.1.3 and Xcode 3.1.4</a></p>
<h2>Conclusion</h2>
<p>Without the intension to blame the guys at Apple or anybody else, I can say that becoming an iPhone developer involves much more trouble that I have ever thought it could. Anyway, now I managed to get to the point where I can actually start to be productive so I won&#8217;t give it up now&#8230;</p>
<p>I planned to talk also about implementation related issues and tips in this article that you are most probably more interested in, but it seems that this topic is left for a future article as I had so much things to share now that it simply didn&#8217;t fit into the time-box. Anyway, I will recap on the topic in the near future.</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/05/common-pitfalls-of-iphone-development/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Going mobile with OpenGL ES</title>
		<link>http://rastergrid.com/blog/2010/04/going-mobile-with-opengl-es/</link>
		<comments>http://rastergrid.com/blog/2010/04/going-mobile-with-opengl-es/#comments</comments>
		<pubDate>Sun, 18 Apr 2010 16:34:53 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Samples]]></category>
		<category><![CDATA[Telecommunication]]></category>
		<category><![CDATA[Android]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[mobile technology]]></category>
		<category><![CDATA[Objective-C]]></category>
		<category><![CDATA[OpenAL]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[OpenGL ES]]></category>
		<category><![CDATA[phone]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=230</guid>
		<description><![CDATA[
Many things have changed since the first time the public put their hands on the first mobile phone device as these days the end user rarely makes their choices when buying a mobile equipment based on their telephony capabilities. In fact, nowadays these devices are one of the most popular entertainment platforms out there. The [...]]]></description>
			<content:encoded><![CDATA[
<p>Many things have changed since the first time the public put their hands on the first mobile phone device as these days the end user rarely makes their choices when buying a mobile equipment based on their telephony capabilities. In fact, nowadays these devices are one of the most popular entertainment platforms out there. The main problem for application developers is that these platforms tended to be very heterogeneous from point of view of hardware architecture as well as that of API support. Meanwhile things have changed. While the underlying hardware still varies a lot from device to device the work of application developers has been eased by having cross platform mobile operating systems and open standards. In particular OpenGL ES that is an embedded version of the popular graphics API. In this article I would like to talk about some of the big players of the mobile OS industry and about using OpenGL ES for creating impressive mobile applications.</p>
<p><span id="more-230"></span>The first version of the OpenGL ES specification has been released in order to provide a lightweight API for embedded graphics using a well-defined subset of the functionalities provided by the desktop version of OpenGL. While the specification is already out quite for a while, the wide adoption in the industry and the interest of application developers for it became strong only in the recent past. Currently, we have several mobile platforms that are bundled with 3D accelerators and provide a set of features via OpenGL ES that makes developers capable of creating games that weren&#8217;t possible even on desktop platforms about ten years ago.</p>
<h3>Going 3D on mobiles</h3>
<p>Those who know me, know that well that I was always interested in graphics, especially when using it for entertainment purposes. In particular, I was about to develop video games since the first time I&#8217;ve put my hands on a computer. This is no different now as well as now I&#8217;m writing about OpenGL ES and mobile platforms because I got interested in creating games for mobile phones.</p>
<p>As I&#8217;ve already mentioned before, the problem with developing for mobile equipments is the variety of hardware and software platforms that they are built on. As being somebody who is already familiar with desktop OpenGL, having OpenGL ES in the tool-set already eliminates some of the burden that I must face with.</p>
<p>Also when talking about application platform things have also changed a lot. Nowadays, we have just a few big players in the mobile OS industry thus easing the work of the developers. More precisely, if an application developer plans to go mobile and would like to grab the biggest market audience, can limit their efforts on the following platforms:</p>
<ul>
<li><strong>iPhone OS</strong> &#8211; This is the one that drives Apple&#8217;s iPhone mobile devices as well as the iPod Touch. It provides an application platform similar to that Mac developers got used to. It can be said that this platform is the most popular in the industry, especially when dealing with gaming applications.</li>
<li><strong>Android</strong> &#8211; This is the newest player in the field, brought by Google. While it&#8217;s a newbie in the industry it already captured the attention of tons of developers. We can say that currently Android and iPhone are dictating the direction of mobile entertainment.</li>
<li><strong>Symbian OS</strong> &#8211; Symbian has the largest share in most markets worldwide, still not that popular in the mobile gaming industry. It is the operating system running most of today&#8217;s Nokia phones.</li>
<li><strong>Windows Mobile</strong> &#8211; Microsoft&#8217;s product built on Windows CE, the company&#8217;s embedded operating system.</li>
<li><strong>RIM Blackberry OS</strong> &#8211; Operating system primarily designed for the business industry.</li>
</ul>
<p>While most of these mobile operating systems are built on the same design conceptions it is very difficult for the developer to create cross-platform applications for all these platforms as they vary on the language and tool-set support that minimizes the possibilities for code reuse. Unfortunately this is against the one of the most important rule of mobile development as to maximize portability.</p>
<p>It is not 100% true that there is no way to provide optimum portability for all these platforms, but if we choose this direction we are limited to two possibilities: cross-platform Java applications and web-based applications. While these seem to be excellent alternatives to native programming of the platforms, they severely limit the developer in creating applications that fully take advantage of the underlying hardware. This is when OpenGL ES comes into picture as all these platforms have API support thus providing at least some form of code reuse possibility when dealing with entertainment applications.</p>
<p>Now, I would like to continue with talking about the two platforms that I&#8217;m most interested in.</p>
<h3>iPhone OS</h3>
<p>I started to get involved in iPhone game development because one of my friends pushed me to after seeing the great success of his brother-in-law, <a title="zhooley's iPhone applications" href="http://www.zhooley.hu/iphone/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.zhooley.hu/iphone/?referer=');">zhooley</a> who had some great titles. Currently I don&#8217;t have a Mac yet to develop on, but already read some stuff about iPhone development. This is where the following information come from.</p>
<p>iPhone is currently is the most important platform for mobile application developers. It became such an important factor in the industry thanks to Apple&#8217;s AppStore. Previously there was little to no way for the end users to extend their mobile software base so easily. While this is good for the end user, it is maybe even better for application developers as AppStore provides them quite a large market audience.</p>
<p>The secret why iPhone is an excellent gaming platform lies in the palette of features that the phone hardware and the software frameworks provide. Just to mention the most important ones:</p>
<ul>
<li>Touch screen control with support for multi-touch events capturing the movement of up to five fingers.</li>
<li>Three accelerometers for tracking the spacial movement and direction of the device in all axes.</li>
<li>MVC inspired GUI framework for enhanced productivity.</li>
<li>Support for several industry standard APIs like OpenGL ES, OpenAL and much more.</li>
</ul>
<p>But that&#8217;s enough from the general speaking, let&#8217;s see what&#8217;s about OpenGL ES support on the iPhones&#8230;</p>
<p>As far as I can tell, not being an iPhone owner, the graphics hardware bundled with the mobile comes in form of PowerVR accelerators: MBX and SGX.</p>
<p>The PowerVR MBX has OpenGL ES 1.1 support, that is roughly equivalent to OpenGL 1.5, running a tile-based deferred renderer that is suitable for most 3D applications. That means it has only fixed function capabilities, however that is usually enough for most mobile applications. Also note that it has very limited amount of texture memory of 24MB.</p>
<p>The PowerVR SGX is a more powerful processor that also supports OpenGL ES 2.0, roughly equivalent to OpenGL 2.0, but has optimized fixed function shaders that provide flawless backward compatibility for OpenGL ES 1.1 applications.</p>
<p>The most important thing is still that all iPhones are able to do floating point maths natively and efficiently that is an important factor when dealing with OpenGL applications as the usage of the fixed point types can be quite a burden for developers, especially for those migrating from desktop development.</p>
<p>Additionally, the OpenGL ES implementation on iPhone provides some nice extensions like <a title="GL_OES_framebuffer_object" href="http://www.khronos.org/registry/gles/extensions/OES/OES_framebuffer_object.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.khronos.org/registry/gles/extensions/OES/OES_framebuffer_object.txt?referer=');">GL_OES_framebuffer_object</a>, <a title="GL_OES_compressed_paletted_texture" href="http://www.khronos.org/registry/gles/extensions/OES/OES_compressed_paletted_texture.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.khronos.org/registry/gles/extensions/OES/OES_compressed_paletted_texture.txt?referer=');">GL_OES_compressed_paletted_texture</a> and <a title="GL_OES_point_sprite" href="http://www.khronos.org/registry/gles/extensions/OES/OES_point_sprite.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.khronos.org/registry/gles/extensions/OES/OES_point_sprite.txt?referer=');">GL_OES_point_sprite</a>. Also, thanks to the iPhone simulator that comes with the SDK it is easy to test the application during development without an actual device. Still, one important hint to mention is that the iPhone simulator has different OpenGL ES capabilities than the actual hardwares and also the performance characteristics measured on the simulator should not be taken as valid measurements because the simulator does not really simulate the graphics hardware but only the software platform.</p>
<p>iPhone development is done using the Cocoa API and preferably Objective-C, however C, C++ and Objective-C++ can be also used for development. One just has to interface somehow the Cocoa API and the rest can be done almost in any native programming language. That is one of the key advantages of the iPhone platform that one can develop native applications and no need for Java or web-based solutions.</p>
<p>While iPhone may seem to be a perfect choice for mobile game platform, we should not forget about one big disadvantage of it, in particular that one cannot develop legal iPhone applications without owning a Mac.</p>
<h3>Android</h3>
<p>The Android platform was suggested by one of my workmates who just brought a Droid. That phone is actually a device capable to compete with the iPhone from both features and performance point of view.</p>
<p>Android is the big hit of the last year and my forecast is that it will be one of the most relevant platforms of the upcoming years. Google adopted the idea of Apple and they also created an open market for the softwares that the end user can easily download and install on their devices. This is the AndroidMarket that can easily become a powerful competitor of the AppStore.</p>
<p>While, as I said earlier, the Motorola Droid, as an example, does support about the same feature set that makes the iPhone an excellent gaming platform, this cannot be said about most of the phones running Android on them. This is maybe one of the biggest disadvantages of the Android platform. However, we can take this also as an advantage as it makes it possible for more phones to adopt this operating system.</p>
<p>As the Android operating system is running on various phones from different vendors with different hardware capabilities, there isn&#8217;t too much to talk about the graphics hardware capabilities except that some devices not just don&#8217;t have a graphics accelerator but they also lack of floating point support. This is another disadvantage as it forces developers to stick to fixed point math in their OpenGL ES applications to maximize portability or they have to maintain two different rendering paths.</p>
<p>Originally, Android supported only OpenGL ES 1.0 that is roughly equivalent to OpenGL 1.3. However, since NDK r3 there is also OpenGL ES 2.0 support for Android as well. The feature set here varies much more from both hardware point of view and extension support.</p>
<p>Development for Android is done in Java using a proprietary SDK for accessing the Android API. The SDK comes with a simulator that works fine, except the long initial boot time that I was really surprised about when first trying it out.</p>
<p>One advantage of the SDK that it can be used in virtually any operating system so application developers can work on either Windows, Linux, MacOSX or other platform. There is also a nice Eclipse plugin that makes application development for Android even easier. That&#8217;s why I started with this one.</p>
<p>Just to illustrate how easy to put together some working demo with a good SDK, I&#8217;ve created a simple box rotating app to demonstrate OpenGL ES usage on Android. From installation till having a working application it took no more than two hours. You can find the download links for both the source code and the binary release at the end of the article.</p>
<h3>Why mobile games?</h3>
<p>I am a person who was, is and will be interested in developing computer games. Previously, I was working with desktop platforms and at the time when I was 10 years old it was satisfactory to put together some simple 2D game but not now.</p>
<p>I had always planned to create a state-of-the-art game engine and use it for some game, like most people like me do, but the efforts of one is simply unsatisfactory to compete with the players in the industry out there. Even if I feel the capability to be able to write such an engine but it would take that much time that I simply don&#8217;t have since I am working. Even if I would manage to accomplish it in a year or two then the problem with content creation comes into picture. For an AAA PC game content creation takes several times more than the actual programming and here I even lack the knowledge to achieve it. On the other hand mobile game creation is a much shorter process when you can get to actual results in a matter of weeks that is far better compared to PC game creation.</p>
<p>Also, I would never use third party game engines, except some basic libraries like OpenGL, a physics library and things like that because otherwise I wouldn&#8217;t feel the results being my own creation.</p>
<p>Having game development as a hobby works well during high school and university but it gets quite difficult after you are out there in the world having a job and responsibilities. Maybe I should have been already taking my time before to develop something concrete for PC but, as most fellow hobbyist know, you usually end up having hundreds of unfinished projects.</p>
<p>While I would never forget about desktop platforms and I will actively keep myself up with the evolution of the industry, mobile application development opened another world for me where I can unfold myself.</p>
<h3>HelloAndroid Demo</h3>
<p>Source code: <a href="http://rastergrid.com/blog/wp-content/uploads/2010/04/files/helloandroid_src.zip">helloandroid_src.zip</a><br />
Binary release: <a href="http://rastergrid.com/blog/wp-content/uploads/2010/04/files/HelloAndroid.apk">HelloAndroid.apk</a></p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/04/going-mobile-with-opengl-es/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sad facts about OpenGL extension libraries</title>
		<link>http://rastergrid.com/blog/2010/03/sad-facts-about-opengl-extension-libraries/</link>
		<comments>http://rastergrid.com/blog/2010/03/sad-facts-about-opengl-extension-libraries/#comments</comments>
		<pubDate>Wed, 31 Mar 2010 21:15:16 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[GLee]]></category>
		<category><![CDATA[GLEW]]></category>
		<category><![CDATA[GLLoader]]></category>
		<category><![CDATA[OpenGL]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=224</guid>
		<description><![CDATA[
Everybody who used to make OpenGL applications, whether it be a simple triangle-of-death demo or a comprehensive rendering engine at some point needs to use extensions or later OpenGL versions. Usually many people start this by creating their own initializer library that loads the required entry points from the OpenGL library by hand. What is [...]]]></description>
			<content:encoded><![CDATA[
<p>Everybody who used to make OpenGL applications, whether it be a simple triangle-of-death demo or a comprehensive rendering engine at some point needs to use extensions or later OpenGL versions. Usually many people start this by creating their own initializer library that loads the required entry points from the OpenGL library by hand. What is sure is that at some point everybody realizes that this process is just a waste of time and starts to look for an extension loading library out there. This is the obvious solution as it makes no sense to reinvent the wheel all the time. However, after using a particular one from the repertoire of these libraries one will face the problem that they are not that nice as they seemed before. In this article I will talk about some of these libraries and some of my thoughts about them.</p>
<p><span id="more-224"></span>OpenGL is evolving in a more and more fast manner nowadays and it is crucial to have an extension library that serves your needs and is up-to-date enough so you can easily adopt the latest features of the API. The sad truth is that this is not the case, at least there are some pitfalls that can cause you a lot of headaches when relying on these libraries.</p>
<p>This week, I planned to create a new version of my Nature demo first presented in my article <a title="Instance culling using geometry shaders" href="http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/">Instance culling using geometry shaders</a> that adopts the latest features of OpenGL 3.3, especially concentrating on how <a title="GL_ARB_instanced_arrays" href="http://www.opengl.org/registry/specs/ARB/instanced_arrays.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/instanced_arrays.txt?referer=');">GL_ARB_instanced_arrays</a> can improve the throughput of the technique. I know that I would be able to do this without actual OpenGL 3.3 support from the extension library by using the extension itself rather than the core functions, but how it would like if I publish a demo stating it&#8217;s OpenGL 3.3 but I use internally OpenGL 3.2 with extensions? That would be just too airy.</p>
<p>As you may know, I used GLEW in my previous demo as the extension library of choice, but I had some difficulties at that time as well. This time I got pissed off much easier as the bad memories put their mark on my preconception. I will talk about the reasons behind this later. First I would like to clarify the subject behind this article.</p>
<p>As I had several bad experiences with various libraries, especially this week, I thought it would be nice to write a summary of the possibilities have regarding to the topic and what are the advantages and disadvantages of each, at least based on my experiences. The libraries I will talk about are: <a title="GLEE" href="http://elf-stone.com/glee.php" target="_blank" onclick="pageTracker._trackPageview('/outgoing/elf-stone.com/glee.php?referer=');">GLee</a>, <a title="GLEW" href="http://glew.sourceforge.net/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/glew.sourceforge.net/?referer=');">GLEW</a> and <a title="GLLoader" href="http://sourceforge.net/projects/klayge/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/sourceforge.net/projects/klayge/?referer=');">GLLoader</a>. I don&#8217;t want to blame the people who developed these libraries so excuse me if I will be sometimes too harsh as I should be already satisfied with the fact that at least I have the opportunity to use such open source tools instead of reinventing the wheel myself, but if you understand the background of my feelings you may easily accept why I&#8217;m unsatisfied&#8230;</p>
<p>A few years ago I was developing all my hobby projects in Delphi/Free Pascal due to the great facilities and the nicer coding style that the language provides. I changed my mind in the recent past and switched to C/C++ because of the greater developer community behind the languages and to eliminate the need to develop supporting libraries for myself that already exist for other languages. So my assumption was that if I do this shift I will have far less problems with gathering the modules that will do the secondary stuff needed by my projects. Unfortunately, I soon became very disappointed due to the quality properties of those third-party tools that I thought can be the holy grail for my hobby projects. Of course, I had many good experiences, there is also one that <a title="Flawless alternative to SDL" href="http://rastergrid.com/blog/2010/01/flawless-alternative-to-sdl/">I shared with you</a>. But enough from the doublespeak, let&#8217;s see what we have&#8230;</p>
<h3>GLee (GL Easy Extension library)</h3>
<p>I don&#8217;t have too much experience with this library as the last time I used it was years ago. Because of this, it is hard to say too much about it but I think one simple facts are enough to justify why this library should not be the primary choice for enthusiasts: it is pretty outdated as it supports only OpenGL 3.0 that has been released quite a long time ago. It even seems to me that it is no longer developed. It is written as hard-coded source that is not even an easy to maintain library, most probably that is the reason behind the disappearance of it.</p>
<p>Beside that, there are also some good things about this one. Namely it comes with a source code that can be easily integrated and compiled without the need of any third party software and I seriously consider this as an advantage, later you will understand why. Also, it comes with BSD license that allows enough freedom for almost any project.</p>
<p>Anyway, even though this library is simple and good for many applications, the fact that it&#8217;s outdated and not really maintained is a warning sign that every developer starting to use it must consider.</p>
<h3>GLEW (OpenGL Extension Wrangler library)</h3>
<p>As I mentioned earlier, this is the library I used for my latest demos so I have up-to-date information about it. The one thing I like in it the most is that there is an excellent design idea behind it that, in theory, would make it the most superior library for the purpose. I intensionally used the expression &#8220;in theory&#8221;, I will explain it soon. The library comes with BSD or MIT license that is also a nice thing.</p>
<p>GLEW has a very nice build system behind it that can automatically download extension specifications from the <a title="OpenGL extension registry" href="http://www.opengl.org/registry/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/?referer=');">OpenGL Extension Registry</a> and generate the library using that information. Unfortunately, this build system is the one that can make you much headache if you would like to use it under Windows as it relies on many POSIX tools.</p>
<p>According to the homepage, GLEW shall be compiled very easy even under Windows with <a title="cygwin" href="http://www.cygwin.com/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.cygwin.com/?referer=');">cygwin</a>. I have only two problems with this:</p>
<ol>
<li>I use Windows and I use <a title="MinGW" href="http://www.mingw.org/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.mingw.org/?referer=');">MinGW</a> for compilation and even with the tool-set of <a title="MSYS" href="http://www.mingw.org/wiki/MSYS" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.mingw.org/wiki/MSYS?referer=');">MSYS</a> the building of the library renders almost impossible.</li>
<li>As a last resort, I also tried cygwin but had similar results with it as well.</li>
</ol>
<p>Actually this wouldn&#8217;t be a problem on its own if the project maintainers would release Windows binary versions at least in case of every update of the OpenGL core specification or, at least, they would generate the source files in such cases as for compiling the sources themselves there is already a nice Makefile and a Visual Studio project file as well. Unfortunately, this is not the case. I don&#8217;t remember when they&#8217;ve released the binaries for OpenGL 3.2 but I&#8217;m sure it was far after the specification release. Anyway, the best way to solve the problem would be to create a build system that works also on Windows as it makes the library quite inconvenient for Windows developers.</p>
<h3>GLLoader (part of Klay Game Engine)</h3>
<p>I heard about this one just in the recent past, after the release of OpenGL 3.3 and 4.0. This was the first extension library stating that it&#8217;s OpenGL 4.0 capable. Well, there is not too much to complain about this tool, only just a few things:</p>
<ul>
<li>By default, it comes with a dynamic library project file and I don&#8217;t really like to supply a DLL with my demos just for extension loading.</li>
<li>There are some mistakes in the code (at least I found one related to glMapBufferRange function).</li>
</ul>
<p>Also, unfortunately, it comes with GPL licensing that is too restrictive for many use cases.</p>
<h3>Conclusion</h3>
<p>There are actually several possibilities if one has to choose an OpenGL extension library, but unfortunately each has its drawbacks. GLEW would be obviously the most superior solution if there wouldn&#8217;t be problems regarding to its build system. I would even consider correcting the problems during compilation (what usually occur due to the code generation as it does not handle all special cases, like it happened earlier with WGL_ARB_create_context that had multiple versions of it).GLee seems to be a definite no, but GLLoader is maybe a factor to consider.</p>
<p>We will see how each project will progress and whether they will care about the minor but annoying problems with their products (I already created some bug/support report regarding to the met issues mentioned). Anyway, currently it seems that this area still has unfilled gaps so one can easily drop in with its own library and capture the attention of developers who are seeking for third-party supplementary tools. My hope is that the developers of GLEW will make a step and resolve the issues thus creating a real plug &amp; play library that everybody can rely upon.</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/03/sad-facts-about-opengl-extension-libraries/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>A brief preview of the new features introduced by OpenGL 3.3 and 4.0</title>
		<link>http://rastergrid.com/blog/2010/03/a-brief-preview-of-the-new-features-introduced-by-opengl-3-3-and-4-0/</link>
		<comments>http://rastergrid.com/blog/2010/03/a-brief-preview-of-the-new-features-introduced-by-opengl-3-3-and-4-0/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 16:23:17 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[fragment shader]]></category>
		<category><![CDATA[geometry instancing]]></category>
		<category><![CDATA[geometry shader]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[synchronization]]></category>
		<category><![CDATA[tessellation]]></category>
		<category><![CDATA[tessellation control shader]]></category>
		<category><![CDATA[tessellation evaluation shader]]></category>
		<category><![CDATA[texture array]]></category>
		<category><![CDATA[texture buffer]]></category>
		<category><![CDATA[transform feedback]]></category>
		<category><![CDATA[uniform buffer]]></category>
		<category><![CDATA[vertex shader]]></category>
		<category><![CDATA[vertex stream]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=207</guid>
		<description><![CDATA[
The Khronos Group continues the progress of streamlining the OpenGL API. One very important step in this battle has been made just a few days ago by releasing two concurrent core releases of the OpenGL specification, namely version 3.3 and 4.0. This is a major update of the standard containing many revolutionary additions to the [...]]]></description>
			<content:encoded><![CDATA[
<p>The Khronos Group continues the progress of streamlining the OpenGL API. One very important step in this battle has been made just a few days ago by releasing two concurrent core releases of the OpenGL specification, namely version 3.3 and 4.0. This is a major update of the standard containing many revolutionary additions to the tool-set of OpenGL that need careful examination. In this article I would like to talk about these new features trying to point out their importance and touching also some practical use case scenarios.</p>
<p><span id="more-207"></span>This is the fourth revision of the OpenGL API standard in the last two years. This fast pace revolution started about one and half years ago with the release of the version 3.0 of the specification. At that time, a great feel of disappointment has overcame the developers due to the lack of the promised rewrite of the whole API. Others, who had to deal with legacy code were also disappointed but they felt so because the new revision of the API threatened them with removing old features. These two opposing forces have put the Khronos Group into a situation where there was very difficult to make a decision that would make everybody happy. After two releases, this issue has been mostly resolved with OpenGL 3.2 and also lots of missing features have been integrated into the core API meanwhile.</p>
<p>Even though great steps has been made in order to fulfill everybody&#8217;s needs, the gap between the core functionality of OpenGL and the DirectX API still increased, especially due to the introduction of Shader Model 5.0 hardware. OpenGL was in a position when it had to adopt the features of the new hardware generation and also try to make up leeway in case of Shader Model 4.0 hardware. My personal wish was that there should be two new versions of the API: one that complements the OpenGL 3.x API with the missing features and another that catches up to DirectX 11. Actually my wish became true as the first time in the history of OpenGL we got two new releases of the standard at once, and finally, we got an API that is a really competitive alternative for Microsoft&#8217;s DirectX API. I think I can say this in the name of every OpenGL developer: Thank you Khronos!</p>
<p>Okay, but that&#8217;s enough about history and acknowledgements. Lets see what&#8217;s under the hood of the new API revisions! When I read the good news at <a title="OpenGL.org" href="http://www.opengl.org/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/?referer=');">OpenGL.org</a> I felt myself like a child at Christmas just taking the first look at the presents under the tree: I was in great ecstasy and started to &#8220;open the presents&#8221; as fast as I could&#8230;</p>
<h2>New features of OpenGL 3.3</h2>
<p>Let&#8217;s start with the new version of the API targeting Shader Model 4.x hardware. It seems that the concentration on the major release 4.0 didn&#8217;t capture the attention of the ARB explicitly as we have many interesting features already in the first box&#8230;</p>
<h3><a title="ARB_blend_func_extended" href="http://www.opengl.org/registry/specs/ARB/blend_func_extended.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/blend_func_extended.txt?referer=');">ARB_blend_func_extended</a></h3>
<p>This is a feature for what I&#8217;ve seen many requests on the OpenGL discussion forums. It enables fragment shaders to output an additional color per render target that can be used as a blending factor for either source or destination colors providing an additional degree of freedom to affect the way how fragments are blended into the destination buffers. This is one functionality that is supported by the underlying hardware for a while but without API support it was impossible to take advantage of it. As it is very straightforward how this feature works I would not even talk about it too much. Just one additional comment: surprisingly <a title="ATI Catalyst 10.2: Better CrossFire and OpenGL Support" href="http://www.geeks3d.com/20100218/test-ati-catalyst-10-2-better-crossfire-and-opengl-support/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.geeks3d.com/20100218/test-ati-catalyst-10-2-better-crossfire-and-opengl-support/?referer=');">AMD already supports this extension</a> in its latest graphics drivers which is a remarkable thing taking in consideration that AMD drivers were always a step behind the NVIDIA ones in the race of adopting latest OpenGL features. It seems that now AMD takes seriously the OpenGL support and this is good news for all the developers out there, especially for me, being an ATI fan.</p>
<h3><a title="ARB_explicit_attrib_location" href="http://www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt?referer=');">ARB_explicit_attrib_location</a></h3>
<p>Most probably not just for me, the way how the binding of vertex attributes to shader attributes and the binding of shader outputs to render targets happened earlier caused a big headache from both the point of view of modular software design and efficiency. Previously, the application developer had little to no control over how to automatically connect these elements together in a shader independent way. This tight coupling between the host application code and the shaders just make the work of the developers cumbersome. This feature leverages the way how this binding process is done by allowing to globally assign a particular semantic meaning to an attribute location without knowing how that attribute will be named in any particular shader, decoupling the host application from the shaders. This extension is a typical example how design abstractions can ease the life of the developer without any dependency on hardware support.</p>
<h3><a title="ARB_occlusion_query2" href="http://www.opengl.org/registry/specs/ARB/occlusion_query2.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/occlusion_query2.txt?referer=');">ARB_occlusion_query2</a></h3>
<p>Well, there isn&#8217;t too much to say about this extension as it just adds a new occlusion query type that reports just a boolean value about the visibility of the object rather than the actual samples. It is somewhat equivalent to the occlusion query extensions prior to <a title="ARB_occlusion_query" href="http://www.opengl.org/registry/specs/ARB/occlusion_query.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/occlusion_query.txt?referer=');">ARB_occlusion_query</a>. Don&#8217;t ask me why this feature is important but they felt that it might be useful. One thing I can think about that with such a query we might get our results about the occlusion query of the proxy object sooner as we have to wait only till the first passed sample but I&#8217;m not confident whether such thing is supported by either the hardware or the drivers.</p>
<h3><a title="ARB_sampler_objects" href="http://www.opengl.org/registry/specs/ARB/sampler_objects.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/sampler_objects.txt?referer=');">ARB_sampler_objects</a></h3>
<p>This is one another feature that people have been waiting for years. This extension decouples texture image data from sampler state. Previously, if a texture image had to be used with different sampler modes, no matter if we talk about various filtering modes or texture coordinate wrapping, one had to do expensive state changes to modify the sampler state of the texture object, accomplish the needed filtering or wrapping from within shaders or, in worst case, duplicating texture image data in order to have access to the same texture with different sampler parameters. The primary intend of this feature is to solve these problems.</p>
<p>One thing to remark regarding to this extension is that even though it is a long waited addition to the API, several people already expressed their discontent regarding to the fact that the texture unit semantics have been kept. Nevertheless, I also expected that the introduction of this feature should be the point when the texture unit semantics has to go but after seeing the example of <a title="ARB_explicit_attrib_location" href="http://www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt?referer=');">ARB_explicit_attrib_location</a> as a way to decouple the shader code from the host application code I tend to agree with Khronos in this decision as we can think about the texture units from now as an adapter layer between GPU and CPU code and as such the decision seems reasonable.</p>
<h3><a title="ARB_shader_bit_encoding" href="http://www.opengl.org/registry/specs/ARB/shader_bit_encoding.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_bit_encoding.txt?referer=');">ARB_shader_bit_encoding</a></h3>
<p>This extension adds built-in functions for getting and setting the bit encoding for floating-point values in the OpenGL Shading Language. As it is more like an indicator extension regarding to added functionality in the Shading Language I would rather not go into details as I will talk about the new Shading Language later.</p>
<h3><a title="ARB_texture_rgb10_a2ui" href="http://www.opengl.org/registry/specs/ARB/texture_rgb10_a2ui.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_rgb10_a2ui.txt?referer=');">ARB_texture_rgb10_a2ui</a></h3>
<p>Again, an extension that is quite self-explanatory: new texture image format called RGB10_A2 with non-normalized unsigned integers in them. This is nothing more than another hole filled in the gap between hardware and API support.</p>
<h3><a title="ARB_texture_swizzle" href="http://www.opengl.org/registry/specs/ARB/texture_swizzle.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_swizzle.txt?referer=');">ARB_texture_swizzle</a></h3>
<p>Especially when using one or two component texture formats, like in the case of shadow maps, the specification was somewhat unclear how these components are finally mapped to RGBA quadruples and provided little to no facilities to control this process. If the developers weren&#8217;t already fed up with this, the possibility of a problem increased even further because often the driver implementations behaved differently as well. This issue has been finally clarified with this extension by providing an explicit tool for the application developer to control the swizzling of the components that is done implicitly afterwards in case of every single texture fetch. The new state is introduced as part of texture object state that provides fine grained control over when and how to use the swizzling. According to the extension specification, this feature has a notable role in helping porting issues of legacy OpenGL applications as well as those of the games written for PlayStation 3 as the console provides such functionality already.</p>
<h3><a title="ARB_timer_query" href="http://www.opengl.org/registry/specs/ARB/timer_query.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/timer_query.txt?referer=');">ARB_timer_query</a></h3>
<p>Prior to this extension, runtime performance measurements were limited to the use of client side timing information or relying on the use of offline profiling mechanisms like that of AMD&#8217;s <a title="GPU PerfStudio" href="http://developer.amd.com/gpu/perfstudio/Pages/default.aspx" target="_blank" onclick="pageTracker._trackPageview('/outgoing/developer.amd.com/gpu/perfstudio/Pages/default.aspx?referer=');">GPUPerfStudio</a>. During development, this timing information can help identify application, driver or GPU bottlenecks. At runtime, this data can be used to dynamically optimize the scene to achieve reasonable frame rates. While today&#8217;s hardware provides a great repertoire of performance measurement metrics there was no API support to access these previously. This feature provides an additional asynchronous query type that enables application developers to measure the driver and GPU time that is required to complete a set of rendering commands, thus providing additional flexibility for both offline and runtime optimizations. While this extension does not guarantee 100% consistency and repeatability, the information gathered with timer queries will definitely make it possible to identify server side bottlenecks and the reasons behind them.</p>
<h3><a title="ARB_instanced_arrays" href="http://www.opengl.org/registry/specs/ARB/instanced_arrays.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/instanced_arrays.txt?referer=');">ARB_instanced_arrays</a></h3>
<p>Many people argued with me at the OpenGL discussion forums when I stated that instanced arrays should be included in core OpenGL. Their reasoning was built on the fact that we already have the <a title="ARB_draw_instanced" href="http://www.opengl.org/registry/specs/ARB/draw_instanced.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_instanced.txt?referer=');">ARB_draw_instanced</a> extension that provides a shader based thus much more flexible way to handle instanced geometry. While from this point of view I tend to agree with them, there are many non-trivial use cases which prove that my reasoning is not pointless. It seems that Khronos agrees with me regarding to this topic.</p>
<p>In a nutshell, the instanced arrays feature enables the use of vertex attributes as a source of instance data. This is done by introducing a so called &#8220;array divisor&#8221; that specifies how the corresponding vertex attributes are mapped to instances. Usually a vertex attribute advances on a per-vertex basis. In case of instanced arrays this advance happens only after ever<span style="font-size: small;">y Nth conceptual draw calls that is equivalent to  a traditional draw command, excluding instanced draw commands.</span></p>
<p>One use case can be when one deals with huge number of instances where the per-instance data simply not fits into uniform buffers. While in such cases one can use a texture buffer instead to source the instance data like it was mentioned in my article <a title="Uniform Buffers VS Texture Buffers - RasterGrid Blog" href="http://rastergrid.com/blog/2010/01/uniform-buffers-vs-texture-buffers/">Uniform Buffers VS Texture Buffers</a>, accepting the additional overhead of using texture fetches may prove to be a not-so-performance-wise decision. Beside standard instancing use cases, there are plenty of nasty tricks that can be efficiently achieved using this feature but that goes far beyond the scope of this article and requires a separate discussion on what I will most probably recap in the near future.</p>
<h3><a title="ARB_vertex_type_2_10_10_10_rev" href="http://www.opengl.org/registry/specs/ARB/vertex_type_2_10_10_10_rev.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/vertex_type_2_10_10_10_rev.txt?referer=');">ARB_vertex_type_2_10_10_10_rev</a></h3>
<p>We&#8217;ve arrived to the final new extension included in core OpenGL 3.3. This is another gap filling extension to provide two new vertex attribute data formats: a signed and an unsigned format with 10 bits for each significant coordinate. The most typical use of this format is to store vertex normals in the signed-normalized version of the format in order to have a compact (4 bytes per normal) yet high precision (due to 10 bits per component) format that can reduce memory needs and bandwidth requirements while retaining sufficient precision. Previously, there was no way to have such high precision for the vertex attributes in case of a 4-byte footprint.</p>
<h3>The OpenGL Shading Language 3.30</h3>
<p>The first remarkable thing is the shift in the versioning of the Shading Language. It seems that from now it will be in align with the core specification version. This decision was most probably made because of the introduction of two release branches of the standard specification in order to avoid confusion regarding to the correspondence between API and Shading Language versioning.</p>
<p>As in case of talking about the OpenGL Shading Language it is much more difficult to easily summarize the new features with corresponding use cases I will simply limit my comments to an excerpt from its specification regarding to the features added in this new version:</p>
<ul>
<li>Layout qualifiers can be used to declare the location of vertex shader inputs and fragment shader outputs in align with the API functionality provided by <a title="ARB_explicit_attrib_location" href="http://www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/explicit_attrib_location.txt?referer=');">ARB_explicit_attrib_location</a> as mentioned before.</li>
<li>Built-in functions provided to converting floating-point values to integer ones representing their encoding.</li>
<li>Some clarification of already existing facilities of the language.</li>
</ul>
<h2>New features of OpenGL 4.0</h2>
<p>It is very obvious that the major version number change indicates that this revision of the specification is targeting Shader Model 5.0 hardware. To be honest, as I was never really interested in DirectX, I barely know all the features introduced by DX11 but seems that there are some great facilities in OpenGL 4.0 that I&#8217;ve never heard that hardware supports it. This can be due to DX11 does not even support such functionalities but it is maybe because I don&#8217;t know enough details about DX11. Anyway, let&#8217;s see the revolutionary things that we face we checking out the latest version of the OpenGL specification&#8230;</p>
<h3><a title="ARB_draw_buffers_blend" href="http://www.opengl.org/registry/specs/ARB/draw_buffers_blend.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_buffers_blend.txt?referer=');">ARB_draw_buffers_blend</a></h3>
<p>Using this feature one is able to select individual blend equations and blend functions for each render target. This extension was already exposed for a few months now so most probably everybody heard about it or even if not the functionality is very straightforward. It simply removes some of the restrictions when dealing with multiple render targets (MRT). One interesting thing is still that the Khronos Group decided to include this extension in the 4.0 version of the API but not in 3.3. This is odd as Shader Model 4.0 capable hardware already supports this feature or at least I have the extension on my Radeon HD2600 which raises the question: why only in 4.0? Unfortunately, I don&#8217;t know the answer but I hope the ARB has a good reason behind this, as we will see later, there are other features that for some reason were only exposed in the latest version of the API but not in core for Shader Model 4.0 hardware.</p>
<h3><a title="ARB_sample_shading" href="http://www.opengl.org/registry/specs/ARB/sample_shading.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/sample_shading.txt?referer=');">ARB_sample_shading</a></h3>
<p>In case of traditional multisample rendering the hardware optimizes the multisampling in a way that the fragment shader is executed only once for each fragment. This can be done as the standard specification relaxes the way how the implementation behaves regarding to feeding color and texture coordinate values for each sample. While this optimization usually does not provide any rendering artifacts and it heavily reduces the amount of pressure on the GPU, there are some situations when this optimization results in aliasing artifacts. One sample use case is when alpha-tested primitives are rendered.</p>
<p>This extension provides a global state for enabling and disabling sample shading and a way to control how fine-grained per-sample shading should be by supplying a minimum number of samples that need to be shaded. Beside this, it also introduces the required language elements to the OpenGL Shading Language to support sample shading.</p>
<h3><a title="ARB_shader_subroutine" href="http://www.opengl.org/registry/specs/ARB/shader_subroutine.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/shader_subroutine.txt?referer=');">ARB_shader_subroutine</a></h3>
<p>In my humble opinion, this is one of the most important features introduced in this new version of the API specification. So far, many engine and shader developers faced the problems that where inherently there in the Shading Language that heavily reduced the ability to create a modular shader design in order to separate the independent tasks done in shaders nowadays. One initiative was the idea behind the <a title="EXT_separate_shaders" href="http://www.opengl.org/registry/specs/EXT/separate_shader_objects.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/EXT/separate_shader_objects.txt?referer=');">EXT_separate_shader_objects</a> extension. While that extension removed the dependency between shader stages, it does not address the problem with tight coupling inside one shader stage, also the aforementioned extension defeats some of the design goals of the Shading Language introducing complicated language semantics in order to solve the problem of inter-stage dependency.</p>
<p>Just to emphasize the importance of this new functionality with a very basic example, let&#8217;s take a simple rendering engine that supports skeletal animated geometry, materials and lights. In such a use case both the vertex and fragment shaders have multiple roles: the vertex shader has to perform the skeletal animation (property of the geometry) and the view transformation (property of the camera or of the light in case of shadow map rendering), and the fragment shader has to calculate the incident light to the surface point (property of the light) and then calculate the illuminance factor (property of the material). With the traditional tool-set these components of the shaders were tightly coupled and in order to support the combination of any geometry type (animated or not, skeletal or morph animation, etc.), any light type (directional, point, etc.) and material type (diffuse, phong, environment mapped, etc.), one had to compile all possible combinations of the shaders or create uber-shaders that do run-time decisions in order to solve the problem of heterogeneous inputs. Both of these solutions provide additional hardware resource usage and possible runtime overhead.</p>
<p>This extension adds some kind of polymorphism support to shaders. This way a single shader can include many alternative subroutines for a particular task and dynamically select through the API which subroutine is called from each call site. This opens the doors for modular shader designs while retaining most of the performance of specialized shaders.</p>
<h3><a title="ARB_tessellation_shader" href="http://www.opengl.org/registry/specs/ARB/tessellation_shader.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/tessellation_shader.txt?referer=');">ARB_tessellation_shader</a></h3>
<p>Yes, this is about the new geometry tessellation mechanism introduced by Shader Model 5.0 hardware. The extension itself introduces three new stages that are roughly situated between the vertex shader and the geometry shader:</p>
<ul>
<li><strong>Tessellation Control Shader</strong> &#8211; This new shader type operates on a patch that is actually nothing more than a fixed-size collection of vertices, each with per-vertex attributes and a number of associated per-patch attributes. Also note that while it operates on a patch, it is invoked on a per-vertex basis. The most important rule of this shader is to perturb the tessellation level for the patch that controls how finely the patch will be tesselated. Usually think about a patch as a triangle or quad. This shader is equivalent to DX11&#8217;s hull shader.</li>
<li><strong>Fixed-function tessellation primitive generator</strong> &#8211; The role of this new stage is to subdivide the incoming patch based on the tessellation level and related configuration that the unit gets as input.</li>
<li><strong>Tessellation Evaluation Shader</strong> &#8211; This new shader type is responsible of calculating the position and other attributes of the vertices produced by the tesselator. This shader is equivalent to DX11&#8217;s domain shader.</li>
</ul>
<p>One important thing to notice is that a new primitive type is introduced, namely a patch. A patch on its own it is not directly or indirectly related to any traditional OpenGL primitive as it cannot be directly rendered. It is used only as the input type for the tesselator, however, a patch supplies the control grid of the geometry to be generated via tessellation so in practice it is most likely to be equivalent with triangles or quads but it is important to remark the difference.</p>
<p>As this is maybe the most well known feature of Shader Model 5.0 hardware I wouldn&#8217;t like to talk about it more as everybody knows what is it for and it would be rather long to explain how to use it. Also, it is not the intension of this article to fully cover the usage of all the new features, it is just a quick summarization of the new possibilities.</p>
<h3><a title="ARB_texture_buffer_object_rgb32" href="http://www.opengl.org/registry/specs/ARB/texture_buffer_object_rgb32.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_buffer_object_rgb32.txt?referer=');">ARB_texture_buffer_object_rgb32</a></h3>
<p>Yet another extension that introduces an additional format, now for texture buffers. Previously, texture buffers supported only four-component formats, this is extended with three-component formats. As currently there is no any practical use case in my mind when this can be useful, I would rather not come up with one. However, my opinion is that these formats most probably work with reduced performance compared to the four-component ones even though the memory footprint and bandwidth usage is maybe somewhat lower, I have concerns regarding to alignment related performance issues.</p>
<h3><a title="ARB_texture_cube_map_array" href="http://www.opengl.org/registry/specs/ARB/texture_cube_map_array.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_cube_map_array.txt?referer=');">ARB_texture_cube_map_array</a></h3>
<p>Those who already use texture arrays to help batching issues and remove unnecessary state changes most probably adore this extension as it enables texture array capabilities also for cube map textures. This comes handy especially in case when many materials use environment cube maps or when shadow cube maps are used for point lights. To come up with even a more concrete example, you can render the shadow cube maps for many hundreds of point lights with a single draw call by taking advantage of the layered rendering capability of geometry shaders and the possibility to bind texture arrays as render targets.</p>
<p>One more thing to notice here is that cube map arrays are already supported by Shader Model 4.1 hardware so the question to the ARB is again there, however, as OpenGL 3.3 still targets Shader Model 4.0 hardware maybe we will see a 3.x version of the specification that will also include this extension. The judgement is up to you whether you agree with me or not.</p>
<h3><a title="ARB_texture_gather" href="http://www.opengl.org/registry/specs/ARB/texture_gather.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_gather.txt?referer=');">ARB_texture_gather</a></h3>
<p>Another feature from the repertoire of Shader Model 4.1. This extension introduces new texture fetching functions to the Shading Language that determine a 2&#215;2 footprint of the texture that would be used for linear filtering in a texture lookup and returns a vector consisting of the first component from each of the four texels in the footprint. This is the so called Gather4 texture fetching mode and can be useful to accelerate percentage closer filtering of shadow maps as it can fetch four samples at once. Still, there are some limitations on the use of this fetching mode, one important thing is that a shader cannot use normal and gather fetches on the same sampler. This makes me think about whether this feature is not part of the sampler object state instead of being a Shading Language construct. Anyway, as in typical use cases these limitations does not defeat the goal of the feature, I would not consider this problem a design issue.</p>
<h3><a title="ARB_transform_feedback2" href="http://www.opengl.org/registry/specs/ARB/transform_feedback2.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/transform_feedback2.txt?referer=');">ARB_transform_feedback2</a></h3>
<p>The transform feedback mechanism already proved to me that is a great addition to the tool-set of graphics application developers. This feature extends transform feedback with an object type that encapsulates transform feedback related state to enable configuration reuse. Also it provides a way to pause and resume transform feedback mode if, for some reason, some rendering commands should be excluded from the feedback process.</p>
<p>The last and maybe most important benefit of this extension is the ability to draw primitives captured in transform feedback mode without querying the captured primitive count. It is roughly equivalent to DX10&#8217;s AutoDraw feature and the purpose of it is to eliminate the need to query the number of previously generated primitives in order to supply it to an OpenGL draw command. This solves the synchronization issues that previously happened between the CPU and the GPU.</p>
<p>One example is when a skeletal animated geometry has to be used in a multipass rendering technique. We can think about traditional forward rendering or when dealing with multiple shadow maps that have to be generated. Anyway, as the calculations needed to perform skeletal animation are rather expensive, it is wastage to perform these calculations in each pass.  A common way to solve this problem is to use transform feedback to capture the geometry emitted by a vertex shader that simply executes the skeletal animation on the input geometry. In subsequent rendering passes this feedback buffer can be used to source the geometry data to eliminate the need to recompute the animation. Without this extension, in such cases the application is most probably stalled until the feedback process ends as it needs to query the number of generated primitives. With this extension, this is solved as we don&#8217;t have to know the results of the previous transform feedback in order to issue a draw command that sources the data from the feedback buffer. By the way, this seems to be logical as the information is already on the GPU so why it should ping-pong between the CPU and the GPU?</p>
<p>As I mentioned before, the functionality provided by this extension is equivalent to DX10&#8217;s AutoDraw feature. This time my question is really serious: why this feature haven&#8217;t been included in OpenGL 3.3? It would provide a great benefit for those who use transform feedback and I don&#8217;t see any reason behind not supporting it because, as far as I can tell, it is supported on the corresponding hardware.</p>
<h3><a title="ARB_transform_feedback3" href="http://www.opengl.org/registry/specs/ARB/transform_feedback3.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/transform_feedback3.txt?referer=');">ARB_transform_feedback3</a></h3>
<p>Surprisingly, OpenGL 4.0 comes with another transform feedback extension as well but this time a true Shader Model 5.0 feature. The new hardware generation has the ability to emit vertices from the geometry shader to multiple vertex streams. In order to provide clever API support, the ARB decided to relax the previous limitation of transform feedback mode that output can be in either interleaved format or to separate buffers. This new extension enables the use of both together also providing a way to group geometry shader outputs to groups in order to target the individual vertex streams.</p>
<p>The most important benefit of this feature is still that we have separate streams, each with its own primitive emission counter so the outputs should not necessarily have the same granularity. This provide room for very clever rendering techniques. As an example, remember NVIDIA&#8217;s <a title="NVIDIA Skinned Instancing demo" href="http://developer.download.nvidia.com/SDK/10/direct3d/samples.html" target="_blank" onclick="pageTracker._trackPageview('/outgoing/developer.download.nvidia.com/SDK/10/direct3d/samples.html?referer=');">Skinned Instancing</a> demo that used one draw call per geometry LOD to sort instance data on a per-LOD basis. Using this extension, this preprocessing step can be done with a single draw call, but the abilities of this feature goes far beyond such a simple use case, I will also talk a bit about another in the next section.</p>
<p>One of my less technical notes is that it seems that the Khronos Group members have good sense of humor. I realized this when I met the &#8220;manbearbig&#8221; when reading one of the examples in the extension specification.</p>
<h3><a title="ARB_draw_indirect" href="http://www.opengl.org/registry/specs/ARB/draw_indirect.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/draw_indirect.txt?referer=');">ARB_draw_indirect</a></h3>
<p>We&#8217;ve arrived to the most culminating point in the list of features introduced. It is hard to say such a thing, but in my humble opinion this extension can be the Holy Grail of next generation rendering engines. I will explain why I think so&#8230;</p>
<p>The extension provides a way to source the parameters of instanced draw commands from within buffer objects. One naive use case would be to put all the rendering command parameters to a buffer object using the host application and then draw everything with a single command. While this simple method already has its benefits, this feature provides much more flexibility than this. The most revolutionary is that, using this extension, one is able to generate instanced draw commands with the GPU on-the-fly. Together with <a title="ARB_transform_feedback3" href="http://www.opengl.org/registry/specs/ARB/transform_feedback3.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/transform_feedback3.txt?referer=');">ARB_transform_feedback3</a> it is possible to write a completely GPU based scene management system.</p>
<p>Those who remember my <em>Instance Cloud Reduction</em> (ICR) algorithm, presented in the article <a title="Instance culling using geometry shaders - RasterGrid Blog" href="http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/">Instance culling using geometry shaders</a>, know that the required synchronization points between the CPU and the GPU heavily limited the practical utility of the culling technique. By taking advantage of the aforementioned features in case of ICR does not just eliminate the synchronization issues that I&#8217;ve spoken of but makes the technique practical also in case of heavily heterogeneous scenes with virtually any number of geometries even if there are multiple number of LOD level for them, and this whole stuff can be done with even less number of draw calls than that of the demo that accompanied my article. As soon as we will see OpenGL 4.0 capable drivers I will write an article about this technique, supplying also a reference implementation.</p>
<h3><a title="ARB_texture_query_lod" href="http://www.opengl.org/registry/specs/ARB/texture_query_lod.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/texture_query_lod.txt?referer=');">ARB_texture_query_lod</a></h3>
<p>This extension provides new fragment shader texture functions, namely textureLOD*, that return the results of automatic LOD computations that would be performed if a texture lookup would be performed. These functions return a two-component vector. The X component of the result vector contains information about the mipmap level that would be used if a normal texture lookup would have been made with the same coordinates. This value can be a concrete mipmap level or a value between two levels if trilinear filtering is in use. The Y component of the result holds the computed LOD lambda-prime, see the OpenGL specification in order to check out where it is actually coming from and how it is calculated.</p>
<p>One interesting thing that this extension can be used for is when one implements some shader based filtering and addressing method for textures. As an example, lets take a mega-texture implemented that uses a 3D texture for storage, without actual mipmaps, and the addressing, filtering and mipmapping is done with shader code. As right now this is the only example that came into my mind and this is already awkward enough, I would rather leave the further discussion of the importance of this feature to more competent people.</p>
<h3><a title="ARB_gpu_shader5" href="http://www.opengl.org/registry/specs/ARB/gpu_shader5.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/gpu_shader5.txt?referer=');">ARB_gpu_shader5</a></h3>
<p>Basically, this extension is nothing more than a big umbrella feature under what all the additional general or minor API changes go. Just to sum up the miscellaneous features provided by this extension, here is an excerpt from the extension specification:</p>
<ul>
<li>Support for indexing into arrays of samplers using non-constant indices.</li>
<li>Support for indexing into an array of uniform blocks.</li>
<li>Extending Gather4 with the ability to select any single component of a multi-component texture, to perform per-sample depth comparison, and to specify arbitrary offsets computed at runtime when gathering the 2&#215;2 footprint.</li>
<li>Support for instanced geometry shaders, where a geometry shader may be run multiple times for each primitive.</li>
</ul>
<p>For a full list of new facilities introduced by the extension refer to the extension specification.</p>
<h3><a title="ARB_gpu_shader_fp64" href="http://www.opengl.org/registry/specs/ARB/gpu_shader_fp64.txt" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.opengl.org/registry/specs/ARB/gpu_shader_fp64.txt?referer=');">ARB_gpu_shader_fp64</a></h3>
<p>This extension enables the use of double-precision floating-point data types and arithmetic from within shaders, also providing API entry points for double-precision data where it was missing. While one may think that the added precision is somewhat wastage in case of real-time graphics, it is important to note that GPUs are more and more often used for scientific calculations, not even necessarily in case of graphics related tasks. Taking in consideration this fact, the importance of double-precision floating-point support should not be underestimated. Beside that, maybe standard graphics application developers can also take advantage of the higher precision in some extreme use case scenarios.</p>
<h3>The OpenGL Shading Language 4.00</h3>
<p>Beside what I&#8217;ve already mentioned, there is no important thing to mention regarding to the Shading Language. There where many changes but most of them are simply provide Shading Language support to the API extensions. What I haven&#8217;t mentioned so far is the synchronization possibility for tessellation shaders, more implicit conversions, more integer functions, packing and unpacking facilities for floating-point formats and a new qualifier to force precision and disallow optimizations that re-order operations or treat different instances of the same operator with different precision.</p>
<h2>Conclusion</h2>
<p>I hope that some of you didn&#8217;t give up the reading so far. Sorry, but it seems that this article gone wild and still didn&#8217;t manage to cover all the topics I intended to talk about. But still, maybe I&#8217;ll recap on those subjects later.</p>
<blockquote><p>Where is direct state access?</p></blockquote>
<p>The original promise of eliminating the bind-to-modify semantics from the OpenGL API is still not done. The first reaction of many people is still to ask this question. While the bind-to-modify semantics is a rather annoying &#8220;feature&#8221; of OpenGL, I tend to state that if we are not talking about legacy OpenGL, the importance of direct state access is less and less relevant as we can already heavily reduce the number of state changes and API calls in our applications, thanks to the fast pace evolution of OpenGL. I sincerely think that with a modern rendering engine design built upon the idioms behind the new versions of the OpenGL API one should not face any significant scalability issues due to the outdated bind-to-modify semantics but maybe I&#8217;m wrong.</p>
<p>Personally, I have only one problem with the newly released specification versions that I&#8217;ve already tried to emphasize several times: the fact that so far many Shader Model 4.x features are missing from the 3.x line of the API specification. Hopefully that will be solved sooner or later, however addressing these issues should happen before the hardware to support will become outdated.</p>
<p>Anyway, we should not have any harsh complains as the Khronos Group did a great job again. They managed to keep again the half-year schedule and they even published two parallel releases at once! If someone still says that the DirectX API is superior compared to OpenGL should think it twice, as it seems that the tendency is that OpenGL just starts to evolve more and more fast. Beside that as now also AMD is being active in the OpenGL world, we can expect good support from both industry and developer community point of view.</p>
<p>My respect for the Khronos Group and thanks for reading the article!</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/03/a-brief-preview-of-the-new-features-introduced-by-opengl-3-3-and-4-0/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>RasterGrid Blog crossed the 10000 threshold</title>
		<link>http://rastergrid.com/blog/2010/03/rastergrid-blog-crossed-the-10000-threshold/</link>
		<comments>http://rastergrid.com/blog/2010/03/rastergrid-blog-crossed-the-10000-threshold/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 20:35:41 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=204</guid>
		<description><![CDATA[
I am proud to announce that the number of visits has just gone over 10000. I would like to share that I haven&#8217;t expected such a great success in less than two months. I can hardly thank this enough to all my occasional and especially for my returning visitors!
When I&#8217;ve started to write this blog [...]]]></description>
			<content:encoded><![CDATA[
<p>I am proud to announce that the number of visits has just gone over 10000. I would like to share that I haven&#8217;t expected such a great success in less than two months. I can hardly thank this enough to all my occasional and especially for my returning visitors!</p>
<p>When I&#8217;ve started to write this blog my primary intension was to share my knowledge and ideas, no matter if they are legitimate enough or not. I did this in the hope that the articles on this blog may help others. At the end, it turned out that me myself learned from it a lot as, thanks to You, I&#8217;ve got great improvement ideas and feedback about my writings.</p>
<p><span id="more-204"></span>During the last two months I&#8217;ve came up with articles that brought various feelings out of the Reader. I&#8217;ve met exhortatory comments that gave me great power to continue the progress. Sometimes also faced conflicts due to different points of view and opinion, but I think those were also very edifying for both me and others. Also, one of my best experiences were when You came up with excellent improvement ideas regarding to the presented source code or whatever. The only thing I feel sorry for is that I haven&#8217;t had sufficient time to write further articles. Unfortunately, I cannot promise that it will change in the future but I will do my best, and I hope the quality and the utility of my articles will improve over time.</p>
<p>As it turned out that a schedule of 2-3 articles per week rendered being impossible to fit in my time-frame, I will most probably try to stick to one post per week. As a foretaste, here are some topics that I would like to talk about in the near future:</p>
<ul>
<li>Further application of geometry shaders (cube rendering and some more nifty tricks)</li>
<li>AMD tessellation demo with practical use cases</li>
<li>WebGL, COLLADA and other good to know stuff regarding to portable graphics</li>
<li>Physics and rigid body dynamics</li>
<li>More info about some of the best unit test practices</li>
<li>C++ messaging and state machines</li>
<li>Maybe some further development software reviews</li>
</ul>
<p>As the final word: thanks for being interested!</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/03/rastergrid-blog-crossed-the-10000-threshold/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Flexible static analysis for C++ code bases</title>
		<link>http://rastergrid.com/blog/2010/03/flexible-static-analysis-for-c-code-bases/</link>
		<comments>http://rastergrid.com/blog/2010/03/flexible-static-analysis-for-c-code-bases/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 17:12:37 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[code analysis]]></category>
		<category><![CDATA[CppDepend]]></category>
		<category><![CDATA[GLM]]></category>
		<category><![CDATA[GoogleMock]]></category>
		<category><![CDATA[maintenance]]></category>
		<category><![CDATA[refactoring]]></category>
		<category><![CDATA[SFML]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=190</guid>
		<description><![CDATA[
The importance of static code analysis is already a well known thing in the domain of software development. There are plenty of useful and less useful tools for the purpose, especially in the case of C++. However, even if in general the quality of these softwares is adequate they usually suffer from the inability for [...]]]></description>
			<content:encoded><![CDATA[
<p>The importance of static code analysis is already a well known thing in the domain of software development. There are plenty of useful and less useful tools for the purpose, especially in the case of C++. However, even if in general the quality of these softwares is adequate they usually suffer from the inability for extending or customizing behavior. Also, a usual problem arises from the fact that the C++ language syntax is overwhelmingly complex and it makes the code parser of any static analysis tool a nightmare. In this article I would like to present a tool called CppDepend that gracefully solves the aforementioned problems primarily focusing on providing an interface that enables 100% adaptability and extensibility for creating customized metrics that are relevant or applicable in a particular domain.</p>
<p><span id="more-190"></span></p>
<h3>Why static code analysis?</h3>
<p>Analysis of computer software, in particular verification and validation, is a very important factor in professional software development. The process behind itself can come in different forms. Generally all kind of verification and validation techniques can be categorized in two major groups: static analysis and dynamic analysis. The key difference between the two is while dynamic analysis verifies the execution of the code, static analysis strictly works on the code base itself.</p>
<p>Well, there are thousands of reasons why using a static code analysis tool makes any benefits to a particular software development process. If you ask various people they will all have their own reasons and rationale behind that. Just to mention my favorites here is a brief excerpt from the long list:</p>
<ul>
<li>Find coding errors before executing a single line of code. This is important as it does not require the project to be built or executed as in many cases these two additional phases can be quite expensive from both time and budget point of view.</li>
<li>Identifies parts of the code that seem to be difficult to maintain or do not conform to various policies of a particular company or organization. This provides us the benefit to move towards a sustainable development by heavily reducing maintenance costs.</li>
<li>Provides us miscellaneous metrics about our code that can have key importance in measuring the quality of the code base.</li>
</ul>
<blockquote><p>If you can&#8217;t measure it, you can&#8217;t improve it &#8211; Lord Kelvin</p></blockquote>
<p>Many people still think that code metrics are overrated. Even if at first sight it seems to be true for micro-projects its importance becomes very obvious when one mets a large code bases specifically talking about situations when legacy code is inherited from earlier software developer generations. When the magnitude of the software goes out of the limits a programmer is capable to keep in mind (this means the 99% of software products) code metrics provide great value to identify &#8220;hot spots&#8221; in the code base, no matter what actual situation we are talking about.</p>
<p>Also, making decision about whether the evolution of the software goes in the right direction is very difficult if not impossible without ways of measuring the quality of the code. The most naive solution for this problem is to measure the amount of bug reports reported over time, however, code metrics provide a much more sophisticated way of measuring the quality by different aspects and on different levels.</p>
<p>During my career, as a software developer, I also faced many situations when the inspection of the legacy code was necessary in order to introduce new functionalities. Unfortunately, in most of the cases, due to the lack of an adequate static code analyst, this required developers to read and manually inspect the code in order to solve the particular problem. I can tell you that it&#8217;s not a joyful duty. Just to mention some of the most critical situations that current developers meet regarding to the topic:</p>
<p><strong>Removing dependencies on deprecated features.</strong> This is a thing that each software development faces from time to time. This time interval is usually relatively low, as we talk about few years which can be called quite often compared to other industries. Just think about situations when one migrates to a new version of a third party library that the whole software depends on. As a recent event, we can talk about the release of version 3 of the OpenGL specification. CAD software developer companies faced a huge challenge by being forced to adopt the new features as the old ones became deprecated and obsolete. Actually they were quite lucky that vendors denied to drop features from their implementations. Using a code analyst one can easily identify the modules that needs to be modified in order to adopt to the latest changes.</p>
<p><strong>Introducing multiprocessing.</strong> This is also a very imminent problem that every software development company will face sooner or later. Code bases inherited from the previous decades were not prepared to handle concurrent execution of the code thus making big headaches to software architects to redesign the code in order to be SMP compliant, especially when dealing with multi-core processors. I&#8217;ve also faced this situation during my career and it was a painful lesson that code analyzing possibilities have a great importance. Before inspecting carefully the whole code base it is very difficult to identify the possible problems that may arise by the introduction of multiprocessing. Automatic inspection of the code can be a very handy tool for minimizing the required efforts.</p>
<h3>What makes up a good static code analysis tool?</h3>
<p>There are many different aspects that affect how good a particular static code analysis tool is. In many situations having competing alternatives for this purpose is at a premium. Fortunately, this is not the case regarding to C++ as being a well supported programming language from the community. However, in order to choose a suitable alternative we have to collect our requirements:</p>
<ul>
<li><strong>Correctness</strong> &#8211; It must correctly analyze the code. This is a very basic requirement against any software development tool. While this seems to be a completely obvious requirement and one expects that tools behave as expected from this point of view, most of such tools for C++ do not conform to this principle. Those who know the C++ language standard know well that writing a good parser for it is almost impossible.</li>
<li><strong>Usefulness</strong> &#8211; There is no sense in using a static code analyst if we don&#8217;t get any benefits from it. The reports generated by the analyst should provide useful information that are directly applicable in a particular use case. One typical example that I also faced quite often is that when one analyses legacy code and gets a report about thousands of problematic code parts. These reports are almost impossible to be handled and it makes headaches to the developers even to answer the very simple question: where to start?</li>
<li><strong>Customizability</strong> &#8211; This requirement directly relates to the previous one. By examining the previous example if there would be some customization possibility to get reports only about the 10 most problematic module it would be much easier to handle it. However, this requirement goes far beyond this. As an example, beside the build-in metrics of the analysis tool, it should provide means to add or modify metrics in order to have more relevant measures about the code fitting a particular domain or use case.</li>
</ul>
<p>We&#8217;ve just mentioned three requirements explicitly and we already heavily reduced the number of alternatives&#8230;</p>
<h3>CppDepend as a flawless alternative</h3>
<p>Recently I&#8217;ve got a request to review a C++ static code analyst tool called CppDepend. After having a brief eye shot on the product I realized that it deserves a thorough inspection as it features a revolutionary technology called CQL that I will talk about a bit later in the article.</p>
<p>CppDepend was developed in partnership with NDepend, it was released six months ago having a two years development history by a very small team of experts. Actually it is accompanied with it&#8217;s brothers NDepend and XDepend that accomplish the same job for .NET and Java projects respectively.</p>
<p>We are talking about a Windows application that has tight integration with Visual Studio projects but also provides ways to be applicable in case of projects built with other development tool-set. Beside it is a command-line static code analysis tool for the C++ language, it provides a powerful GUI tool for visual inspection of different aspects of the code base thus enabling increased productivity and ease of use.</p>
<p>Lets have our first sight on the tool by using the visual interface to analyse a sample code base that will be in our case the source code of <a title="Simple and Fast Multimedia Library" href="http://www.sfml-dev.org/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.sfml-dev.org/?referer=');">SFML</a>.</p>
<p>Setting up the basic configuration for an analysis project is very straightforward. Beside that, the code analysis itself is surprisingly fast. While testing, the longest time it took was in case when I parsed the code of the <a title="Bullet Physics Library" href="http://bulletphysics.org/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/bulletphysics.org/?referer=');">Bullet Physics Library</a> but even that didn&#8217;t required a minute on my system.</p>
<div id="attachment_194" class="wp-caption aligncenter" style="width: 624px"><a href="http://rastergrid.com/blog/wp-content/uploads/2010/03/cppdepend.png"><img class="size-large wp-image-194 " title="CppDepend graphical user interface" src="http://rastergrid.com/blog/wp-content/uploads/2010/03/cppdepend-1024x789.png" alt="CppDepend graphical user interface" width="614" height="473" /></a><p class="wp-caption-text">CppDepend graphical user interface</p></div>
<p>The visual controls themselves sometimes lack of good responsiveness due to the complex structures and relationships presented by them but we soon forgive CppDepend this minor issue when we take a closer look at the navigation possibilities offered by the tool.</p>
<p>At first sight, the user interface seems to be a bit overcomplicated but we soon realize that each and every element of it is made by purpose in order to provide as much freedom in navigation as possible. Just to mention the most interesting ones here&#8217;s the explanation of the purpose of the graphical figures at the top right part of the GUI:</p>
<ul>
<li>At top left we see a graphical representation of the currently selected code metric. It shows the magnitude of the result of the metric according to the selected level of granularity. We can easily visualize here as an example how the size of different classes of our project compare to each other.</li>
<li>At middle left is the dependency matrix of our solution. We can easily find &#8220;hot spots&#8221; in our code regarding to coupling, by default, on project level. The granularity of the table can be easily changed in a non-proportional way from project level down to method level. I used the word &#8220;non-proportional&#8221; by intension as we can examine dependency even between a method and a foreign project thus providing additional flexibility over how fine grained we would like to have our numbers.</li>
<li>My favorite is in the middle, called dependency graph. It can present the dependencies between different software elements from project level down to method level, as usual, by means of a graph that is very convenient for human inspection.</li>
</ul>
<p>The whole user interface is designed in a way that each time we point on a particular element it shows convenient information about that particular element and its environment, no matter if we talk about the metrics view, the dependency graph or matrix.</p>
<p>Beside the tools for navigation and easy visualization, the GUI provides a collection of built-in reports about different aspects of the code. One of the first thing everybody would try out from these is the query called &#8220;Quick summary of methods to refactor&#8221;. This is exactly the answer what the developer would like to have for the question &#8220;where to start?&#8221; that I mentioned earlier.</p>
<p>To emphasize even more the fact that how convenient is the user interface, when one selects a particular query it will immediately show the results by means of a list of classes, methods or whatever, but beside this, the code elements in question are immediately highlighted in the relevant graphical views as well.</p>
<p>Maybe I already convinced most of you that CppDepend is a tool that deserves attention as being a valuable tool in good hands but I haven&#8217;t even talked about the most interesting feature that really makes it a uniquely powerful software.</p>
<h3>The power of extensibility</h3>
<p>I have often brought to relief the importance of extensibility and customizability of a static code analyst. This, in fact, is not just my craze but it is an important factor in the decision of most software developers out there. Being able to get some common metrics about the code is one thing, having the possibility to define own metrics and analysis criterias is another&#8230;</p>
<p>The power of CppDepend is behind a revolutionary technology that provides us an interface to retrieve information about the code that is relevant for us as easy as querying a relational database. The apparatus in our hand to achieve this is the <a title="Code Query Language 1.8 Specification" href="http://www.cppdepend.com/CQL.htm" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.cppdepend.com/CQL.htm?referer=');">Code Query Language (CQL)</a>. CppDepend actually builds some internal database structure from the source code and provides us an SQL-like language to make queries that fetches reports from this internal database. Those who are already familiar with SQL will adore this feature. Just to illustrate how easy it is to use CQL in order to build custom queries, let&#8217;s query the classes that have more than 20 methods is as simple as the following line of CQL code:</p>
<pre class="brush: sql">SELECT TYPES WHERE NbMethods &gt; 20</pre>
<p>Simple, isn&#8217;t it? For further details, please refer to the specification of the Code Query Language: <a href="http://www.cppdepend.com/CQL.htm" onclick="pageTracker._trackPageview('/outgoing/www.cppdepend.com/CQL.htm?referer=');">http://www.cppdepend.com/CQL.htm</a></p>
<p>This means that the software developers have complete freedom over how they define the metrics that indicate whether the code quality reaches the levels required by company policies or individual needs. It is also useful to solve the problems arising from the sample situations I&#8217;ve mentioned earlier, namely the problem with dependency on deprecated features and the introduction of multiprocessing, by easily and clearly identifying the modules that need to be changed even in situations when the code base is extremely huge and traditional ways for identifying affected modules are not applicable or simply not feasible.</p>
<h3>Endurance test</h3>
<p>Well, I&#8217;ve already talked enough about the abilities of CppDepend regarding to usefulness and customizability, however, I&#8217;ve barely touched the topic of correctness. As I&#8217;ve already mentioned, parsing C++ code correctly is not as easy as it may look like. For this purpose I&#8217;ve prepared a bunch of template heavy libraries like <a title="OpenGL Mathematics" href="http://glm.g-truc.net/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/glm.g-truc.net/?referer=');">GLM</a> and <a title="GoogleMock" href="http://code.google.com/p/googlemock/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/code.google.com/p/googlemock/?referer=');">GoogleMock</a> to check how well CppDepend handles code bases when it comes to awkward features of the C++ language.</p>
<p>Even though generally static analyst tools does not provide too much useful information about such project, due to their special nature, it still looked convenient to try to make parsed these libraries by CppDepend in order to have a picture about how it would handle huge projects that also take advantage of the templating mechanisms of C++. I have to say that the results are very promising as it had problems only with GoogleMock but the developers were already informed about the problem I&#8217;ve encountered.</p>
<h3>The dark side of the story</h3>
<p>While CppDepend is an excellent tool for software developers working under Windows, especially if they use Visual Studio, I would like to see a cross-platform version of CppDepend in the future, at least for Linux and MacOSX.</p>
<p>Also, CppDepend does not come for free but at a reasonable price. Even though most probably individuals and hobbyists would not consider buying it, for enterprises, even for small ones, the price of the tool will most probably pay back soon by heavily decreasing short- and long-run maintenance costs of the development.</p>
<h3>Conclusion</h3>
<p>A clever static code analyst tool is nowadays a must for every software development company that deals with code whose size have already ran over a certain threshold but it is also good to use one from the very beginning of a new project. Selecting a particular tool for this purpose is the choice of the enterprise, still, the requirements against such a software are usually the same.</p>
<p>CppDepend proved to me of being a valuable software in the tool-chain of every C++ programmer using Windows as primary development platform. If you are still not convinced then check out the <a title="CppDepend - Features" href="http://www.cppdepend.com/Features.aspx" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.cppdepend.com/Features.aspx?referer=');">full feature list</a> on the official site.</p>
<p>Even if you are not interested in using CppDepend or in static analysis tools at all, you should still take a look at CQL and the great idea behind it as it is a perfect example how a solution for a well discussed problem can ascend to new levels by adopting good practices from other domains, in this case from relational databases and related technologies.</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/03/flexible-static-analysis-for-c-code-bases/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unit testing OpenGL applications</title>
		<link>http://rastergrid.com/blog/2010/02/unit-testing-opengl-applications/</link>
		<comments>http://rastergrid.com/blog/2010/02/unit-testing-opengl-applications/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 19:54:15 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[GLEW]]></category>
		<category><![CDATA[GoogleMock]]></category>
		<category><![CDATA[macro]]></category>
		<category><![CDATA[mocks]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[TDD]]></category>
		<category><![CDATA[unit test]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=182</guid>
		<description><![CDATA[
Nowadays comprehensive testing is a must for any software product. However, it isn&#8217;t such a general rule when it comes to graphics applications. Many developers face difficulties when they have to test their rendering codes. Manual tests and visual feedback is sometimes satisfactory but if one would like to have automated regression tests usual approaches [...]]]></description>
			<content:encoded><![CDATA[
<p>Nowadays comprehensive testing is a must for any software product. However, it isn&#8217;t such a general rule when it comes to graphics applications. Many developers face difficulties when they have to test their rendering codes. Manual tests and visual feedback is sometimes satisfactory but if one would like to have automated regression tests usual approaches seem to fail. Even if at first sight unit testing of rendering code doesn&#8217;t look really straightforward, in fact it is. OpenGL is not an exception from this rule as well. In this article I would like to briefly present a few methods how to unit test OpenGL rendering code and also present my choice and the reasons behind the decision.</p>
<p><span id="more-182"></span>There are several ways how to create automated test cases for rendering code. To present the different approaches we first have to select a small portion of our rendering code to demonstrate the differences of each technique, mentioning the strengths and weaknesses of them.</p>
<p>Before going any further, we have to lay down our requirements against a good OpenGL unit testing environment:</p>
<ul>
<li><strong>Verifies results</strong> &#8211; This is the most basic requirement for any testing framework. We have to have the ability to check whether the rendering code executed by the module is valid and works as expected.</li>
<li><strong>Productive</strong> &#8211; The usage and maintenance of the framework shall require minimal effort. Many times unit testing is attacked because it requires additional code writing. While this is generally true, a nice unit testing environment can be kept very simple yet flexible. An OpenGL testing environment shouldn&#8217;t be different.</li>
<li><strong>Fast</strong> &#8211; This is a general requirement for any unit testing environment especially when combined with a continuous integration framework. We want our test results as fast as possible as long feedback cycles severely slow down the development process.</li>
<li><strong>Standalone</strong> &#8211; Does not require complex setup or environmental support in order to be executed. This is a general requirement when we deal with unit testing as if the code is tightly coupled by any of the surroundings then both development and maintenance costs increase.</li>
<li><strong>Compatible</strong> &#8211; Does not require any special hardware so it can be tested on a machine that wouldn&#8217;t necessarily be suitable for manually testing the actual product. This is especially important when the target hardware is some type of embedded platform. It is also important to ensure that it will work on hardware provided by different vendors. In one word, it should comply to the standard, not to driver implementations.</li>
<li><strong>Cross-platform</strong> &#8211; Does not rely on the services of a particular operating system or platform, instead it can be executed on any machine as usually all unit tests. Of course, this restriction can be relaxed depending on actual use case scenarios.</li>
</ul>
<p>Now that we know what we would like to achieve, we can continue with a sample use case. Lets say we would like to create an OpenGL 3.2 based rendering engine. One of the first things that we would write is a class (or set of classes) that will help us handling OpenGL buffer objects as it seems to be one of the main building blocks of such a system. As a very basic example, our first version of the buffer handling class will act simply as a wrapper for buffer objects having the following interface:</p>
<pre class="brush: cpp">class Buffer {
public:
    Buffer();
    virtual ~Buffer();
};</pre>
<p>As it can be seen for now we just require that our class to handle the creation and deletion of a buffer object. Obviously, our test has to check that the constructor successfully creates a buffer by calling <em>glGenBuffers</em> and the destructor deletes that by calling <em>glDeleteBuffers</em> with proper arguments. Now lets see what possibilities we have to test OpenGL rendering code and whether it conforms to our requirements and is able to test our simple module.</p>
<h3>Checking rendered image</h3>
<p>The most naive solution for creating automated tests for rendering code is to actually execute the OpenGL commands and check whether the rendering happened as expected. This can be done by comparing reference rendering results to the actual ones. This approach has the benefit that we actually verify the concrete behavior but lets see how it looks like when we check against our previously laid down requirements:</p>
<ul>
<li><strong>Verifies results</strong> &#8211; Partially fulfilled. We check against the correct behavior, however, the ability to reproduce the actual same image is often difficult if not impossible due to different relaxations regarding to precision in both the standard and driver implementations. In order to have reproducible results the testing environment shall also provide some mechanisms to allow slight differences.</li>
<li><strong>Productive</strong> &#8211; Not met. It can be quite expensive to create an assertion system. Also, the production of reference data can be quite time consuming.</li>
<li><strong>Fast</strong> &#8211; Not met. Even if the checkers are highly optimized components of the framework, it wouldn&#8217;t fit into the time-frame of unit test cycles to execute possibly thousands of test cases that require complete verification of the produced image.</li>
<li><strong>Standalone</strong> &#8211; Not met. We have to setup a complete rendering environment in order to test even the simplest rendering code. Also, it relies on the assumption that the rendering code actually produces some image. As we can see in our buffer handling example, this is not always the case.</li>
<li><strong>Compatible</strong> &#8211; Not met. We need a testing machine that has the hardware capabilities to execute the rendering code and produce the required image.</li>
<li><strong>Cross-platform</strong> &#8211; Partially fulfilled. If our rendering code is cross-platform then it is possible to test it on any of the supported platforms. However, this makes the assertion system even more complicated as it also has to support the target platforms. Also, driver implementations may vary even further when dealing with different operating systems.</li>
</ul>
<p>As we can see, even if this version is quite natural way of thinking for anybody it&#8217;s simply impractical and not feasible for actual use. To be able to find a good solution we must look deeper into what unit testing exactly is as the presented solution has nothing to do with it. In order to be able to do real unit testing we have to eliminate the dependency on OpenGL driver implementations and strictly concentrating on the module under test.</p>
<h3>Fake OpenGL driver</h3>
<p>The second presented solution is to create a layer between the code under testing and the actual OpenGL driver implementation. This can be easily achieved by creating a fake driver, as an example a dynamic library called <em>opengl32.dll</em> in case of Windows. This additional layer would do nothing else than just recording and checking whether the required API calls happened as expected. Providing an interface towards the testing environment that can be used to request the informations needed to make a verdict about the successfulness of the test case.</p>
<p>Beside that this version accommodates much more to the idea behind unit testing it also has the benefit that it is acting as a totally independent layer and does not directly disturb the development of the actual code. Still, if we go back to our checklist we have some issues that raise some concerns regarding to the applicability of this approach:</p>
<ul>
<li><strong>Verifies results</strong> &#8211; Partially fulfilled. It is up to the implementation of the new layer whether it provides the required facilities to properly check the behavior of our tested code. Nevertheless, it also highly depends on the implementation on how we define correct behavior and the responsibility of the library.</li>
<li><strong>Productive</strong> &#8211; Partially fulfilled. Now we have a separate module that helps us in the testing. This may introduce some additional maintenance work but, of course, this depends on how intelligently is the library actually implemented.</li>
<li><strong>Fast</strong> &#8211; Mostly resolved. We do not have expensive assertions, however, as we have a quite restricted interface between our testing environment and the new layer we most probably met situations when we have to make trade-offs between speed and flexibility.</li>
<li><strong>Standalone</strong> &#8211; Resolved. We have a totally independent module that is responsible to simulate the surrounding environment of the code under testing as it should be when doing unit test. However, the question arises whether we would like this layer to be that separated from the testing code.</li>
<li><strong>Compatible</strong> &#8211; Resolved. There is no dependency on dedicated graphics hardware or any other piece of metal. In case of a robust driver simulation layer we can test our code on whatever platform we prefer.</li>
<li><strong>Cross-platform</strong> &#8211; Resolved. As previously mentioned, if the additional layer is well designed, there should be no problems regarding to this issue.</li>
</ul>
<p>Now we have a resolution that can be seriously taken into consideration as a good way to test rendering code. It can also be simply applied to test our buffer handling code as well. Also, as it is a totally standalone software element it is also very portable so it is easy to reuse between projects written in different programming languages and for different platforms.</p>
<p>Still, there is one thing that may need further investigation. Most probably for the other portions of our production code we already use some kind of mocking mechanisms for our unit testing. Having an additional interface type to handle the OpenGL related mocking (as the presented fake driver approach is nothing more than a mock library for OpenGL) may reduce the productivity of our developers. Also, it can make the testing code less uniform so introducing a slight maintenance penalty. At least for comparison, we should try to integrate the OpenGL mocking into our existing mocking facilities.</p>
<h3>API mocks</h3>
<p>All the people who seriously do unit testing use some mocking techniques to eliminate dependency on any external software element like databases, network or another code element. Why should the OpenGL API be different?</p>
<p>As I already written about that I use <a title="GoogleMock" href="http://code.google.com/p/googlemock/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/code.google.com/p/googlemock/?referer=');">GoogleMock</a> to test my C++ code. Lets see how this mocking framework is capable for removing OpenGL related dependencies. By default, GoogleMock does support only class mocks, however it is fairly straightforward to mock out OpenGL API functions as well. As an example, our buffer handling class needs at least a mock for the <em>glGenBuffers</em> and <em>glDeleteBuffers</em> API functions. These mocks can be very easily created using GoogleMock as part of a class in the following way:</p>
<pre class="brush: cpp">class CGLMock {
public:
    MOCK_METHOD2( GenBuffers, void(GLsizei n, GLuint* buffers) );
    MOCK_METHOD2( DeleteBuffers, void(GLsizei n, GLuint* buffers) );
};
CGLMock GLMock;</pre>
<p>This, however is not enough to replace the already existing real API function pointers with the fake ones. I did this with a nasty little trick by taking advantage of the C preprocessor:</p>
<pre class="brush: cpp">#undef glGenBuffers
#define glGenBuffers                  GLMock.GenBuffers
#undef glDeleteBuffers
#define glDeleteBuffers               GLMock.DeleteBuffers</pre>
<p>The <em>#undef</em> is needed because I use <a title="GLEW" href="http://glew.sourceforge.net/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/glew.sourceforge.net/?referer=');">GLEW</a> for accessing OpenGL API functions and it uses macros for the API function names as well.</p>
<p>All these are put into a file that can be called like <em>glmock.h</em>. In order to force the production code to use these definitions when trying to access the API inside a test case we have to create a wrapper header called something like <em>opengl.h</em> that will include original headers in case of normal build and include the mock library in case of unit test build. This is kind of a workaround but it works quite well in practice.</p>
<p>In theory, this trick can be applied in case of any mocking framework. As a result, from now we can write a very simple test case to check the creation and deletion of our buffer object as easily as the following few lines of code:</p>
<pre class="brush: cpp">TEST(BufferTest, CreationAndDestruction) {
    EXPECT_CALL(GLMock, GenBuffers(1,_))
        .WillOnce(SetArgumentPointee&lt;1&gt;(13));
    Buffer* buffer = new Buffer;
    EXPECT_CALL(GLMock, DeleteBuffers(1,Pointee(13)));
    delete buffer;
}</pre>
<p>I would not like to go into the details related to the interface of GoogleMock. In one word, the test case above checks whether the constructor calls <em>glGenBuffers</em> with a number of 1 for the requested number of buffer objects and returns a buffer ID in the pointer argument, and at the end it checks if <em>glDeleteBuffers</em> was called with the buffer ID value got at creation.</p>
<p>It is maybe a matter of taste whether the second or this third solution is more attractive for you. My choice was this last solution because I didn&#8217;t want to develop an separate library and also was afraid of messing up my test code with different syntactical representations of mocks. Finally, lets sum up the achievements of this last version:</p>
<ul>
<li><strong>Verifies results</strong> &#8211; Fulfilled. An existing mocking framework is used for emulating the OpenGL API thus we have all the facilities required for the proper checking of the API calls.</li>
<li><strong>Productive</strong> &#8211; Fulfilled. Again, we don&#8217;t have to deal with writing an own mocking mechanisms as we have everything out of the box. We can also incrementally extend our mock library on-the-fly while editing the test cases and the production code.</li>
<li><strong>Fast</strong> &#8211; Resolved. Our rendering related unit test cases should be as fast as any other test codes as they are indifferent, just the purposes are dissimilar.</li>
<li><strong>Standalone</strong> &#8211; Mostly resolved. The mocking library is independent, however, as we&#8217;ve seen, the introduction may require some nasty tricks in order to inject foreign code into the production code.</li>
<li><strong>Compatible</strong> &#8211; Resolved. From this point of view, this approach behaves the same as the previous version.</li>
<li><strong>Cross-platform</strong> &#8211; Resolved. Again, the same like in the previous case, maybe even a bit easier to make it portable.</li>
</ul>
<h3>Conclusion</h3>
<p>We&#8217;ve seen a few ways how we can extend our testing environment in order to support the verification of rendering code. We&#8217;ve also seen that the range varies from techniques that provide high level methods suitable especially for functional testing, until very low level methods that tightly integrate in the mocking methodology of unit testing. These, of course, do not replace traditional testing methods rather they extend it in order to find problems in the early phases of software development.</p>
<p>I also tried to present a very basic example of production code that needs such a facility in order to be tested, as well as a sample test case written using GoogleMocks applying the last presented technique.</p>
<p>While writing this article I got the idea that it would be nice to have a complete and general framework for OpenGL testing. If there is interest for it, maybe I&#8217;ll allocate some time to write one. I&#8217;m also interested which approach is the most attractive for you, especially if you have some concrete experience with any of these or with some other technique.</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/02/unit-testing-opengl-applications/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
		<item>
		<title>One more degree of freedom for C++</title>
		<link>http://rastergrid.com/blog/2010/02/one-more-degree-of-freedom-for-c/</link>
		<comments>http://rastergrid.com/blog/2010/02/one-more-degree-of-freedom-for-c/#comments</comments>
		<pubDate>Sun, 14 Feb 2010 14:38:42 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[callback]]></category>
		<category><![CDATA[delegate]]></category>
		<category><![CDATA[delegation]]></category>
		<category><![CDATA[Delphi]]></category>
		<category><![CDATA[event handling]]></category>
		<category><![CDATA[message]]></category>
		<category><![CDATA[signal]]></category>
		<category><![CDATA[signals and slots]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=176</guid>
		<description><![CDATA[
Those who worked enough with C or other procedure oriented languages know how much flexibility callbacks provide. The simplest example is the qsort function of the C standard library. It is also not unintentional that many libraries, windowing system APIs and operating system APIs also highly rely on callbacks to pass a particular task over [...]]]></description>
			<content:encoded><![CDATA[
<p>Those who worked enough with C or other procedure oriented languages know how much flexibility callbacks provide. The simplest example is the qsort function of the C standard library. It is also not unintentional that many libraries, windowing system APIs and operating system APIs also highly rely on callbacks to pass a particular task over to another program module and it is one of the fundamental tools needed to implement an event-driven application. At the same time, object oriented languages does not directly support the concept of callbacks as they don&#8217;t really fit into the paradigms used by these languages. Fortunately, even if not as a language feature, all object oriented languages support a similar facility like callbacks in the form of delegates.</p>
<p><span id="more-176"></span>Delegation as a design pattern is used to describe the situation when one object passes on the implementation of a particular task to another object. This clearly reflects the purpose of callbacks used in procedure oriented languages. Many languages does natively support some form of delegation, some of the well known ones are C# and Delphi.</p>
<h2>Callbacks</h2>
<p>As mentioned before, the facility present in procedure oriented languages that enables the delegation of functionalities to other modules is done with callbacks. These callbacks are specified by passing function pointers to some registration functions provided by the library. Here is a very simple C example:</p>
<pre class="brush: c">/* server header */
void registerFooCallback(int (*fooCB)(int, float));
int doFoo(int a, float b);

/* client code */
int myFooCallback(int a, float b) {
    /* ... do something ... */
}

int main() {
    registerFooCallback(myFooCallback);
    cout &lt;&lt; doFoo(5, 3.2f);
    return 0;
}</pre>
<p>Here we can see how easily callbacks provide injection of user code for handling events happened in the server.</p>
<h2>Delegation as a design pattern</h2>
<p>The simplest way to create object oriented callbacks is by applying the design pattern of delegation. If we would like to construct the C++ equivalent of the example above using the mentioned pattern, we end up with something like the following:</p>
<pre class="brush: cpp">/* server header */
class IFooCallback {
public:
    virtual int operator() (int a, float b) = 0;
};

class Foo {
private:
    IFooCallback* _fooCB;
public:
    void registerCallback(IFooCallback* fooCB);
    int doFoo(int a, float b);
};

/* client code */
class MyFooCallback: public IFooCallback {
    int operator() (int a, float b) {
        /* ... do something ... */
    }
};

int main() {
    Foo foo;
    MyFooCallback fooCB;
    foo.registerCallback(fooCB);
    cout &lt;&lt; foo.doFoo(5, 3.2f);
    return 0;
}</pre>
<p>As you can see, it is quite straightforward to provide an object oriented alternative to callbacks. However, there is a very significant drawback when using the technique above, namely the type intrusion inherently coming from this definition of a callback. The client code needs to explicitly inherit it&#8217;s own code from a type defined in the server. This results in tight coupling and is likely to carry other disadvantages inside regarding to maintainability and migration issues.</p>
<h2>Delegate methods</h2>
<p>In our previous attempt to provide an easy to use C++ alternative for callbacks with OOP in mind we tried to replace function pointers with a pure virtual base class that acts like an interface definition for our callback. However, it somewhat violates the original goals of delegates which by definition should be some form of run-time inheritance (this varies from definition to definition, still, this is the one that I&#8217;m referring to in this article). We soon figure out that the most convenient way would be to be able to assign member functions of any class as a callback. Obviously, the parameters and return type should still match as previously to provide type safety, but we would like to remove any additional dependencies between the client and the server.</p>
<p>While C++ does have the term of pointers to member functions there is no easy and standard way to implement callbacks using them. Or is there? First of all, there is no particular problem with class static member functions as they are much like C functions, however, limiting delegates to static methods heavily affects the freedom of the developer. The problem with object member functions and especially with virtual member functions is that they have the implicit parameter <strong>this</strong> that enables them to access the object they correspond to.</p>
<p>The popular Boost library provides mechanisms that enables the use of object member functions as separate entities by using the <strong>bind</strong> functor adaptor which became part of the language standard as part of <a title="ISO/IEC TR 19768:2007" href="http://www.iso.org/iso/catalogue_detail.htm?csnumber=43289" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.iso.org/iso/catalogue_detail.htm?csnumber=43289&amp;referer=');">Technical Report 1</a>. This extension makes it possible to use member functions as delegates in a way that does not involve any type intrusion side effects.</p>
<p>Unfortunately, these facilities involve a noticeable performance hit when the callback is invoked compared to simple method invocations. Also, using functor adaptors for implementing delegates is not the most straightforward and makes the code quite ugly compared to an ideal situation when delegates are part of the language itself. Of course, this is only my opinion, others who used these libraries more often may have a different vision about the topic.</p>
<p>Anyway, as for me performance is always a concern, I started to look around for alternatives. It surprised me that I&#8217;ve found even two of them very soon:</p>
<ul>
<li><a title="Fastest Possible C++ Delegates" href="http://www.codeproject.com/KB/cpp/FastDelegate.aspx" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.codeproject.com/KB/cpp/FastDelegate.aspx?referer=');">Fastest Possible C++ Delegates</a> by Don Clugston &#8211; This is a library that provides delegates that are as fast as simple virtual method invocations. The implementation strongly relies on the behavior of different compilers, yet is very portable, at least as far as I can tell.</li>
<li><a title="The Impossibly Fast C++ Delegates" href="http://www.codeproject.com/KB/cpp/ImpossiblyFastCppDelegate.aspx" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.codeproject.com/KB/cpp/ImpossiblyFastCppDelegate.aspx?referer=');">The Impossibly Fast C++ Delegates</a> by Sergey Ryazanov &#8211; This library was introduced as an alternative to the previous one that strictly relies only on standard features of the languages. Surprisingly, this later is less supported by different compiler implementations and it is also somewhat slower than the previous one.</li>
</ul>
<p>Personally, I go with the first one as for me performance and portability is more important than conformance with the standard. And, of course, it is not that hard to change the back-end for the delegate support at some time if I change my mind. Finally, lets see how our foo callback looks like when using the fast delegates of Don Clugston:</p>
<pre class="brush: cpp">/* server header */
class Foo {
private:
    FastDelegate2&lt;int, float, int&gt; _fooCB;
public:
    void registerCallback(FastDelegate2&lt;int, float, int&gt; fooCB);
    int doFoo(int a, float b);
};

/* client code */
class MyClass {
    virtual int handleFoo(int a, float b) {
        /* ... do something ... */
    }
};

int main() {
    Foo foo;
    MyClass myObj;
    foo.registerCallback( MakeDelegate(&amp;myObj, &amp;MyClass::handleFoo) );
    cout &lt;&lt; foo.doFoo(5, 3.2f);
    return 0;
}</pre>
<h2>Multicast delegates</h2>
<p>The delegates presented previously can only be bound to a single method, as usually delegates behave this way, although a single method can be bound by many delegates. The signals and slots model extends this to a many-to-many relationship. Thus a signal is actually just a delegate that can bind to multiple methods at once. Such a primitive is sometimes also referred to as a multicast delegate.</p>
<p>Multicast delegates come handy especially in case of user interface programming and other situations where the event based programming model is used. The basic foundation behind this programming model is the idea of &#8220;subscribe and notify&#8221;. That means there are <em>publishers</em> who will do some logic and sometimes publish <em>events</em>. When such an <em>event</em> is published, it is actually sent out to the <em>subscribers</em> who have subscribed to receive the specific event. At implementation level this is nothing more than having a multicast delegate in the <em>publisher</em> object and providing an interface that will be used by the <em>subscriber</em> objects to register one of their methods that has to be called in case a particular <em>event</em> occurs.</p>
<p>There are plenty of signals and slots libraries out there including but not limited to the Boost Signals library. However, again, if performance is a concern one must look around carefully to find the appropriate library suitable for a particular purpose. One such library that extends the fast delegates of Clugston with a signals and slots framework is that of <a title="Simpler UI Code With Signals and Slots" href="http://www.gallantgames.com/2009/12/13/simpler-ui-code-with-signals-and-slots" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.gallantgames.com/2009/12/13/simpler-ui-code-with-signals-and-slots?referer=');">Patrick Hogan</a>&#8217;s.</p>
<h2>Asynchronous delegates</h2>
<p>If we do one more step forward, we arrive to asynchronous delegates that can provide us a flexible yet efficient messaging system for multi-threaded applications. The only additional thing we have to implement a message queue on the callee side and optionally some form of synchronization if we would like to also make it possible for the asynchronous delegates to return data to the caller.</p>
<p>As this topic deserves a thorough discussion on its own, I would recap on the subject in a future article and try to provide a sample implementation using OpenMP as usual.</p>
<h2>Conclusion</h2>
<p>We&#8217;ve just touched the surface of what possible use case scenarios of delegates one can met during software development, still, we&#8217;ve seen how many advantages such a programming primitive can give to C++ developers no matter if they are implementing a very simple library of sorting algorithms like the qsort C standard library function or a robust, fully event-driven multi-threaded application. We&#8217;ve also seen that there exist several efficient implementations of such a framework for those performance fanatics like me.</p>
<p>It is a perfect example how easily one can extend C++ with another facility that is usually available only in the most modern managed languages. By the way, I would be interested in your opinion what do you like the most in other languages like Java and C#, and you are disappointed that C++ does not directly provide the same thing. Maybe there exists a C++ alternative for those facilities as well, just we have to look around to find them&#8230;</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/02/one-more-degree-of-freedom-for-c/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Instance culling using geometry shaders</title>
		<link>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/</link>
		<comments>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/#comments</comments>
		<pubDate>Mon, 08 Feb 2010 22:58:53 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Samples]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[culling]]></category>
		<category><![CDATA[fragment shader]]></category>
		<category><![CDATA[geometry instancing]]></category>
		<category><![CDATA[geometry shader]]></category>
		<category><![CDATA[GLEW]]></category>
		<category><![CDATA[GLM]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[SFML]]></category>
		<category><![CDATA[texture buffer]]></category>
		<category><![CDATA[transform feedback]]></category>
		<category><![CDATA[uniform buffer]]></category>
		<category><![CDATA[vertex buffer]]></category>
		<category><![CDATA[vertex shader]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=135</guid>
		<description><![CDATA[
Since the appearance of Shader Model 4.0 people wonder how to take advantage of the newly introduced programmable pipeline stage. The most important feature enabled by geometry shaders is that one can change the amount of emitted primitives inside the pipeline. The first thing that a naive developer would try to do with it is [...]]]></description>
			<content:encoded><![CDATA[
<div id="attachment_136" class="wp-caption alignleft" style="width: 160px"><a href="http://rastergrid.com/blog/wp-content/uploads/2010/02/Nature-2010-02-08-20-20-36-24.png"><img class="size-thumbnail wp-image-136  " title="Nature demo screenshot" src="http://rastergrid.com/blog/wp-content/uploads/2010/02/Nature-2010-02-08-20-20-36-24-150x150.png" alt="Nature demo screenshot" width="150" height="150" /></a><p class="wp-caption-text">OpenGL 3.2 - Nature</p></div>
<p>Since the appearance of Shader Model 4.0 people wonder how to take advantage of the newly introduced programmable pipeline stage. The most important feature enabled by geometry shaders is that one can change the amount of emitted primitives inside the pipeline. The first thing that a naive developer would try to do with it is geometry tesselation. However, the new shader performs very bad when used for tesselation in a real life scenario even though there are demos show casting this possibility. If we take a closer look at the new feature we observe that the most revolutionary in it is not that it can raise the number of emitted primitives but that it can discard them. This article would like to present a rendering technique that takes advantage of this aspect of geometry shaders to enable the GPU accelerated culling of higher order primitives.</p>
<p><span id="more-135"></span>Geometry shaders can be used for many different advanced rendering techniques that were impossible before the introduction of this flexible programmable shader stage. In this article I would like to present one use case that for me seemed to be one of the most practical application of primitive manipulation possibilities introduced by geometry shaders. As I haven&#8217;t seen any whitepaper talking specifically about this particular technique, even if some of them inherently used it, I would dare name the technique myself as <strong>Instance Cloud Reduction</strong>. I will also present a demo program that shows how to take advantage of the technique in a heavy workload situation.</p>
<p>The idea itself was inspired by AMD&#8217;s  tech demo for the Radeon 4800 series cards called <a title="March of the Froblins" href="http://developer.amd.com/samples/demos/pages/froblins.aspx" target="_blank" onclick="pageTracker._trackPageview('/outgoing/developer.amd.com/samples/demos/pages/froblins.aspx?referer=');">March of the Froblins</a>. An almost identical technique presented in this article is used in the mentioned demo for the culling of large amount of animated creatures against the view frustum. Also a somewhat similar technique is used in NVIDIA&#8217;s <a title="Skinned Instancing" href="http://developer.download.nvidia.com/SDK/10/direct3d/samples.html" target="_blank" onclick="pageTracker._trackPageview('/outgoing/developer.download.nvidia.com/SDK/10/direct3d/samples.html?referer=');">Skinned Instancing</a> demo for determining LOD instance sets. Unfortunately, both demos are for DirectX only and, as far as I can tell, there is no OpenGL demo showing any of the aforementioned rendering techniques.</p>
<h3>Motivation</h3>
<p>Nowadays, as the computational capabilities of GPUs is growing in a much faster pace than that of CPUs, graphics developers meet more and more optimization problems related to CPU bound applications. More and more focus is on minimizing the number of driver invocations, actually that&#8217;s what motivated the restructuring of the two most commonly used graphics APIs. As a result we have now DirectX 10+ and OpenGL 3+. However, even if the introduction of geometry instancing, texture arrays and local memory buffer storage for the most important inputs of the rendering, there is still need for wise decisions from graphics programmers to take full advantage of the horsepower coming with the latest GPUs.</p>
<p>Earlier graphics applications strongly relied on CPU based culling techniques, whether it be the usage of the quite outdated BSPs or the more generic and still heavily applied hierarchical culling techniques. We&#8217;ve already reached the point that sometimes even the most efficient CPU based culling techniques seem to be too expensive and usually introduce the small batch problem. Instanced rendering is not an exception.</p>
<p>The applicability of geometry instancing is strongly limited by several factors. One of the most important ones is the culling of instanced geometries. One may choose to cull these objects in the same fashion as others, using the CPU, but that usually breaks the batch and maybe we loose the benefits of geometry instancing. It is more and more imminent to have a GPU based alternative. Without CPU based culling, by sending the whole bunch of instances down the graphics pipeline may choke our vertex processor in case we have high poly geometries and quite large amount of instances of it.</p>
<p>The rendering technique presented in this article will try to achieve this goal. We will use a multi-pass technique that in the first pass culls the object instances against the view frustum using the GPU and in the second pass renders only those instances that are likely to be visible in the final scene. This way we can severely reduce the amount of vertex data sent through the graphics pipeline.</p>
<h3>Implementation</h3>
<p>For some people it might seem that the promise for such a technique is simply too naive and is most probably relying on very exotic OpenGL features, heavy misuse of some basic features or need of data conversions during the frame rendering. Wondrously, this is not the case as we have all we need in OpenGL 3.2 to implement the object culling method sketched above. All we need are the followings:</p>
<ul>
<li>instanced rendering (core since OpenGL 3.1)</li>
<li>geometry shaders (core since OpenGL 3.2)</li>
<li>transform feedback (core since OpenGL 3.0)</li>
<li>uniform or texture buffers (core since OpenGL 3.1)</li>
</ul>
<p>The method itself is a multi-pass rendering technique, however, unlike other multi-pass rendering techniques it does not produce any fragments in the first pass, instead the first pass does the view frustum culling and processes data entirely only inside buffer objects.</p>
<h3>Culling pass</h3>
<p>In the first pass we will feed the graphics pipeline with information about the instances that are needed to perform the view frustum culling. For this we need two inputs for the executed shaders in order to be able to perform the required calculations:</p>
<ol>
<li><strong>Instance transformation data</strong> (whether it be a simple transformation matrix or quaternions or whatever) -- This preferably comes from one or more buffer objects that are bound as vertex buffers to the context.</li>
<li><strong>Object extents information</strong> -- Beside the instance positions we have to know the extents of an instance in order to perform correct culling. This can be either a single float representing the object radius if we choose to use bounding spheres for the culling or a three-dimensional extent vector if we would like to use bounding boxes.</li>
</ol>
<p>Using these as input we can feed in the instance transformation data as attributes of point primitives to our culling shader. The culling shader is composed of a vertex and a geometry shader. In a typical setup the role of each is the following: the vertex shader determines whether the actual object instance&#8217;s bounding volume is inside the view frustum and sends a flag about the culling to the geometry shader, that will emit the instance data to the destination buffer if the flag says that the instance is likely to be visible or does not emit anything if it is determined that the object instance is out of view.</p>
<p>Next, transform feedback is used to capture the primitives emitted by the geometry shader into another buffer object that will be used in the actual rendering pass to source instance transformation data. Beside this, we also need to have an asynchronous query to determine the number of primitives generated to know how many instances of the object do we actually need to render. The following figure shows the workflow of the first pass:</p>
<div id="attachment_146" class="wp-caption aligncenter" style="width: 460px"><a href="http://rastergrid.com/blog/wp-content/uploads/2010/02/icr_pass1.png"><img class="size-full wp-image-146" title="Culling pass" src="http://rastergrid.com/blog/wp-content/uploads/2010/02/icr_pass1.png" alt="Culling pass" width="450" height="200" /></a><p class="wp-caption-text">Instance Cloud Reduction - Pass 1: Culling</p></div>
<p>The actual geometry shader implementation needed to perform the actual culling based on the view frustum check performed by the vertex shader should look like the following chunk:</p>
<pre class="brush: c">#version 150 core

layout(points) in;
layout(points, max_vertices = 1) out;

in vec4 OrigPosition[1];
flat in int objectVisible[1];

out vec4 CulledPosition;

void main() {

	/* only emit primitive if the object is visible */
	if ( objectVisible[0] == 1 )
	{
		CulledPosition = OrigPosition[0];
		EmitVertex();
		EndPrimitive();
	}
}</pre>
<p>In this example we used only simply a four-component position vector for the instance transformation data but the technique works well for transformation matrices and quaternions as well.</p>
<p>One more thing is that beside that we set up transform feedback in a way that we feed our buffer object dedicated for the culled instance data and we also started an asynchronous query to be able to determine the number of primitives written into the buffer object, it is also useful to turn of rasterization as we wouldn&#8217;t like to produce any fragments as a result of the first pass.</p>
<h3>Rendering pass</h3>
<p>In the second pass there is nothing special to do. Simply use whatever rendering setup you would like to use. The only things that need to be changed in this step compared to your already existing rendering path is that the instance data for the rendering must be sourced from the generated culled instance data buffer and, as a result, the number of instances passed for the instanced drawing functions shall be changed in order to render only the visible instances. This number can be read from the asynchronous query&#8217;s result that we started in the first pass.</p>
<p>The instance data in the rendering pass can be, of course, sourced from either a uniform or a texture buffer object. This depends on the actual use case and is more clearly explained in the article <a href="http://rastergrid.com/blog/2010/01/uniform-buffers-vs-texture-buffers/">Uniform Buffers VS Texture Buffers</a>.</p>
<p>Important note is that when one has to deal with several instanced geometries it is recommended to do the culling phase prior to rendering any instanced primitives because of the following reasons:</p>
<ul>
<li>The result of the first instance cloud&#8217;s culling is more likely to be finished on the GPU so no sync issues arise from reading the asynchronous query result to determine the number of visible instances.</li>
<li>Probably less state changes are needed as very different setup is required by the two passes.</li>
<li>Results in tidier renderer design as culling is clearly separated from actual rendering.</li>
</ul>
<p>Putting everything together, the application of the presented technique would result in the following workflow on the GPU:</p>
<div id="attachment_150" class="wp-caption aligncenter" style="width: 660px"><a href="http://rastergrid.com/blog/wp-content/uploads/2010/02/icr_combined.png"><img class="size-full wp-image-150" title="Instance Cloud Reduction" src="http://rastergrid.com/blog/wp-content/uploads/2010/02/icr_combined.png" alt="Instance Cloud Reduction" width="650" height="347" /></a><p class="wp-caption-text">Instance Cloud Reduction - Combined view of Pass 1 + Pass 2</p></div>
<h3>Conclusion</h3>
<p>We&#8217;ve seen that the presented advanced rendering technique is able to help in situations when we have to deal with large number of instanced geometries and how to take advantage of the latest features of graphics cards and OpenGL to perform view frustum culling calculations on the GPU. This prevents us from having to deal with complicated and expensive CPU based object culling methods that break the drawing batches, especially when dealing with dynamic objects. For ease the decision whether to incorporate this technique in your rendering engine I would like to present the advantages and disadvantages of it.</p>
<p><strong>Advantages:</strong></p>
<ul>
<li>Heavily reduces the amount of processed data in a naive implementation.</li>
<li>No need for any space partitioning methods in the host application to handle the culling of dynamic objects.</li>
<li>Can handle huge amount of instanced objects due to the enormous horsepower of today&#8217;s GPUs.</li>
<li>Scales well with increased number of instances as the per-instance calculation is relatively low.</li>
<li>Relies strictly on OpenGL 3.2 core features.</li>
<li>No need for OpenCL capable hardware.</li>
</ul>
<p><strong>Disadvantages:</strong></p>
<ul>
<li>Needs an extra rendering pass to perform the culling.</li>
<li>Requires the usage of asynchronous queries to determine the number of visible instances.</li>
</ul>
<p>I hope you agree with me and think about this technique as one more step towards fully GPU based scene management. If you have any remarks or improvement ideas regarding to the rendering technique itself feel free to tell me.</p>
<h3>The Demo</h3>
<p>As I promised, the technique presented above comes with a live demo that actually took most of my time dedicated to writing this blog in the last two weeks. The demo itself is more like a technical show cast rather than a presentation of a real-life use case scenario.</p>
<p>First of all, I used high polygon count models for the rendering to emphasize the amount of time the culling phase spares from the very valuable time of our GPU. In a real world application one would never do something like this. As a result, the demo is more like a benchmark than an interactive application. However, maybe on high-end graphics cards it can perform pretty well.</p>
<p>The demo scene consists of two object types: trees and grass blocks. The tree model is further divided into two parts as they need different textures: the tree trunk and the tree foliage. Obviously, this additional burden can be prevented by using texture arrays to avoid the need of separate draw calls to render the trunk and the foliage.</p>
<p>The tree trunk consists of 33138 triangles, the tree foliage has 16069 triangles and the faking-free grass block consists of 8961 triangles which I had to model myself as didn&#8217;t found any suitable model. Actually this modeling step consumed quite a reasonable amount of my time spent with the demo as I&#8217;m not an expert in this domain.As you can see, these models are not the ones that one might use in an interactive real-time application like games. However, they seemed to be very suitable for the purpose of the demonstration.</p>
<p>What really kicks off the boundaries of GPUs is that the demo renders 10,000 trees and 250,000 grass blocks using instancing. This ends up in more than <strong>2.7 billion triangles</strong> in the scene. This is far more that a GPU can handle without the aid of some scene management and culling. However, we will use no scene management at all and the only culling method that we will use is the one presented in this article.</p>
<p>The actual results are quite promising. The view frustum culling step usually spares more than <strong>99.9%</strong> of the GPU horsepower as the amount of actually rendered triangles after the culling step is far below 2 million triangles. This is still quite much but as we use high polygon count models and we don&#8217;t use any LOD techniques this seems reasonable.</p>
<p>Even if the demo scene statistics doesn&#8217;t seem like a typical use case scenario, the ease of the implementation and the compelling visual results made me pleased anyway:</p>
<p style="text-align: center;"><span class="youtube">
<object width="640" height="480">
<param name="movie" value="http://www.youtube.com/v/srbOFTLTe8k&amp;rel=1&amp;color1=3a3a3a&amp;color2=999999&amp;border=0&amp;fs=1&amp;hl=en&amp;autoplay=0&amp;showinfo=0&amp;iv_load_policy=3&amp;showsearch=0&amp;hd=1" />
<param name="allowFullScreen" value="true" />
<embed wmode="transparent" src="http://www.youtube.com/v/srbOFTLTe8k&amp;rel=1&amp;color1=3a3a3a&amp;color2=999999&amp;border=0&amp;fs=1&amp;hl=en&amp;autoplay=0&amp;showinfo=0&amp;iv_load_policy=3&amp;showsearch=0&amp;hd=1" type="application/x-shockwave-flash" allowfullscreen="true" width="640" height="480"></embed>
<param name="wmode" value="transparent" />
</object>
</span><p><a href="http://www.youtube.com/watch?v=srbOFTLTe8k&fmt=18" onclick="pageTracker._trackPageview('/outgoing/www.youtube.com/watch?v=srbOFTLTe8k_fmt=18&amp;referer=');">www.youtube.com/watch?v=srbOFTLTe8k</a></p></p>
<p>On my Radeon HD2600XT I have achieved 6-7 frames per second which is acceptable taking in consideration the huge amount of geometry data still passed to the graphics card. On more recent cards I suppose it should run with good frame rates, however, due to the lack of hardware to test on, these are my only results. If anybody manages to take a better screen capture than mine above then please let me know.</p>
<h3>Implementation details</h3>
<p>Just to tell a few words about what techniques and tricks I&#8217;ve used during the creation of the demo here is a listing of the most important ones:</p>
<ul>
<li>Three models are used as mentioned previously with high instance counts with over 2.7 billion of total triangles in the scene as mentioned already.</li>
<li>Three 512x512 RGBA textures are used for the models that are partially handmade, and again, I&#8217;m not a texture artist so sorry if they don&#8217;t look flawless.</li>
<li>The wavefront model and TGA image loader that accompany the demo are very roughly implemented only for the demo so I would strongly encourage you not to use it to any purpose as it handles only a subset of the possibilities of the file formats.</li>
<li>The vertex data from the wavefront model files is transferred in a very naive way so vertex reuse isn&#8217;t taken into account.</li>
<li>The instance data consists of simple four-component vectors representing the world-space position of the instance. This seemed to be the most simple for the demonstration purposes.</li>
<li>In the second pass, the instance data is sourced from a texture buffer but not really because the visible instance count exceeded the amount that would fit in a uniform buffer. I used texture buffers because for this simple demonstration they seemed to be a little bit more easy to be integrated.</li>
<li>The morphing effect that simulated wind blow is done using hard-coded geometry deformation in the vertex shader. It is not physically correct but visually compelling.</li>
<li>The lighting is a simple directional light using Phong&#8217;s shading and reflection model.</li>
<li>Simple fog is simulated with some awkward formula that I&#8217;ve chosen after a few test runs.</li>
<li>Alpha testing is achieved by using the discard operation in the fragment shader.</li>
</ul>
<h3>Driver issues</h3>
<p>During the development of the demonstration program I&#8217;ve met several driver related problems as I&#8217;ve never used so heavily the latest OpenGL features previously. I&#8217;ve worked with Catalyst 9.12 and 10.1 but both seemed to lack of a proper GLSL compiler. Here are some of the issues I&#8217;ve met:</p>
<ul>
<li>When I&#8217;ve forgot to declare the varyings in the geometry shader as arrays like the standard requires then still the driver hasn&#8217;t complained about any syntax error but when tried to execute the code the program crashed.</li>
<li>Except the texture sampler uniform, all other uniforms failed to work when used in the fragment shader only so I&#8217;ve put them all in the vertex shader.</li>
<li>For loops seemed not to work when used inside the geometry shader, that&#8217;s why the culling itself is done in the vertex shader in the demo.</li>
</ul>
<p>All these problems resulted in nasty tricks to make things working and ended up in awful shader code. Sorry for that. At least now it works on my configuration but pretty unsure whether it will work on other graphics card and driver combos. Please report me any success or failure when trying out the demo. Anyway, be sure to have the latest graphics drivers installed as, at least in case of AMD, OpenGL 3.2 drivers came out only at the fall of 2009.</p>
<p><em><strong>Edit:</strong></em></p>
<p><em>Thanks to the information got from Pierre Boudier from AMD I&#8217;ve updated both the source and binary releases to support the latest drivers properly. The problem was that I didn&#8217;t use attribute location binding as specified in the standard.</em></p>
<p><em>Also have to mention that with my new Radeon HD5770 I managed to achieve over 90 frames per second that actually show that this technique can be in fact used for games and other interactive applications.</em></p>
<p><em>One more thing in the end. As you know this version of the Nature demo uses a texture buffer to source instance positions. I plan to create another version that will take advantage of the instanced arrays introduced in core with OpenGL 3.4. I expect quite a reasonable speedup as that would eliminate the need for texture fetches in the vertex array by rather dedicating a vertex fetcher for the purpose thus increasing the overall performance of the technique.</em></p>
<h3>Binary release</h3>
<p><strong>Platform:</strong> Windows<br />
<strong>Dependency:</strong> OpenGL 3.2 capable graphics driver<br />
<strong>Download link:</strong> <a href="http://rastergrid.com/blog/wp-content/uploads/2010/06/nature12_win32.zip" target="_blank">nature12_win32.zip (3.58MB)<br />
</a><strong>Comments:</strong> Includes the update that makes it work even with the latest drivers.</p>
<h3>Full source code</h3>
<p><strong>Language:</strong> C++<br />
<strong>Platform:</strong> cross-platform<br />
<strong>Dependency:</strong> GLEW, SFML, GLM<br />
<strong>Download link:</strong> <a href="http://rastergrid.com/blog/wp-content/uploads/2010/06/nature12_src.zip" target="_blank">nature12_src.zip (12.6KB)<br />
</a><strong>Comments:</strong> Sorry for the many dependencies, however, I would recommend the mentioned libraries for everybody who is doing OpenGL development.</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/feed/</wfw:commentRss>
		<slash:comments>34</slash:comments>
		</item>
	</channel>
</rss>
