<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Instance culling using geometry shaders</title>
	<atom:link href="http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/feed/" rel="self" type="application/rss+xml" />
	<link>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/</link>
	<description>A technical blog from Daniel Rákos (aka aqnuep)</description>
	<lastBuildDate>Mon, 16 Apr 2012 23:29:18 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: feng</title>
		<link>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/comment-page-1/#comment-27278</link>
		<dc:creator>feng</dc:creator>
		<pubDate>Mon, 13 Feb 2012 09:17:13 +0000</pubDate>
		<guid isPermaLink="false">http://rastergrid.com/blog/?p=135#comment-27278</guid>
		<description>thank for reply. Looking forwards to the article!(though my cards are limited to 3.3~)
By the way, for instancing, if I use crossed billboards for the grass instead of model, then I may still do instancing with attributes divisor or so and draw using one call; or I may just draw points of position(glDrawArrays(GL_POINTS,..), one draw call though), and populate them in the geometry shader. Which one may be better? 

Another, for instancing, if i do culling as above,then i may first pass the  instance data(position) to the pipeline, then the actual vertices by the 2nd pass. As the  count ratio of them is merely 4:1,or at most 12:1, i tended to think the culling pass cost not so little compared to the whole &quot;vertices&quot; sent . or maybe still a gain to cpu culling? is that right or not? thanks</description>
		<content:encoded><![CDATA[<p>thank for reply. Looking forwards to the article!(though my cards are limited to 3.3~)<br />
By the way, for instancing, if I use crossed billboards for the grass instead of model, then I may still do instancing with attributes divisor or so and draw using one call; or I may just draw points of position(glDrawArrays(GL_POINTS,..), one draw call though), and populate them in the geometry shader. Which one may be better? </p>
<p>Another, for instancing, if i do culling as above,then i may first pass the  instance data(position) to the pipeline, then the actual vertices by the 2nd pass. As the  count ratio of them is merely 4:1,or at most 12:1, i tended to think the culling pass cost not so little compared to the whole &#8220;vertices&#8221; sent . or maybe still a gain to cpu culling? is that right or not? thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Rákos</title>
		<link>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/comment-page-1/#comment-26638</link>
		<dc:creator>Daniel Rákos</dc:creator>
		<pubDate>Fri, 10 Feb 2012 14:53:02 +0000</pubDate>
		<guid isPermaLink="false">http://rastergrid.com/blog/?p=135#comment-26638</guid>
		<description>In case you don&#039;t use instancing or the instance count is too small, then it is possible that the delay caused by waiting for the culling pass&#039;s result would be prohibitive.

However, you should know that no actual vertex data is passed to the culling pass so that&#039;s not true that vertex data has to be sent twice as you use only the instance data buffer in the culling pass.

In most cases, actually not the culling pass that takes time but the waiting for the culling results is the one that is rather expensive.

Actually there is a solution for both problems (i.e. for non-instanced data and query result delay) if you use OpenGL 4.2+ and I plan to write a demo that will show this technique, however, the technique presented in this article provides advantages only in case of instanced data and in case the instance count is relatively high.</description>
		<content:encoded><![CDATA[<p>In case you don&#8217;t use instancing or the instance count is too small, then it is possible that the delay caused by waiting for the culling pass&#8217;s result would be prohibitive.</p>
<p>However, you should know that no actual vertex data is passed to the culling pass so that&#8217;s not true that vertex data has to be sent twice as you use only the instance data buffer in the culling pass.</p>
<p>In most cases, actually not the culling pass that takes time but the waiting for the culling results is the one that is rather expensive.</p>
<p>Actually there is a solution for both problems (i.e. for non-instanced data and query result delay) if you use OpenGL 4.2+ and I plan to write a demo that will show this technique, however, the technique presented in this article provides advantages only in case of instanced data and in case the instance count is relatively high.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: feng</title>
		<link>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/comment-page-1/#comment-26497</link>
		<dc:creator>feng</dc:creator>
		<pubDate>Fri, 10 Feb 2012 06:01:24 +0000</pubDate>
		<guid isPermaLink="false">http://rastergrid.com/blog/?p=135#comment-26497</guid>
		<description>If not for Instanced data, or, the instance count is not that large, would the method here also be a gain? Compared to culling on CPU, the mass of vertices&#039; data must be sent almost twice. And it&#039;s after the geometry stage that vertices are automatically culled for the view port. thank you.</description>
		<content:encoded><![CDATA[<p>If not for Instanced data, or, the instance count is not that large, would the method here also be a gain? Compared to culling on CPU, the mass of vertices&#8217; data must be sent almost twice. And it&#8217;s after the geometry stage that vertices are automatically culled for the view port. thank you.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dmitry Duka</title>
		<link>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/comment-page-1/#comment-1992</link>
		<dc:creator>Dmitry Duka</dc:creator>
		<pubDate>Fri, 17 Jun 2011 12:02:47 +0000</pubDate>
		<guid isPermaLink="false">http://rastergrid.com/blog/?p=135#comment-1992</guid>
		<description>GTX 465 - 240 fps on average, rarely falling down to minimum of 150 fps for a second. Nice demo :)</description>
		<content:encoded><![CDATA[<p>GTX 465 &#8211; 240 fps on average, rarely falling down to minimum of 150 fps for a second. Nice demo <img src='http://rastergrid.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Rákos</title>
		<link>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/comment-page-1/#comment-794</link>
		<dc:creator>Daniel Rákos</dc:creator>
		<pubDate>Fri, 19 Nov 2010 15:55:24 +0000</pubDate>
		<guid isPermaLink="false">http://rastergrid.com/blog/?p=135#comment-794</guid>
		<description>Yes, you should do it that way. 
Actually that can differentiate an efficient and an inefficient OpenGL app that how much you care about the parallel nature of the client-server architecture.</description>
		<content:encoded><![CDATA[<p>Yes, you should do it that way.<br />
Actually that can differentiate an efficient and an inefficient OpenGL app that how much you care about the parallel nature of the client-server architecture.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dude</title>
		<link>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/comment-page-1/#comment-793</link>
		<dc:creator>Dude</dc:creator>
		<pubDate>Fri, 19 Nov 2010 12:18:59 +0000</pubDate>
		<guid isPermaLink="false">http://rastergrid.com/blog/?p=135#comment-793</guid>
		<description>Ahh I see, I have always used only one query object. I should have an array of them, one for each block? And maybe I should also do some trivial task between drawing and culling, like drawing the skybox?

And yeah it helps, makes sense now, it was geting me really frustrated, I am so used to thinking that everything needs to be sequential.</description>
		<content:encoded><![CDATA[<p>Ahh I see, I have always used only one query object. I should have an array of them, one for each block? And maybe I should also do some trivial task between drawing and culling, like drawing the skybox?</p>
<p>And yeah it helps, makes sense now, it was geting me really frustrated, I am so used to thinking that everything needs to be sequential.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Rákos</title>
		<link>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/comment-page-1/#comment-788</link>
		<dc:creator>Daniel Rákos</dc:creator>
		<pubDate>Thu, 18 Nov 2010 16:03:41 +0000</pubDate>
		<guid isPermaLink="false">http://rastergrid.com/blog/?p=135#comment-788</guid>
		<description>Sorry, don&#039;t really read the youtube comments.

Actually querying transform feedback is a cheap operation, the only problem is that the query blocks by default until the results are available. So for you it seemed expensive only because you waited for the latecy between the geometry culling pass and the query results retrieval. In practice you should fill that time with other rendering tasks so you don&#039;t have to stall the CPU (and inherently the GPU) by using blocking query readback.
In your case when you have several blocks to cull I would do it in the following way:

&lt;code&gt;for i = 1 to block.count do
    cull_with_geometry_shader(block[i])
endfor

for i = 1 to block.count do
    inst_count = get_query_result(i)
    draw_instanced(block[i], inst_count)
endfor
&lt;/code&gt;
I hope this helps.</description>
		<content:encoded><![CDATA[<p>Sorry, don&#8217;t really read the youtube comments.</p>
<p>Actually querying transform feedback is a cheap operation, the only problem is that the query blocks by default until the results are available. So for you it seemed expensive only because you waited for the latecy between the geometry culling pass and the query results retrieval. In practice you should fill that time with other rendering tasks so you don&#8217;t have to stall the CPU (and inherently the GPU) by using blocking query readback.<br />
In your case when you have several blocks to cull I would do it in the following way:</p>
<p><code>for i = 1 to block.count do<br />
    cull_with_geometry_shader(block[i])<br />
endfor</p>
<p>for i = 1 to block.count do<br />
    inst_count = get_query_result(i)<br />
    draw_instanced(block[i], inst_count)<br />
endfor<br />
</code><br />
I hope this helps.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dude</title>
		<link>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/comment-page-1/#comment-786</link>
		<dc:creator>Dude</dc:creator>
		<pubDate>Thu, 18 Nov 2010 15:36:00 +0000</pubDate>
		<guid isPermaLink="false">http://rastergrid.com/blog/?p=135#comment-786</guid>
		<description>Hey dude, I sent you a message some time ago on youtube, asking for the models you used in this demo, which was quite silly of me because they were actually included along with the binary. Anyways, I have a slight issue: I generate terrain blocks with the GPU using a density volume, and as suchm i have quite a few blocks(4*4 minimum, excluding coarser(larger) blocks for lod), and for each of these is do another geometry shader pass to generate points for instances(currently just a sphere with 75 vertices or so) and I found that doing the query to retrieve the culled instances for each block was quite expansive(performed once for each block I know there to be instances), so I wound up puting all those instances into a huge buffer and doing just one query. Is querying a transform feedback really that expansive?

Btw I ran your demo with my GT9500, and while I can not remember the fps, it ran quite fluently)</description>
		<content:encoded><![CDATA[<p>Hey dude, I sent you a message some time ago on youtube, asking for the models you used in this demo, which was quite silly of me because they were actually included along with the binary. Anyways, I have a slight issue: I generate terrain blocks with the GPU using a density volume, and as suchm i have quite a few blocks(4*4 minimum, excluding coarser(larger) blocks for lod), and for each of these is do another geometry shader pass to generate points for instances(currently just a sphere with 75 vertices or so) and I found that doing the query to retrieve the culled instances for each block was quite expansive(performed once for each block I know there to be instances), so I wound up puting all those instances into a huge buffer and doing just one query. Is querying a transform feedback really that expansive?</p>
<p>Btw I ran your demo with my GT9500, and while I can not remember the fps, it ran quite fluently)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Rákos</title>
		<link>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/comment-page-1/#comment-437</link>
		<dc:creator>Daniel Rákos</dc:creator>
		<pubDate>Tue, 07 Sep 2010 12:08:18 +0000</pubDate>
		<guid isPermaLink="false">http://rastergrid.com/blog/?p=135#comment-437</guid>
		<description>Since that I&#039;ve been optimizing the algorithm a bit, so most probably also on the 2600XT it would work faster, however I already replaced my video card with a 5770. Anyway it is possible that the algorithm performs better on NVIDIA cards.</description>
		<content:encoded><![CDATA[<p>Since that I&#8217;ve been optimizing the algorithm a bit, so most probably also on the 2600XT it would work faster, however I already replaced my video card with a 5770. Anyway it is possible that the algorithm performs better on NVIDIA cards.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Yours3lf</title>
		<link>http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/comment-page-1/#comment-436</link>
		<dc:creator>Yours3lf</dc:creator>
		<pubDate>Tue, 07 Sep 2010 10:54:24 +0000</pubDate>
		<guid isPermaLink="false">http://rastergrid.com/blog/?p=135#comment-436</guid>
		<description>Hello, I&#039;m also from Hungary, I achieved 50-60 FPS on a 8600GT, which is similar to 2600XT I cant understand that 6 FPS :)</description>
		<content:encoded><![CDATA[<p>Hello, I&#8217;m also from Hungary, I achieved 50-60 FPS on a 8600GT, which is similar to 2600XT I cant understand that 6 FPS <img src='http://rastergrid.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
</channel>
</rss>

