OpenGL vs DirectX: The War Is Far From Over

The War Is Far From Over
I’ve chosen the title based on the popular article that tries to prove that OpenGL lost the war against Direct3D. To be honest, I didn’t really like the article at all. First, because it compared OpenGL 3 which targeted Shader Model 4.0 hardware and DirectX 11 which targeted Shader Model 5.0 hardware. Besides that, as we will see, the war is really far from over… This article aims to list the most important features introduced by OpenGL 3.x, OpenGL 4.x, Direct3D 10, Direct3D 11 and we will also talk about the promised features of the upcoming Direct3D 11.1 to be fair with DirectX
After I wrote my article about the latest features introduced in OpenGL someone asked me whether I can write an article about the comparison of the hardware features exposed by OpenGL and Direct3D. Instead of a long explanation, I decided to simply create a table of the features introduced by the APIs. Please note that the list focuses on hardware features and does not discuss API feature differences between the two APIs. The list may be far from complete and I’m happy to get feedback about what is missing from the table so that I can extend it. Also there are features for which I did not find whether an equivalent exists in D3D and are marked with a question mark. If anybody can point me to the answer, I would be happy, but I did not find a specification of the HLSL versions.
| HARDWARE FEATURES EXPOSED | |||||
| Draw command related features | |||||
| GL 3.x | GL 4.x | DX 10 | DX 11 | DX 11.1 | |
| Conditional/predicated rendering based on the result of occlusion queries (NV_conditional_render) | |||||
| Basic geometry instancing support and instanced draw commands (ARB_draw_instanced) | |||||
| Geometry instancing with the ability to specify instanced vertex attributes (ARB_instanced_arrays) | |||||
| Primitive restart (cut index) feature for batching multiple strips together (NV_primitive_restart) | |||||
| Draw commands allowing modification of the base vertex index (ARB_draw_elements_base_vertex) | |||||
| Indirect draw commands that source their parameters from server side buffers (ARB_draw_indirect) | |||||
| New shader type related features | |||||
| GL 3.x | GL 4.x | DX 10 | DX 11 | DX 11.1 | |
| Geometry shader support and adjacency primitive support (ARB_geometry_shader4) | |||||
| Instanced geometry shader support with fixed number of invocations (ARB_gpu_shader5) | |||||
| Tessellation control and evaluation (hull and domain) shader support (ARB_tessellation_shader) | |||||
| Transform feedback (stream-output) related features | |||||
| GL 3.x | GL 4.x | DX 10 | DX 11 | DX 11.1 | |
| Basic transform feedback (stream-output) support (EXT_transform_feedback) | |||||
| Transform feedback support without a geometry shader being active (EXT_transform_feedback) | |||||
| Support for pausing and resuming transform feedback (stream-output) (ARB_transform_feedback2) | |||||
| Auto-draw support (feed back the contents of the transform feedback buffer) (ARB_transform_feedback2) | |||||
| Instanced auto-draw support (transform feedback buffer drawing with instancing support) (ARB_transform_feedback_instanced) | |||||
| Support for outputting multiple primitive streams using transform feedback (stream-output) (ARB_transform_feedback3) | |||||
| Asynchronous queries and related features | |||||
| GL 3.x | GL 4.x | DX 10 | DX 11 | DX 11.1 | |
| Support for occlusion query for getting number of samples passed (ARB_occlusion_query) | |||||
| Support for occlusion query for getting only a boolean value about visibility (ARB_occlusion_query2) | |||||
| Support to query the number vertices processed and the number of vertex shader invocations | [1] | ||||
| Support to query the number of geometry shader invocations in case a geometry shader is active | [1] | ||||
| Support to query the number of primitives output by the geometry shader (EXT_transform_feedback) | |||||
| Support to query the number of primitives that were sent to the rasterizer (EXT_transform_feedback) | |||||
| Support to query the number of primitives that were passing clipping and were actually rendered | [1] | ||||
| Support to query the number of times a fragment/pixel shader was invoked | [1] | ||||
| Support to query the number of primitives written during transform feedback (stream-output) (EXT_transform_feedback) | |||||
| Support to query the number of primitives generated during transform feedback (stream-output) (EXT_transform_feedback) | |||||
| Support to query a server side high resolution timestamp (ARB_timer_query) | |||||
| Support to query the completeness of rendering commands (ARB_sync) | |||||
| Texture, vertex and renderbuffer format related features | |||||
| GL 3.x | GL 4.x | DX 10 | DX 11 | DX 11.1 | |
| Floating point color and depth formats for textures and render buffers (various extensions) | |||||
| Cube map textures with depth component internal format (EXT_gpu_shader4) | |||||
| Half-float (16-bit) vertex and pixel data support (NV_half_float, ARB_half_float_pixel) | |||||
| Non-normalized integer color formats for textures and renderbuffers (EXT_texture_integer) | |||||
| Packed depth/stencil texture and renderbuffer formats (EXT_packed_depth_stencil) | |||||
| RGTC texture compression for two-component textures (EXT_texture_compression_rgtc) | |||||
| Signed normalized texture component formats (EXT_texture_snorm) | |||||
| Seamless cube map filtering support (to hide artifacts at cube map edges) (ARB_seamless_cube_map) | |||||
| Support for swizzling the components of a texture (ARB_texture_swizzle) | |||||
| BPTC texture compression for floating point and unsigned normalized textures (ARB_texture_compression_bptc) | |||||
| 64-bit floating point vertex attribute formats (ARB_vertex_attrib_64bit) | |||||
| New texture type related features | |||||
| GL 3.x | GL 4.x | DX 10 | DX 11 | DX 11.1 | |
| One- and two-dimensional layered array textures (EXT_texture_array) | |||||
| Cube map array textures as special two-dimensional array textures (ARB_texture_cube_map_array)) | |||||
| Rectangular textures with no mipmap support and that are accessed with integer coordinates (ARB_texture_rectangle) | |||||
| Multisampled textures and support for fetching specific sample locations (ARB_texture_multisample) | |||||
| Casting a texture’s interpreted internal format to another internal format | [4] | [4] | |||
| Uniform buffer (constant buffer) related features | |||||
| GL 3.x | GL 4.x | DX 10 | DX 11 | DX 11.1 | |
| Basic uniform buffer (constant buffer) support (ARB_uniform_buffer_object) | |||||
| Support for large uniform buffers and binding subranges (ARB_uniform_buffer_object) | |||||
| Framebuffer and texture rendering related features | |||||
| GL 3.x | GL 4.x | DX 10 | DX 11 | DX 11.1 | |
| Rendering to textures and renderbuffers (EXT_framebuffer_object) | |||||
| Multisample stretch blit functionality (EXT_framebuffer_multisample, EXT_framebuffer_blit) | |||||
| sRGB rendering and blending support for framebuffers (EXT_framebuffer_sRGB) | |||||
| Support for enabling or disabling clamping of the depth of fragments (ARB_depth_clamp) | |||||
| Support for logical operations on integer render targets (supported for a decade in OpenGL) | |||||
| Blending related features | |||||
| GL 3.x | GL 4.x | DX 10 | DX 11 | DX 11.1 | |
| Support for alpha-to-coverage when using multisampling (ARB_multisample) | |||||
| Per-color-buffer blend enables and color writemasks (EXT_draw_buffers2) | |||||
| Dual-source color blending support based on a secondary output of the fragment shader (ARB_blend_func_extended) | |||||
| Individual blend equations and blend functions support for each color output (ARB_draw_buffers_blend) | |||||
| Shader related features | |||||
| GL 3.x | GL 4.x | DX 10 | DX 11 | DX 11.1 | |
| Texture lookup functions to access individual texels of a LOD using integer coordinates (EXT_gpu_shader4) | |||||
| Query the dimensions of a specific LOD of a texture in shaders (EXT_gpu_shader4) | |||||
| Ability to apply integer offsets to the texel location during texture lookup (EXT_gpu_shader4) | |||||
| Ability to explicitly pass in derivative values that are used to compute LOD during texture lookup (EXT_gpu_shader4) | |||||
| Control over varying variable interpolation: non-perspective, flat, centroid sampling, etc. (EXT_gpu_shader4) | |||||
| Full signed and unsigned integer support in shaders (EXT_gpu_shader4) | |||||
| Vertex ID built-in variable available in vertex shader (EXT_gpu_shader4) | |||||
| Primitive ID built-in variable available in geometry and fragment shader (EXT_gpu_shader4) | |||||
| Instance ID built-in variable available in vertex shader (ARB_draw_instanced) | |||||
| Shader fragment coordinate convention control (ARB_fragment_coord_conventions) | |||||
| Provoking vertex control (for flat shaded varying value selection) (ARB_provoking_vertex) | |||||
| Support for encoding and decoding floating point values from and to integers (ARB_shader_bit_encoding) | |||||
| Support for get the results of the automatic LOD computations in shaders (ARB_texture_query_lod) | |||||
| Support for coherent indexing into arrays of samplers using non-constant indices (addressable samplers) (ARB_gpu_shader5) | |||||
| Support for indexing into arrays of uniform blocks (addressable constant buffers) (ARB_gpu_shader5) | |||||
| Gathered texture fetches over a 2×2 footprint (with custom offsets) (ARB_texture_gather) | |||||
| Invocation ID built-in variable available in geometry shader (ARB_gpu_shader5) | |||||
| Support for double-precision floating-point data types in shaders (ARB_gpu_shader_fp64) | |||||
| Support for sample-frequency fragment shader execution (ARB_sample_shading) | |||||
| Support indirect subroutine calls in all shader stages (ARB_shader_subroutine) | |||||
| Support for selecting from multiple viewports using a geometry shader (ARB_viewport_array) | |||||
| Support for dedicated atomic counters in shaders (ARB_shader_atomic_counters) | [2] | [2] | |||
| Support for backing up dedicated atomic counters with buffers (ARB_shader_atomic_counters) | [5] | [5] | |||
| Support for load/store (read/write) buffers and textures in shaders (ARB_shader_image_load_store) | [3] | ||||
| Support for atomic operations on load/store buffers and textures (ARB_shader_image_load_store) | |||||
| Support for disabling or forcing early depth test (ARB_shader_image_load_store) | |||||
| Support for conservative depth (enabling safe early tests even when modifying depth) (ARB_conservative_depth) | |||||
| Support for coverage as input to the fragment shader (ARB_gpu_shader5) | |||||
| Miscellaneous features | |||||
| GL 3.x | GL 4.x | DX 10 | DX 11 | DX 11.1 | |
| Support for floating point viewport specification (ARB_viewport_array) | |||||
| Per-texture mipmap clamping (supported since the very early versions of OpenGL) | |||||
| Support to use a single depth texture for depth testing and as texture input (when depth writes are disabled) | |||||
[1] There is no support for these counters in OpenGL, however they can be implemented with the help of shader atomic counters.
[2] There is no support in Direct3D to use the dedicated atomic counter hardware (supported currently only by AMD GPUs) only by using an append/consume buffer. Though, as atomic counters are the part of UAVs and arbitrary number of UAVs can be attached to a single resource, the same functionality is supported indirectly.
[3] There is read/write buffer and texture support in Direct3D 11, however it is available only in the fragment (pixel) shader. Direct3D 11.1 plans to remove this restriction.
[4] There is no support for texture format casting in OpenGL, conversion, however, can be done by doing a copy preferably using pixel buffer objects.
[5] There is no support for automatic storage of atomic counter values in buffers in Direct3D, however, their value can be manually copied to arbitrary resources.
As a conclusion, I would like to say just one thing: even though there are some features that are not supported by either OpenGL or Direct3D, we really can say that the two APIs are on par with the number of hardware features they expose.
(Sorry in advance for any mistakes, it took quite some time to create this table and I may became too tired at the end)
| Print article | This entry was posted by Daniel Rákos on October 7, 2011 at 7:02 pm, and is filed under Graphics, Programming. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site. |

about 6 months ago
That’s all cool, but practice doesn’t confirm this. See benchmarks for games that have both DX9 and DX10/11 mode. It’s not like DX10 mode is twice as fast. In fact, differences are often negligible and in some cases DX10/11 mode offers no or minimal visual boost with severe performance hit (Polish Crysis 2 DX9/11 test, translated: http://translate.google.com/translate?u=http%3A%2F%2Fwww.benchmark.pl%2Fmini-recenzje%2FCrysis_2_DX9_vs_DX11_-_roznica_w_grafice_oraz_w_wydajnosci-4014.html&hl=en&langpair=auto|en&tbb=1&ie=UTF-8 ).
I’m mentioning multithreaded rendering because that’s what possibly could justify new API drawbacks (like reduced audience).
While I appreciate your work on 3D graphics and promoting OpenGL, you write mostly from 3D graphics demos perspective and focus on new, cool features. I’m just saying that there are other factors, like business POV, (targeting right people and hardware), technical POV (having usable debugger and reliable drivers) and that in a real-world, non-synthetic environment (i.e. a real game) new features don’t offer clearly visible benefits. Of course you can wonder if writing new tech from scratch would change this as major engines were mostly upgraded to DX10+ from DX9 code. But don’t forget that consoles are much more profitable and they are still ~SM 3.0 level hardware (but with low level access).
about 6 months ago
Another example would be tessellation. If used for real (not as a better bump mapping), it could vastly change not only graphics but also whole content creation pipelines. However, Steams says that only 5.6% systems have DX11 GPU. Now, according to vgchartz.com, PC sales are usually 10-15% of all sales for multiplatform AAA titles. So the question is do you *really* want to limit your audience to 0.5% of usual one. For 1M sales that would be 5000 boxes.
about 6 months ago
Yes, the only metric that matters is device support. The market for mobile games is insanely huge these days, and the fact that GL runs on both mobile and desktop is a no brainer. Give your PC and Consoles ports a little more visual gravy (mainly just to please the platform holder), but spend the bulk of your effort on the lowest common denominator: Mobile.
Anyone not playing the GL game is oblivious to the direction the game biz is going.
about 6 months ago
I’m not going to get into the GL vs DX debate any further than to note that to any professional studio or programmer the API differences are unimportant and the decision is dominated more by issues like the stability of GL vs DX drivers, etc. It’s *really easy* to use a different API on different platforms, so you use the one that works best on each platform. Only very small teams care about the portability of a graphics API to different platforms.
That said let me correct a few factual errors:
1) Your note (2) is incorrect. DirectX 11+ has IncrementCounter() and DecrementCounter() functions that expose this hardware. This is how all modern order-independent transparency per-pixel lists work.
2) Your next box “support for backing up dedicated atomic counters with buffers ARB_shader_atomic_counters)” is sort of misleading and irrelevant. These counters are stored together with buffers in DirectX 11+, so I’d argue you should either remove this line entirely or mark it as supported in DirectX as well.
3) Remove note (4) and put red boxes for OpenGL there… that’s just waffling nonsense. You could put a similar note for every single feature that is red-boxed in DirectX. Copying an entire buffer/texture does not constitute and acceptable workaround.
I know that your intention here is not to do an unbiased comparison, but you really ought to include some of the larger features that DirectX supports and OpenGL does not for completeness. The big ones that you really need to include:
1) Multithreaded rendering and object creation.
2) Compute shader; yes this *is* a graphics feature. It is used in nearly every modern game for everything from shading to texture compression/decompression to post-processing. And no, OpenCL interop does not provide an equivalent mechanism since it causes a scheduling stall at every interop point (compute shader does not since it uses the same scheduler and resources).
Those are the two big ones you really need to add. I’d argue that compiled state handling is still significant superior in DX10+ compared to OpenGL, but it’s not as critical as the above two that you include.
about 6 months ago
1) The problem is that you think that atomic counters are only useful for order-independent transparency and other per-pixel linked list based techniques. In case of OpenGL, you can use atomic counters for whatever you wish, not just for offsets into append/consume buffers.
2) Again, you assume that atomic counters are just for append/consume buffers. No, in OpenGL you can save them in arbitrary buffers so you can e.g. generate the instance count field of an indirect draw command (it is called indirect dispatch or something like that in D3D) structure, but this is something that is not supported by D3D at all, so no, there is no need to remove the line (and there are countless other possible use cases).
3) That line was actually introduced based on one of the comments. Yes, maybe red box would be better, I can change that. Here your reasoning has some point.
It seems that you are simply unaware of what OpenGL and the hardware supports and just see it from the Direct3D perspective. I never said that my opinion is not biased but these are more or less facts (except for the yellow cases that may have room for argument). You should first check what are the extras exposed by OpenGL that are not supported by D3D before making any preconceptions. Atomic counters are simply something that are only implicitly supported by D3D in a very, very limited form. If you don’t believe me, just ask any of the experts of AMD or NVIDIA.
1) Multithreaded rendering and object creation is API feature, not hardware feature. As it was stated in the article and multiple times in my comments, this is about exposed hardware features. Btw, OpenGL allows multithreaded object creation for a long time, so as limited multithreaded rendering. I agree that D3D’s multithreading support is far more robust and superior, but these things are totally out of the scope of this article.
2) Compute shader’s equivalent is OpenCL, not OpenGL. Microsoft decided to include the computing API inside Direct3D, a graphics API. That’s their design decision, it does not mean that everybody has to follow it. OpenCL and OpenGL can interoperate, and maybe OpenCL is less capable than D3D11 (I don’t know for sure, I’m not an expert of the topic, so I don’t make an opinion about it). OpenCL interop allows sharing resources with OpenGL and also allows synchronization between OpenGL rendering commands and OpenCL compute kernels so your argument about “same scheduler and resources” is pointless.
I agree that the API choice is not the most important thing in developing graphics software and I agree that driver quality matters, but again, this is not part of the scope of this article.
Before “protecting” one API against another one, please take your time to better understand and learn the other API, I did it with D3D…
about 6 months ago
1) No, it’s exactly the same. IncrementCounter() returns an integer. This is separate from Append(). You can use it for whatever you want.
2) Again, you’re incorrect; look it up. No offense, but have you ever actually used DirectX 11? If not, you should at least take my word for it.
And trust me, I’m extremely aware of what both the hardware and software support across OpenGL and DirectX is. I do this professionally.
About compute shader – read what I wrote again please. I did mention OpenCL and why it is not a comparable alternative option.
I’m honestly not trying to “protect” any API. I use them both and have no vested interest in either. I’m actually just trying to correct a few factual errors and give a slightly less biased comparison in your table.
about 6 months ago
Can you access the value of the atomic counter on the CPU side? Can you access the value of the atomic counter indirectly, through a buffer on the GPU side?
I’m not questioning your knowledge related to DirectX but I could ask also you whether you’ve used OpenGL 4.2 or do you follow the latest specification changes.
About the OpenCL interop, I need you to clarify a bit more what do you mean by scheduling stall, CPU side, GPU side, why, when. What I understood is what I reacted for, maybe I’m wrong, but from the information you wrote it is difficult to judge.
I don’t want to refer to any particular persons by their name, but this article was read by certain representatives of GPU vendors and professional game developers. If not even officially, but they approved most of the data here (at least from their point of view of expertise, both D3D and GL).
You know, maybe I’ll come back somewhen when I’ll have time with an atomic counter based algorithm that cannot be implemented in D3D, at least not with the same performance (because, of course, you can emulate atomic counters with buffer storage using read/write image atomics, just it will be slower).
Sorry if I was harsh, but if you check the many comments, some people with little to no background came here to stand for what they think is correct, sometimes I had to protect D3D because there were such strange opinions that were promoting D3D9 against D3D10+, so I already read your comment with the same feelings and so I answered. Sometimes it is really difficult to select legitimate comments, but I still don’t accept the atomic counter thing…
about 6 months ago
Yes you can copy the counter value to an arbitrary location using ID3D11DeviceContext::CopyStructureCount, including copying it to a staging buffer for readback to the CPU. Just go look up the documentation related to that function, anything with “UAV_FLAG_COUNTER”, IncrementCounter(), etc. The public DirectX documentation is unfortunately pretty bad (one area where OpenGL is way better!) but you should be able to get the idea.
I do follow OpenGL, yes, although probably not quite as much as you (I skim the specs and some extensions, often before they are released). That said, I’m quite aware of what the hardware supports and how various features are implemented so I generally know if there’s something that is not exposed in one API or another.
What I mean with respect to OpenCL is that the OpenCL runtime has it’s own scheduler, command buffers and resource handles. It has it’s own DAG for working out when it’s safe to execute dispatched commands (by tracking input/output buffer dependencies). Because OpenGL has a separate DAG, they do not interact very cleanly. Since there is no global scheduler that decides whether it’s now best to run a compute task or a graphics task, the best you can get is to insert dummy nodes into the other graph and implement them as stalls in one or the other scheduler. This ensures correctness (i.e. a compute task will not start that is dependent on a graphics task), but not efficiency since it is a very coarse-grained mechanism. Thus while you can do this a couple times per frame, you can’t do fine-grained graphics/compute interaction which is a shame since a large portion of future-looking renderers have significant portions that run more efficiently in the compute pipeline. Now don’t get me wrong… having an interop mechanism is better than nothing, but it’s not really good enough to be usable in a shipping game yet.
Anyways I can respect that people have looked this over and I’m not blaming you for getting anything wrong here what-so-ever – it’s subtle stuff that most people wouldn’t catch or really know how it works internally. Hence why I’m just trying to correct the cases that I’m familiar with. Anyways it’s not really a big deal, but your post was going around on twitter and I figured since you obviously spent a lot of time putting the table together there were a couple cases that are worth correcting.
I’ll accept your point that multithreaded rendering is primarily an API/software/driver feature, although it does have some minor implications on hardware. Given that goal in this post, you’re right that it’s fine not to include it.
I totally understand your response in that posts like this typically attract trolls and inexperienced programmers. I can assure you I’m not one of those (my name is in the OpenCL spec among others
.
about 6 months ago
Okay, thanks for the clarification about the CopyStructureCount method, though, to be fair, if you say that the texture format casting vs pixel buffer object case should be red, then this one should still remain red, because it requires an expicit copy as well
One more question, though: can you create and use atomic counters in D3D without creating an UAV buffer? I suppose not, but please correct me if I’m wrong.
About D3D documentation, well, yes, I also had a hard time to figure out whether certain features are supported in D3D or not because the documentation is not really the best, it’s more like a reference manual than a clear documentation of the API.
Respect to OpenCL interop, well I think what you say is more of an implementation question. I believe that OpenCL and OpenGL could have, at least in theory, a shared command queue at low level thus the synchronization problems that you’ve mentioned could be prevented. I don’t know that much the OpenCL implementations thus maybe this is not done so, but this issue is not just limited to OpenCL interop but also to multiple contexts in case of OpenGL or D3D. That’s why I think, again, at least in theory, there is solution for this issue.
About multithreaded rendering, I think I understand what you mean by saying it has some minor implications on hardware. I suppose you mean here the possibility to dispatch multiple rendering commands in parallel, but I think this hardware feature is not limited to multithreaded rendering as even in case of a single rendering context it is beneficial to be able to dispatch multiple rendering commands in case there are such ones that are independent from each other.
By the way, are you Andrew Lauritzen of Intel? You said that your name is in the OpenCL spec, so I checked it and fortunately there was only one Andrew L.
Well, if you are from Intel then it is great that you have such in-depth knowledge about modern OpenGL and D3D, considering the poor representation of Intel in the GPU industry, not even talking about driver quality (don’t take it seriously, I’m just joking here).
about 6 months ago
Sure you technically have to “copy” the structure count if you want it somewhere random in a buffer, but copying 1 uint is slightly different than an entire resource
Yes you need a separate UAV for each counter that you want, but note that UAVs are *views* in DirectX, not buffers. So you can have 1 buffer (even with no data) and attach as many counter UAV views to it as you want (up to whatever maximum which will be similar to GL). If you “copy” the counter value it can go to any buffer, not just the one that it is attached to.
And yeah, the public documentation is awful. The spec is slightly better but unfortunately it’s not public.
With regards to OpenCL/OpenGL, the issue is that someone really needs to provide a scheduling interface that would work across vendors and implementations for that to work nicely. Microsoft provides this in the driver interface to DirectX, but currently every vendor does their own, and separate ones for GL/CL. So certainly it’s technically possible, the issue – as always – with GL/CL is getting the vendors to cooperate. For that reason unless there’s a large company with enough ability to force the hardware vendors into line. Such companies exist (not too hard to figure out who I’m hinting at), but so far haven’t been too interested in driving graphics innovation.
I imagined you’d work out who I was from that comment, but I really have no reason to be anonymous
And yeah, Intel has only recently gotten serious about graphics so while great strides have been made in Sandy Bridge, there’s still a long way to go, particularly on the OpenGL side. That said, the DirectX drivers are getting pretty decent, although there are still improvements to be made there too.
In any case I’m not directly involved with the driver effort. I’m more on the research side looking at future rendering techniques and trying to get interesting and useful features into the hardware
about 6 months ago
Yes, copying 1 uint is different than copying an entire resource, but even if the former is less work on the GPU side, both have similar CPU side performance penalties.
Thanks for the clarification, yes, I wasn’t even considering that multiple UAVs can share the same resource backend.
With regards to OpenCL interop, well, let’s say I have an AMD GPU, there the OpenCL and OpenGL drivers can share the same scheduler, why not? In case the OpenCL kernel is ran on the CPU things, of course, are different, as the CPU vendor may be different than that of the GPU, but that’s not what we were talking about. GPU GL/CL drivers should work with the same scheduler, that’s for sure. Of course, I have doubts whether this is really done like this in practice, but it is definitely the way to go.
About your “background check”
, I didn’t think that you want to stay anonymous, just, you know, I like to know who I am talking with, that’s why I was interested from which perspective are you “looking at” what’s going on with GL and D3D.
about 6 months ago
Sure they are similar CPU overhead, but I don’t think that would be a big issue in this case since the total number of counters that can be bound in a given pass is fairly small. You generally don’t need this function anyways since you typically just directly store the count(s) in a buffer directly when you read/increment/decrement them on the GPU in a shader anyways. You could argue that this copy function is just a convenient syntactic sugar for launching a 1×1 compute kernel that reads the counter and writes it elsewhere.
Yes if the drivers are provided by the same vendor it’s theoretically possible for them to use a shared scheduler, but it does involve having an internal layer where things like resources and jobs are submitted from both frontend APIs, similar to the WDDM layer that Microsoft provides. I’m almost certain that no one does this right now though and I imagine it would take a strong push from game developers for anyone to bother.
about 6 months ago
Having a shared internal layer is beneficial anyways. Personally, I would go with the exact same internal layer for providing the functionality to D3D, GL and CL as well. Of course, when you have such a long running legacy code, such a design switch may be not so easy, though.
about 6 months ago
I see you put some updates in there – thanks! To be fair though, this is really not correct: “There is no support in Direct3D to use the dedicated atomic counter hardware (supported currently only by AMD GPUs) only by using an append/consume buffer”. The DX mechanism is exactly the same as the GL one and uses the same hardware. The only differences are in the API, and there are no differences in the power or flexibility of either solution. I don’t mind the footnote for the item below it about buffers, but this one really ought to just be green for both GL and DX.
about 6 months ago
Yes, I meant to put that to green, but keep the footnote, just I forgot the former. However, I would still stick with the yellow in case of the atomic counter with buffer storage.
about 6 months ago
Yup that’s fair – thanks for making the updates!
about 6 months ago
Its also worth mentioning, that until OpenCL1.2 (which has just ratified and currently has no drivers) Compute Shader offers a number of extra image functionality. OpenCL however offers access to a few extra HW bits, that the fixed Compute model doesn’t.
As we move to compute rendering this will be more important, so imho its worth OGL4.2 + OCL compared to DX11 + Compute as in reality any modern compute renderer will interop a lot.
Off the top of my head, Support for MSAA surfaces, Texture Arrays and generally more flexible image support for Compute (1.2 adds some), for OpenCL more hardware support, variable sized Local Mem (for example supporting 48KiB on NV) and explicit support for out of order queues and better multi-device support.
about 5 months ago
OpenGL 4 is pretty comparable to DX11, but none of that matters. All that matters is who provides absolute bottom-of-the-barrel 3D rendering on the widest range of devices possible. I’m convinced the general population lacks the ability to distinguish between an image with basic 3D graphics and no shadows at all, and a high-end 3D render, because they continually throw their money away on low-quality trash.
about 5 months ago
Adrew maybe you know why Intel have asymmetrical approach to supporting DX and OGL in their chips?
Every GPU that support DX11 support also OGL 4.2 (or will be when stable drivers come out), but intels.
Is it hwd problem (eg OGL need more than DX?) or gpu drivers?
(and special one:)
Can OGL 3.3+ be implemented on your SandyBridge/IvyBridge (eg. as by product of Mesa&Gallium&Linux efforts).
It seam that OpenCL have some drawbacks when compared to DXCompute (like inability to bind depth buffers to OpenCL without copping it). But there will be time, when comparing DX3D and OGL will require comparing also DXCompute and OpenCL.
about 5 months ago
as far as I could see from the table while opengl 4.2 supports all features (even if it is implemented with atomic counters) directx 11.1 don’t. However while directx drivers all fully support the standard, opengl drivers don’t, they’re slower (a little bit, max. -5 FPS with Unigine benchmark) and sometimes they’re buggy.
despite all this I still can’t stand directx, I tried both but opengl just rules (at least for me…). Plus as I see it while dx is only on windows and xbox, opengl is on windows linux mac ios android, there’s webgl and even PS3 has opengl-like api. So from a developer’s viewpoint choosing dx is I think just shooting yourself in the leg financially…
still lot of companies manage to live with it.