# Efficient Gaussian blur with linear sampling

Gaussian blur is an image space effect that is used to create a softly blurred version of the original image. This image then can be used by more sophisticated algorithms to produce effects like bloom, depth-of-field, heat haze or fuzzy glass. In this article I will present how to take advantage of the various properties of the Gaussian filter to create an efficient implementation as well as a technique that can greatly improve the performance of a naive Gaussian blur filter implementation by taking advantage of bilinear texture filtering to reduce the number of necessary texture lookups. While the article focuses on the Gaussian blur filter, most of the principles presented are valid for most convolution filters used in real-time graphics.

Gaussian blur is a widely used technique in the domain of computer graphics and many rendering techniques rely on it in order to produce convincing photorealistic effects, no matter if we talk about an offline renderer or a game engine. Since the advent of configurable fragment processing through texture combiners and then using fragment shaders the use of Gaussian blur or some other blur filter is almost a must for every rendering engine. While the basic convolution filter algorithm is a rather expensive one, there are a lot of neat techniques that can drastically reduce the computational cost of it, making it available for real-time rendering even on pretty outdated hardware. This article will be most like a tutorial article that tries to present most of the available optimization techniques. Some of them may be familiar to all of you but maybe the linear sampling will bring you some surprise, but let’s not go that far but start with the basics.

## Terminology

In order to precede any possibility of confusion, I’ll start the article with the introduction of some terms and concepts that I will use in the post.

**Convolution filter** – An algorithm that combines the color value of a group of pixels.

**NxN-tap filter – **A filter that uses a square shaped footprint of pixels with the square’s side length being N pixels.

**N-tap filter** – A filter that uses an N-pixel footprint. Note that an N-tap filter does *not* necessarily mean that the filter has to sample N texels as we will see that an N-tap filter can be implemented using less than N texel fetches.

**Filter kernel** – A collection of relative coordinates and weights that are used to combine the pixel footprint of the filter.

**Discrete sampling** – Texture sampling method when we fetch the data of exactly one texel (aka GL_NEAREST filtering).

**Linear sampling** – Texture sampling method when we fetch a footprint of 2×2 texels and we apply a bilinear filter to aquire the final color information (aka GL_LINEAR filtering).

## Gaussian filter

The image space Gaussian filter is an NxN-tap convolution filter that weights the pixels inside of its footprint based on the Gaussian function:

The pixels of the filter footprint are weighted using the values got from the Gaussian function thus providing a blur effect. The spacial representation of the Gaussian filter, sometimes referred to as the “bell surface”, demonstrates how much the individual pixels of the footprint contribute to the final pixel color.

Based on this some of you may already say “aha, so we simply need to do NxN texture fetches and weight them together and voilà”. While this is true, it is not that efficient as it looks like. In case of a 1024×1024 image, using a fragment shader that implements a 33×33-tap Gaussian filter based on this approach would need an enormous number of 1024*1024*33*33 ≈ 1.14 billion texture fetches in order to apply the blur filter for the whole image.

In order to get to a more efficient algorithm we have to analyze a bit some of the nice properties of the Gaussian function:

- The 2-dimensional Gaussian function can be calculated by multiplying two 1-dimensional Gaussian function:

- A Gaussian function with a distribution of 2σ is equivalent with the product of two Gaussian functions with a distribution of σ.

Both of these properties of the Gaussian function give us room for heavy optimization.

Based on the first property, we can separate our 2-dimensional Gaussian function into two 1-dimensional one. In case of the fragment shader implementation this means that we can separate our Gaussian filter into a horizontal blur filter and the vertical blur filter, still getting the accurate results after the rendering. This results in two N-tap filters and an additional rendering pass needed for the second filter. Getting back to our example, applying the two filters to a 1024×1024 image using two 33-tap Gaussian filters will get us to 1024*1024*33*2 ≈ 69 million texture fetches. That is more than an order of magnitude less than the original approach made possible.

The second property can be used to bypass hardware limitations on platforms that support only a limited number of texture fetches in a single pass.

## Gaussian kernel weights

We’ve seen how to implement an efficient Gaussian blur filter for our application, at least in theory, but we haven’t talked about how we should calculate the weights for each pixel we combine using the filter in order to get the proper results. The most straightforward way to determine the kernel weights is by simply calculating the value of the Gaussian function for various distribution and coordinate values. While this is the most generic solution, there is a simpler way to get some weights by using the binomial coefficients. Why we can do that? Because the Gaussian function is actually the distribution function of the normal distribution and the normal distribution’s discrete equivalent is the binomial distribution which uses the binomial coefficients for weighting its samples.

For implementing our 9-tap horizontal and vertical Gaussian filter we will use the last row of the Pascal triangle illustrated above in order to calculate our weights. One may ask why we don’t use the row with index 8 as it has 9 coefficients. This is a justifiable question, but it is rather easy to answer it. This is because with a typical 32 bit color buffer the outermost coefficients don’t have any effect on the final image while the second outermost ones have little to no effect. We would like to minimize the number of texture fetches but provide the highest quality blur as possible with our 9-tap filter. Obviously, in case very high precision results are a must and a higher precision color buffer is available, preferably a floating point one, using the row with index 8 is better. But let’s stick to our original idea and use the last row…

By having the necessary coefficients, it is very easy to calculate the weights that will be used to linearly interpolate our pixels. We just have to divide the coefficient by the sum of the coefficients that is 4096 in this case. Of course, for correcting the elimination of the four outermost coefficients, we shall reduce the sum to 4070, otherwise if we apply the filter several times the image may get darker.

Now, as we have our weights it is very straightforward to implement our fragment shaders. Let’s see how the vertical file shader will look like in GLSL:

uniform sampler2D image; out vec4 FragmentColor; uniform float offset[5] = float[]( 0.0, 1.0, 2.0, 3.0, 4.0 ); uniform float weight[5] = float[]( 0.2270270270, 0.1945945946, 0.1216216216, 0.0540540541, 0.0162162162 ); void main(void) { FragmentColor = texture2D( image, vec2(gl_FragCoord)/1024.0 ) * weight[0]; for (int i=1; i<5; i++) { FragmentColor += texture2D( image, ( vec2(gl_FragCoord)+vec2(0.0, offset[i]) )/1024.0 ) * weight[i]; FragmentColor += texture2D( image, ( vec2(gl_FragCoord)-vec2(0.0, offset[i]) )/1024.0 ) * weight[i]; } }

Obviously the horizontal filter is no different just the offset value is applied to the X component rather than to the Y component of the fragment coordinate. Note that we hardcoded here the size of the image as we divide the resulting window space coordinate by 1024. In a real life scenario one may replace that with a uniform or simply use texture rectangles that don’t use normalized texture coordinates.

If you have to apply the filter several times in order to get a more strong blur effect, the only thing you have to do is ping-pong between two framebuffers and apply the shaders to the result of the previous step.

## Linear sampling

So far, we were able to see how to implement a separable Gaussian filter using two rendering pass in order to get a 9-tap Gaussian blur. We’ve also seen that we can run this filter three times over a 1024×1024 sized image in order to get a 33-tap Gaussian blur by using only 56 million texture fetches. While this is already quite efficient it does not really expose any possibilities of the GPUs as this form of the algorithm would work perfectly almost unmodified on a CPU as well.

Now, we will see that we can take advantage of the fixed function hardware available on the GPU that can even further reduce the number of required texture fetches. In order to get to this optimization let’s discuss one of the assumptions that we made from the beginning of the article:

So far, we assumed that in order to get information about a single pixel we have to make a texture fetch, that means for 9 pixels we need 9 texture fetches. While this is true in case of a CPU implementation, it is not necessarily true in case of a GPU implementation. This is because in the GPU case we have bilinear texture filtering at our disposal that comes with practically no cost. That means if we don’t fetch at texel center positions our texture then we can get information about multiple pixels. As we already use the separability property of the Gaussian function we actually working in 1D so for us bilinear filter will provide information about two pixels. The amount of how much each texel contribute to the final color value is based on the coordinate that we use.

By properly adjusting the texture coordinate offsets we can get the accurate information of two texels or pixels using a single texture fetch. That means for implementing a 9-tap horizontal/vertical Gaussian filter we need only 5 texture fetches. In general, for an N-tap filter we need [N/2] texture fetches.

What this will mean for our weight values previously used for the discrete sampled Gaussian filter? It means that each case we use a single texture fetch to get information about two texels we have to weight the color value retrieved by the sum of the weights corresponding to the two texels. Now that we know what are our weights, we just have to calculate the texture coordinate offsets properly.

For texture coordinates, we can simply use the middle coordinate between the two texel centers. While this is a good approximation, we won’t accept it as we can calculate much better coordinates that will result us exactly the same values as when we used discrete sampling.

In case of such a merge of two texels we have to adjust the coordinates that the distance of the determined coordinate from the texel #1 center should be equal to the weight of texel #2 divided by the sum of the two weights. In the same style, the distance of the determined coordinate from the texel #2 center should be equal to the weight of texel #1 divided by the sum of the two weights.

As a result, we get the following formulas to determine the weights and offsets for our linear sampled Gaussian blur filter:

By using this information we just have to replace our uniform constants and decrease the number of iterations in our vertical filter shader and we get the following:

uniform sampler2D image; out vec4 FragmentColor; uniform float offset[3] = float[]( 0.0, 1.3846153846, 3.2307692308 ); uniform float weight[3] = float[]( 0.2270270270, 0.3162162162, 0.0702702703 ); void main(void) { FragmentColor = texture2D( image, vec2(gl_FragCoord)/1024.0 ) * weight[0]; for (int i=1; i<3; i++) { FragmentColor += texture2D( image, ( vec2(gl_FragCoord)+vec2(0.0, offset[i]) )/1024.0 ) * weight[i]; FragmentColor += texture2D( image, ( vec2(gl_FragCoord)-vec2(0.0, offset[i]) )/1024.0 ) * weight[i]; } }

This simplification of the algorithm is mathematically correct and if we don’t consider possible rounding errors resulting from the hardware implementation of the bilinear filter we should get the exact same result with our linear sampling shader like in case of the discrete sampling one.

While the implementation of the linear sampling is pretty straightforward, it has a quite visible effect on the performance of the Gaussian blur filter. Taking into consideration that we managed to implement a 9-tap filter using just five texture fetches instead of nine, back to our example, blurring a 1024×1024 image with a 33-tap filter takes only 1024*1024*5*3*2 ≈ 31 million texture fetches instead of the 56 million required by discrete sampling. This is a quite reasonable difference and in order to better present how much that matters I’ve done some experiment to measure the difference between the two techniques. The result speaks for itself:

As we can see, the performance of the Gaussian filter implemented with linear sampling is about 60% faster than the one implemented with discrete sampling indifferent from the number of blur steps applied to the image. This roughly proportional to the number of texture fetches spared by using linear filtering.

## Conclusion

We’ve seen that implementing an efficient Gaussian blur filter is quite straightforward and the result is a very fast real-time algorithm, especially using the linear sampling, that can be used as the basis of more advanced rendering techniques.

Even though we concentrated on Gaussian blur in this article, many of the discussed principles apply to most convolution filter types. Also, most of the theory applies in case we need a blurred image of reduced size like it is usually needed by the bloom effect, even the linear sampling. The only thing that is really different in case of a reduced size blurred image is that our center pixel is also a “double-pixel”. This means that we have to use a row from our Pascal triangle that has even number of coefficients as we would like to linear sample the middle texels as well.

We’ve also had a brief insight into the computational complexity of the various techniques and how the filter can be efficiently implemented on the GPU.

The demo application used for the measurements performed to compare the discrete and linear sampling method can be downloaded here:

### Binary release

**Platform:** Windows

**Dependency:** OpenGL 3.3 capable graphics driver

**Download link: gaussian_win32.zip (2.96MB)**

**Language:** C++

**Platform:** cross-platform

**Dependency:** GLEW, SFML, GLM

**Download link:** gaussian_src.zip (5.37KB)

** **

P.S.: Sorry for the high minimum requirements of the application just I would really like to stick to strict OpenGL 3+ demos.

Print article | This entry was posted by Daniel Rákos on September 7, 2010 at 8:48 pm, and is filed under Graphics, Programming, Samples. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site. |

djmjabout 2 years ago

You are looking up 10 and not 9 pixels! Coordinate 0.0 intersects two pixels and so you take full advantage of linear interpolation! Else you need to offset by half a pixel!

Where is the benefit of the usual easier way? Lookup at coordinates [-4, -2, 0, 2, 4] to get 10 pixel range and then apply the weighting of each two texels and calculate final value?

about 2 years ago

Hi Daniel,

This article was incredibly clear and informative. Thank you for taking the time to put it together. I’m still learning this stuff, so this was very helpful.

I wrote up a small Python script based on this post to generate the gaussian texture lookups, optimized for linear sampling. Maybe it can be useful to someone else too:

https://gist.github.com/2332010

Thanks again

about 1 year ago

Hey Daniel,

nice article, the connection with Pascal’s triangle is indeed very interesting.

I would like to add two things. First, maybe it would be good to add that Gaussian has a unit integral, since it’s probably one of the reasons the filtering works (in the continuous case at least). The 3D plot should also reflect this (meaning that such a wide Gaussian certainly wouldn’t reach a value of 1.6). This is of course just minor.

The second thing that struck me was:

“A Gaussian function with a distribution of 2σ is equivalent with the product of two Gaussian functions with a distribution of σ.”

This is just my opinion, but I think this doesn’t hold. First, I think such product won’t be a 2σ-Gaussian (properly normalized and all). Second, a repeated application of any filter corresponds to applying their *convolution*, which differs from product. And, convolution of two Gaussians with StD=σ produces a Gaussian with StD=Sqrt(2*σ^2)=σ*Sqrt(2).

I think what you meant was that product of 2 polynoms with order n (n-th row of Pascal triangle) gives you a polynom of order 2n? That should be true.

What do you think?

Oskar

Christian Cann Schuldt Jensenabout 1 year ago

You can do a 9-tap gaussian using only 2 texture fetches by sampling at one of corner of the center pixel in the first pass (f.x with an xy-offset of -0.5,0.5 ) and then sample the opposite corner in the next pass ( xy-offset of 0.5,-0.5 )

This results in the standard

121

242

121

9-tap gaussian.

You can do the same gaussian in a single pass if you sample at all 4 corners (4 texture fetches)

You can do a nice and very fast 7-tap blur in a single pass if you sample two opposite corners with an offset of 1/3 away from the center.

This results in a

121

282

121

convolution kernel.

If you calculate the strength of the blur by subtracting from the center pixel you can then scale the result so that it closely matches the regular 9-tap gaussian.

I also do a 17-tap blur using only 5 fetches in a single pass by sampling the center pixel and then offsets (0.4,-1.2) , (-1.2,-0.4) , (1.2,0.4) and (-0.4,1.2).

I then adjust the strength of it to more closely match the 9-tap gaussian.

Billabout 1 year ago

Thank you. This article it’s awesome. It helped me a lot !

After implemented your blurring function, I have implemented what I think is a “bilateral filter” too. I don’t know if it is 100% correct but seams to do what I mean.

Pseudocode:

[code]

#define DENOISE_AMOUNT 1 // between 0-1

ref = texture( imageMap, IMAGE_TEX_COORDS );

sumColor = ref * weight[0] ;

#define CHECK(S, X) step( X , intensity_(S) * DENOISE_AMOUNT )

for (int i=1; i<BLUR_SAMPLE_COUNT; i++) {

cSamp1 = texture( imageMap, IMAGE_TEX_COORDS + OFFSET_BLUR );

cSamp2 = texture( imageMap, IMAGE_TEX_COORDS - OFFSET_BLUR );

float diff1 = intensity_(abs(cSamp1 - ref));

float diff2 = intensity_(abs(cSamp2 - ref));

sumColor += mix( ref, cSamp1, CHECK(cSamp1* weight[i], diff1 ) ) * weight[i] ;

sumColor += mix( ref , cSamp2, CHECK(cSamp2* weight[i], diff2 ) ) * weight[i] ;

}

return max(sumColor, 0);

[/code]

for RGB, 'intensity_()' it's the simple luminance of the color

I hope I can use this for things like blurring the SSAO map, but I have not tested it for that thing yet.

If someone have any advice.. I much appreciate!

@Christian Cann Schuldt Jensen

I doubt that you are right for the single-pass. The gaussian it's separable, but if in the texture there are not blurred value..?! but I'm beginning so… correct me if I'm wrong.

P.S.

sorry if I made any language's mistake, I'm italian