söndag 8 juni 2014

Temporal subpixel reconstruction antialiasing (Temporal SRAA)


Lately I've been experimenting with different kinds of antialiasing for the graphics engine I'm working on. Since I'm using HDR rendering and deferred shading, I've been looking at alternatives to multisampling since I perceive multisampling as being too complex to implement, too slow, uses too much memory and suffers from low quality (see my previous post for potential solutions there though).

Subpixel reconstruction antialiasing

A pretty unknown antialiasing technique is Subpixel Reconstruction Antialiasing (SRAA). The paper on SRAA can be found here: https://research.nvidia.com/sites/default/files/publications/I3D11.pdf. In short, SRAA renders the scene in two passes. In the first pass, the scene is rendered and shaded as normal. The second pass then renders the scene with multisampling but no shading to generate additional coverage data for each pixel.

In a way, SRAA then works similarly to FXAA in that it indeed blurs each pixel with its neighbors, but it does not rely on edge detection to do this. Instead it relies on the additional coverage data from the second pass when deciding what shaded pixel to read data from. That means that SRAA has much better temporal stability for moving edges, but also suffers from a similar drawback: If a triangle is too small or thin to end up being shaded in the first pass, then there will be no color information for SRAA to work on, so even if the multisampled pass captures that triangle there will be no color information for it, so the algorithm breaks down. In this sense, SRAA carries the same limitation as FXAA. It's advantage then only lies in better temporal stability for subpixel movement of edges.

Here's an example of a triangle soup with no antialiasing and with 8x SRAA.

No antialiasing


At first glance, it looks like SRAA solves everything, but a closer look shows that SRAA wasn't able to reconstruct all parts of the screen. The SRAA version is to the right.

There's an improvement, but since the triangle is so thin that it barely covers any pixels in the shading pass, the resolving pass simply does not have enough color data to work with. In the following screenshot, any pixel that lacked a color sample for at least one its eight samples is highlighted in white.

Temporal SRAA

This is where the temporal component comes into play. We can sample the previous frame as well as the current one, preferably using reprojection to compensate for the movement of our triangles, but that can cause ghosting. In normal temporal supersampling, we have no good way of determining if the sample we get from the previous frame will cause ghosting or not, but with SRAA we have primitive IDs that we can match together! That means that we can completely avoid geometry ghosting by only picking samples from the previous frame that actually belonged to the triangle we're processing, so if the triangle was occluded in the previous frame we simply ignore the samples from the previous frame.

Here's an example of 8x TSRAA in action...

And a visualization of pixels that missed color data for at least one sample.

Much better! Finally, how does our thin green friend look?

Awesome! Despite the triangle being much thinner than a pixel, we had enough data to at least reconstruct a continuous rasterized triangle out of it which doesn't flicker during sub-pixel motion!


The edge accuracy of both SRAA and temporal SRAA solely depends on the amount of multisampling used, in my case 8x. While SRAA only has a subpixel accuracy of slightly under 1 pixel, I estimate the subpixel accuracy of TSRAA to between half a pixel and 1/3rd of a pixel, which is a massive improvement for a the very small performance hit of adding the temporal component. The cost is also kept separated from the shading cost as this works as a post processing step. Another good improvement over plain MSAA is that the resolve step can be done after tonemapping.

Although geometry ghosting from the temporal component is completely solved by SRAA, ghosting may still appear if SRAA is applied after transparent particles or objects have been rendered on top of the triangles. An implementation problem is that the triangle IDs need to remain consistent between frames, so each instance in the scene needs to be assigned a permanent ID number (or be based on a constant value like a static location for example).

The fact that the scene has to be rendered twice can be a limiting factor when it comes to vertex processing, but it's possible to simply create an MSAA render target for your G-buffer, then only shade the first sample as usual and use the rest of the samples for SRAA, effectively wasting a bit if VRAM and bandwidth to avoid the second pass.

If anyone's interested in actually implementing this and not just the theory, I'd be glad to follow this up with more implementation details.

måndag 21 januari 2013

HDR inverse tone mapping MSAA resolve

The problem:
Usually when mixing HDR rendering with MSAA, the super-simplified pipeline looks something like this:

Render the scene with MSAA
Compute lighting with sub-sample accuracy where it's needed
Resolve MSAA texture to a normal HDR texture
Tone mapping

However, we actually want to do the tone mapping per sample, not on the already resolved sample. If we don't, we get this:

This is a triangle rendered with 4xMSAA. The corners have the colors (0, 0, 0), (1, 1, 1) and (50, 50, 50), respectively. Notice how the anti-aliasing looks pretty good near the left edge of the triangle while to the right, the effect is completely lost. That's because since we're computing the average of samples close to (50, 50, 50) and samples close to (0, 0, 0) (the black background), we still get very high values after averaging them together even if just a single sample was (50, 50, 50). In a way, the bright samples completely take over the pixel. Tone mapping doesn't help either, since even values like (10, 10, 10) get mapped close to white. In short, we completely lose the anti-aliasing effect.

We need to do the tone-mapping per sample, but that's not practical in a real game. Running post processing effects like depth of field and motion blur on an MSAA render target would be incredibly expensive. We also can't move those two effects to after tone mapping since a big part of the effect is that they should be done in HDR.

Yesterday I had an idea that "solves" this. The idea is that we need to do the resolve with tone mapped colors, but afterwards we also want the post processing to take advantage of HDR. Therefore I tone mapped each sample, averaged them together and then simply ran the new value through the inverse of the tone mapping function to get back a HDR value. When the tone mapping is later redone as usual after post-processing, the result will be perfectly anti-aliased tone-mapped values. Although the post processing may blur the values, I'm assuming that the blur will hide the aliasing introduced by messing with the result of my little trick, but I haven't tested that yet.

For the most simple tone mapping function out there,
color = color / (color + 1);
called Reinhard tone mapping, calculating the inverse of it was very simple and the result was extremely satisfying:

Standard resolve:

Inverse tone mapping resolve:

The result is simply perfect. However, Reinhard's function is rarely used since while it does the job, it doesn't look very good and desaturates colors and blacks and whatever. More advanced functions can cause a few problems. John Hable's blog post on the tone mapping function used by Uncharted 2 seemed to be a respectable candidate: http://filmicgames.com/archives/75

The function looks like this:

float A = 0.15;
float B = 0.50;
float C = 0.10;
float D = 0.20;
float E = 0.02;
float F = 0.30;
float W = 11.2;

vec3 toneMap(vec3 x)
   return ((x*(A*x+C*B)+D*E)/(x*(A*x+B)+D*F))-E/F;

First I just ran the function through an equation solver to get the inverse function, which horribly enough looks like this:

vec3 inverseToneMap(vec3 x)

    return (sqrt((4*x-4*x*x)*A*D*F*F*F+(-4*x*A*D*E+B*B*C*C-2*x*B*B*C+x*x*B*B)*F*F+(2*x*B*B-2*B*B*C)*E*F+B*B*E*E)+(B*C-x*B)*F-B*E)/((2*x-2)*A*F+2*A*E);

Although most of this will be precomputed by the shader, that's still quite a few operations. Luckily, the resolve shader is incredibly bandwidth limited thanks to the HDR texture and multiple samples, so increasing the amount of computations needed won't have a very big impact on performance.

The results were a bit disappointing.

Normal resolve:

Inverse tone mapping resolve:

Oops. There wasn't enough floating precision to rebuild the HDR values after tone mapping. As color intensity approaches infinity, the tone mapped color approaches 1.0. Currently, we have lots of precision close to 0.0 but relatively bad precision close to 1.0 since we're using floating point values. If we invert the tone mapped color during the resolve, much more precision will be available where it matters.

vec3 toneMap(vec3 x)
//return ((x*(A*x+C*B)+D*E)/(x*(A*x+B)+D*F)-E/F);
return 1.0-((x*(A*x+C*B)+D*E)/(x*(A*x+B)+D*F)-E/F);

vec3 inverseToneMap(vec3 x)
//return (sqrt((4*x-4*x*x)*A*D*F*F*F+(-4*x*A*D*E+B*B*C*C-2*x*B*B*C+x*x*B*B)*F*F+(2*x*B*B-2*B*B*C)*E*F+B*B*E*E)+(B*C-x*B)*F-B*E)/((2*x-2)*A*F+2*A*E);
return (sqrt((4*x-4*x*x)*A*D*F*F*F+((4*x-4)*A*D*E+B*B*C*C+(2*x-2)*B*B*C+(x*x-2*x+1)*B*B)*F*F+((2-2*x)*B*B-2*B*B*C)*E*F+B*B*E*E)+((1-x)*B-B*C)*F+B*E)/(2*x*A*F-2*A*E);

The result is perfect!

First of all, ignore the FPS values on the pictures! They're affected by all kinds of YouTube videos and other instances of the test program running simultaneously.

I tested the different resolve algorithms at 1920x1080 with 4xMSAA on a GTX 295 (SLI disabled) which performance-wise lies somewhere in between a GTX 260 and a GTX 275. These numbers include the cost of rendering the triangle, resolving the MSAA texture and tone mapping the final resolve.

Hardware blitting resolve: 906 FPS (1.104 ms)
Custom shader resolve (no tone mapping): 850 FPS (1.176 ms)
Custom shader resolve (inverse Reinhard): 770 FPS (1.2987 ms)
Custom shader resolve (inverse Uncharted 2): 485 FPS (2.062 ms)

The numbers would look a lot better if I could get SLI working. Anyway, a GTX 680 has around 150% better performance compared to a GTX 295 with SLI enabled, so expect around 250% better performance with today's high-end hardware compared to this benchmark (around 0.6ms for the inverse Uncharted 2 resolve).

And finally, here's the source code for the GLSL resolve fragment shader. Note that this shader is fed with unnormalized texture coordinates.

And for reference, here's the source code for the tone mapping shader, which uses normalized texture coordinates.

Now get out there and fix your AA resolves! I'm looking at you, BF3!

PS: After Googling a bit, I found out that this trick has already been invented. Figures. -_-'