VTK Technical Highlight: Dual Depth Peeling

October 18, 2016

Visualization is the practice of discerning meaning from data. To aid this practice, the Visualization Toolkit (VTK) provides two sets of facilities. One set massages data into more useful formats (filters), and another set displays data (rendering techniques). The major focus of the VTK 7 series has been improved and expanded rendering. Let’s examine the improvements we’ve made so far.

Things started rolling in version 7.0 with the promotion of the previously experimental “OpenGL2” rendering backend to the default. The name is a misnomer, as it has nothing to do with OpenGL version 2. A more precise name would be “OpenGL>=3.2” because we replaced our fixed-function rendering calls with modern and much faster programmable-shader-style rendering calls. VTK is a big library that supports many features, so this was a more arduous task than it may appear. It took at least 773 commits (and counting) and touched hundreds of classes as well as tens of thousands of lines of code to get everything displayed as least as well as before. It was well worth doing, though. Our tests show that volume rendering is roughly two times faster and that surface rendering is roughly 100 times faster.

One more thing about OpenGL2: Besides being faster, OpenGL2 makes it easier to inject low-level shader code to build up entirely new rendering capabilities. We recently added direct value rendering in floating point buffers to support deferred rendering in Cinema, hidden line removal for wireframe renderings, and Fast Approximate Anti-Aliasing (FXAA) to make more appealing images and preserve the ability to perform depth compositing at scale (e.g., in IceT for ParaView).

We’ve also been busy with several other rendering improvements. These include off-screen rendering with the EGL native platform interface; compatibility with the OpenSWR software rasterizer, which is optimized for Intel’s central processing units (CPUs); integration with the OSPRay ray tracer, which is similarly optimized; and support for the new commodity-class virtual reality systems like HTC Vive and Oculus Rift.

VTK Technical Highlight 1
OSPRay renders a Vector Particle-In-Cell (VPIC) dataset with scaled implicit spheres, shadows, and ambient occlusion.

Now, let’s focus on a new dual depth peeling algorithm that we added to VTK. To begin, let’s look at depth peeling.

Depth Peeling

Translucency is tricky with a rasterizer. Without a guarantee of the order in which primitives are drawn, the naive approach of simply accumulating translucent triangles onto the screen will produce incorrect pictures, where distant objects appear to be in front of nearby ones. Sorting primitives by distance to the camera can help, but it is compute intensive, it must be done every frame, and it will not work in cases where groups of primitives self overlap.

With naive alpha blending, the blue dragon, which is nearest, appears to be behind the other dragons.
With naive alpha blending, the blue dragon, which is nearest, appears to be behind the other dragons.

Depth peeling is a technique for rendering translucent geometry correctly in a rasterizer like OpenGL. VTK’s first incarnation of depth peeling arrived in 2006. Back then, the required programmable shader features were somewhat rare, so the VTK code to enable and use them was necessarily convoluted. The code was widely used, however, and we strove to maintain and improve it over the years.

Depth Peeling Algorithm

VTK’s depth peeling algorithm renders the scene repeatedly. At each pass, the algorithm uses the depth buffer to only keep contributions from the next-nearest layer (or “peel”) of space. The algorithm then blends the contributions onto the already rendered results. The core of the algorithm is simplicity.

float fragDepth = gl_FragCoord.z
float odepth = texture2D(opaqueZTexture, gl_FragCoord.xy/screenSize).r;
float tdepth = texture2D(translucentZTexture, gl_FragCoord.xy/screenSize).r;
if (fragDepth >= odepth || // Occluded by opaque geometry
    fragDepth <= tdepth)   // Already handled in a previous peel
  {
  discard;
  }

Over the series of passes, the image improves, and fewer and fewer pixels change.

Peel 1 draws the nearest sides of each dragon (left) onto the screen (right).

VTK Technical Highlight 3
The fragments processed / screen size equals 135 percent.

Peel 2 draws whatever is behind the nearest sides, i.e., the farthest sides.

The fragments processes / screen size equals 96 percent.
The fragments processed / screen size equals 96 percent.

Peel 3 draws whatever is next, behind the farthest sides.

The fragments processes / screen size equals 56 percent.
The fragments processed / screen size equals 56 percent.

After some number of passes (17 in this case), the algorithm exceeds the depth complexity of the scene. No more pixels will change, and we can terminate the algorithm.

The final correct image shows the blue dragon in front and the red in back all on a gradient background.
The final correct image shows the blue dragon in front and the red dragon in back, all on a gradient background.

Dual Depth Peeling

Dual depth peeling is an improvement over the above. Essentially, at each pass, VTK’s dual depth peeling algorithm, vtkDualDepthPeelingPass, simultaneously peels layers of fragments from both the nearest and farthest sides of the scene. As such, the algorithm converges and terminates in approximately half the time. The algorithm was originally developed by NVIDIA in 2008.

Dual Depth Peeling Algorithm

The code in vtkDualDepthPeelingPass is somewhat complicated by the requirement to use a single blending specification while both peels’ color and depth information are computed and accumulated into separate render targets. The solution here is to use six input/output textures (BackPeel, BackAccum, FrontAccumIn, FrontAccumOut, MinMaxDepthIn, and MinMaxDepthOut) and MAX blending for all render targets. To implement this solution, we play some tricks. We negate values that we expect to decrease (such as the front peel’s alpha value), and we write to an empty texture and blend in a separate pass with the appropriate blend state values that we think may either increase or decrease (such as the back peel RGBA information).

The core code is as follows.

vec4 front = texelFetch(lastFrontPeel, pixelCoord, 0);
vec2 minMaxDepth = texelFetch(lastDepthPeel, pixelCoord, 0).xy;
float minDepth = -minMaxDepth.x; // negated for MAX blending
float maxDepth = minMaxDepth.y;
gl_FragDepth = gl_FragCoord.z;



// Default outputs (no data/change):
fragOutput0 = vec4(0.);     // BackPeel - only write rgba for current back peel
fragOutput1 = front;        // FrontA/B - write accumulated front peel rgba.
fragOutput2.xy = vec2(-1.); // DepthA/B - blends to form (-min, max) depth for

                           // next peel pair.



if (gl_FragDepth < minDepth || gl_FragDepth > maxDepth)
  { // Outside current peels
  return; // Already peeled
  }



if (gl_FragDepth > minDepth && gl_FragDepth < maxDepth)
  { // Inside current peels. Write out depth info and let MAX blending resolve

   // the next peel pair.
  fragOutput2.xy = vec2(-gl_FragDepth, gl_FragDepth);
  return;
  }




vec4 frag = ComputeFragmentRGBA(); // Current fragment color

// This fragment is on a current peel:
if (gl_FragDepth == minDepth)
    { // Front peel:
    front.a = 1. - front.a; // Front alpha is negated for MAX blending
    // Use under-blending to combine with front color:
    fragOutput1.rgb = front.a * frag.a * frag.rgb + front.rgb;
    fragOutput1.a = 1. - (front.a * (1. - frag.a)); // negated again
    }
  else // (gl_FragDepth == maxDepth)
    { // Back peel:
    frag.rgb *= frag.a; // Fragment with premultiplied alpha
    fragOutput0 = frag; // Blended in later fullscreen pass.
    }

In Peel 1, the left shows the back peel, the middle shows the min/max depth buffer, and the right shows the accumulated front peel.

VTK Technical Highlight 7
The dual depth peeling algorithm modifies 39 percent of screen pixels.

As in Peel 1, in Peel 2, the left shows the back peel, the middle shows the min/max depth buffer, and the right shows the accumulated front peel.

The shader modifies 21 percent of screen pixels.
The dual depth peeling algorithm modifies 21 percent of screen pixels.

Once again, in Peel 3, the left shows the back peel, the middle shows the min/max depth buffer, and the right shows the accumulated front peel.

VTK Technical Highlight 10
The dual depth peeling algorithm modifies 5.7 percent of screen pixels.

We end up with the same image as before. Since dual depth peeling extracts two peels with each geometry pass, however, it takes half the time. In this case, the algorithm converged to a correct image in only nine peeling passes (compared to 17 for traditional depth peeling), plus the overhead of a single pass to initialize depth buffers. In other words, the frame rate for correct rendering of translucent geometry nearly doubled.

In summary, we’ve made great strides in VTK’s rendering capabilities. The updated OpenGL rendering code is an improvement not only because it is faster, but also because the coding paradigm is more familiar to today’s OpenGL programmers and because the low-level OpenGL Shading Language code is easier to access. Overall, we can expect rendering improvements like the dual depth peeling implementation to come more frequently now, while VTK as a whole continues to move forward.

To learn more about dual depth peeling and the future of VTK, come see the Kitware booth (#3437) at The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16). In addition to the booth, you can find us at other activities, such as a tutorial on “Large Scale Visualization with ParaView.” For the dates and times of these activities, please refer to the Kitware SC16 event listing.

7 comments to VTK Technical Highlight: Dual Depth Peeling

  1. Very cool! Have you thought about using single-pass OIT? Also, does vtkDepthSortPolyData sort by triangle or by mesh?

    1. I’ve seen some single-pass OIT implementations (e.g. weighted alpha blending, https://mynameismjp.wordpress.com/2014/02/03/weighted-blended-oit/), but these are too inaccurate for scientific computing. They hold promise for gaming and other qualitative applications where “looks okay” is good enough, but we strive to produce quantitatively correct results whenever possible.

      vtkDepthSortPolyData works at the cell / triangle level, rather than the dataset / mesh level.

        1. Cool, thanks for the link — I hadn’t seen that method before! I’ll give this a read later.

          I’m very much looking forward to OpenGL 4+, since features like the atomic counters open up a lot of possibilities for interesting rendering techniques. Unfortunately VTK / ParaView are, for the most part, restricted to OpenGL 3.2 to ensure compatibility with most users, so I don’t get to play with many of these features (yet!).

    1. That’s still the procedure. Just let the renderer know that you want depth peeling, and if dual depth peeling is supported it will be used, otherwise it falls back to the single peel implementation.

  2. When we started this rewrite OpenGL2 really did mean OpenGL 2.1 on the desktop, and OpenGL ES 2.0 on the embedded platforms, using only the common subset of API and removing calls to the fixed pipeline in favor of the programmable. That evolved over the course of the rewrite to mean OpenGL 3.2, just thought I would add a little of the history. What you state is of course correct now. The depth peeling and antialiasing look great, and perform really well!

Leave a Reply