VTK/ParaView Filters: Performance Improvements

Over the last year, many VTK/ParaView filters have been significantly improved performance-wise. These changes have further enhanced the overall responsiveness of VTK/ParaView because the majority of these filters are used very often. This work is a continuation of last year’s VTK/ParaView performance improvements [1] that build on  recent vtkSMPTools improvements [2]. This is a community effort that has been supported by Kitware’s collaborators, community members, and customers—both commercial and government. All of this work is part of the vision to enhance the overall performance of VTK/ParaView and increase the productivity of scientists who use Kitware’s open source platforms.

In this post, we provide an overview of how each filter’s performance has been improved and also time-related information used to calculate the achieved speed-up. Several approaches have contributed to the performance improvements shown below which include: 1) employing multithreading using vtkSMPTools, 2) using caching whenever it’s deemed beneficial, 3) utilizing memory pools, and 4) improving certain cell/point locator operations. Note that performance may vary depending on the input dataset and the particulars of the computing platform, especially the number of computing cores.

In the following performance analysis, we use the terms speed-up and parallel efficiency to characterize performance improvements. The speed-up is the original execution time over the new/enhanced execution time; for the case of an algorithm that utilizes many cores, speed-up is the execution time using 1 core over the execution time utilizing n cores. Parallel efficiency is the speed-up over the number of cores.

  1. vtkMergeVectorComponents is a filter that is responsible for merging three different PointData/CellData arrays into one array with tuple size = 3. Its performance has been improved by employing multithreading using vtkSMPTools.
1 Thread0.510 sec
8 Threads0.152 sec
Parallel Efficiency42%
Speed-upx3.35
  1. vtkExtractVectorComponents is a filter that is responsible for creating three different arrays PointData/CellData out of one array with tuple size = 3. Its performance has been improved by employing multithreading using vtkSMPTools.
1 Thread0.561 sec
8 Threads0.150 sec
Parallel Efficiency47%
Speed-upx3.74
  1. vtkArrayCalculator is a filter that is responsible for calculating mathematical expressions on vectors/scalars/coordinates array. Its performance has been improved by 1) employing multithreading using vtkSMPTools, 2) using a new robust and function parser based on the ExprTk library, and 3) removing certain atomic operations that were creating unnecessary contention issues.
BeforeAfter
1 Thread6.312 sec3.640 sec
8 Threads0.542 sec
Parallel Efficiency84%
Speed-upx6.72
Before-After Speed-upx11.65
  1. vtkExtractHistogram is a filter that creates a histogram for a specific PointData/CellData array.  Its performance has been improved by employing multithreading using vtkSMPTools.
1 Thread0.333 sec
8 Threads0.046 sec
Parallel Efficiency90%
Speed-upx7.23
  1. vtkVortexCore is a filter that computes vortex cores lines using the parallel vectors method. Its performance has been improved by employing multithreading using vtkSMPTools.
Fast Mode = OffFast Mode = On
1 Thread90.8 sec36.4 sec
8 Threads12.33 sec5.23 sec
Parallel Efficiency92%87%
Speed-upx7.23x6.95
  1. vtkMeshQuality is a filter that can compute many different metrics related to the quality of the cells of a mesh. Its performance has been improved by employing multithreading using vtkSMPTools.
1 Thread1.276 sec
8 Threads0.190 sec
Parallel Efficiency84%
Speed-upx6.72
  1. vtkPointDataToCellData is a filter that transforms PointData to CellData. Its performance has been improved by employing multithreading using vtkSMPTools for the case that input PointData arrays are to be treated as categorical data.
1 Thread0.318 sec
8 Threads0.048 sec
Parallel Efficiency83%
Speed-upx6.63
  1. vtkStreamTracer is a filter that integrates vector fields to generate streamlines. Its performance has been improved by 1) employing multithreading using vtkSMPTools and 2) minimizing the usage of the locators used for integration.
Surface DataSetBefore with Point LocatorAfter with Point LocatorAfter with Cell Locator
1 Thread71.8 sec35.259 sec11.241 sec
8 Threads5.892 sec1.857 sec
Parallel Efficiency75%76%
Speed-upx5.98x6.05
Before-After Speed-upx12.18x38.66
Volume DataSetBefore with Point LocatorAfter with Point LocatorAfter with Cell Locator
1 Thread8.936 sec5.843 sec7.189 sec
8 Threads1.179 sec2.431 sec
Parallel Efficiency62%37%
Speed-upx4.96x2.96
Before-After Speed-upx7.6x3.67

Note that for the volume dataset, the Integration step has 90% parallel efficiency but the cell/point locators’ BuiltLocator function have 21.8% and 39.7% parallel efficiency respectively, because they create a uniform structure whereas the input volume dataset is not a uniform structure.

  1. vtkParticleTracer is a filter that integrates a vector field to advect particles (changes in this filter also affect vtkParticlePathFilter/vtkStreaklineFilter). Its performance has been improved by 1) employing multithreading using vtkSMPTools, 2) caching the cell bounds of the input mesh, and 3) utilizing transformation techniques to avoid rebuilding the cell locator when the input dataset at each timestep is a linear transformation of the first timestep.
Before with Cell LocatorAfter with Point Locator and Linear Transform Optimization = OffAfter with Cell Locator and Linear Transform Optimization = OffAfter with Point Locator and Linear Transform Optimization = OnAfter with Cell Locator and Linear Transform Optimization = On
1 Thread1362.5 sec1301.5 sec1752.5 sec200.66 sec109.25 sec
8 Threads165.68 sec223.7 sec54.425 sec17.244 sec
Parallel Efficiency98%98%46%79%
Speed-upx7.85x7.83x3.68x6.33
Before-After Speed-upx8.22x6.09x25.03x79.01
  1. vtkGeometryFilter is a filter that extracts the boundary geometry (surface) of a dataset. Its performance has been improved by 1) replacing all the existing algorithms with the ones found in vtkDataSetSurfaceFilter and utilizing the existing multithreaded infrastructure, 2) improving the existing HashTables query speed by removing duplicate faces when found, 3) consuming 50% less memory by identifying if vtkIdType (long long) or int cell point IDs are sufficient, and 4) converting a vtkUnstructuredGrid to vtkPolydata almost instantaneously if it contains only vertices, lines, polys, or strips—therefore it’s already a surface. vtkGeometryFilter delegates to vtkDataSetSurfaceFilter when input is vtkUnstructuredGrid and it contains nonlinear cells. vtkDataSetSurfaceFilter will be removed when vtkGeometryFilter is extended to support nonlinear cells. 
UnstructuredGrid with TetravtkDataSetSurfaceFilterBefore vtkGeometryFilterAfter vtkGeometryFilter
1 Thread4.749 sec64.820 sec2.650 sec
8 Threads9.875 sec0.440 sec
Parallel Efficiency82%75%
Speed-upx6.56x6.02
Before-After Speed-upx22.44
vtkDataSetSurfaceFilter-After Speed-upx10.79
UnstructuredGrid with trianglesvtkDataSetSurfaceFilterBefore vtkGeometryFilterAfter vtkGeometryFilter
1 Thread1.599 sec1.801 sec0.008 sec
8 Threads0.363 sec0.002 sec
Parallel Efficiency62%50%
Speed-upx4.96x4
Before-After Speed-upx191.5
vtkDataSetSurfaceFilter-After Speed-upx799.5
ImageData without hidden cellsvtkDataSetSurfaceFilterBefore vtkGeometryFilterAfter vtkGeometryFilter
1 Thread0.158 sec2.191 sec0.540 sec
8 Threads1.061 sec0.201 sec
Parallel Efficiency25%33%
Speed-upx2.07x2.69
Before-After Speed-upx5.28
vtkDataSetSurfaceFilter-After Speed-upx0.78

Note that vtkDataSetSurfaceFilter generates a surface that includes duplicate points on the side of each face of an image; therefore, it would be unfair to compare vtkDataSetSurfaceFilter with vtkGeometryFilter. Also, vtkGeometryFilter did not previously support ImageData with hidden cells.

ImageData with hidden cellsvtkDataSetSurfaceFilterBefore vtkGeometryFilterAfter vtkGeometryFilter
1 Thread3.632 sec3.934 sec
8 Threads0.720 sec
Parallel Efficiency68%
Speed-upx5.46
Before-After Speed-up
vtkDataSetSurfaceFilter-After Speed-upx5.04
  1. vtkResampleWithDataSet is a filter that takes two inputs, Input and Source, and samples the point and cell values of Source on to the point locations of Input (changes in this filter also affect vtkCompositeDataProbeFilter/vtkProbeFilter). Its performance has been improved for the case that the input and source are not vtkImageData by employing multithreading using vtkSMPTools and caching found cells at each query.
BeforeAfter
1 Thread170.4 sec168.07 sec
8 Threads42.741 sec
Parallel Efficiency49%
Speed-upx3.93
Before-After Speed-upx3.98
  1. vtkCutter is a filter to cut/slice a dataset using any subclass of vtkImplicitFunction. vtkCutter now delegates to vtkPlaneCutter, which is specialized for planes. vtkPlaneCutter delegates to 1) vtkFlyingEdgesPlaneCutter if input is vtkImageData and triangles are requested to be generated, 2) vtkStructuredDataPlaneCutter if input is vtkImageData and polygons are requested to be generated or if input is vtkStructuredGrid/vtkRectilinearGrid, 3) vtkPolyDataPlaneCutter if input is vtkPolyData with convex polygons cells, and 4) vtk3DLinearGridPlaneCutter if input is vtkUnstructuredGrid with 3D linear cells. All filters that vtkPlaneCutter delegates to are employing multithreading using vtkSMPTools.

    All the following performance results use a plane as the cut function that cuts the dataset in half.
UnstructuredGrid
with Tetra
BeforeAfter
1 Thread0.441 sec0.094 sec
8 Threads0.023 sec
Parallel Efficiency51%
Speed-upx4.09
Before-After Speed-upx19.17
PolyData with trianglesBeforeAfter
1 Thread1.511 sec0.031 sec
8 Threads0.011 sec
Parallel Efficiency35%
Speed-upx2.82
Before-After Speed-upx137.36
ImageDataBeforeAfter
1 Thread2.146 sec0.115 sec
8 Threads0.037 sec
Parallel Efficiency39%
Speed-upx3.11
Before-After Speed-upx85
RectilinearGridBeforeAfter
1 Thread3.716 sec5.128 sec
8 Threads1.011 sec
Parallel Efficiency63%
Speed-upx5.07
Before-After Speed-upx3.68
  1. vtkTableBasedClipDataSet is a filter to clip a dataset using any subclass of vtkImplicitFunction. vtkTableBasedClipDataSet delegates to vtkClipDataSet only if 1) input is vtkPolyData that includes cells other than vertex, line, triangle, or quad, or 2) if input is vtkUnstructuredGrid that includes cells other than vertex, line, triangle, quad, pixel, tetrahedron, pyramid, wedge, hexahedron, or voxel. Its performance has been improved by 1) employing multithreading using vtkSMPTools, 2) utilizing vtkStaticEdgeLocatorTemplate, which is a more efficient edge hash table than the preexisting one, and 3) consuming 50% less memory by identifying if vtkIdType (long long) or int cell point IDs are sufficient.


All the following performance results use a plane as the clip function that clips the dataset in half.

UnstructuredGrid
with Tetra
BeforeAfter
1 Thread1.035 sec0.650 sec
8 Threads0.109 sec
Parallel Efficiency74%
Speed-upx5.96
Before-After Speed-upx9.49
PolyData
with triangles
BeforeAfter
1 Thread0.680 sec0.532 sec
8 Threads0.129 sec
Parallel Efficiency52%
Speed-upx4.11
Before-After Speed-upx5.27
ImageData BeforeAfter
1 Thread20.056 sec14.161 sec
8 Threads2.49 sec
Parallel Efficiency71%
Speed-upx5.69
Before-After Speed-upx8.05
RectilinearGrid BeforeAfter
1 Thread19.683 sec14.293 sec
8 Threads2.598 sec
Parallel Efficiency69%
Speed-upx5.5
Before-After Speed-upx7.58

Future work:

As part of the future work of this vision, we plan to explore the use of better cell/point locators for filters that heavily depend on them, such as vtkParticleTracer, vtkStreamTracer, and vtkResampleWithDataSet. Additionally, we plan to multithread vtkContourFilter, vtkPVGlyphFilter, and vtkPolyDataNormals. Finally, we plan to speed up the readers and writers by utilizing fast and robust open source libraries, such as fmt [3] and scnlib [4].

If you have particular needs to improve the performance of your VTK/ParaView-based applications, please contact us with ideas, suggestions, and feedback. At Kitware, we regularly engage customers to improve the performance of their software; contact us if you’d like us to assist you in this way.

References:

[1] Ongoing VTK / ParaView Performance Improvements https://www.kitware.com/ongoing-vtk-paraview-performance-improvements/

[2] VTK Shared Memory Parallelism Tools, 2021 updates https://www.kitware.com/vtk-shared-memory-parallelism-tools-2021-updates/

[3] fmt https://github.com/fmtlib/fmt

[4] scnlib https://github.com/eliaskosunen/scnlib

Documentation:

Parallel Processing with VTK’s SMP Framework Documentation

4 comments to VTK/ParaView Filters: Performance Improvements

  1. I’m excited to test several of the performance enhancements designed described here – particularly the vtkGeometryFilter and vtkCutter. Which version of VTK are these improvements delivered in – what are the “Before and After’ versions used in the performance tables?

Leave a Reply