Over the last year, many VTK/ParaView filters have been significantly improved performance-wise. These changes have further enhanced the overall responsiveness of VTK/ParaView because the majority of these filters are used very often. This work is a continuation of last year’s VTK/ParaView performance improvements [1] that build on recent vtkSMPTools improvements [2]. This is a community effort that has been supported by Kitware’s collaborators, community members, and customers—both commercial and government. All of this work is part of the vision to enhance the overall performance of VTK/ParaView and increase the productivity of scientists who use Kitware’s open source platforms.
In this post, we provide an overview of how each filter’s performance has been improved and also time-related information used to calculate the achieved speed-up. Several approaches have contributed to the performance improvements shown below which include: 1) employing multithreading using vtkSMPTools, 2) using caching whenever it’s deemed beneficial, 3) utilizing memory pools, and 4) improving certain cell/point locator operations. Note that performance may vary depending on the input dataset and the particulars of the computing platform, especially the number of computing cores.
In the following performance analysis, we use the terms speed-up and parallel efficiency to characterize performance improvements. The speed-up is the original execution time over the new/enhanced execution time; for the case of an algorithm that utilizes many cores, speed-up is the execution time using 1 core over the execution time utilizing n cores. Parallel efficiency is the speed-up over the number of cores.
vtkMergeVectorComponents is a filter that is responsible for merging three different PointData/CellData arrays into one array with tuple size = 3. Its performance has been improved by employing multithreading using vtkSMPTools.
1 Thread
0.510 sec
8 Threads
0.152 sec
Parallel Efficiency
42%
Speed-up
x3.35
vtkExtractVectorComponents is a filter that is responsible for creating three different arrays PointData/CellData out of one array with tuple size = 3. Its performance has been improved by employing multithreading using vtkSMPTools.
1 Thread
0.561 sec
8 Threads
0.150 sec
Parallel Efficiency
47%
Speed-up
x3.74
vtkArrayCalculator is a filter that is responsible for calculating mathematical expressions on vectors/scalars/coordinates array. Its performance has been improved by 1) employing multithreading using vtkSMPTools, 2) using a new robust and function parser based on the ExprTk library, and 3) removing certain atomic operations that were creating unnecessary contention issues.
Before
After
1 Thread
6.312 sec
3.640 sec
8 Threads
–
0.542 sec
Parallel Efficiency
–
84%
Speed-up
–
x6.72
Before-After Speed-up
–
x11.65
vtkExtractHistogram is a filter that creates a histogram for a specific PointData/CellData array. Its performance has been improved by employing multithreading using vtkSMPTools.
1 Thread
0.333 sec
8 Threads
0.046 sec
Parallel Efficiency
90%
Speed-up
x7.23
vtkVortexCore is a filter that computes vortex cores lines using the parallel vectors method. Its performance has been improved by employing multithreading using vtkSMPTools.
Fast Mode = Off
Fast Mode = On
1 Thread
90.8 sec
36.4 sec
8 Threads
12.33 sec
5.23 sec
Parallel Efficiency
92%
87%
Speed-up
x7.23
x6.95
vtkMeshQuality is a filter that can compute many different metrics related to the quality of the cells of a mesh. Its performance has been improved by employing multithreading using vtkSMPTools.
1 Thread
1.276 sec
8 Threads
0.190 sec
Parallel Efficiency
84%
Speed-up
x6.72
vtkPointDataToCellData is a filter that transforms PointData to CellData. Its performance has been improved by employing multithreading using vtkSMPTools for the case that input PointData arrays are to be treated as categorical data.
1 Thread
0.318 sec
8 Threads
0.048 sec
Parallel Efficiency
83%
Speed-up
x6.63
vtkStreamTracer is a filter that integrates vector fields to generate streamlines. Its performance has been improved by 1) employing multithreading using vtkSMPTools and 2) minimizing the usage of the locators used for integration.
Surface DataSet
Before with Point Locator
After with Point Locator
After with Cell Locator
1 Thread
71.8 sec
35.259 sec
11.241 sec
8 Threads
–
5.892 sec
1.857 sec
Parallel Efficiency
–
75%
76%
Speed-up
–
x5.98
x6.05
Before-After Speed-up
–
x12.18
x38.66
Volume DataSet
Before with Point Locator
After with Point Locator
After with Cell Locator
1 Thread
8.936 sec
5.843 sec
7.189 sec
8 Threads
–
1.179 sec
2.431 sec
Parallel Efficiency
–
62%
37%
Speed-up
–
x4.96
x2.96
Before-After Speed-up
–
x7.6
x3.67
Note that for the volume dataset, the Integration step has 90% parallel efficiency but the cell/point locators’ BuiltLocator function have 21.8% and 39.7% parallel efficiency respectively, because they create a uniform structure whereas the input volume dataset is not a uniform structure.
vtkParticleTracer is a filter that integrates a vector field to advect particles (changes in this filter also affect vtkParticlePathFilter/vtkStreaklineFilter). Its performance has been improved by 1) employing multithreading using vtkSMPTools, 2) caching the cell bounds of the input mesh, and 3) utilizing transformation techniques to avoid rebuilding the cell locator when the input dataset at each timestep is a linear transformation of the first timestep.
Before with Cell Locator
After with Point Locator and Linear Transform Optimization = Off
After with Cell Locator and Linear Transform Optimization = Off
After with Point Locator and Linear Transform Optimization = On
After with Cell Locator and Linear Transform Optimization = On
1 Thread
1362.5 sec
1301.5 sec
1752.5 sec
200.66 sec
109.25 sec
8 Threads
–
165.68 sec
223.7 sec
54.425 sec
17.244 sec
Parallel Efficiency
–
98%
98%
46%
79%
Speed-up
–
x7.85
x7.83
x3.68
x6.33
Before-After Speed-up
–
x8.22
x6.09
x25.03
x79.01
vtkGeometryFilter is a filter that extracts the boundary geometry (surface) of a dataset. Its performance has been improved by 1) replacing all the existing algorithms with the ones found in vtkDataSetSurfaceFilter and utilizing the existing multithreaded infrastructure, 2) improving the existing HashTables query speed by removing duplicate faces when found, 3) consuming 50% less memory by identifying if vtkIdType (long long) or int cell point IDs are sufficient, and 4) converting a vtkUnstructuredGrid to vtkPolydata almost instantaneously if it contains only vertices, lines, polys, or strips—therefore it’s already a surface. vtkGeometryFilter delegates to vtkDataSetSurfaceFilter when input is vtkUnstructuredGrid and it contains nonlinear cells. vtkDataSetSurfaceFilter will be removed when vtkGeometryFilter is extended to support nonlinear cells.
UnstructuredGrid with Tetra
vtkDataSetSurfaceFilter
Before vtkGeometryFilter
After vtkGeometryFilter
1 Thread
4.749 sec
64.820 sec
2.650 sec
8 Threads
–
9.875 sec
0.440 sec
Parallel Efficiency
–
82%
75%
Speed-up
–
x6.56
x6.02
Before-After Speed-up
–
–
x22.44
vtkDataSetSurfaceFilter-After Speed-up
–
–
x10.79
UnstructuredGrid with triangles
vtkDataSetSurfaceFilter
Before vtkGeometryFilter
After vtkGeometryFilter
1 Thread
1.599 sec
1.801 sec
0.008 sec
8 Threads
–
0.363 sec
0.002 sec
Parallel Efficiency
–
62%
50%
Speed-up
–
x4.96
x4
Before-After Speed-up
–
–
x191.5
vtkDataSetSurfaceFilter-After Speed-up
–
–
x799.5
ImageData without hidden cells
vtkDataSetSurfaceFilter
Before vtkGeometryFilter
After vtkGeometryFilter
1 Thread
0.158 sec
2.191 sec
0.540 sec
8 Threads
–
1.061 sec
0.201 sec
Parallel Efficiency
–
25%
33%
Speed-up
–
x2.07
x2.69
Before-After Speed-up
–
–
x5.28
vtkDataSetSurfaceFilter-After Speed-up
–
–
x0.78
Note that vtkDataSetSurfaceFilter generates a surface that includes duplicate points on the side of each face of an image; therefore, it would be unfair to compare vtkDataSetSurfaceFilter with vtkGeometryFilter. Also, vtkGeometryFilter did not previously support ImageData with hidden cells.
ImageData with hidden cells
vtkDataSetSurfaceFilter
Before vtkGeometryFilter
After vtkGeometryFilter
1 Thread
3.632 sec
–
3.934 sec
8 Threads
–
–
0.720 sec
Parallel Efficiency
–
–
68%
Speed-up
–
–
x5.46
Before-After Speed-up
–
–
–
vtkDataSetSurfaceFilter-After Speed-up
–
x5.04
vtkResampleWithDataSet is a filter that takes two inputs, Input and Source, and samples the point and cell values of Source on to the point locations of Input (changes in this filter also affect vtkCompositeDataProbeFilter/vtkProbeFilter). Its performance has been improved for the case that the input and source are not vtkImageData by employing multithreading using vtkSMPTools and caching found cells at each query.
Before
After
1 Thread
170.4 sec
168.07 sec
8 Threads
–
42.741 sec
Parallel Efficiency
–
49%
Speed-up
–
x3.93
Before-After Speed-up
–
x3.98
vtkCutter is a filter to cut/slice a dataset using any subclass of vtkImplicitFunction. vtkCutter now delegates to vtkPlaneCutter, which is specialized for planes. vtkPlaneCutter delegates to 1) vtkFlyingEdgesPlaneCutter if input is vtkImageData and triangles are requested to be generated, 2) vtkStructuredDataPlaneCutter if input is vtkImageData and polygons are requested to be generated or if input is vtkStructuredGrid/vtkRectilinearGrid, 3) vtkPolyDataPlaneCutter if input is vtkPolyData with convex polygons cells, and 4) vtk3DLinearGridPlaneCutter if input is vtkUnstructuredGrid with 3D linear cells. All filters that vtkPlaneCutter delegates to are employing multithreading using vtkSMPTools.
All the following performance results use a plane as the cut function that cuts the dataset in half.
UnstructuredGrid with Tetra
Before
After
1 Thread
0.441 sec
0.094 sec
8 Threads
–
0.023 sec
Parallel Efficiency
–
51%
Speed-up
–
x4.09
Before-After Speed-up
–
x19.17
PolyData with triangles
Before
After
1 Thread
1.511 sec
0.031 sec
8 Threads
–
0.011 sec
Parallel Efficiency
–
35%
Speed-up
–
x2.82
Before-After Speed-up
–
x137.36
ImageData
Before
After
1 Thread
2.146 sec
0.115 sec
8 Threads
–
0.037 sec
Parallel Efficiency
–
39%
Speed-up
–
x3.11
Before-After Speed-up
–
x85
RectilinearGrid
Before
After
1 Thread
3.716 sec
5.128 sec
8 Threads
–
1.011 sec
Parallel Efficiency
–
63%
Speed-up
–
x5.07
Before-After Speed-up
–
x3.68
vtkTableBasedClipDataSet is a filter to clip a dataset using any subclass of vtkImplicitFunction. vtkTableBasedClipDataSet delegates to vtkClipDataSet only if 1) input is vtkPolyData that includes cells other than vertex, line, triangle, or quad, or 2) if input is vtkUnstructuredGrid that includes cells other than vertex, line, triangle, quad, pixel, tetrahedron, pyramid, wedge, hexahedron, or voxel. Its performance has been improved by 1) employing multithreading using vtkSMPTools, 2) utilizing vtkStaticEdgeLocatorTemplate, which is a more efficient edge hash table than the preexisting one, and 3) consuming 50% less memory by identifying if vtkIdType (long long) or int cell point IDs are sufficient.
All the following performance results use a plane as the clip function that clips the dataset in half.
UnstructuredGrid with Tetra
Before
After
1 Thread
1.035 sec
0.650 sec
8 Threads
–
0.109 sec
Parallel Efficiency
–
74%
Speed-up
–
x5.96
Before-After Speed-up
–
x9.49
PolyData with triangles
Before
After
1 Thread
0.680 sec
0.532 sec
8 Threads
–
0.129 sec
Parallel Efficiency
–
52%
Speed-up
–
x4.11
Before-After Speed-up
–
x5.27
ImageData
Before
After
1 Thread
20.056 sec
14.161 sec
8 Threads
–
2.49 sec
Parallel Efficiency
–
71%
Speed-up
–
x5.69
Before-After Speed-up
–
x8.05
RectilinearGrid
Before
After
1 Thread
19.683 sec
14.293 sec
8 Threads
–
2.598 sec
Parallel Efficiency
–
69%
Speed-up
–
x5.5
Before-After Speed-up
–
x7.58
Future work:
As part of the future work of this vision, we plan to explore the use of better cell/point locators for filters that heavily depend on them, such as vtkParticleTracer, vtkStreamTracer, and vtkResampleWithDataSet. Additionally, we plan to multithread vtkContourFilter, vtkPVGlyphFilter, and vtkPolyDataNormals. Finally, we plan to speed up the readers and writers by utilizing fast and robust open source libraries, such as fmt [3] and scnlib [4].
If you have particular needs to improve the performance of your VTK/ParaView-based applications, please contact us with ideas, suggestions, and feedback. At Kitware, we regularly engage customers to improve the performance of their software; contact us if you’d like us to assist you in this way.
4 comments to VTK/ParaView Filters: Performance Improvements
I’m excited to test several of the performance enhancements designed described here – particularly the vtkGeometryFilter and vtkCutter. Which version of VTK are these improvements delivered in – what are the “Before and After’ versions used in the performance tables?
I’m excited to test several of the performance enhancements designed described here – particularly the vtkGeometryFilter and vtkCutter. Which version of VTK are these improvements delivered in – what are the “Before and After’ versions used in the performance tables?
You can use VTK master as after and VTK 9.0 as before
Can the author provide sample code or related test code ?
Thanks alot!
With regards to what filter?