Faster data loading in VTK

September 13, 2023

Alexy Pellegrini, Mathieu Westphal, Julien Fausty, Julien Finet and Francois Mazen

Introduction

VTK 9.3 has seen an improvement to its data file loading process and related performances. Let’s take a look at the VTK data loading process, what has changed since the last VTK release, and what are the benefits for performances!

Performance analysis

Thanks to the new vtkResourceParser, and a little bit of refactoring, vtkOBJReader is now about seven times faster than before, and can now load OBJ from any kind of stream (e.g. string) instead of being limited to file names.
vtkPLYReader also benefited from vtkResourceParser, it is now about three times faster than before when reading text encoded PLY files, and also has a 35% performance improvement when reading binary encoded PLY files.

Performance of text-encoded files, with vtkOBJReader and vtkPLYReader, is significantly increased thanks to the new number parsing algorithm included in vtkResourceParser.
Parsing of binary-encoded files with vtkPLYReader is also faster thanks to the “ReadLine” function of vtkResourceParser that is about four times faster than its standard counterpart, std::istream::getline.

vtkGLTFReader performance does not change much, because JSON parsing is still handled by the same third-party JSON parser. The difference in the previous graph is not significant and is probably related to the few changes induced by vtkResourceStream support.

All tested files are part of VTK testing data.

What changed

If you are curious about what lead to those improvements and the motivations behind the changes, read the next sections, otherwise, jump to the current status part.

Current status of data loading in VTK

When it comes to loading data, VTK 9.2 and older had one standard way of doing it, the SetFileName() function. It was the only function defined in every reader.
Some readers implement other ways to load data, generally using a string as input, which induces a copy of input data, and more sparsely, a raw buffer.
This had the following limitations:

Inconsistent API; some readers supported more input methods than others. The only common input method being via file names.
Some readers only support file names; if data comes from another source such as a web resource, data would have to be written to a file and then read back from it.
- This has drawbacks. First one is performance, because of file system access. Second one is more code, which reduces maintainability and also creates more opportunities of making errors. Last one is other questions that may need to be answered, such as: where to put this file ? Putting confidential data on disk may also require more work to ensure it can not be accessed later.
Some readers support reading from a string; but this always performs an additional copy of input data.

Present and future of data loading in VTK

Resource streams are intended to become the new standard for data loading in VTK 9.3; they provide a common, simple and extensible interface for data loading.

A resource stream is represented by the vtkResourceStream class, an abstract class that represents a raw data stream (i.e. sequence of bytes).

This abstract class has four virtual methods: Read, EndOfStream, Seek and Tell. Only the two firsts are required. When only the two firsts methods are defined, the stream is “non seekable”. These four methods are standard stream functions that can be found in most languages. vtkResourceStream is an input stream, but may become a bidirectional stream in the future.

Current implementation supports file input, through vtkFileResourceStream, and memory input through vtkMemoryResourceStream. Additional streams supporting other types of resources, such as web resources or object storages, can easily be added in the future.

Current status

Reader	Status	VTK version
vtkPLYReader	✅Done \| Merge Request	9.3.0
vtkOBJReader	✅Done \| Merge Request	9.3.0
vtkGLTFReader	✅Done \| Merge Request	9.3.0
vtkXMLReader (vti, vtp, …)	📅 Planned	TBD

Possible contributions

A lot of work still needs to be done in order to support resource streams everywhere in VTK and more ways of loading data:

Add resource stream support to all readers and importers
Add new resource stream types to support other input methods, e.g. HTTP(s), zip archives…

Please reach out to Kitware if you need our support with this work.

Acknowledgement

This work has been implemented by Kitware Europe and was funded by the CALM-AA European project (cofunded by the European fund for regional development).

Tags:

Data Performance Scientific Computing Software Process VTK

3 comments to Faster data loading in VTK

Sean Ahern says:

September 18, 2023 at 11:28 am

It’s great to hear about the rearchitecting to allow data to come from sources other than files. What was the fundamental change that enabled an almost order-of-magnitude speed increase for OBJ and PLY files?

Reply
1. Alexy Pellegrini says:
  
  September 19, 2023 at 3:34 am
  
  The main improvement is the replacement of the standard float parsing function (strtod, istream, etc) with the fast float parsing implementation from Daniel Lemire (https://github.com/fastfloat/fast_float). This simple change make text based format that mainly contains decimal values to be read way faster. This is the reason why text-based formats benefit that much of vtkResourceParser.
  Other improvements have been done on different aspects of file parsing, such as implementing faster algorithm for line-by-line reading, better code architecture (especially in vtkOBJReader) and a slightly faster text to int parsing algorithm.
  
  Reply
Jaswant Panchumarti says:

September 18, 2023 at 11:58 am

This is very cool from a WebAssembly point of view. I’ve currently hit a limit attempting to load a .vtp file larger than 2GB in chrome because the chrome JavaScript engine cannot allocate more than 2 billion bytes for ArrayBuffer. Both the input string and file modes involve a copy from the javascript heap onto the webassembly heap. This motivates me to investiage if it’s possible to connect Web Storage API to VTK with a custom vtkResourceStream.

Reply