Supporting tomorrow’s data storage systems in VTK / ParaView
Context
As part of the SAGE2 project funded by the European Union, Kitware worked on porting VTK to a new generation of data storage system.
SAGE2 is a consortium of European Experts in data storage, data management and supercomputing technologies, lead by Seagate.
The Precursor SAGE project built a prototype storage system in 2017 that is now running and being extended at the Juelich Supercomputing Centre in Germany.
This prototype:
- Gives us a deeper understanding of the usage of Non Volatile Memories (NVM)
- Is a storage system that can accommodate any storage device type (Disk, SSD, NVM)
- Runs on software that can help the storage system to keep growing indefinitely
- Is a storage system that can also do computations
- Can work with low power processing technology – based on “arm”
Most of the project was open-sourced as part as the Cortx community.
A distributed object store, replacing the usual filesystem, called Motr was developed.
The main challenge for Kitware was to use one of the higher-level API of Motr to be able to read data from the object store and pass them to a visualization pipeline.
Developments
To interact with Motr, we choose the Ummap-io library improved by Atos, as it offers a generic API to work with several back-ends (in our case, filesystem and Motr).
A first implementation was done as a ParaView plugin. This allows us to use a release version of ParaView, instead of create patches/branches harder to maintain and update.
The plugin adds a new source to ParaView, the MeroVTKReader (Mero is the old name of Motr), with two properties: the URI and the size of the object.
Internally, we forward those information to Ummap-io and get back a pointer to the corresponding memory in return.
Then we pass this pointer to a VTK ASCII reader, which creates a vtkDataObject
to feed the pipeline.
Developping a ParaView plugin has a nice side effect: it brings the SAGE2 technologies to the Web!
Visualizer natively supports the plugin and ParaView Lite only requires some UI elements.
Good points: this solution is small, efficient and non-intrusive.
Not so good point: most of the file readers cannot read from a raw pointer and require a file path to create a fstream
from it. So our approach cannot be used for those.
Next step: the boost streams
Then we thought about the boost iostreams library, an easy way to write a custom stl-compatible stream.
We wrote an implementation of a UmmapioSource
that creates a ummap-io mapping and can be used as standard stream, for instance
io::stream<UmmapIoSource> fileStream(uri);
for (std::string line; getline(fileStream, line);)
{
std::cout << line << std::endl;
}
The idea behind that is to hook vtksys::ifstream
and replace it with our UmmapioSource, so most of VTK readers will be ported to Motr system without additional work.
Pros: with this approach, a lot of readers and writers may take benefit of new storage technologies. We did it with SAGE2 Motr but we can imagine even more! Reading from network, from database, etc…
Cons: it requires an actual modification of the inner tools (the VTK library). Adding a new dependency (such as UmmapIo) and modifying core code is harder and longer than writing a plugin.
Whereas the plugin was developed and tested, the boost approach is still at the proof of concept step.
References
- SAGE2 main website: https://sagestorage.eu/
- Cortx resources: https://github.com/Seagate/cortx
- Boost tutorial for source stream: https://www.boost.org/doc/libs/1_77_0/libs/iostreams/doc/tutorial/container_source.html
- ParaView Lite: https://kitware.github.io/paraview-lite/
- ParaView Visualizer: https://kitware.github.io/visualizer/docs/
Acknowledgments
- SAGE2 consortium funded by the European Union
- Sebastien Vallat @ Atos for ummap-io support.