Kitware, NVIDIA, and ORNL Collaborate on Enabling Titan to be used as the World’s Largest Visualization Cluster
With SC15 underway, Kitware is pleased to announce its collaboration with NVIDIA and Oak Ridge National Laboratory (ORNL) to upgrade ParaView on ORNL’s Titan supercomputer.
The technical innovation underlying the renaming of Titan in 2012 was the introduction of general purpose graphics processing unit (GPGPU) co-processing hardware in the form of NVIDIA Tesla K20X accelerators on each node. Surprisingly, the GPU capability, which accounts for roughly 90 percent of the machine’s theoretical capacity, was at that time inaccessible to traditional graphics applications like ParaView due to peculiarities of the machine configuration. In fact, as recently as version 4.3, the ParaView installation there was a direct descendent of the version that ran on the machine in its initial XT3 “Jaguar” configuration. There, rendering was done in a statically linked cross-compiled version of MESA’s classic (non-threaded) back end.
The team is happy to announce that we have finally overcome all of the system-level stumbling blocks so that ParaView/Catalyst users now have access to the full graphics capacity of the machine. By providing updated drivers, delivering a version of X11 that runs on the compute nodes, and restructuring our build and job launch scripts, ParaView visualization and Catalyst in situ visualization jobs run significantly faster than was possible beforehand.
To try it out from ParaView, simply File/Connect, click Fetch Servers, import TITAN@ORNL or windows to TITAN@ORNL, hit Connect, and enter your credentials in the popup login terminal window. For complete instructions, including how to use the installation in batch mode and for Catalyst enabled simulations, see:
https://www.olcf.ornl.gov/tutorials/running-paraview-on-titan
Furthermore, ParaView 5.0 brings a completely refactored and modernized OpenGL interface to ParaView. The revamped installation and modernized rendering code together increase rendering performance by a factor of more than 1000 for polygonal models, from barely .3 million triangles per second per processor (single threaded mesa classic) to approximately 450 million triangles per second per GPU (OpenGL”2”).