Data Operators in Tomviz
A core feature of Tomviz is the ability to modify 3D image data by one or more of the data operators available in the program. These operators can be used to perform 3D reconstruction of raw electron tomography data, to imbue images with metadata, or to perform general 3D image processing and analysis on reconstructed images. Tomviz has a couple ways to implement operators, one of which involves the use of Python. In this blog post I give a summary of the different ways to implement operators with specific emphasis on defining them in Python.
Types of Data Operators
The core of Tomviz is written in C++, and some data operators are expressed in that language. The bulk of the processing functionality is contained in a function that takes the current dataset, operates on it, and then returns a boolean value indicating the success or failure of the operator. Tomviz invokes this function while executing each data operator in a pipeline.
Data operators may accept user input in the form of parameters that control how the operator executes. To obtain those parameters, some operators create a custom dialog box with user interface elements where parameters can be set. When the “OK” button is clicked in the dialog box, the input parameters are copied from the user interface to the data operator, and then the operator executes.
There is great flexibility in the types of data operators and in creating rich user interfaces for controlling them within Tomviz at the cost of some implementation complexity. For many operations, however, it is advantageous to have a fast path to implementing a new operator, especially when that path makes use of a language that is becoming increasingly popular among domain scientists.
Python Operators
A Python data operator executes a Python script when the data operator is applied. This design is very flexible and enables implementation of an unlimited number of reconstruction algorithms and data operators with minimal code. In fact, Tomviz 0.9.3 supplies a total of 51 Python data operators and reconstruction algorithms, compared to 5 operators defined purely in C++, making it by far the predominant means by which functionality has been added to Tomviz.
Tomviz provides extensive support for manipulating data through Python. The ParaView platform, on which Tomviz is built, provides Python bindings for both ParaView itself and the Visualization Toolkit (VTK). NumPy and SciPy are also included with Tomviz so that data can be manipulated with functions in those popular Python packages. Additionally, Python bindings for the Insight and Segmentation Toolkit (ITK) are included. Bridges between NumPy, VTK, and ITK enable mixing of functionality from all the Python packages to create powerful data operators.
Python Script Types
There are two types of Python scripts that can be used to define a data operator: simple scripts and cancelable scripts.
Simple Scripts
The general form of a simple Python script is:
def transform_scalars(dataset): # apply operations to dataset # place results back in the dataset passed # into this function
The dataset parameter is a vtkImageData object containing the current version of the data. I say “current version” because data operators can be chained together into a pipeline in the user interface to create complex data processing operations. This means that the script in the current data operator may be getting a dataset that has been modified by one or more data operators operating before it. Script writers need not, and indeed should not, make assumptions about whether any prior data operators exist, or their order.
Let’s take a look at a very simple example of a operator that sets all the voxels in the dataset to zero if they are less than zero.
def transform_scalars(dataset): """Set negative voxels to zero""" from tomviz import utils data = utils.get_array(dataset) data[data < 0] = 0 # Set negative voxels to zero # Set the result as the new scalars. utils.set_array(dataset, data)
This script imports the tomviz.utils module to access useful functions. The tomviz.utils.get_array() function takes a dataset and returns a NumPy array. The returned array can be manipulated like any other NumPy array – here, numbers below zero are set to zero. At the end of the script, the tomviz.utils.set_array(…) function replaces the data in the dataset with the contents of the NumPy array.
For potentially long-running operators that should report progress and/or be cancellable, a slightly different Python script can be used. In it, you define a subclass of a tomviz.operators.CancelableOperator Python class. The CancelableOperator class provides two features – the ability to cancel an operation while it is still executing, and the ability to report operator progress. The only method that needs to be defined in this subclass is called transform_scalars, and it takes the standard self parameter in addition to the dataset parameter.
Cancelable Scripts
Let’s take a look at an example of a cancelable data operator:
import tomviz.operators NUMBER_OF_CHUNKS = 10 class InvertOperator(tomviz.operators.CancelableOperator): def transform_scalars(self, dataset): from tomviz import utils import numpy as np self.progress.maximum = NUMBER_OF_CHUNKS scalars = utils.get_scalars(dataset) if scalars is None: raise RuntimeError("No scalars found!") result = np.float32(scalars) max = np.amax(scalars) step = 0 for chunk in np.array_split(result, NUMBER_OF_CHUNKS): if self.canceled: return chunk[:] = max - chunk step += 1 self.progress.value = step utils.set_scalars(dataset, result)
This operator inverts the voxel values in the dataset. Note that the class declaration of InvertOperator is a subclass of tomviz.operators.CancelableOperator.
The CancelableOperator class has some member variables that are used to communicate progress information back to the application. The CancelableOperator.progress.maximum member variable defines the number of units of work. In this example, the work is split up into 10 chunks. The CancelableOperator.progress.value member variable communicates which unit of work has been completed. Additional progress information can be communicated to the user by setting the CancelableOperator.progress.message member variable to a message string.
The inherited member variable CancelableOperator.canceled indicates whether Tomviz has requested cancellation of the data operator, usually at the direction of the user. The script can check this flag at places where stopping the operator makes sense. In this example, this flag is checked prior to computing each chunk.
Additional information for defining data operators is available in the Tomviz Python Operators documentation.
Defining Data Operator Parameters
As mentioned, data operators can have user-supplied parameters that affect their operation. For Python-based data operators, these parameters can be described succinctly in a JSON file. The user interface for setting these parameters is generated automatically from the JSON description, eliminating the need to design a custom user interface for each data operator. Additionally, the values from the automatically generated user interface are added as optional named Python function arguments that are passed to the transform_scalars function defined in the Python script. Writing the parameter description once in JSON gets you the user interface and argument passing to the Python script for free, saving you a lot of time when developing new data operators.
An example JSON description looks like the following:
{ "name" : "Rotate", "label" : "Rotate", "description" : "Rotate dataset along a given axis.", "parameters" : [ { "name" : "rotation_angle", "label" : "Angle", "description" : "Rotation angle in degrees.", "type" : "double", "default" : 90.0, "minimum" : -360.0, "maximum" : 360.0 }, { "name" : "rotation_axis", "label" : "Axis", "description" : "Axis of rotation.", "type" : "enumeration", "default" : 0, "options" : [ {"X" : 0}, {"Y" : 1}, {"Z" : 2} ] } ] }
From this description, Tomviz creates the following user interface.
Several top-level JSON keys provide basic information about the data operator, including its name, the label that should be used to represent it in the user interface, and a more detailed description of the data operator. The parameters key specifies the list of all the parameters defined for the data operator.
Each parameter is described as a JSON object with several key-value pairs. Some of these key-value pairs are required. The name key gives the name of the parameter, the label key gives the text used to represent the parameter in the user interface, the description is a detailed description of the parameter, the type specifies the type of parameter, and the default entry specifies the default value of the parameter.
Other parameters are optional or depend on the type of the parameter. The minimum and maximum options specify the range that a parameter value may take. Neither, one, or both minimum and maximum can be provided. If only one is given, then the range is bound by only that constraint.
Numeric type parameters can have one or more elements. For parameters with one element, the default, minimum, and maximum values should be defined by a single numeric JSON value, e.g., 1. Parameters with two or more elements are defined as a list of numeric JSON values, e.g., [0, 1].
The enumeration type requires definition of another key-value pair named options. The options entry is a list of JSON objects where the single key in the object is the name of the enumeration value and its value is an integer value associated with that name. The integer value is what is the parameter value passed to the Python script. Look here for more information about the JSON description format.
Handling parameter values in Python scripts
The parameters defined in the JSON description are passed as named arguments to the transform_scalars function (or member function) in the Python script. These parameters should be defined as optional arguments with default values in the Python script. For example, the transform_scalars definition for the Rotate data operator is defined with the following:
def transform_scalars(dataset, rotation_angle=90.0, rotation_axis=0): # processing occurs here
Note the function arguments rotation_angle and rotation_axis match the names of the parameters defined in the JSON file.
Conclusion
This concludes our tour of data operators in Tomviz. The facilities for defining operators with Python make it possible to rapidly develop new operators with features users expect and/or need, such as the ability to set custom parameters and cancel the operator. Future version of Tomviz will load new data operators created by users and not expose just those that are packaged with it.