Developing HDF5 readers using vtkPythonAlgorithm

In my last article, I introduced the new vtkPythonAlgorithm and showed how it can be used to developed fully functional VTK algorithms in Python. In this one, we are going to put this knowledge to use and develop a set of HDF5 readers using the wonderful h5py package.

First, let’s use h5py to write a series of simple HDF5 files.

Here I used the vtkRTAnalyticSource which generates synthetic data for testing purposes. I varied its X Frequency parameter in order to create a file series. The output will be a set of files ranging from data60.h5 to data79.h5. Each file will contain one 3D dataset.

Here is my first pass at a vtkPythonAlgorithm based reader.

Note that this is fairly basic. It performs no error checking and hard-codes the RTData dataset in RequestData. We can test the reader with a script like this:

Guess what. It doesn’t work 🙂 This is because the producer of any structured dataset (vtkImageData, vtkRectilinearGrid and vtkStructuredGrid) has to report the whole extent of the data in the index space. So we need to add the method to do that:

With this addition, we get the following result (click on the picture to see the animation).


As I discussed previously, RequestInformation provides meta-data downstream. This meta-data is most of the time lightweight. In this example, we used f[‘RTData’].shape to read extent meta-data from the HDF5 file. This does not read any heavyweight data. Later on, we will see other examples of meta-data that is provided during RequestInformation.

Let’s make our reader a bit more sophisticated. Notice that the RequestData we implemented above always reads the whole dataset. However, VTK’s pipeline is designed such that algorithms can ask a data producer for a subset of its whole extent. This is done using the UPDATE_EXTENT key. Let’s change our RequestData to handle UPDATE_EXTENT:

Here the key is the following:

Thanks to h5py’s support for reading subsets (called hyperslabs in HDF5 speak), we had to do very little to support the UPDATE_EXTENT request. Let’s try it out.

This will print (5, 10, 5, 10, 0, 10) as expected. A few notes:

  • UpdateInformation() tells the algorithm (and the pipeline upstream of the algorithm) to produce meta-data. In our example, this will lead to a call to HDF5Source.RequestInformation().
  • It is essential to call UpdateInformation() before setting any requests. Otherwise, any user set requests will be overwritten.
  • SetUpdateExtent() tells the algorithms to produce a given extent; in this case based on index space. There are other signatures of SetUpdateExtent() we will discover later.
  • This works only if the requests are set on the algorithm that Update() will be called on. If you were to set any requests on any algorithms upstream of the pipeline, they would be overwritten by downstream filters.

As a final exercise in this article, let’s write a Python filter that asks the HDF5Source to produce only a sub-extent so that we can contour and render a subset (à la vtkExtractVOI). Here it is.

If you look at RequestData() alone, this is a pass-through filter. It shallow copies its input to its output. The trick is in RequestUpdateExtent() where the filter asks for the user defined extent from its input. When this is combined with the reader’s ability of reading requested subsets, this filter acts as a subset filter producing the user requested sub-extent. RequestInformation() is written to reflect this : it tells downstream that the filter will produce the extent requested by the user.

Let’s put these two algorithms in a pipeline:

This will produce the following (click on the picture to see the animation).


We now have a fairly complex reader in our hands. With some digging through h5py’s documentation and a bit of numpy knowledge, you can put together readers that do a lot more very easily. In upcoming blogs, I will build on this foundation to highlight other features of VTK’s pipeline.

If you got a little lost in the details of the pipeline passes, don’t worry. In my next blog, I will discuss in a bit more detail how the various pipeline passes work and what they do.

5 Responses to Developing HDF5 readers using vtkPythonAlgorithm

  1. Andrew Maclean says:

    This is really impressive.

    In the last part, where you put the two algorithms in a pipeline could I suggest changing:

    for xfreq in range(60, 80):
    alg.SetFileName(‘data%d.h5’ % xfreq)


    for xfreq in range(60, 80):
    alg.SetFileName(‘data%d.h5’ % xfreq)

    import time

  2. Very nice post. Thanks!

    If I understand it correctly, you could have used a regular vtkExtractVOI? The RequestSubset was just to give another example of a Python algorithm?

  3. Another short question: In the commit

    from jan 14, you removed the vtkself parameter of VTKPythonAlgorithmBase.RequestData, and I noticed that it’s not there in the reader class for this blog post (which is much older, 2014).

    I saw that in my version of VTK (6.2), the vtkself parameter _is_ there (my IDE warned me that my signature didn’t match the base class). So I’m curious: Was it always a mistake (copy/paste error?) that it was there in the first place?

Questions or comments are always welcome!