VTK Wrapper Overhaul 2010

The VTK wrappers received their last major overhaul in 1998 and have done well to serve the VTK community. In recent years, however, the wrappers have started to show some cracks, particularly as new types such as vtkVariant and vtkUnicodeString have been introduced to VTK but have not been made available in the wrappers. One reason for the wrappers’ slow development compared to the rest of VTK is undoubtedly the “intimidation factor” of the wrapper-generator code, which is very complex and lacking in documentation.

VTK wrappers were recently overhauled again. The four main goals of overhaul project were as follows:

  • Cleaning up the wrapper-generator code by removing hard-coded hexadecimal constants, reducing the use of global variables, and improving code documentation;
  • Proper wrapping of vtkStdString, which is a crucial interface type that is only partly wrapped;
  • Wrapping vtkVariant and other new VTK types in Python;
  • And eliminating the need for BTX/ETX “unwrappable section” markers in the code.

The overarching goal is to provide a new foundation for the wrapper-generators that will make it easier to move the wrappers forward.These changes have been made while maintaining backwards compatibility with existing Tcl, Python, and Java programs which use VTK. The old wrapper-generator code has not been replaced; it has only been cleaned up and enhanced.

WRAPPER PRIMER
To provide some background, the wrapper-generator code consists of a “front-end” parser that reads the VTK header files and stores the class declarations in simple data structures, and three “back-ends” that read those data structures and generate the wrapper code for each of the wrapper languages. Most of this project has focused on the front end, but enhancements have been added to the back end as well, particularly the Python back end.

The parser front-end can be further subdivided into a “lex” tokenizer (which also does rudimentary preprocessing) and a “yacc” parser that understands the C++ grammar. These two pieces are the foundation of the wrappers, or less generously, they are the bottleneck. The wrappers are only able to wrap the class and method definitions the parser can pull from the header files. Because of the parser’s importance, it has received more attention during this update than any other part of the wrappers.

An important feature of the VTK wrappers is that they wrap the VTK classes one class at a time using only the header file for that class, with a minimal amount of hinting. When combined with CMake, this approach is easily scalable to a very large number of classes in different directories or even in different packages. The new wrappers further enhance this approach by automatically generating “hierarchy” files that describe all types defined in any particular VTK source directory. These files are discussed in detail later in this article.

A SHINY NEW PARSER
The new parser is a significant improvement to the old parser code. The original code took a minimalist approach to parsing the VTK header files. It looked for the first VTK class defined in the header, and then extracted only the methods defined in that class. The rest of the file would be ignored, but since the parser lacked a full C++ grammar, it was not always successful in skipping over parts it was supposed to ignore. These troublesome patches of code had to be surrounded by BTX/ETX exclusion markers so that they could be removed at the preprocessing stage.

The new parser code reverses this minimalist approach: it reads all the declarations in the header file and stores them all for use in the wrappers. This means that typedefs, templates, constant definitions, enum types, operator overloads and namespaces are all available to the wrapper-generators. Each piece of information from the header file is stored in a C struct, with the most-used struct being the ValueInfo struct that is used for method arguments, variables, constants and typedefs. The following is from the vtkParse.h header file:

struct ValueInfo
{
  parse_item_t   ItemType; /* var, typedef, etc */
  parse_access_t Access; /* public, private, etc. */
  const char     *Name;
  const char     *Comment;
  const char     *Value; /* value or default val */
  unsigned int   Type; /* see vtkParseType.h   */
  const char      *Class; /* type as a string */
  int            Count;  /* for arrays */
  int            NumberOfDimensions;
  const char      **Dimensions;
  FunctionInfo      *Function; /* for function ptrs */
  int            IsStatic; /* class variables only */
  int            IsEnum; /* for constants only */
};

One should note that in contrast to the old parser, arrays can now be multi-dimensional. The dimensions are stored as strings, in case the dimensions are symbolic values (e.g., template parameters) that cannot be evaluated until compile time. The product of the dimensions is stored as an “int” if all the dimensions are integer literals.

Similar structs provide information for functions, classes, namespaces, templates, etcetera. The FunctionInfo and ClassInfo structs have backward compatibility sections that provide their info in the old wrapper format, so that wrappers-generators based on the old structs can easily be made to work with the new parser.

The new parser also features a preprocessor, something that was conspicuously absent before. The preprocessor stores defined macros and provides conditional parsing based on #if directives, eliminating yet another previous use of the BTX/ETX markers. Unlike a traditional preprocessor, the parser stores macros but does not expand them. This is by design, since several VTK macros have special meaning to the wrappers that would be lost if those macros were expanded. The parser can query the macros to get their value.

HIERARCHIES FOR ALL
A fundamental addition for the new wrappers is a collection of “hierarchy” files, one per source directory, that list the full genealogy of all the VTK classes in each directory. The file vtkCommonHierarchy.txt, for example, lists all classes defined in the Common directory. The file structure is simple:

vtkArray : vtkObject ; vtkArray.h ; ABSTRACT
vtkArraySort ; vtkArraySort.h ; WRAP_EXCLUDE

The name of the class comes first, followed by any superclasses, then the header file, and finally any of the CMake flags WRAP_EXCLUDE, WRAP_SPECIAL or ABSTRACT which apply to the class. Classes that have a name that is different from the name of their header file are automatically labelled with VTK_WRAP_EXCLUDE. Note that the file format might change in the future, so anyone who is interested in using this file should always use the functions defined in VTK/Wrapping/vtkParseHierarchy.h to read the file, instead of writing their own code to do so.

In addition to classes, the hierarchy files also include all typedefs and enum types encountered in the header files, in order to provide a comprehensive list of all types defined in VTK. The rationale behind the hierarchy files is as follows: previously, the wrappers would assume that any type with a “vtk” prefix was derived from vtkObjectBase, excepting types like vtkIdType that were specifically caught by the parser. This is the other reason that BTX/ETX had to be used so often (in addition to the aforementioned limitations of the parser), since methods that used new types like vtkVariant or vtkUnicodeString had to be BTX’d because these types were misidentified by the wrappers. By using the hierarchy files, the wrappers can identify any type defined in VTK.

WRAPPER COMMAND LINE ARGUMENTS
The command line for the wrapper-generators has been modified, and this will make it easier to invoke the wrappers by hand. The old calling convention still works, in order to support older CMake scripts. The new command line is as follows:

vtkWrapPython [options] input_file output_file

  –concrete     tell wrappers that class is concrete
  –abstract     tell wrappers that class is abstract
  –vtkobject     class is derived from vtkObjectBase
  –special     class not derived from vtkObjectBase
  –hints <file> specify a hints file
  –types <file> specify a hierarchy file
  -I <dir>         add an include directory
  -D <macro>     define a preprocessor macro

All of these arguments are optional. For instance, the wrapper-generators will automatically guess that any class with pure virtual methods is an abstract class. However, the –concrete option is needed in cases where an abstract class is being wrapped that will be replaced by a concrete factory-generated class at run time.

The “–hints” option provides a way of specifying the hints file that has been used by the wrappers since day one. The hints file is the only part of the wrappers that still uses hexadecimal literals to describe types. Support for it will be maintained indefinitely to ensure backwards compatibility.  The new “–types” option gives the wrappers access to the new hierarchy files that were described above.

The “-I” and “-D” options are forwarded to the preprocessor, so that if the file being wrapped includes important header files like vtkConfigure.h, the preprocessor can read those files and use the macro values defined within them.

STRINGS AND UNICODE
The vtkStdString type was introduced several years ago, and when it was added to the wrappers, it was wrapped by identifying it as “const char *” in the parser, with its c_str() method used to do the conversion. This caused problems, because most VTK methods return vtkStdString by value, resulting in the creation of a temporary string object on the stack, for which the char pointer returned by c_str() is likewise only temporarily valid.  Because of this, only methods that returned vtkStdString by reference could safely be wrapped, and methods that passed vtkStdString by value were blocked with BTX/ETX.  One of the reasons that vtkStdString was wrapped this way is that the old parser had only 15 slots available to identify fundamental VTK types, compared to nearly 255 slots for the new parser.

The new Tcl, Python and Java wrappers have all been modified so that they correctly identify vtkStdString as a std::string subclass and wrap it appropriately, allowing the BTX/ETX markers to be removed from methods that pass vtkStdString by value. The vtkUnicodeString type has also been added to the parser as a new type, and the Python wrappers have been modified to transparently convert between vtkUnicodeString and Python’s own unicode type.

WRAPPING VARIANTS IN PYTHON
There are two approaches that could have been used to wrap vtkVariant. The first approach would have been to have the Python wrappers implicitly convert Python types to and from vtkVariant, so that if a C++ VTK method returned a variant, the wrapped Python method would automatically extract and return the value stored in the variant. This would have been convenient, but also would have resulted in loss of information. For example, any integer type between “unsigned char” and “long” would automatically be converted into a Python integer, and Python users would not be able to discover the original C++ type. For this reason, an approach in which vtkVariant is wrapped as its own distinct Python type was used instead.

The technique used to wrap vtkVariant is similar to the technique used to wrap vtkObjectBase-derived objects. The main differences are the way memory management is done and the way that constructors are handled. Unlike vtkObjects, the vtkVariant is not reference-counted, so if a variant is passed as a method argument, it is copied just as if it was an int or any other basic type. Wrapping the many constructors for vtkVariant was a challenge, because the old Python wrapper code for resolving overloaded methods was inadequate for this task: it would simply try each overload in turn until one of the overloads could be called without generating an argument type error. Hence, calling vtkVariant(10) from Python would create an “unsigned char” variant since vtkVariant(unsigned char) is the first constructor in the header file. For this wrapper update project, code was added to the Python wrappers so that they compare all passed arguments against those of the various method signatures, and call the overload that provides the best match. This new code is used for all Python method calls, not just for constructors, so there might be some small backwards compatibility problems since the old code always called the first matched method, while the new code calls the best match.

Another feature that was added to the Python wrappers is automatic argument conversion via constructors. In C++, if a method requires a vtkVariant argument, but is passed as an “int”, the vtkVariant(int) constructor will automatically be called to convert the “int”. No such automatic conversion exists in Python; it is instead the responsibility of the method writer to have it explicitly convert the arguments. Fortunately, since all the VTK/Python methods are generated by the wrapper-generator code it only required some creative programming to have the Python wrappers automatically look through the constructors of all wrapped types and do conversions as necessary.

BEYOND VARIANTS
The new wrappers handle the Python-wrapping of vtkVariant automatically, and a few other special VTK types are similarly wrapped. These types are wrapped by marking them with VTK_WRAP_SPECIAL in the CMakeLists.txt file, and they must also have a public copy constructor and an “=” operator. Other special types that have been wrapped include the vtkArray helper types: vtkArrayCoordinates, vtkArrayExtents, vtkArrayRange, and vtkArrayWeights.

Even though the parser now recognizes all operator methods, at this point in time the Python wrappers only wrap the comparison operators “<” “<=” “==” “!=” “>=” “>”, and the stream output operator “<<” via Python’s “print” statement.  Some types will need additional operator methods to be wrapped in order to make them truly useful from Python.
CONCLUSION
The overhauled parser and the hierarchy files provide a solid new foundation for VTK wrapper development, but work still needs to be done to update the back-end wrapper generators for Tcl, Python and Java. The Python wrappers now support vtkUnicodeString and several other special VTK types, and the code has been reorganized and documented, but entities like multi-dimensional arrays and templated classes that are parsed by the new parser are not yet wrapped in any language. All of these can hopefully be added to the wrappers in the coming years.

ACKNOWLEDGEMENTS
I would like to acknowledge the work performed by Marcus Hanwell and Keith Fieldhouse in testing the new code and merging it with VTK, and would also like to thank Bill Hoffman, Will Schroeder and Ken Martin for allowing me to be part of their open-source experiment for these past twelve years.

David Gobbi is a software consultant in Calgary, Alberta who specializes in software for medical imaging and image-guided neurosurgery. He received his Ph.D. from the University of Western Ontario in 2003, and has been contributing to the VTK wrapper code since 1999.

Questions or comments are always welcome!