VTK/Wrapper Update 2010

From KitwarePublic
Jump to navigationJump to search

The new wrappers for VTK are not really "new", but they are a drastic clean-up of the original wrappers which were completed circa 1998. The wrapper renewal project is currently an open project, with four necessary items and a long wish list of desired new features. The four main goals of this project are: 1) Clean up the wrapper code by removing hard-code hexadecimal constants and reducing voodoo factor, 2) Properly wrap vtkStdString, because it is a crucial interface type, 3) Wrap vtkVariant in Python, especially for use in ParaView, and 4) Eliminate the need for BTX/ETX markers in the code.

Overview

The main design goals for VTK wrappers have not changed since 1998, specifically they must be:

  1. Scalable to a very large number of classes with minimal additional compile-time or run-time overhead.
  2. Able to wrap VTK classes as automatically as possible, with a minimal amount of hinting.
  3. Able to support multiple wrapper back-ends for different wrapper languages.

The core of the wrapper is a lex/yacc parser that reads C++ header files and stores information about the classes in C data structures that can be used by the wrapper-generator back-ends. This parser and its data structures are what have received the most attention during this wrapper update. Some important points about the parser (both the new and the old) are as follows:

  1. It only parses the header file in question, it does not pull in the included header files.
  2. It understands all (or nearly all) of the VTK macros defined in vtkSetGet.

These two points are important for the efficiency and simplicity of the parser. The parser does not have a C preprocessor, and it does not read more than one file at a time. Instead, it relies on its built-in knowledge of the VTK macros. The new parser front-end does have the ability to read and parse multiple header files, but this feature is not taken advantage of for VTK. It is primarily just a result of the code cleanup.

The four main items

The big cleanup

The first part of the cleanup was to remove all the hexadecimal constants like 0x303 from the files in the Wrapper directory, and replace them with named constants defined in a new header file called vtkParseType.h. This was a tedious job, but just by itself was enough to make the code much more readable. For example, VTK_PARSE_CHAR_PTR is obvious, while 0x303 is not.

The second stage of cleanup was to create a new vtkParseMain.c to hold a shared "main()" method for the parsers and a better way of parsing command-line arguments. This new file gives the wrapper-generators the ability to receive "-I" arguments so that they can access all of the VTK include directories, though none of the VTK wrapper-generators utilize this feature yet. The coding style of vtkWrapPython.c was also improved, by using consistent method naming and by eliminating global variables.

The third, and most crucial, part of the cleanup was a reorganization of vtkParse.y and vtkParse.l, which hold the yacc and lex rules for the parser, as well as an update of the data structures in vtkParse.h that the class information is stored in. These files were a complete mess before, and though they are still very difficult code to read and understand, they are at least manageable now. Also, the hard-coded limitations in these files have been removed, and the data structures have been updated to capture the full richness of C++ types.

Note that many of these improvements to the parser have not yet been propagated to the wrapper-generators for Python, Tcl, and Java. For example, the new parser stores information about template types, multi-dimensional arrays, enums, preprocessor constants and constant variables, namespaces, typedefs, etcetera. It will still require a substantial programming effort to implement these features in the language wrappers.

Wrapping vtkStdString

The vtkStdString type was introduced in VTK 5.0, as a VTK-standard subclass of std::string. It was initially wrapped via the expedient of adding a "const char *" typecast operator to it so that the wrappers could simply treat vtkStdString return values as if they were "const char *". This trick unfortunately only works for methods that return "vtkStdString&", i.e. methods that return a reference to a persistent string. As a result, VTK methods that returned vtkStdString by value had to be surrounded by BTX/ETX because, if they were wrapped, they would return a temporary vtkStdString object to the wrappers, which would then grab a pointer to the internal "char *", which would immediately become invalid. This issue of having to BTX/ETX methods that return vtkStdString persisted from 2005 to 2010, with only a select few methods that returned "vtkStdString&" being properly wrapped. The original addition of vtkStdString to the wrappers is logged as follows:

ENH: Wrap vtkStringArray by adding vtkStdString as a special token and mapping 
it to "const char *" in the wrappers.  vtkStringArray::GetValue() was changed to 
return a reference because otherwise c_str() is called on a temporary 
vtkStdString object.
dgobbi (author) May 21, 2005

In the new wrappers, vtkStdString (and vtkstd::string and std::string) are recognized by the parser as their own types, rather than as "const char *", so now all VTK methods that use vtkStdString can be properly and safely wrapped.

The parser also recognizes vtkUnicodeString, but only the Python wrappers handle this type. In the python wrappers, the vtkUnicodeString is synonymous with Python's unicode type, with automatic conversion between the two.

Wrapping vtkVariant

The vtkVariant type is a VTK type that can hold any of the types commonly used in VTK, such as the C++ numeric types, vtkObjects, vtkStdString, and vtkUnicodeString. It is, in other words, an interface to a union of these types. An increasing number of classes in VTK use it as an interface, so there was a strong interest in wrapping it, particularly for use in ParaView's python scripting engine. The new wrappers make vtkVariant available in Python, but not in Java or Tcl (and not, as of yet, in ParaView's ClientServer wrapper).

Two approaches could have been taken for wrapping vtkVariant. The first approach would have been to make vtkVariant invisible from Python, i.e. methods taking vtkVariant arguments would automatically convert the given Python type into a vtkVariant, and methods returning vtkVariant would automatically convert the vtkVariant to a native Python type (or to a vtkObject). The second approach was to explicitly wrap vtkVariant and make it possible to construct and use vtkVariant objects within python. This latter approach was taken, because it makes Python VTK code much easier to compare with and convert to C++ VTK code.

One concession was made, however. The VTK/Python wrappers were modified to support automatic argument conversion via the vtkVariant constructors. So if a VTK method accepts a vtkVariant, then you can pass a numeric value, a string, a unicode string, or a vtkObject and the vtkVariant will be constructed automatically. This kind of argument conversion is standard in C++, but not in Python, except for the VTK/Python wrappers.

The method used to wrap vtkVariant is generic, and can be applied to other special VTK types. Currently the special-wrapped types for VTK/Python are vtkVariant, vtkTimeStamp, vtkArrayCoordinates, vtkArrayExtents, and vtkArrayRange.

Also see the following project page: Wrapping special types (start Apr 28, 2010, finish Jun 18, 2010)

Eliminating BTX/ETX from VTK header files

There are two main uses BTX/ETX are used in the VTK header files. The first use is to block off code that the VTK wrapper parser cannot parse, since it does not understand all C++ syntax. The second use is to block of methods that, if they were wrapped, would cause the wrappers to either refuse to compile, or compile and then segfault if the method was called.

The new wrappers tackle both of these issues in order to make it possible to remove BTX/ETX from the code. The main feature of the new parser is that it is a full C++ parser, and is likely to only be confused by the use of unrecognized preprocessor macros (since the wrapper's parser lacks a true preprocessor).

The second issue, i.e. the problem of wrapped methods either not compiling or segfaulting when used, was due to the inability of the wrappers to properly recognize anything but basic C types and vtkIdType. When the wrappers saw a vtkSomething as an argument, they would always assume that this was a vtkObjectBase-derived type. There are only two ways for the wrappers to be able to figure out types: the first is to have them go through all included header files and look for class definitions and typedefs, and the second is for them to be given a list of types that they can consult.

This "list of types" is provided by the new vtkWrapHierarchy tool, which has its own project page here. The vtkWrapHierarchy tool reads all the VTK header files in one go, and spits out a file that lists all the classes, typedefs, and enums that are defined within the kit. This information is then pulled in by the wrappers, which use it in order to properly wrap method arguments and return types.

Wish list items

The four items listed above are the only items that are certain to be done for the VTK 6.0 release. There are several desired features, however, that might be added as resources allow:

Complete vtkParse.y and vtkParse.l

The new parser is able to parse virtually any C++ code, but there are a few bits of information that it does not yet store for use by the wrappers. The short list of definitions that are not stored for use by the wrappers are as follows:

  1. Nested classes
  2. Unions
  3. Anonymous structs and classes
  4. Typedefs involving struct or class definitions
  5. Variables

None of these items are used as part of the VTK interface, which is why they are not yet supported. However, it would still be nice if vtkParse.y was complete in this regard.

A more serious problem with the parser is that it lacks a proper preprocessor. There are rules in vtkParse.l that recognize certain #ifdefs, but only in a very crude manner. If vtkParse.l was made to properly handle #if directives, and furthermore made to include vtkConfigure.h, then the parser would become much more robust. Macros, however, would still need to be dealt with in an ad-hoc manner because they are wrapped directly, rather than being expanded and then wrapped.

Generalized special type wrapping for Python

The technique that was used to wrap vtkVariant in python is general and extensible, but it requires that all the wrapped special types have the following basic properties:

  1. support pass-by-value, i.e. a public copy constructor and assignment operator
  2. have a "<<" operator function to allow printing to a stream

The second requirement is problematic, and should be removed. The new parser can tell vtkWrapPython.c about any operator and constructor methods defined for an object, and this information should be used to allow wrapping of methods that don't have "<<" support. Two other highly desirable features for special-type wrapping are:

  1. support for hierarchies of special types (i.e. vtkValue, vtkDataValue)
  2. support for templated VTK types in python

Also see this wiki page: Python wrapper enhancements

Wrapping of constants

The new parser provides a list of all constants defined in the header files. These constants should be automatically wrapped in all of the wrapper languages. In Python it would be very easy to wrap public enum constants defined inside classes, in addition to those defined in the global namespace.

Unicode in Java, Tcl

Adding vtkUnicodeString support to the python wrappers was a simple feat that required only a few hours of work. It should be similarly easy to add support for Tcl and Java.

Cleanup of the Java and Tcl wrappers

The worst code-style offences in the vtkWrapJava.c and and vtkWrapTcl.c should be fixed: 1) elimination of global variables, 2) reduction of code duplication, 3) better method names and improved code documentation.

Unified wrapper for VTK and ITK?

It is uncertain whether adding recognition of ITK macros to the parser would be enough to make it able to parse ITK header files. It would also have to deal with VNL header files, and would have to recognize all the basic STL container types. It might, in fact, be necessary to make it recognize #include directives and parse through all included header files so that it can keep track of all typedefs and other relevant information. The parser does, however, already handle templates and typedef statements.

The back-end wrapper generators for Python and Tcl would need to be modified to utilize the template information and all the other ITK-relevant information that the parser provides.

The advantage to this approach is that it would be possible to wrap classes that utilize both ITK and VTK types in their interfaces. It could also make wrapper compilation for ITK much faster, and in general it would make it much easier to use ITK and VTK together in the wrapper languages a more pleasant experience.