VTK/Unwrappable Code: Difference between revisions
No edit summary |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 23: | Line 23: | ||
This code will not work: | This code will not work: | ||
class | class MyClassHasAVeryLongNameSo\ | ||
IBrokeItWithABackslash; | IBrokeItWithABackslash; | ||
Line 31: | Line 31: | ||
=== Universal character names in identifiers === | === Universal character names in identifiers === | ||
In C++11, universal character names \uXXXX and \UXXXXXXXX can be used in place of non-ASCII characters. The wrapper preprocessor only allows these in string literals and character literals, but not in identifiers. | In C++11, universal character names \uXXXX and \UXXXXXXXX can be used in place of non-ASCII characters. The wrapper preprocessor only allows these in string literals and character literals, but not in identifiers. However, the wrappers allow you to use utf-8 encoding for identifiers (and for strings, characters, and comments). | ||
// this is fine | // this is fine | ||
Line 58: | Line 58: | ||
The wrapper's parser does not distinguish type names from other names within its grammar rules, therefore it cannot disambiguate between the constructor declaration at the top and the funny-looking variable declaration in the middle. It will try to interpret both as constructor declarations. | The wrapper's parser does not distinguish type names from other names within its grammar rules, therefore it cannot disambiguate between the constructor declaration at the top and the funny-looking variable declaration in the middle. It will try to interpret both as constructor declarations. | ||
=== Ambiguous | === Ambiguous angle brackets === | ||
The | The wrappers fail when angle brackets occur in the RHS of an assignment, unless the assignment is taking place within a function body (the parser ignores all function bodies, because they are part of the "implementation" rather than part of the "interface".). | ||
// this looks totally natural, and is valid C++ code | // this looks totally natural, and is valid C++ code |
Latest revision as of 02:28, 18 October 2015
The VTK Tcl, Java, and Python wrappers use a custom parser to read the VTK C++ header files. This parser consists of the following pieces:
- a C++ preprocessor
- a lex/yacc C++ parser (a GNU bison GLR parser)
- a set of data structures for describing a C++ API
As of this writing, the above are based on the C++11 grammar and are being updated for C++14 and C++17.
Syntax that the wrapper's cannot parse
The parser was written based on the ISO draft standards for C++98, C++11, and C++14. However, there are specific parts of the C++ grammar that were not implemented. These are described below.
Backslash line continuation in odd places
According to the C++ standard, any backslash that occurs at the end of a line (unless it occurs within a raw string) is meant to indicate that the following newline should be ignored. The wrapper preprocessor, however, does not allow a backslash to be used within any token except for a string literal.
This code will work:
#define mymacro(x) \ (2*(x)) const char *s = "this is a long\ string broken in two.";
This code will not work:
class MyClassHasAVeryLongNameSo\ IBrokeItWithABackslash; const int i = 'A\ ';
Universal character names in identifiers
In C++11, universal character names \uXXXX and \UXXXXXXXX can be used in place of non-ASCII characters. The wrapper preprocessor only allows these in string literals and character literals, but not in identifiers. However, the wrappers allow you to use utf-8 encoding for identifiers (and for strings, characters, and comments).
// this is fine const char16_t *s = u"Hello\u00A0There"; // this will break things const char *encyclop\u00C6dia = "Britannica";
Ambiguous member variable definition
C++ has an ambiguous grammar. One of the most common sources of ambiguity is that a name will sometimes be identified as a type, and sometimes as a function or variable name, depending on context.
struct x { typedef int z; // this kinda looks like a constructor x(z); // so would you believe that this defines a variable y of type z? z(y); // it does, because it is equivalent to writing this! z y; };
The wrapper's parser does not distinguish type names from other names within its grammar rules, therefore it cannot disambiguate between the constructor declaration at the top and the funny-looking variable declaration in the middle. It will try to interpret both as constructor declarations.
Ambiguous angle brackets
The wrappers fail when angle brackets occur in the RHS of an assignment, unless the assignment is taking place within a function body (the parser ignores all function bodies, because they are part of the "implementation" rather than part of the "interface".).
// this looks totally natural, and is valid C++ code const T unity = static_cast<T>(1.0); // whitespace makes it look a bit less natural const T unity = static_cast < T > (1.0);
The difficulty is that our parser thinks that the angle brackets might be less-than and greater-than operators, because it doesn't know that T names a type and not a constant. So it conks out after reporting "syntax is ambiguous". It is possible to disambiguate by writing the code as follows, which causes the parser to take a different path:
// this causes to parse to succeed const T unity = (static_cast<T>(1.0));