[Insight-users] Using ITK with unicode character set

Sean McBride sean at rogue-research.com
Fri Jul 6 10:06:03 EDT 2007


Luis,

Thanks for your reply.

>I'm not sure if you could solve the ambiguity of unicode by
>passing std::string to the SetFileName() methods.

I am not very familiar with STL, but I don't believe std::string is very
Unicode-friendly as it is basically an array of bytes.

>If that is not the case,
>
>then we could add another overload to that StringMacro in order
>to have methods with the wide char pointer as argument.
>
>That's a relatively minor change to make.

wchar_t is evil, I strongly suggest you never use it, especially if ITK
is already wchar_t-free  Why?  Well, that has been discussed on many
lists many times.  Here's a decent summary:
<http://lists.apple.com/archives/carbon-dev/2005/Feb/msg00491.html>

String handling is a rather complex subject that many people do not
understand the details of.  I really can't recommend this article enough:
<http://www.joelonsoftware.com/articles/Unicode.html>

Also, this multi-part article is really good too; parts 1 and 2 give a
good background.
<http://blogs.msdn.com/ryanmy/archive/tags/I18N/default.aspx>

If you haven't read these, they are absolutely worth your time.

Now, for the matter at hand.  In our usage of ITK's SetFileName() we
pass a char* encoded as UTF8.  On Mac OS X, this is the correct thing to
do.  I am able to open filenames with the oddest of characters.  This
works because OS X's file system APIs, like the POSIX open() API, are
documented to required UTF8.  Our char* gets converted into a
std::string (array of bytes) by ITK, then it is presumably converted
back to a char* in the bowls of stl when it is passed to open().  The
important part here is that open() (the POSIX API) is documented to take
a char* encoded as UTF8.

What character encoding do the Windows file system APIs require?

I suspect the correct thing from ITK's point of view is to document
SetFileName(), and similar APIs, saying "the string should be encoded
using the character encoding required by the host OS.  For example, UTF8
on Mac OS X and ??? on Windows".

-- 
____________________________________________________________
Sean McBride, B. Eng                 sean at rogue-research.com
Rogue Research                        www.rogue-research.com 
Mac Software Developer              Montréal, Québec, Canada




More information about the Insight-users mailing list