[Insight-developers] Image writing with unicode filename impossible with MSVC

Tom Vercauteren tom.vercauteren at m4x.org
Mon Oct 26 16:14:40 EDT 2009


Hi all,

I'm back on the unicode filenames topic and really need your feedback
before I start touching every IO class...

I have uploaded a preliminary patch on the bug tracker that allows the
use of utf-8 encoded strings on windows for several IO classes:
  http://public.kitware.com/Bug/file_download.php?file_id=2601&type=bug

Namely, what is working already is writing (and maybe reading) of
unicode filenames on windows for the following formats:
- jpeg
- png
- meta (mhd and mha)
- tiff

My approach was to convert the utf-8 encoded std::string to a utf-16
encoded wstring (on windows only) when it becomes necessay. This is
done using the utfcpp library:
  http://utfcpp.sourceforge.net/

For backward compatibility reasons, this conversion is activated by a
cmake variable:
  ITK_USE_REVIEW_UTF8_STRINGS

For png, jpeg and tiff, no modification were necessary to the
underlying third party libraries.

For metaio, one file had to be modified. For backward compatibility
reasons, the new behavior is only activated if
  METAIO_USE_REVIEW_UTF8_STRINGS
is defined. Of course, turning ITK_USE_REVIEW_UTF8_STRINGS on in
cmake, turns METAIO_USE_REVIEW_UTF8_STRINGS on in the c++ code.

Could you give a look at my preliminary patch and tell me if something
along those lines could be accepted into ITK?

Cheers,
Tom

On Tue, Oct 20, 2009 at 18:40, Tom Vercauteren <tom.vercauteren at m4x.org> wrote:
> Hi all,
>
> Thanks for your constructive feedback.
>
> Benjamin and I have looked a bit further into this issue and into utfcpp.
>
> Unfortunately utfcpp does not really provide the features we would
> really like, namely:
>
> - It does not define a separate utf8 string class, it uses std::string
> as a container
>
> - It does not allow the creation of a utf8 encoded std::string from a
> std::string encoded with the default encoding
>
>
> That being said, we can still make efficient use of it. Here is a proposal:
>
> 1) We keep the current API that only allows users to set char* or
> std::string filenames
>
> 2) We specify in the documentation that these strings have to be
> encoded in utf8 on MSVC (and other utf8-based systems as previously)
>
> 3) On MSVC, we use utfcpp to check whether the filename actually is
> encoded in utf8 and we throw an exception otherwise
>
> 4) We write fopen-like functions in ITK (say itk::fopen) that works
> with utf8 filenames (For MSVC, this will basically use utfcpp to
> convert the utf8 encoded string to a utf16 encoded wstring and call
> _wopen)
>
> 5) We use itk::fopen when possible instead of fopen
>
> Some preliminary experiments are shown here:
> http://public.kitware.com/Bug/file_download.php?file_id=2574&type=bug
>
> The only drawback of this approach is that it is not strictly backward
> compatible for MSVC. More specifically it will work as previously with
> ASCII filenames but will not work without prior utf8 conversion for
> non-ASCII filenames that could be represented in the local codepage.
> We could of course add a cmake switch to maintain strict backward
> compatibilty if deemed necessary.
>
> Thoughts?
>
> Tom
>
>
>
> On Tue, Oct 20, 2009 at 14:56, Brad King <brad.king at kitware.com> wrote:
>> Sean McBride wrote:
>>>
>>> On 10/19/09 11:15 AM, Brad King said:
>>>
>>>> As the primary maintainer of KWSys I prefer to put as little
>>>> in the library as possible.
>>>
>>> Perhaps I haven't been following closely enough, but do you mean you
>>> wouldn't want to create a utf8 lib from scratch in KWSys or that you
>>> don't even want a thin wrapper over utf-cpp in KWSys?
>>
>> Both.
>>
>> There is no reason to create a utf8 lib from scratch when there are
>> plenty of third-party libraries available.  We cannot do a thin-wrapper
>> because KWSys cannot have third-party dependencies.
>>
>> IMO KWSys already has too much.  Originally it was just supposed to avoid
>> duplicate Kitware-written code that was copied between VTK and ITK.  It
>> was a/my mistake to add things like the MD5 hash implementation to it.
>>
>>> If the latter, that means we'd end up with both a vtkUnicodeString and
>>> itkUnicodeString?  What if CMake needs to process utf8?
>>
>> We already have zlib in all three projects, named vtkzlib, itkzlib, and
>> cmzlib.  Each project mangles the symbols to avoid conflicts, and they
>> all support sharing a system-installed version.
>>
>> -Brad
>>
>


More information about the Insight-developers mailing list