[Insight-developers] Image writing with unicode filename impossible with MSVC

Tom Vercauteren tom.vercauteren at m4x.org
Mon Oct 26 16:54:55 EDT 2009


Hey Bill,

> Is there a way to try the portability of this solution without
> touching any itk classes. Can you check in a test that verifies the
> functionality and portability of your proposed solution before we
> commit to this solution?

Sure, I can make a unit test out of my previous preliminary
experiments with utfcpp:
http://public.kitware.com/Bug/file_download.php?file_id=2574&type=bug

This will just require to commit one new test and four files from
utfcpp (e.g. in itkExtHeaders). Would this be fine?

Tom


> On Mon, Oct 26, 2009 at 4:14 PM, Tom Vercauteren
> <tom.vercauteren at m4x.org> wrote:
>> Hi all,
>>
>> I'm back on the unicode filenames topic and really need your feedback
>> before I start touching every IO class...
>>
>> I have uploaded a preliminary patch on the bug tracker that allows the
>> use of utf-8 encoded strings on windows for several IO classes:
>>  http://public.kitware.com/Bug/file_download.php?file_id=2601&type=bug
>>
>> Namely, what is working already is writing (and maybe reading) of
>> unicode filenames on windows for the following formats:
>> - jpeg
>> - png
>> - meta (mhd and mha)
>> - tiff
>>
>> My approach was to convert the utf-8 encoded std::string to a utf-16
>> encoded wstring (on windows only) when it becomes necessay. This is
>> done using the utfcpp library:
>>  http://utfcpp.sourceforge.net/
>>
>> For backward compatibility reasons, this conversion is activated by a
>> cmake variable:
>>  ITK_USE_REVIEW_UTF8_STRINGS
>>
>> For png, jpeg and tiff, no modification were necessary to the
>> underlying third party libraries.
>>
>> For metaio, one file had to be modified. For backward compatibility
>> reasons, the new behavior is only activated if
>>  METAIO_USE_REVIEW_UTF8_STRINGS
>> is defined. Of course, turning ITK_USE_REVIEW_UTF8_STRINGS on in
>> cmake, turns METAIO_USE_REVIEW_UTF8_STRINGS on in the c++ code.
>>
>> Could you give a look at my preliminary patch and tell me if something
>> along those lines could be accepted into ITK?
>>
>> Cheers,
>> Tom
>>
>> On Tue, Oct 20, 2009 at 18:40, Tom Vercauteren <tom.vercauteren at m4x.org> wrote:
>>> Hi all,
>>>
>>> Thanks for your constructive feedback.
>>>
>>> Benjamin and I have looked a bit further into this issue and into utfcpp.
>>>
>>> Unfortunately utfcpp does not really provide the features we would
>>> really like, namely:
>>>
>>> - It does not define a separate utf8 string class, it uses std::string
>>> as a container
>>>
>>> - It does not allow the creation of a utf8 encoded std::string from a
>>> std::string encoded with the default encoding
>>>
>>>
>>> That being said, we can still make efficient use of it. Here is a proposal:
>>>
>>> 1) We keep the current API that only allows users to set char* or
>>> std::string filenames
>>>
>>> 2) We specify in the documentation that these strings have to be
>>> encoded in utf8 on MSVC (and other utf8-based systems as previously)
>>>
>>> 3) On MSVC, we use utfcpp to check whether the filename actually is
>>> encoded in utf8 and we throw an exception otherwise
>>>
>>> 4) We write fopen-like functions in ITK (say itk::fopen) that works
>>> with utf8 filenames (For MSVC, this will basically use utfcpp to
>>> convert the utf8 encoded string to a utf16 encoded wstring and call
>>> _wopen)
>>>
>>> 5) We use itk::fopen when possible instead of fopen
>>>
>>> Some preliminary experiments are shown here:
>>> http://public.kitware.com/Bug/file_download.php?file_id=2574&type=bug
>>>
>>> The only drawback of this approach is that it is not strictly backward
>>> compatible for MSVC. More specifically it will work as previously with
>>> ASCII filenames but will not work without prior utf8 conversion for
>>> non-ASCII filenames that could be represented in the local codepage.
>>> We could of course add a cmake switch to maintain strict backward
>>> compatibilty if deemed necessary.
>>>
>>> Thoughts?
>>>
>>> Tom
>>>
>>>
>>>
>>> On Tue, Oct 20, 2009 at 14:56, Brad King <brad.king at kitware.com> wrote:
>>>> Sean McBride wrote:
>>>>>
>>>>> On 10/19/09 11:15 AM, Brad King said:
>>>>>
>>>>>> As the primary maintainer of KWSys I prefer to put as little
>>>>>> in the library as possible.
>>>>>
>>>>> Perhaps I haven't been following closely enough, but do you mean you
>>>>> wouldn't want to create a utf8 lib from scratch in KWSys or that you
>>>>> don't even want a thin wrapper over utf-cpp in KWSys?
>>>>
>>>> Both.
>>>>
>>>> There is no reason to create a utf8 lib from scratch when there are
>>>> plenty of third-party libraries available.  We cannot do a thin-wrapper
>>>> because KWSys cannot have third-party dependencies.
>>>>
>>>> IMO KWSys already has too much.  Originally it was just supposed to avoid
>>>> duplicate Kitware-written code that was copied between VTK and ITK.  It
>>>> was a/my mistake to add things like the MD5 hash implementation to it.
>>>>
>>>>> If the latter, that means we'd end up with both a vtkUnicodeString and
>>>>> itkUnicodeString?  What if CMake needs to process utf8?
>>>>
>>>> We already have zlib in all three projects, named vtkzlib, itkzlib, and
>>>> cmzlib.  Each project mangles the symbols to avoid conflicts, and they
>>>> all support sharing a system-installed version.
>>>>
>>>> -Brad
>>>>
>>>
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at
>> http://www.kitware.com/opensource/opensource.html
>>
>> Kitware offers ITK Training Courses, for more information visit:
>> http://kitware.com/products/protraining.html
>>
>> Please keep messages on-topic and check the ITK FAQ at:
>> http://www.itk.org/Wiki/ITK_FAQ
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.itk.org/mailman/listinfo/insight-developers
>>
>


More information about the Insight-developers mailing list