[Insight-developers] Image writing with unicode filename impossible with MSVC

Tom Vercauteren tom.vercauteren at m4x.org
Tue Oct 13 06:51:53 EDT 2009


Hi Luis,

Thanks for your feedback. Please find my comments below.

> Sean's warning about the inherit Evilness of wchar_t was
> quite convincing...
>
> That plus the perspective of adding code that only works
> on Windows are not quite motivating...

I do agree that c++ doesn't provide much incentive for dealing with
unicode questions.

Sean was perfectly right in saying that wchar_t is evil (especially on
linux and mac). However, there seems to be no simple way to avoid
dealing with evil in the real world...

Using visual studio, the only way we could find that works out of box
to open a file that has a unicode filename is to use the windows only
function _wopen that takes a wchar_t * as argument:
http://msdn.microsoft.com/en-us/library/z0kc8e3z(VS.80).aspx


> So,
> to be pragmatic, we certainly could add the wchar_t
> versions....
>
> but the question is:
>
>
>                  How is this going to help ?
>
>
> if what matter is what each on of the particular ImageIO
> classes do with the m_Filename string.
>
> To be more precise, most of the ImageIO classes end up
> passing the string to a third party library (such as PNG,
> TIFF, JPEG, META, Nifti, Nrrd, GDCM...), so, if that library
> doesn't support wchar_t,   we are still dead in the water.

That's correct. I reckon it is not a simple problem to solve. Note
however that some libraries may already support unicode filenames. At
least libtiff does:
  TIFFOpenW uses wchar_t (TIFFOpen uses char)
  http://www.itk.org/cgi-bin/viewcvs.cgi/Utilities/itktiff/tiffio.h?root=Insight&sortby=file&view=markup


> Most of these libraries will end up using :
>
>     ifstream.open()
>     ofstream.open()
>     fopen()
>
>
> How are we going to pass those wchar_t down these
> other libraries and have them to be interpreted as the
> proper character set ?
>
>
> Are we going to add more Windows conditional code
> to ALL of these libraries ?
> (I hope the answer here is: no)

I do not know any other workable alternative :(

But please, if someone knows, speak out!


> I'll suggest we start with adding a minimal tests
> that just attempts to do simple operations such as
> creating new files with filenames that require a particular
> encoding, and see how this can be made portable
> across platforms.

Sure, this seems good.


> Would that make sense ?

It definitely makes sense to me.

The other thing we need to take care of is how a new wchar_t * API
would behave on non-visual c++ systems.

A potential alternative could be to add the ICU library
  http://site.icu-project.org/
to the list of third party-library and use ICU for the API.

This would have the advantage of providing a portable unicode
interface. The gory details can then be handled at the latest possible
stage and could be conditioned on the plateform, e.g.:

  * use conversion to wchar_t * on windows if the third party IO
library supports it
  http://icu-project.org/apiref/icu4c/ustring_8h.html#184562a078b0a961d9281b0c29bb5406

  * use (if possible) conversion to local codepage encoded char * on windows
  http://icu-project.org/apiref/icu4c/ustring_8h.html#c80eca8339bf48f3cb650d31d4a9ef80

  * use conversion to utf8 encoded char * otherwise
  http://icu-project.org/apiref/icu4c/ustring_8h.html#0ca7af2cf47b116454eed92331594afa

Would that look better to you?

Tom

>
>
>      Luis
>
>
> -----------------------------------------------------------------------------------------
> On Wed, Sep 30, 2009 at 12:04 PM, Tom Vercauteren
> <tom.vercauteren at m4x.org> wrote:
>> Hi all,
>>
>> Benjamin, a colleague of mine, tried to move forward on the topic of
>> unicode filenames. There was a thread going on on the mailing list in
>> 2007:
>> http://www.itk.org/pipermail/insight-users/2007-July/022828.html
>>
>> However, there seem to be no workable alternative with visual c++.
>> Basically, itk relies on char* filenames that are assumed to be encoded in
>> * utf-8 on linux and mac
>> * some local encoding on windows based on the current codepage
>>
>> The problem is that a given codepage does not cover all possible
>> unicode characters (unlike utf-8). It seems that on windows wchar_t*
>> is required.
>>
>> There is more information and a unit test on the bug tracker:
>> http://public.kitware.com/Bug/view.php?id=9623
>>
>> Are we missing something here?
>>
>> Best regards,
>> Tom
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>
>> Please keep messages on-topic and check the ITK FAQ at: http://www.itk.org/Wiki/ITK_FAQ
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.itk.org/mailman/listinfo/insight-developers
>>
>


More information about the Insight-developers mailing list