[Insight-developers] Image writing with unicode filename impossible with MSVC

Luis Ibanez luis.ibanez at kitware.com
Tue Oct 13 09:58:29 EDT 2009


Hi Tom,

I'm curious of taking a look at other libraries and
seeing how they have managed this issue.

A good candidate to learn from is Qt, since they
have such an international user base, and support
anything from KDE to cell phone applications.

Some greps in: qt-x11-opensource-src-4.5.0-beta1

reveal instances like:

src/gui/kernel/qapplication.cpp

lines 4282-4303

#ifndef QT_NO_SESSIONMANAGER
#if defined(Q_WS_WIN) || defined(Q_WS_MAC) || defined(Q_WS_QWS)

#if defined(Q_OS_WINCE)
HRESULT qt_CoCreateGuid(GUID* guid)
{
    // We will use the following information to create the GUID
    // 1. absolute path to application
    wchar_t tempFilename[512];
    if (!GetModuleFileNameW(0, tempFilename, 512))
        return S_FALSE;
    unsigned int hash = qHash(QString::fromUtf16((const unsigned short
*) tempFilename));
    guid->Data1 = hash;
    // 2. creation time of file
    QFileInfo info(QString::fromUtf16((const unsigned short *) tempFilename));
    guid->Data2 = qHash(info.created().toTime_t());
    // 3. current system time
    guid->Data3 = qHash(QDateTime::currentDateTime().toTime_t());
    return S_OK;
}
#if !defined(OLE32_MCOMGUID) || defined(QT_WINCE_FORCE_CREATE_GUID)
#define CoCreateGuid qt_CoCreateGuid
#endif



The classical, (inherently Evil) #ifdef for Windows...
and some internal methods for dealing with conversions
from and to QString.


Similar in lines 4324-4333:


#if defined(Q_WS_WIN)
    wchar_t guidstr[40];
    GUID guid;
    CoCreateGuid(&guid);
    StringFromGUID2(guid, guidstr, 40);
    id = QString::fromUtf16((ushort*)guidstr);
    CoCreateGuid(&guid);
    StringFromGUID2(guid, guidstr, 40);
    key = QString::fromUtf16((ushort*)guidstr);
#endif


In total there are only 37 instances of wchar_t
in Qt (once we exclude the 3rdparty libraries).


In short: I don't see too much trouble in
inserting this sort of modifications in ITK itself.
What will be tricky is to propagate this down
each one ImageIO class and their respective
libraries (PNG, JPEG, gdcm, Meta, Nifti).

The option of using ICU is tempting.

The license is fine:
http://source.icu-project.org/repos/icu/icu/trunk/license.html

The source code size. however, is:   69 Mb.    :-/

ITK itself (with all the third party libraries) is 133 Mb.

I'm finding difficult to justify including such a large library
for dealing with a rather localized feature.

.....

Let's start with the isolated experiment,
to test the waters...(that is, just a test that
doesn't modify yet any of the ITK classes).

Would you like to take a first crack at it ?



     Luis


---------------------------------------------------------------------------
On Tue, Oct 13, 2009 at 6:51 AM, Tom Vercauteren
<tom.vercauteren at m4x.org> wrote:
> Hi Luis,
>
> Thanks for your feedback. Please find my comments below.
>
>> Sean's warning about the inherit Evilness of wchar_t was
>> quite convincing...
>>
>> That plus the perspective of adding code that only works
>> on Windows are not quite motivating...
>
> I do agree that c++ doesn't provide much incentive for dealing with
> unicode questions.
>
> Sean was perfectly right in saying that wchar_t is evil (especially on
> linux and mac). However, there seems to be no simple way to avoid
> dealing with evil in the real world...
>
> Using visual studio, the only way we could find that works out of box
> to open a file that has a unicode filename is to use the windows only
> function _wopen that takes a wchar_t * as argument:
> http://msdn.microsoft.com/en-us/library/z0kc8e3z(VS.80).aspx
>
>
>> So,
>> to be pragmatic, we certainly could add the wchar_t
>> versions....
>>
>> but the question is:
>>
>>
>>                  How is this going to help ?
>>
>>
>> if what matter is what each on of the particular ImageIO
>> classes do with the m_Filename string.
>>
>> To be more precise, most of the ImageIO classes end up
>> passing the string to a third party library (such as PNG,
>> TIFF, JPEG, META, Nifti, Nrrd, GDCM...), so, if that library
>> doesn't support wchar_t,   we are still dead in the water.
>
> That's correct. I reckon it is not a simple problem to solve. Note
> however that some libraries may already support unicode filenames. At
> least libtiff does:
>  TIFFOpenW uses wchar_t (TIFFOpen uses char)
>  http://www.itk.org/cgi-bin/viewcvs.cgi/Utilities/itktiff/tiffio.h?root=Insight&sortby=file&view=markup
>
>
>> Most of these libraries will end up using :
>>
>>     ifstream.open()
>>     ofstream.open()
>>     fopen()
>>
>>
>> How are we going to pass those wchar_t down these
>> other libraries and have them to be interpreted as the
>> proper character set ?
>>
>>
>> Are we going to add more Windows conditional code
>> to ALL of these libraries ?
>> (I hope the answer here is: no)
>
> I do not know any other workable alternative :(
>
> But please, if someone knows, speak out!
>
>
>> I'll suggest we start with adding a minimal tests
>> that just attempts to do simple operations such as
>> creating new files with filenames that require a particular
>> encoding, and see how this can be made portable
>> across platforms.
>
> Sure, this seems good.
>
>
>> Would that make sense ?
>
> It definitely makes sense to me.
>
> The other thing we need to take care of is how a new wchar_t * API
> would behave on non-visual c++ systems.
>
> A potential alternative could be to add the ICU library
>  http://site.icu-project.org/
> to the list of third party-library and use ICU for the API.
>
> This would have the advantage of providing a portable unicode
> interface. The gory details can then be handled at the latest possible
> stage and could be conditioned on the plateform, e.g.:
>
>  * use conversion to wchar_t * on windows if the third party IO
> library supports it
>  http://icu-project.org/apiref/icu4c/ustring_8h.html#184562a078b0a961d9281b0c29bb5406
>
>  * use (if possible) conversion to local codepage encoded char * on windows
>  http://icu-project.org/apiref/icu4c/ustring_8h.html#c80eca8339bf48f3cb650d31d4a9ef80
>
>  * use conversion to utf8 encoded char * otherwise
>  http://icu-project.org/apiref/icu4c/ustring_8h.html#0ca7af2cf47b116454eed92331594afa
>
> Would that look better to you?
>
> Tom
>
>>
>>
>>      Luis
>>
>>
>> -----------------------------------------------------------------------------------------
>> On Wed, Sep 30, 2009 at 12:04 PM, Tom Vercauteren
>> <tom.vercauteren at m4x.org> wrote:
>>> Hi all,
>>>
>>> Benjamin, a colleague of mine, tried to move forward on the topic of
>>> unicode filenames. There was a thread going on on the mailing list in
>>> 2007:
>>> http://www.itk.org/pipermail/insight-users/2007-July/022828.html
>>>
>>> However, there seem to be no workable alternative with visual c++.
>>> Basically, itk relies on char* filenames that are assumed to be encoded in
>>> * utf-8 on linux and mac
>>> * some local encoding on windows based on the current codepage
>>>
>>> The problem is that a given codepage does not cover all possible
>>> unicode characters (unlike utf-8). It seems that on windows wchar_t*
>>> is required.
>>>
>>> There is more information and a unit test on the bug tracker:
>>> http://public.kitware.com/Bug/view.php?id=9623
>>>
>>> Are we missing something here?
>>>
>>> Best regards,
>>> Tom
>>> _______________________________________________
>>> Powered by www.kitware.com
>>>
>>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>>
>>> Please keep messages on-topic and check the ITK FAQ at: http://www.itk.org/Wiki/ITK_FAQ
>>>
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>
>>
>


More information about the Insight-developers mailing list