[Insight-developers] itkTimeStamp Test Failures Mistery

Wed Feb 18 04:02:30 EST 2009

Hi Luis,

> 1) Yeap, I haven't experimented either with removing the
>   optimization, to see if this is a "feature" of the
>   optimized code. Note that since it shows up in multiple
>   platforms, it would be affected the different optimization
>   methods that are platform-specific.

That's right but all optimized version use the same pattern:
  m_ModifiedTime = (unsigned
long)some_atomic_increment_function(itkTimeStampTime);
which is why the problem might be in the cast and assign.

> 2) I see your point regarding the timeIncrement[] array.
>   It is really not testing that the tsp Modified time was
>   incremented in *this* thread.

I removed this array from the test.

> 3) This still leaves us with the mistery of why some of
>   the counts fail.  Notice that the counters[] array,
>   still indicates that all the threads have been visited.

Indeed, this leaves me perplexed...

> I guess I'll try disabling to optimization to see what
> happens in that case...

I realized that it was not necessary to modify the TimeStamp code in
order to test the non-optimized behavior. We can indeed only add,
within the test, a mutex lock around the call to modified. I have
committed it to the repository:
http://www.itk.org/cgi-bin/viewcvs.cgi/Testing/Code/Common/itkTimeStampTest.cxx?root=Insight&r1=1.8&r2=1.9&sortby=date

Both optimized and pseudo-non-optimized versions will thus be tested.
I hope this will help better understand the problem.

Regards,
Tom

>
>
>    Thanks
>
>
>        Luis
>
>
>
> ----------------------
> Tom Vercauteren wrote:
>>
>> Hi Luis,
>>
>> Even though I am the unfortunate initial author of that time stamp
>> test, this bug is definitely out of my c++ expertise.
>>
>> That being said, I am not sure that the new timeIncrement array is
>> used in a meaningful way. The modified_function of the test is indeed
>> called "simultaneously" by many threads. It should therefore only use
>> thread safe calls.
>>
>> If I get it right the following line is OK because threadId is unique
>> for each thread:
>>  helper->counters[threadId]++;
>>
>> However I don't get the following lines:
>>  const signed long time1 = tsp->GetMTime();
>>  tsp->Modified();
>>  const signed long time2 = tsp->GetMTime();
>>  helper->timeIncrement[threadId] = time2 - time1;
>>
>> Since other threads might have called tsp->Modified() between the
>> allocation of time1 and time2, I see no reason for
>> helper->timeIncrement[threadId] to be equal to one.
>>
>>
>> Anyhow, it seems that the problem is either in the the timestamp
>> itself or in the multithreader.
>>
>> When I first saw this bug I was planning to try and use an alternative
>> threading framework such as boost thread or a direct use of pthreads.
>>
>> The second thing I wanted to try, was a revert to non-optimized
>> TimeStamps. Since I committed the test after the optimization, I am
>> not even sure that non-optimized TimeStamps don't suffer from the same
>> bug. I guess that even though the non-optimized version has that bug,
>> it would be far more difficult to highlight because the mutexes lock
>> would slow down everything.
>>
>> My only fear with the optimized version is that even though the
>> increment is atomic (and thus thread safe) the casting assignment
>> might be more problematic:
>>  m_ModifiedTime = (unsigned long)InterlockedIncrement(&itkTimeStampTime);
>>
>> If the problem really comes from here, maybe adding
>>  ts.Modified();
>> just after
>>  multithreader->SingleMethodExecute();
>> might make the test pass. This won't of course be a fix for the bug
>> but it might help understand what's wrong.
>>
>> Unfortunately, I never took the time to actually try all that because
>> the test doesn't fail on my machine.
>>
>> Hope this helps a little.
>>
>> Cheers,
>> Tom
>>
>>
>> On Mon, Feb 16, 2009 at 20:29, Luis Ibanez <luis.ibanez at kitware.com>
>> wrote:
>>
>>> The TimeStamp test continues to mysteriously fail in several
>>> platforms, apparently at random occasions.
>>>
>>> mini3.nlm       MacOSX-gcc4.0-rel    Failed    17.00
>>> gingee          Linux-c++            Failed     6.16
>>> mini3.nlm       MacOSX-gcc4.0-rel    Failed    18.00
>>> zion.kitware    Linux-g++-4.1        Failed    10.34
>>> BillsBasement   Linux-gcc43-release  Failed    16.24
>>> crl.med.harvard.edu Linux-x86_64-release-gcc  Failed    5.58
>>> mini1.nlm       MacOSX-Xcode3.0-dbg           Failed   17.00
>>> murron.hobbs-hancock  Linux-gcc-4.3.2-x86_64  Failed  149.23
>>> mini2.nlm       MacOSX-Xcode3.0-dbg           Failed   17.00
>>> Eternia.kitware gcc_review_optreg_libxml2     Failed    6.46
>>> mini3.nlm       MacOSX-gcc4.0-rel             Failed   18.00
>>>
>>> So here is the mystery:
>>>
>>> The structure of the test is that it creates one TimeStamp
>>> instance.
>>>
>>> It sets up a function to be called from a large number of
>>> threads (128 in some platforms). Each call to the function,
>>> triggers an update in the TimeStamp instance.
>>>
>>> Therefore the expectation of the test is that after the
>>> call to SingleExecuteMethod() the Modified time of the
>>> TimeStamp will have increased by (at least) the number
>>> of threads.
>>>
>>> The increase may be larger than the number of threads
>>> if any other ITK class happens to have an instance of
>>> the TimeStamp updated in between...
>>>
>>> The test assumes that if the difference between the modified
>>> time before and after the call to
>>>
>>>   multithreader->SingleMethodExecute();
>>>
>>> if less than the number of threads, then the TimeStamp
>>> failed to increment its counter in between the round
>>> of threaded executions.
>>>
>>> The platforms above report such occurrences.
>>>
>>> However,
>>> The test also has two arrays with a number of elements
>>> equal to the number of threads, and in them we verify
>>> that the threaded function has indeed been called for
>>> each one of the threads.
>>>
>>> When the test fails, we still observe that such counters
>>> are incremented correctly.
>>>
>>> The failure cases are behaving "as if" a number of the
>>> increments to the time stamp where "delayed" to be
>>> executed after the call to SingleMethodExecute().
>>>
>>> It looks as if the threads were interrupted in the
>>> middle of their execution and then allowed to continue.
>>>
>>> Particularly because, when the test fail, by reporting
>>> a default in the TimeStamp increment, the next iteration
>>> report and excess by the same amount. E.g. as if the
>>> TimeStamp were catching up the missing increments in
>>> the next iteration of the test.
>>>
>>>
>>> Any suggestions, hints, advice will be appreciated,
>>>
>>>
>>>   Thanks
>>>
>>>
>>>      Luis
>>>
>>>
>>> _______________________________________________
>>> Powered by www.kitware.com
>>>
>>> Visit other Kitware open-source projects at
>>> http://www.kitware.com/opensource/opensource.html
>>>
>>> Please keep messages on-topic and check the ITK FAQ at:
>>> http://www.itk.org/Wiki/ITK_FAQ
>>>
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>
>>
>>
>