[Insight-developers] itkTimeStamp Test Failures Mistery

Tue Feb 17 13:38:14 EST 2009

Hi Luis,

Even though I am the unfortunate initial author of that time stamp
test, this bug is definitely out of my c++ expertise.

That being said, I am not sure that the new timeIncrement array is
used in a meaningful way. The modified_function of the test is indeed
called "simultaneously" by many threads. It should therefore only use
thread safe calls.

If I get it right the following line is OK because threadId is unique
for each thread:
  helper->counters[threadId]++;

However I don't get the following lines:
  const signed long time1 = tsp->GetMTime();
  tsp->Modified();
  const signed long time2 = tsp->GetMTime();
  helper->timeIncrement[threadId] = time2 - time1;

Since other threads might have called tsp->Modified() between the
allocation of time1 and time2, I see no reason for
helper->timeIncrement[threadId] to be equal to one.

Anyhow, it seems that the problem is either in the the timestamp
itself or in the multithreader.

When I first saw this bug I was planning to try and use an alternative
threading framework such as boost thread or a direct use of pthreads.

The second thing I wanted to try, was a revert to non-optimized
TimeStamps. Since I committed the test after the optimization, I am
not even sure that non-optimized TimeStamps don't suffer from the same
bug. I guess that even though the non-optimized version has that bug,
it would be far more difficult to highlight because the mutexes lock
would slow down everything.

My only fear with the optimized version is that even though the
increment is atomic (and thus thread safe) the casting assignment
might be more problematic:
  m_ModifiedTime = (unsigned long)InterlockedIncrement(&itkTimeStampTime);

If the problem really comes from here, maybe adding
  ts.Modified();
just after
  multithreader->SingleMethodExecute();
might make the test pass. This won't of course be a fix for the bug
but it might help understand what's wrong.

Unfortunately, I never took the time to actually try all that because
the test doesn't fail on my machine.

Hope this helps a little.

Cheers,
Tom

On Mon, Feb 16, 2009 at 20:29, Luis Ibanez <luis.ibanez at kitware.com> wrote:
>
> The TimeStamp test continues to mysteriously fail in several
> platforms, apparently at random occasions.
>
> mini3.nlm       MacOSX-gcc4.0-rel    Failed    17.00
> gingee          Linux-c++            Failed     6.16
> mini3.nlm       MacOSX-gcc4.0-rel    Failed    18.00
> zion.kitware    Linux-g++-4.1        Failed    10.34
> BillsBasement   Linux-gcc43-release  Failed    16.24
> crl.med.harvard.edu Linux-x86_64-release-gcc  Failed    5.58
> mini1.nlm       MacOSX-Xcode3.0-dbg           Failed   17.00
> murron.hobbs-hancock  Linux-gcc-4.3.2-x86_64  Failed  149.23
> mini2.nlm       MacOSX-Xcode3.0-dbg           Failed   17.00
> Eternia.kitware gcc_review_optreg_libxml2     Failed    6.46
> mini3.nlm       MacOSX-gcc4.0-rel             Failed   18.00
>
> So here is the mystery:
>
> The structure of the test is that it creates one TimeStamp
> instance.
>
> It sets up a function to be called from a large number of
> threads (128 in some platforms). Each call to the function,
> triggers an update in the TimeStamp instance.
>
> Therefore the expectation of the test is that after the
> call to SingleExecuteMethod() the Modified time of the
> TimeStamp will have increased by (at least) the number
> of threads.
>
> The increase may be larger than the number of threads
> if any other ITK class happens to have an instance of
> the TimeStamp updated in between...
>
> The test assumes that if the difference between the modified
> time before and after the call to
>
>     multithreader->SingleMethodExecute();
>
> if less than the number of threads, then the TimeStamp
> failed to increment its counter in between the round
> of threaded executions.
>
> The platforms above report such occurrences.
>
> However,
> The test also has two arrays with a number of elements
> equal to the number of threads, and in them we verify
> that the threaded function has indeed been called for
> each one of the threads.
>
> When the test fails, we still observe that such counters
> are incremented correctly.
>
> The failure cases are behaving "as if" a number of the
> increments to the time stamp where "delayed" to be
> executed after the call to SingleMethodExecute().
>
> It looks as if the threads were interrupted in the
> middle of their execution and then allowed to continue.
>
> Particularly because, when the test fail, by reporting
> a default in the TimeStamp increment, the next iteration
> report and excess by the same amount. E.g. as if the
> TimeStamp were catching up the missing increments in
> the next iteration of the test.
>
>
> Any suggestions, hints, advice will be appreciated,
>
>
>     Thanks
>
>
>        Luis
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ITK FAQ at:
> http://www.itk.org/Wiki/ITK_FAQ
>
> Follow this link to subscribe/unsubscribe:
> http://www.itk.org/mailman/listinfo/insight-developers
>