[ITK] [ITK-dev] [ITK Community] [Insight-developers] non-deterministic v4 registrations in 4.5.x

Simon Alexander skalexander at gmail.com
Wed Mar 19 12:24:22 EDT 2014


Brain,

I could have sworn I had initially added a follow up email clarifying this
but since I can't find it in the current quoted exchange, let me reiterate:

This is not a case of with different results on different systems.  This is
a case of different results on the same system if you use a different
number of threads.

So while that possibly could be some odd intrinsics issue, for example, the
far more likely thing is that data partitioning is not being handled in a
way that ensures consistency.

Originally I was also seeing intra-system differences due to internal
precision, but that was a separate issue and has been solved.

Hope that is more clear!



On Wed, Mar 19, 2014 at 12:13 PM, Simon Alexander <skalexander at gmail.com>wrote:

> Brian,
>
> Do you mean the generality of my AVX  internal precision problem?
>
> I agree that is a very common issue, the surprising thing there was that
> we were already constraining the code generation in way that worked as over
> the different processor generations and types we used, up until we hit the
> first Haswell cpus with AVX2 support (even though no AVX2 instructions were
> generated).  Perhaps it shouldn't have surprised me, but It took me a few
> tests to work that out because the problem was confounded with the problem
> I discuss in this thread (which is unrelated).  Once I separated them it
> was easy to spot.
>
> So that is a solved issue for now, but I am still interested the
> partitioning issue in the image metric, as I only have a work around for
> now.
>
>
>
> On Wed, Mar 19, 2014 at 11:24 AM, brian avants <stnava at gmail.com> wrote:
>
>>
>> http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler
>>
>> just as an example of the generality of this problem
>>
>>
>> brian
>>
>>
>>
>>
>> On Wed, Mar 19, 2014 at 11:22 AM, Simon Alexander <skalexander at gmail.com>wrote:
>>
>>> Brian, Luis,
>>>
>>> Thanks.  I have been using Mattes as you suspect.
>>>
>>> I don't quite understand how precision is specifically the issue with #
>>> of cores.  There are all kinds of issues with precision and order of
>>> operations in numerical analysis, but often data partitioning (i.e. for
>>> concurrency) schemes can be set up so that the actual sums are done the
>>> same way regardless of number of workers, which keeps your final results
>>> identical.  Is there some reason this can't be done for the Matte's metric?
>>>   I really should look at the implementation to answer that, of course.
>>>
>>> Do you have a pointer to earlier discussions?  If I can find the time
>>> I'd like to dig into this a bit, but I'm not sure when I'll have the
>>> bandwidth.  I've "solved" this currently by constraining the core count.
>>>
>>> Perhaps interestingly, my earlier experiments were confounded a bit by a
>>> precision issue, but that had to do with intrinsics generation on my
>>> compiler behaving differently on systems with AVX2 (even though only AVX
>>> intrinsics were being generated).  So that made things confusing at first
>>> until I separated the issues.
>>>
>>>
>>> On Wed, Mar 19, 2014 at 9:49 AM, brian avants <stnava at gmail.com> wrote:
>>>
>>>> yes - we had several discussions about this during v4 development.
>>>>
>>>> experiments showed that differences are due to precision.
>>>>
>>>> one solution was to truncate precision to the point that is reliable.
>>>>
>>>> but there are problems with that too.   last i checked, this was an
>>>>
>>>> open problem, in general, in computer science.
>>>>
>>>>
>>>> brian
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 19, 2014 at 9:16 AM, Luis Ibanez <luis.ibanez at kitware.com>wrote:
>>>>
>>>>> Hi Simon,
>>>>>
>>>>> We are aware of some multi-threading related issues in
>>>>> the registration process that result in metric values changing
>>>>> depending on the number of cores used.
>>>>>
>>>>> Are you using the MattesMutualInformationMetric ?
>>>>>
>>>>> At some point it was suspected that the problem was the
>>>>> result of accumulative rounding, in the contributions that
>>>>> each pixel makes to the metric value.... this may or may
>>>>> not be related to what you are observing.
>>>>>
>>>>>
>>>>>    Thanks
>>>>>
>>>>>        Luis
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Feb 20, 2014 at 3:27 PM, Simon Alexander <
>>>>> skalexander at gmail.com> wrote:
>>>>>
>>>>>> I've been finding some regressions in registration results when using
>>>>>> systems with different numbers of cores (so the thread count is different).
>>>>>>  This is resolved by fixing the global max.
>>>>>>
>>>>>> It's difficult for me to run the identical code on against 4.4.2, but
>>>>>> similar experiments were run in that timeframe without these regressions.
>>>>>>
>>>>>> I recall that there were changes affecting multhreading in the v4
>>>>>> registration in 4.5.0 release, so I thought this might be a side effect.
>>>>>>
>>>>>> So a few questions:
>>>>>>
>>>>>> Is this behaviour expected?
>>>>>>
>>>>>> Am I correct that this was not the behaviour in 4.4.x ?
>>>>>>
>>>>>> Does anyone who has a feel for  the recent changes 4.4.2 -> 4.5.[0,1]
>>>>>>  have a good idea where to start looking?  I haven't yet dug into the
>>>>>> multithreading architecture, but this "smells" like a data partitioning
>>>>>> issue to me.
>>>>>>
>>>>>> Any other thoughts?
>>>>>>
>>>>>> cheers,
>>>>>> Simon
>>>>>>
>>>>>> _______________________________________________
>>>>>> Powered by www.kitware.com
>>>>>>
>>>>>> Visit other Kitware open-source projects at
>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>
>>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>>> http://kitware.com/products/protraining.php
>>>>>>
>>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>>
>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>>>>
>>>>>> _______________________________________________
>>>>>> Community mailing list
>>>>>> Community at itk.org
>>>>>> http://public.kitware.com/cgi-bin/mailman/listinfo/community
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Powered by www.kitware.com
>>>>>
>>>>> Visit other Kitware open-source projects at
>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>
>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>> http://kitware.com/products/protraining.php
>>>>>
>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>
>>>>> Follow this link to subscribe/unsubscribe:
>>>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/community/attachments/20140319/cdbcb8b1/attachment-0002.html>
-------------- next part --------------
_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html

Kitware offers ITK Training Courses, for more information visit:
http://kitware.com/products/protraining.php

Please keep messages on-topic and check the ITK FAQ at:
http://www.itk.org/Wiki/ITK_FAQ

Follow this link to subscribe/unsubscribe:
http://www.itk.org/mailman/listinfo/insight-developers


More information about the Community mailing list