[ITK] [ITK-dev] [ITK Community] [Insight-developers] non-deterministic v4 registrations in 4.5.x

Simon Alexander skalexander at gmail.com
Wed Mar 19 12:43:20 EDT 2014


Thanks for the summary Brain.

A lot of partitioning issues fundamentally  come down to the lack of
associativity & distributivity  of fp operations.  Not sure I can do
anything practical to improve it  but I will have a look if I can find a
bit of my "copious free time" .


On Wed, Mar 19, 2014 at 12:29 PM, brian avants <stnava at gmail.com> wrote:

> yes - i understand.
>
> * matt mccormick implemented compensated summation to address - it helps
> but is not a full fix
>
> * truncating floating point precision greatly reduces the effect you are
> talking about but is unatisfactory to most people ... not sure if the
> functionality for that truncation was taken out of the v4 metrics but it
> was in there at one point.
>
> * there may be a small and undiscovered bug that contributes to this in
> mattes specificallly but i dont think that's the issue.  we saw this effect
> even in mean squares.  if there is a bug it may be beyond just mattes.   we
> cannot disprove that there is a bug.  if anyone knows of way to do that,
> let me know.
>
> * any help is appreciated
>
>
> brian
>
>
>
>
> On Wed, Mar 19, 2014 at 12:24 PM, Simon Alexander <skalexander at gmail.com>wrote:
>
>> Brain,
>>
>> I could have sworn I had initially added a follow up email clarifying
>> this but since I can't find it in the current quoted exchange, let me
>> reiterate:
>>
>> This is not a case of with different results on different systems.  This
>> is a case of different results on the same system if you use a different
>> number of threads.
>>
>> So while that possibly could be some odd intrinsics issue, for example,
>> the far more likely thing is that data partitioning is not being handled in
>> a way that ensures consistency.
>>
>>  Originally I was also seeing intra-system differences due to internal
>> precision, but that was a separate issue and has been solved.
>>
>> Hope that is more clear!
>>
>>
>>
>> On Wed, Mar 19, 2014 at 12:13 PM, Simon Alexander <skalexander at gmail.com>wrote:
>>
>>> Brian,
>>>
>>> Do you mean the generality of my AVX  internal precision problem?
>>>
>>> I agree that is a very common issue, the surprising thing there was that
>>> we were already constraining the code generation in way that worked as over
>>> the different processor generations and types we used, up until we hit the
>>> first Haswell cpus with AVX2 support (even though no AVX2 instructions were
>>> generated).  Perhaps it shouldn't have surprised me, but It took me a few
>>> tests to work that out because the problem was confounded with the problem
>>> I discuss in this thread (which is unrelated).  Once I separated them it
>>> was easy to spot.
>>>
>>> So that is a solved issue for now, but I am still interested the
>>> partitioning issue in the image metric, as I only have a work around for
>>> now.
>>>
>>>
>>>
>>> On Wed, Mar 19, 2014 at 11:24 AM, brian avants <stnava at gmail.com> wrote:
>>>
>>>>
>>>> http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler
>>>>
>>>> just as an example of the generality of this problem
>>>>
>>>>
>>>> brian
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 19, 2014 at 11:22 AM, Simon Alexander <
>>>> skalexander at gmail.com> wrote:
>>>>
>>>>> Brian, Luis,
>>>>>
>>>>> Thanks.  I have been using Mattes as you suspect.
>>>>>
>>>>> I don't quite understand how precision is specifically the issue with
>>>>> # of cores.  There are all kinds of issues with precision and order of
>>>>> operations in numerical analysis, but often data partitioning (i.e. for
>>>>> concurrency) schemes can be set up so that the actual sums are done the
>>>>> same way regardless of number of workers, which keeps your final results
>>>>> identical.  Is there some reason this can't be done for the Matte's metric?
>>>>>   I really should look at the implementation to answer that, of course.
>>>>>
>>>>> Do you have a pointer to earlier discussions?  If I can find the time
>>>>> I'd like to dig into this a bit, but I'm not sure when I'll have the
>>>>> bandwidth.  I've "solved" this currently by constraining the core count.
>>>>>
>>>>> Perhaps interestingly, my earlier experiments were confounded a bit by
>>>>> a precision issue, but that had to do with intrinsics generation on my
>>>>> compiler behaving differently on systems with AVX2 (even though only AVX
>>>>> intrinsics were being generated).  So that made things confusing at first
>>>>> until I separated the issues.
>>>>>
>>>>>
>>>>> On Wed, Mar 19, 2014 at 9:49 AM, brian avants <stnava at gmail.com>wrote:
>>>>>
>>>>>> yes - we had several discussions about this during v4 development.
>>>>>>
>>>>>> experiments showed that differences are due to precision.
>>>>>>
>>>>>> one solution was to truncate precision to the point that is reliable.
>>>>>>
>>>>>> but there are problems with that too.   last i checked, this was an
>>>>>>
>>>>>> open problem, in general, in computer science.
>>>>>>
>>>>>>
>>>>>> brian
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 19, 2014 at 9:16 AM, Luis Ibanez <luis.ibanez at kitware.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi Simon,
>>>>>>>
>>>>>>> We are aware of some multi-threading related issues in
>>>>>>> the registration process that result in metric values changing
>>>>>>> depending on the number of cores used.
>>>>>>>
>>>>>>> Are you using the MattesMutualInformationMetric ?
>>>>>>>
>>>>>>> At some point it was suspected that the problem was the
>>>>>>> result of accumulative rounding, in the contributions that
>>>>>>> each pixel makes to the metric value.... this may or may
>>>>>>> not be related to what you are observing.
>>>>>>>
>>>>>>>
>>>>>>>    Thanks
>>>>>>>
>>>>>>>        Luis
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Feb 20, 2014 at 3:27 PM, Simon Alexander <
>>>>>>> skalexander at gmail.com> wrote:
>>>>>>>
>>>>>>>> I've been finding some regressions in registration results when
>>>>>>>> using systems with different numbers of cores (so the thread count is
>>>>>>>> different).  This is resolved by fixing the global max.
>>>>>>>>
>>>>>>>> It's difficult for me to run the identical code on against 4.4.2,
>>>>>>>> but similar experiments were run in that timeframe without these
>>>>>>>> regressions.
>>>>>>>>
>>>>>>>> I recall that there were changes affecting multhreading in the v4
>>>>>>>> registration in 4.5.0 release, so I thought this might be a side effect.
>>>>>>>>
>>>>>>>> So a few questions:
>>>>>>>>
>>>>>>>> Is this behaviour expected?
>>>>>>>>
>>>>>>>> Am I correct that this was not the behaviour in 4.4.x ?
>>>>>>>>
>>>>>>>> Does anyone who has a feel for  the recent changes 4.4.2 ->
>>>>>>>> 4.5.[0,1]  have a good idea where to start looking?  I haven't yet dug into
>>>>>>>> the multithreading architecture, but this "smells" like a data partitioning
>>>>>>>> issue to me.
>>>>>>>>
>>>>>>>> Any other thoughts?
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> Simon
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Powered by www.kitware.com
>>>>>>>>
>>>>>>>> Visit other Kitware open-source projects at
>>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>>
>>>>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>>>>> http://kitware.com/products/protraining.php
>>>>>>>>
>>>>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>>>>
>>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Community mailing list
>>>>>>>> Community at itk.org
>>>>>>>> http://public.kitware.com/cgi-bin/mailman/listinfo/community
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Powered by www.kitware.com
>>>>>>>
>>>>>>> Visit other Kitware open-source projects at
>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>
>>>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>>>> http://kitware.com/products/protraining.php
>>>>>>>
>>>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>>>
>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/community/attachments/20140319/1f404279/attachment-0002.html>
-------------- next part --------------
_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html

Kitware offers ITK Training Courses, for more information visit:
http://kitware.com/products/protraining.php

Please keep messages on-topic and check the ITK FAQ at:
http://www.itk.org/Wiki/ITK_FAQ

Follow this link to subscribe/unsubscribe:
http://www.itk.org/mailman/listinfo/insight-developers


More information about the Community mailing list