[Insight-users] Kd-tree implementation in ITK is buggy and unreliable

Luis Ibanez luis.ibanez at kitware.com
Tue Apr 29 12:27:46 EDT 2008


Hi Bill,


I undid part of those changes this
morining when fixing compiler warnings   :-(

...

Here is the problem,

In the effort to reduce the bounds
http://public.kitware.com/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkKdTree.txx?root=Insight&r1=1.23&r2=1.24

the code used:

lowerBound[d] = -vcl_sqrt(NumericTraits< MeasurementType >::max())/2.0;

but some of the examples use MeasurementTypes that are unsinged,
and the negative value is misinterpreted in that context. In
general we cannot assume that the vector use signed components.

The code above was replaced with
http://public.kitware.com/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkKdTree.txx?root=Insight&r1=1.23&r2=1.26


lowerBound[d] =
    NumericTraits< MeasurementType >::NonpositiveMin() / 2.0;


It is missing the sqrt() part of your fix...

Do you  think that the 1/2.0 factor may be enough for
dealing with the numerical overflow ?


----

BTW: Somewere in the MeanShift class there is a mislead
      typedef for MeasurementType.

      I had to add the declaration:
typedef typename TSample::MeasurementType  MeasurementType;

      in order to avoid conversion warnings with gcc.




    Luis



---------------------
Bill Lorensen wrote:
> I committed last night. You should see the results on tomorrow's dashboard.
> 
> Bill
> 
> On Tue, Apr 29, 2008 at 11:43 AM, Luis Ibanez <luis.ibanez at kitware.com> wrote:
> 
>>Hi Bill,
>>
>>         Great !
>>
>>
>>Have you committed the fix ?
>>Or, could you post the patch
>>
>>
>>On a parallel note,
>>I'm tracking the problem with the time out
>>of about 6 test. The new version of QuickSelect
>>fails for cases where there are many repeated
>>values in the collection.  I'm looking at
>>replacing it with the implementation of
>>NthElement from STL.
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>    Luis
>>
>>
>>-------------------
>>Bill Lorensen wrote:
>>
>>>Luis,
>>>
>>>I fixed the floating point overflow problem in KdTree.
>>>
>>>Bill
>>>
>>>On Mon, Apr 28, 2008 at 8:21 AM, Luis Ibanez <luis.ibanez at kitware.com>
>>
>>wrote:
>>
>>>
>>>>Hi Ali,
>>>>
>>>>
>>>>A quick update on the status of this problem:
>>>>
>>>>
>>>>Bugs were identified and fixed at the level of the
>>>>StatisticsAlgorithms:
>>>>
>>>> * Partition
>>>> * QuickSelect
>>>>
>>>>
>>
>>http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkStatisticsAlgorithm.txx?root=Insight&r1=1.19&r2=1.21&sortby=date
>>
>>http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkKdTreeGenerator.txx?root=Insight&r1=1.14&r2=1.18&sortby=date
>>
>>>>Explicit tests were added for
>>>>
>>>> * QuickSelect
>>>> * KdTree
>>>> * KdTreeGenerator
>>>>
>>>>(about 12 new tests).
>>>>
>>>>These new tests are passing.
>>>>
>>>>
>>>>There are still side effects in the following (now failing) tests
>>>>
>>>>* BayesianClassifierInitializeTest
>>>>* itkSampleMeanSjhiftClusteringFilterTest
>>>>* RBFTest1
>>>>* ScalarImageKmeansClassifierTest
>>>>
>>>>They are currently timing-out, (probably due to infinite-loops)
>>>>We suspect that they are related to the WeightedCentroidKdTreeGenerator
>>>>(but that is only  early speculation).
>>>>
>>>>
>>>>We are now tracking those issues...
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  Luis
>>>>
>>>>
>>>>
>>>>--------------------
>>>>Luis Ibanez wrote:
>>>>
>>>>
>>>>
>>>>>Hi Bill,
>>>>>
>>>>>Yeap, the problem was in the combination of QuickSelect and Partition.
>>>>>
>>>>>Both of them are defined in
>>>>>
>>>>> Insight/Code/Numerics/Statistics/itkStatisticsAlgorithms.txx
>>>>>
>>>>>Sorting the list before QuickSelect would only work when the
>>
>>measurement
>>
>>>>>vector has a single dimension. Otherwise, a single sort can not
>>
>>possibly
>>
>>>>>organize the list in all dimensions. We could call sort just before
>>>>>Partition, to sort the activeDimension, but this is an overkill, since
>>>>>Partition is indeed intended for parforming a partial sorting.
>>>>>
>>>>>
>>>>>I have replaced both the QuickSelect and Partition functions with
>>>>>implementations based on the description of the Partition and
>>>>>QuickSelect algorithms in the Wikipedia:
>>>>>
>>>>>  http://en.wikipedia.org/wiki/Quickselect
>>>>>
>>>>>This works fine for KdTreeTest1 and KdTreeTest2.
>>>>>
>>>>>
>>>>>However, the test for the StatisticsAlgorithms itself fails (timesout)
>>>>>due to the fact that its list contains multiple entries with
>>>>>values equal to the partition value. A case that is not considered
>>>>>in the current algorithm. The current algorithm enters into an
>>>>>infinite loop in this case.
>>>>>
>>>>>I'm modifying the implementation for covering that case.
>>>>>
>>>>>--
>>>>>
>>>>>BTW, about 10 test more were added, for exercising the KdTree
>>>>>at different bucket sizes.
>>>>>
>>>>>
>>>>>To be revisited: The tests in Borland are producing floating-point
>>>>>              overflows... not a good sign. There are possible
>>>>>              other issues in the computation.
>>>>>
>>>>>
>>>>>
>>>>> Luis
>>>>>
>>>>>
>>>>>
>>>>>---------------------
>>>>>Bill Lorensen wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Luis,
>>>>>>
>>>>>>I think the main problem is that the samples need to be sorted. I
>>
>>added:
>>
>>>>>>HeapSort<SubsampleType>(m_Subsample, partitionDimension, beginIndex,
>>>>>>endIndex);
>>>>>>just before the QuickSelect call and itkKdTreeTest1 passes
>>>>>>consistently.
>>>>>>
>>>>>>However, this is not a good fix. I think the QuickSelect code should
>>>>>>work. Probably the error is in Partition. It should group a list
>>
>>into
>>
>>>>>>two parts, one less than the partition value and another greater
>>
>>than
>>
>>>>>>or equal the value. Currently it looks like Partition assumes the
>>
>>list
>>
>>>>>>is sorted.
>>>>>>
>>>>>>BTW, itkKdTreeTest2 produces a tree different from the one in
>>
>>wikipedia.
>>
>>>>>>HTH,
>>>>>>Bill
>>>>>>
>>>>>>On Fri, Apr 25, 2008 at 12:59 PM, Luis Ibanez
>>
>><luis.ibanez at kitware.com>
>>
>>>>wrote:
>>>>
>>>>
>>>>
>>>>>>
>>>>>>>Hi Ali,
>>>>>>>
>>>>>>>
>>>>>>>An update on the KdTree segmentation fault
>>>>>>>(that may or may not be related to the problem that you observe).
>>>>>>>
>>>>>>>
>>>>>>>This test is running the example tree provided in the Wikipedia
>>>>>>>http://en.wikipedia.org/wiki/Kdtree
>>>>>>>http://en.wikipedia.org/wiki/Kdtree#Constructing_a_kd-tree
>>>>>>>
>>>>>>>We are feeding the set of points:
>>>>>>>
>>>>>>>    (2,3), (5,4), (9,6), (4,7), (8,1), (7,2)
>>>>>>>
>>>>>>>and expecting to get the tree:
>>>>>>>http://upload.wikimedia.org/wikipedia/en/f/f1/Kd_tree.png
>>>>>>>
>>>>>>>            (7,2)
>>>>>>>              /\
>>>>>>>             /  \
>>>>>>>            /    \
>>>>>>>        (5,4)    (9,6)
>>>>>>>          /\       \
>>>>>>>         /  \       \
>>>>>>>        /    \       \
>>>>>>>     (2,3)  (4,7)    (8,1)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>Here is what we currently observe as output for different
>>>>>>>bucket sizes:
>>>>>>>
>>>>>>>-----
>>>>>>>
>>>>>>>Case BucketSize == 1:
>>>>>>>
>>>>>>>The segmentation fault of itkKdTreeTest2 with BucketSize==1
>>>>>>>happens because the recursive function:
>>>>>>>
>>>>>>> KdTreeGenerator::GenerateTreeLoop()  (line 179)
>>>>>>>
>>>>>>>ends up calling itself indifinitely, until it overflows the stack.
>>>>>>>It enter in an infinite (recursive) loop calling:
>>>>>>>
>>>>>>>GenerateTreeLoop( 0, 0, level++)
>>>>>>>GenerateTreeLoop( 0, 2, level++)
>>>>>>>GenerateTreeLoop( 0, 0, level++)
>>>>>>>GenerateTreeLoop( 0, 2, level++)
>>>>>>>GenerateTreeLoop( 0, 0, level++)
>>>>>>>GenerateTreeLoop( 0, 2, level++)
>>>>>>>....
>>>>>>>
>>>>>>>where "0" and "2" here are the indexes of the points in the
>>
>>sample.
>>
>>>>>>>-----
>>>>>>>
>>>>>>>Case BucketSize == 2:
>>>>>>>
>>>>>>>The segmentation fault of itkKdTreeTest2 with BucketSize==2
>>>>>>>happens also because the recursive function:
>>>>>>>
>>>>>>> KdTreeGenerator::GenerateTreeLoop()  (line 179)
>>>>>>>
>>>>>>>ends up calling itself indifinitely, until it overflows the stack.
>>>>>>>It enter in an infinite (recursive) loop calling:
>>>>>>>
>>>>>>>GenerateTreeLoop( 3, 3, level++)
>>>>>>>GenerateTreeLoop( 3, 6, level++)
>>>>>>>GenerateTreeLoop( 3, 3, level++)
>>>>>>>GenerateTreeLoop( 3, 6, level++)
>>>>>>>GenerateTreeLoop( 3, 3, level++)
>>>>>>>GenerateTreeLoop( 3, 6, level++)
>>>>>>>....
>>>>>>>
>>>>>>>where "3" and "6" here are the indexes of the points in the
>>
>>sample.
>>
>>>>>>>-----
>>>>>>>
>>>>>>>Case BucketSize == 3:
>>>>>>>
>>>>>>>When using a bucket size of 3, the function GenerateTreeLoop()
>>>>>>>is being called only once, but it generates the following tree
>>>>>>>
>>>>>>>              NTN
>>>>>>>              /\
>>>>>>>             /  \
>>>>>>>            /    \
>>>>>>> [(2,3),(4,7)]     NTN
>>>>>>>                  /\
>>>>>>>                 /  \
>>>>>>>                /    \
>>>>>>>            [(8,1)]     [ (7,2) (5,4) (9,6)]
>>>>>>>
>>>>>>>
>>>>>>>Where "NTN" stands for "non-terminal node".
>>>>>>>and the groups with parenthesis "[ ]" represent buckets.
>>>>>>>
>>>>>>>
>>>>>>>That is a first plane dividing the space at X=4.5
>>>>>>>One bucket on the left for points with X values < 4.5
>>>>>>>and on the right, a second plane dividing the space
>>>>>>>at Y = 1.5, to create on bucket on the left with (8,1)
>>>>>>>and on the right another bucket with points with Y values
>>>>>>>larger than 1.5, namely: [ (7,2) (5,4) (9,6)]
>>>>>>>
>>>>>>>
>>>>>>>-----
>>>>>>>
>>>>>>>Case BucketSize == 4:
>>>>>>>
>>>>>>>When using a bucket size of 4 , the function GenerateTreeLoop()
>>>>>>>is being called three times, and it generates the following tree
>>>>>>>
>>>>>>>
>>>>>>>              NTN
>>>>>>>              /\
>>>>>>>             /  \
>>>>>>>            /    \
>>>>>>> [(2,3),(4,7)]     [(8,1) (7,2) (5,4) (9,6)]
>>>>>>>
>>>>>>>
>>>>>>>That is a single plane dividing the space at X=4.5
>>>>>>>
>>>>>>>
>>>>>>>----------
>>>>>>>
>>>>>>>In  either case,
>>>>>>>the partition is not what we could expect from a KdTree...
>>>>>>>
>>>>>>>
>>>>>>>Just to clarify, the real problem seems to be in the
>>>>>>>implementation of the algorithm in the KdTreeGenerator,
>>>>>>>not in the KdTree class itself...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Luis
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>-------------------
>>>>>>>
>>>>>>>
>>>>>>>Luis Ibanez wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Hi Ali,
>>>>>>>>
>>>>>>>>Thanks for the additinonal information.
>>>>>>>>
>>>>>>>> Yeap, the algorithm should work regardless
>>>>>>>> of the bucket size. That one was a false alarm.
>>>>>>>>
>>>>>>>>
>>>>>>>>I'm tracking now the cases where itkKdTreeTest2 segfaults
>>>>>>>>with a bucket size == 2.  Hopefully this will illuminate
>>>>>>>>at least part of the problem.
>>>>>>>>
>>>>>>>>
>>>>>>>>Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>> Luis
>>>>>>>>
>>>>>>>>
>>>>>>>>---------------
>>>>>>>>Ali - wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>Luis,
>>>>>>>>>
>>>>>>>>>I used a bucket size of 16 with about 100 particles. Like you
>>>>>>>>>
>>>>>>>>
>>>>mentioned,
>>>>
>>>>
>>>>
>>>>>>>the algorithm must work fine independent of these parameters.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>-Ali
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>Hi Ali,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>Could you please report
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>1) How many points (measurement vectors) are you
>>>>>>>>>> currently feeding into the KdTreeGenerator ?
>>>>>>>>>>
>>>>>>>>>>and
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>2) What value of "BucketSize" did you use ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>The problem reported in the bug
>>>>>>>>>>http://public.kitware.com/Bug/view.php?id=5082
>>>>>>>>>>
>>>>>>>>>>Seems to be simply the result of a poor choice
>>>>>>>>>>on the value of the bucket size parameter.
>>>>>>>>>>For the test case reported in that bug, increasing
>>>>>>>>>>the value from 16 to 30 solves the problem.
>>>>>>>>>>
>>>>>>>>>>We are wondering if the problem that you are
>>>>>>>>>>experiencing is also the result of poor parameter
>>>>>>>>>>setting.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>Please let us know,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     Luis
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>------------
>>>>>>>>>>Ali - wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>Hi,
>>>>>>>>>>>
>>>>>>>>>>>A quick search in the mailing list shows that, during the
>>
>>past
>>
>>>>half
>>>>
>>>>
>>>>
>>>>>>>a decade, it has been reported many times that the kd-tree classes
>>
>>in
>>
>>>>ITK
>>>>
>>>>
>>>>
>>>>>>>are buggy. Writing some library based on ITK, it took me a few
>>
>>days to
>>
>>>>find
>>>>
>>>>
>>>>
>>>>>>>out that the bug in my library is actually introduced by the
>>
>>kd-tree
>>
>>>>>>>implementation in ITK which FINDS THE WRONG NEAREST NEIGHBOUR.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>>>I have no idea, against many warnings from the users, why
>>
>>the
>>
>>>>bug
>>>>
>>>>
>>>>
>>>>>>>has not been resolved yet. In the case it is difficult to address
>>
>>the
>>
>>>>bug,
>>>>
>>>>
>>>>
>>>>>>>one option is to wrap many other existing implementations,
>>
>>personally
>>
>>>>I
>>>>
>>>>
>>>>
>>>>>>>switched to libkdtree++.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>_________________________________________________________________
>>>>
>>>>
>>>>
>>>>>>>>>>>100's of prizes to be won at BigSnapSearch.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>http://www.bigsnapsearch.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>Insight-users mailing list
>>>>>>>>>>>Insight-users at itk.org
>>>>>>>>>>>http://www.itk.org/mailman/listinfo/insight-users
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>_________________________________________________________________
>>
>>>>>>>>>100's of prizes to be won at BigSnapSearch.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>http://www.bigsnapsearch.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Insight-users mailing list
>>>>>>>Insight-users at itk.org
>>>>>>>http://www.itk.org/mailman/listinfo/insight-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
> 


More information about the Insight-users mailing list