[Insight-users] Kd-tree implementation in ITK is buggy and unreliable

Luis Ibanez luis.ibanez at kitware.com
Sun Apr 27 13:51:57 EDT 2008


Hi Bill,

Yeap, the problem was in the combination of QuickSelect and Partition.

Both of them are defined in

    Insight/Code/Numerics/Statistics/itkStatisticsAlgorithms.txx

Sorting the list before QuickSelect would only work when the measurement
vector has a single dimension. Otherwise, a single sort can not possibly
organize the list in all dimensions. We could call sort just before
Partition, to sort the activeDimension, but this is an overkill, since
Partition is indeed intended for parforming a partial sorting.


I have replaced both the QuickSelect and Partition functions with
implementations based on the description of the Partition and
QuickSelect algorithms in the Wikipedia:

      http://en.wikipedia.org/wiki/Quickselect

This works fine for KdTreeTest1 and KdTreeTest2.


However, the test for the StatisticsAlgorithms itself fails (timesout)
due to the fact that its list contains multiple entries with
values equal to the partition value. A case that is not considered
in the current algorithm. The current algorithm enters into an
infinite loop in this case.

I'm modifying the implementation for covering that case.

--

BTW, about 10 test more were added, for exercising the KdTree
at different bucket sizes.


To be revisited: The tests in Borland are producing floating-point
                  overflows... not a good sign. There are possible
                  other issues in the computation.



    Luis



---------------------
Bill Lorensen wrote:
> Luis,
> 
> I think the main problem is that the samples need to be sorted. I added:
>   HeapSort<SubsampleType>(m_Subsample, partitionDimension, beginIndex,
> endIndex);
> just before the QuickSelect call and itkKdTreeTest1 passes
> consistently.
> 
> However, this is not a good fix. I think the QuickSelect code should
> work. Probably the error is in Partition. It should group a list into
> two parts, one less than the partition value and another greater than
> or equal the value. Currently it looks like Partition assumes the list
> is sorted.
> 
> BTW, itkKdTreeTest2 produces a tree different from the one in wikipedia.
> 
> HTH,
> Bill
> 
> On Fri, Apr 25, 2008 at 12:59 PM, Luis Ibanez <luis.ibanez at kitware.com> wrote:
> 
>>Hi Ali,
>>
>>
>>An update on the KdTree segmentation fault
>>(that may or may not be related to the problem that you observe).
>>
>>
>>This test is running the example tree provided in the Wikipedia
>>http://en.wikipedia.org/wiki/Kdtree
>>http://en.wikipedia.org/wiki/Kdtree#Constructing_a_kd-tree
>>
>>We are feeding the set of points:
>>
>>        (2,3), (5,4), (9,6), (4,7), (8,1), (7,2)
>>
>>and expecting to get the tree:
>>http://upload.wikimedia.org/wikipedia/en/f/f1/Kd_tree.png
>>
>>                (7,2)
>>                  /\
>>                 /  \
>>                /    \
>>            (5,4)    (9,6)
>>              /\       \
>>             /  \       \
>>            /    \       \
>>         (2,3)  (4,7)    (8,1)
>>
>>
>>
>>Here is what we currently observe as output for different
>>bucket sizes:
>>
>>-----
>>
>>Case BucketSize == 1:
>>
>>The segmentation fault of itkKdTreeTest2 with BucketSize==1
>>happens because the recursive function:
>>
>>    KdTreeGenerator::GenerateTreeLoop()  (line 179)
>>
>>ends up calling itself indifinitely, until it overflows the stack.
>>It enter in an infinite (recursive) loop calling:
>>
>>  GenerateTreeLoop( 0, 0, level++)
>>  GenerateTreeLoop( 0, 2, level++)
>>  GenerateTreeLoop( 0, 0, level++)
>>  GenerateTreeLoop( 0, 2, level++)
>>  GenerateTreeLoop( 0, 0, level++)
>>  GenerateTreeLoop( 0, 2, level++)
>>  ....
>>
>>where "0" and "2" here are the indexes of the points in the sample.
>>
>>-----
>>
>>Case BucketSize == 2:
>>
>>The segmentation fault of itkKdTreeTest2 with BucketSize==2
>>happens also because the recursive function:
>>
>>    KdTreeGenerator::GenerateTreeLoop()  (line 179)
>>
>>ends up calling itself indifinitely, until it overflows the stack.
>>It enter in an infinite (recursive) loop calling:
>>
>>  GenerateTreeLoop( 3, 3, level++)
>>  GenerateTreeLoop( 3, 6, level++)
>>  GenerateTreeLoop( 3, 3, level++)
>>  GenerateTreeLoop( 3, 6, level++)
>>  GenerateTreeLoop( 3, 3, level++)
>>  GenerateTreeLoop( 3, 6, level++)
>>  ....
>>
>>where "3" and "6" here are the indexes of the points in the sample.
>>
>>-----
>>
>>Case BucketSize == 3:
>>
>>When using a bucket size of 3, the function GenerateTreeLoop()
>>is being called only once, but it generates the following tree
>>
>>                  NTN
>>                  /\
>>                 /  \
>>                /    \
>>    [(2,3),(4,7)]     NTN
>>                      /\
>>                     /  \
>>                    /    \
>>                [(8,1)]     [ (7,2) (5,4) (9,6)]
>>
>>
>>Where "NTN" stands for "non-terminal node".
>>and the groups with parenthesis "[ ]" represent buckets.
>>
>>
>>That is a first plane dividing the space at X=4.5
>>One bucket on the left for points with X values < 4.5
>>and on the right, a second plane dividing the space
>>at Y = 1.5, to create on bucket on the left with (8,1)
>>and on the right another bucket with points with Y values
>>larger than 1.5, namely: [ (7,2) (5,4) (9,6)]
>>
>>
>>-----
>>
>>Case BucketSize == 4:
>>
>>When using a bucket size of 4 , the function GenerateTreeLoop()
>>is being called three times, and it generates the following tree
>>
>>
>>                  NTN
>>                  /\
>>                 /  \
>>                /    \
>>    [(2,3),(4,7)]     [(8,1) (7,2) (5,4) (9,6)]
>>
>>
>>That is a single plane dividing the space at X=4.5
>>
>>
>>----------
>>
>>In  either case,
>>the partition is not what we could expect from a KdTree...
>>
>>
>>Just to clarify, the real problem seems to be in the
>>implementation of the algorithm in the KdTreeGenerator,
>>not in the KdTree class itself...
>>
>>
>>
>>    Luis
>>
>>
>>
>>-------------------
>>
>>
>>Luis Ibanez wrote:
>>
>>>Hi Ali,
>>>
>>>Thanks for the additinonal information.
>>>
>>>    Yeap, the algorithm should work regardless
>>>    of the bucket size. That one was a false alarm.
>>>
>>>
>>>I'm tracking now the cases where itkKdTreeTest2 segfaults
>>>with a bucket size == 2.  Hopefully this will illuminate
>>>at least part of the problem.
>>>
>>>
>>> Thanks
>>>
>>>
>>>    Luis
>>>
>>>
>>>---------------
>>>Ali - wrote:
>>>
>>>
>>>
>>>>Luis,
>>>>
>>>>I used a bucket size of 16 with about 100 particles. Like you mentioned,
>>
>>the algorithm must work fine independent of these parameters.
>>
>>>>
>>>>-Ali
>>>>
>>>>
>>>>
>>>>
>>>>>Hi Ali,
>>>>>
>>>>>
>>>>>Could you please report
>>>>>
>>>>>
>>>>> 1) How many points (measurement vectors) are you
>>>>>    currently feeding into the KdTreeGenerator ?
>>>>>
>>>>>and
>>>>>
>>>>>
>>>>> 2) What value of "BucketSize" did you use ?
>>>>>
>>>>>
>>>>>
>>>>>The problem reported in the bug
>>>>>http://public.kitware.com/Bug/view.php?id=5082
>>>>>
>>>>>Seems to be simply the result of a poor choice
>>>>>on the value of the bucket size parameter.
>>>>>For the test case reported in that bug, increasing
>>>>>the value from 16 to 30 solves the problem.
>>>>>
>>>>>We are wondering if the problem that you are
>>>>>experiencing is also the result of poor parameter
>>>>>setting.
>>>>>
>>>>>
>>>>>  Please let us know,
>>>>>
>>>>>
>>>>>     Thanks
>>>>>
>>>>>
>>>>>
>>>>>         Luis
>>>>>
>>>>>
>>>>>
>>>>>------------
>>>>>Ali - wrote:
>>>>>
>>>>>
>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>A quick search in the mailing list shows that, during the past half
>>
>>a decade, it has been reported many times that the kd-tree classes in ITK
>>are buggy. Writing some library based on ITK, it took me a few days to find
>>out that the bug in my library is actually introduced by the kd-tree
>>implementation in ITK which FINDS THE WRONG NEAREST NEIGHBOUR.
>>
>>>>>>I have no idea, against many warnings from the users, why the bug
>>
>>has not been resolved yet. In the case it is difficult to address the bug,
>>one option is to wrap many other existing implementations, personally I
>>switched to libkdtree++.
>>
>>>>>>_________________________________________________________________
>>>>>>100's of prizes to be won at BigSnapSearch.com
>>
>>http://www.bigsnapsearch.com
>>
>>>>>>_______________________________________________
>>>>>>Insight-users mailing list
>>>>>>Insight-users at itk.org
>>>>>>http://www.itk.org/mailman/listinfo/insight-users
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>_________________________________________________________________
>>>>100's of prizes to be won at BigSnapSearch.com
>>
>>http://www.bigsnapsearch.com
>>
>>>
>>>
>>_______________________________________________
>>Insight-users mailing list
>>Insight-users at itk.org
>>http://www.itk.org/mailman/listinfo/insight-users
>>
> 
> 


More information about the Insight-users mailing list