[Insight-users] Kd-tree implementation in ITK is buggy and unreliable

Luis Ibanez luis.ibanez at kitware.com
Mon Apr 28 08:21:49 EDT 2008



Hi Ali,


A quick update on the status of this problem:


Bugs were identified and fixed at the level of the
StatisticsAlgorithms:

    * Partition
    * QuickSelect

http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkStatisticsAlgorithm.txx?root=Insight&r1=1.19&r2=1.21&sortby=date 

http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkKdTreeGenerator.txx?root=Insight&r1=1.14&r2=1.18&sortby=date

Explicit tests were added for

    * QuickSelect
    * KdTree
    * KdTreeGenerator

(about 12 new tests).

These new tests are passing.


There are still side effects in the following (now failing) tests

   * BayesianClassifierInitializeTest
   * itkSampleMeanSjhiftClusteringFilterTest
   * RBFTest1
   * ScalarImageKmeansClassifierTest

They are currently timing-out, (probably due to infinite-loops)
We suspect that they are related to the WeightedCentroidKdTreeGenerator
(but that is only  early speculation).


We are now tracking those issues...



      Luis



--------------------
Luis Ibanez wrote:
> 
> Hi Bill,
> 
> Yeap, the problem was in the combination of QuickSelect and Partition.
> 
> Both of them are defined in
> 
>    Insight/Code/Numerics/Statistics/itkStatisticsAlgorithms.txx
> 
> Sorting the list before QuickSelect would only work when the measurement
> vector has a single dimension. Otherwise, a single sort can not possibly
> organize the list in all dimensions. We could call sort just before
> Partition, to sort the activeDimension, but this is an overkill, since
> Partition is indeed intended for parforming a partial sorting.
> 
> 
> I have replaced both the QuickSelect and Partition functions with
> implementations based on the description of the Partition and
> QuickSelect algorithms in the Wikipedia:
> 
>      http://en.wikipedia.org/wiki/Quickselect
> 
> This works fine for KdTreeTest1 and KdTreeTest2.
> 
> 
> However, the test for the StatisticsAlgorithms itself fails (timesout)
> due to the fact that its list contains multiple entries with
> values equal to the partition value. A case that is not considered
> in the current algorithm. The current algorithm enters into an
> infinite loop in this case.
> 
> I'm modifying the implementation for covering that case.
> 
> -- 
> 
> BTW, about 10 test more were added, for exercising the KdTree
> at different bucket sizes.
> 
> 
> To be revisited: The tests in Borland are producing floating-point
>                  overflows... not a good sign. There are possible
>                  other issues in the computation.
> 
> 
> 
>    Luis
> 
> 
> 
> ---------------------
> Bill Lorensen wrote:
> 
>> Luis,
>>
>> I think the main problem is that the samples need to be sorted. I added:
>>   HeapSort<SubsampleType>(m_Subsample, partitionDimension, beginIndex,
>> endIndex);
>> just before the QuickSelect call and itkKdTreeTest1 passes
>> consistently.
>>
>> However, this is not a good fix. I think the QuickSelect code should
>> work. Probably the error is in Partition. It should group a list into
>> two parts, one less than the partition value and another greater than
>> or equal the value. Currently it looks like Partition assumes the list
>> is sorted.
>>
>> BTW, itkKdTreeTest2 produces a tree different from the one in wikipedia.
>>
>> HTH,
>> Bill
>>
>> On Fri, Apr 25, 2008 at 12:59 PM, Luis Ibanez 
>> <luis.ibanez at kitware.com> wrote:
>>
>>> Hi Ali,
>>>
>>>
>>> An update on the KdTree segmentation fault
>>> (that may or may not be related to the problem that you observe).
>>>
>>>
>>> This test is running the example tree provided in the Wikipedia
>>> http://en.wikipedia.org/wiki/Kdtree
>>> http://en.wikipedia.org/wiki/Kdtree#Constructing_a_kd-tree
>>>
>>> We are feeding the set of points:
>>>
>>>        (2,3), (5,4), (9,6), (4,7), (8,1), (7,2)
>>>
>>> and expecting to get the tree:
>>> http://upload.wikimedia.org/wikipedia/en/f/f1/Kd_tree.png
>>>
>>>                (7,2)
>>>                  /\
>>>                 /  \
>>>                /    \
>>>            (5,4)    (9,6)
>>>              /\       \
>>>             /  \       \
>>>            /    \       \
>>>         (2,3)  (4,7)    (8,1)
>>>
>>>
>>>
>>> Here is what we currently observe as output for different
>>> bucket sizes:
>>>
>>> -----
>>>
>>> Case BucketSize == 1:
>>>
>>> The segmentation fault of itkKdTreeTest2 with BucketSize==1
>>> happens because the recursive function:
>>>
>>>    KdTreeGenerator::GenerateTreeLoop()  (line 179)
>>>
>>> ends up calling itself indifinitely, until it overflows the stack.
>>> It enter in an infinite (recursive) loop calling:
>>>
>>>  GenerateTreeLoop( 0, 0, level++)
>>>  GenerateTreeLoop( 0, 2, level++)
>>>  GenerateTreeLoop( 0, 0, level++)
>>>  GenerateTreeLoop( 0, 2, level++)
>>>  GenerateTreeLoop( 0, 0, level++)
>>>  GenerateTreeLoop( 0, 2, level++)
>>>  ....
>>>
>>> where "0" and "2" here are the indexes of the points in the sample.
>>>
>>> -----
>>>
>>> Case BucketSize == 2:
>>>
>>> The segmentation fault of itkKdTreeTest2 with BucketSize==2
>>> happens also because the recursive function:
>>>
>>>    KdTreeGenerator::GenerateTreeLoop()  (line 179)
>>>
>>> ends up calling itself indifinitely, until it overflows the stack.
>>> It enter in an infinite (recursive) loop calling:
>>>
>>>  GenerateTreeLoop( 3, 3, level++)
>>>  GenerateTreeLoop( 3, 6, level++)
>>>  GenerateTreeLoop( 3, 3, level++)
>>>  GenerateTreeLoop( 3, 6, level++)
>>>  GenerateTreeLoop( 3, 3, level++)
>>>  GenerateTreeLoop( 3, 6, level++)
>>>  ....
>>>
>>> where "3" and "6" here are the indexes of the points in the sample.
>>>
>>> -----
>>>
>>> Case BucketSize == 3:
>>>
>>> When using a bucket size of 3, the function GenerateTreeLoop()
>>> is being called only once, but it generates the following tree
>>>
>>>                  NTN
>>>                  /\
>>>                 /  \
>>>                /    \
>>>    [(2,3),(4,7)]     NTN
>>>                      /\
>>>                     /  \
>>>                    /    \
>>>                [(8,1)]     [ (7,2) (5,4) (9,6)]
>>>
>>>
>>> Where "NTN" stands for "non-terminal node".
>>> and the groups with parenthesis "[ ]" represent buckets.
>>>
>>>
>>> That is a first plane dividing the space at X=4.5
>>> One bucket on the left for points with X values < 4.5
>>> and on the right, a second plane dividing the space
>>> at Y = 1.5, to create on bucket on the left with (8,1)
>>> and on the right another bucket with points with Y values
>>> larger than 1.5, namely: [ (7,2) (5,4) (9,6)]
>>>
>>>
>>> -----
>>>
>>> Case BucketSize == 4:
>>>
>>> When using a bucket size of 4 , the function GenerateTreeLoop()
>>> is being called three times, and it generates the following tree
>>>
>>>
>>>                  NTN
>>>                  /\
>>>                 /  \
>>>                /    \
>>>    [(2,3),(4,7)]     [(8,1) (7,2) (5,4) (9,6)]
>>>
>>>
>>> That is a single plane dividing the space at X=4.5
>>>
>>>
>>> ----------
>>>
>>> In  either case,
>>> the partition is not what we could expect from a KdTree...
>>>
>>>
>>> Just to clarify, the real problem seems to be in the
>>> implementation of the algorithm in the KdTreeGenerator,
>>> not in the KdTree class itself...
>>>
>>>
>>>
>>>    Luis
>>>
>>>
>>>
>>> -------------------
>>>
>>>
>>> Luis Ibanez wrote:
>>>
>>>> Hi Ali,
>>>>
>>>> Thanks for the additinonal information.
>>>>
>>>>    Yeap, the algorithm should work regardless
>>>>    of the bucket size. That one was a false alarm.
>>>>
>>>>
>>>> I'm tracking now the cases where itkKdTreeTest2 segfaults
>>>> with a bucket size == 2.  Hopefully this will illuminate
>>>> at least part of the problem.
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>    Luis
>>>>
>>>>
>>>> ---------------
>>>> Ali - wrote:
>>>>
>>>>
>>>>
>>>>> Luis,
>>>>>
>>>>> I used a bucket size of 16 with about 100 particles. Like you 
>>>>> mentioned,
>>>
>>>
>>> the algorithm must work fine independent of these parameters.
>>>
>>>>>
>>>>> -Ali
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Hi Ali,
>>>>>>
>>>>>>
>>>>>> Could you please report
>>>>>>
>>>>>>
>>>>>> 1) How many points (measurement vectors) are you
>>>>>>    currently feeding into the KdTreeGenerator ?
>>>>>>
>>>>>> and
>>>>>>
>>>>>>
>>>>>> 2) What value of "BucketSize" did you use ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> The problem reported in the bug
>>>>>> http://public.kitware.com/Bug/view.php?id=5082
>>>>>>
>>>>>> Seems to be simply the result of a poor choice
>>>>>> on the value of the bucket size parameter.
>>>>>> For the test case reported in that bug, increasing
>>>>>> the value from 16 to 30 solves the problem.
>>>>>>
>>>>>> We are wondering if the problem that you are
>>>>>> experiencing is also the result of poor parameter
>>>>>> setting.
>>>>>>
>>>>>>
>>>>>>  Please let us know,
>>>>>>
>>>>>>
>>>>>>     Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>         Luis
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------
>>>>>> Ali - wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> A quick search in the mailing list shows that, during the past half
>>>
>>>
>>> a decade, it has been reported many times that the kd-tree classes in 
>>> ITK
>>> are buggy. Writing some library based on ITK, it took me a few days 
>>> to find
>>> out that the bug in my library is actually introduced by the kd-tree
>>> implementation in ITK which FINDS THE WRONG NEAREST NEIGHBOUR.
>>>
>>>>>>> I have no idea, against many warnings from the users, why the bug
>>>
>>>
>>> has not been resolved yet. In the case it is difficult to address the 
>>> bug,
>>> one option is to wrap many other existing implementations, personally I
>>> switched to libkdtree++.
>>>
>>>>>>> _________________________________________________________________
>>>>>>> 100's of prizes to be won at BigSnapSearch.com
>>>
>>>
>>> http://www.bigsnapsearch.com
>>>
>>>>>>> _______________________________________________
>>>>>>> Insight-users mailing list
>>>>>>> Insight-users at itk.org
>>>>>>> http://www.itk.org/mailman/listinfo/insight-users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _________________________________________________________________
>>>>> 100's of prizes to be won at BigSnapSearch.com
>>>
>>>
>>> http://www.bigsnapsearch.com
>>>
>>>>
>>>>
>>> _______________________________________________
>>> Insight-users mailing list
>>> Insight-users at itk.org
>>> http://www.itk.org/mailman/listinfo/insight-users
>>>
>>
>>
> 


More information about the Insight-users mailing list