[Insight-users] Kd-tree implementation in ITK is buggy and unreliable
Luis Ibanez
luis.ibanez at kitware.com
Tue Apr 29 11:43:09 EDT 2008
Hi Bill,
Great !
Have you committed the fix ?
Or, could you post the patch
On a parallel note,
I'm tracking the problem with the time out
of about 6 test. The new version of QuickSelect
fails for cases where there are many repeated
values in the collection. I'm looking at
replacing it with the implementation of
NthElement from STL.
Thanks
Luis
-------------------
Bill Lorensen wrote:
> Luis,
>
> I fixed the floating point overflow problem in KdTree.
>
> Bill
>
> On Mon, Apr 28, 2008 at 8:21 AM, Luis Ibanez <luis.ibanez at kitware.com> wrote:
>
>>
>>Hi Ali,
>>
>>
>>A quick update on the status of this problem:
>>
>>
>>Bugs were identified and fixed at the level of the
>>StatisticsAlgorithms:
>>
>> * Partition
>> * QuickSelect
>>
>>http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkStatisticsAlgorithm.txx?root=Insight&r1=1.19&r2=1.21&sortby=date
>>http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkKdTreeGenerator.txx?root=Insight&r1=1.14&r2=1.18&sortby=date
>>
>>Explicit tests were added for
>>
>> * QuickSelect
>> * KdTree
>> * KdTreeGenerator
>>
>>(about 12 new tests).
>>
>>These new tests are passing.
>>
>>
>>There are still side effects in the following (now failing) tests
>>
>> * BayesianClassifierInitializeTest
>> * itkSampleMeanSjhiftClusteringFilterTest
>> * RBFTest1
>> * ScalarImageKmeansClassifierTest
>>
>>They are currently timing-out, (probably due to infinite-loops)
>>We suspect that they are related to the WeightedCentroidKdTreeGenerator
>>(but that is only early speculation).
>>
>>
>>We are now tracking those issues...
>>
>>
>>
>>
>>
>> Luis
>>
>>
>>
>>--------------------
>>Luis Ibanez wrote:
>>
>>>Hi Bill,
>>>
>>>Yeap, the problem was in the combination of QuickSelect and Partition.
>>>
>>>Both of them are defined in
>>>
>>> Insight/Code/Numerics/Statistics/itkStatisticsAlgorithms.txx
>>>
>>>Sorting the list before QuickSelect would only work when the measurement
>>>vector has a single dimension. Otherwise, a single sort can not possibly
>>>organize the list in all dimensions. We could call sort just before
>>>Partition, to sort the activeDimension, but this is an overkill, since
>>>Partition is indeed intended for parforming a partial sorting.
>>>
>>>
>>>I have replaced both the QuickSelect and Partition functions with
>>>implementations based on the description of the Partition and
>>>QuickSelect algorithms in the Wikipedia:
>>>
>>> http://en.wikipedia.org/wiki/Quickselect
>>>
>>>This works fine for KdTreeTest1 and KdTreeTest2.
>>>
>>>
>>>However, the test for the StatisticsAlgorithms itself fails (timesout)
>>>due to the fact that its list contains multiple entries with
>>>values equal to the partition value. A case that is not considered
>>>in the current algorithm. The current algorithm enters into an
>>>infinite loop in this case.
>>>
>>>I'm modifying the implementation for covering that case.
>>>
>>>--
>>>
>>>BTW, about 10 test more were added, for exercising the KdTree
>>>at different bucket sizes.
>>>
>>>
>>>To be revisited: The tests in Borland are producing floating-point
>>> overflows... not a good sign. There are possible
>>> other issues in the computation.
>>>
>>>
>>>
>>> Luis
>>>
>>>
>>>
>>>---------------------
>>>Bill Lorensen wrote:
>>>
>>>
>>>
>>>>Luis,
>>>>
>>>>I think the main problem is that the samples need to be sorted. I added:
>>>> HeapSort<SubsampleType>(m_Subsample, partitionDimension, beginIndex,
>>>>endIndex);
>>>>just before the QuickSelect call and itkKdTreeTest1 passes
>>>>consistently.
>>>>
>>>>However, this is not a good fix. I think the QuickSelect code should
>>>>work. Probably the error is in Partition. It should group a list into
>>>>two parts, one less than the partition value and another greater than
>>>>or equal the value. Currently it looks like Partition assumes the list
>>>>is sorted.
>>>>
>>>>BTW, itkKdTreeTest2 produces a tree different from the one in wikipedia.
>>>>
>>>>HTH,
>>>>Bill
>>>>
>>>>On Fri, Apr 25, 2008 at 12:59 PM, Luis Ibanez <luis.ibanez at kitware.com>
>>
>>wrote:
>>
>>>>
>>>>>Hi Ali,
>>>>>
>>>>>
>>>>>An update on the KdTree segmentation fault
>>>>>(that may or may not be related to the problem that you observe).
>>>>>
>>>>>
>>>>>This test is running the example tree provided in the Wikipedia
>>>>>http://en.wikipedia.org/wiki/Kdtree
>>>>>http://en.wikipedia.org/wiki/Kdtree#Constructing_a_kd-tree
>>>>>
>>>>>We are feeding the set of points:
>>>>>
>>>>> (2,3), (5,4), (9,6), (4,7), (8,1), (7,2)
>>>>>
>>>>>and expecting to get the tree:
>>>>>http://upload.wikimedia.org/wikipedia/en/f/f1/Kd_tree.png
>>>>>
>>>>> (7,2)
>>>>> /\
>>>>> / \
>>>>> / \
>>>>> (5,4) (9,6)
>>>>> /\ \
>>>>> / \ \
>>>>> / \ \
>>>>> (2,3) (4,7) (8,1)
>>>>>
>>>>>
>>>>>
>>>>>Here is what we currently observe as output for different
>>>>>bucket sizes:
>>>>>
>>>>>-----
>>>>>
>>>>>Case BucketSize == 1:
>>>>>
>>>>>The segmentation fault of itkKdTreeTest2 with BucketSize==1
>>>>>happens because the recursive function:
>>>>>
>>>>> KdTreeGenerator::GenerateTreeLoop() (line 179)
>>>>>
>>>>>ends up calling itself indifinitely, until it overflows the stack.
>>>>>It enter in an infinite (recursive) loop calling:
>>>>>
>>>>> GenerateTreeLoop( 0, 0, level++)
>>>>> GenerateTreeLoop( 0, 2, level++)
>>>>> GenerateTreeLoop( 0, 0, level++)
>>>>> GenerateTreeLoop( 0, 2, level++)
>>>>> GenerateTreeLoop( 0, 0, level++)
>>>>> GenerateTreeLoop( 0, 2, level++)
>>>>> ....
>>>>>
>>>>>where "0" and "2" here are the indexes of the points in the sample.
>>>>>
>>>>>-----
>>>>>
>>>>>Case BucketSize == 2:
>>>>>
>>>>>The segmentation fault of itkKdTreeTest2 with BucketSize==2
>>>>>happens also because the recursive function:
>>>>>
>>>>> KdTreeGenerator::GenerateTreeLoop() (line 179)
>>>>>
>>>>>ends up calling itself indifinitely, until it overflows the stack.
>>>>>It enter in an infinite (recursive) loop calling:
>>>>>
>>>>> GenerateTreeLoop( 3, 3, level++)
>>>>> GenerateTreeLoop( 3, 6, level++)
>>>>> GenerateTreeLoop( 3, 3, level++)
>>>>> GenerateTreeLoop( 3, 6, level++)
>>>>> GenerateTreeLoop( 3, 3, level++)
>>>>> GenerateTreeLoop( 3, 6, level++)
>>>>> ....
>>>>>
>>>>>where "3" and "6" here are the indexes of the points in the sample.
>>>>>
>>>>>-----
>>>>>
>>>>>Case BucketSize == 3:
>>>>>
>>>>>When using a bucket size of 3, the function GenerateTreeLoop()
>>>>>is being called only once, but it generates the following tree
>>>>>
>>>>> NTN
>>>>> /\
>>>>> / \
>>>>> / \
>>>>> [(2,3),(4,7)] NTN
>>>>> /\
>>>>> / \
>>>>> / \
>>>>> [(8,1)] [ (7,2) (5,4) (9,6)]
>>>>>
>>>>>
>>>>>Where "NTN" stands for "non-terminal node".
>>>>>and the groups with parenthesis "[ ]" represent buckets.
>>>>>
>>>>>
>>>>>That is a first plane dividing the space at X=4.5
>>>>>One bucket on the left for points with X values < 4.5
>>>>>and on the right, a second plane dividing the space
>>>>>at Y = 1.5, to create on bucket on the left with (8,1)
>>>>>and on the right another bucket with points with Y values
>>>>>larger than 1.5, namely: [ (7,2) (5,4) (9,6)]
>>>>>
>>>>>
>>>>>-----
>>>>>
>>>>>Case BucketSize == 4:
>>>>>
>>>>>When using a bucket size of 4 , the function GenerateTreeLoop()
>>>>>is being called three times, and it generates the following tree
>>>>>
>>>>>
>>>>> NTN
>>>>> /\
>>>>> / \
>>>>> / \
>>>>> [(2,3),(4,7)] [(8,1) (7,2) (5,4) (9,6)]
>>>>>
>>>>>
>>>>>That is a single plane dividing the space at X=4.5
>>>>>
>>>>>
>>>>>----------
>>>>>
>>>>>In either case,
>>>>>the partition is not what we could expect from a KdTree...
>>>>>
>>>>>
>>>>>Just to clarify, the real problem seems to be in the
>>>>>implementation of the algorithm in the KdTreeGenerator,
>>>>>not in the KdTree class itself...
>>>>>
>>>>>
>>>>>
>>>>> Luis
>>>>>
>>>>>
>>>>>
>>>>>-------------------
>>>>>
>>>>>
>>>>>Luis Ibanez wrote:
>>>>>
>>>>>
>>>>>
>>>>>>Hi Ali,
>>>>>>
>>>>>>Thanks for the additinonal information.
>>>>>>
>>>>>> Yeap, the algorithm should work regardless
>>>>>> of the bucket size. That one was a false alarm.
>>>>>>
>>>>>>
>>>>>>I'm tracking now the cases where itkKdTreeTest2 segfaults
>>>>>>with a bucket size == 2. Hopefully this will illuminate
>>>>>>at least part of the problem.
>>>>>>
>>>>>>
>>>>>>Thanks
>>>>>>
>>>>>>
>>>>>> Luis
>>>>>>
>>>>>>
>>>>>>---------------
>>>>>>Ali - wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Luis,
>>>>>>>
>>>>>>>I used a bucket size of 16 with about 100 particles. Like you
>>
>>mentioned,
>>
>>>>>
>>>>>the algorithm must work fine independent of these parameters.
>>>>>
>>>>>
>>>>>
>>>>>>>-Ali
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Hi Ali,
>>>>>>>>
>>>>>>>>
>>>>>>>>Could you please report
>>>>>>>>
>>>>>>>>
>>>>>>>>1) How many points (measurement vectors) are you
>>>>>>>> currently feeding into the KdTreeGenerator ?
>>>>>>>>
>>>>>>>>and
>>>>>>>>
>>>>>>>>
>>>>>>>>2) What value of "BucketSize" did you use ?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>The problem reported in the bug
>>>>>>>>http://public.kitware.com/Bug/view.php?id=5082
>>>>>>>>
>>>>>>>>Seems to be simply the result of a poor choice
>>>>>>>>on the value of the bucket size parameter.
>>>>>>>>For the test case reported in that bug, increasing
>>>>>>>>the value from 16 to 30 solves the problem.
>>>>>>>>
>>>>>>>>We are wondering if the problem that you are
>>>>>>>>experiencing is also the result of poor parameter
>>>>>>>>setting.
>>>>>>>>
>>>>>>>>
>>>>>>>> Please let us know,
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Luis
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>------------
>>>>>>>>Ali - wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>Hi,
>>>>>>>>>
>>>>>>>>>A quick search in the mailing list shows that, during the past
>>
>>half
>>
>>>>>
>>>>>a decade, it has been reported many times that the kd-tree classes in
>>
>>ITK
>>
>>>>>are buggy. Writing some library based on ITK, it took me a few days to
>>
>>find
>>
>>>>>out that the bug in my library is actually introduced by the kd-tree
>>>>>implementation in ITK which FINDS THE WRONG NEAREST NEIGHBOUR.
>>>>>
>>>>>
>>>>>
>>>>>>>>>I have no idea, against many warnings from the users, why the
>>
>>bug
>>
>>>>>
>>>>>has not been resolved yet. In the case it is difficult to address the
>>
>>bug,
>>
>>>>>one option is to wrap many other existing implementations, personally
>>
>>I
>>
>>>>>switched to libkdtree++.
>>>>>
>>>>>
>>>>>
>>_________________________________________________________________
>>
>>>>>>>>>100's of prizes to be won at BigSnapSearch.com
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>>http://www.bigsnapsearch.com
>>>>>
>>>>>
>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Insight-users mailing list
>>>>>>>>>Insight-users at itk.org
>>>>>>>>>http://www.itk.org/mailman/listinfo/insight-users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>_________________________________________________________________
>>>>>>>100's of prizes to be won at BigSnapSearch.com
>>>>>>>
>>>>>>
>>>>>
>>>>>http://www.bigsnapsearch.com
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>_______________________________________________
>>>>>Insight-users mailing list
>>>>>Insight-users at itk.org
>>>>>http://www.itk.org/mailman/listinfo/insight-users
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>
More information about the Insight-users
mailing list