[Insight-users] Kd-tree implementation in ITK is buggy and unreliable

Bill Lorensen bill.lorensen at gmail.com
Tue Apr 29 09:09:58 EDT 2008


Luis,

I fixed the floating point overflow problem in KdTree.

Bill

On Mon, Apr 28, 2008 at 8:21 AM, Luis Ibanez <luis.ibanez at kitware.com> wrote:
>
>
> Hi Ali,
>
>
> A quick update on the status of this problem:
>
>
> Bugs were identified and fixed at the level of the
> StatisticsAlgorithms:
>
>   * Partition
>   * QuickSelect
>
> http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkStatisticsAlgorithm.txx?root=Insight&r1=1.19&r2=1.21&sortby=date
> http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkKdTreeGenerator.txx?root=Insight&r1=1.14&r2=1.18&sortby=date
>
> Explicit tests were added for
>
>   * QuickSelect
>   * KdTree
>   * KdTreeGenerator
>
> (about 12 new tests).
>
> These new tests are passing.
>
>
> There are still side effects in the following (now failing) tests
>
>  * BayesianClassifierInitializeTest
>  * itkSampleMeanSjhiftClusteringFilterTest
>  * RBFTest1
>  * ScalarImageKmeansClassifierTest
>
> They are currently timing-out, (probably due to infinite-loops)
> We suspect that they are related to the WeightedCentroidKdTreeGenerator
> (but that is only  early speculation).
>
>
> We are now tracking those issues...
>
>
>
>
>
>     Luis
>
>
>
> --------------------
> Luis Ibanez wrote:
> >
> > Hi Bill,
> >
> > Yeap, the problem was in the combination of QuickSelect and Partition.
> >
> > Both of them are defined in
> >
> >   Insight/Code/Numerics/Statistics/itkStatisticsAlgorithms.txx
> >
> > Sorting the list before QuickSelect would only work when the measurement
> > vector has a single dimension. Otherwise, a single sort can not possibly
> > organize the list in all dimensions. We could call sort just before
> > Partition, to sort the activeDimension, but this is an overkill, since
> > Partition is indeed intended for parforming a partial sorting.
> >
> >
> > I have replaced both the QuickSelect and Partition functions with
> > implementations based on the description of the Partition and
> > QuickSelect algorithms in the Wikipedia:
> >
> >     http://en.wikipedia.org/wiki/Quickselect
> >
> > This works fine for KdTreeTest1 and KdTreeTest2.
> >
> >
> > However, the test for the StatisticsAlgorithms itself fails (timesout)
> > due to the fact that its list contains multiple entries with
> > values equal to the partition value. A case that is not considered
> > in the current algorithm. The current algorithm enters into an
> > infinite loop in this case.
> >
> > I'm modifying the implementation for covering that case.
> >
> > --
> >
> > BTW, about 10 test more were added, for exercising the KdTree
> > at different bucket sizes.
> >
> >
> > To be revisited: The tests in Borland are producing floating-point
> >                 overflows... not a good sign. There are possible
> >                 other issues in the computation.
> >
> >
> >
> >   Luis
> >
> >
> >
> > ---------------------
> > Bill Lorensen wrote:
> >
> >
> > > Luis,
> > >
> > > I think the main problem is that the samples need to be sorted. I added:
> > >  HeapSort<SubsampleType>(m_Subsample, partitionDimension, beginIndex,
> > > endIndex);
> > > just before the QuickSelect call and itkKdTreeTest1 passes
> > > consistently.
> > >
> > > However, this is not a good fix. I think the QuickSelect code should
> > > work. Probably the error is in Partition. It should group a list into
> > > two parts, one less than the partition value and another greater than
> > > or equal the value. Currently it looks like Partition assumes the list
> > > is sorted.
> > >
> > > BTW, itkKdTreeTest2 produces a tree different from the one in wikipedia.
> > >
> > > HTH,
> > > Bill
> > >
> > > On Fri, Apr 25, 2008 at 12:59 PM, Luis Ibanez <luis.ibanez at kitware.com>
> wrote:
> > >
> > >
> > > > Hi Ali,
> > > >
> > > >
> > > > An update on the KdTree segmentation fault
> > > > (that may or may not be related to the problem that you observe).
> > > >
> > > >
> > > > This test is running the example tree provided in the Wikipedia
> > > > http://en.wikipedia.org/wiki/Kdtree
> > > > http://en.wikipedia.org/wiki/Kdtree#Constructing_a_kd-tree
> > > >
> > > > We are feeding the set of points:
> > > >
> > > >       (2,3), (5,4), (9,6), (4,7), (8,1), (7,2)
> > > >
> > > > and expecting to get the tree:
> > > > http://upload.wikimedia.org/wikipedia/en/f/f1/Kd_tree.png
> > > >
> > > >               (7,2)
> > > >                 /\
> > > >                /  \
> > > >               /    \
> > > >           (5,4)    (9,6)
> > > >             /\       \
> > > >            /  \       \
> > > >           /    \       \
> > > >        (2,3)  (4,7)    (8,1)
> > > >
> > > >
> > > >
> > > > Here is what we currently observe as output for different
> > > > bucket sizes:
> > > >
> > > > -----
> > > >
> > > > Case BucketSize == 1:
> > > >
> > > > The segmentation fault of itkKdTreeTest2 with BucketSize==1
> > > > happens because the recursive function:
> > > >
> > > >   KdTreeGenerator::GenerateTreeLoop()  (line 179)
> > > >
> > > > ends up calling itself indifinitely, until it overflows the stack.
> > > > It enter in an infinite (recursive) loop calling:
> > > >
> > > >  GenerateTreeLoop( 0, 0, level++)
> > > >  GenerateTreeLoop( 0, 2, level++)
> > > >  GenerateTreeLoop( 0, 0, level++)
> > > >  GenerateTreeLoop( 0, 2, level++)
> > > >  GenerateTreeLoop( 0, 0, level++)
> > > >  GenerateTreeLoop( 0, 2, level++)
> > > >  ....
> > > >
> > > > where "0" and "2" here are the indexes of the points in the sample.
> > > >
> > > > -----
> > > >
> > > > Case BucketSize == 2:
> > > >
> > > > The segmentation fault of itkKdTreeTest2 with BucketSize==2
> > > > happens also because the recursive function:
> > > >
> > > >   KdTreeGenerator::GenerateTreeLoop()  (line 179)
> > > >
> > > > ends up calling itself indifinitely, until it overflows the stack.
> > > > It enter in an infinite (recursive) loop calling:
> > > >
> > > >  GenerateTreeLoop( 3, 3, level++)
> > > >  GenerateTreeLoop( 3, 6, level++)
> > > >  GenerateTreeLoop( 3, 3, level++)
> > > >  GenerateTreeLoop( 3, 6, level++)
> > > >  GenerateTreeLoop( 3, 3, level++)
> > > >  GenerateTreeLoop( 3, 6, level++)
> > > >  ....
> > > >
> > > > where "3" and "6" here are the indexes of the points in the sample.
> > > >
> > > > -----
> > > >
> > > > Case BucketSize == 3:
> > > >
> > > > When using a bucket size of 3, the function GenerateTreeLoop()
> > > > is being called only once, but it generates the following tree
> > > >
> > > >                 NTN
> > > >                 /\
> > > >                /  \
> > > >               /    \
> > > >   [(2,3),(4,7)]     NTN
> > > >                     /\
> > > >                    /  \
> > > >                   /    \
> > > >               [(8,1)]     [ (7,2) (5,4) (9,6)]
> > > >
> > > >
> > > > Where "NTN" stands for "non-terminal node".
> > > > and the groups with parenthesis "[ ]" represent buckets.
> > > >
> > > >
> > > > That is a first plane dividing the space at X=4.5
> > > > One bucket on the left for points with X values < 4.5
> > > > and on the right, a second plane dividing the space
> > > > at Y = 1.5, to create on bucket on the left with (8,1)
> > > > and on the right another bucket with points with Y values
> > > > larger than 1.5, namely: [ (7,2) (5,4) (9,6)]
> > > >
> > > >
> > > > -----
> > > >
> > > > Case BucketSize == 4:
> > > >
> > > > When using a bucket size of 4 , the function GenerateTreeLoop()
> > > > is being called three times, and it generates the following tree
> > > >
> > > >
> > > >                 NTN
> > > >                 /\
> > > >                /  \
> > > >               /    \
> > > >   [(2,3),(4,7)]     [(8,1) (7,2) (5,4) (9,6)]
> > > >
> > > >
> > > > That is a single plane dividing the space at X=4.5
> > > >
> > > >
> > > > ----------
> > > >
> > > > In  either case,
> > > > the partition is not what we could expect from a KdTree...
> > > >
> > > >
> > > > Just to clarify, the real problem seems to be in the
> > > > implementation of the algorithm in the KdTreeGenerator,
> > > > not in the KdTree class itself...
> > > >
> > > >
> > > >
> > > >   Luis
> > > >
> > > >
> > > >
> > > > -------------------
> > > >
> > > >
> > > > Luis Ibanez wrote:
> > > >
> > > >
> > > > > Hi Ali,
> > > > >
> > > > > Thanks for the additinonal information.
> > > > >
> > > > >   Yeap, the algorithm should work regardless
> > > > >   of the bucket size. That one was a false alarm.
> > > > >
> > > > >
> > > > > I'm tracking now the cases where itkKdTreeTest2 segfaults
> > > > > with a bucket size == 2.  Hopefully this will illuminate
> > > > > at least part of the problem.
> > > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > >   Luis
> > > > >
> > > > >
> > > > > ---------------
> > > > > Ali - wrote:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > Luis,
> > > > > >
> > > > > > I used a bucket size of 16 with about 100 particles. Like you
> mentioned,
> > > > > >
> > > > >
> > > >
> > > >
> > > > the algorithm must work fine independent of these parameters.
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > -Ali
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Hi Ali,
> > > > > > >
> > > > > > >
> > > > > > > Could you please report
> > > > > > >
> > > > > > >
> > > > > > > 1) How many points (measurement vectors) are you
> > > > > > >   currently feeding into the KdTreeGenerator ?
> > > > > > >
> > > > > > > and
> > > > > > >
> > > > > > >
> > > > > > > 2) What value of "BucketSize" did you use ?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > The problem reported in the bug
> > > > > > > http://public.kitware.com/Bug/view.php?id=5082
> > > > > > >
> > > > > > > Seems to be simply the result of a poor choice
> > > > > > > on the value of the bucket size parameter.
> > > > > > > For the test case reported in that bug, increasing
> > > > > > > the value from 16 to 30 solves the problem.
> > > > > > >
> > > > > > > We are wondering if the problem that you are
> > > > > > > experiencing is also the result of poor parameter
> > > > > > > setting.
> > > > > > >
> > > > > > >
> > > > > > >  Please let us know,
> > > > > > >
> > > > > > >
> > > > > > >    Thanks
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >        Luis
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ------------
> > > > > > > Ali - wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > A quick search in the mailing list shows that, during the past
> half
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > a decade, it has been reported many times that the kd-tree classes in
> ITK
> > > > are buggy. Writing some library based on ITK, it took me a few days to
> find
> > > > out that the bug in my library is actually introduced by the kd-tree
> > > > implementation in ITK which FINDS THE WRONG NEAREST NEIGHBOUR.
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > I have no idea, against many warnings from the users, why the
> bug
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > has not been resolved yet. In the case it is difficult to address the
> bug,
> > > > one option is to wrap many other existing implementations, personally
> I
> > > > switched to libkdtree++.
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> _________________________________________________________________
> > > > > > > > 100's of prizes to be won at BigSnapSearch.com
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > http://www.bigsnapsearch.com
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Insight-users mailing list
> > > > > > > > Insight-users at itk.org
> > > > > > > > http://www.itk.org/mailman/listinfo/insight-users
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > _________________________________________________________________
> > > > > > 100's of prizes to be won at BigSnapSearch.com
> > > > > >
> > > > >
> > > >
> > > >
> > > > http://www.bigsnapsearch.com
> > > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > _______________________________________________
> > > > Insight-users mailing list
> > > > Insight-users at itk.org
> > > > http://www.itk.org/mailman/listinfo/insight-users
> > > >
> > > >
> > >
> > >
> > >
> >
> >
>


More information about the Insight-users mailing list