[Insight-users] Kd-tree implementation in ITK is buggy and unreliable

Bill Lorensen bill.lorensen at gmail.com
Tue Apr 29 12:08:11 EDT 2008


I committed last night. You should see the results on tomorrow's dashboard.

Bill

On Tue, Apr 29, 2008 at 11:43 AM, Luis Ibanez <luis.ibanez at kitware.com> wrote:
>
> Hi Bill,
>
>          Great !
>
>
> Have you committed the fix ?
> Or, could you post the patch
>
>
> On a parallel note,
> I'm tracking the problem with the time out
> of about 6 test. The new version of QuickSelect
> fails for cases where there are many repeated
> values in the collection.  I'm looking at
> replacing it with the implementation of
> NthElement from STL.
>
>
>
>  Thanks
>
>
>
>
>     Luis
>
>
> -------------------
> Bill Lorensen wrote:
> > Luis,
> >
> > I fixed the floating point overflow problem in KdTree.
> >
> > Bill
> >
> > On Mon, Apr 28, 2008 at 8:21 AM, Luis Ibanez <luis.ibanez at kitware.com>
> wrote:
> >
> >
> > >
> > > Hi Ali,
> > >
> > >
> > > A quick update on the status of this problem:
> > >
> > >
> > > Bugs were identified and fixed at the level of the
> > > StatisticsAlgorithms:
> > >
> > >  * Partition
> > >  * QuickSelect
> > >
> > >
> http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkStatisticsAlgorithm.txx?root=Insight&r1=1.19&r2=1.21&sortby=date
> > >
> http://www.itk.org/cgi-bin/viewcvs.cgi/Code/Numerics/Statistics/itkKdTreeGenerator.txx?root=Insight&r1=1.14&r2=1.18&sortby=date
> > >
> > > Explicit tests were added for
> > >
> > >  * QuickSelect
> > >  * KdTree
> > >  * KdTreeGenerator
> > >
> > > (about 12 new tests).
> > >
> > > These new tests are passing.
> > >
> > >
> > > There are still side effects in the following (now failing) tests
> > >
> > > * BayesianClassifierInitializeTest
> > > * itkSampleMeanSjhiftClusteringFilterTest
> > > * RBFTest1
> > > * ScalarImageKmeansClassifierTest
> > >
> > > They are currently timing-out, (probably due to infinite-loops)
> > > We suspect that they are related to the WeightedCentroidKdTreeGenerator
> > > (but that is only  early speculation).
> > >
> > >
> > > We are now tracking those issues...
> > >
> > >
> > >
> > >
> > >
> > >   Luis
> > >
> > >
> > >
> > > --------------------
> > > Luis Ibanez wrote:
> > >
> > >
> > > > Hi Bill,
> > > >
> > > > Yeap, the problem was in the combination of QuickSelect and Partition.
> > > >
> > > > Both of them are defined in
> > > >
> > > >  Insight/Code/Numerics/Statistics/itkStatisticsAlgorithms.txx
> > > >
> > > > Sorting the list before QuickSelect would only work when the
> measurement
> > > > vector has a single dimension. Otherwise, a single sort can not
> possibly
> > > > organize the list in all dimensions. We could call sort just before
> > > > Partition, to sort the activeDimension, but this is an overkill, since
> > > > Partition is indeed intended for parforming a partial sorting.
> > > >
> > > >
> > > > I have replaced both the QuickSelect and Partition functions with
> > > > implementations based on the description of the Partition and
> > > > QuickSelect algorithms in the Wikipedia:
> > > >
> > > >   http://en.wikipedia.org/wiki/Quickselect
> > > >
> > > > This works fine for KdTreeTest1 and KdTreeTest2.
> > > >
> > > >
> > > > However, the test for the StatisticsAlgorithms itself fails (timesout)
> > > > due to the fact that its list contains multiple entries with
> > > > values equal to the partition value. A case that is not considered
> > > > in the current algorithm. The current algorithm enters into an
> > > > infinite loop in this case.
> > > >
> > > > I'm modifying the implementation for covering that case.
> > > >
> > > > --
> > > >
> > > > BTW, about 10 test more were added, for exercising the KdTree
> > > > at different bucket sizes.
> > > >
> > > >
> > > > To be revisited: The tests in Borland are producing floating-point
> > > >               overflows... not a good sign. There are possible
> > > >               other issues in the computation.
> > > >
> > > >
> > > >
> > > >  Luis
> > > >
> > > >
> > > >
> > > > ---------------------
> > > > Bill Lorensen wrote:
> > > >
> > > >
> > > >
> > > >
> > > > > Luis,
> > > > >
> > > > > I think the main problem is that the samples need to be sorted. I
> added:
> > > > > HeapSort<SubsampleType>(m_Subsample, partitionDimension, beginIndex,
> > > > > endIndex);
> > > > > just before the QuickSelect call and itkKdTreeTest1 passes
> > > > > consistently.
> > > > >
> > > > > However, this is not a good fix. I think the QuickSelect code should
> > > > > work. Probably the error is in Partition. It should group a list
> into
> > > > > two parts, one less than the partition value and another greater
> than
> > > > > or equal the value. Currently it looks like Partition assumes the
> list
> > > > > is sorted.
> > > > >
> > > > > BTW, itkKdTreeTest2 produces a tree different from the one in
> wikipedia.
> > > > >
> > > > > HTH,
> > > > > Bill
> > > > >
> > > > > On Fri, Apr 25, 2008 at 12:59 PM, Luis Ibanez
> <luis.ibanez at kitware.com>
> > > > >
> > > >
> > >
> > > wrote:
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > > > Hi Ali,
> > > > > >
> > > > > >
> > > > > > An update on the KdTree segmentation fault
> > > > > > (that may or may not be related to the problem that you observe).
> > > > > >
> > > > > >
> > > > > > This test is running the example tree provided in the Wikipedia
> > > > > > http://en.wikipedia.org/wiki/Kdtree
> > > > > > http://en.wikipedia.org/wiki/Kdtree#Constructing_a_kd-tree
> > > > > >
> > > > > > We are feeding the set of points:
> > > > > >
> > > > > >     (2,3), (5,4), (9,6), (4,7), (8,1), (7,2)
> > > > > >
> > > > > > and expecting to get the tree:
> > > > > > http://upload.wikimedia.org/wikipedia/en/f/f1/Kd_tree.png
> > > > > >
> > > > > >             (7,2)
> > > > > >               /\
> > > > > >              /  \
> > > > > >             /    \
> > > > > >         (5,4)    (9,6)
> > > > > >           /\       \
> > > > > >          /  \       \
> > > > > >         /    \       \
> > > > > >      (2,3)  (4,7)    (8,1)
> > > > > >
> > > > > >
> > > > > >
> > > > > > Here is what we currently observe as output for different
> > > > > > bucket sizes:
> > > > > >
> > > > > > -----
> > > > > >
> > > > > > Case BucketSize == 1:
> > > > > >
> > > > > > The segmentation fault of itkKdTreeTest2 with BucketSize==1
> > > > > > happens because the recursive function:
> > > > > >
> > > > > >  KdTreeGenerator::GenerateTreeLoop()  (line 179)
> > > > > >
> > > > > > ends up calling itself indifinitely, until it overflows the stack.
> > > > > > It enter in an infinite (recursive) loop calling:
> > > > > >
> > > > > > GenerateTreeLoop( 0, 0, level++)
> > > > > > GenerateTreeLoop( 0, 2, level++)
> > > > > > GenerateTreeLoop( 0, 0, level++)
> > > > > > GenerateTreeLoop( 0, 2, level++)
> > > > > > GenerateTreeLoop( 0, 0, level++)
> > > > > > GenerateTreeLoop( 0, 2, level++)
> > > > > > ....
> > > > > >
> > > > > > where "0" and "2" here are the indexes of the points in the
> sample.
> > > > > >
> > > > > > -----
> > > > > >
> > > > > > Case BucketSize == 2:
> > > > > >
> > > > > > The segmentation fault of itkKdTreeTest2 with BucketSize==2
> > > > > > happens also because the recursive function:
> > > > > >
> > > > > >  KdTreeGenerator::GenerateTreeLoop()  (line 179)
> > > > > >
> > > > > > ends up calling itself indifinitely, until it overflows the stack.
> > > > > > It enter in an infinite (recursive) loop calling:
> > > > > >
> > > > > > GenerateTreeLoop( 3, 3, level++)
> > > > > > GenerateTreeLoop( 3, 6, level++)
> > > > > > GenerateTreeLoop( 3, 3, level++)
> > > > > > GenerateTreeLoop( 3, 6, level++)
> > > > > > GenerateTreeLoop( 3, 3, level++)
> > > > > > GenerateTreeLoop( 3, 6, level++)
> > > > > > ....
> > > > > >
> > > > > > where "3" and "6" here are the indexes of the points in the
> sample.
> > > > > >
> > > > > > -----
> > > > > >
> > > > > > Case BucketSize == 3:
> > > > > >
> > > > > > When using a bucket size of 3, the function GenerateTreeLoop()
> > > > > > is being called only once, but it generates the following tree
> > > > > >
> > > > > >               NTN
> > > > > >               /\
> > > > > >              /  \
> > > > > >             /    \
> > > > > >  [(2,3),(4,7)]     NTN
> > > > > >                   /\
> > > > > >                  /  \
> > > > > >                 /    \
> > > > > >             [(8,1)]     [ (7,2) (5,4) (9,6)]
> > > > > >
> > > > > >
> > > > > > Where "NTN" stands for "non-terminal node".
> > > > > > and the groups with parenthesis "[ ]" represent buckets.
> > > > > >
> > > > > >
> > > > > > That is a first plane dividing the space at X=4.5
> > > > > > One bucket on the left for points with X values < 4.5
> > > > > > and on the right, a second plane dividing the space
> > > > > > at Y = 1.5, to create on bucket on the left with (8,1)
> > > > > > and on the right another bucket with points with Y values
> > > > > > larger than 1.5, namely: [ (7,2) (5,4) (9,6)]
> > > > > >
> > > > > >
> > > > > > -----
> > > > > >
> > > > > > Case BucketSize == 4:
> > > > > >
> > > > > > When using a bucket size of 4 , the function GenerateTreeLoop()
> > > > > > is being called three times, and it generates the following tree
> > > > > >
> > > > > >
> > > > > >               NTN
> > > > > >               /\
> > > > > >              /  \
> > > > > >             /    \
> > > > > >  [(2,3),(4,7)]     [(8,1) (7,2) (5,4) (9,6)]
> > > > > >
> > > > > >
> > > > > > That is a single plane dividing the space at X=4.5
> > > > > >
> > > > > >
> > > > > > ----------
> > > > > >
> > > > > > In  either case,
> > > > > > the partition is not what we could expect from a KdTree...
> > > > > >
> > > > > >
> > > > > > Just to clarify, the real problem seems to be in the
> > > > > > implementation of the algorithm in the KdTreeGenerator,
> > > > > > not in the KdTree class itself...
> > > > > >
> > > > > >
> > > > > >
> > > > > >  Luis
> > > > > >
> > > > > >
> > > > > >
> > > > > > -------------------
> > > > > >
> > > > > >
> > > > > > Luis Ibanez wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Hi Ali,
> > > > > > >
> > > > > > > Thanks for the additinonal information.
> > > > > > >
> > > > > > >  Yeap, the algorithm should work regardless
> > > > > > >  of the bucket size. That one was a false alarm.
> > > > > > >
> > > > > > >
> > > > > > > I'm tracking now the cases where itkKdTreeTest2 segfaults
> > > > > > > with a bucket size == 2.  Hopefully this will illuminate
> > > > > > > at least part of the problem.
> > > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > >  Luis
> > > > > > >
> > > > > > >
> > > > > > > ---------------
> > > > > > > Ali - wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Luis,
> > > > > > > >
> > > > > > > > I used a bucket size of 16 with about 100 particles. Like you
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > mentioned,
> > >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > the algorithm must work fine independent of these parameters.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > > -Ali
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Hi Ali,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Could you please report
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 1) How many points (measurement vectors) are you
> > > > > > > > >  currently feeding into the KdTreeGenerator ?
> > > > > > > > >
> > > > > > > > > and
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2) What value of "BucketSize" did you use ?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The problem reported in the bug
> > > > > > > > > http://public.kitware.com/Bug/view.php?id=5082
> > > > > > > > >
> > > > > > > > > Seems to be simply the result of a poor choice
> > > > > > > > > on the value of the bucket size parameter.
> > > > > > > > > For the test case reported in that bug, increasing
> > > > > > > > > the value from 16 to 30 solves the problem.
> > > > > > > > >
> > > > > > > > > We are wondering if the problem that you are
> > > > > > > > > experiencing is also the result of poor parameter
> > > > > > > > > setting.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Please let us know,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >  Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >      Luis
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > ------------
> > > > > > > > > Ali - wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > A quick search in the mailing list shows that, during the
> past
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > half
> > >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > a decade, it has been reported many times that the kd-tree classes
> in
> > > > > >
> > > > >
> > > >
> > >
> > > ITK
> > >
> > >
> > > >
> > > > >
> > > > > > are buggy. Writing some library based on ITK, it took me a few
> days to
> > > > > >
> > > > >
> > > >
> > >
> > > find
> > >
> > >
> > > >
> > > > >
> > > > > > out that the bug in my library is actually introduced by the
> kd-tree
> > > > > > implementation in ITK which FINDS THE WRONG NEAREST NEIGHBOUR.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > I have no idea, against many warnings from the users, why
> the
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > bug
> > >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > has not been resolved yet. In the case it is difficult to address
> the
> > > > > >
> > > > >
> > > >
> > >
> > > bug,
> > >
> > >
> > > >
> > > > >
> > > > > > one option is to wrap many other existing implementations,
> personally
> > > > > >
> > > > >
> > > >
> > >
> > > I
> > >
> > >
> > > >
> > > > >
> > > > > > switched to libkdtree++.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > _________________________________________________________________
> > >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > 100's of prizes to be won at BigSnapSearch.com
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > http://www.bigsnapsearch.com
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > Insight-users mailing list
> > > > > > > > > > Insight-users at itk.org
> > > > > > > > > > http://www.itk.org/mailman/listinfo/insight-users
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> _________________________________________________________________
> > > > > > > > 100's of prizes to be won at BigSnapSearch.com
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > http://www.bigsnapsearch.com
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > _______________________________________________
> > > > > > Insight-users mailing list
> > > > > > Insight-users at itk.org
> > > > > > http://www.itk.org/mailman/listinfo/insight-users
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >
>


More information about the Insight-users mailing list