[Insight-users] KD Tree Search Question...

Fri Mar 24 12:09:15 EST 2006

Jim,

  In my opinion, I think returning self-matches are the way to go.
Users that don't want self-matches can check against a 0+epsilon
distance.  ITK could even mask the whole self-match checking into a
member function, as you suggest, to make everyone's life easier.
Concomitant to this should be an enforcement of how the NN are returned:
i.e. in ascending or descending order.  That way users could easily
filter for the NN's they want just in case the ITK 0 distance check is
not what they are looking for.

  I have replaced the ITK KDtree with ANN
( http://www.cs.umd.edu/~mount/ANN/ )to help speed-up the development of
my algorithm.  Once I get my application coded up and crunching numbers,
I'll re-generate my test case and submit it to the ITK bug tracker.

Best regards,

Tuhin

On Fri, 2006-03-24 at 09:28 -0500, Miller, James V (GE, Research) wrote:
> Tuhin,
> 
> This sounds like a bug.  The KdTree should do one or the other
> but not sometimes one and sometimes the other.
> 
> Now, the question becomes what should it do?
> 
> I see use cases for both.  For instance, you may not know (realize)
> that you point you are querying with is already in the KdTree and
> you want to know if this position has already been used. So in this
> case, you would want to be able to query a position in the domain
> and if that position is in the tree, return it in the nearest
> neighbor list.  This would occur if you are "testing" the tree
> against data other than what it was built with.
> 
> On the other hand, if you built a tree, you may now want to 
> reason/cluster based on the tree.  So when you query for the
> k-nearest neighbors, you want the k-nearest that do not 
> include the point the in question.
> 
> I see two solutions:
> 
> 1) If the point exists in the tree, it is returned as part of the 
> nearest neighbor set.  It would be an algorithms responsibility to 
> decide whether to use that point (sort by distance and ignore any
> point with a distance of zero).
> 
> 2) Provide two sets of methods.  One which will return the query point
> in the nearest neighbor set if the query point is in the tree. And one
> which will not return the query point in the nearest neighbor set even
> if the query point is the tree.
> 
> Any preferences?
> 
> The bottom line is that it should be deterministic and not do one
> thing sometimes and another thing another time.  Can you construct
> a small test case (maybe lower dimension) that illustrates the bug?
> 
> Jim
> 
> 
> 
> -----Original Message-----
> From: insight-users-bounces+millerjv=crd.ge.com at itk.org
> [mailto:insight-users-bounces+millerjv=crd.ge.com at itk.org]On Behalf Of
> Tuhin Sinha
> Sent: Thursday, March 23, 2006 5:43 PM
> To: insight-users at itk.org
> Subject: [Insight-users] KD Tree Search Question...
> 
> 
> Hello insight-users,
> 
> I have a quick question about the KdTree implementation in ITK.  I have
> started using the KdTree in ITK to help speed up nearest-neighbor
> searches in high-dimensional spaces (D > ~500).  In the process of
> testing the KdTree, I noticed that for a k-nearest-neighbor search: 1)
> the nearest-neighbors are not returned in any specific order, and 2)
> _sometimes_ the query point itself (which is guaranteed to be in the
> KdTree) is returned as a nearest-neighbor.
> 
> The first observation is not a big issue, but the second one is
> problematic for my specific needs.  I would like to enforce some
> consistency in the behavior of the KdTree in terms of the returned
> nearest-neighbors; either give me the query point every time or never.
> 
>   Does the behavior I am getting from the KdTree correlate with what the
> developers intended?  That is, if one query's a KDTree in ITK with a
> point that is in the KDTree what should be the returned
> nearest-neighbor, itself or it's closest neighbor?  In my testing a
> 1-neighbor search yeilds both results; sometimes I get the query point,
> and sometimes I don't.  Is the end-user supposed to enforce some sort of
> redundancy checking of the returned nearest neighbors for KD searches in
> ITK? Or, do I have a bug in my code?  I am running ITK 2.4 under Linux.
> 
> Thanks,
-- 
Tuhin K. Sinha, Ph.D.
Institute of Imaging Science
Department of Radiology and Radiological Sciences
Vanderbilt University
Medical Center North,  Suite R-1302
Nashville, TN 37232-2310