[Insight-developers] Proposed requirements for convolution filters in ITK

Thu Jun 2 13:07:33 EDT 2011

Hello Cory,

Thanks a lot for your suggestions on how to organize the convolution
filters.  I agree with most of your points as well as Gaetan's replies.
Here are some thoughts about your questions.

"The output image should be the same size as the first input image."  
Perhaps the main reason for this is for easy streaming.  But in my
implementations, I generally set the output to be the full convolution
size (N+M-1,N+M-1), where N is the size of image 1, and M is the size of
image 2.  As long as we have an accessor for this image and perhaps one
for only the valid region, I think that is fine.  We should keep in mind
that if the images are the same size, the valid region is only 1 pixel.

"Treatment of regions beyond the boundaries of the input images should
be controllable by the user."
I believe that no handling of boundaries is necessary because the
algorithm should simply calculate the convolution in the overlap region
as the images are convolved.  This can be done explicitly in the spatial
domain, and it can be accomplished in the Fourier domain by padding both
images first with zeros so that their size is (N+M-1,N+M-1), where N is
the size of image 1, and M is the size of image 2.

"Normalization of the kernel image should be an option. Default is off."
Do you mean normalization to have a sum of one?  I agree that it should
be off by default.  For example, if the user wants to convolve two
images, it would be confusing if the algorithm normalizes the second
image.  The user should have to explicitly specify normalization or do
the normalization himself/herself beforehand.

"VNL operates only on images whose sizes are powers of two."
This is not entirely true; VNL requires that the image size be multiples
of 2s, 3s, and 5s, and this is far less restricting (and much faster).
It also turns out that, even though FFTW handles any size, it is much
faster when the images are multiples of 2s, 3s, and 5s.  In my
implementation, I wrote a function that calculates the next largest
image size in each dimension that is a multiple of 2s, 3s, and 5s and
then pads the image with zeros before doing the Fourier domain
computation.  Thus, the padding size should not be a parameter.  At the
end, it crops the correlation map back to the correct size.  Then the
result can be compared with the spatial domain version since the spatial
domain doesn't require any padding.  

I have already implemented a filter that addresses most of these
concerns in the context of Normalized Cross-Correlation in the Fourier
domain using VNL.  It is based on my "Masked FFT Registration" CVPR 2010
paper.  Many parts of that code should be relevant for the convolution
filter, and I can share it with you if that is helpful.

Thanks,
Dirk

Date: Wed, 1 Jun 2011 11:17:02 -0400
From: Cory Quammen <cquammen at cs.unc.edu>
Subject: [Insight-developers] Proposed requirements for convolution
	filters	in ITK
To: Insight Developers <insight-developers at itk.org>,	Nicholas
Tustison
	<ntustison at gmail.com>, 	Luis Ibanez <luis.ibanez at kitware.com>
Message-ID: <BANLkTikVqQuHX7ibA_zW0sXQz7H+0vMvWQ at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Here is what I would like to see in the convolution filters in ITK.

- Convolution filters should have the same interface to make it easy
to switch between the spatial domain and Fourier domain methods. This
suggests a class hierarchy for image convolution filters, perhaps
organized as follows:

--ConvolutionImageFilter
---SpatialDomainConvolutionImageFilter
---FourierDomainConvolutionImageFilter

- Each algorithm should take two inputs. The first is treated as the
image to convolve and the second as the convolution kernel.

- The output image should be the same size as the first input image.

- Both the image to convolve and the kernel can have an arbitrary
size. There should be no assumption that requires the kernel to be
smaller than the input image in any dimension or to have an even or
odd size.

- Results from the spatial domain implementation and Fourier domain
implementation should differ only by a tolerance when the same
settings are used for each filter and the pixel type of the images is
compatible with the underlying FFT routines used in the Fourier domain
implementation.

- The spacing of the two input images should be required to be the
same (other specialized convolution algorithms may handle other
cases). *Or* the image with coarser spacing should be resampled to the
same spacing as the image with finer spacing.

- The user must have the ability to specify which index is treated as
the center of the second input. This should default to
floor((2*index+size-1)/2) where index and size are from the largest
possible region of the kernel image input.

- The treatment of regions beyond the boundaries of the input images
should be controllable by the user. Typical choices include constant
padding, periodic, mirror, and zero-flux Neumann boundary conditions.
Default should be constant padding with a constant of zero.

- Normalization of the kernel image should be an option. Default is off.

Outstanding question:

- VNL operates only on images whose sizes are powers of two. For best
performance, FFTW requires that the largest prime factor in an image
size should be 13 or less. Input images should be padded to meet those
size constraints in the Fourier domain filter, but no such restriction
exists for padding in the spatial domain filter. Nevertheless, to
compare outputs of the spatial and Fourier domain filters, users
should be able to set the padding size in the spatial domain filter to
be the same as the padding in the Fourier domain filter. Does exposing
a class method to control this justified if it is used only for
testing? Or should testing for agreement between the spatial and
Fourier domain filters be restricted to cases where no special padding
for VNL or FFTW is required?

Please let me know what you think.

Thanks,
Cory