[Insight-users] Speed optimizations in mutual information registration

Wed, 22 May 2002 13:08:58 +0300

Hi!

I did some profiling to the mutual information multi-resolution matching 
routines. It seems that a considerable amount of the total optimization 
time is spent in the dynamic allocation of vectors. Since the vector type 
itk::Array has variable size, it has to do all memory allocations from the 
heap. Unfortunately heap allocations and deallocations tend to be quite 
expensive in terms of speed.

I then did an quick and dirty hack to replace the use of itk::Array with 
itk::FixedArray. This resulted in quite large speedup: the total 
registration time (which includes the read & write routines etc.) was cut 
in half. I also calculated the time spent in the core optimization routine 
m_Optimizer->StartOptimization() by cumulating the run times in each round:

Time before optimization: 270 seconds
Time after optimization: 38 seconds

My test setup: 2 x Xeon 1,7 MHz, Windows 2000, Visual Studio 6.0, 
RelWithDebInfo configuration, CT image 512x512x76, MR image 256x256x82, 
latest CVS version of ITK, 5 levels of multiresolution with 2500 iterations 
per level.

Now my question is, is there any interest in the ITK developers or ITK 
community in these speed-ups? If there is, I may use part of my work time 
in this project, if needed.

Here are some of my thoughts about speed-up:
1) Changing the type of the parameter and derivative of 
MIImageToImageMetric to itk::FixedArray. This is what I did in my quick 
hack. Unfortunately this change is not small: currently all the 
optimization routines in the Numeric subdirectory are hard-coded to use 
itk::Array as the type of parameter and derivative.

However, I have the feeling that templatizing the optimization routines 
over the vector types would be in the spirit of ITK. It would make them 
more generic in the spirit of STL's algorithms.

2) A smaller change would be to use a fixed array inside 
itk::MutualInformationImageToImageMetric in all the calculations. I have 
not benchmarked how this would affect the run times.

3) Yet another possibility would be to use specialized fixed size allocator 
in itk::Array, such as that in the Loki library [1]. Again, I do not have 
any data about the speed-ups attainable by this method.

Any thoughts?

Hannu Helminen
Varian Medical Systems Finland Oy

[1] <http://sourceforge.net/projects/loki-lib/>