[Insight-users] Understanding OptimizerScales in its entirety

Fri Aug 26 15:52:53 EDT 2011

Hi Rupert,
     I am still working my way through your thesis, and I have also  
downloaded the trust region optimizer you provided for the Insight  
Journal.  I am still having trouble getting the results I desire using  
your optimizer for registering multimodal (T1, T2, DWI) brain data  
using a Rigid 3D Versor transform and the mutual information metric  
(this is done as an initialization to then feed deformable  
registrations).  Based on what I've read and experimented with so far,  
I believe this trouble can be attributed to some or all of the  
following issues:

1.  I don't think my scaling is right yet, do you have an  
implementation available of your scaling calculation?

2.  I think the optimizer is not using enough samples.  According to a  
comment I found in one of the ITK examples, "Regulating the number of  
samples in the Metric is equivalent to performing multi-resolution  
registration because it is indeed a sub-sampling of the image."   
However, based on my own experiments and also based on your  
evaluation, just arbitrarily setting a fixed percentage of pixels to  
use does not perform well especially in the mutual information case.   
I feel like I should be doing an actual multi-resolution registration  
or at least using all of the pixels in the image for a single-level  
registration.

3.  I am not sure that I can guarantee (even if I fix 1-2 above) that  
I will be starting within the capture radius of the optimizer.  Do you  
have any thoughts / recommendations / suggestions on how to better  
initialize the transform?  Or do you have an analysis of what the ITK  
optimizers' capture radii are like?  I still seem to have significant  
translation and rotation error if I use the  
CenteredVersorTransformInitializer with and without moments on.

4. I modified VersorRigid3DTransformOptimizer to derive from your  
trust region optimizer, but found that the API for StepAlongGradient  
no longer provides the factor as an argument.  The Versor optimizer  
was scaling the rotation and the transformed gradient by that factor,  
is this no longer necessary?

5.  Can you provide access to your modified Hessian approximation for  
mutual information?

Thanks!
Kris Zygmunt
krismz at sci.utah.edu

>
>
> Hi Rupert,
>
> thank you for your very helpful explanations and the link to your  
> thesis!!
> Your thesis incidentally answered some other questions I had.
>
> regards
> Levin
> ________________________________________
> Von: Rupert Brooks [rupert.brooks at gmail.com]
> Gesendet: Dienstag, 16. August 2011 04:01
> An: Wolf, Levin
> Cc: insight-users at itk.org
> Betreff: Re: [Insight-users] Understanding OptimizerScales in its  
> entirety
>
> Hi Levin,
>
> Perhaps no one responded because understanding the optimizer scales in
> their _entirety_ is a very tall order. :-) I'll take a shot at
> answering the immediate question anyway.
>
> The optimizer scales are, unfortunately, not consistently applied
> across all the ITK optimizers.  However, VersorRigid3DOptimizer is a
> subclass of RegularStepGradientDescentOptimizer and they both work the
> same way.
>
> In these optimizers, the gradient is divided by the scales.  Then the
> step is this direction normalized to the step length.
>
> This sounds simple but the effect is a bit counterintuitive.  This is
> like scaling the transform parameters by the square roots of the
> optimizer scale factors, and then limiting the step to a circle in the
> original parameter space.  Which would be an ellipse in the new one.
> Why square root? because you change the derivative by changing the
> scales - and then consider it a direction in the original parameter
> space.
>
> I apologize in advance for self-promotion, but i just put an optimizer
> on the Insight-Journal that may interest you, if you are digging into
> this subject.  http://www.insight-journal.org/browse/publication/834
> Different people have different theories about what the scales
> accomplish - if you are up to some rather dry reading, i'll refer you
> to Section 4.5 of my thesis
> http://www.rupertbrooks.ca/downloads/Brooks_PhDThesis.pdf
>
> And yes, different people have different heuristics for how to set
> these scales.  In the thesis I argued that they should be chosen to
> precondition the Hessian matrix of the cost function.  Others will
> tell you they should roughly equalize the average pixel motion in the
> image due to a unit shift of the parameters.  It turns out that these
> are roughly the same thing.  Its important also to consider both how
> the scales affect the optimizer path through parameter space, and how
> they affect the stopping criteria.
>
> Hope that helps a little,
> Rupert
>
>
> --------------------------------------------------------------
> Rupert Brooks
> rupert.brooks at gmail.com
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-users/attachments/20110826/22e6a914/attachment.htm>