[Insight-users] Re: itkGradientDescentOptimizer

Luis Ibanez luis.ibanez@kitware.com
Fri, 18 Oct 2002 10:40:49 -0400


Hi digvijay,

The gradient descent optimizer implemented
in ITK is designed for finding the optimal
value of a single valued function of type

   V = f( P ) = f( x1, x2, x3, .... xn )

V = scalar
P = {x1,x2,...,xn} are the parameters of the
function to be optimized. They define an
N-dimensional parameteric space.

The optimizer explores this N-D space
following the direction of the gradient
of f().

At each N-D point, P={x1,x2,...,xn}
the gradient of f(P) is computed as:

   Gf = {df/dx1, df/dx2,....,df/dxn }

This vector (actually a covariant vector) is
then used to define a direction along which
the optimizer will move to the next position
on the parametric space. The lenght of the
step is computed using the learning rate.

Like

       P' = P +  R * Gf

R  = scalar = learning rate
Gf = N components: gradient of F
P  = N components: point in the parametric space

The same equation is better presented in
the doxygen documentation:
http://www.itk.org/Insight/Doxygen/html/classitk_1_1GradientDescentOptimizer.html

The learning rate defines how big the steps
should be in the parametric space.

A large learning rate will make optimization faster,
but also will make it very unstable since in places
with high gradients it may sent the next point too
far in the parametric space breaking the continous
walk that it is supposed to follow.

Small learning rates will be more stable and reliable
but will result in slow optimizations.

The problem with the learning rate is that it is
difficult to find an optimal value for it.
If the optimization path crosses regions of high
and low gradients the learning rate value may not be
apropriate for all the conditions.

"Learning Rate" is a bit missleading here since it
looks like some training is going on (like in neural
networks). This is not the case here, the learning
rate parameter is just a multiplier of the gradient.



An alternative is to use the
RegularStepGradientDescentOptimizer
http://www.itk.org/Insight/Doxygen/html/classitk_1_1RegularStepGradientDescentBaseOptimizer.html
which is steadily walking at a regular step length
regardless of the gradient value. The gradient vector
is only used to determine the direction of the step.

The step lenght is reduce only when the gradient have
sudden changes in direccion. (in that case the step
lenght is divided by 2 at each direction change).



Please let us know if you have further questions,


   Thanks


     Luis


==============================================
digvijay singh wrote:
> Hi all...
> I am trying to use the itkgradientdescent optimizer
> Could somebody please explain 
> 1-)The use of GetDerivative: i have an inkling that it
> is  used to give system constraints but i am not sure.
> 2-)Is the learning rate  applied to just the error
> function or to the individual parameters as well
> Any info would be appreciated.
> thanks
> digvijay
> 
>