[Insight-users] Quaternions and Versors

Sat Jun 14 08:36:33 EDT 2008

Hi Luis,

thank you so much for the extensive answer. It helped me understand the 
whole idea a lot better. So basically, as long as I actually want to use 
a similarity transform for my registration, I can just use a quaternion 
transform with for instance an LBFGS optimizer.

But the whole thing sparked my interest a bit more. I've looked into 
Hamilton's book and the tutorials more closely, in order to understand 
what is actually going on with the itk::VersorTransform and -Optimizer.
It looks like the rotation performed by the versor transform which 
transforms a vector v can be expressed by the quaternion equation:

    f(q) = q . v . q^(-1)

In order to minimize with respect to q, we form the differential 
according to all the rules in Hamilton's book:

   dfq =  df(q,dq) = dq . v . q^(-1)  -  q . v . q^(-1) . dq . q^(-1)

So how do I get from this expression to the Jacobian implemented in the 
itk::VersorTransform?

Looking at the whole problem form the optimization point of view poses 
yet another question:
Thanks to the Taylor expansion we have

    f(q + dq) \approx f(q) + dfq

and

    f(q + x dq) \approx f(q) + x dfq

In regular gradient descent optimization, the corresponding linear 
approximation can be seen as the basis for the minimization. For smarter 
optimization techniques, the second term of the Taylor expansion can be 
used as well.
So if we assume that f is now the cost function for which we wish to 
optimize with respect to the versor or quaternion q, this implies using 
an additive update dq or x dq . But you have convinced me that just 
adding an update would not be the correct operation in the versor space. 
So how is the optimization formulated and motivated? At this point, the 
tutorial doesn't offer too many details besides a couple of drawings. Do 
we just assume that:

    f(q . dq) \approx f(q) + dfq

or

    f(q . dq^x) \approx f(q) + x dfq

Or can this actually be proved?

Well , I hope you find some time to shed a little more light on this 
interesting yet complicated topic.

Thanks

Tom

Luis Ibanez wrote:
>
>
> Hi Thomas,
>
>
> Thanks for the very detailed introduction to your questions.
>
>
> Yes, as you pointed out, the Versor space is not a Vector space, in
>      the sense that in order to combine two Versors you must use the
>      operation of Versor composition, instead of the simple addition
>      of their components.
>
>
> About your specific questions:
>
>
> A) "How is the versor derivative defined and calculated?"
>
>    Hamilton's book on quaternions describes the proper definition
>    of derivatives that applies to the Quaternion space.
>
>    Instead of the well-know definition of differential that
>    we use for Vector spaces:
>
>        df(x)/dx =  lim( h->0 )( ( f(x+h) - f(x) ) / h )
>
>     Hamilton uses the following definition of differential for
>     Quaternions spaces:
>
>        df(x)    = lim ( n->inf )[  f( x + (dx/n) ) - f(x) ]
>
>
> A.1) "Is there any publication outlining the exact calculations
>       leading to the versor derivative implemented in the
>       VersorRigid3DTransform?"
>
>      Yes, the computations are based on Hamilton's book description.
>
>      The full analysis is presented in the ITK Tutorials
>
>             http://www.itk.org/HTML/Tutorials.htm
>
>      specifically in the QuaternionsII.pdf tutorial
>      (toward the end of the Tutorials HTML page):
>
>      http://www.itk.org/CourseWare/Training/QuaternionsII.pdf
>
>      See page 24 for the definition of "Differentials" suitable
>      for Quaternion spaces. See page 25 for an example of how
>      you will apply that definition for computing the derivative
>      of f(x) = x^2.
>
>
>
>
> A.2) "I checked out Hamilton's book on quaternions,
>       but it didn't help me much":
>
>
>      Hamilton's book requires to be read
>      in sequence from the beggining. :-)
>
>      We are so accustomed now to "cartesian" notation that it
>      is hard to appreciate the great clarity of Hamilton's book.
>
>      I would strongly encourage you to give a second try at
>      Hamilton's book, maybe as a summer reading. If you follow
>      his description of the philosophical principles behind the
>      mathematical concepts, things will fall into place with
>      great clarity.
>
>
>
> A.3) "Is there any literature on the subject from this century,
>       preferably focusing on the given problem of image registration?"
>
>
>      Modern books on Quaternions are actually of very inferior
>      quality to Hamilton's one. They are more focused on the
>      notation details, and are usually handicaped by the obsesive
>      compulsion for describing quaterions as "even harder than
>      complex numbers".
>
>
>      The Tutorials on quaternions
>
>      http://www.itk.org/CourseWare/Training/QuaternionsI.pdf
>      http://www.itk.org/CourseWare/Training/QuaternionsII.pdf
>
>      will give you a Geometrical/Visual introductions to (some of)
>      the concepts described in Hamilton's book.
>
>
> B) "How does the Taylor expansion look like when using the versor
>     derivative?"
>
>    You will find the Taylor expansion described in
>
>         http://www.itk.org/CourseWare/Training/QuaternionsII.pdf
>
>    pages 43-44.
>
>
>      a) given recursive definition
>
>                  d^m(f(q)) = d( d^(m-1)( f(q) ) )
>
>      b) the Taylor's expansion of f(q), where q is a quaterion,
>         is
>
>       f(q+dq) = f(q) + df(q)/1! + d^2(f(q))/2! + d^3(f(q))/3!...
>
>
>      c) where f(q+dq) is a *linear* function of {q,dq,d^2(q),d^3(q)...}
>         and dq is *not necessarily* small.
>
>
>     The Taylor's approximation becomes then:
>
>
>        Fx = f(q+x.dq) - f(q) - x.d(f(q))/1! - x^2 d^2(f(q))/2! ....
>
>     (page 44).
>
>
>
> C) "Why is it possible to use an additive update here"
>     (in the   Quaternion Transform GetJacobian() method)
>    "even though it is claimed in Insight into Images that this
>     is not possible and a compositional update should be used?"
>
>
>    Excellent question!
>
>    The answer is that: mathematically, this is the incorrect operation,
>    *but* if the resulting Jacobian is used for performing a "small
>    enough" step, *and* it is followed by renomalization of the
>    Quaternion, then the result can be expected to be "close enough".
>
>    In other words, it is an engineering approximation to the proper
>    "on the sphere" computation.
>
>
>        Part of the great confusion about Quaternions on our field
>        and in computer graphics is that not enough people have
>        read Hamilton's original book    :-)
>
>
>    When Shoemaker introduced Quaternions as a mechanism for representing
>    rotations in 3D space, not enough emphasis was placed on the fact
>    that only "Unit Quaternions" are a proper representation of pure
>    rotations, while full Quaternions are a representation of rotation
>    *and scaling*.
>
>    The proper mathematical construct to use, should have always been
>    "Versors" instead of "Quaternions".
>
>    The apparent similarity of Quaterions addition operations with
>    standard Vector space operations can be easily misleading.
>
>    The computation of the GetJacobian() in the Quaternion transform
>    "allows" the quaterion to get out of the unit sphere, and then
>    reproject it back into the sphere. This is ok as long as it doesn't
>    travel too far.
>
>
>
> D) "Why is the regular derivative used here and not the versor
>     or quaternion derivative?"
>
>     Because the derivation is done with respect to the Quaternion
>     components without including the constraint of remaining in
>     the surface of the unit sphere.
>
>     In other words, this computation allows scaling factors to
>     be "momentarily" present in the Quaternion.
>
>     If you want to be formal, you should use the Versor version
>     of the Jacobian computation.
>
>
>
> E) "If we used quaternions for representing a similarity transform
>     instead of a rigid transform, could we just use an off-the-shelf
>     optimizer with an additive update, based on the regular Taylor
>     expansion?"
>
>     Yes, and this is indeed the case in which you want to use full
>     Quaternions. That is, a Quaternion is suitable for representing
>     *Similarity transforms* since they are a composition of Rigid
>     rotations and uniform Scaling.
>
>
> E.1) "If using an additive update works for quaternions,
>       why doesn't it work for versors?"
>
>      The additive update works for Quaterions, *BUT* it
>      corresponds to the space of Similarity transforms,
>      not to the space of rigid transforms. If you want to
>      use Quaternions for representing rigid transforms then
>      the computation of derivatives is equivalent to the
>      following:
>
>      Imagine that you are living in a sphere (e.g. the planet) and you
>      throw and object in a straight line tangent to the sphere surface
>      (disregard gravity for the sake of this example).  If your object
>      travels for 1km along that tangent line and then drops vertically
>      down to the surface. It will probably land in the same location
>      as if you have walked 1km ON the sphere's surface.
>
>      Imagine now what happens if you let the object travel for 100km
>      in the straight tangent line, and then drop vertically.
>
>      Imagine now what happens if you let the object travel for 1000km
>      in the straight tangent line, and then drop vertically.
>
>      (Keep in mind that Earth's radius is about 6,378 Km)
>
>      Imagine now what happens if you let the object travel for 6000km
>      in the straight tangent line, and then drop vertically.
>
>      at this point, you will see, how the Quaternion differential
>      will differ from the Versor differential.  In the Quaterion
>      differential you are allowed to go into space (out of earths's
>      surface), while in Versor differentials you are forced to
>      *walk on the surface* of the planet.
>
>
>
> E.2) "Why is the space of versor parameters not a vector space?"
>
>
>      Because adding the components of two Versor *does not* result
>      in another Versor.  In other words, Versors and the addition
>      operation do not form a Group.
>
>      This can be easily illustrated with the following example:
>
>
>                    Versor1 = ( 0.5, 0.0 ,0.0 )
>
>      is actually a Quaternion of the following components:
>
>               Quaterion1 = ( 0.5, 0.0, 0.0, sqrt(3)/2 )
>
>      since in this way we respect the condition of having
>      a "unit" quaternion.
>
>
>      Versor1 = Quaternion1 represents a rotation of 60 degrees
>      around the X axis.
>
>      because Versor components are
>
>                   sin(angle/2) * unit axis
>
>
>      if we add the components of two of these versors we get
>
>          Quaternion1 + Quaternion1 = ( 1.0, 0.0, 0.0, sqrt(3) )
>
>      which *is not* a unit quaternion, and therefore *is not*
>      a Versor.
>
>
>      You could "project it" on the Versor space, but that
>      will result on the issues that we described with the
>      example of throwing an object along the tangent of
>      the planet. That is, the trick works as long as you
>      don't throw the object very far (compared to the
>      radius of the planet).
>
>      For example,... if we normalize the result of
>      ( Quaternion1 + Quaternion1 ), we get....
>
>                  Quaternion1    :-)
>
>
> E.3) "Doesn't every triple (x,y,z) represent a valid versor rotation?"
>
>       Yes, every triplet (x,y,z) *with* ( (x^2 + y^2 + z^2 ) <= 1.0 )
>       represents a valid versor.
>
>        For example, the triplet
>
>                      ( 2.0, 0.0, 0.0 )
>
>       can't possibly represent a Versor.
>
>
>       The operation of addition of components is *not defined*
>       among Versors. Addition of components is defined among
>       Quaternions though, and leads to the behaviors observed
>       in (E.2).
>
>
> ---
>
>
>   Versors are better understood by using geometrical diagrams
>   drawn on the surface of the unit sphere (as presented on the
>   QuaternionI.pdf and QuaternionsII.pdf tutorials), than by
>   using "cartesian"-like notation of components. It is quite
>   unfortunate that our current education is more focused on
>   mathematical notation than on geometrical understanding...
>   but that's a discussion for a different mailing list  :-)
>
>
> ----
>
>
>
>    Please let us know if you have any further questions,
>
>
>
>       Thanks
>
>
>
>          Luis
>
>
>
>
> ----------------------
> Thomas Albrecht wrote:
>> Hi everybody,
>>
>> I have a couple of questions about the good old problem of optimizing 
>> with respect to a rigid transform represented by quaternions or 
>> versors. In ITK, there is a QuaternionRigidTransform and a 
>> VersorRigid3DTransform. Both implement a rigid transform in 3D. It is 
>> argued in the ITK Software Guide as well as in the "theory book" 
>> Insight into Images, that the space of unit quaternions or versors is 
>> not a vector space and therefore special optimization methods need to 
>> be designed in order to optimize the quaternion or versor parameters 
>> in a registration algorithm. Most importantly Insight into Images 
>> states that when optimizing with a gradient descent type optimizer, 
>> the update step needs to be performed by versor or quaternion 
>> composition and NOT by adding an update term to the current parameters.
>> For this reason, there are two specialized gradient descent 
>> optimizers: the QuaternionRigidTransformGradientDescentOptimizer and 
>> the VersorRigid3DTransformOptimizer. The versor optimizer 
>> re-implements the update step of the regular gradient descent 
>> optimizer in order to update the parameters by versor composition 
>> instead of addition.
>> However, the method of minimizing by steepest descent relies on the 
>> first term of the Taylor expansion, i.e. the linearization of the 
>> function with help of the first derivative. Using the regular 
>> derivative we all know and love implies the use of an additive update 
>> step. Therefore, in order to be able to perform a compositional 
>> update, the GetJacobian method of the VersorRigid3DTransform 
>> implements what is called the "versor derivative" in the ITK Software 
>> Guide. So here finally, are my first two questions:
>>
>>   How is the versor derivative defined and calculated?
>>   Is there any publication outlining the exact calculations leading 
>> to the versor derivative implemented in the VersorRigid3DTransform?
>>   I checked out Hamilton's book on quaternions, but it didn't help me 
>> much. Is there any literature on the subject from this century, 
>> preferably focusing on the given problem of image registration?
>>
>> I would like to use a smarter optimizer than the gradient descent 
>> already implemented in ITK. But in order to write a Newton or 
>> Quasi-Newton (Levenberg-Marquardt) optimization for the 
>> VersorRigid3DTransform (cf. 
>> http://www.itk.org/pipermail/insight-users/2006-September/019324.html 
>> ), one would need to know the Taylor expansion up to the second term 
>> of the cost function. This leads to the next question:
>>
>>   How does the Taylor expansion look like when using the versor 
>> derivative?
>>
>> The whole problem seems to be solved differently in the quaternion 
>> transform implemented in ITK. Contrary to the versor transform, the 
>> GetJacobian method of the QuaternionRigidTransform implements a 
>> regular derivative as we all know it. It is easily calculated by 
>> writing out the matrix-times-vector representation of the versor 
>> transform and taking the regular partial derivatives with respect to 
>> the four quaternion parameters w, x, y, and z. Accordingly, the 
>> QuaternionRigidTransformGradientDescentOptimizer uses a regular 
>> additive update in the gradient descent optimization. The only change 
>> is that it enforces the unit length of the updating quaternion by 
>> normalization in each step.
>>
>>   Why is it possible to use an additive update here even though it is 
>> claimed in Insight into Images that this is not possible and a 
>> compositional update should be used?
>>   Why is the regular derivative used here and not the versor or 
>> quaternion derivative?
>>   If we used quaternions for representing a similarity transform 
>> instead of a rigid transform, could we just use an off-the-shelf 
>> optimizer with an additive update, based on the regular Taylor 
>> expansion?
>>  This leads to two more questions:
>>
>>   If using an additive update works for quaternions, why doesn't it 
>> work for versors?
>>   Why is the space of versor parameters not a vector space? Doesn't 
>> every triple (x,y,z) represent a valid versor rotation?
>>
>> I hope that somebody finds the time to answer at least some of these 
>> numerous and complicated questions.
>>
>> Thanks
>>
>> Tom
>>
>> _______________________________________________
>> Insight-users mailing list
>> Insight-users at itk.org
>> http://www.itk.org/mailman/listinfo/insight-users
>>
>