[Insight-users] Quaternions and Versors

Wed Jun 18 16:06:25 EDT 2008

Hi Thomas,

1) Yes, As long as you are addressing a Similarity Transform (not the
    more restrictive Rigid Transform) you should be fine using a
    Quaternion along with any optimizer that assumes that the parametric
    space is additive. (e.g the LBFGS optimizer).

    Note that the itkQuaternionRigidTransform is crafted for perfoming
    Rigid Transforms, not Similarity ones.  You may modify this class
    (and rename it to something like itkQuaternionSimilarityTransform)
    and remove the quaternion normalization that are performed inside.

    and... of course.. at that point, it will be great if you contribute
    this class to ITK by submitting it to the Insight Journal   :-)

2) It is great that you are digging more into Hamilton's book.

    Note that in ITK we are not deriving with respect to the Versor.

    We are deriving with respect to the Versor components, which is
    not quite the same thing.

3) About the Taylor expansion

    As you correctly pointed out,

    the expression:

       f( q + dq )

    is not the same as

       f( q . dq )

    The concept of a Taylor expansion applies to Vector spaces
    (e.g. those were the addition operation form a group)

    One way to approach this problem is to note that quaternion
    composition (and therefore Versor composition) behave more
    like exponential functions if the rotation angles.

    therefore the concept of

                          f( q + x dq )

    is better represented as

                          f( q . (dq)^x )

    This is explained towards the end of the slides as part of
    the description of the spherical linear interpolation concept.

      Regards,

          Luis

----------------------
Thomas Albrecht wrote:
> Hi Luis,
> 
> thank you so much for the extensive answer. It helped me understand the 
> whole idea a lot better. So basically, as long as I actually want to use 
> a similarity transform for my registration, I can just use a quaternion 
> transform with for instance an LBFGS optimizer.
> 
> But the whole thing sparked my interest a bit more. I've looked into 
> Hamilton's book and the tutorials more closely, in order to understand 
> what is actually going on with the itk::VersorTransform and -Optimizer.
> It looks like the rotation performed by the versor transform which 
> transforms a vector v can be expressed by the quaternion equation:
> 
>    f(q) = q . v . q^(-1)
> 
> In order to minimize with respect to q, we form the differential 
> according to all the rules in Hamilton's book:
> 
>   dfq =  df(q,dq) = dq . v . q^(-1)  -  q . v . q^(-1) . dq . q^(-1)
> 
> So how do I get from this expression to the Jacobian implemented in the 
> itk::VersorTransform?
> 
> Looking at the whole problem form the optimization point of view poses 
> yet another question:
> Thanks to the Taylor expansion we have
> 
>    f(q + dq) \approx f(q) + dfq
> 
> and
> 
>    f(q + x dq) \approx f(q) + x dfq
> 
> In regular gradient descent optimization, the corresponding linear 
> approximation can be seen as the basis for the minimization. For smarter 
> optimization techniques, the second term of the Taylor expansion can be 
> used as well.
> So if we assume that f is now the cost function for which we wish to 
> optimize with respect to the versor or quaternion q, this implies using 
> an additive update dq or x dq . But you have convinced me that just 
> adding an update would not be the correct operation in the versor space. 
> So how is the optimization formulated and motivated? At this point, the 
> tutorial doesn't offer too many details besides a couple of drawings. Do 
> we just assume that:
> 
>    f(q . dq) \approx f(q) + dfq
> 
> or
> 
>    f(q . dq^x) \approx f(q) + x dfq
> 
> 
> Or can this actually be proved?
> 
> Well , I hope you find some time to shed a little more light on this 
> interesting yet complicated topic.
> 
> Thanks
> 
> Tom
> 
> 
> 
> Luis Ibanez wrote:
> 
>>
>>
>> Hi Thomas,
>>
>>
>> Thanks for the very detailed introduction to your questions.
>>
>>
>> Yes, as you pointed out, the Versor space is not a Vector space, in
>>      the sense that in order to combine two Versors you must use the
>>      operation of Versor composition, instead of the simple addition
>>      of their components.
>>
>>
>> About your specific questions:
>>
>>
>> A) "How is the versor derivative defined and calculated?"
>>
>>    Hamilton's book on quaternions describes the proper definition
>>    of derivatives that applies to the Quaternion space.
>>
>>    Instead of the well-know definition of differential that
>>    we use for Vector spaces:
>>
>>        df(x)/dx =  lim( h->0 )( ( f(x+h) - f(x) ) / h )
>>
>>     Hamilton uses the following definition of differential for
>>     Quaternions spaces:
>>
>>        df(x)    = lim ( n->inf )[  f( x + (dx/n) ) - f(x) ]
>>
>>
>> A.1) "Is there any publication outlining the exact calculations
>>       leading to the versor derivative implemented in the
>>       VersorRigid3DTransform?"
>>
>>      Yes, the computations are based on Hamilton's book description.
>>
>>      The full analysis is presented in the ITK Tutorials
>>
>>             http://www.itk.org/HTML/Tutorials.htm
>>
>>      specifically in the QuaternionsII.pdf tutorial
>>      (toward the end of the Tutorials HTML page):
>>
>>      http://www.itk.org/CourseWare/Training/QuaternionsII.pdf
>>
>>      See page 24 for the definition of "Differentials" suitable
>>      for Quaternion spaces. See page 25 for an example of how
>>      you will apply that definition for computing the derivative
>>      of f(x) = x^2.
>>
>>
>>
>>
>> A.2) "I checked out Hamilton's book on quaternions,
>>       but it didn't help me much":
>>
>>
>>      Hamilton's book requires to be read
>>      in sequence from the beggining. :-)
>>
>>      We are so accustomed now to "cartesian" notation that it
>>      is hard to appreciate the great clarity of Hamilton's book.
>>
>>      I would strongly encourage you to give a second try at
>>      Hamilton's book, maybe as a summer reading. If you follow
>>      his description of the philosophical principles behind the
>>      mathematical concepts, things will fall into place with
>>      great clarity.
>>
>>
>>
>> A.3) "Is there any literature on the subject from this century,
>>       preferably focusing on the given problem of image registration?"
>>
>>
>>      Modern books on Quaternions are actually of very inferior
>>      quality to Hamilton's one. They are more focused on the
>>      notation details, and are usually handicaped by the obsesive
>>      compulsion for describing quaterions as "even harder than
>>      complex numbers".
>>
>>
>>      The Tutorials on quaternions
>>
>>      http://www.itk.org/CourseWare/Training/QuaternionsI.pdf
>>      http://www.itk.org/CourseWare/Training/QuaternionsII.pdf
>>
>>      will give you a Geometrical/Visual introductions to (some of)
>>      the concepts described in Hamilton's book.
>>
>>
>> B) "How does the Taylor expansion look like when using the versor
>>     derivative?"
>>
>>    You will find the Taylor expansion described in
>>
>>         http://www.itk.org/CourseWare/Training/QuaternionsII.pdf
>>
>>    pages 43-44.
>>
>>
>>      a) given recursive definition
>>
>>                  d^m(f(q)) = d( d^(m-1)( f(q) ) )
>>
>>      b) the Taylor's expansion of f(q), where q is a quaterion,
>>         is
>>
>>       f(q+dq) = f(q) + df(q)/1! + d^2(f(q))/2! + d^3(f(q))/3!...
>>
>>
>>      c) where f(q+dq) is a *linear* function of {q,dq,d^2(q),d^3(q)...}
>>         and dq is *not necessarily* small.
>>
>>
>>     The Taylor's approximation becomes then:
>>
>>
>>        Fx = f(q+x.dq) - f(q) - x.d(f(q))/1! - x^2 d^2(f(q))/2! ....
>>
>>     (page 44).
>>
>>
>>
>> C) "Why is it possible to use an additive update here"
>>     (in the   Quaternion Transform GetJacobian() method)
>>    "even though it is claimed in Insight into Images that this
>>     is not possible and a compositional update should be used?"
>>
>>
>>    Excellent question!
>>
>>    The answer is that: mathematically, this is the incorrect operation,
>>    *but* if the resulting Jacobian is used for performing a "small
>>    enough" step, *and* it is followed by renomalization of the
>>    Quaternion, then the result can be expected to be "close enough".
>>
>>    In other words, it is an engineering approximation to the proper
>>    "on the sphere" computation.
>>
>>
>>        Part of the great confusion about Quaternions on our field
>>        and in computer graphics is that not enough people have
>>        read Hamilton's original book    :-)
>>
>>
>>    When Shoemaker introduced Quaternions as a mechanism for representing
>>    rotations in 3D space, not enough emphasis was placed on the fact
>>    that only "Unit Quaternions" are a proper representation of pure
>>    rotations, while full Quaternions are a representation of rotation
>>    *and scaling*.
>>
>>    The proper mathematical construct to use, should have always been
>>    "Versors" instead of "Quaternions".
>>
>>    The apparent similarity of Quaterions addition operations with
>>    standard Vector space operations can be easily misleading.
>>
>>    The computation of the GetJacobian() in the Quaternion transform
>>    "allows" the quaterion to get out of the unit sphere, and then
>>    reproject it back into the sphere. This is ok as long as it doesn't
>>    travel too far.
>>
>>
>>
>> D) "Why is the regular derivative used here and not the versor
>>     or quaternion derivative?"
>>
>>     Because the derivation is done with respect to the Quaternion
>>     components without including the constraint of remaining in
>>     the surface of the unit sphere.
>>
>>     In other words, this computation allows scaling factors to
>>     be "momentarily" present in the Quaternion.
>>
>>     If you want to be formal, you should use the Versor version
>>     of the Jacobian computation.
>>
>>
>>
>> E) "If we used quaternions for representing a similarity transform
>>     instead of a rigid transform, could we just use an off-the-shelf
>>     optimizer with an additive update, based on the regular Taylor
>>     expansion?"
>>
>>     Yes, and this is indeed the case in which you want to use full
>>     Quaternions. That is, a Quaternion is suitable for representing
>>     *Similarity transforms* since they are a composition of Rigid
>>     rotations and uniform Scaling.
>>
>>
>> E.1) "If using an additive update works for quaternions,
>>       why doesn't it work for versors?"
>>
>>      The additive update works for Quaterions, *BUT* it
>>      corresponds to the space of Similarity transforms,
>>      not to the space of rigid transforms. If you want to
>>      use Quaternions for representing rigid transforms then
>>      the computation of derivatives is equivalent to the
>>      following:
>>
>>      Imagine that you are living in a sphere (e.g. the planet) and you
>>      throw and object in a straight line tangent to the sphere surface
>>      (disregard gravity for the sake of this example).  If your object
>>      travels for 1km along that tangent line and then drops vertically
>>      down to the surface. It will probably land in the same location
>>      as if you have walked 1km ON the sphere's surface.
>>
>>      Imagine now what happens if you let the object travel for 100km
>>      in the straight tangent line, and then drop vertically.
>>
>>      Imagine now what happens if you let the object travel for 1000km
>>      in the straight tangent line, and then drop vertically.
>>
>>      (Keep in mind that Earth's radius is about 6,378 Km)
>>
>>      Imagine now what happens if you let the object travel for 6000km
>>      in the straight tangent line, and then drop vertically.
>>
>>      at this point, you will see, how the Quaternion differential
>>      will differ from the Versor differential.  In the Quaterion
>>      differential you are allowed to go into space (out of earths's
>>      surface), while in Versor differentials you are forced to
>>      *walk on the surface* of the planet.
>>
>>
>>
>> E.2) "Why is the space of versor parameters not a vector space?"
>>
>>
>>      Because adding the components of two Versor *does not* result
>>      in another Versor.  In other words, Versors and the addition
>>      operation do not form a Group.
>>
>>      This can be easily illustrated with the following example:
>>
>>
>>                    Versor1 = ( 0.5, 0.0 ,0.0 )
>>
>>      is actually a Quaternion of the following components:
>>
>>               Quaterion1 = ( 0.5, 0.0, 0.0, sqrt(3)/2 )
>>
>>      since in this way we respect the condition of having
>>      a "unit" quaternion.
>>
>>
>>      Versor1 = Quaternion1 represents a rotation of 60 degrees
>>      around the X axis.
>>
>>      because Versor components are
>>
>>                   sin(angle/2) * unit axis
>>
>>
>>      if we add the components of two of these versors we get
>>
>>          Quaternion1 + Quaternion1 = ( 1.0, 0.0, 0.0, sqrt(3) )
>>
>>      which *is not* a unit quaternion, and therefore *is not*
>>      a Versor.
>>
>>
>>      You could "project it" on the Versor space, but that
>>      will result on the issues that we described with the
>>      example of throwing an object along the tangent of
>>      the planet. That is, the trick works as long as you
>>      don't throw the object very far (compared to the
>>      radius of the planet).
>>
>>      For example,... if we normalize the result of
>>      ( Quaternion1 + Quaternion1 ), we get....
>>
>>                  Quaternion1    :-)
>>
>>
>> E.3) "Doesn't every triple (x,y,z) represent a valid versor rotation?"
>>
>>       Yes, every triplet (x,y,z) *with* ( (x^2 + y^2 + z^2 ) <= 1.0 )
>>       represents a valid versor.
>>
>>        For example, the triplet
>>
>>                      ( 2.0, 0.0, 0.0 )
>>
>>       can't possibly represent a Versor.
>>
>>
>>       The operation of addition of components is *not defined*
>>       among Versors. Addition of components is defined among
>>       Quaternions though, and leads to the behaviors observed
>>       in (E.2).
>>
>>
>> ---
>>
>>
>>   Versors are better understood by using geometrical diagrams
>>   drawn on the surface of the unit sphere (as presented on the
>>   QuaternionI.pdf and QuaternionsII.pdf tutorials), than by
>>   using "cartesian"-like notation of components. It is quite
>>   unfortunate that our current education is more focused on
>>   mathematical notation than on geometrical understanding...
>>   but that's a discussion for a different mailing list  :-)
>>
>>
>> ----
>>
>>
>>
>>    Please let us know if you have any further questions,
>>
>>
>>
>>       Thanks
>>
>>
>>
>>          Luis
>>
>>
>>
>>
>> ----------------------
>> Thomas Albrecht wrote:
>>
>>> Hi everybody,
>>>
>>> I have a couple of questions about the good old problem of optimizing 
>>> with respect to a rigid transform represented by quaternions or 
>>> versors. In ITK, there is a QuaternionRigidTransform and a 
>>> VersorRigid3DTransform. Both implement a rigid transform in 3D. It is 
>>> argued in the ITK Software Guide as well as in the "theory book" 
>>> Insight into Images, that the space of unit quaternions or versors is 
>>> not a vector space and therefore special optimization methods need to 
>>> be designed in order to optimize the quaternion or versor parameters 
>>> in a registration algorithm. Most importantly Insight into Images 
>>> states that when optimizing with a gradient descent type optimizer, 
>>> the update step needs to be performed by versor or quaternion 
>>> composition and NOT by adding an update term to the current parameters.
>>> For this reason, there are two specialized gradient descent 
>>> optimizers: the QuaternionRigidTransformGradientDescentOptimizer and 
>>> the VersorRigid3DTransformOptimizer. The versor optimizer 
>>> re-implements the update step of the regular gradient descent 
>>> optimizer in order to update the parameters by versor composition 
>>> instead of addition.
>>> However, the method of minimizing by steepest descent relies on the 
>>> first term of the Taylor expansion, i.e. the linearization of the 
>>> function with help of the first derivative. Using the regular 
>>> derivative we all know and love implies the use of an additive update 
>>> step. Therefore, in order to be able to perform a compositional 
>>> update, the GetJacobian method of the VersorRigid3DTransform 
>>> implements what is called the "versor derivative" in the ITK Software 
>>> Guide. So here finally, are my first two questions:
>>>
>>>   How is the versor derivative defined and calculated?
>>>   Is there any publication outlining the exact calculations leading 
>>> to the versor derivative implemented in the VersorRigid3DTransform?
>>>   I checked out Hamilton's book on quaternions, but it didn't help me 
>>> much. Is there any literature on the subject from this century, 
>>> preferably focusing on the given problem of image registration?
>>>
>>> I would like to use a smarter optimizer than the gradient descent 
>>> already implemented in ITK. But in order to write a Newton or 
>>> Quasi-Newton (Levenberg-Marquardt) optimization for the 
>>> VersorRigid3DTransform (cf. 
>>> http://www.itk.org/pipermail/insight-users/2006-September/019324.html 
>>> ), one would need to know the Taylor expansion up to the second term 
>>> of the cost function. This leads to the next question:
>>>
>>>   How does the Taylor expansion look like when using the versor 
>>> derivative?
>>>
>>> The whole problem seems to be solved differently in the quaternion 
>>> transform implemented in ITK. Contrary to the versor transform, the 
>>> GetJacobian method of the QuaternionRigidTransform implements a 
>>> regular derivative as we all know it. It is easily calculated by 
>>> writing out the matrix-times-vector representation of the versor 
>>> transform and taking the regular partial derivatives with respect to 
>>> the four quaternion parameters w, x, y, and z. Accordingly, the 
>>> QuaternionRigidTransformGradientDescentOptimizer uses a regular 
>>> additive update in the gradient descent optimization. The only change 
>>> is that it enforces the unit length of the updating quaternion by 
>>> normalization in each step.
>>>
>>>   Why is it possible to use an additive update here even though it is 
>>> claimed in Insight into Images that this is not possible and a 
>>> compositional update should be used?
>>>   Why is the regular derivative used here and not the versor or 
>>> quaternion derivative?
>>>   If we used quaternions for representing a similarity transform 
>>> instead of a rigid transform, could we just use an off-the-shelf 
>>> optimizer with an additive update, based on the regular Taylor 
>>> expansion?
>>>  This leads to two more questions:
>>>
>>>   If using an additive update works for quaternions, why doesn't it 
>>> work for versors?
>>>   Why is the space of versor parameters not a vector space? Doesn't 
>>> every triple (x,y,z) represent a valid versor rotation?
>>>
>>> I hope that somebody finds the time to answer at least some of these 
>>> numerous and complicated questions.
>>>
>>> Thanks
>>>
>>> Tom
>>>
>>> _______________________________________________
>>> Insight-users mailing list
>>> Insight-users at itk.org
>>> http://www.itk.org/mailman/listinfo/insight-users
>>>
>>
> 
> _______________________________________________
> Insight-users mailing list
> Insight-users at itk.org
> http://www.itk.org/mailman/listinfo/insight-users
>