[ITK] [ITK-dev] efficiency of vnl_matrix

Thu Mar 12 11:39:03 EDT 2015

Hi,

For (2), the attachment is a comparison I made.
It adds additional tests on vnl to the code from the previous link
http://nghiaho.com/?p=1726 .
Some header files can be downloaded from
https://github.com/DiffusionMRITool/dmritool/tree/master/Modules/HelperFunctions/include

Then you can build and run
// With eigen
// g++ -DTEST_EIGEN test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse_vnl -lopencv_core  -O3 -DNDEBUG

// With ARMA OpenBLAS
// g++ -DTEST_ARMA test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse -lopencv_core -larmadillo -lgomp -fopenmp
-lopenblas -O3 -DNDEBUG -DHAVE_INLINE

// with vnl
// g++ -DTEST_VNL test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse -lopencv_core -lvnl -lvnl_algo
-I/usr/include/vxl/core -I/usr/include/vxl/vcl -O3 -DNDEBUG
-DUTL_USE_FASTLAPACK

// with vnl + openblas
// g++ -DTEST_VNL_BLAS test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse -lopencv_core -lopenblas  -lvnl -lvnl_algo
-I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG  -O3
-DUTL_USE_FASTLAPACK
// with vnl + mkl
// g++ -DTEST_VNL_BLAS test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse -lopencv_core -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm  -lvnl -lvnl_algo
-I/usr/include/vxl/core -I/usr/include/vxl/vcl -O3 -DNDEBUG
-DUTL_USE_FASTLAPACK

// with utl+openblas
// g++ -DTEST_UTL test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse -lopencv_core -lopenblas  -lvnl -lvnl_algo
-I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG  -O3
-DUTL_USE_FASTLAPACK
// with utl + mkl;
// g++ -DTEST_UTL test_matrix_pseudoinverse.cpp -o
test_matrix_pseudoinverse -lopencv_core -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm  -lvnl -lvnl_algo
-I/usr/include/vxl/core -I/usr/include/vxl/vcl -DNDEBUG -O3
-DUTL_USE_FASTLAPACK

In my experiments, using blas functions from mkl is the most efficient way.

best,
Jian Cheng

On 03/12/2015 10:41 AM, Matt McCormick wrote:
> Hi,
>
> From the discussion so far, it appears the following serious of steps
> could be taken to move forward on performance:
>
> 1) Replace vnl_vector and vnl_matrix with vnl_vector_fixed and
> vnl_matrix_fixed when possible.
>
> 2) Add Jian Cheng's BLAS and LAPACK backends for vnl_vector and vnl_matrix.
>
> 3) Add support for armadillo or eigen.
>
> 1) and 2) will be relatively easy to make, and will hopefully have an
> immediate impact on performance.  3) will take make work to happen,
> and it will take longer to impact the toolkit.  We will need to have
> cross-platform builds of the libraries in the repository.  Also, many
> ITK classes encapsulate their use of VNL very poorly, so it will not
> be as simple as swapping out or improving their backends.
>
> 2 cents,
> Matt
>
> On Thu, Mar 12, 2015 at 10:20 AM, Bradley Lowekamp
> <blowekamp at mail.nih.gov> wrote:
>> Chuck,
>>
>> Thank you for giving us that important conclusion, under a quite difficult
>> situations.
>>
>> I wonder if there is any distinction in the usage of vnl_matrix vs
>> vnl_matrix_fixed. I would expect that operations done for pixel transforms
>> should have there dimension know at run-time and should be able to use the
>> vnl_matrix_fixed.
>>
>> I also have considered methods to transform a whole array of points at a
>> time. I wonder if for 3x3*3*256 sized operations ( scan-line size ) if there
>> would be benefit with the library based operations.
>>
>> Brad
>>
>>
>> On Mar 12, 2015, at 10:02 AM, Chuck Atkins <chuck.atkins at kitware.com> wrote:
>>
>> I worked with Julie Langou, maintainer of LAPACK, on this project a few
>> years ago.  The funding situation ended up very strange and messy and we
>> basically had to cram 3 months worth of effort into 3 weeks, so needless to
>> say, we were not able to really achieve our goals.  However, we spent a fair
>> amount of time profiling ITK and analyzing it's hot spots from vnl to
>> determine where to best spend the small ammount of time we had.  The results
>> were not as straight forward as we expected.  It turns out that most of the
>> use for vnl_matrix and vnl_vector was actually for an enourmous number of
>> operations on very small sized vectors and matricies (dimensions of 2, 3, or
>> 4), often for coordinate and geometry calculations or for small per-pixel
>> operations that were not easily vectorized in the implementation at the
>> time.  In these cases, the overhead of calling out to a BLAS or LAPACK
>> library was much too expensive and the existing use of VNL was far more
>> optimal.  This falls apart, however when trying to use vnl for more complex
>> algorithms since the larger matrix operations will be where the benefit can
>> be seen.  So just re-implementing the vnl vector and matrix classes and
>> operators with underlying BLAS and LAPACK routines turned out to not be the
>> best solution for ITK as a whole.
>>
>> - Chuck
>> tage of the performance gains of large block matrix and vector
>> operations seen with optimized BLAS and LAPACK libraries, the
>> computations needed to be re-worked to act in an SoA (struct of
>> arrays) fashion instead.  Given our limited time and resources, this
>> was out of scope for what we could tackle.
>>
>> * Typically AoS and SoA refer to storage layout but I'm using it to
>> refer to computation layout.  The terminology may not be correct but
>> I think you can understand what I mean.
>> On Thu, Mar 12, 2015 at 8:32 AM, Bradley Lowekamp <blowekamp at mail.nih.gov>
>> wrote:
>>> Hello,
>>>
>>> If I was writing my own ITK classes, and needed a fast matrix library I
>>> would likely pursue an additional dependency on an efficient numeric library
>>> for that project, such as eigen.
>>>
>>> However for the broad appeal of ITK I would think a flexible back end
>>> would be best. As I think there are a variety of BLAS and LAPACK libraries
>>> available ( commercial, open source, vender free ). It would be nice to pick
>>> one what has been optimized for the current architecture. I would think it
>>> would be most flexible to use this interface in the back end of a chosen
>>> numeric interface ( currently VNL ). Unfortunately, I don't have as much
>>> experience with these libraries as I'd like.
>>>
>>> Brad
>>>
>>> On Mar 12, 2015, at 5:15 AM, m.staring at lumc.nl wrote:
>>>
>>>> Hi,
>>>>
>>>> I think the eigen library is a mature and very fast library for these
>>>> kind of things:
>>>> http://eigen.tuxfamily.org/index.php?title=Main_Page
>>>>
>>>> You may want to check it out, to see if it offers what you need.
>>>>
>>>> It would be great to be able to use these within the itk.
>>>>
>>>> 2c
>>>> Marius
>>>>
>>>> -----Original Message-----
>>>> From: Insight-developers [mailto:insight-developers-bounces at itk.org] On
>>>> Behalf Of Jian Cheng
>>>> Sent: Wednesday, March 11, 2015 23:17
>>>> To: Matt McCormick
>>>> Cc: Chuck Atkins; ITK
>>>> Subject: Re: [ITK-dev] efficiency of vnl_matrix
>>>>
>>>> Hi Matt,
>>>>
>>>> Thanks for your help, and also for the ITK workshop in UNC last time.
>>>>
>>>> It is very unfortunate. The efficiency of these numerical math operators
>>>> are very important for many applications.
>>>>
>>>> I recently released an ITK based toolbox, called dmritool, for diffusion
>>>> MRI data processing.
>>>> It has some files to add some supports of blas, lapack, mkl to
>>>> vnl_matrix and vnl_vector.
>>>>
>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlBlas_8h_source.html
>>>>
>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlVNLBlas_8h_source.html
>>>>
>>>> Those functions are not internally for vnl_matrix class. They are
>>>> operators for the data pointer stored in vnl_matrix object.
>>>> Thus, later I made a N-dimensional array library which internally
>>>> includes those functions, and also supports expression template to avoid
>>>> temporary copies.
>>>>
>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlMatrix_8h_source.html
>>>>
>>>> http://diffusionmritool.github.io/dmritool_doxygen/utlVector_8h_source.html
>>>>
>>>> The efficiency comparison between vnl_vector/vnl_matrix and the
>>>> vector/matrix using openblas, lapack, or mkl can be found by running those
>>>> two tests
>>>> https://github.com/DiffusionMRITool/dmritool/blob/master/Modules/HelperFunctions/test/utlVNLBlasGTest.cxx
>>>>
>>>> https://github.com/DiffusionMRITool/dmritool/blob/master/Modules/HelperFunctions/test/utlVNLLapackGTest.cxx
>>>>
>>>> Maybe some codes can be used as patches in somewhere in ITK. I am not
>>>> sure. Maybe we need more discussion on it.
>>>> With your help and discussion, I will be very glad to make my first
>>>> patch to ITK.
>>>> Thanks.
>>>>
>>>> best,
>>>> Jian Cheng
>>>>
>>>>
>>>> On 03/11/2015 04:39 PM, Matt McCormick wrote:
>>>>> Hi Jian,
>>>>>
>>>>> Yes, it would be wonderful to improve the efficiency of these basic
>>>>> numerical operations.
>>>>>
>>>>> Funding for the Refactor Numerical Libraries has currently ended, and
>>>>> the effort is currently frozen.  However, you are more than welcome to
>>>>> pick it up and we can help you get it into ITK.  More information on
>>>>> the patch submission process can be found here [1] and in the ITK
>>>>> Software Guide.
>>>>>
>>>>> Thanks,
>>>>> Matt
>>>>>
>>>>> [1]
>>>>> https://insightsoftwareconsortium.github.io/ITKBarCamp-doc/CommunitySo
>>>>> ftwareProcess/SubmitAPatchToGerrit/index.html
>>>>>
>>>>> On Wed, Mar 11, 2015 at 4:07 PM, Jian Cheng <jian.cheng.1983 at gmail.com>
>>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> My task using ITK has intensive matrix-matrix product, pseudo-inverse,
>>>>>> etc.
>>>>>> Thus the performance is actually mainly determined by the matrix
>>>>>> library I used.
>>>>>> Firstly I use vnl_matrix and vnl_vector in ITK. Then I found it is
>>>>>> very inefficient because vnl matrix lib does not use blas and lapack.
>>>>>> After I wrote my own matrix class which uses openblas and lapack, I
>>>>>> got a hug gain of performance.
>>>>>>
>>>>>> I found there is a proposal to improve the efficiency of numerical
>>>>>> libraries in ITK.
>>>>>> http://www.itk.org/Wiki/ITK/Release_4/Refactor_Numerical_Libraries
>>>>>> I am not sure how is the progress of the proposal.
>>>>>> I wonder when the vnl matrix lib can internally support blas and
>>>>>> lapack, or mkl, so that we can just use it without lose of the
>>>>>> efficiency.
>>>>>> Thanks.
>>>>>>
>>>>>> best,
>>>>>> Jian Cheng
>>>>>> _______________________________________________
>>>>>> Powered by www.kitware.com
>>>>>>
>>>>>> Visit other Kitware open-source projects at
>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>
>>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>>> http://kitware.com/products/protraining.php
>>>>>>
>>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>>
>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>> http://public.kitware.com/mailman/listinfo/insight-developers
>>>> _______________________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Kitware offers ITK Training Courses, for more information visit:
>>>> http://kitware.com/products/protraining.php
>>>>
>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://public.kitware.com/mailman/listinfo/insight-developers
>>>> _______________________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Kitware offers ITK Training Courses, for more information visit:
>>>> http://kitware.com/products/protraining.php
>>>>
>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://public.kitware.com/mailman/listinfo/insight-developers
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_matrix_pseudoinverse.cpp
Type: text/x-c++src
Size: 11455 bytes
Desc: not available
URL: <http://public.kitware.com/pipermail/community/attachments/20150312/0a5c1e63/attachment-0001.cpp>
-------------- next part --------------
_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html

Kitware offers ITK Training Courses, for more information visit:
http://kitware.com/products/protraining.php

Please keep messages on-topic and check the ITK FAQ at:
http://www.itk.org/Wiki/ITK_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/insight-developers