[Insight-users] parallelization

Mon Sep 7 02:47:25 EDT 2009

Solved!

I used 8 cores spread over multiple machines. This is fine for an MPI
job, but not SMP.
Therefore, running the code on the cores available on a single node
saved the day: now with 8 cores it is taking only 8 minutes (instead
of 58' of a single core computation). That's pretty good.

cheers,
Mauro

On Sat, Sep 5, 2009 at 1:58 AM, Bill Lorensen<bill.lorensen at gmail.com> wrote:
> Are you sure your HPC facility allows you to get more than one
> processor? Have you tried a small example on a local machine with
> multiple processors?
>
> Bill
>
> On Fri, Sep 4, 2009 at 11:51 AM, Mauro Maiorca<mauromaiorca at gmail.com> wrote:
>> I compile with this:
>> $ c++ -v
>> Using built-in specs.
>> Target: x86_64-linux-gnu
>> Configured with: ../src/configure -v --with-pkgversion='Ubuntu
>> 4.3.3-5ubuntu4'
>> --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs
>> --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
>> --enable-shared --with-system-zlib --libexecdir=/usr/lib
>> --without-included-gettext --enable-threads=posix --enable-nls
>> --with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3
>> --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc
>> --enable-mpfr --with-tune=generic --enable-checking=release
>> --build=x86_64-linux-gnu --host=x86_64-linux-gnu
>> --target=x86_64-linux-gnu
>> Thread model: posix
>> gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
>>
>> I execute in a High Performance Computing (HPC) facility with linux (I
>> launch the program using a predefined Portable Batch System script,
>> where I can choose the number of processors for the task).
>>
>> cheers,
>> Mauro
>>
>>
>> On Sat, Sep 5, 2009 at 1:30 AM, Bill Lorensen<bill.lorensen at gmail.com> wrote:
>>> What hardware, OS and compiler are you using?
>>>
>>> On Fri, Sep 4, 2009 at 7:39 AM, Mauro Maiorca<mauromaiorca at gmail.com> wrote:
>>>> Hi Dan & list,
>>>>
>>>> I'm using
>>>> MattesMutualInformationImageToImageMetric
>>>> LinearInterpolateImageFunction
>>>>
>>>> thanks for mentioning the IJ, it's an excellent source of info indeed!
>>>> However, according to the article "Optimizing ITK’s Registration
>>>> Methods for Multi-processor, Shared-Memory Systems" my code should be
>>>> optimized.
>>>> I would be really pleased to see any improvement on a multicore
>>>> architecture (even because I'm planning to register quite big volumes
>>>> using MMI).
>>>>
>>>> Do you have any other clue/test/whatever to solve the problem?
>>>>
>>>>
>>>> cheers,
>>>> Mauro
>>>>
>>>>
>>>> On Fri, Sep 4, 2009 at 8:59 PM, Dan Mueller<dan.muel at gmail.com> wrote:
>>>>> Hi Mauro,
>>>>>
>>>>> Please report to the list what registration components you are using.
>>>>>
>>>>> ie. Linear Interpolator, Mattes Mutual Information Metric, Regular
>>>>> Step Gradient Descent.
>>>>>
>>>>> Only _some_ components have been optimized for parallelization. For
>>>>> which, please refer to the IJ article or the Code/Review folder (look
>>>>> for itkOpt*).
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Cheers, Dan
>>>>>
>>>>> 2009/9/4 Mauro Maiorca <mauromaiorca at gmail.com>:
>>>>>> Thankyou very much Dan, John, and  Michael,
>>>>>>
>>>>>> I compiled again ITK with:
>>>>>> ITK_USE_OPTIMIZED_REGISTRATION_METHODS = ON
>>>>>> And still the same excecution time!
>>>>>>
>>>>>> I sent the task to only one processor (exec time 56'14") , four
>>>>>> processors (56'57"), and 8 processors (56'16"). Pretty much the same
>>>>>> time, no matter the number of processors involved in the computation.
>>>>>>
>>>>>> I also tried to add this to the ccmake of my application
>>>>>> CMAKE_CXX_FLAGS   -lpthread
>>>>>> It doesn't make any (significant) difference at all!
>>>>>>
>>>>>> any other clue?
>>>>>> cheers,
>>>>>> Mauro
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 4, 2009 at 1:10 AM, Dan Mueller<dan.muel at gmail.com> wrote:
>>>>>>> Hi Mauro,
>>>>>>>
>>>>>>> To make use of multiple cores for registration, please set the CMake variable
>>>>>>>    ITK_USE_OPTIMIZED_REGISTRATION_METHODS = ON
>>>>>>>
>>>>>>> This allows some interpolators/metrics to utilize multiple cores. For
>>>>>>> full details, please see:
>>>>>>>    http://www.insight-journal.org/browse/publication/172
>>>>>>>
>>>>>>> Hope this helps.
>>>>>>>
>>>>>>> Regards, Dan
>>>>>>>
>>>>>>> 2009/9/3 Mauro Maiorca <mauromaiorca at gmail.com>:
>>>>>>>> Hello list,
>>>>>>>>
>>>>>>>> I'm wondering whether the ITK code is optimized for running on a
>>>>>>>> multiprocessor machine?
>>>>>>>> I wrote a program to register 2 images but it is quite slow because of
>>>>>>>> many resamplings, so I'm running it on a 8 processor architecture ...
>>>>>>>> but no significant improvements.
>>>>>>>>
>>>>>>>> I'm using Itk 3.14 compiled (linux) with the following flags:
>>>>>>>>
>>>>>>>>  CMAKE_THREAD_LIBS                -lpthread
>>>>>>>>  CMAKE_USE_PTHREADS               ON
>>>>>>>>
>>>>>>>> any suggestion?
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> Mauro
>>>>>>>
>>>>>>
>>>>>
>>>> _____________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Please keep messages on-topic and check the ITK FAQ at: http://www.itk.org/Wiki/ITK_FAQ
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://www.itk.org/mailman/listinfo/insight-users
>>>>
>>>
>>
>