[Insight-users] parallelization

Mon Sep 7 07:44:19 EDT 2009

Also, make sure you build your build type is release.

On Mon, Sep 7, 2009 at 2:47 AM, Mauro Maiorca<mauromaiorca at gmail.com> wrote:
> Solved!
>
> I used 8 cores spread over multiple machines. This is fine for an MPI
> job, but not SMP.
> Therefore, running the code on the cores available on a single node
> saved the day: now with 8 cores it is taking only 8 minutes (instead
> of 58' of a single core computation). That's pretty good.
>
> cheers,
> Mauro
>
>
> On Sat, Sep 5, 2009 at 1:58 AM, Bill Lorensen<bill.lorensen at gmail.com> wrote:
>> Are you sure your HPC facility allows you to get more than one
>> processor? Have you tried a small example on a local machine with
>> multiple processors?
>>
>> Bill
>>
>> On Fri, Sep 4, 2009 at 11:51 AM, Mauro Maiorca<mauromaiorca at gmail.com> wrote:
>>> I compile with this:
>>> $ c++ -v
>>> Using built-in specs.
>>> Target: x86_64-linux-gnu
>>> Configured with: ../src/configure -v --with-pkgversion='Ubuntu
>>> 4.3.3-5ubuntu4'
>>> --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs
>>> --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
>>> --enable-shared --with-system-zlib --libexecdir=/usr/lib
>>> --without-included-gettext --enable-threads=posix --enable-nls
>>> --with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3
>>> --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc
>>> --enable-mpfr --with-tune=generic --enable-checking=release
>>> --build=x86_64-linux-gnu --host=x86_64-linux-gnu
>>> --target=x86_64-linux-gnu
>>> Thread model: posix
>>> gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
>>>
>>> I execute in a High Performance Computing (HPC) facility with linux (I
>>> launch the program using a predefined Portable Batch System script,
>>> where I can choose the number of processors for the task).
>>>
>>> cheers,
>>> Mauro
>>>
>>>
>>> On Sat, Sep 5, 2009 at 1:30 AM, Bill Lorensen<bill.lorensen at gmail.com> wrote:
>>>> What hardware, OS and compiler are you using?
>>>>
>>>> On Fri, Sep 4, 2009 at 7:39 AM, Mauro Maiorca<mauromaiorca at gmail.com> wrote:
>>>>> Hi Dan & list,
>>>>>
>>>>> I'm using
>>>>> MattesMutualInformationImageToImageMetric
>>>>> LinearInterpolateImageFunction
>>>>>
>>>>> thanks for mentioning the IJ, it's an excellent source of info indeed!
>>>>> However, according to the article "Optimizing ITK’s Registration
>>>>> Methods for Multi-processor, Shared-Memory Systems" my code should be
>>>>> optimized.
>>>>> I would be really pleased to see any improvement on a multicore
>>>>> architecture (even because I'm planning to register quite big volumes
>>>>> using MMI).
>>>>>
>>>>> Do you have any other clue/test/whatever to solve the problem?
>>>>>
>>>>>
>>>>> cheers,
>>>>> Mauro
>>>>>
>>>>>
>>>>> On Fri, Sep 4, 2009 at 8:59 PM, Dan Mueller<dan.muel at gmail.com> wrote:
>>>>>> Hi Mauro,
>>>>>>
>>>>>> Please report to the list what registration components you are using.
>>>>>>
>>>>>> ie. Linear Interpolator, Mattes Mutual Information Metric, Regular
>>>>>> Step Gradient Descent.
>>>>>>
>>>>>> Only _some_ components have been optimized for parallelization. For
>>>>>> which, please refer to the IJ article or the Code/Review folder (look
>>>>>> for itkOpt*).
>>>>>>
>>>>>> Hope this helps.
>>>>>>
>>>>>> Cheers, Dan
>>>>>>
>>>>>> 2009/9/4 Mauro Maiorca <mauromaiorca at gmail.com>:
>>>>>>> Thankyou very much Dan, John, and  Michael,
>>>>>>>
>>>>>>> I compiled again ITK with:
>>>>>>> ITK_USE_OPTIMIZED_REGISTRATION_METHODS = ON
>>>>>>> And still the same excecution time!
>>>>>>>
>>>>>>> I sent the task to only one processor (exec time 56'14") , four
>>>>>>> processors (56'57"), and 8 processors (56'16"). Pretty much the same
>>>>>>> time, no matter the number of processors involved in the computation.
>>>>>>>
>>>>>>> I also tried to add this to the ccmake of my application
>>>>>>> CMAKE_CXX_FLAGS   -lpthread
>>>>>>> It doesn't make any (significant) difference at all!
>>>>>>>
>>>>>>> any other clue?
>>>>>>> cheers,
>>>>>>> Mauro
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 4, 2009 at 1:10 AM, Dan Mueller<dan.muel at gmail.com> wrote:
>>>>>>>> Hi Mauro,
>>>>>>>>
>>>>>>>> To make use of multiple cores for registration, please set the CMake variable
>>>>>>>>    ITK_USE_OPTIMIZED_REGISTRATION_METHODS = ON
>>>>>>>>
>>>>>>>> This allows some interpolators/metrics to utilize multiple cores. For
>>>>>>>> full details, please see:
>>>>>>>>    http://www.insight-journal.org/browse/publication/172
>>>>>>>>
>>>>>>>> Hope this helps.
>>>>>>>>
>>>>>>>> Regards, Dan
>>>>>>>>
>>>>>>>> 2009/9/3 Mauro Maiorca <mauromaiorca at gmail.com>:
>>>>>>>>> Hello list,
>>>>>>>>>
>>>>>>>>> I'm wondering whether the ITK code is optimized for running on a
>>>>>>>>> multiprocessor machine?
>>>>>>>>> I wrote a program to register 2 images but it is quite slow because of
>>>>>>>>> many resamplings, so I'm running it on a 8 processor architecture ...
>>>>>>>>> but no significant improvements.
>>>>>>>>>
>>>>>>>>> I'm using Itk 3.14 compiled (linux) with the following flags:
>>>>>>>>>
>>>>>>>>>  CMAKE_THREAD_LIBS                -lpthread
>>>>>>>>>  CMAKE_USE_PTHREADS               ON
>>>>>>>>>
>>>>>>>>> any suggestion?
>>>>>>>>>
>>>>>>>>> cheers,
>>>>>>>>> Mauro
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> _____________________________________
>>>>> Powered by www.kitware.com
>>>>>
>>>>> Visit other Kitware open-source projects at
>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>
>>>>> Please keep messages on-topic and check the ITK FAQ at: http://www.itk.org/Wiki/ITK_FAQ
>>>>>
>>>>> Follow this link to subscribe/unsubscribe:
>>>>> http://www.itk.org/mailman/listinfo/insight-users
>>>>>
>>>>
>>>
>>
>