[Insight-users] parallelization
Mauro Maiorca
mauromaiorca at gmail.com
Fri Sep 4 13:10:07 EDT 2009
On Sat, Sep 5, 2009 at 1:58 AM, Bill Lorensen<bill.lorensen at gmail.com> wrote:
> Are you sure your HPC facility allows you to get more than one
> processor?
100% sure: I got a notification by mail, with the execution time + the
name of the processor/s who accomplished the task.
> Have you tried a small example on a local machine with
> multiple processors?
I'm quite sure the problem is not in the HPC, I'm just using the
standard script (that works with other programs). I'll ask the HPC
admin, and I'll try to retrieve a detailed monitoring of what is going
on in the CPUs, however I'm quite sure the problem is somehow nested
in the cmake settings.
There is online a test program (executable + source + CMakeList.txt)
of something that gives improvements on a multicore architecture? Or
-even easier- the example registration8.cxx compiled for linux,
optimized for multicore architectures?
cheers,
Mauro
>
> Bill
>
> On Fri, Sep 4, 2009 at 11:51 AM, Mauro Maiorca<mauromaiorca at gmail.com> wrote:
>> I compile with this:
>> $ c++ -v
>> Using built-in specs.
>> Target: x86_64-linux-gnu
>> Configured with: ../src/configure -v --with-pkgversion='Ubuntu
>> 4.3.3-5ubuntu4'
>> --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs
>> --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
>> --enable-shared --with-system-zlib --libexecdir=/usr/lib
>> --without-included-gettext --enable-threads=posix --enable-nls
>> --with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3
>> --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc
>> --enable-mpfr --with-tune=generic --enable-checking=release
>> --build=x86_64-linux-gnu --host=x86_64-linux-gnu
>> --target=x86_64-linux-gnu
>> Thread model: posix
>> gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
>>
>> I execute in a High Performance Computing (HPC) facility with linux (I
>> launch the program using a predefined Portable Batch System script,
>> where I can choose the number of processors for the task).
>>
>> cheers,
>> Mauro
>>
>>
>> On Sat, Sep 5, 2009 at 1:30 AM, Bill Lorensen<bill.lorensen at gmail.com> wrote:
>>> What hardware, OS and compiler are you using?
>>>
>>> On Fri, Sep 4, 2009 at 7:39 AM, Mauro Maiorca<mauromaiorca at gmail.com> wrote:
>>>> Hi Dan & list,
>>>>
>>>> I'm using
>>>> MattesMutualInformationImageToImageMetric
>>>> LinearInterpolateImageFunction
>>>>
>>>> thanks for mentioning the IJ, it's an excellent source of info indeed!
>>>> However, according to the article "Optimizing ITK’s Registration
>>>> Methods for Multi-processor, Shared-Memory Systems" my code should be
>>>> optimized.
>>>> I would be really pleased to see any improvement on a multicore
>>>> architecture (even because I'm planning to register quite big volumes
>>>> using MMI).
>>>>
>>>> Do you have any other clue/test/whatever to solve the problem?
>>>>
>>>>
>>>> cheers,
>>>> Mauro
>>>>
>>>>
>>>> On Fri, Sep 4, 2009 at 8:59 PM, Dan Mueller<dan.muel at gmail.com> wrote:
>>>>> Hi Mauro,
>>>>>
>>>>> Please report to the list what registration components you are using.
>>>>>
>>>>> ie. Linear Interpolator, Mattes Mutual Information Metric, Regular
>>>>> Step Gradient Descent.
>>>>>
>>>>> Only _some_ components have been optimized for parallelization. For
>>>>> which, please refer to the IJ article or the Code/Review folder (look
>>>>> for itkOpt*).
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Cheers, Dan
>>>>>
>>>>> 2009/9/4 Mauro Maiorca <mauromaiorca at gmail.com>:
>>>>>> Thankyou very much Dan, John, and Michael,
>>>>>>
>>>>>> I compiled again ITK with:
>>>>>> ITK_USE_OPTIMIZED_REGISTRATION_METHODS = ON
>>>>>> And still the same excecution time!
>>>>>>
>>>>>> I sent the task to only one processor (exec time 56'14") , four
>>>>>> processors (56'57"), and 8 processors (56'16"). Pretty much the same
>>>>>> time, no matter the number of processors involved in the computation.
>>>>>>
>>>>>> I also tried to add this to the ccmake of my application
>>>>>> CMAKE_CXX_FLAGS -lpthread
>>>>>> It doesn't make any (significant) difference at all!
>>>>>>
>>>>>> any other clue?
>>>>>> cheers,
>>>>>> Mauro
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 4, 2009 at 1:10 AM, Dan Mueller<dan.muel at gmail.com> wrote:
>>>>>>> Hi Mauro,
>>>>>>>
>>>>>>> To make use of multiple cores for registration, please set the CMake variable
>>>>>>> ITK_USE_OPTIMIZED_REGISTRATION_METHODS = ON
>>>>>>>
>>>>>>> This allows some interpolators/metrics to utilize multiple cores. For
>>>>>>> full details, please see:
>>>>>>> http://www.insight-journal.org/browse/publication/172
>>>>>>>
>>>>>>> Hope this helps.
>>>>>>>
>>>>>>> Regards, Dan
>>>>>>>
>>>>>>> 2009/9/3 Mauro Maiorca <mauromaiorca at gmail.com>:
>>>>>>>> Hello list,
>>>>>>>>
>>>>>>>> I'm wondering whether the ITK code is optimized for running on a
>>>>>>>> multiprocessor machine?
>>>>>>>> I wrote a program to register 2 images but it is quite slow because of
>>>>>>>> many resamplings, so I'm running it on a 8 processor architecture ...
>>>>>>>> but no significant improvements.
>>>>>>>>
>>>>>>>> I'm using Itk 3.14 compiled (linux) with the following flags:
>>>>>>>>
>>>>>>>> CMAKE_THREAD_LIBS -lpthread
>>>>>>>> CMAKE_USE_PTHREADS ON
>>>>>>>>
>>>>>>>> any suggestion?
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> Mauro
>>>>>>>
>>>>>>
>>>>>
>>>> _____________________________________
>>>> Powered by www.kitware.com
>>>>
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>>
>>>> Please keep messages on-topic and check the ITK FAQ at: http://www.itk.org/Wiki/ITK_FAQ
>>>>
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://www.itk.org/mailman/listinfo/insight-users
>>>>
>>>
>>
>
More information about the Insight-users
mailing list