<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hans,<div><br></div><div>I have seem poor speed up by using more threads with by brief usage with ANTS. If I recall correctly more threads were actually slower for what ever I was doing at the time. </div><div><br></div><div>It's sounds like you are on the right track with the tools you are using and trying to remove unnecessary usage of smart pointers atomic reference counting.</div><div><br></div><div>Another thing to keep in mind is the memory layout of the data being processes, and the impact of multithreaded memory allocation.</div><div><br></div><div>When I have looked at these methods I didn't have time to figure out all the data structures an methods involved to get a good handle of what was going on.</div><div><br></div><div>I'd be interested in see speed up number of some of this problematic code, that is speed up with 2,3,4,8,16 threads (true, not hyper threaded) etc. Or perhaps a ratio of the total CPU time ( summed a crossed threads) vs wall time or something similar would be a simple fair number. </div><div><br></div><div>Good luck,</div><div>Brad</div><div><br><div><div>On Sep 23, 2013, at 7:46 AM, Brian Avants <<a href="mailto:stnava@gmail.com">stnava@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><div dir="ltr">hi hans <div><br></div><div>thanks for looking at this - i suppose the good news is that there is plenty of room for improvement. </div><div><br></div><div>do you have a sense of whether this a registration-specific issue or if this is multi-threading in itk, in general?</div>
<div><br></div><div>am wondering if there is a simplified case that we can invent or find that will help clarify/isolate the issues.</div><div><br></div><div>see you tomorrow, probably. i just got in @ 6pm .... </div>
</div><div class="gmail_extra"><br clear="all"><div><div><br></div>brian<br><div><br></div><div><br></div></div>
<br><br><div class="gmail_quote">On Mon, Sep 23, 2013 at 8:16 AM, Johnson, Hans J <span dir="ltr"><<a href="mailto:hans-johnson@uiowa.edu" target="_blank">hans-johnson@uiowa.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto; ">
<div style="font-size:14px;font-family:Calibri,sans-serif;word-wrap:break-word">
<div>All,</div>
<div><br>
</div>
<div>
<div>Based on a discussion with Nick Tustison on the train from Nogoya airport to the MICCAI conference, I started some profiling to determine what is actually causing registration to be so slow. Some fixes have already been pushed to gerrit (<a href="http://review.source.kitware.com/#/c/12747/" target="_blank">http://review.source.kitware.com/#/c/12747/</a>)
and that has shown about a 15% speed improvement. This however, appears to only be the tip of the iceberg. </div>
</div>
<div><br>
</div>
<div>In addition, I have been greatly disappointed that converting to floating point precision did not result in performance improvement (even though all my past experience indicates that it should be a performance improvement!). If these multithreading issues
turn out to be the problem, that would explain why improving floating point performance does not improve overall performance. </div>
<div><br>
</div>
<div>
<div>=================</div>
</div>
<div><br>
</div>
<div>So far everything I've profile with regards to ants registration indicates that there is a serious flaw in the multi-threaded implementation.</div>
<div><br>
</div>
<div>20 of the 52 seconds are waiting for condition variables to clear (I.e. Variables are shared and require synchronization to complete). The thread concurrency histogram is particularly troubling. Only 1 or 2 threads are actually doing productive work
at the same time. NOTE: THIS IS A REAL program that is actually in use for affine registration. I use it every day and have been terribly disappointed in it's speed. Every ants registration that you do like has this behavior.</div>
<div><br>
</div>
<div>=================</div>
<div><br>
</div>
<div>I'll continue to track down where the issues are, but it appears to be in places where a transform is referenced in multiple threads, but is requiring updating the internal reference count of the smart pointer. Each smart pointer reference count update
requires a global lock on that object to do the increment/decrement.</div>
<div><br>
</div>
<div>More testing to follow.</div>
<div><br>
</div>
<div>Hans</div>
<div><br>
</div>
<div><span><BB93F41A-C611-4E82-8897-59D419BC5E08.png></span></div>
<br>
<br>
<br>
<hr>
Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged. If you are not the intended recipient, you are hereby notified that any
retention, dissemination, distribution, or copying of this communication is strictly prohibited. Please reply to the sender that you have received the message in error, then delete it. Thank you.
<hr>
</div>
</blockquote></div><br></div>
</blockquote></div><br></div></body></html>