On Sun, Sep 2, 2012 at 4:24 PM, Bradley Lowekamp &lt;<a href="mailto:blowekamp@mail.nih.gov">blowekamp@mail.nih.gov</a>&gt; wrote:<br>&gt;<br>&gt; Hello David,<br>&gt;<br>&gt; Is the code you use to profile this available?<br>

&gt;<br>&gt; Little bit of code like this can be very compiler and architecture dependent. There is a lot of potential optimization. Even the way you convert/round the result of the method to the integer your assigning it to can make a big difference, you could look at the round methods in the itk::Math namespace.<br>

&gt;<br>&gt; Also what architecture and sse instruction sets are you compiling for? x64 arch?<br>&gt;<br>&gt; Brad<br><br><br>Hi Brad,<br><br>Here is a demo (previously I had put this function actually in the itkCovariantVector class, but this way seemed easier for a demo and I don&#39;t think it changes anything):<div>

<br><a href="https://github.com/daviddoria/ITKTimingDemos/blob/master/SquaredNorm/SquaredNorm.cpp">https://github.com/daviddoria/ITKTimingDemos/blob/master/SquaredNorm/SquaredNorm.cpp</a><br><br>I am compiling with g++ 4.6.3. Using these flags: <div>

<br></div><div>-O3 </div><div><br></div><div>and </div><div><br></div><div>-03 -msse2 </div><div><br></div><div>seems to produce the same timings:<br><br>Built in time: 3.049<br><br></div><div>Custom time: 2.00261<br><br>

Those are the only compiler flags I was passing - would it help to explicitly specify an architecture? I am using a 32bit system.</div><div><br>David </div></div>