<font size="2"><font face="verdana,sans-serif">Instead of doing it "by hand", you might use an optimized linear algebra library. ITK includes VNL, but you may also use something else (I use <a href="http://eigen.tuxfamily.org/index.php?title=Main_Page">Eigen</a>). It should be faster, because those libraries are optimized to take advantage of SIMD instructions such as MMX, SSE2, AVX etc, among other performance optimizations.<br>
<br>But before that, make sure you are compiling your code in release mode. 12 million multiplications + additions should take much closer to 2ms than 20ms.<br><br>About transferring data to GPU and back: you are right, it makes no sense to do it if you only need to execute 2 operations on each element.<br>
<br>HTH<br></font></font><br><div class="gmail_quote">On Mon, Oct 24, 2011 at 17:36, zlf <span dir="ltr"><<a href="mailto:jxdw_zlf@yahoo.com.cn">jxdw_zlf@yahoo.com.cn</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hi<br>
I have two 4000x3000 matrics. Say matrix A and matrix C. I want to add the<br>
two matrics. It tooks me 20ms now.<br>
<br>
But in my application. I need it finished in 2ms!<br>
<br>
Moreover, I cannot put the calculation in GPU because the data translation<br>
between memory and GPU is time-costuming.<br>
<br>
Anyway to accerlate the calculation in CPU?<br>
<br>
#include "stdafx.h"<br>
#include <iostream><br>
#include <time.h><br>
#include <conio.h><br>
<br>
int _tmain(int argc, _TCHAR* argv[])<br>
{<br>
short* a = new short[4000*3000];<br>
short* b = new short[4000*3000];<br>
short* c = new short[4000*3000];<br>
<br>
clock_t cstart, cend;<br>
double spend;<br>
cstart = clock();<br>
<br>
for(int i = 0 ; i < 1000;++i){<br>
#pragma omp parallel for<br>
for(int x = 0 ; x < 4000*3000;++x){<br>
//for(int y = 0 ; y < 3000;++y){<br>
short value = a[x] * 3000 + b[x];<br>
c[x] = value;<br>
//}<br>
}<br>
}<br>
<br>
cend = clock();<br>
spend = ((double)(cend-cstart)) / (double)CLOCKS_PER_SEC*1000/1000;<br>
printf("spend: %f\n(ms)", spend);<br>
<br>
return 0;<br>
}<br>
<br>
Thanks<br>
<br>
Jerry<br>
<br>
--<br>
View this message in context: <a href="http://itk-insight-users.2283740.n2.nabble.com/Fast-large-matrix-add-tp6925448p6925448.html" target="_blank">http://itk-insight-users.2283740.n2.nabble.com/Fast-large-matrix-add-tp6925448p6925448.html</a><br>
Sent from the ITK Insight Users mailing list archive at Nabble.com.<br>
_____________________________________<br>
Powered by <a href="http://www.kitware.com" target="_blank">www.kitware.com</a><br>
<br>
Visit other Kitware open-source projects at<br>
<a href="http://www.kitware.com/opensource/opensource.html" target="_blank">http://www.kitware.com/opensource/opensource.html</a><br>
<br>
Kitware offers ITK Training Courses, for more information visit:<br>
<a href="http://www.kitware.com/products/protraining.html" target="_blank">http://www.kitware.com/products/protraining.html</a><br>
<br>
Please keep messages on-topic and check the ITK FAQ at:<br>
<a href="http://www.itk.org/Wiki/ITK_FAQ" target="_blank">http://www.itk.org/Wiki/ITK_FAQ</a><br>
<br>
Follow this link to subscribe/unsubscribe:<br>
<a href="http://www.itk.org/mailman/listinfo/insight-users" target="_blank">http://www.itk.org/mailman/listinfo/insight-users</a><br>
</blockquote></div><br>