<html><head><style type="text/css"><!-- DIV {margin:0px;} --></style></head><body><div style="font-family:times new roman,new york,times,serif;font-size:14pt"><div>Thank you Andriy for your reply.<br><br>Actually my problem is way simpler than that. I want to multiply matrices (ordinary matrix-matrix multiplication and element-wise multiplication) but my algorithm involves iterative and large matrix multiplication. I tried to exploit multi-threading by openmp (not openmpi). I googled it and find this simple code for multi-threaded matrix-matrix multiplication:<br><br>from: https://computing.llnl.gov/tutorials/openMP/samples/C/omp_mm.c <br>/******************************************************************************<br>* FILE: omp_mm.c<br>* DESCRIPTION: <br>* OpenMp Example - Matrix Multiply - C Version<br>* Demonstrates a matrix multiply using OpenMP. Threads share row iterations<br>* according to a predefined
chunk size.<br>* AUTHOR: Blaise Barney<br>* LAST REVISED: 06/28/05<br>******************************************************************************/<br>#include <omp.h><br>#include <stdio.h><br>#include <stdlib.h><br><br>#define NRA 62 /* number of rows in matrix A */<br>#define NCA 15 /* number of columns in matrix A */<br>#define NCB 7 /* number of columns in matrix B */<br><br>int main (int argc, char *argv[]) <br>{<br>int tid, nthreads, i, j, k, chunk;<br>double a[NRA][NCA], /* matrix A to be multiplied */<br>
b[NCA][NCB], /* matrix B to be multiplied */<br> c[NRA][NCB]; /* result matrix C */<br><br>chunk = 10; /* set loop iteration chunk size */<br><br>/*** Spawn a parallel region explicitly scoping all variables ***/<br>#pragma omp parallel shared(a,b,c,nthreads,chunk) private(tid,i,j,k)<br> {<br> tid = omp_get_thread_num();<br> if (tid == 0)<br> {<br> nthreads = omp_get_num_threads();<br> printf("Starting matrix multiple example with %d threads\n",nthreads);<br> printf("Initializing matrices...\n");<br> }<br> /*** Initialize matrices ***/<br> #pragma omp for schedule (static, chunk)
<br> for (i=0; i<NRA; i++)<br> for (j=0; j<NCA; j++)<br> a[i][j]= i+j;<br> #pragma omp for schedule (static, chunk)<br> for (i=0; i<NCA; i++)<br> for (j=0; j<NCB; j++)<br> b[i][j]= i*j;<br> #pragma omp for schedule (static, chunk)<br> for (i=0; i<NRA; i++)<br> for (j=0; j<NCB; j++)<br> c[i][j]= 0;<br><br> /*** Do matrix multiply sharing iterations on outer loop ***/<br> /*** Display who does which iterations for demonstration purposes ***/<br> printf("Thread %d starting matrix multiply...\n",tid);<br> #pragma omp for schedule (static, chunk)<br> for (i=0; i<NRA; i++) <br> {<br> printf("Thread=%d did row=%d\n",tid,i);<br> for(j=0; j<NCB;
j++) <br> for (k=0; k<NCA; k++)<br> c[i][j] += a[i][k] * b[k][j];<br> }<br> } /*** End of parallel region ***/<br><br>/*** Print results ***/<br>printf("******************************************************\n");<br>printf("Result Matrix:\n");<br>for (i=0; i<NRA; i++)<br> {<br> for (j=0; j<NCB; j++) <br> printf("%6.2f ", c[i][j]);<br> printf("\n"); <br> }<br>printf("******************************************************\n");<br>printf ("Done.\n");<br><br>}<br><br><br>it compiles and works perfectly when I type: <br>>> g++ omp_mm.c -o omp_mm -fopenmp<br><br>but when I add following lines to my CMakefile:<br>SET(CurrentExe "omp_mm")<br>ADD_EXECUTABLE(${CurrentExe} omp_mm.c ${SRCS} ${HDRS}) <br>TARGET_LINK_LIBRARIES(${CurrentExe}
${Libraries})<br><br>and change :<br>CMAKE_EXE_LINKER_FLAGS : -fopenmp<br><br>here is what I get in compilation time:<br>/home/kaveh/Src/ITKNMF/Source/omp_mm.c: In function ¡main¢:<br>/home/kaveh/Src/ITKNMF/Source/omp_mm..c:28: warning: ignoring #pragma omp parallel<br>/home/kaveh/Src/ITKNMF/Source/omp_mm.c:38: warning: ignoring #pragma omp for<br>/home/kaveh/Src/ITKNMF/Source/omp_mm.c:42: warning: ignoring #pragma omp for<br>/home/kaveh/Src/ITKNMF/Source/omp_mm.c:46: warning: ignoring #pragma omp for<br>/home/kaveh/Src/ITKNMF/Source/omp_mm.c:54: warning: ignoring #pragma omp for<br>/home/kaveh/Src/ITKNMF/Source/omp_mm.c:76: warning: control reaches end of non-void function<br><br>and no matter how I change the enviroment variable of $OMP_NUM_THREADS, it always lunches one thread. Unlike when I compile it with command line (without cmake), in which it works perfectly!!!! <br><br>Now, I am confused!! There should be something wrong with cmake but how
can I fix it?<br><br>Second question is, ITK uses LAPACK and LAPACK is multithreaded (corret me if I am wrong). If yes how can I set number of threads for matrix multiplication?<br><br>Regards,<br>Kaveh<br> </div><div style="font-family: times new roman,new york,times,serif; font-size: 14pt;"><div style="font-family: arial,helvetica,sans-serif; font-size: 13px;"><br>Kaveh,<br><br>You can look at PETSc for parallel linear algebra routines:<br><a href="http://www.mcs.anl.gov/petsc/petsc-as/" target="_blank">http://www.mcs.anl.gov/petsc/petsc-as/</a>.<br><br>In the past I used PETSc in conjunction with ITK code. You have to be<br>careful to make sure that both PETSc and ITK are compiled with the<br>same mpicc/mpicxx/mpiCC compiler.<br><br>Hope this helps<br><br>Andriy Fedorov<br><br><br><br>> Date: Wed, 21 Jan 2009 09:01:08 -0800 (PST)<br>> From: Kaveh Kohan <<a ymailto="mailto:kaveh.kohan@yahoo.com"
href="mailto:kaveh.kohan@yahoo.com">kaveh.kohan@yahoo.com</a>><br>> Subject: [Insight-users] about multithreading in ITK<br>> To: <a ymailto="mailto:insight-users@itk.org" href="mailto:insight-users@itk.org">insight-users@itk.org</a><br>> Message-ID: <<a ymailto="mailto:352748.10256.qm@web59915.mail.ac4.yahoo.com" href="mailto:352748.10256.qm@web59915.mail.ac4.yahoo.com">352748.10256.qm@web59915.mail.ac4.yahoo.com</a>><br>> Content-Type: text/plain; charset="us-ascii"<br>><br>> Hello All,<br>><br>> I have a general question about multi-threading in ITK and I would be thankful if anybody<br>> answer:<br>><br>> As far as I know, ITK uses vnl for linear albera operation but it is not<br>> multithreaded (Please correct me if I am wrong). While many complicated<br>> algorithm are already multithreaded in ITK, was there any technical reason that<br>> linear algebra operation (like matrix-matrix
multiplication, solving linear<br>> system, LU, ....) is not multithreaded? Perhaps it didn't have priority.<br>><br>> I would be appreciated if you tell what is the best way to implement efficient,<br>> multithread linear algebra operation. In case it is not a good idea to use ITK<br>> API for multithreading of such operation, is there any third party multithreaded<br>> libray for linear algebra operation that I can link to ITK.<br>><br>><br>> Regards,<br>> Kaveh<br>><br>><br>><br>><br><br></div></div></div><br>
</body></html>