[Insight-users] about multithreading in ITK (Andriy Fedorov)

Thu Jan 22 20:35:24 EST 2009

Thank you Andriy for your reply.

Actually my problem is way simpler than that. I want to multiply matrices (ordinary matrix-matrix multiplication and element-wise multiplication) but my algorithm involves iterative and large matrix multiplication. I tried to exploit multi-threading by openmp (not openmpi). I googled it and find this simple code for multi-threaded matrix-matrix multiplication:

from: https://computing.llnl.gov/tutorials/openMP/samples/C/omp_mm.c
/******************************************************************************
* FILE: omp_mm.c
* DESCRIPTION:  
*   OpenMp Example - Matrix Multiply - C Version
*   Demonstrates a matrix multiply using OpenMP. Threads share row iterations
*   according to a predefined chunk size.
* AUTHOR: Blaise Barney
* LAST REVISED: 06/28/05
******************************************************************************/
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

#define NRA 62                 /* number of rows in matrix A */
#define NCA 15                 /* number of columns in matrix A */
#define NCB 7                  /* number of columns in matrix B */

int main (int argc, char *argv[]) 
{
int    tid, nthreads, i, j, k, chunk;
double    a[NRA][NCA],           /* matrix A to be multiplied */
    b[NCA][NCB],           /* matrix B to be multiplied */
    c[NRA][NCB];           /* result matrix C */

chunk = 10;                    /* set loop iteration chunk size */

/*** Spawn a parallel region explicitly scoping all variables ***/
#pragma omp parallel shared(a,b,c,nthreads,chunk) private(tid,i,j,k)
  {
  tid = omp_get_thread_num();
  if (tid == 0)
    {
    nthreads = omp_get_num_threads();
    printf("Starting matrix multiple example with %d threads\n",nthreads);
    printf("Initializing matrices...\n");
    }
  /*** Initialize matrices ***/
  #pragma omp for schedule (static, chunk) 
  for (i=0; i<NRA; i++)
    for (j=0; j<NCA; j++)
      a[i][j]= i+j;
  #pragma omp for schedule (static, chunk)
  for (i=0; i<NCA; i++)
    for (j=0; j<NCB; j++)
      b[i][j]= i*j;
  #pragma omp for schedule (static, chunk)
  for (i=0; i<NRA; i++)
    for (j=0; j<NCB; j++)
      c[i][j]= 0;

  /*** Do matrix multiply sharing iterations on outer loop ***/
  /*** Display who does which iterations for demonstration purposes ***/
  printf("Thread %d starting matrix multiply...\n",tid);
  #pragma omp for schedule (static, chunk)
  for (i=0; i<NRA; i++)    
    {
    printf("Thread=%d did row=%d\n",tid,i);
    for(j=0; j<NCB; j++)       
      for (k=0; k<NCA; k++)
        c[i][j] += a[i][k] * b[k][j];
    }
  }   /*** End of parallel region ***/

/*** Print results ***/
printf("******************************************************\n");
printf("Result Matrix:\n");
for (i=0; i<NRA; i++)
  {
  for (j=0; j<NCB; j++) 
    printf("%6.2f   ", c[i][j]);
  printf("\n"); 
  }
printf("******************************************************\n");
printf ("Done.\n");

}

it compiles and works perfectly when I type: 
>> g++  omp_mm.c  -o omp_mm -fopenmp

but when I add following lines to my CMakefile:
SET(CurrentExe "omp_mm")
ADD_EXECUTABLE(${CurrentExe} omp_mm.c ${SRCS} ${HDRS}) 
TARGET_LINK_LIBRARIES(${CurrentExe} ${Libraries})

and change :
CMAKE_EXE_LINKER_FLAGS : -fopenmp

here is what I get in compilation time:
/home/kaveh/Src/ITKNMF/Source/omp_mm.c: In function ‘main’:
/home/kaveh/Src/ITKNMF/Source/omp_mm.c:28: warning: ignoring #pragma omp parallel
/home/kaveh/Src/ITKNMF/Source/omp_mm.c:38: warning: ignoring #pragma omp for
/home/kaveh/Src/ITKNMF/Source/omp_mm.c:42: warning: ignoring #pragma omp for
/home/kaveh/Src/ITKNMF/Source/omp_mm.c:46: warning: ignoring #pragma omp for
/home/kaveh/Src/ITKNMF/Source/omp_mm.c:54: warning: ignoring #pragma omp for
/home/kaveh/Src/ITKNMF/Source/omp_mm.c:76: warning: control reaches end of non-void function

and no matter how I change the enviroment variable of $OMP_NUM_THREADS, it always lunches one thread. Unlike when I compile it with command line (without cmake), in which it works perfectly!!!! 

Now, I am confused!! There should be something wrong with cmake but how can I fix it?

Second question is, ITK uses LAPACK and LAPACK is multithreaded (corret me if I am wrong). If yes how can I set number of threads for matrix multiplication?

Regards,
Kaveh

Kaveh,

You can look at PETSc for parallel linear algebra routines:
http://www.mcs.anl.gov/petsc/petsc-as/.

In the past I used PETSc in conjunction with ITK code. You have to be
careful to make sure that both PETSc and ITK are compiled with the
same mpicc/mpicxx/mpiCC compiler.

Hope this helps

Andriy Fedorov

> Date: Wed, 21 Jan 2009 09:01:08 -0800 (PST)
> From: Kaveh Kohan <kaveh.kohan at yahoo.com>
> Subject: [Insight-users] about multithreading in ITK
> To: insight-users at itk.org
> Message-ID: <352748.10256.qm at web59915.mail.ac4.yahoo.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Hello All,
>
> I have a general question about multi-threading in ITK and I would be thankful if anybody
> answer:
>
> As far as I know, ITK uses vnl for linear albera operation but it is not
> multithreaded (Please correct me if I am wrong). While many complicated
> algorithm are already multithreaded in ITK, was there any technical reason that
> linear algebra operation (like matrix-matrix multiplication, solving linear
> system, LU, ....) is not multithreaded? Perhaps it didn't have priority.
>
> I would be appreciated if you tell what is the best way to implement efficient,
> multithread linear algebra operation. In case it is not a good idea to use ITK
> API for multithreading of such operation, is there any third party multithreaded
> libray for linear algebra operation that I can link to ITK.
>
>
> Regards,
> Kaveh
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-users/attachments/20090122/5f577b31/attachment-0001.htm>