[Insight-users] FW: Question about using OpenCL shared buffer for more than one kernel

Joachim Weber joachim.weber at stud.hs-regensburg.de
Wed Feb 20 12:23:10 EST 2013


Thank you very much! I think, I can go from there. 

But I still have one last question.^^ 
I tried to implement a TreeKMean algorithm on GPU(only classification is done on GPU).
This is done by an filter, which gets an input image. 
The filter classifies each pixel according to a classification vector(5 centers are stored), and saves the nearest center into an output image.
This is done several times. Now the problem. For each iteration I need to create the filter anew, because the input image doesn't change.
It looks like, if the inputs don't change, nothing will be done. So I need to recreate the whole filter(with all the inputs set anew).
Is there a way to tell the filter just to do its work anew.^^ 
The classification vector is not set via the classic itk way(filter->SetInput(inPtr)).

It looks like, that for the input images there is information stored, which indicates that the image is modified or not.

Thanks again,
Weber Joachim

________________________________________
From: Denis Shamonin [dshamoni at gmail.com]
Sent: Wednesday, February 20, 2013 5:05 PM
To: Joachim Weber
Cc: insight-users at itk.org
Subject: Re: [Insight-users] FW: Question about using OpenCL shared buffer for more than one kernel

Hi,

The itkGPUDataManager is a simple wrapper around CPU <-> GPU buffers with some support for synchronization logic.
In order to share the same CPU->GPU block of memory across multiple kernels follow the same recipe
that was used for the itkGPUResampleImageFilter.

Some pseudo code below:

1. Create the data manager

typename GPUDataManager::Pointer m_MyBuffer;
m_MyBuffer = GPUDataManager::New();

m_MyBuffer->Initialize();
m_MyBuffer->SetBufferFlag( CL_MEM_READ_WRITE ); // or use CL_MEM_READ_ONLY, if only will be used for read operations
m_MyBuffer->SetBufferSize( mem_size_MB ); // size of your buffer
m_MyBuffer->Allocate();

There is nothing yet in this buffer, you could fill it on GPU side for example.
You could also fill it beforehand with some values on CPU side and then copy to GPU.
If you want to fill it with something on CPU side then do this.

m_MyBuffer->SetCPUBufferPointer( ... );
m_MyBuffer->SetGPUDirtyFlag( true );
m_MyBuffer->UpdateGPUBuffer();

2. Define two kernel managers (also possible with only one using different kernel handles)

typedef typename GPUKernelManager::Pointer GPUKernelManagerPointer;
GPUKernelManagerPointer m_AKernelManager;
GPUKernelManagerPointer m_BKernelManager;

3. Create handles
m_AHandle = m_AKernelManager->CreateKernel( "OpenCLFilterA" ); // Your OpenCL code A
m_BHandle = m_BKernelManager->CreateKernel( "OpenCLFilterB" ); // Your OpenCL code B

4. Set kernels with this block of memory
m_AKernelManager->SetKernelArgWithImage( m_AHandle, argidx++, m_MyBuffer ); // Confusing sometimes, should be called buffer or something like it.
m_BKernelManager->SetKernelArgWithImage( m_BHandle, argidx++, m_MyBuffer ); // Confusing sometimes, should be called buffer or something like it.

5. Lunch the kernel B after kernel A using OpenCLEvent and OpenCLEventList in synchronize way.

OpenCLEventList eventList;

OpenCLEvent AEvent = m_AKernelManager->LaunchKernel( m_AHandle, eventList );
eventList.Append( AEvent );

OpenCLEvent BEvent = m_BKernelManager->LaunchKernel( m_BHandle, eventList );
eventList.Append( BEvent );

eventList.WaitForFinished();

6. Inside the OpenCL kernel called 'OpenCLFilterA' you could fill it with some values or reuse the ones defined on CPU side.
When the kernel 'OpenCLFilterB' has been launched, retrieve this values, compute something else and put them back to the same buffer or to the output buffer.

That is it, Good luck.
Denis

On Wed, Feb 20, 2013 at 2:42 PM, Joachim Weber <joachim.weber at stud.hs-regensburg.de<mailto:joachim.weber at stud.hs-regensburg.de>> wrote:
Thanks for that fast answer.

1. I've already implemented my own version of an Gaussian Filter, according to FSL-Fast.
2. I've looked into itkGPUDataManager. How is that done? Is it necessary to set a certain OpenCL flag for this?
Normally it is recommended, just to create an cl_mem object and to set certain flags for that.
I don't find any information, how to do this. It looks like, if I set up the GPUDataManger for each GPUImage right,
I can achieve exactly that, what I want to. But I am not sure...

Regards
Weber Joachim
________________________________________
From: Denis Shamonin [dshamoni at gmail.com<mailto:dshamoni at gmail.com>]
Sent: Wednesday, February 20, 2013 10:00 AM
To: Joachim Weber
Cc: insight-users at itk.org<mailto:insight-users at itk.org>
Subject: Re: [Insight-users] Question about using OpenCL shared buffer for more than one kernel

Dear Weber,

1. I've implemented the Gaussian Filter
    itkGPURecursiveGaussianImageFilter and itkGPUSmoothingRecursiveGaussianImageFilter
    in my paper http://hdl.handle.net/10380/3393 have a look to the article.
2. Yes, the memory could be copied only once from CPU->GPU and used by the multiple kernels via pointers, use itkGPUDataManager for that.

Regards,

-Denis Shamonin, MSc
Division of Image Processing (LKEB)
Department of Radiology
Leiden University Medical Center

On Wed, Feb 20, 2013 at 12:02 AM, Joachim Weber <joachim.weber at stud.hs-regensburg.de<mailto:joachim.weber at stud.hs-regensburg.de><mailto:joachim.weber at stud.hs-regensburg.de<mailto:joachim.weber at stud.hs-regensburg.de>>> wrote:
Hi,
i want to write a Gaussian Filter with ITK-GPU. I have a 3D image, which has to be filtered in all 3 directions(first Z, then X and then Y).
Because ITK encapsulates the whole OpenCL functionality inside its on methods and classes, I find it very hard to define memory objects for multiple kernels.
It don't want to copy the image between Host and Device the whole time, which would result in a heavy performance drawback.
Is there a simple way to define OpenCL memory objects, which can be used by multiple kernels?
I am not to much into ITK GPU, but it looks like, defining memory buffers for multiple kernels(which stay on Device) is not possible right now.
Or do all kernels run in the same context?


Many thanks in advance,
Weber Joachim
_____________________________________
Powered by www.kitware.com<http://www.kitware.com><http://www.kitware.com>

Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html

Kitware offers ITK Training Courses, for more information visit:
http://www.kitware.com/products/protraining.php

Please keep messages on-topic and check the ITK FAQ at:
http://www.itk.org/Wiki/ITK_FAQ

Follow this link to subscribe/unsubscribe:
http://www.itk.org/mailman/listinfo/insight-users

_____________________________________
Powered by www.kitware.com<http://www.kitware.com>

Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html

Kitware offers ITK Training Courses, for more information visit:
http://www.kitware.com/products/protraining.php

Please keep messages on-topic and check the ITK FAQ at:
http://www.itk.org/Wiki/ITK_FAQ

Follow this link to subscribe/unsubscribe:
http://www.itk.org/mailman/listinfo/insight-users



More information about the Insight-users mailing list