[ITK-users] GPUDiscreteGaussian not working

Tue Apr 22 06:43:11 EDT 2014

Denis,

I am not following your recommendations for Jose. 

Are you stating that sometimes ITK does not copy the result of the GPU filter back into the CPU memory?

A user should not have to use e methods you are directing Jose towards. 

Jim

> On Apr 22, 2014, at 5:28 AM, Denis Shamonin <dshamoni at gmail.com> wrote:
> 
> Hi Jose,
> 
>  
> 
> The synchronization from CPU to GPU image and back, may or may not be triggered by the default in the current ITK implementation.
> 
> You have to make an extra effort to control it. Basically, you have to make sure that GPU input image is allocated and copied from CPU to GPU,
> 
> execute the filter which only use input and output images, copy GPU output image back to CPU.
> 
>  
> 
> Ideally, you should not control it and that has to be managed by the ITK GPU pipeline itself (hided from the user), but this is not a case right now.
> 
>  
> 
> There are few problems. First, what you get after calling reader->GetOutput() is normal ITK image that you passing to the GPU filter,
> 
> while expecting GPU image at this moment. What you want is that GPU image is ready when you call reader->GetOutput() (created, allocated and copied to GPU).
> 
> The second problem may happen right after calling the GPU filter, the memory for output are not copied back to CPU output image.
> 
>  
> 
> What you should do is following:
> 
> 1 Create GPU input image.
> 
>  
> 
> 1.1 Register the GPUImageFactory before calling GPUReader = ReaderType::New();
> 
>   itk::ObjectFactoryBase::RegisterFactory( itk::GPUImageFactory::New() );
> 
>  
> 
>   At this moment ALL ITK images which are created will be itk::GPUImage's with memory allocated on GPU.
> 
>   Use it with care, you may end up with not intended copying to GPU when you modify this images.
> 
>   The reader once created will also have GPU image inside and that what you need.
> 
>   You may unregister factory right after you have used it.
> 
>  
> 
> 1.2 Alternative way if registering factory is not possible for you application.
> 
>   But you still want to use GPU filter in the middle of your application you may consider following:
> 
>       gpuInputImage = GPUInputImageType::New();
> 
>       gpuInputImage->GraftITKImage( itkimage );  // normal itk image here
> 
>       gpuInputImage->AllocateGPU(); // allocate only on GPU
> 
>       gpuInputImage->GetGPUDataManager()->SetCPUBufferLock( true ); // we don't want to change it CPU input
> 
>       gpuInputImage->GetGPUDataManager()->SetGPUDirtyFlag( true );  // set gpu dirty flag
> 
>       gpuInputImage->GetGPUDataManager()->UpdateGPUBuffer();  // copy cpu -> gpu
> 
>      
> 
> 2. Construct you filter, set input from the reader (or gpuInput image), call your filter.
> 
>   At the moment of construction GPU filter will create GPU output image for you.
> 
>  
> 
> 3. Call extra synchronization step after GPUFilter->Update(); (listed below)
> 
> itk::GPUExplicitSync< FilterType, OutputImageType >( GPUFilter, false );
> 
>  
> 
> 4. Write results
> 
>  
> 
> I hope that helps a bit. Making or using GPU filters with ITK is a bit of the challenge right now.
> 
> Specially to get it working across all GPU cards available.
> 
>  
> 
> You could check the correct execution for itkGPUShrinkImageFilter as example in The Insight Journal paper:
> 
> http://www.insight-journal.org/browse/publication/884
> 
>  
> 
> Regards,
> 
> -Denis Shamonin
> 
> Division of Image Processing (LKEB)
> 
> Department of Radiology
> 
> Leiden University Medical Center
> 
> PO Box 9600, 2300 RC Leiden, The Netherlands
> 
> 
> 
> //------------------------------------------------------------------------------
> // GPU explicit synchronization helper function
> template< class ImageToImageFilterType, class OutputImageType >
> void
> GPUExplicitSync( typename ImageToImageFilterType::Pointer & filter,
>   const bool filterUpdate = true,
>   const bool releaseGPUMemory = false )
> {
>   if( filter.IsNotNull() )
>   {
>     if( filterUpdate )
>     {
>       filter->Update();
>     }
> 
>     typedef typename OutputImageType::PixelType                               OutputImagePixelType;
>     typedef GPUImage< OutputImagePixelType, OutputImageType::ImageDimension > GPUOutputImageType;
>     GPUOutputImageType * GPUOutput = dynamic_cast< GPUOutputImageType * >( filter->GetOutput() );
>     if( GPUOutput )
>     {
>       GPUOutput->UpdateBuffers();
>     }
> 
>     if( releaseGPUMemory )
>     {
>       GPUOutput->GetGPUDataManager()->Initialize();
>     }
>   }
>   else
>   {
>     itkGenericExceptionMacro( << "The filter pointer is null." );
>   }
> }
> 
> 
> 
>> On Wed, Apr 16, 2014 at 3:24 PM, Jose Ignacio Prieto <joseignacio.prieto at gmail.com> wrote:
>> Hi Jim,
>> 
>> It had a different problem when using float. It would show a NAN on the results. That's why I changed to short. 
>> The card has 4GB ram. 
>> 
>> 
>>> On Tue, Apr 15, 2014 at 7:40 PM, Jim Miller <millerjv at gmail.com> wrote:
>>> Does the test for GPUDiscreteGaussian run on your platform?
>>> 
>>> The test uses a pixel type of float. Your code does not. You might try float. 
>>> 
>>> The Gaussian filter will require much more GPU memory than the mean filter. How much memory does your GPU have?
>>> 
>>> Jim
>>> 
>>>> On Apr 15, 2014, at 11:18 AM, Jose Ignacio Prieto <joseignacio.prieto at gmail.com> wrote:
>>>> 
>>>> Hi all, I am having trouble using GPUdiscretegaussian. It works for me on CPU but GPU version gives output 0. I tried running the test code but no help. I do run GPUMean filter. My card is AMDw7000 and using opencl 1.2, itk 4.6
>>>> 
>>>> Here is the code and the output. The images are vtk files of 320x320x231, ushort.
>>>> 
>>>> /*=========================================================================
>>>> *
>>>> *  Copyright Insight Software Consortium
>>>> *
>>>> *  Licensed under the Apache License, Version 2.0 (the "License");
>>>> *  you may not use this file except in compliance with the License.
>>>> *  You may obtain a copy of the License at
>>>> *
>>>> *         http://www.apache.org/licenses/LICENSE-2.0.txt
>>>> *
>>>> *  Unless required by applicable law or agreed to in writing, software
>>>> *  distributed under the License is distributed on an "AS IS" BASIS,
>>>> *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>>> *  See the License for the specific language governing permissions and
>>>> *  limitations under the License.
>>>> *
>>>> *=========================================================================*/
>>>> 
>>>> #include "itkImageFileReader.h"
>>>> #include "itkImageFileWriter.h"
>>>> 
>>>> #include "itkGPUImage.h"
>>>> #include "itkGPUKernelManager.h"
>>>> #include "itkGPUContextManager.h"
>>>> #include "itkGPUImageToImageFilter.h"
>>>> #include "itkGPUNeighborhoodOperatorImageFilter.h"
>>>> 
>>>> #include "itkTimeProbe.h"
>>>> #include "itkGaussianOperator.h"
>>>> 
>>>> #include "itkDiscreteGaussianImageFilter.h"
>>>> #include "itkGPUDiscreteGaussianImageFilter.h"
>>>> #include "itkMeanImageFilter.h"
>>>> #include "itkGPUMeanImageFilter.h"
>>>> 
>>>> //  typedef float InputPixelType;
>>>> //  typedef float OutputPixelType;
>>>> typedef  short InputPixelType;
>>>> typedef  short OutputPixelType;
>>>> 
>>>> typedef itk::GPUImage< InputPixelType,  3 >   InputImageType;
>>>> typedef itk::GPUImage< OutputPixelType, 3 >   OutputImageType;
>>>> 
>>>> 
>>>> 
>>>> typedef itk::ImageFileReader< InputImageType  >  ReaderType;
>>>> typedef itk::ImageFileWriter< OutputImageType >  WriterType;
>>>> 
>>>> 
>>>> 
>>>> int main(int argc, char *argv[])
>>>> {
>>>>     if(!itk::IsGPUAvailable())
>>>>     {
>>>>         std::cerr << "OpenCL-enabled GPU is not present." << std::endl;
>>>>         return EXIT_FAILURE;
>>>>     }
>>>> 
>>>>     if( argc <  3 )
>>>>     {
>>>>         std::cerr << "Error: missing arguments" << std::endl;
>>>>         std::cerr << "inputfile outputfile [num_dimensions]" << std::endl;
>>>>         return EXIT_FAILURE;
>>>>     }
>>>> 
>>>>     std::string inFile( argv[1] );
>>>>     std::string outFile( argv[2] );
>>>> 
>>>>     unsigned int dim = 3;
>>>>     ReaderType::Pointer reader;
>>>>     WriterType::Pointer writer;
>>>>     reader = ReaderType::New();
>>>>     writer = WriterType::New();
>>>> 
>>>>     reader->SetFileName( inFile );
>>>>     writer->SetFileName( outFile );
>>>> 
>>>>     float variance = 4.0;
>>>> 
>>>>     // test 1~8 threads for CPU
>>>>     int nThreads = 8;
>>>> 
>>>>     typedef itk::DiscreteGaussianImageFilter< InputImageType, OutputImageType> CPUFilterType;
>>>>     CPUFilterType::Pointer CPUFilter = CPUFilterType::New();
>>>>     itk::TimeProbe cputimer;
>>>>     cputimer.Start();
>>>>     CPUFilter->SetNumberOfThreads( nThreads );
>>>>     CPUFilter->SetInput( reader->GetOutput() );
>>>>     CPUFilter->SetMaximumKernelWidth(10);
>>>>     CPUFilter->SetUseImageSpacingOff();
>>>>     CPUFilter->SetVariance( variance );
>>>>     CPUFilter->Update();
>>>>     cputimer.Stop();
>>>> 
>>>> //    typedef itk::MeanImageFilter< InputImageType, OutputImageType> CPUFilterType;
>>>> //    CPUFilterType::Pointer CPUFilter = CPUFilterType::New();
>>>> //    itk::TimeProbe cputimer;
>>>> //    cputimer.Start();
>>>> //    CPUFilter->SetNumberOfThreads( nThreads );
>>>> //    CPUFilter->SetInput( reader->GetOutput() );
>>>> ////    CPUFilter->SetMaximumKernelWidth(10);
>>>> ////    CPUFilter->SetUseImageSpacingOff();
>>>> //    CPUFilter->SetRadius( variance );
>>>> //    CPUFilter->Update();
>>>> //    cputimer.Stop();
>>>> 
>>>>     std::cout << "CPU Gaussian Filter took " << cputimer.GetMean() << " seconds with "
>>>>               << CPUFilter->GetNumberOfThreads() << " threads.\n" << std::endl;
>>>> 
>>>>     // -------
>>>> 
>>>>     typedef itk::GPUDiscreteGaussianImageFilter< InputImageType, OutputImageType> GPUFilterType;
>>>>     GPUFilterType::Pointer GPUFilter = GPUFilterType::New();
>>>>     itk::TimeProbe gputimer;
>>>>     gputimer.Start();
>>>>     GPUFilter->SetInput( reader->GetOutput() );
>>>>     GPUFilter->SetVariance( variance );
>>>>     GPUFilter->SetMaximumKernelWidth(10);
>>>>     GPUFilter->SetUseImageSpacingOff();
>>>> //    GPUFilter->DebugOn();
>>>> //    GPUFilter->GPUEnabledOff();
>>>>     GPUFilter->Print(std::cout);
>>>>     GPUFilter->Update();
>>>>     GPUFilter->GetOutput()->UpdateBuffers(); // synchronization point (GPU->CPU memcpy)
>>>>     gputimer.Stop();
>>>>     std::cout << "GPU Gaussian Filter took " << gputimer.GetMean() << " seconds.\n" << std::endl;
>>>> 
>>>> //    typedef itk::GPUMeanImageFilter< InputImageType, OutputImageType> GPUFilterType;
>>>> //    GPUFilterType::Pointer GPUFilter = GPUFilterType::New();
>>>> //    itk::TimeProbe gputimer;
>>>> //    gputimer.Start();
>>>> //    GPUFilter->SetInput( reader->GetOutput() );
>>>> ////    GPUFilter->SetVariance( variance );
>>>> ////    GPUFilter->SetMaximumKernelWidth(10);
>>>> ////    GPUFilter->SetUseImageSpacingOff();
>>>> ////    GPUFilter->DebugOn();
>>>> ////    GPUFilter->Print(std::cout);
>>>> //    GPUFilter->SetRadius( variance );
>>>> //    GPUFilter->Update();
>>>> //    GPUFilter->GetOutput()->UpdateBuffers(); // synchronization point (GPU->CPU memcpy)
>>>> //    gputimer.Stop();
>>>> //    std::cout << "GPU Gaussian Filter took " << gputimer.GetMean() << " seconds.\n" << std::endl;
>>>> 
>>>>     // ---------------
>>>>     // RMS Error check
>>>>     // ---------------
>>>> 
>>>>     double diff = 0;
>>>>     unsigned int nPix = 0;
>>>>     itk::ImageRegionIterator<OutputImageType> cit(CPUFilter->GetOutput(), CPUFilter->GetOutput()->GetLargestPossibleRegion());
>>>>     itk::ImageRegionIterator<OutputImageType> git(GPUFilter->GetOutput(), GPUFilter->GetOutput()->GetLargestPossibleRegion());
>>>> 
>>>>     for(cit.GoToBegin(), git.GoToBegin(); !cit.IsAtEnd(); ++cit, ++git)
>>>>     {
>>>>         double err = (double)(cit.Get()) - (double)(git.Get());
>>>>         //         if(err > 0.1 || (double)cit.Get() < 0.1) std::cout << "CPU : " << (double)(cit.Get()) << ", GPU : " << (double)(git.Get()) << std::endl;
>>>>         diff += err*err;
>>>>         nPix++;
>>>>     }
>>>> 
>>>>     writer->SetInput( GPUFilter->GetOutput() );
>>>> //    writer->SetInput( CPUFilter->GetOutput() );
>>>>     writer->Update();
>>>> 
>>>>     if (nPix > 0)
>>>>     {
>>>>         double RMSError = sqrt( diff / (double)nPix );
>>>>         std::cout << "RMS Error : " << RMSError << std::endl;
>>>>         // the CPU filter operator has type double
>>>>         // but the double precision is not well-supported on most GPUs
>>>>         // and by most drivers at this time.  Therefore, the GPU filter
>>>>         // operator has type float
>>>>         // relax the RMS threshold here to allow for errors due to
>>>>         // differences in precision
>>>>         // NOTE:
>>>>         //   a threshold of 1.2e-5 worked on linux and Mac, but not Windows
>>>>         //   why?
>>>>         double RMSThreshold = 1.7e-5;
>>>>         if (vnl_math_isnan(RMSError))
>>>>         {
>>>>             std::cout << "RMS Error is NaN! nPix: " << nPix << std::endl;
>>>>             return EXIT_FAILURE;
>>>>         }
>>>>         if (RMSError > RMSThreshold)
>>>>         {
>>>>             std::cout << "RMS Error exceeds threshold (" << RMSThreshold << ")" << std::endl;
>>>>             return EXIT_FAILURE;
>>>>         }
>>>>     }
>>>>     else
>>>>     {
>>>>         std::cout << "No pixels in output!" << std::endl;
>>>>         return EXIT_FAILURE;
>>>>     }
>>>> 
>>>> }
>>>> 
>>>> 
>>>> OUTPUT
>>>> 
>>>> 
>>>> Starting C:\DocsMaracuya\Build\Ejemplos\Gpu\GPUTest.exe...
>>>> Platform : AMD Accelerated Parallel Processing
>>>> Platform : AMD Accelerated Parallel Processing
>>>> Pitcairn
>>>> Maximum Work Item Sizes : { 256, 256, 256 }
>>>> Maximum Work Group Size : 256
>>>> Alignment in bits of the base address : 2048
>>>> Smallest alignment in bytes for any data type : 128
>>>> cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_amd_bus_addressable_memory cl_amd_c1x_atomics
>>>> CPU Gaussian Filter took 1.70355 seconds with 8 threads.
>>>> 
>>>> Defines: #define DIM_3
>>>> #define INTYPE short
>>>> #define OUTTYPE short
>>>> #define OPTYPE short
>>>> 
>>>> Defines: #define DIM_3
>>>> #define INTYPE short
>>>> #define OUTTYPE short
>>>> #define OPTYPE short
>>>> 
>>>> Defines: #define DIM_3
>>>> #define INTYPE short
>>>> #define OUTTYPE short
>>>> #define OPTYPE short
>>>> 
>>>> GPUDiscreteGaussianImageFilter (0000000002205DF0)
>>>> RTTI typeinfo: class itk::GPUDiscreteGaussianImageFilter<class itk::GPUImage<short,3>,class itk::GPUImage<short,3> >
>>>> Reference Count: 1
>>>> Modified Time: 560
>>>> Debug: Off
>>>> Object Name:
>>>> Observers:
>>>> none
>>>> Inputs:
>>>> Primary: (000000000216E560) *
>>>> Indexed Inputs:
>>>> 0: Primary (000000000216E560)
>>>> Required Input Names: Primary
>>>> NumberOfRequiredInputs: 1
>>>> Outputs:
>>>> Primary: (000000000218A070)
>>>> Indexed Outputs:
>>>> 0: Primary (000000000218A070)
>>>> NumberOfRequiredOutputs: 1
>>>> Number Of Threads: 8
>>>> ReleaseDataFlag: Off
>>>> ReleaseDataBeforeUpdateFlag: Off
>>>> AbortGenerateData: Off
>>>> Progress: 0
>>>> Multithreader:
>>>> RTTI typeinfo: class itk::MultiThreader
>>>> Reference Count: 1
>>>> Modified Time: 499
>>>> Debug: Off
>>>> Object Name:
>>>> Observers:
>>>> none
>>>> Thread Count: 8
>>>> Global Maximum Number Of Threads: 128
>>>> Global Default Number Of Threads: 8
>>>> CoordinateTolerance: 1e-006
>>>> DirectionTolerance: 1e-006
>>>> Variance: [4, 4, 4]
>>>> MaximumError: [0.01, 0.01, 0.01]
>>>> MaximumKernelWidth: 10
>>>> FilterDimensionality: 3
>>>> UseImageSpacing: 0
>>>> InternalNumberOfStreamDivisions: 9
>>>> GPU: Enabled
>>>> GPU Gaussian Filter took 0.111351 seconds.
>>>> 
>>>> RMS Error : 26.4279
>>>> RMS Error exceeds threshold (1.7e-005)
>>>> C:\DocsMaracuya\Build\Ejemplos\Gpu\GPUTest.exe exited with code 1
>>>> 
>>>> 
>>>> -- 
>>>> José Ignacio Prieto
>>>> celular(nuevo): 94348182
>>>> _____________________________________
>>>> Powered by www.kitware.com
>>>> 
>>>> Visit other Kitware open-source projects at
>>>> http://www.kitware.com/opensource/opensource.html
>>>> 
>>>> Kitware offers ITK Training Courses, for more information visit:
>>>> http://www.kitware.com/products/protraining.php
>>>> 
>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>> 
>>>> Follow this link to subscribe/unsubscribe:
>>>> http://www.itk.org/mailman/listinfo/insight-users
>> 
>> 
>> 
>> -- 
>> José Ignacio Prieto
>> celular(nuevo): 94348182
>> 
>> _____________________________________
>> Powered by www.kitware.com
>> 
>> Visit other Kitware open-source projects at
>> http://www.kitware.com/opensource/opensource.html
>> 
>> Kitware offers ITK Training Courses, for more information visit:
>> http://www.kitware.com/products/protraining.php
>> 
>> Please keep messages on-topic and check the ITK FAQ at:
>> http://www.itk.org/Wiki/ITK_FAQ
>> 
>> Follow this link to subscribe/unsubscribe:
>> http://www.itk.org/mailman/listinfo/insight-users
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-users/attachments/20140422/8d28a13f/attachment-0001.html>