[ITK-users] GPUDiscreteGaussian not working

Wed Apr 23 14:28:41 EDT 2014

I also get an output image filled with 0 when I run the GPUDiscreteGaussianImageFilterTest with the ITK test image (Examples/Data/BrainProtonDensitySlice.png) and pixel types changed to short.  However, I do not see the NaNs that José saw when using float.  José, can you try running your code again with pixel types as float using either the ITK test image or using another image that starts out as float (perhaps there is trouble reading in your vtk ushort image) to see if you get NaNs with these images as well?  I do think there is a bug with the way shorts are handled in this filter, but I'm trying to determine whether your float NaNs are a separate bug or a related issue.

I used the following command to run the test code with various test images :
>> /path/to/build/ITK/bin/ITKGPUSmoothingTestDriver itkGPUDiscreteGaussianImageFilterTest /path/to/source/ITK/Examples/Data/BrainProtonDensitySlice.png /path/to/output/gpuGaussianImageFilterTest3DOutput.mha 3

-Kris

> I cannot follow this either. ITK v4 GPU framework do not require users to manually synchronize because ITK will handle dirty flags / data synchronization automatically.
> 
> Won-Ki
> 
> 
> 2014-04-22 19:43 GMT+09:00 Jim Miller <millerjv at gmail.com>:
> Denis,
> 
> I am not following your recommendations for Jose. 
> 
> Are you stating that sometimes ITK does not copy the result of the GPU filter back into the CPU memory?
> 
> A user should not have to use e methods you are directing Jose towards. 
> 
> Jim
> 
> On Apr 22, 2014, at 5:28 AM, Denis Shamonin <dshamoni at gmail.com> wrote:
> 
>> Hi Jose,
>> 
>>  
>> The synchronization from CPU to GPU image and back, may or may not be triggered by the default in the current ITK implementation.
>> 
>> You have to make an extra effort to control it. Basically, you have to make sure that GPU input image is allocated and copied from CPU to GPU,
>> 
>> execute the filter which only use input and output images, copy GPU output image back to CPU.
>> 
>>  
>> Ideally, you should not control it and that has to be managed by the ITK GPU pipeline itself (hided from the user), but this is not a case right now.
>> 
>>  
>> There are few problems. First, what you get after calling reader->GetOutput() is normal ITK image that you passing to the GPU filter,
>> 
>> while expecting GPU image at this moment. What you want is that GPU image is ready when you call reader->GetOutput() (created, allocated and copied to GPU).
>> 
>> The second problem may happen right after calling the GPU filter, the memory for output are not copied back to CPU output image.
>> 
>>  
>> What you should do is following:
>> 
>> 1 Create GPU input image.
>> 
>>  
>> 1.1 Register the GPUImageFactory before calling GPUReader = ReaderType::New();
>> 
>>   itk::ObjectFactoryBase::RegisterFactory( itk::GPUImageFactory::New() );
>> 
>>   
>>   At this moment ALL ITK images which are created will be itk::GPUImage's with memory allocated on GPU.
>> 
>>   Use it with care, you may end up with not intended copying to GPU when you modify this images.
>> 
>>   The reader once created will also have GPU image inside and that what you need.
>> 
>>   You may unregister factory right after you have used it.
>> 
>>  
>> 1.2 Alternative way if registering factory is not possible for you application.
>> 
>>   But you still want to use GPU filter in the middle of your application you may consider following:
>> 
>>       gpuInputImage = GPUInputImageType::New();
>> 
>>       gpuInputImage->GraftITKImage( itkimage );  // normal itk image here
>> 
>>       gpuInputImage->AllocateGPU(); // allocate only on GPU
>> 
>>       gpuInputImage->GetGPUDataManager()->SetCPUBufferLock( true ); // we don't want to change it CPU input
>> 
>>       gpuInputImage->GetGPUDataManager()->SetGPUDirtyFlag( true );  // set gpu dirty flag
>> 
>>       gpuInputImage->GetGPUDataManager()->UpdateGPUBuffer();  // copy cpu -> gpu
>> 
>>       
>> 2. Construct you filter, set input from the reader (or gpuInput image), call your filter.
>> 
>>   At the moment of construction GPU filter will create GPU output image for you.
>> 
>>  
>> 3. Call extra synchronization step after GPUFilter->Update(); (listed below)
>> 
>> itk::GPUExplicitSync< FilterType, OutputImageType >( GPUFilter, false );
>> 
>>  
>> 4. Write results
>> 
>>  
>> I hope that helps a bit. Making or using GPU filters with ITK is a bit of the challenge right now.
>> 
>> Specially to get it working across all GPU cards available.
>> 
>>  
>> You could check the correct execution for itkGPUShrinkImageFilter as example in The Insight Journal paper:
>> 
>> http://www.insight-journal.org/browse/publication/884
>> 
>>  
>> Regards,
>> 
>> -Denis Shamonin
>> 
>> Division of Image Processing (LKEB)
>> 
>> Department of Radiology
>> 
>> Leiden University Medical Center
>> 
>> PO Box 9600, 2300 RC Leiden, The Netherlands
>> 
>> 
>> 
>> //------------------------------------------------------------------------------
>> // GPU explicit synchronization helper function
>> template< class ImageToImageFilterType, class OutputImageType >
>> void
>> GPUExplicitSync( typename ImageToImageFilterType::Pointer & filter,
>>   const bool filterUpdate = true,
>>   const bool releaseGPUMemory = false )
>> {
>>   if( filter.IsNotNull() )
>>   {
>>     if( filterUpdate )
>>     {
>>       filter->Update();
>>     }
>> 
>>     typedef typename OutputImageType::PixelType                               OutputImagePixelType;
>>     typedef GPUImage< OutputImagePixelType, OutputImageType::ImageDimension > GPUOutputImageType;
>>     GPUOutputImageType * GPUOutput = dynamic_cast< GPUOutputImageType * >( filter->GetOutput() );
>>     if( GPUOutput )
>>     {
>>       GPUOutput->UpdateBuffers();
>>     }
>> 
>>     if( releaseGPUMemory )
>>     {
>>       GPUOutput->GetGPUDataManager()->Initialize();
>>     }
>>   }
>>   else
>>   {
>>     itkGenericExceptionMacro( << "The filter pointer is null." );
>>   }
>> }
>> 
>> 
>> 
>> On Wed, Apr 16, 2014 at 3:24 PM, Jose Ignacio Prieto <joseignacio.prieto at gmail.com> wrote:
>> Hi Jim,
>> 
>> It had a different problem when using float. It would show a NAN on the results. That's why I changed to short. 
>> The card has 4GB ram. 
>> 
>> 
>> On Tue, Apr 15, 2014 at 7:40 PM, Jim Miller <millerjv at gmail.com> wrote:
>> Does the test for GPUDiscreteGaussian run on your platform?
>> 
>> The test uses a pixel type of float. Your code does not. You might try float. 
>> 
>> The Gaussian filter will require much more GPU memory than the mean filter. How much memory does your GPU have?
>> 
>> Jim
>> 
>> On Apr 15, 2014, at 11:18 AM, Jose Ignacio Prieto <joseignacio.prieto at gmail.com> wrote:
>> 
>>> Hi all, I am having trouble using GPUdiscretegaussian. It works for me on CPU but GPU version gives output 0. I tried running the test code but no help. I do run GPUMean filter. My card is AMDw7000 and using opencl 1.2, itk 4.6
>>> 
>>> Here is the code and the output. The images are vtk files of 320x320x231, ushort.
>>> 
>>> /*=========================================================================
>>> *
>>> *  Copyright Insight Software Consortium
>>> *
>>> *  Licensed under the Apache License, Version 2.0 (the "License");
>>> *  you may not use this file except in compliance with the License.
>>> *  You may obtain a copy of the License at
>>> *
>>> *         http://www.apache.org/licenses/LICENSE-2.0.txt
>>> *
>>> *  Unless required by applicable law or agreed to in writing, software
>>> *  distributed under the License is distributed on an "AS IS" BASIS,
>>> *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>> *  See the License for the specific language governing permissions and
>>> *  limitations under the License.
>>> *
>>> *=========================================================================*/
>>> 
>>> #include "itkImageFileReader.h"
>>> #include "itkImageFileWriter.h"
>>> 
>>> #include "itkGPUImage.h"
>>> #include "itkGPUKernelManager.h"
>>> #include "itkGPUContextManager.h"
>>> #include "itkGPUImageToImageFilter.h"
>>> #include "itkGPUNeighborhoodOperatorImageFilter.h"
>>> 
>>> #include "itkTimeProbe.h"
>>> #include "itkGaussianOperator.h"
>>> 
>>> #include "itkDiscreteGaussianImageFilter.h"
>>> #include "itkGPUDiscreteGaussianImageFilter.h"
>>> #include "itkMeanImageFilter.h"
>>> #include "itkGPUMeanImageFilter.h"
>>> 
>>> //  typedef float InputPixelType;
>>> //  typedef float OutputPixelType;
>>> typedef  short InputPixelType;
>>> typedef  short OutputPixelType;
>>> 
>>> typedef itk::GPUImage< InputPixelType,  3 >   InputImageType;
>>> typedef itk::GPUImage< OutputPixelType, 3 >   OutputImageType;
>>> 
>>> 
>>> 
>>> typedef itk::ImageFileReader< InputImageType  >  ReaderType;
>>> typedef itk::ImageFileWriter< OutputImageType >  WriterType;
>>> 
>>> 
>>> 
>>> int main(int argc, char *argv[])
>>> {
>>>     if(!itk::IsGPUAvailable())
>>>     {
>>>         std::cerr << "OpenCL-enabled GPU is not present." << std::endl;
>>>         return EXIT_FAILURE;
>>>     }
>>> 
>>>     if( argc <  3 )
>>>     {
>>>         std::cerr << "Error: missing arguments" << std::endl;
>>>         std::cerr << "inputfile outputfile [num_dimensions]" << std::endl;
>>>         return EXIT_FAILURE;
>>>     }
>>> 
>>>     std::string inFile( argv[1] );
>>>     std::string outFile( argv[2] );
>>> 
>>>     unsigned int dim = 3;
>>>     ReaderType::Pointer reader;
>>>     WriterType::Pointer writer;
>>>     reader = ReaderType::New();
>>>     writer = WriterType::New();
>>> 
>>>     reader->SetFileName( inFile );
>>>     writer->SetFileName( outFile );
>>> 
>>>     float variance = 4.0;
>>> 
>>>     // test 1~8 threads for CPU
>>>     int nThreads = 8;
>>> 
>>>     typedef itk::DiscreteGaussianImageFilter< InputImageType, OutputImageType> CPUFilterType;
>>>     CPUFilterType::Pointer CPUFilter = CPUFilterType::New();
>>>     itk::TimeProbe cputimer;
>>>     cputimer.Start();
>>>     CPUFilter->SetNumberOfThreads( nThreads );
>>>     CPUFilter->SetInput( reader->GetOutput() );
>>>     CPUFilter->SetMaximumKernelWidth(10);
>>>     CPUFilter->SetUseImageSpacingOff();
>>>     CPUFilter->SetVariance( variance );
>>>     CPUFilter->Update();
>>>     cputimer.Stop();
>>> 
>>> //    typedef itk::MeanImageFilter< InputImageType, OutputImageType> CPUFilterType;
>>> //    CPUFilterType::Pointer CPUFilter = CPUFilterType::New();
>>> //    itk::TimeProbe cputimer;
>>> //    cputimer.Start();
>>> //    CPUFilter->SetNumberOfThreads( nThreads );
>>> //    CPUFilter->SetInput( reader->GetOutput() );
>>> ////    CPUFilter->SetMaximumKernelWidth(10);
>>> ////    CPUFilter->SetUseImageSpacingOff();
>>> //    CPUFilter->SetRadius( variance );
>>> //    CPUFilter->Update();
>>> //    cputimer.Stop();
>>> 
>>>     std::cout << "CPU Gaussian Filter took " << cputimer.GetMean() << " seconds with "
>>>               << CPUFilter->GetNumberOfThreads() << " threads.\n" << std::endl;
>>> 
>>>     // -------
>>> 
>>>     typedef itk::GPUDiscreteGaussianImageFilter< InputImageType, OutputImageType> GPUFilterType;
>>>     GPUFilterType::Pointer GPUFilter = GPUFilterType::New();
>>>     itk::TimeProbe gputimer;
>>>     gputimer.Start();
>>>     GPUFilter->SetInput( reader->GetOutput() );
>>>     GPUFilter->SetVariance( variance );
>>>     GPUFilter->SetMaximumKernelWidth(10);
>>>     GPUFilter->SetUseImageSpacingOff();
>>> //    GPUFilter->DebugOn();
>>> //    GPUFilter->GPUEnabledOff();
>>>     GPUFilter->Print(std::cout);
>>>     GPUFilter->Update();
>>>     GPUFilter->GetOutput()->UpdateBuffers(); // synchronization point (GPU->CPU memcpy)
>>>     gputimer.Stop();
>>>     std::cout << "GPU Gaussian Filter took " << gputimer.GetMean() << " seconds.\n" << std::endl;
>>> 
>>> //    typedef itk::GPUMeanImageFilter< InputImageType, OutputImageType> GPUFilterType;
>>> //    GPUFilterType::Pointer GPUFilter = GPUFilterType::New();
>>> //    itk::TimeProbe gputimer;
>>> //    gputimer.Start();
>>> //    GPUFilter->SetInput( reader->GetOutput() );
>>> ////    GPUFilter->SetVariance( variance );
>>> ////    GPUFilter->SetMaximumKernelWidth(10);
>>> ////    GPUFilter->SetUseImageSpacingOff();
>>> ////    GPUFilter->DebugOn();
>>> ////    GPUFilter->Print(std::cout);
>>> //    GPUFilter->SetRadius( variance );
>>> //    GPUFilter->Update();
>>> //    GPUFilter->GetOutput()->UpdateBuffers(); // synchronization point (GPU->CPU memcpy)
>>> //    gputimer.Stop();
>>> //    std::cout << "GPU Gaussian Filter took " << gputimer.GetMean() << " seconds.\n" << std::endl;
>>> 
>>>     // ---------------
>>>     // RMS Error check
>>>     // ---------------
>>> 
>>>     double diff = 0;
>>>     unsigned int nPix = 0;
>>>     itk::ImageRegionIterator<OutputImageType> cit(CPUFilter->GetOutput(), CPUFilter->GetOutput()->GetLargestPossibleRegion());
>>>     itk::ImageRegionIterator<OutputImageType> git(GPUFilter->GetOutput(), GPUFilter->GetOutput()->GetLargestPossibleRegion());
>>> 
>>>     for(cit.GoToBegin(), git.GoToBegin(); !cit.IsAtEnd(); ++cit, ++git)
>>>     {
>>>         double err = (double)(cit.Get()) - (double)(git.Get());
>>>         //         if(err > 0.1 || (double)cit.Get() < 0.1) std::cout << "CPU : " << (double)(cit.Get()) << ", GPU : " << (double)(git.Get()) << std::endl;
>>>         diff += err*err;
>>>         nPix++;
>>>     }
>>> 
>>>     writer->SetInput( GPUFilter->GetOutput() );
>>> //    writer->SetInput( CPUFilter->GetOutput() );
>>>     writer->Update();
>>> 
>>>     if (nPix > 0)
>>>     {
>>>         double RMSError = sqrt( diff / (double)nPix );
>>>         std::cout << "RMS Error : " << RMSError << std::endl;
>>>         // the CPU filter operator has type double
>>>         // but the double precision is not well-supported on most GPUs
>>>         // and by most drivers at this time.  Therefore, the GPU filter
>>>         // operator has type float
>>>         // relax the RMS threshold here to allow for errors due to
>>>         // differences in precision
>>>         // NOTE:
>>>         //   a threshold of 1.2e-5 worked on linux and Mac, but not Windows
>>>         //   why?
>>>         double RMSThreshold = 1.7e-5;
>>>         if (vnl_math_isnan(RMSError))
>>>         {
>>>             std::cout << "RMS Error is NaN! nPix: " << nPix << std::endl;
>>>             return EXIT_FAILURE;
>>>         }
>>>         if (RMSError > RMSThreshold)
>>>         {
>>>             std::cout << "RMS Error exceeds threshold (" << RMSThreshold << ")" << std::endl;
>>>             return EXIT_FAILURE;
>>>         }
>>>     }
>>>     else
>>>     {
>>>         std::cout << "No pixels in output!" << std::endl;
>>>         return EXIT_FAILURE;
>>>     }
>>> 
>>> }
>>> 
>>> 
>>> OUTPUT
>>> 
>>> 
>>> Starting C:\DocsMaracuya\Build\Ejemplos\Gpu\GPUTest.exe...
>>> Platform : AMD Accelerated Parallel Processing
>>> Platform : AMD Accelerated Parallel Processing
>>> Pitcairn
>>> Maximum Work Item Sizes : { 256, 256, 256 }
>>> Maximum Work Group Size : 256
>>> Alignment in bits of the base address : 2048
>>> Smallest alignment in bytes for any data type : 128
>>> cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_amd_bus_addressable_memory cl_amd_c1x_atomics
>>> CPU Gaussian Filter took 1.70355 seconds with 8 threads.
>>> 
>>> Defines: #define DIM_3
>>> #define INTYPE short
>>> #define OUTTYPE short
>>> #define OPTYPE short
>>> 
>>> Defines: #define DIM_3
>>> #define INTYPE short
>>> #define OUTTYPE short
>>> #define OPTYPE short
>>> 
>>> Defines: #define DIM_3
>>> #define INTYPE short
>>> #define OUTTYPE short
>>> #define OPTYPE short
>>> 
>>> GPUDiscreteGaussianImageFilter (0000000002205DF0)
>>> RTTI typeinfo: class itk::GPUDiscreteGaussianImageFilter<class itk::GPUImage<short,3>,class itk::GPUImage<short,3> >
>>> Reference Count: 1
>>> Modified Time: 560
>>> Debug: Off
>>> Object Name:
>>> Observers:
>>> none
>>> Inputs:
>>> Primary: (000000000216E560) *
>>> Indexed Inputs:
>>> 0: Primary (000000000216E560)
>>> Required Input Names: Primary
>>> NumberOfRequiredInputs: 1
>>> Outputs:
>>> Primary: (000000000218A070)
>>> Indexed Outputs:
>>> 0: Primary (000000000218A070)
>>> NumberOfRequiredOutputs: 1
>>> Number Of Threads: 8
>>> ReleaseDataFlag: Off
>>> ReleaseDataBeforeUpdateFlag: Off
>>> AbortGenerateData: Off
>>> Progress: 0
>>> Multithreader:
>>> RTTI typeinfo: class itk::MultiThreader
>>> Reference Count: 1
>>> Modified Time: 499
>>> Debug: Off
>>> Object Name:
>>> Observers:
>>> none
>>> Thread Count: 8
>>> Global Maximum Number Of Threads: 128
>>> Global Default Number Of Threads: 8
>>> CoordinateTolerance: 1e-006
>>> DirectionTolerance: 1e-006
>>> Variance: [4, 4, 4]
>>> MaximumError: [0.01, 0.01, 0.01]
>>> MaximumKernelWidth: 10
>>> FilterDimensionality: 3
>>> UseImageSpacing: 0
>>> InternalNumberOfStreamDivisions: 9
>>> GPU: Enabled
>>> GPU Gaussian Filter took 0.111351 seconds.
>>> 
>>> RMS Error : 26.4279
>>> RMS Error exceeds threshold (1.7e-005)
>>> C:\DocsMaracuya\Build\Ejemplos\Gpu\GPUTest.exe exited with code 1
>>> 
>>> 
>>> -- 
>>> José Ignacio Prieto
>>> celular(nuevo): 94348182
>>> _____________________________________
>>> Powered by www.kitware.com
>>> 
>>> Visit other Kitware open-source projects at
>>> http://www.kitware.com/opensource/opensource.html
>>> 
>>> Kitware offers ITK Training Courses, for more information visit:
>>> http://www.kitware.com/products/protraining.php
>>> 
>>> Please keep messages on-topic and check the ITK FAQ at:
>>> http://www.itk.org/Wiki/ITK_FAQ
>>> 
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.itk.org/mailman/listinfo/insight-users
>> 
>> 
>> 
>> -- 
>> José Ignacio Prieto
>> celular(nuevo): 94348182
>> 
>> _____________________________________
>> Powered by www.kitware.com
>> 
>> Visit other Kitware open-source projects at
>> http://www.kitware.com/opensource/opensource.html
>> 
>> Kitware offers ITK Training Courses, for more information visit:
>> http://www.kitware.com/products/protraining.php
>> 
>> Please keep messages on-topic and check the ITK FAQ at:
>> http://www.itk.org/Wiki/ITK_FAQ
>> 
>> Follow this link to subscribe/unsubscribe:
>> http://www.itk.org/mailman/listinfo/insight-users
>> 
>> 
> 
> 
> 
> -- 
> Won-Ki Jeong, PhD
> Assistant Professor
> Electrical and Computer Engineering
> Ulsan National Institute of Science and Technology (UNIST)
> 100 Banyeon-ri Eonyang-eup, Ulju-gun
> Ulsan, Korea 689-798
> Tel: +82-52-217-2131
> http://hvcl.unist.ac.kr
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-users/attachments/20140423/a5a755f1/attachment-0001.html>