[ITK-users] GPUDiscreteGaussian not working

Denis Shamonin dshamoni at gmail.com
Tue Apr 22 05:28:19 EDT 2014


Hi Jose,



The synchronization from CPU to GPU image and back, may or may not be
triggered by the default in the current ITK implementation.

You have to make an extra effort to control it. Basically, you have to make
sure that GPU input image is allocated and copied from CPU to GPU,

execute the filter which only use input and output images, copy GPU output
image back to CPU.



Ideally, you should not control it and that has to be managed by the ITK
GPU pipeline itself (hided from the user), but this is not a case right now.



There are few problems. First, what you get after calling
reader->GetOutput() is normal ITK image that you passing to the GPU filter,

while expecting GPU image at this moment. What you want is that GPU image
is ready when you call reader->GetOutput() (created, allocated and copied
to GPU).

The second problem may happen right after calling the GPU filter, the
memory for output are not copied back to CPU output image.



What you should do is following:

1 Create GPU input image.



1.1 Register the GPUImageFactory before calling GPUReader =
ReaderType::New();

  itk::ObjectFactoryBase::RegisterFactory( itk::GPUImageFactory::New() );



  At this moment ALL ITK images which are created will be itk::GPUImage's
with memory allocated on GPU.

  Use it with care, you may end up with not intended copying to GPU when
you modify this images.

  The reader once created will also have GPU image inside and that what you
need.

  You may unregister factory right after you have used it.



1.2 Alternative way if registering factory is not possible for you
application.

  But you still want to use GPU filter in the middle of your application
you may consider following:

      gpuInputImage = GPUInputImageType::New();

      gpuInputImage->GraftITKImage( itkimage );  // normal itk image here

      gpuInputImage->AllocateGPU(); // allocate only on GPU

      gpuInputImage->GetGPUDataManager()->SetCPUBufferLock( true ); // we
don't want to change it CPU input

      gpuInputImage->GetGPUDataManager()->SetGPUDirtyFlag( true );  // set
gpu dirty flag

      gpuInputImage->GetGPUDataManager()->UpdateGPUBuffer();  // copy cpu
-> gpu



2. Construct you filter, set input from the reader (or gpuInput image),
call your filter.

  At the moment of construction GPU filter will create GPU output image for
you.



3. Call extra synchronization step after GPUFilter->Update(); (listed below)

itk::GPUExplicitSync< FilterType, OutputImageType >( GPUFilter, false );



4. Write results



I hope that helps a bit. Making or using GPU filters with ITK is a bit of
the challenge right now.

Specially to get it working across all GPU cards available.



You could check the correct execution for itkGPUShrinkImageFilter as
example in The Insight Journal paper:

http://www.insight-journal.org/browse/publication/884



Regards,

-Denis Shamonin

Division of Image Processing (LKEB)

Department of Radiology

Leiden University Medical Center

PO Box 9600, 2300 RC Leiden, The Netherlands


//------------------------------------------------------------------------------
// GPU explicit synchronization helper function
template< class ImageToImageFilterType, class OutputImageType >
void
GPUExplicitSync( typename ImageToImageFilterType::Pointer & filter,
  const bool filterUpdate = true,
  const bool releaseGPUMemory = false )
{
  if( filter.IsNotNull() )
  {
    if( filterUpdate )
    {
      filter->Update();
    }

    typedef typename
OutputImageType::PixelType
OutputImagePixelType;
    typedef GPUImage< OutputImagePixelType, OutputImageType::ImageDimension
> GPUOutputImageType;
    GPUOutputImageType * GPUOutput = dynamic_cast< GPUOutputImageType * >(
filter->GetOutput() );
    if( GPUOutput )
    {
      GPUOutput->UpdateBuffers();
    }

    if( releaseGPUMemory )
    {
      GPUOutput->GetGPUDataManager()->Initialize();
    }
  }
  else
  {
    itkGenericExceptionMacro( << "The filter pointer is null." );
  }
}



On Wed, Apr 16, 2014 at 3:24 PM, Jose Ignacio Prieto <
joseignacio.prieto at gmail.com> wrote:

> Hi Jim,
>
> It had a different problem when using float. It would show a NAN on the
> results. That's why I changed to short.
> The card has 4GB ram.
>
>
> On Tue, Apr 15, 2014 at 7:40 PM, Jim Miller <millerjv at gmail.com> wrote:
>
>> Does the test for GPUDiscreteGaussian run on your platform?
>>
>> The test uses a pixel type of float. Your code does not. You might try
>> float.
>>
>> The Gaussian filter will require much more GPU memory than the mean
>> filter. How much memory does your GPU have?
>>
>> Jim
>>
>> On Apr 15, 2014, at 11:18 AM, Jose Ignacio Prieto <
>> joseignacio.prieto at gmail.com> wrote:
>>
>> Hi all, I am having trouble using GPUdiscretegaussian. It works for me on
>> CPU but GPU version gives output 0. I tried running the test code but no
>> help. I do run GPUMean filter. My card is AMDw7000 and using opencl 1.2,
>> itk 4.6
>>
>> Here is the code and the output. The images are vtk files of 320x320x231,
>> ushort.
>>
>> /*=========================================================================
>>
>> *
>>
>> *  Copyright Insight Software Consortium
>>
>> *
>>
>> *  Licensed under the Apache License, Version 2.0 (the "License");
>>
>> *  you may not use this file except in compliance with the License.
>>
>> *  You may obtain a copy of the License at
>>
>> *
>>
>> *         http://www.apache.org/licenses/LICENSE-2.0.txt
>>
>> *
>>
>> *  Unless required by applicable law or agreed to in writing, software
>>
>> *  distributed under the License is distributed on an "AS IS" BASIS,
>>
>> *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>
>> *  See the License for the specific language governing permissions and
>>
>> *  limitations under the License.
>>
>> *
>>
>> *=========================================================================*/
>>
>>
>> #include "itkImageFileReader.h"
>>
>> #include "itkImageFileWriter.h"
>>
>>
>> #include "itkGPUImage.h"
>>
>> #include "itkGPUKernelManager.h"
>>
>> #include "itkGPUContextManager.h"
>>
>> #include "itkGPUImageToImageFilter.h"
>>
>> #include "itkGPUNeighborhoodOperatorImageFilter.h"
>>
>>
>> #include "itkTimeProbe.h"
>>
>> #include "itkGaussianOperator.h"
>>
>>
>> #include "itkDiscreteGaussianImageFilter.h"
>>
>> #include "itkGPUDiscreteGaussianImageFilter.h"
>>
>> #include "itkMeanImageFilter.h"
>>
>> #include "itkGPUMeanImageFilter.h"
>>
>>
>> //  typedef float InputPixelType;
>>
>> //  typedef float OutputPixelType;
>>
>> typedef  short InputPixelType;
>>
>> typedef  short OutputPixelType;
>>
>>
>> typedef itk::GPUImage< InputPixelType,  3 >   InputImageType;
>>
>> typedef itk::GPUImage< OutputPixelType, 3 >   OutputImageType;
>>
>>
>>
>>
>> typedef itk::ImageFileReader< InputImageType  >  ReaderType;
>>
>> typedef itk::ImageFileWriter< OutputImageType >  WriterType;
>>
>>
>>
>>
>> int main(int argc, char *argv[])
>>
>> {
>>
>>     if(!itk::IsGPUAvailable())
>>
>>     {
>>
>>         std::cerr << "OpenCL-enabled GPU is not present." << std::endl;
>>
>>         return EXIT_FAILURE;
>>
>>     }
>>
>>
>>     if( argc <  3 )
>>
>>     {
>>
>>         std::cerr << "Error: missing arguments" << std::endl;
>>
>>         std::cerr << "inputfile outputfile [num_dimensions]" << std::endl;
>>
>>         return EXIT_FAILURE;
>>
>>     }
>>
>>
>>     std::string inFile( argv[1] );
>>
>>     std::string outFile( argv[2] );
>>
>>
>>     unsigned int dim = 3;
>>
>>     ReaderType::Pointer reader;
>>
>>     WriterType::Pointer writer;
>>
>>     reader = ReaderType::New();
>>
>>     writer = WriterType::New();
>>
>>
>>     reader->SetFileName( inFile );
>>
>>     writer->SetFileName( outFile );
>>
>>
>>     float variance = 4.0;
>>
>>
>>     // test 1~8 threads for CPU
>>
>>     int nThreads = 8;
>>
>>
>>     typedef itk::DiscreteGaussianImageFilter< InputImageType, OutputImageType> CPUFilterType;
>>
>>     CPUFilterType::Pointer CPUFilter = CPUFilterType::New();
>>
>>     itk::TimeProbe cputimer;
>>
>>     cputimer.Start();
>>
>>     CPUFilter->SetNumberOfThreads( nThreads );
>>
>>     CPUFilter->SetInput( reader->GetOutput() );
>>
>>     CPUFilter->SetMaximumKernelWidth(10);
>>
>>     CPUFilter->SetUseImageSpacingOff();
>>
>>     CPUFilter->SetVariance( variance );
>>
>>     CPUFilter->Update();
>>
>>     cputimer.Stop();
>>
>>
>> //    typedef itk::MeanImageFilter< InputImageType, OutputImageType> CPUFilterType;
>>
>> //    CPUFilterType::Pointer CPUFilter = CPUFilterType::New();
>>
>> //    itk::TimeProbe cputimer;
>>
>> //    cputimer.Start();
>>
>> //    CPUFilter->SetNumberOfThreads( nThreads );
>>
>> //    CPUFilter->SetInput( reader->GetOutput() );
>>
>> ////    CPUFilter->SetMaximumKernelWidth(10);
>>
>> ////    CPUFilter->SetUseImageSpacingOff();
>>
>> //    CPUFilter->SetRadius( variance );
>>
>> //    CPUFilter->Update();
>>
>> //    cputimer.Stop();
>>
>>
>>     std::cout << "CPU Gaussian Filter took " << cputimer.GetMean() << " seconds with "
>>
>>               << CPUFilter->GetNumberOfThreads() << " threads.\n" << std::endl;
>>
>>
>>     // -------
>>
>>
>>     typedef itk::GPUDiscreteGaussianImageFilter< InputImageType, OutputImageType> GPUFilterType;
>>
>>     GPUFilterType::Pointer GPUFilter = GPUFilterType::New();
>>
>>     itk::TimeProbe gputimer;
>>
>>     gputimer.Start();
>>
>>     GPUFilter->SetInput( reader->GetOutput() );
>>
>>     GPUFilter->SetVariance( variance );
>>
>>     GPUFilter->SetMaximumKernelWidth(10);
>>
>>     GPUFilter->SetUseImageSpacingOff();
>>
>> //    GPUFilter->DebugOn();
>>
>> //    GPUFilter->GPUEnabledOff();
>>
>>     GPUFilter->Print(std::cout);
>>
>>     GPUFilter->Update();
>>
>>     GPUFilter->GetOutput()->UpdateBuffers(); // synchronization point (GPU->CPU memcpy)
>>
>>     gputimer.Stop();
>>
>>     std::cout << "GPU Gaussian Filter took " << gputimer.GetMean() << " seconds.\n" << std::endl;
>>
>>
>> //    typedef itk::GPUMeanImageFilter< InputImageType, OutputImageType> GPUFilterType;
>>
>> //    GPUFilterType::Pointer GPUFilter = GPUFilterType::New();
>>
>> //    itk::TimeProbe gputimer;
>>
>> //    gputimer.Start();
>>
>> //    GPUFilter->SetInput( reader->GetOutput() );
>>
>> ////    GPUFilter->SetVariance( variance );
>>
>> ////    GPUFilter->SetMaximumKernelWidth(10);
>>
>> ////    GPUFilter->SetUseImageSpacingOff();
>>
>> ////    GPUFilter->DebugOn();
>>
>> ////    GPUFilter->Print(std::cout);
>>
>> //    GPUFilter->SetRadius( variance );
>>
>> //    GPUFilter->Update();
>>
>> //    GPUFilter->GetOutput()->UpdateBuffers(); // synchronization point (GPU->CPU memcpy)
>>
>> //    gputimer.Stop();
>>
>> //    std::cout << "GPU Gaussian Filter took " << gputimer.GetMean() << " seconds.\n" << std::endl;
>>
>>
>>     // ---------------
>>
>>     // RMS Error check
>>
>>     // ---------------
>>
>>
>>     double diff = 0;
>>
>>     unsigned int nPix = 0;
>>
>>     itk::ImageRegionIterator<OutputImageType> cit(CPUFilter->GetOutput(), CPUFilter->GetOutput()->GetLargestPossibleRegion());
>>
>>     itk::ImageRegionIterator<OutputImageType> git(GPUFilter->GetOutput(), GPUFilter->GetOutput()->GetLargestPossibleRegion());
>>
>>
>>     for(cit.GoToBegin(), git.GoToBegin(); !cit.IsAtEnd(); ++cit, ++git)
>>
>>     {
>>
>>         double err = (double)(cit.Get()) - (double)(git.Get());
>>
>>         //         if(err > 0.1 || (double)cit.Get() < 0.1) std::cout << "CPU : " << (double)(cit.Get()) << ", GPU : " << (double)(git.Get()) << std::endl;
>>
>>         diff += err*err;
>>
>>         nPix++;
>>
>>     }
>>
>>
>>     writer->SetInput( GPUFilter->GetOutput() );
>>
>> //    writer->SetInput( CPUFilter->GetOutput() );
>>
>>     writer->Update();
>>
>>
>>     if (nPix > 0)
>>
>>     {
>>
>>         double RMSError = sqrt( diff / (double)nPix );
>>
>>         std::cout << "RMS Error : " << RMSError << std::endl;
>>
>>         // the CPU filter operator has type double
>>
>>         // but the double precision is not well-supported on most GPUs
>>
>>         // and by most drivers at this time.  Therefore, the GPU filter
>>
>>         // operator has type float
>>
>>         // relax the RMS threshold here to allow for errors due to
>>
>>         // differences in precision
>>
>>         // NOTE:
>>
>>         //   a threshold of 1.2e-5 worked on linux and Mac, but not Windows
>>
>>         //   why?
>>
>>         double RMSThreshold = 1.7e-5;
>>
>>         if (vnl_math_isnan(RMSError))
>>
>>         {
>>
>>             std::cout << "RMS Error is NaN! nPix: " << nPix << std::endl;
>>
>>             return EXIT_FAILURE;
>>
>>         }
>>
>>         if (RMSError > RMSThreshold)
>>
>>         {
>>
>>             std::cout << "RMS Error exceeds threshold (" << RMSThreshold << ")" << std::endl;
>>
>>             return EXIT_FAILURE;
>>
>>         }
>>
>>     }
>>
>>     else
>>
>>     {
>>
>>         std::cout << "No pixels in output!" << std::endl;
>>
>>         return EXIT_FAILURE;
>>
>>     }
>>
>>
>> }
>>
>>
>>
>> OUTPUT
>>
>>
>> Starting C:\DocsMaracuya\Build\Ejemplos\Gpu\GPUTest.exe...
>>
>> Platform : AMD Accelerated Parallel Processing
>>
>> Platform : AMD Accelerated Parallel Processing
>>
>> Pitcairn
>>
>> Maximum Work Item Sizes : { 256, 256, 256 }
>>
>> Maximum Work Group Size : 256
>>
>> Alignment in bits of the base address : 2048
>>
>> Smallest alignment in bytes for any data type : 128
>>
>> cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics
>> cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics
>> cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics
>> cl_khr_int64_extended_atomics cl_khr_3d_image_writes
>> cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32
>> cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops
>> cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing
>> cl_amd_bus_addressable_memory cl_amd_c1x_atomics
>>
>> CPU Gaussian Filter took 1.70355 seconds with 8 threads.
>>
>>
>> Defines: #define DIM_3
>>
>> #define INTYPE short
>>
>> #define OUTTYPE short
>>
>> #define OPTYPE short
>>
>>
>> Defines: #define DIM_3
>>
>> #define INTYPE short
>>
>> #define OUTTYPE short
>>
>> #define OPTYPE short
>>
>>
>> Defines: #define DIM_3
>>
>> #define INTYPE short
>>
>> #define OUTTYPE short
>>
>> #define OPTYPE short
>>
>>
>> GPUDiscreteGaussianImageFilter (0000000002205DF0)
>>
>> RTTI typeinfo: class itk::GPUDiscreteGaussianImageFilter<class
>> itk::GPUImage<short,3>,class itk::GPUImage<short,3> >
>>
>> Reference Count: 1
>>
>> Modified Time: 560
>>
>> Debug: Off
>>
>> Object Name:
>>
>> Observers:
>>
>> none
>>
>> Inputs:
>>
>> Primary: (000000000216E560) *
>>
>> Indexed Inputs:
>>
>> 0: Primary (000000000216E560)
>>
>> Required Input Names: Primary
>>
>> NumberOfRequiredInputs: 1
>>
>> Outputs:
>>
>> Primary: (000000000218A070)
>>
>> Indexed Outputs:
>>
>> 0: Primary (000000000218A070)
>>
>> NumberOfRequiredOutputs: 1
>>
>> Number Of Threads: 8
>>
>> ReleaseDataFlag: Off
>>
>> ReleaseDataBeforeUpdateFlag: Off
>>
>> AbortGenerateData: Off
>>
>> Progress: 0
>>
>> Multithreader:
>>
>> RTTI typeinfo: class itk::MultiThreader
>>
>> Reference Count: 1
>>
>> Modified Time: 499
>>
>> Debug: Off
>>
>> Object Name:
>>
>> Observers:
>>
>> none
>>
>> Thread Count: 8
>>
>> Global Maximum Number Of Threads: 128
>>
>> Global Default Number Of Threads: 8
>>
>> CoordinateTolerance: 1e-006
>>
>> DirectionTolerance: 1e-006
>>
>> Variance: [4, 4, 4]
>>
>> MaximumError: [0.01, 0.01, 0.01]
>>
>> MaximumKernelWidth: 10
>>
>> FilterDimensionality: 3
>>
>> UseImageSpacing: 0
>>
>> InternalNumberOfStreamDivisions: 9
>>
>> GPU: Enabled
>>
>> GPU Gaussian Filter took 0.111351 seconds.
>>
>>
>> RMS Error : 26.4279
>>
>> RMS Error exceeds threshold (1.7e-005)
>>
>> C:\DocsMaracuya\Build\Ejemplos\Gpu\GPUTest.exe exited with code 1
>>
>>
>> --
>> José Ignacio Prieto
>> celular(nuevo): 94348182
>>
>> _____________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at
>> http://www.kitware.com/opensource/opensource.html
>>
>> Kitware offers ITK Training Courses, for more information visit:
>> http://www.kitware.com/products/protraining.php
>>
>> Please keep messages on-topic and check the ITK FAQ at:
>> http://www.itk.org/Wiki/ITK_FAQ
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.itk.org/mailman/listinfo/insight-users
>>
>>
>
>
> --
> José Ignacio Prieto
> celular(nuevo): 94348182
>
> _____________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Kitware offers ITK Training Courses, for more information visit:
> http://www.kitware.com/products/protraining.php
>
> Please keep messages on-topic and check the ITK FAQ at:
> http://www.itk.org/Wiki/ITK_FAQ
>
> Follow this link to subscribe/unsubscribe:
> http://www.itk.org/mailman/listinfo/insight-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-users/attachments/20140422/d13b6802/attachment-0001.html>


More information about the Insight-users mailing list