Difference between revisions of "ITK/Release 4/GPU Acceleration"

From KitwarePublic
< ITK‎ | Release 4
Jump to navigationJump to search
 
(One intermediate revision by the same user not shown)
Line 185: Line 185:
|Mac 10.7.3 || ATI Radeon HD 5870 || nvidia, included with OS || ? || 3D Mean Filter FAILS
|Mac 10.7.3 || ATI Radeon HD 5870 || nvidia, included with OS || ? || 3D Mean Filter FAILS
|-
|-
|Mac 10.7.2 || NVIDIA GeForce GT 120 (9500 GT) || nvidia, included with OS || v 1.50.63, get info string 1.50.63 || ?
|Mac 10.7.2 || NVIDIA GeForce GT 120 (9500 GT) || nvidia, included with OS || v 1.50.63, get info string 1.50.63 || PASS
|-
|-
|Linux OpenSuse 12.1 (x86_64) || NVIDIA Quadro FX 5600 || x86_64 295.33 || nvidia Revision 10327 || PASS
|Linux OpenSUSE 11.4 (x86_64) || NVIDIA GeForce GTX 570 || x86_64 280.13 || NVIDIA GPU Computing SDK v? || PASS
|-
|-
|Linux OpenSuse 12.1 (x86_64) || NVIDIA GeForce GT120 (9500 GT) || x86_64 295.40-15.1 || nvidia 295.40-15.1 || PASS
|Linux OpenSUSE 11.4 (x86_64) || NVIDIA Tesla C2070 || x86_64 280.13 || NVIDIA GPU Computing SDK v? || PASS
|-
|-
|Linux OpenSuse 12.1 (x86_64) || NVIDIA GeForce GT120 (9500 GT) || x86_64 295.40-15.1 || intel 1.5 x64 || PASS
|Linux OpenSUSE 12.1 (x86_64) || NVIDIA Quadro FX 5600 || x86_64 295.33 || nvidia Revision 10327 || PASS
|-
|Linux OpenSUSE 12.1 (x86_64) || NVIDIA GeForce GT120 (9500 GT) || x86_64 295.40-15.1 || nvidia 295.40-15.1 || PASS
|-
|Linux OpenSUSE 12.1 (x86_64) || NVIDIA GeForce GT120 (9500 GT) || x86_64 295.40-15.1 || intel 1.5 x64 || PASS
|-
|-
|Windows 7 Professional SP1 (64-bit) || NVIDIA GeForce 8800 GTX || 8.17.12.8562 (285.62) || NVIDIA GPU Computing SDK 4.0.19 Win 64 || PASS
|Windows 7 Professional SP1 (64-bit) || NVIDIA GeForce 8800 GTX || 8.17.12.8562 (285.62) || NVIDIA GPU Computing SDK 4.0.19 Win 64 || PASS

Latest revision as of 19:25, 18 October 2012

This page outlines the proposed GPU acceleration framework in ITK v4. The GPU has become a cost-effective parallel computing platform for computationally expensive problems. Although many ITK image filters can benefit from the GPU, there has been no GPU support in ITK as of today. We propose to add a new data structure, framework, and some basic image operations that support the GPU in order to allow ITK developers to easily implement their filters running on both the CPU and GPU.

Goals

  • Add the support for the GPU processing in ITK
    • GPU image class
    • Extension of ITK multithreading model to support the GPU
    • Pipeline supporting both CPU and GPU filters and images
    • Basic GPU image operators

Authors

GPU acceleration for ITK v4 has been proposed by Harvard University and University of Utah (PI: Jim Miller from GE). University of Pennsylvania is also participating for GPU image registration.

  • Won-Ki Jeong (wkjeong -at- seas.harvard.edu)
  • Baohua Wu (baohua -at- seas.upenn.edu)

Summary

  • GPU image class that manages GPU and CPU data transparently from users
  • GPU image filter base class that supports pipelining
  • GPU data, context, kernel manager that help users to use OpenCL code easily with ITK
  • Object factory automatically creates images and filters for target architecture
  • GPU inPlace image filter
  • GPU Finite Difference Image Filter and Function classes

Example Filters

  • GPUMeanImageFilter
  • GPUBinaryThresholdImageFilter
  • GPUGradientAnisotropicDiffusionImageFilter
  • GPUDemonsRegistrationFilter

New Classes

GPUImage

GPU image class.

GPUDataManager

Manage GPU data container and synchronize between the CPU and GPU. Used by GPU image class.

GPUImageDataManager

GPU data manager for GPUImage data, derived from GPUDataManager base class.

GPUContextManager

Manage GPU contexts and Command Queues.

GPUKernelManager

Manage GPU programs and kernels, and execute kernels.

GPUImageToImageFilter

Base class for GPU-based ImageToImage filters. To write your own filter, derive a child class from this base class and implement GPUGenerateDate() accordingly.

GPUMeanImageFilter

Mean image filter implementation.

GPUInPlaceImageFilter

Base class for GPU-based inplace image filters. Input and output image is same. Graft() for GPU data is implemented.

GPUBinaryThresholdImageFilter

Binary threshold image filter as an example of inplace image filter.

GPUFiniteDifferenceImageFilter

Base class for GPU-based finite difference image filters.

Usage Example

ITK GPU classes hide low-level details to manage GPU resources and greatly reduce programmer's effort. You just need to create and modify GPU images as you would normally do for a regular ITK image (e.g., using pixel iterators, FillBuffer(), or SetPixel()). A GPU kernel can be created by as simple as using only three lines of code (creating kernel manager, loading source program, and creating kernel). After running a kernel on the GPU images, you can access pixel values using normal ITK pixel access APIs (e.g., GetPixel()). Synchronization between the CPU and GPU will be performed automatically and efficiently (lazy-evaluation), transparent to the users. Example:

typedef itk::GPUImage<float, 2> GPUImage1f;

//
// Create GPU images as normal itk image
//
GPUImage1f::Pointer srcA, srcB, dest;
srcA = GPUImage1f::New();
srcB = GPUImage1f::New();
dest = GPUImage1f::New();

//
// Initialize GPU images as you normally do for regular itk images
//
srcA->FillBuffer(1.0f);
srcB->FillBuffer(2.0f);
dest->FillBuffer(3.0f);

//
// Create GPU program object
//
GPUKernelManager::Pointer kernelManager = GPUKernelManager::New();

//
// Load OpenCL source code and compile
//
kernelManager->LoadProgramFromFile("ImageOps.cl");

//
// Create kernel
//
int kernel_add = kernelManager->CreateKernel("ImageAdd");

//
// Set parameters
//
unsigned int nElem = 65536;
kernelManager->SetKernelArgWithImage(kernel_add, 0, srcA->GetGPUDataManager());
kernelManager->SetKernelArgWithImage(kernel_add, 1, srcB->GetGPUDataManager());
kernelManager->SetKernelArgWithImage(kernel_add, 2, dest->GetGPUDataManager());
kernelManager->SetKernelArg(kernel_add, 3, sizeof(unsigned int), &nElem);

//
// Launch Kernel
//
kernelManager->LaunchKernel2D(kernel_add, 16, 16, 16, 16);

Latest Code

  • git@github.com:graphor/ITK.git (branch: GPU-Alpha)

ToDo List

  • GPUThreadedGenerateData() for multi-GPU support
  • Context/device management
  • InPlace GPU filter base class
    • Grafting for GPU data object

Plans

GPU image class

We propose a new GPU image class, itk::GPUImage, which provides a GPU data container and functions for implicit and explicit data transfers between the CPU and the GPU memory spaces. itk::GPUImage will contain two snapshots of the current image—one on the CPU and one on the GPU—but provide the functionality of a single image to the user. itk::GPUImage inherits all the public functions from itk::Image, so it can be used with the existing CPU ITK image filters as before. All the pixel operators, for example GetPixel(), and the image iterators can be used to modify pixel values on the CPU side. Conversely, GPU code will modify the pixel values on the GPU side. We propose an automatic synchronization mechanism between the CPU and GPU buffers, transparent to the user. Specifically, we propose the following functionalities for the ITK GPU image class:

  • Efficient GPU memory management
  • CPU and GPU synchronization scheme
  • GPU buffer interface for direct access

GPU support for ITK multithreading model

We will extend the ITK multithreaded architecture by introducing two new virtual functions, GPUGenerateData() and GPUThreadedGenerateData(). These methods will borrow the implicit thread management design from the existing architecture but manage threads based on GPU resources and not CPU resources. When the filter is called, a superclass of the filter will decide between single or multi-threaded execution and determine where to run the code, either on a CPU or GPU. The superclass will spawn threads and call one of the four functions accordingly.

Filter API to support GPU code

We will implement a filter class that has an API to execute GPU code written in OpenCL.

Basic GPU image operators

We propose a set of basic GPU image operators and filters that can be used as building blocks for more complicated numerical algorithms, such as:

  • Addition, subtraction, division, multiplication, inner product, reduction, copy and assignment operators
  • Neighborhood operator filter (for convolution-type filter)

Wish List of Classes to Support GPU

Target architecture

We are going to use OpenCL to implement GPU code for wide applicability (Intel, AMD, and NVIDIA). We will consider supporting NVIDIA CUDA as well if required (for example, to employ existing GPU libraries, such as CUFFT or CUBLAS).

Architectures that have been confirmed to work as of ITK 4.2:

OS GPU Driver OpenCL Test Status
Mac 10.6.8 NVIDIA GeForce GT 120 (9500 GT) nvidia, included with OS v 12.3.6, get info string 1.5.6 PASS
Mac 10.7.? NVIDIA GeForce 9400M and NVIDIA GeForce 9600M GT included with OS ? ?
Mac 10.7.3 ATI Radeon HD 5870 nvidia, included with OS ? 3D Mean Filter FAILS
Mac 10.7.2 NVIDIA GeForce GT 120 (9500 GT) nvidia, included with OS v 1.50.63, get info string 1.50.63 PASS
Linux OpenSUSE 11.4 (x86_64) NVIDIA GeForce GTX 570 x86_64 280.13 NVIDIA GPU Computing SDK v? PASS
Linux OpenSUSE 11.4 (x86_64) NVIDIA Tesla C2070 x86_64 280.13 NVIDIA GPU Computing SDK v? PASS
Linux OpenSUSE 12.1 (x86_64) NVIDIA Quadro FX 5600 x86_64 295.33 nvidia Revision 10327 PASS
Linux OpenSUSE 12.1 (x86_64) NVIDIA GeForce GT120 (9500 GT) x86_64 295.40-15.1 nvidia 295.40-15.1 PASS
Linux OpenSUSE 12.1 (x86_64) NVIDIA GeForce GT120 (9500 GT) x86_64 295.40-15.1 intel 1.5 x64 PASS
Windows 7 Professional SP1 (64-bit) NVIDIA GeForce 8800 GTX 8.17.12.8562 (285.62) NVIDIA GPU Computing SDK 4.0.19 Win 64 PASS
Windows 7 Professional SP1 (64-bit) NVIDIA GeForce 9500 GT (GT120) 8.17.12.9610 Intel SDK for ocl applications 2012 x64 PASS
Windows 7 Professional SP1 (64-bit) NVIDIA GeForce 9500 GT (GT120) 8.17.12.9610 NVIDIA GPU Computing SDK 4.2.9 Win 64 PASS
Windows Vista Ultimate SP2 (64-bit) ATI FireGL V7600 8.850.7.2000 AMD APP SDK v2.5 Windows 64 (AMD APP SDK Runtime 2.4.595.10) Code compiles, all tests fail because GPU device not found. According to http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=126463, this card is not supported


Potential error messages and fixes

Error OS Fix
clGetPlatformIDs returns -1001 and 0 platforms Linux Make sure permissions for /dev/nvidia0 and /dev/nvidiactl are rw for your user
error : Instruction ‘mov’ requires SM 1.3 or higher, or map_f64_to_f32 directive … ptxas fatal : Ptx assembly aborted due to errors … ptxas application ptx input, line 109; warning : Double is not supported. Demoting to float … Any Not all GPUs and/or drivers support double precision calculations. Be sure to only instantiate floating point or integer filter types

Instructions for Installing OpenCL

In specific platforms

Tcons