Task Stealing framework: Difference between revisions

From KitwarePublic
Jump to navigationJump to search
(Created page with "==Purpose== Task Stealing is a programming model that can be used to efficiently develop multi threaded applications, and to use hybrid resources. This model uses a few concepts...")
 
No edit summary
Line 6: Line 6:
==Task Stealing description==
==Task Stealing description==


The task stealing framework can be applied to tasks that can be recursively split into smaller tasks.  
The task stealing framework can be applied to tasks that can be recursively split into smaller tasks.


When a resource has finished its task, it can then steal work from another resource by splitting its task in two and taking the second half for itself.
A resource executes first its local tasks (newly generated tasks are pushed in the local task queue).
When a resource becomes idle, it randomly selects a victim, and if this victim has some work left, the thief steals part of it (50% usually).
If no work can be stolen from this vitcim, the theif select an other victim.


The last part of the task stealing framework is to merge all the contributions from all the workers into a single output.
This algorithm has a provable performance.


==High level interface==
The last part of the task stealing framework is to merge all the contributions from all workers into a single output.
 
==Application to scientific visualization==
 
Many visualization filters  tend to iterate on a data structure to execute  a function on some local data. The result is a new data structure.
A way to parallelize such algorithms while minimizing the impact on the code is to rely on a two phase parallel algorithm where the iteraction becomes a parallel loop (iteractions are independant tasks)  executed in parallel (work stealing takes care of splitting
the iterations according to the available resources). Results produced are stored in an intermediate data structure (local to each resource when possible). A second step  applies on these intermediate data structures a (parallel) reduction operator to produce the final data structure (again
work stealing takes care of task scheduling during this reduction).
 
==Working plan==
 
===High level interface===


The first proposal is to expose an interface that can be used with minimal modification to the existing code. This interface can be used for algorithms that can be splitted and the generated pieces merged by a single strategy.  
The first proposal is to expose an interface that can be used with minimal modification to the existing code. This interface can be used for algorithms that can be splitted and the generated pieces merged by a single strategy.  
In order to be parallelized, the algorithm would have to provided the Splitter and the Merger, and the RequestData method would be called on subsets of the data.
 
In order to be parallelized, the algorithm would have to provide the Splitter and the Merger (for which implementations for the most usual cases will be provided), and the RequestData method would be called on subsets of the data.


Examples :  
Examples :  
* Contouring : the input is splitted by cell, and during the merge operation, me have to manage several piece of surface generated generated by each worker.
* Contouring : the input is splitted by cell, and during the merge operation, we have to manage several pieces of surface generated generated by each worker.
* streamlines : the seeds can be distributed and each streamline traced independently
* streamlines : the seeds can be distributed and each streamline traced independently


==Low level interface==
===Low level interface===


This second level of interface will be necessary for algorithms for which there is not a single split or merge operation.
This second level of interface will be necessary for algorithms that can not be decomposed into the split, parallel task, then merge steps.


In this case, the developer will have to define the tasks of his algorithm that can be parallelized within the RequestData method, and explicitly provide split and merge operation for each task.
In this case, the developer will have to define the tasks of his algorithm that can be parallelized within the RequestData method, and explicitly provide split and merge operation for each task.


==Multi-threading improvements==
===Multi-threading improvements===


In order to help developers to write multi-threaded algorithms, we have to provide some help to improve thread concurrency in VTK. In the high level interface, the RequestData method has to be reentrant.
In order to help developers to write multi-threaded algorithms, we have to provide some help to improve thread concurrency in VTK. In the high level interface, the RequestData method has to be reentrant.


Several designs used in VTK are not adapted in VTK, and replacements can be progressively provided :  
Several designs used in VTK are not adapted, and replacements can be progressively provided :  
* caches : many algorithms have internal ivars that are used as cache, to use during the RequestData method. A possibility would be to provide a vtkThreadSpecific<> template that could be used as a smart pointer, be that will keep one copy of the cached object per thread.
* caches : many algorithms have internal ivars that are used as cache during the RequestData method. A possibility would be to provide a vtkThreadSpecific<> template that could be used as a smart pointer, but that will keep one copy of the cached object per thread.
* iterators : Next() method are still present on VTK containers, we will certainly have to replace all methods by iterator-based traversal.
* iterators : InitTraversal() and Next() methods are still present on VTK containers. We will certainly have to replace those methods by iterator-based traversal.
 
===Implementations===
The plan is to propose a vtk interface for task-stealing, and two implementations to begin with :
* a sequential implementation
* a [http://kaapi.gforge.inria.fr/ kaapi]-based implementation

Revision as of 12:45, 1 October 2010

Purpose

Task Stealing is a programming model that can be used to efficiently develop multi threaded applications, and to use hybrid resources. This model uses a few concepts that the developers have to become used to, but most of the resource management is hidden to the user.

Task Stealing description

The task stealing framework can be applied to tasks that can be recursively split into smaller tasks.

A resource executes first its local tasks (newly generated tasks are pushed in the local task queue). When a resource becomes idle, it randomly selects a victim, and if this victim has some work left, the thief steals part of it (50% usually). If no work can be stolen from this vitcim, the theif select an other victim.

This algorithm has a provable performance.

The last part of the task stealing framework is to merge all the contributions from all workers into a single output.

Application to scientific visualization

Many visualization filters tend to iterate on a data structure to execute a function on some local data. The result is a new data structure. A way to parallelize such algorithms while minimizing the impact on the code is to rely on a two phase parallel algorithm where the iteraction becomes a parallel loop (iteractions are independant tasks) executed in parallel (work stealing takes care of splitting the iterations according to the available resources). Results produced are stored in an intermediate data structure (local to each resource when possible). A second step applies on these intermediate data structures a (parallel) reduction operator to produce the final data structure (again work stealing takes care of task scheduling during this reduction).

Working plan

High level interface

The first proposal is to expose an interface that can be used with minimal modification to the existing code. This interface can be used for algorithms that can be splitted and the generated pieces merged by a single strategy.

In order to be parallelized, the algorithm would have to provide the Splitter and the Merger (for which implementations for the most usual cases will be provided), and the RequestData method would be called on subsets of the data.

Examples :

  • Contouring : the input is splitted by cell, and during the merge operation, we have to manage several pieces of surface generated generated by each worker.
  • streamlines : the seeds can be distributed and each streamline traced independently

Low level interface

This second level of interface will be necessary for algorithms that can not be decomposed into the split, parallel task, then merge steps.

In this case, the developer will have to define the tasks of his algorithm that can be parallelized within the RequestData method, and explicitly provide split and merge operation for each task.

Multi-threading improvements

In order to help developers to write multi-threaded algorithms, we have to provide some help to improve thread concurrency in VTK. In the high level interface, the RequestData method has to be reentrant.

Several designs used in VTK are not adapted, and replacements can be progressively provided :

  • caches : many algorithms have internal ivars that are used as cache during the RequestData method. A possibility would be to provide a vtkThreadSpecific<> template that could be used as a smart pointer, but that will keep one copy of the cached object per thread.
  • iterators : InitTraversal() and Next() methods are still present on VTK containers. We will certainly have to replace those methods by iterator-based traversal.

Implementations

The plan is to propose a vtk interface for task-stealing, and two implementations to begin with :

  • a sequential implementation
  • a kaapi-based implementation