[Insight-developers] Re: multi-thread efficient malloc for LevelSetNode

Simon Warfield warfield@bwh.harvard.edu
Wed, 17 Apr 2002 08:35:49 -0400


Hi Joshua,

> We are not using smart pointers for the LevelSetNodes, but calling new and
> delete has the same mutex problem (there is no way around this, right?).

  It is possible to implement an efficient memory allocator (i.e. malloc(),
free(),new(),delete()) that minimizes mutex contention.  Basically, rather 
than a single global heap protected by a single mutex that all threads contend 
for, you can use a set of heaps and organize for each thread to try to 
allocate new memory from its own heap.  

There have been several implementations of this idea.  One with a good 
reputation is Hoard:
  http://www.cs.utexas.edu/users/emery/hoard/
  ftp://ftp.cs.utexas.edu/pub/emery/papers/asplos2000.pdf

  I used a different implementation (ptmalloc) a few of years ago with good 
results.  ptmalloc() has become the default memory allocator in the GNU libc,
so in fact the assumption that the default malloc() involves contending for 
a single mutex isn't true on systems using GNU libc (i.e. linux) and may not
be true for other systems either.
  One paper on performance tests for ptmalloc() is at:
http://www.citi.umich.edu/techreports/reports/citi-tr-00-5.pdf
However, Hoard is apparently even better.

  All of that said, you might be surprised to not find much of a performance
hit in doing 'parallel' memory allocation.  If your threads are not allocating
memory at the same time then lock acquisition (for an uncontended mutex) can
be very fast (Sun said 2 microseconds for their mutex implementation on
their hardware a couple of years ago).  

  You can also mitigate the contention problem by the usual solution of
allocating more than you need at each allocation time, and then handing
it out as you need it.  From your description below you seem to be doing
something like this. The idea is you need to allocate one LevelSetNode,
then allocate ten LevelSetNode's, and don't do another allocation until you 
have made use of the other nine.  This reduces memory system contention by a 
factor of ten at the cost of slightly increased memory usage. I suggest to
do careful timing of the actual impact of mutex contention or memory
allocation costs before going to a lot of trouble to optimize for this. One
simple test would be to drop in Hoard and see if you see much impact on runtime.
If you don't, then this probably isn't worth optimizing on.

> Date: Tue, 16 Apr 2002 12:21:36 -0600 (MDT)
> From: Joshua Cates <cates@poblano.sci.utah.edu>
> To: "Miller, James V (Research)" <millerjv@crd.ge.com>
> Cc: Insight-Developers <insight-developers@public.kitware.com>
> Subject: RE: [Insight-developers] design issue
> 
> Hi Jim,
> 
> Thanks for the input.  I'll take a look at these articles.
> 
> >
> > Can the LevelSetNode be a stacked based class? And hence not use
> > SmartPointers at all? And hence avoid the mutexes?
> >
> 
> We are not using smart pointers for the LevelSetNodes, but calling new and
> delete has the same mutex problem (there is no way around this, right?).
> I have considered using std::vector for the storage class of my object.
> It has the advantage that different allocators could be used, i.e. a
> non-threadsafe allocator could gain some speed if you know you don't need
> thread safety.
> 
> Josh.
> 
> ______________________________
>  Josh Cates
>  School of Computer Science
>  University of Utah
>  Email: cates@sci.utah.edu
>  Phone: (801) 587-7697
>  URL:   www.cs.utk.edu/~cates
> 
> 
> On Tue, 16 Apr 2002, Miller, James V (Research) wrote:
> 
> > Here are my thoughts:
> >
> > Not knowing anything about your implementation or level sets....
> >
> > Can the LevelSetNode be a stacked based class? And hence not use
> > SmartPointers at all? And hence avoid the mutexes?
> >
> > You could create an std::vector of LevelSetNodes for each thread.
> >
> > In the current issue of C/C++ Users Journal (May 2002), there are a
> > number of articles on threading.  One article covers the boost thread
> > library which looks pretty nice.  Other articles are on "thread local
> > storage" (globals per thread). These are slower than real globals but
> > faster than mutex code.
> >
> > Another article talks about different mechanisms for mutexes.  One
> > interesting idea is a recursive mutex where a thread pays a penalty
> > for the mutex the first time it grabs it. As long as it doesn't release
> > the mutex, subsequent calls to get the mutex are fast.
> >
> > I'll be looking at these articles in a little more detail to see
> > if there is anything we want to fold into ITK.
> >
> >
> >
> > -----Original Message-----
> > From: Joshua Cates [mailto:cates@poblano.sci.utah.edu]
> > Sent: Monday, April 15, 2002 2:41 PM
> > To: Insight-Developers
> > Subject: [Insight-developers] design issue
> >
> >
> > Hi all,
> >
> > I'm soliciting feedback on a design issue.
> >
> > In our SparseFieldLevelSetImageFilter we manage lists of small data
> > objects (call them LevelSetNodes).  These lists must grow and shrink
> > dynamically. Because the filter is multithreaded, allocating and freeing
> > LevelSetObjects as they are needed is not an option.  It's one of those
> > cases where a naive implementation could actually be slower in a
> > multithreaded environment because of mutex locks.
> >
> > So what we do is to pre-allocate a "store" of objects for each thread from
> > which that thread borrows to grow its lists.  This allows us to operate in
> > a distributed memory mode.  Each thread has a private reserve of memory
> > that will not be touched by other threads, eliminating the need for mutex
> > locks. Blocking only occurs when a thread runs out of memory and has to
> > enlarge its store.  I am planning to encapsulate this functionality into
> > an itk::DataObject subclass.
> >
> > Questions are these:
> >
> > Have other developers come up with different solutions to this same
> > problem?
> >
> > If so, what memory management objects might already exist in Itk (or STL)
> > to help?
> >
> > Would other developers find this "itkObjectStore" class useful?
> >
> > Josh.
> >
> >
> >
> > ______________________________
> >  Josh Cates
> >  School of Computer Science
> >  University of Utah
> >  Email: cates@cs.utah.edu
> >  Phone: (801) 587-7697
> >  URL:   www.cs.utk.edu/~cates

--
Simon Warfield, Ph.D. warfield@bwh.harvard.edu Phone:617-732-7090
http://www.spl.harvard.edu/~warfield           FAX:  617-582-6033
Assistant Professor of Radiology,          Harvard Medical School
Director, Computational Radiology Laboratory
Thorn 329, Dept Radiology,  Brigham and Women's Hospital 
75 Francis St, Boston, MA, 02115