[Haskell-cafe] Basic questions about concurrency in Haskell

Fri Aug 7 11:13:15 EDT 2009

Bringing the cafe back in.

If I remember correctly tuning the GC is one of the things they worked on
for the next release (in relation to parallelism).Here's a link to the
paper:
http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/multicore-ghc.pdf

You can allocate dynamic memory on the C heap using the
Foreign.Marshal.Alloc stuff. The Storable class allows you to specify
alignment.
However it's not clear you will need it in all cases. Most of the time IME
all you really need is for "the business end" of object A and B to end up on
different cache lines, padding them with the size of a cache line will
accomplish that even if each object starts in the middle of a cache line. In
other words, as long as you ensure that there's at least CACHE_LINE bytes
difference between the last bit of data in A and the first bit of data in B,
they won't exhibit false sharing.

I'm not sure if any of the "standard" mutable references etc. respect this.
Maybe someone can clarify? If you modify a thread-local IORef will that
cause another thread to stall because it has it's own thread-local IORef on
the same cache line?

On Fri, Aug 7, 2009 at 12:42 PM, Thomas Witzel <witzel.thomas at gmail.com>wrote:

> I have not yet been able to get the dev version compiled, but I'm
> looking into it (basically compiles to 99% and then stops because of a
> missing or defective .mk file). Anyhow, I played more with the release
> version and one thing I noticed is that the GC time increases almost
> linearly with the number of cores. Is there somewhere a document that
> explains the overall architecture of the GHC/RTS and how in especially
> the GC acts in concurrent situations ?
>
> As for your comment regarding the padding the data-structures, I'll of
> course also need to control the alignment in such a case. Is there
> such a thing as explicit dynamic memory allocation in Haskell ?
>
> Thanks, Thomas
>
> On Wed, Aug 5, 2009 at 3:16 PM, Sebastian
> Sylvan<sebastian.sylvan at gmail.com> wrote:
> > GHC doesn't have per-thread allocation so it's probably a bit tricky to
> get
> > that working. Plus, for parallelism it's not clear that a piece of data
> is
> > necessarily "owned" by one thead, since it could be produced by a spark
> and
> > consumed by another spark, those two independent sparks may not
> necessarily
> > occupy the same thread, which means that any *other* data accessed by the
> > firs thread could thrash the cache. So really you'd need per-spark
> > allocation areas which would probably make sparks very heavy weight.
> > In other words, I think there's plenty of research that needs to be done
> > w.r.t. scheduling things in time and space so as to avoid false sharing.
> You
> > could, of course, always chunk your work manually, and make sure that
> each
> > "chunk" works on a big block that won't share cache lines with anything
> else
> > (e.g. by padding the data structures).
> > Also, while GHC does a fair bit of mutation on its own internal data
> (thunks
> > etc.), most of the "user data" is read-only, which should help. I.e. once
> a
> > cache line has been filled up, there won't be any synchronisation needed
> on
> > that data again.
> >
> > On Wed, Aug 5, 2009 at 8:04 PM, Thomas Witzel <witzel.thomas at gmail.com>
> > wrote:
> >>
> >> I'll try that. I'd like to stick with it. As for the memory, although
> >> its probably quite a bit of work, it should be doable to have code
> >> generated where the threads have their own, non-overlapping, memory
> >> pages, so that the CPUs don't go into a cache-thrashing death-match.
> >> I'll spend some more time with Haskell and then go from there.
> >>
> >> On Wed, Aug 5, 2009 at 3:01 PM, Sebastian
> >> Sylvan<sebastian.sylvan at gmail.com> wrote:
> >> >
> >> >
> >> > On Wed, Aug 5, 2009 at 6:59 PM, Thomas Witzel <
> witzel.thomas at gmail.com>
> >> > wrote:
> >> >>
> >> >> 2. I started with the very simple nfib example given in the manual
> for
> >> >> Control.Parallel (Section 7.18). On my systems using multiple cores
> >> >> makes the code actually slower than just using a single core. While
> >> >> the manual cautions that this could be the case for certain
> >> >> algorithms, I'm wondering whether this is the desired behaviour for
> >> >> this example.
> >> >>
> >> >> I'm using ghc 6.10.4 right now.
> >> >
> >> > IIRC the development version of GHC has some major work to optimize
> >> > concurrency, so it may be worth trying that. In particular I believe
> it
> >> > executes sparks in batches, to reduce the overhead (which hopefully
> >> > fixes
> >> > your issue).
> >> >
> >> > --
> >> > Sebastian Sylvan
> >> >
> >
> >
> >
> > --
> > Sebastian Sylvan
> >
>

-- 
Sebastian Sylvan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090807/61a183a0/attachment.html