Using mutable array after an unsafeFreezeArray, and GC details

Fri May 9 23:49:07 UTC 2014

Any chance you could try to use storable or unboxed vectors?

On Friday, May 9, 2014, Brandon Simmons <brandon.m.simmons at gmail.com> wrote:

>
> On May 9, 2014 5:13 PM, "Edward Z. Yang" <ezyang at mit.edu<javascript:_e(%7B%7D,'cvml','ezyang at mit.edu');>>
> wrote:
> >
> > Hello Brandon,
> >
> > Excerpts from Brandon Simmons's message of 2014-05-08 16:18:48 -0700:
> > > I have an unusual application with some unusual performance problems
> > > and I'm trying to understand how I might use unsafeFreezeArray to help
> > > me, as well as understand in detail what's going on with boxed mutable
> > > arrays and GC. I'm using the interface from 'primitive' below.
> > >
> > > First some basic questions, then a bit more background
> > >
> > > 1) What happens when I do `newArray s x >>= \a-> unsafeFreezeArray a
> > > >> return a` and then use `a`? What problems could that cause?
> >
> > Your code as written wouldn't compile, but assuming you're talking about
> > the primops newArray# and unsafeFreezeArray#, what this operation does
> > is allocate a new array of pointers (initially recorded as mutable), and
> > then freezes it in-place (by changing the info-table associated with
> > it), but while maintaining a pointer to the original mutable array.
>  Nothing bad
> > will happen immediately, but if you use this mutable reference to mutate
> > the pointer array, you can cause a crash (in particular, if the array
> > makes it to the old generation, it will not be on the mutable list and
> > so if you mutate it, you may be missing roots.)
> >
> > > 2) And what if a do a `cloneMutableArray` on `a` and likewise use the
> > > resulting array?
> >
> > If you do the clone before freezing, that's fine for all use-cases;
> > if you do the clone after, you will end up with the same result as (1).
> >
> > > Background: I've been looking into an issue [1] in a library in which
> > > as more mutable arrays are allocated, GC dominates (I think I verified
> > > this?) and all code gets slower in proportion to the number of mutable
> > > arrays that are hanging around.
> > >
> > > I've been trying to understand how this is working internally. I don't
> > > quite understand how the "remembered set" works with respect to
> > > MutableArray. As best I understand: the remembered set in generation G
> > > points to certain objects in older generations, which objects hold
> > > references to objects in G. Then for MutableArrays specifically,
> > > card-marking is used to mark regions of the array with garbage in some
> > > way.
> > >
> > > So my hypothesis is the slowdown is associated with the size of the
> > > remembered set, and whatever the GC has to do on it. And in my tests,
> > > freezing the array seems to make that overhead (at least the overhead
> > > proportional to number of arrays) disappear.
> >
> > You're basically correct.  In the current GC design, mutable arrays of
> > pointers are always placed on the mutable list.  The mutable list of
> > generations which are not being collected are always traversed; thus,
> > the number of pointer arrays corresponds to a linear overhead for minor
> GCs.
> >
> > Here is a feature request tracking many of the infelicities that our
> > current GC design has:  https://ghc.haskell.org/trac/ghc/ticket/7662
> > The upshot is that the Haskell GC is very nicely tuned for mostly
> > immutable workloads, but there are some bad asymptotics when your
> > heap has lots of mutable objects.  This is generally a hard problem:
> > tuned GC implementations for mutable languages are a lot of work!
> > (Just ask the JVM implementors.)
> >
>
> Very helpful, thanks! And take some internet points.
>
> > > Now I'm really lost in the woods though. My hope is that I might be
> > > able to safely use unsafeFreezeArray to help me here [3]. Here are the
> > > particulars of how I use MutableArray in my algorithm, which are
> > > somewhat unusual:
> > >   - keep around a small template `MutableArray Nothing`
> > >   - use cloneMutableArray for fast allocation of new arrays
> > >   - for each array only a *single* write (CAS actually) happens at
> each position
> > >
> > > In fact as far as I can reason, there ought to be no garbage to
> > > collect at all until the entire array becomes garbage (the initial
> > > value is surely shared, especially since I'm keeping this template
> > > array around to clone from, right?). In fact I was even playing with
> > > the idea of rolling a new CAS that skips the card-marking stuff.
> >
> > I don't understand your full workload, but if you have a workload that
> > involves creating an array, mutating it over a short period of time,
> > and then never mutating it afterwards, you should simply freeze it after
> > you are done writing it.  Once frozen, the array will no longer be kept
> > on the mutable list and you won't pay for it when doing GC.  However,
> > the fact that you are doing a CAS makes it seem to me that your workflow
> > may be more complicated than that...
>
> Yes I think for my use case the overhead required to determine when the
> array can be frozen would not be worth it. I think I have some other knobs
> I can twist here.
>
> I'll keep an eye on that ticket and maybe chime in if I have any ideas.
>
> Thanks,
> Brandon
>
> >
> > Cheers,
> > Edward
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20140509/32360768/attachment-0001.html>