[GHC] #8199: Get rid of HEAP_ALLOCED

GHC ghc-devs at haskell.org
Thu Apr 10 12:02:32 UTC 2014


#8199: Get rid of HEAP_ALLOCED
----------------------------+----------------------------------------------
        Reporter:  ezyang   |            Owner:  ezyang
            Type:  feature  |           Status:  new
  request                   |        Milestone:  7.10.1
        Priority:  normal   |          Version:  7.7
       Component:           |         Keywords:
  Compiler                  |     Architecture:  Unknown/Multiple
      Resolution:           |       Difficulty:  Project (more than a week)
Operating System:           |       Blocked By:  5435
  Unknown/Multiple          |  Related Tickets:
 Type of failure:           |
  None/Unknown              |
       Test Case:           |
        Blocking:           |
----------------------------+----------------------------------------------
Description changed by ezyang:

Old description:

> This bug is to track progress of removing HEAP_ALLOCED from GHC,
> promising faster GC (especially for large/scattered heaps), as long as we
> can keep the cost of indirections down.
>
> The relevant wiki article:
> http://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapAlloced ;
> we are implementing method 2.  Version 2 of the patchset is probably
> correct.
>
> Blocking problems:
>
> * Properly handle the Windows DLL case (e.g. SRTs). We will probably have
> to reorganize how the indirections are laid out.
>
> * ~~Make it work for GHCi linking of static objects.~~ Blocked on #2841,
> I have it working for ELF, and can make it work for other platforms as
> soon as I get relevant machines.
>
> * Bikeshed hs_main API changes (because closures are not valid prior to
> RTS initialization, so you have to pass in an indirection instead)
>
> Performance improvements possible:
>
> * ~~This patch introduces a lot of new symbols; ensure we are not unduly
> polluting the symbol tables. (In particular, I think _static_closure
> symbols can be made hidden).~~ I've eliminated all of these except for
> the init symbols, which cross the stub object and assembly file boundary,
> and so would need to be made invisible by the linker.
>
> * Don't pay for a double indirection when -fPIC is turned on.  Probably
> the easiest way to do this is to *not* bake in the indirections into
> compiled code when it is -fPIC'd, and instead scribble over the GOT.
> However, I don't know how to go backwards from a symbol to a GOT entry,
> so we might need some heinous assembly hacks to get this working.
>
> * The old HEAP_ALLOCED is supposed to be pessimal on very large heaps. Do
> some performance tests under those workloads.
>
> * Make sure the extra indirection is not causing any C-- optimizations to
> stop firing (it might be, because I put it in as a literal CmmLoad)
>
> * Once an static thunk is updated, we can tag the indirection to let
> other code segments to know about the good news. One way to do this is
> have the update frame for a static indirection should have a reference to
> the *indirection*, not the closure itself. However, this scheme will not
> affect other static closures which have references to the thunk.
>
> * Closure tables should have their indirections short-circuited by the
> initialization code. But maybe it is not worth the cost of telling the
> RTS about the closure tables (also, they would need to be made
> writeable).
>
> * We are paying an indirection when a GC occurs when the closure is not
> in R1. According to the wiki page, technically this is not needed, but I
> don't know how to eliminate references to the closure itself from
> stg_gc_fun.
>
> * ~~Save tags inside the indirection tables, so that we don't spend
> instructions retagging after the following the indirection.~~ Done.
>
> * ~~Improve static indirection and stable pointer registration, avoiding
> binary bloat from `__attribute(constructor)__` stubs.~~ After discussing
> this with some folks, it seems that there isn't really a portable way to
> do this when we are using the system dynamic linker to load libraries at
> startup.  The problem is that we need to somehow get a list of all the
> GHC-compiled libraries which got loaded, and really the easiest way to
> get that is to just build it ourselves.
>
> * ~~Need to implement a new megablock tracking structure so we can
> free/check for lost blocks~~. Now that efficient lookup is not necessary,
> perhaps we can write-optimize the megablock tracking structures.
>
> Speculative improvements:
>
> * Now that static lives in a block, can we GC them like we GC normal
> data? This would be beneficial for static thunks, which now can have
> their indirections completely removed; reverting CAFs may be somewhat
> tricky, however.
>
> * Now that HEAP_ALLOCED is greatly simplified, can we further simply some
> aspects of the GC? At the very least, we ought to be able to make
> megablock allocation cheaper, by figuring out how to remove the
> spinlocks, etc.
>
> * Another possibility is to adopt a hybrid approach, where we manually
> lay out closures when compiling statically, and indirect when compiling
> dynamically. In some sense, this gets the best of both worlds, since we
> expect to not pay any extra cost for indirection due to PIC.

New description:

 This bug is to track progress of removing HEAP_ALLOCED from GHC, promising
 faster GC (especially for large/scattered heaps), as long as we can keep
 the cost of indirections down.

 The relevant wiki article:
 http://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapAlloced ;
 we are implementing method 2.  Version 2 of the patchset is probably
 correct.

 Blocking problems:

 * Properly handle the Windows DLL case (e.g. SRTs). We will probably have
 to reorganize how the indirections are laid out.

 * ~~Make it work for GHCi linking of static objects.~~ Blocked on #2841, I
 have it working for ELF, and can make it work for other platforms as soon
 as I get relevant machines.

 * Bikeshed hs_main API changes (because closures are not valid prior to
 RTS initialization, so you have to pass in an indirection instead)

 * Does not work with unloadObj (see comments below)

 Performance improvements possible:

 * ~~This patch introduces a lot of new symbols; ensure we are not unduly
 polluting the symbol tables. (In particular, I think _static_closure
 symbols can be made hidden).~~ I've eliminated all of these except for the
 init symbols, which cross the stub object and assembly file boundary, and
 so would need to be made invisible by the linker.

 * Don't pay for a double indirection when -fPIC is turned on.  Probably
 the easiest way to do this is to *not* bake in the indirections into
 compiled code when it is -fPIC'd, and instead scribble over the GOT.
 However, I don't know how to go backwards from a symbol to a GOT entry, so
 we might need some heinous assembly hacks to get this working.

 * The old HEAP_ALLOCED is supposed to be pessimal on very large heaps. Do
 some performance tests under those workloads.

 * Make sure the extra indirection is not causing any C-- optimizations to
 stop firing (it might be, because I put it in as a literal CmmLoad)

 * Once an static thunk is updated, we can tag the indirection to let other
 code segments to know about the good news. One way to do this is have the
 update frame for a static indirection should have a reference to the
 *indirection*, not the closure itself. However, this scheme will not
 affect other static closures which have references to the thunk.

 * Closure tables should have their indirections short-circuited by the
 initialization code. But maybe it is not worth the cost of telling the RTS
 about the closure tables (also, they would need to be made writeable).

 * We are paying an indirection when a GC occurs when the closure is not in
 R1. According to the wiki page, technically this is not needed, but I
 don't know how to eliminate references to the closure itself from
 stg_gc_fun.

 * ~~Save tags inside the indirection tables, so that we don't spend
 instructions retagging after the following the indirection.~~ Done.

 * ~~Improve static indirection and stable pointer registration, avoiding
 binary bloat from `__attribute(constructor)__` stubs.~~ After discussing
 this with some folks, it seems that there isn't really a portable way to
 do this when we are using the system dynamic linker to load libraries at
 startup.  The problem is that we need to somehow get a list of all the
 GHC-compiled libraries which got loaded, and really the easiest way to get
 that is to just build it ourselves.

 * ~~Need to implement a new megablock tracking structure so we can
 free/check for lost blocks~~. Now that efficient lookup is not necessary,
 perhaps we can write-optimize the megablock tracking structures.

 Speculative improvements:

 * Now that static lives in a block, can we GC them like we GC normal data?
 This would be beneficial for static thunks, which now can have their
 indirections completely removed; reverting CAFs may be somewhat tricky,
 however.

 * Now that HEAP_ALLOCED is greatly simplified, can we further simply some
 aspects of the GC? At the very least, we ought to be able to make
 megablock allocation cheaper, by figuring out how to remove the spinlocks,
 etc.

 * Another possibility is to adopt a hybrid approach, where we manually lay
 out closures when compiling statically, and indirect when compiling
 dynamically. In some sense, this gets the best of both worlds, since we
 expect to not pay any extra cost for indirection due to PIC.

--

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8199#comment:28>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list