[GHC] #8199: Get rid of HEAP_ALLOCED
GHC
ghc-devs at haskell.org
Fri Apr 11 19:30:58 UTC 2014
#8199: Get rid of HEAP_ALLOCED
----------------------------+----------------------------------------------
Reporter: ezyang | Owner: ezyang
Type: feature | Status: new
request | Milestone: 7.10.1
Priority: normal | Version: 7.7
Component: | Keywords:
Compiler | Architecture: Unknown/Multiple
Resolution: | Difficulty: Project (more than a week)
Operating System: | Blocked By: 5435
Unknown/Multiple | Related Tickets:
Type of failure: |
None/Unknown |
Test Case: |
Blocking: |
----------------------------+----------------------------------------------
Old description:
> This bug is to track progress of removing HEAP_ALLOCED from GHC,
> promising faster GC (especially for large/scattered heaps), as long as we
> can keep the cost of indirections down.
>
> The relevant wiki article:
> http://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapAlloced ;
> we are implementing method 2. Version 2 of the patchset is probably
> correct.
>
> Blocking problems:
>
> * Properly handle the Windows DLL case (e.g. SRTs). We will probably have
> to reorganize how the indirections are laid out.
>
> * ~~Make it work for GHCi linking of static objects.~~ Blocked on #2841,
> I have it working for ELF, and can make it work for other platforms as
> soon as I get relevant machines.
>
> * Bikeshed hs_main API changes (because closures are not valid prior to
> RTS initialization, so you have to pass in an indirection instead)
>
> * Does not work with unloadObj (see comments below)
>
> Performance improvements possible:
>
> * ~~This patch introduces a lot of new symbols; ensure we are not unduly
> polluting the symbol tables. (In particular, I think _static_closure
> symbols can be made hidden).~~ I've eliminated all of these except for
> the init symbols, which cross the stub object and assembly file boundary,
> and so would need to be made invisible by the linker.
>
> * Don't pay for a double indirection when -fPIC is turned on. Probably
> the easiest way to do this is to *not* bake in the indirections into
> compiled code when it is -fPIC'd, and instead scribble over the GOT.
> However, I don't know how to go backwards from a symbol to a GOT entry,
> so we might need some heinous assembly hacks to get this working.
>
> * The old HEAP_ALLOCED is supposed to be pessimal on very large heaps. Do
> some performance tests under those workloads.
>
> * Make sure the extra indirection is not causing any C-- optimizations to
> stop firing (it might be, because I put it in as a literal CmmLoad)
>
> * Once an static thunk is updated, we can tag the indirection to let
> other code segments to know about the good news. One way to do this is
> have the update frame for a static indirection should have a reference to
> the *indirection*, not the closure itself. However, this scheme will not
> affect other static closures which have references to the thunk.
>
> * Closure tables should have their indirections short-circuited by the
> initialization code. But maybe it is not worth the cost of telling the
> RTS about the closure tables (also, they would need to be made
> writeable).
>
> * We are paying an indirection when a GC occurs when the closure is not
> in R1. According to the wiki page, technically this is not needed, but I
> don't know how to eliminate references to the closure itself from
> stg_gc_fun.
>
> * ~~Save tags inside the indirection tables, so that we don't spend
> instructions retagging after the following the indirection.~~ Done.
>
> * ~~Improve static indirection and stable pointer registration, avoiding
> binary bloat from `__attribute(constructor)__` stubs.~~ After discussing
> this with some folks, it seems that there isn't really a portable way to
> do this when we are using the system dynamic linker to load libraries at
> startup. The problem is that we need to somehow get a list of all the
> GHC-compiled libraries which got loaded, and really the easiest way to
> get that is to just build it ourselves.
>
> * ~~Need to implement a new megablock tracking structure so we can
> free/check for lost blocks~~. Now that efficient lookup is not necessary,
> perhaps we can write-optimize the megablock tracking structures.
>
> Speculative improvements:
>
> * Now that static lives in a block, can we GC them like we GC normal
> data? This would be beneficial for static thunks, which now can have
> their indirections completely removed; reverting CAFs may be somewhat
> tricky, however.
>
> * Now that HEAP_ALLOCED is greatly simplified, can we further simply some
> aspects of the GC? At the very least, we ought to be able to make
> megablock allocation cheaper, by figuring out how to remove the
> spinlocks, etc.
>
> * Another possibility is to adopt a hybrid approach, where we manually
> lay out closures when compiling statically, and indirect when compiling
> dynamically. In some sense, this gets the best of both worlds, since we
> expect to not pay any extra cost for indirection due to PIC.
New description:
This bug is to track progress of removing HEAP_ALLOCED from GHC, promising
faster GC (especially for large/scattered heaps), as long as we can keep
the cost of indirections down.
The relevant wiki article:
http://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapAlloced ;
we are implementing method 2. Version 2 of the patchset is probably
correct.
Blocking problems:
* Properly handle the Windows DLL case (e.g. SRTs). We will probably have
to reorganize how the indirections are laid out.
* ~~Make it work for GHCi linking of static objects.~~ Blocked on #2841, I
have it working for ELF, and can make it work for other platforms as soon
as I get relevant machines.
* Bikeshed hs_main API changes (because closures are not valid prior to
RTS initialization, so you have to pass in an indirection instead)
* Does not work with unloadObj (see comments below)
Performance improvements possible:
* ~~This patch introduces a lot of new symbols; ensure we are not unduly
polluting the symbol tables. (In particular, I think _static_closure
symbols can be made hidden).~~ ~~I've eliminated all of these except for
the init symbols, which cross the stub object and assembly file boundary,
and so would need to be made invisible by the linker.~~ I needed to make
local info tables public.
* Don't pay for a double indirection when -fPIC is turned on. Probably
the easiest way to do this is to *not* bake in the indirections into
compiled code when it is -fPIC'd, and instead scribble over the GOT.
However, I don't know how to go backwards from a symbol to a GOT entry, so
we might need some heinous assembly hacks to get this working.
* The old HEAP_ALLOCED is supposed to be pessimal on very large heaps. Do
some performance tests under those workloads.
* Make sure the extra indirection is not causing any C-- optimizations to
stop firing (it might be, because I put it in as a literal CmmLoad)
* Once an static thunk is updated, we can tag the indirection to let other
code segments to know about the good news. One way to do this is have the
update frame for a static indirection should have a reference to the
*indirection*, not the closure itself. However, this scheme will not
affect other static closures which have references to the thunk.
* Closure tables should have their indirections short-circuited by the
initialization code. But maybe it is not worth the cost of telling the RTS
about the closure tables (also, they would need to be made writeable).
* We are paying an indirection when a GC occurs when the closure is not in
R1. According to the wiki page, technically this is not needed, but I
don't know how to eliminate references to the closure itself from
stg_gc_fun.
* ~~Save tags inside the indirection tables, so that we don't spend
instructions retagging after the following the indirection.~~ Done.
* ~~Improve static indirection and stable pointer registration, avoiding
binary bloat from `__attribute(constructor)__` stubs.~~ After discussing
this with some folks, it seems that there isn't really a portable way to
do this when we are using the system dynamic linker to load libraries at
startup. The problem is that we need to somehow get a list of all the
GHC-compiled libraries which got loaded, and really the easiest way to get
that is to just build it ourselves.
* ~~Need to implement a new megablock tracking structure so we can
free/check for lost blocks~~. Now that efficient lookup is not necessary,
perhaps we can write-optimize the megablock tracking structures.
Speculative improvements:
* Now that static lives in a block, can we GC them like we GC normal data?
This would be beneficial for static thunks, which now can have their
indirections completely removed; reverting CAFs may be somewhat tricky,
however.
* Now that HEAP_ALLOCED is greatly simplified, can we further simply some
aspects of the GC? At the very least, we ought to be able to make
megablock allocation cheaper, by figuring out how to remove the spinlocks,
etc.
* Another possibility is to adopt a hybrid approach, where we manually lay
out closures when compiling statically, and indirect when compiling
dynamically. In some sense, this gets the best of both worlds, since we
expect to not pay any extra cost for indirection due to PIC.
--
Comment (by ezyang):
Great news: with overlapping/compressed static closure representation,
binary size overall is now less than it used to be! The downside is that I
still need to collect static closures together away from their definition,
which means that I now need to export symbols local info tables (which I
know we'd previously been keeping local to reduce the number of symbols).
Remaining tasks: unloadObj support and T3294 performance debugging..
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8199#comment:32>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list