RFC: ghc's dynamic linker

Duncan Coutts duncan@coutts.uklinux.net
Wed, 28 Aug 2002 00:50:50 +0100

On Tue, 27 Aug 2002 17:19:15 +0100
"Simon Marlow" <simonmar@microsoft.com> wrote:

> Right, now what would it take to implement this.  As Duncan points out,
> this is almost possible already using the GHCi dynamic linker, which is
> available to any program compiled with GHC via the FFI.  The interface
> is fairly straightforward, eg:


This is what Andre Pang has done, modulo any changes between ghc 5.03 & 5.04.

Andre says:
  The actual runtime loader itself is in a runtime_loader/
  directory in the tarball.  The best example of how to use it is
  in the tests/ChibaTest* files.

> but the main problem is that the dynamic linker can't link new modules
> to symbols in the currently running binary.  So, in order to link a new
> Haskell module, you first have to load up a fresh copy of the 'base' and
> 'haskell98' packages, just like GHCi does.  It *almost* works to do
> this, except that you get strange effects, one of which is that you have
> two copies of stdout each with their own buffer.

This is exatly what Andre does in Chiba, he has to load extra copies of
certian interface modules, but it is ok since the Haskell modules are

> Going the final step and allowing linkage to the current binary is
> possible, it just means the linker has to know how to read the symbol
> table out of the binary, and you have to avoid running 'strip'.  I
> believe reading the symbol table is quite straightforward, the main
> problem being that on Unix you don't actually know where the binary
> lives, so you have to wire it in or search the PATH.  

Would it be easier/better to exlicitly specify a symbol map file for the
linker to use to locate the appropriate points in the current binary.
Then perhaps we need a flag to ask ghc to spit out a symbol map along
with the .o. Alternatively there tools to extract the map from a .o, I
don't know - I'm not a bin-utils guru!

> Another problem is that you don't normally link a *complete* copy of the
> base package into your binary, you only link the bits you need.  Linking
> the whole lot would mean every binary would be about 10M; but doing this
> on the basis of a flag which you turn on when you want to do dynamic
> linking maybe isn't so bad.

The only bit that I would want to include completely is the API module
which would likely be quite small as it would only rexport other parts
of the program though a smaller simpler interface.

Ah, I see what you're saying now, we'd have to include the whole of the
standard library, or indeed any library that we wanted the plugins to be
able to use. The system's dynamic linker doesn't have this problem
because it always has all of the libraries avaliable and just loads them
on demand. With static linking we have to predict what would be wanted
beforehand. Aaarg! Perhaps linking all of the standard library wouldn't
be so bad (using a special flag of course) since only the bits that are
used get loaded into memory, leaving just the large disk overhead.

> There are a couple of other options:
>   - make your program into a collection of dynamically-linked
>     libraries itself.  i.e. have a little stub main() which links
>     with the RTS, and loads up 'base' followed by your program
>     when it starts.  The startup cost would be high (we don't
>     do lazy linking in Haskell), but you'd only get one copy of
>     the base package and this is possible right now.

I don't understand this, would you mind explainging a bit more.

> Summary: extending GHC's dynamic linker to be able to slurp in the
> symbol table from the currently running binary would be useful, and is a
> good bite-sized GHC hacker task.  I can't guarantee that we'll get
> around to it in a timely fashion, but contributions are, as always,
> entirely welcome...

Having made the suggestion, I'ts only right that I contribute my (limited)
skills. I have done some gdb hacking before (not out of choice you
understand!) so I ought to know a bit about .o's ELF sections and such.