RFC: ghc's dynamic linker

Andre Pang ozone@algorithm.com.au
Wed, 28 Aug 2002 15:56:36 +1000


On Tue, Aug 27, 2002 at 05:19:15 +0100, Simon Marlow wrote:

> 'haskell98' packages, just like GHCi does.  It *almost* works to do
> this, except that you get strange effects, one of which is that you have
> two copies of stdout each with their own buffer.

If it's not too much trouble, do you mind explaining why this is
so?  It's just to satisfy my curiosity; don't worry if it's too
long-winded or contains really heavy wizardry :).

> Going the final step and allowing linkage to the current binary is
> possible, it just means the linker has to know how to read the symbol
> table out of the binary, and you have to avoid running 'strip'.  I
> believe reading the symbol table is quite straightforward, the main
> problem being that on Unix you don't actually know where the binary
> lives, so you have to wire it in or search the PATH.  

You've already got the symbol table in memory though, right?  Is
it absolutely necessary to re-read the binary?

BTW, I tried using objcopy (part of binutils) to 'merge' together
several plugin modules by copying over all the symbols in a bunch
of files to a single .o file.  Loading that up using the GHCI
linker didn't work :(.  If there's no reason why it shouldn't
work, I'll try again ... it's entirely possible that I stuffed up
somewhere.

> Another problem is that you don't normally link a *complete* copy of the
> base package into your binary, you only link the bits you need.  Linking
> the whole lot would mean every binary would be about 10M; but doing this
> on the basis of a flag which you turn on when you want to do dynamic
> linking maybe isn't so bad.

How about a feature (maybe a tool separate to GHC) which can find
the dependencies required for a particular symbol, and removes
all the excess baggage?

e.g. You have a program called, uhh, "Program", and a plugin
called, uhh, "Plugin", with Program containing the symbols 1, 2,
3, and Plugin containing symbols A B C.  Symbol "1" in Program
uses the "head" function from the standard library, so you need
to compile that into Program, and symbol "B" in Plugin uses the
"tail" function, so you need to compile that in:

    Program: 1 head 2 3
    Plugin: A B tail C

That should work, no?  Maybe it's even possible to do this right
now using a combination of evil GHC hacks and binutils?

However, then you have the problem that the RTS doesn't _know_
that it has to load the "tail" symbol when it loads the plugin.
Program will just load symbols A, B, C, and then die a sad death
when it realises it can't resolve the symbols (since the tail
symbol required for B is missing).  I guess you could work around
this by using some "stub" function (like "dependentSymbols")
which the linker first loads.

In Plugin.hs:

    dependentSymbols = ["tail"]

In Program.hs:

    loadModule "plugin"
    -- Load the symbols which A, B, C require
    loadFunction "dependentSymbols"
    resolveFunctions
    mapM_ (loadFunction) dependentSymbols
    -- Load A, B, C themselves
    mapM_ (loadFunction) ["A", "B", "C"]

Hopefully I'm not describing non-issues here ...

>   - make your program into a collection of dynamically-linked
>     libraries itself.  i.e. have a little stub main() which links
>     with the RTS, and loads up 'base' followed by your program
>     when it starts.  The startup cost would be high (we don't
>     do lazy linking in Haskell), but you'd only get one copy of
>     the base package and this is possible right now.

I was thinking of doing this when I started my own project.
However, I don't think it's really acceptable, because:

    1. You still need the base Haskell libraries on the system,
       which means that you either ship it with your application,
       or the user needs GHC installed on their system.  (I'm
       a big fan of the "it should just work" principle when user
       downloads and installs applications.)  If the user has GHC
       installed on their system, it probably also needs to be
       the same version of GHC, otherwise you will probably run
       into Bad Problems.

    2. As you say, startup cost (time) is high.  This is fine for
       some applications, but my next project will be invoked as
       a CGI, where the ~2 second overhead involved at startup
       really kills performance (to the point where it won't
       scale to handle lots of users).

Of course, the big advantage is that you can do this right now.

>   - make GHC generate objects that play nicely with the standarad
>     dynamic linker on your system.  This is entirely non-trivial,
>     I believe.  See previous discussions on this list.  However,
>     it might get easier in the future; I'm currently working on
>     removing the need to distinguish code from data in GHC's RTS,
>     which will eliminate some of the problems.

Just a comment: it's, well, interesting how GHC has this
fantastic method of importing modules at runtime, which is
similar (at least in what it achieves) to the dynamic linker.
I dunno, it feels like reinvent-wheel syndrome.  Not saying
that's a bad or good thing, just an observation.


-- 
#ozone/algorithm <ozone@algorithm.com.au>          - trust.in.love.to.save