[Haskell-cafe] is 256M RAM insufficient for a 20 millionelement Int/Int map?

Max Bolingbroke batterseapower at hotmail.com
Sun Oct 19 20:35:52 EDT 2008


2008/10/20 Don Stewart <dons at galois.com>:
> claus.reinke:
>> Ideally, I'd just like to indicate the strictness in the types (hint to ghc
>> hackers: 'Data.Map.Map Int !Int' and '[!a]' would really be useful!-),
> In general, being able to specialise polymorphic structures so they look like unpacked
> monomorphic ones would be awesome.
>
>    (!Int, !Bool)     ->   (,) {-# UNPACK #-}!Int {-# UNPACK #-}!Bool

I spent some time thinking about how one might actually go about
implementing this in GHC today, and came to the conclusion that it
interacts far too badly with separate compilation.

In an ideal world, you know which specialisations to generate for the
data structure when compiling it in the first place. You would then of
course generate the appropriate specialisations for all subsequent
polymorphic function definitions that use the speciliased type ((,),
IntMap or whatever), so you don't lose the ability to apply your
unpacked version to a function of type like (a, b) -> a - see
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.8984 that
Claus pointed to. But we don't have that information yet at the point
where we compile a data definition...

The CLR has a similar problem but gets away with it because it has a
JIT compiler that can see the original definitions of classes /
methods and instantiate them at the necessary types at runtime (see
http://research.microsoft.com/~akenn/generics/space2004generics.pdf).

One possible approach for GHC would be to generate all possible
specializations for a datatype + functions when compiling a module.
This sounds bad, but actually you only need to cater for at most three
unboxed field sizes:

1) 32 bit (Int#, Word#, Float#, Addr#)
2) 64 bit (Int64#, Word64#)
3) Usually 64 bit (Double#)

Of course, if you have a type like (,,,) with 4 type arguments you
still have to potentially generate 3^4 specialisations (and
corresponding functions!) so this /can/ get bad quickly :-).

This is also sort of a hack, in that it crucially relies on the fact
that unboxed tuples don't really make sense as type arguments yet.
This is because types like (# Int, Bool #) -> Int are rejected by the
compiler - unboxed tuples can only occur on the right of function
arrows. However, this is just a technical limitation, and were it to
be lifted it might make sense to be able to instantiate data types at
unboxed tuple types, blowing this scheme out of the water.

You could also export the entire definition of functions+data
structures you might need to specialise across module boundaries, and
generate specialisations where they are needed, but this can lead to
redundant code generation with diamond dependencies (see the
Syme/Kennedy paper - though I don't think their problem with redundant
type descriptors applies to Haskell, making it easier to apply) and
would bloat .hi files. This might still be the best way to go about
it, should someone choose to implement it,.

A further alternative would be to just compile every function once,
but pass something like a dictionary to every function to tell it how
to handle its own arguments (how much stuff to pop off the stack etc).
Clean ,but the huge runtime overhead of this approach kind of defeats
the point of using unboxed types in the first place.

A final alternative would be to somehow recompile a module with the
desired instantiation if it turned out not to contain it. But this
could potentially lead to greatly increased compile times...

So, I can't really see a clean way to go about this without going the
whole hog and turning GHC into a whole-program compiler :-)

Cheers,
Max


More information about the Haskell-Cafe mailing list