Inferring from context declarations

Wed, 21 Feb 2001 15:59:44 -0800

| [One way to compile polymorphic code is to inline all uses of
| polymorphic values...]
|=20
| It would require to keep bodies of all polymorphic functions in
| a form allowing instantiation in different modules. For example
| ghc already requires too much memory (even a hundred MB for large
| modules), and it leads to code bloat, so in the current state of
| technology it's not advisable to always compile out polymorphism.

Separate compilation is an issue here, but I'm not at all sure that
code bloat would be a problem in practice.  I say this on the basis
of some experiments that I did (a long time ago now), which you can
find described in my paper:

   Dictionary-free Overloading by Partial Evaluation
   http://www.cse.ogi.edu/~mpj/pubs/pepm94.html
   Be sure to get the (longer) Yale Technical report version,
   not the PEPM paper, which omits some specific material that
   is relevant to this discussion.

The primary purpose of this paper was to investigate what would happen
if overloading was implemented by generating specialized versions of
each overloaded function.  (In other words, by using a specialized form
of partial evaluation to eliminate the need for dictionaries.)  In every
single case that I tried, specialization resulted in *smaller* programs:
although some functions were duplicated, others---which had previously
sat unreferenced inside dictionaries---could now be identified (and
hence removed) as dead code.

I also experimented to see what kind of effect specialization might
have if used to eliminate polymorphism.  (You need to read the Technical
Report version of the paper for this.)  The results of that suggested
that the resulting programs might contain perhaps twice as many =
functions
as the original.  But note:

 1) Polymorphic functions tend to be quite small (intuitively, the
    bigger the code for a function gets, the more constrained its
    type becomes).  Because these are the functions whose bodies are
    duplicated, a doubling in the number of functions may result in
    something much less than a doubling in overall code size.

 2) Small functions are obvious candidates for inlining, so worries
    about code bloat may not be realized because the code for many
    of these functions is already being duplicated in existing,
    inlining compilers.

 3) It may not be necessary to have distinct versions of the code
    for a polymorphic function at every possible type; instead,
    the implementations at different types could share the same
    code.  For example, if, at the lowest level, integers and
    list values are represented as 32 bit quantities, then the
    implementations for the identity function on Int and for the
    identity function on lists could well be the same.  In other
    words, we might only need to index implementations on the size
    of polymorphic type parameters, and not on the parameters
    themselves.  This, I believe, would address the problems that
    Lennart described involving polymorphic recursion.  It might
    also help to solve the separate compilation issue.

All the best,
Mark