Inferring from context declarations
Mark P Jones
mpj@cse.ogi.edu
Wed, 21 Feb 2001 15:59:44 -0800
| [One way to compile polymorphic code is to inline all uses of
| polymorphic values...]
|=20
| It would require to keep bodies of all polymorphic functions in
| a form allowing instantiation in different modules. For example
| ghc already requires too much memory (even a hundred MB for large
| modules), and it leads to code bloat, so in the current state of
| technology it's not advisable to always compile out polymorphism.
Separate compilation is an issue here, but I'm not at all sure that
code bloat would be a problem in practice. I say this on the basis
of some experiments that I did (a long time ago now), which you can
find described in my paper:
Dictionary-free Overloading by Partial Evaluation
http://www.cse.ogi.edu/~mpj/pubs/pepm94.html
Be sure to get the (longer) Yale Technical report version,
not the PEPM paper, which omits some specific material that
is relevant to this discussion.
The primary purpose of this paper was to investigate what would happen
if overloading was implemented by generating specialized versions of
each overloaded function. (In other words, by using a specialized form
of partial evaluation to eliminate the need for dictionaries.) In every
single case that I tried, specialization resulted in *smaller* programs:
although some functions were duplicated, others---which had previously
sat unreferenced inside dictionaries---could now be identified (and
hence removed) as dead code.
I also experimented to see what kind of effect specialization might
have if used to eliminate polymorphism. (You need to read the Technical
Report version of the paper for this.) The results of that suggested
that the resulting programs might contain perhaps twice as many =
functions
as the original. But note:
1) Polymorphic functions tend to be quite small (intuitively, the
bigger the code for a function gets, the more constrained its
type becomes). Because these are the functions whose bodies are
duplicated, a doubling in the number of functions may result in
something much less than a doubling in overall code size.
2) Small functions are obvious candidates for inlining, so worries
about code bloat may not be realized because the code for many
of these functions is already being duplicated in existing,
inlining compilers.
3) It may not be necessary to have distinct versions of the code
for a polymorphic function at every possible type; instead,
the implementations at different types could share the same
code. For example, if, at the lowest level, integers and
list values are represented as 32 bit quantities, then the
implementations for the identity function on Int and for the
identity function on lists could well be the same. In other
words, we might only need to index implementations on the size
of polymorphic type parameters, and not on the parameters
themselves. This, I believe, would address the problems that
Lennart described involving polymorphic recursion. It might
also help to solve the separate compilation issue.
All the best,
Mark